* [PATCH v2 00/12] use compiler atomic builtins for app modules @ 2021-11-16 9:41 Joyce Kong 2021-11-16 9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong ` (12 more replies) 0 siblings, 13 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:41 UTC (permalink / raw) Cc: dev, honnappa.nagarahalli, nd, Joyce Kong Since atomic operations have been adopted in DPDK now[1], change rte_atomicNN_xxx APIs to compiler's atomic built-ins in app modules[2]. [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/ [2] https://doc.dpdk.org/guides/rel_notes/deprecation.html v2: By Honnappa Nagarahalli: 1. Replace the RELAXED barriers with suitable ones for shared data sync in pmd_perf and timer test cases. 2. Avoid unnecessary atomic operations in compress and testpmd modules. 3. Fix some typo. Joyce Kong (12): test/pmd_perf: use compiler atomic builtins for polling sync test/ring_perf: use compiler atomic builtins for lcores sync test/timer: use compiler atomic builtins for sync test/stack_perf: use compiler atomics for lcore sync test/bpf: use compiler atomics for calculation test/func_reentrancy: use compiler atomics for data sync app/eventdev: use compiler atomics for shared data sync app/crypto: use compiler atomic builtins for display sync app/compress: use compiler atomic builtins for display sync app/testpmd: remove atomic operations for port status app/bbdev: use compiler atomics for shared data sync app: remove unnecessary include of atomic header file app/proc-info/main.c | 1 - app/test-bbdev/test_bbdev_perf.c | 135 ++++++++---------- .../comp_perf_test_common.h | 2 +- .../comp_perf_test_cyclecount.c | 15 +- .../comp_perf_test_throughput.c | 10 +- .../comp_perf_test_verify.c | 6 +- app/test-crypto-perf/cperf_test_latency.c | 6 +- .../cperf_test_pmd_cyclecount.c | 9 +- app/test-crypto-perf/cperf_test_throughput.c | 9 +- app/test-crypto-perf/cperf_test_verify.c | 9 +- app/test-eventdev/evt_main.c | 1 - app/test-eventdev/test_order_atq.c | 4 +- app/test-eventdev/test_order_common.c | 4 +- app/test-eventdev/test_order_common.h | 8 +- app/test-eventdev/test_order_queue.c | 4 +- app/test-pipeline/config.c | 1 - app/test-pipeline/init.c | 1 - app/test-pipeline/main.c | 1 - app/test-pipeline/runtime.c | 1 - app/test-pmd/cmdline.c | 1 - app/test-pmd/config.c | 1 - app/test-pmd/csumonly.c | 1 - app/test-pmd/flowgen.c | 1 - app/test-pmd/icmpecho.c | 1 - app/test-pmd/iofwd.c | 1 - app/test-pmd/macfwd.c | 1 - app/test-pmd/macswap.c | 1 - app/test-pmd/parameters.c | 1 - app/test-pmd/rxonly.c | 1 - app/test-pmd/testpmd.c | 58 ++++---- app/test-pmd/txonly.c | 1 - app/test/test_barrier.c | 1 - app/test/test_bpf.c | 28 ++-- app/test/test_func_reentrancy.c | 27 ++-- app/test/test_mbuf.c | 1 - app/test/test_mp_secondary.c | 1 - app/test/test_pmd_perf.c | 14 +- app/test/test_ring.c | 1 - app/test/test_ring_perf.c | 9 +- app/test/test_stack_perf.c | 14 +- app/test/test_timer.c | 30 ++-- app/test/test_timer_secondary.c | 1 - 42 files changed, 197 insertions(+), 226 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong @ 2021-11-16 9:41 ` Joyce Kong 2021-11-16 21:30 ` Honnappa Nagarahalli 2021-11-16 9:41 ` [PATCH v2 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong ` (11 subsequent siblings) 12 siblings, 1 reply; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:41 UTC (permalink / raw) Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for polling sync in pmd_perf test cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_pmd_perf.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c index 1df86ce080..546384a50d 100644 --- a/app/test/test_pmd_perf.c +++ b/app/test/test_pmd_perf.c @@ -10,7 +10,6 @@ #include <rte_cycles.h> #include <rte_ethdev.h> #include <rte_byteorder.h> -#include <rte_atomic.h> #include <rte_malloc.h> #include "packet_burst_generator.h" #include "test.h" @@ -525,7 +524,7 @@ main_loop(__rte_unused void *args) return 0; } -static rte_atomic64_t start; +static uint64_t start; static inline int poll_burst(void *args) @@ -563,8 +562,7 @@ poll_burst(void *args) num[portid] = pkt_per_port; } - while (!rte_atomic64_read(&start)) - ; + rte_wait_until_equal_64(&start, 1, __ATOMIC_ACQUIRE); cur_tsc = rte_rdtsc(); while (total) { @@ -616,15 +614,15 @@ exec_burst(uint32_t flags, int lcore) pkt_per_port = MAX_TRAFFIC_BURST; num = pkt_per_port * conf->nb_ports; - rte_atomic64_init(&start); - /* start polling thread, but not actually poll yet */ rte_eal_remote_launch(poll_burst, (void *)&pkt_per_port, lcore); /* Only when polling first */ if (flags == SC_BURST_POLL_FIRST) - rte_atomic64_set(&start, 1); + __atomic_store_n(&start, 1, __ATOMIC_RELAXED); + else + __atomic_store_n(&start, 0, __ATOMIC_RELAXED); /* start xmit */ i = 0; @@ -641,7 +639,7 @@ exec_burst(uint32_t flags, int lcore) /* only when polling second */ if (flags == SC_BURST_XMIT_FIRST) - rte_atomic64_set(&start, 1); + __atomic_store_n(&start, 1, __ATOMIC_RELEASE); /* wait for polling finished */ diff_tsc = rte_eal_wait_lcore(lcore); -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync 2021-11-16 9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong @ 2021-11-16 21:30 ` Honnappa Nagarahalli 0 siblings, 0 replies; 36+ messages in thread From: Honnappa Nagarahalli @ 2021-11-16 21:30 UTC (permalink / raw) To: Joyce Kong; +Cc: dev, nd, Joyce Kong, Ruifeng Wang, nd <snip> > > Convert rte_atomic usages to compiler atomic built-ins for polling sync in > pmd_perf test cases. > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > --- > app/test/test_pmd_perf.c | 14 ++++++-------- > 1 file changed, 6 insertions(+), 8 deletions(-) > > diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c index > 1df86ce080..546384a50d 100644 > --- a/app/test/test_pmd_perf.c > +++ b/app/test/test_pmd_perf.c > @@ -10,7 +10,6 @@ > #include <rte_cycles.h> > #include <rte_ethdev.h> > #include <rte_byteorder.h> > -#include <rte_atomic.h> > #include <rte_malloc.h> > #include "packet_burst_generator.h" > #include "test.h" > @@ -525,7 +524,7 @@ main_loop(__rte_unused void *args) > return 0; > } > > -static rte_atomic64_t start; > +static uint64_t start; > > static inline int > poll_burst(void *args) > @@ -563,8 +562,7 @@ poll_burst(void *args) > num[portid] = pkt_per_port; > } > > - while (!rte_atomic64_read(&start)) > - ; > + rte_wait_until_equal_64(&start, 1, __ATOMIC_ACQUIRE); > > cur_tsc = rte_rdtsc(); > while (total) { > @@ -616,15 +614,15 @@ exec_burst(uint32_t flags, int lcore) > pkt_per_port = MAX_TRAFFIC_BURST; > num = pkt_per_port * conf->nb_ports; > > - rte_atomic64_init(&start); > - > /* start polling thread, but not actually poll yet */ > rte_eal_remote_launch(poll_burst, > (void *)&pkt_per_port, lcore); > > /* Only when polling first */ > if (flags == SC_BURST_POLL_FIRST) > - rte_atomic64_set(&start, 1); > + __atomic_store_n(&start, 1, __ATOMIC_RELAXED); > + else > + __atomic_store_n(&start, 0, __ATOMIC_RELAXED); These lines need to be moved up before calling rte_eal_remote_launch, so that update to start is visible to the worker threads. > > /* start xmit */ > i = 0; > @@ -641,7 +639,7 @@ exec_burst(uint32_t flags, int lcore) > > /* only when polling second */ > if (flags == SC_BURST_XMIT_FIRST) > - rte_atomic64_set(&start, 1); > + __atomic_store_n(&start, 1, __ATOMIC_RELEASE); > > /* wait for polling finished */ > diff_tsc = rte_eal_wait_lcore(lcore); > -- > 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 02/12] test/ring_perf: use compiler atomic builtins for lcores sync 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong 2021-11-16 9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong @ 2021-11-16 9:41 ` Joyce Kong 2021-11-16 9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong ` (10 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:41 UTC (permalink / raw) To: Honnappa Nagarahalli, Konstantin Ananyev Cc: dev, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for lcores sync in ring_perf test cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test/test_ring_perf.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c index fd82e20412..2d8bb675a3 100644 --- a/app/test/test_ring_perf.c +++ b/app/test/test_ring_perf.c @@ -320,7 +320,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, const int esize) return 0; } -static rte_atomic32_t synchro; +static uint32_t synchro; static uint64_t queue_count[RTE_MAX_LCORE]; #define TIME_MS 100 @@ -342,8 +342,7 @@ load_loop_fn_helper(struct thread_params *p, const int esize) /* wait synchro for workers */ if (lcore != rte_get_main_lcore()) - while (rte_atomic32_read(&synchro) == 0) - rte_pause(); + rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); begin = rte_get_timer_cycles(); while (time_diff < hz * TIME_MS / 1000) { @@ -398,12 +397,12 @@ run_on_all_cores(struct rte_ring *r, const int esize) param.r = r; /* clear synchro and start workers */ - rte_atomic32_set(&synchro, 0); + __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED); if (rte_eal_mp_remote_launch(lcore_f, ¶m, SKIP_MAIN) < 0) return -1; /* start synchro and launch test on main */ - rte_atomic32_set(&synchro, 1); + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); lcore_f(¶m); rte_eal_mp_wait_lcore(); -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong 2021-11-16 9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong 2021-11-16 9:41 ` [PATCH v2 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong @ 2021-11-16 9:41 ` Joyce Kong 2021-11-16 19:52 ` Honnappa Nagarahalli 2021-11-16 20:20 ` David Marchand 2021-11-16 9:41 ` [PATCH v2 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong ` (9 subsequent siblings) 12 siblings, 2 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:41 UTC (permalink / raw) To: Robert Sanford, Erik Gabriel Carrillo Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for lcore_state and collisions sync. Also, move 'main_init_workers' outside of 'timer_stress2_main_loop' to guarantee lcore_state initialized correctly before the threads launched. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_timer.c | 30 +++++++++++++----------------- app/test/test_timer_secondary.c | 1 - 2 files changed, 13 insertions(+), 18 deletions(-) diff --git a/app/test/test_timer.c b/app/test/test_timer.c index a10b2fe9da..c97e5c891c 100644 --- a/app/test/test_timer.c +++ b/app/test/test_timer.c @@ -102,7 +102,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_timer.h> #include <rte_random.h> #include <rte_malloc.h> @@ -203,7 +202,7 @@ timer_stress_main_loop(__rte_unused void *arg) /* Need to synchronize worker lcores through multiple steps. */ enum { WORKER_WAITING = 1, WORKER_RUN_SIGNAL, WORKER_RUNNING, WORKER_FINISHED }; -static rte_atomic16_t lcore_state[RTE_MAX_LCORE]; +static uint16_t lcore_state[RTE_MAX_LCORE]; static void main_init_workers(void) @@ -211,7 +210,7 @@ main_init_workers(void) unsigned i; RTE_LCORE_FOREACH_WORKER(i) { - rte_atomic16_set(&lcore_state[i], WORKER_WAITING); + __atomic_store_n(&lcore_state[i], WORKER_WAITING, __ATOMIC_RELAXED); } } @@ -221,11 +220,10 @@ main_start_workers(void) unsigned i; RTE_LCORE_FOREACH_WORKER(i) { - rte_atomic16_set(&lcore_state[i], WORKER_RUN_SIGNAL); + __atomic_store_n(&lcore_state[i], WORKER_RUN_SIGNAL, __ATOMIC_RELEASE); } RTE_LCORE_FOREACH_WORKER(i) { - while (rte_atomic16_read(&lcore_state[i]) != WORKER_RUNNING) - rte_pause(); + rte_wait_until_equal_16(&lcore_state[i], WORKER_RUNNING, __ATOMIC_ACQUIRE); } } @@ -235,8 +233,7 @@ main_wait_for_workers(void) unsigned i; RTE_LCORE_FOREACH_WORKER(i) { - while (rte_atomic16_read(&lcore_state[i]) != WORKER_FINISHED) - rte_pause(); + rte_wait_until_equal_16(&lcore_state[i], WORKER_FINISHED, __ATOMIC_ACQUIRE); } } @@ -245,9 +242,8 @@ worker_wait_to_start(void) { unsigned lcore_id = rte_lcore_id(); - while (rte_atomic16_read(&lcore_state[lcore_id]) != WORKER_RUN_SIGNAL) - rte_pause(); - rte_atomic16_set(&lcore_state[lcore_id], WORKER_RUNNING); + rte_wait_until_equal_16(&lcore_state[lcore_id], WORKER_RUN_SIGNAL, __ATOMIC_ACQUIRE); + __atomic_store_n(&lcore_state[lcore_id], WORKER_RUNNING, __ATOMIC_RELEASE); } static void @@ -255,7 +251,7 @@ worker_finish(void) { unsigned lcore_id = rte_lcore_id(); - rte_atomic16_set(&lcore_state[lcore_id], WORKER_FINISHED); + __atomic_store_n(&lcore_state[lcore_id], WORKER_FINISHED, __ATOMIC_RELEASE); } @@ -281,13 +277,12 @@ timer_stress2_main_loop(__rte_unused void *arg) unsigned int lcore_id = rte_lcore_id(); unsigned int main_lcore = rte_get_main_lcore(); int32_t my_collisions = 0; - static rte_atomic32_t collisions; + static uint32_t collisions; if (lcore_id == main_lcore) { cb_count = 0; test_failed = 0; - rte_atomic32_set(&collisions, 0); - main_init_workers(); + __atomic_store_n(&collisions, 0, __ATOMIC_RELAXED); timers = rte_malloc(NULL, sizeof(*timers) * NB_STRESS2_TIMERS, 0); if (timers == NULL) { printf("Test Failed\n"); @@ -315,7 +310,7 @@ timer_stress2_main_loop(__rte_unused void *arg) my_collisions++; } if (my_collisions != 0) - rte_atomic32_add(&collisions, my_collisions); + __atomic_fetch_add(&collisions, my_collisions, __ATOMIC_RELAXED); /* wait long enough for timers to expire */ rte_delay_ms(100); @@ -329,7 +324,7 @@ timer_stress2_main_loop(__rte_unused void *arg) /* now check that we get the right number of callbacks */ if (lcore_id == main_lcore) { - my_collisions = rte_atomic32_read(&collisions); + my_collisions = __atomic_load_n(&collisions, __ATOMIC_RELAXED); if (my_collisions != 0) printf("- %d timer reset collisions (OK)\n", my_collisions); rte_timer_manage(); @@ -573,6 +568,7 @@ test_timer(void) /* run a second, slightly different set of stress tests */ printf("\nStart timer stress tests 2\n"); test_failed = 0; + main_init_workers(); rte_eal_mp_remote_launch(timer_stress2_main_loop, NULL, CALL_MAIN); rte_eal_mp_wait_lcore(); if (test_failed) diff --git a/app/test/test_timer_secondary.c b/app/test/test_timer_secondary.c index 16a9f1878b..5795c97f07 100644 --- a/app/test/test_timer_secondary.c +++ b/app/test/test_timer_secondary.c @@ -9,7 +9,6 @@ #include <rte_lcore.h> #include <rte_debug.h> #include <rte_memzone.h> -#include <rte_atomic.h> #include <rte_timer.h> #include <rte_cycles.h> #include <rte_mempool.h> -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync 2021-11-16 9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong @ 2021-11-16 19:52 ` Honnappa Nagarahalli 2021-11-16 20:20 ` David Marchand 1 sibling, 0 replies; 36+ messages in thread From: Honnappa Nagarahalli @ 2021-11-16 19:52 UTC (permalink / raw) To: Joyce Kong, Robert Sanford, Erik Gabriel Carrillo Cc: dev, nd, Joyce Kong, Ruifeng Wang, nd <snip> > > Convert rte_atomic usages to compiler atomic built-ins for lcore_state and > collisions sync. > > Also, move 'main_init_workers' outside of 'timer_stress2_main_loop' to > guarantee lcore_state initialized correctly before the threads launched. > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > --- > app/test/test_timer.c | 30 +++++++++++++----------------- > app/test/test_timer_secondary.c | 1 - > 2 files changed, 13 insertions(+), 18 deletions(-) > > diff --git a/app/test/test_timer.c b/app/test/test_timer.c index > a10b2fe9da..c97e5c891c 100644 > --- a/app/test/test_timer.c > +++ b/app/test/test_timer.c > @@ -102,7 +102,6 @@ > #include <rte_eal.h> > #include <rte_per_lcore.h> > #include <rte_lcore.h> > -#include <rte_atomic.h> > #include <rte_timer.h> > #include <rte_random.h> > #include <rte_malloc.h> > @@ -203,7 +202,7 @@ timer_stress_main_loop(__rte_unused void *arg) > > /* Need to synchronize worker lcores through multiple steps. */ enum { > WORKER_WAITING = 1, WORKER_RUN_SIGNAL, WORKER_RUNNING, > WORKER_FINISHED }; -static rte_atomic16_t lcore_state[RTE_MAX_LCORE]; > +static uint16_t lcore_state[RTE_MAX_LCORE]; > > static void > main_init_workers(void) > @@ -211,7 +210,7 @@ main_init_workers(void) > unsigned i; > > RTE_LCORE_FOREACH_WORKER(i) { > - rte_atomic16_set(&lcore_state[i], WORKER_WAITING); > + __atomic_store_n(&lcore_state[i], WORKER_WAITING, > __ATOMIC_RELAXED); > } > } > > @@ -221,11 +220,10 @@ main_start_workers(void) > unsigned i; > > RTE_LCORE_FOREACH_WORKER(i) { > - rte_atomic16_set(&lcore_state[i], WORKER_RUN_SIGNAL); > + __atomic_store_n(&lcore_state[i], WORKER_RUN_SIGNAL, > +__ATOMIC_RELEASE); > } > RTE_LCORE_FOREACH_WORKER(i) { > - while (rte_atomic16_read(&lcore_state[i]) != > WORKER_RUNNING) > - rte_pause(); > + rte_wait_until_equal_16(&lcore_state[i], WORKER_RUNNING, > +__ATOMIC_ACQUIRE); > } > } > > @@ -235,8 +233,7 @@ main_wait_for_workers(void) > unsigned i; > > RTE_LCORE_FOREACH_WORKER(i) { > - while (rte_atomic16_read(&lcore_state[i]) != > WORKER_FINISHED) > - rte_pause(); > + rte_wait_until_equal_16(&lcore_state[i], WORKER_FINISHED, > +__ATOMIC_ACQUIRE); > } > } > > @@ -245,9 +242,8 @@ worker_wait_to_start(void) { > unsigned lcore_id = rte_lcore_id(); > > - while (rte_atomic16_read(&lcore_state[lcore_id]) != > WORKER_RUN_SIGNAL) > - rte_pause(); > - rte_atomic16_set(&lcore_state[lcore_id], WORKER_RUNNING); > + rte_wait_until_equal_16(&lcore_state[lcore_id], > WORKER_RUN_SIGNAL, __ATOMIC_ACQUIRE); > + __atomic_store_n(&lcore_state[lcore_id], WORKER_RUNNING, > +__ATOMIC_RELEASE); > } > > static void > @@ -255,7 +251,7 @@ worker_finish(void) > { > unsigned lcore_id = rte_lcore_id(); > > - rte_atomic16_set(&lcore_state[lcore_id], WORKER_FINISHED); > + __atomic_store_n(&lcore_state[lcore_id], WORKER_FINISHED, > +__ATOMIC_RELEASE); > } > > > @@ -281,13 +277,12 @@ timer_stress2_main_loop(__rte_unused void *arg) > unsigned int lcore_id = rte_lcore_id(); > unsigned int main_lcore = rte_get_main_lcore(); > int32_t my_collisions = 0; > - static rte_atomic32_t collisions; > + static uint32_t collisions; > > if (lcore_id == main_lcore) { > cb_count = 0; > test_failed = 0; > - rte_atomic32_set(&collisions, 0); > - main_init_workers(); > + __atomic_store_n(&collisions, 0, __ATOMIC_RELAXED); > timers = rte_malloc(NULL, sizeof(*timers) * > NB_STRESS2_TIMERS, 0); > if (timers == NULL) { > printf("Test Failed\n"); > @@ -315,7 +310,7 @@ timer_stress2_main_loop(__rte_unused void *arg) > my_collisions++; > } > if (my_collisions != 0) > - rte_atomic32_add(&collisions, my_collisions); > + __atomic_fetch_add(&collisions, my_collisions, > __ATOMIC_RELAXED); > > /* wait long enough for timers to expire */ > rte_delay_ms(100); > @@ -329,7 +324,7 @@ timer_stress2_main_loop(__rte_unused void *arg) > > /* now check that we get the right number of callbacks */ > if (lcore_id == main_lcore) { > - my_collisions = rte_atomic32_read(&collisions); > + my_collisions = __atomic_load_n(&collisions, > __ATOMIC_RELAXED); > if (my_collisions != 0) > printf("- %d timer reset collisions (OK)\n", > my_collisions); > rte_timer_manage(); > @@ -573,6 +568,7 @@ test_timer(void) > /* run a second, slightly different set of stress tests */ > printf("\nStart timer stress tests 2\n"); > test_failed = 0; > + main_init_workers(); > rte_eal_mp_remote_launch(timer_stress2_main_loop, NULL, > CALL_MAIN); > rte_eal_mp_wait_lcore(); > if (test_failed) > diff --git a/app/test/test_timer_secondary.c b/app/test/test_timer_secondary.c > index 16a9f1878b..5795c97f07 100644 > --- a/app/test/test_timer_secondary.c > +++ b/app/test/test_timer_secondary.c > @@ -9,7 +9,6 @@ > #include <rte_lcore.h> > #include <rte_debug.h> > #include <rte_memzone.h> > -#include <rte_atomic.h> > #include <rte_timer.h> > #include <rte_cycles.h> > #include <rte_mempool.h> > -- > 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync 2021-11-16 9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong 2021-11-16 19:52 ` Honnappa Nagarahalli @ 2021-11-16 20:20 ` David Marchand 2021-11-16 21:21 ` Honnappa Nagarahalli 1 sibling, 1 reply; 36+ messages in thread From: David Marchand @ 2021-11-16 20:20 UTC (permalink / raw) To: Joyce Kong, Honnappa Nagarahalli Cc: Robert Sanford, Erik Gabriel Carrillo, dev, nd, Ruifeng Wang Joyce, Honnappa, On Tue, Nov 16, 2021 at 10:43 AM Joyce Kong <joyce.kong@arm.com> wrote: > > Convert rte_atomic usages to compiler atomic > built-ins for lcore_state and collisions sync. > > Also, move 'main_init_workers' outside of > 'timer_stress2_main_loop' to guarantee lcore_state > initialized correctly before the threads launched. Is this "also" part actually related to the change? Or is it a separate fix? > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> -- David Marchand ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync 2021-11-16 20:20 ` David Marchand @ 2021-11-16 21:21 ` Honnappa Nagarahalli 2021-11-17 9:29 ` David Marchand 0 siblings, 1 reply; 36+ messages in thread From: Honnappa Nagarahalli @ 2021-11-16 21:21 UTC (permalink / raw) To: David Marchand, Joyce Kong Cc: Robert Sanford, Erik Gabriel Carrillo, dev, nd, Ruifeng Wang, nd <snip> > > Joyce, Honnappa, > > On Tue, Nov 16, 2021 at 10:43 AM Joyce Kong <joyce.kong@arm.com> wrote: > > > > Convert rte_atomic usages to compiler atomic built-ins for lcore_state > > and collisions sync. > > > > Also, move 'main_init_workers' outside of 'timer_stress2_main_loop' to > > guarantee lcore_state initialized correctly before the threads > > launched. > > Is this "also" part actually related to the change? > Or is it a separate fix? 'Also' part is not fixing a different problem (i.e. the code earlier was not having any issues). This 'also' part just helps to keep the code simple. > > > > > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > > > -- > David Marchand ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync 2021-11-16 21:21 ` Honnappa Nagarahalli @ 2021-11-17 9:29 ` David Marchand 0 siblings, 0 replies; 36+ messages in thread From: David Marchand @ 2021-11-17 9:29 UTC (permalink / raw) To: Honnappa Nagarahalli Cc: Joyce Kong, Robert Sanford, Erik Gabriel Carrillo, dev, nd, Ruifeng Wang On Tue, Nov 16, 2021 at 10:21 PM Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> wrote: > > Joyce, Honnappa, > > > > On Tue, Nov 16, 2021 at 10:43 AM Joyce Kong <joyce.kong@arm.com> wrote: > > > > > > Convert rte_atomic usages to compiler atomic built-ins for lcore_state > > > and collisions sync. > > > > > > Also, move 'main_init_workers' outside of 'timer_stress2_main_loop' to > > > guarantee lcore_state initialized correctly before the threads > > > launched. > > > > Is this "also" part actually related to the change? > > Or is it a separate fix? > 'Also' part is not fixing a different problem (i.e. the code earlier was not having any issues). This 'also' part just helps to keep the code simple. This is indeed better this way. Thanks. -- David Marchand ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 04/12] test/stack_perf: use compiler atomics for lcore sync 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong ` (2 preceding siblings ...) 2021-11-16 9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong @ 2021-11-16 9:41 ` Joyce Kong 2021-11-16 9:41 ` [PATCH v2 05/12] test/bpf: use compiler atomics for calculation Joyce Kong ` (8 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:41 UTC (permalink / raw) To: Olivier Matz; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for lcore sync in stack_perf test cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test/test_stack_perf.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/app/test/test_stack_perf.c b/app/test/test_stack_perf.c index 4ee40d5d19..1eae00a334 100644 --- a/app/test/test_stack_perf.c +++ b/app/test/test_stack_perf.c @@ -6,7 +6,6 @@ #include <stdio.h> #include <inttypes.h> -#include <rte_atomic.h> #include <rte_cycles.h> #include <rte_launch.h> #include <rte_pause.h> @@ -24,7 +23,7 @@ */ static volatile unsigned int bulk_sizes[] = {8, MAX_BURST}; -static rte_atomic32_t lcore_barrier; +static uint32_t lcore_barrier; struct lcore_pair { unsigned int c1; @@ -144,9 +143,8 @@ bulk_push_pop(void *p) s = args->s; size = args->sz; - rte_atomic32_sub(&lcore_barrier, 1); - while (rte_atomic32_read(&lcore_barrier) != 0) - rte_pause(); + __atomic_fetch_sub(&lcore_barrier, 1, __ATOMIC_RELAXED); + rte_wait_until_equal_32(&lcore_barrier, 0, __ATOMIC_RELAXED); uint64_t start = rte_rdtsc(); @@ -175,7 +173,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_stack *s, unsigned int i; for (i = 0; i < RTE_DIM(bulk_sizes); i++) { - rte_atomic32_set(&lcore_barrier, 2); + __atomic_store_n(&lcore_barrier, 2, __ATOMIC_RELAXED); args[0].sz = args[1].sz = bulk_sizes[i]; args[0].s = args[1].s = s; @@ -208,7 +206,7 @@ run_on_n_cores(struct rte_stack *s, lcore_function_t fn, int n) int cnt = 0; double avg; - rte_atomic32_set(&lcore_barrier, n); + __atomic_store_n(&lcore_barrier, n, __ATOMIC_RELAXED); RTE_LCORE_FOREACH_WORKER(lcore_id) { if (++cnt >= n) @@ -302,7 +300,7 @@ __test_stack_perf(uint32_t flags) struct lcore_pair cores; struct rte_stack *s; - rte_atomic32_init(&lcore_barrier); + __atomic_store_n(&lcore_barrier, 0, __ATOMIC_RELAXED); s = rte_stack_create(STACK_NAME, STACK_SIZE, rte_socket_id(), flags); if (s == NULL) { -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 05/12] test/bpf: use compiler atomics for calculation 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong ` (3 preceding siblings ...) 2021-11-16 9:41 ` [PATCH v2 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong @ 2021-11-16 9:41 ` Joyce Kong 2021-11-16 9:41 ` [PATCH v2 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong ` (7 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:41 UTC (permalink / raw) To: Konstantin Ananyev Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for calculation in bpf test cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_bpf.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c index e3e9a1b0b5..b8be1e3d30 100644 --- a/app/test/test_bpf.c +++ b/app/test/test_bpf.c @@ -1569,32 +1569,32 @@ test_xadd1_check(uint64_t rc, const void *arg) memset(&dfe, 0, sizeof(dfe)); rv = 1; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); rv = -1; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); rv = (int32_t)TEST_FILL_1; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); rv = TEST_MUL_1; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); rv = TEST_MUL_2; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); rv = TEST_JCC_2; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); rv = TEST_JCC_3; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe)); } -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 06/12] test/func_reentrancy: use compiler atomics for data sync 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong ` (4 preceding siblings ...) 2021-11-16 9:41 ` [PATCH v2 05/12] test/bpf: use compiler atomics for calculation Joyce Kong @ 2021-11-16 9:41 ` Joyce Kong 2021-11-16 9:42 ` [PATCH v2 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong ` (6 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:41 UTC (permalink / raw) To: Olivier Matz, Andrew Rybchenko, Bruce Richardson, Vladimir Medvedkin, Yipeng Wang, Sameh Gobriel, Anatoly Burakov, Honnappa Nagarahalli, Konstantin Ananyev Cc: dev, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for shared data sync in func_reentrancy test cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test/test_func_reentrancy.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c index 838ab6f0f9..7825c6cb86 100644 --- a/app/test/test_func_reentrancy.c +++ b/app/test/test_func_reentrancy.c @@ -20,7 +20,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_ring.h> #include <rte_mempool.h> @@ -54,12 +53,12 @@ typedef void (*case_clean_t)(unsigned lcore_id); #define MAX_LCORES (RTE_MAX_MEMZONE / (MAX_ITER_MULTI * 4U)) -static rte_atomic32_t obj_count = RTE_ATOMIC32_INIT(0); -static rte_atomic32_t synchro = RTE_ATOMIC32_INIT(0); +static uint32_t obj_count; +static uint32_t synchro; #define WAIT_SYNCHRO_FOR_WORKERS() do { \ if (lcore_self != rte_get_main_lcore()) \ - while (rte_atomic32_read(&synchro) == 0); \ + rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); \ } while(0) /* @@ -72,7 +71,7 @@ test_eal_init_once(__rte_unused void *arg) WAIT_SYNCHRO_FOR_WORKERS(); - rte_atomic32_set(&obj_count, 1); /* silent the check in the caller */ + __atomic_store_n(&obj_count, 1, __ATOMIC_RELAXED); /* silent the check in the caller */ if (rte_eal_init(0, NULL) != -1) return -1; @@ -116,7 +115,7 @@ ring_create_lookup(__rte_unused void *arg) for (i = 0; i < MAX_ITER_ONCE; i++) { rp = rte_ring_create("fr_test_once", 4096, SOCKET_ID_ANY, 0); if (rp != NULL) - rte_atomic32_inc(&obj_count); + __atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED); } /* create/lookup new ring several times */ @@ -183,7 +182,7 @@ mempool_create_lookup(__rte_unused void *arg) my_obj_init, NULL, SOCKET_ID_ANY, 0); if (mp != NULL) - rte_atomic32_inc(&obj_count); + __atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED); } /* create/lookup new ring several times */ @@ -250,7 +249,7 @@ hash_create_free(__rte_unused void *arg) for (i = 0; i < MAX_ITER_ONCE; i++) { handle = rte_hash_create(&hash_params); if (handle != NULL) - rte_atomic32_inc(&obj_count); + __atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED); } /* create mutiple times simultaneously */ @@ -318,7 +317,7 @@ fbk_create_free(__rte_unused void *arg) for (i = 0; i < MAX_ITER_ONCE; i++) { handle = rte_fbk_hash_create(&fbk_params); if (handle != NULL) - rte_atomic32_inc(&obj_count); + __atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED); } /* create mutiple fbk tables simultaneously */ @@ -384,7 +383,7 @@ lpm_create_free(__rte_unused void *arg) for (i = 0; i < MAX_ITER_ONCE; i++) { lpm = rte_lpm_create("fr_test_once", SOCKET_ID_ANY, &config); if (lpm != NULL) - rte_atomic32_inc(&obj_count); + __atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED); } /* create mutiple fbk tables simultaneously */ @@ -445,8 +444,8 @@ launch_test(struct test_case *pt_case) if (pt_case->func == NULL) return -1; - rte_atomic32_set(&obj_count, 0); - rte_atomic32_set(&synchro, 0); + __atomic_store_n(&obj_count, 0, __ATOMIC_RELAXED); + __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED); cores = RTE_MIN(rte_lcore_count(), MAX_LCORES); RTE_LCORE_FOREACH_WORKER(lcore_id) { @@ -456,7 +455,7 @@ launch_test(struct test_case *pt_case) rte_eal_remote_launch(pt_case->func, pt_case->arg, lcore_id); } - rte_atomic32_set(&synchro, 1); + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); if (pt_case->func(pt_case->arg) < 0) ret = -1; @@ -471,7 +470,7 @@ launch_test(struct test_case *pt_case) pt_case->clean(lcore_id); } - count = rte_atomic32_read(&obj_count); + count = __atomic_load_n(&obj_count, __ATOMIC_RELAXED); if (count != 1) { printf("%s: common object allocated %d times (should be 1)\n", pt_case->name, count); -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 07/12] app/eventdev: use compiler atomics for shared data sync 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong ` (5 preceding siblings ...) 2021-11-16 9:41 ` [PATCH v2 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong @ 2021-11-16 9:42 ` Joyce Kong 2021-11-16 9:42 ` [PATCH v2 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong ` (5 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:42 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for shared data sync in eventdev cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test-eventdev/evt_main.c | 1 - app/test-eventdev/test_order_atq.c | 4 ++-- app/test-eventdev/test_order_common.c | 4 ++-- app/test-eventdev/test_order_common.h | 8 ++++---- app/test-eventdev/test_order_queue.c | 4 ++-- 5 files changed, 10 insertions(+), 11 deletions(-) diff --git a/app/test-eventdev/evt_main.c b/app/test-eventdev/evt_main.c index 3534aabca7..194c980c7a 100644 --- a/app/test-eventdev/evt_main.c +++ b/app/test-eventdev/evt_main.c @@ -6,7 +6,6 @@ #include <unistd.h> #include <signal.h> -#include <rte_atomic.h> #include <rte_debug.h> #include <rte_eal.h> #include <rte_eventdev.h> diff --git a/app/test-eventdev/test_order_atq.c b/app/test-eventdev/test_order_atq.c index 71215a07b6..2fee4b4daa 100644 --- a/app/test-eventdev/test_order_atq.c +++ b/app/test-eventdev/test_order_atq.c @@ -28,7 +28,7 @@ order_atq_worker(void *arg, const bool flow_id_cap) uint16_t event = rte_event_dequeue_burst(dev_id, port, &ev, 1, 0); if (!event) { - if (rte_atomic64_read(outstand_pkts) <= 0) + if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0) break; rte_pause(); continue; @@ -64,7 +64,7 @@ order_atq_worker_burst(void *arg, const bool flow_id_cap) BURST_SIZE, 0); if (nb_rx == 0) { - if (rte_atomic64_read(outstand_pkts) <= 0) + if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0) break; rte_pause(); continue; diff --git a/app/test-eventdev/test_order_common.c b/app/test-eventdev/test_order_common.c index d7760061ba..ff7813f9c2 100644 --- a/app/test-eventdev/test_order_common.c +++ b/app/test-eventdev/test_order_common.c @@ -187,7 +187,7 @@ order_test_setup(struct evt_test *test, struct evt_options *opt) evt_err("failed to allocate t->expected_flow_seq memory"); goto exp_nomem; } - rte_atomic64_set(&t->outstand_pkts, opt->nb_pkts); + __atomic_store_n(&t->outstand_pkts, opt->nb_pkts, __ATOMIC_RELAXED); t->err = false; t->nb_pkts = opt->nb_pkts; t->nb_flows = opt->nb_flows; @@ -294,7 +294,7 @@ order_launch_lcores(struct evt_test *test, struct evt_options *opt, while (t->err == false) { uint64_t new_cycles = rte_get_timer_cycles(); - int64_t remaining = rte_atomic64_read(&t->outstand_pkts); + int64_t remaining = __atomic_load_n(&t->outstand_pkts, __ATOMIC_RELAXED); if (remaining <= 0) { t->result = EVT_TEST_SUCCESS; diff --git a/app/test-eventdev/test_order_common.h b/app/test-eventdev/test_order_common.h index cd9d6009ec..92781d9587 100644 --- a/app/test-eventdev/test_order_common.h +++ b/app/test-eventdev/test_order_common.h @@ -48,7 +48,7 @@ struct test_order { * The atomic_* is an expensive operation,Since it is a functional test, * We are using the atomic_ operation to reduce the code complexity. */ - rte_atomic64_t outstand_pkts; + uint64_t outstand_pkts; enum evt_test_result result; uint32_t nb_flows; uint64_t nb_pkts; @@ -95,7 +95,7 @@ static __rte_always_inline void order_process_stage_1(struct test_order *const t, struct rte_event *const ev, const uint32_t nb_flows, uint32_t *const expected_flow_seq, - rte_atomic64_t *const outstand_pkts) + uint64_t *const outstand_pkts) { const uint32_t flow = (uintptr_t)ev->mbuf % nb_flows; /* compare the seqn against expected value */ @@ -113,7 +113,7 @@ order_process_stage_1(struct test_order *const t, */ expected_flow_seq[flow]++; rte_pktmbuf_free(ev->mbuf); - rte_atomic64_sub(outstand_pkts, 1); + __atomic_sub_fetch(outstand_pkts, 1, __ATOMIC_RELAXED); } static __rte_always_inline void @@ -132,7 +132,7 @@ order_process_stage_invalid(struct test_order *const t, const uint8_t port = w->port_id;\ const uint32_t nb_flows = t->nb_flows;\ uint32_t *expected_flow_seq = t->expected_flow_seq;\ - rte_atomic64_t *outstand_pkts = &t->outstand_pkts;\ + uint64_t *outstand_pkts = &t->outstand_pkts;\ if (opt->verbose_level > 1)\ printf("%s(): lcore %d dev_id %d port=%d\n",\ __func__, rte_lcore_id(), dev_id, port) diff --git a/app/test-eventdev/test_order_queue.c b/app/test-eventdev/test_order_queue.c index 621367805a..80eaea5cf5 100644 --- a/app/test-eventdev/test_order_queue.c +++ b/app/test-eventdev/test_order_queue.c @@ -28,7 +28,7 @@ order_queue_worker(void *arg, const bool flow_id_cap) uint16_t event = rte_event_dequeue_burst(dev_id, port, &ev, 1, 0); if (!event) { - if (rte_atomic64_read(outstand_pkts) <= 0) + if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0) break; rte_pause(); continue; @@ -64,7 +64,7 @@ order_queue_worker_burst(void *arg, const bool flow_id_cap) BURST_SIZE, 0); if (nb_rx == 0) { - if (rte_atomic64_read(outstand_pkts) <= 0) + if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0) break; rte_pause(); continue; -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 08/12] app/crypto: use compiler atomic builtins for display sync 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong ` (6 preceding siblings ...) 2021-11-16 9:42 ` [PATCH v2 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong @ 2021-11-16 9:42 ` Joyce Kong 2021-11-16 9:42 ` [PATCH v2 09/12] app/compress: " Joyce Kong ` (4 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:42 UTC (permalink / raw) To: Declan Doherty, Ciara Power Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic_test_and_set usage to compiler atomic CAS operation for display sync in crypto cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test-crypto-perf/cperf_test_latency.c | 6 ++++-- app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 9 ++++++--- app/test-crypto-perf/cperf_test_throughput.c | 9 ++++++--- app/test-crypto-perf/cperf_test_verify.c | 9 ++++++--- 4 files changed, 22 insertions(+), 11 deletions(-) diff --git a/app/test-crypto-perf/cperf_test_latency.c b/app/test-crypto-perf/cperf_test_latency.c index 69f55de50a..ce49feaba9 100644 --- a/app/test-crypto-perf/cperf_test_latency.c +++ b/app/test-crypto-perf/cperf_test_latency.c @@ -126,7 +126,7 @@ cperf_latency_test_runner(void *arg) uint8_t burst_size_idx = 0; uint32_t imix_idx = 0; - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; if (ctx == NULL) return 0; @@ -307,8 +307,10 @@ cperf_latency_test_runner(void *arg) time_max = tunit*(double)(tsc_max) / tsc_hz; time_min = tunit*(double)(tsc_min) / tsc_hz; + uint16_t exp = 0; if (ctx->options->csv) { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf("\n# lcore, Buffer Size, Burst Size, Pakt Seq #, " "cycles, time (us)"); diff --git a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c index fda97e8ab9..ba1f104f72 100644 --- a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c +++ b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c @@ -404,7 +404,7 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx) state.lcore = rte_lcore_id(); state.linearize = 0; - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; static bool warmup = true; /* @@ -449,8 +449,10 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx) continue; } + uint16_t exp = 0; if (!opts->csv) { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf(PRETTY_HDR_FMT, "lcore id", "Buf Size", "Burst Size", "Enqueued", "Dequeued", "Enq Retries", @@ -466,7 +468,8 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx) state.cycles_per_enq, state.cycles_per_deq); } else { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf(CSV_HDR_FMT, "# lcore id", "Buf Size", "Burst Size", "Enqueued", "Dequeued", "Enq Retries", diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-perf/cperf_test_throughput.c index 739ed9e573..51512af2ad 100644 --- a/app/test-crypto-perf/cperf_test_throughput.c +++ b/app/test-crypto-perf/cperf_test_throughput.c @@ -113,7 +113,7 @@ cperf_throughput_test_runner(void *test_ctx) uint8_t burst_size_idx = 0; uint32_t imix_idx = 0; - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; struct rte_crypto_op *ops[ctx->options->max_burst_size]; struct rte_crypto_op *ops_processed[ctx->options->max_burst_size]; @@ -281,8 +281,10 @@ cperf_throughput_test_runner(void *test_ctx) double cycles_per_packet = ((double)tsc_duration / ctx->options->total_ops); + uint16_t exp = 0; if (!ctx->options->csv) { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf("%12s%12s%12s%12s%12s%12s%12s%12s%12s%12s\n\n", "lcore id", "Buf Size", "Burst Size", "Enqueued", "Dequeued", "Failed Enq", @@ -302,7 +304,8 @@ cperf_throughput_test_runner(void *test_ctx) throughput_gbps, cycles_per_packet); } else { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf("#lcore id,Buffer Size(B)," "Burst Size,Enqueued,Dequeued,Failed Enq," "Failed Deq,Ops(Millions),Throughput(Gbps)," diff --git a/app/test-crypto-perf/cperf_test_verify.c b/app/test-crypto-perf/cperf_test_verify.c index 1962438034..496eb0de00 100644 --- a/app/test-crypto-perf/cperf_test_verify.c +++ b/app/test-crypto-perf/cperf_test_verify.c @@ -241,7 +241,7 @@ cperf_verify_test_runner(void *test_ctx) uint64_t ops_deqd = 0, ops_deqd_total = 0, ops_deqd_failed = 0; uint64_t ops_failed = 0; - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; uint64_t i; uint16_t ops_unused = 0; @@ -383,8 +383,10 @@ cperf_verify_test_runner(void *test_ctx) ops_deqd_total += ops_deqd; } + uint16_t exp = 0; if (!ctx->options->csv) { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf("%12s%12s%12s%12s%12s%12s%12s%12s\n\n", "lcore id", "Buf Size", "Burst size", "Enqueued", "Dequeued", "Failed Enq", @@ -401,7 +403,8 @@ cperf_verify_test_runner(void *test_ctx) ops_deqd_failed, ops_failed); } else { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf("\n# lcore id, Buffer Size(B), " "Burst Size,Enqueued,Dequeued,Failed Enq," "Failed Deq,Failed Ops\n"); -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 09/12] app/compress: use compiler atomic builtins for display sync 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong ` (7 preceding siblings ...) 2021-11-16 9:42 ` [PATCH v2 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong @ 2021-11-16 9:42 ` Joyce Kong 2021-11-16 20:15 ` Honnappa Nagarahalli 2021-11-16 9:42 ` [PATCH v2 10/12] app/testpmd: remove atomic operations for port status Joyce Kong ` (3 subsequent siblings) 12 siblings, 1 reply; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:42 UTC (permalink / raw) Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic_test_and_set usage to compiler atomic CAS operation for display sync. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test-compress-perf/comp_perf_test_common.h | 2 +- .../comp_perf_test_cyclecount.c | 15 +++++++-------- .../comp_perf_test_throughput.c | 10 +++++++--- app/test-compress-perf/comp_perf_test_verify.c | 6 ++++-- 4 files changed, 19 insertions(+), 14 deletions(-) diff --git a/app/test-compress-perf/comp_perf_test_common.h b/app/test-compress-perf/comp_perf_test_common.h index 72705c6a2b..d039e5a29a 100644 --- a/app/test-compress-perf/comp_perf_test_common.h +++ b/app/test-compress-perf/comp_perf_test_common.h @@ -14,7 +14,7 @@ struct cperf_mem_resources { uint16_t qp_id; uint8_t lcore_id; - rte_atomic16_t print_info_once; + uint16_t print_info_once; uint32_t total_bufs; uint8_t *compressed_data; diff --git a/app/test-compress-perf/comp_perf_test_cyclecount.c b/app/test-compress-perf/comp_perf_test_cyclecount.c index c875ddbdac..da55b02b74 100644 --- a/app/test-compress-perf/comp_perf_test_cyclecount.c +++ b/app/test-compress-perf/comp_perf_test_cyclecount.c @@ -466,7 +466,7 @@ cperf_cyclecount_test_runner(void *test_ctx) struct cperf_cyclecount_ctx *ctx = test_ctx; struct comp_test_data *test_data = ctx->ver.options; uint32_t lcore = rte_lcore_id(); - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; static rte_spinlock_t print_spinlock; int i; @@ -486,10 +486,12 @@ cperf_cyclecount_test_runner(void *test_ctx) ctx->ver.mem.lcore_id = lcore; + uint16_t exp = 0; /* * printing information about current compression thread */ - if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once)) + if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp, + 1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf(" lcore: %u," " driver name: %s," " device name: %s," @@ -546,9 +548,10 @@ cperf_cyclecount_test_runner(void *test_ctx) (ctx->ver.mem.total_bufs * test_data->num_iter); /* R E P O R T processing */ - if (rte_atomic16_test_and_set(&display_once)) { + rte_spinlock_lock(&print_spinlock); - rte_spinlock_lock(&print_spinlock); + if (display_once == 0) { + display_once = 1; printf("\nLegend for the table\n" " - Retries section: number of retries for the following operations:\n" @@ -576,12 +579,8 @@ cperf_cyclecount_test_runner(void *test_ctx) "setup/op", "[C-e]", "[C-d]", "[D-e]", "[D-d]"); - - rte_spinlock_unlock(&print_spinlock); } - rte_spinlock_lock(&print_spinlock); - printf("%12u" "%6u" "%12zu" diff --git a/app/test-compress-perf/comp_perf_test_throughput.c b/app/test-compress-perf/comp_perf_test_throughput.c index 13922b658c..d3dff070b0 100644 --- a/app/test-compress-perf/comp_perf_test_throughput.c +++ b/app/test-compress-perf/comp_perf_test_throughput.c @@ -329,15 +329,17 @@ cperf_throughput_test_runner(void *test_ctx) struct cperf_benchmark_ctx *ctx = test_ctx; struct comp_test_data *test_data = ctx->ver.options; uint32_t lcore = rte_lcore_id(); - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; int i, ret = EXIT_SUCCESS; ctx->ver.mem.lcore_id = lcore; + uint16_t exp = 0; /* * printing information about current compression thread */ - if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once)) + if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp, + 1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf(" lcore: %u," " driver name: %s," " device name: %s," @@ -391,7 +393,9 @@ cperf_throughput_test_runner(void *test_ctx) ctx->decomp_gbps = rte_get_tsc_hz() / ctx->decomp_tsc_byte * 8 / 1000000000; - if (rte_atomic16_test_and_set(&display_once)) { + exp = 0; + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) { printf("\n%12s%6s%12s%17s%15s%16s\n", "lcore id", "Level", "Comp size", "Comp ratio [%]", "Comp [Gbps]", "Decomp [Gbps]"); diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test-compress-perf/comp_perf_test_verify.c index 5e13257b79..f6e21368e8 100644 --- a/app/test-compress-perf/comp_perf_test_verify.c +++ b/app/test-compress-perf/comp_perf_test_verify.c @@ -388,7 +388,7 @@ cperf_verify_test_runner(void *test_ctx) struct cperf_verify_ctx *ctx = test_ctx; struct comp_test_data *test_data = ctx->options; int ret = EXIT_SUCCESS; - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; uint32_t lcore = rte_lcore_id(); ctx->mem.lcore_id = lcore; @@ -427,8 +427,10 @@ cperf_verify_test_runner(void *test_ctx) ctx->ratio = (double) ctx->comp_data_sz / test_data->input_data_sz * 100; + uint16_t exp = 0; if (!ctx->silent) { - if (rte_atomic16_test_and_set(&display_once)) { + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) { printf("%12s%6s%12s%17s\n", "lcore id", "Level", "Comp size", "Comp ratio [%]"); } -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [PATCH v2 09/12] app/compress: use compiler atomic builtins for display sync 2021-11-16 9:42 ` [PATCH v2 09/12] app/compress: " Joyce Kong @ 2021-11-16 20:15 ` Honnappa Nagarahalli 0 siblings, 0 replies; 36+ messages in thread From: Honnappa Nagarahalli @ 2021-11-16 20:15 UTC (permalink / raw) To: Joyce Kong; +Cc: dev, nd, Joyce Kong, Ruifeng Wang, nd <snip> > > Convert rte_atomic_test_and_set usage to compiler atomic CAS operation for > display sync. > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > --- > app/test-compress-perf/comp_perf_test_common.h | 2 +- > .../comp_perf_test_cyclecount.c | 15 +++++++-------- > .../comp_perf_test_throughput.c | 10 +++++++--- > app/test-compress-perf/comp_perf_test_verify.c | 6 ++++-- > 4 files changed, 19 insertions(+), 14 deletions(-) > > diff --git a/app/test-compress-perf/comp_perf_test_common.h b/app/test- > compress-perf/comp_perf_test_common.h > index 72705c6a2b..d039e5a29a 100644 > --- a/app/test-compress-perf/comp_perf_test_common.h > +++ b/app/test-compress-perf/comp_perf_test_common.h > @@ -14,7 +14,7 @@ struct cperf_mem_resources { > uint16_t qp_id; > uint8_t lcore_id; > > - rte_atomic16_t print_info_once; > + uint16_t print_info_once; > > uint32_t total_bufs; > uint8_t *compressed_data; > diff --git a/app/test-compress-perf/comp_perf_test_cyclecount.c b/app/test- > compress-perf/comp_perf_test_cyclecount.c > index c875ddbdac..da55b02b74 100644 > --- a/app/test-compress-perf/comp_perf_test_cyclecount.c > +++ b/app/test-compress-perf/comp_perf_test_cyclecount.c > @@ -466,7 +466,7 @@ cperf_cyclecount_test_runner(void *test_ctx) > struct cperf_cyclecount_ctx *ctx = test_ctx; > struct comp_test_data *test_data = ctx->ver.options; > uint32_t lcore = rte_lcore_id(); > - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); > + static uint16_t display_once; > static rte_spinlock_t print_spinlock; > int i; > > @@ -486,10 +486,12 @@ cperf_cyclecount_test_runner(void *test_ctx) > > ctx->ver.mem.lcore_id = lcore; > > + uint16_t exp = 0; > /* > * printing information about current compression thread > */ > - if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once)) > + if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, > &exp, > + 1, 0, __ATOMIC_RELAXED, > __ATOMIC_RELAXED)) > printf(" lcore: %u," > " driver name: %s," > " device name: %s," > @@ -546,9 +548,10 @@ cperf_cyclecount_test_runner(void *test_ctx) > (ctx->ver.mem.total_bufs * test_data->num_iter); > > /* R E P O R T processing */ > - if (rte_atomic16_test_and_set(&display_once)) { > + rte_spinlock_lock(&print_spinlock); > > - rte_spinlock_lock(&print_spinlock); > + if (display_once == 0) { > + display_once = 1; > > printf("\nLegend for the table\n" > " - Retries section: number of retries for the following > operations:\n" > @@ -576,12 +579,8 @@ cperf_cyclecount_test_runner(void *test_ctx) > "setup/op", > "[C-e]", "[C-d]", > "[D-e]", "[D-d]"); > - > - rte_spinlock_unlock(&print_spinlock); > } > > - rte_spinlock_lock(&print_spinlock); > - > printf("%12u" > "%6u" > "%12zu" > diff --git a/app/test-compress-perf/comp_perf_test_throughput.c b/app/test- > compress-perf/comp_perf_test_throughput.c > index 13922b658c..d3dff070b0 100644 > --- a/app/test-compress-perf/comp_perf_test_throughput.c > +++ b/app/test-compress-perf/comp_perf_test_throughput.c > @@ -329,15 +329,17 @@ cperf_throughput_test_runner(void *test_ctx) > struct cperf_benchmark_ctx *ctx = test_ctx; > struct comp_test_data *test_data = ctx->ver.options; > uint32_t lcore = rte_lcore_id(); > - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); > + static uint16_t display_once; > int i, ret = EXIT_SUCCESS; > > ctx->ver.mem.lcore_id = lcore; > > + uint16_t exp = 0; > /* > * printing information about current compression thread > */ > - if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once)) > + if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, > &exp, > + 1, 0, __ATOMIC_RELAXED, > __ATOMIC_RELAXED)) > printf(" lcore: %u," > " driver name: %s," > " device name: %s," > @@ -391,7 +393,9 @@ cperf_throughput_test_runner(void *test_ctx) > ctx->decomp_gbps = rte_get_tsc_hz() / ctx->decomp_tsc_byte * 8 / > 1000000000; > > - if (rte_atomic16_test_and_set(&display_once)) { > + exp = 0; > + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, > + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) { > printf("\n%12s%6s%12s%17s%15s%16s\n", > "lcore id", "Level", "Comp size", "Comp ratio [%]", > "Comp [Gbps]", "Decomp [Gbps]"); > diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test- > compress-perf/comp_perf_test_verify.c > index 5e13257b79..f6e21368e8 100644 > --- a/app/test-compress-perf/comp_perf_test_verify.c > +++ b/app/test-compress-perf/comp_perf_test_verify.c > @@ -388,7 +388,7 @@ cperf_verify_test_runner(void *test_ctx) > struct cperf_verify_ctx *ctx = test_ctx; > struct comp_test_data *test_data = ctx->options; > int ret = EXIT_SUCCESS; > - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); > + static uint16_t display_once; > uint32_t lcore = rte_lcore_id(); > > ctx->mem.lcore_id = lcore; > @@ -427,8 +427,10 @@ cperf_verify_test_runner(void *test_ctx) > ctx->ratio = (double) ctx->comp_data_sz / > test_data->input_data_sz * 100; > > + uint16_t exp = 0; > if (!ctx->silent) { > - if (rte_atomic16_test_and_set(&display_once)) { > + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, > + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) { > printf("%12s%6s%12s%17s\n", > "lcore id", "Level", "Comp size", "Comp ratio [%]"); > } > -- > 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 10/12] app/testpmd: remove atomic operations for port status 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong ` (8 preceding siblings ...) 2021-11-16 9:42 ` [PATCH v2 09/12] app/compress: " Joyce Kong @ 2021-11-16 9:42 ` Joyce Kong 2021-11-16 21:34 ` Honnappa Nagarahalli 2021-11-16 9:42 ` [PATCH v2 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong ` (2 subsequent siblings) 12 siblings, 1 reply; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:42 UTC (permalink / raw) To: Xiaoyun Li; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang The port_status changes do not need to be handled atomically, as they are modified during initialization or through the testpmd prompt instead of multiple threads. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test-pmd/testpmd.c | 58 ++++++++++++++++++++++-------------------- 1 file changed, 31 insertions(+), 27 deletions(-) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index a66dfb297c..ed472cacd2 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -36,7 +36,6 @@ #include <rte_alarm.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_malloc.h> @@ -2521,9 +2520,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi) continue; /* Fail to setup rx queue, return */ - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, - RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set back to stopped\n", pi); fprintf(stderr, "Fail to configure port %d hairpin queues\n", @@ -2544,9 +2543,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi) continue; /* Fail to setup rx queue, return */ - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, - RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set back to stopped\n", pi); fprintf(stderr, "Fail to configure port %d hairpin queues\n", @@ -2729,8 +2728,9 @@ start_port(portid_t pid) need_check_link_status = 0; port = &ports[pi]; - if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STOPPED, - RTE_PORT_HANDLING) == 0) { + if (port->port_status == RTE_PORT_STOPPED) + port->port_status = RTE_PORT_HANDLING; + else { fprintf(stderr, "Port %d is now not stopped\n", pi); continue; } @@ -2766,8 +2766,9 @@ start_port(portid_t pid) nb_txq + nb_hairpinq, &(port->dev_conf)); if (diag != 0) { - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set back to stopped\n", pi); @@ -2828,9 +2829,9 @@ start_port(portid_t pid) continue; /* Fail to setup tx queue, return */ - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, - RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set back to stopped\n", pi); @@ -2880,9 +2881,9 @@ start_port(portid_t pid) continue; /* Fail to setup rx queue, return */ - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, - RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set back to stopped\n", pi); @@ -2917,16 +2918,18 @@ start_port(portid_t pid) pi, rte_strerror(-diag)); /* Fail to setup rx queue, return */ - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set back to stopped\n", pi); continue; } - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, RTE_PORT_STARTED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STARTED; + else fprintf(stderr, "Port %d can not be set into started\n", pi); @@ -3028,8 +3031,9 @@ stop_port(portid_t pid) } port = &ports[pi]; - if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STARTED, - RTE_PORT_HANDLING) == 0) + if (port->port_status == RTE_PORT_STARTED) + port->port_status = RTE_PORT_HANDLING; + else continue; if (hairpin_mode & 0xf) { @@ -3055,8 +3059,9 @@ stop_port(portid_t pid) RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port %u\n", pi); - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set into stopped\n", pi); need_check_link_status = 1; @@ -3119,8 +3124,7 @@ close_port(portid_t pid) } port = &ports[pi]; - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_CLOSED, RTE_PORT_CLOSED) == 1) { + if (port->port_status == RTE_PORT_CLOSED) { fprintf(stderr, "Port %d is already closed\n", pi); continue; } -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [PATCH v2 10/12] app/testpmd: remove atomic operations for port status 2021-11-16 9:42 ` [PATCH v2 10/12] app/testpmd: remove atomic operations for port status Joyce Kong @ 2021-11-16 21:34 ` Honnappa Nagarahalli 0 siblings, 0 replies; 36+ messages in thread From: Honnappa Nagarahalli @ 2021-11-16 21:34 UTC (permalink / raw) To: Joyce Kong, Xiaoyun Li; +Cc: dev, nd, Joyce Kong, Ruifeng Wang, nd <snip> > > The port_status changes do not need to be handled atomically, as they are > modified during initialization or through the testpmd prompt instead of > multiple threads. > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > --- > app/test-pmd/testpmd.c | 58 ++++++++++++++++++++++-------------------- > 1 file changed, 31 insertions(+), 27 deletions(-) > > diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index > a66dfb297c..ed472cacd2 100644 > --- a/app/test-pmd/testpmd.c > +++ b/app/test-pmd/testpmd.c > @@ -36,7 +36,6 @@ > #include <rte_alarm.h> > #include <rte_per_lcore.h> > #include <rte_lcore.h> > -#include <rte_atomic.h> > #include <rte_branch_prediction.h> > #include <rte_mempool.h> > #include <rte_malloc.h> > @@ -2521,9 +2520,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, > uint16_t cnt_pi) > continue; > > /* Fail to setup rx queue, return */ > - if (rte_atomic16_cmpset(&(port->port_status), > - RTE_PORT_HANDLING, > - RTE_PORT_STOPPED) == 0) > + if (port->port_status == RTE_PORT_HANDLING) > + port->port_status = RTE_PORT_STOPPED; > + else > fprintf(stderr, > "Port %d can not be set back to stopped\n", > pi); > fprintf(stderr, "Fail to configure port %d hairpin queues\n", > @@ -2544,9 +2543,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, > uint16_t cnt_pi) > continue; > > /* Fail to setup rx queue, return */ > - if (rte_atomic16_cmpset(&(port->port_status), > - RTE_PORT_HANDLING, > - RTE_PORT_STOPPED) == 0) > + if (port->port_status == RTE_PORT_HANDLING) > + port->port_status = RTE_PORT_STOPPED; > + else > fprintf(stderr, > "Port %d can not be set back to stopped\n", > pi); > fprintf(stderr, "Fail to configure port %d hairpin queues\n", > @@ -2729,8 +2728,9 @@ start_port(portid_t pid) > > need_check_link_status = 0; > port = &ports[pi]; > - if (rte_atomic16_cmpset(&(port->port_status), > RTE_PORT_STOPPED, > - RTE_PORT_HANDLING) == 0) > { > + if (port->port_status == RTE_PORT_STOPPED) > + port->port_status = RTE_PORT_HANDLING; > + else { > fprintf(stderr, "Port %d is now not stopped\n", pi); > continue; > } > @@ -2766,8 +2766,9 @@ start_port(portid_t pid) > nb_txq + nb_hairpinq, > &(port->dev_conf)); > if (diag != 0) { > - if (rte_atomic16_cmpset(&(port- > >port_status), > - RTE_PORT_HANDLING, RTE_PORT_STOPPED) > == 0) > + if (port->port_status == > RTE_PORT_HANDLING) > + port->port_status = > RTE_PORT_STOPPED; > + else > fprintf(stderr, > "Port %d can not be set back > to stopped\n", > pi); > @@ -2828,9 +2829,9 @@ start_port(portid_t pid) > continue; > > /* Fail to setup tx queue, return */ > - if (rte_atomic16_cmpset(&(port- > >port_status), > - > RTE_PORT_HANDLING, > - RTE_PORT_STOPPED) > == 0) > + if (port->port_status == > RTE_PORT_HANDLING) > + port->port_status = > RTE_PORT_STOPPED; > + else > fprintf(stderr, > "Port %d can not be set back > to stopped\n", > pi); > @@ -2880,9 +2881,9 @@ start_port(portid_t pid) > continue; > > /* Fail to setup rx queue, return */ > - if (rte_atomic16_cmpset(&(port- > >port_status), > - > RTE_PORT_HANDLING, > - RTE_PORT_STOPPED) > == 0) > + if (port->port_status == > RTE_PORT_HANDLING) > + port->port_status = > RTE_PORT_STOPPED; > + else > fprintf(stderr, > "Port %d can not be set back > to stopped\n", > pi); > @@ -2917,16 +2918,18 @@ start_port(portid_t pid) > pi, rte_strerror(-diag)); > > /* Fail to setup rx queue, return */ > - if (rte_atomic16_cmpset(&(port->port_status), > - RTE_PORT_HANDLING, RTE_PORT_STOPPED) > == 0) > + if (port->port_status == RTE_PORT_HANDLING) > + port->port_status = RTE_PORT_STOPPED; > + else > fprintf(stderr, > "Port %d can not be set back to > stopped\n", > pi); > continue; > } > > - if (rte_atomic16_cmpset(&(port->port_status), > - RTE_PORT_HANDLING, RTE_PORT_STARTED) == 0) > + if (port->port_status == RTE_PORT_HANDLING) > + port->port_status = RTE_PORT_STARTED; > + else > fprintf(stderr, "Port %d can not be set into started\n", > pi); > > @@ -3028,8 +3031,9 @@ stop_port(portid_t pid) > } > > port = &ports[pi]; > - if (rte_atomic16_cmpset(&(port->port_status), > RTE_PORT_STARTED, > - RTE_PORT_HANDLING) == 0) > + if (port->port_status == RTE_PORT_STARTED) > + port->port_status = RTE_PORT_HANDLING; > + else > continue; > > if (hairpin_mode & 0xf) { > @@ -3055,8 +3059,9 @@ stop_port(portid_t pid) > RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port > %u\n", > pi); > > - if (rte_atomic16_cmpset(&(port->port_status), > - RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0) > + if (port->port_status == RTE_PORT_HANDLING) > + port->port_status = RTE_PORT_STOPPED; > + else > fprintf(stderr, "Port %d can not be set into > stopped\n", > pi); > need_check_link_status = 1; > @@ -3119,8 +3124,7 @@ close_port(portid_t pid) > } > > port = &ports[pi]; > - if (rte_atomic16_cmpset(&(port->port_status), > - RTE_PORT_CLOSED, RTE_PORT_CLOSED) == 1) { > + if (port->port_status == RTE_PORT_CLOSED) { > fprintf(stderr, "Port %d is already closed\n", pi); > continue; > } > -- > 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 11/12] app/bbdev: use compiler atomics for shared data sync 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong ` (9 preceding siblings ...) 2021-11-16 9:42 ` [PATCH v2 10/12] app/testpmd: remove atomic operations for port status Joyce Kong @ 2021-11-16 9:42 ` Joyce Kong 2021-11-16 9:42 ` [PATCH v2 12/12] app: remove unnecessary include of atomic header file Joyce Kong 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:42 UTC (permalink / raw) To: Nicolas Chautru; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for shared data sync in bbdev cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test-bbdev/test_bbdev_perf.c | 135 ++++++++++++++----------------- 1 file changed, 59 insertions(+), 76 deletions(-) diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c index 7b4529789b..0fa119a502 100644 --- a/app/test-bbdev/test_bbdev_perf.c +++ b/app/test-bbdev/test_bbdev_perf.c @@ -133,7 +133,7 @@ struct test_op_params { uint16_t num_to_process; uint16_t num_lcores; int vector_mask; - rte_atomic16_t sync; + uint16_t sync; struct test_buffers q_bufs[RTE_MAX_NUMA_NODES][MAX_QUEUES]; }; @@ -148,9 +148,9 @@ struct thread_params { uint8_t iter_count; double iter_average; double bler; - rte_atomic16_t nb_dequeued; - rte_atomic16_t processing_status; - rte_atomic16_t burst_sz; + uint16_t nb_dequeued; + int16_t processing_status; + uint16_t burst_sz; struct test_op_params *op_params; struct rte_bbdev_dec_op *dec_ops[MAX_BURST]; struct rte_bbdev_enc_op *enc_ops[MAX_BURST]; @@ -2637,46 +2637,46 @@ dequeue_event_callback(uint16_t dev_id, } if (unlikely(event != RTE_BBDEV_EVENT_DEQUEUE)) { - rte_atomic16_set(&tp->processing_status, TEST_FAILED); + __atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED); printf( "Dequeue interrupt handler called for incorrect event!\n"); return; } - burst_sz = rte_atomic16_read(&tp->burst_sz); + burst_sz = __atomic_load_n(&tp->burst_sz, __ATOMIC_RELAXED); num_ops = tp->op_params->num_to_process; if (test_vector.op_type == RTE_BBDEV_OP_TURBO_DEC) deq = rte_bbdev_dequeue_dec_ops(dev_id, queue_id, &tp->dec_ops[ - rte_atomic16_read(&tp->nb_dequeued)], + __atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)], burst_sz); else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_DEC) deq = rte_bbdev_dequeue_ldpc_dec_ops(dev_id, queue_id, &tp->dec_ops[ - rte_atomic16_read(&tp->nb_dequeued)], + __atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)], burst_sz); else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_ENC) deq = rte_bbdev_dequeue_ldpc_enc_ops(dev_id, queue_id, &tp->enc_ops[ - rte_atomic16_read(&tp->nb_dequeued)], + __atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)], burst_sz); else /*RTE_BBDEV_OP_TURBO_ENC*/ deq = rte_bbdev_dequeue_enc_ops(dev_id, queue_id, &tp->enc_ops[ - rte_atomic16_read(&tp->nb_dequeued)], + __atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)], burst_sz); if (deq < burst_sz) { printf( "After receiving the interrupt all operations should be dequeued. Expected: %u, got: %u\n", burst_sz, deq); - rte_atomic16_set(&tp->processing_status, TEST_FAILED); + __atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED); return; } - if (rte_atomic16_read(&tp->nb_dequeued) + deq < num_ops) { - rte_atomic16_add(&tp->nb_dequeued, deq); + if (__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) + deq < num_ops) { + __atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED); return; } @@ -2713,7 +2713,7 @@ dequeue_event_callback(uint16_t dev_id, if (ret) { printf("Buffers validation failed\n"); - rte_atomic16_set(&tp->processing_status, TEST_FAILED); + __atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED); } switch (test_vector.op_type) { @@ -2734,7 +2734,7 @@ dequeue_event_callback(uint16_t dev_id, break; default: printf("Unknown op type: %d\n", test_vector.op_type); - rte_atomic16_set(&tp->processing_status, TEST_FAILED); + __atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED); return; } @@ -2743,7 +2743,7 @@ dequeue_event_callback(uint16_t dev_id, tp->mbps += (((double)(num_ops * tb_len_bits)) / 1000000.0) / ((double)total_time / (double)rte_get_tsc_hz()); - rte_atomic16_add(&tp->nb_dequeued, deq); + __atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED); } static int @@ -2781,11 +2781,10 @@ throughput_intr_lcore_ldpc_dec(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - rte_atomic16_clear(&tp->processing_status); - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops, num_to_process); @@ -2833,17 +2832,15 @@ throughput_intr_lcore_ldpc_dec(void *arg) * the number of operations is not a multiple of * burst size. */ - rte_atomic16_set(&tp->burst_sz, num_to_enq); + __atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED); /* Wait until processing of previous batch is * completed */ - while (rte_atomic16_read(&tp->nb_dequeued) != - (int16_t) enqueued) - rte_pause(); + rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED); } if (j != TEST_REPETITIONS - 1) - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); } return TEST_SUCCESS; @@ -2878,11 +2875,10 @@ throughput_intr_lcore_dec(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - rte_atomic16_clear(&tp->processing_status); - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops, num_to_process); @@ -2923,17 +2919,15 @@ throughput_intr_lcore_dec(void *arg) * the number of operations is not a multiple of * burst size. */ - rte_atomic16_set(&tp->burst_sz, num_to_enq); + __atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED); /* Wait until processing of previous batch is * completed */ - while (rte_atomic16_read(&tp->nb_dequeued) != - (int16_t) enqueued) - rte_pause(); + rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED); } if (j != TEST_REPETITIONS - 1) - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); } return TEST_SUCCESS; @@ -2968,11 +2962,10 @@ throughput_intr_lcore_enc(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - rte_atomic16_clear(&tp->processing_status); - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops, num_to_process); @@ -3012,17 +3005,15 @@ throughput_intr_lcore_enc(void *arg) * the number of operations is not a multiple of * burst size. */ - rte_atomic16_set(&tp->burst_sz, num_to_enq); + __atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED); /* Wait until processing of previous batch is * completed */ - while (rte_atomic16_read(&tp->nb_dequeued) != - (int16_t) enqueued) - rte_pause(); + rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED); } if (j != TEST_REPETITIONS - 1) - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); } return TEST_SUCCESS; @@ -3058,11 +3049,10 @@ throughput_intr_lcore_ldpc_enc(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - rte_atomic16_clear(&tp->processing_status); - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops, num_to_process); @@ -3104,17 +3094,15 @@ throughput_intr_lcore_ldpc_enc(void *arg) * the number of operations is not a multiple of * burst size. */ - rte_atomic16_set(&tp->burst_sz, num_to_enq); + __atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED); /* Wait until processing of previous batch is * completed */ - while (rte_atomic16_read(&tp->nb_dequeued) != - (int16_t) enqueued) - rte_pause(); + rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED); } if (j != TEST_REPETITIONS - 1) - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); } return TEST_SUCCESS; @@ -3148,8 +3136,7 @@ throughput_pmd_lcore_dec(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops); TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops); @@ -3252,8 +3239,7 @@ bler_pmd_lcore_ldpc_dec(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops); TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops); @@ -3382,8 +3368,7 @@ throughput_pmd_lcore_ldpc_dec(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops); TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops); @@ -3499,8 +3484,7 @@ throughput_pmd_lcore_enc(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops); @@ -3590,8 +3574,7 @@ throughput_pmd_lcore_ldpc_enc(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops); @@ -3774,7 +3757,7 @@ bler_test(struct active_device *ad, else return TEST_SKIPPED; - rte_atomic16_set(&op_params->sync, SYNC_WAIT); + __atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED); /* Main core is set at first entry */ t_params[0].dev_id = ad->dev_id; @@ -3797,7 +3780,7 @@ bler_test(struct active_device *ad, &t_params[used_cores++], lcore_id); } - rte_atomic16_set(&op_params->sync, SYNC_START); + __atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = bler_function(&t_params[0]); /* Main core is always used */ @@ -3892,7 +3875,7 @@ throughput_test(struct active_device *ad, throughput_function = throughput_pmd_lcore_enc; } - rte_atomic16_set(&op_params->sync, SYNC_WAIT); + __atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED); /* Main core is set at first entry */ t_params[0].dev_id = ad->dev_id; @@ -3915,7 +3898,7 @@ throughput_test(struct active_device *ad, &t_params[used_cores++], lcore_id); } - rte_atomic16_set(&op_params->sync, SYNC_START); + __atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = throughput_function(&t_params[0]); /* Main core is always used */ @@ -3945,29 +3928,29 @@ throughput_test(struct active_device *ad, * Wait for main lcore operations. */ tp = &t_params[0]; - while ((rte_atomic16_read(&tp->nb_dequeued) < - op_params->num_to_process) && - (rte_atomic16_read(&tp->processing_status) != - TEST_FAILED)) + while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) < + op_params->num_to_process) && + (__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) != + TEST_FAILED)) rte_pause(); tp->ops_per_sec /= TEST_REPETITIONS; tp->mbps /= TEST_REPETITIONS; - ret |= (int)rte_atomic16_read(&tp->processing_status); + ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED); /* Wait for worker lcores operations */ for (used_cores = 1; used_cores < num_lcores; used_cores++) { tp = &t_params[used_cores]; - while ((rte_atomic16_read(&tp->nb_dequeued) < - op_params->num_to_process) && - (rte_atomic16_read(&tp->processing_status) != - TEST_FAILED)) + while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) < + op_params->num_to_process) && + (__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) != + TEST_FAILED)) rte_pause(); tp->ops_per_sec /= TEST_REPETITIONS; tp->mbps /= TEST_REPETITIONS; - ret |= (int)rte_atomic16_read(&tp->processing_status); + ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED); } /* Print throughput if test passed */ -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v2 12/12] app: remove unnecessary include of atomic header file 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong ` (10 preceding siblings ...) 2021-11-16 9:42 ` [PATCH v2 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong @ 2021-11-16 9:42 ` Joyce Kong 2021-11-16 20:23 ` David Marchand 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong 12 siblings, 1 reply; 36+ messages in thread From: Joyce Kong @ 2021-11-16 9:42 UTC (permalink / raw) To: Maryam Tahhan, Reshma Pattan, Cristian Dumitrescu, Xiaoyun Li, Olivier Matz, Anatoly Burakov, Honnappa Nagarahalli, Konstantin Ananyev Cc: dev, nd, Joyce Kong, Ruifeng Wang Remove the unnecessary rte_atomic.h included in app modules. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/proc-info/main.c | 1 - app/test-pipeline/config.c | 1 - app/test-pipeline/init.c | 1 - app/test-pipeline/main.c | 1 - app/test-pipeline/runtime.c | 1 - app/test-pmd/cmdline.c | 1 - app/test-pmd/config.c | 1 - app/test-pmd/csumonly.c | 1 - app/test-pmd/flowgen.c | 1 - app/test-pmd/icmpecho.c | 1 - app/test-pmd/iofwd.c | 1 - app/test-pmd/macfwd.c | 1 - app/test-pmd/macswap.c | 1 - app/test-pmd/parameters.c | 1 - app/test-pmd/rxonly.c | 1 - app/test-pmd/txonly.c | 1 - app/test/test_barrier.c | 1 - app/test/test_mbuf.c | 1 - app/test/test_mp_secondary.c | 1 - app/test/test_ring.c | 1 - 20 files changed, 20 deletions(-) diff --git a/app/proc-info/main.c b/app/proc-info/main.c index a4271047e6..ebe2d77264 100644 --- a/app/proc-info/main.c +++ b/app/proc-info/main.c @@ -27,7 +27,6 @@ #include <rte_per_lcore.h> #include <rte_lcore.h> #include <rte_log.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_string_fns.h> #include <rte_metrics.h> diff --git a/app/test-pipeline/config.c b/app/test-pipeline/config.c index 33f3f1c827..daf838948b 100644 --- a/app/test-pipeline/config.c +++ b/app/test-pipeline/config.c @@ -21,7 +21,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_launch.h> -#include <rte_atomic.h> #include <rte_cycles.h> #include <rte_prefetch.h> #include <rte_lcore.h> diff --git a/app/test-pipeline/init.c b/app/test-pipeline/init.c index c738019041..eee0719b67 100644 --- a/app/test-pipeline/init.c +++ b/app/test-pipeline/init.c @@ -21,7 +21,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_launch.h> -#include <rte_atomic.h> #include <rte_cycles.h> #include <rte_prefetch.h> #include <rte_lcore.h> diff --git a/app/test-pipeline/main.c b/app/test-pipeline/main.c index 72e4797ff2..1e16794183 100644 --- a/app/test-pipeline/main.c +++ b/app/test-pipeline/main.c @@ -22,7 +22,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_launch.h> -#include <rte_atomic.h> #include <rte_cycles.h> #include <rte_prefetch.h> #include <rte_lcore.h> diff --git a/app/test-pipeline/runtime.c b/app/test-pipeline/runtime.c index 159192bcd8..d939a85d7e 100644 --- a/app/test-pipeline/runtime.c +++ b/app/test-pipeline/runtime.c @@ -21,7 +21,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_launch.h> -#include <rte_atomic.h> #include <rte_cycles.h> #include <rte_prefetch.h> #include <rte_branch_prediction.h> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 4f51b259fe..4e93f535ff 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_ring.h> #include <rte_mempool.h> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 26cadf39f7..d8b5032b58 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -27,7 +27,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index 8526d9158a..e0b00abe8c 100644 --- a/app/test-pmd/csumonly.c +++ b/app/test-pmd/csumonly.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c index 5737eaa105..9ceef3b54a 100644 --- a/app/test-pmd/flowgen.c +++ b/app/test-pmd/flowgen.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c index 8f1d68a83a..3a85ec3dd1 100644 --- a/app/test-pmd/icmpecho.c +++ b/app/test-pmd/icmpecho.c @@ -20,7 +20,6 @@ #include <rte_cycles.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_memory.h> #include <rte_mempool.h> diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c index 83d098adcb..19cd920f70 100644 --- a/app/test-pmd/iofwd.c +++ b/app/test-pmd/iofwd.c @@ -23,7 +23,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_memcpy.h> #include <rte_mempool.h> diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c index ac50d0b9f8..812a0c721f 100644 --- a/app/test-pmd/macfwd.c +++ b/app/test-pmd/macfwd.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c index 310bca06af..4627ff83e9 100644 --- a/app/test-pmd/macswap.c +++ b/app/test-pmd/macswap.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index 0974b0a38f..2f4f944efa 100644 --- a/app/test-pmd/parameters.c +++ b/app/test-pmd/parameters.c @@ -30,7 +30,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_interrupts.h> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c index c78fc4609a..d1a579d8d8 100644 --- a/app/test-pmd/rxonly.c +++ b/app/test-pmd/rxonly.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index 34bb538379..b8497e733d 100644 --- a/app/test-pmd/txonly.c +++ b/app/test-pmd/txonly.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test/test_barrier.c b/app/test/test_barrier.c index c27f8a0742..898c2516ed 100644 --- a/app/test/test_barrier.c +++ b/app/test/test_barrier.c @@ -24,7 +24,6 @@ #include <rte_memory.h> #include <rte_per_lcore.h> #include <rte_launch.h> -#include <rte_atomic.h> #include <rte_eal.h> #include <rte_lcore.h> #include <rte_pause.h> diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c index f93bcef8a9..d53126710f 100644 --- a/app/test/test_mbuf.c +++ b/app/test/test_mbuf.c @@ -21,7 +21,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_ring.h> #include <rte_mempool.h> diff --git a/app/test/test_mp_secondary.c b/app/test/test_mp_secondary.c index 5b6f05dbb1..021ca0547f 100644 --- a/app/test/test_mp_secondary.c +++ b/app/test/test_mp_secondary.c @@ -28,7 +28,6 @@ #include <rte_lcore.h> #include <rte_errno.h> #include <rte_branch_prediction.h> -#include <rte_atomic.h> #include <rte_ring.h> #include <rte_debug.h> #include <rte_log.h> diff --git a/app/test/test_ring.c b/app/test/test_ring.c index fb8532a409..bde33ab4a1 100644 --- a/app/test/test_ring.c +++ b/app/test/test_ring.c @@ -20,7 +20,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_malloc.h> #include <rte_ring.h> -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v2 12/12] app: remove unnecessary include of atomic header file 2021-11-16 9:42 ` [PATCH v2 12/12] app: remove unnecessary include of atomic header file Joyce Kong @ 2021-11-16 20:23 ` David Marchand 2021-11-17 7:05 ` Joyce Kong 0 siblings, 1 reply; 36+ messages in thread From: David Marchand @ 2021-11-16 20:23 UTC (permalink / raw) To: Joyce Kong Cc: Maryam Tahhan, Reshma Pattan, Cristian Dumitrescu, Xiaoyun Li, Olivier Matz, Anatoly Burakov, Honnappa Nagarahalli, Konstantin Ananyev, dev, nd, Ruifeng Wang On Tue, Nov 16, 2021 at 10:44 AM Joyce Kong <joyce.kong@arm.com> wrote: > > Remove the unnecessary rte_atomic.h included in app modules. > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> After patch, I still see: $ git grep rte_atomic.h app/ app/test/commands.c:#include <rte_atomic.h> app/test/test_atomic.c:#include <rte_atomic.h> app/test/test_event_timer_adapter.c:#include <rte_atomic.h> I can undertand why the test_atomic would depend on rte_atomic.h :-) but not the rest. Is there a reason? or is it just a miss? -- David Marchand ^ permalink raw reply [flat|nested] 36+ messages in thread
* RE: [PATCH v2 12/12] app: remove unnecessary include of atomic header file 2021-11-16 20:23 ` David Marchand @ 2021-11-17 7:05 ` Joyce Kong 0 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 7:05 UTC (permalink / raw) To: David Marchand Cc: Maryam Tahhan, Reshma Pattan, Cristian Dumitrescu, Xiaoyun Li, Olivier Matz, Anatoly Burakov, Honnappa Nagarahalli, Konstantin Ananyev, dev, nd, Ruifeng Wang <snip> > Subject: Re: [PATCH v2 12/12] app: remove unnecessary include of atomic > header file > > On Tue, Nov 16, 2021 at 10:44 AM Joyce Kong <joyce.kong@arm.com> wrote: > > > > Remove the unnecessary rte_atomic.h included in app modules. > > > > Signed-off-by: Joyce Kong <joyce.kong@arm.com> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> > > After patch, I still see: > > $ git grep rte_atomic.h app/ > app/test/commands.c:#include <rte_atomic.h> > app/test/test_atomic.c:#include <rte_atomic.h> > app/test/test_event_timer_adapter.c:#include <rte_atomic.h> > > I can undertand why the test_atomic would depend on rte_atomic.h :-) but > not the rest. > Is there a reason? or is it just a miss? > > -- > David Marchand Hi David, I checked the rest and it was a miss. Thanks for the remind, would update in v3. Joyce ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 00/12] use compiler atomic builtins for app modules 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong ` (11 preceding siblings ...) 2021-11-16 9:42 ` [PATCH v2 12/12] app: remove unnecessary include of atomic header file Joyce Kong @ 2021-11-17 8:21 ` Joyce Kong 2021-11-17 8:21 ` [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong ` (12 more replies) 12 siblings, 13 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:21 UTC (permalink / raw) Cc: dev, honnappa.nagarahalli, nd, Joyce Kong Since atomic operations have been adopted in DPDK now[1], change rte_atomicNN_xxx APIs to compiler atomic built-ins in app modules[2]. [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/ [2] https://doc.dpdk.org/guides/rel_notes/deprecation.html v3: 1. In pmd_perf test case, move the initialization of polling start before calling rte_eal_remote_launch, so the update is visible to the worker threads.(Honnappa Nagarahalli) 2. Remove the rest rte_atomic.h which miss in v2.(David Marchand) v2: By Honnappa Nagarahalli: 1. Replace the RELAXED barriers with suitable ones for shared data sync in pmd_perf and timer test cases. 2. Avoid unnecessary atomic operations in compress and testpmd modules. 3. Fix some typo. Joyce Kong (12): test/pmd_perf: use compiler atomic builtins for polling sync test/ring_perf: use compiler atomic builtins for lcores sync test/timer: use compiler atomic builtins for sync test/stack_perf: use compiler atomics for lcore sync test/bpf: use compiler atomics for calculation test/func_reentrancy: use compiler atomics for data sync app/eventdev: use compiler atomics for shared data sync app/crypto: use compiler atomic builtins for display sync app/compress: use compiler atomic builtins for display sync app/testpmd: remove atomic operations for port status app/bbdev: use compiler atomics for shared data sync app: remove unnecessary include of atomic header file app/proc-info/main.c | 1 - app/test-bbdev/test_bbdev_perf.c | 135 ++++++++---------- .../comp_perf_test_common.h | 2 +- .../comp_perf_test_cyclecount.c | 15 +- .../comp_perf_test_throughput.c | 10 +- .../comp_perf_test_verify.c | 6 +- app/test-crypto-perf/cperf_test_latency.c | 6 +- .../cperf_test_pmd_cyclecount.c | 9 +- app/test-crypto-perf/cperf_test_throughput.c | 9 +- app/test-crypto-perf/cperf_test_verify.c | 9 +- app/test-eventdev/evt_main.c | 1 - app/test-eventdev/test_order_atq.c | 4 +- app/test-eventdev/test_order_common.c | 4 +- app/test-eventdev/test_order_common.h | 8 +- app/test-eventdev/test_order_queue.c | 4 +- app/test-pipeline/config.c | 1 - app/test-pipeline/init.c | 1 - app/test-pipeline/main.c | 1 - app/test-pipeline/runtime.c | 1 - app/test-pmd/cmdline.c | 1 - app/test-pmd/config.c | 1 - app/test-pmd/csumonly.c | 1 - app/test-pmd/flowgen.c | 1 - app/test-pmd/icmpecho.c | 1 - app/test-pmd/iofwd.c | 1 - app/test-pmd/macfwd.c | 1 - app/test-pmd/macswap.c | 1 - app/test-pmd/parameters.c | 1 - app/test-pmd/rxonly.c | 1 - app/test-pmd/testpmd.c | 58 ++++---- app/test-pmd/txonly.c | 1 - app/test/commands.c | 1 - app/test/test_barrier.c | 1 - app/test/test_bpf.c | 28 ++-- app/test/test_event_timer_adapter.c | 1 - app/test/test_func_reentrancy.c | 27 ++-- app/test/test_mbuf.c | 1 - app/test/test_mp_secondary.c | 1 - app/test/test_pmd_perf.c | 23 +-- app/test/test_ring.c | 1 - app/test/test_ring_perf.c | 9 +- app/test/test_stack_perf.c | 14 +- app/test/test_timer.c | 30 ++-- app/test/test_timer_secondary.c | 1 - 44 files changed, 203 insertions(+), 231 deletions(-) -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong @ 2021-11-17 8:21 ` Joyce Kong 2021-11-17 8:21 ` [PATCH v3 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong ` (11 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:21 UTC (permalink / raw) Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for polling sync in pmd_perf test cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_pmd_perf.c | 23 ++++++++++++----------- 1 file changed, 12 insertions(+), 11 deletions(-) diff --git a/app/test/test_pmd_perf.c b/app/test/test_pmd_perf.c index 1df86ce080..a6bac9d45e 100644 --- a/app/test/test_pmd_perf.c +++ b/app/test/test_pmd_perf.c @@ -10,7 +10,6 @@ #include <rte_cycles.h> #include <rte_ethdev.h> #include <rte_byteorder.h> -#include <rte_atomic.h> #include <rte_malloc.h> #include "packet_burst_generator.h" #include "test.h" @@ -525,7 +524,7 @@ main_loop(__rte_unused void *args) return 0; } -static rte_atomic64_t start; +static uint64_t start; static inline int poll_burst(void *args) @@ -563,8 +562,7 @@ poll_burst(void *args) num[portid] = pkt_per_port; } - while (!rte_atomic64_read(&start)) - ; + rte_wait_until_equal_64(&start, 1, __ATOMIC_ACQUIRE); cur_tsc = rte_rdtsc(); while (total) { @@ -616,16 +614,19 @@ exec_burst(uint32_t flags, int lcore) pkt_per_port = MAX_TRAFFIC_BURST; num = pkt_per_port * conf->nb_ports; - rte_atomic64_init(&start); + /* only when polling first */ + if (flags == SC_BURST_POLL_FIRST) + __atomic_store_n(&start, 1, __ATOMIC_RELAXED); + else + __atomic_store_n(&start, 0, __ATOMIC_RELAXED); - /* start polling thread, but not actually poll yet */ + /* start polling thread + * if in POLL_FIRST mode, poll once launched; + * otherwise, not actually poll yet + */ rte_eal_remote_launch(poll_burst, (void *)&pkt_per_port, lcore); - /* Only when polling first */ - if (flags == SC_BURST_POLL_FIRST) - rte_atomic64_set(&start, 1); - /* start xmit */ i = 0; while (num) { @@ -641,7 +642,7 @@ exec_burst(uint32_t flags, int lcore) /* only when polling second */ if (flags == SC_BURST_XMIT_FIRST) - rte_atomic64_set(&start, 1); + __atomic_store_n(&start, 1, __ATOMIC_RELEASE); /* wait for polling finished */ diff_tsc = rte_eal_wait_lcore(lcore); -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 02/12] test/ring_perf: use compiler atomic builtins for lcores sync 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong 2021-11-17 8:21 ` [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong @ 2021-11-17 8:21 ` Joyce Kong 2021-11-17 8:21 ` [PATCH v3 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong ` (10 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:21 UTC (permalink / raw) To: Honnappa Nagarahalli, Konstantin Ananyev Cc: dev, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for lcores sync in ring_perf test cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test/test_ring_perf.c | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/app/test/test_ring_perf.c b/app/test/test_ring_perf.c index fd82e20412..2d8bb675a3 100644 --- a/app/test/test_ring_perf.c +++ b/app/test/test_ring_perf.c @@ -320,7 +320,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_ring *r, const int esize) return 0; } -static rte_atomic32_t synchro; +static uint32_t synchro; static uint64_t queue_count[RTE_MAX_LCORE]; #define TIME_MS 100 @@ -342,8 +342,7 @@ load_loop_fn_helper(struct thread_params *p, const int esize) /* wait synchro for workers */ if (lcore != rte_get_main_lcore()) - while (rte_atomic32_read(&synchro) == 0) - rte_pause(); + rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); begin = rte_get_timer_cycles(); while (time_diff < hz * TIME_MS / 1000) { @@ -398,12 +397,12 @@ run_on_all_cores(struct rte_ring *r, const int esize) param.r = r; /* clear synchro and start workers */ - rte_atomic32_set(&synchro, 0); + __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED); if (rte_eal_mp_remote_launch(lcore_f, ¶m, SKIP_MAIN) < 0) return -1; /* start synchro and launch test on main */ - rte_atomic32_set(&synchro, 1); + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); lcore_f(¶m); rte_eal_mp_wait_lcore(); -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 03/12] test/timer: use compiler atomic builtins for sync 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong 2021-11-17 8:21 ` [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong 2021-11-17 8:21 ` [PATCH v3 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong @ 2021-11-17 8:21 ` Joyce Kong 2021-11-17 8:21 ` [PATCH v3 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong ` (9 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:21 UTC (permalink / raw) To: Robert Sanford, Erik Gabriel Carrillo Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for lcore_state and collisions sync. Also, move 'main_init_workers' outside of 'timer_stress2_main_loop' to guarantee lcore_state initialized correctly before the threads launched. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test/test_timer.c | 30 +++++++++++++----------------- app/test/test_timer_secondary.c | 1 - 2 files changed, 13 insertions(+), 18 deletions(-) diff --git a/app/test/test_timer.c b/app/test/test_timer.c index a10b2fe9da..c97e5c891c 100644 --- a/app/test/test_timer.c +++ b/app/test/test_timer.c @@ -102,7 +102,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_timer.h> #include <rte_random.h> #include <rte_malloc.h> @@ -203,7 +202,7 @@ timer_stress_main_loop(__rte_unused void *arg) /* Need to synchronize worker lcores through multiple steps. */ enum { WORKER_WAITING = 1, WORKER_RUN_SIGNAL, WORKER_RUNNING, WORKER_FINISHED }; -static rte_atomic16_t lcore_state[RTE_MAX_LCORE]; +static uint16_t lcore_state[RTE_MAX_LCORE]; static void main_init_workers(void) @@ -211,7 +210,7 @@ main_init_workers(void) unsigned i; RTE_LCORE_FOREACH_WORKER(i) { - rte_atomic16_set(&lcore_state[i], WORKER_WAITING); + __atomic_store_n(&lcore_state[i], WORKER_WAITING, __ATOMIC_RELAXED); } } @@ -221,11 +220,10 @@ main_start_workers(void) unsigned i; RTE_LCORE_FOREACH_WORKER(i) { - rte_atomic16_set(&lcore_state[i], WORKER_RUN_SIGNAL); + __atomic_store_n(&lcore_state[i], WORKER_RUN_SIGNAL, __ATOMIC_RELEASE); } RTE_LCORE_FOREACH_WORKER(i) { - while (rte_atomic16_read(&lcore_state[i]) != WORKER_RUNNING) - rte_pause(); + rte_wait_until_equal_16(&lcore_state[i], WORKER_RUNNING, __ATOMIC_ACQUIRE); } } @@ -235,8 +233,7 @@ main_wait_for_workers(void) unsigned i; RTE_LCORE_FOREACH_WORKER(i) { - while (rte_atomic16_read(&lcore_state[i]) != WORKER_FINISHED) - rte_pause(); + rte_wait_until_equal_16(&lcore_state[i], WORKER_FINISHED, __ATOMIC_ACQUIRE); } } @@ -245,9 +242,8 @@ worker_wait_to_start(void) { unsigned lcore_id = rte_lcore_id(); - while (rte_atomic16_read(&lcore_state[lcore_id]) != WORKER_RUN_SIGNAL) - rte_pause(); - rte_atomic16_set(&lcore_state[lcore_id], WORKER_RUNNING); + rte_wait_until_equal_16(&lcore_state[lcore_id], WORKER_RUN_SIGNAL, __ATOMIC_ACQUIRE); + __atomic_store_n(&lcore_state[lcore_id], WORKER_RUNNING, __ATOMIC_RELEASE); } static void @@ -255,7 +251,7 @@ worker_finish(void) { unsigned lcore_id = rte_lcore_id(); - rte_atomic16_set(&lcore_state[lcore_id], WORKER_FINISHED); + __atomic_store_n(&lcore_state[lcore_id], WORKER_FINISHED, __ATOMIC_RELEASE); } @@ -281,13 +277,12 @@ timer_stress2_main_loop(__rte_unused void *arg) unsigned int lcore_id = rte_lcore_id(); unsigned int main_lcore = rte_get_main_lcore(); int32_t my_collisions = 0; - static rte_atomic32_t collisions; + static uint32_t collisions; if (lcore_id == main_lcore) { cb_count = 0; test_failed = 0; - rte_atomic32_set(&collisions, 0); - main_init_workers(); + __atomic_store_n(&collisions, 0, __ATOMIC_RELAXED); timers = rte_malloc(NULL, sizeof(*timers) * NB_STRESS2_TIMERS, 0); if (timers == NULL) { printf("Test Failed\n"); @@ -315,7 +310,7 @@ timer_stress2_main_loop(__rte_unused void *arg) my_collisions++; } if (my_collisions != 0) - rte_atomic32_add(&collisions, my_collisions); + __atomic_fetch_add(&collisions, my_collisions, __ATOMIC_RELAXED); /* wait long enough for timers to expire */ rte_delay_ms(100); @@ -329,7 +324,7 @@ timer_stress2_main_loop(__rte_unused void *arg) /* now check that we get the right number of callbacks */ if (lcore_id == main_lcore) { - my_collisions = rte_atomic32_read(&collisions); + my_collisions = __atomic_load_n(&collisions, __ATOMIC_RELAXED); if (my_collisions != 0) printf("- %d timer reset collisions (OK)\n", my_collisions); rte_timer_manage(); @@ -573,6 +568,7 @@ test_timer(void) /* run a second, slightly different set of stress tests */ printf("\nStart timer stress tests 2\n"); test_failed = 0; + main_init_workers(); rte_eal_mp_remote_launch(timer_stress2_main_loop, NULL, CALL_MAIN); rte_eal_mp_wait_lcore(); if (test_failed) diff --git a/app/test/test_timer_secondary.c b/app/test/test_timer_secondary.c index 16a9f1878b..5795c97f07 100644 --- a/app/test/test_timer_secondary.c +++ b/app/test/test_timer_secondary.c @@ -9,7 +9,6 @@ #include <rte_lcore.h> #include <rte_debug.h> #include <rte_memzone.h> -#include <rte_atomic.h> #include <rte_timer.h> #include <rte_cycles.h> #include <rte_mempool.h> -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 04/12] test/stack_perf: use compiler atomics for lcore sync 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong ` (2 preceding siblings ...) 2021-11-17 8:21 ` [PATCH v3 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong @ 2021-11-17 8:21 ` Joyce Kong 2021-11-17 8:21 ` [PATCH v3 05/12] test/bpf: use compiler atomics for calculation Joyce Kong ` (8 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:21 UTC (permalink / raw) To: Olivier Matz; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for lcore sync in stack_perf test cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test/test_stack_perf.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/app/test/test_stack_perf.c b/app/test/test_stack_perf.c index 4ee40d5d19..1eae00a334 100644 --- a/app/test/test_stack_perf.c +++ b/app/test/test_stack_perf.c @@ -6,7 +6,6 @@ #include <stdio.h> #include <inttypes.h> -#include <rte_atomic.h> #include <rte_cycles.h> #include <rte_launch.h> #include <rte_pause.h> @@ -24,7 +23,7 @@ */ static volatile unsigned int bulk_sizes[] = {8, MAX_BURST}; -static rte_atomic32_t lcore_barrier; +static uint32_t lcore_barrier; struct lcore_pair { unsigned int c1; @@ -144,9 +143,8 @@ bulk_push_pop(void *p) s = args->s; size = args->sz; - rte_atomic32_sub(&lcore_barrier, 1); - while (rte_atomic32_read(&lcore_barrier) != 0) - rte_pause(); + __atomic_fetch_sub(&lcore_barrier, 1, __ATOMIC_RELAXED); + rte_wait_until_equal_32(&lcore_barrier, 0, __ATOMIC_RELAXED); uint64_t start = rte_rdtsc(); @@ -175,7 +173,7 @@ run_on_core_pair(struct lcore_pair *cores, struct rte_stack *s, unsigned int i; for (i = 0; i < RTE_DIM(bulk_sizes); i++) { - rte_atomic32_set(&lcore_barrier, 2); + __atomic_store_n(&lcore_barrier, 2, __ATOMIC_RELAXED); args[0].sz = args[1].sz = bulk_sizes[i]; args[0].s = args[1].s = s; @@ -208,7 +206,7 @@ run_on_n_cores(struct rte_stack *s, lcore_function_t fn, int n) int cnt = 0; double avg; - rte_atomic32_set(&lcore_barrier, n); + __atomic_store_n(&lcore_barrier, n, __ATOMIC_RELAXED); RTE_LCORE_FOREACH_WORKER(lcore_id) { if (++cnt >= n) @@ -302,7 +300,7 @@ __test_stack_perf(uint32_t flags) struct lcore_pair cores; struct rte_stack *s; - rte_atomic32_init(&lcore_barrier); + __atomic_store_n(&lcore_barrier, 0, __ATOMIC_RELAXED); s = rte_stack_create(STACK_NAME, STACK_SIZE, rte_socket_id(), flags); if (s == NULL) { -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 05/12] test/bpf: use compiler atomics for calculation 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong ` (3 preceding siblings ...) 2021-11-17 8:21 ` [PATCH v3 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong @ 2021-11-17 8:21 ` Joyce Kong 2021-11-17 8:21 ` [PATCH v3 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong ` (7 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:21 UTC (permalink / raw) To: Konstantin Ananyev Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for calculation in bpf test cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test/test_bpf.c | 28 ++++++++++++++-------------- 1 file changed, 14 insertions(+), 14 deletions(-) diff --git a/app/test/test_bpf.c b/app/test/test_bpf.c index e3e9a1b0b5..b8be1e3d30 100644 --- a/app/test/test_bpf.c +++ b/app/test/test_bpf.c @@ -1569,32 +1569,32 @@ test_xadd1_check(uint64_t rc, const void *arg) memset(&dfe, 0, sizeof(dfe)); rv = 1; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); rv = -1; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); rv = (int32_t)TEST_FILL_1; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); rv = TEST_MUL_1; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); rv = TEST_MUL_2; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); rv = TEST_JCC_2; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); rv = TEST_JCC_3; - rte_atomic32_add((rte_atomic32_t *)&dfe.u32, rv); - rte_atomic64_add((rte_atomic64_t *)&dfe.u64, rv); + __atomic_fetch_add(&dfe.u32, rv, __ATOMIC_RELAXED); + __atomic_fetch_add(&dfe.u64, rv, __ATOMIC_RELAXED); return cmp_res(__func__, 1, rc, &dfe, dft, sizeof(dfe)); } -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 06/12] test/func_reentrancy: use compiler atomics for data sync 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong ` (4 preceding siblings ...) 2021-11-17 8:21 ` [PATCH v3 05/12] test/bpf: use compiler atomics for calculation Joyce Kong @ 2021-11-17 8:21 ` Joyce Kong 2021-11-17 8:21 ` [PATCH v3 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong ` (6 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:21 UTC (permalink / raw) To: Olivier Matz, Andrew Rybchenko, Bruce Richardson, Vladimir Medvedkin, Honnappa Nagarahalli, Konstantin Ananyev, Anatoly Burakov, Yipeng Wang, Sameh Gobriel Cc: dev, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for shared data sync in func_reentrancy test cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test/test_func_reentrancy.c | 27 +++++++++++++-------------- 1 file changed, 13 insertions(+), 14 deletions(-) diff --git a/app/test/test_func_reentrancy.c b/app/test/test_func_reentrancy.c index 838ab6f0f9..7825c6cb86 100644 --- a/app/test/test_func_reentrancy.c +++ b/app/test/test_func_reentrancy.c @@ -20,7 +20,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_ring.h> #include <rte_mempool.h> @@ -54,12 +53,12 @@ typedef void (*case_clean_t)(unsigned lcore_id); #define MAX_LCORES (RTE_MAX_MEMZONE / (MAX_ITER_MULTI * 4U)) -static rte_atomic32_t obj_count = RTE_ATOMIC32_INIT(0); -static rte_atomic32_t synchro = RTE_ATOMIC32_INIT(0); +static uint32_t obj_count; +static uint32_t synchro; #define WAIT_SYNCHRO_FOR_WORKERS() do { \ if (lcore_self != rte_get_main_lcore()) \ - while (rte_atomic32_read(&synchro) == 0); \ + rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED); \ } while(0) /* @@ -72,7 +71,7 @@ test_eal_init_once(__rte_unused void *arg) WAIT_SYNCHRO_FOR_WORKERS(); - rte_atomic32_set(&obj_count, 1); /* silent the check in the caller */ + __atomic_store_n(&obj_count, 1, __ATOMIC_RELAXED); /* silent the check in the caller */ if (rte_eal_init(0, NULL) != -1) return -1; @@ -116,7 +115,7 @@ ring_create_lookup(__rte_unused void *arg) for (i = 0; i < MAX_ITER_ONCE; i++) { rp = rte_ring_create("fr_test_once", 4096, SOCKET_ID_ANY, 0); if (rp != NULL) - rte_atomic32_inc(&obj_count); + __atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED); } /* create/lookup new ring several times */ @@ -183,7 +182,7 @@ mempool_create_lookup(__rte_unused void *arg) my_obj_init, NULL, SOCKET_ID_ANY, 0); if (mp != NULL) - rte_atomic32_inc(&obj_count); + __atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED); } /* create/lookup new ring several times */ @@ -250,7 +249,7 @@ hash_create_free(__rte_unused void *arg) for (i = 0; i < MAX_ITER_ONCE; i++) { handle = rte_hash_create(&hash_params); if (handle != NULL) - rte_atomic32_inc(&obj_count); + __atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED); } /* create mutiple times simultaneously */ @@ -318,7 +317,7 @@ fbk_create_free(__rte_unused void *arg) for (i = 0; i < MAX_ITER_ONCE; i++) { handle = rte_fbk_hash_create(&fbk_params); if (handle != NULL) - rte_atomic32_inc(&obj_count); + __atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED); } /* create mutiple fbk tables simultaneously */ @@ -384,7 +383,7 @@ lpm_create_free(__rte_unused void *arg) for (i = 0; i < MAX_ITER_ONCE; i++) { lpm = rte_lpm_create("fr_test_once", SOCKET_ID_ANY, &config); if (lpm != NULL) - rte_atomic32_inc(&obj_count); + __atomic_fetch_add(&obj_count, 1, __ATOMIC_RELAXED); } /* create mutiple fbk tables simultaneously */ @@ -445,8 +444,8 @@ launch_test(struct test_case *pt_case) if (pt_case->func == NULL) return -1; - rte_atomic32_set(&obj_count, 0); - rte_atomic32_set(&synchro, 0); + __atomic_store_n(&obj_count, 0, __ATOMIC_RELAXED); + __atomic_store_n(&synchro, 0, __ATOMIC_RELAXED); cores = RTE_MIN(rte_lcore_count(), MAX_LCORES); RTE_LCORE_FOREACH_WORKER(lcore_id) { @@ -456,7 +455,7 @@ launch_test(struct test_case *pt_case) rte_eal_remote_launch(pt_case->func, pt_case->arg, lcore_id); } - rte_atomic32_set(&synchro, 1); + __atomic_store_n(&synchro, 1, __ATOMIC_RELAXED); if (pt_case->func(pt_case->arg) < 0) ret = -1; @@ -471,7 +470,7 @@ launch_test(struct test_case *pt_case) pt_case->clean(lcore_id); } - count = rte_atomic32_read(&obj_count); + count = __atomic_load_n(&obj_count, __ATOMIC_RELAXED); if (count != 1) { printf("%s: common object allocated %d times (should be 1)\n", pt_case->name, count); -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 07/12] app/eventdev: use compiler atomics for shared data sync 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong ` (5 preceding siblings ...) 2021-11-17 8:21 ` [PATCH v3 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong @ 2021-11-17 8:21 ` Joyce Kong 2021-11-17 8:21 ` [PATCH v3 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong ` (5 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:21 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for shared data sync in eventdev cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/test-eventdev/evt_main.c | 1 - app/test-eventdev/test_order_atq.c | 4 ++-- app/test-eventdev/test_order_common.c | 4 ++-- app/test-eventdev/test_order_common.h | 8 ++++---- app/test-eventdev/test_order_queue.c | 4 ++-- 5 files changed, 10 insertions(+), 11 deletions(-) diff --git a/app/test-eventdev/evt_main.c b/app/test-eventdev/evt_main.c index 3534aabca7..194c980c7a 100644 --- a/app/test-eventdev/evt_main.c +++ b/app/test-eventdev/evt_main.c @@ -6,7 +6,6 @@ #include <unistd.h> #include <signal.h> -#include <rte_atomic.h> #include <rte_debug.h> #include <rte_eal.h> #include <rte_eventdev.h> diff --git a/app/test-eventdev/test_order_atq.c b/app/test-eventdev/test_order_atq.c index 71215a07b6..2fee4b4daa 100644 --- a/app/test-eventdev/test_order_atq.c +++ b/app/test-eventdev/test_order_atq.c @@ -28,7 +28,7 @@ order_atq_worker(void *arg, const bool flow_id_cap) uint16_t event = rte_event_dequeue_burst(dev_id, port, &ev, 1, 0); if (!event) { - if (rte_atomic64_read(outstand_pkts) <= 0) + if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0) break; rte_pause(); continue; @@ -64,7 +64,7 @@ order_atq_worker_burst(void *arg, const bool flow_id_cap) BURST_SIZE, 0); if (nb_rx == 0) { - if (rte_atomic64_read(outstand_pkts) <= 0) + if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0) break; rte_pause(); continue; diff --git a/app/test-eventdev/test_order_common.c b/app/test-eventdev/test_order_common.c index d7760061ba..ff7813f9c2 100644 --- a/app/test-eventdev/test_order_common.c +++ b/app/test-eventdev/test_order_common.c @@ -187,7 +187,7 @@ order_test_setup(struct evt_test *test, struct evt_options *opt) evt_err("failed to allocate t->expected_flow_seq memory"); goto exp_nomem; } - rte_atomic64_set(&t->outstand_pkts, opt->nb_pkts); + __atomic_store_n(&t->outstand_pkts, opt->nb_pkts, __ATOMIC_RELAXED); t->err = false; t->nb_pkts = opt->nb_pkts; t->nb_flows = opt->nb_flows; @@ -294,7 +294,7 @@ order_launch_lcores(struct evt_test *test, struct evt_options *opt, while (t->err == false) { uint64_t new_cycles = rte_get_timer_cycles(); - int64_t remaining = rte_atomic64_read(&t->outstand_pkts); + int64_t remaining = __atomic_load_n(&t->outstand_pkts, __ATOMIC_RELAXED); if (remaining <= 0) { t->result = EVT_TEST_SUCCESS; diff --git a/app/test-eventdev/test_order_common.h b/app/test-eventdev/test_order_common.h index cd9d6009ec..92781d9587 100644 --- a/app/test-eventdev/test_order_common.h +++ b/app/test-eventdev/test_order_common.h @@ -48,7 +48,7 @@ struct test_order { * The atomic_* is an expensive operation,Since it is a functional test, * We are using the atomic_ operation to reduce the code complexity. */ - rte_atomic64_t outstand_pkts; + uint64_t outstand_pkts; enum evt_test_result result; uint32_t nb_flows; uint64_t nb_pkts; @@ -95,7 +95,7 @@ static __rte_always_inline void order_process_stage_1(struct test_order *const t, struct rte_event *const ev, const uint32_t nb_flows, uint32_t *const expected_flow_seq, - rte_atomic64_t *const outstand_pkts) + uint64_t *const outstand_pkts) { const uint32_t flow = (uintptr_t)ev->mbuf % nb_flows; /* compare the seqn against expected value */ @@ -113,7 +113,7 @@ order_process_stage_1(struct test_order *const t, */ expected_flow_seq[flow]++; rte_pktmbuf_free(ev->mbuf); - rte_atomic64_sub(outstand_pkts, 1); + __atomic_sub_fetch(outstand_pkts, 1, __ATOMIC_RELAXED); } static __rte_always_inline void @@ -132,7 +132,7 @@ order_process_stage_invalid(struct test_order *const t, const uint8_t port = w->port_id;\ const uint32_t nb_flows = t->nb_flows;\ uint32_t *expected_flow_seq = t->expected_flow_seq;\ - rte_atomic64_t *outstand_pkts = &t->outstand_pkts;\ + uint64_t *outstand_pkts = &t->outstand_pkts;\ if (opt->verbose_level > 1)\ printf("%s(): lcore %d dev_id %d port=%d\n",\ __func__, rte_lcore_id(), dev_id, port) diff --git a/app/test-eventdev/test_order_queue.c b/app/test-eventdev/test_order_queue.c index 621367805a..80eaea5cf5 100644 --- a/app/test-eventdev/test_order_queue.c +++ b/app/test-eventdev/test_order_queue.c @@ -28,7 +28,7 @@ order_queue_worker(void *arg, const bool flow_id_cap) uint16_t event = rte_event_dequeue_burst(dev_id, port, &ev, 1, 0); if (!event) { - if (rte_atomic64_read(outstand_pkts) <= 0) + if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0) break; rte_pause(); continue; @@ -64,7 +64,7 @@ order_queue_worker_burst(void *arg, const bool flow_id_cap) BURST_SIZE, 0); if (nb_rx == 0) { - if (rte_atomic64_read(outstand_pkts) <= 0) + if (__atomic_load_n(outstand_pkts, __ATOMIC_RELAXED) <= 0) break; rte_pause(); continue; -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 08/12] app/crypto: use compiler atomic builtins for display sync 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong ` (6 preceding siblings ...) 2021-11-17 8:21 ` [PATCH v3 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong @ 2021-11-17 8:21 ` Joyce Kong 2021-11-17 8:21 ` [PATCH v3 09/12] app/compress: " Joyce Kong ` (4 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:21 UTC (permalink / raw) To: Declan Doherty, Ciara Power Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic_test_and_set usage to compiler atomic CAS operation for display sync in crypto cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test-crypto-perf/cperf_test_latency.c | 6 ++++-- app/test-crypto-perf/cperf_test_pmd_cyclecount.c | 9 ++++++--- app/test-crypto-perf/cperf_test_throughput.c | 9 ++++++--- app/test-crypto-perf/cperf_test_verify.c | 9 ++++++--- 4 files changed, 22 insertions(+), 11 deletions(-) diff --git a/app/test-crypto-perf/cperf_test_latency.c b/app/test-crypto-perf/cperf_test_latency.c index 69f55de50a..ce49feaba9 100644 --- a/app/test-crypto-perf/cperf_test_latency.c +++ b/app/test-crypto-perf/cperf_test_latency.c @@ -126,7 +126,7 @@ cperf_latency_test_runner(void *arg) uint8_t burst_size_idx = 0; uint32_t imix_idx = 0; - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; if (ctx == NULL) return 0; @@ -307,8 +307,10 @@ cperf_latency_test_runner(void *arg) time_max = tunit*(double)(tsc_max) / tsc_hz; time_min = tunit*(double)(tsc_min) / tsc_hz; + uint16_t exp = 0; if (ctx->options->csv) { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf("\n# lcore, Buffer Size, Burst Size, Pakt Seq #, " "cycles, time (us)"); diff --git a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c index fda97e8ab9..ba1f104f72 100644 --- a/app/test-crypto-perf/cperf_test_pmd_cyclecount.c +++ b/app/test-crypto-perf/cperf_test_pmd_cyclecount.c @@ -404,7 +404,7 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx) state.lcore = rte_lcore_id(); state.linearize = 0; - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; static bool warmup = true; /* @@ -449,8 +449,10 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx) continue; } + uint16_t exp = 0; if (!opts->csv) { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf(PRETTY_HDR_FMT, "lcore id", "Buf Size", "Burst Size", "Enqueued", "Dequeued", "Enq Retries", @@ -466,7 +468,8 @@ cperf_pmd_cyclecount_test_runner(void *test_ctx) state.cycles_per_enq, state.cycles_per_deq); } else { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf(CSV_HDR_FMT, "# lcore id", "Buf Size", "Burst Size", "Enqueued", "Dequeued", "Enq Retries", diff --git a/app/test-crypto-perf/cperf_test_throughput.c b/app/test-crypto-perf/cperf_test_throughput.c index 739ed9e573..51512af2ad 100644 --- a/app/test-crypto-perf/cperf_test_throughput.c +++ b/app/test-crypto-perf/cperf_test_throughput.c @@ -113,7 +113,7 @@ cperf_throughput_test_runner(void *test_ctx) uint8_t burst_size_idx = 0; uint32_t imix_idx = 0; - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; struct rte_crypto_op *ops[ctx->options->max_burst_size]; struct rte_crypto_op *ops_processed[ctx->options->max_burst_size]; @@ -281,8 +281,10 @@ cperf_throughput_test_runner(void *test_ctx) double cycles_per_packet = ((double)tsc_duration / ctx->options->total_ops); + uint16_t exp = 0; if (!ctx->options->csv) { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf("%12s%12s%12s%12s%12s%12s%12s%12s%12s%12s\n\n", "lcore id", "Buf Size", "Burst Size", "Enqueued", "Dequeued", "Failed Enq", @@ -302,7 +304,8 @@ cperf_throughput_test_runner(void *test_ctx) throughput_gbps, cycles_per_packet); } else { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf("#lcore id,Buffer Size(B)," "Burst Size,Enqueued,Dequeued,Failed Enq," "Failed Deq,Ops(Millions),Throughput(Gbps)," diff --git a/app/test-crypto-perf/cperf_test_verify.c b/app/test-crypto-perf/cperf_test_verify.c index 1962438034..496eb0de00 100644 --- a/app/test-crypto-perf/cperf_test_verify.c +++ b/app/test-crypto-perf/cperf_test_verify.c @@ -241,7 +241,7 @@ cperf_verify_test_runner(void *test_ctx) uint64_t ops_deqd = 0, ops_deqd_total = 0, ops_deqd_failed = 0; uint64_t ops_failed = 0; - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; uint64_t i; uint16_t ops_unused = 0; @@ -383,8 +383,10 @@ cperf_verify_test_runner(void *test_ctx) ops_deqd_total += ops_deqd; } + uint16_t exp = 0; if (!ctx->options->csv) { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf("%12s%12s%12s%12s%12s%12s%12s%12s\n\n", "lcore id", "Buf Size", "Burst size", "Enqueued", "Dequeued", "Failed Enq", @@ -401,7 +403,8 @@ cperf_verify_test_runner(void *test_ctx) ops_deqd_failed, ops_failed); } else { - if (rte_atomic16_test_and_set(&display_once)) + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf("\n# lcore id, Buffer Size(B), " "Burst Size,Enqueued,Dequeued,Failed Enq," "Failed Deq,Failed Ops\n"); -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 09/12] app/compress: use compiler atomic builtins for display sync 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong ` (7 preceding siblings ...) 2021-11-17 8:21 ` [PATCH v3 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong @ 2021-11-17 8:21 ` Joyce Kong 2021-11-17 8:21 ` [PATCH v3 10/12] app/testpmd: remove atomic operations for port status Joyce Kong ` (3 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:21 UTC (permalink / raw) Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic_test_and_set usage to compiler atomic CAS operation for display sync. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test-compress-perf/comp_perf_test_common.h | 2 +- .../comp_perf_test_cyclecount.c | 15 +++++++-------- .../comp_perf_test_throughput.c | 10 +++++++--- app/test-compress-perf/comp_perf_test_verify.c | 6 ++++-- 4 files changed, 19 insertions(+), 14 deletions(-) diff --git a/app/test-compress-perf/comp_perf_test_common.h b/app/test-compress-perf/comp_perf_test_common.h index 72705c6a2b..d039e5a29a 100644 --- a/app/test-compress-perf/comp_perf_test_common.h +++ b/app/test-compress-perf/comp_perf_test_common.h @@ -14,7 +14,7 @@ struct cperf_mem_resources { uint16_t qp_id; uint8_t lcore_id; - rte_atomic16_t print_info_once; + uint16_t print_info_once; uint32_t total_bufs; uint8_t *compressed_data; diff --git a/app/test-compress-perf/comp_perf_test_cyclecount.c b/app/test-compress-perf/comp_perf_test_cyclecount.c index c875ddbdac..da55b02b74 100644 --- a/app/test-compress-perf/comp_perf_test_cyclecount.c +++ b/app/test-compress-perf/comp_perf_test_cyclecount.c @@ -466,7 +466,7 @@ cperf_cyclecount_test_runner(void *test_ctx) struct cperf_cyclecount_ctx *ctx = test_ctx; struct comp_test_data *test_data = ctx->ver.options; uint32_t lcore = rte_lcore_id(); - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; static rte_spinlock_t print_spinlock; int i; @@ -486,10 +486,12 @@ cperf_cyclecount_test_runner(void *test_ctx) ctx->ver.mem.lcore_id = lcore; + uint16_t exp = 0; /* * printing information about current compression thread */ - if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once)) + if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp, + 1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf(" lcore: %u," " driver name: %s," " device name: %s," @@ -546,9 +548,10 @@ cperf_cyclecount_test_runner(void *test_ctx) (ctx->ver.mem.total_bufs * test_data->num_iter); /* R E P O R T processing */ - if (rte_atomic16_test_and_set(&display_once)) { + rte_spinlock_lock(&print_spinlock); - rte_spinlock_lock(&print_spinlock); + if (display_once == 0) { + display_once = 1; printf("\nLegend for the table\n" " - Retries section: number of retries for the following operations:\n" @@ -576,12 +579,8 @@ cperf_cyclecount_test_runner(void *test_ctx) "setup/op", "[C-e]", "[C-d]", "[D-e]", "[D-d]"); - - rte_spinlock_unlock(&print_spinlock); } - rte_spinlock_lock(&print_spinlock); - printf("%12u" "%6u" "%12zu" diff --git a/app/test-compress-perf/comp_perf_test_throughput.c b/app/test-compress-perf/comp_perf_test_throughput.c index 13922b658c..d3dff070b0 100644 --- a/app/test-compress-perf/comp_perf_test_throughput.c +++ b/app/test-compress-perf/comp_perf_test_throughput.c @@ -329,15 +329,17 @@ cperf_throughput_test_runner(void *test_ctx) struct cperf_benchmark_ctx *ctx = test_ctx; struct comp_test_data *test_data = ctx->ver.options; uint32_t lcore = rte_lcore_id(); - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; int i, ret = EXIT_SUCCESS; ctx->ver.mem.lcore_id = lcore; + uint16_t exp = 0; /* * printing information about current compression thread */ - if (rte_atomic16_test_and_set(&ctx->ver.mem.print_info_once)) + if (__atomic_compare_exchange_n(&ctx->ver.mem.print_info_once, &exp, + 1, 0, __ATOMIC_RELAXED, __ATOMIC_RELAXED)) printf(" lcore: %u," " driver name: %s," " device name: %s," @@ -391,7 +393,9 @@ cperf_throughput_test_runner(void *test_ctx) ctx->decomp_gbps = rte_get_tsc_hz() / ctx->decomp_tsc_byte * 8 / 1000000000; - if (rte_atomic16_test_and_set(&display_once)) { + exp = 0; + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) { printf("\n%12s%6s%12s%17s%15s%16s\n", "lcore id", "Level", "Comp size", "Comp ratio [%]", "Comp [Gbps]", "Decomp [Gbps]"); diff --git a/app/test-compress-perf/comp_perf_test_verify.c b/app/test-compress-perf/comp_perf_test_verify.c index 5e13257b79..f6e21368e8 100644 --- a/app/test-compress-perf/comp_perf_test_verify.c +++ b/app/test-compress-perf/comp_perf_test_verify.c @@ -388,7 +388,7 @@ cperf_verify_test_runner(void *test_ctx) struct cperf_verify_ctx *ctx = test_ctx; struct comp_test_data *test_data = ctx->options; int ret = EXIT_SUCCESS; - static rte_atomic16_t display_once = RTE_ATOMIC16_INIT(0); + static uint16_t display_once; uint32_t lcore = rte_lcore_id(); ctx->mem.lcore_id = lcore; @@ -427,8 +427,10 @@ cperf_verify_test_runner(void *test_ctx) ctx->ratio = (double) ctx->comp_data_sz / test_data->input_data_sz * 100; + uint16_t exp = 0; if (!ctx->silent) { - if (rte_atomic16_test_and_set(&display_once)) { + if (__atomic_compare_exchange_n(&display_once, &exp, 1, 0, + __ATOMIC_RELAXED, __ATOMIC_RELAXED)) { printf("%12s%6s%12s%17s\n", "lcore id", "Level", "Comp size", "Comp ratio [%]"); } -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 10/12] app/testpmd: remove atomic operations for port status 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong ` (8 preceding siblings ...) 2021-11-17 8:21 ` [PATCH v3 09/12] app/compress: " Joyce Kong @ 2021-11-17 8:21 ` Joyce Kong 2021-11-17 8:21 ` [PATCH v3 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong ` (2 subsequent siblings) 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:21 UTC (permalink / raw) To: Xiaoyun Li; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang The port_status changes do not need to be handled atomically, as they are modified during initialization or through the testpmd prompt instead of multiple threads. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test-pmd/testpmd.c | 58 ++++++++++++++++++++++-------------------- 1 file changed, 31 insertions(+), 27 deletions(-) diff --git a/app/test-pmd/testpmd.c b/app/test-pmd/testpmd.c index a66dfb297c..ed472cacd2 100644 --- a/app/test-pmd/testpmd.c +++ b/app/test-pmd/testpmd.c @@ -36,7 +36,6 @@ #include <rte_alarm.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_malloc.h> @@ -2521,9 +2520,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi) continue; /* Fail to setup rx queue, return */ - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, - RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set back to stopped\n", pi); fprintf(stderr, "Fail to configure port %d hairpin queues\n", @@ -2544,9 +2543,9 @@ setup_hairpin_queues(portid_t pi, portid_t p_pi, uint16_t cnt_pi) continue; /* Fail to setup rx queue, return */ - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, - RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set back to stopped\n", pi); fprintf(stderr, "Fail to configure port %d hairpin queues\n", @@ -2729,8 +2728,9 @@ start_port(portid_t pid) need_check_link_status = 0; port = &ports[pi]; - if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STOPPED, - RTE_PORT_HANDLING) == 0) { + if (port->port_status == RTE_PORT_STOPPED) + port->port_status = RTE_PORT_HANDLING; + else { fprintf(stderr, "Port %d is now not stopped\n", pi); continue; } @@ -2766,8 +2766,9 @@ start_port(portid_t pid) nb_txq + nb_hairpinq, &(port->dev_conf)); if (diag != 0) { - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set back to stopped\n", pi); @@ -2828,9 +2829,9 @@ start_port(portid_t pid) continue; /* Fail to setup tx queue, return */ - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, - RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set back to stopped\n", pi); @@ -2880,9 +2881,9 @@ start_port(portid_t pid) continue; /* Fail to setup rx queue, return */ - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, - RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set back to stopped\n", pi); @@ -2917,16 +2918,18 @@ start_port(portid_t pid) pi, rte_strerror(-diag)); /* Fail to setup rx queue, return */ - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set back to stopped\n", pi); continue; } - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, RTE_PORT_STARTED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STARTED; + else fprintf(stderr, "Port %d can not be set into started\n", pi); @@ -3028,8 +3031,9 @@ stop_port(portid_t pid) } port = &ports[pi]; - if (rte_atomic16_cmpset(&(port->port_status), RTE_PORT_STARTED, - RTE_PORT_HANDLING) == 0) + if (port->port_status == RTE_PORT_STARTED) + port->port_status = RTE_PORT_HANDLING; + else continue; if (hairpin_mode & 0xf) { @@ -3055,8 +3059,9 @@ stop_port(portid_t pid) RTE_LOG(ERR, EAL, "rte_eth_dev_stop failed for port %u\n", pi); - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_HANDLING, RTE_PORT_STOPPED) == 0) + if (port->port_status == RTE_PORT_HANDLING) + port->port_status = RTE_PORT_STOPPED; + else fprintf(stderr, "Port %d can not be set into stopped\n", pi); need_check_link_status = 1; @@ -3119,8 +3124,7 @@ close_port(portid_t pid) } port = &ports[pi]; - if (rte_atomic16_cmpset(&(port->port_status), - RTE_PORT_CLOSED, RTE_PORT_CLOSED) == 1) { + if (port->port_status == RTE_PORT_CLOSED) { fprintf(stderr, "Port %d is already closed\n", pi); continue; } -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 11/12] app/bbdev: use compiler atomics for shared data sync 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong ` (9 preceding siblings ...) 2021-11-17 8:21 ` [PATCH v3 10/12] app/testpmd: remove atomic operations for port status Joyce Kong @ 2021-11-17 8:21 ` Joyce Kong 2021-11-17 8:22 ` [PATCH v3 12/12] app: remove unnecessary include of atomic header file Joyce Kong 2021-11-17 10:02 ` [PATCH v3 00/12] use compiler atomic builtins for app modules David Marchand 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:21 UTC (permalink / raw) To: Nicolas Chautru; +Cc: dev, honnappa.nagarahalli, nd, Joyce Kong, Ruifeng Wang Convert rte_atomic usages to compiler atomic built-ins for shared data sync in bbdev cases. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> --- app/test-bbdev/test_bbdev_perf.c | 135 ++++++++++++++----------------- 1 file changed, 59 insertions(+), 76 deletions(-) diff --git a/app/test-bbdev/test_bbdev_perf.c b/app/test-bbdev/test_bbdev_perf.c index 7b4529789b..0fa119a502 100644 --- a/app/test-bbdev/test_bbdev_perf.c +++ b/app/test-bbdev/test_bbdev_perf.c @@ -133,7 +133,7 @@ struct test_op_params { uint16_t num_to_process; uint16_t num_lcores; int vector_mask; - rte_atomic16_t sync; + uint16_t sync; struct test_buffers q_bufs[RTE_MAX_NUMA_NODES][MAX_QUEUES]; }; @@ -148,9 +148,9 @@ struct thread_params { uint8_t iter_count; double iter_average; double bler; - rte_atomic16_t nb_dequeued; - rte_atomic16_t processing_status; - rte_atomic16_t burst_sz; + uint16_t nb_dequeued; + int16_t processing_status; + uint16_t burst_sz; struct test_op_params *op_params; struct rte_bbdev_dec_op *dec_ops[MAX_BURST]; struct rte_bbdev_enc_op *enc_ops[MAX_BURST]; @@ -2637,46 +2637,46 @@ dequeue_event_callback(uint16_t dev_id, } if (unlikely(event != RTE_BBDEV_EVENT_DEQUEUE)) { - rte_atomic16_set(&tp->processing_status, TEST_FAILED); + __atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED); printf( "Dequeue interrupt handler called for incorrect event!\n"); return; } - burst_sz = rte_atomic16_read(&tp->burst_sz); + burst_sz = __atomic_load_n(&tp->burst_sz, __ATOMIC_RELAXED); num_ops = tp->op_params->num_to_process; if (test_vector.op_type == RTE_BBDEV_OP_TURBO_DEC) deq = rte_bbdev_dequeue_dec_ops(dev_id, queue_id, &tp->dec_ops[ - rte_atomic16_read(&tp->nb_dequeued)], + __atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)], burst_sz); else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_DEC) deq = rte_bbdev_dequeue_ldpc_dec_ops(dev_id, queue_id, &tp->dec_ops[ - rte_atomic16_read(&tp->nb_dequeued)], + __atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)], burst_sz); else if (test_vector.op_type == RTE_BBDEV_OP_LDPC_ENC) deq = rte_bbdev_dequeue_ldpc_enc_ops(dev_id, queue_id, &tp->enc_ops[ - rte_atomic16_read(&tp->nb_dequeued)], + __atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)], burst_sz); else /*RTE_BBDEV_OP_TURBO_ENC*/ deq = rte_bbdev_dequeue_enc_ops(dev_id, queue_id, &tp->enc_ops[ - rte_atomic16_read(&tp->nb_dequeued)], + __atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED)], burst_sz); if (deq < burst_sz) { printf( "After receiving the interrupt all operations should be dequeued. Expected: %u, got: %u\n", burst_sz, deq); - rte_atomic16_set(&tp->processing_status, TEST_FAILED); + __atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED); return; } - if (rte_atomic16_read(&tp->nb_dequeued) + deq < num_ops) { - rte_atomic16_add(&tp->nb_dequeued, deq); + if (__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) + deq < num_ops) { + __atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED); return; } @@ -2713,7 +2713,7 @@ dequeue_event_callback(uint16_t dev_id, if (ret) { printf("Buffers validation failed\n"); - rte_atomic16_set(&tp->processing_status, TEST_FAILED); + __atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED); } switch (test_vector.op_type) { @@ -2734,7 +2734,7 @@ dequeue_event_callback(uint16_t dev_id, break; default: printf("Unknown op type: %d\n", test_vector.op_type); - rte_atomic16_set(&tp->processing_status, TEST_FAILED); + __atomic_store_n(&tp->processing_status, TEST_FAILED, __ATOMIC_RELAXED); return; } @@ -2743,7 +2743,7 @@ dequeue_event_callback(uint16_t dev_id, tp->mbps += (((double)(num_ops * tb_len_bits)) / 1000000.0) / ((double)total_time / (double)rte_get_tsc_hz()); - rte_atomic16_add(&tp->nb_dequeued, deq); + __atomic_fetch_add(&tp->nb_dequeued, deq, __ATOMIC_RELAXED); } static int @@ -2781,11 +2781,10 @@ throughput_intr_lcore_ldpc_dec(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - rte_atomic16_clear(&tp->processing_status); - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops, num_to_process); @@ -2833,17 +2832,15 @@ throughput_intr_lcore_ldpc_dec(void *arg) * the number of operations is not a multiple of * burst size. */ - rte_atomic16_set(&tp->burst_sz, num_to_enq); + __atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED); /* Wait until processing of previous batch is * completed */ - while (rte_atomic16_read(&tp->nb_dequeued) != - (int16_t) enqueued) - rte_pause(); + rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED); } if (j != TEST_REPETITIONS - 1) - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); } return TEST_SUCCESS; @@ -2878,11 +2875,10 @@ throughput_intr_lcore_dec(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - rte_atomic16_clear(&tp->processing_status); - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops, num_to_process); @@ -2923,17 +2919,15 @@ throughput_intr_lcore_dec(void *arg) * the number of operations is not a multiple of * burst size. */ - rte_atomic16_set(&tp->burst_sz, num_to_enq); + __atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED); /* Wait until processing of previous batch is * completed */ - while (rte_atomic16_read(&tp->nb_dequeued) != - (int16_t) enqueued) - rte_pause(); + rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED); } if (j != TEST_REPETITIONS - 1) - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); } return TEST_SUCCESS; @@ -2968,11 +2962,10 @@ throughput_intr_lcore_enc(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - rte_atomic16_clear(&tp->processing_status); - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops, num_to_process); @@ -3012,17 +3005,15 @@ throughput_intr_lcore_enc(void *arg) * the number of operations is not a multiple of * burst size. */ - rte_atomic16_set(&tp->burst_sz, num_to_enq); + __atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED); /* Wait until processing of previous batch is * completed */ - while (rte_atomic16_read(&tp->nb_dequeued) != - (int16_t) enqueued) - rte_pause(); + rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED); } if (j != TEST_REPETITIONS - 1) - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); } return TEST_SUCCESS; @@ -3058,11 +3049,10 @@ throughput_intr_lcore_ldpc_enc(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - rte_atomic16_clear(&tp->processing_status); - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->processing_status, 0, __ATOMIC_RELAXED); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops, num_to_process); @@ -3104,17 +3094,15 @@ throughput_intr_lcore_ldpc_enc(void *arg) * the number of operations is not a multiple of * burst size. */ - rte_atomic16_set(&tp->burst_sz, num_to_enq); + __atomic_store_n(&tp->burst_sz, num_to_enq, __ATOMIC_RELAXED); /* Wait until processing of previous batch is * completed */ - while (rte_atomic16_read(&tp->nb_dequeued) != - (int16_t) enqueued) - rte_pause(); + rte_wait_until_equal_16(&tp->nb_dequeued, enqueued, __ATOMIC_RELAXED); } if (j != TEST_REPETITIONS - 1) - rte_atomic16_clear(&tp->nb_dequeued); + __atomic_store_n(&tp->nb_dequeued, 0, __ATOMIC_RELAXED); } return TEST_SUCCESS; @@ -3148,8 +3136,7 @@ throughput_pmd_lcore_dec(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops); TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops); @@ -3252,8 +3239,7 @@ bler_pmd_lcore_ldpc_dec(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops); TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops); @@ -3382,8 +3368,7 @@ throughput_pmd_lcore_ldpc_dec(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_dec_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops); TEST_ASSERT_SUCCESS(ret, "Allocation failed for %d ops", num_ops); @@ -3499,8 +3484,7 @@ throughput_pmd_lcore_enc(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops); @@ -3590,8 +3574,7 @@ throughput_pmd_lcore_ldpc_enc(void *arg) bufs = &tp->op_params->q_bufs[GET_SOCKET(info.socket_id)][queue_id]; - while (rte_atomic16_read(&tp->op_params->sync) == SYNC_WAIT) - rte_pause(); + rte_wait_until_equal_16(&tp->op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = rte_bbdev_enc_op_alloc_bulk(tp->op_params->mp, ops_enq, num_ops); @@ -3774,7 +3757,7 @@ bler_test(struct active_device *ad, else return TEST_SKIPPED; - rte_atomic16_set(&op_params->sync, SYNC_WAIT); + __atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED); /* Main core is set at first entry */ t_params[0].dev_id = ad->dev_id; @@ -3797,7 +3780,7 @@ bler_test(struct active_device *ad, &t_params[used_cores++], lcore_id); } - rte_atomic16_set(&op_params->sync, SYNC_START); + __atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = bler_function(&t_params[0]); /* Main core is always used */ @@ -3892,7 +3875,7 @@ throughput_test(struct active_device *ad, throughput_function = throughput_pmd_lcore_enc; } - rte_atomic16_set(&op_params->sync, SYNC_WAIT); + __atomic_store_n(&op_params->sync, SYNC_WAIT, __ATOMIC_RELAXED); /* Main core is set at first entry */ t_params[0].dev_id = ad->dev_id; @@ -3915,7 +3898,7 @@ throughput_test(struct active_device *ad, &t_params[used_cores++], lcore_id); } - rte_atomic16_set(&op_params->sync, SYNC_START); + __atomic_store_n(&op_params->sync, SYNC_START, __ATOMIC_RELAXED); ret = throughput_function(&t_params[0]); /* Main core is always used */ @@ -3945,29 +3928,29 @@ throughput_test(struct active_device *ad, * Wait for main lcore operations. */ tp = &t_params[0]; - while ((rte_atomic16_read(&tp->nb_dequeued) < - op_params->num_to_process) && - (rte_atomic16_read(&tp->processing_status) != - TEST_FAILED)) + while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) < + op_params->num_to_process) && + (__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) != + TEST_FAILED)) rte_pause(); tp->ops_per_sec /= TEST_REPETITIONS; tp->mbps /= TEST_REPETITIONS; - ret |= (int)rte_atomic16_read(&tp->processing_status); + ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED); /* Wait for worker lcores operations */ for (used_cores = 1; used_cores < num_lcores; used_cores++) { tp = &t_params[used_cores]; - while ((rte_atomic16_read(&tp->nb_dequeued) < - op_params->num_to_process) && - (rte_atomic16_read(&tp->processing_status) != - TEST_FAILED)) + while ((__atomic_load_n(&tp->nb_dequeued, __ATOMIC_RELAXED) < + op_params->num_to_process) && + (__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED) != + TEST_FAILED)) rte_pause(); tp->ops_per_sec /= TEST_REPETITIONS; tp->mbps /= TEST_REPETITIONS; - ret |= (int)rte_atomic16_read(&tp->processing_status); + ret |= (int)__atomic_load_n(&tp->processing_status, __ATOMIC_RELAXED); } /* Print throughput if test passed */ -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH v3 12/12] app: remove unnecessary include of atomic header file 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong ` (10 preceding siblings ...) 2021-11-17 8:21 ` [PATCH v3 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong @ 2021-11-17 8:22 ` Joyce Kong 2021-11-17 10:02 ` [PATCH v3 00/12] use compiler atomic builtins for app modules David Marchand 12 siblings, 0 replies; 36+ messages in thread From: Joyce Kong @ 2021-11-17 8:22 UTC (permalink / raw) To: Maryam Tahhan, Reshma Pattan, Cristian Dumitrescu, Xiaoyun Li, Erik Gabriel Carrillo, Olivier Matz, Anatoly Burakov, Honnappa Nagarahalli, Konstantin Ananyev Cc: dev, nd, Joyce Kong, Ruifeng Wang Remove the unnecessary rte_atomic.h included in app modules. Signed-off-by: Joyce Kong <joyce.kong@arm.com> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> --- app/proc-info/main.c | 1 - app/test-pipeline/config.c | 1 - app/test-pipeline/init.c | 1 - app/test-pipeline/main.c | 1 - app/test-pipeline/runtime.c | 1 - app/test-pmd/cmdline.c | 1 - app/test-pmd/config.c | 1 - app/test-pmd/csumonly.c | 1 - app/test-pmd/flowgen.c | 1 - app/test-pmd/icmpecho.c | 1 - app/test-pmd/iofwd.c | 1 - app/test-pmd/macfwd.c | 1 - app/test-pmd/macswap.c | 1 - app/test-pmd/parameters.c | 1 - app/test-pmd/rxonly.c | 1 - app/test-pmd/txonly.c | 1 - app/test/commands.c | 1 - app/test/test_barrier.c | 1 - app/test/test_event_timer_adapter.c | 1 - app/test/test_mbuf.c | 1 - app/test/test_mp_secondary.c | 1 - app/test/test_ring.c | 1 - 22 files changed, 22 deletions(-) diff --git a/app/proc-info/main.c b/app/proc-info/main.c index a4271047e6..ebe2d77264 100644 --- a/app/proc-info/main.c +++ b/app/proc-info/main.c @@ -27,7 +27,6 @@ #include <rte_per_lcore.h> #include <rte_lcore.h> #include <rte_log.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_string_fns.h> #include <rte_metrics.h> diff --git a/app/test-pipeline/config.c b/app/test-pipeline/config.c index 33f3f1c827..daf838948b 100644 --- a/app/test-pipeline/config.c +++ b/app/test-pipeline/config.c @@ -21,7 +21,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_launch.h> -#include <rte_atomic.h> #include <rte_cycles.h> #include <rte_prefetch.h> #include <rte_lcore.h> diff --git a/app/test-pipeline/init.c b/app/test-pipeline/init.c index c738019041..eee0719b67 100644 --- a/app/test-pipeline/init.c +++ b/app/test-pipeline/init.c @@ -21,7 +21,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_launch.h> -#include <rte_atomic.h> #include <rte_cycles.h> #include <rte_prefetch.h> #include <rte_lcore.h> diff --git a/app/test-pipeline/main.c b/app/test-pipeline/main.c index 72e4797ff2..1e16794183 100644 --- a/app/test-pipeline/main.c +++ b/app/test-pipeline/main.c @@ -22,7 +22,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_launch.h> -#include <rte_atomic.h> #include <rte_cycles.h> #include <rte_prefetch.h> #include <rte_lcore.h> diff --git a/app/test-pipeline/runtime.c b/app/test-pipeline/runtime.c index 159192bcd8..d939a85d7e 100644 --- a/app/test-pipeline/runtime.c +++ b/app/test-pipeline/runtime.c @@ -21,7 +21,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_launch.h> -#include <rte_atomic.h> #include <rte_cycles.h> #include <rte_prefetch.h> #include <rte_branch_prediction.h> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index 4f51b259fe..4e93f535ff 100644 --- a/app/test-pmd/cmdline.c +++ b/app/test-pmd/cmdline.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_ring.h> #include <rte_mempool.h> diff --git a/app/test-pmd/config.c b/app/test-pmd/config.c index 26cadf39f7..d8b5032b58 100644 --- a/app/test-pmd/config.c +++ b/app/test-pmd/config.c @@ -27,7 +27,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test-pmd/csumonly.c b/app/test-pmd/csumonly.c index 8526d9158a..e0b00abe8c 100644 --- a/app/test-pmd/csumonly.c +++ b/app/test-pmd/csumonly.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c index 5737eaa105..9ceef3b54a 100644 --- a/app/test-pmd/flowgen.c +++ b/app/test-pmd/flowgen.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test-pmd/icmpecho.c b/app/test-pmd/icmpecho.c index 8f1d68a83a..3a85ec3dd1 100644 --- a/app/test-pmd/icmpecho.c +++ b/app/test-pmd/icmpecho.c @@ -20,7 +20,6 @@ #include <rte_cycles.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_memory.h> #include <rte_mempool.h> diff --git a/app/test-pmd/iofwd.c b/app/test-pmd/iofwd.c index 83d098adcb..19cd920f70 100644 --- a/app/test-pmd/iofwd.c +++ b/app/test-pmd/iofwd.c @@ -23,7 +23,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_memcpy.h> #include <rte_mempool.h> diff --git a/app/test-pmd/macfwd.c b/app/test-pmd/macfwd.c index ac50d0b9f8..812a0c721f 100644 --- a/app/test-pmd/macfwd.c +++ b/app/test-pmd/macfwd.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test-pmd/macswap.c b/app/test-pmd/macswap.c index 310bca06af..4627ff83e9 100644 --- a/app/test-pmd/macswap.c +++ b/app/test-pmd/macswap.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test-pmd/parameters.c b/app/test-pmd/parameters.c index 0974b0a38f..2f4f944efa 100644 --- a/app/test-pmd/parameters.c +++ b/app/test-pmd/parameters.c @@ -30,7 +30,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_interrupts.h> diff --git a/app/test-pmd/rxonly.c b/app/test-pmd/rxonly.c index c78fc4609a..d1a579d8d8 100644 --- a/app/test-pmd/rxonly.c +++ b/app/test-pmd/rxonly.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test-pmd/txonly.c b/app/test-pmd/txonly.c index 34bb538379..b8497e733d 100644 --- a/app/test-pmd/txonly.c +++ b/app/test-pmd/txonly.c @@ -24,7 +24,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_mempool.h> #include <rte_mbuf.h> diff --git a/app/test/commands.c b/app/test/commands.c index 76f6ee5d23..2dced3bc44 100644 --- a/app/test/commands.c +++ b/app/test/commands.c @@ -25,7 +25,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_ring.h> #include <rte_malloc.h> diff --git a/app/test/test_barrier.c b/app/test/test_barrier.c index c27f8a0742..898c2516ed 100644 --- a/app/test/test_barrier.c +++ b/app/test/test_barrier.c @@ -24,7 +24,6 @@ #include <rte_memory.h> #include <rte_per_lcore.h> #include <rte_launch.h> -#include <rte_atomic.h> #include <rte_eal.h> #include <rte_lcore.h> #include <rte_pause.h> diff --git a/app/test/test_event_timer_adapter.c b/app/test/test_event_timer_adapter.c index 12c00e678e..25bac2d155 100644 --- a/app/test/test_event_timer_adapter.c +++ b/app/test/test_event_timer_adapter.c @@ -5,7 +5,6 @@ #include <math.h> -#include <rte_atomic.h> #include <rte_common.h> #include <rte_cycles.h> #include <rte_debug.h> diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c index f93bcef8a9..d53126710f 100644 --- a/app/test/test_mbuf.c +++ b/app/test/test_mbuf.c @@ -21,7 +21,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_ring.h> #include <rte_mempool.h> diff --git a/app/test/test_mp_secondary.c b/app/test/test_mp_secondary.c index 5b6f05dbb1..021ca0547f 100644 --- a/app/test/test_mp_secondary.c +++ b/app/test/test_mp_secondary.c @@ -28,7 +28,6 @@ #include <rte_lcore.h> #include <rte_errno.h> #include <rte_branch_prediction.h> -#include <rte_atomic.h> #include <rte_ring.h> #include <rte_debug.h> #include <rte_log.h> diff --git a/app/test/test_ring.c b/app/test/test_ring.c index fb8532a409..bde33ab4a1 100644 --- a/app/test/test_ring.c +++ b/app/test/test_ring.c @@ -20,7 +20,6 @@ #include <rte_eal.h> #include <rte_per_lcore.h> #include <rte_lcore.h> -#include <rte_atomic.h> #include <rte_branch_prediction.h> #include <rte_malloc.h> #include <rte_ring.h> -- 2.25.1 ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH v3 00/12] use compiler atomic builtins for app modules 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong ` (11 preceding siblings ...) 2021-11-17 8:22 ` [PATCH v3 12/12] app: remove unnecessary include of atomic header file Joyce Kong @ 2021-11-17 10:02 ` David Marchand 12 siblings, 0 replies; 36+ messages in thread From: David Marchand @ 2021-11-17 10:02 UTC (permalink / raw) To: Joyce Kong; +Cc: dev, Honnappa Nagarahalli, nd On Wed, Nov 17, 2021 at 9:22 AM Joyce Kong <joyce.kong@arm.com> wrote: > > Since atomic operations have been adopted in DPDK now[1], > change rte_atomicNN_xxx APIs to compiler atomic built-ins > in app modules[2]. > > [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/ > [2] https://doc.dpdk.org/guides/rel_notes/deprecation.html > > v3: > 1. In pmd_perf test case, move the initialization of polling > start before calling rte_eal_remote_launch, so the update > is visible to the worker threads.(Honnappa Nagarahalli) > 2. Remove the rest rte_atomic.h which miss in v2.(David Marchand) > > v2: > By Honnappa Nagarahalli: > 1. Replace the RELAXED barriers with suitable ones for shared > data sync in pmd_perf and timer test cases. > 2. Avoid unnecessary atomic operations in compress and testpmd > modules. > 3. Fix some typo. > > Joyce Kong (12): > test/pmd_perf: use compiler atomic builtins for polling sync > test/ring_perf: use compiler atomic builtins for lcores sync > test/timer: use compiler atomic builtins for sync > test/stack_perf: use compiler atomics for lcore sync > test/bpf: use compiler atomics for calculation > test/func_reentrancy: use compiler atomics for data sync > app/eventdev: use compiler atomics for shared data sync > app/crypto: use compiler atomic builtins for display sync > app/compress: use compiler atomic builtins for display sync > app/testpmd: remove atomic operations for port status > app/bbdev: use compiler atomics for shared data sync > app: remove unnecessary include of atomic header file There were cleanups of unneeded rte_atomic.h inclusion along the series: I moved all of them to the last patch so that patches focus on what their commitlog describes. Series applied, thanks. -- David Marchand ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2021-11-17 10:02 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-11-16 9:41 [PATCH v2 00/12] use compiler atomic builtins for app modules Joyce Kong 2021-11-16 9:41 ` [PATCH v2 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong 2021-11-16 21:30 ` Honnappa Nagarahalli 2021-11-16 9:41 ` [PATCH v2 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong 2021-11-16 9:41 ` [PATCH v2 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong 2021-11-16 19:52 ` Honnappa Nagarahalli 2021-11-16 20:20 ` David Marchand 2021-11-16 21:21 ` Honnappa Nagarahalli 2021-11-17 9:29 ` David Marchand 2021-11-16 9:41 ` [PATCH v2 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong 2021-11-16 9:41 ` [PATCH v2 05/12] test/bpf: use compiler atomics for calculation Joyce Kong 2021-11-16 9:41 ` [PATCH v2 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong 2021-11-16 9:42 ` [PATCH v2 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong 2021-11-16 9:42 ` [PATCH v2 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong 2021-11-16 9:42 ` [PATCH v2 09/12] app/compress: " Joyce Kong 2021-11-16 20:15 ` Honnappa Nagarahalli 2021-11-16 9:42 ` [PATCH v2 10/12] app/testpmd: remove atomic operations for port status Joyce Kong 2021-11-16 21:34 ` Honnappa Nagarahalli 2021-11-16 9:42 ` [PATCH v2 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong 2021-11-16 9:42 ` [PATCH v2 12/12] app: remove unnecessary include of atomic header file Joyce Kong 2021-11-16 20:23 ` David Marchand 2021-11-17 7:05 ` Joyce Kong 2021-11-17 8:21 ` [PATCH v3 00/12] use compiler atomic builtins for app modules Joyce Kong 2021-11-17 8:21 ` [PATCH v3 01/12] test/pmd_perf: use compiler atomic builtins for polling sync Joyce Kong 2021-11-17 8:21 ` [PATCH v3 02/12] test/ring_perf: use compiler atomic builtins for lcores sync Joyce Kong 2021-11-17 8:21 ` [PATCH v3 03/12] test/timer: use compiler atomic builtins for sync Joyce Kong 2021-11-17 8:21 ` [PATCH v3 04/12] test/stack_perf: use compiler atomics for lcore sync Joyce Kong 2021-11-17 8:21 ` [PATCH v3 05/12] test/bpf: use compiler atomics for calculation Joyce Kong 2021-11-17 8:21 ` [PATCH v3 06/12] test/func_reentrancy: use compiler atomics for data sync Joyce Kong 2021-11-17 8:21 ` [PATCH v3 07/12] app/eventdev: use compiler atomics for shared " Joyce Kong 2021-11-17 8:21 ` [PATCH v3 08/12] app/crypto: use compiler atomic builtins for display sync Joyce Kong 2021-11-17 8:21 ` [PATCH v3 09/12] app/compress: " Joyce Kong 2021-11-17 8:21 ` [PATCH v3 10/12] app/testpmd: remove atomic operations for port status Joyce Kong 2021-11-17 8:21 ` [PATCH v3 11/12] app/bbdev: use compiler atomics for shared data sync Joyce Kong 2021-11-17 8:22 ` [PATCH v3 12/12] app: remove unnecessary include of atomic header file Joyce Kong 2021-11-17 10:02 ` [PATCH v3 00/12] use compiler atomic builtins for app modules David Marchand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).