DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test
@ 2021-06-04  9:46 Joyce Kong
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 1/8] test/ticketlock: use GCC atomic builtins for lcores sync Joyce Kong
                   ` (10 more replies)
  0 siblings, 11 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-04  9:46 UTC (permalink / raw)
  To: thomas, david.marchand, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Since C11 memory model is adopted in DPDK now[1], use GCC's 
atomic builtins in test cases.

[1]https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/

Joyce Kong (8):
  test/ticketlock: use GCC atomic builtins for lcores sync
  test/spinlock: use GCC atomic builtins for lcores sync
  test/rwlock: use GCC atomic builtins for lcores sync
  test/mcslock: use GCC atomic builtins for lcores sync
  test/mempool: remove unused variable for lcores sync
  test/mempool_perf: use GCC atomic builtins for lcores sync
  test/service_cores: use GCC atomic builtins for lock sync
  test/rcu_perf: use GCC atomic builtins for data sync

 app/test/test_mcslock.c       | 13 +++--
 app/test/test_mempool.c       |  5 --
 app/test/test_mempool_perf.c  | 12 ++---
 app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------
 app/test/test_rwlock.c        |  9 ++--
 app/test/test_service_cores.c | 36 +++++++------
 app/test/test_spinlock.c      | 10 ++--
 app/test/test_ticketlock.c    |  9 ++--
 8 files changed, 93 insertions(+), 99 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v1 1/8] test/ticketlock: use GCC atomic builtins for lcores sync
  2021-06-04  9:46 [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Joyce Kong
@ 2021-06-04  9:46 ` Joyce Kong
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 2/8] test/spinlock: " Joyce Kong
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-04  9:46 UTC (permalink / raw)
  To: thomas, david.marchand, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to GCC atomic builtins for lcores sync
in ticketlock testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
---
 app/test/test_ticketlock.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/app/test/test_ticketlock.c b/app/test/test_ticketlock.c
index 7aab8665b..9aa212fa9 100644
--- a/app/test/test_ticketlock.c
+++ b/app/test/test_ticketlock.c
@@ -9,7 +9,6 @@
 #include <sys/queue.h>
 #include <unistd.h>
 
-#include <rte_atomic.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
 #include <rte_eal.h>
@@ -49,7 +48,7 @@ static rte_ticketlock_t tl_tab[RTE_MAX_LCORE];
 static rte_ticketlock_recursive_t tlr;
 static unsigned int count;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 static int
 test_ticketlock_per_core(__rte_unused void *arg)
@@ -112,7 +111,7 @@ load_loop_fn(void *func_param)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0)
+		while (__atomic_load_n(&synchro, __ATOMIC_RELAXED) == 0)
 			;
 
 	begin = rte_rdtsc_precise();
@@ -155,11 +154,11 @@ test_ticketlock_perf(void)
 	printf("\nTest with lock on %u cores...\n", rte_lcore_count());
 
 	/* Clear synchro and start workers */
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(&lock);
 
 	rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v1 2/8] test/spinlock: use GCC atomic builtins for lcores sync
  2021-06-04  9:46 [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Joyce Kong
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 1/8] test/ticketlock: use GCC atomic builtins for lcores sync Joyce Kong
@ 2021-06-04  9:46 ` Joyce Kong
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 3/8] test/rwlock: " Joyce Kong
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-04  9:46 UTC (permalink / raw)
  To: thomas, david.marchand, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to GCC atomic builtins for lcores sync
in spinlock testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_spinlock.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/app/test/test_spinlock.c b/app/test/test_spinlock.c
index 054fb43a9..77b9b7086 100644
--- a/app/test/test_spinlock.c
+++ b/app/test/test_spinlock.c
@@ -17,7 +17,6 @@
 #include <rte_lcore.h>
 #include <rte_cycles.h>
 #include <rte_spinlock.h>
-#include <rte_atomic.h>
 
 #include "test.h"
 
@@ -49,7 +48,7 @@ static rte_spinlock_t sl_tab[RTE_MAX_LCORE];
 static rte_spinlock_recursive_t slr;
 static unsigned count = 0;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 static int
 test_spinlock_per_core(__rte_unused void *arg)
@@ -111,7 +110,8 @@ load_loop_fn(void *func_param)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0);
+		while (__atomic_load_n(&synchro, __ATOMIC_RELAXED) == 0)
+			;
 
 	begin = rte_get_timer_cycles();
 	while (lcount < MAX_LOOP) {
@@ -150,11 +150,11 @@ test_spinlock_perf(void)
 	printf("\nTest with lock on %u cores...\n", rte_lcore_count());
 
 	/* Clear synchro and start workers */
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(&lock);
 
 	rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v1 3/8] test/rwlock: use GCC atomic builtins for lcores sync
  2021-06-04  9:46 [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Joyce Kong
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 1/8] test/ticketlock: use GCC atomic builtins for lcores sync Joyce Kong
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 2/8] test/spinlock: " Joyce Kong
@ 2021-06-04  9:46 ` Joyce Kong
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 4/8] test/mcslock: " Joyce Kong
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-04  9:46 UTC (permalink / raw)
  To: thomas, david.marchand, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to GCC atomic builtins for lcores sync
in rwlock testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_rwlock.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/app/test/test_rwlock.c b/app/test/test_rwlock.c
index b47150a86..ef89ae44c 100644
--- a/app/test/test_rwlock.c
+++ b/app/test/test_rwlock.c
@@ -13,7 +13,6 @@
 #include <rte_memory.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_rwlock.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
@@ -36,7 +35,7 @@
 
 static rte_rwlock_t sl;
 static rte_rwlock_t sl_tab[RTE_MAX_LCORE];
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 enum {
 	LC_TYPE_RDLOCK,
@@ -102,7 +101,7 @@ load_loop_fn(__rte_unused void *arg)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0)
+		while (__atomic_load_n(&synchro, __ATOMIC_RELAXED) == 0)
 			;
 
 	begin = rte_rdtsc_precise();
@@ -136,12 +135,12 @@ test_rwlock_perf(void)
 	printf("\nRwlock Perf Test on %u cores...\n", rte_lcore_count());
 
 	/* clear synchro and start workers */
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 	if (rte_eal_mp_remote_launch(load_loop_fn, NULL, SKIP_MAIN) < 0)
 		return -1;
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(NULL);
 
 	rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v1 4/8] test/mcslock: use GCC atomic builtins for lcores sync
  2021-06-04  9:46 [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Joyce Kong
                   ` (2 preceding siblings ...)
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 3/8] test/rwlock: " Joyce Kong
@ 2021-06-04  9:46 ` Joyce Kong
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 5/8] test/mempool: remove unused variable " Joyce Kong
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-04  9:46 UTC (permalink / raw)
  To: thomas, david.marchand, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to GCC atomic builtins for lcores sync
in mcslock testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_mcslock.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c
index 80eaecc90..e6bdeb966 100644
--- a/app/test/test_mcslock.c
+++ b/app/test/test_mcslock.c
@@ -17,7 +17,6 @@
 #include <rte_lcore.h>
 #include <rte_cycles.h>
 #include <rte_mcslock.h>
-#include <rte_atomic.h>
 
 #include "test.h"
 
@@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf;
 
 static unsigned int count;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 static int
 test_mcslock_per_core(__rte_unused void *arg)
@@ -76,7 +75,7 @@ load_loop_fn(void *func_param)
 	rte_mcslock_t ml_perf_me;
 
 	/* wait synchro */
-	while (rte_atomic32_read(&synchro) == 0)
+	while (__atomic_load_n(&synchro, __ATOMIC_RELAXED) == 0)
 		;
 
 	begin = rte_get_timer_cycles();
@@ -102,15 +101,15 @@ test_mcslock_perf(void)
 	const unsigned int lcore = rte_lcore_id();
 
 	printf("\nTest with no lock on single core...\n");
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(&lock);
 	printf("Core [%u] Cost Time = %"PRIu64" us\n",
 			lcore, time_count[lcore]);
 	memset(time_count, 0, sizeof(time_count));
 
 	printf("\nTest with lock on single core...\n");
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	lock = 1;
-	rte_atomic32_set(&synchro, 1);
 	load_loop_fn(&lock);
 	printf("Core [%u] Cost Time = %"PRIu64" us\n",
 			lcore, time_count[lcore]);
@@ -118,11 +117,11 @@ test_mcslock_perf(void)
 
 	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
 
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(&lock);
 
 	rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v1 5/8] test/mempool: remove unused variable for lcores sync
  2021-06-04  9:46 [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Joyce Kong
                   ` (3 preceding siblings ...)
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 4/8] test/mcslock: " Joyce Kong
@ 2021-06-04  9:46 ` Joyce Kong
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 6/8] test/mempool_perf: use GCC atomic builtins " Joyce Kong
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-04  9:46 UTC (permalink / raw)
  To: thomas, david.marchand, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Remove the unused synchro variable as there is no lcores
sync in mempool function test.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_mempool.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index 3adadd673..7675a3e60 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_spinlock.h>
@@ -57,8 +56,6 @@
 		goto label;						\
 	} while (0)
 
-static rte_atomic32_t synchro;
-
 /*
  * save the object number in the first 4 bytes of object data. All
  * other bytes are set to 0.
@@ -491,8 +488,6 @@ test_mempool(void)
 	};
 	const char *default_pool_ops = rte_mbuf_best_mempool_ops();
 
-	rte_atomic32_init(&synchro);
-
 	/* create a mempool (without cache) */
 	mp_nocache = rte_mempool_create("test_nocache", MEMPOOL_SIZE,
 		MEMPOOL_ELT_SIZE, 0, 0,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v1 6/8] test/mempool_perf: use GCC atomic builtins for lcores sync
  2021-06-04  9:46 [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Joyce Kong
                   ` (4 preceding siblings ...)
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 5/8] test/mempool: remove unused variable " Joyce Kong
@ 2021-06-04  9:46 ` Joyce Kong
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 7/8] test/service_cores: use GCC atomic builtins for lock sync Joyce Kong
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-04  9:46 UTC (permalink / raw)
  To: thomas, david.marchand, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to GCC atomic builtins for lcores sync
in mempool_perf testcases. Meanwhile, remove unnecessary synchro
init as it would be set to 0 when launching cores.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_mempool_perf.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index d7d0aaa33..9271378aa 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_spinlock.h>
@@ -83,7 +82,7 @@
 static int use_external_cache;
 static unsigned external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 /* number of objects in one bulk operation (get or put) */
 static unsigned n_get_bulk;
@@ -145,7 +144,8 @@ per_lcore_mempool_test(void *arg)
 
 	/* wait synchro for workers */
 	if (lcore_id != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0);
+		while (__atomic_load_n(&synchro, __ATOMIC_RELAXED) == 0)
+			;
 
 	start_cycles = rte_get_timer_cycles();
 
@@ -198,7 +198,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
 	int ret;
 	unsigned cores_save = cores;
 
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 
 	/* reset stats */
 	memset(stats, 0, sizeof(stats));
@@ -223,7 +223,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
 	}
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 
 	ret = per_lcore_mempool_test(mp);
 
@@ -288,8 +288,6 @@ test_mempool_perf(void)
 	const char *default_pool_ops;
 	int ret = -1;
 
-	rte_atomic32_init(&synchro);
-
 	/* create a mempool (without cache) */
 	mp_nocache = rte_mempool_create("perf_test_nocache", MEMPOOL_SIZE,
 					MEMPOOL_ELT_SIZE, 0, 0,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v1 7/8] test/service_cores: use GCC atomic builtins for lock sync
  2021-06-04  9:46 [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Joyce Kong
                   ` (5 preceding siblings ...)
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 6/8] test/mempool_perf: use GCC atomic builtins " Joyce Kong
@ 2021-06-04  9:46 ` Joyce Kong
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 8/8] test/rcu_perf: use GCC atomic builtins for data sync Joyce Kong
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-04  9:46 UTC (permalink / raw)
  To: thomas, david.marchand, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to GCC atomic builtins for lock sync
in service core testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_service_cores.c | 36 +++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index 37d7172d5..9d908d44e 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -53,18 +53,20 @@ static int32_t dummy_cb(void *args)
 static int32_t dummy_mt_unsafe_cb(void *args)
 {
 	/* before running test, the initialization has set pass_test to 1.
-	 * If the cmpset in service-cores is working correctly, the code here
+	 * If the cas in service-cores is working correctly, the code here
 	 * should never fail to take the lock. If the lock *is* taken, fail the
 	 * test, because two threads are concurrently in a non-MT safe callback.
 	 */
 	uint32_t *test_params = args;
-	uint32_t *atomic_lock = &test_params[0];
+	uint32_t *lock = &test_params[0];
 	uint32_t *pass_test = &test_params[1];
-	int lock_taken = rte_atomic32_cmpset(atomic_lock, 0, 1);
+	uint32_t exp = 0;
+	int lock_taken = __atomic_compare_exchange_n(lock, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED);
 	if (lock_taken) {
 		/* delay with the lock held */
 		rte_delay_ms(250);
-		rte_atomic32_clear((rte_atomic32_t *)atomic_lock);
+		__atomic_store_n(lock, 0, __ATOMIC_RELAXED);
 	} else {
 		/* 2nd thread will fail to take lock, so set pass flag */
 		*pass_test = 0;
@@ -83,13 +85,15 @@ static int32_t dummy_mt_safe_cb(void *args)
 	 *    that 2 threads are running the callback at the same time: MT safe
 	 */
 	uint32_t *test_params = args;
-	uint32_t *atomic_lock = &test_params[0];
+	uint32_t *lock = &test_params[0];
 	uint32_t *pass_test = &test_params[1];
-	int lock_taken = rte_atomic32_cmpset(atomic_lock, 0, 1);
+	uint32_t exp = 0;
+	int lock_taken = __atomic_compare_exchange_n(lock, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED);
 	if (lock_taken) {
 		/* delay with the lock held */
 		rte_delay_ms(250);
-		rte_atomic32_clear((rte_atomic32_t *)atomic_lock);
+		__atomic_store_n(lock, 0, __ATOMIC_RELAXED);
 	} else {
 		/* 2nd thread will fail to take lock, so set pass flag */
 		*pass_test = 1;
@@ -622,9 +626,9 @@ service_threaded_test(int mt_safe)
 	TEST_ASSERT_EQUAL(0, rte_service_lcore_add(slcore_2),
 			"mt safe lcore add fail");
 
-	/* Use atomic locks to verify that two threads are in the same function
-	 * at the same time. These are passed to the unit tests through the
-	 * callback userdata parameter
+	/* Use locks to verify that two threads are in the same function
+	 * at the same time. These are passed to the unit tests through
+	 * the callback userdata parameter.
 	 */
 	uint32_t test_params[2];
 	memset(test_params, 0, sizeof(uint32_t) * 2);
@@ -713,7 +717,7 @@ service_mt_safe_poll(void)
 }
 
 /* tests a NON mt safe service with two cores, the callback is serialized
- * using the atomic cmpset.
+ * using the cas.
  */
 static int
 service_mt_unsafe_poll(void)
@@ -735,17 +739,17 @@ delay_as_a_mt_safe_service(void *args)
 	RTE_SET_USED(args);
 	uint32_t *params = args;
 
-	/* retrieve done flag and atomic lock to inc/dec */
+	/* retrieve done flag and lock to add/sub */
 	uint32_t *done = &params[0];
-	rte_atomic32_t *lock = (rte_atomic32_t *)&params[1];
+	uint32_t *lock = &params[1];
 
 	while (!*done) {
-		rte_atomic32_inc(lock);
+		__atomic_add_fetch(lock, 1, __ATOMIC_RELAXED);
 		rte_delay_us(500);
-		if (rte_atomic32_read(lock) > 1)
+		if (__atomic_load_n(lock, __ATOMIC_RELAXED) > 1)
 			/* pass: second core has simultaneously incremented */
 			*done = 1;
-		rte_atomic32_dec(lock);
+		__atomic_sub_fetch(lock, 1, __ATOMIC_RELAXED);
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v1 8/8] test/rcu_perf: use GCC atomic builtins for data sync
  2021-06-04  9:46 [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Joyce Kong
                   ` (6 preceding siblings ...)
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 7/8] test/service_cores: use GCC atomic builtins for lock sync Joyce Kong
@ 2021-06-04  9:46 ` Joyce Kong
  2021-06-04 19:57 ` [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Stephen Hemminger
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-04  9:46 UTC (permalink / raw)
  To: thomas, david.marchand, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Covert rte_atomic usages to GCC atomic builtins in rcu_perf
testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/app/test/test_rcu_qsbr_perf.c b/app/test/test_rcu_qsbr_perf.c
index 3017e7112..cf7b158d2 100644
--- a/app/test/test_rcu_qsbr_perf.c
+++ b/app/test/test_rcu_qsbr_perf.c
@@ -30,8 +30,8 @@ static volatile uint32_t thr_id;
 static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
 static struct rte_hash *h;
 static char hash_name[8];
-static rte_atomic64_t updates, checks;
-static rte_atomic64_t update_cycles, check_cycles;
+static uint64_t updates, checks;
+static uint64_t update_cycles, check_cycles;
 
 /* Scale down results to 1000 operations to support lower
  * granularity clocks.
@@ -81,8 +81,8 @@ test_rcu_qsbr_reader_perf(void *arg)
 	}
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&update_cycles, cycles);
-	rte_atomic64_add(&updates, loop_cnt);
+	__atomic_fetch_add(&update_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&updates, loop_cnt, __ATOMIC_RELAXED);
 
 	/* Make the thread offline */
 	rte_rcu_qsbr_thread_offline(t[0], thread_id);
@@ -113,8 +113,8 @@ test_rcu_qsbr_writer_perf(void *arg)
 	} while (loop_cnt < 20000000);
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&check_cycles, cycles);
-	rte_atomic64_add(&checks, loop_cnt);
+	__atomic_fetch_add(&check_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&checks, loop_cnt, __ATOMIC_RELAXED);
 	return 0;
 }
 
@@ -130,10 +130,10 @@ test_rcu_qsbr_perf(void)
 
 	writer_done = 0;
 
-	rte_atomic64_clear(&updates);
-	rte_atomic64_clear(&update_cycles);
-	rte_atomic64_clear(&checks);
-	rte_atomic64_clear(&check_cycles);
+	__atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
 	printf("\nPerf Test: %d Readers/1 Writer('wait' in qsbr_check == true)\n",
 		num_cores - 1);
@@ -168,15 +168,15 @@ test_rcu_qsbr_perf(void)
 	rte_eal_mp_wait_lcore();
 
 	printf("Total quiescent state updates = %"PRIi64"\n",
-		rte_atomic64_read(&updates));
+		__atomic_load_n(&updates, __ATOMIC_RELAXED));
 	printf("Cycles per %d quiescent state updates: %"PRIi64"\n",
 		RCU_SCALE_DOWN,
-		rte_atomic64_read(&update_cycles) /
-		(rte_atomic64_read(&updates) / RCU_SCALE_DOWN));
-	printf("Total RCU checks = %"PRIi64"\n", rte_atomic64_read(&checks));
+		__atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+		(__atomic_load_n(&updates, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
+	printf("Total RCU checks = %"PRIi64"\n", __atomic_load_n(&checks, __ATOMIC_RELAXED));
 	printf("Cycles per %d checks: %"PRIi64"\n", RCU_SCALE_DOWN,
-		rte_atomic64_read(&check_cycles) /
-		(rte_atomic64_read(&checks) / RCU_SCALE_DOWN));
+		__atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+		(__atomic_load_n(&checks, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
 
 	rte_free(t[0]);
 
@@ -193,8 +193,8 @@ test_rcu_qsbr_rperf(void)
 	size_t sz;
 	unsigned int i, tmp_num_cores;
 
-	rte_atomic64_clear(&updates);
-	rte_atomic64_clear(&update_cycles);
+	__atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
 
 	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
 
@@ -220,11 +220,11 @@ test_rcu_qsbr_rperf(void)
 	rte_eal_mp_wait_lcore();
 
 	printf("Total quiescent state updates = %"PRIi64"\n",
-		rte_atomic64_read(&updates));
+		__atomic_load_n(&updates, __ATOMIC_RELAXED));
 	printf("Cycles per %d quiescent state updates: %"PRIi64"\n",
 		RCU_SCALE_DOWN,
-		rte_atomic64_read(&update_cycles) /
-		(rte_atomic64_read(&updates) / RCU_SCALE_DOWN));
+		__atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+		(__atomic_load_n(&updates, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
 
 	rte_free(t[0]);
 
@@ -241,8 +241,8 @@ test_rcu_qsbr_wperf(void)
 	size_t sz;
 	unsigned int i;
 
-	rte_atomic64_clear(&checks);
-	rte_atomic64_clear(&check_cycles);
+	__atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
 	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
 
@@ -266,10 +266,10 @@ test_rcu_qsbr_wperf(void)
 	/* Wait until all readers have exited */
 	rte_eal_mp_wait_lcore();
 
-	printf("Total RCU checks = %"PRIi64"\n", rte_atomic64_read(&checks));
+	printf("Total RCU checks = %"PRIi64"\n", __atomic_load_n(&checks, __ATOMIC_RELAXED));
 	printf("Cycles per %d checks: %"PRIi64"\n", RCU_SCALE_DOWN,
-		rte_atomic64_read(&check_cycles) /
-		(rte_atomic64_read(&checks) / RCU_SCALE_DOWN));
+		__atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+		(__atomic_load_n(&checks, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
 
 	rte_free(t[0]);
 
@@ -317,8 +317,8 @@ test_rcu_qsbr_hash_reader(void *arg)
 	} while (!writer_done);
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&update_cycles, cycles);
-	rte_atomic64_add(&updates, loop_cnt);
+	__atomic_fetch_add(&update_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&updates, loop_cnt, __ATOMIC_RELAXED);
 
 	rte_rcu_qsbr_thread_unregister(temp, thread_id);
 
@@ -389,10 +389,10 @@ test_rcu_qsbr_sw_sv_1qs(void)
 
 	writer_done = 0;
 
-	rte_atomic64_clear(&updates);
-	rte_atomic64_clear(&update_cycles);
-	rte_atomic64_clear(&checks);
-	rte_atomic64_clear(&check_cycles);
+	__atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
 	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
 
@@ -453,8 +453,8 @@ test_rcu_qsbr_sw_sv_1qs(void)
 	}
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&check_cycles, cycles);
-	rte_atomic64_add(&checks, i);
+	__atomic_fetch_add(&check_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&checks, i, __ATOMIC_RELAXED);
 
 	writer_done = 1;
 
@@ -467,12 +467,12 @@ test_rcu_qsbr_sw_sv_1qs(void)
 
 	printf("Following numbers include calls to rte_hash functions\n");
 	printf("Cycles per 1 quiescent state update(online/update/offline): %"PRIi64"\n",
-		rte_atomic64_read(&update_cycles) /
-		rte_atomic64_read(&updates));
+		__atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+		__atomic_load_n(&updates, __ATOMIC_RELAXED));
 
 	printf("Cycles per 1 check(start, check): %"PRIi64"\n\n",
-		rte_atomic64_read(&check_cycles) /
-		rte_atomic64_read(&checks));
+		__atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+		__atomic_load_n(&checks, __ATOMIC_RELAXED));
 
 	rte_free(t[0]);
 
@@ -511,7 +511,7 @@ test_rcu_qsbr_sw_sv_1qs_non_blocking(void)
 
 	printf("Perf test: 1 writer, %d readers, 1 QSBR variable, 1 QSBR Query, Non-Blocking QSBR check\n", num_cores);
 
-	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+	__atomic_store_n(&thr_id, 0, __ATOMIC_RELAXED);
 
 	if (all_registered == 1)
 		tmp_num_cores = num_cores;
@@ -570,8 +570,8 @@ test_rcu_qsbr_sw_sv_1qs_non_blocking(void)
 	}
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&check_cycles, cycles);
-	rte_atomic64_add(&checks, i);
+	__atomic_fetch_add(&check_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&checks, i, __ATOMIC_RELAXED);
 
 	writer_done = 1;
 	/* Wait and check return value from reader threads */
@@ -583,12 +583,12 @@ test_rcu_qsbr_sw_sv_1qs_non_blocking(void)
 
 	printf("Following numbers include calls to rte_hash functions\n");
 	printf("Cycles per 1 quiescent state update(online/update/offline): %"PRIi64"\n",
-		rte_atomic64_read(&update_cycles) /
-		rte_atomic64_read(&updates));
+		__atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+		__atomic_load_n(&updates, __ATOMIC_RELAXED));
 
 	printf("Cycles per 1 check(start, check): %"PRIi64"\n\n",
-		rte_atomic64_read(&check_cycles) /
-		rte_atomic64_read(&checks));
+		__atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+		__atomic_load_n(&checks, __ATOMIC_RELAXED));
 
 	rte_free(t[0]);
 
@@ -619,10 +619,10 @@ test_rcu_qsbr_main(void)
 		return TEST_SKIPPED;
 	}
 
-	rte_atomic64_init(&updates);
-	rte_atomic64_init(&update_cycles);
-	rte_atomic64_init(&checks);
-	rte_atomic64_init(&check_cycles);
+	__atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
 	num_cores = 0;
 	RTE_LCORE_FOREACH_WORKER(core_id) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test
  2021-06-04  9:46 [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Joyce Kong
                   ` (7 preceding siblings ...)
  2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 8/8] test/rcu_perf: use GCC atomic builtins for data sync Joyce Kong
@ 2021-06-04 19:57 ` Stephen Hemminger
  2021-06-11  8:40 ` David Marchand
  2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
  10 siblings, 0 replies; 49+ messages in thread
From: Stephen Hemminger @ 2021-06-04 19:57 UTC (permalink / raw)
  To: Joyce Kong
  Cc: thomas, david.marchand, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang, dev, nd

On Fri,  4 Jun 2021 04:46:16 -0500
Joyce Kong <joyce.kong@arm.com> wrote:

> Since C11 memory model is adopted in DPDK now[1], use GCC's 
> atomic builtins in test cases.
> 
> [1]https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/
> 
> Joyce Kong (8):
>   test/ticketlock: use GCC atomic builtins for lcores sync
>   test/spinlock: use GCC atomic builtins for lcores sync
>   test/rwlock: use GCC atomic builtins for lcores sync
>   test/mcslock: use GCC atomic builtins for lcores sync
>   test/mempool: remove unused variable for lcores sync
>   test/mempool_perf: use GCC atomic builtins for lcores sync
>   test/service_cores: use GCC atomic builtins for lock sync
>   test/rcu_perf: use GCC atomic builtins for data sync
> 
>  app/test/test_mcslock.c       | 13 +++--
>  app/test/test_mempool.c       |  5 --
>  app/test/test_mempool_perf.c  | 12 ++---
>  app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------
>  app/test/test_rwlock.c        |  9 ++--
>  app/test/test_service_cores.c | 36 +++++++------
>  app/test/test_spinlock.c      | 10 ++--
>  app/test/test_ticketlock.c    |  9 ++--
>  8 files changed, 93 insertions(+), 99 deletions(-)
> 

Thanks, I did this for pflock tests during review cycle

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test
  2021-06-04  9:46 [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Joyce Kong
                   ` (8 preceding siblings ...)
  2021-06-04 19:57 ` [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Stephen Hemminger
@ 2021-06-11  8:40 ` David Marchand
  2021-06-11 10:45   ` Joyce Kong
  2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
  10 siblings, 1 reply; 49+ messages in thread
From: David Marchand @ 2021-06-11  8:40 UTC (permalink / raw)
  To: Joyce Kong
  Cc: Thomas Monjalon, Olivier Matz, Andrew Rybchenko,
	Van Haaren Harry, Honnappa Nagarahalli,
	Ruifeng Wang (Arm Technology China),
	dev, nd

On Fri, Jun 4, 2021 at 11:46 AM Joyce Kong <joyce.kong@arm.com> wrote:
>
> Since C11 memory model is adopted in DPDK now[1], use GCC's
> atomic builtins in test cases.
>
> [1]https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/
>
> Joyce Kong (8):
>   test/ticketlock: use GCC atomic builtins for lcores sync
>   test/spinlock: use GCC atomic builtins for lcores sync
>   test/rwlock: use GCC atomic builtins for lcores sync
>   test/mcslock: use GCC atomic builtins for lcores sync
>   test/mempool: remove unused variable for lcores sync
>   test/mempool_perf: use GCC atomic builtins for lcores sync
>   test/service_cores: use GCC atomic builtins for lock sync
>   test/rcu_perf: use GCC atomic builtins for data sync

Just a nit, use rte_wait_until_equal() instead of those sync loops.
Rest lgtm, thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test
  2021-06-11  8:40 ` David Marchand
@ 2021-06-11 10:45   ` Joyce Kong
  0 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-11 10:45 UTC (permalink / raw)
  To: David Marchand
  Cc: thomas, Olivier Matz, Andrew Rybchenko, Van Haaren Harry,
	Honnappa Nagarahalli, Ruifeng Wang, dev, nd

<snip>
> >
> > Since C11 memory model is adopted in DPDK now[1], use GCC's atomic
> > builtins in test cases.
> >
> > [1]https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-
> mod
> > el/
> >
> > Joyce Kong (8):
> >   test/ticketlock: use GCC atomic builtins for lcores sync
> >   test/spinlock: use GCC atomic builtins for lcores sync
> >   test/rwlock: use GCC atomic builtins for lcores sync
> >   test/mcslock: use GCC atomic builtins for lcores sync
> >   test/mempool: remove unused variable for lcores sync
> >   test/mempool_perf: use GCC atomic builtins for lcores sync
> >   test/service_cores: use GCC atomic builtins for lock sync
> >   test/rcu_perf: use GCC atomic builtins for data sync
> 
> Just a nit, use rte_wait_until_equal() instead of those sync loops.
> Rest lgtm, thanks.
> 

Thanks for your comment, I'll do the related changes.
> 
> --
> David Marchand


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-04  9:46 [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Joyce Kong
                   ` (9 preceding siblings ...)
  2021-06-11  8:40 ` David Marchand
@ 2021-06-16  2:54 ` Joyce Kong
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 1/8] test/ticketlock: use GCC atomic builtins for lcores sync Joyce Kong
                     ` (9 more replies)
  10 siblings, 10 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-16  2:54 UTC (permalink / raw)
  To: thomas, david.marchand, stephen, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Since C11 memory model is adopted in DPDK now[1], use GCC's 
atomic builtins in test cases.

[1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/

v2:
 Use rte_wait_until_equal() instead of original sync loops.
 <David Marchand>

v1:
 The initial version.

Joyce Kong (8):
  test/ticketlock: use GCC atomic builtins for lcores sync
  test/spinlock: use GCC atomic builtins for lcores sync
  test/rwlock: use GCC atomic builtins for lcores sync
  test/mcslock: use GCC atomic builtins for lcores sync
  test/mempool: remove unused variable for lcores sync
  test/mempool_perf: use GCC atomic builtins for lcores sync
  test/service_cores: use GCC atomic builtins for lock sync
  test/rcu: use GCC atomic builtins for data sync

 app/test/test_mcslock.c       | 14 +++--
 app/test/test_mempool.c       |  5 --
 app/test/test_mempool_perf.c  | 11 ++--
 app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------
 app/test/test_rwlock.c        | 10 ++--
 app/test/test_service_cores.c | 36 +++++++------
 app/test/test_spinlock.c      |  9 ++--
 app/test/test_ticketlock.c    | 10 ++--
 8 files changed, 91 insertions(+), 102 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v2 1/8] test/ticketlock: use GCC atomic builtins for lcores sync
  2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
@ 2021-06-16  2:54   ` Joyce Kong
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 2/8] test/spinlock: " Joyce Kong
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-16  2:54 UTC (permalink / raw)
  To: thomas, david.marchand, stephen, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to GCC atomic builtins for lcores sync
in ticketlock testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_ticketlock.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/app/test/test_ticketlock.c b/app/test/test_ticketlock.c
index 7aab8665b..242c13647 100644
--- a/app/test/test_ticketlock.c
+++ b/app/test/test_ticketlock.c
@@ -9,7 +9,6 @@
 #include <sys/queue.h>
 #include <unistd.h>
 
-#include <rte_atomic.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
 #include <rte_eal.h>
@@ -49,7 +48,7 @@ static rte_ticketlock_t tl_tab[RTE_MAX_LCORE];
 static rte_ticketlock_recursive_t tlr;
 static unsigned int count;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 static int
 test_ticketlock_per_core(__rte_unused void *arg)
@@ -112,8 +111,7 @@ load_loop_fn(void *func_param)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0)
-			;
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_rdtsc_precise();
 	while (lcore_count[lcore] < MAX_LOOP) {
@@ -155,11 +153,11 @@ test_ticketlock_perf(void)
 	printf("\nTest with lock on %u cores...\n", rte_lcore_count());
 
 	/* Clear synchro and start workers */
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(&lock);
 
 	rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v2 2/8] test/spinlock: use GCC atomic builtins for lcores sync
  2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 1/8] test/ticketlock: use GCC atomic builtins for lcores sync Joyce Kong
@ 2021-06-16  2:54   ` Joyce Kong
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 3/8] test/rwlock: " Joyce Kong
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-16  2:54 UTC (permalink / raw)
  To: thomas, david.marchand, stephen, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to GCC atomic builtins for lcores sync
in spinlock testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_spinlock.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/app/test/test_spinlock.c b/app/test/test_spinlock.c
index 054fb43a9..3f5937230 100644
--- a/app/test/test_spinlock.c
+++ b/app/test/test_spinlock.c
@@ -17,7 +17,6 @@
 #include <rte_lcore.h>
 #include <rte_cycles.h>
 #include <rte_spinlock.h>
-#include <rte_atomic.h>
 
 #include "test.h"
 
@@ -49,7 +48,7 @@ static rte_spinlock_t sl_tab[RTE_MAX_LCORE];
 static rte_spinlock_recursive_t slr;
 static unsigned count = 0;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 static int
 test_spinlock_per_core(__rte_unused void *arg)
@@ -111,7 +110,7 @@ load_loop_fn(void *func_param)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0);
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_get_timer_cycles();
 	while (lcount < MAX_LOOP) {
@@ -150,11 +149,11 @@ test_spinlock_perf(void)
 	printf("\nTest with lock on %u cores...\n", rte_lcore_count());
 
 	/* Clear synchro and start workers */
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(&lock);
 
 	rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v2 3/8] test/rwlock: use GCC atomic builtins for lcores sync
  2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 1/8] test/ticketlock: use GCC atomic builtins for lcores sync Joyce Kong
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 2/8] test/spinlock: " Joyce Kong
@ 2021-06-16  2:54   ` Joyce Kong
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 4/8] test/mcslock: " Joyce Kong
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-16  2:54 UTC (permalink / raw)
  To: thomas, david.marchand, stephen, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to GCC atomic builtins for lcores sync
in rwlock testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_rwlock.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/app/test/test_rwlock.c b/app/test/test_rwlock.c
index b47150a86..f2d1c8883 100644
--- a/app/test/test_rwlock.c
+++ b/app/test/test_rwlock.c
@@ -13,7 +13,6 @@
 #include <rte_memory.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_rwlock.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
@@ -36,7 +35,7 @@
 
 static rte_rwlock_t sl;
 static rte_rwlock_t sl_tab[RTE_MAX_LCORE];
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 enum {
 	LC_TYPE_RDLOCK,
@@ -102,8 +101,7 @@ load_loop_fn(__rte_unused void *arg)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0)
-			;
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_rdtsc_precise();
 	while (lcount < MAX_LOOP) {
@@ -136,12 +134,12 @@ test_rwlock_perf(void)
 	printf("\nRwlock Perf Test on %u cores...\n", rte_lcore_count());
 
 	/* clear synchro and start workers */
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 	if (rte_eal_mp_remote_launch(load_loop_fn, NULL, SKIP_MAIN) < 0)
 		return -1;
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(NULL);
 
 	rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v2 4/8] test/mcslock: use GCC atomic builtins for lcores sync
  2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
                     ` (2 preceding siblings ...)
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 3/8] test/rwlock: " Joyce Kong
@ 2021-06-16  2:54   ` Joyce Kong
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 5/8] test/mempool: remove unused variable " Joyce Kong
                     ` (5 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-16  2:54 UTC (permalink / raw)
  To: thomas, david.marchand, stephen, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to GCC atomic builtins for lcores sync
in mcslock testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_mcslock.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c
index 80eaecc90..52e45e7e2 100644
--- a/app/test/test_mcslock.c
+++ b/app/test/test_mcslock.c
@@ -17,7 +17,6 @@
 #include <rte_lcore.h>
 #include <rte_cycles.h>
 #include <rte_mcslock.h>
-#include <rte_atomic.h>
 
 #include "test.h"
 
@@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf;
 
 static unsigned int count;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 static int
 test_mcslock_per_core(__rte_unused void *arg)
@@ -76,8 +75,7 @@ load_loop_fn(void *func_param)
 	rte_mcslock_t ml_perf_me;
 
 	/* wait synchro */
-	while (rte_atomic32_read(&synchro) == 0)
-		;
+	rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_get_timer_cycles();
 	while (lcount < MAX_LOOP) {
@@ -102,15 +100,15 @@ test_mcslock_perf(void)
 	const unsigned int lcore = rte_lcore_id();
 
 	printf("\nTest with no lock on single core...\n");
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(&lock);
 	printf("Core [%u] Cost Time = %"PRIu64" us\n",
 			lcore, time_count[lcore]);
 	memset(time_count, 0, sizeof(time_count));
 
 	printf("\nTest with lock on single core...\n");
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	lock = 1;
-	rte_atomic32_set(&synchro, 1);
 	load_loop_fn(&lock);
 	printf("Core [%u] Cost Time = %"PRIu64" us\n",
 			lcore, time_count[lcore]);
@@ -118,11 +116,11 @@ test_mcslock_perf(void)
 
 	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
 
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(&lock);
 
 	rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v2 5/8] test/mempool: remove unused variable for lcores sync
  2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
                     ` (3 preceding siblings ...)
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 4/8] test/mcslock: " Joyce Kong
@ 2021-06-16  2:54   ` Joyce Kong
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 6/8] test/mempool_perf: use GCC atomic builtins " Joyce Kong
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-16  2:54 UTC (permalink / raw)
  To: thomas, david.marchand, stephen, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Remove the unused synchro variable as there is no lcores
sync in mempool function test.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_mempool.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index 3adadd673..7675a3e60 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_spinlock.h>
@@ -57,8 +56,6 @@
 		goto label;						\
 	} while (0)
 
-static rte_atomic32_t synchro;
-
 /*
  * save the object number in the first 4 bytes of object data. All
  * other bytes are set to 0.
@@ -491,8 +488,6 @@ test_mempool(void)
 	};
 	const char *default_pool_ops = rte_mbuf_best_mempool_ops();
 
-	rte_atomic32_init(&synchro);
-
 	/* create a mempool (without cache) */
 	mp_nocache = rte_mempool_create("test_nocache", MEMPOOL_SIZE,
 		MEMPOOL_ELT_SIZE, 0, 0,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v2 6/8] test/mempool_perf: use GCC atomic builtins for lcores sync
  2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
                     ` (4 preceding siblings ...)
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 5/8] test/mempool: remove unused variable " Joyce Kong
@ 2021-06-16  2:54   ` Joyce Kong
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 7/8] test/service_cores: use GCC atomic builtins for lock sync Joyce Kong
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-16  2:54 UTC (permalink / raw)
  To: thomas, david.marchand, stephen, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to GCC atomic builtins for lcores sync
in mempool_perf testcases. Meanwhile, remove unnecessary synchro
init as it would be set to 0 when launching cores.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_mempool_perf.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index d7d0aaa33..8f629736e 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_spinlock.h>
@@ -83,7 +82,7 @@
 static int use_external_cache;
 static unsigned external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 /* number of objects in one bulk operation (get or put) */
 static unsigned n_get_bulk;
@@ -145,7 +144,7 @@ per_lcore_mempool_test(void *arg)
 
 	/* wait synchro for workers */
 	if (lcore_id != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0);
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	start_cycles = rte_get_timer_cycles();
 
@@ -198,7 +197,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
 	int ret;
 	unsigned cores_save = cores;
 
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 
 	/* reset stats */
 	memset(stats, 0, sizeof(stats));
@@ -223,7 +222,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
 	}
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 
 	ret = per_lcore_mempool_test(mp);
 
@@ -288,8 +287,6 @@ test_mempool_perf(void)
 	const char *default_pool_ops;
 	int ret = -1;
 
-	rte_atomic32_init(&synchro);
-
 	/* create a mempool (without cache) */
 	mp_nocache = rte_mempool_create("perf_test_nocache", MEMPOOL_SIZE,
 					MEMPOOL_ELT_SIZE, 0, 0,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v2 7/8] test/service_cores: use GCC atomic builtins for lock sync
  2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
                     ` (5 preceding siblings ...)
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 6/8] test/mempool_perf: use GCC atomic builtins " Joyce Kong
@ 2021-06-16  2:54   ` Joyce Kong
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 8/8] test/rcu: use GCC atomic builtins for data sync Joyce Kong
                     ` (2 subsequent siblings)
  9 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-16  2:54 UTC (permalink / raw)
  To: thomas, david.marchand, stephen, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to GCC atomic builtins for lock sync
in service core testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_service_cores.c | 36 +++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index 37d7172d5..9d908d44e 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -53,18 +53,20 @@ static int32_t dummy_cb(void *args)
 static int32_t dummy_mt_unsafe_cb(void *args)
 {
 	/* before running test, the initialization has set pass_test to 1.
-	 * If the cmpset in service-cores is working correctly, the code here
+	 * If the cas in service-cores is working correctly, the code here
 	 * should never fail to take the lock. If the lock *is* taken, fail the
 	 * test, because two threads are concurrently in a non-MT safe callback.
 	 */
 	uint32_t *test_params = args;
-	uint32_t *atomic_lock = &test_params[0];
+	uint32_t *lock = &test_params[0];
 	uint32_t *pass_test = &test_params[1];
-	int lock_taken = rte_atomic32_cmpset(atomic_lock, 0, 1);
+	uint32_t exp = 0;
+	int lock_taken = __atomic_compare_exchange_n(lock, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED);
 	if (lock_taken) {
 		/* delay with the lock held */
 		rte_delay_ms(250);
-		rte_atomic32_clear((rte_atomic32_t *)atomic_lock);
+		__atomic_store_n(lock, 0, __ATOMIC_RELAXED);
 	} else {
 		/* 2nd thread will fail to take lock, so set pass flag */
 		*pass_test = 0;
@@ -83,13 +85,15 @@ static int32_t dummy_mt_safe_cb(void *args)
 	 *    that 2 threads are running the callback at the same time: MT safe
 	 */
 	uint32_t *test_params = args;
-	uint32_t *atomic_lock = &test_params[0];
+	uint32_t *lock = &test_params[0];
 	uint32_t *pass_test = &test_params[1];
-	int lock_taken = rte_atomic32_cmpset(atomic_lock, 0, 1);
+	uint32_t exp = 0;
+	int lock_taken = __atomic_compare_exchange_n(lock, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED);
 	if (lock_taken) {
 		/* delay with the lock held */
 		rte_delay_ms(250);
-		rte_atomic32_clear((rte_atomic32_t *)atomic_lock);
+		__atomic_store_n(lock, 0, __ATOMIC_RELAXED);
 	} else {
 		/* 2nd thread will fail to take lock, so set pass flag */
 		*pass_test = 1;
@@ -622,9 +626,9 @@ service_threaded_test(int mt_safe)
 	TEST_ASSERT_EQUAL(0, rte_service_lcore_add(slcore_2),
 			"mt safe lcore add fail");
 
-	/* Use atomic locks to verify that two threads are in the same function
-	 * at the same time. These are passed to the unit tests through the
-	 * callback userdata parameter
+	/* Use locks to verify that two threads are in the same function
+	 * at the same time. These are passed to the unit tests through
+	 * the callback userdata parameter.
 	 */
 	uint32_t test_params[2];
 	memset(test_params, 0, sizeof(uint32_t) * 2);
@@ -713,7 +717,7 @@ service_mt_safe_poll(void)
 }
 
 /* tests a NON mt safe service with two cores, the callback is serialized
- * using the atomic cmpset.
+ * using the cas.
  */
 static int
 service_mt_unsafe_poll(void)
@@ -735,17 +739,17 @@ delay_as_a_mt_safe_service(void *args)
 	RTE_SET_USED(args);
 	uint32_t *params = args;
 
-	/* retrieve done flag and atomic lock to inc/dec */
+	/* retrieve done flag and lock to add/sub */
 	uint32_t *done = &params[0];
-	rte_atomic32_t *lock = (rte_atomic32_t *)&params[1];
+	uint32_t *lock = &params[1];
 
 	while (!*done) {
-		rte_atomic32_inc(lock);
+		__atomic_add_fetch(lock, 1, __ATOMIC_RELAXED);
 		rte_delay_us(500);
-		if (rte_atomic32_read(lock) > 1)
+		if (__atomic_load_n(lock, __ATOMIC_RELAXED) > 1)
 			/* pass: second core has simultaneously incremented */
 			*done = 1;
-		rte_atomic32_dec(lock);
+		__atomic_sub_fetch(lock, 1, __ATOMIC_RELAXED);
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v2 8/8] test/rcu: use GCC atomic builtins for data sync
  2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
                     ` (6 preceding siblings ...)
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 7/8] test/service_cores: use GCC atomic builtins for lock sync Joyce Kong
@ 2021-06-16  2:54   ` Joyce Kong
  2021-06-17 15:21   ` [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test Tyler Retzlaff
  2021-07-20  3:51   ` [dpdk-dev] [PATCH v3 0/8] use compiler " Joyce Kong
  9 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-06-16  2:54 UTC (permalink / raw)
  To: thomas, david.marchand, stephen, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang
  Cc: dev, nd

Covert rte_atomic usages to GCC atomic builtins in rcu_perf
testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/app/test/test_rcu_qsbr_perf.c b/app/test/test_rcu_qsbr_perf.c
index 3017e7112..cf7b158d2 100644
--- a/app/test/test_rcu_qsbr_perf.c
+++ b/app/test/test_rcu_qsbr_perf.c
@@ -30,8 +30,8 @@ static volatile uint32_t thr_id;
 static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
 static struct rte_hash *h;
 static char hash_name[8];
-static rte_atomic64_t updates, checks;
-static rte_atomic64_t update_cycles, check_cycles;
+static uint64_t updates, checks;
+static uint64_t update_cycles, check_cycles;
 
 /* Scale down results to 1000 operations to support lower
  * granularity clocks.
@@ -81,8 +81,8 @@ test_rcu_qsbr_reader_perf(void *arg)
 	}
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&update_cycles, cycles);
-	rte_atomic64_add(&updates, loop_cnt);
+	__atomic_fetch_add(&update_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&updates, loop_cnt, __ATOMIC_RELAXED);
 
 	/* Make the thread offline */
 	rte_rcu_qsbr_thread_offline(t[0], thread_id);
@@ -113,8 +113,8 @@ test_rcu_qsbr_writer_perf(void *arg)
 	} while (loop_cnt < 20000000);
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&check_cycles, cycles);
-	rte_atomic64_add(&checks, loop_cnt);
+	__atomic_fetch_add(&check_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&checks, loop_cnt, __ATOMIC_RELAXED);
 	return 0;
 }
 
@@ -130,10 +130,10 @@ test_rcu_qsbr_perf(void)
 
 	writer_done = 0;
 
-	rte_atomic64_clear(&updates);
-	rte_atomic64_clear(&update_cycles);
-	rte_atomic64_clear(&checks);
-	rte_atomic64_clear(&check_cycles);
+	__atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
 	printf("\nPerf Test: %d Readers/1 Writer('wait' in qsbr_check == true)\n",
 		num_cores - 1);
@@ -168,15 +168,15 @@ test_rcu_qsbr_perf(void)
 	rte_eal_mp_wait_lcore();
 
 	printf("Total quiescent state updates = %"PRIi64"\n",
-		rte_atomic64_read(&updates));
+		__atomic_load_n(&updates, __ATOMIC_RELAXED));
 	printf("Cycles per %d quiescent state updates: %"PRIi64"\n",
 		RCU_SCALE_DOWN,
-		rte_atomic64_read(&update_cycles) /
-		(rte_atomic64_read(&updates) / RCU_SCALE_DOWN));
-	printf("Total RCU checks = %"PRIi64"\n", rte_atomic64_read(&checks));
+		__atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+		(__atomic_load_n(&updates, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
+	printf("Total RCU checks = %"PRIi64"\n", __atomic_load_n(&checks, __ATOMIC_RELAXED));
 	printf("Cycles per %d checks: %"PRIi64"\n", RCU_SCALE_DOWN,
-		rte_atomic64_read(&check_cycles) /
-		(rte_atomic64_read(&checks) / RCU_SCALE_DOWN));
+		__atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+		(__atomic_load_n(&checks, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
 
 	rte_free(t[0]);
 
@@ -193,8 +193,8 @@ test_rcu_qsbr_rperf(void)
 	size_t sz;
 	unsigned int i, tmp_num_cores;
 
-	rte_atomic64_clear(&updates);
-	rte_atomic64_clear(&update_cycles);
+	__atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
 
 	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
 
@@ -220,11 +220,11 @@ test_rcu_qsbr_rperf(void)
 	rte_eal_mp_wait_lcore();
 
 	printf("Total quiescent state updates = %"PRIi64"\n",
-		rte_atomic64_read(&updates));
+		__atomic_load_n(&updates, __ATOMIC_RELAXED));
 	printf("Cycles per %d quiescent state updates: %"PRIi64"\n",
 		RCU_SCALE_DOWN,
-		rte_atomic64_read(&update_cycles) /
-		(rte_atomic64_read(&updates) / RCU_SCALE_DOWN));
+		__atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+		(__atomic_load_n(&updates, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
 
 	rte_free(t[0]);
 
@@ -241,8 +241,8 @@ test_rcu_qsbr_wperf(void)
 	size_t sz;
 	unsigned int i;
 
-	rte_atomic64_clear(&checks);
-	rte_atomic64_clear(&check_cycles);
+	__atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
 	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
 
@@ -266,10 +266,10 @@ test_rcu_qsbr_wperf(void)
 	/* Wait until all readers have exited */
 	rte_eal_mp_wait_lcore();
 
-	printf("Total RCU checks = %"PRIi64"\n", rte_atomic64_read(&checks));
+	printf("Total RCU checks = %"PRIi64"\n", __atomic_load_n(&checks, __ATOMIC_RELAXED));
 	printf("Cycles per %d checks: %"PRIi64"\n", RCU_SCALE_DOWN,
-		rte_atomic64_read(&check_cycles) /
-		(rte_atomic64_read(&checks) / RCU_SCALE_DOWN));
+		__atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+		(__atomic_load_n(&checks, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
 
 	rte_free(t[0]);
 
@@ -317,8 +317,8 @@ test_rcu_qsbr_hash_reader(void *arg)
 	} while (!writer_done);
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&update_cycles, cycles);
-	rte_atomic64_add(&updates, loop_cnt);
+	__atomic_fetch_add(&update_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&updates, loop_cnt, __ATOMIC_RELAXED);
 
 	rte_rcu_qsbr_thread_unregister(temp, thread_id);
 
@@ -389,10 +389,10 @@ test_rcu_qsbr_sw_sv_1qs(void)
 
 	writer_done = 0;
 
-	rte_atomic64_clear(&updates);
-	rte_atomic64_clear(&update_cycles);
-	rte_atomic64_clear(&checks);
-	rte_atomic64_clear(&check_cycles);
+	__atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
 	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
 
@@ -453,8 +453,8 @@ test_rcu_qsbr_sw_sv_1qs(void)
 	}
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&check_cycles, cycles);
-	rte_atomic64_add(&checks, i);
+	__atomic_fetch_add(&check_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&checks, i, __ATOMIC_RELAXED);
 
 	writer_done = 1;
 
@@ -467,12 +467,12 @@ test_rcu_qsbr_sw_sv_1qs(void)
 
 	printf("Following numbers include calls to rte_hash functions\n");
 	printf("Cycles per 1 quiescent state update(online/update/offline): %"PRIi64"\n",
-		rte_atomic64_read(&update_cycles) /
-		rte_atomic64_read(&updates));
+		__atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+		__atomic_load_n(&updates, __ATOMIC_RELAXED));
 
 	printf("Cycles per 1 check(start, check): %"PRIi64"\n\n",
-		rte_atomic64_read(&check_cycles) /
-		rte_atomic64_read(&checks));
+		__atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+		__atomic_load_n(&checks, __ATOMIC_RELAXED));
 
 	rte_free(t[0]);
 
@@ -511,7 +511,7 @@ test_rcu_qsbr_sw_sv_1qs_non_blocking(void)
 
 	printf("Perf test: 1 writer, %d readers, 1 QSBR variable, 1 QSBR Query, Non-Blocking QSBR check\n", num_cores);
 
-	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+	__atomic_store_n(&thr_id, 0, __ATOMIC_RELAXED);
 
 	if (all_registered == 1)
 		tmp_num_cores = num_cores;
@@ -570,8 +570,8 @@ test_rcu_qsbr_sw_sv_1qs_non_blocking(void)
 	}
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&check_cycles, cycles);
-	rte_atomic64_add(&checks, i);
+	__atomic_fetch_add(&check_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&checks, i, __ATOMIC_RELAXED);
 
 	writer_done = 1;
 	/* Wait and check return value from reader threads */
@@ -583,12 +583,12 @@ test_rcu_qsbr_sw_sv_1qs_non_blocking(void)
 
 	printf("Following numbers include calls to rte_hash functions\n");
 	printf("Cycles per 1 quiescent state update(online/update/offline): %"PRIi64"\n",
-		rte_atomic64_read(&update_cycles) /
-		rte_atomic64_read(&updates));
+		__atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+		__atomic_load_n(&updates, __ATOMIC_RELAXED));
 
 	printf("Cycles per 1 check(start, check): %"PRIi64"\n\n",
-		rte_atomic64_read(&check_cycles) /
-		rte_atomic64_read(&checks));
+		__atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+		__atomic_load_n(&checks, __ATOMIC_RELAXED));
 
 	rte_free(t[0]);
 
@@ -619,10 +619,10 @@ test_rcu_qsbr_main(void)
 		return TEST_SKIPPED;
 	}
 
-	rte_atomic64_init(&updates);
-	rte_atomic64_init(&update_cycles);
-	rte_atomic64_init(&checks);
-	rte_atomic64_init(&check_cycles);
+	__atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
 	num_cores = 0;
 	RTE_LCORE_FOREACH_WORKER(core_id) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
                     ` (7 preceding siblings ...)
  2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 8/8] test/rcu: use GCC atomic builtins for data sync Joyce Kong
@ 2021-06-17 15:21   ` Tyler Retzlaff
  2021-06-17 23:26     ` Honnappa Nagarahalli
  2021-07-20  3:51   ` [dpdk-dev] [PATCH v3 0/8] use compiler " Joyce Kong
  9 siblings, 1 reply; 49+ messages in thread
From: Tyler Retzlaff @ 2021-06-17 15:21 UTC (permalink / raw)
  To: Joyce Kong
  Cc: thomas, david.marchand, stephen, olivier.matz, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang, dev, nd

On Tue, Jun 15, 2021 at 09:54:51PM -0500, Joyce Kong wrote:
> Since C11 memory model is adopted in DPDK now[1], use GCC's 
> atomic builtins in test cases.

as previously discussed these atomics are not "C11" they are
direct use of gcc builtins. please don't incorporate C11 into the title
of the patches or commit messages since it isn't.

please do not integrate a patch that directly uses gcc builtins and
extensions please maintain abstractions under the rte_ namespace.
specifically this patch substantially increases coupling to a single
compiler implementation reducing portability.

as previously requested, please establish at a minimum macros in the
rte_ namespace for this.

thanks.

> 
> [1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/
> 
> v2:
>  Use rte_wait_until_equal() instead of original sync loops.
>  <David Marchand>
> 
> v1:
>  The initial version.
> 
> Joyce Kong (8):
>   test/ticketlock: use GCC atomic builtins for lcores sync
>   test/spinlock: use GCC atomic builtins for lcores sync
>   test/rwlock: use GCC atomic builtins for lcores sync
>   test/mcslock: use GCC atomic builtins for lcores sync
>   test/mempool: remove unused variable for lcores sync
>   test/mempool_perf: use GCC atomic builtins for lcores sync
>   test/service_cores: use GCC atomic builtins for lock sync
>   test/rcu: use GCC atomic builtins for data sync
> 
>  app/test/test_mcslock.c       | 14 +++--
>  app/test/test_mempool.c       |  5 --
>  app/test/test_mempool_perf.c  | 11 ++--
>  app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------
>  app/test/test_rwlock.c        | 10 ++--
>  app/test/test_service_cores.c | 36 +++++++------
>  app/test/test_spinlock.c      |  9 ++--
>  app/test/test_ticketlock.c    | 10 ++--
>  8 files changed, 91 insertions(+), 102 deletions(-)
> 
> -- 
> 2.17.1

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-17 15:21   ` [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test Tyler Retzlaff
@ 2021-06-17 23:26     ` Honnappa Nagarahalli
  2021-06-23 10:24       ` Thomas Monjalon
  0 siblings, 1 reply; 49+ messages in thread
From: Honnappa Nagarahalli @ 2021-06-17 23:26 UTC (permalink / raw)
  To: Tyler Retzlaff, Joyce Kong
  Cc: thomas, david.marchand, stephen, olivier.matz, andrew.rybchenko,
	harry.van.haaren, Ruifeng Wang, dev, nd, Honnappa Nagarahalli,
	techboard, nd

<snip>
> 
> On Tue, Jun 15, 2021 at 09:54:51PM -0500, Joyce Kong wrote:
> > Since C11 memory model is adopted in DPDK now[1], use GCC's atomic
> > builtins in test cases.
> 
> as previously discussed these atomics are not "C11" they are direct use of gcc
> builtins. please don't incorporate C11 into the title of the patches or commit
> messages since it isn't.
GCC supports 2 types of built-in atomics, __atomic_xxx[1] and __sync_xxx [2]. We need a way to distinguish between them. We are using "C11" as [1] says they match C++11 memory model.

[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
[2] https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html

> 
> please do not integrate a patch that directly uses gcc builtins and extensions
> please maintain abstractions under the rte_ namespace.
This is just another wrapper which is not required according to the current requirements we are working with.

> specifically this patch substantially increases coupling to a single compiler
> implementation reducing portability.
> 
> as previously requested, please establish at a minimum macros in the rte_
> namespace for this.
This needs to be discussed at the Techboard. I have CCed the Techboard. The Techboard meets once in 2 weeks. The details are at [3]. Next meeting is on 6/30 at 10am CST. Can you please attend and make your case?

Alternately, you can send an email to techboard@dpdk.org explaining your case and we will debate and make a decision.

[3] https://core.dpdk.org/techboard/

> 
> thanks.
> 
> >
> > [1]
> > https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-
> model/
> >
> > v2:
> >  Use rte_wait_until_equal() instead of original sync loops.
> >  <David Marchand>
> >
> > v1:
> >  The initial version.
> >
> > Joyce Kong (8):
> >   test/ticketlock: use GCC atomic builtins for lcores sync
> >   test/spinlock: use GCC atomic builtins for lcores sync
> >   test/rwlock: use GCC atomic builtins for lcores sync
> >   test/mcslock: use GCC atomic builtins for lcores sync
> >   test/mempool: remove unused variable for lcores sync
> >   test/mempool_perf: use GCC atomic builtins for lcores sync
> >   test/service_cores: use GCC atomic builtins for lock sync
> >   test/rcu: use GCC atomic builtins for data sync
> >
> >  app/test/test_mcslock.c       | 14 +++--
> >  app/test/test_mempool.c       |  5 --
> >  app/test/test_mempool_perf.c  | 11 ++--
> > app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------
> >  app/test/test_rwlock.c        | 10 ++--
> >  app/test/test_service_cores.c | 36 +++++++------
> >  app/test/test_spinlock.c      |  9 ++--
> >  app/test/test_ticketlock.c    | 10 ++--
> >  8 files changed, 91 insertions(+), 102 deletions(-)
> >
> > --
> > 2.17.1

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-17 23:26     ` Honnappa Nagarahalli
@ 2021-06-23 10:24       ` Thomas Monjalon
  2021-06-23 16:02         ` Tyler Retzlaff
  2021-06-29 17:04         ` Honnappa Nagarahalli
  0 siblings, 2 replies; 49+ messages in thread
From: Thomas Monjalon @ 2021-06-23 10:24 UTC (permalink / raw)
  To: Tyler Retzlaff, Honnappa Nagarahalli
  Cc: Joyce Kong, dev, david.marchand, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, Ruifeng Wang, dev, nd,
	techboard, nd

18/06/2021 01:26, Honnappa Nagarahalli:
> > On Tue, Jun 15, 2021 at 09:54:51PM -0500, Joyce Kong wrote:
> > > Since C11 memory model is adopted in DPDK now[1], use GCC's atomic
> > > builtins in test cases.
> > 
> > as previously discussed these atomics are not "C11" they are direct use of gcc
> > builtins. please don't incorporate C11 into the title of the patches or commit
> > messages since it isn't.
> 
> GCC supports 2 types of built-in atomics,
> __atomic_xxx[1] and __sync_xxx [2].
> We need a way to distinguish between them.
> We are using "C11" as [1] says they match C++11 memory model.

I agree it would be more correct to mention "compiler builtin"
as it is not strictly the C11 API.

> [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
> [2] https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html
> 
> > 
> > please do not integrate a patch that directly uses gcc builtins and extensions
> > please maintain abstractions under the rte_ namespace.
> 
> This is just another wrapper which is not required
> according to the current requirements we are working with.

Yes such wrapper is not required *today*.
We have 2 options:
	1/ introduce a wrapper now to anticipate any future issue
	2/ introduce a wrapper later when required

Given we already use these builtins, we should not block this patchset.
If it is decided to change the policy, then we'll replace the calls
to the compiler builtins in all the codebase.

> > specifically this patch substantially increases coupling to a single compiler
> > implementation reducing portability.
> > 
> > as previously requested, please establish at a minimum macros in the rte_
> > namespace for this.
> 
> This needs to be discussed at the Techboard. I have CCed the Techboard.
> The Techboard meets once in 2 weeks. The details are at [3].
> Next meeting is on 6/30 at 10am CST. Can you please attend and make your case?

I agree to discuss options 1 or 2 in a techboard meeting.

> Alternately, you can send an email to techboard@dpdk.org
> explaining your case and we will debate and make a decision.
> 
> [3] https://core.dpdk.org/techboard/
> 
> > 
> > thanks.
> > 
> > > [1]
> > > https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-
> > model/




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-23 10:24       ` Thomas Monjalon
@ 2021-06-23 16:02         ` Tyler Retzlaff
  2021-06-29 17:04         ` Honnappa Nagarahalli
  1 sibling, 0 replies; 49+ messages in thread
From: Tyler Retzlaff @ 2021-06-23 16:02 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Honnappa Nagarahalli, Joyce Kong, dev, david.marchand, stephen,
	olivier.matz, andrew.rybchenko, harry.van.haaren, Ruifeng Wang,
	nd, techboard

On Wed, Jun 23, 2021 at 12:24:55PM +0200, Thomas Monjalon wrote:
> 18/06/2021 01:26, Honnappa Nagarahalli:
> 
> Yes such wrapper is not required *today*.
> We have 2 options:
> 	1/ introduce a wrapper now to anticipate any future issue
> 	2/ introduce a wrapper later when required
> 
> Given we already use these builtins, we should not block this patchset.
> If it is decided to change the policy, then we'll replace the calls
> to the compiler builtins in all the codebase.

agreed. on the condition that the community accepts that a broad
tree-wide change (or set of changes) will have to be accepted later
to introduce the abstraction.

> > This needs to be discussed at the Techboard. I have CCed the Techboard.
> > The Techboard meets once in 2 weeks. The details are at [3].
> > Next meeting is on 6/30 at 10am CST. Can you please attend and make your case?
> 
> I agree to discuss options 1 or 2 in a techboard meeting.

option 2 (when required) is acceptable. in general it would be good to
communicate that this isn't the only abstraction that will be
introduced to improve portability. there are areas other than atomics
that need to be addressed in the platform/toolchain matrix.

thanks


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-23 10:24       ` Thomas Monjalon
  2021-06-23 16:02         ` Tyler Retzlaff
@ 2021-06-29 17:04         ` Honnappa Nagarahalli
  2021-06-30 18:51           ` Tyler Retzlaff
  1 sibling, 1 reply; 49+ messages in thread
From: Honnappa Nagarahalli @ 2021-06-29 17:04 UTC (permalink / raw)
  To: thomas, Tyler Retzlaff
  Cc: Joyce Kong, dev, david.marchand, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, Ruifeng Wang, dev, nd,
	techboard, nd, nd

<snip>
> 
> 18/06/2021 01:26, Honnappa Nagarahalli:
> > > On Tue, Jun 15, 2021 at 09:54:51PM -0500, Joyce Kong wrote:
> > > > Since C11 memory model is adopted in DPDK now[1], use GCC's atomic
> > > > builtins in test cases.
> > >
> > > as previously discussed these atomics are not "C11" they are direct
> > > use of gcc builtins. please don't incorporate C11 into the title of
> > > the patches or commit messages since it isn't.
> >
> > GCC supports 2 types of built-in atomics, __atomic_xxx[1] and
> > __sync_xxx [2].
> > We need a way to distinguish between them.
> > We are using "C11" as [1] says they match C++11 memory model.
> 
> I agree it would be more correct to mention "compiler builtin"
> as it is not strictly the C11 API.
The log already mentions "GCC's C11 atomic builtins". I think that is correct enough and represents the change correctly.

> 
> > [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html
> > [2] https://gcc.gnu.org/onlinedocs/gcc-4.1.1/gcc/Atomic-Builtins.html
> >
> > >
> > > please do not integrate a patch that directly uses gcc builtins and
> > > extensions please maintain abstractions under the rte_ namespace.
> >
<snip>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-29 17:04         ` Honnappa Nagarahalli
@ 2021-06-30 18:51           ` Tyler Retzlaff
  2021-06-30 19:06             ` Honnappa Nagarahalli
  0 siblings, 1 reply; 49+ messages in thread
From: Tyler Retzlaff @ 2021-06-30 18:51 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: thomas, Joyce Kong, dev, david.marchand, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, Ruifeng Wang, nd, techboard

On Tue, Jun 29, 2021 at 05:04:55PM +0000, Honnappa Nagarahalli wrote:
> <snip>
> > 
> > 18/06/2021 01:26, Honnappa Nagarahalli:
> > > > On Tue, Jun 15, 2021 at 09:54:51PM -0500, Joyce Kong wrote:
> > > > > Since C11 memory model is adopted in DPDK now[1], use GCC's atomic
> > > > > builtins in test cases.
> > > >
> > > > as previously discussed these atomics are not "C11" they are direct
> > > > use of gcc builtins. please don't incorporate C11 into the title of
> > > > the patches or commit messages since it isn't.
> > >
> > > GCC supports 2 types of built-in atomics, __atomic_xxx[1] and
> > > __sync_xxx [2].
> > > We need a way to distinguish between them.
> > > We are using "C11" as [1] says they match C++11 memory model.
> > 
> > I agree it would be more correct to mention "compiler builtin"
> > as it is not strictly the C11 API.
> The log already mentions "GCC's C11 atomic builtins". I think that is correct enough and represents the change correctly.

it's misleading and does not attract the correct reviewers particularly
due to prominence in the commit/mail subject.

please change it to "Use GCC atomic builtins" which describes clearly
the actual change without ambiguity.  using "C11" implies the patch is
adding code that uses C11 stdatomic.h and it doesn't.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-30 18:51           ` Tyler Retzlaff
@ 2021-06-30 19:06             ` Honnappa Nagarahalli
  2021-06-30 19:38               ` Tyler Retzlaff
  0 siblings, 1 reply; 49+ messages in thread
From: Honnappa Nagarahalli @ 2021-06-30 19:06 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: thomas, Joyce Kong, dev, david.marchand, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, Ruifeng Wang, nd, techboard,
	nd

<snip>

> > >
> > > 18/06/2021 01:26, Honnappa Nagarahalli:
> > > > > On Tue, Jun 15, 2021 at 09:54:51PM -0500, Joyce Kong wrote:
> > > > > > Since C11 memory model is adopted in DPDK now[1], use GCC's
> > > > > > atomic builtins in test cases.
> > > > >
> > > > > as previously discussed these atomics are not "C11" they are
> > > > > direct use of gcc builtins. please don't incorporate C11 into
> > > > > the title of the patches or commit messages since it isn't.
> > > >
> > > > GCC supports 2 types of built-in atomics, __atomic_xxx[1] and
> > > > __sync_xxx [2].
> > > > We need a way to distinguish between them.
> > > > We are using "C11" as [1] says they match C++11 memory model.
> > >
> > > I agree it would be more correct to mention "compiler builtin"
> > > as it is not strictly the C11 API.
> > The log already mentions "GCC's C11 atomic builtins". I think that is correct
> enough and represents the change correctly.
> 
> it's misleading and does not attract the correct reviewers particularly due to
> prominence in the commit/mail subject.
> 
> please change it to "Use GCC atomic builtins" which describes clearly the
> actual change without ambiguity.  using "C11" implies the patch is adding
> code that uses C11 stdatomic.h and it doesn't.
As I mentioned earlier in this thread, GCC supports 2 types of atomics. "Use GCC atomic builtins" does not help distinguish between them. In "GCC's C11 atomic builtins" - "C11" indicates which atomics we are using, "atomic builtins" indicates that we are NOT using APIs from stdatomic.h

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-30 19:06             ` Honnappa Nagarahalli
@ 2021-06-30 19:38               ` Tyler Retzlaff
  2021-06-30 20:25                 ` Honnappa Nagarahalli
  0 siblings, 1 reply; 49+ messages in thread
From: Tyler Retzlaff @ 2021-06-30 19:38 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: thomas, Joyce Kong, dev, david.marchand, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, Ruifeng Wang, nd, techboard

On Wed, Jun 30, 2021 at 07:06:31PM +0000, Honnappa Nagarahalli wrote:
> <snip>
> 
> As I mentioned earlier in this thread, GCC supports 2 types of atomics. "Use GCC atomic builtins" does not help distinguish between them. In "GCC's C11 atomic builtins" - "C11" indicates which atomics we are using, "atomic builtins" indicates that we are NOT using APIs from stdatomic.h

if you need a term to distinguish the two sets of atomics in gcc you can
qualify it with "Memory Model Aware" which is straight from the gcc
manual.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-30 19:38               ` Tyler Retzlaff
@ 2021-06-30 20:25                 ` Honnappa Nagarahalli
  2021-06-30 21:49                   ` Tyler Retzlaff
  0 siblings, 1 reply; 49+ messages in thread
From: Honnappa Nagarahalli @ 2021-06-30 20:25 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: thomas, Joyce Kong, dev, david.marchand, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, Ruifeng Wang, nd, techboard,
	nd

<snip>

> >
> > As I mentioned earlier in this thread, GCC supports 2 types of
> > atomics. "Use GCC atomic builtins" does not help distinguish between
> > them. In "GCC's C11 atomic builtins" - "C11" indicates which atomics
> > we are using, "atomic builtins" indicates that we are NOT using APIs
> > from stdatomic.h
> 
> if you need a term to distinguish the two sets of atomics in gcc you can qualify
> it with "Memory Model Aware" which is straight from the gcc manual.
"Memory model aware" sounds too generic. The same page [1] also makes it clear that the built-in functions match the requirements for the C11 memory model.

There are also several patches merged in the past which do not use the term "memory model aware". I would prefer to be consistent.

[1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-30 20:25                 ` Honnappa Nagarahalli
@ 2021-06-30 21:49                   ` Tyler Retzlaff
  2021-06-30 22:41                     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 49+ messages in thread
From: Tyler Retzlaff @ 2021-06-30 21:49 UTC (permalink / raw)
  To: Honnappa Nagarahalli
  Cc: thomas, Joyce Kong, dev, david.marchand, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, Ruifeng Wang, nd, techboard

On Wed, Jun 30, 2021 at 08:25:44PM +0000, Honnappa Nagarahalli wrote:
> <snip>
> 
> > >
> > > As I mentioned earlier in this thread, GCC supports 2 types of
> > > atomics. "Use GCC atomic builtins" does not help distinguish between
> > > them. In "GCC's C11 atomic builtins" - "C11" indicates which atomics
> > > we are using, "atomic builtins" indicates that we are NOT using APIs
> > > from stdatomic.h
> > 
> > if you need a term to distinguish the two sets of atomics in gcc you can qualify
> > it with "Memory Model Aware" which is straight from the gcc manual.
> "Memory model aware" sounds too generic. The same page [1] also makes it clear that the built-in functions match the requirements for the C11 memory model.

allow me to put your interpretation of the manual that you linked side
by side with what the manual text actually says verbatim.

your text from above
  "built-in functions match the requirements for the C11 memory model."

the actual text from your link
  "built-in functions approximately match the requirements for the C++11 memory model."

* you've chosen to drop approximately from the wording to try and make
  your argument.

* you've also chosen to substitute C11 in place of C++11. again
  presumably for the same reason.

in fact the entire page does not mention C11 even once, it also goes on
to highlight a specific deviation from C++11 with this excerpt "because
of a deficiency in C++11's semantics for memory_order_consume"

> There are also several patches merged in the past which do not use the term "memory model aware". I would prefer to be consistent.

i prefer the history represent the change. that previous submitters and
reviewers lacked precision is not my concern nor is consistency a reason
to continue documenting history incorrectly.

i'm waiting to ack the change, it's up to you. you've already spent more
time arguing than it would have taken to submit a v2 correcting the
problem.

> 
> [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-30 21:49                   ` Tyler Retzlaff
@ 2021-06-30 22:41                     ` Honnappa Nagarahalli
  2021-07-13  7:28                       ` Joyce Kong
  0 siblings, 1 reply; 49+ messages in thread
From: Honnappa Nagarahalli @ 2021-06-30 22:41 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: thomas, Joyce Kong, dev, david.marchand, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, Ruifeng Wang, nd, techboard,
	nd

<snip>

> >
> > > >
> > > > As I mentioned earlier in this thread, GCC supports 2 types of
> > > > atomics. "Use GCC atomic builtins" does not help distinguish
> > > > between them. In "GCC's C11 atomic builtins" - "C11" indicates
> > > > which atomics we are using, "atomic builtins" indicates that we
> > > > are NOT using APIs from stdatomic.h
> > >
> > > if you need a term to distinguish the two sets of atomics in gcc you
> > > can qualify it with "Memory Model Aware" which is straight from the gcc
> manual.
> > "Memory model aware" sounds too generic. The same page [1] also makes
> it clear that the built-in functions match the requirements for the C11
> memory model.
> 
> allow me to put your interpretation of the manual that you linked side by side
> with what the manual text actually says verbatim.
> 
> your text from above
>   "built-in functions match the requirements for the C11 memory model."
> 
> the actual text from your link
>   "built-in functions approximately match the requirements for the C++11
> memory model."
> 
> * you've chosen to drop approximately from the wording to try and make
>   your argument.
I am not sure how this makes a difference to our arguments. For ex: there are no other built in functions that "exactly" match the C++11 memory model supported by GCC.

> 
> * you've also chosen to substitute C11 in place of C++11. again
>   presumably for the same reason.
> 
> in fact the entire page does not mention C11 even once, it also goes on to
> highlight a specific deviation from C++11 with this excerpt "because of a
> deficiency in C++11's semantics for memory_order_consume"
I do not have a problem to call it C++11. IMO, calling it "GCC's C++11 ..." will address this deviation and the approximation.

> 
> > There are also several patches merged in the past which do not use the term
> "memory model aware". I would prefer to be consistent.
> 
> i prefer the history represent the change. that previous submitters and
> reviewers lacked precision is not my concern nor is consistency a reason to
> continue documenting history incorrectly.
Ok. As I mentioned, it is just my preference.

> 
> i'm waiting to ack the change, it's up to you. you've already spent more time
> arguing than it would have taken to submit a v2 correcting the problem.
I am not arguing for the sake of arguing. You are trying to correct few mistakes here (I truly appreciate that) and I am trying to explain my POV and making corrections as needed. I am sure we will conclude soon.

> 
> >
> > [1] https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-06-30 22:41                     ` Honnappa Nagarahalli
@ 2021-07-13  7:28                       ` Joyce Kong
  2021-07-14 11:44                         ` Thomas Monjalon
  0 siblings, 1 reply; 49+ messages in thread
From: Joyce Kong @ 2021-07-13  7:28 UTC (permalink / raw)
  To: david.marchand
  Cc: thomas, Honnappa Nagarahalli, Tyler Retzlaff, dev, stephen,
	olivier.matz, andrew.rybchenko, harry.van.haaren, Ruifeng Wang,
	techboard, nd

Hi David,

Since we have some discussion about the atomic operations now, I changed the commit message from "C11 atomics"(which has been widely used in previous commit) to "GCC atomic built-ins".
What's your opinion about whether keeping the previous message for the consistency or using the new description?

Joyce

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Thursday, July 1, 2021 6:41 AM
> To: Tyler Retzlaff <roretzla@linux.microsoft.com>
> Cc: thomas@monjalon.net; Joyce Kong <Joyce.Kong@arm.com>;
> dev@dpdk.org; david.marchand@redhat.com;
> stephen@networkplumber.org; olivier.matz@6wind.com;
> andrew.rybchenko@oktetlabs.ru; harry.van.haaren@intel.com; Ruifeng
> Wang <Ruifeng.Wang@arm.com>; nd <nd@arm.com>; techboard@dpdk.org;
> nd <nd@arm.com>
> Subject: RE: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
> 
> <snip>
> 
> > >
> > > > >
> > > > > As I mentioned earlier in this thread, GCC supports 2 types of
> > > > > atomics. "Use GCC atomic builtins" does not help distinguish
> > > > > between them. In "GCC's C11 atomic builtins" - "C11" indicates
> > > > > which atomics we are using, "atomic builtins" indicates that we
> > > > > are NOT using APIs from stdatomic.h
> > > >
> > > > if you need a term to distinguish the two sets of atomics in gcc
> > > > you can qualify it with "Memory Model Aware" which is straight
> > > > from the gcc
> > manual.
> > > "Memory model aware" sounds too generic. The same page [1] also
> > > makes
> > it clear that the built-in functions match the requirements for the
> > C11 memory model.
> >
> > allow me to put your interpretation of the manual that you linked side
> > by side with what the manual text actually says verbatim.
> >
> > your text from above
> >   "built-in functions match the requirements for the C11 memory model."
> >
> > the actual text from your link
> >   "built-in functions approximately match the requirements for the
> > C++11 memory model."
> >
> > * you've chosen to drop approximately from the wording to try and make
> >   your argument.
> I am not sure how this makes a difference to our arguments. For ex: there
> are no other built in functions that "exactly" match the C++11 memory model
> supported by GCC.
> 
> >
> > * you've also chosen to substitute C11 in place of C++11. again
> >   presumably for the same reason.
> >
> > in fact the entire page does not mention C11 even once, it also goes
> > on to highlight a specific deviation from C++11 with this excerpt
> > "because of a deficiency in C++11's semantics for
> memory_order_consume"
> I do not have a problem to call it C++11. IMO, calling it "GCC's C++11 ..." will
> address this deviation and the approximation.
> 
> >
> > > There are also several patches merged in the past which do not use
> > > the term
> > "memory model aware". I would prefer to be consistent.
> >
> > i prefer the history represent the change. that previous submitters
> > and reviewers lacked precision is not my concern nor is consistency a
> > reason to continue documenting history incorrectly.
> Ok. As I mentioned, it is just my preference.
> 
> >
> > i'm waiting to ack the change, it's up to you. you've already spent
> > more time arguing than it would have taken to submit a v2 correcting the
> problem.
> I am not arguing for the sake of arguing. You are trying to correct few
> mistakes here (I truly appreciate that) and I am trying to explain my POV and
> making corrections as needed. I am sure we will conclude soon.
> 
> >
> > >
> > > [1]
> > > https://gcc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test
  2021-07-13  7:28                       ` Joyce Kong
@ 2021-07-14 11:44                         ` Thomas Monjalon
  0 siblings, 0 replies; 49+ messages in thread
From: Thomas Monjalon @ 2021-07-14 11:44 UTC (permalink / raw)
  To: david.marchand, Joyce Kong
  Cc: Honnappa Nagarahalli, Tyler Retzlaff, dev, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, Ruifeng Wang, techboard, nd

13/07/2021 09:28, Joyce Kong:
> Hi David,
> 
> Since we have some discussion about the atomic operations now, I changed the commit message from "C11 atomics"(which has been widely used in previous commit) to "GCC atomic built-ins".
> What's your opinion about whether keeping the previous message for the consistency or using the new description?

Given clang adopted the same syntax as GCC,
I prefer "compiler atomic builtins".



^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v3 0/8] use compiler atomic builtins for test
  2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
                     ` (8 preceding siblings ...)
  2021-06-17 15:21   ` [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test Tyler Retzlaff
@ 2021-07-20  3:51   ` Joyce Kong
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 1/8] test/ticketlock: use compiler atomics for lcores sync Joyce Kong
                       ` (8 more replies)
  9 siblings, 9 replies; 49+ messages in thread
From: Joyce Kong @ 2021-07-20  3:51 UTC (permalink / raw)
  To: thomas, david.marchand, roretzla, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, honnappa.nagarahalli,
	ruifeng.wang
  Cc: dev, nd

Since atomic operations have been adopted in DPDK now[1], change
rte_atomicNN_xxx APIs to compiler's atomic built-ins in test cases.

[1] https://www.dpdk.org/blog/2021/03/26/dpdk-adopts-the-c11-memory-model/

v3:
 Change 'GCC atomic builtins' to 'compiler atomic builtins'
 as clang adopted the same syntax as GCC.<Thomas Monjalon>

v2:
 Use rte_wait_until_equal() instead of original sync loops.
 <David Marchand>

v1:
 The initial version.

Joyce Kong (8):
  test/ticketlock: use compiler atomics for lcores sync
  test/spinlock: use compile atomics for lcores sync
  test/rwlock: use compiler atomics for lcores sync
  test/mcslock: use compiler atomics for lcores sync
  test/mempool: remove unused variable for lcores sync
  test/mempool_perf: use compiler atomics for lcores sync
  test/service_cores: use compiler atomics for lock sync
  test/rcu: use compiler atomics for data sync

 app/test/test_mcslock.c       | 14 +++--
 app/test/test_mempool.c       |  5 --
 app/test/test_mempool_perf.c  | 11 ++--
 app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------
 app/test/test_rwlock.c        | 10 ++--
 app/test/test_service_cores.c | 36 +++++++------
 app/test/test_spinlock.c      |  9 ++--
 app/test/test_ticketlock.c    | 10 ++--
 8 files changed, 91 insertions(+), 102 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v3 1/8] test/ticketlock: use compiler atomics for lcores sync
  2021-07-20  3:51   ` [dpdk-dev] [PATCH v3 0/8] use compiler " Joyce Kong
@ 2021-07-20  3:51     ` Joyce Kong
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 2/8] test/spinlock: use compile " Joyce Kong
                       ` (7 subsequent siblings)
  8 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-07-20  3:51 UTC (permalink / raw)
  To: thomas, david.marchand, roretzla, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, honnappa.nagarahalli,
	ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins for lcores
sync in ticketlock testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_ticketlock.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/app/test/test_ticketlock.c b/app/test/test_ticketlock.c
index 7aab8665bc..242c136478 100644
--- a/app/test/test_ticketlock.c
+++ b/app/test/test_ticketlock.c
@@ -9,7 +9,6 @@
 #include <sys/queue.h>
 #include <unistd.h>
 
-#include <rte_atomic.h>
 #include <rte_common.h>
 #include <rte_cycles.h>
 #include <rte_eal.h>
@@ -49,7 +48,7 @@ static rte_ticketlock_t tl_tab[RTE_MAX_LCORE];
 static rte_ticketlock_recursive_t tlr;
 static unsigned int count;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 static int
 test_ticketlock_per_core(__rte_unused void *arg)
@@ -112,8 +111,7 @@ load_loop_fn(void *func_param)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0)
-			;
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_rdtsc_precise();
 	while (lcore_count[lcore] < MAX_LOOP) {
@@ -155,11 +153,11 @@ test_ticketlock_perf(void)
 	printf("\nTest with lock on %u cores...\n", rte_lcore_count());
 
 	/* Clear synchro and start workers */
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(&lock);
 
 	rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v3 2/8] test/spinlock: use compile atomics for lcores sync
  2021-07-20  3:51   ` [dpdk-dev] [PATCH v3 0/8] use compiler " Joyce Kong
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 1/8] test/ticketlock: use compiler atomics for lcores sync Joyce Kong
@ 2021-07-20  3:51     ` Joyce Kong
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 3/8] test/rwlock: use compiler " Joyce Kong
                       ` (6 subsequent siblings)
  8 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-07-20  3:51 UTC (permalink / raw)
  To: thomas, david.marchand, roretzla, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, honnappa.nagarahalli,
	ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins for lcores
sync in spinlock testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_spinlock.c | 9 ++++-----
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/app/test/test_spinlock.c b/app/test/test_spinlock.c
index 054fb43a9f..3f59372300 100644
--- a/app/test/test_spinlock.c
+++ b/app/test/test_spinlock.c
@@ -17,7 +17,6 @@
 #include <rte_lcore.h>
 #include <rte_cycles.h>
 #include <rte_spinlock.h>
-#include <rte_atomic.h>
 
 #include "test.h"
 
@@ -49,7 +48,7 @@ static rte_spinlock_t sl_tab[RTE_MAX_LCORE];
 static rte_spinlock_recursive_t slr;
 static unsigned count = 0;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 static int
 test_spinlock_per_core(__rte_unused void *arg)
@@ -111,7 +110,7 @@ load_loop_fn(void *func_param)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0);
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_get_timer_cycles();
 	while (lcount < MAX_LOOP) {
@@ -150,11 +149,11 @@ test_spinlock_perf(void)
 	printf("\nTest with lock on %u cores...\n", rte_lcore_count());
 
 	/* Clear synchro and start workers */
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(&lock);
 
 	rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v3 3/8] test/rwlock: use compiler atomics for lcores sync
  2021-07-20  3:51   ` [dpdk-dev] [PATCH v3 0/8] use compiler " Joyce Kong
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 1/8] test/ticketlock: use compiler atomics for lcores sync Joyce Kong
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 2/8] test/spinlock: use compile " Joyce Kong
@ 2021-07-20  3:51     ` Joyce Kong
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 4/8] test/mcslock: " Joyce Kong
                       ` (5 subsequent siblings)
  8 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-07-20  3:51 UTC (permalink / raw)
  To: thomas, david.marchand, roretzla, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, honnappa.nagarahalli,
	ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins for lcores
sync in rwlock testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_rwlock.c | 10 ++++------
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/app/test/test_rwlock.c b/app/test/test_rwlock.c
index b47150a86a..f2d1c8883c 100644
--- a/app/test/test_rwlock.c
+++ b/app/test/test_rwlock.c
@@ -13,7 +13,6 @@
 #include <rte_memory.h>
 #include <rte_per_lcore.h>
 #include <rte_launch.h>
-#include <rte_atomic.h>
 #include <rte_rwlock.h>
 #include <rte_eal.h>
 #include <rte_lcore.h>
@@ -36,7 +35,7 @@
 
 static rte_rwlock_t sl;
 static rte_rwlock_t sl_tab[RTE_MAX_LCORE];
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 enum {
 	LC_TYPE_RDLOCK,
@@ -102,8 +101,7 @@ load_loop_fn(__rte_unused void *arg)
 
 	/* wait synchro for workers */
 	if (lcore != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0)
-			;
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_rdtsc_precise();
 	while (lcount < MAX_LOOP) {
@@ -136,12 +134,12 @@ test_rwlock_perf(void)
 	printf("\nRwlock Perf Test on %u cores...\n", rte_lcore_count());
 
 	/* clear synchro and start workers */
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 	if (rte_eal_mp_remote_launch(load_loop_fn, NULL, SKIP_MAIN) < 0)
 		return -1;
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(NULL);
 
 	rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v3 4/8] test/mcslock: use compiler atomics for lcores sync
  2021-07-20  3:51   ` [dpdk-dev] [PATCH v3 0/8] use compiler " Joyce Kong
                       ` (2 preceding siblings ...)
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 3/8] test/rwlock: use compiler " Joyce Kong
@ 2021-07-20  3:51     ` Joyce Kong
  2021-07-28  9:56       ` Olivier Matz
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 5/8] test/mempool: remove unused variable " Joyce Kong
                       ` (4 subsequent siblings)
  8 siblings, 1 reply; 49+ messages in thread
From: Joyce Kong @ 2021-07-20  3:51 UTC (permalink / raw)
  To: thomas, david.marchand, roretzla, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, honnappa.nagarahalli,
	ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins for lcores
sync in mcslock testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_mcslock.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c
index 80eaecc90a..52e45e7e2a 100644
--- a/app/test/test_mcslock.c
+++ b/app/test/test_mcslock.c
@@ -17,7 +17,6 @@
 #include <rte_lcore.h>
 #include <rte_cycles.h>
 #include <rte_mcslock.h>
-#include <rte_atomic.h>
 
 #include "test.h"
 
@@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf;
 
 static unsigned int count;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 static int
 test_mcslock_per_core(__rte_unused void *arg)
@@ -76,8 +75,7 @@ load_loop_fn(void *func_param)
 	rte_mcslock_t ml_perf_me;
 
 	/* wait synchro */
-	while (rte_atomic32_read(&synchro) == 0)
-		;
+	rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	begin = rte_get_timer_cycles();
 	while (lcount < MAX_LOOP) {
@@ -102,15 +100,15 @@ test_mcslock_perf(void)
 	const unsigned int lcore = rte_lcore_id();
 
 	printf("\nTest with no lock on single core...\n");
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(&lock);
 	printf("Core [%u] Cost Time = %"PRIu64" us\n",
 			lcore, time_count[lcore]);
 	memset(time_count, 0, sizeof(time_count));
 
 	printf("\nTest with lock on single core...\n");
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	lock = 1;
-	rte_atomic32_set(&synchro, 1);
 	load_loop_fn(&lock);
 	printf("Core [%u] Cost Time = %"PRIu64" us\n",
 			lcore, time_count[lcore]);
@@ -118,11 +116,11 @@ test_mcslock_perf(void)
 
 	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
 
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 	load_loop_fn(&lock);
 
 	rte_eal_mp_wait_lcore();
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v3 5/8] test/mempool: remove unused variable for lcores sync
  2021-07-20  3:51   ` [dpdk-dev] [PATCH v3 0/8] use compiler " Joyce Kong
                       ` (3 preceding siblings ...)
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 4/8] test/mcslock: " Joyce Kong
@ 2021-07-20  3:51     ` Joyce Kong
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 6/8] test/mempool_perf: use compiler atomics " Joyce Kong
                       ` (3 subsequent siblings)
  8 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-07-20  3:51 UTC (permalink / raw)
  To: thomas, david.marchand, roretzla, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, honnappa.nagarahalli,
	ruifeng.wang
  Cc: dev, nd

Remove the unused synchro variable as there is no lcores
sync in mempool function test.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_mempool.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/app/test/test_mempool.c b/app/test/test_mempool.c
index 3adadd6731..7675a3e605 100644
--- a/app/test/test_mempool.c
+++ b/app/test/test_mempool.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_spinlock.h>
@@ -57,8 +56,6 @@
 		goto label;						\
 	} while (0)
 
-static rte_atomic32_t synchro;
-
 /*
  * save the object number in the first 4 bytes of object data. All
  * other bytes are set to 0.
@@ -491,8 +488,6 @@ test_mempool(void)
 	};
 	const char *default_pool_ops = rte_mbuf_best_mempool_ops();
 
-	rte_atomic32_init(&synchro);
-
 	/* create a mempool (without cache) */
 	mp_nocache = rte_mempool_create("test_nocache", MEMPOOL_SIZE,
 		MEMPOOL_ELT_SIZE, 0, 0,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v3 6/8] test/mempool_perf: use compiler atomics for lcores sync
  2021-07-20  3:51   ` [dpdk-dev] [PATCH v3 0/8] use compiler " Joyce Kong
                       ` (4 preceding siblings ...)
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 5/8] test/mempool: remove unused variable " Joyce Kong
@ 2021-07-20  3:51     ` Joyce Kong
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 7/8] test/service_cores: use compiler atomics for lock sync Joyce Kong
                       ` (2 subsequent siblings)
  8 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-07-20  3:51 UTC (permalink / raw)
  To: thomas, david.marchand, roretzla, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, honnappa.nagarahalli,
	ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins for lcores
sync in mempool_perf testcases. Meanwhile, remove unnecessary
synchro init as it would be set to 0 when launching cores.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_mempool_perf.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/app/test/test_mempool_perf.c b/app/test/test_mempool_perf.c
index d7d0aaa334..8f629736e8 100644
--- a/app/test/test_mempool_perf.c
+++ b/app/test/test_mempool_perf.c
@@ -20,7 +20,6 @@
 #include <rte_eal.h>
 #include <rte_per_lcore.h>
 #include <rte_lcore.h>
-#include <rte_atomic.h>
 #include <rte_branch_prediction.h>
 #include <rte_mempool.h>
 #include <rte_spinlock.h>
@@ -83,7 +82,7 @@
 static int use_external_cache;
 static unsigned external_cache_size = RTE_MEMPOOL_CACHE_MAX_SIZE;
 
-static rte_atomic32_t synchro;
+static uint32_t synchro;
 
 /* number of objects in one bulk operation (get or put) */
 static unsigned n_get_bulk;
@@ -145,7 +144,7 @@ per_lcore_mempool_test(void *arg)
 
 	/* wait synchro for workers */
 	if (lcore_id != rte_get_main_lcore())
-		while (rte_atomic32_read(&synchro) == 0);
+		rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
 
 	start_cycles = rte_get_timer_cycles();
 
@@ -198,7 +197,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
 	int ret;
 	unsigned cores_save = cores;
 
-	rte_atomic32_set(&synchro, 0);
+	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
 
 	/* reset stats */
 	memset(stats, 0, sizeof(stats));
@@ -223,7 +222,7 @@ launch_cores(struct rte_mempool *mp, unsigned int cores)
 	}
 
 	/* start synchro and launch test on main */
-	rte_atomic32_set(&synchro, 1);
+	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
 
 	ret = per_lcore_mempool_test(mp);
 
@@ -288,8 +287,6 @@ test_mempool_perf(void)
 	const char *default_pool_ops;
 	int ret = -1;
 
-	rte_atomic32_init(&synchro);
-
 	/* create a mempool (without cache) */
 	mp_nocache = rte_mempool_create("perf_test_nocache", MEMPOOL_SIZE,
 					MEMPOOL_ELT_SIZE, 0, 0,
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v3 7/8] test/service_cores: use compiler atomics for lock sync
  2021-07-20  3:51   ` [dpdk-dev] [PATCH v3 0/8] use compiler " Joyce Kong
                       ` (5 preceding siblings ...)
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 6/8] test/mempool_perf: use compiler atomics " Joyce Kong
@ 2021-07-20  3:51     ` Joyce Kong
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 8/8] test/rcu: use compiler atomics for data sync Joyce Kong
  2021-07-30 21:58     ` [dpdk-dev] [PATCH v3 0/8] use compiler atomic builtins for test Thomas Monjalon
  8 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-07-20  3:51 UTC (permalink / raw)
  To: thomas, david.marchand, roretzla, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, honnappa.nagarahalli,
	ruifeng.wang
  Cc: dev, nd

Convert rte_atomic usages to compiler atomic built-ins for lock
sync in service_cores testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_service_cores.c | 36 +++++++++++++++++++----------------
 1 file changed, 20 insertions(+), 16 deletions(-)

diff --git a/app/test/test_service_cores.c b/app/test/test_service_cores.c
index 37d7172d53..ece104054e 100644
--- a/app/test/test_service_cores.c
+++ b/app/test/test_service_cores.c
@@ -53,18 +53,20 @@ static int32_t dummy_cb(void *args)
 static int32_t dummy_mt_unsafe_cb(void *args)
 {
 	/* before running test, the initialization has set pass_test to 1.
-	 * If the cmpset in service-cores is working correctly, the code here
+	 * If the CAS in service-cores is working correctly, the code here
 	 * should never fail to take the lock. If the lock *is* taken, fail the
 	 * test, because two threads are concurrently in a non-MT safe callback.
 	 */
 	uint32_t *test_params = args;
-	uint32_t *atomic_lock = &test_params[0];
+	uint32_t *lock = &test_params[0];
 	uint32_t *pass_test = &test_params[1];
-	int lock_taken = rte_atomic32_cmpset(atomic_lock, 0, 1);
+	uint32_t exp = 0;
+	int lock_taken = __atomic_compare_exchange_n(lock, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED);
 	if (lock_taken) {
 		/* delay with the lock held */
 		rte_delay_ms(250);
-		rte_atomic32_clear((rte_atomic32_t *)atomic_lock);
+		__atomic_store_n(lock, 0, __ATOMIC_RELAXED);
 	} else {
 		/* 2nd thread will fail to take lock, so set pass flag */
 		*pass_test = 0;
@@ -83,13 +85,15 @@ static int32_t dummy_mt_safe_cb(void *args)
 	 *    that 2 threads are running the callback at the same time: MT safe
 	 */
 	uint32_t *test_params = args;
-	uint32_t *atomic_lock = &test_params[0];
+	uint32_t *lock = &test_params[0];
 	uint32_t *pass_test = &test_params[1];
-	int lock_taken = rte_atomic32_cmpset(atomic_lock, 0, 1);
+	uint32_t exp = 0;
+	int lock_taken = __atomic_compare_exchange_n(lock, &exp, 1, 0,
+					__ATOMIC_RELAXED, __ATOMIC_RELAXED);
 	if (lock_taken) {
 		/* delay with the lock held */
 		rte_delay_ms(250);
-		rte_atomic32_clear((rte_atomic32_t *)atomic_lock);
+		__atomic_store_n(lock, 0, __ATOMIC_RELAXED);
 	} else {
 		/* 2nd thread will fail to take lock, so set pass flag */
 		*pass_test = 1;
@@ -622,9 +626,9 @@ service_threaded_test(int mt_safe)
 	TEST_ASSERT_EQUAL(0, rte_service_lcore_add(slcore_2),
 			"mt safe lcore add fail");
 
-	/* Use atomic locks to verify that two threads are in the same function
-	 * at the same time. These are passed to the unit tests through the
-	 * callback userdata parameter
+	/* Use locks to verify that two threads are in the same function
+	 * at the same time. These are passed to the unit tests through
+	 * the callback userdata parameter.
 	 */
 	uint32_t test_params[2];
 	memset(test_params, 0, sizeof(uint32_t) * 2);
@@ -713,7 +717,7 @@ service_mt_safe_poll(void)
 }
 
 /* tests a NON mt safe service with two cores, the callback is serialized
- * using the atomic cmpset.
+ * using the CAS.
  */
 static int
 service_mt_unsafe_poll(void)
@@ -735,17 +739,17 @@ delay_as_a_mt_safe_service(void *args)
 	RTE_SET_USED(args);
 	uint32_t *params = args;
 
-	/* retrieve done flag and atomic lock to inc/dec */
+	/* retrieve done flag and lock to add/sub */
 	uint32_t *done = &params[0];
-	rte_atomic32_t *lock = (rte_atomic32_t *)&params[1];
+	uint32_t *lock = &params[1];
 
 	while (!*done) {
-		rte_atomic32_inc(lock);
+		__atomic_add_fetch(lock, 1, __ATOMIC_RELAXED);
 		rte_delay_us(500);
-		if (rte_atomic32_read(lock) > 1)
+		if (__atomic_load_n(lock, __ATOMIC_RELAXED) > 1)
 			/* pass: second core has simultaneously incremented */
 			*done = 1;
-		rte_atomic32_dec(lock);
+		__atomic_sub_fetch(lock, 1, __ATOMIC_RELAXED);
 	}
 
 	return 0;
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* [dpdk-dev] [PATCH v3 8/8] test/rcu: use compiler atomics for data sync
  2021-07-20  3:51   ` [dpdk-dev] [PATCH v3 0/8] use compiler " Joyce Kong
                       ` (6 preceding siblings ...)
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 7/8] test/service_cores: use compiler atomics for lock sync Joyce Kong
@ 2021-07-20  3:51     ` Joyce Kong
  2021-07-23 19:52       ` Andrew Rybchenko
  2021-07-30 21:58     ` [dpdk-dev] [PATCH v3 0/8] use compiler atomic builtins for test Thomas Monjalon
  8 siblings, 1 reply; 49+ messages in thread
From: Joyce Kong @ 2021-07-20  3:51 UTC (permalink / raw)
  To: thomas, david.marchand, roretzla, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, honnappa.nagarahalli,
	ruifeng.wang
  Cc: dev, nd

Covert rte_atomic usages to compiler atomic built-ins in
rcu_perf testcases.

Signed-off-by: Joyce Kong <joyce.kong@arm.com>
Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------
 1 file changed, 49 insertions(+), 49 deletions(-)

diff --git a/app/test/test_rcu_qsbr_perf.c b/app/test/test_rcu_qsbr_perf.c
index 3017e71120..cf7b158d22 100644
--- a/app/test/test_rcu_qsbr_perf.c
+++ b/app/test/test_rcu_qsbr_perf.c
@@ -30,8 +30,8 @@ static volatile uint32_t thr_id;
 static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
 static struct rte_hash *h;
 static char hash_name[8];
-static rte_atomic64_t updates, checks;
-static rte_atomic64_t update_cycles, check_cycles;
+static uint64_t updates, checks;
+static uint64_t update_cycles, check_cycles;
 
 /* Scale down results to 1000 operations to support lower
  * granularity clocks.
@@ -81,8 +81,8 @@ test_rcu_qsbr_reader_perf(void *arg)
 	}
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&update_cycles, cycles);
-	rte_atomic64_add(&updates, loop_cnt);
+	__atomic_fetch_add(&update_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&updates, loop_cnt, __ATOMIC_RELAXED);
 
 	/* Make the thread offline */
 	rte_rcu_qsbr_thread_offline(t[0], thread_id);
@@ -113,8 +113,8 @@ test_rcu_qsbr_writer_perf(void *arg)
 	} while (loop_cnt < 20000000);
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&check_cycles, cycles);
-	rte_atomic64_add(&checks, loop_cnt);
+	__atomic_fetch_add(&check_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&checks, loop_cnt, __ATOMIC_RELAXED);
 	return 0;
 }
 
@@ -130,10 +130,10 @@ test_rcu_qsbr_perf(void)
 
 	writer_done = 0;
 
-	rte_atomic64_clear(&updates);
-	rte_atomic64_clear(&update_cycles);
-	rte_atomic64_clear(&checks);
-	rte_atomic64_clear(&check_cycles);
+	__atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
 	printf("\nPerf Test: %d Readers/1 Writer('wait' in qsbr_check == true)\n",
 		num_cores - 1);
@@ -168,15 +168,15 @@ test_rcu_qsbr_perf(void)
 	rte_eal_mp_wait_lcore();
 
 	printf("Total quiescent state updates = %"PRIi64"\n",
-		rte_atomic64_read(&updates));
+		__atomic_load_n(&updates, __ATOMIC_RELAXED));
 	printf("Cycles per %d quiescent state updates: %"PRIi64"\n",
 		RCU_SCALE_DOWN,
-		rte_atomic64_read(&update_cycles) /
-		(rte_atomic64_read(&updates) / RCU_SCALE_DOWN));
-	printf("Total RCU checks = %"PRIi64"\n", rte_atomic64_read(&checks));
+		__atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+		(__atomic_load_n(&updates, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
+	printf("Total RCU checks = %"PRIi64"\n", __atomic_load_n(&checks, __ATOMIC_RELAXED));
 	printf("Cycles per %d checks: %"PRIi64"\n", RCU_SCALE_DOWN,
-		rte_atomic64_read(&check_cycles) /
-		(rte_atomic64_read(&checks) / RCU_SCALE_DOWN));
+		__atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+		(__atomic_load_n(&checks, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
 
 	rte_free(t[0]);
 
@@ -193,8 +193,8 @@ test_rcu_qsbr_rperf(void)
 	size_t sz;
 	unsigned int i, tmp_num_cores;
 
-	rte_atomic64_clear(&updates);
-	rte_atomic64_clear(&update_cycles);
+	__atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
 
 	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
 
@@ -220,11 +220,11 @@ test_rcu_qsbr_rperf(void)
 	rte_eal_mp_wait_lcore();
 
 	printf("Total quiescent state updates = %"PRIi64"\n",
-		rte_atomic64_read(&updates));
+		__atomic_load_n(&updates, __ATOMIC_RELAXED));
 	printf("Cycles per %d quiescent state updates: %"PRIi64"\n",
 		RCU_SCALE_DOWN,
-		rte_atomic64_read(&update_cycles) /
-		(rte_atomic64_read(&updates) / RCU_SCALE_DOWN));
+		__atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+		(__atomic_load_n(&updates, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
 
 	rte_free(t[0]);
 
@@ -241,8 +241,8 @@ test_rcu_qsbr_wperf(void)
 	size_t sz;
 	unsigned int i;
 
-	rte_atomic64_clear(&checks);
-	rte_atomic64_clear(&check_cycles);
+	__atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
 	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
 
@@ -266,10 +266,10 @@ test_rcu_qsbr_wperf(void)
 	/* Wait until all readers have exited */
 	rte_eal_mp_wait_lcore();
 
-	printf("Total RCU checks = %"PRIi64"\n", rte_atomic64_read(&checks));
+	printf("Total RCU checks = %"PRIi64"\n", __atomic_load_n(&checks, __ATOMIC_RELAXED));
 	printf("Cycles per %d checks: %"PRIi64"\n", RCU_SCALE_DOWN,
-		rte_atomic64_read(&check_cycles) /
-		(rte_atomic64_read(&checks) / RCU_SCALE_DOWN));
+		__atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+		(__atomic_load_n(&checks, __ATOMIC_RELAXED) / RCU_SCALE_DOWN));
 
 	rte_free(t[0]);
 
@@ -317,8 +317,8 @@ test_rcu_qsbr_hash_reader(void *arg)
 	} while (!writer_done);
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&update_cycles, cycles);
-	rte_atomic64_add(&updates, loop_cnt);
+	__atomic_fetch_add(&update_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&updates, loop_cnt, __ATOMIC_RELAXED);
 
 	rte_rcu_qsbr_thread_unregister(temp, thread_id);
 
@@ -389,10 +389,10 @@ test_rcu_qsbr_sw_sv_1qs(void)
 
 	writer_done = 0;
 
-	rte_atomic64_clear(&updates);
-	rte_atomic64_clear(&update_cycles);
-	rte_atomic64_clear(&checks);
-	rte_atomic64_clear(&check_cycles);
+	__atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
 	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
 
@@ -453,8 +453,8 @@ test_rcu_qsbr_sw_sv_1qs(void)
 	}
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&check_cycles, cycles);
-	rte_atomic64_add(&checks, i);
+	__atomic_fetch_add(&check_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&checks, i, __ATOMIC_RELAXED);
 
 	writer_done = 1;
 
@@ -467,12 +467,12 @@ test_rcu_qsbr_sw_sv_1qs(void)
 
 	printf("Following numbers include calls to rte_hash functions\n");
 	printf("Cycles per 1 quiescent state update(online/update/offline): %"PRIi64"\n",
-		rte_atomic64_read(&update_cycles) /
-		rte_atomic64_read(&updates));
+		__atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+		__atomic_load_n(&updates, __ATOMIC_RELAXED));
 
 	printf("Cycles per 1 check(start, check): %"PRIi64"\n\n",
-		rte_atomic64_read(&check_cycles) /
-		rte_atomic64_read(&checks));
+		__atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+		__atomic_load_n(&checks, __ATOMIC_RELAXED));
 
 	rte_free(t[0]);
 
@@ -511,7 +511,7 @@ test_rcu_qsbr_sw_sv_1qs_non_blocking(void)
 
 	printf("Perf test: 1 writer, %d readers, 1 QSBR variable, 1 QSBR Query, Non-Blocking QSBR check\n", num_cores);
 
-	__atomic_store_n(&thr_id, 0, __ATOMIC_SEQ_CST);
+	__atomic_store_n(&thr_id, 0, __ATOMIC_RELAXED);
 
 	if (all_registered == 1)
 		tmp_num_cores = num_cores;
@@ -570,8 +570,8 @@ test_rcu_qsbr_sw_sv_1qs_non_blocking(void)
 	}
 
 	cycles = rte_rdtsc_precise() - begin;
-	rte_atomic64_add(&check_cycles, cycles);
-	rte_atomic64_add(&checks, i);
+	__atomic_fetch_add(&check_cycles, cycles, __ATOMIC_RELAXED);
+	__atomic_fetch_add(&checks, i, __ATOMIC_RELAXED);
 
 	writer_done = 1;
 	/* Wait and check return value from reader threads */
@@ -583,12 +583,12 @@ test_rcu_qsbr_sw_sv_1qs_non_blocking(void)
 
 	printf("Following numbers include calls to rte_hash functions\n");
 	printf("Cycles per 1 quiescent state update(online/update/offline): %"PRIi64"\n",
-		rte_atomic64_read(&update_cycles) /
-		rte_atomic64_read(&updates));
+		__atomic_load_n(&update_cycles, __ATOMIC_RELAXED) /
+		__atomic_load_n(&updates, __ATOMIC_RELAXED));
 
 	printf("Cycles per 1 check(start, check): %"PRIi64"\n\n",
-		rte_atomic64_read(&check_cycles) /
-		rte_atomic64_read(&checks));
+		__atomic_load_n(&check_cycles, __ATOMIC_RELAXED) /
+		__atomic_load_n(&checks, __ATOMIC_RELAXED));
 
 	rte_free(t[0]);
 
@@ -619,10 +619,10 @@ test_rcu_qsbr_main(void)
 		return TEST_SKIPPED;
 	}
 
-	rte_atomic64_init(&updates);
-	rte_atomic64_init(&update_cycles);
-	rte_atomic64_init(&checks);
-	rte_atomic64_init(&check_cycles);
+	__atomic_store_n(&updates, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&update_cycles, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&checks, 0, __ATOMIC_RELAXED);
+	__atomic_store_n(&check_cycles, 0, __ATOMIC_RELAXED);
 
 	num_cores = 0;
 	RTE_LCORE_FOREACH_WORKER(core_id) {
-- 
2.17.1


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v3 8/8] test/rcu: use compiler atomics for data sync
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 8/8] test/rcu: use compiler atomics for data sync Joyce Kong
@ 2021-07-23 19:52       ` Andrew Rybchenko
  2021-07-28  7:07         ` Joyce Kong
  0 siblings, 1 reply; 49+ messages in thread
From: Andrew Rybchenko @ 2021-07-23 19:52 UTC (permalink / raw)
  To: Joyce Kong, thomas, david.marchand, roretzla, stephen,
	olivier.matz, harry.van.haaren, honnappa.nagarahalli,
	ruifeng.wang
  Cc: dev, nd

On 7/20/21 6:51 AM, Joyce Kong wrote:
> Covert rte_atomic usages to compiler atomic built-ins in
> rcu_perf testcases.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>   app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------
>   1 file changed, 49 insertions(+), 49 deletions(-)
> 
> diff --git a/app/test/test_rcu_qsbr_perf.c b/app/test/test_rcu_qsbr_perf.c
> index 3017e71120..cf7b158d22 100644
> --- a/app/test/test_rcu_qsbr_perf.c
> +++ b/app/test/test_rcu_qsbr_perf.c
> @@ -30,8 +30,8 @@ static volatile uint32_t thr_id;
>   static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
>   static struct rte_hash *h;
>   static char hash_name[8];
> -static rte_atomic64_t updates, checks;
> -static rte_atomic64_t update_cycles, check_cycles;
> +static uint64_t updates, checks;
> +static uint64_t update_cycles, check_cycles;
>   
>   /* Scale down results to 1000 operations to support lower
>    * granularity clocks.
> @@ -81,8 +81,8 @@ test_rcu_qsbr_reader_perf(void *arg)
>   	}
>   
>   	cycles = rte_rdtsc_precise() - begin;
> -	rte_atomic64_add(&update_cycles, cycles);
> -	rte_atomic64_add(&updates, loop_cnt);
> +	__atomic_fetch_add(&update_cycles, cycles, __ATOMIC_RELAXED);
> +	__atomic_fetch_add(&updates, loop_cnt, __ATOMIC_RELAXED);

Shouldn't __atomic_add_fetch() be used instead since it pseudo-code is
a bit simpler. What is the best option if return value is not actually
used?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v3 8/8] test/rcu: use compiler atomics for data sync
  2021-07-23 19:52       ` Andrew Rybchenko
@ 2021-07-28  7:07         ` Joyce Kong
  0 siblings, 0 replies; 49+ messages in thread
From: Joyce Kong @ 2021-07-28  7:07 UTC (permalink / raw)
  To: Andrew Rybchenko, thomas, david.marchand, roretzla, stephen,
	olivier.matz, harry.van.haaren, Honnappa Nagarahalli,
	Ruifeng Wang
  Cc: dev, nd

> -----Original Message-----
> From: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
> Sent: Saturday, July 24, 2021 3:52 AM
> To: Joyce Kong <Joyce.Kong@arm.com>; thomas@monjalon.net;
> david.marchand@redhat.com; roretzla@linux.microsoft.com;
> stephen@networkplumber.org; olivier.matz@6wind.com;
> harry.van.haaren@intel.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>
> Cc: dev@dpdk.org; nd <nd@arm.com>
> Subject: Re: [PATCH v3 8/8] test/rcu: use compiler atomics for data sync
> 
> On 7/20/21 6:51 AM, Joyce Kong wrote:
> > Covert rte_atomic usages to compiler atomic built-ins in rcu_perf
> > testcases.
> >
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >   app/test/test_rcu_qsbr_perf.c | 98 +++++++++++++++++------------------
> >   1 file changed, 49 insertions(+), 49 deletions(-)
> >
> > diff --git a/app/test/test_rcu_qsbr_perf.c
> > b/app/test/test_rcu_qsbr_perf.c index 3017e71120..cf7b158d22 100644
> > --- a/app/test/test_rcu_qsbr_perf.c
> > +++ b/app/test/test_rcu_qsbr_perf.c
> > @@ -30,8 +30,8 @@ static volatile uint32_t thr_id;
> >   static struct rte_rcu_qsbr *t[RTE_MAX_LCORE];
> >   static struct rte_hash *h;
> >   static char hash_name[8];
> > -static rte_atomic64_t updates, checks; -static rte_atomic64_t
> > update_cycles, check_cycles;
> > +static uint64_t updates, checks;
> > +static uint64_t update_cycles, check_cycles;
> >
> >   /* Scale down results to 1000 operations to support lower
> >    * granularity clocks.
> > @@ -81,8 +81,8 @@ test_rcu_qsbr_reader_perf(void *arg)
> >   	}
> >
> >   	cycles = rte_rdtsc_precise() - begin;
> > -	rte_atomic64_add(&update_cycles, cycles);
> > -	rte_atomic64_add(&updates, loop_cnt);
> > +	__atomic_fetch_add(&update_cycles, cycles, __ATOMIC_RELAXED);
> > +	__atomic_fetch_add(&updates, loop_cnt, __ATOMIC_RELAXED);
> 
> Shouldn't __atomic_add_fetch() be used instead since it pseudo-code is a bit
> simpler. What is the best option if return value is not actually used?

If the return value is not used, like the situations here, the instructions for __atomic_fetch_add() and __atomic_add_fetch() would be the same on X86 and Arm for gcc and clang that I have tried.
If the return value is used, __atomic_add_fetch() would do two more instructions('mov' 'add') than __atomic_fetch_add() to return the calculation result.
Based on experiments here: https://godbolt.org/ .

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/8] test/mcslock: use compiler atomics for lcores sync
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 4/8] test/mcslock: " Joyce Kong
@ 2021-07-28  9:56       ` Olivier Matz
  2021-07-29  7:19         ` Joyce Kong
  0 siblings, 1 reply; 49+ messages in thread
From: Olivier Matz @ 2021-07-28  9:56 UTC (permalink / raw)
  To: Joyce Kong
  Cc: thomas, david.marchand, roretzla, stephen, andrew.rybchenko,
	harry.van.haaren, honnappa.nagarahalli, ruifeng.wang, dev, nd

Hi Joyce,

On Mon, Jul 19, 2021 at 10:51:21PM -0500, Joyce Kong wrote:
> Convert rte_atomic usages to compiler atomic built-ins for lcores
> sync in mcslock testcases.
> 
> Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  app/test/test_mcslock.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)
> 
> diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c
> index 80eaecc90a..52e45e7e2a 100644
> --- a/app/test/test_mcslock.c
> +++ b/app/test/test_mcslock.c
> @@ -17,7 +17,6 @@
>  #include <rte_lcore.h>
>  #include <rte_cycles.h>
>  #include <rte_mcslock.h>
> -#include <rte_atomic.h>
>  
>  #include "test.h"
>  
> @@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf;
>  
>  static unsigned int count;
>  
> -static rte_atomic32_t synchro;
> +static uint32_t synchro;
>  
>  static int
>  test_mcslock_per_core(__rte_unused void *arg)
> @@ -76,8 +75,7 @@ load_loop_fn(void *func_param)
>  	rte_mcslock_t ml_perf_me;
>  
>  	/* wait synchro */
> -	while (rte_atomic32_read(&synchro) == 0)
> -		;
> +	rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
>  
>  	begin = rte_get_timer_cycles();
>  	while (lcount < MAX_LOOP) {
> @@ -102,15 +100,15 @@ test_mcslock_perf(void)
>  	const unsigned int lcore = rte_lcore_id();
>  
>  	printf("\nTest with no lock on single core...\n");
> -	rte_atomic32_set(&synchro, 1);
> +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
>  	load_loop_fn(&lock);
>  	printf("Core [%u] Cost Time = %"PRIu64" us\n",
>  			lcore, time_count[lcore]);
>  	memset(time_count, 0, sizeof(time_count));
>  
>  	printf("\nTest with lock on single core...\n");
> +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
>  	lock = 1;
> -	rte_atomic32_set(&synchro, 1);

nit: is there a reason for moving this line?


>  	load_loop_fn(&lock);
>  	printf("Core [%u] Cost Time = %"PRIu64" us\n",
>  			lcore, time_count[lcore]);
> @@ -118,11 +116,11 @@ test_mcslock_perf(void)
>  
>  	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
>  
> -	rte_atomic32_set(&synchro, 0);
> +	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
>  	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
>  
>  	/* start synchro and launch test on main */
> -	rte_atomic32_set(&synchro, 1);
> +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
>  	load_loop_fn(&lock);

I have a more general question. Please forgive my ignorance about the
C++11 atomic builtins and memory model. Both gcc manual and C11 standard
are not that easy to understand :)

In all the patches of this patchset, __ATOMIC_RELAXED is used. My
understanding is that it does not add any inter-thread ordering
constraint. I suppose that in this particular case, we rely on
the call to rte_eal_mp_remote_launch() being a compiler barrier,
and the function itself to be a memory barrier. This ensures that
worker threads sees synchro=0 until it is set to 1 by the master.
Is it correct?

What is the reason for using the atomic API here? Wouldn't a standard
affectation work too? (I mean "synchro = 1;")


>  
>  	rte_eal_mp_wait_lcore();
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/8] test/mcslock: use compiler atomics for lcores sync
  2021-07-28  9:56       ` Olivier Matz
@ 2021-07-29  7:19         ` Joyce Kong
  2021-07-29  7:58           ` Olivier Matz
  0 siblings, 1 reply; 49+ messages in thread
From: Joyce Kong @ 2021-07-29  7:19 UTC (permalink / raw)
  To: Olivier Matz
  Cc: thomas, david.marchand, roretzla, stephen, andrew.rybchenko,
	harry.van.haaren, Honnappa Nagarahalli, Ruifeng Wang, dev, nd

Hi Olivier,

> -----Original Message-----
> From: Olivier Matz <olivier.matz@6wind.com>
> Sent: Wednesday, July 28, 2021 5:57 PM
> To: Joyce Kong <Joyce.Kong@arm.com>
> Cc: thomas@monjalon.net; david.marchand@redhat.com;
> roretzla@linux.microsoft.com; stephen@networkplumber.org;
> andrew.rybchenko@oktetlabs.ru; harry.van.haaren@intel.com; Honnappa
> Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; dev@dpdk.org; nd <nd@arm.com>
> Subject: Re: [PATCH v3 4/8] test/mcslock: use compiler atomics for lcores
> sync
> 
> Hi Joyce,
> 
> On Mon, Jul 19, 2021 at 10:51:21PM -0500, Joyce Kong wrote:
> > Convert rte_atomic usages to compiler atomic built-ins for lcores sync
> > in mcslock testcases.
> >
> > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  app/test/test_mcslock.c | 14 ++++++--------
> >  1 file changed, 6 insertions(+), 8 deletions(-)
> >
> > diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c index
> > 80eaecc90a..52e45e7e2a 100644
> > --- a/app/test/test_mcslock.c
> > +++ b/app/test/test_mcslock.c
> > @@ -17,7 +17,6 @@
> >  #include <rte_lcore.h>
> >  #include <rte_cycles.h>
> >  #include <rte_mcslock.h>
> > -#include <rte_atomic.h>
> >
> >  #include "test.h"
> >
> > @@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf;
> >
> >  static unsigned int count;
> >
> > -static rte_atomic32_t synchro;
> > +static uint32_t synchro;
> >
> >  static int
> >  test_mcslock_per_core(__rte_unused void *arg) @@ -76,8 +75,7 @@
> > load_loop_fn(void *func_param)
> >  	rte_mcslock_t ml_perf_me;
> >
> >  	/* wait synchro */
> > -	while (rte_atomic32_read(&synchro) == 0)
> > -		;
> > +	rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
> >
> >  	begin = rte_get_timer_cycles();
> >  	while (lcount < MAX_LOOP) {
> > @@ -102,15 +100,15 @@ test_mcslock_perf(void)
> >  	const unsigned int lcore = rte_lcore_id();
> >
> >  	printf("\nTest with no lock on single core...\n");
> > -	rte_atomic32_set(&synchro, 1);
> > +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
> >  	load_loop_fn(&lock);
> >  	printf("Core [%u] Cost Time = %"PRIu64" us\n",
> >  			lcore, time_count[lcore]);
> >  	memset(time_count, 0, sizeof(time_count));
> >
> >  	printf("\nTest with lock on single core...\n");
> > +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
> >  	lock = 1;
> > -	rte_atomic32_set(&synchro, 1);
> 
> nit: is there a reason for moving this line?

I meant to use __atomic_store_n() instead of rte_atomic32_set() to set synchro,
but put the operation to the line up 'lock=1' by mistake, will change it.

> > 
> >  	load_loop_fn(&lock);
> >  	printf("Core [%u] Cost Time = %"PRIu64" us\n",
> >  			lcore, time_count[lcore]);
> > @@ -118,11 +116,11 @@ test_mcslock_perf(void)
> >
> >  	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
> >
> > -	rte_atomic32_set(&synchro, 0);
> > +	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
> >  	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
> >
> >  	/* start synchro and launch test on main */
> > -	rte_atomic32_set(&synchro, 1);
> > +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
> >  	load_loop_fn(&lock);
> 
> I have a more general question. Please forgive my ignorance about the
> C++11 atomic builtins and memory model. Both gcc manual and C11
> standard
> are not that easy to understand :)
> 
> In all the patches of this patchset, __ATOMIC_RELAXED is used. My
> understanding is that it does not add any inter-thread ordering constraint. I
> suppose that in this particular case, we rely on the call to
> rte_eal_mp_remote_launch() being a compiler barrier, and the function itself
> to be a memory barrier. This ensures that worker threads sees synchro=0
> until it is set to 1 by the master.
> Is it correct?
> 

Yes, you are right. __ATOMIC_RELAXED would introduce no barrier, and the worker
threads would sync with master thread by 'synchro'.

> What is the reason for using the atomic API here? Wouldn't a standard
> affectation work too? (I mean "synchro = 1;")
> 

Here, __atomic_store_n(__ATOMIC_RELAXED) is used to ensure worker threads
see 'synchro=1' after it is changed by the master. And a standard affection can not
ensure worker threads get the new value.

> 
> >
> >  	rte_eal_mp_wait_lcore();
> > --
> > 2.17.1
> >

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/8] test/mcslock: use compiler atomics for lcores sync
  2021-07-29  7:19         ` Joyce Kong
@ 2021-07-29  7:58           ` Olivier Matz
  0 siblings, 0 replies; 49+ messages in thread
From: Olivier Matz @ 2021-07-29  7:58 UTC (permalink / raw)
  To: Joyce Kong
  Cc: thomas, david.marchand, roretzla, stephen, andrew.rybchenko,
	harry.van.haaren, Honnappa Nagarahalli, Ruifeng Wang, dev, nd

On Thu, Jul 29, 2021 at 07:19:13AM +0000, Joyce Kong wrote:
> Hi Olivier,
> 
> > -----Original Message-----
> > From: Olivier Matz <olivier.matz@6wind.com>
> > Sent: Wednesday, July 28, 2021 5:57 PM
> > To: Joyce Kong <Joyce.Kong@arm.com>
> > Cc: thomas@monjalon.net; david.marchand@redhat.com;
> > roretzla@linux.microsoft.com; stephen@networkplumber.org;
> > andrew.rybchenko@oktetlabs.ru; harry.van.haaren@intel.com; Honnappa
> > Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> > <Ruifeng.Wang@arm.com>; dev@dpdk.org; nd <nd@arm.com>
> > Subject: Re: [PATCH v3 4/8] test/mcslock: use compiler atomics for lcores
> > sync
> > 
> > Hi Joyce,
> > 
> > On Mon, Jul 19, 2021 at 10:51:21PM -0500, Joyce Kong wrote:
> > > Convert rte_atomic usages to compiler atomic built-ins for lcores sync
> > > in mcslock testcases.
> > >
> > > Signed-off-by: Joyce Kong <joyce.kong@arm.com>
> > > Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > > Acked-by: Stephen Hemminger <stephen@networkplumber.org>
> > > ---
> > >  app/test/test_mcslock.c | 14 ++++++--------
> > >  1 file changed, 6 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/app/test/test_mcslock.c b/app/test/test_mcslock.c index
> > > 80eaecc90a..52e45e7e2a 100644
> > > --- a/app/test/test_mcslock.c
> > > +++ b/app/test/test_mcslock.c
> > > @@ -17,7 +17,6 @@
> > >  #include <rte_lcore.h>
> > >  #include <rte_cycles.h>
> > >  #include <rte_mcslock.h>
> > > -#include <rte_atomic.h>
> > >
> > >  #include "test.h"
> > >
> > > @@ -43,7 +42,7 @@ rte_mcslock_t *p_ml_perf;
> > >
> > >  static unsigned int count;
> > >
> > > -static rte_atomic32_t synchro;
> > > +static uint32_t synchro;
> > >
> > >  static int
> > >  test_mcslock_per_core(__rte_unused void *arg) @@ -76,8 +75,7 @@
> > > load_loop_fn(void *func_param)
> > >  	rte_mcslock_t ml_perf_me;
> > >
> > >  	/* wait synchro */
> > > -	while (rte_atomic32_read(&synchro) == 0)
> > > -		;
> > > +	rte_wait_until_equal_32(&synchro, 1, __ATOMIC_RELAXED);
> > >
> > >  	begin = rte_get_timer_cycles();
> > >  	while (lcount < MAX_LOOP) {
> > > @@ -102,15 +100,15 @@ test_mcslock_perf(void)
> > >  	const unsigned int lcore = rte_lcore_id();
> > >
> > >  	printf("\nTest with no lock on single core...\n");
> > > -	rte_atomic32_set(&synchro, 1);
> > > +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
> > >  	load_loop_fn(&lock);
> > >  	printf("Core [%u] Cost Time = %"PRIu64" us\n",
> > >  			lcore, time_count[lcore]);
> > >  	memset(time_count, 0, sizeof(time_count));
> > >
> > >  	printf("\nTest with lock on single core...\n");
> > > +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
> > >  	lock = 1;
> > > -	rte_atomic32_set(&synchro, 1);
> > 
> > nit: is there a reason for moving this line?
> 
> I meant to use __atomic_store_n() instead of rte_atomic32_set() to set synchro,
> but put the operation to the line up 'lock=1' by mistake, will change it.
> 
> > > 
> > >  	load_loop_fn(&lock);
> > >  	printf("Core [%u] Cost Time = %"PRIu64" us\n",
> > >  			lcore, time_count[lcore]);
> > > @@ -118,11 +116,11 @@ test_mcslock_perf(void)
> > >
> > >  	printf("\nTest with lock on %u cores...\n", (rte_lcore_count()));
> > >
> > > -	rte_atomic32_set(&synchro, 0);
> > > +	__atomic_store_n(&synchro, 0, __ATOMIC_RELAXED);
> > >  	rte_eal_mp_remote_launch(load_loop_fn, &lock, SKIP_MAIN);
> > >
> > >  	/* start synchro and launch test on main */
> > > -	rte_atomic32_set(&synchro, 1);
> > > +	__atomic_store_n(&synchro, 1, __ATOMIC_RELAXED);
> > >  	load_loop_fn(&lock);
> > 
> > I have a more general question. Please forgive my ignorance about the
> > C++11 atomic builtins and memory model. Both gcc manual and C11
> > standard
> > are not that easy to understand :)
> > 
> > In all the patches of this patchset, __ATOMIC_RELAXED is used. My
> > understanding is that it does not add any inter-thread ordering constraint. I
> > suppose that in this particular case, we rely on the call to
> > rte_eal_mp_remote_launch() being a compiler barrier, and the function itself
> > to be a memory barrier. This ensures that worker threads sees synchro=0
> > until it is set to 1 by the master.
> > Is it correct?
> > 
> 
> Yes, you are right. __ATOMIC_RELAXED would introduce no barrier, and the worker
> threads would sync with master thread by 'synchro'.
> 
> > What is the reason for using the atomic API here? Wouldn't a standard
> > affectation work too? (I mean "synchro = 1;")
> > 
> 
> Here, __atomic_store_n(__ATOMIC_RELAXED) is used to ensure worker threads
> see 'synchro=1' after it is changed by the master. And a standard affection can not
> ensure worker threads get the new value.

So, if I understand correctly, using __atomic_store() acts as if the variable is
volatile, and this is indeed needed to ensure visibility from other worker
threads.

I did some tests to convince myself: https://godbolt.org/z/3qWYeneGf

Thank you for the clarification.

> > 
> > >
> > >  	rte_eal_mp_wait_lcore();
> > > --
> > > 2.17.1
> > >

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/8] use compiler atomic builtins for test
  2021-07-20  3:51   ` [dpdk-dev] [PATCH v3 0/8] use compiler " Joyce Kong
                       ` (7 preceding siblings ...)
  2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 8/8] test/rcu: use compiler atomics for data sync Joyce Kong
@ 2021-07-30 21:58     ` Thomas Monjalon
  8 siblings, 0 replies; 49+ messages in thread
From: Thomas Monjalon @ 2021-07-30 21:58 UTC (permalink / raw)
  To: Joyce Kong
  Cc: david.marchand, roretzla, stephen, olivier.matz,
	andrew.rybchenko, harry.van.haaren, honnappa.nagarahalli,
	ruifeng.wang, dev, nd

> Joyce Kong (8):
>   test/ticketlock: use compiler atomics for lcores sync
>   test/spinlock: use compile atomics for lcores sync
>   test/rwlock: use compiler atomics for lcores sync
>   test/mcslock: use compiler atomics for lcores sync
>   test/mempool: remove unused variable for lcores sync
>   test/mempool_perf: use compiler atomics for lcores sync
>   test/service_cores: use compiler atomics for lock sync
>   test/rcu: use compiler atomics for data sync

Applied, thanks.




^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2021-07-30 21:58 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-04  9:46 [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Joyce Kong
2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 1/8] test/ticketlock: use GCC atomic builtins for lcores sync Joyce Kong
2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 2/8] test/spinlock: " Joyce Kong
2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 3/8] test/rwlock: " Joyce Kong
2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 4/8] test/mcslock: " Joyce Kong
2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 5/8] test/mempool: remove unused variable " Joyce Kong
2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 6/8] test/mempool_perf: use GCC atomic builtins " Joyce Kong
2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 7/8] test/service_cores: use GCC atomic builtins for lock sync Joyce Kong
2021-06-04  9:46 ` [dpdk-dev] [PATCH v1 8/8] test/rcu_perf: use GCC atomic builtins for data sync Joyce Kong
2021-06-04 19:57 ` [dpdk-dev] [PATCH v1 0/8] use GCC's C11 atomic builtins for test Stephen Hemminger
2021-06-11  8:40 ` David Marchand
2021-06-11 10:45   ` Joyce Kong
2021-06-16  2:54 ` [dpdk-dev] [PATCH v2 " Joyce Kong
2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 1/8] test/ticketlock: use GCC atomic builtins for lcores sync Joyce Kong
2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 2/8] test/spinlock: " Joyce Kong
2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 3/8] test/rwlock: " Joyce Kong
2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 4/8] test/mcslock: " Joyce Kong
2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 5/8] test/mempool: remove unused variable " Joyce Kong
2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 6/8] test/mempool_perf: use GCC atomic builtins " Joyce Kong
2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 7/8] test/service_cores: use GCC atomic builtins for lock sync Joyce Kong
2021-06-16  2:54   ` [dpdk-dev] [PATCH v2 8/8] test/rcu: use GCC atomic builtins for data sync Joyce Kong
2021-06-17 15:21   ` [dpdk-dev] [PATCH v2 0/8] use GCC's C11 atomic builtins for test Tyler Retzlaff
2021-06-17 23:26     ` Honnappa Nagarahalli
2021-06-23 10:24       ` Thomas Monjalon
2021-06-23 16:02         ` Tyler Retzlaff
2021-06-29 17:04         ` Honnappa Nagarahalli
2021-06-30 18:51           ` Tyler Retzlaff
2021-06-30 19:06             ` Honnappa Nagarahalli
2021-06-30 19:38               ` Tyler Retzlaff
2021-06-30 20:25                 ` Honnappa Nagarahalli
2021-06-30 21:49                   ` Tyler Retzlaff
2021-06-30 22:41                     ` Honnappa Nagarahalli
2021-07-13  7:28                       ` Joyce Kong
2021-07-14 11:44                         ` Thomas Monjalon
2021-07-20  3:51   ` [dpdk-dev] [PATCH v3 0/8] use compiler " Joyce Kong
2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 1/8] test/ticketlock: use compiler atomics for lcores sync Joyce Kong
2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 2/8] test/spinlock: use compile " Joyce Kong
2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 3/8] test/rwlock: use compiler " Joyce Kong
2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 4/8] test/mcslock: " Joyce Kong
2021-07-28  9:56       ` Olivier Matz
2021-07-29  7:19         ` Joyce Kong
2021-07-29  7:58           ` Olivier Matz
2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 5/8] test/mempool: remove unused variable " Joyce Kong
2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 6/8] test/mempool_perf: use compiler atomics " Joyce Kong
2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 7/8] test/service_cores: use compiler atomics for lock sync Joyce Kong
2021-07-20  3:51     ` [dpdk-dev] [PATCH v3 8/8] test/rcu: use compiler atomics for data sync Joyce Kong
2021-07-23 19:52       ` Andrew Rybchenko
2021-07-28  7:07         ` Joyce Kong
2021-07-30 21:58     ` [dpdk-dev] [PATCH v3 0/8] use compiler atomic builtins for test Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).