[PATCH 00/21] use rte optional stdatomic API

DPDK patches and discussions
 help / color / mirror / Atom feed

* [PATCH 00/21] use rte optional stdatomic API
@ 2023-10-16 23:08 Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 01/21] power: fix use of rte stdatomic Tyler Retzlaff
                   ` (22 more replies)
  0 siblings, 23 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API.

Note the first 2 patches are redundant and will be removed before merge.
The series is a work in progress and is being submitted to get initial CI
results.

Tyler Retzlaff (21):
  power: fix use of rte stdatomic
  event/cnxk: remove single use of rte stdatomic
  power: use rte optional stdatomic API
  bbdev: use rte optional stdatomic API
  eal: use rte optional stdatomic API
  eventdev: use rte optional stdatomic API
  gpudev: use rte optional stdatomic API
  ipsec: use rte optional stdatomic API
  mbuf: use rte optional stdatomic API
  mempool: use rte optional stdatomic API
  rcu: use rte optional stdatomic API
  pdump: use rte optional stdatomic API
  stack: use rte optional stdatomic API
  telemetry: use rte optional stdatomic API
  vhost: use rte optional stdatomic API
  cryptodev: use rte optional stdatomic API
  distributor: use rte optional stdatomic API
  ethdev: use rte optional stdatomic API
  hash: use rte optional stdatomic API
  timer: use rte optional stdatomic API
  ring: use rte optional stdatomic API

 drivers/event/cnxk/cnxk_tim_worker.c   |   2 +-
 drivers/event/cnxk/cnxk_tim_worker.h   |   4 +-
 drivers/net/mlx5/mlx5_hws_cnt.h        |   2 +-
 lib/bbdev/rte_bbdev.c                  |   6 +-
 lib/bbdev/rte_bbdev.h                  |   2 +-
 lib/cryptodev/rte_cryptodev.c          |  22 +++---
 lib/cryptodev/rte_cryptodev.h          |  16 ++---
 lib/distributor/distributor_private.h  |   4 +-
 lib/distributor/rte_distributor.c      |  54 +++++++--------
 lib/eal/common/eal_common_launch.c     |  10 +--
 lib/eal/common/eal_common_mcfg.c       |   2 +-
 lib/eal/common/eal_common_proc.c       |  14 ++--
 lib/eal/common/eal_common_thread.c     |  26 +++----
 lib/eal/common/eal_common_trace.c      |   8 +--
 lib/eal/common/eal_common_trace_ctf.c  |   4 +-
 lib/eal/common/eal_memcfg.h            |   2 +-
 lib/eal/common/eal_private.h           |   4 +-
 lib/eal/common/eal_trace.h             |   4 +-
 lib/eal/common/rte_service.c           | 122 ++++++++++++++++-----------------
 lib/eal/freebsd/eal.c                  |  20 +++---
 lib/eal/include/rte_epoll.h            |   3 +-
 lib/eal/linux/eal.c                    |  26 +++----
 lib/eal/linux/eal_interrupts.c         |  42 ++++++------
 lib/eal/ppc/include/rte_atomic.h       |   6 +-
 lib/eal/windows/rte_thread.c           |   8 ++-
 lib/ethdev/ethdev_driver.h             |  16 ++---
 lib/ethdev/ethdev_private.c            |   6 +-
 lib/ethdev/rte_ethdev.c                |  24 +++----
 lib/ethdev/rte_ethdev.h                |  16 ++---
 lib/ethdev/rte_ethdev_core.h           |   2 +-
 lib/eventdev/rte_event_timer_adapter.c |  66 +++++++++---------
 lib/eventdev/rte_event_timer_adapter.h |   2 +-
 lib/gpudev/gpudev.c                    |   6 +-
 lib/gpudev/gpudev_driver.h             |   2 +-
 lib/hash/rte_cuckoo_hash.c             | 116 +++++++++++++++----------------
 lib/hash/rte_cuckoo_hash.h             |   6 +-
 lib/ipsec/ipsec_sqn.h                  |   2 +-
 lib/ipsec/sa.h                         |   2 +-
 lib/mbuf/rte_mbuf.h                    |  20 +++---
 lib/mbuf/rte_mbuf_core.h               |   4 +-
 lib/mempool/rte_mempool.h              |   4 +-
 lib/pdump/rte_pdump.c                  |  14 ++--
 lib/pdump/rte_pdump.h                  |   8 +--
 lib/power/power_acpi_cpufreq.c         |  33 ++++-----
 lib/power/power_amd_pstate_cpufreq.c   |  14 ++--
 lib/power/power_cppc_cpufreq.c         |  25 +++----
 lib/power/power_pstate_cpufreq.c       |  31 +++++----
 lib/rcu/rte_rcu_qsbr.c                 |  48 ++++++-------
 lib/rcu/rte_rcu_qsbr.h                 |  68 +++++++++---------
 lib/ring/rte_ring_c11_pvt.h            |  33 ++++-----
 lib/ring/rte_ring_core.h               |  10 +--
 lib/ring/rte_ring_generic_pvt.h        |   3 +-
 lib/ring/rte_ring_hts_elem_pvt.h       |  22 +++---
 lib/ring/rte_ring_peek_elem_pvt.h      |   6 +-
 lib/ring/rte_ring_rts_elem_pvt.h       |  27 ++++----
 lib/stack/rte_stack.h                  |   2 +-
 lib/stack/rte_stack_lf_c11.h           |  24 +++----
 lib/stack/rte_stack_lf_generic.h       |  18 ++---
 lib/telemetry/telemetry.c              |  18 ++---
 lib/timer/rte_timer.c                  |  50 +++++++-------
 lib/timer/rte_timer.h                  |   6 +-
 lib/vhost/vdpa.c                       |   3 +-
 lib/vhost/vhost.c                      |  42 ++++++------
 lib/vhost/vhost.h                      |  39 ++++++-----
 lib/vhost/vhost_user.c                 |   6 +-
 lib/vhost/virtio_net.c                 |  58 +++++++++-------
 lib/vhost/virtio_net_ctrl.c            |   6 +-
 67 files changed, 674 insertions(+), 647 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 01/21] power: fix use of rte stdatomic
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 02/21] event/cnxk: remove single " Tyler Retzlaff
                   ` (21 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

* rte stdatomic functions operate on RTE_ATOMIC(T) specified types not
  regular T add missing specifier for amd_pstate_power_info state field
* use rte_memory_order_xxx instead of __ATOMIC_XXX

Fixes: 1ed04d33cf19 ("power: support amd-pstate cpufreq driver")
Cc: sivaprasad.tummala@amd.com

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/power/power_amd_pstate_cpufreq.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/lib/power/power_amd_pstate_cpufreq.c b/lib/power/power_amd_pstate_cpufreq.c
index ee58395..dbd9d2b 100644
--- a/lib/power/power_amd_pstate_cpufreq.c
+++ b/lib/power/power_amd_pstate_cpufreq.c
@@ -47,7 +47,7 @@ enum power_state {
  */
 struct amd_pstate_power_info {
 	uint32_t lcore_id;                   /**< Logical core id */
-	uint32_t state;                      /**< Power in use state */
+	RTE_ATOMIC(uint32_t) state;          /**< Power in use state */
 	FILE *f;                             /**< FD of scaling_setspeed */
 	char governor_ori[28];               /**< Original governor name */
 	uint32_t curr_idx;                   /**< Freq index in freqs array */
@@ -370,7 +370,7 @@ struct amd_pstate_power_info {
 	 */
 	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state),
 					&exp_state, POWER_ONGOING,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"in use\n", lcore_id);
 		return -1;
@@ -408,12 +408,12 @@ struct amd_pstate_power_info {
 	RTE_LOG(INFO, POWER, "Initialized successfully for lcore %u "
 			"power management\n", lcore_id);
 
-	rte_atomic_store_explicit(&(pi->state), POWER_USED, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_USED, rte_memory_order_release);
 
 	return 0;
 
 fail:
-	rte_atomic_store_explicit(&(pi->state), POWER_UNKNOWN, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_UNKNOWN, rte_memory_order_release);
 	return -1;
 }
 
@@ -448,7 +448,7 @@ struct amd_pstate_power_info {
 	 */
 	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state),
 					&exp_state, POWER_ONGOING,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"not used\n", lcore_id);
 		return -1;
@@ -468,12 +468,12 @@ struct amd_pstate_power_info {
 	RTE_LOG(INFO, POWER, "Power management of lcore %u has exited from "
 			"'userspace' mode and been set back to the "
 			"original\n", lcore_id);
-	rte_atomic_store_explicit(&(pi->state), POWER_IDLE, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_IDLE, rte_memory_order_release);
 
 	return 0;
 
 fail:
-	rte_atomic_store_explicit(&(pi->state), POWER_UNKNOWN, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_UNKNOWN, rte_memory_order_release);
 
 	return -1;
 }
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 02/21] event/cnxk: remove single use of rte stdatomic
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 01/21] power: fix use of rte stdatomic Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 03/21] power: use rte optional stdatomic API Tyler Retzlaff
                   ` (20 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

The variable operated on by the single use of rte stdatomic was not
RTE_ATOMIC(T) specified. Remove the use of stdatomic for now to fix
LLVM build with enable_stdatomic=true. event/cnxk will be converted
to rte stdatomic in a later series.

Fixes: 14a4aa9eae71 ("event/cnxk: support get remaining ticks")
Cc: pbhagavatula@marvell.com

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 drivers/event/cnxk/cnxk_tim_worker.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/event/cnxk/cnxk_tim_worker.c b/drivers/event/cnxk/cnxk_tim_worker.c
index ae4bf33..944490d 100644
--- a/drivers/event/cnxk/cnxk_tim_worker.c
+++ b/drivers/event/cnxk/cnxk_tim_worker.c
@@ -193,7 +193,7 @@
 		return -ENOENT;
 
 	bkt = (struct cnxk_tim_bkt *)evtim->impl_opaque[1];
-	sema = rte_atomic_load_explicit(&bkt->w1, rte_memory_order_acquire);
+	sema = __atomic_load_n(&bkt->w1, rte_memory_order_acquire);
 	if (cnxk_tim_bkt_get_hbt(sema) || !cnxk_tim_bkt_get_nent(sema))
 		return -ENOENT;
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 03/21] power: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 01/21] power: fix use of rte stdatomic Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 02/21] event/cnxk: remove single " Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 04/21] bbdev: " Tyler Retzlaff
                   ` (19 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding
rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/power/power_acpi_cpufreq.c   | 33 +++++++++++++++++----------------
 lib/power/power_cppc_cpufreq.c   | 25 +++++++++++++------------
 lib/power/power_pstate_cpufreq.c | 31 ++++++++++++++++---------------
 3 files changed, 46 insertions(+), 43 deletions(-)

diff --git a/lib/power/power_acpi_cpufreq.c b/lib/power/power_acpi_cpufreq.c
index 6e57aca..8b55f19 100644
--- a/lib/power/power_acpi_cpufreq.c
+++ b/lib/power/power_acpi_cpufreq.c
@@ -7,6 +7,7 @@
 #include <stdlib.h>
 
 #include <rte_memcpy.h>
+#include <rte_stdatomic.h>
 #include <rte_string_fns.h>
 
 #include "power_acpi_cpufreq.h"
@@ -41,13 +42,13 @@ enum power_state {
  * Power info per lcore.
  */
 struct acpi_power_info {
-	unsigned int lcore_id;                   /**< Logical core id */
+	unsigned int lcore_id;               /**< Logical core id */
 	uint32_t freqs[RTE_MAX_LCORE_FREQS]; /**< Frequency array */
 	uint32_t nb_freqs;                   /**< number of available freqs */
 	FILE *f;                             /**< FD of scaling_setspeed */
 	char governor_ori[32];               /**< Original governor name */
 	uint32_t curr_idx;                   /**< Freq index in freqs array */
-	uint32_t state;                      /**< Power in use state */
+	RTE_ATOMIC(uint32_t) state;          /**< Power in use state */
 	uint16_t turbo_available;            /**< Turbo Boost available */
 	uint16_t turbo_enable;               /**< Turbo Boost enable/disable */
 } __rte_cache_aligned;
@@ -249,9 +250,9 @@ struct acpi_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"in use\n", lcore_id);
 		return -1;
@@ -289,15 +290,15 @@ struct acpi_power_info {
 	RTE_LOG(INFO, POWER, "Initialized successfully for lcore %u "
 			"power management\n", lcore_id);
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_USED,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_USED,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return 0;
 
 fail:
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_UNKNOWN,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_UNKNOWN,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return -1;
 }
@@ -321,9 +322,9 @@ struct acpi_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"not used\n", lcore_id);
 		return -1;
@@ -344,15 +345,15 @@ struct acpi_power_info {
 			"'userspace' mode and been set back to the "
 			"original\n", lcore_id);
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_IDLE,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_IDLE,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return 0;
 
 fail:
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_UNKNOWN,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_UNKNOWN,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return -1;
 }
diff --git a/lib/power/power_cppc_cpufreq.c b/lib/power/power_cppc_cpufreq.c
index fc9cffe..bb70f6a 100644
--- a/lib/power/power_cppc_cpufreq.c
+++ b/lib/power/power_cppc_cpufreq.c
@@ -6,6 +6,7 @@
 #include <stdlib.h>
 
 #include <rte_memcpy.h>
+#include <rte_stdatomic.h>
 
 #include "power_cppc_cpufreq.h"
 #include "power_common.h"
@@ -49,8 +50,8 @@ enum power_state {
  * Power info per lcore.
  */
 struct cppc_power_info {
-	unsigned int lcore_id;                   /**< Logical core id */
-	uint32_t state;                      /**< Power in use state */
+	unsigned int lcore_id;               /**< Logical core id */
+	RTE_ATOMIC(uint32_t) state;          /**< Power in use state */
 	FILE *f;                             /**< FD of scaling_setspeed */
 	char governor_ori[32];               /**< Original governor name */
 	uint32_t curr_idx;                   /**< Freq index in freqs array */
@@ -353,9 +354,9 @@ struct cppc_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"in use\n", lcore_id);
 		return -1;
@@ -393,12 +394,12 @@ struct cppc_power_info {
 	RTE_LOG(INFO, POWER, "Initialized successfully for lcore %u "
 			"power management\n", lcore_id);
 
-	__atomic_store_n(&(pi->state), POWER_USED, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_USED, rte_memory_order_release);
 
 	return 0;
 
 fail:
-	__atomic_store_n(&(pi->state), POWER_UNKNOWN, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_UNKNOWN, rte_memory_order_release);
 	return -1;
 }
 
@@ -431,9 +432,9 @@ struct cppc_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"not used\n", lcore_id);
 		return -1;
@@ -453,12 +454,12 @@ struct cppc_power_info {
 	RTE_LOG(INFO, POWER, "Power management of lcore %u has exited from "
 			"'userspace' mode and been set back to the "
 			"original\n", lcore_id);
-	__atomic_store_n(&(pi->state), POWER_IDLE, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_IDLE, rte_memory_order_release);
 
 	return 0;
 
 fail:
-	__atomic_store_n(&(pi->state), POWER_UNKNOWN, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_UNKNOWN, rte_memory_order_release);
 
 	return -1;
 }
diff --git a/lib/power/power_pstate_cpufreq.c b/lib/power/power_pstate_cpufreq.c
index 52aa645..5ca5f60 100644
--- a/lib/power/power_pstate_cpufreq.c
+++ b/lib/power/power_pstate_cpufreq.c
@@ -12,6 +12,7 @@
 #include <inttypes.h>
 
 #include <rte_memcpy.h>
+#include <rte_stdatomic.h>
 
 #include "rte_power_pmd_mgmt.h"
 #include "power_pstate_cpufreq.h"
@@ -59,7 +60,7 @@ struct pstate_power_info {
 	uint32_t non_turbo_max_ratio;        /**< Non Turbo Max ratio  */
 	uint32_t sys_max_freq;               /**< system wide max freq  */
 	uint32_t core_base_freq;             /**< core base freq  */
-	uint32_t state;                      /**< Power in use state */
+	RTE_ATOMIC(uint32_t) state;          /**< Power in use state */
 	uint16_t turbo_available;            /**< Turbo Boost available */
 	uint16_t turbo_enable;               /**< Turbo Boost enable/disable */
 	uint16_t priority_core;              /**< High Performance core */
@@ -555,9 +556,9 @@ struct pstate_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"in use\n", lcore_id);
 		return -1;
@@ -600,15 +601,15 @@ struct pstate_power_info {
 	RTE_LOG(INFO, POWER, "Initialized successfully for lcore %u "
 			"power management\n", lcore_id);
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_USED,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_USED,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return 0;
 
 fail:
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_UNKNOWN,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_UNKNOWN,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return -1;
 }
@@ -633,9 +634,9 @@ struct pstate_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are under done the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"not used\n", lcore_id);
 		return -1;
@@ -658,15 +659,15 @@ struct pstate_power_info {
 			"'performance' mode and been set back to the "
 			"original\n", lcore_id);
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_IDLE,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_IDLE,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return 0;
 
 fail:
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_UNKNOWN,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_UNKNOWN,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return -1;
 }
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 04/21] bbdev: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (2 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 03/21] power: use rte optional stdatomic API Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 05/21] eal: " Tyler Retzlaff
                   ` (18 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding
rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/bbdev/rte_bbdev.c | 6 +++---
 lib/bbdev/rte_bbdev.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/bbdev/rte_bbdev.c b/lib/bbdev/rte_bbdev.c
index 155323e..cfebea0 100644
--- a/lib/bbdev/rte_bbdev.c
+++ b/lib/bbdev/rte_bbdev.c
@@ -208,7 +208,7 @@ struct rte_bbdev *
 		return NULL;
 	}
 
-	__atomic_fetch_add(&bbdev->data->process_cnt, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&bbdev->data->process_cnt, 1, rte_memory_order_relaxed);
 	bbdev->data->dev_id = dev_id;
 	bbdev->state = RTE_BBDEV_INITIALIZED;
 
@@ -250,8 +250,8 @@ struct rte_bbdev *
 	}
 
 	/* clear shared BBDev Data if no process is using the device anymore */
-	if (__atomic_fetch_sub(&bbdev->data->process_cnt, 1,
-			      __ATOMIC_RELAXED) - 1 == 0)
+	if (rte_atomic_fetch_sub_explicit(&bbdev->data->process_cnt, 1,
+			      rte_memory_order_relaxed) - 1 == 0)
 		memset(bbdev->data, 0, sizeof(*bbdev->data));
 
 	memset(bbdev, 0, sizeof(*bbdev));
diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index d12e2e7..e1aee08 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -482,7 +482,7 @@ struct rte_bbdev_data {
 	uint16_t dev_id;  /**< Device ID */
 	int socket_id;  /**< NUMA socket that device is on */
 	bool started;  /**< Device run-time state */
-	uint16_t process_cnt;  /** Counter of processes using the device */
+	RTE_ATOMIC(uint16_t) process_cnt;  /** Counter of processes using the device */
 };
 
 /* Forward declarations */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 05/21] eal: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (3 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 04/21] bbdev: " Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 06/21] eventdev: " Tyler Retzlaff
                   ` (17 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding
rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/common/eal_common_launch.c    |  10 +--
 lib/eal/common/eal_common_mcfg.c      |   2 +-
 lib/eal/common/eal_common_proc.c      |  14 ++--
 lib/eal/common/eal_common_thread.c    |  26 ++++----
 lib/eal/common/eal_common_trace.c     |   8 +--
 lib/eal/common/eal_common_trace_ctf.c |   4 +-
 lib/eal/common/eal_memcfg.h           |   2 +-
 lib/eal/common/eal_private.h          |   4 +-
 lib/eal/common/eal_trace.h            |   4 +-
 lib/eal/common/rte_service.c          | 122 +++++++++++++++++-----------------
 lib/eal/freebsd/eal.c                 |  20 +++---
 lib/eal/include/rte_epoll.h           |   3 +-
 lib/eal/linux/eal.c                   |  26 ++++----
 lib/eal/linux/eal_interrupts.c        |  42 ++++++------
 lib/eal/ppc/include/rte_atomic.h      |   6 +-
 lib/eal/windows/rte_thread.c          |   8 ++-
 16 files changed, 152 insertions(+), 149 deletions(-)

diff --git a/lib/eal/common/eal_common_launch.c b/lib/eal/common/eal_common_launch.c
index 0504598..5320c3b 100644
--- a/lib/eal/common/eal_common_launch.c
+++ b/lib/eal/common/eal_common_launch.c
@@ -18,8 +18,8 @@
 int
 rte_eal_wait_lcore(unsigned worker_id)
 {
-	while (__atomic_load_n(&lcore_config[worker_id].state,
-			__ATOMIC_ACQUIRE) != WAIT)
+	while (rte_atomic_load_explicit(&lcore_config[worker_id].state,
+			rte_memory_order_acquire) != WAIT)
 		rte_pause();
 
 	return lcore_config[worker_id].ret;
@@ -38,8 +38,8 @@
 	/* Check if the worker is in 'WAIT' state. Use acquire order
 	 * since 'state' variable is used as the guard variable.
 	 */
-	if (__atomic_load_n(&lcore_config[worker_id].state,
-			__ATOMIC_ACQUIRE) != WAIT)
+	if (rte_atomic_load_explicit(&lcore_config[worker_id].state,
+			rte_memory_order_acquire) != WAIT)
 		goto finish;
 
 	lcore_config[worker_id].arg = arg;
@@ -47,7 +47,7 @@
 	 * before the worker thread starts running the function.
 	 * Use worker thread function as the guard variable.
 	 */
-	__atomic_store_n(&lcore_config[worker_id].f, f, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&lcore_config[worker_id].f, f, rte_memory_order_release);
 
 	rc = eal_thread_wake_worker(worker_id);
 
diff --git a/lib/eal/common/eal_common_mcfg.c b/lib/eal/common/eal_common_mcfg.c
index 2a785e7..dabb80e 100644
--- a/lib/eal/common/eal_common_mcfg.c
+++ b/lib/eal/common/eal_common_mcfg.c
@@ -30,7 +30,7 @@
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 
 	/* wait until shared mem_config finish initialising */
-	rte_wait_until_equal_32(&mcfg->magic, RTE_MAGIC, __ATOMIC_RELAXED);
+	rte_wait_until_equal_32(&mcfg->magic, RTE_MAGIC, rte_memory_order_relaxed);
 }
 
 int
diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index f20a348..728815c 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -33,7 +33,7 @@
 #include "eal_filesystem.h"
 #include "eal_internal_cfg.h"
 
-static int mp_fd = -1;
+static RTE_ATOMIC(int) mp_fd = -1;
 static rte_thread_t mp_handle_tid;
 static char mp_filter[PATH_MAX];   /* Filter for secondary process sockets */
 static char mp_dir_path[PATH_MAX]; /* The directory path for all mp sockets */
@@ -404,7 +404,7 @@ struct pending_request {
 	struct sockaddr_un sa;
 	int fd;
 
-	while ((fd = __atomic_load_n(&mp_fd, __ATOMIC_RELAXED)) >= 0) {
+	while ((fd = rte_atomic_load_explicit(&mp_fd, rte_memory_order_relaxed)) >= 0) {
 		int ret;
 
 		ret = read_msg(fd, &msg, &sa);
@@ -652,7 +652,7 @@ enum async_action {
 		RTE_LOG(ERR, EAL, "failed to create mp thread: %s\n",
 			strerror(errno));
 		close(dir_fd);
-		close(__atomic_exchange_n(&mp_fd, -1, __ATOMIC_RELAXED));
+		close(rte_atomic_exchange_explicit(&mp_fd, -1, rte_memory_order_relaxed));
 		return -1;
 	}
 
@@ -668,7 +668,7 @@ enum async_action {
 {
 	int fd;
 
-	fd = __atomic_exchange_n(&mp_fd, -1, __ATOMIC_RELAXED);
+	fd = rte_atomic_exchange_explicit(&mp_fd, -1, rte_memory_order_relaxed);
 	if (fd < 0)
 		return;
 
@@ -1282,11 +1282,11 @@ enum mp_status {
 
 	expected = MP_STATUS_UNKNOWN;
 	desired = status;
-	if (__atomic_compare_exchange_n(&mcfg->mp_status, &expected, desired,
-			false, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+	if (rte_atomic_compare_exchange_strong_explicit(&mcfg->mp_status, &expected, desired,
+			rte_memory_order_relaxed, rte_memory_order_relaxed))
 		return true;
 
-	return __atomic_load_n(&mcfg->mp_status, __ATOMIC_RELAXED) == desired;
+	return rte_atomic_load_explicit(&mcfg->mp_status, rte_memory_order_relaxed) == desired;
 }
 
 bool
diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c
index 668b1ed..c422ea8 100644
--- a/lib/eal/common/eal_common_thread.c
+++ b/lib/eal/common/eal_common_thread.c
@@ -191,8 +191,8 @@ unsigned rte_socket_id(void)
 		/* Set the state to 'RUNNING'. Use release order
 		 * since 'state' variable is used as the guard variable.
 		 */
-		__atomic_store_n(&lcore_config[lcore_id].state, RUNNING,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&lcore_config[lcore_id].state, RUNNING,
+			rte_memory_order_release);
 
 		eal_thread_ack_command();
 
@@ -201,8 +201,8 @@ unsigned rte_socket_id(void)
 		 * are accessed only after update to 'f' is visible.
 		 * Wait till the update to 'f' is visible to the worker.
 		 */
-		while ((f = __atomic_load_n(&lcore_config[lcore_id].f,
-				__ATOMIC_ACQUIRE)) == NULL)
+		while ((f = rte_atomic_load_explicit(&lcore_config[lcore_id].f,
+				rte_memory_order_acquire)) == NULL)
 			rte_pause();
 
 		rte_eal_trace_thread_lcore_running(lcore_id, f);
@@ -219,8 +219,8 @@ unsigned rte_socket_id(void)
 		 * are completed before the state is updated.
 		 * Use 'state' as the guard variable.
 		 */
-		__atomic_store_n(&lcore_config[lcore_id].state, WAIT,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&lcore_config[lcore_id].state, WAIT,
+			rte_memory_order_release);
 
 		rte_eal_trace_thread_lcore_stopped(lcore_id);
 	}
@@ -242,7 +242,7 @@ struct control_thread_params {
 	/* Control thread status.
 	 * If the status is CTRL_THREAD_ERROR, 'ret' has the error code.
 	 */
-	enum __rte_ctrl_thread_status status;
+	RTE_ATOMIC(enum __rte_ctrl_thread_status) status;
 };
 
 static int control_thread_init(void *arg)
@@ -259,13 +259,13 @@ static int control_thread_init(void *arg)
 	RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;
 	params->ret = rte_thread_set_affinity_by_id(rte_thread_self(), cpuset);
 	if (params->ret != 0) {
-		__atomic_store_n(&params->status,
-			CTRL_THREAD_ERROR, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&params->status,
+			CTRL_THREAD_ERROR, rte_memory_order_release);
 		return 1;
 	}
 
-	__atomic_store_n(&params->status,
-		CTRL_THREAD_RUNNING, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&params->status,
+		CTRL_THREAD_RUNNING, rte_memory_order_release);
 
 	return 0;
 }
@@ -310,8 +310,8 @@ static uint32_t control_thread_start(void *arg)
 
 	/* Wait for the control thread to initialize successfully */
 	while ((ctrl_thread_status =
-			__atomic_load_n(&params->status,
-			__ATOMIC_ACQUIRE)) == CTRL_THREAD_LAUNCHING) {
+			rte_atomic_load_explicit(&params->status,
+			rte_memory_order_acquire)) == CTRL_THREAD_LAUNCHING) {
 		rte_delay_us_sleep(1);
 	}
 
diff --git a/lib/eal/common/eal_common_trace.c b/lib/eal/common/eal_common_trace.c
index d2eac2d..6ad87fc 100644
--- a/lib/eal/common/eal_common_trace.c
+++ b/lib/eal/common/eal_common_trace.c
@@ -97,7 +97,7 @@ struct trace_point_head *
 bool
 rte_trace_is_enabled(void)
 {
-	return __atomic_load_n(&trace.status, __ATOMIC_ACQUIRE) != 0;
+	return rte_atomic_load_explicit(&trace.status, rte_memory_order_acquire) != 0;
 }
 
 static void
@@ -157,7 +157,7 @@ rte_trace_mode rte_trace_mode_get(void)
 	prev = rte_atomic_fetch_or_explicit(t, __RTE_TRACE_FIELD_ENABLE_MASK,
 		rte_memory_order_release);
 	if ((prev & __RTE_TRACE_FIELD_ENABLE_MASK) == 0)
-		__atomic_fetch_add(&trace.status, 1, __ATOMIC_RELEASE);
+		rte_atomic_fetch_add_explicit(&trace.status, 1, rte_memory_order_release);
 	return 0;
 }
 
@@ -172,7 +172,7 @@ rte_trace_mode rte_trace_mode_get(void)
 	prev = rte_atomic_fetch_and_explicit(t, ~__RTE_TRACE_FIELD_ENABLE_MASK,
 		rte_memory_order_release);
 	if ((prev & __RTE_TRACE_FIELD_ENABLE_MASK) != 0)
-		__atomic_fetch_sub(&trace.status, 1, __ATOMIC_RELEASE);
+		rte_atomic_fetch_sub_explicit(&trace.status, 1, rte_memory_order_release);
 	return 0;
 }
 
@@ -526,7 +526,7 @@ rte_trace_mode rte_trace_mode_get(void)
 
 	/* Add the trace point at tail */
 	STAILQ_INSERT_TAIL(&tp_list, tp, next);
-	__atomic_thread_fence(__ATOMIC_RELEASE);
+	__atomic_thread_fence(rte_memory_order_release);
 
 	/* All Good !!! */
 	return 0;
diff --git a/lib/eal/common/eal_common_trace_ctf.c b/lib/eal/common/eal_common_trace_ctf.c
index c6775c3..04c4f71 100644
--- a/lib/eal/common/eal_common_trace_ctf.c
+++ b/lib/eal/common/eal_common_trace_ctf.c
@@ -361,10 +361,10 @@
 	if (ctf_meta == NULL)
 		return -EINVAL;
 
-	if (!__atomic_load_n(&trace->ctf_fixup_done, __ATOMIC_SEQ_CST) &&
+	if (!rte_atomic_load_explicit(&trace->ctf_fixup_done, rte_memory_order_seq_cst) &&
 				rte_get_timer_hz()) {
 		meta_fixup(trace, ctf_meta);
-		__atomic_store_n(&trace->ctf_fixup_done, 1, __ATOMIC_SEQ_CST);
+		rte_atomic_store_explicit(&trace->ctf_fixup_done, 1, rte_memory_order_seq_cst);
 	}
 
 	rc = fprintf(f, "%s", ctf_meta);
diff --git a/lib/eal/common/eal_memcfg.h b/lib/eal/common/eal_memcfg.h
index d5c63e2..60e2089 100644
--- a/lib/eal/common/eal_memcfg.h
+++ b/lib/eal/common/eal_memcfg.h
@@ -42,7 +42,7 @@ struct rte_mem_config {
 	rte_rwlock_t memory_hotplug_lock;
 	/**< Indicates whether memory hotplug request is in progress. */
 
-	uint8_t mp_status; /**< Multiprocess status. */
+	RTE_ATOMIC(uint8_t) mp_status; /**< Multiprocess status. */
 
 	/* memory segments and zones */
 	struct rte_fbarray memzones; /**< Memzone descriptors. */
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index ebd496b..4d2e806 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -24,11 +24,11 @@ struct lcore_config {
 	int pipe_main2worker[2];   /**< communication pipe with main */
 	int pipe_worker2main[2];   /**< communication pipe with main */
 
-	lcore_function_t * volatile f; /**< function to call */
+	RTE_ATOMIC(lcore_function_t *) volatile f; /**< function to call */
 	void * volatile arg;       /**< argument of function */
 	volatile int ret;          /**< return value of function */
 
-	volatile enum rte_lcore_state_t state; /**< lcore state */
+	volatile RTE_ATOMIC(enum rte_lcore_state_t) state; /**< lcore state */
 	unsigned int socket_id;    /**< physical socket id for this lcore */
 	unsigned int core_id;      /**< core number on socket for this lcore */
 	int core_index;            /**< relative index, starting from 0 */
diff --git a/lib/eal/common/eal_trace.h b/lib/eal/common/eal_trace.h
index d66bcfe..ace2ef3 100644
--- a/lib/eal/common/eal_trace.h
+++ b/lib/eal/common/eal_trace.h
@@ -50,7 +50,7 @@ struct trace_arg {
 struct trace {
 	char *dir;
 	int register_errno;
-	uint32_t status;
+	RTE_ATOMIC(uint32_t) status;
 	enum rte_trace_mode mode;
 	rte_uuid_t uuid;
 	uint32_t buff_len;
@@ -65,7 +65,7 @@ struct trace {
 	uint32_t ctf_meta_offset_freq;
 	uint32_t ctf_meta_offset_freq_off_s;
 	uint32_t ctf_meta_offset_freq_off;
-	uint16_t ctf_fixup_done;
+	RTE_ATOMIC(uint16_t) ctf_fixup_done;
 	rte_spinlock_t lock;
 };
 
diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index 9e2aa4a..3fc2b9a 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -43,8 +43,8 @@ struct rte_service_spec_impl {
 	rte_spinlock_t execute_lock;
 
 	/* API set/get-able variables */
-	int8_t app_runstate;
-	int8_t comp_runstate;
+	RTE_ATOMIC(int8_t) app_runstate;
+	RTE_ATOMIC(int8_t) comp_runstate;
 	uint8_t internal_flags;
 
 	/* per service statistics */
@@ -52,24 +52,24 @@ struct rte_service_spec_impl {
 	 * It does not indicate the number of cores the service is running
 	 * on currently.
 	 */
-	uint32_t num_mapped_cores;
+	RTE_ATOMIC(uint32_t) num_mapped_cores;
 } __rte_cache_aligned;
 
 struct service_stats {
-	uint64_t calls;
-	uint64_t cycles;
+	RTE_ATOMIC(uint64_t) calls;
+	RTE_ATOMIC(uint64_t) cycles;
 };
 
 /* the internal values of a service core */
 struct core_state {
 	/* map of services IDs are run on this core */
 	uint64_t service_mask;
-	uint8_t runstate; /* running or stopped */
-	uint8_t thread_active; /* indicates when thread is in service_run() */
+	RTE_ATOMIC(uint8_t) runstate; /* running or stopped */
+	RTE_ATOMIC(uint8_t) thread_active; /* indicates when thread is in service_run() */
 	uint8_t is_service_core; /* set if core is currently a service core */
 	uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX];
-	uint64_t loops;
-	uint64_t cycles;
+	RTE_ATOMIC(uint64_t) loops;
+	RTE_ATOMIC(uint64_t) cycles;
 	struct service_stats service_stats[RTE_SERVICE_NUM_MAX];
 } __rte_cache_aligned;
 
@@ -314,11 +314,11 @@ struct core_state {
 	 * service_run and service_runstate_get function.
 	 */
 	if (runstate)
-		__atomic_store_n(&s->comp_runstate, RUNSTATE_RUNNING,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&s->comp_runstate, RUNSTATE_RUNNING,
+			rte_memory_order_release);
 	else
-		__atomic_store_n(&s->comp_runstate, RUNSTATE_STOPPED,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&s->comp_runstate, RUNSTATE_STOPPED,
+			rte_memory_order_release);
 
 	return 0;
 }
@@ -334,11 +334,11 @@ struct core_state {
 	 * service_run runstate_get function.
 	 */
 	if (runstate)
-		__atomic_store_n(&s->app_runstate, RUNSTATE_RUNNING,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&s->app_runstate, RUNSTATE_RUNNING,
+			rte_memory_order_release);
 	else
-		__atomic_store_n(&s->app_runstate, RUNSTATE_STOPPED,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&s->app_runstate, RUNSTATE_STOPPED,
+			rte_memory_order_release);
 
 	rte_eal_trace_service_runstate_set(id, runstate);
 	return 0;
@@ -354,14 +354,14 @@ struct core_state {
 	 * Use load-acquire memory order. This synchronizes with
 	 * store-release in service state set functions.
 	 */
-	if (__atomic_load_n(&s->comp_runstate, __ATOMIC_ACQUIRE) ==
+	if (rte_atomic_load_explicit(&s->comp_runstate, rte_memory_order_acquire) ==
 			RUNSTATE_RUNNING &&
-	    __atomic_load_n(&s->app_runstate, __ATOMIC_ACQUIRE) ==
+	    rte_atomic_load_explicit(&s->app_runstate, rte_memory_order_acquire) ==
 			RUNSTATE_RUNNING) {
 		int check_disabled = !(s->internal_flags &
 			SERVICE_F_START_CHECK);
-		int lcore_mapped = (__atomic_load_n(&s->num_mapped_cores,
-			__ATOMIC_RELAXED) > 0);
+		int lcore_mapped = (rte_atomic_load_explicit(&s->num_mapped_cores,
+			rte_memory_order_relaxed) > 0);
 
 		return (check_disabled | lcore_mapped);
 	} else
@@ -392,15 +392,15 @@ struct core_state {
 			uint64_t end = rte_rdtsc();
 			uint64_t cycles = end - start;
 
-			__atomic_store_n(&cs->cycles, cs->cycles + cycles,
-				__ATOMIC_RELAXED);
-			__atomic_store_n(&service_stats->cycles,
+			rte_atomic_store_explicit(&cs->cycles, cs->cycles + cycles,
+				rte_memory_order_relaxed);
+			rte_atomic_store_explicit(&service_stats->cycles,
 				service_stats->cycles + cycles,
-				__ATOMIC_RELAXED);
+				rte_memory_order_relaxed);
 		}
 
-		__atomic_store_n(&service_stats->calls,
-			service_stats->calls + 1, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&service_stats->calls,
+			service_stats->calls + 1, rte_memory_order_relaxed);
 	} else {
 		s->spec.callback(userdata);
 	}
@@ -420,9 +420,9 @@ struct core_state {
 	 * Use load-acquire memory order. This synchronizes with
 	 * store-release in service state set functions.
 	 */
-	if (__atomic_load_n(&s->comp_runstate, __ATOMIC_ACQUIRE) !=
+	if (rte_atomic_load_explicit(&s->comp_runstate, rte_memory_order_acquire) !=
 			RUNSTATE_RUNNING ||
-	    __atomic_load_n(&s->app_runstate, __ATOMIC_ACQUIRE) !=
+	    rte_atomic_load_explicit(&s->app_runstate, rte_memory_order_acquire) !=
 			RUNSTATE_RUNNING ||
 	    !(service_mask & (UINT64_C(1) << i))) {
 		cs->service_active_on_lcore[i] = 0;
@@ -472,11 +472,11 @@ struct core_state {
 	/* Increment num_mapped_cores to reflect that this core is
 	 * now mapped capable of running the service.
 	 */
-	__atomic_fetch_add(&s->num_mapped_cores, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&s->num_mapped_cores, 1, rte_memory_order_relaxed);
 
 	int ret = service_run(id, cs, UINT64_MAX, s, serialize_mt_unsafe);
 
-	__atomic_fetch_sub(&s->num_mapped_cores, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_sub_explicit(&s->num_mapped_cores, 1, rte_memory_order_relaxed);
 
 	return ret;
 }
@@ -489,13 +489,13 @@ struct core_state {
 	const int lcore = rte_lcore_id();
 	struct core_state *cs = &lcore_states[lcore];
 
-	__atomic_store_n(&cs->thread_active, 1, __ATOMIC_SEQ_CST);
+	rte_atomic_store_explicit(&cs->thread_active, 1, rte_memory_order_seq_cst);
 
 	/* runstate act as the guard variable. Use load-acquire
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
 	 */
-	while (__atomic_load_n(&cs->runstate, __ATOMIC_ACQUIRE) ==
+	while (rte_atomic_load_explicit(&cs->runstate, rte_memory_order_acquire) ==
 			RUNSTATE_RUNNING) {
 
 		const uint64_t service_mask = cs->service_mask;
@@ -513,7 +513,7 @@ struct core_state {
 			service_run(i, cs, service_mask, service_get(i), 1);
 		}
 
-		__atomic_store_n(&cs->loops, cs->loops + 1, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&cs->loops, cs->loops + 1, rte_memory_order_relaxed);
 	}
 
 	/* Switch off this core for all services, to ensure that future
@@ -526,7 +526,7 @@ struct core_state {
 	 * this store, ensuring that once this store is visible, the service
 	 * lcore thread really is done in service cores code.
 	 */
-	__atomic_store_n(&cs->thread_active, 0, __ATOMIC_SEQ_CST);
+	rte_atomic_store_explicit(&cs->thread_active, 0, rte_memory_order_seq_cst);
 	return 0;
 }
 
@@ -539,8 +539,8 @@ struct core_state {
 	/* Load thread_active using ACQUIRE to avoid instructions dependent on
 	 * the result being re-ordered before this load completes.
 	 */
-	return __atomic_load_n(&lcore_states[lcore].thread_active,
-			       __ATOMIC_ACQUIRE);
+	return rte_atomic_load_explicit(&lcore_states[lcore].thread_active,
+			       rte_memory_order_acquire);
 }
 
 int32_t
@@ -646,13 +646,13 @@ struct core_state {
 
 		if (*set && !lcore_mapped) {
 			lcore_states[lcore].service_mask |= sid_mask;
-			__atomic_fetch_add(&rte_services[sid].num_mapped_cores,
-				1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&rte_services[sid].num_mapped_cores,
+				1, rte_memory_order_relaxed);
 		}
 		if (!*set && lcore_mapped) {
 			lcore_states[lcore].service_mask &= ~(sid_mask);
-			__atomic_fetch_sub(&rte_services[sid].num_mapped_cores,
-				1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_sub_explicit(&rte_services[sid].num_mapped_cores,
+				1, rte_memory_order_relaxed);
 		}
 	}
 
@@ -709,13 +709,13 @@ struct core_state {
 			 * store-release memory order here to synchronize
 			 * with load-acquire in runstate read functions.
 			 */
-			__atomic_store_n(&lcore_states[i].runstate,
-				RUNSTATE_STOPPED, __ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&lcore_states[i].runstate,
+				RUNSTATE_STOPPED, rte_memory_order_release);
 		}
 	}
 	for (i = 0; i < RTE_SERVICE_NUM_MAX; i++)
-		__atomic_store_n(&rte_services[i].num_mapped_cores, 0,
-			__ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&rte_services[i].num_mapped_cores, 0,
+			rte_memory_order_relaxed);
 
 	return 0;
 }
@@ -735,8 +735,8 @@ struct core_state {
 	/* Use store-release memory order here to synchronize with
 	 * load-acquire in runstate read functions.
 	 */
-	__atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
-		__ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
+		rte_memory_order_release);
 
 	return rte_eal_wait_lcore(lcore);
 }
@@ -755,7 +755,7 @@ struct core_state {
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
 	 */
-	if (__atomic_load_n(&cs->runstate, __ATOMIC_ACQUIRE) !=
+	if (rte_atomic_load_explicit(&cs->runstate, rte_memory_order_acquire) !=
 			RUNSTATE_STOPPED)
 		return -EBUSY;
 
@@ -779,7 +779,7 @@ struct core_state {
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
 	 */
-	if (__atomic_load_n(&cs->runstate, __ATOMIC_ACQUIRE) ==
+	if (rte_atomic_load_explicit(&cs->runstate, rte_memory_order_acquire) ==
 			RUNSTATE_RUNNING)
 		return -EALREADY;
 
@@ -789,7 +789,7 @@ struct core_state {
 	/* Use load-acquire memory order here to synchronize with
 	 * store-release in runstate update functions.
 	 */
-	__atomic_store_n(&cs->runstate, RUNSTATE_RUNNING, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&cs->runstate, RUNSTATE_RUNNING, rte_memory_order_release);
 
 	rte_eal_trace_service_lcore_start(lcore);
 
@@ -808,7 +808,7 @@ struct core_state {
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
 	 */
-	if (__atomic_load_n(&lcore_states[lcore].runstate, __ATOMIC_ACQUIRE) ==
+	if (rte_atomic_load_explicit(&lcore_states[lcore].runstate, rte_memory_order_acquire) ==
 			RUNSTATE_STOPPED)
 		return -EALREADY;
 
@@ -820,8 +820,8 @@ struct core_state {
 		int32_t enabled = service_mask & (UINT64_C(1) << i);
 		int32_t service_running = rte_service_runstate_get(i);
 		int32_t only_core = (1 ==
-			__atomic_load_n(&rte_services[i].num_mapped_cores,
-				__ATOMIC_RELAXED));
+			rte_atomic_load_explicit(&rte_services[i].num_mapped_cores,
+				rte_memory_order_relaxed));
 
 		/* if the core is mapped, and the service is running, and this
 		 * is the only core that is mapped, the service would cease to
@@ -834,8 +834,8 @@ struct core_state {
 	/* Use store-release memory order here to synchronize with
 	 * load-acquire in runstate read functions.
 	 */
-	__atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
-		__ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
+		rte_memory_order_release);
 
 	rte_eal_trace_service_lcore_stop(lcore);
 
@@ -847,7 +847,7 @@ struct core_state {
 {
 	struct core_state *cs = &lcore_states[lcore];
 
-	return __atomic_load_n(&cs->loops, __ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&cs->loops, rte_memory_order_relaxed);
 }
 
 static uint64_t
@@ -855,7 +855,7 @@ struct core_state {
 {
 	struct core_state *cs = &lcore_states[lcore];
 
-	return __atomic_load_n(&cs->cycles, __ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&cs->cycles, rte_memory_order_relaxed);
 }
 
 static uint64_t
@@ -863,8 +863,8 @@ struct core_state {
 {
 	struct core_state *cs = &lcore_states[lcore];
 
-	return __atomic_load_n(&cs->service_stats[service_id].calls,
-		__ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&cs->service_stats[service_id].calls,
+		rte_memory_order_relaxed);
 }
 
 static uint64_t
@@ -872,8 +872,8 @@ struct core_state {
 {
 	struct core_state *cs = &lcore_states[lcore];
 
-	return __atomic_load_n(&cs->service_stats[service_id].cycles,
-		__ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&cs->service_stats[service_id].cycles,
+		rte_memory_order_relaxed);
 }
 
 typedef uint64_t (*lcore_attr_get_fun)(uint32_t service_id,
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 39a2868..568e06e 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -597,8 +597,8 @@ static void rte_eal_init_alert(const char *msg)
 		return -1;
 	}
 
-	if (!__atomic_compare_exchange_n(&run_once, &has_run, 1, 0,
-					__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&run_once, &has_run, 1,
+					rte_memory_order_relaxed, rte_memory_order_relaxed)) {
 		rte_eal_init_alert("already called initialization.");
 		rte_errno = EALREADY;
 		return -1;
@@ -622,7 +622,7 @@ static void rte_eal_init_alert(const char *msg)
 	if (fctret < 0) {
 		rte_eal_init_alert("Invalid 'command line' arguments.");
 		rte_errno = EINVAL;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -636,20 +636,20 @@ static void rte_eal_init_alert(const char *msg)
 	if (eal_plugins_init() < 0) {
 		rte_eal_init_alert("Cannot init plugins");
 		rte_errno = EINVAL;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
 	if (eal_trace_init() < 0) {
 		rte_eal_init_alert("Cannot init trace");
 		rte_errno = EFAULT;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
 	if (eal_option_device_parse()) {
 		rte_errno = ENODEV;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -683,7 +683,7 @@ static void rte_eal_init_alert(const char *msg)
 	if (rte_bus_scan()) {
 		rte_eal_init_alert("Cannot scan the buses for devices");
 		rte_errno = ENODEV;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -736,7 +736,7 @@ static void rte_eal_init_alert(const char *msg)
 		if (ret < 0) {
 			rte_eal_init_alert("Cannot get hugepage information.");
 			rte_errno = EACCES;
-			__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+			rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 			return -1;
 		}
 	}
@@ -915,8 +915,8 @@ static void rte_eal_init_alert(const char *msg)
 	static uint32_t run_once;
 	uint32_t has_run = 0;
 
-	if (!__atomic_compare_exchange_n(&run_once, &has_run, 1, 0,
-			__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&run_once, &has_run, 1,
+			rte_memory_order_relaxed, rte_memory_order_relaxed)) {
 		RTE_LOG(WARNING, EAL, "Already called cleanup\n");
 		rte_errno = EALREADY;
 		return -1;
diff --git a/lib/eal/include/rte_epoll.h b/lib/eal/include/rte_epoll.h
index 01525f5..ae0cf20 100644
--- a/lib/eal/include/rte_epoll.h
+++ b/lib/eal/include/rte_epoll.h
@@ -13,6 +13,7 @@
 
 #include <stdint.h>
 
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -38,7 +39,7 @@ enum {
 
 /** interrupt epoll event obj, taken by epoll_event.ptr */
 struct rte_epoll_event {
-	uint32_t status;           /**< OUT: event status */
+	RTE_ATOMIC(uint32_t) status;           /**< OUT: event status */
 	int fd;                    /**< OUT: event fd */
 	int epfd;       /**< OUT: epoll instance the ev associated with */
 	struct rte_epoll_data epdata;
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 5f4b2fb..57da058 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -967,7 +967,7 @@ static void rte_eal_init_alert(const char *msg)
 rte_eal_init(int argc, char **argv)
 {
 	int i, fctret, ret;
-	static uint32_t run_once;
+	static RTE_ATOMIC(uint32_t) run_once;
 	uint32_t has_run = 0;
 	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
 	char thread_name[RTE_THREAD_NAME_SIZE];
@@ -983,8 +983,8 @@ static void rte_eal_init_alert(const char *msg)
 		return -1;
 	}
 
-	if (!__atomic_compare_exchange_n(&run_once, &has_run, 1, 0,
-					__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&run_once, &has_run, 1,
+					rte_memory_order_relaxed, rte_memory_order_relaxed)) {
 		rte_eal_init_alert("already called initialization.");
 		rte_errno = EALREADY;
 		return -1;
@@ -1008,14 +1008,14 @@ static void rte_eal_init_alert(const char *msg)
 	if (fctret < 0) {
 		rte_eal_init_alert("Invalid 'command line' arguments.");
 		rte_errno = EINVAL;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
 	if (eal_plugins_init() < 0) {
 		rte_eal_init_alert("Cannot init plugins");
 		rte_errno = EINVAL;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -1027,7 +1027,7 @@ static void rte_eal_init_alert(const char *msg)
 
 	if (eal_option_device_parse()) {
 		rte_errno = ENODEV;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -1061,7 +1061,7 @@ static void rte_eal_init_alert(const char *msg)
 	if (rte_bus_scan()) {
 		rte_eal_init_alert("Cannot scan the buses for devices");
 		rte_errno = ENODEV;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -1125,7 +1125,7 @@ static void rte_eal_init_alert(const char *msg)
 		if (ret < 0) {
 			rte_eal_init_alert("Cannot get hugepage information.");
 			rte_errno = EACCES;
-			__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+			rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 			return -1;
 		}
 	}
@@ -1150,7 +1150,7 @@ static void rte_eal_init_alert(const char *msg)
 			 internal_conf->syslog_facility) < 0) {
 		rte_eal_init_alert("Cannot init logging.");
 		rte_errno = ENOMEM;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -1158,7 +1158,7 @@ static void rte_eal_init_alert(const char *msg)
 	if (rte_eal_vfio_setup() < 0) {
 		rte_eal_init_alert("Cannot init VFIO");
 		rte_errno = EAGAIN;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 #endif
@@ -1345,11 +1345,11 @@ static void rte_eal_init_alert(const char *msg)
 int
 rte_eal_cleanup(void)
 {
-	static uint32_t run_once;
+	static RTE_ATOMIC(uint32_t) run_once;
 	uint32_t has_run = 0;
 
-	if (!__atomic_compare_exchange_n(&run_once, &has_run, 1, 0,
-					__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&run_once, &has_run, 1,
+					rte_memory_order_relaxed, rte_memory_order_relaxed)) {
 		RTE_LOG(WARNING, EAL, "Already called cleanup\n");
 		rte_errno = EALREADY;
 		return -1;
diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
index 24fff3d..d4919df 100644
--- a/lib/eal/linux/eal_interrupts.c
+++ b/lib/eal/linux/eal_interrupts.c
@@ -1266,9 +1266,9 @@ struct rte_intr_source {
 		 * ordering below acting as a lock to synchronize
 		 * the event data updating.
 		 */
-		if (!rev || !__atomic_compare_exchange_n(&rev->status,
-				    &valid_status, RTE_EPOLL_EXEC, 0,
-				    __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+		if (!rev || !rte_atomic_compare_exchange_strong_explicit(&rev->status,
+				    &valid_status, RTE_EPOLL_EXEC,
+				    rte_memory_order_acquire, rte_memory_order_relaxed))
 			continue;
 
 		events[count].status        = RTE_EPOLL_VALID;
@@ -1283,8 +1283,8 @@ struct rte_intr_source {
 		/* the status update should be observed after
 		 * the other fields change.
 		 */
-		__atomic_store_n(&rev->status, RTE_EPOLL_VALID,
-				__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&rev->status, RTE_EPOLL_VALID,
+				rte_memory_order_release);
 		count++;
 	}
 	return count;
@@ -1374,10 +1374,10 @@ struct rte_intr_source {
 {
 	uint32_t valid_status = RTE_EPOLL_VALID;
 
-	while (!__atomic_compare_exchange_n(&ev->status, &valid_status,
-		    RTE_EPOLL_INVALID, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
-		while (__atomic_load_n(&ev->status,
-				__ATOMIC_RELAXED) != RTE_EPOLL_VALID)
+	while (!rte_atomic_compare_exchange_strong_explicit(&ev->status, &valid_status,
+		    RTE_EPOLL_INVALID, rte_memory_order_acquire, rte_memory_order_relaxed)) {
+		while (rte_atomic_load_explicit(&ev->status,
+				rte_memory_order_relaxed) != RTE_EPOLL_VALID)
 			rte_pause();
 		valid_status = RTE_EPOLL_VALID;
 	}
@@ -1402,8 +1402,8 @@ struct rte_intr_source {
 		epfd = rte_intr_tls_epfd();
 
 	if (op == EPOLL_CTL_ADD) {
-		__atomic_store_n(&event->status, RTE_EPOLL_VALID,
-				__ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&event->status, RTE_EPOLL_VALID,
+				rte_memory_order_relaxed);
 		event->fd = fd;  /* ignore fd in event */
 		event->epfd = epfd;
 		ev.data.ptr = (void *)event;
@@ -1415,13 +1415,13 @@ struct rte_intr_source {
 			op, fd, strerror(errno));
 		if (op == EPOLL_CTL_ADD)
 			/* rollback status when CTL_ADD fail */
-			__atomic_store_n(&event->status, RTE_EPOLL_INVALID,
-					__ATOMIC_RELAXED);
+			rte_atomic_store_explicit(&event->status, RTE_EPOLL_INVALID,
+					rte_memory_order_relaxed);
 		return -1;
 	}
 
-	if (op == EPOLL_CTL_DEL && __atomic_load_n(&event->status,
-			__ATOMIC_RELAXED) != RTE_EPOLL_INVALID)
+	if (op == EPOLL_CTL_DEL && rte_atomic_load_explicit(&event->status,
+			rte_memory_order_relaxed) != RTE_EPOLL_INVALID)
 		eal_epoll_data_safe_free(event);
 
 	return 0;
@@ -1450,8 +1450,8 @@ struct rte_intr_source {
 	case RTE_INTR_EVENT_ADD:
 		epfd_op = EPOLL_CTL_ADD;
 		rev = rte_intr_elist_index_get(intr_handle, efd_idx);
-		if (__atomic_load_n(&rev->status,
-				__ATOMIC_RELAXED) != RTE_EPOLL_INVALID) {
+		if (rte_atomic_load_explicit(&rev->status,
+				rte_memory_order_relaxed) != RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event already been added.\n");
 			return -EEXIST;
 		}
@@ -1474,8 +1474,8 @@ struct rte_intr_source {
 	case RTE_INTR_EVENT_DEL:
 		epfd_op = EPOLL_CTL_DEL;
 		rev = rte_intr_elist_index_get(intr_handle, efd_idx);
-		if (__atomic_load_n(&rev->status,
-				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID) {
+		if (rte_atomic_load_explicit(&rev->status,
+				rte_memory_order_relaxed) == RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event does not exist.\n");
 			return -EPERM;
 		}
@@ -1500,8 +1500,8 @@ struct rte_intr_source {
 
 	for (i = 0; i < (uint32_t)rte_intr_nb_efd_get(intr_handle); i++) {
 		rev = rte_intr_elist_index_get(intr_handle, i);
-		if (__atomic_load_n(&rev->status,
-				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID)
+		if (rte_atomic_load_explicit(&rev->status,
+				rte_memory_order_relaxed) == RTE_EPOLL_INVALID)
 			continue;
 		if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) {
 			/* force free if the entry valid */
diff --git a/lib/eal/ppc/include/rte_atomic.h b/lib/eal/ppc/include/rte_atomic.h
index 7382412..645c713 100644
--- a/lib/eal/ppc/include/rte_atomic.h
+++ b/lib/eal/ppc/include/rte_atomic.h
@@ -48,7 +48,7 @@
 static inline int
 rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
 {
-	return __atomic_compare_exchange(dst, &exp, &src, 0, rte_memory_order_acquire,
+	return rte_atomic_compare_exchange_strong_explicit(dst, &exp, src, rte_memory_order_acquire,
 		rte_memory_order_acquire) ? 1 : 0;
 }
 
@@ -90,7 +90,7 @@ static inline int rte_atomic16_dec_and_test(rte_atomic16_t *v)
 static inline int
 rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
 {
-	return __atomic_compare_exchange(dst, &exp, &src, 0, rte_memory_order_acquire,
+	return rte_atomic_compare_exchange_strong_explicit(dst, &exp, src, rte_memory_order_acquire,
 		rte_memory_order_acquire) ? 1 : 0;
 }
 
@@ -132,7 +132,7 @@ static inline int rte_atomic32_dec_and_test(rte_atomic32_t *v)
 static inline int
 rte_atomic64_cmpset(volatile uint64_t *dst, uint64_t exp, uint64_t src)
 {
-	return __atomic_compare_exchange(dst, &exp, &src, 0, rte_memory_order_acquire,
+	return rte_atomic_compare_exchange_strong_explicit(dst, &exp, src, rte_memory_order_acquire,
 		rte_memory_order_acquire) ? 1 : 0;
 }
 
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index acf6484..145ac4b 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -9,6 +9,7 @@
 #include <rte_eal.h>
 #include <rte_common.h>
 #include <rte_errno.h>
+#include <rte_stdatomic.h>
 #include <rte_thread.h>
 
 #include "eal_windows.h"
@@ -19,7 +20,7 @@ struct eal_tls_key {
 
 struct thread_routine_ctx {
 	rte_thread_func thread_func;
-	bool thread_init_failed;
+	RTE_ATOMIC(bool) thread_init_failed;
 	void *routine_args;
 };
 
@@ -168,7 +169,8 @@ struct thread_routine_ctx {
 thread_func_wrapper(void *arg)
 {
 	struct thread_routine_ctx ctx = *(struct thread_routine_ctx *)arg;
-	const bool thread_exit = __atomic_load_n(&ctx.thread_init_failed, __ATOMIC_ACQUIRE);
+	const bool thread_exit = rte_atomic_load_explicit(
+		&ctx.thread_init_failed, rte_memory_order_acquire);
 
 	free(arg);
 
@@ -237,7 +239,7 @@ struct thread_routine_ctx {
 	}
 
 resume_thread:
-	__atomic_store_n(&ctx->thread_init_failed, thread_exit, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ctx->thread_init_failed, thread_exit, rte_memory_order_release);
 
 	if (ResumeThread(thread_handle) == (DWORD)-1) {
 		ret = thread_log_last_error("ResumeThread()");
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 06/21] eventdev: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (4 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 05/21] eal: " Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 07/21] gpudev: " Tyler Retzlaff
                   ` (16 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 drivers/event/cnxk/cnxk_tim_worker.h   |  4 +--
 lib/eventdev/rte_event_timer_adapter.c | 66 +++++++++++++++++-----------------
 lib/eventdev/rte_event_timer_adapter.h |  2 +-
 3 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/event/cnxk/cnxk_tim_worker.h b/drivers/event/cnxk/cnxk_tim_worker.h
index f0857f2..f530d8c 100644
--- a/drivers/event/cnxk/cnxk_tim_worker.h
+++ b/drivers/event/cnxk/cnxk_tim_worker.h
@@ -314,7 +314,7 @@
 
 	tim->impl_opaque[0] = (uintptr_t)chunk;
 	tim->impl_opaque[1] = (uintptr_t)bkt;
-	__atomic_store_n(&tim->state, RTE_EVENT_TIMER_ARMED, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&tim->state, RTE_EVENT_TIMER_ARMED, rte_memory_order_release);
 	cnxk_tim_bkt_inc_nent(bkt);
 	cnxk_tim_bkt_dec_lock_relaxed(bkt);
 
@@ -425,7 +425,7 @@
 
 	tim->impl_opaque[0] = (uintptr_t)chunk;
 	tim->impl_opaque[1] = (uintptr_t)bkt;
-	__atomic_store_n(&tim->state, RTE_EVENT_TIMER_ARMED, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&tim->state, RTE_EVENT_TIMER_ARMED, rte_memory_order_release);
 	cnxk_tim_bkt_inc_nent(bkt);
 	cnxk_tim_bkt_dec_lock_relaxed(bkt);
 
diff --git a/lib/eventdev/rte_event_timer_adapter.c b/lib/eventdev/rte_event_timer_adapter.c
index 427c4c6..2746670 100644
--- a/lib/eventdev/rte_event_timer_adapter.c
+++ b/lib/eventdev/rte_event_timer_adapter.c
@@ -630,12 +630,12 @@ struct swtim {
 	uint32_t timer_data_id;
 	/* Track which cores have actually armed a timer */
 	struct {
-		uint16_t v;
+		RTE_ATOMIC(uint16_t) v;
 	} __rte_cache_aligned in_use[RTE_MAX_LCORE];
 	/* Track which cores' timer lists should be polled */
-	unsigned int poll_lcores[RTE_MAX_LCORE];
+	RTE_ATOMIC(unsigned int) poll_lcores[RTE_MAX_LCORE];
 	/* The number of lists that should be polled */
-	int n_poll_lcores;
+	RTE_ATOMIC(int) n_poll_lcores;
 	/* Timers which have expired and can be returned to a mempool */
 	struct rte_timer *expired_timers[EXP_TIM_BUF_SZ];
 	/* The number of timers that can be returned to a mempool */
@@ -669,10 +669,10 @@ struct swtim {
 
 	if (unlikely(sw->in_use[lcore].v == 0)) {
 		sw->in_use[lcore].v = 1;
-		n_lcores = __atomic_fetch_add(&sw->n_poll_lcores, 1,
-					     __ATOMIC_RELAXED);
-		__atomic_store_n(&sw->poll_lcores[n_lcores], lcore,
-				__ATOMIC_RELAXED);
+		n_lcores = rte_atomic_fetch_add_explicit(&sw->n_poll_lcores, 1,
+					     rte_memory_order_relaxed);
+		rte_atomic_store_explicit(&sw->poll_lcores[n_lcores], lcore,
+				rte_memory_order_relaxed);
 	}
 
 	ret = event_buffer_add(&sw->buffer, &evtim->ev);
@@ -719,8 +719,8 @@ struct swtim {
 		sw->stats.evtim_exp_count++;
 
 		if (type == SINGLE)
-			__atomic_store_n(&evtim->state, RTE_EVENT_TIMER_NOT_ARMED,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&evtim->state, RTE_EVENT_TIMER_NOT_ARMED,
+				rte_memory_order_release);
 	}
 
 	if (event_buffer_batch_ready(&sw->buffer)) {
@@ -846,7 +846,7 @@ struct swtim {
 
 	if (swtim_did_tick(sw)) {
 		rte_timer_alt_manage(sw->timer_data_id,
-				     sw->poll_lcores,
+				     (unsigned int *)(uintptr_t)sw->poll_lcores,
 				     sw->n_poll_lcores,
 				     swtim_callback);
 
@@ -1027,7 +1027,7 @@ struct swtim {
 
 	/* Free outstanding timers */
 	rte_timer_stop_all(sw->timer_data_id,
-			   sw->poll_lcores,
+			   (unsigned int *)(uintptr_t)sw->poll_lcores,
 			   sw->n_poll_lcores,
 			   swtim_free_tim,
 			   sw);
@@ -1142,7 +1142,7 @@ struct swtim {
 	uint64_t cur_cycles;
 
 	/* Check that timer is armed */
-	n_state = __atomic_load_n(&evtim->state, __ATOMIC_ACQUIRE);
+	n_state = rte_atomic_load_explicit(&evtim->state, rte_memory_order_acquire);
 	if (n_state != RTE_EVENT_TIMER_ARMED)
 		return -EINVAL;
 
@@ -1201,15 +1201,15 @@ struct swtim {
 	 * The atomic compare-and-swap operation can prevent the race condition
 	 * on in_use flag between multiple non-EAL threads.
 	 */
-	if (unlikely(__atomic_compare_exchange_n(&sw->in_use[lcore_id].v,
-			&exp_state, 1, 0,
-			__ATOMIC_RELAXED, __ATOMIC_RELAXED))) {
+	if (unlikely(rte_atomic_compare_exchange_strong_explicit(&sw->in_use[lcore_id].v,
+			&exp_state, 1,
+			rte_memory_order_relaxed, rte_memory_order_relaxed))) {
 		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
 			      lcore_id);
-		n_lcores = __atomic_fetch_add(&sw->n_poll_lcores, 1,
-					     __ATOMIC_RELAXED);
-		__atomic_store_n(&sw->poll_lcores[n_lcores], lcore_id,
-				__ATOMIC_RELAXED);
+		n_lcores = rte_atomic_fetch_add_explicit(&sw->n_poll_lcores, 1,
+					     rte_memory_order_relaxed);
+		rte_atomic_store_explicit(&sw->poll_lcores[n_lcores], lcore_id,
+				rte_memory_order_relaxed);
 	}
 
 	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
@@ -1223,7 +1223,7 @@ struct swtim {
 	type = get_timer_type(adapter);
 
 	for (i = 0; i < nb_evtims; i++) {
-		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		n_state = rte_atomic_load_explicit(&evtims[i]->state, rte_memory_order_acquire);
 		if (n_state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
@@ -1235,9 +1235,9 @@ struct swtim {
 
 		if (unlikely(check_destination_event_queue(evtims[i],
 							   adapter) < 0)) {
-			__atomic_store_n(&evtims[i]->state,
+			rte_atomic_store_explicit(&evtims[i]->state,
 					RTE_EVENT_TIMER_ERROR,
-					__ATOMIC_RELAXED);
+					rte_memory_order_relaxed);
 			rte_errno = EINVAL;
 			break;
 		}
@@ -1250,15 +1250,15 @@ struct swtim {
 
 		ret = get_timeout_cycles(evtims[i], adapter, &cycles);
 		if (unlikely(ret == -1)) {
-			__atomic_store_n(&evtims[i]->state,
+			rte_atomic_store_explicit(&evtims[i]->state,
 					RTE_EVENT_TIMER_ERROR_TOOLATE,
-					__ATOMIC_RELAXED);
+					rte_memory_order_relaxed);
 			rte_errno = EINVAL;
 			break;
 		} else if (unlikely(ret == -2)) {
-			__atomic_store_n(&evtims[i]->state,
+			rte_atomic_store_explicit(&evtims[i]->state,
 					RTE_EVENT_TIMER_ERROR_TOOEARLY,
-					__ATOMIC_RELAXED);
+					rte_memory_order_relaxed);
 			rte_errno = EINVAL;
 			break;
 		}
@@ -1267,9 +1267,9 @@ struct swtim {
 					  type, lcore_id, NULL, evtims[i]);
 		if (ret < 0) {
 			/* tim was in RUNNING or CONFIG state */
-			__atomic_store_n(&evtims[i]->state,
+			rte_atomic_store_explicit(&evtims[i]->state,
 					RTE_EVENT_TIMER_ERROR,
-					__ATOMIC_RELEASE);
+					rte_memory_order_release);
 			break;
 		}
 
@@ -1277,8 +1277,8 @@ struct swtim {
 		/* RELEASE ordering guarantees the adapter specific value
 		 * changes observed before the update of state.
 		 */
-		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_ARMED,
-				__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&evtims[i]->state, RTE_EVENT_TIMER_ARMED,
+				rte_memory_order_release);
 	}
 
 	if (i < nb_evtims)
@@ -1320,7 +1320,7 @@ struct swtim {
 		/* ACQUIRE ordering guarantees the access of implementation
 		 * specific opaque data under the correct state.
 		 */
-		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		n_state = rte_atomic_load_explicit(&evtims[i]->state, rte_memory_order_acquire);
 		if (n_state == RTE_EVENT_TIMER_CANCELED) {
 			rte_errno = EALREADY;
 			break;
@@ -1346,8 +1346,8 @@ struct swtim {
 		 * to make sure the state update data observed between
 		 * threads.
 		 */
-		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_CANCELED,
-				__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&evtims[i]->state, RTE_EVENT_TIMER_CANCELED,
+				rte_memory_order_release);
 	}
 
 	return i;
diff --git a/lib/eventdev/rte_event_timer_adapter.h b/lib/eventdev/rte_event_timer_adapter.h
index fbdddf8..49e646a 100644
--- a/lib/eventdev/rte_event_timer_adapter.h
+++ b/lib/eventdev/rte_event_timer_adapter.h
@@ -498,7 +498,7 @@ struct rte_event_timer {
 	 * implementation specific values to share between the arm and cancel
 	 * operations.  The application should not modify this field.
 	 */
-	enum rte_event_timer_state state;
+	RTE_ATOMIC(enum rte_event_timer_state) state;
 	/**< State of the event timer. */
 	uint8_t user_meta[];
 	/**< Memory to store user specific metadata.
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 07/21] gpudev: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (5 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 06/21] eventdev: " Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 08/21] ipsec: " Tyler Retzlaff
                   ` (15 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/gpudev/gpudev.c        | 6 +++---
 lib/gpudev/gpudev_driver.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 8f12abe..6845d18 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -228,7 +228,7 @@ struct rte_gpu *
 	dev->mpshared->info.numa_node = -1;
 	dev->mpshared->info.parent = RTE_GPU_ID_NONE;
 	TAILQ_INIT(&dev->callbacks);
-	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&dev->mpshared->process_refcnt, 1, rte_memory_order_relaxed);
 
 	gpu_count++;
 	GPU_LOG(DEBUG, "new device %s (id %d) of total %d",
@@ -277,7 +277,7 @@ struct rte_gpu *
 
 	TAILQ_INIT(&dev->callbacks);
 	dev->mpshared = shared_dev;
-	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&dev->mpshared->process_refcnt, 1, rte_memory_order_relaxed);
 
 	gpu_count++;
 	GPU_LOG(DEBUG, "attached device %s (id %d) of total %d",
@@ -340,7 +340,7 @@ struct rte_gpu *
 
 	gpu_free_callbacks(dev);
 	dev->process_state = RTE_GPU_STATE_UNUSED;
-	__atomic_fetch_sub(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_sub_explicit(&dev->mpshared->process_refcnt, 1, rte_memory_order_relaxed);
 	gpu_count--;
 
 	return 0;
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 42898c7..0b1e7f2 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -69,7 +69,7 @@ struct rte_gpu_mpshared {
 	/* Device info structure. */
 	struct rte_gpu_info info;
 	/* Counter of processes using the device. */
-	uint16_t process_refcnt; /* Updated by this library. */
+	RTE_ATOMIC(uint16_t) process_refcnt; /* Updated by this library. */
 };
 
 struct rte_gpu {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 08/21] ipsec: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (6 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 07/21] gpudev: " Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 09/21] mbuf: " Tyler Retzlaff
                   ` (14 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/ipsec/ipsec_sqn.h | 2 +-
 lib/ipsec/sa.h        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/ipsec/ipsec_sqn.h b/lib/ipsec/ipsec_sqn.h
index 505950e..984a9dd 100644
--- a/lib/ipsec/ipsec_sqn.h
+++ b/lib/ipsec/ipsec_sqn.h
@@ -128,7 +128,7 @@
 
 	n = *num;
 	if (SQN_ATOMIC(sa))
-		sqn = __atomic_fetch_add(&sa->sqn.outb, n, __ATOMIC_RELAXED) + n;
+		sqn = rte_atomic_fetch_add_explicit(&sa->sqn.outb, n, rte_memory_order_relaxed) + n;
 	else {
 		sqn = sa->sqn.outb + n;
 		sa->sqn.outb = sqn;
diff --git a/lib/ipsec/sa.h b/lib/ipsec/sa.h
index ce4af8c..4b30bea 100644
--- a/lib/ipsec/sa.h
+++ b/lib/ipsec/sa.h
@@ -124,7 +124,7 @@ struct rte_ipsec_sa {
 	 * place from other frequently accessed data.
 	 */
 	union {
-		uint64_t outb;
+		RTE_ATOMIC(uint64_t) outb;
 		struct {
 			uint32_t rdidx; /* read index */
 			uint32_t wridx; /* write index */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 09/21] mbuf: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (7 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 08/21] ipsec: " Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 10/21] mempool: " Tyler Retzlaff
                   ` (13 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/mbuf/rte_mbuf.h      | 20 ++++++++++----------
 lib/mbuf/rte_mbuf_core.h |  4 ++--
 2 files changed, 12 insertions(+), 12 deletions(-)

diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 913c459..b8ab477 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -361,7 +361,7 @@ struct rte_pktmbuf_pool_private {
 static inline uint16_t
 rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 {
-	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&m->refcnt, rte_memory_order_relaxed);
 }
 
 /**
@@ -374,15 +374,15 @@ struct rte_pktmbuf_pool_private {
 static inline void
 rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 {
-	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
+	rte_atomic_store_explicit(&m->refcnt, new_value, rte_memory_order_relaxed);
 }
 
 /* internal */
 static inline uint16_t
 __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
 {
-	return __atomic_fetch_add(&m->refcnt, value,
-				 __ATOMIC_ACQ_REL) + value;
+	return rte_atomic_fetch_add_explicit(&m->refcnt, value,
+				 rte_memory_order_acq_rel) + value;
 }
 
 /**
@@ -463,7 +463,7 @@ struct rte_pktmbuf_pool_private {
 static inline uint16_t
 rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
 {
-	return __atomic_load_n(&shinfo->refcnt, __ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&shinfo->refcnt, rte_memory_order_relaxed);
 }
 
 /**
@@ -478,7 +478,7 @@ struct rte_pktmbuf_pool_private {
 rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
 	uint16_t new_value)
 {
-	__atomic_store_n(&shinfo->refcnt, new_value, __ATOMIC_RELAXED);
+	rte_atomic_store_explicit(&shinfo->refcnt, new_value, rte_memory_order_relaxed);
 }
 
 /**
@@ -502,8 +502,8 @@ struct rte_pktmbuf_pool_private {
 		return (uint16_t)value;
 	}
 
-	return __atomic_fetch_add(&shinfo->refcnt, value,
-				 __ATOMIC_ACQ_REL) + value;
+	return rte_atomic_fetch_add_explicit(&shinfo->refcnt, value,
+				 rte_memory_order_acq_rel) + value;
 }
 
 /** Mbuf prefetch */
@@ -1315,8 +1315,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 	 * Direct usage of add primitive to avoid
 	 * duplication of comparing with one.
 	 */
-	if (likely(__atomic_fetch_add(&shinfo->refcnt, -1,
-				     __ATOMIC_ACQ_REL) - 1))
+	if (likely(rte_atomic_fetch_add_explicit(&shinfo->refcnt, -1,
+				     rte_memory_order_acq_rel) - 1))
 		return 1;
 
 	/* Reinitialize counter before mbuf freeing. */
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index e9bc0d1..bf761f8 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -497,7 +497,7 @@ struct rte_mbuf {
 	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
 	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
 	 */
-	uint16_t refcnt;
+	RTE_ATOMIC(uint16_t) refcnt;
 
 	/**
 	 * Number of segments. Only valid for the first segment of an mbuf
@@ -674,7 +674,7 @@ struct rte_mbuf {
 struct rte_mbuf_ext_shared_info {
 	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
 	void *fcb_opaque;                        /**< Free callback argument */
-	uint16_t refcnt;
+	RTE_ATOMIC(uint16_t) refcnt;
 };
 
 /** Maximum number of nb_segs allowed. */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 10/21] mempool: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (8 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 09/21] mbuf: " Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 11/21] rcu: " Tyler Retzlaff
                   ` (12 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/mempool/rte_mempool.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index f70bf36..df87cd2 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -327,8 +327,8 @@ struct rte_mempool {
 		if (likely(__lcore_id < RTE_MAX_LCORE))                         \
 			(mp)->stats[__lcore_id].name += (n);                    \
 		else                                                            \
-			__atomic_fetch_add(&((mp)->stats[RTE_MAX_LCORE].name),  \
-					   (n), __ATOMIC_RELAXED);              \
+			rte_atomic_fetch_add_explicit(&((mp)->stats[RTE_MAX_LCORE].name),  \
+					   (n), rte_memory_order_relaxed);              \
 	} while (0)
 #else
 #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {} while (0)
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 11/21] rcu: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (9 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 10/21] mempool: " Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 12/21] pdump: " Tyler Retzlaff
                   ` (11 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/rcu/rte_rcu_qsbr.c | 48 +++++++++++++++++------------------
 lib/rcu/rte_rcu_qsbr.h | 68 +++++++++++++++++++++++++-------------------------
 2 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/lib/rcu/rte_rcu_qsbr.c b/lib/rcu/rte_rcu_qsbr.c
index 17be93e..4dc7714 100644
--- a/lib/rcu/rte_rcu_qsbr.c
+++ b/lib/rcu/rte_rcu_qsbr.c
@@ -102,21 +102,21 @@
 	 * go out of sync. Hence, additional checks are required.
 	 */
 	/* Check if the thread is already registered */
-	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_RELAXED);
+	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
+					rte_memory_order_relaxed);
 	if (old_bmap & 1UL << id)
 		return 0;
 
 	do {
 		new_bmap = old_bmap | (1UL << id);
-		success = __atomic_compare_exchange(
+		success = rte_atomic_compare_exchange_strong_explicit(
 					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					&old_bmap, &new_bmap, 0,
-					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
+					&old_bmap, new_bmap,
+					rte_memory_order_release, rte_memory_order_relaxed);
 
 		if (success)
-			__atomic_fetch_add(&v->num_threads,
-						1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&v->num_threads,
+						1, rte_memory_order_relaxed);
 		else if (old_bmap & (1UL << id))
 			/* Someone else registered this thread.
 			 * Counter should not be incremented.
@@ -154,8 +154,8 @@
 	 * go out of sync. Hence, additional checks are required.
 	 */
 	/* Check if the thread is already unregistered */
-	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_RELAXED);
+	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
+					rte_memory_order_relaxed);
 	if (!(old_bmap & (1UL << id)))
 		return 0;
 
@@ -165,14 +165,14 @@
 		 * completed before removal of the thread from the list of
 		 * reporting threads.
 		 */
-		success = __atomic_compare_exchange(
+		success = rte_atomic_compare_exchange_strong_explicit(
 					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					&old_bmap, &new_bmap, 0,
-					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
+					&old_bmap, new_bmap,
+					rte_memory_order_release, rte_memory_order_relaxed);
 
 		if (success)
-			__atomic_fetch_sub(&v->num_threads,
-						1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_sub_explicit(&v->num_threads,
+						1, rte_memory_order_relaxed);
 		else if (!(old_bmap & (1UL << id)))
 			/* Someone else unregistered this thread.
 			 * Counter should not be incremented.
@@ -227,8 +227,8 @@
 
 	fprintf(f, "  Registered thread IDs = ");
 	for (i = 0; i < v->num_elems; i++) {
-		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_ACQUIRE);
+		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
+					rte_memory_order_acquire);
 		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
 		while (bmap) {
 			t = __builtin_ctzl(bmap);
@@ -241,26 +241,26 @@
 	fprintf(f, "\n");
 
 	fprintf(f, "  Token = %" PRIu64 "\n",
-			__atomic_load_n(&v->token, __ATOMIC_ACQUIRE));
+			rte_atomic_load_explicit(&v->token, rte_memory_order_acquire));
 
 	fprintf(f, "  Least Acknowledged Token = %" PRIu64 "\n",
-			__atomic_load_n(&v->acked_token, __ATOMIC_ACQUIRE));
+			rte_atomic_load_explicit(&v->acked_token, rte_memory_order_acquire));
 
 	fprintf(f, "Quiescent State Counts for readers:\n");
 	for (i = 0; i < v->num_elems; i++) {
-		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_ACQUIRE);
+		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
+					rte_memory_order_acquire);
 		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
 		while (bmap) {
 			t = __builtin_ctzl(bmap);
 			fprintf(f, "thread ID = %u, count = %" PRIu64 ", lock count = %u\n",
 				id + t,
-				__atomic_load_n(
+				rte_atomic_load_explicit(
 					&v->qsbr_cnt[id + t].cnt,
-					__ATOMIC_RELAXED),
-				__atomic_load_n(
+					rte_memory_order_relaxed),
+				rte_atomic_load_explicit(
 					&v->qsbr_cnt[id + t].lock_cnt,
-					__ATOMIC_RELAXED));
+					rte_memory_order_relaxed));
 			bmap &= ~(1UL << t);
 		}
 	}
diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h
index 87e1b55..9f4aed2 100644
--- a/lib/rcu/rte_rcu_qsbr.h
+++ b/lib/rcu/rte_rcu_qsbr.h
@@ -63,11 +63,11 @@
  * Given thread id needs to be converted to index into the array and
  * the id within the array element.
  */
-#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8)
+#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(RTE_ATOMIC(uint64_t)) * 8)
 #define __RTE_QSBR_THRID_ARRAY_SIZE(max_threads) \
 	RTE_ALIGN(RTE_ALIGN_MUL_CEIL(max_threads, \
 		__RTE_QSBR_THRID_ARRAY_ELM_SIZE) >> 3, RTE_CACHE_LINE_SIZE)
-#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *) \
+#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t __rte_atomic *) \
 	((struct rte_rcu_qsbr_cnt *)(v + 1) + v->max_threads) + i)
 #define __RTE_QSBR_THRID_INDEX_SHIFT 6
 #define __RTE_QSBR_THRID_MASK 0x3f
@@ -75,13 +75,13 @@
 
 /* Worker thread counter */
 struct rte_rcu_qsbr_cnt {
-	uint64_t cnt;
+	RTE_ATOMIC(uint64_t) cnt;
 	/**< Quiescent state counter. Value 0 indicates the thread is offline
 	 *   64b counter is used to avoid adding more code to address
 	 *   counter overflow. Changing this to 32b would require additional
 	 *   changes to various APIs.
 	 */
-	uint32_t lock_cnt;
+	RTE_ATOMIC(uint32_t) lock_cnt;
 	/**< Lock counter. Used when RTE_LIBRTE_RCU_DEBUG is enabled */
 } __rte_cache_aligned;
 
@@ -97,16 +97,16 @@ struct rte_rcu_qsbr_cnt {
  * 2) Register thread ID array
  */
 struct rte_rcu_qsbr {
-	uint64_t token __rte_cache_aligned;
+	RTE_ATOMIC(uint64_t) token __rte_cache_aligned;
 	/**< Counter to allow for multiple concurrent quiescent state queries */
-	uint64_t acked_token;
+	RTE_ATOMIC(uint64_t) acked_token;
 	/**< Least token acked by all the threads in the last call to
 	 *   rte_rcu_qsbr_check API.
 	 */
 
 	uint32_t num_elems __rte_cache_aligned;
 	/**< Number of elements in the thread ID array */
-	uint32_t num_threads;
+	RTE_ATOMIC(uint32_t) num_threads;
 	/**< Number of threads currently using this QS variable */
 	uint32_t max_threads;
 	/**< Maximum number of threads using this QS variable */
@@ -311,13 +311,13 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * the following will not move down after the load of any shared
 	 * data structure.
 	 */
-	t = __atomic_load_n(&v->token, __ATOMIC_RELAXED);
+	t = rte_atomic_load_explicit(&v->token, rte_memory_order_relaxed);
 
-	/* __atomic_store_n(cnt, __ATOMIC_RELAXED) is used to ensure
+	/* rte_atomic_store_explicit(cnt, rte_memory_order_relaxed) is used to ensure
 	 * 'cnt' (64b) is accessed atomically.
 	 */
-	__atomic_store_n(&v->qsbr_cnt[thread_id].cnt,
-		t, __ATOMIC_RELAXED);
+	rte_atomic_store_explicit(&v->qsbr_cnt[thread_id].cnt,
+		t, rte_memory_order_relaxed);
 
 	/* The subsequent load of the data structure should not
 	 * move above the store. Hence a store-load barrier
@@ -326,7 +326,7 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * writer might not see that the reader is online, even though
 	 * the reader is referencing the shared data structure.
 	 */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 }
 
 /**
@@ -362,8 +362,8 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * data structure can not move after this store.
 	 */
 
-	__atomic_store_n(&v->qsbr_cnt[thread_id].cnt,
-		__RTE_QSBR_CNT_THR_OFFLINE, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&v->qsbr_cnt[thread_id].cnt,
+		__RTE_QSBR_CNT_THR_OFFLINE, rte_memory_order_release);
 }
 
 /**
@@ -394,8 +394,8 @@ struct rte_rcu_qsbr_dq_parameters {
 
 #if defined(RTE_LIBRTE_RCU_DEBUG)
 	/* Increment the lock counter */
-	__atomic_fetch_add(&v->qsbr_cnt[thread_id].lock_cnt,
-				1, __ATOMIC_ACQUIRE);
+	rte_atomic_fetch_add_explicit(&v->qsbr_cnt[thread_id].lock_cnt,
+				1, rte_memory_order_acquire);
 #endif
 }
 
@@ -427,8 +427,8 @@ struct rte_rcu_qsbr_dq_parameters {
 
 #if defined(RTE_LIBRTE_RCU_DEBUG)
 	/* Decrement the lock counter */
-	__atomic_fetch_sub(&v->qsbr_cnt[thread_id].lock_cnt,
-				1, __ATOMIC_RELEASE);
+	rte_atomic_fetch_sub_explicit(&v->qsbr_cnt[thread_id].lock_cnt,
+				1, rte_memory_order_release);
 
 	__RTE_RCU_IS_LOCK_CNT_ZERO(v, thread_id, WARNING,
 				"Lock counter %u. Nested locks?\n",
@@ -461,7 +461,7 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * structure are visible to the workers before the token
 	 * update is visible.
 	 */
-	t = __atomic_fetch_add(&v->token, 1, __ATOMIC_RELEASE) + 1;
+	t = rte_atomic_fetch_add_explicit(&v->token, 1, rte_memory_order_release) + 1;
 
 	return t;
 }
@@ -493,16 +493,16 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * Later loads of the shared data structure should not move
 	 * above this load. Hence, use load-acquire.
 	 */
-	t = __atomic_load_n(&v->token, __ATOMIC_ACQUIRE);
+	t = rte_atomic_load_explicit(&v->token, rte_memory_order_acquire);
 
 	/* Check if there are updates available from the writer.
 	 * Inform the writer that updates are visible to this reader.
 	 * Prior loads of the shared data structure should not move
 	 * beyond this store. Hence use store-release.
 	 */
-	if (t != __atomic_load_n(&v->qsbr_cnt[thread_id].cnt, __ATOMIC_RELAXED))
-		__atomic_store_n(&v->qsbr_cnt[thread_id].cnt,
-					 t, __ATOMIC_RELEASE);
+	if (t != rte_atomic_load_explicit(&v->qsbr_cnt[thread_id].cnt, rte_memory_order_relaxed))
+		rte_atomic_store_explicit(&v->qsbr_cnt[thread_id].cnt,
+					 t, rte_memory_order_release);
 
 	__RTE_RCU_DP_LOG(DEBUG, "%s: update: token = %" PRIu64 ", Thread ID = %d",
 		__func__, t, thread_id);
@@ -517,7 +517,7 @@ struct rte_rcu_qsbr_dq_parameters {
 	uint32_t i, j, id;
 	uint64_t bmap;
 	uint64_t c;
-	uint64_t *reg_thread_id;
+	RTE_ATOMIC(uint64_t) *reg_thread_id;
 	uint64_t acked_token = __RTE_QSBR_CNT_MAX;
 
 	for (i = 0, reg_thread_id = __RTE_QSBR_THRID_ARRAY_ELM(v, 0);
@@ -526,7 +526,7 @@ struct rte_rcu_qsbr_dq_parameters {
 		/* Load the current registered thread bit map before
 		 * loading the reader thread quiescent state counters.
 		 */
-		bmap = __atomic_load_n(reg_thread_id, __ATOMIC_ACQUIRE);
+		bmap = rte_atomic_load_explicit(reg_thread_id, rte_memory_order_acquire);
 		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
 
 		while (bmap) {
@@ -534,9 +534,9 @@ struct rte_rcu_qsbr_dq_parameters {
 			__RTE_RCU_DP_LOG(DEBUG,
 				"%s: check: token = %" PRIu64 ", wait = %d, Bit Map = 0x%" PRIx64 ", Thread ID = %d",
 				__func__, t, wait, bmap, id + j);
-			c = __atomic_load_n(
+			c = rte_atomic_load_explicit(
 					&v->qsbr_cnt[id + j].cnt,
-					__ATOMIC_ACQUIRE);
+					rte_memory_order_acquire);
 			__RTE_RCU_DP_LOG(DEBUG,
 				"%s: status: token = %" PRIu64 ", wait = %d, Thread QS cnt = %" PRIu64 ", Thread ID = %d",
 				__func__, t, wait, c, id+j);
@@ -554,8 +554,8 @@ struct rte_rcu_qsbr_dq_parameters {
 				/* This thread might have unregistered.
 				 * Re-read the bitmap.
 				 */
-				bmap = __atomic_load_n(reg_thread_id,
-						__ATOMIC_ACQUIRE);
+				bmap = rte_atomic_load_explicit(reg_thread_id,
+						rte_memory_order_acquire);
 
 				continue;
 			}
@@ -576,8 +576,8 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * no need to update this very accurately using compare-and-swap.
 	 */
 	if (acked_token != __RTE_QSBR_CNT_MAX)
-		__atomic_store_n(&v->acked_token, acked_token,
-			__ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&v->acked_token, acked_token,
+			rte_memory_order_relaxed);
 
 	return 1;
 }
@@ -598,7 +598,7 @@ struct rte_rcu_qsbr_dq_parameters {
 			"%s: check: token = %" PRIu64 ", wait = %d, Thread ID = %d",
 			__func__, t, wait, i);
 		while (1) {
-			c = __atomic_load_n(&cnt->cnt, __ATOMIC_ACQUIRE);
+			c = rte_atomic_load_explicit(&cnt->cnt, rte_memory_order_acquire);
 			__RTE_RCU_DP_LOG(DEBUG,
 				"%s: status: token = %" PRIu64 ", wait = %d, Thread QS cnt = %" PRIu64 ", Thread ID = %d",
 				__func__, t, wait, c, i);
@@ -628,8 +628,8 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * no need to update this very accurately using compare-and-swap.
 	 */
 	if (acked_token != __RTE_QSBR_CNT_MAX)
-		__atomic_store_n(&v->acked_token, acked_token,
-			__ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&v->acked_token, acked_token,
+			rte_memory_order_relaxed);
 
 	return 1;
 }
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 12/21] pdump: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (10 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 11/21] rcu: " Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 13/21] stack: " Tyler Retzlaff
                   ` (10 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/pdump/rte_pdump.c | 14 +++++++-------
 lib/pdump/rte_pdump.h |  8 ++++----
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 53cca10..80b90c6 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -110,8 +110,8 @@ struct pdump_response {
 		 * then packet doesn't match the filter (will be ignored).
 		 */
 		if (cbs->filter && rcs[i] == 0) {
-			__atomic_fetch_add(&stats->filtered,
-					   1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&stats->filtered,
+					   1, rte_memory_order_relaxed);
 			continue;
 		}
 
@@ -127,18 +127,18 @@ struct pdump_response {
 			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
 
 		if (unlikely(p == NULL))
-			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&stats->nombuf, 1, rte_memory_order_relaxed);
 		else
 			dup_bufs[d_pkts++] = p;
 	}
 
-	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&stats->accepted, d_pkts, rte_memory_order_relaxed);
 
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)&dup_bufs[0], d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
+		rte_atomic_fetch_add_explicit(&stats->ringfull, drops, rte_memory_order_relaxed);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
@@ -720,10 +720,10 @@ struct pdump_response {
 	uint16_t qid;
 
 	for (qid = 0; qid < nq; qid++) {
-		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+		const RTE_ATOMIC(uint64_t) *perq = (const uint64_t __rte_atomic *)&stats[port][qid];
 
 		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
-			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			val = rte_atomic_load_explicit(&perq[i], rte_memory_order_relaxed);
 			sum[i] += val;
 		}
 	}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index b1a3918..7feb2b6 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -233,10 +233,10 @@ enum {
  * The statistics are sum of both receive and transmit queues.
  */
 struct rte_pdump_stats {
-	uint64_t accepted; /**< Number of packets accepted by filter. */
-	uint64_t filtered; /**< Number of packets rejected by filter. */
-	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
-	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+	RTE_ATOMIC(uint64_t) accepted; /**< Number of packets accepted by filter. */
+	RTE_ATOMIC(uint64_t) filtered; /**< Number of packets rejected by filter. */
+	RTE_ATOMIC(uint64_t) nombuf;   /**< Number of mbuf allocation failures. */
+	RTE_ATOMIC(uint64_t) ringfull; /**< Number of missed packets due to ring full. */
 
 	uint64_t reserved[4]; /**< Reserved and pad to cache line */
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 13/21] stack: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (11 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 12/21] pdump: " Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 14/21] telemetry: " Tyler Retzlaff
                   ` (9 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/stack/rte_stack.h            |  2 +-
 lib/stack/rte_stack_lf_c11.h     | 24 ++++++++++++------------
 lib/stack/rte_stack_lf_generic.h | 18 +++++++++---------
 3 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/lib/stack/rte_stack.h b/lib/stack/rte_stack.h
index 921d29a..a379300 100644
--- a/lib/stack/rte_stack.h
+++ b/lib/stack/rte_stack.h
@@ -44,7 +44,7 @@ struct rte_stack_lf_list {
 	/** List head */
 	struct rte_stack_lf_head head __rte_aligned(16);
 	/** List len */
-	uint64_t len;
+	RTE_ATOMIC(uint64_t) len;
 };
 
 /* Structure containing two lock-free LIFO lists: the stack itself and a list
diff --git a/lib/stack/rte_stack_lf_c11.h b/lib/stack/rte_stack_lf_c11.h
index 687a6f6..9cb6998 100644
--- a/lib/stack/rte_stack_lf_c11.h
+++ b/lib/stack/rte_stack_lf_c11.h
@@ -26,8 +26,8 @@
 	 * elements. If the mempool is near-empty to the point that this is a
 	 * concern, the user should consider increasing the mempool size.
 	 */
-	return (unsigned int)__atomic_load_n(&s->stack_lf.used.len,
-					     __ATOMIC_RELAXED);
+	return (unsigned int)rte_atomic_load_explicit(&s->stack_lf.used.len,
+					     rte_memory_order_relaxed);
 }
 
 static __rte_always_inline void
@@ -59,14 +59,14 @@
 				(rte_int128_t *)&list->head,
 				(rte_int128_t *)&old_head,
 				(rte_int128_t *)&new_head,
-				1, __ATOMIC_RELEASE,
-				__ATOMIC_RELAXED);
+				1, rte_memory_order_release,
+				rte_memory_order_relaxed);
 	} while (success == 0);
 
 	/* Ensure the stack modifications are not reordered with respect
 	 * to the LIFO len update.
 	 */
-	__atomic_fetch_add(&list->len, num, __ATOMIC_RELEASE);
+	rte_atomic_fetch_add_explicit(&list->len, num, rte_memory_order_release);
 }
 
 static __rte_always_inline struct rte_stack_lf_elem *
@@ -80,7 +80,7 @@
 	int success;
 
 	/* Reserve num elements, if available */
-	len = __atomic_load_n(&list->len, __ATOMIC_RELAXED);
+	len = rte_atomic_load_explicit(&list->len, rte_memory_order_relaxed);
 
 	while (1) {
 		/* Does the list contain enough elements? */
@@ -88,10 +88,10 @@
 			return NULL;
 
 		/* len is updated on failure */
-		if (__atomic_compare_exchange_n(&list->len,
+		if (rte_atomic_compare_exchange_weak_explicit(&list->len,
 						&len, len - num,
-						1, __ATOMIC_ACQUIRE,
-						__ATOMIC_RELAXED))
+						rte_memory_order_acquire,
+						rte_memory_order_relaxed))
 			break;
 	}
 
@@ -110,7 +110,7 @@
 		 * elements are properly ordered with respect to the head
 		 * pointer read.
 		 */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 
 		rte_prefetch0(old_head.top);
 
@@ -159,8 +159,8 @@
 				(rte_int128_t *)&list->head,
 				(rte_int128_t *)&old_head,
 				(rte_int128_t *)&new_head,
-				0, __ATOMIC_RELAXED,
-				__ATOMIC_RELAXED);
+				0, rte_memory_order_relaxed,
+				rte_memory_order_relaxed);
 	} while (success == 0);
 
 	return old_head.top;
diff --git a/lib/stack/rte_stack_lf_generic.h b/lib/stack/rte_stack_lf_generic.h
index 39f7ff3..cc69e4d 100644
--- a/lib/stack/rte_stack_lf_generic.h
+++ b/lib/stack/rte_stack_lf_generic.h
@@ -27,7 +27,7 @@
 	 * concern, the user should consider increasing the mempool size.
 	 */
 	/* NOTE: review for potential ordering optimization */
-	return __atomic_load_n(&s->stack_lf.used.len, __ATOMIC_SEQ_CST);
+	return rte_atomic_load_explicit(&s->stack_lf.used.len, rte_memory_order_seq_cst);
 }
 
 static __rte_always_inline void
@@ -64,11 +64,11 @@
 				(rte_int128_t *)&list->head,
 				(rte_int128_t *)&old_head,
 				(rte_int128_t *)&new_head,
-				1, __ATOMIC_RELEASE,
-				__ATOMIC_RELAXED);
+				1, rte_memory_order_release,
+				rte_memory_order_relaxed);
 	} while (success == 0);
 	/* NOTE: review for potential ordering optimization */
-	__atomic_fetch_add(&list->len, num, __ATOMIC_SEQ_CST);
+	rte_atomic_fetch_add_explicit(&list->len, num, rte_memory_order_seq_cst);
 }
 
 static __rte_always_inline struct rte_stack_lf_elem *
@@ -83,15 +83,15 @@
 	/* Reserve num elements, if available */
 	while (1) {
 		/* NOTE: review for potential ordering optimization */
-		uint64_t len = __atomic_load_n(&list->len, __ATOMIC_SEQ_CST);
+		uint64_t len = rte_atomic_load_explicit(&list->len, rte_memory_order_seq_cst);
 
 		/* Does the list contain enough elements? */
 		if (unlikely(len < num))
 			return NULL;
 
 		/* NOTE: review for potential ordering optimization */
-		if (__atomic_compare_exchange_n(&list->len, &len, len - num,
-				0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+		if (rte_atomic_compare_exchange_strong_explicit(&list->len, &len, len - num,
+				rte_memory_order_seq_cst, rte_memory_order_seq_cst))
 			break;
 	}
 
@@ -143,8 +143,8 @@
 				(rte_int128_t *)&list->head,
 				(rte_int128_t *)&old_head,
 				(rte_int128_t *)&new_head,
-				1, __ATOMIC_RELEASE,
-				__ATOMIC_RELAXED);
+				1, rte_memory_order_release,
+				rte_memory_order_relaxed);
 	} while (success == 0);
 
 	return old_head.top;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 14/21] telemetry: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (12 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 13/21] stack: " Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:08 ` [PATCH 15/21] vhost: " Tyler Retzlaff
                   ` (8 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/telemetry/telemetry.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index aeb078c..9298284 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -45,7 +45,7 @@ struct socket {
 	int sock;
 	char path[sizeof(((struct sockaddr_un *)0)->sun_path)];
 	handler fn;
-	uint16_t *num_clients;
+	RTE_ATOMIC(uint16_t) *num_clients;
 };
 static struct socket v2_socket; /* socket for v2 telemetry */
 static struct socket v1_socket; /* socket for v1 telemetry */
@@ -64,7 +64,7 @@ struct socket {
 /* Used when accessing or modifying list of command callbacks */
 static rte_spinlock_t callback_sl = RTE_SPINLOCK_INITIALIZER;
 #ifndef RTE_EXEC_ENV_WINDOWS
-static uint16_t v2_clients;
+static RTE_ATOMIC(uint16_t) v2_clients;
 #endif /* !RTE_EXEC_ENV_WINDOWS */
 
 int
@@ -404,7 +404,7 @@ struct socket {
 		bytes = read(s, buffer, sizeof(buffer) - 1);
 	}
 	close(s);
-	__atomic_fetch_sub(&v2_clients, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_sub_explicit(&v2_clients, 1, rte_memory_order_relaxed);
 	return NULL;
 }
 
@@ -421,14 +421,14 @@ struct socket {
 			return NULL;
 		}
 		if (s->num_clients != NULL) {
-			uint16_t conns = __atomic_load_n(s->num_clients,
-					__ATOMIC_RELAXED);
+			uint16_t conns = rte_atomic_load_explicit(s->num_clients,
+					rte_memory_order_relaxed);
 			if (conns >= MAX_CONNECTIONS) {
 				close(s_accepted);
 				continue;
 			}
-			__atomic_fetch_add(s->num_clients, 1,
-					__ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(s->num_clients, 1,
+					rte_memory_order_relaxed);
 		}
 		rc = pthread_create(&th, NULL, s->fn,
 				    (void *)(uintptr_t)s_accepted);
@@ -437,8 +437,8 @@ struct socket {
 				 strerror(rc));
 			close(s_accepted);
 			if (s->num_clients != NULL)
-				__atomic_fetch_sub(s->num_clients, 1,
-						   __ATOMIC_RELAXED);
+				rte_atomic_fetch_sub_explicit(s->num_clients, 1,
+						   rte_memory_order_relaxed);
 			continue;
 		}
 		pthread_detach(th);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 15/21] vhost: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (13 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 14/21] telemetry: " Tyler Retzlaff
@ 2023-10-16 23:08 ` Tyler Retzlaff
  2023-10-16 23:09 ` [PATCH 16/21] cryptodev: " Tyler Retzlaff
                   ` (7 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:08 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/vhost/vdpa.c            |  3 ++-
 lib/vhost/vhost.c           | 42 ++++++++++++++++----------------
 lib/vhost/vhost.h           | 39 ++++++++++++++++--------------
 lib/vhost/vhost_user.c      |  6 ++---
 lib/vhost/virtio_net.c      | 58 +++++++++++++++++++++++++--------------------
 lib/vhost/virtio_net_ctrl.c |  6 +++--
 6 files changed, 84 insertions(+), 70 deletions(-)

diff --git a/lib/vhost/vdpa.c b/lib/vhost/vdpa.c
index 6284ea2..219eef8 100644
--- a/lib/vhost/vdpa.c
+++ b/lib/vhost/vdpa.c
@@ -235,7 +235,8 @@ struct rte_vdpa_device *
 	}
 
 	/* used idx is the synchronization point for the split vring */
-	__atomic_store_n(&vq->used->idx, idx_m, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit((unsigned short __rte_atomic *)&vq->used->idx,
+		idx_m, rte_memory_order_release);
 
 	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
 		vring_used_event(s_vring) = idx_m;
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 7fde412..bdcf85b 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -128,12 +128,13 @@ struct vhost_vq_stats_name_off {
 {
 #if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 70100)
 	/*
-	 * __sync_ built-ins are deprecated, but __atomic_ ones
+	 * __sync_ built-ins are deprecated, but rte_atomic_ ones
 	 * are sub-optimized in older GCC versions.
 	 */
 	__sync_fetch_and_or_1(addr, (1U << nr));
 #else
-	__atomic_fetch_or(addr, (1U << nr), __ATOMIC_RELAXED);
+	rte_atomic_fetch_or_explicit((volatile uint8_t __rte_atomic *)addr, (1U << nr),
+		rte_memory_order_relaxed);
 #endif
 }
 
@@ -155,7 +156,7 @@ struct vhost_vq_stats_name_off {
 		return;
 
 	/* To make sure guest memory updates are committed before logging */
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	page = addr / VHOST_LOG_PAGE;
 	while (page * VHOST_LOG_PAGE < addr + len) {
@@ -197,7 +198,7 @@ struct vhost_vq_stats_name_off {
 	if (unlikely(!vq->log_cache))
 		return;
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	log_base = (unsigned long *)(uintptr_t)dev->log_base;
 
@@ -206,17 +207,18 @@ struct vhost_vq_stats_name_off {
 
 #if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 70100)
 		/*
-		 * '__sync' builtins are deprecated, but '__atomic' ones
+		 * '__sync' builtins are deprecated, but 'rte_atomic' ones
 		 * are sub-optimized in older GCC versions.
 		 */
 		__sync_fetch_and_or(log_base + elem->offset, elem->val);
 #else
-		__atomic_fetch_or(log_base + elem->offset, elem->val,
-				__ATOMIC_RELAXED);
+		rte_atomic_fetch_or_explicit(
+			(unsigned long __rte_atomic *)(log_base + elem->offset),
+			elem->val, rte_memory_order_relaxed);
 #endif
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	vq->log_cache_nb_elem = 0;
 }
@@ -231,7 +233,7 @@ struct vhost_vq_stats_name_off {
 
 	if (unlikely(!vq->log_cache)) {
 		/* No logging cache allocated, write dirty log map directly */
-		rte_atomic_thread_fence(__ATOMIC_RELEASE);
+		rte_atomic_thread_fence(rte_memory_order_release);
 		vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page);
 
 		return;
@@ -251,7 +253,7 @@ struct vhost_vq_stats_name_off {
 		 * No more room for a new log cache entry,
 		 * so write the dirty log map directly.
 		 */
-		rte_atomic_thread_fence(__ATOMIC_RELEASE);
+		rte_atomic_thread_fence(rte_memory_order_release);
 		vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page);
 
 		return;
@@ -1184,11 +1186,11 @@ struct vhost_vq_stats_name_off {
 	if (unlikely(idx >= vq->size))
 		return -1;
 
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	vq->inflight_split->desc[idx].inflight = 0;
 
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	vq->inflight_split->used_idx = last_used_idx;
 	return 0;
@@ -1227,11 +1229,11 @@ struct vhost_vq_stats_name_off {
 	if (unlikely(head >= vq->size))
 		return -1;
 
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	inflight_info->desc[head].inflight = 0;
 
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	inflight_info->old_free_head = inflight_info->free_head;
 	inflight_info->old_used_idx = inflight_info->used_idx;
@@ -1454,7 +1456,7 @@ struct vhost_vq_stats_name_off {
 			vq->avail_wrap_counter << 15;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	vq->device_event->flags = flags;
 	return 0;
@@ -1519,16 +1521,16 @@ struct vhost_vq_stats_name_off {
 
 	rte_rwlock_read_lock(&vq->access_lock);
 
-	__atomic_store_n(&vq->irq_pending, false, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&vq->irq_pending, false, rte_memory_order_release);
 
 	if (dev->backend_ops->inject_irq(dev, vq)) {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-			__atomic_fetch_add(&vq->stats.guest_notifications_error,
-					1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&vq->stats.guest_notifications_error,
+					1, rte_memory_order_relaxed);
 	} else {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-			__atomic_fetch_add(&vq->stats.guest_notifications,
-					1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&vq->stats.guest_notifications,
+					1, rte_memory_order_relaxed);
 		if (dev->notify_ops->guest_notified)
 			dev->notify_ops->guest_notified(dev->vid);
 	}
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 5fc9035..f8624fb 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -158,9 +158,9 @@ struct virtqueue_stats {
 	uint64_t inflight_completed;
 	uint64_t guest_notifications_suppressed;
 	/* Counters below are atomic, and should be incremented as such. */
-	uint64_t guest_notifications;
-	uint64_t guest_notifications_offloaded;
-	uint64_t guest_notifications_error;
+	RTE_ATOMIC(uint64_t) guest_notifications;
+	RTE_ATOMIC(uint64_t) guest_notifications_offloaded;
+	RTE_ATOMIC(uint64_t) guest_notifications_error;
 };
 
 /**
@@ -348,7 +348,7 @@ struct vhost_virtqueue {
 	struct vhost_vring_addr ring_addrs;
 	struct virtqueue_stats	stats;
 
-	bool irq_pending;
+	RTE_ATOMIC(bool) irq_pending;
 } __rte_cache_aligned;
 
 /* Virtio device status as per Virtio specification */
@@ -486,7 +486,7 @@ struct virtio_net {
 	uint32_t		flags;
 	uint16_t		vhost_hlen;
 	/* to tell if we need broadcast rarp packet */
-	int16_t			broadcast_rarp;
+	RTE_ATOMIC(int16_t)	broadcast_rarp;
 	uint32_t		nr_vring;
 	int			async_copy;
 
@@ -557,7 +557,8 @@ struct virtio_net {
 static inline bool
 desc_is_avail(struct vring_packed_desc *desc, bool wrap_counter)
 {
-	uint16_t flags = __atomic_load_n(&desc->flags, __ATOMIC_ACQUIRE);
+	uint16_t flags = rte_atomic_load_explicit((unsigned short __rte_atomic *)&desc->flags,
+		rte_memory_order_acquire);
 
 	return wrap_counter == !!(flags & VRING_DESC_F_AVAIL) &&
 		wrap_counter != !!(flags & VRING_DESC_F_USED);
@@ -914,17 +915,19 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	bool expected = false;
 
 	if (dev->notify_ops->guest_notify) {
-		if (__atomic_compare_exchange_n(&vq->irq_pending, &expected, true, 0,
-				  __ATOMIC_RELEASE, __ATOMIC_RELAXED)) {
+		if (rte_atomic_compare_exchange_strong_explicit(&vq->irq_pending, &expected, true,
+				  rte_memory_order_release, rte_memory_order_relaxed)) {
 			if (dev->notify_ops->guest_notify(dev->vid, vq->index)) {
 				if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-					__atomic_fetch_add(&vq->stats.guest_notifications_offloaded,
-						1, __ATOMIC_RELAXED);
+					rte_atomic_fetch_add_explicit(
+						&vq->stats.guest_notifications_offloaded,
+						1, rte_memory_order_relaxed);
 				return;
 			}
 
 			/* Offloading failed, fallback to direct IRQ injection */
-			__atomic_store_n(&vq->irq_pending, false, __ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&vq->irq_pending, false,
+				rte_memory_order_release);
 		} else {
 			vq->stats.guest_notifications_suppressed++;
 			return;
@@ -933,14 +936,14 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	if (dev->backend_ops->inject_irq(dev, vq)) {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-			__atomic_fetch_add(&vq->stats.guest_notifications_error,
-				1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&vq->stats.guest_notifications_error,
+				1, rte_memory_order_relaxed);
 		return;
 	}
 
 	if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-		__atomic_fetch_add(&vq->stats.guest_notifications,
-			1, __ATOMIC_RELAXED);
+		rte_atomic_fetch_add_explicit(&vq->stats.guest_notifications,
+			1, rte_memory_order_relaxed);
 	if (dev->notify_ops->guest_notified)
 		dev->notify_ops->guest_notified(dev->vid);
 }
@@ -949,7 +952,7 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
 	/* Flush used->idx update before we read avail->flags. */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	/* Don't kick guest if we don't reach index specified by guest. */
 	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) {
@@ -981,7 +984,7 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	bool signalled_used_valid, kick = false;
 
 	/* Flush used desc update. */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	if (!(dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))) {
 		if (vq->driver_event->flags !=
@@ -1007,7 +1010,7 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		goto kick;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+	rte_atomic_thread_fence(rte_memory_order_acquire);
 
 	off_wrap = vq->driver_event->off_wrap;
 	off = off_wrap & ~(1 << 15);
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 901a80b..e363121 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -1914,7 +1914,7 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev,
 
 	if (inflight_split->used_idx != used->idx) {
 		inflight_split->desc[last_io].inflight = 0;
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+		rte_atomic_thread_fence(rte_memory_order_seq_cst);
 		inflight_split->used_idx = used->idx;
 	}
 
@@ -2418,10 +2418,10 @@ static int vhost_user_set_log_fd(struct virtio_net **pdev,
 	 * Set the flag to inject a RARP broadcast packet at
 	 * rte_vhost_dequeue_burst().
 	 *
-	 * __ATOMIC_RELEASE ordering is for making sure the mac is
+	 * rte_memory_order_release ordering is for making sure the mac is
 	 * copied before the flag is set.
 	 */
-	__atomic_store_n(&dev->broadcast_rarp, 1, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&dev->broadcast_rarp, 1, rte_memory_order_release);
 	vdpa_dev = dev->vdpa_dev;
 	if (vdpa_dev && vdpa_dev->ops->migration_done)
 		vdpa_dev->ops->migration_done(dev->vid);
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 759a78e..8af20f1 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -298,8 +298,8 @@
 
 	vhost_log_cache_sync(dev, vq);
 
-	__atomic_fetch_add(&vq->used->idx, vq->shadow_used_idx,
-			   __ATOMIC_RELEASE);
+	rte_atomic_fetch_add_explicit((unsigned short __rte_atomic *)&vq->used->idx,
+		vq->shadow_used_idx, rte_memory_order_release);
 	vq->shadow_used_idx = 0;
 	vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx),
 		sizeof(vq->used->idx));
@@ -335,7 +335,7 @@
 	}
 
 	/* The ordering for storing desc flags needs to be enforced. */
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	for (i = 0; i < vq->shadow_used_idx; i++) {
 		uint16_t flags;
@@ -387,8 +387,9 @@
 
 	vq->desc_packed[vq->shadow_last_used_idx].id = used_elem->id;
 	/* desc flags is the synchronization point for virtio packed vring */
-	__atomic_store_n(&vq->desc_packed[vq->shadow_last_used_idx].flags,
-			 used_elem->flags, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(
+		(unsigned short __rte_atomic *)&vq->desc_packed[vq->shadow_last_used_idx].flags,
+		used_elem->flags, rte_memory_order_release);
 
 	vhost_log_cache_used_vring(dev, vq, vq->shadow_last_used_idx *
 				   sizeof(struct vring_packed_desc),
@@ -418,7 +419,7 @@
 		desc_base[i].len = lens[i];
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 		desc_base[i].flags = flags;
@@ -515,7 +516,7 @@
 		vq->desc_packed[vq->last_used_idx + i].len = 0;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 	vhost_for_each_try_unroll(i, begin, PACKED_BATCH_SIZE)
 		vq->desc_packed[vq->last_used_idx + i].flags = flags;
 
@@ -1415,7 +1416,8 @@
 	 * The ordering between avail index and
 	 * desc reads needs to be enforced.
 	 */
-	avail_head = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE);
+	avail_head = rte_atomic_load_explicit((unsigned short __rte_atomic *)&vq->avail->idx,
+		rte_memory_order_acquire);
 
 	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
 
@@ -1806,7 +1808,8 @@
 	/*
 	 * The ordering between avail index and desc reads need to be enforced.
 	 */
-	avail_head = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE);
+	avail_head = rte_atomic_load_explicit((unsigned short __rte_atomic *)&vq->avail->idx,
+		rte_memory_order_acquire);
 
 	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
 
@@ -2222,7 +2225,7 @@
 	}
 
 	/* The ordering for storing desc flags needs to be enforced. */
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	from = async->last_buffer_idx_packed;
 
@@ -2311,7 +2314,9 @@
 			vhost_vring_call_packed(dev, vq);
 		} else {
 			write_back_completed_descs_split(vq, n_descs);
-			__atomic_fetch_add(&vq->used->idx, n_descs, __ATOMIC_RELEASE);
+			rte_atomic_fetch_add_explicit(
+				(unsigned short __rte_atomic *)&vq->used->idx,
+				n_descs, rte_memory_order_release);
 			vhost_vring_call_split(dev, vq);
 		}
 	} else {
@@ -3085,8 +3090,8 @@
 	 * The ordering between avail index and
 	 * desc reads needs to be enforced.
 	 */
-	avail_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
-			vq->last_avail_idx;
+	avail_entries = rte_atomic_load_explicit((unsigned short __rte_atomic *)&vq->avail->idx,
+		rte_memory_order_acquire) - vq->last_avail_idx;
 	if (avail_entries == 0)
 		return 0;
 
@@ -3224,7 +3229,7 @@
 			return -1;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+	rte_atomic_thread_fence(rte_memory_order_acquire);
 
 	vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
 		lens[i] = descs[avail_idx + i].len;
@@ -3297,7 +3302,7 @@
 			return -1;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+	rte_atomic_thread_fence(rte_memory_order_acquire);
 
 	vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
 		lens[i] = descs[avail_idx + i].len;
@@ -3590,7 +3595,7 @@
 	 *
 	 * broadcast_rarp shares a cacheline in the virtio_net structure
 	 * with some fields that are accessed during enqueue and
-	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * rte_atomic_compare_exchange_strong_explicit causes a write if performed compare
 	 * and exchange. This could result in false sharing between enqueue
 	 * and dequeue.
 	 *
@@ -3598,9 +3603,9 @@
 	 * and only performing compare and exchange if the read indicates it
 	 * is likely to be set.
 	 */
-	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
-			__atomic_compare_exchange_n(&dev->broadcast_rarp,
-			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+	if (unlikely(rte_atomic_load_explicit(&dev->broadcast_rarp, rte_memory_order_acquire) &&
+			rte_atomic_compare_exchange_strong_explicit(&dev->broadcast_rarp,
+			&success, 0, rte_memory_order_release, rte_memory_order_relaxed))) {
 
 		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
 		if (rarp_mbuf == NULL) {
@@ -3683,7 +3688,8 @@
 		vhost_vring_call_packed(dev, vq);
 	} else {
 		write_back_completed_descs_split(vq, nr_cpl_pkts);
-		__atomic_fetch_add(&vq->used->idx, nr_cpl_pkts, __ATOMIC_RELEASE);
+		rte_atomic_fetch_add_explicit((unsigned short __rte_atomic *)&vq->used->idx,
+			nr_cpl_pkts, rte_memory_order_release);
 		vhost_vring_call_split(dev, vq);
 	}
 	vq->async->pkts_inflight_n -= nr_cpl_pkts;
@@ -3714,8 +3720,8 @@
 	 * The ordering between avail index and
 	 * desc reads needs to be enforced.
 	 */
-	avail_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
-			vq->last_avail_idx;
+	avail_entries = rte_atomic_load_explicit((unsigned short __rte_atomic *)&vq->avail->idx,
+		rte_memory_order_acquire) - vq->last_avail_idx;
 	if (avail_entries == 0)
 		goto out;
 
@@ -4204,7 +4210,7 @@
 	 *
 	 * broadcast_rarp shares a cacheline in the virtio_net structure
 	 * with some fields that are accessed during enqueue and
-	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * rte_atomic_compare_exchange_strong_explicit causes a write if performed compare
 	 * and exchange. This could result in false sharing between enqueue
 	 * and dequeue.
 	 *
@@ -4212,9 +4218,9 @@
 	 * and only performing compare and exchange if the read indicates it
 	 * is likely to be set.
 	 */
-	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
-			__atomic_compare_exchange_n(&dev->broadcast_rarp,
-			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+	if (unlikely(rte_atomic_load_explicit(&dev->broadcast_rarp, rte_memory_order_acquire) &&
+			rte_atomic_compare_exchange_strong_explicit(&dev->broadcast_rarp,
+			&success, 0, rte_memory_order_release, rte_memory_order_relaxed))) {
 
 		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
 		if (rarp_mbuf == NULL) {
diff --git a/lib/vhost/virtio_net_ctrl.c b/lib/vhost/virtio_net_ctrl.c
index 6b583a0..c4847f8 100644
--- a/lib/vhost/virtio_net_ctrl.c
+++ b/lib/vhost/virtio_net_ctrl.c
@@ -33,7 +33,8 @@ struct virtio_net_ctrl_elem {
 	uint8_t *ctrl_req;
 	struct vring_desc *descs;
 
-	avail_idx = __atomic_load_n(&cvq->avail->idx, __ATOMIC_ACQUIRE);
+	avail_idx = rte_atomic_load_explicit((unsigned short __rte_atomic *)&cvq->avail->idx,
+		rte_memory_order_acquire);
 	if (avail_idx == cvq->last_avail_idx) {
 		VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue empty\n");
 		return 0;
@@ -236,7 +237,8 @@ struct virtio_net_ctrl_elem {
 	if (cvq->last_used_idx >= cvq->size)
 		cvq->last_used_idx -= cvq->size;
 
-	__atomic_store_n(&cvq->used->idx, cvq->last_used_idx, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit((unsigned short __rte_atomic *)&cvq->used->idx,
+		cvq->last_used_idx, rte_memory_order_release);
 
 	vhost_vring_call_split(dev, dev->cvq);
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 16/21] cryptodev: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (14 preceding siblings ...)
  2023-10-16 23:08 ` [PATCH 15/21] vhost: " Tyler Retzlaff
@ 2023-10-16 23:09 ` Tyler Retzlaff
  2023-10-16 23:09 ` [PATCH 17/21] distributor: " Tyler Retzlaff
                   ` (6 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:09 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/cryptodev/rte_cryptodev.c | 22 ++++++++++++----------
 lib/cryptodev/rte_cryptodev.h | 16 ++++++++--------
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/lib/cryptodev/rte_cryptodev.c b/lib/cryptodev/rte_cryptodev.c
index 314710b..b258827 100644
--- a/lib/cryptodev/rte_cryptodev.c
+++ b/lib/cryptodev/rte_cryptodev.c
@@ -1535,12 +1535,12 @@ struct rte_cryptodev_cb *
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
 	} else {
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&list->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&list->next, cb, rte_memory_order_release);
 	}
 
 	rte_spinlock_unlock(&rte_cryptodev_callback_lock);
@@ -1555,7 +1555,8 @@ struct rte_cryptodev_cb *
 				  struct rte_cryptodev_cb *cb)
 {
 	struct rte_cryptodev *dev;
-	struct rte_cryptodev_cb **prev_cb, *curr_cb;
+	RTE_ATOMIC(struct rte_cryptodev_cb *) *prev_cb;
+	struct rte_cryptodev_cb *curr_cb;
 	struct rte_cryptodev_cb_rcu *list;
 	int ret;
 
@@ -1601,8 +1602,8 @@ struct rte_cryptodev_cb *
 		curr_cb = *prev_cb;
 		if (curr_cb == cb) {
 			/* Remove the user cb from the callback list. */
-			__atomic_store_n(prev_cb, curr_cb->next,
-				__ATOMIC_RELAXED);
+			rte_atomic_store_explicit(prev_cb, curr_cb->next,
+				rte_memory_order_relaxed);
 			ret = 0;
 			break;
 		}
@@ -1673,12 +1674,12 @@ struct rte_cryptodev_cb *
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
 	} else {
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&list->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&list->next, cb, rte_memory_order_release);
 	}
 
 	rte_spinlock_unlock(&rte_cryptodev_callback_lock);
@@ -1694,7 +1695,8 @@ struct rte_cryptodev_cb *
 				  struct rte_cryptodev_cb *cb)
 {
 	struct rte_cryptodev *dev;
-	struct rte_cryptodev_cb **prev_cb, *curr_cb;
+	RTE_ATOMIC(struct rte_cryptodev_cb *) *prev_cb;
+	struct rte_cryptodev_cb *curr_cb;
 	struct rte_cryptodev_cb_rcu *list;
 	int ret;
 
@@ -1740,8 +1742,8 @@ struct rte_cryptodev_cb *
 		curr_cb = *prev_cb;
 		if (curr_cb == cb) {
 			/* Remove the user cb from the callback list. */
-			__atomic_store_n(prev_cb, curr_cb->next,
-				__ATOMIC_RELAXED);
+			rte_atomic_store_explicit(prev_cb, curr_cb->next,
+				rte_memory_order_relaxed);
 			ret = 0;
 			break;
 		}
diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
index be0698c..9092118 100644
--- a/lib/cryptodev/rte_cryptodev.h
+++ b/lib/cryptodev/rte_cryptodev.h
@@ -979,7 +979,7 @@ struct rte_cryptodev_config {
  * queue pair on enqueue/dequeue.
  */
 struct rte_cryptodev_cb {
-	struct rte_cryptodev_cb *next;
+	RTE_ATOMIC(struct rte_cryptodev_cb *) next;
 	/**< Pointer to next callback */
 	rte_cryptodev_callback_fn fn;
 	/**< Pointer to callback function */
@@ -992,7 +992,7 @@ struct rte_cryptodev_cb {
  * Structure used to hold information about the RCU for a queue pair.
  */
 struct rte_cryptodev_cb_rcu {
-	struct rte_cryptodev_cb *next;
+	RTE_ATOMIC(struct rte_cryptodev_cb *) next;
 	/**< Pointer to next callback */
 	struct rte_rcu_qsbr *qsbr;
 	/**< RCU QSBR variable per queue pair */
@@ -1947,15 +1947,15 @@ int rte_cryptodev_remove_deq_callback(uint8_t dev_id,
 		struct rte_cryptodev_cb_rcu *list;
 		struct rte_cryptodev_cb *cb;
 
-		/* __ATOMIC_RELEASE memory order was used when the
+		/* rte_memory_order_release memory order was used when the
 		 * call back was inserted into the list.
 		 * Since there is a clear dependency between loading
-		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
 		 * not required.
 		 */
 		list = &fp_ops->qp.deq_cb[qp_id];
 		rte_rcu_qsbr_thread_online(list->qsbr, 0);
-		cb = __atomic_load_n(&list->next, __ATOMIC_RELAXED);
+		cb = rte_atomic_load_explicit(&list->next, rte_memory_order_relaxed);
 
 		while (cb != NULL) {
 			nb_ops = cb->fn(dev_id, qp_id, ops, nb_ops,
@@ -2014,15 +2014,15 @@ int rte_cryptodev_remove_deq_callback(uint8_t dev_id,
 		struct rte_cryptodev_cb_rcu *list;
 		struct rte_cryptodev_cb *cb;
 
-		/* __ATOMIC_RELEASE memory order was used when the
+		/* rte_memory_order_release memory order was used when the
 		 * call back was inserted into the list.
 		 * Since there is a clear dependency between loading
-		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
 		 * not required.
 		 */
 		list = &fp_ops->qp.enq_cb[qp_id];
 		rte_rcu_qsbr_thread_online(list->qsbr, 0);
-		cb = __atomic_load_n(&list->next, __ATOMIC_RELAXED);
+		cb = rte_atomic_load_explicit(&list->next, rte_memory_order_relaxed);
 
 		while (cb != NULL) {
 			nb_ops = cb->fn(dev_id, qp_id, ops, nb_ops,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 17/21] distributor: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (15 preceding siblings ...)
  2023-10-16 23:09 ` [PATCH 16/21] cryptodev: " Tyler Retzlaff
@ 2023-10-16 23:09 ` Tyler Retzlaff
  2023-10-16 23:09 ` [PATCH 18/21] ethdev: " Tyler Retzlaff
                   ` (5 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:09 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/distributor/distributor_private.h |  4 +--
 lib/distributor/rte_distributor.c     | 54 +++++++++++++++++------------------
 2 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/lib/distributor/distributor_private.h b/lib/distributor/distributor_private.h
index 2f29343..dfeb9b5 100644
--- a/lib/distributor/distributor_private.h
+++ b/lib/distributor/distributor_private.h
@@ -113,12 +113,12 @@ enum rte_distributor_match_function {
  * There is a separate cacheline for returns in the burst API.
  */
 struct rte_distributor_buffer {
-	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+	volatile RTE_ATOMIC(int64_t) bufptr64[RTE_DIST_BURST_SIZE]
 		__rte_cache_aligned; /* <= outgoing to worker */
 
 	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
 
-	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+	volatile RTE_ATOMIC(int64_t) retptr64[RTE_DIST_BURST_SIZE]
 		__rte_cache_aligned; /* <= incoming from worker */
 
 	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
diff --git a/lib/distributor/rte_distributor.c b/lib/distributor/rte_distributor.c
index 5ca80dd..2ecb95c 100644
--- a/lib/distributor/rte_distributor.c
+++ b/lib/distributor/rte_distributor.c
@@ -38,7 +38,7 @@
 	struct rte_distributor_buffer *buf = &(d->bufs[worker_id]);
 	unsigned int i;
 
-	volatile int64_t *retptr64;
+	volatile RTE_ATOMIC(int64_t) *retptr64;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
 		rte_distributor_request_pkt_single(d->d_single,
@@ -50,7 +50,7 @@
 	/* Spin while handshake bits are set (scheduler clears it).
 	 * Sync with worker on GET_BUF flag.
 	 */
-	while (unlikely(__atomic_load_n(retptr64, __ATOMIC_ACQUIRE)
+	while (unlikely(rte_atomic_load_explicit(retptr64, rte_memory_order_acquire)
 			& (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF))) {
 		rte_pause();
 		uint64_t t = rte_rdtsc()+100;
@@ -78,8 +78,8 @@
 	 * line is ready for processing
 	 * Sync with distributor to release retptrs
 	 */
-	__atomic_store_n(retptr64, *retptr64 | RTE_DISTRIB_GET_BUF,
-			__ATOMIC_RELEASE);
+	rte_atomic_store_explicit(retptr64, *retptr64 | RTE_DISTRIB_GET_BUF,
+			rte_memory_order_release);
 }
 
 int
@@ -102,7 +102,7 @@
 	 * RETURN_BUF is set when distributor must retrieve in-flight packets
 	 * Sync with distributor to acquire bufptrs
 	 */
-	if (__atomic_load_n(&(buf->bufptr64[0]), __ATOMIC_ACQUIRE)
+	if (rte_atomic_load_explicit(&(buf->bufptr64[0]), rte_memory_order_acquire)
 		& (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF))
 		return -1;
 
@@ -120,8 +120,8 @@
 	 * on the next cacheline while we're working.
 	 * Sync with distributor on GET_BUF flag. Release bufptrs.
 	 */
-	__atomic_store_n(&(buf->bufptr64[0]),
-		buf->bufptr64[0] | RTE_DISTRIB_GET_BUF, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(buf->bufptr64[0]),
+		buf->bufptr64[0] | RTE_DISTRIB_GET_BUF, rte_memory_order_release);
 
 	return count;
 }
@@ -177,7 +177,7 @@
 	/* Spin while handshake bits are set (scheduler clears it).
 	 * Sync with worker on GET_BUF flag.
 	 */
-	while (unlikely(__atomic_load_n(&(buf->retptr64[0]), __ATOMIC_RELAXED)
+	while (unlikely(rte_atomic_load_explicit(&(buf->retptr64[0]), rte_memory_order_relaxed)
 			& (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF))) {
 		rte_pause();
 		uint64_t t = rte_rdtsc()+100;
@@ -187,7 +187,7 @@
 	}
 
 	/* Sync with distributor to acquire retptrs */
-	__atomic_thread_fence(__ATOMIC_ACQUIRE);
+	__atomic_thread_fence(rte_memory_order_acquire);
 	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
 		/* Switch off the return bit first */
 		buf->retptr64[i] = 0;
@@ -200,15 +200,15 @@
 	 * we won't read any mbufs from there even if GET_BUF is set.
 	 * This allows distributor to retrieve in-flight already sent packets.
 	 */
-	__atomic_fetch_or(&(buf->bufptr64[0]), RTE_DISTRIB_RETURN_BUF,
-		__ATOMIC_ACQ_REL);
+	rte_atomic_fetch_or_explicit(&(buf->bufptr64[0]), RTE_DISTRIB_RETURN_BUF,
+		rte_memory_order_acq_rel);
 
 	/* set the RETURN_BUF on retptr64 even if we got no returns.
 	 * Sync with distributor on RETURN_BUF flag. Release retptrs.
 	 * Notify distributor that we don't request more packets any more.
 	 */
-	__atomic_store_n(&(buf->retptr64[0]),
-		buf->retptr64[0] | RTE_DISTRIB_RETURN_BUF, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(buf->retptr64[0]),
+		buf->retptr64[0] | RTE_DISTRIB_RETURN_BUF, rte_memory_order_release);
 
 	return 0;
 }
@@ -297,7 +297,7 @@
 	 * to worker which does not require new packets.
 	 * They must be retrieved and assigned to another worker.
 	 */
-	if (!(__atomic_load_n(&(buf->bufptr64[0]), __ATOMIC_ACQUIRE)
+	if (!(rte_atomic_load_explicit(&(buf->bufptr64[0]), rte_memory_order_acquire)
 		& RTE_DISTRIB_GET_BUF))
 		for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
 			if (buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)
@@ -310,8 +310,8 @@
 	 *     with new packets if worker will make a new request.
 	 * - clear RETURN_BUF to unlock reads on worker side.
 	 */
-	__atomic_store_n(&(buf->bufptr64[0]), RTE_DISTRIB_GET_BUF,
-		__ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(buf->bufptr64[0]), RTE_DISTRIB_GET_BUF,
+		rte_memory_order_release);
 
 	/* Collect backlog packets from worker */
 	for (i = 0; i < d->backlog[wkr].count; i++)
@@ -348,7 +348,7 @@
 	unsigned int i;
 
 	/* Sync on GET_BUF flag. Acquire retptrs. */
-	if (__atomic_load_n(&(buf->retptr64[0]), __ATOMIC_ACQUIRE)
+	if (rte_atomic_load_explicit(&(buf->retptr64[0]), rte_memory_order_acquire)
 		& (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF)) {
 		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
 			if (buf->retptr64[i] & RTE_DISTRIB_VALID_BUF) {
@@ -379,7 +379,7 @@
 		/* Clear for the worker to populate with more returns.
 		 * Sync with distributor on GET_BUF flag. Release retptrs.
 		 */
-		__atomic_store_n(&(buf->retptr64[0]), 0, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&(buf->retptr64[0]), 0, rte_memory_order_release);
 	}
 	return count;
 }
@@ -404,7 +404,7 @@
 		return 0;
 
 	/* Sync with worker on GET_BUF flag */
-	while (!(__atomic_load_n(&(d->bufs[wkr].bufptr64[0]), __ATOMIC_ACQUIRE)
+	while (!(rte_atomic_load_explicit(&(d->bufs[wkr].bufptr64[0]), rte_memory_order_acquire)
 		& RTE_DISTRIB_GET_BUF)) {
 		handle_returns(d, wkr);
 		if (unlikely(!d->active[wkr]))
@@ -430,8 +430,8 @@
 	/* Clear the GET bit.
 	 * Sync with worker on GET_BUF flag. Release bufptrs.
 	 */
-	__atomic_store_n(&(buf->bufptr64[0]),
-		buf->bufptr64[0] & ~RTE_DISTRIB_GET_BUF, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(buf->bufptr64[0]),
+		buf->bufptr64[0] & ~RTE_DISTRIB_GET_BUF, rte_memory_order_release);
 	return  buf->count;
 
 }
@@ -463,8 +463,8 @@
 		/* Flush out all non-full cache-lines to workers. */
 		for (wid = 0 ; wid < d->num_workers; wid++) {
 			/* Sync with worker on GET_BUF flag. */
-			if (__atomic_load_n(&(d->bufs[wid].bufptr64[0]),
-				__ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF) {
+			if (rte_atomic_load_explicit(&(d->bufs[wid].bufptr64[0]),
+				rte_memory_order_acquire) & RTE_DISTRIB_GET_BUF) {
 				d->bufs[wid].count = 0;
 				release(d, wid);
 				handle_returns(d, wid);
@@ -598,8 +598,8 @@
 	/* Flush out all non-full cache-lines to workers. */
 	for (wid = 0 ; wid < d->num_workers; wid++)
 		/* Sync with worker on GET_BUF flag. */
-		if ((__atomic_load_n(&(d->bufs[wid].bufptr64[0]),
-			__ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF)) {
+		if ((rte_atomic_load_explicit(&(d->bufs[wid].bufptr64[0]),
+			rte_memory_order_acquire) & RTE_DISTRIB_GET_BUF)) {
 			d->bufs[wid].count = 0;
 			release(d, wid);
 		}
@@ -700,8 +700,8 @@
 	/* throw away returns, so workers can exit */
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		/* Sync with worker. Release retptrs. */
-		__atomic_store_n(&(d->bufs[wkr].retptr64[0]), 0,
-				__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&(d->bufs[wkr].retptr64[0]), 0,
+				rte_memory_order_release);
 
 	d->returns.start = d->returns.count = 0;
 }
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 18/21] ethdev: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (16 preceding siblings ...)
  2023-10-16 23:09 ` [PATCH 17/21] distributor: " Tyler Retzlaff
@ 2023-10-16 23:09 ` Tyler Retzlaff
  2023-10-16 23:09 ` [PATCH 19/21] hash: " Tyler Retzlaff
                   ` (4 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:09 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/ethdev/ethdev_driver.h   | 16 ++++++++--------
 lib/ethdev/ethdev_private.c  |  6 +++---
 lib/ethdev/rte_ethdev.c      | 24 ++++++++++++------------
 lib/ethdev/rte_ethdev.h      | 16 ++++++++--------
 lib/ethdev/rte_ethdev_core.h |  2 +-
 5 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index deb23ad..b482cd1 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -30,7 +30,7 @@
  * queue on Rx and Tx.
  */
 struct rte_eth_rxtx_callback {
-	struct rte_eth_rxtx_callback *next;
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) next;
 	union{
 		rte_rx_callback_fn rx;
 		rte_tx_callback_fn tx;
@@ -80,12 +80,12 @@ struct rte_eth_dev {
 	 * User-supplied functions called from rx_burst to post-process
 	 * received packets before passing them to the user
 	 */
-	struct rte_eth_rxtx_callback *post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
 	/**
 	 * User-supplied functions called from tx_burst to pre-process
 	 * received packets before passing them to the driver for transmission
 	 */
-	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
 
 	enum rte_eth_dev_state state; /**< Flag indicating the port state */
 	void *security_ctx; /**< Context for security ops */
@@ -1655,7 +1655,7 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 rte_eth_linkstatus_set(struct rte_eth_dev *dev,
 		       const struct rte_eth_link *new_link)
 {
-	uint64_t *dev_link = (uint64_t *)&(dev->data->dev_link);
+	RTE_ATOMIC(uint64_t) *dev_link = (uint64_t __rte_atomic *)&(dev->data->dev_link);
 	union {
 		uint64_t val64;
 		struct rte_eth_link link;
@@ -1663,8 +1663,8 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 
 	RTE_BUILD_BUG_ON(sizeof(*new_link) != sizeof(uint64_t));
 
-	orig.val64 = __atomic_exchange_n(dev_link, *(const uint64_t *)new_link,
-					__ATOMIC_SEQ_CST);
+	orig.val64 = rte_atomic_exchange_explicit(dev_link, *(const uint64_t *)new_link,
+					rte_memory_order_seq_cst);
 
 	return (orig.link.link_status == new_link->link_status) ? -1 : 0;
 }
@@ -1682,12 +1682,12 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
 		       struct rte_eth_link *link)
 {
-	uint64_t *src = (uint64_t *)&(dev->data->dev_link);
+	RTE_ATOMIC(uint64_t) *src = (uint64_t __rte_atomic *)&(dev->data->dev_link);
 	uint64_t *dst = (uint64_t *)link;
 
 	RTE_BUILD_BUG_ON(sizeof(*link) != sizeof(uint64_t));
 
-	*dst = __atomic_load_n(src, __ATOMIC_SEQ_CST);
+	*dst = rte_atomic_load_explicit(src, rte_memory_order_seq_cst);
 }
 
 /**
diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
index 7cc7f28..82e2568 100644
--- a/lib/ethdev/ethdev_private.c
+++ b/lib/ethdev/ethdev_private.c
@@ -245,7 +245,7 @@ struct dummy_queue {
 void
 eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo)
 {
-	static void *dummy_data[RTE_MAX_QUEUES_PER_PORT];
+	static RTE_ATOMIC(void *) dummy_data[RTE_MAX_QUEUES_PER_PORT];
 	uintptr_t port_id = fpo - rte_eth_fp_ops;
 
 	per_port_queues[port_id].rx_warn_once = false;
@@ -278,10 +278,10 @@ struct dummy_queue {
 	fpo->recycle_rx_descriptors_refill = dev->recycle_rx_descriptors_refill;
 
 	fpo->rxq.data = dev->data->rx_queues;
-	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
+	fpo->rxq.clbk = (void * __rte_atomic *)(uintptr_t)dev->post_rx_burst_cbs;
 
 	fpo->txq.data = dev->data->tx_queues;
-	fpo->txq.clbk = (void **)(uintptr_t)dev->pre_tx_burst_cbs;
+	fpo->txq.clbk = (void * __rte_atomic *)(uintptr_t)dev->pre_tx_burst_cbs;
 }
 
 uint16_t
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 9dabcb5..af23ac0 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5654,9 +5654,9 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(
+		rte_atomic_store_explicit(
 			&rte_eth_devices[port_id].post_rx_burst_cbs[queue_id],
-			cb, __ATOMIC_RELEASE);
+			cb, rte_memory_order_release);
 
 	} else {
 		while (tail->next)
@@ -5664,7 +5664,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
 	}
 	rte_spinlock_unlock(&eth_dev_rx_cb_lock);
 
@@ -5704,9 +5704,9 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 	/* Stores to cb->fn, cb->param and cb->next should complete before
 	 * cb is visible to data plane threads.
 	 */
-	__atomic_store_n(
+	rte_atomic_store_explicit(
 		&rte_eth_devices[port_id].post_rx_burst_cbs[queue_id],
-		cb, __ATOMIC_RELEASE);
+		cb, rte_memory_order_release);
 	rte_spinlock_unlock(&eth_dev_rx_cb_lock);
 
 	rte_eth_trace_add_first_rx_callback(port_id, queue_id, fn, user_param,
@@ -5757,9 +5757,9 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(
+		rte_atomic_store_explicit(
 			&rte_eth_devices[port_id].pre_tx_burst_cbs[queue_id],
-			cb, __ATOMIC_RELEASE);
+			cb, rte_memory_order_release);
 
 	} else {
 		while (tail->next)
@@ -5767,7 +5767,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
 	}
 	rte_spinlock_unlock(&eth_dev_tx_cb_lock);
 
@@ -5791,7 +5791,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 	struct rte_eth_rxtx_callback *cb;
-	struct rte_eth_rxtx_callback **prev_cb;
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) *prev_cb;
 	int ret = -EINVAL;
 
 	rte_spinlock_lock(&eth_dev_rx_cb_lock);
@@ -5800,7 +5800,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		cb = *prev_cb;
 		if (cb == user_cb) {
 			/* Remove the user cb from the callback list. */
-			__atomic_store_n(prev_cb, cb->next, __ATOMIC_RELAXED);
+			rte_atomic_store_explicit(prev_cb, cb->next, rte_memory_order_relaxed);
 			ret = 0;
 			break;
 		}
@@ -5828,7 +5828,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 	int ret = -EINVAL;
 	struct rte_eth_rxtx_callback *cb;
-	struct rte_eth_rxtx_callback **prev_cb;
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) *prev_cb;
 
 	rte_spinlock_lock(&eth_dev_tx_cb_lock);
 	prev_cb = &dev->pre_tx_burst_cbs[queue_id];
@@ -5836,7 +5836,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		cb = *prev_cb;
 		if (cb == user_cb) {
 			/* Remove the user cb from the callback list. */
-			__atomic_store_n(prev_cb, cb->next, __ATOMIC_RELAXED);
+			rte_atomic_store_explicit(prev_cb, cb->next, rte_memory_order_relaxed);
 			ret = 0;
 			break;
 		}
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index f949dfc..ec48b24 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6018,14 +6018,14 @@ uint16_t rte_eth_call_rx_callbacks(uint16_t port_id, uint16_t queue_id,
 	{
 		void *cb;
 
-		/* __ATOMIC_RELEASE memory order was used when the
+		/* rte_memory_order_release memory order was used when the
 		 * call back was inserted into the list.
 		 * Since there is a clear dependency between loading
-		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
 		 * not required.
 		 */
-		cb = __atomic_load_n((void **)&p->rxq.clbk[queue_id],
-				__ATOMIC_RELAXED);
+		cb = rte_atomic_load_explicit(&p->rxq.clbk[queue_id],
+				rte_memory_order_relaxed);
 		if (unlikely(cb != NULL))
 			nb_rx = rte_eth_call_rx_callbacks(port_id, queue_id,
 					rx_pkts, nb_rx, nb_pkts, cb);
@@ -6355,14 +6355,14 @@ uint16_t rte_eth_call_tx_callbacks(uint16_t port_id, uint16_t queue_id,
 	{
 		void *cb;
 
-		/* __ATOMIC_RELEASE memory order was used when the
+		/* rte_memory_order_release memory order was used when the
 		 * call back was inserted into the list.
 		 * Since there is a clear dependency between loading
-		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
 		 * not required.
 		 */
-		cb = __atomic_load_n((void **)&p->txq.clbk[queue_id],
-				__ATOMIC_RELAXED);
+		cb = rte_atomic_load_explicit(&p->txq.clbk[queue_id],
+				rte_memory_order_relaxed);
 		if (unlikely(cb != NULL))
 			nb_pkts = rte_eth_call_tx_callbacks(port_id, queue_id,
 					tx_pkts, nb_pkts, cb);
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index 32f5f73..4bfaf79 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -71,7 +71,7 @@ struct rte_ethdev_qdata {
 	/** points to array of internal queue data pointers */
 	void **data;
 	/** points to array of queue callback data pointers */
-	void **clbk;
+	RTE_ATOMIC(void *) *clbk;
 };
 
 /**
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 19/21] hash: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (17 preceding siblings ...)
  2023-10-16 23:09 ` [PATCH 18/21] ethdev: " Tyler Retzlaff
@ 2023-10-16 23:09 ` Tyler Retzlaff
  2023-10-16 23:09 ` [PATCH 20/21] timer: " Tyler Retzlaff
                   ` (3 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:09 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/hash/rte_cuckoo_hash.c | 116 ++++++++++++++++++++++-----------------------
 lib/hash/rte_cuckoo_hash.h |   6 +--
 2 files changed, 61 insertions(+), 61 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 19b23f2..b2cf60d 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -149,7 +149,7 @@ struct rte_hash *
 	unsigned int writer_takes_lock = 0;
 	unsigned int no_free_on_del = 0;
 	uint32_t *ext_bkt_to_free = NULL;
-	uint32_t *tbl_chng_cnt = NULL;
+	RTE_ATOMIC(uint32_t) *tbl_chng_cnt = NULL;
 	struct lcore_cache *local_free_slots = NULL;
 	unsigned int readwrite_concur_lf_support = 0;
 	uint32_t i;
@@ -713,9 +713,9 @@ struct rte_hash *
 				 * variable. Release the application data
 				 * to the readers.
 				 */
-				__atomic_store_n(&k->pdata,
+				rte_atomic_store_explicit(&k->pdata,
 					data,
-					__ATOMIC_RELEASE);
+					rte_memory_order_release);
 				/*
 				 * Return index where key is stored,
 				 * subtracting the first dummy index
@@ -776,9 +776,9 @@ struct rte_hash *
 			 * key_idx is the guard variable for signature
 			 * and key.
 			 */
-			__atomic_store_n(&prim_bkt->key_idx[i],
+			rte_atomic_store_explicit(&prim_bkt->key_idx[i],
 					 new_idx,
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 			break;
 		}
 	}
@@ -851,9 +851,9 @@ struct rte_hash *
 		if (unlikely(&h->buckets[prev_alt_bkt_idx]
 				!= curr_bkt)) {
 			/* revert it to empty, otherwise duplicated keys */
-			__atomic_store_n(&curr_bkt->key_idx[curr_slot],
+			rte_atomic_store_explicit(&curr_bkt->key_idx[curr_slot],
 				EMPTY_SLOT,
-				__ATOMIC_RELEASE);
+				rte_memory_order_release);
 			__hash_rw_writer_unlock(h);
 			return -1;
 		}
@@ -865,13 +865,13 @@ struct rte_hash *
 			 * Since there is one writer, load acquires on
 			 * tbl_chng_cnt are not required.
 			 */
-			__atomic_store_n(h->tbl_chng_cnt,
+			rte_atomic_store_explicit(h->tbl_chng_cnt,
 					 *h->tbl_chng_cnt + 1,
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 			/* The store to sig_current should not
 			 * move above the store to tbl_chng_cnt.
 			 */
-			__atomic_thread_fence(__ATOMIC_RELEASE);
+			__atomic_thread_fence(rte_memory_order_release);
 		}
 
 		/* Need to swap current/alt sig to allow later
@@ -881,9 +881,9 @@ struct rte_hash *
 		curr_bkt->sig_current[curr_slot] =
 			prev_bkt->sig_current[prev_slot];
 		/* Release the updated bucket entry */
-		__atomic_store_n(&curr_bkt->key_idx[curr_slot],
+		rte_atomic_store_explicit(&curr_bkt->key_idx[curr_slot],
 			prev_bkt->key_idx[prev_slot],
-			__ATOMIC_RELEASE);
+			rte_memory_order_release);
 
 		curr_slot = prev_slot;
 		curr_node = prev_node;
@@ -897,20 +897,20 @@ struct rte_hash *
 		 * Since there is one writer, load acquires on
 		 * tbl_chng_cnt are not required.
 		 */
-		__atomic_store_n(h->tbl_chng_cnt,
+		rte_atomic_store_explicit(h->tbl_chng_cnt,
 				 *h->tbl_chng_cnt + 1,
-				 __ATOMIC_RELEASE);
+				 rte_memory_order_release);
 		/* The store to sig_current should not
 		 * move above the store to tbl_chng_cnt.
 		 */
-		__atomic_thread_fence(__ATOMIC_RELEASE);
+		__atomic_thread_fence(rte_memory_order_release);
 	}
 
 	curr_bkt->sig_current[curr_slot] = sig;
 	/* Release the new bucket entry */
-	__atomic_store_n(&curr_bkt->key_idx[curr_slot],
+	rte_atomic_store_explicit(&curr_bkt->key_idx[curr_slot],
 			 new_idx,
-			 __ATOMIC_RELEASE);
+			 rte_memory_order_release);
 
 	__hash_rw_writer_unlock(h);
 
@@ -1076,9 +1076,9 @@ struct rte_hash *
 	 * not leak after the store of pdata in the key store. i.e. pdata is
 	 * the guard variable. Release the application data to the readers.
 	 */
-	__atomic_store_n(&new_k->pdata,
+	rte_atomic_store_explicit(&new_k->pdata,
 		data,
-		__ATOMIC_RELEASE);
+		rte_memory_order_release);
 	/* Copy key */
 	memcpy(new_k->key, key, h->key_len);
 
@@ -1149,9 +1149,9 @@ struct rte_hash *
 				 * key_idx is the guard variable for signature
 				 * and key.
 				 */
-				__atomic_store_n(&cur_bkt->key_idx[i],
+				rte_atomic_store_explicit(&cur_bkt->key_idx[i],
 						 slot_id,
-						 __ATOMIC_RELEASE);
+						 rte_memory_order_release);
 				__hash_rw_writer_unlock(h);
 				return slot_id - 1;
 			}
@@ -1185,9 +1185,9 @@ struct rte_hash *
 	 * the store to key_idx. i.e. key_idx is the guard variable
 	 * for signature and key.
 	 */
-	__atomic_store_n(&(h->buckets_ext[ext_bkt_id - 1]).key_idx[0],
+	rte_atomic_store_explicit(&(h->buckets_ext[ext_bkt_id - 1]).key_idx[0],
 			 slot_id,
-			 __ATOMIC_RELEASE);
+			 rte_memory_order_release);
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
 	last->next = &h->buckets_ext[ext_bkt_id - 1];
@@ -1290,17 +1290,17 @@ struct rte_hash *
 		 * key comparison will ensure that the lookup fails.
 		 */
 		if (bkt->sig_current[i] == sig) {
-			key_idx = __atomic_load_n(&bkt->key_idx[i],
-					  __ATOMIC_ACQUIRE);
+			key_idx = rte_atomic_load_explicit(&bkt->key_idx[i],
+					  rte_memory_order_acquire);
 			if (key_idx != EMPTY_SLOT) {
 				k = (struct rte_hash_key *) ((char *)keys +
 						key_idx * h->key_entry_size);
 
 				if (rte_hash_cmp_eq(key, k->key, h) == 0) {
 					if (data != NULL) {
-						*data = __atomic_load_n(
+						*data = rte_atomic_load_explicit(
 							&k->pdata,
-							__ATOMIC_ACQUIRE);
+							rte_memory_order_acquire);
 					}
 					/*
 					 * Return index where key is stored,
@@ -1374,8 +1374,8 @@ struct rte_hash *
 		 * starts. Acquire semantics will make sure that
 		 * loads in search_one_bucket are not hoisted.
 		 */
-		cnt_b = __atomic_load_n(h->tbl_chng_cnt,
-				__ATOMIC_ACQUIRE);
+		cnt_b = rte_atomic_load_explicit(h->tbl_chng_cnt,
+				rte_memory_order_acquire);
 
 		/* Check if key is in primary location */
 		bkt = &h->buckets[prim_bucket_idx];
@@ -1396,7 +1396,7 @@ struct rte_hash *
 		/* The loads of sig_current in search_one_bucket
 		 * should not move below the load from tbl_chng_cnt.
 		 */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 		/* Re-read the table change counter to check if the
 		 * table has changed during search. If yes, re-do
 		 * the search.
@@ -1405,8 +1405,8 @@ struct rte_hash *
 		 * and key index in secondary bucket will make sure
 		 * that it does not get hoisted.
 		 */
-		cnt_a = __atomic_load_n(h->tbl_chng_cnt,
-					__ATOMIC_ACQUIRE);
+		cnt_a = rte_atomic_load_explicit(h->tbl_chng_cnt,
+					rte_memory_order_acquire);
 	} while (cnt_b != cnt_a);
 
 	return -ENOENT;
@@ -1611,26 +1611,26 @@ struct rte_hash *
 	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
-			__atomic_store_n(&cur_bkt->key_idx[pos],
+			rte_atomic_store_explicit(&cur_bkt->key_idx[pos],
 					 last_bkt->key_idx[i],
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 			if (h->readwrite_concur_lf_support) {
 				/* Inform the readers that the table has changed
 				 * Since there is one writer, load acquire on
 				 * tbl_chng_cnt is not required.
 				 */
-				__atomic_store_n(h->tbl_chng_cnt,
+				rte_atomic_store_explicit(h->tbl_chng_cnt,
 					 *h->tbl_chng_cnt + 1,
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 				/* The store to sig_current should
 				 * not move above the store to tbl_chng_cnt.
 				 */
-				__atomic_thread_fence(__ATOMIC_RELEASE);
+				__atomic_thread_fence(rte_memory_order_release);
 			}
 			last_bkt->sig_current[i] = NULL_SIGNATURE;
-			__atomic_store_n(&last_bkt->key_idx[i],
+			rte_atomic_store_explicit(&last_bkt->key_idx[i],
 					 EMPTY_SLOT,
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 			return;
 		}
 	}
@@ -1650,8 +1650,8 @@ struct rte_hash *
 
 	/* Check if key is in bucket */
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		key_idx = __atomic_load_n(&bkt->key_idx[i],
-					  __ATOMIC_ACQUIRE);
+		key_idx = rte_atomic_load_explicit(&bkt->key_idx[i],
+					  rte_memory_order_acquire);
 		if (bkt->sig_current[i] == sig && key_idx != EMPTY_SLOT) {
 			k = (struct rte_hash_key *) ((char *)keys +
 					key_idx * h->key_entry_size);
@@ -1663,9 +1663,9 @@ struct rte_hash *
 				if (!h->no_free_on_del)
 					remove_entry(h, bkt, i);
 
-				__atomic_store_n(&bkt->key_idx[i],
+				rte_atomic_store_explicit(&bkt->key_idx[i],
 						 EMPTY_SLOT,
-						 __ATOMIC_RELEASE);
+						 rte_memory_order_release);
 
 				*pos = i;
 				/*
@@ -2077,8 +2077,8 @@ struct rte_hash *
 		 * starts. Acquire semantics will make sure that
 		 * loads in compare_signatures are not hoisted.
 		 */
-		cnt_b = __atomic_load_n(h->tbl_chng_cnt,
-					__ATOMIC_ACQUIRE);
+		cnt_b = rte_atomic_load_explicit(h->tbl_chng_cnt,
+					rte_memory_order_acquire);
 
 		/* Compare signatures and prefetch key slot of first hit */
 		for (i = 0; i < num_keys; i++) {
@@ -2121,9 +2121,9 @@ struct rte_hash *
 						__builtin_ctzl(prim_hitmask[i])
 						>> 1;
 				uint32_t key_idx =
-				__atomic_load_n(
+				rte_atomic_load_explicit(
 					&primary_bkt[i]->key_idx[hit_index],
-					__ATOMIC_ACQUIRE);
+					rte_memory_order_acquire);
 				const struct rte_hash_key *key_slot =
 					(const struct rte_hash_key *)(
 					(const char *)h->key_store +
@@ -2137,9 +2137,9 @@ struct rte_hash *
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = __atomic_load_n(
+						data[i] = rte_atomic_load_explicit(
 							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
+							rte_memory_order_acquire);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
@@ -2153,9 +2153,9 @@ struct rte_hash *
 						__builtin_ctzl(sec_hitmask[i])
 						>> 1;
 				uint32_t key_idx =
-				__atomic_load_n(
+				rte_atomic_load_explicit(
 					&secondary_bkt[i]->key_idx[hit_index],
-					__ATOMIC_ACQUIRE);
+					rte_memory_order_acquire);
 				const struct rte_hash_key *key_slot =
 					(const struct rte_hash_key *)(
 					(const char *)h->key_store +
@@ -2170,9 +2170,9 @@ struct rte_hash *
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = __atomic_load_n(
+						data[i] = rte_atomic_load_explicit(
 							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
+							rte_memory_order_acquire);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
@@ -2216,7 +2216,7 @@ struct rte_hash *
 		/* The loads of sig_current in compare_signatures
 		 * should not move below the load from tbl_chng_cnt.
 		 */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 		/* Re-read the table change counter to check if the
 		 * table has changed during search. If yes, re-do
 		 * the search.
@@ -2225,8 +2225,8 @@ struct rte_hash *
 		 * key index will make sure that it does not get
 		 * hoisted.
 		 */
-		cnt_a = __atomic_load_n(h->tbl_chng_cnt,
-					__ATOMIC_ACQUIRE);
+		cnt_a = rte_atomic_load_explicit(h->tbl_chng_cnt,
+					rte_memory_order_acquire);
 	} while (cnt_b != cnt_a);
 
 	if (hit_mask != NULL)
@@ -2498,8 +2498,8 @@ struct rte_hash *
 	idx = *next % RTE_HASH_BUCKET_ENTRIES;
 
 	/* If current position is empty, go to the next one */
-	while ((position = __atomic_load_n(&h->buckets[bucket_idx].key_idx[idx],
-					__ATOMIC_ACQUIRE)) == EMPTY_SLOT) {
+	while ((position = rte_atomic_load_explicit(&h->buckets[bucket_idx].key_idx[idx],
+					rte_memory_order_acquire)) == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
 		if (*next == total_entries_main)
diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
index eb2644f..f7afc4d 100644
--- a/lib/hash/rte_cuckoo_hash.h
+++ b/lib/hash/rte_cuckoo_hash.h
@@ -137,7 +137,7 @@ struct lcore_cache {
 struct rte_hash_key {
 	union {
 		uintptr_t idata;
-		void *pdata;
+		RTE_ATOMIC(void *) pdata;
 	};
 	/* Variable key size */
 	char key[0];
@@ -155,7 +155,7 @@ enum rte_hash_sig_compare_function {
 struct rte_hash_bucket {
 	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
 
-	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
+	RTE_ATOMIC(uint32_t) key_idx[RTE_HASH_BUCKET_ENTRIES];
 
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
 
@@ -229,7 +229,7 @@ struct rte_hash {
 	 * is piggy-backed to freeing of the key index.
 	 */
 	uint32_t *ext_bkt_to_free;
-	uint32_t *tbl_chng_cnt;
+	RTE_ATOMIC(uint32_t) *tbl_chng_cnt;
 	/**< Indicates if the hash table changed from last read. */
 } __rte_cache_aligned;
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 20/21] timer: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (18 preceding siblings ...)
  2023-10-16 23:09 ` [PATCH 19/21] hash: " Tyler Retzlaff
@ 2023-10-16 23:09 ` Tyler Retzlaff
  2023-10-16 23:09 ` [PATCH 21/21] ring: " Tyler Retzlaff
                   ` (2 subsequent siblings)
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:09 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/timer/rte_timer.c | 50 +++++++++++++++++++++++++-------------------------
 lib/timer/rte_timer.h |  6 +++---
 2 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/lib/timer/rte_timer.c b/lib/timer/rte_timer.c
index 85d6757..53ed221 100644
--- a/lib/timer/rte_timer.c
+++ b/lib/timer/rte_timer.c
@@ -210,7 +210,7 @@ struct rte_timer_data {
 
 	status.state = RTE_TIMER_STOP;
 	status.owner = RTE_TIMER_NO_OWNER;
-	__atomic_store_n(&tim->status.u32, status.u32, __ATOMIC_RELAXED);
+	rte_atomic_store_explicit(&tim->status.u32, status.u32, rte_memory_order_relaxed);
 }
 
 /*
@@ -231,7 +231,7 @@ struct rte_timer_data {
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	prev_status.u32 = __atomic_load_n(&tim->status.u32, __ATOMIC_RELAXED);
+	prev_status.u32 = rte_atomic_load_explicit(&tim->status.u32, rte_memory_order_relaxed);
 
 	while (success == 0) {
 		/* timer is running on another core
@@ -254,11 +254,11 @@ struct rte_timer_data {
 		 * timer is in CONFIG state, the state cannot be changed
 		 * by other threads. So, we should use ACQUIRE here.
 		 */
-		success = __atomic_compare_exchange_n(&tim->status.u32,
-					      &prev_status.u32,
-					      status.u32, 0,
-					      __ATOMIC_ACQUIRE,
-					      __ATOMIC_RELAXED);
+		success = rte_atomic_compare_exchange_strong_explicit(&tim->status.u32,
+					      (uint32_t *)(uintptr_t)&prev_status.u32,
+					      status.u32,
+					      rte_memory_order_acquire,
+					      rte_memory_order_relaxed);
 	}
 
 	ret_prev_status->u32 = prev_status.u32;
@@ -277,7 +277,7 @@ struct rte_timer_data {
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as running */
-	prev_status.u32 = __atomic_load_n(&tim->status.u32, __ATOMIC_RELAXED);
+	prev_status.u32 = rte_atomic_load_explicit(&tim->status.u32, rte_memory_order_relaxed);
 
 	while (success == 0) {
 		/* timer is not pending anymore */
@@ -293,11 +293,11 @@ struct rte_timer_data {
 		 * timer is in RUNNING state, the state cannot be changed
 		 * by other threads. So, we should use ACQUIRE here.
 		 */
-		success = __atomic_compare_exchange_n(&tim->status.u32,
-					      &prev_status.u32,
-					      status.u32, 0,
-					      __ATOMIC_ACQUIRE,
-					      __ATOMIC_RELAXED);
+		success = rte_atomic_compare_exchange_strong_explicit(&tim->status.u32,
+					      (uint32_t *)(uintptr_t)&prev_status.u32,
+					      status.u32,
+					      rte_memory_order_acquire,
+					      rte_memory_order_relaxed);
 	}
 
 	return 0;
@@ -530,7 +530,7 @@ struct rte_timer_data {
 	/* The "RELEASE" ordering guarantees the memory operations above
 	 * the status update are observed before the update by all threads
 	 */
-	__atomic_store_n(&tim->status.u32, status.u32, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&tim->status.u32, status.u32, rte_memory_order_release);
 
 	if (tim_lcore != lcore_id || !local_is_locked)
 		rte_spinlock_unlock(&priv_timer[tim_lcore].list_lock);
@@ -612,7 +612,7 @@ struct rte_timer_data {
 	/* The "RELEASE" ordering guarantees the memory operations above
 	 * the status update are observed before the update by all threads
 	 */
-	__atomic_store_n(&tim->status.u32, status.u32, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&tim->status.u32, status.u32, rte_memory_order_release);
 
 	return 0;
 }
@@ -646,8 +646,8 @@ struct rte_timer_data {
 int
 rte_timer_pending(struct rte_timer *tim)
 {
-	return __atomic_load_n(&tim->status.state,
-				__ATOMIC_RELAXED) == RTE_TIMER_PENDING;
+	return rte_atomic_load_explicit(&tim->status.state,
+				rte_memory_order_relaxed) == RTE_TIMER_PENDING;
 }
 
 /* must be called periodically, run all timer that expired */
@@ -753,8 +753,8 @@ struct rte_timer_data {
 			 * operations above the status update are observed
 			 * before the update by all threads
 			 */
-			__atomic_store_n(&tim->status.u32, status.u32,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&tim->status.u32, status.u32,
+				rte_memory_order_release);
 		}
 		else {
 			/* keep it in list and mark timer as pending */
@@ -766,8 +766,8 @@ struct rte_timer_data {
 			 * operations above the status update are observed
 			 * before the update by all threads
 			 */
-			__atomic_store_n(&tim->status.u32, status.u32,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&tim->status.u32, status.u32,
+				rte_memory_order_release);
 			__rte_timer_reset(tim, tim->expire + tim->period,
 				tim->period, lcore_id, tim->f, tim->arg, 1,
 				timer_data);
@@ -941,8 +941,8 @@ struct rte_timer_data {
 			 * operations above the status update are observed
 			 * before the update by all threads
 			 */
-			__atomic_store_n(&tim->status.u32, status.u32,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&tim->status.u32, status.u32,
+				rte_memory_order_release);
 		} else {
 			/* keep it in list and mark timer as pending */
 			rte_spinlock_lock(
@@ -954,8 +954,8 @@ struct rte_timer_data {
 			 * operations above the status update are observed
 			 * before the update by all threads
 			 */
-			__atomic_store_n(&tim->status.u32, status.u32,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&tim->status.u32, status.u32,
+				rte_memory_order_release);
 			__rte_timer_reset(tim, tim->expire + tim->period,
 				tim->period, this_lcore, tim->f, tim->arg, 1,
 				data);
diff --git a/lib/timer/rte_timer.h b/lib/timer/rte_timer.h
index d3927d5..a35bc08 100644
--- a/lib/timer/rte_timer.h
+++ b/lib/timer/rte_timer.h
@@ -65,10 +65,10 @@ enum rte_timer_type {
  */
 union rte_timer_status {
 	struct {
-		uint16_t state;  /**< Stop, pending, running, config. */
-		int16_t owner;   /**< The lcore that owns the timer. */
+		RTE_ATOMIC(uint16_t) state;  /**< Stop, pending, running, config. */
+		RTE_ATOMIC(int16_t) owner;   /**< The lcore that owns the timer. */
 	};
-	uint32_t u32;            /**< To atomic-set status + owner. */
+	RTE_ATOMIC(uint32_t) u32;            /**< To atomic-set status + owner. */
 };
 
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH 21/21] ring: use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (19 preceding siblings ...)
  2023-10-16 23:09 ` [PATCH 20/21] timer: " Tyler Retzlaff
@ 2023-10-16 23:09 ` Tyler Retzlaff
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
  22 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-16 23:09 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 drivers/net/mlx5/mlx5_hws_cnt.h   |  2 +-
 lib/ring/rte_ring_c11_pvt.h       | 33 +++++++++++++++++----------------
 lib/ring/rte_ring_core.h          | 10 +++++-----
 lib/ring/rte_ring_generic_pvt.h   |  3 ++-
 lib/ring/rte_ring_hts_elem_pvt.h  | 22 ++++++++++++----------
 lib/ring/rte_ring_peek_elem_pvt.h |  6 +++---
 lib/ring/rte_ring_rts_elem_pvt.h  | 27 ++++++++++++++-------------
 7 files changed, 54 insertions(+), 49 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
index f462665..cc9ac10 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.h
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -394,7 +394,7 @@ struct mlx5_hws_age_param {
 	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
 			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
 	/* Update tail */
-	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&r->prod.tail, revert2head, rte_memory_order_release);
 	return n;
 }
 
diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
index f895950..f8be538 100644
--- a/lib/ring/rte_ring_c11_pvt.h
+++ b/lib/ring/rte_ring_c11_pvt.h
@@ -22,9 +22,10 @@
 	 * we need to wait for them to complete
 	 */
 	if (!single)
-		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
+		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
+			rte_memory_order_relaxed);
 
-	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
 }
 
 /**
@@ -61,19 +62,19 @@
 	unsigned int max = n;
 	int success;
 
-	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
+	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
 	do {
 		/* Reset n to the initial burst count */
 		n = max;
 
 		/* Ensure the head is read before tail */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 
 		/* load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
-		cons_tail = __atomic_load_n(&r->cons.tail,
-					__ATOMIC_ACQUIRE);
+		cons_tail = rte_atomic_load_explicit(&r->cons.tail,
+					rte_memory_order_acquire);
 
 		/* The subtraction is done between two unsigned 32bits value
 		 * (the result is always modulo 32 bits even if we have
@@ -95,10 +96,10 @@
 			r->prod.head = *new_head, success = 1;
 		else
 			/* on failure, *old_head is updated */
-			success = __atomic_compare_exchange_n(&r->prod.head,
+			success = rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
 					old_head, *new_head,
-					0, __ATOMIC_RELAXED,
-					__ATOMIC_RELAXED);
+					rte_memory_order_relaxed,
+					rte_memory_order_relaxed);
 	} while (unlikely(success == 0));
 	return n;
 }
@@ -137,19 +138,19 @@
 	int success;
 
 	/* move cons.head atomically */
-	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
+	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_relaxed);
 	do {
 		/* Restore n as it may change every loop */
 		n = max;
 
 		/* Ensure the head is read before tail */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 
 		/* this load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
-		prod_tail = __atomic_load_n(&r->prod.tail,
-					__ATOMIC_ACQUIRE);
+		prod_tail = rte_atomic_load_explicit(&r->prod.tail,
+					rte_memory_order_acquire);
 
 		/* The subtraction is done between two unsigned 32bits value
 		 * (the result is always modulo 32 bits even if we have
@@ -170,10 +171,10 @@
 			r->cons.head = *new_head, success = 1;
 		else
 			/* on failure, *old_head will be updated */
-			success = __atomic_compare_exchange_n(&r->cons.head,
+			success = rte_atomic_compare_exchange_strong_explicit(&r->cons.head,
 							old_head, *new_head,
-							0, __ATOMIC_RELAXED,
-							__ATOMIC_RELAXED);
+							rte_memory_order_relaxed,
+							rte_memory_order_relaxed);
 	} while (unlikely(success == 0));
 	return n;
 }
diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 327fdcf..7a2b577 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -67,7 +67,7 @@ enum rte_ring_sync_type {
  */
 struct rte_ring_headtail {
 	volatile uint32_t head;      /**< prod/consumer head. */
-	volatile uint32_t tail;      /**< prod/consumer tail. */
+	volatile RTE_ATOMIC(uint32_t) tail;      /**< prod/consumer tail. */
 	union {
 		/** sync type of prod/cons */
 		enum rte_ring_sync_type sync_type;
@@ -78,7 +78,7 @@ struct rte_ring_headtail {
 
 union __rte_ring_rts_poscnt {
 	/** raw 8B value to read/write *cnt* and *pos* as one atomic op */
-	uint64_t raw __rte_aligned(8);
+	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
 	struct {
 		uint32_t cnt; /**< head/tail reference counter */
 		uint32_t pos; /**< head/tail position */
@@ -94,10 +94,10 @@ struct rte_ring_rts_headtail {
 
 union __rte_ring_hts_pos {
 	/** raw 8B value to read/write *head* and *tail* as one atomic op */
-	uint64_t raw __rte_aligned(8);
+	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
 	struct {
-		uint32_t head; /**< head position */
-		uint32_t tail; /**< tail position */
+		RTE_ATOMIC(uint32_t) head; /**< head position */
+		RTE_ATOMIC(uint32_t) tail; /**< tail position */
 	} pos;
 };
 
diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
index 5acb6e5..ffb3654 100644
--- a/lib/ring/rte_ring_generic_pvt.h
+++ b/lib/ring/rte_ring_generic_pvt.h
@@ -23,7 +23,8 @@
 	 * we need to wait for them to complete
 	 */
 	if (!single)
-		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
+		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
+			rte_memory_order_relaxed);
 
 	ht->tail = new_val;
 }
diff --git a/lib/ring/rte_ring_hts_elem_pvt.h b/lib/ring/rte_ring_hts_elem_pvt.h
index a8678d3..91f5eec 100644
--- a/lib/ring/rte_ring_hts_elem_pvt.h
+++ b/lib/ring/rte_ring_hts_elem_pvt.h
@@ -10,6 +10,8 @@
 #ifndef _RTE_RING_HTS_ELEM_PVT_H_
 #define _RTE_RING_HTS_ELEM_PVT_H_
 
+#include <rte_stdatomic.h>
+
 /**
  * @file rte_ring_hts_elem_pvt.h
  * It is not recommended to include this file directly,
@@ -30,7 +32,7 @@
 	RTE_SET_USED(enqueue);
 
 	tail = old_tail + num;
-	__atomic_store_n(&ht->ht.pos.tail, tail, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ht->ht.pos.tail, tail, rte_memory_order_release);
 }
 
 /**
@@ -44,7 +46,7 @@
 {
 	while (p->pos.head != p->pos.tail) {
 		rte_pause();
-		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+		p->raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_acquire);
 	}
 }
 
@@ -61,7 +63,7 @@
 
 	const uint32_t capacity = r->capacity;
 
-	op.raw = __atomic_load_n(&r->hts_prod.ht.raw, __ATOMIC_ACQUIRE);
+	op.raw = rte_atomic_load_explicit(&r->hts_prod.ht.raw, rte_memory_order_acquire);
 
 	do {
 		/* Reset n to the initial burst count */
@@ -98,9 +100,9 @@
 	 *  - OOO reads of cons tail value
 	 *  - OOO copy of elems from the ring
 	 */
-	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
-			&op.raw, np.raw,
-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_prod.ht.raw,
+			(uint64_t *)(uintptr_t)&op.raw, np.raw,
+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
 
 	*old_head = op.pos.head;
 	return n;
@@ -117,7 +119,7 @@
 	uint32_t n;
 	union __rte_ring_hts_pos np, op;
 
-	op.raw = __atomic_load_n(&r->hts_cons.ht.raw, __ATOMIC_ACQUIRE);
+	op.raw = rte_atomic_load_explicit(&r->hts_cons.ht.raw, rte_memory_order_acquire);
 
 	/* move cons.head atomically */
 	do {
@@ -153,9 +155,9 @@
 	 *  - OOO reads of prod tail value
 	 *  - OOO copy of elems from the ring
 	 */
-	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
-			&op.raw, np.raw,
-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_cons.ht.raw,
+			(uint64_t *)(uintptr_t)&op.raw, np.raw,
+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
 
 	*old_head = op.pos.head;
 	return n;
diff --git a/lib/ring/rte_ring_peek_elem_pvt.h b/lib/ring/rte_ring_peek_elem_pvt.h
index bb0a7d5..b5f0822 100644
--- a/lib/ring/rte_ring_peek_elem_pvt.h
+++ b/lib/ring/rte_ring_peek_elem_pvt.h
@@ -59,7 +59,7 @@
 
 	pos = tail + num;
 	ht->head = pos;
-	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ht->tail, pos, rte_memory_order_release);
 }
 
 /**
@@ -78,7 +78,7 @@
 	uint32_t n;
 	union __rte_ring_hts_pos p;
 
-	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
+	p.raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_relaxed);
 	n = p.pos.head - p.pos.tail;
 
 	RTE_ASSERT(n >= num);
@@ -104,7 +104,7 @@
 	p.pos.head = tail + num;
 	p.pos.tail = p.pos.head;
 
-	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ht->ht.raw, p.raw, rte_memory_order_release);
 }
 
 /**
diff --git a/lib/ring/rte_ring_rts_elem_pvt.h b/lib/ring/rte_ring_rts_elem_pvt.h
index 7164213..1226503 100644
--- a/lib/ring/rte_ring_rts_elem_pvt.h
+++ b/lib/ring/rte_ring_rts_elem_pvt.h
@@ -31,18 +31,19 @@
 	 * might preceded us, then don't update tail with new value.
 	 */
 
-	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
+	ot.raw = rte_atomic_load_explicit(&ht->tail.raw, rte_memory_order_acquire);
 
 	do {
 		/* on 32-bit systems we have to do atomic read here */
-		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
+		h.raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_relaxed);
 
 		nt.raw = ot.raw;
 		if (++nt.val.cnt == h.val.cnt)
 			nt.val.pos = h.val.pos;
 
-	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
-			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&ht->tail.raw,
+			(uint64_t *)(uintptr_t)&ot.raw, nt.raw,
+			rte_memory_order_release, rte_memory_order_acquire) == 0);
 }
 
 /**
@@ -59,7 +60,7 @@
 
 	while (h->val.pos - ht->tail.val.pos > max) {
 		rte_pause();
-		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+		h->raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_acquire);
 	}
 }
 
@@ -76,7 +77,7 @@
 
 	const uint32_t capacity = r->capacity;
 
-	oh.raw = __atomic_load_n(&r->rts_prod.head.raw, __ATOMIC_ACQUIRE);
+	oh.raw = rte_atomic_load_explicit(&r->rts_prod.head.raw, rte_memory_order_acquire);
 
 	do {
 		/* Reset n to the initial burst count */
@@ -113,9 +114,9 @@
 	 *  - OOO reads of cons tail value
 	 *  - OOO copy of elems to the ring
 	 */
-	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
-			&oh.raw, nh.raw,
-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_prod.head.raw,
+			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
 
 	*old_head = oh.val.pos;
 	return n;
@@ -132,7 +133,7 @@
 	uint32_t n;
 	union __rte_ring_rts_poscnt nh, oh;
 
-	oh.raw = __atomic_load_n(&r->rts_cons.head.raw, __ATOMIC_ACQUIRE);
+	oh.raw = rte_atomic_load_explicit(&r->rts_cons.head.raw, rte_memory_order_acquire);
 
 	/* move cons.head atomically */
 	do {
@@ -168,9 +169,9 @@
 	 *  - OOO reads of prod tail value
 	 *  - OOO copy of elems from the ring
 	 */
-	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
-			&oh.raw, nh.raw,
-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_cons.head.raw,
+			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
 
 	*old_head = oh.val.pos;
 	return n;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 00/19] use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (20 preceding siblings ...)
  2023-10-16 23:09 ` [PATCH 21/21] ring: " Tyler Retzlaff
@ 2023-10-17 20:30 ` Tyler Retzlaff
  2023-10-17 20:30   ` [PATCH v2 01/19] power: " Tyler Retzlaff
                     ` (19 more replies)
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
  22 siblings, 20 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:30 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API.

v2:
  * add #include <rte_stdatomic.h> to rte_mbuf_core.h
  * remove first two patches which were fixes that have
    been merged in another series

Tyler Retzlaff (19):
  power: use rte optional stdatomic API
  bbdev: use rte optional stdatomic API
  eal: use rte optional stdatomic API
  eventdev: use rte optional stdatomic API
  gpudev: use rte optional stdatomic API
  ipsec: use rte optional stdatomic API
  mbuf: use rte optional stdatomic API
  mempool: use rte optional stdatomic API
  rcu: use rte optional stdatomic API
  pdump: use rte optional stdatomic API
  stack: use rte optional stdatomic API
  telemetry: use rte optional stdatomic API
  vhost: use rte optional stdatomic API
  cryptodev: use rte optional stdatomic API
  distributor: use rte optional stdatomic API
  ethdev: use rte optional stdatomic API
  hash: use rte optional stdatomic API
  timer: use rte optional stdatomic API
  ring: use rte optional stdatomic API

 drivers/event/cnxk/cnxk_tim_worker.h   |   4 +-
 drivers/net/mlx5/mlx5_hws_cnt.h        |   2 +-
 lib/bbdev/rte_bbdev.c                  |   6 +-
 lib/bbdev/rte_bbdev.h                  |   2 +-
 lib/cryptodev/rte_cryptodev.c          |  22 +++---
 lib/cryptodev/rte_cryptodev.h          |  16 ++---
 lib/distributor/distributor_private.h  |   4 +-
 lib/distributor/rte_distributor.c      |  54 +++++++--------
 lib/eal/common/eal_common_launch.c     |  10 +--
 lib/eal/common/eal_common_mcfg.c       |   2 +-
 lib/eal/common/eal_common_proc.c       |  14 ++--
 lib/eal/common/eal_common_thread.c     |  26 +++----
 lib/eal/common/eal_common_trace.c      |   8 +--
 lib/eal/common/eal_common_trace_ctf.c  |   4 +-
 lib/eal/common/eal_memcfg.h            |   2 +-
 lib/eal/common/eal_private.h           |   4 +-
 lib/eal/common/eal_trace.h             |   4 +-
 lib/eal/common/rte_service.c           | 122 ++++++++++++++++-----------------
 lib/eal/freebsd/eal.c                  |  20 +++---
 lib/eal/include/rte_epoll.h            |   3 +-
 lib/eal/linux/eal.c                    |  26 +++----
 lib/eal/linux/eal_interrupts.c         |  42 ++++++------
 lib/eal/ppc/include/rte_atomic.h       |   6 +-
 lib/eal/windows/rte_thread.c           |   8 ++-
 lib/ethdev/ethdev_driver.h             |  16 ++---
 lib/ethdev/ethdev_private.c            |   6 +-
 lib/ethdev/rte_ethdev.c                |  24 +++----
 lib/ethdev/rte_ethdev.h                |  16 ++---
 lib/ethdev/rte_ethdev_core.h           |   2 +-
 lib/eventdev/rte_event_timer_adapter.c |  66 +++++++++---------
 lib/eventdev/rte_event_timer_adapter.h |   2 +-
 lib/gpudev/gpudev.c                    |   6 +-
 lib/gpudev/gpudev_driver.h             |   2 +-
 lib/hash/rte_cuckoo_hash.c             | 116 +++++++++++++++----------------
 lib/hash/rte_cuckoo_hash.h             |   6 +-
 lib/ipsec/ipsec_sqn.h                  |   2 +-
 lib/ipsec/sa.h                         |   2 +-
 lib/mbuf/rte_mbuf.h                    |  20 +++---
 lib/mbuf/rte_mbuf_core.h               |   5 +-
 lib/mempool/rte_mempool.h              |   4 +-
 lib/pdump/rte_pdump.c                  |  14 ++--
 lib/pdump/rte_pdump.h                  |   8 +--
 lib/power/power_acpi_cpufreq.c         |  33 ++++-----
 lib/power/power_cppc_cpufreq.c         |  25 +++----
 lib/power/power_pstate_cpufreq.c       |  31 +++++----
 lib/rcu/rte_rcu_qsbr.c                 |  48 ++++++-------
 lib/rcu/rte_rcu_qsbr.h                 |  68 +++++++++---------
 lib/ring/rte_ring_c11_pvt.h            |  33 ++++-----
 lib/ring/rte_ring_core.h               |  10 +--
 lib/ring/rte_ring_generic_pvt.h        |   3 +-
 lib/ring/rte_ring_hts_elem_pvt.h       |  22 +++---
 lib/ring/rte_ring_peek_elem_pvt.h      |   6 +-
 lib/ring/rte_ring_rts_elem_pvt.h       |  27 ++++----
 lib/stack/rte_stack.h                  |   2 +-
 lib/stack/rte_stack_lf_c11.h           |  24 +++----
 lib/stack/rte_stack_lf_generic.h       |  18 ++---
 lib/telemetry/telemetry.c              |  18 ++---
 lib/timer/rte_timer.c                  |  50 +++++++-------
 lib/timer/rte_timer.h                  |   6 +-
 lib/vhost/vdpa.c                       |   3 +-
 lib/vhost/vhost.c                      |  42 ++++++------
 lib/vhost/vhost.h                      |  39 ++++++-----
 lib/vhost/vhost_user.c                 |   6 +-
 lib/vhost/virtio_net.c                 |  58 +++++++++-------
 lib/vhost/virtio_net_ctrl.c            |   6 +-
 65 files changed, 667 insertions(+), 639 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 01/19] power: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
@ 2023-10-17 20:30   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 02/19] bbdev: " Tyler Retzlaff
                     ` (18 subsequent siblings)
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:30 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding
rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/power/power_acpi_cpufreq.c   | 33 +++++++++++++++++----------------
 lib/power/power_cppc_cpufreq.c   | 25 +++++++++++++------------
 lib/power/power_pstate_cpufreq.c | 31 ++++++++++++++++---------------
 3 files changed, 46 insertions(+), 43 deletions(-)

diff --git a/lib/power/power_acpi_cpufreq.c b/lib/power/power_acpi_cpufreq.c
index 6e57aca..8b55f19 100644
--- a/lib/power/power_acpi_cpufreq.c
+++ b/lib/power/power_acpi_cpufreq.c
@@ -7,6 +7,7 @@
 #include <stdlib.h>
 
 #include <rte_memcpy.h>
+#include <rte_stdatomic.h>
 #include <rte_string_fns.h>
 
 #include "power_acpi_cpufreq.h"
@@ -41,13 +42,13 @@ enum power_state {
  * Power info per lcore.
  */
 struct acpi_power_info {
-	unsigned int lcore_id;                   /**< Logical core id */
+	unsigned int lcore_id;               /**< Logical core id */
 	uint32_t freqs[RTE_MAX_LCORE_FREQS]; /**< Frequency array */
 	uint32_t nb_freqs;                   /**< number of available freqs */
 	FILE *f;                             /**< FD of scaling_setspeed */
 	char governor_ori[32];               /**< Original governor name */
 	uint32_t curr_idx;                   /**< Freq index in freqs array */
-	uint32_t state;                      /**< Power in use state */
+	RTE_ATOMIC(uint32_t) state;          /**< Power in use state */
 	uint16_t turbo_available;            /**< Turbo Boost available */
 	uint16_t turbo_enable;               /**< Turbo Boost enable/disable */
 } __rte_cache_aligned;
@@ -249,9 +250,9 @@ struct acpi_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"in use\n", lcore_id);
 		return -1;
@@ -289,15 +290,15 @@ struct acpi_power_info {
 	RTE_LOG(INFO, POWER, "Initialized successfully for lcore %u "
 			"power management\n", lcore_id);
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_USED,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_USED,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return 0;
 
 fail:
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_UNKNOWN,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_UNKNOWN,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return -1;
 }
@@ -321,9 +322,9 @@ struct acpi_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"not used\n", lcore_id);
 		return -1;
@@ -344,15 +345,15 @@ struct acpi_power_info {
 			"'userspace' mode and been set back to the "
 			"original\n", lcore_id);
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_IDLE,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_IDLE,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return 0;
 
 fail:
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_UNKNOWN,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_UNKNOWN,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return -1;
 }
diff --git a/lib/power/power_cppc_cpufreq.c b/lib/power/power_cppc_cpufreq.c
index fc9cffe..bb70f6a 100644
--- a/lib/power/power_cppc_cpufreq.c
+++ b/lib/power/power_cppc_cpufreq.c
@@ -6,6 +6,7 @@
 #include <stdlib.h>
 
 #include <rte_memcpy.h>
+#include <rte_stdatomic.h>
 
 #include "power_cppc_cpufreq.h"
 #include "power_common.h"
@@ -49,8 +50,8 @@ enum power_state {
  * Power info per lcore.
  */
 struct cppc_power_info {
-	unsigned int lcore_id;                   /**< Logical core id */
-	uint32_t state;                      /**< Power in use state */
+	unsigned int lcore_id;               /**< Logical core id */
+	RTE_ATOMIC(uint32_t) state;          /**< Power in use state */
 	FILE *f;                             /**< FD of scaling_setspeed */
 	char governor_ori[32];               /**< Original governor name */
 	uint32_t curr_idx;                   /**< Freq index in freqs array */
@@ -353,9 +354,9 @@ struct cppc_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"in use\n", lcore_id);
 		return -1;
@@ -393,12 +394,12 @@ struct cppc_power_info {
 	RTE_LOG(INFO, POWER, "Initialized successfully for lcore %u "
 			"power management\n", lcore_id);
 
-	__atomic_store_n(&(pi->state), POWER_USED, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_USED, rte_memory_order_release);
 
 	return 0;
 
 fail:
-	__atomic_store_n(&(pi->state), POWER_UNKNOWN, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_UNKNOWN, rte_memory_order_release);
 	return -1;
 }
 
@@ -431,9 +432,9 @@ struct cppc_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"not used\n", lcore_id);
 		return -1;
@@ -453,12 +454,12 @@ struct cppc_power_info {
 	RTE_LOG(INFO, POWER, "Power management of lcore %u has exited from "
 			"'userspace' mode and been set back to the "
 			"original\n", lcore_id);
-	__atomic_store_n(&(pi->state), POWER_IDLE, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_IDLE, rte_memory_order_release);
 
 	return 0;
 
 fail:
-	__atomic_store_n(&(pi->state), POWER_UNKNOWN, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_UNKNOWN, rte_memory_order_release);
 
 	return -1;
 }
diff --git a/lib/power/power_pstate_cpufreq.c b/lib/power/power_pstate_cpufreq.c
index 52aa645..5ca5f60 100644
--- a/lib/power/power_pstate_cpufreq.c
+++ b/lib/power/power_pstate_cpufreq.c
@@ -12,6 +12,7 @@
 #include <inttypes.h>
 
 #include <rte_memcpy.h>
+#include <rte_stdatomic.h>
 
 #include "rte_power_pmd_mgmt.h"
 #include "power_pstate_cpufreq.h"
@@ -59,7 +60,7 @@ struct pstate_power_info {
 	uint32_t non_turbo_max_ratio;        /**< Non Turbo Max ratio  */
 	uint32_t sys_max_freq;               /**< system wide max freq  */
 	uint32_t core_base_freq;             /**< core base freq  */
-	uint32_t state;                      /**< Power in use state */
+	RTE_ATOMIC(uint32_t) state;          /**< Power in use state */
 	uint16_t turbo_available;            /**< Turbo Boost available */
 	uint16_t turbo_enable;               /**< Turbo Boost enable/disable */
 	uint16_t priority_core;              /**< High Performance core */
@@ -555,9 +556,9 @@ struct pstate_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"in use\n", lcore_id);
 		return -1;
@@ -600,15 +601,15 @@ struct pstate_power_info {
 	RTE_LOG(INFO, POWER, "Initialized successfully for lcore %u "
 			"power management\n", lcore_id);
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_USED,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_USED,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return 0;
 
 fail:
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_UNKNOWN,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_UNKNOWN,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return -1;
 }
@@ -633,9 +634,9 @@ struct pstate_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are under done the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"not used\n", lcore_id);
 		return -1;
@@ -658,15 +659,15 @@ struct pstate_power_info {
 			"'performance' mode and been set back to the "
 			"original\n", lcore_id);
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_IDLE,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_IDLE,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return 0;
 
 fail:
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_UNKNOWN,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_UNKNOWN,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return -1;
 }
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 02/19] bbdev: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
  2023-10-17 20:30   ` [PATCH v2 01/19] power: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 03/19] eal: " Tyler Retzlaff
                     ` (17 subsequent siblings)
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding
rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/bbdev/rte_bbdev.c | 6 +++---
 lib/bbdev/rte_bbdev.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/bbdev/rte_bbdev.c b/lib/bbdev/rte_bbdev.c
index 155323e..cfebea0 100644
--- a/lib/bbdev/rte_bbdev.c
+++ b/lib/bbdev/rte_bbdev.c
@@ -208,7 +208,7 @@ struct rte_bbdev *
 		return NULL;
 	}
 
-	__atomic_fetch_add(&bbdev->data->process_cnt, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&bbdev->data->process_cnt, 1, rte_memory_order_relaxed);
 	bbdev->data->dev_id = dev_id;
 	bbdev->state = RTE_BBDEV_INITIALIZED;
 
@@ -250,8 +250,8 @@ struct rte_bbdev *
 	}
 
 	/* clear shared BBDev Data if no process is using the device anymore */
-	if (__atomic_fetch_sub(&bbdev->data->process_cnt, 1,
-			      __ATOMIC_RELAXED) - 1 == 0)
+	if (rte_atomic_fetch_sub_explicit(&bbdev->data->process_cnt, 1,
+			      rte_memory_order_relaxed) - 1 == 0)
 		memset(bbdev->data, 0, sizeof(*bbdev->data));
 
 	memset(bbdev, 0, sizeof(*bbdev));
diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index d12e2e7..e1aee08 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -482,7 +482,7 @@ struct rte_bbdev_data {
 	uint16_t dev_id;  /**< Device ID */
 	int socket_id;  /**< NUMA socket that device is on */
 	bool started;  /**< Device run-time state */
-	uint16_t process_cnt;  /** Counter of processes using the device */
+	RTE_ATOMIC(uint16_t) process_cnt;  /** Counter of processes using the device */
 };
 
 /* Forward declarations */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 03/19] eal: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
  2023-10-17 20:30   ` [PATCH v2 01/19] power: " Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 02/19] bbdev: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 04/19] eventdev: " Tyler Retzlaff
                     ` (16 subsequent siblings)
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding
rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/common/eal_common_launch.c    |  10 +--
 lib/eal/common/eal_common_mcfg.c      |   2 +-
 lib/eal/common/eal_common_proc.c      |  14 ++--
 lib/eal/common/eal_common_thread.c    |  26 ++++----
 lib/eal/common/eal_common_trace.c     |   8 +--
 lib/eal/common/eal_common_trace_ctf.c |   4 +-
 lib/eal/common/eal_memcfg.h           |   2 +-
 lib/eal/common/eal_private.h          |   4 +-
 lib/eal/common/eal_trace.h            |   4 +-
 lib/eal/common/rte_service.c          | 122 +++++++++++++++++-----------------
 lib/eal/freebsd/eal.c                 |  20 +++---
 lib/eal/include/rte_epoll.h           |   3 +-
 lib/eal/linux/eal.c                   |  26 ++++----
 lib/eal/linux/eal_interrupts.c        |  42 ++++++------
 lib/eal/ppc/include/rte_atomic.h      |   6 +-
 lib/eal/windows/rte_thread.c          |   8 ++-
 16 files changed, 152 insertions(+), 149 deletions(-)

diff --git a/lib/eal/common/eal_common_launch.c b/lib/eal/common/eal_common_launch.c
index 0504598..5320c3b 100644
--- a/lib/eal/common/eal_common_launch.c
+++ b/lib/eal/common/eal_common_launch.c
@@ -18,8 +18,8 @@
 int
 rte_eal_wait_lcore(unsigned worker_id)
 {
-	while (__atomic_load_n(&lcore_config[worker_id].state,
-			__ATOMIC_ACQUIRE) != WAIT)
+	while (rte_atomic_load_explicit(&lcore_config[worker_id].state,
+			rte_memory_order_acquire) != WAIT)
 		rte_pause();
 
 	return lcore_config[worker_id].ret;
@@ -38,8 +38,8 @@
 	/* Check if the worker is in 'WAIT' state. Use acquire order
 	 * since 'state' variable is used as the guard variable.
 	 */
-	if (__atomic_load_n(&lcore_config[worker_id].state,
-			__ATOMIC_ACQUIRE) != WAIT)
+	if (rte_atomic_load_explicit(&lcore_config[worker_id].state,
+			rte_memory_order_acquire) != WAIT)
 		goto finish;
 
 	lcore_config[worker_id].arg = arg;
@@ -47,7 +47,7 @@
 	 * before the worker thread starts running the function.
 	 * Use worker thread function as the guard variable.
 	 */
-	__atomic_store_n(&lcore_config[worker_id].f, f, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&lcore_config[worker_id].f, f, rte_memory_order_release);
 
 	rc = eal_thread_wake_worker(worker_id);
 
diff --git a/lib/eal/common/eal_common_mcfg.c b/lib/eal/common/eal_common_mcfg.c
index 2a785e7..dabb80e 100644
--- a/lib/eal/common/eal_common_mcfg.c
+++ b/lib/eal/common/eal_common_mcfg.c
@@ -30,7 +30,7 @@
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 
 	/* wait until shared mem_config finish initialising */
-	rte_wait_until_equal_32(&mcfg->magic, RTE_MAGIC, __ATOMIC_RELAXED);
+	rte_wait_until_equal_32(&mcfg->magic, RTE_MAGIC, rte_memory_order_relaxed);
 }
 
 int
diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index f20a348..728815c 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -33,7 +33,7 @@
 #include "eal_filesystem.h"
 #include "eal_internal_cfg.h"
 
-static int mp_fd = -1;
+static RTE_ATOMIC(int) mp_fd = -1;
 static rte_thread_t mp_handle_tid;
 static char mp_filter[PATH_MAX];   /* Filter for secondary process sockets */
 static char mp_dir_path[PATH_MAX]; /* The directory path for all mp sockets */
@@ -404,7 +404,7 @@ struct pending_request {
 	struct sockaddr_un sa;
 	int fd;
 
-	while ((fd = __atomic_load_n(&mp_fd, __ATOMIC_RELAXED)) >= 0) {
+	while ((fd = rte_atomic_load_explicit(&mp_fd, rte_memory_order_relaxed)) >= 0) {
 		int ret;
 
 		ret = read_msg(fd, &msg, &sa);
@@ -652,7 +652,7 @@ enum async_action {
 		RTE_LOG(ERR, EAL, "failed to create mp thread: %s\n",
 			strerror(errno));
 		close(dir_fd);
-		close(__atomic_exchange_n(&mp_fd, -1, __ATOMIC_RELAXED));
+		close(rte_atomic_exchange_explicit(&mp_fd, -1, rte_memory_order_relaxed));
 		return -1;
 	}
 
@@ -668,7 +668,7 @@ enum async_action {
 {
 	int fd;
 
-	fd = __atomic_exchange_n(&mp_fd, -1, __ATOMIC_RELAXED);
+	fd = rte_atomic_exchange_explicit(&mp_fd, -1, rte_memory_order_relaxed);
 	if (fd < 0)
 		return;
 
@@ -1282,11 +1282,11 @@ enum mp_status {
 
 	expected = MP_STATUS_UNKNOWN;
 	desired = status;
-	if (__atomic_compare_exchange_n(&mcfg->mp_status, &expected, desired,
-			false, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+	if (rte_atomic_compare_exchange_strong_explicit(&mcfg->mp_status, &expected, desired,
+			rte_memory_order_relaxed, rte_memory_order_relaxed))
 		return true;
 
-	return __atomic_load_n(&mcfg->mp_status, __ATOMIC_RELAXED) == desired;
+	return rte_atomic_load_explicit(&mcfg->mp_status, rte_memory_order_relaxed) == desired;
 }
 
 bool
diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c
index 668b1ed..c422ea8 100644
--- a/lib/eal/common/eal_common_thread.c
+++ b/lib/eal/common/eal_common_thread.c
@@ -191,8 +191,8 @@ unsigned rte_socket_id(void)
 		/* Set the state to 'RUNNING'. Use release order
 		 * since 'state' variable is used as the guard variable.
 		 */
-		__atomic_store_n(&lcore_config[lcore_id].state, RUNNING,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&lcore_config[lcore_id].state, RUNNING,
+			rte_memory_order_release);
 
 		eal_thread_ack_command();
 
@@ -201,8 +201,8 @@ unsigned rte_socket_id(void)
 		 * are accessed only after update to 'f' is visible.
 		 * Wait till the update to 'f' is visible to the worker.
 		 */
-		while ((f = __atomic_load_n(&lcore_config[lcore_id].f,
-				__ATOMIC_ACQUIRE)) == NULL)
+		while ((f = rte_atomic_load_explicit(&lcore_config[lcore_id].f,
+				rte_memory_order_acquire)) == NULL)
 			rte_pause();
 
 		rte_eal_trace_thread_lcore_running(lcore_id, f);
@@ -219,8 +219,8 @@ unsigned rte_socket_id(void)
 		 * are completed before the state is updated.
 		 * Use 'state' as the guard variable.
 		 */
-		__atomic_store_n(&lcore_config[lcore_id].state, WAIT,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&lcore_config[lcore_id].state, WAIT,
+			rte_memory_order_release);
 
 		rte_eal_trace_thread_lcore_stopped(lcore_id);
 	}
@@ -242,7 +242,7 @@ struct control_thread_params {
 	/* Control thread status.
 	 * If the status is CTRL_THREAD_ERROR, 'ret' has the error code.
 	 */
-	enum __rte_ctrl_thread_status status;
+	RTE_ATOMIC(enum __rte_ctrl_thread_status) status;
 };
 
 static int control_thread_init(void *arg)
@@ -259,13 +259,13 @@ static int control_thread_init(void *arg)
 	RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;
 	params->ret = rte_thread_set_affinity_by_id(rte_thread_self(), cpuset);
 	if (params->ret != 0) {
-		__atomic_store_n(&params->status,
-			CTRL_THREAD_ERROR, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&params->status,
+			CTRL_THREAD_ERROR, rte_memory_order_release);
 		return 1;
 	}
 
-	__atomic_store_n(&params->status,
-		CTRL_THREAD_RUNNING, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&params->status,
+		CTRL_THREAD_RUNNING, rte_memory_order_release);
 
 	return 0;
 }
@@ -310,8 +310,8 @@ static uint32_t control_thread_start(void *arg)
 
 	/* Wait for the control thread to initialize successfully */
 	while ((ctrl_thread_status =
-			__atomic_load_n(&params->status,
-			__ATOMIC_ACQUIRE)) == CTRL_THREAD_LAUNCHING) {
+			rte_atomic_load_explicit(&params->status,
+			rte_memory_order_acquire)) == CTRL_THREAD_LAUNCHING) {
 		rte_delay_us_sleep(1);
 	}
 
diff --git a/lib/eal/common/eal_common_trace.c b/lib/eal/common/eal_common_trace.c
index d2eac2d..6ad87fc 100644
--- a/lib/eal/common/eal_common_trace.c
+++ b/lib/eal/common/eal_common_trace.c
@@ -97,7 +97,7 @@ struct trace_point_head *
 bool
 rte_trace_is_enabled(void)
 {
-	return __atomic_load_n(&trace.status, __ATOMIC_ACQUIRE) != 0;
+	return rte_atomic_load_explicit(&trace.status, rte_memory_order_acquire) != 0;
 }
 
 static void
@@ -157,7 +157,7 @@ rte_trace_mode rte_trace_mode_get(void)
 	prev = rte_atomic_fetch_or_explicit(t, __RTE_TRACE_FIELD_ENABLE_MASK,
 		rte_memory_order_release);
 	if ((prev & __RTE_TRACE_FIELD_ENABLE_MASK) == 0)
-		__atomic_fetch_add(&trace.status, 1, __ATOMIC_RELEASE);
+		rte_atomic_fetch_add_explicit(&trace.status, 1, rte_memory_order_release);
 	return 0;
 }
 
@@ -172,7 +172,7 @@ rte_trace_mode rte_trace_mode_get(void)
 	prev = rte_atomic_fetch_and_explicit(t, ~__RTE_TRACE_FIELD_ENABLE_MASK,
 		rte_memory_order_release);
 	if ((prev & __RTE_TRACE_FIELD_ENABLE_MASK) != 0)
-		__atomic_fetch_sub(&trace.status, 1, __ATOMIC_RELEASE);
+		rte_atomic_fetch_sub_explicit(&trace.status, 1, rte_memory_order_release);
 	return 0;
 }
 
@@ -526,7 +526,7 @@ rte_trace_mode rte_trace_mode_get(void)
 
 	/* Add the trace point at tail */
 	STAILQ_INSERT_TAIL(&tp_list, tp, next);
-	__atomic_thread_fence(__ATOMIC_RELEASE);
+	__atomic_thread_fence(rte_memory_order_release);
 
 	/* All Good !!! */
 	return 0;
diff --git a/lib/eal/common/eal_common_trace_ctf.c b/lib/eal/common/eal_common_trace_ctf.c
index c6775c3..04c4f71 100644
--- a/lib/eal/common/eal_common_trace_ctf.c
+++ b/lib/eal/common/eal_common_trace_ctf.c
@@ -361,10 +361,10 @@
 	if (ctf_meta == NULL)
 		return -EINVAL;
 
-	if (!__atomic_load_n(&trace->ctf_fixup_done, __ATOMIC_SEQ_CST) &&
+	if (!rte_atomic_load_explicit(&trace->ctf_fixup_done, rte_memory_order_seq_cst) &&
 				rte_get_timer_hz()) {
 		meta_fixup(trace, ctf_meta);
-		__atomic_store_n(&trace->ctf_fixup_done, 1, __ATOMIC_SEQ_CST);
+		rte_atomic_store_explicit(&trace->ctf_fixup_done, 1, rte_memory_order_seq_cst);
 	}
 
 	rc = fprintf(f, "%s", ctf_meta);
diff --git a/lib/eal/common/eal_memcfg.h b/lib/eal/common/eal_memcfg.h
index d5c63e2..60e2089 100644
--- a/lib/eal/common/eal_memcfg.h
+++ b/lib/eal/common/eal_memcfg.h
@@ -42,7 +42,7 @@ struct rte_mem_config {
 	rte_rwlock_t memory_hotplug_lock;
 	/**< Indicates whether memory hotplug request is in progress. */
 
-	uint8_t mp_status; /**< Multiprocess status. */
+	RTE_ATOMIC(uint8_t) mp_status; /**< Multiprocess status. */
 
 	/* memory segments and zones */
 	struct rte_fbarray memzones; /**< Memzone descriptors. */
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index ebd496b..4d2e806 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -24,11 +24,11 @@ struct lcore_config {
 	int pipe_main2worker[2];   /**< communication pipe with main */
 	int pipe_worker2main[2];   /**< communication pipe with main */
 
-	lcore_function_t * volatile f; /**< function to call */
+	RTE_ATOMIC(lcore_function_t *) volatile f; /**< function to call */
 	void * volatile arg;       /**< argument of function */
 	volatile int ret;          /**< return value of function */
 
-	volatile enum rte_lcore_state_t state; /**< lcore state */
+	volatile RTE_ATOMIC(enum rte_lcore_state_t) state; /**< lcore state */
 	unsigned int socket_id;    /**< physical socket id for this lcore */
 	unsigned int core_id;      /**< core number on socket for this lcore */
 	int core_index;            /**< relative index, starting from 0 */
diff --git a/lib/eal/common/eal_trace.h b/lib/eal/common/eal_trace.h
index d66bcfe..ace2ef3 100644
--- a/lib/eal/common/eal_trace.h
+++ b/lib/eal/common/eal_trace.h
@@ -50,7 +50,7 @@ struct trace_arg {
 struct trace {
 	char *dir;
 	int register_errno;
-	uint32_t status;
+	RTE_ATOMIC(uint32_t) status;
 	enum rte_trace_mode mode;
 	rte_uuid_t uuid;
 	uint32_t buff_len;
@@ -65,7 +65,7 @@ struct trace {
 	uint32_t ctf_meta_offset_freq;
 	uint32_t ctf_meta_offset_freq_off_s;
 	uint32_t ctf_meta_offset_freq_off;
-	uint16_t ctf_fixup_done;
+	RTE_ATOMIC(uint16_t) ctf_fixup_done;
 	rte_spinlock_t lock;
 };
 
diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index 9e2aa4a..3fc2b9a 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -43,8 +43,8 @@ struct rte_service_spec_impl {
 	rte_spinlock_t execute_lock;
 
 	/* API set/get-able variables */
-	int8_t app_runstate;
-	int8_t comp_runstate;
+	RTE_ATOMIC(int8_t) app_runstate;
+	RTE_ATOMIC(int8_t) comp_runstate;
 	uint8_t internal_flags;
 
 	/* per service statistics */
@@ -52,24 +52,24 @@ struct rte_service_spec_impl {
 	 * It does not indicate the number of cores the service is running
 	 * on currently.
 	 */
-	uint32_t num_mapped_cores;
+	RTE_ATOMIC(uint32_t) num_mapped_cores;
 } __rte_cache_aligned;
 
 struct service_stats {
-	uint64_t calls;
-	uint64_t cycles;
+	RTE_ATOMIC(uint64_t) calls;
+	RTE_ATOMIC(uint64_t) cycles;
 };
 
 /* the internal values of a service core */
 struct core_state {
 	/* map of services IDs are run on this core */
 	uint64_t service_mask;
-	uint8_t runstate; /* running or stopped */
-	uint8_t thread_active; /* indicates when thread is in service_run() */
+	RTE_ATOMIC(uint8_t) runstate; /* running or stopped */
+	RTE_ATOMIC(uint8_t) thread_active; /* indicates when thread is in service_run() */
 	uint8_t is_service_core; /* set if core is currently a service core */
 	uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX];
-	uint64_t loops;
-	uint64_t cycles;
+	RTE_ATOMIC(uint64_t) loops;
+	RTE_ATOMIC(uint64_t) cycles;
 	struct service_stats service_stats[RTE_SERVICE_NUM_MAX];
 } __rte_cache_aligned;
 
@@ -314,11 +314,11 @@ struct core_state {
 	 * service_run and service_runstate_get function.
 	 */
 	if (runstate)
-		__atomic_store_n(&s->comp_runstate, RUNSTATE_RUNNING,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&s->comp_runstate, RUNSTATE_RUNNING,
+			rte_memory_order_release);
 	else
-		__atomic_store_n(&s->comp_runstate, RUNSTATE_STOPPED,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&s->comp_runstate, RUNSTATE_STOPPED,
+			rte_memory_order_release);
 
 	return 0;
 }
@@ -334,11 +334,11 @@ struct core_state {
 	 * service_run runstate_get function.
 	 */
 	if (runstate)
-		__atomic_store_n(&s->app_runstate, RUNSTATE_RUNNING,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&s->app_runstate, RUNSTATE_RUNNING,
+			rte_memory_order_release);
 	else
-		__atomic_store_n(&s->app_runstate, RUNSTATE_STOPPED,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&s->app_runstate, RUNSTATE_STOPPED,
+			rte_memory_order_release);
 
 	rte_eal_trace_service_runstate_set(id, runstate);
 	return 0;
@@ -354,14 +354,14 @@ struct core_state {
 	 * Use load-acquire memory order. This synchronizes with
 	 * store-release in service state set functions.
 	 */
-	if (__atomic_load_n(&s->comp_runstate, __ATOMIC_ACQUIRE) ==
+	if (rte_atomic_load_explicit(&s->comp_runstate, rte_memory_order_acquire) ==
 			RUNSTATE_RUNNING &&
-	    __atomic_load_n(&s->app_runstate, __ATOMIC_ACQUIRE) ==
+	    rte_atomic_load_explicit(&s->app_runstate, rte_memory_order_acquire) ==
 			RUNSTATE_RUNNING) {
 		int check_disabled = !(s->internal_flags &
 			SERVICE_F_START_CHECK);
-		int lcore_mapped = (__atomic_load_n(&s->num_mapped_cores,
-			__ATOMIC_RELAXED) > 0);
+		int lcore_mapped = (rte_atomic_load_explicit(&s->num_mapped_cores,
+			rte_memory_order_relaxed) > 0);
 
 		return (check_disabled | lcore_mapped);
 	} else
@@ -392,15 +392,15 @@ struct core_state {
 			uint64_t end = rte_rdtsc();
 			uint64_t cycles = end - start;
 
-			__atomic_store_n(&cs->cycles, cs->cycles + cycles,
-				__ATOMIC_RELAXED);
-			__atomic_store_n(&service_stats->cycles,
+			rte_atomic_store_explicit(&cs->cycles, cs->cycles + cycles,
+				rte_memory_order_relaxed);
+			rte_atomic_store_explicit(&service_stats->cycles,
 				service_stats->cycles + cycles,
-				__ATOMIC_RELAXED);
+				rte_memory_order_relaxed);
 		}
 
-		__atomic_store_n(&service_stats->calls,
-			service_stats->calls + 1, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&service_stats->calls,
+			service_stats->calls + 1, rte_memory_order_relaxed);
 	} else {
 		s->spec.callback(userdata);
 	}
@@ -420,9 +420,9 @@ struct core_state {
 	 * Use load-acquire memory order. This synchronizes with
 	 * store-release in service state set functions.
 	 */
-	if (__atomic_load_n(&s->comp_runstate, __ATOMIC_ACQUIRE) !=
+	if (rte_atomic_load_explicit(&s->comp_runstate, rte_memory_order_acquire) !=
 			RUNSTATE_RUNNING ||
-	    __atomic_load_n(&s->app_runstate, __ATOMIC_ACQUIRE) !=
+	    rte_atomic_load_explicit(&s->app_runstate, rte_memory_order_acquire) !=
 			RUNSTATE_RUNNING ||
 	    !(service_mask & (UINT64_C(1) << i))) {
 		cs->service_active_on_lcore[i] = 0;
@@ -472,11 +472,11 @@ struct core_state {
 	/* Increment num_mapped_cores to reflect that this core is
 	 * now mapped capable of running the service.
 	 */
-	__atomic_fetch_add(&s->num_mapped_cores, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&s->num_mapped_cores, 1, rte_memory_order_relaxed);
 
 	int ret = service_run(id, cs, UINT64_MAX, s, serialize_mt_unsafe);
 
-	__atomic_fetch_sub(&s->num_mapped_cores, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_sub_explicit(&s->num_mapped_cores, 1, rte_memory_order_relaxed);
 
 	return ret;
 }
@@ -489,13 +489,13 @@ struct core_state {
 	const int lcore = rte_lcore_id();
 	struct core_state *cs = &lcore_states[lcore];
 
-	__atomic_store_n(&cs->thread_active, 1, __ATOMIC_SEQ_CST);
+	rte_atomic_store_explicit(&cs->thread_active, 1, rte_memory_order_seq_cst);
 
 	/* runstate act as the guard variable. Use load-acquire
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
 	 */
-	while (__atomic_load_n(&cs->runstate, __ATOMIC_ACQUIRE) ==
+	while (rte_atomic_load_explicit(&cs->runstate, rte_memory_order_acquire) ==
 			RUNSTATE_RUNNING) {
 
 		const uint64_t service_mask = cs->service_mask;
@@ -513,7 +513,7 @@ struct core_state {
 			service_run(i, cs, service_mask, service_get(i), 1);
 		}
 
-		__atomic_store_n(&cs->loops, cs->loops + 1, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&cs->loops, cs->loops + 1, rte_memory_order_relaxed);
 	}
 
 	/* Switch off this core for all services, to ensure that future
@@ -526,7 +526,7 @@ struct core_state {
 	 * this store, ensuring that once this store is visible, the service
 	 * lcore thread really is done in service cores code.
 	 */
-	__atomic_store_n(&cs->thread_active, 0, __ATOMIC_SEQ_CST);
+	rte_atomic_store_explicit(&cs->thread_active, 0, rte_memory_order_seq_cst);
 	return 0;
 }
 
@@ -539,8 +539,8 @@ struct core_state {
 	/* Load thread_active using ACQUIRE to avoid instructions dependent on
 	 * the result being re-ordered before this load completes.
 	 */
-	return __atomic_load_n(&lcore_states[lcore].thread_active,
-			       __ATOMIC_ACQUIRE);
+	return rte_atomic_load_explicit(&lcore_states[lcore].thread_active,
+			       rte_memory_order_acquire);
 }
 
 int32_t
@@ -646,13 +646,13 @@ struct core_state {
 
 		if (*set && !lcore_mapped) {
 			lcore_states[lcore].service_mask |= sid_mask;
-			__atomic_fetch_add(&rte_services[sid].num_mapped_cores,
-				1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&rte_services[sid].num_mapped_cores,
+				1, rte_memory_order_relaxed);
 		}
 		if (!*set && lcore_mapped) {
 			lcore_states[lcore].service_mask &= ~(sid_mask);
-			__atomic_fetch_sub(&rte_services[sid].num_mapped_cores,
-				1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_sub_explicit(&rte_services[sid].num_mapped_cores,
+				1, rte_memory_order_relaxed);
 		}
 	}
 
@@ -709,13 +709,13 @@ struct core_state {
 			 * store-release memory order here to synchronize
 			 * with load-acquire in runstate read functions.
 			 */
-			__atomic_store_n(&lcore_states[i].runstate,
-				RUNSTATE_STOPPED, __ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&lcore_states[i].runstate,
+				RUNSTATE_STOPPED, rte_memory_order_release);
 		}
 	}
 	for (i = 0; i < RTE_SERVICE_NUM_MAX; i++)
-		__atomic_store_n(&rte_services[i].num_mapped_cores, 0,
-			__ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&rte_services[i].num_mapped_cores, 0,
+			rte_memory_order_relaxed);
 
 	return 0;
 }
@@ -735,8 +735,8 @@ struct core_state {
 	/* Use store-release memory order here to synchronize with
 	 * load-acquire in runstate read functions.
 	 */
-	__atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
-		__ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
+		rte_memory_order_release);
 
 	return rte_eal_wait_lcore(lcore);
 }
@@ -755,7 +755,7 @@ struct core_state {
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
 	 */
-	if (__atomic_load_n(&cs->runstate, __ATOMIC_ACQUIRE) !=
+	if (rte_atomic_load_explicit(&cs->runstate, rte_memory_order_acquire) !=
 			RUNSTATE_STOPPED)
 		return -EBUSY;
 
@@ -779,7 +779,7 @@ struct core_state {
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
 	 */
-	if (__atomic_load_n(&cs->runstate, __ATOMIC_ACQUIRE) ==
+	if (rte_atomic_load_explicit(&cs->runstate, rte_memory_order_acquire) ==
 			RUNSTATE_RUNNING)
 		return -EALREADY;
 
@@ -789,7 +789,7 @@ struct core_state {
 	/* Use load-acquire memory order here to synchronize with
 	 * store-release in runstate update functions.
 	 */
-	__atomic_store_n(&cs->runstate, RUNSTATE_RUNNING, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&cs->runstate, RUNSTATE_RUNNING, rte_memory_order_release);
 
 	rte_eal_trace_service_lcore_start(lcore);
 
@@ -808,7 +808,7 @@ struct core_state {
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
 	 */
-	if (__atomic_load_n(&lcore_states[lcore].runstate, __ATOMIC_ACQUIRE) ==
+	if (rte_atomic_load_explicit(&lcore_states[lcore].runstate, rte_memory_order_acquire) ==
 			RUNSTATE_STOPPED)
 		return -EALREADY;
 
@@ -820,8 +820,8 @@ struct core_state {
 		int32_t enabled = service_mask & (UINT64_C(1) << i);
 		int32_t service_running = rte_service_runstate_get(i);
 		int32_t only_core = (1 ==
-			__atomic_load_n(&rte_services[i].num_mapped_cores,
-				__ATOMIC_RELAXED));
+			rte_atomic_load_explicit(&rte_services[i].num_mapped_cores,
+				rte_memory_order_relaxed));
 
 		/* if the core is mapped, and the service is running, and this
 		 * is the only core that is mapped, the service would cease to
@@ -834,8 +834,8 @@ struct core_state {
 	/* Use store-release memory order here to synchronize with
 	 * load-acquire in runstate read functions.
 	 */
-	__atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
-		__ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
+		rte_memory_order_release);
 
 	rte_eal_trace_service_lcore_stop(lcore);
 
@@ -847,7 +847,7 @@ struct core_state {
 {
 	struct core_state *cs = &lcore_states[lcore];
 
-	return __atomic_load_n(&cs->loops, __ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&cs->loops, rte_memory_order_relaxed);
 }
 
 static uint64_t
@@ -855,7 +855,7 @@ struct core_state {
 {
 	struct core_state *cs = &lcore_states[lcore];
 
-	return __atomic_load_n(&cs->cycles, __ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&cs->cycles, rte_memory_order_relaxed);
 }
 
 static uint64_t
@@ -863,8 +863,8 @@ struct core_state {
 {
 	struct core_state *cs = &lcore_states[lcore];
 
-	return __atomic_load_n(&cs->service_stats[service_id].calls,
-		__ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&cs->service_stats[service_id].calls,
+		rte_memory_order_relaxed);
 }
 
 static uint64_t
@@ -872,8 +872,8 @@ struct core_state {
 {
 	struct core_state *cs = &lcore_states[lcore];
 
-	return __atomic_load_n(&cs->service_stats[service_id].cycles,
-		__ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&cs->service_stats[service_id].cycles,
+		rte_memory_order_relaxed);
 }
 
 typedef uint64_t (*lcore_attr_get_fun)(uint32_t service_id,
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 39a2868..568e06e 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -597,8 +597,8 @@ static void rte_eal_init_alert(const char *msg)
 		return -1;
 	}
 
-	if (!__atomic_compare_exchange_n(&run_once, &has_run, 1, 0,
-					__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&run_once, &has_run, 1,
+					rte_memory_order_relaxed, rte_memory_order_relaxed)) {
 		rte_eal_init_alert("already called initialization.");
 		rte_errno = EALREADY;
 		return -1;
@@ -622,7 +622,7 @@ static void rte_eal_init_alert(const char *msg)
 	if (fctret < 0) {
 		rte_eal_init_alert("Invalid 'command line' arguments.");
 		rte_errno = EINVAL;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -636,20 +636,20 @@ static void rte_eal_init_alert(const char *msg)
 	if (eal_plugins_init() < 0) {
 		rte_eal_init_alert("Cannot init plugins");
 		rte_errno = EINVAL;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
 	if (eal_trace_init() < 0) {
 		rte_eal_init_alert("Cannot init trace");
 		rte_errno = EFAULT;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
 	if (eal_option_device_parse()) {
 		rte_errno = ENODEV;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -683,7 +683,7 @@ static void rte_eal_init_alert(const char *msg)
 	if (rte_bus_scan()) {
 		rte_eal_init_alert("Cannot scan the buses for devices");
 		rte_errno = ENODEV;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -736,7 +736,7 @@ static void rte_eal_init_alert(const char *msg)
 		if (ret < 0) {
 			rte_eal_init_alert("Cannot get hugepage information.");
 			rte_errno = EACCES;
-			__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+			rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 			return -1;
 		}
 	}
@@ -915,8 +915,8 @@ static void rte_eal_init_alert(const char *msg)
 	static uint32_t run_once;
 	uint32_t has_run = 0;
 
-	if (!__atomic_compare_exchange_n(&run_once, &has_run, 1, 0,
-			__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&run_once, &has_run, 1,
+			rte_memory_order_relaxed, rte_memory_order_relaxed)) {
 		RTE_LOG(WARNING, EAL, "Already called cleanup\n");
 		rte_errno = EALREADY;
 		return -1;
diff --git a/lib/eal/include/rte_epoll.h b/lib/eal/include/rte_epoll.h
index 01525f5..ae0cf20 100644
--- a/lib/eal/include/rte_epoll.h
+++ b/lib/eal/include/rte_epoll.h
@@ -13,6 +13,7 @@
 
 #include <stdint.h>
 
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -38,7 +39,7 @@ enum {
 
 /** interrupt epoll event obj, taken by epoll_event.ptr */
 struct rte_epoll_event {
-	uint32_t status;           /**< OUT: event status */
+	RTE_ATOMIC(uint32_t) status;           /**< OUT: event status */
 	int fd;                    /**< OUT: event fd */
 	int epfd;       /**< OUT: epoll instance the ev associated with */
 	struct rte_epoll_data epdata;
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 5f4b2fb..57da058 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -967,7 +967,7 @@ static void rte_eal_init_alert(const char *msg)
 rte_eal_init(int argc, char **argv)
 {
 	int i, fctret, ret;
-	static uint32_t run_once;
+	static RTE_ATOMIC(uint32_t) run_once;
 	uint32_t has_run = 0;
 	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
 	char thread_name[RTE_THREAD_NAME_SIZE];
@@ -983,8 +983,8 @@ static void rte_eal_init_alert(const char *msg)
 		return -1;
 	}
 
-	if (!__atomic_compare_exchange_n(&run_once, &has_run, 1, 0,
-					__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&run_once, &has_run, 1,
+					rte_memory_order_relaxed, rte_memory_order_relaxed)) {
 		rte_eal_init_alert("already called initialization.");
 		rte_errno = EALREADY;
 		return -1;
@@ -1008,14 +1008,14 @@ static void rte_eal_init_alert(const char *msg)
 	if (fctret < 0) {
 		rte_eal_init_alert("Invalid 'command line' arguments.");
 		rte_errno = EINVAL;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
 	if (eal_plugins_init() < 0) {
 		rte_eal_init_alert("Cannot init plugins");
 		rte_errno = EINVAL;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -1027,7 +1027,7 @@ static void rte_eal_init_alert(const char *msg)
 
 	if (eal_option_device_parse()) {
 		rte_errno = ENODEV;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -1061,7 +1061,7 @@ static void rte_eal_init_alert(const char *msg)
 	if (rte_bus_scan()) {
 		rte_eal_init_alert("Cannot scan the buses for devices");
 		rte_errno = ENODEV;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -1125,7 +1125,7 @@ static void rte_eal_init_alert(const char *msg)
 		if (ret < 0) {
 			rte_eal_init_alert("Cannot get hugepage information.");
 			rte_errno = EACCES;
-			__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+			rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 			return -1;
 		}
 	}
@@ -1150,7 +1150,7 @@ static void rte_eal_init_alert(const char *msg)
 			 internal_conf->syslog_facility) < 0) {
 		rte_eal_init_alert("Cannot init logging.");
 		rte_errno = ENOMEM;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -1158,7 +1158,7 @@ static void rte_eal_init_alert(const char *msg)
 	if (rte_eal_vfio_setup() < 0) {
 		rte_eal_init_alert("Cannot init VFIO");
 		rte_errno = EAGAIN;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 #endif
@@ -1345,11 +1345,11 @@ static void rte_eal_init_alert(const char *msg)
 int
 rte_eal_cleanup(void)
 {
-	static uint32_t run_once;
+	static RTE_ATOMIC(uint32_t) run_once;
 	uint32_t has_run = 0;
 
-	if (!__atomic_compare_exchange_n(&run_once, &has_run, 1, 0,
-					__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&run_once, &has_run, 1,
+					rte_memory_order_relaxed, rte_memory_order_relaxed)) {
 		RTE_LOG(WARNING, EAL, "Already called cleanup\n");
 		rte_errno = EALREADY;
 		return -1;
diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
index 24fff3d..d4919df 100644
--- a/lib/eal/linux/eal_interrupts.c
+++ b/lib/eal/linux/eal_interrupts.c
@@ -1266,9 +1266,9 @@ struct rte_intr_source {
 		 * ordering below acting as a lock to synchronize
 		 * the event data updating.
 		 */
-		if (!rev || !__atomic_compare_exchange_n(&rev->status,
-				    &valid_status, RTE_EPOLL_EXEC, 0,
-				    __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+		if (!rev || !rte_atomic_compare_exchange_strong_explicit(&rev->status,
+				    &valid_status, RTE_EPOLL_EXEC,
+				    rte_memory_order_acquire, rte_memory_order_relaxed))
 			continue;
 
 		events[count].status        = RTE_EPOLL_VALID;
@@ -1283,8 +1283,8 @@ struct rte_intr_source {
 		/* the status update should be observed after
 		 * the other fields change.
 		 */
-		__atomic_store_n(&rev->status, RTE_EPOLL_VALID,
-				__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&rev->status, RTE_EPOLL_VALID,
+				rte_memory_order_release);
 		count++;
 	}
 	return count;
@@ -1374,10 +1374,10 @@ struct rte_intr_source {
 {
 	uint32_t valid_status = RTE_EPOLL_VALID;
 
-	while (!__atomic_compare_exchange_n(&ev->status, &valid_status,
-		    RTE_EPOLL_INVALID, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
-		while (__atomic_load_n(&ev->status,
-				__ATOMIC_RELAXED) != RTE_EPOLL_VALID)
+	while (!rte_atomic_compare_exchange_strong_explicit(&ev->status, &valid_status,
+		    RTE_EPOLL_INVALID, rte_memory_order_acquire, rte_memory_order_relaxed)) {
+		while (rte_atomic_load_explicit(&ev->status,
+				rte_memory_order_relaxed) != RTE_EPOLL_VALID)
 			rte_pause();
 		valid_status = RTE_EPOLL_VALID;
 	}
@@ -1402,8 +1402,8 @@ struct rte_intr_source {
 		epfd = rte_intr_tls_epfd();
 
 	if (op == EPOLL_CTL_ADD) {
-		__atomic_store_n(&event->status, RTE_EPOLL_VALID,
-				__ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&event->status, RTE_EPOLL_VALID,
+				rte_memory_order_relaxed);
 		event->fd = fd;  /* ignore fd in event */
 		event->epfd = epfd;
 		ev.data.ptr = (void *)event;
@@ -1415,13 +1415,13 @@ struct rte_intr_source {
 			op, fd, strerror(errno));
 		if (op == EPOLL_CTL_ADD)
 			/* rollback status when CTL_ADD fail */
-			__atomic_store_n(&event->status, RTE_EPOLL_INVALID,
-					__ATOMIC_RELAXED);
+			rte_atomic_store_explicit(&event->status, RTE_EPOLL_INVALID,
+					rte_memory_order_relaxed);
 		return -1;
 	}
 
-	if (op == EPOLL_CTL_DEL && __atomic_load_n(&event->status,
-			__ATOMIC_RELAXED) != RTE_EPOLL_INVALID)
+	if (op == EPOLL_CTL_DEL && rte_atomic_load_explicit(&event->status,
+			rte_memory_order_relaxed) != RTE_EPOLL_INVALID)
 		eal_epoll_data_safe_free(event);
 
 	return 0;
@@ -1450,8 +1450,8 @@ struct rte_intr_source {
 	case RTE_INTR_EVENT_ADD:
 		epfd_op = EPOLL_CTL_ADD;
 		rev = rte_intr_elist_index_get(intr_handle, efd_idx);
-		if (__atomic_load_n(&rev->status,
-				__ATOMIC_RELAXED) != RTE_EPOLL_INVALID) {
+		if (rte_atomic_load_explicit(&rev->status,
+				rte_memory_order_relaxed) != RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event already been added.\n");
 			return -EEXIST;
 		}
@@ -1474,8 +1474,8 @@ struct rte_intr_source {
 	case RTE_INTR_EVENT_DEL:
 		epfd_op = EPOLL_CTL_DEL;
 		rev = rte_intr_elist_index_get(intr_handle, efd_idx);
-		if (__atomic_load_n(&rev->status,
-				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID) {
+		if (rte_atomic_load_explicit(&rev->status,
+				rte_memory_order_relaxed) == RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event does not exist.\n");
 			return -EPERM;
 		}
@@ -1500,8 +1500,8 @@ struct rte_intr_source {
 
 	for (i = 0; i < (uint32_t)rte_intr_nb_efd_get(intr_handle); i++) {
 		rev = rte_intr_elist_index_get(intr_handle, i);
-		if (__atomic_load_n(&rev->status,
-				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID)
+		if (rte_atomic_load_explicit(&rev->status,
+				rte_memory_order_relaxed) == RTE_EPOLL_INVALID)
 			continue;
 		if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) {
 			/* force free if the entry valid */
diff --git a/lib/eal/ppc/include/rte_atomic.h b/lib/eal/ppc/include/rte_atomic.h
index 7382412..645c713 100644
--- a/lib/eal/ppc/include/rte_atomic.h
+++ b/lib/eal/ppc/include/rte_atomic.h
@@ -48,7 +48,7 @@
 static inline int
 rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
 {
-	return __atomic_compare_exchange(dst, &exp, &src, 0, rte_memory_order_acquire,
+	return rte_atomic_compare_exchange_strong_explicit(dst, &exp, src, rte_memory_order_acquire,
 		rte_memory_order_acquire) ? 1 : 0;
 }
 
@@ -90,7 +90,7 @@ static inline int rte_atomic16_dec_and_test(rte_atomic16_t *v)
 static inline int
 rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
 {
-	return __atomic_compare_exchange(dst, &exp, &src, 0, rte_memory_order_acquire,
+	return rte_atomic_compare_exchange_strong_explicit(dst, &exp, src, rte_memory_order_acquire,
 		rte_memory_order_acquire) ? 1 : 0;
 }
 
@@ -132,7 +132,7 @@ static inline int rte_atomic32_dec_and_test(rte_atomic32_t *v)
 static inline int
 rte_atomic64_cmpset(volatile uint64_t *dst, uint64_t exp, uint64_t src)
 {
-	return __atomic_compare_exchange(dst, &exp, &src, 0, rte_memory_order_acquire,
+	return rte_atomic_compare_exchange_strong_explicit(dst, &exp, src, rte_memory_order_acquire,
 		rte_memory_order_acquire) ? 1 : 0;
 }
 
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index acf6484..145ac4b 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -9,6 +9,7 @@
 #include <rte_eal.h>
 #include <rte_common.h>
 #include <rte_errno.h>
+#include <rte_stdatomic.h>
 #include <rte_thread.h>
 
 #include "eal_windows.h"
@@ -19,7 +20,7 @@ struct eal_tls_key {
 
 struct thread_routine_ctx {
 	rte_thread_func thread_func;
-	bool thread_init_failed;
+	RTE_ATOMIC(bool) thread_init_failed;
 	void *routine_args;
 };
 
@@ -168,7 +169,8 @@ struct thread_routine_ctx {
 thread_func_wrapper(void *arg)
 {
 	struct thread_routine_ctx ctx = *(struct thread_routine_ctx *)arg;
-	const bool thread_exit = __atomic_load_n(&ctx.thread_init_failed, __ATOMIC_ACQUIRE);
+	const bool thread_exit = rte_atomic_load_explicit(
+		&ctx.thread_init_failed, rte_memory_order_acquire);
 
 	free(arg);
 
@@ -237,7 +239,7 @@ struct thread_routine_ctx {
 	}
 
 resume_thread:
-	__atomic_store_n(&ctx->thread_init_failed, thread_exit, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ctx->thread_init_failed, thread_exit, rte_memory_order_release);
 
 	if (ResumeThread(thread_handle) == (DWORD)-1) {
 		ret = thread_log_last_error("ResumeThread()");
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 04/19] eventdev: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (2 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 03/19] eal: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 05/19] gpudev: " Tyler Retzlaff
                     ` (15 subsequent siblings)
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 drivers/event/cnxk/cnxk_tim_worker.h   |  4 +--
 lib/eventdev/rte_event_timer_adapter.c | 66 +++++++++++++++++-----------------
 lib/eventdev/rte_event_timer_adapter.h |  2 +-
 3 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/event/cnxk/cnxk_tim_worker.h b/drivers/event/cnxk/cnxk_tim_worker.h
index f0857f2..f530d8c 100644
--- a/drivers/event/cnxk/cnxk_tim_worker.h
+++ b/drivers/event/cnxk/cnxk_tim_worker.h
@@ -314,7 +314,7 @@
 
 	tim->impl_opaque[0] = (uintptr_t)chunk;
 	tim->impl_opaque[1] = (uintptr_t)bkt;
-	__atomic_store_n(&tim->state, RTE_EVENT_TIMER_ARMED, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&tim->state, RTE_EVENT_TIMER_ARMED, rte_memory_order_release);
 	cnxk_tim_bkt_inc_nent(bkt);
 	cnxk_tim_bkt_dec_lock_relaxed(bkt);
 
@@ -425,7 +425,7 @@
 
 	tim->impl_opaque[0] = (uintptr_t)chunk;
 	tim->impl_opaque[1] = (uintptr_t)bkt;
-	__atomic_store_n(&tim->state, RTE_EVENT_TIMER_ARMED, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&tim->state, RTE_EVENT_TIMER_ARMED, rte_memory_order_release);
 	cnxk_tim_bkt_inc_nent(bkt);
 	cnxk_tim_bkt_dec_lock_relaxed(bkt);
 
diff --git a/lib/eventdev/rte_event_timer_adapter.c b/lib/eventdev/rte_event_timer_adapter.c
index 427c4c6..2746670 100644
--- a/lib/eventdev/rte_event_timer_adapter.c
+++ b/lib/eventdev/rte_event_timer_adapter.c
@@ -630,12 +630,12 @@ struct swtim {
 	uint32_t timer_data_id;
 	/* Track which cores have actually armed a timer */
 	struct {
-		uint16_t v;
+		RTE_ATOMIC(uint16_t) v;
 	} __rte_cache_aligned in_use[RTE_MAX_LCORE];
 	/* Track which cores' timer lists should be polled */
-	unsigned int poll_lcores[RTE_MAX_LCORE];
+	RTE_ATOMIC(unsigned int) poll_lcores[RTE_MAX_LCORE];
 	/* The number of lists that should be polled */
-	int n_poll_lcores;
+	RTE_ATOMIC(int) n_poll_lcores;
 	/* Timers which have expired and can be returned to a mempool */
 	struct rte_timer *expired_timers[EXP_TIM_BUF_SZ];
 	/* The number of timers that can be returned to a mempool */
@@ -669,10 +669,10 @@ struct swtim {
 
 	if (unlikely(sw->in_use[lcore].v == 0)) {
 		sw->in_use[lcore].v = 1;
-		n_lcores = __atomic_fetch_add(&sw->n_poll_lcores, 1,
-					     __ATOMIC_RELAXED);
-		__atomic_store_n(&sw->poll_lcores[n_lcores], lcore,
-				__ATOMIC_RELAXED);
+		n_lcores = rte_atomic_fetch_add_explicit(&sw->n_poll_lcores, 1,
+					     rte_memory_order_relaxed);
+		rte_atomic_store_explicit(&sw->poll_lcores[n_lcores], lcore,
+				rte_memory_order_relaxed);
 	}
 
 	ret = event_buffer_add(&sw->buffer, &evtim->ev);
@@ -719,8 +719,8 @@ struct swtim {
 		sw->stats.evtim_exp_count++;
 
 		if (type == SINGLE)
-			__atomic_store_n(&evtim->state, RTE_EVENT_TIMER_NOT_ARMED,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&evtim->state, RTE_EVENT_TIMER_NOT_ARMED,
+				rte_memory_order_release);
 	}
 
 	if (event_buffer_batch_ready(&sw->buffer)) {
@@ -846,7 +846,7 @@ struct swtim {
 
 	if (swtim_did_tick(sw)) {
 		rte_timer_alt_manage(sw->timer_data_id,
-				     sw->poll_lcores,
+				     (unsigned int *)(uintptr_t)sw->poll_lcores,
 				     sw->n_poll_lcores,
 				     swtim_callback);
 
@@ -1027,7 +1027,7 @@ struct swtim {
 
 	/* Free outstanding timers */
 	rte_timer_stop_all(sw->timer_data_id,
-			   sw->poll_lcores,
+			   (unsigned int *)(uintptr_t)sw->poll_lcores,
 			   sw->n_poll_lcores,
 			   swtim_free_tim,
 			   sw);
@@ -1142,7 +1142,7 @@ struct swtim {
 	uint64_t cur_cycles;
 
 	/* Check that timer is armed */
-	n_state = __atomic_load_n(&evtim->state, __ATOMIC_ACQUIRE);
+	n_state = rte_atomic_load_explicit(&evtim->state, rte_memory_order_acquire);
 	if (n_state != RTE_EVENT_TIMER_ARMED)
 		return -EINVAL;
 
@@ -1201,15 +1201,15 @@ struct swtim {
 	 * The atomic compare-and-swap operation can prevent the race condition
 	 * on in_use flag between multiple non-EAL threads.
 	 */
-	if (unlikely(__atomic_compare_exchange_n(&sw->in_use[lcore_id].v,
-			&exp_state, 1, 0,
-			__ATOMIC_RELAXED, __ATOMIC_RELAXED))) {
+	if (unlikely(rte_atomic_compare_exchange_strong_explicit(&sw->in_use[lcore_id].v,
+			&exp_state, 1,
+			rte_memory_order_relaxed, rte_memory_order_relaxed))) {
 		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
 			      lcore_id);
-		n_lcores = __atomic_fetch_add(&sw->n_poll_lcores, 1,
-					     __ATOMIC_RELAXED);
-		__atomic_store_n(&sw->poll_lcores[n_lcores], lcore_id,
-				__ATOMIC_RELAXED);
+		n_lcores = rte_atomic_fetch_add_explicit(&sw->n_poll_lcores, 1,
+					     rte_memory_order_relaxed);
+		rte_atomic_store_explicit(&sw->poll_lcores[n_lcores], lcore_id,
+				rte_memory_order_relaxed);
 	}
 
 	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
@@ -1223,7 +1223,7 @@ struct swtim {
 	type = get_timer_type(adapter);
 
 	for (i = 0; i < nb_evtims; i++) {
-		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		n_state = rte_atomic_load_explicit(&evtims[i]->state, rte_memory_order_acquire);
 		if (n_state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
@@ -1235,9 +1235,9 @@ struct swtim {
 
 		if (unlikely(check_destination_event_queue(evtims[i],
 							   adapter) < 0)) {
-			__atomic_store_n(&evtims[i]->state,
+			rte_atomic_store_explicit(&evtims[i]->state,
 					RTE_EVENT_TIMER_ERROR,
-					__ATOMIC_RELAXED);
+					rte_memory_order_relaxed);
 			rte_errno = EINVAL;
 			break;
 		}
@@ -1250,15 +1250,15 @@ struct swtim {
 
 		ret = get_timeout_cycles(evtims[i], adapter, &cycles);
 		if (unlikely(ret == -1)) {
-			__atomic_store_n(&evtims[i]->state,
+			rte_atomic_store_explicit(&evtims[i]->state,
 					RTE_EVENT_TIMER_ERROR_TOOLATE,
-					__ATOMIC_RELAXED);
+					rte_memory_order_relaxed);
 			rte_errno = EINVAL;
 			break;
 		} else if (unlikely(ret == -2)) {
-			__atomic_store_n(&evtims[i]->state,
+			rte_atomic_store_explicit(&evtims[i]->state,
 					RTE_EVENT_TIMER_ERROR_TOOEARLY,
-					__ATOMIC_RELAXED);
+					rte_memory_order_relaxed);
 			rte_errno = EINVAL;
 			break;
 		}
@@ -1267,9 +1267,9 @@ struct swtim {
 					  type, lcore_id, NULL, evtims[i]);
 		if (ret < 0) {
 			/* tim was in RUNNING or CONFIG state */
-			__atomic_store_n(&evtims[i]->state,
+			rte_atomic_store_explicit(&evtims[i]->state,
 					RTE_EVENT_TIMER_ERROR,
-					__ATOMIC_RELEASE);
+					rte_memory_order_release);
 			break;
 		}
 
@@ -1277,8 +1277,8 @@ struct swtim {
 		/* RELEASE ordering guarantees the adapter specific value
 		 * changes observed before the update of state.
 		 */
-		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_ARMED,
-				__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&evtims[i]->state, RTE_EVENT_TIMER_ARMED,
+				rte_memory_order_release);
 	}
 
 	if (i < nb_evtims)
@@ -1320,7 +1320,7 @@ struct swtim {
 		/* ACQUIRE ordering guarantees the access of implementation
 		 * specific opaque data under the correct state.
 		 */
-		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		n_state = rte_atomic_load_explicit(&evtims[i]->state, rte_memory_order_acquire);
 		if (n_state == RTE_EVENT_TIMER_CANCELED) {
 			rte_errno = EALREADY;
 			break;
@@ -1346,8 +1346,8 @@ struct swtim {
 		 * to make sure the state update data observed between
 		 * threads.
 		 */
-		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_CANCELED,
-				__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&evtims[i]->state, RTE_EVENT_TIMER_CANCELED,
+				rte_memory_order_release);
 	}
 
 	return i;
diff --git a/lib/eventdev/rte_event_timer_adapter.h b/lib/eventdev/rte_event_timer_adapter.h
index fbdddf8..49e646a 100644
--- a/lib/eventdev/rte_event_timer_adapter.h
+++ b/lib/eventdev/rte_event_timer_adapter.h
@@ -498,7 +498,7 @@ struct rte_event_timer {
 	 * implementation specific values to share between the arm and cancel
 	 * operations.  The application should not modify this field.
 	 */
-	enum rte_event_timer_state state;
+	RTE_ATOMIC(enum rte_event_timer_state) state;
 	/**< State of the event timer. */
 	uint8_t user_meta[];
 	/**< Memory to store user specific metadata.
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 05/19] gpudev: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (3 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 04/19] eventdev: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 06/19] ipsec: " Tyler Retzlaff
                     ` (14 subsequent siblings)
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/gpudev/gpudev.c        | 6 +++---
 lib/gpudev/gpudev_driver.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 8f12abe..6845d18 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -228,7 +228,7 @@ struct rte_gpu *
 	dev->mpshared->info.numa_node = -1;
 	dev->mpshared->info.parent = RTE_GPU_ID_NONE;
 	TAILQ_INIT(&dev->callbacks);
-	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&dev->mpshared->process_refcnt, 1, rte_memory_order_relaxed);
 
 	gpu_count++;
 	GPU_LOG(DEBUG, "new device %s (id %d) of total %d",
@@ -277,7 +277,7 @@ struct rte_gpu *
 
 	TAILQ_INIT(&dev->callbacks);
 	dev->mpshared = shared_dev;
-	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&dev->mpshared->process_refcnt, 1, rte_memory_order_relaxed);
 
 	gpu_count++;
 	GPU_LOG(DEBUG, "attached device %s (id %d) of total %d",
@@ -340,7 +340,7 @@ struct rte_gpu *
 
 	gpu_free_callbacks(dev);
 	dev->process_state = RTE_GPU_STATE_UNUSED;
-	__atomic_fetch_sub(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_sub_explicit(&dev->mpshared->process_refcnt, 1, rte_memory_order_relaxed);
 	gpu_count--;
 
 	return 0;
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 42898c7..0b1e7f2 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -69,7 +69,7 @@ struct rte_gpu_mpshared {
 	/* Device info structure. */
 	struct rte_gpu_info info;
 	/* Counter of processes using the device. */
-	uint16_t process_refcnt; /* Updated by this library. */
+	RTE_ATOMIC(uint16_t) process_refcnt; /* Updated by this library. */
 };
 
 struct rte_gpu {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 06/19] ipsec: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (4 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 05/19] gpudev: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-24  8:45     ` Konstantin Ananyev
  2023-10-17 20:31   ` [PATCH v2 07/19] mbuf: " Tyler Retzlaff
                     ` (13 subsequent siblings)
  19 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/ipsec/ipsec_sqn.h | 2 +-
 lib/ipsec/sa.h        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/ipsec/ipsec_sqn.h b/lib/ipsec/ipsec_sqn.h
index 505950e..984a9dd 100644
--- a/lib/ipsec/ipsec_sqn.h
+++ b/lib/ipsec/ipsec_sqn.h
@@ -128,7 +128,7 @@
 
 	n = *num;
 	if (SQN_ATOMIC(sa))
-		sqn = __atomic_fetch_add(&sa->sqn.outb, n, __ATOMIC_RELAXED) + n;
+		sqn = rte_atomic_fetch_add_explicit(&sa->sqn.outb, n, rte_memory_order_relaxed) + n;
 	else {
 		sqn = sa->sqn.outb + n;
 		sa->sqn.outb = sqn;
diff --git a/lib/ipsec/sa.h b/lib/ipsec/sa.h
index ce4af8c..4b30bea 100644
--- a/lib/ipsec/sa.h
+++ b/lib/ipsec/sa.h
@@ -124,7 +124,7 @@ struct rte_ipsec_sa {
 	 * place from other frequently accessed data.
 	 */
 	union {
-		uint64_t outb;
+		RTE_ATOMIC(uint64_t) outb;
 		struct {
 			uint32_t rdidx; /* read index */
 			uint32_t wridx; /* write index */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 07/19] mbuf: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (5 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 06/19] ipsec: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-24  8:46     ` Konstantin Ananyev
  2023-10-17 20:31   ` [PATCH v2 08/19] mempool: " Tyler Retzlaff
                     ` (12 subsequent siblings)
  19 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/mbuf/rte_mbuf.h      | 20 ++++++++++----------
 lib/mbuf/rte_mbuf_core.h |  5 +++--
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 913c459..b8ab477 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -361,7 +361,7 @@ struct rte_pktmbuf_pool_private {
 static inline uint16_t
 rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 {
-	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&m->refcnt, rte_memory_order_relaxed);
 }
 
 /**
@@ -374,15 +374,15 @@ struct rte_pktmbuf_pool_private {
 static inline void
 rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 {
-	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
+	rte_atomic_store_explicit(&m->refcnt, new_value, rte_memory_order_relaxed);
 }
 
 /* internal */
 static inline uint16_t
 __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
 {
-	return __atomic_fetch_add(&m->refcnt, value,
-				 __ATOMIC_ACQ_REL) + value;
+	return rte_atomic_fetch_add_explicit(&m->refcnt, value,
+				 rte_memory_order_acq_rel) + value;
 }
 
 /**
@@ -463,7 +463,7 @@ struct rte_pktmbuf_pool_private {
 static inline uint16_t
 rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
 {
-	return __atomic_load_n(&shinfo->refcnt, __ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&shinfo->refcnt, rte_memory_order_relaxed);
 }
 
 /**
@@ -478,7 +478,7 @@ struct rte_pktmbuf_pool_private {
 rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
 	uint16_t new_value)
 {
-	__atomic_store_n(&shinfo->refcnt, new_value, __ATOMIC_RELAXED);
+	rte_atomic_store_explicit(&shinfo->refcnt, new_value, rte_memory_order_relaxed);
 }
 
 /**
@@ -502,8 +502,8 @@ struct rte_pktmbuf_pool_private {
 		return (uint16_t)value;
 	}
 
-	return __atomic_fetch_add(&shinfo->refcnt, value,
-				 __ATOMIC_ACQ_REL) + value;
+	return rte_atomic_fetch_add_explicit(&shinfo->refcnt, value,
+				 rte_memory_order_acq_rel) + value;
 }
 
 /** Mbuf prefetch */
@@ -1315,8 +1315,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 	 * Direct usage of add primitive to avoid
 	 * duplication of comparing with one.
 	 */
-	if (likely(__atomic_fetch_add(&shinfo->refcnt, -1,
-				     __ATOMIC_ACQ_REL) - 1))
+	if (likely(rte_atomic_fetch_add_explicit(&shinfo->refcnt, -1,
+				     rte_memory_order_acq_rel) - 1))
 		return 1;
 
 	/* Reinitialize counter before mbuf freeing. */
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index e9bc0d1..5688683 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -19,6 +19,7 @@
 #include <stdint.h>
 
 #include <rte_byteorder.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -497,7 +498,7 @@ struct rte_mbuf {
 	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
 	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
 	 */
-	uint16_t refcnt;
+	RTE_ATOMIC(uint16_t) refcnt;
 
 	/**
 	 * Number of segments. Only valid for the first segment of an mbuf
@@ -674,7 +675,7 @@ struct rte_mbuf {
 struct rte_mbuf_ext_shared_info {
 	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
 	void *fcb_opaque;                        /**< Free callback argument */
-	uint16_t refcnt;
+	RTE_ATOMIC(uint16_t) refcnt;
 };
 
 /** Maximum number of nb_segs allowed. */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 08/19] mempool: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (6 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 07/19] mbuf: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-24  8:47     ` Konstantin Ananyev
  2023-10-17 20:31   ` [PATCH v2 09/19] rcu: " Tyler Retzlaff
                     ` (11 subsequent siblings)
  19 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/mempool/rte_mempool.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index f70bf36..df87cd2 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -327,8 +327,8 @@ struct rte_mempool {
 		if (likely(__lcore_id < RTE_MAX_LCORE))                         \
 			(mp)->stats[__lcore_id].name += (n);                    \
 		else                                                            \
-			__atomic_fetch_add(&((mp)->stats[RTE_MAX_LCORE].name),  \
-					   (n), __ATOMIC_RELAXED);              \
+			rte_atomic_fetch_add_explicit(&((mp)->stats[RTE_MAX_LCORE].name),  \
+					   (n), rte_memory_order_relaxed);              \
 	} while (0)
 #else
 #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {} while (0)
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 09/19] rcu: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (7 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 08/19] mempool: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-25  9:41     ` Ruifeng Wang
  2023-10-17 20:31   ` [PATCH v2 10/19] pdump: " Tyler Retzlaff
                     ` (10 subsequent siblings)
  19 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/rcu/rte_rcu_qsbr.c | 48 +++++++++++++++++------------------
 lib/rcu/rte_rcu_qsbr.h | 68 +++++++++++++++++++++++++-------------------------
 2 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/lib/rcu/rte_rcu_qsbr.c b/lib/rcu/rte_rcu_qsbr.c
index 17be93e..4dc7714 100644
--- a/lib/rcu/rte_rcu_qsbr.c
+++ b/lib/rcu/rte_rcu_qsbr.c
@@ -102,21 +102,21 @@
 	 * go out of sync. Hence, additional checks are required.
 	 */
 	/* Check if the thread is already registered */
-	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_RELAXED);
+	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
+					rte_memory_order_relaxed);
 	if (old_bmap & 1UL << id)
 		return 0;
 
 	do {
 		new_bmap = old_bmap | (1UL << id);
-		success = __atomic_compare_exchange(
+		success = rte_atomic_compare_exchange_strong_explicit(
 					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					&old_bmap, &new_bmap, 0,
-					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
+					&old_bmap, new_bmap,
+					rte_memory_order_release, rte_memory_order_relaxed);
 
 		if (success)
-			__atomic_fetch_add(&v->num_threads,
-						1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&v->num_threads,
+						1, rte_memory_order_relaxed);
 		else if (old_bmap & (1UL << id))
 			/* Someone else registered this thread.
 			 * Counter should not be incremented.
@@ -154,8 +154,8 @@
 	 * go out of sync. Hence, additional checks are required.
 	 */
 	/* Check if the thread is already unregistered */
-	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_RELAXED);
+	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
+					rte_memory_order_relaxed);
 	if (!(old_bmap & (1UL << id)))
 		return 0;
 
@@ -165,14 +165,14 @@
 		 * completed before removal of the thread from the list of
 		 * reporting threads.
 		 */
-		success = __atomic_compare_exchange(
+		success = rte_atomic_compare_exchange_strong_explicit(
 					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					&old_bmap, &new_bmap, 0,
-					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
+					&old_bmap, new_bmap,
+					rte_memory_order_release, rte_memory_order_relaxed);
 
 		if (success)
-			__atomic_fetch_sub(&v->num_threads,
-						1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_sub_explicit(&v->num_threads,
+						1, rte_memory_order_relaxed);
 		else if (!(old_bmap & (1UL << id)))
 			/* Someone else unregistered this thread.
 			 * Counter should not be incremented.
@@ -227,8 +227,8 @@
 
 	fprintf(f, "  Registered thread IDs = ");
 	for (i = 0; i < v->num_elems; i++) {
-		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_ACQUIRE);
+		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
+					rte_memory_order_acquire);
 		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
 		while (bmap) {
 			t = __builtin_ctzl(bmap);
@@ -241,26 +241,26 @@
 	fprintf(f, "\n");
 
 	fprintf(f, "  Token = %" PRIu64 "\n",
-			__atomic_load_n(&v->token, __ATOMIC_ACQUIRE));
+			rte_atomic_load_explicit(&v->token, rte_memory_order_acquire));
 
 	fprintf(f, "  Least Acknowledged Token = %" PRIu64 "\n",
-			__atomic_load_n(&v->acked_token, __ATOMIC_ACQUIRE));
+			rte_atomic_load_explicit(&v->acked_token, rte_memory_order_acquire));
 
 	fprintf(f, "Quiescent State Counts for readers:\n");
 	for (i = 0; i < v->num_elems; i++) {
-		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_ACQUIRE);
+		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
+					rte_memory_order_acquire);
 		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
 		while (bmap) {
 			t = __builtin_ctzl(bmap);
 			fprintf(f, "thread ID = %u, count = %" PRIu64 ", lock count = %u\n",
 				id + t,
-				__atomic_load_n(
+				rte_atomic_load_explicit(
 					&v->qsbr_cnt[id + t].cnt,
-					__ATOMIC_RELAXED),
-				__atomic_load_n(
+					rte_memory_order_relaxed),
+				rte_atomic_load_explicit(
 					&v->qsbr_cnt[id + t].lock_cnt,
-					__ATOMIC_RELAXED));
+					rte_memory_order_relaxed));
 			bmap &= ~(1UL << t);
 		}
 	}
diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h
index 87e1b55..9f4aed2 100644
--- a/lib/rcu/rte_rcu_qsbr.h
+++ b/lib/rcu/rte_rcu_qsbr.h
@@ -63,11 +63,11 @@
  * Given thread id needs to be converted to index into the array and
  * the id within the array element.
  */
-#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8)
+#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(RTE_ATOMIC(uint64_t)) * 8)
 #define __RTE_QSBR_THRID_ARRAY_SIZE(max_threads) \
 	RTE_ALIGN(RTE_ALIGN_MUL_CEIL(max_threads, \
 		__RTE_QSBR_THRID_ARRAY_ELM_SIZE) >> 3, RTE_CACHE_LINE_SIZE)
-#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *) \
+#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t __rte_atomic *) \
 	((struct rte_rcu_qsbr_cnt *)(v + 1) + v->max_threads) + i)
 #define __RTE_QSBR_THRID_INDEX_SHIFT 6
 #define __RTE_QSBR_THRID_MASK 0x3f
@@ -75,13 +75,13 @@
 
 /* Worker thread counter */
 struct rte_rcu_qsbr_cnt {
-	uint64_t cnt;
+	RTE_ATOMIC(uint64_t) cnt;
 	/**< Quiescent state counter. Value 0 indicates the thread is offline
 	 *   64b counter is used to avoid adding more code to address
 	 *   counter overflow. Changing this to 32b would require additional
 	 *   changes to various APIs.
 	 */
-	uint32_t lock_cnt;
+	RTE_ATOMIC(uint32_t) lock_cnt;
 	/**< Lock counter. Used when RTE_LIBRTE_RCU_DEBUG is enabled */
 } __rte_cache_aligned;
 
@@ -97,16 +97,16 @@ struct rte_rcu_qsbr_cnt {
  * 2) Register thread ID array
  */
 struct rte_rcu_qsbr {
-	uint64_t token __rte_cache_aligned;
+	RTE_ATOMIC(uint64_t) token __rte_cache_aligned;
 	/**< Counter to allow for multiple concurrent quiescent state queries */
-	uint64_t acked_token;
+	RTE_ATOMIC(uint64_t) acked_token;
 	/**< Least token acked by all the threads in the last call to
 	 *   rte_rcu_qsbr_check API.
 	 */
 
 	uint32_t num_elems __rte_cache_aligned;
 	/**< Number of elements in the thread ID array */
-	uint32_t num_threads;
+	RTE_ATOMIC(uint32_t) num_threads;
 	/**< Number of threads currently using this QS variable */
 	uint32_t max_threads;
 	/**< Maximum number of threads using this QS variable */
@@ -311,13 +311,13 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * the following will not move down after the load of any shared
 	 * data structure.
 	 */
-	t = __atomic_load_n(&v->token, __ATOMIC_RELAXED);
+	t = rte_atomic_load_explicit(&v->token, rte_memory_order_relaxed);
 
-	/* __atomic_store_n(cnt, __ATOMIC_RELAXED) is used to ensure
+	/* rte_atomic_store_explicit(cnt, rte_memory_order_relaxed) is used to ensure
 	 * 'cnt' (64b) is accessed atomically.
 	 */
-	__atomic_store_n(&v->qsbr_cnt[thread_id].cnt,
-		t, __ATOMIC_RELAXED);
+	rte_atomic_store_explicit(&v->qsbr_cnt[thread_id].cnt,
+		t, rte_memory_order_relaxed);
 
 	/* The subsequent load of the data structure should not
 	 * move above the store. Hence a store-load barrier
@@ -326,7 +326,7 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * writer might not see that the reader is online, even though
 	 * the reader is referencing the shared data structure.
 	 */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 }
 
 /**
@@ -362,8 +362,8 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * data structure can not move after this store.
 	 */
 
-	__atomic_store_n(&v->qsbr_cnt[thread_id].cnt,
-		__RTE_QSBR_CNT_THR_OFFLINE, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&v->qsbr_cnt[thread_id].cnt,
+		__RTE_QSBR_CNT_THR_OFFLINE, rte_memory_order_release);
 }
 
 /**
@@ -394,8 +394,8 @@ struct rte_rcu_qsbr_dq_parameters {
 
 #if defined(RTE_LIBRTE_RCU_DEBUG)
 	/* Increment the lock counter */
-	__atomic_fetch_add(&v->qsbr_cnt[thread_id].lock_cnt,
-				1, __ATOMIC_ACQUIRE);
+	rte_atomic_fetch_add_explicit(&v->qsbr_cnt[thread_id].lock_cnt,
+				1, rte_memory_order_acquire);
 #endif
 }
 
@@ -427,8 +427,8 @@ struct rte_rcu_qsbr_dq_parameters {
 
 #if defined(RTE_LIBRTE_RCU_DEBUG)
 	/* Decrement the lock counter */
-	__atomic_fetch_sub(&v->qsbr_cnt[thread_id].lock_cnt,
-				1, __ATOMIC_RELEASE);
+	rte_atomic_fetch_sub_explicit(&v->qsbr_cnt[thread_id].lock_cnt,
+				1, rte_memory_order_release);
 
 	__RTE_RCU_IS_LOCK_CNT_ZERO(v, thread_id, WARNING,
 				"Lock counter %u. Nested locks?\n",
@@ -461,7 +461,7 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * structure are visible to the workers before the token
 	 * update is visible.
 	 */
-	t = __atomic_fetch_add(&v->token, 1, __ATOMIC_RELEASE) + 1;
+	t = rte_atomic_fetch_add_explicit(&v->token, 1, rte_memory_order_release) + 1;
 
 	return t;
 }
@@ -493,16 +493,16 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * Later loads of the shared data structure should not move
 	 * above this load. Hence, use load-acquire.
 	 */
-	t = __atomic_load_n(&v->token, __ATOMIC_ACQUIRE);
+	t = rte_atomic_load_explicit(&v->token, rte_memory_order_acquire);
 
 	/* Check if there are updates available from the writer.
 	 * Inform the writer that updates are visible to this reader.
 	 * Prior loads of the shared data structure should not move
 	 * beyond this store. Hence use store-release.
 	 */
-	if (t != __atomic_load_n(&v->qsbr_cnt[thread_id].cnt, __ATOMIC_RELAXED))
-		__atomic_store_n(&v->qsbr_cnt[thread_id].cnt,
-					 t, __ATOMIC_RELEASE);
+	if (t != rte_atomic_load_explicit(&v->qsbr_cnt[thread_id].cnt, rte_memory_order_relaxed))
+		rte_atomic_store_explicit(&v->qsbr_cnt[thread_id].cnt,
+					 t, rte_memory_order_release);
 
 	__RTE_RCU_DP_LOG(DEBUG, "%s: update: token = %" PRIu64 ", Thread ID = %d",
 		__func__, t, thread_id);
@@ -517,7 +517,7 @@ struct rte_rcu_qsbr_dq_parameters {
 	uint32_t i, j, id;
 	uint64_t bmap;
 	uint64_t c;
-	uint64_t *reg_thread_id;
+	RTE_ATOMIC(uint64_t) *reg_thread_id;
 	uint64_t acked_token = __RTE_QSBR_CNT_MAX;
 
 	for (i = 0, reg_thread_id = __RTE_QSBR_THRID_ARRAY_ELM(v, 0);
@@ -526,7 +526,7 @@ struct rte_rcu_qsbr_dq_parameters {
 		/* Load the current registered thread bit map before
 		 * loading the reader thread quiescent state counters.
 		 */
-		bmap = __atomic_load_n(reg_thread_id, __ATOMIC_ACQUIRE);
+		bmap = rte_atomic_load_explicit(reg_thread_id, rte_memory_order_acquire);
 		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
 
 		while (bmap) {
@@ -534,9 +534,9 @@ struct rte_rcu_qsbr_dq_parameters {
 			__RTE_RCU_DP_LOG(DEBUG,
 				"%s: check: token = %" PRIu64 ", wait = %d, Bit Map = 0x%" PRIx64 ", Thread ID = %d",
 				__func__, t, wait, bmap, id + j);
-			c = __atomic_load_n(
+			c = rte_atomic_load_explicit(
 					&v->qsbr_cnt[id + j].cnt,
-					__ATOMIC_ACQUIRE);
+					rte_memory_order_acquire);
 			__RTE_RCU_DP_LOG(DEBUG,
 				"%s: status: token = %" PRIu64 ", wait = %d, Thread QS cnt = %" PRIu64 ", Thread ID = %d",
 				__func__, t, wait, c, id+j);
@@ -554,8 +554,8 @@ struct rte_rcu_qsbr_dq_parameters {
 				/* This thread might have unregistered.
 				 * Re-read the bitmap.
 				 */
-				bmap = __atomic_load_n(reg_thread_id,
-						__ATOMIC_ACQUIRE);
+				bmap = rte_atomic_load_explicit(reg_thread_id,
+						rte_memory_order_acquire);
 
 				continue;
 			}
@@ -576,8 +576,8 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * no need to update this very accurately using compare-and-swap.
 	 */
 	if (acked_token != __RTE_QSBR_CNT_MAX)
-		__atomic_store_n(&v->acked_token, acked_token,
-			__ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&v->acked_token, acked_token,
+			rte_memory_order_relaxed);
 
 	return 1;
 }
@@ -598,7 +598,7 @@ struct rte_rcu_qsbr_dq_parameters {
 			"%s: check: token = %" PRIu64 ", wait = %d, Thread ID = %d",
 			__func__, t, wait, i);
 		while (1) {
-			c = __atomic_load_n(&cnt->cnt, __ATOMIC_ACQUIRE);
+			c = rte_atomic_load_explicit(&cnt->cnt, rte_memory_order_acquire);
 			__RTE_RCU_DP_LOG(DEBUG,
 				"%s: status: token = %" PRIu64 ", wait = %d, Thread QS cnt = %" PRIu64 ", Thread ID = %d",
 				__func__, t, wait, c, i);
@@ -628,8 +628,8 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * no need to update this very accurately using compare-and-swap.
 	 */
 	if (acked_token != __RTE_QSBR_CNT_MAX)
-		__atomic_store_n(&v->acked_token, acked_token,
-			__ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&v->acked_token, acked_token,
+			rte_memory_order_relaxed);
 
 	return 1;
 }
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 10/19] pdump: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (8 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 09/19] rcu: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 11/19] stack: " Tyler Retzlaff
                     ` (9 subsequent siblings)
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/pdump/rte_pdump.c | 14 +++++++-------
 lib/pdump/rte_pdump.h |  8 ++++----
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 53cca10..80b90c6 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -110,8 +110,8 @@ struct pdump_response {
 		 * then packet doesn't match the filter (will be ignored).
 		 */
 		if (cbs->filter && rcs[i] == 0) {
-			__atomic_fetch_add(&stats->filtered,
-					   1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&stats->filtered,
+					   1, rte_memory_order_relaxed);
 			continue;
 		}
 
@@ -127,18 +127,18 @@ struct pdump_response {
 			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
 
 		if (unlikely(p == NULL))
-			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&stats->nombuf, 1, rte_memory_order_relaxed);
 		else
 			dup_bufs[d_pkts++] = p;
 	}
 
-	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&stats->accepted, d_pkts, rte_memory_order_relaxed);
 
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)&dup_bufs[0], d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
+		rte_atomic_fetch_add_explicit(&stats->ringfull, drops, rte_memory_order_relaxed);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
@@ -720,10 +720,10 @@ struct pdump_response {
 	uint16_t qid;
 
 	for (qid = 0; qid < nq; qid++) {
-		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+		const RTE_ATOMIC(uint64_t) *perq = (const uint64_t __rte_atomic *)&stats[port][qid];
 
 		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
-			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			val = rte_atomic_load_explicit(&perq[i], rte_memory_order_relaxed);
 			sum[i] += val;
 		}
 	}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index b1a3918..7feb2b6 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -233,10 +233,10 @@ enum {
  * The statistics are sum of both receive and transmit queues.
  */
 struct rte_pdump_stats {
-	uint64_t accepted; /**< Number of packets accepted by filter. */
-	uint64_t filtered; /**< Number of packets rejected by filter. */
-	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
-	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+	RTE_ATOMIC(uint64_t) accepted; /**< Number of packets accepted by filter. */
+	RTE_ATOMIC(uint64_t) filtered; /**< Number of packets rejected by filter. */
+	RTE_ATOMIC(uint64_t) nombuf;   /**< Number of mbuf allocation failures. */
+	RTE_ATOMIC(uint64_t) ringfull; /**< Number of missed packets due to ring full. */
 
 	uint64_t reserved[4]; /**< Reserved and pad to cache line */
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 11/19] stack: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (9 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 10/19] pdump: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-24  8:48     ` Konstantin Ananyev
  2023-10-17 20:31   ` [PATCH v2 12/19] telemetry: " Tyler Retzlaff
                     ` (8 subsequent siblings)
  19 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/stack/rte_stack.h            |  2 +-
 lib/stack/rte_stack_lf_c11.h     | 24 ++++++++++++------------
 lib/stack/rte_stack_lf_generic.h | 18 +++++++++---------
 3 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/lib/stack/rte_stack.h b/lib/stack/rte_stack.h
index 921d29a..a379300 100644
--- a/lib/stack/rte_stack.h
+++ b/lib/stack/rte_stack.h
@@ -44,7 +44,7 @@ struct rte_stack_lf_list {
 	/** List head */
 	struct rte_stack_lf_head head __rte_aligned(16);
 	/** List len */
-	uint64_t len;
+	RTE_ATOMIC(uint64_t) len;
 };
 
 /* Structure containing two lock-free LIFO lists: the stack itself and a list
diff --git a/lib/stack/rte_stack_lf_c11.h b/lib/stack/rte_stack_lf_c11.h
index 687a6f6..9cb6998 100644
--- a/lib/stack/rte_stack_lf_c11.h
+++ b/lib/stack/rte_stack_lf_c11.h
@@ -26,8 +26,8 @@
 	 * elements. If the mempool is near-empty to the point that this is a
 	 * concern, the user should consider increasing the mempool size.
 	 */
-	return (unsigned int)__atomic_load_n(&s->stack_lf.used.len,
-					     __ATOMIC_RELAXED);
+	return (unsigned int)rte_atomic_load_explicit(&s->stack_lf.used.len,
+					     rte_memory_order_relaxed);
 }
 
 static __rte_always_inline void
@@ -59,14 +59,14 @@
 				(rte_int128_t *)&list->head,
 				(rte_int128_t *)&old_head,
 				(rte_int128_t *)&new_head,
-				1, __ATOMIC_RELEASE,
-				__ATOMIC_RELAXED);
+				1, rte_memory_order_release,
+				rte_memory_order_relaxed);
 	} while (success == 0);
 
 	/* Ensure the stack modifications are not reordered with respect
 	 * to the LIFO len update.
 	 */
-	__atomic_fetch_add(&list->len, num, __ATOMIC_RELEASE);
+	rte_atomic_fetch_add_explicit(&list->len, num, rte_memory_order_release);
 }
 
 static __rte_always_inline struct rte_stack_lf_elem *
@@ -80,7 +80,7 @@
 	int success;
 
 	/* Reserve num elements, if available */
-	len = __atomic_load_n(&list->len, __ATOMIC_RELAXED);
+	len = rte_atomic_load_explicit(&list->len, rte_memory_order_relaxed);
 
 	while (1) {
 		/* Does the list contain enough elements? */
@@ -88,10 +88,10 @@
 			return NULL;
 
 		/* len is updated on failure */
-		if (__atomic_compare_exchange_n(&list->len,
+		if (rte_atomic_compare_exchange_weak_explicit(&list->len,
 						&len, len - num,
-						1, __ATOMIC_ACQUIRE,
-						__ATOMIC_RELAXED))
+						rte_memory_order_acquire,
+						rte_memory_order_relaxed))
 			break;
 	}
 
@@ -110,7 +110,7 @@
 		 * elements are properly ordered with respect to the head
 		 * pointer read.
 		 */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 
 		rte_prefetch0(old_head.top);
 
@@ -159,8 +159,8 @@
 				(rte_int128_t *)&list->head,
 				(rte_int128_t *)&old_head,
 				(rte_int128_t *)&new_head,
-				0, __ATOMIC_RELAXED,
-				__ATOMIC_RELAXED);
+				0, rte_memory_order_relaxed,
+				rte_memory_order_relaxed);
 	} while (success == 0);
 
 	return old_head.top;
diff --git a/lib/stack/rte_stack_lf_generic.h b/lib/stack/rte_stack_lf_generic.h
index 39f7ff3..cc69e4d 100644
--- a/lib/stack/rte_stack_lf_generic.h
+++ b/lib/stack/rte_stack_lf_generic.h
@@ -27,7 +27,7 @@
 	 * concern, the user should consider increasing the mempool size.
 	 */
 	/* NOTE: review for potential ordering optimization */
-	return __atomic_load_n(&s->stack_lf.used.len, __ATOMIC_SEQ_CST);
+	return rte_atomic_load_explicit(&s->stack_lf.used.len, rte_memory_order_seq_cst);
 }
 
 static __rte_always_inline void
@@ -64,11 +64,11 @@
 				(rte_int128_t *)&list->head,
 				(rte_int128_t *)&old_head,
 				(rte_int128_t *)&new_head,
-				1, __ATOMIC_RELEASE,
-				__ATOMIC_RELAXED);
+				1, rte_memory_order_release,
+				rte_memory_order_relaxed);
 	} while (success == 0);
 	/* NOTE: review for potential ordering optimization */
-	__atomic_fetch_add(&list->len, num, __ATOMIC_SEQ_CST);
+	rte_atomic_fetch_add_explicit(&list->len, num, rte_memory_order_seq_cst);
 }
 
 static __rte_always_inline struct rte_stack_lf_elem *
@@ -83,15 +83,15 @@
 	/* Reserve num elements, if available */
 	while (1) {
 		/* NOTE: review for potential ordering optimization */
-		uint64_t len = __atomic_load_n(&list->len, __ATOMIC_SEQ_CST);
+		uint64_t len = rte_atomic_load_explicit(&list->len, rte_memory_order_seq_cst);
 
 		/* Does the list contain enough elements? */
 		if (unlikely(len < num))
 			return NULL;
 
 		/* NOTE: review for potential ordering optimization */
-		if (__atomic_compare_exchange_n(&list->len, &len, len - num,
-				0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+		if (rte_atomic_compare_exchange_strong_explicit(&list->len, &len, len - num,
+				rte_memory_order_seq_cst, rte_memory_order_seq_cst))
 			break;
 	}
 
@@ -143,8 +143,8 @@
 				(rte_int128_t *)&list->head,
 				(rte_int128_t *)&old_head,
 				(rte_int128_t *)&new_head,
-				1, __ATOMIC_RELEASE,
-				__ATOMIC_RELAXED);
+				1, rte_memory_order_release,
+				rte_memory_order_relaxed);
 	} while (success == 0);
 
 	return old_head.top;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 12/19] telemetry: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (10 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 11/19] stack: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 13/19] vhost: " Tyler Retzlaff
                     ` (7 subsequent siblings)
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/telemetry/telemetry.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index aeb078c..9298284 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -45,7 +45,7 @@ struct socket {
 	int sock;
 	char path[sizeof(((struct sockaddr_un *)0)->sun_path)];
 	handler fn;
-	uint16_t *num_clients;
+	RTE_ATOMIC(uint16_t) *num_clients;
 };
 static struct socket v2_socket; /* socket for v2 telemetry */
 static struct socket v1_socket; /* socket for v1 telemetry */
@@ -64,7 +64,7 @@ struct socket {
 /* Used when accessing or modifying list of command callbacks */
 static rte_spinlock_t callback_sl = RTE_SPINLOCK_INITIALIZER;
 #ifndef RTE_EXEC_ENV_WINDOWS
-static uint16_t v2_clients;
+static RTE_ATOMIC(uint16_t) v2_clients;
 #endif /* !RTE_EXEC_ENV_WINDOWS */
 
 int
@@ -404,7 +404,7 @@ struct socket {
 		bytes = read(s, buffer, sizeof(buffer) - 1);
 	}
 	close(s);
-	__atomic_fetch_sub(&v2_clients, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_sub_explicit(&v2_clients, 1, rte_memory_order_relaxed);
 	return NULL;
 }
 
@@ -421,14 +421,14 @@ struct socket {
 			return NULL;
 		}
 		if (s->num_clients != NULL) {
-			uint16_t conns = __atomic_load_n(s->num_clients,
-					__ATOMIC_RELAXED);
+			uint16_t conns = rte_atomic_load_explicit(s->num_clients,
+					rte_memory_order_relaxed);
 			if (conns >= MAX_CONNECTIONS) {
 				close(s_accepted);
 				continue;
 			}
-			__atomic_fetch_add(s->num_clients, 1,
-					__ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(s->num_clients, 1,
+					rte_memory_order_relaxed);
 		}
 		rc = pthread_create(&th, NULL, s->fn,
 				    (void *)(uintptr_t)s_accepted);
@@ -437,8 +437,8 @@ struct socket {
 				 strerror(rc));
 			close(s_accepted);
 			if (s->num_clients != NULL)
-				__atomic_fetch_sub(s->num_clients, 1,
-						   __ATOMIC_RELAXED);
+				rte_atomic_fetch_sub_explicit(s->num_clients, 1,
+						   rte_memory_order_relaxed);
 			continue;
 		}
 		pthread_detach(th);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 13/19] vhost: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (11 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 12/19] telemetry: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 14/19] cryptodev: " Tyler Retzlaff
                     ` (6 subsequent siblings)
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/vhost/vdpa.c            |  3 ++-
 lib/vhost/vhost.c           | 42 ++++++++++++++++----------------
 lib/vhost/vhost.h           | 39 ++++++++++++++++--------------
 lib/vhost/vhost_user.c      |  6 ++---
 lib/vhost/virtio_net.c      | 58 +++++++++++++++++++++++++--------------------
 lib/vhost/virtio_net_ctrl.c |  6 +++--
 6 files changed, 84 insertions(+), 70 deletions(-)

diff --git a/lib/vhost/vdpa.c b/lib/vhost/vdpa.c
index 6284ea2..219eef8 100644
--- a/lib/vhost/vdpa.c
+++ b/lib/vhost/vdpa.c
@@ -235,7 +235,8 @@ struct rte_vdpa_device *
 	}
 
 	/* used idx is the synchronization point for the split vring */
-	__atomic_store_n(&vq->used->idx, idx_m, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit((unsigned short __rte_atomic *)&vq->used->idx,
+		idx_m, rte_memory_order_release);
 
 	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
 		vring_used_event(s_vring) = idx_m;
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 7fde412..bdcf85b 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -128,12 +128,13 @@ struct vhost_vq_stats_name_off {
 {
 #if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 70100)
 	/*
-	 * __sync_ built-ins are deprecated, but __atomic_ ones
+	 * __sync_ built-ins are deprecated, but rte_atomic_ ones
 	 * are sub-optimized in older GCC versions.
 	 */
 	__sync_fetch_and_or_1(addr, (1U << nr));
 #else
-	__atomic_fetch_or(addr, (1U << nr), __ATOMIC_RELAXED);
+	rte_atomic_fetch_or_explicit((volatile uint8_t __rte_atomic *)addr, (1U << nr),
+		rte_memory_order_relaxed);
 #endif
 }
 
@@ -155,7 +156,7 @@ struct vhost_vq_stats_name_off {
 		return;
 
 	/* To make sure guest memory updates are committed before logging */
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	page = addr / VHOST_LOG_PAGE;
 	while (page * VHOST_LOG_PAGE < addr + len) {
@@ -197,7 +198,7 @@ struct vhost_vq_stats_name_off {
 	if (unlikely(!vq->log_cache))
 		return;
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	log_base = (unsigned long *)(uintptr_t)dev->log_base;
 
@@ -206,17 +207,18 @@ struct vhost_vq_stats_name_off {
 
 #if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 70100)
 		/*
-		 * '__sync' builtins are deprecated, but '__atomic' ones
+		 * '__sync' builtins are deprecated, but 'rte_atomic' ones
 		 * are sub-optimized in older GCC versions.
 		 */
 		__sync_fetch_and_or(log_base + elem->offset, elem->val);
 #else
-		__atomic_fetch_or(log_base + elem->offset, elem->val,
-				__ATOMIC_RELAXED);
+		rte_atomic_fetch_or_explicit(
+			(unsigned long __rte_atomic *)(log_base + elem->offset),
+			elem->val, rte_memory_order_relaxed);
 #endif
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	vq->log_cache_nb_elem = 0;
 }
@@ -231,7 +233,7 @@ struct vhost_vq_stats_name_off {
 
 	if (unlikely(!vq->log_cache)) {
 		/* No logging cache allocated, write dirty log map directly */
-		rte_atomic_thread_fence(__ATOMIC_RELEASE);
+		rte_atomic_thread_fence(rte_memory_order_release);
 		vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page);
 
 		return;
@@ -251,7 +253,7 @@ struct vhost_vq_stats_name_off {
 		 * No more room for a new log cache entry,
 		 * so write the dirty log map directly.
 		 */
-		rte_atomic_thread_fence(__ATOMIC_RELEASE);
+		rte_atomic_thread_fence(rte_memory_order_release);
 		vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page);
 
 		return;
@@ -1184,11 +1186,11 @@ struct vhost_vq_stats_name_off {
 	if (unlikely(idx >= vq->size))
 		return -1;
 
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	vq->inflight_split->desc[idx].inflight = 0;
 
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	vq->inflight_split->used_idx = last_used_idx;
 	return 0;
@@ -1227,11 +1229,11 @@ struct vhost_vq_stats_name_off {
 	if (unlikely(head >= vq->size))
 		return -1;
 
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	inflight_info->desc[head].inflight = 0;
 
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	inflight_info->old_free_head = inflight_info->free_head;
 	inflight_info->old_used_idx = inflight_info->used_idx;
@@ -1454,7 +1456,7 @@ struct vhost_vq_stats_name_off {
 			vq->avail_wrap_counter << 15;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	vq->device_event->flags = flags;
 	return 0;
@@ -1519,16 +1521,16 @@ struct vhost_vq_stats_name_off {
 
 	rte_rwlock_read_lock(&vq->access_lock);
 
-	__atomic_store_n(&vq->irq_pending, false, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&vq->irq_pending, false, rte_memory_order_release);
 
 	if (dev->backend_ops->inject_irq(dev, vq)) {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-			__atomic_fetch_add(&vq->stats.guest_notifications_error,
-					1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&vq->stats.guest_notifications_error,
+					1, rte_memory_order_relaxed);
 	} else {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-			__atomic_fetch_add(&vq->stats.guest_notifications,
-					1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&vq->stats.guest_notifications,
+					1, rte_memory_order_relaxed);
 		if (dev->notify_ops->guest_notified)
 			dev->notify_ops->guest_notified(dev->vid);
 	}
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 5fc9035..f8624fb 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -158,9 +158,9 @@ struct virtqueue_stats {
 	uint64_t inflight_completed;
 	uint64_t guest_notifications_suppressed;
 	/* Counters below are atomic, and should be incremented as such. */
-	uint64_t guest_notifications;
-	uint64_t guest_notifications_offloaded;
-	uint64_t guest_notifications_error;
+	RTE_ATOMIC(uint64_t) guest_notifications;
+	RTE_ATOMIC(uint64_t) guest_notifications_offloaded;
+	RTE_ATOMIC(uint64_t) guest_notifications_error;
 };
 
 /**
@@ -348,7 +348,7 @@ struct vhost_virtqueue {
 	struct vhost_vring_addr ring_addrs;
 	struct virtqueue_stats	stats;
 
-	bool irq_pending;
+	RTE_ATOMIC(bool) irq_pending;
 } __rte_cache_aligned;
 
 /* Virtio device status as per Virtio specification */
@@ -486,7 +486,7 @@ struct virtio_net {
 	uint32_t		flags;
 	uint16_t		vhost_hlen;
 	/* to tell if we need broadcast rarp packet */
-	int16_t			broadcast_rarp;
+	RTE_ATOMIC(int16_t)	broadcast_rarp;
 	uint32_t		nr_vring;
 	int			async_copy;
 
@@ -557,7 +557,8 @@ struct virtio_net {
 static inline bool
 desc_is_avail(struct vring_packed_desc *desc, bool wrap_counter)
 {
-	uint16_t flags = __atomic_load_n(&desc->flags, __ATOMIC_ACQUIRE);
+	uint16_t flags = rte_atomic_load_explicit((unsigned short __rte_atomic *)&desc->flags,
+		rte_memory_order_acquire);
 
 	return wrap_counter == !!(flags & VRING_DESC_F_AVAIL) &&
 		wrap_counter != !!(flags & VRING_DESC_F_USED);
@@ -914,17 +915,19 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	bool expected = false;
 
 	if (dev->notify_ops->guest_notify) {
-		if (__atomic_compare_exchange_n(&vq->irq_pending, &expected, true, 0,
-				  __ATOMIC_RELEASE, __ATOMIC_RELAXED)) {
+		if (rte_atomic_compare_exchange_strong_explicit(&vq->irq_pending, &expected, true,
+				  rte_memory_order_release, rte_memory_order_relaxed)) {
 			if (dev->notify_ops->guest_notify(dev->vid, vq->index)) {
 				if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-					__atomic_fetch_add(&vq->stats.guest_notifications_offloaded,
-						1, __ATOMIC_RELAXED);
+					rte_atomic_fetch_add_explicit(
+						&vq->stats.guest_notifications_offloaded,
+						1, rte_memory_order_relaxed);
 				return;
 			}
 
 			/* Offloading failed, fallback to direct IRQ injection */
-			__atomic_store_n(&vq->irq_pending, false, __ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&vq->irq_pending, false,
+				rte_memory_order_release);
 		} else {
 			vq->stats.guest_notifications_suppressed++;
 			return;
@@ -933,14 +936,14 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	if (dev->backend_ops->inject_irq(dev, vq)) {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-			__atomic_fetch_add(&vq->stats.guest_notifications_error,
-				1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&vq->stats.guest_notifications_error,
+				1, rte_memory_order_relaxed);
 		return;
 	}
 
 	if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-		__atomic_fetch_add(&vq->stats.guest_notifications,
-			1, __ATOMIC_RELAXED);
+		rte_atomic_fetch_add_explicit(&vq->stats.guest_notifications,
+			1, rte_memory_order_relaxed);
 	if (dev->notify_ops->guest_notified)
 		dev->notify_ops->guest_notified(dev->vid);
 }
@@ -949,7 +952,7 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
 	/* Flush used->idx update before we read avail->flags. */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	/* Don't kick guest if we don't reach index specified by guest. */
 	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) {
@@ -981,7 +984,7 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	bool signalled_used_valid, kick = false;
 
 	/* Flush used desc update. */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	if (!(dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))) {
 		if (vq->driver_event->flags !=
@@ -1007,7 +1010,7 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		goto kick;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+	rte_atomic_thread_fence(rte_memory_order_acquire);
 
 	off_wrap = vq->driver_event->off_wrap;
 	off = off_wrap & ~(1 << 15);
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 901a80b..e363121 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -1914,7 +1914,7 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev,
 
 	if (inflight_split->used_idx != used->idx) {
 		inflight_split->desc[last_io].inflight = 0;
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+		rte_atomic_thread_fence(rte_memory_order_seq_cst);
 		inflight_split->used_idx = used->idx;
 	}
 
@@ -2418,10 +2418,10 @@ static int vhost_user_set_log_fd(struct virtio_net **pdev,
 	 * Set the flag to inject a RARP broadcast packet at
 	 * rte_vhost_dequeue_burst().
 	 *
-	 * __ATOMIC_RELEASE ordering is for making sure the mac is
+	 * rte_memory_order_release ordering is for making sure the mac is
 	 * copied before the flag is set.
 	 */
-	__atomic_store_n(&dev->broadcast_rarp, 1, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&dev->broadcast_rarp, 1, rte_memory_order_release);
 	vdpa_dev = dev->vdpa_dev;
 	if (vdpa_dev && vdpa_dev->ops->migration_done)
 		vdpa_dev->ops->migration_done(dev->vid);
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 759a78e..8af20f1 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -298,8 +298,8 @@
 
 	vhost_log_cache_sync(dev, vq);
 
-	__atomic_fetch_add(&vq->used->idx, vq->shadow_used_idx,
-			   __ATOMIC_RELEASE);
+	rte_atomic_fetch_add_explicit((unsigned short __rte_atomic *)&vq->used->idx,
+		vq->shadow_used_idx, rte_memory_order_release);
 	vq->shadow_used_idx = 0;
 	vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx),
 		sizeof(vq->used->idx));
@@ -335,7 +335,7 @@
 	}
 
 	/* The ordering for storing desc flags needs to be enforced. */
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	for (i = 0; i < vq->shadow_used_idx; i++) {
 		uint16_t flags;
@@ -387,8 +387,9 @@
 
 	vq->desc_packed[vq->shadow_last_used_idx].id = used_elem->id;
 	/* desc flags is the synchronization point for virtio packed vring */
-	__atomic_store_n(&vq->desc_packed[vq->shadow_last_used_idx].flags,
-			 used_elem->flags, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(
+		(unsigned short __rte_atomic *)&vq->desc_packed[vq->shadow_last_used_idx].flags,
+		used_elem->flags, rte_memory_order_release);
 
 	vhost_log_cache_used_vring(dev, vq, vq->shadow_last_used_idx *
 				   sizeof(struct vring_packed_desc),
@@ -418,7 +419,7 @@
 		desc_base[i].len = lens[i];
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 		desc_base[i].flags = flags;
@@ -515,7 +516,7 @@
 		vq->desc_packed[vq->last_used_idx + i].len = 0;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 	vhost_for_each_try_unroll(i, begin, PACKED_BATCH_SIZE)
 		vq->desc_packed[vq->last_used_idx + i].flags = flags;
 
@@ -1415,7 +1416,8 @@
 	 * The ordering between avail index and
 	 * desc reads needs to be enforced.
 	 */
-	avail_head = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE);
+	avail_head = rte_atomic_load_explicit((unsigned short __rte_atomic *)&vq->avail->idx,
+		rte_memory_order_acquire);
 
 	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
 
@@ -1806,7 +1808,8 @@
 	/*
 	 * The ordering between avail index and desc reads need to be enforced.
 	 */
-	avail_head = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE);
+	avail_head = rte_atomic_load_explicit((unsigned short __rte_atomic *)&vq->avail->idx,
+		rte_memory_order_acquire);
 
 	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
 
@@ -2222,7 +2225,7 @@
 	}
 
 	/* The ordering for storing desc flags needs to be enforced. */
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	from = async->last_buffer_idx_packed;
 
@@ -2311,7 +2314,9 @@
 			vhost_vring_call_packed(dev, vq);
 		} else {
 			write_back_completed_descs_split(vq, n_descs);
-			__atomic_fetch_add(&vq->used->idx, n_descs, __ATOMIC_RELEASE);
+			rte_atomic_fetch_add_explicit(
+				(unsigned short __rte_atomic *)&vq->used->idx,
+				n_descs, rte_memory_order_release);
 			vhost_vring_call_split(dev, vq);
 		}
 	} else {
@@ -3085,8 +3090,8 @@
 	 * The ordering between avail index and
 	 * desc reads needs to be enforced.
 	 */
-	avail_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
-			vq->last_avail_idx;
+	avail_entries = rte_atomic_load_explicit((unsigned short __rte_atomic *)&vq->avail->idx,
+		rte_memory_order_acquire) - vq->last_avail_idx;
 	if (avail_entries == 0)
 		return 0;
 
@@ -3224,7 +3229,7 @@
 			return -1;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+	rte_atomic_thread_fence(rte_memory_order_acquire);
 
 	vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
 		lens[i] = descs[avail_idx + i].len;
@@ -3297,7 +3302,7 @@
 			return -1;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+	rte_atomic_thread_fence(rte_memory_order_acquire);
 
 	vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
 		lens[i] = descs[avail_idx + i].len;
@@ -3590,7 +3595,7 @@
 	 *
 	 * broadcast_rarp shares a cacheline in the virtio_net structure
 	 * with some fields that are accessed during enqueue and
-	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * rte_atomic_compare_exchange_strong_explicit causes a write if performed compare
 	 * and exchange. This could result in false sharing between enqueue
 	 * and dequeue.
 	 *
@@ -3598,9 +3603,9 @@
 	 * and only performing compare and exchange if the read indicates it
 	 * is likely to be set.
 	 */
-	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
-			__atomic_compare_exchange_n(&dev->broadcast_rarp,
-			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+	if (unlikely(rte_atomic_load_explicit(&dev->broadcast_rarp, rte_memory_order_acquire) &&
+			rte_atomic_compare_exchange_strong_explicit(&dev->broadcast_rarp,
+			&success, 0, rte_memory_order_release, rte_memory_order_relaxed))) {
 
 		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
 		if (rarp_mbuf == NULL) {
@@ -3683,7 +3688,8 @@
 		vhost_vring_call_packed(dev, vq);
 	} else {
 		write_back_completed_descs_split(vq, nr_cpl_pkts);
-		__atomic_fetch_add(&vq->used->idx, nr_cpl_pkts, __ATOMIC_RELEASE);
+		rte_atomic_fetch_add_explicit((unsigned short __rte_atomic *)&vq->used->idx,
+			nr_cpl_pkts, rte_memory_order_release);
 		vhost_vring_call_split(dev, vq);
 	}
 	vq->async->pkts_inflight_n -= nr_cpl_pkts;
@@ -3714,8 +3720,8 @@
 	 * The ordering between avail index and
 	 * desc reads needs to be enforced.
 	 */
-	avail_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
-			vq->last_avail_idx;
+	avail_entries = rte_atomic_load_explicit((unsigned short __rte_atomic *)&vq->avail->idx,
+		rte_memory_order_acquire) - vq->last_avail_idx;
 	if (avail_entries == 0)
 		goto out;
 
@@ -4204,7 +4210,7 @@
 	 *
 	 * broadcast_rarp shares a cacheline in the virtio_net structure
 	 * with some fields that are accessed during enqueue and
-	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * rte_atomic_compare_exchange_strong_explicit causes a write if performed compare
 	 * and exchange. This could result in false sharing between enqueue
 	 * and dequeue.
 	 *
@@ -4212,9 +4218,9 @@
 	 * and only performing compare and exchange if the read indicates it
 	 * is likely to be set.
 	 */
-	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
-			__atomic_compare_exchange_n(&dev->broadcast_rarp,
-			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+	if (unlikely(rte_atomic_load_explicit(&dev->broadcast_rarp, rte_memory_order_acquire) &&
+			rte_atomic_compare_exchange_strong_explicit(&dev->broadcast_rarp,
+			&success, 0, rte_memory_order_release, rte_memory_order_relaxed))) {
 
 		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
 		if (rarp_mbuf == NULL) {
diff --git a/lib/vhost/virtio_net_ctrl.c b/lib/vhost/virtio_net_ctrl.c
index 6b583a0..c4847f8 100644
--- a/lib/vhost/virtio_net_ctrl.c
+++ b/lib/vhost/virtio_net_ctrl.c
@@ -33,7 +33,8 @@ struct virtio_net_ctrl_elem {
 	uint8_t *ctrl_req;
 	struct vring_desc *descs;
 
-	avail_idx = __atomic_load_n(&cvq->avail->idx, __ATOMIC_ACQUIRE);
+	avail_idx = rte_atomic_load_explicit((unsigned short __rte_atomic *)&cvq->avail->idx,
+		rte_memory_order_acquire);
 	if (avail_idx == cvq->last_avail_idx) {
 		VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue empty\n");
 		return 0;
@@ -236,7 +237,8 @@ struct virtio_net_ctrl_elem {
 	if (cvq->last_used_idx >= cvq->size)
 		cvq->last_used_idx -= cvq->size;
 
-	__atomic_store_n(&cvq->used->idx, cvq->last_used_idx, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit((unsigned short __rte_atomic *)&cvq->used->idx,
+		cvq->last_used_idx, rte_memory_order_release);
 
 	vhost_vring_call_split(dev, dev->cvq);
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 14/19] cryptodev: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (12 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 13/19] vhost: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 15/19] distributor: " Tyler Retzlaff
                     ` (5 subsequent siblings)
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/cryptodev/rte_cryptodev.c | 22 ++++++++++++----------
 lib/cryptodev/rte_cryptodev.h | 16 ++++++++--------
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/lib/cryptodev/rte_cryptodev.c b/lib/cryptodev/rte_cryptodev.c
index 314710b..b258827 100644
--- a/lib/cryptodev/rte_cryptodev.c
+++ b/lib/cryptodev/rte_cryptodev.c
@@ -1535,12 +1535,12 @@ struct rte_cryptodev_cb *
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
 	} else {
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&list->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&list->next, cb, rte_memory_order_release);
 	}
 
 	rte_spinlock_unlock(&rte_cryptodev_callback_lock);
@@ -1555,7 +1555,8 @@ struct rte_cryptodev_cb *
 				  struct rte_cryptodev_cb *cb)
 {
 	struct rte_cryptodev *dev;
-	struct rte_cryptodev_cb **prev_cb, *curr_cb;
+	RTE_ATOMIC(struct rte_cryptodev_cb *) *prev_cb;
+	struct rte_cryptodev_cb *curr_cb;
 	struct rte_cryptodev_cb_rcu *list;
 	int ret;
 
@@ -1601,8 +1602,8 @@ struct rte_cryptodev_cb *
 		curr_cb = *prev_cb;
 		if (curr_cb == cb) {
 			/* Remove the user cb from the callback list. */
-			__atomic_store_n(prev_cb, curr_cb->next,
-				__ATOMIC_RELAXED);
+			rte_atomic_store_explicit(prev_cb, curr_cb->next,
+				rte_memory_order_relaxed);
 			ret = 0;
 			break;
 		}
@@ -1673,12 +1674,12 @@ struct rte_cryptodev_cb *
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
 	} else {
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&list->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&list->next, cb, rte_memory_order_release);
 	}
 
 	rte_spinlock_unlock(&rte_cryptodev_callback_lock);
@@ -1694,7 +1695,8 @@ struct rte_cryptodev_cb *
 				  struct rte_cryptodev_cb *cb)
 {
 	struct rte_cryptodev *dev;
-	struct rte_cryptodev_cb **prev_cb, *curr_cb;
+	RTE_ATOMIC(struct rte_cryptodev_cb *) *prev_cb;
+	struct rte_cryptodev_cb *curr_cb;
 	struct rte_cryptodev_cb_rcu *list;
 	int ret;
 
@@ -1740,8 +1742,8 @@ struct rte_cryptodev_cb *
 		curr_cb = *prev_cb;
 		if (curr_cb == cb) {
 			/* Remove the user cb from the callback list. */
-			__atomic_store_n(prev_cb, curr_cb->next,
-				__ATOMIC_RELAXED);
+			rte_atomic_store_explicit(prev_cb, curr_cb->next,
+				rte_memory_order_relaxed);
 			ret = 0;
 			break;
 		}
diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
index be0698c..9092118 100644
--- a/lib/cryptodev/rte_cryptodev.h
+++ b/lib/cryptodev/rte_cryptodev.h
@@ -979,7 +979,7 @@ struct rte_cryptodev_config {
  * queue pair on enqueue/dequeue.
  */
 struct rte_cryptodev_cb {
-	struct rte_cryptodev_cb *next;
+	RTE_ATOMIC(struct rte_cryptodev_cb *) next;
 	/**< Pointer to next callback */
 	rte_cryptodev_callback_fn fn;
 	/**< Pointer to callback function */
@@ -992,7 +992,7 @@ struct rte_cryptodev_cb {
  * Structure used to hold information about the RCU for a queue pair.
  */
 struct rte_cryptodev_cb_rcu {
-	struct rte_cryptodev_cb *next;
+	RTE_ATOMIC(struct rte_cryptodev_cb *) next;
 	/**< Pointer to next callback */
 	struct rte_rcu_qsbr *qsbr;
 	/**< RCU QSBR variable per queue pair */
@@ -1947,15 +1947,15 @@ int rte_cryptodev_remove_deq_callback(uint8_t dev_id,
 		struct rte_cryptodev_cb_rcu *list;
 		struct rte_cryptodev_cb *cb;
 
-		/* __ATOMIC_RELEASE memory order was used when the
+		/* rte_memory_order_release memory order was used when the
 		 * call back was inserted into the list.
 		 * Since there is a clear dependency between loading
-		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
 		 * not required.
 		 */
 		list = &fp_ops->qp.deq_cb[qp_id];
 		rte_rcu_qsbr_thread_online(list->qsbr, 0);
-		cb = __atomic_load_n(&list->next, __ATOMIC_RELAXED);
+		cb = rte_atomic_load_explicit(&list->next, rte_memory_order_relaxed);
 
 		while (cb != NULL) {
 			nb_ops = cb->fn(dev_id, qp_id, ops, nb_ops,
@@ -2014,15 +2014,15 @@ int rte_cryptodev_remove_deq_callback(uint8_t dev_id,
 		struct rte_cryptodev_cb_rcu *list;
 		struct rte_cryptodev_cb *cb;
 
-		/* __ATOMIC_RELEASE memory order was used when the
+		/* rte_memory_order_release memory order was used when the
 		 * call back was inserted into the list.
 		 * Since there is a clear dependency between loading
-		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
 		 * not required.
 		 */
 		list = &fp_ops->qp.enq_cb[qp_id];
 		rte_rcu_qsbr_thread_online(list->qsbr, 0);
-		cb = __atomic_load_n(&list->next, __ATOMIC_RELAXED);
+		cb = rte_atomic_load_explicit(&list->next, rte_memory_order_relaxed);
 
 		while (cb != NULL) {
 			nb_ops = cb->fn(dev_id, qp_id, ops, nb_ops,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 15/19] distributor: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (13 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 14/19] cryptodev: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 16/19] ethdev: " Tyler Retzlaff
                     ` (4 subsequent siblings)
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/distributor/distributor_private.h |  4 +--
 lib/distributor/rte_distributor.c     | 54 +++++++++++++++++------------------
 2 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/lib/distributor/distributor_private.h b/lib/distributor/distributor_private.h
index 2f29343..dfeb9b5 100644
--- a/lib/distributor/distributor_private.h
+++ b/lib/distributor/distributor_private.h
@@ -113,12 +113,12 @@ enum rte_distributor_match_function {
  * There is a separate cacheline for returns in the burst API.
  */
 struct rte_distributor_buffer {
-	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+	volatile RTE_ATOMIC(int64_t) bufptr64[RTE_DIST_BURST_SIZE]
 		__rte_cache_aligned; /* <= outgoing to worker */
 
 	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
 
-	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+	volatile RTE_ATOMIC(int64_t) retptr64[RTE_DIST_BURST_SIZE]
 		__rte_cache_aligned; /* <= incoming from worker */
 
 	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
diff --git a/lib/distributor/rte_distributor.c b/lib/distributor/rte_distributor.c
index 5ca80dd..2ecb95c 100644
--- a/lib/distributor/rte_distributor.c
+++ b/lib/distributor/rte_distributor.c
@@ -38,7 +38,7 @@
 	struct rte_distributor_buffer *buf = &(d->bufs[worker_id]);
 	unsigned int i;
 
-	volatile int64_t *retptr64;
+	volatile RTE_ATOMIC(int64_t) *retptr64;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
 		rte_distributor_request_pkt_single(d->d_single,
@@ -50,7 +50,7 @@
 	/* Spin while handshake bits are set (scheduler clears it).
 	 * Sync with worker on GET_BUF flag.
 	 */
-	while (unlikely(__atomic_load_n(retptr64, __ATOMIC_ACQUIRE)
+	while (unlikely(rte_atomic_load_explicit(retptr64, rte_memory_order_acquire)
 			& (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF))) {
 		rte_pause();
 		uint64_t t = rte_rdtsc()+100;
@@ -78,8 +78,8 @@
 	 * line is ready for processing
 	 * Sync with distributor to release retptrs
 	 */
-	__atomic_store_n(retptr64, *retptr64 | RTE_DISTRIB_GET_BUF,
-			__ATOMIC_RELEASE);
+	rte_atomic_store_explicit(retptr64, *retptr64 | RTE_DISTRIB_GET_BUF,
+			rte_memory_order_release);
 }
 
 int
@@ -102,7 +102,7 @@
 	 * RETURN_BUF is set when distributor must retrieve in-flight packets
 	 * Sync with distributor to acquire bufptrs
 	 */
-	if (__atomic_load_n(&(buf->bufptr64[0]), __ATOMIC_ACQUIRE)
+	if (rte_atomic_load_explicit(&(buf->bufptr64[0]), rte_memory_order_acquire)
 		& (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF))
 		return -1;
 
@@ -120,8 +120,8 @@
 	 * on the next cacheline while we're working.
 	 * Sync with distributor on GET_BUF flag. Release bufptrs.
 	 */
-	__atomic_store_n(&(buf->bufptr64[0]),
-		buf->bufptr64[0] | RTE_DISTRIB_GET_BUF, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(buf->bufptr64[0]),
+		buf->bufptr64[0] | RTE_DISTRIB_GET_BUF, rte_memory_order_release);
 
 	return count;
 }
@@ -177,7 +177,7 @@
 	/* Spin while handshake bits are set (scheduler clears it).
 	 * Sync with worker on GET_BUF flag.
 	 */
-	while (unlikely(__atomic_load_n(&(buf->retptr64[0]), __ATOMIC_RELAXED)
+	while (unlikely(rte_atomic_load_explicit(&(buf->retptr64[0]), rte_memory_order_relaxed)
 			& (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF))) {
 		rte_pause();
 		uint64_t t = rte_rdtsc()+100;
@@ -187,7 +187,7 @@
 	}
 
 	/* Sync with distributor to acquire retptrs */
-	__atomic_thread_fence(__ATOMIC_ACQUIRE);
+	__atomic_thread_fence(rte_memory_order_acquire);
 	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
 		/* Switch off the return bit first */
 		buf->retptr64[i] = 0;
@@ -200,15 +200,15 @@
 	 * we won't read any mbufs from there even if GET_BUF is set.
 	 * This allows distributor to retrieve in-flight already sent packets.
 	 */
-	__atomic_fetch_or(&(buf->bufptr64[0]), RTE_DISTRIB_RETURN_BUF,
-		__ATOMIC_ACQ_REL);
+	rte_atomic_fetch_or_explicit(&(buf->bufptr64[0]), RTE_DISTRIB_RETURN_BUF,
+		rte_memory_order_acq_rel);
 
 	/* set the RETURN_BUF on retptr64 even if we got no returns.
 	 * Sync with distributor on RETURN_BUF flag. Release retptrs.
 	 * Notify distributor that we don't request more packets any more.
 	 */
-	__atomic_store_n(&(buf->retptr64[0]),
-		buf->retptr64[0] | RTE_DISTRIB_RETURN_BUF, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(buf->retptr64[0]),
+		buf->retptr64[0] | RTE_DISTRIB_RETURN_BUF, rte_memory_order_release);
 
 	return 0;
 }
@@ -297,7 +297,7 @@
 	 * to worker which does not require new packets.
 	 * They must be retrieved and assigned to another worker.
 	 */
-	if (!(__atomic_load_n(&(buf->bufptr64[0]), __ATOMIC_ACQUIRE)
+	if (!(rte_atomic_load_explicit(&(buf->bufptr64[0]), rte_memory_order_acquire)
 		& RTE_DISTRIB_GET_BUF))
 		for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
 			if (buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)
@@ -310,8 +310,8 @@
 	 *     with new packets if worker will make a new request.
 	 * - clear RETURN_BUF to unlock reads on worker side.
 	 */
-	__atomic_store_n(&(buf->bufptr64[0]), RTE_DISTRIB_GET_BUF,
-		__ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(buf->bufptr64[0]), RTE_DISTRIB_GET_BUF,
+		rte_memory_order_release);
 
 	/* Collect backlog packets from worker */
 	for (i = 0; i < d->backlog[wkr].count; i++)
@@ -348,7 +348,7 @@
 	unsigned int i;
 
 	/* Sync on GET_BUF flag. Acquire retptrs. */
-	if (__atomic_load_n(&(buf->retptr64[0]), __ATOMIC_ACQUIRE)
+	if (rte_atomic_load_explicit(&(buf->retptr64[0]), rte_memory_order_acquire)
 		& (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF)) {
 		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
 			if (buf->retptr64[i] & RTE_DISTRIB_VALID_BUF) {
@@ -379,7 +379,7 @@
 		/* Clear for the worker to populate with more returns.
 		 * Sync with distributor on GET_BUF flag. Release retptrs.
 		 */
-		__atomic_store_n(&(buf->retptr64[0]), 0, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&(buf->retptr64[0]), 0, rte_memory_order_release);
 	}
 	return count;
 }
@@ -404,7 +404,7 @@
 		return 0;
 
 	/* Sync with worker on GET_BUF flag */
-	while (!(__atomic_load_n(&(d->bufs[wkr].bufptr64[0]), __ATOMIC_ACQUIRE)
+	while (!(rte_atomic_load_explicit(&(d->bufs[wkr].bufptr64[0]), rte_memory_order_acquire)
 		& RTE_DISTRIB_GET_BUF)) {
 		handle_returns(d, wkr);
 		if (unlikely(!d->active[wkr]))
@@ -430,8 +430,8 @@
 	/* Clear the GET bit.
 	 * Sync with worker on GET_BUF flag. Release bufptrs.
 	 */
-	__atomic_store_n(&(buf->bufptr64[0]),
-		buf->bufptr64[0] & ~RTE_DISTRIB_GET_BUF, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(buf->bufptr64[0]),
+		buf->bufptr64[0] & ~RTE_DISTRIB_GET_BUF, rte_memory_order_release);
 	return  buf->count;
 
 }
@@ -463,8 +463,8 @@
 		/* Flush out all non-full cache-lines to workers. */
 		for (wid = 0 ; wid < d->num_workers; wid++) {
 			/* Sync with worker on GET_BUF flag. */
-			if (__atomic_load_n(&(d->bufs[wid].bufptr64[0]),
-				__ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF) {
+			if (rte_atomic_load_explicit(&(d->bufs[wid].bufptr64[0]),
+				rte_memory_order_acquire) & RTE_DISTRIB_GET_BUF) {
 				d->bufs[wid].count = 0;
 				release(d, wid);
 				handle_returns(d, wid);
@@ -598,8 +598,8 @@
 	/* Flush out all non-full cache-lines to workers. */
 	for (wid = 0 ; wid < d->num_workers; wid++)
 		/* Sync with worker on GET_BUF flag. */
-		if ((__atomic_load_n(&(d->bufs[wid].bufptr64[0]),
-			__ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF)) {
+		if ((rte_atomic_load_explicit(&(d->bufs[wid].bufptr64[0]),
+			rte_memory_order_acquire) & RTE_DISTRIB_GET_BUF)) {
 			d->bufs[wid].count = 0;
 			release(d, wid);
 		}
@@ -700,8 +700,8 @@
 	/* throw away returns, so workers can exit */
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		/* Sync with worker. Release retptrs. */
-		__atomic_store_n(&(d->bufs[wkr].retptr64[0]), 0,
-				__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&(d->bufs[wkr].retptr64[0]), 0,
+				rte_memory_order_release);
 
 	d->returns.start = d->returns.count = 0;
 }
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 16/19] ethdev: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (14 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 15/19] distributor: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 17/19] hash: " Tyler Retzlaff
                     ` (3 subsequent siblings)
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/ethdev/ethdev_driver.h   | 16 ++++++++--------
 lib/ethdev/ethdev_private.c  |  6 +++---
 lib/ethdev/rte_ethdev.c      | 24 ++++++++++++------------
 lib/ethdev/rte_ethdev.h      | 16 ++++++++--------
 lib/ethdev/rte_ethdev_core.h |  2 +-
 5 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index deb23ad..b482cd1 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -30,7 +30,7 @@
  * queue on Rx and Tx.
  */
 struct rte_eth_rxtx_callback {
-	struct rte_eth_rxtx_callback *next;
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) next;
 	union{
 		rte_rx_callback_fn rx;
 		rte_tx_callback_fn tx;
@@ -80,12 +80,12 @@ struct rte_eth_dev {
 	 * User-supplied functions called from rx_burst to post-process
 	 * received packets before passing them to the user
 	 */
-	struct rte_eth_rxtx_callback *post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
 	/**
 	 * User-supplied functions called from tx_burst to pre-process
 	 * received packets before passing them to the driver for transmission
 	 */
-	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
 
 	enum rte_eth_dev_state state; /**< Flag indicating the port state */
 	void *security_ctx; /**< Context for security ops */
@@ -1655,7 +1655,7 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 rte_eth_linkstatus_set(struct rte_eth_dev *dev,
 		       const struct rte_eth_link *new_link)
 {
-	uint64_t *dev_link = (uint64_t *)&(dev->data->dev_link);
+	RTE_ATOMIC(uint64_t) *dev_link = (uint64_t __rte_atomic *)&(dev->data->dev_link);
 	union {
 		uint64_t val64;
 		struct rte_eth_link link;
@@ -1663,8 +1663,8 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 
 	RTE_BUILD_BUG_ON(sizeof(*new_link) != sizeof(uint64_t));
 
-	orig.val64 = __atomic_exchange_n(dev_link, *(const uint64_t *)new_link,
-					__ATOMIC_SEQ_CST);
+	orig.val64 = rte_atomic_exchange_explicit(dev_link, *(const uint64_t *)new_link,
+					rte_memory_order_seq_cst);
 
 	return (orig.link.link_status == new_link->link_status) ? -1 : 0;
 }
@@ -1682,12 +1682,12 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
 		       struct rte_eth_link *link)
 {
-	uint64_t *src = (uint64_t *)&(dev->data->dev_link);
+	RTE_ATOMIC(uint64_t) *src = (uint64_t __rte_atomic *)&(dev->data->dev_link);
 	uint64_t *dst = (uint64_t *)link;
 
 	RTE_BUILD_BUG_ON(sizeof(*link) != sizeof(uint64_t));
 
-	*dst = __atomic_load_n(src, __ATOMIC_SEQ_CST);
+	*dst = rte_atomic_load_explicit(src, rte_memory_order_seq_cst);
 }
 
 /**
diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
index 7cc7f28..82e2568 100644
--- a/lib/ethdev/ethdev_private.c
+++ b/lib/ethdev/ethdev_private.c
@@ -245,7 +245,7 @@ struct dummy_queue {
 void
 eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo)
 {
-	static void *dummy_data[RTE_MAX_QUEUES_PER_PORT];
+	static RTE_ATOMIC(void *) dummy_data[RTE_MAX_QUEUES_PER_PORT];
 	uintptr_t port_id = fpo - rte_eth_fp_ops;
 
 	per_port_queues[port_id].rx_warn_once = false;
@@ -278,10 +278,10 @@ struct dummy_queue {
 	fpo->recycle_rx_descriptors_refill = dev->recycle_rx_descriptors_refill;
 
 	fpo->rxq.data = dev->data->rx_queues;
-	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
+	fpo->rxq.clbk = (void * __rte_atomic *)(uintptr_t)dev->post_rx_burst_cbs;
 
 	fpo->txq.data = dev->data->tx_queues;
-	fpo->txq.clbk = (void **)(uintptr_t)dev->pre_tx_burst_cbs;
+	fpo->txq.clbk = (void * __rte_atomic *)(uintptr_t)dev->pre_tx_burst_cbs;
 }
 
 uint16_t
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 9dabcb5..af23ac0 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5654,9 +5654,9 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(
+		rte_atomic_store_explicit(
 			&rte_eth_devices[port_id].post_rx_burst_cbs[queue_id],
-			cb, __ATOMIC_RELEASE);
+			cb, rte_memory_order_release);
 
 	} else {
 		while (tail->next)
@@ -5664,7 +5664,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
 	}
 	rte_spinlock_unlock(&eth_dev_rx_cb_lock);
 
@@ -5704,9 +5704,9 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 	/* Stores to cb->fn, cb->param and cb->next should complete before
 	 * cb is visible to data plane threads.
 	 */
-	__atomic_store_n(
+	rte_atomic_store_explicit(
 		&rte_eth_devices[port_id].post_rx_burst_cbs[queue_id],
-		cb, __ATOMIC_RELEASE);
+		cb, rte_memory_order_release);
 	rte_spinlock_unlock(&eth_dev_rx_cb_lock);
 
 	rte_eth_trace_add_first_rx_callback(port_id, queue_id, fn, user_param,
@@ -5757,9 +5757,9 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(
+		rte_atomic_store_explicit(
 			&rte_eth_devices[port_id].pre_tx_burst_cbs[queue_id],
-			cb, __ATOMIC_RELEASE);
+			cb, rte_memory_order_release);
 
 	} else {
 		while (tail->next)
@@ -5767,7 +5767,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
 	}
 	rte_spinlock_unlock(&eth_dev_tx_cb_lock);
 
@@ -5791,7 +5791,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 	struct rte_eth_rxtx_callback *cb;
-	struct rte_eth_rxtx_callback **prev_cb;
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) *prev_cb;
 	int ret = -EINVAL;
 
 	rte_spinlock_lock(&eth_dev_rx_cb_lock);
@@ -5800,7 +5800,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		cb = *prev_cb;
 		if (cb == user_cb) {
 			/* Remove the user cb from the callback list. */
-			__atomic_store_n(prev_cb, cb->next, __ATOMIC_RELAXED);
+			rte_atomic_store_explicit(prev_cb, cb->next, rte_memory_order_relaxed);
 			ret = 0;
 			break;
 		}
@@ -5828,7 +5828,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 	int ret = -EINVAL;
 	struct rte_eth_rxtx_callback *cb;
-	struct rte_eth_rxtx_callback **prev_cb;
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) *prev_cb;
 
 	rte_spinlock_lock(&eth_dev_tx_cb_lock);
 	prev_cb = &dev->pre_tx_burst_cbs[queue_id];
@@ -5836,7 +5836,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		cb = *prev_cb;
 		if (cb == user_cb) {
 			/* Remove the user cb from the callback list. */
-			__atomic_store_n(prev_cb, cb->next, __ATOMIC_RELAXED);
+			rte_atomic_store_explicit(prev_cb, cb->next, rte_memory_order_relaxed);
 			ret = 0;
 			break;
 		}
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index f949dfc..ec48b24 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6018,14 +6018,14 @@ uint16_t rte_eth_call_rx_callbacks(uint16_t port_id, uint16_t queue_id,
 	{
 		void *cb;
 
-		/* __ATOMIC_RELEASE memory order was used when the
+		/* rte_memory_order_release memory order was used when the
 		 * call back was inserted into the list.
 		 * Since there is a clear dependency between loading
-		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
 		 * not required.
 		 */
-		cb = __atomic_load_n((void **)&p->rxq.clbk[queue_id],
-				__ATOMIC_RELAXED);
+		cb = rte_atomic_load_explicit(&p->rxq.clbk[queue_id],
+				rte_memory_order_relaxed);
 		if (unlikely(cb != NULL))
 			nb_rx = rte_eth_call_rx_callbacks(port_id, queue_id,
 					rx_pkts, nb_rx, nb_pkts, cb);
@@ -6355,14 +6355,14 @@ uint16_t rte_eth_call_tx_callbacks(uint16_t port_id, uint16_t queue_id,
 	{
 		void *cb;
 
-		/* __ATOMIC_RELEASE memory order was used when the
+		/* rte_memory_order_release memory order was used when the
 		 * call back was inserted into the list.
 		 * Since there is a clear dependency between loading
-		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
 		 * not required.
 		 */
-		cb = __atomic_load_n((void **)&p->txq.clbk[queue_id],
-				__ATOMIC_RELAXED);
+		cb = rte_atomic_load_explicit(&p->txq.clbk[queue_id],
+				rte_memory_order_relaxed);
 		if (unlikely(cb != NULL))
 			nb_pkts = rte_eth_call_tx_callbacks(port_id, queue_id,
 					tx_pkts, nb_pkts, cb);
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index 32f5f73..4bfaf79 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -71,7 +71,7 @@ struct rte_ethdev_qdata {
 	/** points to array of internal queue data pointers */
 	void **data;
 	/** points to array of queue callback data pointers */
-	void **clbk;
+	RTE_ATOMIC(void *) *clbk;
 };
 
 /**
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 17/19] hash: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (15 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 16/19] ethdev: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 18/19] timer: " Tyler Retzlaff
                     ` (2 subsequent siblings)
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/hash/rte_cuckoo_hash.c | 116 ++++++++++++++++++++++-----------------------
 lib/hash/rte_cuckoo_hash.h |   6 +--
 2 files changed, 61 insertions(+), 61 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 19b23f2..b2cf60d 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -149,7 +149,7 @@ struct rte_hash *
 	unsigned int writer_takes_lock = 0;
 	unsigned int no_free_on_del = 0;
 	uint32_t *ext_bkt_to_free = NULL;
-	uint32_t *tbl_chng_cnt = NULL;
+	RTE_ATOMIC(uint32_t) *tbl_chng_cnt = NULL;
 	struct lcore_cache *local_free_slots = NULL;
 	unsigned int readwrite_concur_lf_support = 0;
 	uint32_t i;
@@ -713,9 +713,9 @@ struct rte_hash *
 				 * variable. Release the application data
 				 * to the readers.
 				 */
-				__atomic_store_n(&k->pdata,
+				rte_atomic_store_explicit(&k->pdata,
 					data,
-					__ATOMIC_RELEASE);
+					rte_memory_order_release);
 				/*
 				 * Return index where key is stored,
 				 * subtracting the first dummy index
@@ -776,9 +776,9 @@ struct rte_hash *
 			 * key_idx is the guard variable for signature
 			 * and key.
 			 */
-			__atomic_store_n(&prim_bkt->key_idx[i],
+			rte_atomic_store_explicit(&prim_bkt->key_idx[i],
 					 new_idx,
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 			break;
 		}
 	}
@@ -851,9 +851,9 @@ struct rte_hash *
 		if (unlikely(&h->buckets[prev_alt_bkt_idx]
 				!= curr_bkt)) {
 			/* revert it to empty, otherwise duplicated keys */
-			__atomic_store_n(&curr_bkt->key_idx[curr_slot],
+			rte_atomic_store_explicit(&curr_bkt->key_idx[curr_slot],
 				EMPTY_SLOT,
-				__ATOMIC_RELEASE);
+				rte_memory_order_release);
 			__hash_rw_writer_unlock(h);
 			return -1;
 		}
@@ -865,13 +865,13 @@ struct rte_hash *
 			 * Since there is one writer, load acquires on
 			 * tbl_chng_cnt are not required.
 			 */
-			__atomic_store_n(h->tbl_chng_cnt,
+			rte_atomic_store_explicit(h->tbl_chng_cnt,
 					 *h->tbl_chng_cnt + 1,
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 			/* The store to sig_current should not
 			 * move above the store to tbl_chng_cnt.
 			 */
-			__atomic_thread_fence(__ATOMIC_RELEASE);
+			__atomic_thread_fence(rte_memory_order_release);
 		}
 
 		/* Need to swap current/alt sig to allow later
@@ -881,9 +881,9 @@ struct rte_hash *
 		curr_bkt->sig_current[curr_slot] =
 			prev_bkt->sig_current[prev_slot];
 		/* Release the updated bucket entry */
-		__atomic_store_n(&curr_bkt->key_idx[curr_slot],
+		rte_atomic_store_explicit(&curr_bkt->key_idx[curr_slot],
 			prev_bkt->key_idx[prev_slot],
-			__ATOMIC_RELEASE);
+			rte_memory_order_release);
 
 		curr_slot = prev_slot;
 		curr_node = prev_node;
@@ -897,20 +897,20 @@ struct rte_hash *
 		 * Since there is one writer, load acquires on
 		 * tbl_chng_cnt are not required.
 		 */
-		__atomic_store_n(h->tbl_chng_cnt,
+		rte_atomic_store_explicit(h->tbl_chng_cnt,
 				 *h->tbl_chng_cnt + 1,
-				 __ATOMIC_RELEASE);
+				 rte_memory_order_release);
 		/* The store to sig_current should not
 		 * move above the store to tbl_chng_cnt.
 		 */
-		__atomic_thread_fence(__ATOMIC_RELEASE);
+		__atomic_thread_fence(rte_memory_order_release);
 	}
 
 	curr_bkt->sig_current[curr_slot] = sig;
 	/* Release the new bucket entry */
-	__atomic_store_n(&curr_bkt->key_idx[curr_slot],
+	rte_atomic_store_explicit(&curr_bkt->key_idx[curr_slot],
 			 new_idx,
-			 __ATOMIC_RELEASE);
+			 rte_memory_order_release);
 
 	__hash_rw_writer_unlock(h);
 
@@ -1076,9 +1076,9 @@ struct rte_hash *
 	 * not leak after the store of pdata in the key store. i.e. pdata is
 	 * the guard variable. Release the application data to the readers.
 	 */
-	__atomic_store_n(&new_k->pdata,
+	rte_atomic_store_explicit(&new_k->pdata,
 		data,
-		__ATOMIC_RELEASE);
+		rte_memory_order_release);
 	/* Copy key */
 	memcpy(new_k->key, key, h->key_len);
 
@@ -1149,9 +1149,9 @@ struct rte_hash *
 				 * key_idx is the guard variable for signature
 				 * and key.
 				 */
-				__atomic_store_n(&cur_bkt->key_idx[i],
+				rte_atomic_store_explicit(&cur_bkt->key_idx[i],
 						 slot_id,
-						 __ATOMIC_RELEASE);
+						 rte_memory_order_release);
 				__hash_rw_writer_unlock(h);
 				return slot_id - 1;
 			}
@@ -1185,9 +1185,9 @@ struct rte_hash *
 	 * the store to key_idx. i.e. key_idx is the guard variable
 	 * for signature and key.
 	 */
-	__atomic_store_n(&(h->buckets_ext[ext_bkt_id - 1]).key_idx[0],
+	rte_atomic_store_explicit(&(h->buckets_ext[ext_bkt_id - 1]).key_idx[0],
 			 slot_id,
-			 __ATOMIC_RELEASE);
+			 rte_memory_order_release);
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
 	last->next = &h->buckets_ext[ext_bkt_id - 1];
@@ -1290,17 +1290,17 @@ struct rte_hash *
 		 * key comparison will ensure that the lookup fails.
 		 */
 		if (bkt->sig_current[i] == sig) {
-			key_idx = __atomic_load_n(&bkt->key_idx[i],
-					  __ATOMIC_ACQUIRE);
+			key_idx = rte_atomic_load_explicit(&bkt->key_idx[i],
+					  rte_memory_order_acquire);
 			if (key_idx != EMPTY_SLOT) {
 				k = (struct rte_hash_key *) ((char *)keys +
 						key_idx * h->key_entry_size);
 
 				if (rte_hash_cmp_eq(key, k->key, h) == 0) {
 					if (data != NULL) {
-						*data = __atomic_load_n(
+						*data = rte_atomic_load_explicit(
 							&k->pdata,
-							__ATOMIC_ACQUIRE);
+							rte_memory_order_acquire);
 					}
 					/*
 					 * Return index where key is stored,
@@ -1374,8 +1374,8 @@ struct rte_hash *
 		 * starts. Acquire semantics will make sure that
 		 * loads in search_one_bucket are not hoisted.
 		 */
-		cnt_b = __atomic_load_n(h->tbl_chng_cnt,
-				__ATOMIC_ACQUIRE);
+		cnt_b = rte_atomic_load_explicit(h->tbl_chng_cnt,
+				rte_memory_order_acquire);
 
 		/* Check if key is in primary location */
 		bkt = &h->buckets[prim_bucket_idx];
@@ -1396,7 +1396,7 @@ struct rte_hash *
 		/* The loads of sig_current in search_one_bucket
 		 * should not move below the load from tbl_chng_cnt.
 		 */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 		/* Re-read the table change counter to check if the
 		 * table has changed during search. If yes, re-do
 		 * the search.
@@ -1405,8 +1405,8 @@ struct rte_hash *
 		 * and key index in secondary bucket will make sure
 		 * that it does not get hoisted.
 		 */
-		cnt_a = __atomic_load_n(h->tbl_chng_cnt,
-					__ATOMIC_ACQUIRE);
+		cnt_a = rte_atomic_load_explicit(h->tbl_chng_cnt,
+					rte_memory_order_acquire);
 	} while (cnt_b != cnt_a);
 
 	return -ENOENT;
@@ -1611,26 +1611,26 @@ struct rte_hash *
 	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
-			__atomic_store_n(&cur_bkt->key_idx[pos],
+			rte_atomic_store_explicit(&cur_bkt->key_idx[pos],
 					 last_bkt->key_idx[i],
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 			if (h->readwrite_concur_lf_support) {
 				/* Inform the readers that the table has changed
 				 * Since there is one writer, load acquire on
 				 * tbl_chng_cnt is not required.
 				 */
-				__atomic_store_n(h->tbl_chng_cnt,
+				rte_atomic_store_explicit(h->tbl_chng_cnt,
 					 *h->tbl_chng_cnt + 1,
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 				/* The store to sig_current should
 				 * not move above the store to tbl_chng_cnt.
 				 */
-				__atomic_thread_fence(__ATOMIC_RELEASE);
+				__atomic_thread_fence(rte_memory_order_release);
 			}
 			last_bkt->sig_current[i] = NULL_SIGNATURE;
-			__atomic_store_n(&last_bkt->key_idx[i],
+			rte_atomic_store_explicit(&last_bkt->key_idx[i],
 					 EMPTY_SLOT,
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 			return;
 		}
 	}
@@ -1650,8 +1650,8 @@ struct rte_hash *
 
 	/* Check if key is in bucket */
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		key_idx = __atomic_load_n(&bkt->key_idx[i],
-					  __ATOMIC_ACQUIRE);
+		key_idx = rte_atomic_load_explicit(&bkt->key_idx[i],
+					  rte_memory_order_acquire);
 		if (bkt->sig_current[i] == sig && key_idx != EMPTY_SLOT) {
 			k = (struct rte_hash_key *) ((char *)keys +
 					key_idx * h->key_entry_size);
@@ -1663,9 +1663,9 @@ struct rte_hash *
 				if (!h->no_free_on_del)
 					remove_entry(h, bkt, i);
 
-				__atomic_store_n(&bkt->key_idx[i],
+				rte_atomic_store_explicit(&bkt->key_idx[i],
 						 EMPTY_SLOT,
-						 __ATOMIC_RELEASE);
+						 rte_memory_order_release);
 
 				*pos = i;
 				/*
@@ -2077,8 +2077,8 @@ struct rte_hash *
 		 * starts. Acquire semantics will make sure that
 		 * loads in compare_signatures are not hoisted.
 		 */
-		cnt_b = __atomic_load_n(h->tbl_chng_cnt,
-					__ATOMIC_ACQUIRE);
+		cnt_b = rte_atomic_load_explicit(h->tbl_chng_cnt,
+					rte_memory_order_acquire);
 
 		/* Compare signatures and prefetch key slot of first hit */
 		for (i = 0; i < num_keys; i++) {
@@ -2121,9 +2121,9 @@ struct rte_hash *
 						__builtin_ctzl(prim_hitmask[i])
 						>> 1;
 				uint32_t key_idx =
-				__atomic_load_n(
+				rte_atomic_load_explicit(
 					&primary_bkt[i]->key_idx[hit_index],
-					__ATOMIC_ACQUIRE);
+					rte_memory_order_acquire);
 				const struct rte_hash_key *key_slot =
 					(const struct rte_hash_key *)(
 					(const char *)h->key_store +
@@ -2137,9 +2137,9 @@ struct rte_hash *
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = __atomic_load_n(
+						data[i] = rte_atomic_load_explicit(
 							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
+							rte_memory_order_acquire);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
@@ -2153,9 +2153,9 @@ struct rte_hash *
 						__builtin_ctzl(sec_hitmask[i])
 						>> 1;
 				uint32_t key_idx =
-				__atomic_load_n(
+				rte_atomic_load_explicit(
 					&secondary_bkt[i]->key_idx[hit_index],
-					__ATOMIC_ACQUIRE);
+					rte_memory_order_acquire);
 				const struct rte_hash_key *key_slot =
 					(const struct rte_hash_key *)(
 					(const char *)h->key_store +
@@ -2170,9 +2170,9 @@ struct rte_hash *
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = __atomic_load_n(
+						data[i] = rte_atomic_load_explicit(
 							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
+							rte_memory_order_acquire);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
@@ -2216,7 +2216,7 @@ struct rte_hash *
 		/* The loads of sig_current in compare_signatures
 		 * should not move below the load from tbl_chng_cnt.
 		 */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 		/* Re-read the table change counter to check if the
 		 * table has changed during search. If yes, re-do
 		 * the search.
@@ -2225,8 +2225,8 @@ struct rte_hash *
 		 * key index will make sure that it does not get
 		 * hoisted.
 		 */
-		cnt_a = __atomic_load_n(h->tbl_chng_cnt,
-					__ATOMIC_ACQUIRE);
+		cnt_a = rte_atomic_load_explicit(h->tbl_chng_cnt,
+					rte_memory_order_acquire);
 	} while (cnt_b != cnt_a);
 
 	if (hit_mask != NULL)
@@ -2498,8 +2498,8 @@ struct rte_hash *
 	idx = *next % RTE_HASH_BUCKET_ENTRIES;
 
 	/* If current position is empty, go to the next one */
-	while ((position = __atomic_load_n(&h->buckets[bucket_idx].key_idx[idx],
-					__ATOMIC_ACQUIRE)) == EMPTY_SLOT) {
+	while ((position = rte_atomic_load_explicit(&h->buckets[bucket_idx].key_idx[idx],
+					rte_memory_order_acquire)) == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
 		if (*next == total_entries_main)
diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
index eb2644f..f7afc4d 100644
--- a/lib/hash/rte_cuckoo_hash.h
+++ b/lib/hash/rte_cuckoo_hash.h
@@ -137,7 +137,7 @@ struct lcore_cache {
 struct rte_hash_key {
 	union {
 		uintptr_t idata;
-		void *pdata;
+		RTE_ATOMIC(void *) pdata;
 	};
 	/* Variable key size */
 	char key[0];
@@ -155,7 +155,7 @@ enum rte_hash_sig_compare_function {
 struct rte_hash_bucket {
 	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
 
-	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
+	RTE_ATOMIC(uint32_t) key_idx[RTE_HASH_BUCKET_ENTRIES];
 
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
 
@@ -229,7 +229,7 @@ struct rte_hash {
 	 * is piggy-backed to freeing of the key index.
 	 */
 	uint32_t *ext_bkt_to_free;
-	uint32_t *tbl_chng_cnt;
+	RTE_ATOMIC(uint32_t) *tbl_chng_cnt;
 	/**< Indicates if the hash table changed from last read. */
 } __rte_cache_aligned;
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 18/19] timer: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (16 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 17/19] hash: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-17 20:31   ` [PATCH v2 19/19] ring: " Tyler Retzlaff
  2023-10-17 23:55   ` [PATCH v2 00/19] " Stephen Hemminger
  19 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/timer/rte_timer.c | 50 +++++++++++++++++++++++++-------------------------
 lib/timer/rte_timer.h |  6 +++---
 2 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/lib/timer/rte_timer.c b/lib/timer/rte_timer.c
index 85d6757..53ed221 100644
--- a/lib/timer/rte_timer.c
+++ b/lib/timer/rte_timer.c
@@ -210,7 +210,7 @@ struct rte_timer_data {
 
 	status.state = RTE_TIMER_STOP;
 	status.owner = RTE_TIMER_NO_OWNER;
-	__atomic_store_n(&tim->status.u32, status.u32, __ATOMIC_RELAXED);
+	rte_atomic_store_explicit(&tim->status.u32, status.u32, rte_memory_order_relaxed);
 }
 
 /*
@@ -231,7 +231,7 @@ struct rte_timer_data {
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	prev_status.u32 = __atomic_load_n(&tim->status.u32, __ATOMIC_RELAXED);
+	prev_status.u32 = rte_atomic_load_explicit(&tim->status.u32, rte_memory_order_relaxed);
 
 	while (success == 0) {
 		/* timer is running on another core
@@ -254,11 +254,11 @@ struct rte_timer_data {
 		 * timer is in CONFIG state, the state cannot be changed
 		 * by other threads. So, we should use ACQUIRE here.
 		 */
-		success = __atomic_compare_exchange_n(&tim->status.u32,
-					      &prev_status.u32,
-					      status.u32, 0,
-					      __ATOMIC_ACQUIRE,
-					      __ATOMIC_RELAXED);
+		success = rte_atomic_compare_exchange_strong_explicit(&tim->status.u32,
+					      (uint32_t *)(uintptr_t)&prev_status.u32,
+					      status.u32,
+					      rte_memory_order_acquire,
+					      rte_memory_order_relaxed);
 	}
 
 	ret_prev_status->u32 = prev_status.u32;
@@ -277,7 +277,7 @@ struct rte_timer_data {
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as running */
-	prev_status.u32 = __atomic_load_n(&tim->status.u32, __ATOMIC_RELAXED);
+	prev_status.u32 = rte_atomic_load_explicit(&tim->status.u32, rte_memory_order_relaxed);
 
 	while (success == 0) {
 		/* timer is not pending anymore */
@@ -293,11 +293,11 @@ struct rte_timer_data {
 		 * timer is in RUNNING state, the state cannot be changed
 		 * by other threads. So, we should use ACQUIRE here.
 		 */
-		success = __atomic_compare_exchange_n(&tim->status.u32,
-					      &prev_status.u32,
-					      status.u32, 0,
-					      __ATOMIC_ACQUIRE,
-					      __ATOMIC_RELAXED);
+		success = rte_atomic_compare_exchange_strong_explicit(&tim->status.u32,
+					      (uint32_t *)(uintptr_t)&prev_status.u32,
+					      status.u32,
+					      rte_memory_order_acquire,
+					      rte_memory_order_relaxed);
 	}
 
 	return 0;
@@ -530,7 +530,7 @@ struct rte_timer_data {
 	/* The "RELEASE" ordering guarantees the memory operations above
 	 * the status update are observed before the update by all threads
 	 */
-	__atomic_store_n(&tim->status.u32, status.u32, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&tim->status.u32, status.u32, rte_memory_order_release);
 
 	if (tim_lcore != lcore_id || !local_is_locked)
 		rte_spinlock_unlock(&priv_timer[tim_lcore].list_lock);
@@ -612,7 +612,7 @@ struct rte_timer_data {
 	/* The "RELEASE" ordering guarantees the memory operations above
 	 * the status update are observed before the update by all threads
 	 */
-	__atomic_store_n(&tim->status.u32, status.u32, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&tim->status.u32, status.u32, rte_memory_order_release);
 
 	return 0;
 }
@@ -646,8 +646,8 @@ struct rte_timer_data {
 int
 rte_timer_pending(struct rte_timer *tim)
 {
-	return __atomic_load_n(&tim->status.state,
-				__ATOMIC_RELAXED) == RTE_TIMER_PENDING;
+	return rte_atomic_load_explicit(&tim->status.state,
+				rte_memory_order_relaxed) == RTE_TIMER_PENDING;
 }
 
 /* must be called periodically, run all timer that expired */
@@ -753,8 +753,8 @@ struct rte_timer_data {
 			 * operations above the status update are observed
 			 * before the update by all threads
 			 */
-			__atomic_store_n(&tim->status.u32, status.u32,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&tim->status.u32, status.u32,
+				rte_memory_order_release);
 		}
 		else {
 			/* keep it in list and mark timer as pending */
@@ -766,8 +766,8 @@ struct rte_timer_data {
 			 * operations above the status update are observed
 			 * before the update by all threads
 			 */
-			__atomic_store_n(&tim->status.u32, status.u32,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&tim->status.u32, status.u32,
+				rte_memory_order_release);
 			__rte_timer_reset(tim, tim->expire + tim->period,
 				tim->period, lcore_id, tim->f, tim->arg, 1,
 				timer_data);
@@ -941,8 +941,8 @@ struct rte_timer_data {
 			 * operations above the status update are observed
 			 * before the update by all threads
 			 */
-			__atomic_store_n(&tim->status.u32, status.u32,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&tim->status.u32, status.u32,
+				rte_memory_order_release);
 		} else {
 			/* keep it in list and mark timer as pending */
 			rte_spinlock_lock(
@@ -954,8 +954,8 @@ struct rte_timer_data {
 			 * operations above the status update are observed
 			 * before the update by all threads
 			 */
-			__atomic_store_n(&tim->status.u32, status.u32,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&tim->status.u32, status.u32,
+				rte_memory_order_release);
 			__rte_timer_reset(tim, tim->expire + tim->period,
 				tim->period, this_lcore, tim->f, tim->arg, 1,
 				data);
diff --git a/lib/timer/rte_timer.h b/lib/timer/rte_timer.h
index d3927d5..a35bc08 100644
--- a/lib/timer/rte_timer.h
+++ b/lib/timer/rte_timer.h
@@ -65,10 +65,10 @@ enum rte_timer_type {
  */
 union rte_timer_status {
 	struct {
-		uint16_t state;  /**< Stop, pending, running, config. */
-		int16_t owner;   /**< The lcore that owns the timer. */
+		RTE_ATOMIC(uint16_t) state;  /**< Stop, pending, running, config. */
+		RTE_ATOMIC(int16_t) owner;   /**< The lcore that owns the timer. */
 	};
-	uint32_t u32;            /**< To atomic-set status + owner. */
+	RTE_ATOMIC(uint32_t) u32;            /**< To atomic-set status + owner. */
 };
 
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v2 19/19] ring: use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (17 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 18/19] timer: " Tyler Retzlaff
@ 2023-10-17 20:31   ` Tyler Retzlaff
  2023-10-24  8:43     ` Konstantin Ananyev
  2023-10-17 23:55   ` [PATCH v2 00/19] " Stephen Hemminger
  19 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-17 20:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 drivers/net/mlx5/mlx5_hws_cnt.h   |  2 +-
 lib/ring/rte_ring_c11_pvt.h       | 33 +++++++++++++++++----------------
 lib/ring/rte_ring_core.h          | 10 +++++-----
 lib/ring/rte_ring_generic_pvt.h   |  3 ++-
 lib/ring/rte_ring_hts_elem_pvt.h  | 22 ++++++++++++----------
 lib/ring/rte_ring_peek_elem_pvt.h |  6 +++---
 lib/ring/rte_ring_rts_elem_pvt.h  | 27 ++++++++++++++-------------
 7 files changed, 54 insertions(+), 49 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
index f462665..cc9ac10 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.h
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -394,7 +394,7 @@ struct mlx5_hws_age_param {
 	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
 			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
 	/* Update tail */
-	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&r->prod.tail, revert2head, rte_memory_order_release);
 	return n;
 }
 
diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
index f895950..f8be538 100644
--- a/lib/ring/rte_ring_c11_pvt.h
+++ b/lib/ring/rte_ring_c11_pvt.h
@@ -22,9 +22,10 @@
 	 * we need to wait for them to complete
 	 */
 	if (!single)
-		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
+		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
+			rte_memory_order_relaxed);
 
-	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
 }
 
 /**
@@ -61,19 +62,19 @@
 	unsigned int max = n;
 	int success;
 
-	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
+	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
 	do {
 		/* Reset n to the initial burst count */
 		n = max;
 
 		/* Ensure the head is read before tail */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 
 		/* load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
-		cons_tail = __atomic_load_n(&r->cons.tail,
-					__ATOMIC_ACQUIRE);
+		cons_tail = rte_atomic_load_explicit(&r->cons.tail,
+					rte_memory_order_acquire);
 
 		/* The subtraction is done between two unsigned 32bits value
 		 * (the result is always modulo 32 bits even if we have
@@ -95,10 +96,10 @@
 			r->prod.head = *new_head, success = 1;
 		else
 			/* on failure, *old_head is updated */
-			success = __atomic_compare_exchange_n(&r->prod.head,
+			success = rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
 					old_head, *new_head,
-					0, __ATOMIC_RELAXED,
-					__ATOMIC_RELAXED);
+					rte_memory_order_relaxed,
+					rte_memory_order_relaxed);
 	} while (unlikely(success == 0));
 	return n;
 }
@@ -137,19 +138,19 @@
 	int success;
 
 	/* move cons.head atomically */
-	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
+	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_relaxed);
 	do {
 		/* Restore n as it may change every loop */
 		n = max;
 
 		/* Ensure the head is read before tail */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 
 		/* this load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
-		prod_tail = __atomic_load_n(&r->prod.tail,
-					__ATOMIC_ACQUIRE);
+		prod_tail = rte_atomic_load_explicit(&r->prod.tail,
+					rte_memory_order_acquire);
 
 		/* The subtraction is done between two unsigned 32bits value
 		 * (the result is always modulo 32 bits even if we have
@@ -170,10 +171,10 @@
 			r->cons.head = *new_head, success = 1;
 		else
 			/* on failure, *old_head will be updated */
-			success = __atomic_compare_exchange_n(&r->cons.head,
+			success = rte_atomic_compare_exchange_strong_explicit(&r->cons.head,
 							old_head, *new_head,
-							0, __ATOMIC_RELAXED,
-							__ATOMIC_RELAXED);
+							rte_memory_order_relaxed,
+							rte_memory_order_relaxed);
 	} while (unlikely(success == 0));
 	return n;
 }
diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 327fdcf..7a2b577 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -67,7 +67,7 @@ enum rte_ring_sync_type {
  */
 struct rte_ring_headtail {
 	volatile uint32_t head;      /**< prod/consumer head. */
-	volatile uint32_t tail;      /**< prod/consumer tail. */
+	volatile RTE_ATOMIC(uint32_t) tail;      /**< prod/consumer tail. */
 	union {
 		/** sync type of prod/cons */
 		enum rte_ring_sync_type sync_type;
@@ -78,7 +78,7 @@ struct rte_ring_headtail {
 
 union __rte_ring_rts_poscnt {
 	/** raw 8B value to read/write *cnt* and *pos* as one atomic op */
-	uint64_t raw __rte_aligned(8);
+	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
 	struct {
 		uint32_t cnt; /**< head/tail reference counter */
 		uint32_t pos; /**< head/tail position */
@@ -94,10 +94,10 @@ struct rte_ring_rts_headtail {
 
 union __rte_ring_hts_pos {
 	/** raw 8B value to read/write *head* and *tail* as one atomic op */
-	uint64_t raw __rte_aligned(8);
+	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
 	struct {
-		uint32_t head; /**< head position */
-		uint32_t tail; /**< tail position */
+		RTE_ATOMIC(uint32_t) head; /**< head position */
+		RTE_ATOMIC(uint32_t) tail; /**< tail position */
 	} pos;
 };
 
diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
index 5acb6e5..ffb3654 100644
--- a/lib/ring/rte_ring_generic_pvt.h
+++ b/lib/ring/rte_ring_generic_pvt.h
@@ -23,7 +23,8 @@
 	 * we need to wait for them to complete
 	 */
 	if (!single)
-		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
+		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
+			rte_memory_order_relaxed);
 
 	ht->tail = new_val;
 }
diff --git a/lib/ring/rte_ring_hts_elem_pvt.h b/lib/ring/rte_ring_hts_elem_pvt.h
index a8678d3..91f5eec 100644
--- a/lib/ring/rte_ring_hts_elem_pvt.h
+++ b/lib/ring/rte_ring_hts_elem_pvt.h
@@ -10,6 +10,8 @@
 #ifndef _RTE_RING_HTS_ELEM_PVT_H_
 #define _RTE_RING_HTS_ELEM_PVT_H_
 
+#include <rte_stdatomic.h>
+
 /**
  * @file rte_ring_hts_elem_pvt.h
  * It is not recommended to include this file directly,
@@ -30,7 +32,7 @@
 	RTE_SET_USED(enqueue);
 
 	tail = old_tail + num;
-	__atomic_store_n(&ht->ht.pos.tail, tail, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ht->ht.pos.tail, tail, rte_memory_order_release);
 }
 
 /**
@@ -44,7 +46,7 @@
 {
 	while (p->pos.head != p->pos.tail) {
 		rte_pause();
-		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+		p->raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_acquire);
 	}
 }
 
@@ -61,7 +63,7 @@
 
 	const uint32_t capacity = r->capacity;
 
-	op.raw = __atomic_load_n(&r->hts_prod.ht.raw, __ATOMIC_ACQUIRE);
+	op.raw = rte_atomic_load_explicit(&r->hts_prod.ht.raw, rte_memory_order_acquire);
 
 	do {
 		/* Reset n to the initial burst count */
@@ -98,9 +100,9 @@
 	 *  - OOO reads of cons tail value
 	 *  - OOO copy of elems from the ring
 	 */
-	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
-			&op.raw, np.raw,
-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_prod.ht.raw,
+			(uint64_t *)(uintptr_t)&op.raw, np.raw,
+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
 
 	*old_head = op.pos.head;
 	return n;
@@ -117,7 +119,7 @@
 	uint32_t n;
 	union __rte_ring_hts_pos np, op;
 
-	op.raw = __atomic_load_n(&r->hts_cons.ht.raw, __ATOMIC_ACQUIRE);
+	op.raw = rte_atomic_load_explicit(&r->hts_cons.ht.raw, rte_memory_order_acquire);
 
 	/* move cons.head atomically */
 	do {
@@ -153,9 +155,9 @@
 	 *  - OOO reads of prod tail value
 	 *  - OOO copy of elems from the ring
 	 */
-	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
-			&op.raw, np.raw,
-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_cons.ht.raw,
+			(uint64_t *)(uintptr_t)&op.raw, np.raw,
+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
 
 	*old_head = op.pos.head;
 	return n;
diff --git a/lib/ring/rte_ring_peek_elem_pvt.h b/lib/ring/rte_ring_peek_elem_pvt.h
index bb0a7d5..b5f0822 100644
--- a/lib/ring/rte_ring_peek_elem_pvt.h
+++ b/lib/ring/rte_ring_peek_elem_pvt.h
@@ -59,7 +59,7 @@
 
 	pos = tail + num;
 	ht->head = pos;
-	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ht->tail, pos, rte_memory_order_release);
 }
 
 /**
@@ -78,7 +78,7 @@
 	uint32_t n;
 	union __rte_ring_hts_pos p;
 
-	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
+	p.raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_relaxed);
 	n = p.pos.head - p.pos.tail;
 
 	RTE_ASSERT(n >= num);
@@ -104,7 +104,7 @@
 	p.pos.head = tail + num;
 	p.pos.tail = p.pos.head;
 
-	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ht->ht.raw, p.raw, rte_memory_order_release);
 }
 
 /**
diff --git a/lib/ring/rte_ring_rts_elem_pvt.h b/lib/ring/rte_ring_rts_elem_pvt.h
index 7164213..1226503 100644
--- a/lib/ring/rte_ring_rts_elem_pvt.h
+++ b/lib/ring/rte_ring_rts_elem_pvt.h
@@ -31,18 +31,19 @@
 	 * might preceded us, then don't update tail with new value.
 	 */
 
-	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
+	ot.raw = rte_atomic_load_explicit(&ht->tail.raw, rte_memory_order_acquire);
 
 	do {
 		/* on 32-bit systems we have to do atomic read here */
-		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
+		h.raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_relaxed);
 
 		nt.raw = ot.raw;
 		if (++nt.val.cnt == h.val.cnt)
 			nt.val.pos = h.val.pos;
 
-	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
-			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&ht->tail.raw,
+			(uint64_t *)(uintptr_t)&ot.raw, nt.raw,
+			rte_memory_order_release, rte_memory_order_acquire) == 0);
 }
 
 /**
@@ -59,7 +60,7 @@
 
 	while (h->val.pos - ht->tail.val.pos > max) {
 		rte_pause();
-		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+		h->raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_acquire);
 	}
 }
 
@@ -76,7 +77,7 @@
 
 	const uint32_t capacity = r->capacity;
 
-	oh.raw = __atomic_load_n(&r->rts_prod.head.raw, __ATOMIC_ACQUIRE);
+	oh.raw = rte_atomic_load_explicit(&r->rts_prod.head.raw, rte_memory_order_acquire);
 
 	do {
 		/* Reset n to the initial burst count */
@@ -113,9 +114,9 @@
 	 *  - OOO reads of cons tail value
 	 *  - OOO copy of elems to the ring
 	 */
-	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
-			&oh.raw, nh.raw,
-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_prod.head.raw,
+			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
 
 	*old_head = oh.val.pos;
 	return n;
@@ -132,7 +133,7 @@
 	uint32_t n;
 	union __rte_ring_rts_poscnt nh, oh;
 
-	oh.raw = __atomic_load_n(&r->rts_cons.head.raw, __ATOMIC_ACQUIRE);
+	oh.raw = rte_atomic_load_explicit(&r->rts_cons.head.raw, rte_memory_order_acquire);
 
 	/* move cons.head atomically */
 	do {
@@ -168,9 +169,9 @@
 	 *  - OOO reads of prod tail value
 	 *  - OOO copy of elems from the ring
 	 */
-	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
-			&oh.raw, nh.raw,
-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_cons.head.raw,
+			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
 
 	*old_head = oh.val.pos;
 	return n;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 00/19] use rte optional stdatomic API
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
                     ` (18 preceding siblings ...)
  2023-10-17 20:31   ` [PATCH v2 19/19] ring: " Tyler Retzlaff
@ 2023-10-17 23:55   ` Stephen Hemminger
  19 siblings, 0 replies; 91+ messages in thread
From: Stephen Hemminger @ 2023-10-17 23:55 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Akhil Goyal, Anatoly Burakov, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Ciara Power, David Christensen,
	David Hunt, Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

On Tue, 17 Oct 2023 13:30:58 -0700
Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:

> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API.
> 
> v2:
>   * add #include <rte_stdatomic.h> to rte_mbuf_core.h
>   * remove first two patches which were fixes that have
>     been merged in another series

Looks good.
It does look like lots of places are doing per-cpu statistics
and maybe a set of helpers for that would make it simpler.

Linux has percpu_counter_XXX helpers and NET_INC_STATS() macro.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 19/19] ring: use rte optional stdatomic API
  2023-10-17 20:31   ` [PATCH v2 19/19] ring: " Tyler Retzlaff
@ 2023-10-24  8:43     ` Konstantin Ananyev
  2023-10-24  9:56       ` Morten Brørup
  2023-10-24 16:29       ` Tyler Retzlaff
  0 siblings, 2 replies; 91+ messages in thread
From: Konstantin Ananyev @ 2023-10-24  8:43 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

17.10.2023 21:31, Tyler Retzlaff пишет:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   drivers/net/mlx5/mlx5_hws_cnt.h   |  2 +-
>   lib/ring/rte_ring_c11_pvt.h       | 33 +++++++++++++++++----------------
>   lib/ring/rte_ring_core.h          | 10 +++++-----
>   lib/ring/rte_ring_generic_pvt.h   |  3 ++-
>   lib/ring/rte_ring_hts_elem_pvt.h  | 22 ++++++++++++----------
>   lib/ring/rte_ring_peek_elem_pvt.h |  6 +++---
>   lib/ring/rte_ring_rts_elem_pvt.h  | 27 ++++++++++++++-------------
>   7 files changed, 54 insertions(+), 49 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
> index f462665..cc9ac10 100644
> --- a/drivers/net/mlx5/mlx5_hws_cnt.h
> +++ b/drivers/net/mlx5/mlx5_hws_cnt.h
> @@ -394,7 +394,7 @@ struct mlx5_hws_age_param {
>   	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
>   			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
>   	/* Update tail */
> -	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
> +	rte_atomic_store_explicit(&r->prod.tail, revert2head, rte_memory_order_release);
>   	return n;
>   }
>   
> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> index f895950..f8be538 100644
> --- a/lib/ring/rte_ring_c11_pvt.h
> +++ b/lib/ring/rte_ring_c11_pvt.h
> @@ -22,9 +22,10 @@
>   	 * we need to wait for them to complete
>   	 */
>   	if (!single)
> -		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> +		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
> +			rte_memory_order_relaxed);
>   
> -	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
> +	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
>   }
>   
>   /**
> @@ -61,19 +62,19 @@
>   	unsigned int max = n;
>   	int success;
>   
> -	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
> +	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
>   	do {
>   		/* Reset n to the initial burst count */
>   		n = max;
>   
>   		/* Ensure the head is read before tail */
> -		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> +		__atomic_thread_fence(rte_memory_order_acquire);
>   
>   		/* load-acquire synchronize with store-release of ht->tail
>   		 * in update_tail.
>   		 */
> -		cons_tail = __atomic_load_n(&r->cons.tail,
> -					__ATOMIC_ACQUIRE);
> +		cons_tail = rte_atomic_load_explicit(&r->cons.tail,
> +					rte_memory_order_acquire);
>   
>   		/* The subtraction is done between two unsigned 32bits value
>   		 * (the result is always modulo 32 bits even if we have
> @@ -95,10 +96,10 @@
>   			r->prod.head = *new_head, success = 1;
>   		else
>   			/* on failure, *old_head is updated */
> -			success = __atomic_compare_exchange_n(&r->prod.head,
> +			success = rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
>   					old_head, *new_head,
> -					0, __ATOMIC_RELAXED,
> -					__ATOMIC_RELAXED);
> +					rte_memory_order_relaxed,
> +					rte_memory_order_relaxed);
>   	} while (unlikely(success == 0));
>   	return n;
>   }
> @@ -137,19 +138,19 @@
>   	int success;
>   
>   	/* move cons.head atomically */
> -	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
> +	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_relaxed);
>   	do {
>   		/* Restore n as it may change every loop */
>   		n = max;
>   
>   		/* Ensure the head is read before tail */
> -		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> +		__atomic_thread_fence(rte_memory_order_acquire);
>   
>   		/* this load-acquire synchronize with store-release of ht->tail
>   		 * in update_tail.
>   		 */
> -		prod_tail = __atomic_load_n(&r->prod.tail,
> -					__ATOMIC_ACQUIRE);
> +		prod_tail = rte_atomic_load_explicit(&r->prod.tail,
> +					rte_memory_order_acquire);
>   
>   		/* The subtraction is done between two unsigned 32bits value
>   		 * (the result is always modulo 32 bits even if we have
> @@ -170,10 +171,10 @@
>   			r->cons.head = *new_head, success = 1;
>   		else
>   			/* on failure, *old_head will be updated */
> -			success = __atomic_compare_exchange_n(&r->cons.head,
> +			success = rte_atomic_compare_exchange_strong_explicit(&r->cons.head,
>   							old_head, *new_head,
> -							0, __ATOMIC_RELAXED,
> -							__ATOMIC_RELAXED);
> +							rte_memory_order_relaxed,
> +							rte_memory_order_relaxed);
>   	} while (unlikely(success == 0));
>   	return n;
>   }
> diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
> index 327fdcf..7a2b577 100644
> --- a/lib/ring/rte_ring_core.h
> +++ b/lib/ring/rte_ring_core.h
> @@ -67,7 +67,7 @@ enum rte_ring_sync_type {
>    */
>   struct rte_ring_headtail {
>   	volatile uint32_t head;      /**< prod/consumer head. */
> -	volatile uint32_t tail;      /**< prod/consumer tail. */
> +	volatile RTE_ATOMIC(uint32_t) tail;      /**< prod/consumer tail. */

Probably a stupid q:
why we do need RTE_ATOMIC() around tail only?
Why head is not affected?

>   	union {
>   		/** sync type of prod/cons */
>   		enum rte_ring_sync_type sync_type;
> @@ -78,7 +78,7 @@ struct rte_ring_headtail {
>   
>   union __rte_ring_rts_poscnt {
>   	/** raw 8B value to read/write *cnt* and *pos* as one atomic op */
> -	uint64_t raw __rte_aligned(8);
> +	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
>   	struct {
>   		uint32_t cnt; /**< head/tail reference counter */
>   		uint32_t pos; /**< head/tail position */
> @@ -94,10 +94,10 @@ struct rte_ring_rts_headtail {
>   
>   union __rte_ring_hts_pos {
>   	/** raw 8B value to read/write *head* and *tail* as one atomic op */
> -	uint64_t raw __rte_aligned(8);
> +	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
>   	struct {
> -		uint32_t head; /**< head position */
> -		uint32_t tail; /**< tail position */
> +		RTE_ATOMIC(uint32_t) head; /**< head position */
> +		RTE_ATOMIC(uint32_t) tail; /**< tail position */
>   	} pos;
>   };
>   
> diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
> index 5acb6e5..ffb3654 100644
> --- a/lib/ring/rte_ring_generic_pvt.h
> +++ b/lib/ring/rte_ring_generic_pvt.h
> @@ -23,7 +23,8 @@
>   	 * we need to wait for them to complete
>   	 */
>   	if (!single)
> -		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> +		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,

I suppose we do need that double type conversion only for atomic types 
right?

> +			rte_memory_order_relaxed);
>   
>   	ht->tail = new_val;
>   }
> diff --git a/lib/ring/rte_ring_hts_elem_pvt.h b/lib/ring/rte_ring_hts_elem_pvt.h
> index a8678d3..91f5eec 100644
> --- a/lib/ring/rte_ring_hts_elem_pvt.h
> +++ b/lib/ring/rte_ring_hts_elem_pvt.h
> @@ -10,6 +10,8 @@
>   #ifndef _RTE_RING_HTS_ELEM_PVT_H_
>   #define _RTE_RING_HTS_ELEM_PVT_H_
>   
> +#include <rte_stdatomic.h>
> +
>   /**
>    * @file rte_ring_hts_elem_pvt.h
>    * It is not recommended to include this file directly,
> @@ -30,7 +32,7 @@
>   	RTE_SET_USED(enqueue);
>   
>   	tail = old_tail + num;
> -	__atomic_store_n(&ht->ht.pos.tail, tail, __ATOMIC_RELEASE);
> +	rte_atomic_store_explicit(&ht->ht.pos.tail, tail, rte_memory_order_release);
>   }
>   
>   /**
> @@ -44,7 +46,7 @@
>   {
>   	while (p->pos.head != p->pos.tail) {
>   		rte_pause();
> -		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
> +		p->raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_acquire);
>   	}
>   }
>   
> @@ -61,7 +63,7 @@
>   
>   	const uint32_t capacity = r->capacity;
>   
> -	op.raw = __atomic_load_n(&r->hts_prod.ht.raw, __ATOMIC_ACQUIRE);
> +	op.raw = rte_atomic_load_explicit(&r->hts_prod.ht.raw, rte_memory_order_acquire);
>   
>   	do {
>   		/* Reset n to the initial burst count */
> @@ -98,9 +100,9 @@
>   	 *  - OOO reads of cons tail value
>   	 *  - OOO copy of elems from the ring
>   	 */
> -	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
> -			&op.raw, np.raw,
> -			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> +	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_prod.ht.raw,
> +			(uint64_t *)(uintptr_t)&op.raw, np.raw,
> +			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
>   
>   	*old_head = op.pos.head;
>   	return n;
> @@ -117,7 +119,7 @@
>   	uint32_t n;
>   	union __rte_ring_hts_pos np, op;
>   
> -	op.raw = __atomic_load_n(&r->hts_cons.ht.raw, __ATOMIC_ACQUIRE);
> +	op.raw = rte_atomic_load_explicit(&r->hts_cons.ht.raw, rte_memory_order_acquire);
>   
>   	/* move cons.head atomically */
>   	do {
> @@ -153,9 +155,9 @@
>   	 *  - OOO reads of prod tail value
>   	 *  - OOO copy of elems from the ring
>   	 */
> -	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
> -			&op.raw, np.raw,
> -			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> +	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_cons.ht.raw,
> +			(uint64_t *)(uintptr_t)&op.raw, np.raw,
> +			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
>   
>   	*old_head = op.pos.head;
>   	return n;
> diff --git a/lib/ring/rte_ring_peek_elem_pvt.h b/lib/ring/rte_ring_peek_elem_pvt.h
> index bb0a7d5..b5f0822 100644
> --- a/lib/ring/rte_ring_peek_elem_pvt.h
> +++ b/lib/ring/rte_ring_peek_elem_pvt.h
> @@ -59,7 +59,7 @@
>   
>   	pos = tail + num;
>   	ht->head = pos;
> -	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
> +	rte_atomic_store_explicit(&ht->tail, pos, rte_memory_order_release);
>   }
>   
>   /**
> @@ -78,7 +78,7 @@
>   	uint32_t n;
>   	union __rte_ring_hts_pos p;
>   
> -	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
> +	p.raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_relaxed);
>   	n = p.pos.head - p.pos.tail;
>   
>   	RTE_ASSERT(n >= num);
> @@ -104,7 +104,7 @@
>   	p.pos.head = tail + num;
>   	p.pos.tail = p.pos.head;
>   
> -	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
> +	rte_atomic_store_explicit(&ht->ht.raw, p.raw, rte_memory_order_release);
>   }
>   
>   /**
> diff --git a/lib/ring/rte_ring_rts_elem_pvt.h b/lib/ring/rte_ring_rts_elem_pvt.h
> index 7164213..1226503 100644
> --- a/lib/ring/rte_ring_rts_elem_pvt.h
> +++ b/lib/ring/rte_ring_rts_elem_pvt.h
> @@ -31,18 +31,19 @@
>   	 * might preceded us, then don't update tail with new value.
>   	 */
>   
> -	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
> +	ot.raw = rte_atomic_load_explicit(&ht->tail.raw, rte_memory_order_acquire);
>   
>   	do {
>   		/* on 32-bit systems we have to do atomic read here */
> -		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
> +		h.raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_relaxed);
>   
>   		nt.raw = ot.raw;
>   		if (++nt.val.cnt == h.val.cnt)
>   			nt.val.pos = h.val.pos;
>   
> -	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
> -			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
> +	} while (rte_atomic_compare_exchange_strong_explicit(&ht->tail.raw,
> +			(uint64_t *)(uintptr_t)&ot.raw, nt.raw,
> +			rte_memory_order_release, rte_memory_order_acquire) == 0);
>   }
>   
>   /**
> @@ -59,7 +60,7 @@
>   
>   	while (h->val.pos - ht->tail.val.pos > max) {
>   		rte_pause();
> -		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
> +		h->raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_acquire);
>   	}
>   }
>   
> @@ -76,7 +77,7 @@
>   
>   	const uint32_t capacity = r->capacity;
>   
> -	oh.raw = __atomic_load_n(&r->rts_prod.head.raw, __ATOMIC_ACQUIRE);
> +	oh.raw = rte_atomic_load_explicit(&r->rts_prod.head.raw, rte_memory_order_acquire);
>   
>   	do {
>   		/* Reset n to the initial burst count */
> @@ -113,9 +114,9 @@
>   	 *  - OOO reads of cons tail value
>   	 *  - OOO copy of elems to the ring
>   	 */
> -	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
> -			&oh.raw, nh.raw,
> -			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> +	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_prod.head.raw,
> +			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
> +			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
>   
>   	*old_head = oh.val.pos;
>   	return n;
> @@ -132,7 +133,7 @@
>   	uint32_t n;
>   	union __rte_ring_rts_poscnt nh, oh;
>   
> -	oh.raw = __atomic_load_n(&r->rts_cons.head.raw, __ATOMIC_ACQUIRE);
> +	oh.raw = rte_atomic_load_explicit(&r->rts_cons.head.raw, rte_memory_order_acquire);
>   
>   	/* move cons.head atomically */
>   	do {
> @@ -168,9 +169,9 @@
>   	 *  - OOO reads of prod tail value
>   	 *  - OOO copy of elems from the ring
>   	 */
> -	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
> -			&oh.raw, nh.raw,
> -			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> +	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_cons.head.raw,
> +			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
> +			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
>   
>   	*old_head = oh.val.pos;
>   	return n;


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 06/19] ipsec: use rte optional stdatomic API
  2023-10-17 20:31   ` [PATCH v2 06/19] ipsec: " Tyler Retzlaff
@ 2023-10-24  8:45     ` Konstantin Ananyev
  0 siblings, 0 replies; 91+ messages in thread
From: Konstantin Ananyev @ 2023-10-24  8:45 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

17.10.2023 21:31, Tyler Retzlaff пишет:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/ipsec/ipsec_sqn.h | 2 +-
>   lib/ipsec/sa.h        | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/ipsec/ipsec_sqn.h b/lib/ipsec/ipsec_sqn.h
> index 505950e..984a9dd 100644
> --- a/lib/ipsec/ipsec_sqn.h
> +++ b/lib/ipsec/ipsec_sqn.h
> @@ -128,7 +128,7 @@
>   
>   	n = *num;
>   	if (SQN_ATOMIC(sa))
> -		sqn = __atomic_fetch_add(&sa->sqn.outb, n, __ATOMIC_RELAXED) + n;
> +		sqn = rte_atomic_fetch_add_explicit(&sa->sqn.outb, n, rte_memory_order_relaxed) + n;
>   	else {
>   		sqn = sa->sqn.outb + n;
>   		sa->sqn.outb = sqn;
> diff --git a/lib/ipsec/sa.h b/lib/ipsec/sa.h
> index ce4af8c..4b30bea 100644
> --- a/lib/ipsec/sa.h
> +++ b/lib/ipsec/sa.h
> @@ -124,7 +124,7 @@ struct rte_ipsec_sa {
>   	 * place from other frequently accessed data.
>   	 */
>   	union {
> -		uint64_t outb;
> +		RTE_ATOMIC(uint64_t) outb;
>   		struct {
>   			uint32_t rdidx; /* read index */
>   			uint32_t wridx; /* write index */


Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 07/19] mbuf: use rte optional stdatomic API
  2023-10-17 20:31   ` [PATCH v2 07/19] mbuf: " Tyler Retzlaff
@ 2023-10-24  8:46     ` Konstantin Ananyev
  0 siblings, 0 replies; 91+ messages in thread
From: Konstantin Ananyev @ 2023-10-24  8:46 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

17.10.2023 21:31, Tyler Retzlaff пишет:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/mbuf/rte_mbuf.h      | 20 ++++++++++----------
>   lib/mbuf/rte_mbuf_core.h |  5 +++--
>   2 files changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> index 913c459..b8ab477 100644
> --- a/lib/mbuf/rte_mbuf.h
> +++ b/lib/mbuf/rte_mbuf.h
> @@ -361,7 +361,7 @@ struct rte_pktmbuf_pool_private {
>   static inline uint16_t
>   rte_mbuf_refcnt_read(const struct rte_mbuf *m)
>   {
> -	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
> +	return rte_atomic_load_explicit(&m->refcnt, rte_memory_order_relaxed);
>   }
>   
>   /**
> @@ -374,15 +374,15 @@ struct rte_pktmbuf_pool_private {
>   static inline void
>   rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
>   {
> -	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
> +	rte_atomic_store_explicit(&m->refcnt, new_value, rte_memory_order_relaxed);
>   }
>   
>   /* internal */
>   static inline uint16_t
>   __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
>   {
> -	return __atomic_fetch_add(&m->refcnt, value,
> -				 __ATOMIC_ACQ_REL) + value;
> +	return rte_atomic_fetch_add_explicit(&m->refcnt, value,
> +				 rte_memory_order_acq_rel) + value;
>   }
>   
>   /**
> @@ -463,7 +463,7 @@ struct rte_pktmbuf_pool_private {
>   static inline uint16_t
>   rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
>   {
> -	return __atomic_load_n(&shinfo->refcnt, __ATOMIC_RELAXED);
> +	return rte_atomic_load_explicit(&shinfo->refcnt, rte_memory_order_relaxed);
>   }
>   
>   /**
> @@ -478,7 +478,7 @@ struct rte_pktmbuf_pool_private {
>   rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
>   	uint16_t new_value)
>   {
> -	__atomic_store_n(&shinfo->refcnt, new_value, __ATOMIC_RELAXED);
> +	rte_atomic_store_explicit(&shinfo->refcnt, new_value, rte_memory_order_relaxed);
>   }
>   
>   /**
> @@ -502,8 +502,8 @@ struct rte_pktmbuf_pool_private {
>   		return (uint16_t)value;
>   	}
>   
> -	return __atomic_fetch_add(&shinfo->refcnt, value,
> -				 __ATOMIC_ACQ_REL) + value;
> +	return rte_atomic_fetch_add_explicit(&shinfo->refcnt, value,
> +				 rte_memory_order_acq_rel) + value;
>   }
>   
>   /** Mbuf prefetch */
> @@ -1315,8 +1315,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
>   	 * Direct usage of add primitive to avoid
>   	 * duplication of comparing with one.
>   	 */
> -	if (likely(__atomic_fetch_add(&shinfo->refcnt, -1,
> -				     __ATOMIC_ACQ_REL) - 1))
> +	if (likely(rte_atomic_fetch_add_explicit(&shinfo->refcnt, -1,
> +				     rte_memory_order_acq_rel) - 1))
>   		return 1;
>   
>   	/* Reinitialize counter before mbuf freeing. */
> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> index e9bc0d1..5688683 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -19,6 +19,7 @@
>   #include <stdint.h>
>   
>   #include <rte_byteorder.h>
> +#include <rte_stdatomic.h>
>   
>   #ifdef __cplusplus
>   extern "C" {
> @@ -497,7 +498,7 @@ struct rte_mbuf {
>   	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
>   	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
>   	 */
> -	uint16_t refcnt;
> +	RTE_ATOMIC(uint16_t) refcnt;
>   
>   	/**
>   	 * Number of segments. Only valid for the first segment of an mbuf
> @@ -674,7 +675,7 @@ struct rte_mbuf {
>   struct rte_mbuf_ext_shared_info {
>   	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
>   	void *fcb_opaque;                        /**< Free callback argument */
> -	uint16_t refcnt;
> +	RTE_ATOMIC(uint16_t) refcnt;
>   };
>   
>   /** Maximum number of nb_segs allowed. */

Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 08/19] mempool: use rte optional stdatomic API
  2023-10-17 20:31   ` [PATCH v2 08/19] mempool: " Tyler Retzlaff
@ 2023-10-24  8:47     ` Konstantin Ananyev
  0 siblings, 0 replies; 91+ messages in thread
From: Konstantin Ananyev @ 2023-10-24  8:47 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

17.10.2023 21:31, Tyler Retzlaff пишет:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/mempool/rte_mempool.h | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index f70bf36..df87cd2 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -327,8 +327,8 @@ struct rte_mempool {
>   		if (likely(__lcore_id < RTE_MAX_LCORE))                         \
>   			(mp)->stats[__lcore_id].name += (n);                    \
>   		else                                                            \
> -			__atomic_fetch_add(&((mp)->stats[RTE_MAX_LCORE].name),  \
> -					   (n), __ATOMIC_RELAXED);              \
> +			rte_atomic_fetch_add_explicit(&((mp)->stats[RTE_MAX_LCORE].name),  \
> +					   (n), rte_memory_order_relaxed);              \
>   	} while (0)
>   #else
>   #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {} while (0)

Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>



^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 11/19] stack: use rte optional stdatomic API
  2023-10-17 20:31   ` [PATCH v2 11/19] stack: " Tyler Retzlaff
@ 2023-10-24  8:48     ` Konstantin Ananyev
  0 siblings, 0 replies; 91+ messages in thread
From: Konstantin Ananyev @ 2023-10-24  8:48 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

17.10.2023 21:31, Tyler Retzlaff пишет:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/stack/rte_stack.h            |  2 +-
>   lib/stack/rte_stack_lf_c11.h     | 24 ++++++++++++------------
>   lib/stack/rte_stack_lf_generic.h | 18 +++++++++---------
>   3 files changed, 22 insertions(+), 22 deletions(-)
> 
> diff --git a/lib/stack/rte_stack.h b/lib/stack/rte_stack.h
> index 921d29a..a379300 100644
> --- a/lib/stack/rte_stack.h
> +++ b/lib/stack/rte_stack.h
> @@ -44,7 +44,7 @@ struct rte_stack_lf_list {
>   	/** List head */
>   	struct rte_stack_lf_head head __rte_aligned(16);
>   	/** List len */
> -	uint64_t len;
> +	RTE_ATOMIC(uint64_t) len;
>   };
>   
>   /* Structure containing two lock-free LIFO lists: the stack itself and a list
> diff --git a/lib/stack/rte_stack_lf_c11.h b/lib/stack/rte_stack_lf_c11.h
> index 687a6f6..9cb6998 100644
> --- a/lib/stack/rte_stack_lf_c11.h
> +++ b/lib/stack/rte_stack_lf_c11.h
> @@ -26,8 +26,8 @@
>   	 * elements. If the mempool is near-empty to the point that this is a
>   	 * concern, the user should consider increasing the mempool size.
>   	 */
> -	return (unsigned int)__atomic_load_n(&s->stack_lf.used.len,
> -					     __ATOMIC_RELAXED);
> +	return (unsigned int)rte_atomic_load_explicit(&s->stack_lf.used.len,
> +					     rte_memory_order_relaxed);
>   }
>   
>   static __rte_always_inline void
> @@ -59,14 +59,14 @@
>   				(rte_int128_t *)&list->head,
>   				(rte_int128_t *)&old_head,
>   				(rte_int128_t *)&new_head,
> -				1, __ATOMIC_RELEASE,
> -				__ATOMIC_RELAXED);
> +				1, rte_memory_order_release,
> +				rte_memory_order_relaxed);
>   	} while (success == 0);
>   
>   	/* Ensure the stack modifications are not reordered with respect
>   	 * to the LIFO len update.
>   	 */
> -	__atomic_fetch_add(&list->len, num, __ATOMIC_RELEASE);
> +	rte_atomic_fetch_add_explicit(&list->len, num, rte_memory_order_release);
>   }
>   
>   static __rte_always_inline struct rte_stack_lf_elem *
> @@ -80,7 +80,7 @@
>   	int success;
>   
>   	/* Reserve num elements, if available */
> -	len = __atomic_load_n(&list->len, __ATOMIC_RELAXED);
> +	len = rte_atomic_load_explicit(&list->len, rte_memory_order_relaxed);
>   
>   	while (1) {
>   		/* Does the list contain enough elements? */
> @@ -88,10 +88,10 @@
>   			return NULL;
>   
>   		/* len is updated on failure */
> -		if (__atomic_compare_exchange_n(&list->len,
> +		if (rte_atomic_compare_exchange_weak_explicit(&list->len,
>   						&len, len - num,
> -						1, __ATOMIC_ACQUIRE,
> -						__ATOMIC_RELAXED))
> +						rte_memory_order_acquire,
> +						rte_memory_order_relaxed))
>   			break;
>   	}
>   
> @@ -110,7 +110,7 @@
>   		 * elements are properly ordered with respect to the head
>   		 * pointer read.
>   		 */
> -		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> +		__atomic_thread_fence(rte_memory_order_acquire);
>   
>   		rte_prefetch0(old_head.top);
>   
> @@ -159,8 +159,8 @@
>   				(rte_int128_t *)&list->head,
>   				(rte_int128_t *)&old_head,
>   				(rte_int128_t *)&new_head,
> -				0, __ATOMIC_RELAXED,
> -				__ATOMIC_RELAXED);
> +				0, rte_memory_order_relaxed,
> +				rte_memory_order_relaxed);
>   	} while (success == 0);
>   
>   	return old_head.top;
> diff --git a/lib/stack/rte_stack_lf_generic.h b/lib/stack/rte_stack_lf_generic.h
> index 39f7ff3..cc69e4d 100644
> --- a/lib/stack/rte_stack_lf_generic.h
> +++ b/lib/stack/rte_stack_lf_generic.h
> @@ -27,7 +27,7 @@
>   	 * concern, the user should consider increasing the mempool size.
>   	 */
>   	/* NOTE: review for potential ordering optimization */
> -	return __atomic_load_n(&s->stack_lf.used.len, __ATOMIC_SEQ_CST);
> +	return rte_atomic_load_explicit(&s->stack_lf.used.len, rte_memory_order_seq_cst);
>   }
>   
>   static __rte_always_inline void
> @@ -64,11 +64,11 @@
>   				(rte_int128_t *)&list->head,
>   				(rte_int128_t *)&old_head,
>   				(rte_int128_t *)&new_head,
> -				1, __ATOMIC_RELEASE,
> -				__ATOMIC_RELAXED);
> +				1, rte_memory_order_release,
> +				rte_memory_order_relaxed);
>   	} while (success == 0);
>   	/* NOTE: review for potential ordering optimization */
> -	__atomic_fetch_add(&list->len, num, __ATOMIC_SEQ_CST);
> +	rte_atomic_fetch_add_explicit(&list->len, num, rte_memory_order_seq_cst);
>   }
>   
>   static __rte_always_inline struct rte_stack_lf_elem *
> @@ -83,15 +83,15 @@
>   	/* Reserve num elements, if available */
>   	while (1) {
>   		/* NOTE: review for potential ordering optimization */
> -		uint64_t len = __atomic_load_n(&list->len, __ATOMIC_SEQ_CST);
> +		uint64_t len = rte_atomic_load_explicit(&list->len, rte_memory_order_seq_cst);
>   
>   		/* Does the list contain enough elements? */
>   		if (unlikely(len < num))
>   			return NULL;
>   
>   		/* NOTE: review for potential ordering optimization */
> -		if (__atomic_compare_exchange_n(&list->len, &len, len - num,
> -				0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
> +		if (rte_atomic_compare_exchange_strong_explicit(&list->len, &len, len - num,
> +				rte_memory_order_seq_cst, rte_memory_order_seq_cst))
>   			break;
>   	}
>   
> @@ -143,8 +143,8 @@
>   				(rte_int128_t *)&list->head,
>   				(rte_int128_t *)&old_head,
>   				(rte_int128_t *)&new_head,
> -				1, __ATOMIC_RELEASE,
> -				__ATOMIC_RELAXED);
> +				1, rte_memory_order_release,
> +				rte_memory_order_relaxed);
>   	} while (success == 0);
>   
>   	return old_head.top;

Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: [PATCH v2 19/19] ring: use rte optional stdatomic API
  2023-10-24  8:43     ` Konstantin Ananyev
@ 2023-10-24  9:56       ` Morten Brørup
  2023-10-24 15:58         ` Tyler Retzlaff
  2023-10-24 16:29       ` Tyler Retzlaff
  1 sibling, 1 reply; 91+ messages in thread
From: Morten Brørup @ 2023-10-24  9:56 UTC (permalink / raw)
  To: Konstantin Ananyev, Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

> From: Konstantin Ananyev [mailto:konstantin.v.ananyev@yandex.ru]
> Sent: Tuesday, 24 October 2023 10.43
> 
> 17.10.2023 21:31, Tyler Retzlaff пишет:
> > Replace the use of gcc builtin __atomic_xxx intrinsics with
> > corresponding rte_atomic_xxx optional stdatomic API
> >
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---

[...]

> >   	if (!single)
> > -		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> > +		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht-
> >tail, old_val,
> 
> I suppose we do need that double type conversion only for atomic types
> right?
> 
> > +			rte_memory_order_relaxed);
> >
> >   	ht->tail = new_val;
> >   }

This got me thinking...

Do we want to cast away the value's atomic attribute like this, or should we introduce new rte_atomic_wait_XX() functions with the parameters being pointers to atomic values, instead of pointers to simple values?

Just a thought.

The initial rte_atomic_wait_XX() implementations could simply cast a away the atomic attribute like here.


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 19/19] ring: use rte optional stdatomic API
  2023-10-24  9:56       ` Morten Brørup
@ 2023-10-24 15:58         ` Tyler Retzlaff
  2023-10-24 16:36           ` Morten Brørup
  0 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-24 15:58 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Konstantin Ananyev, dev, Akhil Goyal, Anatoly Burakov,
	Andrew Rybchenko, Bruce Richardson, Chenbo Xia, Ciara Power,
	David Christensen, David Hunt, Dmitry Kozlyuk, Dmitry Malloy,
	Elena Agostini, Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit,
	Harman Kalra, Harry van Haaren, Honnappa Nagarahalli,
	Jerin Jacob, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang

On Tue, Oct 24, 2023 at 11:56:11AM +0200, Morten Brørup wrote:
> > From: Konstantin Ananyev [mailto:konstantin.v.ananyev@yandex.ru]
> > Sent: Tuesday, 24 October 2023 10.43
> > 
> > 17.10.2023 21:31, Tyler Retzlaff пишет:
> > > Replace the use of gcc builtin __atomic_xxx intrinsics with
> > > corresponding rte_atomic_xxx optional stdatomic API
> > >
> > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > ---
> 
> [...]
> 
> > >   	if (!single)
> > > -		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> > > +		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht-
> > >tail, old_val,
> > 
> > I suppose we do need that double type conversion only for atomic types
> > right?
> > 
> > > +			rte_memory_order_relaxed);
> > >
> > >   	ht->tail = new_val;
> > >   }
> 
> This got me thinking...
> 
> Do we want to cast away the value's atomic attribute like this, or should we introduce new rte_atomic_wait_XX() functions with the parameters being pointers to atomic values, instead of pointers to simple values?

just some notes here.

so first let me start with it's okay to do this cast but only because we
have knowledge of the internal implementation detail and this series has
to do this in a few places.

basically internally the actual atomic operation is fed back into an
intrinsic/builtin that is either re-qualified as __rte_atomic or doesn't
require qualification. i agree it isn't optimal since we have to take
care should we ever alter the implementation to avoid compatibility
problems but unlikely for it to be changed.

we could provide new api but i'm not sure we can do that this late in
the release cycle. notably i think it would be nicer if it *could* be
made to be 'generic' as used literally in the atomics documentation
which means it may operate on non-integer and non-pointer types.

> 
> Just a thought.
> 
> The initial rte_atomic_wait_XX() implementations could simply cast a away the atomic attribute like here.
> 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 19/19] ring: use rte optional stdatomic API
  2023-10-24  8:43     ` Konstantin Ananyev
  2023-10-24  9:56       ` Morten Brørup
@ 2023-10-24 16:29       ` Tyler Retzlaff
  2023-10-25 10:06         ` Konstantin Ananyev
  1 sibling, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-24 16:29 UTC (permalink / raw)
  To: Konstantin Ananyev
  Cc: dev, Akhil Goyal, Anatoly Burakov, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Ciara Power, David Christensen,
	David Hunt, Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

On Tue, Oct 24, 2023 at 09:43:13AM +0100, Konstantin Ananyev wrote:
> 17.10.2023 21:31, Tyler Retzlaff пишет:
> >Replace the use of gcc builtin __atomic_xxx intrinsics with
> >corresponding rte_atomic_xxx optional stdatomic API
> >
> >Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >---
> >  drivers/net/mlx5/mlx5_hws_cnt.h   |  2 +-
> >  lib/ring/rte_ring_c11_pvt.h       | 33 +++++++++++++++++----------------
> >  lib/ring/rte_ring_core.h          | 10 +++++-----
> >  lib/ring/rte_ring_generic_pvt.h   |  3 ++-
> >  lib/ring/rte_ring_hts_elem_pvt.h  | 22 ++++++++++++----------
> >  lib/ring/rte_ring_peek_elem_pvt.h |  6 +++---
> >  lib/ring/rte_ring_rts_elem_pvt.h  | 27 ++++++++++++++-------------
> >  7 files changed, 54 insertions(+), 49 deletions(-)
> >
> >diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
> >index f462665..cc9ac10 100644
> >--- a/drivers/net/mlx5/mlx5_hws_cnt.h
> >+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
> >@@ -394,7 +394,7 @@ struct mlx5_hws_age_param {
> >  	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
> >  			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
> >  	/* Update tail */
> >-	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
> >+	rte_atomic_store_explicit(&r->prod.tail, revert2head, rte_memory_order_release);
> >  	return n;
> >  }
> >diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> >index f895950..f8be538 100644
> >--- a/lib/ring/rte_ring_c11_pvt.h
> >+++ b/lib/ring/rte_ring_c11_pvt.h
> >@@ -22,9 +22,10 @@
> >  	 * we need to wait for them to complete
> >  	 */
> >  	if (!single)
> >-		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> >+		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
> >+			rte_memory_order_relaxed);
> >-	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
> >+	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
> >  }
> >  /**
> >@@ -61,19 +62,19 @@
> >  	unsigned int max = n;
> >  	int success;
> >-	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
> >+	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
> >  	do {
> >  		/* Reset n to the initial burst count */
> >  		n = max;
> >  		/* Ensure the head is read before tail */
> >-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> >+		__atomic_thread_fence(rte_memory_order_acquire);
> >  		/* load-acquire synchronize with store-release of ht->tail
> >  		 * in update_tail.
> >  		 */
> >-		cons_tail = __atomic_load_n(&r->cons.tail,
> >-					__ATOMIC_ACQUIRE);
> >+		cons_tail = rte_atomic_load_explicit(&r->cons.tail,
> >+					rte_memory_order_acquire);
> >  		/* The subtraction is done between two unsigned 32bits value
> >  		 * (the result is always modulo 32 bits even if we have
> >@@ -95,10 +96,10 @@
> >  			r->prod.head = *new_head, success = 1;
> >  		else
> >  			/* on failure, *old_head is updated */
> >-			success = __atomic_compare_exchange_n(&r->prod.head,
> >+			success = rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
> >  					old_head, *new_head,
> >-					0, __ATOMIC_RELAXED,
> >-					__ATOMIC_RELAXED);
> >+					rte_memory_order_relaxed,
> >+					rte_memory_order_relaxed);
> >  	} while (unlikely(success == 0));
> >  	return n;
> >  }
> >@@ -137,19 +138,19 @@
> >  	int success;
> >  	/* move cons.head atomically */
> >-	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
> >+	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_relaxed);
> >  	do {
> >  		/* Restore n as it may change every loop */
> >  		n = max;
> >  		/* Ensure the head is read before tail */
> >-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> >+		__atomic_thread_fence(rte_memory_order_acquire);
> >  		/* this load-acquire synchronize with store-release of ht->tail
> >  		 * in update_tail.
> >  		 */
> >-		prod_tail = __atomic_load_n(&r->prod.tail,
> >-					__ATOMIC_ACQUIRE);
> >+		prod_tail = rte_atomic_load_explicit(&r->prod.tail,
> >+					rte_memory_order_acquire);
> >  		/* The subtraction is done between two unsigned 32bits value
> >  		 * (the result is always modulo 32 bits even if we have
> >@@ -170,10 +171,10 @@
> >  			r->cons.head = *new_head, success = 1;
> >  		else
> >  			/* on failure, *old_head will be updated */
> >-			success = __atomic_compare_exchange_n(&r->cons.head,
> >+			success = rte_atomic_compare_exchange_strong_explicit(&r->cons.head,
> >  							old_head, *new_head,
> >-							0, __ATOMIC_RELAXED,
> >-							__ATOMIC_RELAXED);
> >+							rte_memory_order_relaxed,
> >+							rte_memory_order_relaxed);
> >  	} while (unlikely(success == 0));
> >  	return n;
> >  }
> >diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
> >index 327fdcf..7a2b577 100644
> >--- a/lib/ring/rte_ring_core.h
> >+++ b/lib/ring/rte_ring_core.h
> >@@ -67,7 +67,7 @@ enum rte_ring_sync_type {
> >   */
> >  struct rte_ring_headtail {
> >  	volatile uint32_t head;      /**< prod/consumer head. */
> >-	volatile uint32_t tail;      /**< prod/consumer tail. */
> >+	volatile RTE_ATOMIC(uint32_t) tail;      /**< prod/consumer tail. */
> 
> Probably a stupid q:
> why we do need RTE_ATOMIC() around tail only?
> Why head is not affected?

you have a good eye and this is a slightly common issue that i've seen
and there appear to be some interesting things showing up.

the field being qualified has atomic operation performed on it the other
field does not in the implementation. it may be an indication of a bug in
the existing code or it may be intentional.


case 1. atomics should be used but they aren't.

there are fields in structures and variables that were accessed in a
'mixed' manner. that is in some instances __atomic_op_xxx was being used
on them and in other instances not. sometimes it is the initialization
case so it is probably okay, sometimes maybe not...

case 2. broader scope atomic operation, or we don't care if narrower
        access is atomic.

e.g.
union {
   struct {
       uint32_t head;
       RTE_ATOMIC(uint32_t) tail;
    }
    RTE_ATOMIC(uint64_t) combined;
}

again, could be an indication of missing use of atomic, often the
operation on the `combined' field consistently uses atomics but one of
the head/tail fields will not be. on purpose? maybe if we are just doing
== comparison?

my approach in this series prioritized no functional change. as a result
if any of the above are real bugs, they stay real bugs but i have not
changed the way the variables are accessed. if i were to change the code
and start atomic specifying it has a risk of performance regression (for
cases where it isn't a bug) because specifying would result in the
compiler code generating for strongest ordering seq_cst for accesses
that are not using atomic generic functions that specify ordering.

there is another case which comes up half a dozen times or so that is
also concerning to me, but i would need the maintainers of the code to
adapt the code to be correct or maybe it is okay...


case 3. qualification discard .. is the existing code really okay?

e.g.

atomic_compare_exchange(*object, *expected, desired, ...)

the issue is with the specification of the memory aliased by expected.
gcc doesn't complain or enforce discarding of qualification when using
builtin intrinsics. the result is that if expected is an atomic type it
may be accessed in a non-atomic manner by the code generated for the
atomic operation.

again, i have chosen to maintain existing behavior by casting away the
qualification if present on the expected argument.

i feel that in terms of mutating the source tree it is best to separate
conversion to atomic specified/qualified types into this separate series
and then follow up with additional changes that may have
functional/performance impact if not for any other reason that it
narrows where you have to look if there is a change. certainly conversion
to atomics has made these cases far easier to spot in the code.

finally in terms of most of the toolchain/targets all of this is pretty
moot because most of them are defaulting to enable_stdatomics=false so
most likely if there are problems they will manifest on windows built with
msvc only.

thoughts?

> 
> >  	union {
> >  		/** sync type of prod/cons */
> >  		enum rte_ring_sync_type sync_type;
> >@@ -78,7 +78,7 @@ struct rte_ring_headtail {
> >  union __rte_ring_rts_poscnt {
> >  	/** raw 8B value to read/write *cnt* and *pos* as one atomic op */
> >-	uint64_t raw __rte_aligned(8);
> >+	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
> >  	struct {
> >  		uint32_t cnt; /**< head/tail reference counter */
> >  		uint32_t pos; /**< head/tail position */
> >@@ -94,10 +94,10 @@ struct rte_ring_rts_headtail {
> >  union __rte_ring_hts_pos {
> >  	/** raw 8B value to read/write *head* and *tail* as one atomic op */
> >-	uint64_t raw __rte_aligned(8);
> >+	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
> >  	struct {
> >-		uint32_t head; /**< head position */
> >-		uint32_t tail; /**< tail position */
> >+		RTE_ATOMIC(uint32_t) head; /**< head position */
> >+		RTE_ATOMIC(uint32_t) tail; /**< tail position */
> >  	} pos;
> >  };
> >diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
> >index 5acb6e5..ffb3654 100644
> >--- a/lib/ring/rte_ring_generic_pvt.h
> >+++ b/lib/ring/rte_ring_generic_pvt.h
> >@@ -23,7 +23,8 @@
> >  	 * we need to wait for them to complete
> >  	 */
> >  	if (!single)
> >-		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> >+		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
> 
> I suppose we do need that double type conversion only for atomic
> types right?
> 
> >+			rte_memory_order_relaxed);
> >  	ht->tail = new_val;
> >  }
> >diff --git a/lib/ring/rte_ring_hts_elem_pvt.h b/lib/ring/rte_ring_hts_elem_pvt.h
> >index a8678d3..91f5eec 100644
> >--- a/lib/ring/rte_ring_hts_elem_pvt.h
> >+++ b/lib/ring/rte_ring_hts_elem_pvt.h
> >@@ -10,6 +10,8 @@
> >  #ifndef _RTE_RING_HTS_ELEM_PVT_H_
> >  #define _RTE_RING_HTS_ELEM_PVT_H_
> >+#include <rte_stdatomic.h>
> >+
> >  /**
> >   * @file rte_ring_hts_elem_pvt.h
> >   * It is not recommended to include this file directly,
> >@@ -30,7 +32,7 @@
> >  	RTE_SET_USED(enqueue);
> >  	tail = old_tail + num;
> >-	__atomic_store_n(&ht->ht.pos.tail, tail, __ATOMIC_RELEASE);
> >+	rte_atomic_store_explicit(&ht->ht.pos.tail, tail, rte_memory_order_release);
> >  }
> >  /**
> >@@ -44,7 +46,7 @@
> >  {
> >  	while (p->pos.head != p->pos.tail) {
> >  		rte_pause();
> >-		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
> >+		p->raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_acquire);
> >  	}
> >  }
> >@@ -61,7 +63,7 @@
> >  	const uint32_t capacity = r->capacity;
> >-	op.raw = __atomic_load_n(&r->hts_prod.ht.raw, __ATOMIC_ACQUIRE);
> >+	op.raw = rte_atomic_load_explicit(&r->hts_prod.ht.raw, rte_memory_order_acquire);
> >  	do {
> >  		/* Reset n to the initial burst count */
> >@@ -98,9 +100,9 @@
> >  	 *  - OOO reads of cons tail value
> >  	 *  - OOO copy of elems from the ring
> >  	 */
> >-	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
> >-			&op.raw, np.raw,
> >-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> >+	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_prod.ht.raw,
> >+			(uint64_t *)(uintptr_t)&op.raw, np.raw,
> >+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
> >  	*old_head = op.pos.head;
> >  	return n;
> >@@ -117,7 +119,7 @@
> >  	uint32_t n;
> >  	union __rte_ring_hts_pos np, op;
> >-	op.raw = __atomic_load_n(&r->hts_cons.ht.raw, __ATOMIC_ACQUIRE);
> >+	op.raw = rte_atomic_load_explicit(&r->hts_cons.ht.raw, rte_memory_order_acquire);
> >  	/* move cons.head atomically */
> >  	do {
> >@@ -153,9 +155,9 @@
> >  	 *  - OOO reads of prod tail value
> >  	 *  - OOO copy of elems from the ring
> >  	 */
> >-	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
> >-			&op.raw, np.raw,
> >-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> >+	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_cons.ht.raw,
> >+			(uint64_t *)(uintptr_t)&op.raw, np.raw,
> >+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
> >  	*old_head = op.pos.head;
> >  	return n;
> >diff --git a/lib/ring/rte_ring_peek_elem_pvt.h b/lib/ring/rte_ring_peek_elem_pvt.h
> >index bb0a7d5..b5f0822 100644
> >--- a/lib/ring/rte_ring_peek_elem_pvt.h
> >+++ b/lib/ring/rte_ring_peek_elem_pvt.h
> >@@ -59,7 +59,7 @@
> >  	pos = tail + num;
> >  	ht->head = pos;
> >-	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
> >+	rte_atomic_store_explicit(&ht->tail, pos, rte_memory_order_release);
> >  }
> >  /**
> >@@ -78,7 +78,7 @@
> >  	uint32_t n;
> >  	union __rte_ring_hts_pos p;
> >-	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
> >+	p.raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_relaxed);
> >  	n = p.pos.head - p.pos.tail;
> >  	RTE_ASSERT(n >= num);
> >@@ -104,7 +104,7 @@
> >  	p.pos.head = tail + num;
> >  	p.pos.tail = p.pos.head;
> >-	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
> >+	rte_atomic_store_explicit(&ht->ht.raw, p.raw, rte_memory_order_release);
> >  }
> >  /**
> >diff --git a/lib/ring/rte_ring_rts_elem_pvt.h b/lib/ring/rte_ring_rts_elem_pvt.h
> >index 7164213..1226503 100644
> >--- a/lib/ring/rte_ring_rts_elem_pvt.h
> >+++ b/lib/ring/rte_ring_rts_elem_pvt.h
> >@@ -31,18 +31,19 @@
> >  	 * might preceded us, then don't update tail with new value.
> >  	 */
> >-	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
> >+	ot.raw = rte_atomic_load_explicit(&ht->tail.raw, rte_memory_order_acquire);
> >  	do {
> >  		/* on 32-bit systems we have to do atomic read here */
> >-		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
> >+		h.raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_relaxed);
> >  		nt.raw = ot.raw;
> >  		if (++nt.val.cnt == h.val.cnt)
> >  			nt.val.pos = h.val.pos;
> >-	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
> >-			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
> >+	} while (rte_atomic_compare_exchange_strong_explicit(&ht->tail.raw,
> >+			(uint64_t *)(uintptr_t)&ot.raw, nt.raw,
> >+			rte_memory_order_release, rte_memory_order_acquire) == 0);
> >  }
> >  /**
> >@@ -59,7 +60,7 @@
> >  	while (h->val.pos - ht->tail.val.pos > max) {
> >  		rte_pause();
> >-		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
> >+		h->raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_acquire);
> >  	}
> >  }
> >@@ -76,7 +77,7 @@
> >  	const uint32_t capacity = r->capacity;
> >-	oh.raw = __atomic_load_n(&r->rts_prod.head.raw, __ATOMIC_ACQUIRE);
> >+	oh.raw = rte_atomic_load_explicit(&r->rts_prod.head.raw, rte_memory_order_acquire);
> >  	do {
> >  		/* Reset n to the initial burst count */
> >@@ -113,9 +114,9 @@
> >  	 *  - OOO reads of cons tail value
> >  	 *  - OOO copy of elems to the ring
> >  	 */
> >-	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
> >-			&oh.raw, nh.raw,
> >-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> >+	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_prod.head.raw,
> >+			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
> >+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
> >  	*old_head = oh.val.pos;
> >  	return n;
> >@@ -132,7 +133,7 @@
> >  	uint32_t n;
> >  	union __rte_ring_rts_poscnt nh, oh;
> >-	oh.raw = __atomic_load_n(&r->rts_cons.head.raw, __ATOMIC_ACQUIRE);
> >+	oh.raw = rte_atomic_load_explicit(&r->rts_cons.head.raw, rte_memory_order_acquire);
> >  	/* move cons.head atomically */
> >  	do {
> >@@ -168,9 +169,9 @@
> >  	 *  - OOO reads of prod tail value
> >  	 *  - OOO copy of elems from the ring
> >  	 */
> >-	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
> >-			&oh.raw, nh.raw,
> >-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> >+	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_cons.head.raw,
> >+			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
> >+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
> >  	*old_head = oh.val.pos;
> >  	return n;

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: [PATCH v2 19/19] ring: use rte optional stdatomic API
  2023-10-24 15:58         ` Tyler Retzlaff
@ 2023-10-24 16:36           ` Morten Brørup
  0 siblings, 0 replies; 91+ messages in thread
From: Morten Brørup @ 2023-10-24 16:36 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Konstantin Ananyev, dev, Akhil Goyal, Anatoly Burakov,
	Andrew Rybchenko, Bruce Richardson, Chenbo Xia, Ciara Power,
	David Christensen, David Hunt, Dmitry Kozlyuk, Dmitry Malloy,
	Elena Agostini, Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit,
	Harman Kalra, Harry van Haaren, Honnappa Nagarahalli,
	Jerin Jacob, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Tuesday, 24 October 2023 17.59
> 
> On Tue, Oct 24, 2023 at 11:56:11AM +0200, Morten Brørup wrote:
> > > From: Konstantin Ananyev [mailto:konstantin.v.ananyev@yandex.ru]
> > > Sent: Tuesday, 24 October 2023 10.43
> > >
> > > 17.10.2023 21:31, Tyler Retzlaff пишет:
> > > > Replace the use of gcc builtin __atomic_xxx intrinsics with
> > > > corresponding rte_atomic_xxx optional stdatomic API
> > > >
> > > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > > ---
> >
> > [...]
> >
> > > >   	if (!single)
> > > > -		rte_wait_until_equal_32(&ht->tail, old_val,
> __ATOMIC_RELAXED);
> > > > +		rte_wait_until_equal_32((volatile uint32_t
> *)(uintptr_t)&ht-
> > > >tail, old_val,
> > >
> > > I suppose we do need that double type conversion only for atomic
> types
> > > right?
> > >
> > > > +			rte_memory_order_relaxed);
> > > >
> > > >   	ht->tail = new_val;
> > > >   }
> >
> > This got me thinking...
> >
> > Do we want to cast away the value's atomic attribute like this, or
> should we introduce new rte_atomic_wait_XX() functions with the
> parameters being pointers to atomic values, instead of pointers to
> simple values?
> 
> just some notes here.
> 
> so first let me start with it's okay to do this cast but only because we
> have knowledge of the internal implementation detail and this series has
> to do this in a few places.
> 
> basically internally the actual atomic operation is fed back into an
> intrinsic/builtin that is either re-qualified as __rte_atomic or doesn't
> require qualification. i agree it isn't optimal since we have to take
> care should we ever alter the implementation to avoid compatibility
> problems but unlikely for it to be changed.
> 
> we could provide new api but i'm not sure we can do that this late in
> the release cycle. notably i think it would be nicer if it *could* be
> made to be 'generic' as used literally in the atomics documentation
> which means it may operate on non-integer and non-pointer types.

I agree with all of the above, incl. the conclusion:
Future proofing this (for a very distant future) is not worth the effort - and added APIs - at this time.

Thank you for elaborating, Tyler.

> 
> >
> > Just a thought.
> >
> > The initial rte_atomic_wait_XX() implementations could simply cast a
> away the atomic attribute like here.
> >

^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: [PATCH v2 09/19] rcu: use rte optional stdatomic API
  2023-10-17 20:31   ` [PATCH v2 09/19] rcu: " Tyler Retzlaff
@ 2023-10-25  9:41     ` Ruifeng Wang
  2023-10-25 22:38       ` Tyler Retzlaff
  0 siblings, 1 reply; 91+ messages in thread
From: Ruifeng Wang @ 2023-10-25  9:41 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, jerinj,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, thomas, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang, nd

> -----Original Message-----
> From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> Sent: Wednesday, October 18, 2023 4:31 AM
> To: dev@dpdk.org
> Cc: Akhil Goyal <gakhil@marvell.com>; Anatoly Burakov <anatoly.burakov@intel.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>; Bruce Richardson <bruce.richardson@intel.com>;
> Chenbo Xia <chenbo.xia@intel.com>; Ciara Power <ciara.power@intel.com>; David Christensen
> <drc@linux.vnet.ibm.com>; David Hunt <david.hunt@intel.com>; Dmitry Kozlyuk
> <dmitry.kozliuk@gmail.com>; Dmitry Malloy <dmitrym@microsoft.com>; Elena Agostini
> <eagostini@nvidia.com>; Erik Gabriel Carrillo <erik.g.carrillo@intel.com>; Fan Zhang
> <fanzhang.oss@gmail.com>; Ferruh Yigit <ferruh.yigit@amd.com>; Harman Kalra
> <hkalra@marvell.com>; Harry van Haaren <harry.van.haaren@intel.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; jerinj@marvell.com; Konstantin Ananyev
> <konstantin.v.ananyev@yandex.ru>; Matan Azrad <matan@nvidia.com>; Maxime Coquelin
> <maxime.coquelin@redhat.com>; Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>;
> Nicolas Chautru <nicolas.chautru@intel.com>; Olivier Matz <olivier.matz@6wind.com>; Ori
> Kam <orika@nvidia.com>; Pallavi Kadam <pallavi.kadam@intel.com>; Pavan Nikhilesh
> <pbhagavatula@marvell.com>; Reshma Pattan <reshma.pattan@intel.com>; Sameh Gobriel
> <sameh.gobriel@intel.com>; Shijith Thotton <sthotton@marvell.com>; Sivaprasad Tummala
> <sivaprasad.tummala@amd.com>; Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou
> <suanmingm@nvidia.com>; Sunil Kumar Kori <skori@marvell.com>; thomas@monjalon.net;
> Viacheslav Ovsiienko <viacheslavo@nvidia.com>; Vladimir Medvedkin
> <vladimir.medvedkin@intel.com>; Yipeng Wang <yipeng1.wang@intel.com>; Tyler Retzlaff
> <roretzla@linux.microsoft.com>
> Subject: [PATCH v2 09/19] rcu: use rte optional stdatomic API
> 
> Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding rte_atomic_xxx
> optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>  lib/rcu/rte_rcu_qsbr.c | 48 +++++++++++++++++------------------
>  lib/rcu/rte_rcu_qsbr.h | 68 +++++++++++++++++++++++++-------------------------
>  2 files changed, 58 insertions(+), 58 deletions(-)
> 
> diff --git a/lib/rcu/rte_rcu_qsbr.c b/lib/rcu/rte_rcu_qsbr.c index 17be93e..4dc7714 100644
> --- a/lib/rcu/rte_rcu_qsbr.c
> +++ b/lib/rcu/rte_rcu_qsbr.c
> @@ -102,21 +102,21 @@
>  	 * go out of sync. Hence, additional checks are required.
>  	 */
>  	/* Check if the thread is already registered */
> -	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> -					__ATOMIC_RELAXED);
> +	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> +					rte_memory_order_relaxed);
>  	if (old_bmap & 1UL << id)
>  		return 0;
> 
>  	do {
>  		new_bmap = old_bmap | (1UL << id);
> -		success = __atomic_compare_exchange(
> +		success = rte_atomic_compare_exchange_strong_explicit(
>  					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> -					&old_bmap, &new_bmap, 0,
> -					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
> +					&old_bmap, new_bmap,
> +					rte_memory_order_release, rte_memory_order_relaxed);
> 
>  		if (success)
> -			__atomic_fetch_add(&v->num_threads,
> -						1, __ATOMIC_RELAXED);
> +			rte_atomic_fetch_add_explicit(&v->num_threads,
> +						1, rte_memory_order_relaxed);
>  		else if (old_bmap & (1UL << id))
>  			/* Someone else registered this thread.
>  			 * Counter should not be incremented.
> @@ -154,8 +154,8 @@
>  	 * go out of sync. Hence, additional checks are required.
>  	 */
>  	/* Check if the thread is already unregistered */
> -	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> -					__ATOMIC_RELAXED);
> +	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> +					rte_memory_order_relaxed);
>  	if (!(old_bmap & (1UL << id)))
>  		return 0;
> 
> @@ -165,14 +165,14 @@
>  		 * completed before removal of the thread from the list of
>  		 * reporting threads.
>  		 */
> -		success = __atomic_compare_exchange(
> +		success = rte_atomic_compare_exchange_strong_explicit(
>  					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> -					&old_bmap, &new_bmap, 0,
> -					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
> +					&old_bmap, new_bmap,
> +					rte_memory_order_release, rte_memory_order_relaxed);
> 
>  		if (success)
> -			__atomic_fetch_sub(&v->num_threads,
> -						1, __ATOMIC_RELAXED);
> +			rte_atomic_fetch_sub_explicit(&v->num_threads,
> +						1, rte_memory_order_relaxed);
>  		else if (!(old_bmap & (1UL << id)))
>  			/* Someone else unregistered this thread.
>  			 * Counter should not be incremented.
> @@ -227,8 +227,8 @@
> 
>  	fprintf(f, "  Registered thread IDs = ");
>  	for (i = 0; i < v->num_elems; i++) {
> -		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> -					__ATOMIC_ACQUIRE);
> +		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> +					rte_memory_order_acquire);
>  		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
>  		while (bmap) {
>  			t = __builtin_ctzl(bmap);
> @@ -241,26 +241,26 @@
>  	fprintf(f, "\n");
> 
>  	fprintf(f, "  Token = %" PRIu64 "\n",
> -			__atomic_load_n(&v->token, __ATOMIC_ACQUIRE));
> +			rte_atomic_load_explicit(&v->token, rte_memory_order_acquire));
> 
>  	fprintf(f, "  Least Acknowledged Token = %" PRIu64 "\n",
> -			__atomic_load_n(&v->acked_token, __ATOMIC_ACQUIRE));
> +			rte_atomic_load_explicit(&v->acked_token,
> +rte_memory_order_acquire));
> 
>  	fprintf(f, "Quiescent State Counts for readers:\n");
>  	for (i = 0; i < v->num_elems; i++) {
> -		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> -					__ATOMIC_ACQUIRE);
> +		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> +					rte_memory_order_acquire);
>  		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
>  		while (bmap) {
>  			t = __builtin_ctzl(bmap);
>  			fprintf(f, "thread ID = %u, count = %" PRIu64 ", lock count = %u\n",
>  				id + t,
> -				__atomic_load_n(
> +				rte_atomic_load_explicit(
>  					&v->qsbr_cnt[id + t].cnt,
> -					__ATOMIC_RELAXED),
> -				__atomic_load_n(
> +					rte_memory_order_relaxed),
> +				rte_atomic_load_explicit(
>  					&v->qsbr_cnt[id + t].lock_cnt,
> -					__ATOMIC_RELAXED));
> +					rte_memory_order_relaxed));
>  			bmap &= ~(1UL << t);
>  		}
>  	}
> diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h index 87e1b55..9f4aed2 100644
> --- a/lib/rcu/rte_rcu_qsbr.h
> +++ b/lib/rcu/rte_rcu_qsbr.h
> @@ -63,11 +63,11 @@
>   * Given thread id needs to be converted to index into the array and
>   * the id within the array element.
>   */
> -#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8)
> +#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(RTE_ATOMIC(uint64_t)) *
> +8)
>  #define __RTE_QSBR_THRID_ARRAY_SIZE(max_threads) \
>  	RTE_ALIGN(RTE_ALIGN_MUL_CEIL(max_threads, \
>  		__RTE_QSBR_THRID_ARRAY_ELM_SIZE) >> 3, RTE_CACHE_LINE_SIZE) -#define
> __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *) \
> +#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t __rte_atomic *) \

Is it equivalent to ((RTE_ATOMIC(uint64_t) *)?

>  	((struct rte_rcu_qsbr_cnt *)(v + 1) + v->max_threads) + i)  #define
> __RTE_QSBR_THRID_INDEX_SHIFT 6  #define __RTE_QSBR_THRID_MASK 0x3f @@ -75,13 +75,13 @@
> 

<snip>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: [PATCH v2 19/19] ring: use rte optional stdatomic API
  2023-10-24 16:29       ` Tyler Retzlaff
@ 2023-10-25 10:06         ` Konstantin Ananyev
  2023-10-25 22:49           ` Tyler Retzlaff
  0 siblings, 1 reply; 91+ messages in thread
From: Konstantin Ananyev @ 2023-10-25 10:06 UTC (permalink / raw)
  To: Tyler Retzlaff, Konstantin Ananyev
  Cc: dev, Akhil Goyal, Anatoly Burakov, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Ciara Power, David Christensen,
	David Hunt, Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang



> 
> On Tue, Oct 24, 2023 at 09:43:13AM +0100, Konstantin Ananyev wrote:
> > 17.10.2023 21:31, Tyler Retzlaff пишет:
> > >Replace the use of gcc builtin __atomic_xxx intrinsics with
> > >corresponding rte_atomic_xxx optional stdatomic API
> > >
> > >Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > >---
> > >  drivers/net/mlx5/mlx5_hws_cnt.h   |  2 +-
> > >  lib/ring/rte_ring_c11_pvt.h       | 33 +++++++++++++++++----------------
> > >  lib/ring/rte_ring_core.h          | 10 +++++-----
> > >  lib/ring/rte_ring_generic_pvt.h   |  3 ++-
> > >  lib/ring/rte_ring_hts_elem_pvt.h  | 22 ++++++++++++----------
> > >  lib/ring/rte_ring_peek_elem_pvt.h |  6 +++---
> > >  lib/ring/rte_ring_rts_elem_pvt.h  | 27 ++++++++++++++-------------
> > >  7 files changed, 54 insertions(+), 49 deletions(-)
> > >
> > >diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
> > >index f462665..cc9ac10 100644
> > >--- a/drivers/net/mlx5/mlx5_hws_cnt.h
> > >+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
> > >@@ -394,7 +394,7 @@ struct mlx5_hws_age_param {
> > >  	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
> > >  			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
> > >  	/* Update tail */
> > >-	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
> > >+	rte_atomic_store_explicit(&r->prod.tail, revert2head, rte_memory_order_release);
> > >  	return n;
> > >  }
> > >diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> > >index f895950..f8be538 100644
> > >--- a/lib/ring/rte_ring_c11_pvt.h
> > >+++ b/lib/ring/rte_ring_c11_pvt.h
> > >@@ -22,9 +22,10 @@
> > >  	 * we need to wait for them to complete
> > >  	 */
> > >  	if (!single)
> > >-		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> > >+		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
> > >+			rte_memory_order_relaxed);
> > >-	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
> > >+	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
> > >  }
> > >  /**
> > >@@ -61,19 +62,19 @@
> > >  	unsigned int max = n;
> > >  	int success;
> > >-	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
> > >+	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
> > >  	do {
> > >  		/* Reset n to the initial burst count */
> > >  		n = max;
> > >  		/* Ensure the head is read before tail */
> > >-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> > >+		__atomic_thread_fence(rte_memory_order_acquire);
> > >  		/* load-acquire synchronize with store-release of ht->tail
> > >  		 * in update_tail.
> > >  		 */
> > >-		cons_tail = __atomic_load_n(&r->cons.tail,
> > >-					__ATOMIC_ACQUIRE);
> > >+		cons_tail = rte_atomic_load_explicit(&r->cons.tail,
> > >+					rte_memory_order_acquire);
> > >  		/* The subtraction is done between two unsigned 32bits value
> > >  		 * (the result is always modulo 32 bits even if we have
> > >@@ -95,10 +96,10 @@
> > >  			r->prod.head = *new_head, success = 1;
> > >  		else
> > >  			/* on failure, *old_head is updated */
> > >-			success = __atomic_compare_exchange_n(&r->prod.head,
> > >+			success = rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
> > >  					old_head, *new_head,
> > >-					0, __ATOMIC_RELAXED,
> > >-					__ATOMIC_RELAXED);
> > >+					rte_memory_order_relaxed,
> > >+					rte_memory_order_relaxed);
> > >  	} while (unlikely(success == 0));
> > >  	return n;
> > >  }
> > >@@ -137,19 +138,19 @@
> > >  	int success;
> > >  	/* move cons.head atomically */
> > >-	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
> > >+	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_relaxed);
> > >  	do {
> > >  		/* Restore n as it may change every loop */
> > >  		n = max;
> > >  		/* Ensure the head is read before tail */
> > >-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> > >+		__atomic_thread_fence(rte_memory_order_acquire);
> > >  		/* this load-acquire synchronize with store-release of ht->tail
> > >  		 * in update_tail.
> > >  		 */
> > >-		prod_tail = __atomic_load_n(&r->prod.tail,
> > >-					__ATOMIC_ACQUIRE);
> > >+		prod_tail = rte_atomic_load_explicit(&r->prod.tail,
> > >+					rte_memory_order_acquire);
> > >  		/* The subtraction is done between two unsigned 32bits value
> > >  		 * (the result is always modulo 32 bits even if we have
> > >@@ -170,10 +171,10 @@
> > >  			r->cons.head = *new_head, success = 1;
> > >  		else
> > >  			/* on failure, *old_head will be updated */
> > >-			success = __atomic_compare_exchange_n(&r->cons.head,
> > >+			success = rte_atomic_compare_exchange_strong_explicit(&r->cons.head,
> > >  							old_head, *new_head,
> > >-							0, __ATOMIC_RELAXED,
> > >-							__ATOMIC_RELAXED);
> > >+							rte_memory_order_relaxed,
> > >+							rte_memory_order_relaxed);
> > >  	} while (unlikely(success == 0));
> > >  	return n;
> > >  }
> > >diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
> > >index 327fdcf..7a2b577 100644
> > >--- a/lib/ring/rte_ring_core.h
> > >+++ b/lib/ring/rte_ring_core.h
> > >@@ -67,7 +67,7 @@ enum rte_ring_sync_type {
> > >   */
> > >  struct rte_ring_headtail {
> > >  	volatile uint32_t head;      /**< prod/consumer head. */
> > >-	volatile uint32_t tail;      /**< prod/consumer tail. */
> > >+	volatile RTE_ATOMIC(uint32_t) tail;      /**< prod/consumer tail. */
> >
> > Probably a stupid q:
> > why we do need RTE_ATOMIC() around tail only?
> > Why head is not affected?
> 
> you have a good eye and this is a slightly common issue that i've seen
> and there appear to be some interesting things showing up.
> 
> the field being qualified has atomic operation performed on it the other
> field does not in the implementation. it may be an indication of a bug in
> the existing code or it may be intentional.

Hmm... but as I can see, we are doing similar operations on  both head and tail.
For head it would be: atomic_load(), then either atomic_store() or atomic_cas().
For tail it would be: atomic_load(), then atomic_store().
Or is that because of we missed atomic_store(&r->prod.head, ..., RELAXED) here:
static __rte_always_inline unsigned int
__rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
                unsigned int n, enum rte_ring_queue_behavior behavior,
                uint32_t *old_head, uint32_t *new_head,
                uint32_t *free_entries)
{
....
if (is_sp)
                        r->prod.head = *new_head, success = 1;

?

> 
> case 1. atomics should be used but they aren't.
> 
> there are fields in structures and variables that were accessed in a
> 'mixed' manner. that is in some instances __atomic_op_xxx was being used
> on them and in other instances not. sometimes it is the initialization
> case so it is probably okay, sometimes maybe not...
> 
> case 2. broader scope atomic operation, or we don't care if narrower
>         access is atomic.
> 
> e.g.
> union {
>    struct {
>        uint32_t head;
>        RTE_ATOMIC(uint32_t) tail;
>     }
>     RTE_ATOMIC(uint64_t) combined;
> }
> 
> again, could be an indication of missing use of atomic, often the
> operation on the `combined' field consistently uses atomics but one of
> the head/tail fields will not be. on purpose? maybe if we are just doing
> == comparison?
> 
> my approach in this series prioritized no functional change. as a result
> if any of the above are real bugs, they stay real bugs but i have not
> changed the way the variables are accessed. if i were to change the code
> and start atomic specifying it has a risk of performance regression (for
> cases where it isn't a bug) because specifying would result in the
> compiler code generating for strongest ordering seq_cst for accesses
> that are not using atomic generic functions that specify ordering.
> 
> there is another case which comes up half a dozen times or so that is
> also concerning to me, but i would need the maintainers of the code to
> adapt the code to be correct or maybe it is okay...
> 
> 
> case 3. qualification discard .. is the existing code really okay?
> 
> e.g.
> 
> atomic_compare_exchange(*object, *expected, desired, ...)
> 
> the issue is with the specification of the memory aliased by expected.
> gcc doesn't complain or enforce discarding of qualification when using
> builtin intrinsics. the result is that if expected is an atomic type it
> may be accessed in a non-atomic manner by the code generated for the
> atomic operation.
> 
> again, i have chosen to maintain existing behavior by casting away the
> qualification if present on the expected argument.
> 
> i feel that in terms of mutating the source tree it is best to separate
> conversion to atomic specified/qualified types into this separate series
> and then follow up with additional changes that may have
> functional/performance impact if not for any other reason that it
> narrows where you have to look if there is a change. certainly conversion
> to atomics has made these cases far easier to spot in the code.
> 
> finally in terms of most of the toolchain/targets all of this is pretty
> moot because most of them are defaulting to enable_stdatomics=false so
> most likely if there are problems they will manifest on windows built with
> msvc only.
> 
> thoughts?
> 
> >
> > >  	union {
> > >  		/** sync type of prod/cons */
> > >  		enum rte_ring_sync_type sync_type;
> > >@@ -78,7 +78,7 @@ struct rte_ring_headtail {
> > >  union __rte_ring_rts_poscnt {
> > >  	/** raw 8B value to read/write *cnt* and *pos* as one atomic op */
> > >-	uint64_t raw __rte_aligned(8);
> > >+	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
> > >  	struct {
> > >  		uint32_t cnt; /**< head/tail reference counter */
> > >  		uint32_t pos; /**< head/tail position */
> > >@@ -94,10 +94,10 @@ struct rte_ring_rts_headtail {
> > >  union __rte_ring_hts_pos {
> > >  	/** raw 8B value to read/write *head* and *tail* as one atomic op */
> > >-	uint64_t raw __rte_aligned(8);
> > >+	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
> > >  	struct {
> > >-		uint32_t head; /**< head position */
> > >-		uint32_t tail; /**< tail position */
> > >+		RTE_ATOMIC(uint32_t) head; /**< head position */
> > >+		RTE_ATOMIC(uint32_t) tail; /**< tail position */
> > >  	} pos;
> > >  };
> > >diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
> > >index 5acb6e5..ffb3654 100644
> > >--- a/lib/ring/rte_ring_generic_pvt.h
> > >+++ b/lib/ring/rte_ring_generic_pvt.h
> > >@@ -23,7 +23,8 @@
> > >  	 * we need to wait for them to complete
> > >  	 */
> > >  	if (!single)
> > >-		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> > >+		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
> >
> > I suppose we do need that double type conversion only for atomic
> > types right?
> >
> > >+			rte_memory_order_relaxed);
> > >  	ht->tail = new_val;
> > >  }
> > >diff --git a/lib/ring/rte_ring_hts_elem_pvt.h b/lib/ring/rte_ring_hts_elem_pvt.h
> > >index a8678d3..91f5eec 100644
> > >--- a/lib/ring/rte_ring_hts_elem_pvt.h
> > >+++ b/lib/ring/rte_ring_hts_elem_pvt.h
> > >@@ -10,6 +10,8 @@
> > >  #ifndef _RTE_RING_HTS_ELEM_PVT_H_
> > >  #define _RTE_RING_HTS_ELEM_PVT_H_
> > >+#include <rte_stdatomic.h>
> > >+
> > >  /**
> > >   * @file rte_ring_hts_elem_pvt.h
> > >   * It is not recommended to include this file directly,
> > >@@ -30,7 +32,7 @@
> > >  	RTE_SET_USED(enqueue);
> > >  	tail = old_tail + num;
> > >-	__atomic_store_n(&ht->ht.pos.tail, tail, __ATOMIC_RELEASE);
> > >+	rte_atomic_store_explicit(&ht->ht.pos.tail, tail, rte_memory_order_release);
> > >  }
> > >  /**
> > >@@ -44,7 +46,7 @@
> > >  {
> > >  	while (p->pos.head != p->pos.tail) {
> > >  		rte_pause();
> > >-		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
> > >+		p->raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_acquire);
> > >  	}
> > >  }
> > >@@ -61,7 +63,7 @@
> > >  	const uint32_t capacity = r->capacity;
> > >-	op.raw = __atomic_load_n(&r->hts_prod.ht.raw, __ATOMIC_ACQUIRE);
> > >+	op.raw = rte_atomic_load_explicit(&r->hts_prod.ht.raw, rte_memory_order_acquire);
> > >  	do {
> > >  		/* Reset n to the initial burst count */
> > >@@ -98,9 +100,9 @@
> > >  	 *  - OOO reads of cons tail value
> > >  	 *  - OOO copy of elems from the ring
> > >  	 */
> > >-	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
> > >-			&op.raw, np.raw,
> > >-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> > >+	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_prod.ht.raw,
> > >+			(uint64_t *)(uintptr_t)&op.raw, np.raw,
> > >+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
> > >  	*old_head = op.pos.head;
> > >  	return n;
> > >@@ -117,7 +119,7 @@
> > >  	uint32_t n;
> > >  	union __rte_ring_hts_pos np, op;
> > >-	op.raw = __atomic_load_n(&r->hts_cons.ht.raw, __ATOMIC_ACQUIRE);
> > >+	op.raw = rte_atomic_load_explicit(&r->hts_cons.ht.raw, rte_memory_order_acquire);
> > >  	/* move cons.head atomically */
> > >  	do {
> > >@@ -153,9 +155,9 @@
> > >  	 *  - OOO reads of prod tail value
> > >  	 *  - OOO copy of elems from the ring
> > >  	 */
> > >-	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
> > >-			&op.raw, np.raw,
> > >-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> > >+	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_cons.ht.raw,
> > >+			(uint64_t *)(uintptr_t)&op.raw, np.raw,
> > >+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
> > >  	*old_head = op.pos.head;
> > >  	return n;
> > >diff --git a/lib/ring/rte_ring_peek_elem_pvt.h b/lib/ring/rte_ring_peek_elem_pvt.h
> > >index bb0a7d5..b5f0822 100644
> > >--- a/lib/ring/rte_ring_peek_elem_pvt.h
> > >+++ b/lib/ring/rte_ring_peek_elem_pvt.h
> > >@@ -59,7 +59,7 @@
> > >  	pos = tail + num;
> > >  	ht->head = pos;
> > >-	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
> > >+	rte_atomic_store_explicit(&ht->tail, pos, rte_memory_order_release);
> > >  }
> > >  /**
> > >@@ -78,7 +78,7 @@
> > >  	uint32_t n;
> > >  	union __rte_ring_hts_pos p;
> > >-	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
> > >+	p.raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_relaxed);
> > >  	n = p.pos.head - p.pos.tail;
> > >  	RTE_ASSERT(n >= num);
> > >@@ -104,7 +104,7 @@
> > >  	p.pos.head = tail + num;
> > >  	p.pos.tail = p.pos.head;
> > >-	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
> > >+	rte_atomic_store_explicit(&ht->ht.raw, p.raw, rte_memory_order_release);
> > >  }
> > >  /**
> > >diff --git a/lib/ring/rte_ring_rts_elem_pvt.h b/lib/ring/rte_ring_rts_elem_pvt.h
> > >index 7164213..1226503 100644
> > >--- a/lib/ring/rte_ring_rts_elem_pvt.h
> > >+++ b/lib/ring/rte_ring_rts_elem_pvt.h
> > >@@ -31,18 +31,19 @@
> > >  	 * might preceded us, then don't update tail with new value.
> > >  	 */
> > >-	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
> > >+	ot.raw = rte_atomic_load_explicit(&ht->tail.raw, rte_memory_order_acquire);
> > >  	do {
> > >  		/* on 32-bit systems we have to do atomic read here */
> > >-		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
> > >+		h.raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_relaxed);
> > >  		nt.raw = ot.raw;
> > >  		if (++nt.val.cnt == h.val.cnt)
> > >  			nt.val.pos = h.val.pos;
> > >-	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
> > >-			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
> > >+	} while (rte_atomic_compare_exchange_strong_explicit(&ht->tail.raw,
> > >+			(uint64_t *)(uintptr_t)&ot.raw, nt.raw,
> > >+			rte_memory_order_release, rte_memory_order_acquire) == 0);
> > >  }
> > >  /**
> > >@@ -59,7 +60,7 @@
> > >  	while (h->val.pos - ht->tail.val.pos > max) {
> > >  		rte_pause();
> > >-		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
> > >+		h->raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_acquire);
> > >  	}
> > >  }
> > >@@ -76,7 +77,7 @@
> > >  	const uint32_t capacity = r->capacity;
> > >-	oh.raw = __atomic_load_n(&r->rts_prod.head.raw, __ATOMIC_ACQUIRE);
> > >+	oh.raw = rte_atomic_load_explicit(&r->rts_prod.head.raw, rte_memory_order_acquire);
> > >  	do {
> > >  		/* Reset n to the initial burst count */
> > >@@ -113,9 +114,9 @@
> > >  	 *  - OOO reads of cons tail value
> > >  	 *  - OOO copy of elems to the ring
> > >  	 */
> > >-	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
> > >-			&oh.raw, nh.raw,
> > >-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> > >+	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_prod.head.raw,
> > >+			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
> > >+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
> > >  	*old_head = oh.val.pos;
> > >  	return n;
> > >@@ -132,7 +133,7 @@
> > >  	uint32_t n;
> > >  	union __rte_ring_rts_poscnt nh, oh;
> > >-	oh.raw = __atomic_load_n(&r->rts_cons.head.raw, __ATOMIC_ACQUIRE);
> > >+	oh.raw = rte_atomic_load_explicit(&r->rts_cons.head.raw, rte_memory_order_acquire);
> > >  	/* move cons.head atomically */
> > >  	do {
> > >@@ -168,9 +169,9 @@
> > >  	 *  - OOO reads of prod tail value
> > >  	 *  - OOO copy of elems from the ring
> > >  	 */
> > >-	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
> > >-			&oh.raw, nh.raw,
> > >-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> > >+	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_cons.head.raw,
> > >+			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
> > >+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
> > >  	*old_head = oh.val.pos;
> > >  	return n;

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 09/19] rcu: use rte optional stdatomic API
  2023-10-25  9:41     ` Ruifeng Wang
@ 2023-10-25 22:38       ` Tyler Retzlaff
  2023-10-26  4:24         ` Ruifeng Wang
  0 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-25 22:38 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: dev, Akhil Goyal, Anatoly Burakov, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Ciara Power, David Christensen,
	David Hunt, Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, jerinj,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, thomas, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang, nd

On Wed, Oct 25, 2023 at 09:41:22AM +0000, Ruifeng Wang wrote:
> > -----Original Message-----
> > From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > Sent: Wednesday, October 18, 2023 4:31 AM
> > To: dev@dpdk.org
> > Cc: Akhil Goyal <gakhil@marvell.com>; Anatoly Burakov <anatoly.burakov@intel.com>; Andrew
> > Rybchenko <andrew.rybchenko@oktetlabs.ru>; Bruce Richardson <bruce.richardson@intel.com>;
> > Chenbo Xia <chenbo.xia@intel.com>; Ciara Power <ciara.power@intel.com>; David Christensen
> > <drc@linux.vnet.ibm.com>; David Hunt <david.hunt@intel.com>; Dmitry Kozlyuk
> > <dmitry.kozliuk@gmail.com>; Dmitry Malloy <dmitrym@microsoft.com>; Elena Agostini
> > <eagostini@nvidia.com>; Erik Gabriel Carrillo <erik.g.carrillo@intel.com>; Fan Zhang
> > <fanzhang.oss@gmail.com>; Ferruh Yigit <ferruh.yigit@amd.com>; Harman Kalra
> > <hkalra@marvell.com>; Harry van Haaren <harry.van.haaren@intel.com>; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>; jerinj@marvell.com; Konstantin Ananyev
> > <konstantin.v.ananyev@yandex.ru>; Matan Azrad <matan@nvidia.com>; Maxime Coquelin
> > <maxime.coquelin@redhat.com>; Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>;
> > Nicolas Chautru <nicolas.chautru@intel.com>; Olivier Matz <olivier.matz@6wind.com>; Ori
> > Kam <orika@nvidia.com>; Pallavi Kadam <pallavi.kadam@intel.com>; Pavan Nikhilesh
> > <pbhagavatula@marvell.com>; Reshma Pattan <reshma.pattan@intel.com>; Sameh Gobriel
> > <sameh.gobriel@intel.com>; Shijith Thotton <sthotton@marvell.com>; Sivaprasad Tummala
> > <sivaprasad.tummala@amd.com>; Stephen Hemminger <stephen@networkplumber.org>; Suanming Mou
> > <suanmingm@nvidia.com>; Sunil Kumar Kori <skori@marvell.com>; thomas@monjalon.net;
> > Viacheslav Ovsiienko <viacheslavo@nvidia.com>; Vladimir Medvedkin
> > <vladimir.medvedkin@intel.com>; Yipeng Wang <yipeng1.wang@intel.com>; Tyler Retzlaff
> > <roretzla@linux.microsoft.com>
> > Subject: [PATCH v2 09/19] rcu: use rte optional stdatomic API
> > 
> > Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding rte_atomic_xxx
> > optional stdatomic API
> > 
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> >  lib/rcu/rte_rcu_qsbr.c | 48 +++++++++++++++++------------------
> >  lib/rcu/rte_rcu_qsbr.h | 68 +++++++++++++++++++++++++-------------------------
> >  2 files changed, 58 insertions(+), 58 deletions(-)
> > 
> > diff --git a/lib/rcu/rte_rcu_qsbr.c b/lib/rcu/rte_rcu_qsbr.c index 17be93e..4dc7714 100644
> > --- a/lib/rcu/rte_rcu_qsbr.c
> > +++ b/lib/rcu/rte_rcu_qsbr.c
> > @@ -102,21 +102,21 @@
> >  	 * go out of sync. Hence, additional checks are required.
> >  	 */
> >  	/* Check if the thread is already registered */
> > -	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > -					__ATOMIC_RELAXED);
> > +	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > +					rte_memory_order_relaxed);
> >  	if (old_bmap & 1UL << id)
> >  		return 0;
> > 
> >  	do {
> >  		new_bmap = old_bmap | (1UL << id);
> > -		success = __atomic_compare_exchange(
> > +		success = rte_atomic_compare_exchange_strong_explicit(
> >  					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > -					&old_bmap, &new_bmap, 0,
> > -					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
> > +					&old_bmap, new_bmap,
> > +					rte_memory_order_release, rte_memory_order_relaxed);
> > 
> >  		if (success)
> > -			__atomic_fetch_add(&v->num_threads,
> > -						1, __ATOMIC_RELAXED);
> > +			rte_atomic_fetch_add_explicit(&v->num_threads,
> > +						1, rte_memory_order_relaxed);
> >  		else if (old_bmap & (1UL << id))
> >  			/* Someone else registered this thread.
> >  			 * Counter should not be incremented.
> > @@ -154,8 +154,8 @@
> >  	 * go out of sync. Hence, additional checks are required.
> >  	 */
> >  	/* Check if the thread is already unregistered */
> > -	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > -					__ATOMIC_RELAXED);
> > +	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > +					rte_memory_order_relaxed);
> >  	if (!(old_bmap & (1UL << id)))
> >  		return 0;
> > 
> > @@ -165,14 +165,14 @@
> >  		 * completed before removal of the thread from the list of
> >  		 * reporting threads.
> >  		 */
> > -		success = __atomic_compare_exchange(
> > +		success = rte_atomic_compare_exchange_strong_explicit(
> >  					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > -					&old_bmap, &new_bmap, 0,
> > -					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
> > +					&old_bmap, new_bmap,
> > +					rte_memory_order_release, rte_memory_order_relaxed);
> > 
> >  		if (success)
> > -			__atomic_fetch_sub(&v->num_threads,
> > -						1, __ATOMIC_RELAXED);
> > +			rte_atomic_fetch_sub_explicit(&v->num_threads,
> > +						1, rte_memory_order_relaxed);
> >  		else if (!(old_bmap & (1UL << id)))
> >  			/* Someone else unregistered this thread.
> >  			 * Counter should not be incremented.
> > @@ -227,8 +227,8 @@
> > 
> >  	fprintf(f, "  Registered thread IDs = ");
> >  	for (i = 0; i < v->num_elems; i++) {
> > -		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > -					__ATOMIC_ACQUIRE);
> > +		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > +					rte_memory_order_acquire);
> >  		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
> >  		while (bmap) {
> >  			t = __builtin_ctzl(bmap);
> > @@ -241,26 +241,26 @@
> >  	fprintf(f, "\n");
> > 
> >  	fprintf(f, "  Token = %" PRIu64 "\n",
> > -			__atomic_load_n(&v->token, __ATOMIC_ACQUIRE));
> > +			rte_atomic_load_explicit(&v->token, rte_memory_order_acquire));
> > 
> >  	fprintf(f, "  Least Acknowledged Token = %" PRIu64 "\n",
> > -			__atomic_load_n(&v->acked_token, __ATOMIC_ACQUIRE));
> > +			rte_atomic_load_explicit(&v->acked_token,
> > +rte_memory_order_acquire));
> > 
> >  	fprintf(f, "Quiescent State Counts for readers:\n");
> >  	for (i = 0; i < v->num_elems; i++) {
> > -		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > -					__ATOMIC_ACQUIRE);
> > +		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > +					rte_memory_order_acquire);
> >  		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
> >  		while (bmap) {
> >  			t = __builtin_ctzl(bmap);
> >  			fprintf(f, "thread ID = %u, count = %" PRIu64 ", lock count = %u\n",
> >  				id + t,
> > -				__atomic_load_n(
> > +				rte_atomic_load_explicit(
> >  					&v->qsbr_cnt[id + t].cnt,
> > -					__ATOMIC_RELAXED),
> > -				__atomic_load_n(
> > +					rte_memory_order_relaxed),
> > +				rte_atomic_load_explicit(
> >  					&v->qsbr_cnt[id + t].lock_cnt,
> > -					__ATOMIC_RELAXED));
> > +					rte_memory_order_relaxed));
> >  			bmap &= ~(1UL << t);
> >  		}
> >  	}
> > diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h index 87e1b55..9f4aed2 100644
> > --- a/lib/rcu/rte_rcu_qsbr.h
> > +++ b/lib/rcu/rte_rcu_qsbr.h
> > @@ -63,11 +63,11 @@
> >   * Given thread id needs to be converted to index into the array and
> >   * the id within the array element.
> >   */
> > -#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8)
> > +#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(RTE_ATOMIC(uint64_t)) *
> > +8)
> >  #define __RTE_QSBR_THRID_ARRAY_SIZE(max_threads) \
> >  	RTE_ALIGN(RTE_ALIGN_MUL_CEIL(max_threads, \
> >  		__RTE_QSBR_THRID_ARRAY_ELM_SIZE) >> 3, RTE_CACHE_LINE_SIZE) -#define
> > __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *) \
> > +#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t __rte_atomic *) \
> 
> Is it equivalent to ((RTE_ATOMIC(uint64_t) *)?

i'm not sure if you're asking about the resultant type of the expression
or not?

in this context we aren't specifying an atomic type but rather adding
the atomic qualifier to what should already be a variable that has an
atomic specified type with a cast which is why we use __rte_atomic.

> 
> >  	((struct rte_rcu_qsbr_cnt *)(v + 1) + v->max_threads) + i)  #define
> > __RTE_QSBR_THRID_INDEX_SHIFT 6  #define __RTE_QSBR_THRID_MASK 0x3f @@ -75,13 +75,13 @@
> > 
> 
> <snip>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 19/19] ring: use rte optional stdatomic API
  2023-10-25 10:06         ` Konstantin Ananyev
@ 2023-10-25 22:49           ` Tyler Retzlaff
  2023-10-25 23:22             ` Tyler Retzlaff
  0 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-25 22:49 UTC (permalink / raw)
  To: Konstantin Ananyev
  Cc: Konstantin Ananyev, dev, Akhil Goyal, Anatoly Burakov,
	Andrew Rybchenko, Bruce Richardson, Chenbo Xia, Ciara Power,
	David Christensen, David Hunt, Dmitry Kozlyuk, Dmitry Malloy,
	Elena Agostini, Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit,
	Harman Kalra, Harry van Haaren, Honnappa Nagarahalli,
	Jerin Jacob, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang

On Wed, Oct 25, 2023 at 10:06:23AM +0000, Konstantin Ananyev wrote:
> 
> 
> > 
> > On Tue, Oct 24, 2023 at 09:43:13AM +0100, Konstantin Ananyev wrote:
> > > 17.10.2023 21:31, Tyler Retzlaff пишет:
> > > >Replace the use of gcc builtin __atomic_xxx intrinsics with
> > > >corresponding rte_atomic_xxx optional stdatomic API
> > > >
> > > >Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > >---
> > > >  drivers/net/mlx5/mlx5_hws_cnt.h   |  2 +-
> > > >  lib/ring/rte_ring_c11_pvt.h       | 33 +++++++++++++++++----------------
> > > >  lib/ring/rte_ring_core.h          | 10 +++++-----
> > > >  lib/ring/rte_ring_generic_pvt.h   |  3 ++-
> > > >  lib/ring/rte_ring_hts_elem_pvt.h  | 22 ++++++++++++----------
> > > >  lib/ring/rte_ring_peek_elem_pvt.h |  6 +++---
> > > >  lib/ring/rte_ring_rts_elem_pvt.h  | 27 ++++++++++++++-------------
> > > >  7 files changed, 54 insertions(+), 49 deletions(-)
> > > >
> > > >diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
> > > >index f462665..cc9ac10 100644
> > > >--- a/drivers/net/mlx5/mlx5_hws_cnt.h
> > > >+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
> > > >@@ -394,7 +394,7 @@ struct mlx5_hws_age_param {
> > > >  	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
> > > >  			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
> > > >  	/* Update tail */
> > > >-	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
> > > >+	rte_atomic_store_explicit(&r->prod.tail, revert2head, rte_memory_order_release);
> > > >  	return n;
> > > >  }
> > > >diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> > > >index f895950..f8be538 100644
> > > >--- a/lib/ring/rte_ring_c11_pvt.h
> > > >+++ b/lib/ring/rte_ring_c11_pvt.h
> > > >@@ -22,9 +22,10 @@
> > > >  	 * we need to wait for them to complete
> > > >  	 */
> > > >  	if (!single)
> > > >-		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> > > >+		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
> > > >+			rte_memory_order_relaxed);
> > > >-	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
> > > >+	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
> > > >  }
> > > >  /**
> > > >@@ -61,19 +62,19 @@
> > > >  	unsigned int max = n;
> > > >  	int success;
> > > >-	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
> > > >+	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
> > > >  	do {
> > > >  		/* Reset n to the initial burst count */
> > > >  		n = max;
> > > >  		/* Ensure the head is read before tail */
> > > >-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> > > >+		__atomic_thread_fence(rte_memory_order_acquire);
> > > >  		/* load-acquire synchronize with store-release of ht->tail
> > > >  		 * in update_tail.
> > > >  		 */
> > > >-		cons_tail = __atomic_load_n(&r->cons.tail,
> > > >-					__ATOMIC_ACQUIRE);
> > > >+		cons_tail = rte_atomic_load_explicit(&r->cons.tail,
> > > >+					rte_memory_order_acquire);
> > > >  		/* The subtraction is done between two unsigned 32bits value
> > > >  		 * (the result is always modulo 32 bits even if we have
> > > >@@ -95,10 +96,10 @@
> > > >  			r->prod.head = *new_head, success = 1;
> > > >  		else
> > > >  			/* on failure, *old_head is updated */
> > > >-			success = __atomic_compare_exchange_n(&r->prod.head,
> > > >+			success = rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
> > > >  					old_head, *new_head,
> > > >-					0, __ATOMIC_RELAXED,
> > > >-					__ATOMIC_RELAXED);
> > > >+					rte_memory_order_relaxed,
> > > >+					rte_memory_order_relaxed);
> > > >  	} while (unlikely(success == 0));
> > > >  	return n;
> > > >  }
> > > >@@ -137,19 +138,19 @@
> > > >  	int success;
> > > >  	/* move cons.head atomically */
> > > >-	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
> > > >+	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_relaxed);
> > > >  	do {
> > > >  		/* Restore n as it may change every loop */
> > > >  		n = max;
> > > >  		/* Ensure the head is read before tail */
> > > >-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> > > >+		__atomic_thread_fence(rte_memory_order_acquire);
> > > >  		/* this load-acquire synchronize with store-release of ht->tail
> > > >  		 * in update_tail.
> > > >  		 */
> > > >-		prod_tail = __atomic_load_n(&r->prod.tail,
> > > >-					__ATOMIC_ACQUIRE);
> > > >+		prod_tail = rte_atomic_load_explicit(&r->prod.tail,
> > > >+					rte_memory_order_acquire);
> > > >  		/* The subtraction is done between two unsigned 32bits value
> > > >  		 * (the result is always modulo 32 bits even if we have
> > > >@@ -170,10 +171,10 @@
> > > >  			r->cons.head = *new_head, success = 1;
> > > >  		else
> > > >  			/* on failure, *old_head will be updated */
> > > >-			success = __atomic_compare_exchange_n(&r->cons.head,
> > > >+			success = rte_atomic_compare_exchange_strong_explicit(&r->cons.head,
> > > >  							old_head, *new_head,
> > > >-							0, __ATOMIC_RELAXED,
> > > >-							__ATOMIC_RELAXED);
> > > >+							rte_memory_order_relaxed,
> > > >+							rte_memory_order_relaxed);
> > > >  	} while (unlikely(success == 0));
> > > >  	return n;
> > > >  }
> > > >diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
> > > >index 327fdcf..7a2b577 100644
> > > >--- a/lib/ring/rte_ring_core.h
> > > >+++ b/lib/ring/rte_ring_core.h
> > > >@@ -67,7 +67,7 @@ enum rte_ring_sync_type {
> > > >   */
> > > >  struct rte_ring_headtail {
> > > >  	volatile uint32_t head;      /**< prod/consumer head. */
> > > >-	volatile uint32_t tail;      /**< prod/consumer tail. */
> > > >+	volatile RTE_ATOMIC(uint32_t) tail;      /**< prod/consumer tail. */
> > >
> > > Probably a stupid q:
> > > why we do need RTE_ATOMIC() around tail only?
> > > Why head is not affected?
> > 
> > you have a good eye and this is a slightly common issue that i've seen
> > and there appear to be some interesting things showing up.
> > 
> > the field being qualified has atomic operation performed on it the other
> > field does not in the implementation. it may be an indication of a bug in
> > the existing code or it may be intentional.
> 
> Hmm... but as I can see, we are doing similar operations on  both head and tail.
> For head it would be: atomic_load(), then either atomic_store() or atomic_cas().
> For tail it would be: atomic_load(), then atomic_store().
> Or is that because of we missed atomic_store(&r->prod.head, ..., RELAXED) here:
> static __rte_always_inline unsigned int
> __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
>                 unsigned int n, enum rte_ring_queue_behavior behavior,
>                 uint32_t *old_head, uint32_t *new_head,
>                 uint32_t *free_entries)
> {
> ....
> if (is_sp)
>                         r->prod.head = *new_head, success = 1;
> 
> ?

for this instance you are correct, i need to get an understanding of why
this builds successfully because it shouldn't. that it doesn't fail
probably isn't harmful but since this is a public header the structure
is visible it's best to have it carry the correct RTE_ATOMIC(T).

i'll reply back with what i find.

thanks

> 
> > 
> > case 1. atomics should be used but they aren't.
> > 
> > there are fields in structures and variables that were accessed in a
> > 'mixed' manner. that is in some instances __atomic_op_xxx was being used
> > on them and in other instances not. sometimes it is the initialization
> > case so it is probably okay, sometimes maybe not...
> > 
> > case 2. broader scope atomic operation, or we don't care if narrower
> >         access is atomic.
> > 
> > e.g.
> > union {
> >    struct {
> >        uint32_t head;
> >        RTE_ATOMIC(uint32_t) tail;
> >     }
> >     RTE_ATOMIC(uint64_t) combined;
> > }
> > 
> > again, could be an indication of missing use of atomic, often the
> > operation on the `combined' field consistently uses atomics but one of
> > the head/tail fields will not be. on purpose? maybe if we are just doing
> > == comparison?
> > 
> > my approach in this series prioritized no functional change. as a result
> > if any of the above are real bugs, they stay real bugs but i have not
> > changed the way the variables are accessed. if i were to change the code
> > and start atomic specifying it has a risk of performance regression (for
> > cases where it isn't a bug) because specifying would result in the
> > compiler code generating for strongest ordering seq_cst for accesses
> > that are not using atomic generic functions that specify ordering.
> > 
> > there is another case which comes up half a dozen times or so that is
> > also concerning to me, but i would need the maintainers of the code to
> > adapt the code to be correct or maybe it is okay...
> > 
> > 
> > case 3. qualification discard .. is the existing code really okay?
> > 
> > e.g.
> > 
> > atomic_compare_exchange(*object, *expected, desired, ...)
> > 
> > the issue is with the specification of the memory aliased by expected.
> > gcc doesn't complain or enforce discarding of qualification when using
> > builtin intrinsics. the result is that if expected is an atomic type it
> > may be accessed in a non-atomic manner by the code generated for the
> > atomic operation.
> > 
> > again, i have chosen to maintain existing behavior by casting away the
> > qualification if present on the expected argument.
> > 
> > i feel that in terms of mutating the source tree it is best to separate
> > conversion to atomic specified/qualified types into this separate series
> > and then follow up with additional changes that may have
> > functional/performance impact if not for any other reason that it
> > narrows where you have to look if there is a change. certainly conversion
> > to atomics has made these cases far easier to spot in the code.
> > 
> > finally in terms of most of the toolchain/targets all of this is pretty
> > moot because most of them are defaulting to enable_stdatomics=false so
> > most likely if there are problems they will manifest on windows built with
> > msvc only.
> > 
> > thoughts?
> > 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 19/19] ring: use rte optional stdatomic API
  2023-10-25 22:49           ` Tyler Retzlaff
@ 2023-10-25 23:22             ` Tyler Retzlaff
  0 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-25 23:22 UTC (permalink / raw)
  To: Konstantin Ananyev
  Cc: Konstantin Ananyev, dev, Akhil Goyal, Anatoly Burakov,
	Andrew Rybchenko, Bruce Richardson, Chenbo Xia, Ciara Power,
	David Christensen, David Hunt, Dmitry Kozlyuk, Dmitry Malloy,
	Elena Agostini, Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit,
	Harman Kalra, Harry van Haaren, Honnappa Nagarahalli,
	Jerin Jacob, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang

On Wed, Oct 25, 2023 at 03:49:50PM -0700, Tyler Retzlaff wrote:
> On Wed, Oct 25, 2023 at 10:06:23AM +0000, Konstantin Ananyev wrote:
> > 
> > 
> > > 
> > > On Tue, Oct 24, 2023 at 09:43:13AM +0100, Konstantin Ananyev wrote:
> > > > 17.10.2023 21:31, Tyler Retzlaff пишет:
> > > > >Replace the use of gcc builtin __atomic_xxx intrinsics with
> > > > >corresponding rte_atomic_xxx optional stdatomic API
> > > > >
> > > > >Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > > >---
> > > > >  drivers/net/mlx5/mlx5_hws_cnt.h   |  2 +-
> > > > >  lib/ring/rte_ring_c11_pvt.h       | 33 +++++++++++++++++----------------
> > > > >  lib/ring/rte_ring_core.h          | 10 +++++-----
> > > > >  lib/ring/rte_ring_generic_pvt.h   |  3 ++-
> > > > >  lib/ring/rte_ring_hts_elem_pvt.h  | 22 ++++++++++++----------
> > > > >  lib/ring/rte_ring_peek_elem_pvt.h |  6 +++---
> > > > >  lib/ring/rte_ring_rts_elem_pvt.h  | 27 ++++++++++++++-------------
> > > > >  7 files changed, 54 insertions(+), 49 deletions(-)
> > > > >
> > > > >diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
> > > > >index f462665..cc9ac10 100644
> > > > >--- a/drivers/net/mlx5/mlx5_hws_cnt.h
> > > > >+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
> > > > >@@ -394,7 +394,7 @@ struct mlx5_hws_age_param {
> > > > >  	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
> > > > >  			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
> > > > >  	/* Update tail */
> > > > >-	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
> > > > >+	rte_atomic_store_explicit(&r->prod.tail, revert2head, rte_memory_order_release);
> > > > >  	return n;
> > > > >  }
> > > > >diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> > > > >index f895950..f8be538 100644
> > > > >--- a/lib/ring/rte_ring_c11_pvt.h
> > > > >+++ b/lib/ring/rte_ring_c11_pvt.h
> > > > >@@ -22,9 +22,10 @@
> > > > >  	 * we need to wait for them to complete
> > > > >  	 */
> > > > >  	if (!single)
> > > > >-		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> > > > >+		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
> > > > >+			rte_memory_order_relaxed);
> > > > >-	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
> > > > >+	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
> > > > >  }
> > > > >  /**
> > > > >@@ -61,19 +62,19 @@
> > > > >  	unsigned int max = n;
> > > > >  	int success;
> > > > >-	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
> > > > >+	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
> > > > >  	do {
> > > > >  		/* Reset n to the initial burst count */
> > > > >  		n = max;
> > > > >  		/* Ensure the head is read before tail */
> > > > >-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> > > > >+		__atomic_thread_fence(rte_memory_order_acquire);
> > > > >  		/* load-acquire synchronize with store-release of ht->tail
> > > > >  		 * in update_tail.
> > > > >  		 */
> > > > >-		cons_tail = __atomic_load_n(&r->cons.tail,
> > > > >-					__ATOMIC_ACQUIRE);
> > > > >+		cons_tail = rte_atomic_load_explicit(&r->cons.tail,
> > > > >+					rte_memory_order_acquire);
> > > > >  		/* The subtraction is done between two unsigned 32bits value
> > > > >  		 * (the result is always modulo 32 bits even if we have
> > > > >@@ -95,10 +96,10 @@
> > > > >  			r->prod.head = *new_head, success = 1;
> > > > >  		else
> > > > >  			/* on failure, *old_head is updated */
> > > > >-			success = __atomic_compare_exchange_n(&r->prod.head,
> > > > >+			success = rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
> > > > >  					old_head, *new_head,
> > > > >-					0, __ATOMIC_RELAXED,
> > > > >-					__ATOMIC_RELAXED);
> > > > >+					rte_memory_order_relaxed,
> > > > >+					rte_memory_order_relaxed);
> > > > >  	} while (unlikely(success == 0));
> > > > >  	return n;
> > > > >  }
> > > > >@@ -137,19 +138,19 @@
> > > > >  	int success;
> > > > >  	/* move cons.head atomically */
> > > > >-	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
> > > > >+	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_relaxed);
> > > > >  	do {
> > > > >  		/* Restore n as it may change every loop */
> > > > >  		n = max;
> > > > >  		/* Ensure the head is read before tail */
> > > > >-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> > > > >+		__atomic_thread_fence(rte_memory_order_acquire);
> > > > >  		/* this load-acquire synchronize with store-release of ht->tail
> > > > >  		 * in update_tail.
> > > > >  		 */
> > > > >-		prod_tail = __atomic_load_n(&r->prod.tail,
> > > > >-					__ATOMIC_ACQUIRE);
> > > > >+		prod_tail = rte_atomic_load_explicit(&r->prod.tail,
> > > > >+					rte_memory_order_acquire);
> > > > >  		/* The subtraction is done between two unsigned 32bits value
> > > > >  		 * (the result is always modulo 32 bits even if we have
> > > > >@@ -170,10 +171,10 @@
> > > > >  			r->cons.head = *new_head, success = 1;
> > > > >  		else
> > > > >  			/* on failure, *old_head will be updated */
> > > > >-			success = __atomic_compare_exchange_n(&r->cons.head,
> > > > >+			success = rte_atomic_compare_exchange_strong_explicit(&r->cons.head,
> > > > >  							old_head, *new_head,
> > > > >-							0, __ATOMIC_RELAXED,
> > > > >-							__ATOMIC_RELAXED);
> > > > >+							rte_memory_order_relaxed,
> > > > >+							rte_memory_order_relaxed);
> > > > >  	} while (unlikely(success == 0));
> > > > >  	return n;
> > > > >  }
> > > > >diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
> > > > >index 327fdcf..7a2b577 100644
> > > > >--- a/lib/ring/rte_ring_core.h
> > > > >+++ b/lib/ring/rte_ring_core.h
> > > > >@@ -67,7 +67,7 @@ enum rte_ring_sync_type {
> > > > >   */
> > > > >  struct rte_ring_headtail {
> > > > >  	volatile uint32_t head;      /**< prod/consumer head. */
> > > > >-	volatile uint32_t tail;      /**< prod/consumer tail. */
> > > > >+	volatile RTE_ATOMIC(uint32_t) tail;      /**< prod/consumer tail. */
> > > >
> > > > Probably a stupid q:
> > > > why we do need RTE_ATOMIC() around tail only?
> > > > Why head is not affected?
> > > 
> > > you have a good eye and this is a slightly common issue that i've seen
> > > and there appear to be some interesting things showing up.
> > > 
> > > the field being qualified has atomic operation performed on it the other
> > > field does not in the implementation. it may be an indication of a bug in
> > > the existing code or it may be intentional.
> > 
> > Hmm... but as I can see, we are doing similar operations on  both head and tail.
> > For head it would be: atomic_load(), then either atomic_store() or atomic_cas().
> > For tail it would be: atomic_load(), then atomic_store().
> > Or is that because of we missed atomic_store(&r->prod.head, ..., RELAXED) here:
> > static __rte_always_inline unsigned int
> > __rte_ring_move_prod_head(struct rte_ring *r, unsigned int is_sp,
> >                 unsigned int n, enum rte_ring_queue_behavior behavior,
> >                 uint32_t *old_head, uint32_t *new_head,
> >                 uint32_t *free_entries)
> > {
> > ....
> > if (is_sp)
> >                         r->prod.head = *new_head, success = 1;
> > 
> > ?
> 
> for this instance you are correct, i need to get an understanding of why
> this builds successfully because it shouldn't. that it doesn't fail
> probably isn't harmful but since this is a public header the structure
> is visible it's best to have it carry the correct RTE_ATOMIC(T).
> 
> i'll reply back with what i find.

okay, circling back to answer. simply put we don't seem to have any CI
or convenient way to configure a build with RTE_USE_C11_MEM_MODEL so if
we don't build rte_ring_c11_pvt.h that field is not accessed with a
standard atomic generic function.

this explains why it doesn't fail to build and why it doesn't stick out as
needing to be properly specified. which should be done.

i'll submit a new series that catches the other places the series missed
when using RTE_USE_C11_MEM_MODEL

thanks

> 
> thanks
> 
> > 
> > > 
> > > case 1. atomics should be used but they aren't.
> > > 
> > > there are fields in structures and variables that were accessed in a
> > > 'mixed' manner. that is in some instances __atomic_op_xxx was being used
> > > on them and in other instances not. sometimes it is the initialization
> > > case so it is probably okay, sometimes maybe not...
> > > 
> > > case 2. broader scope atomic operation, or we don't care if narrower
> > >         access is atomic.
> > > 
> > > e.g.
> > > union {
> > >    struct {
> > >        uint32_t head;
> > >        RTE_ATOMIC(uint32_t) tail;
> > >     }
> > >     RTE_ATOMIC(uint64_t) combined;
> > > }
> > > 
> > > again, could be an indication of missing use of atomic, often the
> > > operation on the `combined' field consistently uses atomics but one of
> > > the head/tail fields will not be. on purpose? maybe if we are just doing
> > > == comparison?
> > > 
> > > my approach in this series prioritized no functional change. as a result
> > > if any of the above are real bugs, they stay real bugs but i have not
> > > changed the way the variables are accessed. if i were to change the code
> > > and start atomic specifying it has a risk of performance regression (for
> > > cases where it isn't a bug) because specifying would result in the
> > > compiler code generating for strongest ordering seq_cst for accesses
> > > that are not using atomic generic functions that specify ordering.
> > > 
> > > there is another case which comes up half a dozen times or so that is
> > > also concerning to me, but i would need the maintainers of the code to
> > > adapt the code to be correct or maybe it is okay...
> > > 
> > > 
> > > case 3. qualification discard .. is the existing code really okay?
> > > 
> > > e.g.
> > > 
> > > atomic_compare_exchange(*object, *expected, desired, ...)
> > > 
> > > the issue is with the specification of the memory aliased by expected.
> > > gcc doesn't complain or enforce discarding of qualification when using
> > > builtin intrinsics. the result is that if expected is an atomic type it
> > > may be accessed in a non-atomic manner by the code generated for the
> > > atomic operation.
> > > 
> > > again, i have chosen to maintain existing behavior by casting away the
> > > qualification if present on the expected argument.
> > > 
> > > i feel that in terms of mutating the source tree it is best to separate
> > > conversion to atomic specified/qualified types into this separate series
> > > and then follow up with additional changes that may have
> > > functional/performance impact if not for any other reason that it
> > > narrows where you have to look if there is a change. certainly conversion
> > > to atomics has made these cases far easier to spot in the code.
> > > 
> > > finally in terms of most of the toolchain/targets all of this is pretty
> > > moot because most of them are defaulting to enable_stdatomics=false so
> > > most likely if there are problems they will manifest on windows built with
> > > msvc only.
> > > 
> > > thoughts?
> > > 

^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 00/19] use rte optional stdatomic API
  2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
                   ` (21 preceding siblings ...)
  2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
@ 2023-10-26  0:31 ` Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 01/19] power: " Tyler Retzlaff
                     ` (20 more replies)
  22 siblings, 21 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API.

v3:
  * add missing atomic specification on head variable
    in struct rte_ring_headtail
  * adapt to use rte_atomic_xxx stdatomic API in rte_ring_c11_pvt.h
  * split comma operator statement into 2 statements

v2:
  * add #include <rte_stdatomic.h> to rte_mbuf_core.h
  * remove first two patches which were fixes that have
    been merged in another series

Tyler Retzlaff (19):
  power: use rte optional stdatomic API
  bbdev: use rte optional stdatomic API
  eal: use rte optional stdatomic API
  eventdev: use rte optional stdatomic API
  gpudev: use rte optional stdatomic API
  ipsec: use rte optional stdatomic API
  mbuf: use rte optional stdatomic API
  mempool: use rte optional stdatomic API
  rcu: use rte optional stdatomic API
  pdump: use rte optional stdatomic API
  stack: use rte optional stdatomic API
  telemetry: use rte optional stdatomic API
  vhost: use rte optional stdatomic API
  cryptodev: use rte optional stdatomic API
  distributor: use rte optional stdatomic API
  ethdev: use rte optional stdatomic API
  hash: use rte optional stdatomic API
  timer: use rte optional stdatomic API
  ring: use rte optional stdatomic API

 drivers/event/cnxk/cnxk_tim_worker.h   |   4 +-
 drivers/net/mlx5/mlx5_hws_cnt.h        |   4 +-
 lib/bbdev/rte_bbdev.c                  |   6 +-
 lib/bbdev/rte_bbdev.h                  |   2 +-
 lib/cryptodev/rte_cryptodev.c          |  22 +++---
 lib/cryptodev/rte_cryptodev.h          |  16 ++---
 lib/distributor/distributor_private.h  |   4 +-
 lib/distributor/rte_distributor.c      |  54 +++++++--------
 lib/eal/common/eal_common_launch.c     |  10 +--
 lib/eal/common/eal_common_mcfg.c       |   2 +-
 lib/eal/common/eal_common_proc.c       |  14 ++--
 lib/eal/common/eal_common_thread.c     |  26 +++----
 lib/eal/common/eal_common_trace.c      |   8 +--
 lib/eal/common/eal_common_trace_ctf.c  |   4 +-
 lib/eal/common/eal_memcfg.h            |   2 +-
 lib/eal/common/eal_private.h           |   4 +-
 lib/eal/common/eal_trace.h             |   4 +-
 lib/eal/common/rte_service.c           | 122 ++++++++++++++++-----------------
 lib/eal/freebsd/eal.c                  |  20 +++---
 lib/eal/include/rte_epoll.h            |   3 +-
 lib/eal/linux/eal.c                    |  26 +++----
 lib/eal/linux/eal_interrupts.c         |  42 ++++++------
 lib/eal/ppc/include/rte_atomic.h       |   6 +-
 lib/eal/windows/rte_thread.c           |   8 ++-
 lib/ethdev/ethdev_driver.h             |  16 ++---
 lib/ethdev/ethdev_private.c            |   6 +-
 lib/ethdev/rte_ethdev.c                |  24 +++----
 lib/ethdev/rte_ethdev.h                |  16 ++---
 lib/ethdev/rte_ethdev_core.h           |   2 +-
 lib/eventdev/rte_event_timer_adapter.c |  66 +++++++++---------
 lib/eventdev/rte_event_timer_adapter.h |   2 +-
 lib/gpudev/gpudev.c                    |   6 +-
 lib/gpudev/gpudev_driver.h             |   2 +-
 lib/hash/rte_cuckoo_hash.c             | 116 +++++++++++++++----------------
 lib/hash/rte_cuckoo_hash.h             |   6 +-
 lib/ipsec/ipsec_sqn.h                  |   2 +-
 lib/ipsec/sa.h                         |   2 +-
 lib/mbuf/rte_mbuf.h                    |  20 +++---
 lib/mbuf/rte_mbuf_core.h               |   5 +-
 lib/mempool/rte_mempool.h              |   4 +-
 lib/pdump/rte_pdump.c                  |  14 ++--
 lib/pdump/rte_pdump.h                  |   8 +--
 lib/power/power_acpi_cpufreq.c         |  33 ++++-----
 lib/power/power_cppc_cpufreq.c         |  25 +++----
 lib/power/power_pstate_cpufreq.c       |  31 +++++----
 lib/rcu/rte_rcu_qsbr.c                 |  48 ++++++-------
 lib/rcu/rte_rcu_qsbr.h                 |  68 +++++++++---------
 lib/ring/rte_ring_c11_pvt.h            |  47 +++++++------
 lib/ring/rte_ring_core.h               |  12 ++--
 lib/ring/rte_ring_generic_pvt.h        |  16 +++--
 lib/ring/rte_ring_hts_elem_pvt.h       |  22 +++---
 lib/ring/rte_ring_peek_elem_pvt.h      |   6 +-
 lib/ring/rte_ring_rts_elem_pvt.h       |  27 ++++----
 lib/stack/rte_stack.h                  |   2 +-
 lib/stack/rte_stack_lf_c11.h           |  24 +++----
 lib/stack/rte_stack_lf_generic.h       |  18 ++---
 lib/telemetry/telemetry.c              |  18 ++---
 lib/timer/rte_timer.c                  |  50 +++++++-------
 lib/timer/rte_timer.h                  |   6 +-
 lib/vhost/vdpa.c                       |   3 +-
 lib/vhost/vhost.c                      |  42 ++++++------
 lib/vhost/vhost.h                      |  39 ++++++-----
 lib/vhost/vhost_user.c                 |   6 +-
 lib/vhost/virtio_net.c                 |  58 +++++++++-------
 lib/vhost/virtio_net_ctrl.c            |   6 +-
 65 files changed, 684 insertions(+), 653 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 01/19] power: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 02/19] bbdev: " Tyler Retzlaff
                     ` (19 subsequent siblings)
  20 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding
rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/power/power_acpi_cpufreq.c   | 33 +++++++++++++++++----------------
 lib/power/power_cppc_cpufreq.c   | 25 +++++++++++++------------
 lib/power/power_pstate_cpufreq.c | 31 ++++++++++++++++---------------
 3 files changed, 46 insertions(+), 43 deletions(-)

diff --git a/lib/power/power_acpi_cpufreq.c b/lib/power/power_acpi_cpufreq.c
index 6e57aca..8b55f19 100644
--- a/lib/power/power_acpi_cpufreq.c
+++ b/lib/power/power_acpi_cpufreq.c
@@ -7,6 +7,7 @@
 #include <stdlib.h>
 
 #include <rte_memcpy.h>
+#include <rte_stdatomic.h>
 #include <rte_string_fns.h>
 
 #include "power_acpi_cpufreq.h"
@@ -41,13 +42,13 @@ enum power_state {
  * Power info per lcore.
  */
 struct acpi_power_info {
-	unsigned int lcore_id;                   /**< Logical core id */
+	unsigned int lcore_id;               /**< Logical core id */
 	uint32_t freqs[RTE_MAX_LCORE_FREQS]; /**< Frequency array */
 	uint32_t nb_freqs;                   /**< number of available freqs */
 	FILE *f;                             /**< FD of scaling_setspeed */
 	char governor_ori[32];               /**< Original governor name */
 	uint32_t curr_idx;                   /**< Freq index in freqs array */
-	uint32_t state;                      /**< Power in use state */
+	RTE_ATOMIC(uint32_t) state;          /**< Power in use state */
 	uint16_t turbo_available;            /**< Turbo Boost available */
 	uint16_t turbo_enable;               /**< Turbo Boost enable/disable */
 } __rte_cache_aligned;
@@ -249,9 +250,9 @@ struct acpi_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"in use\n", lcore_id);
 		return -1;
@@ -289,15 +290,15 @@ struct acpi_power_info {
 	RTE_LOG(INFO, POWER, "Initialized successfully for lcore %u "
 			"power management\n", lcore_id);
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_USED,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_USED,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return 0;
 
 fail:
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_UNKNOWN,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_UNKNOWN,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return -1;
 }
@@ -321,9 +322,9 @@ struct acpi_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"not used\n", lcore_id);
 		return -1;
@@ -344,15 +345,15 @@ struct acpi_power_info {
 			"'userspace' mode and been set back to the "
 			"original\n", lcore_id);
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_IDLE,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_IDLE,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return 0;
 
 fail:
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_UNKNOWN,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_UNKNOWN,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return -1;
 }
diff --git a/lib/power/power_cppc_cpufreq.c b/lib/power/power_cppc_cpufreq.c
index fc9cffe..bb70f6a 100644
--- a/lib/power/power_cppc_cpufreq.c
+++ b/lib/power/power_cppc_cpufreq.c
@@ -6,6 +6,7 @@
 #include <stdlib.h>
 
 #include <rte_memcpy.h>
+#include <rte_stdatomic.h>
 
 #include "power_cppc_cpufreq.h"
 #include "power_common.h"
@@ -49,8 +50,8 @@ enum power_state {
  * Power info per lcore.
  */
 struct cppc_power_info {
-	unsigned int lcore_id;                   /**< Logical core id */
-	uint32_t state;                      /**< Power in use state */
+	unsigned int lcore_id;               /**< Logical core id */
+	RTE_ATOMIC(uint32_t) state;          /**< Power in use state */
 	FILE *f;                             /**< FD of scaling_setspeed */
 	char governor_ori[32];               /**< Original governor name */
 	uint32_t curr_idx;                   /**< Freq index in freqs array */
@@ -353,9 +354,9 @@ struct cppc_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"in use\n", lcore_id);
 		return -1;
@@ -393,12 +394,12 @@ struct cppc_power_info {
 	RTE_LOG(INFO, POWER, "Initialized successfully for lcore %u "
 			"power management\n", lcore_id);
 
-	__atomic_store_n(&(pi->state), POWER_USED, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_USED, rte_memory_order_release);
 
 	return 0;
 
 fail:
-	__atomic_store_n(&(pi->state), POWER_UNKNOWN, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_UNKNOWN, rte_memory_order_release);
 	return -1;
 }
 
@@ -431,9 +432,9 @@ struct cppc_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"not used\n", lcore_id);
 		return -1;
@@ -453,12 +454,12 @@ struct cppc_power_info {
 	RTE_LOG(INFO, POWER, "Power management of lcore %u has exited from "
 			"'userspace' mode and been set back to the "
 			"original\n", lcore_id);
-	__atomic_store_n(&(pi->state), POWER_IDLE, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_IDLE, rte_memory_order_release);
 
 	return 0;
 
 fail:
-	__atomic_store_n(&(pi->state), POWER_UNKNOWN, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(pi->state), POWER_UNKNOWN, rte_memory_order_release);
 
 	return -1;
 }
diff --git a/lib/power/power_pstate_cpufreq.c b/lib/power/power_pstate_cpufreq.c
index 52aa645..5ca5f60 100644
--- a/lib/power/power_pstate_cpufreq.c
+++ b/lib/power/power_pstate_cpufreq.c
@@ -12,6 +12,7 @@
 #include <inttypes.h>
 
 #include <rte_memcpy.h>
+#include <rte_stdatomic.h>
 
 #include "rte_power_pmd_mgmt.h"
 #include "power_pstate_cpufreq.h"
@@ -59,7 +60,7 @@ struct pstate_power_info {
 	uint32_t non_turbo_max_ratio;        /**< Non Turbo Max ratio  */
 	uint32_t sys_max_freq;               /**< system wide max freq  */
 	uint32_t core_base_freq;             /**< core base freq  */
-	uint32_t state;                      /**< Power in use state */
+	RTE_ATOMIC(uint32_t) state;          /**< Power in use state */
 	uint16_t turbo_available;            /**< Turbo Boost available */
 	uint16_t turbo_enable;               /**< Turbo Boost enable/disable */
 	uint16_t priority_core;              /**< High Performance core */
@@ -555,9 +556,9 @@ struct pstate_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are done under the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"in use\n", lcore_id);
 		return -1;
@@ -600,15 +601,15 @@ struct pstate_power_info {
 	RTE_LOG(INFO, POWER, "Initialized successfully for lcore %u "
 			"power management\n", lcore_id);
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_USED,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_USED,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return 0;
 
 fail:
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_UNKNOWN,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_UNKNOWN,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return -1;
 }
@@ -633,9 +634,9 @@ struct pstate_power_info {
 	 * ordering below as lock to make sure the frequency operations
 	 * in the critical section are under done the correct state.
 	 */
-	if (!__atomic_compare_exchange_n(&(pi->state), &exp_state,
-					POWER_ONGOING, 0,
-					__ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state,
+					POWER_ONGOING,
+					rte_memory_order_acquire, rte_memory_order_relaxed)) {
 		RTE_LOG(INFO, POWER, "Power management of lcore %u is "
 				"not used\n", lcore_id);
 		return -1;
@@ -658,15 +659,15 @@ struct pstate_power_info {
 			"'performance' mode and been set back to the "
 			"original\n", lcore_id);
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_IDLE,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_IDLE,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return 0;
 
 fail:
 	exp_state = POWER_ONGOING;
-	__atomic_compare_exchange_n(&(pi->state), &exp_state, POWER_UNKNOWN,
-				    0, __ATOMIC_RELEASE, __ATOMIC_RELAXED);
+	rte_atomic_compare_exchange_strong_explicit(&(pi->state), &exp_state, POWER_UNKNOWN,
+				    rte_memory_order_release, rte_memory_order_relaxed);
 
 	return -1;
 }
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 02/19] bbdev: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 01/19] power: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26 11:57     ` Maxime Coquelin
  2023-10-26  0:31   ` [PATCH v3 03/19] eal: " Tyler Retzlaff
                     ` (18 subsequent siblings)
  20 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding
rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/bbdev/rte_bbdev.c | 6 +++---
 lib/bbdev/rte_bbdev.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/bbdev/rte_bbdev.c b/lib/bbdev/rte_bbdev.c
index 155323e..cfebea0 100644
--- a/lib/bbdev/rte_bbdev.c
+++ b/lib/bbdev/rte_bbdev.c
@@ -208,7 +208,7 @@ struct rte_bbdev *
 		return NULL;
 	}
 
-	__atomic_fetch_add(&bbdev->data->process_cnt, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&bbdev->data->process_cnt, 1, rte_memory_order_relaxed);
 	bbdev->data->dev_id = dev_id;
 	bbdev->state = RTE_BBDEV_INITIALIZED;
 
@@ -250,8 +250,8 @@ struct rte_bbdev *
 	}
 
 	/* clear shared BBDev Data if no process is using the device anymore */
-	if (__atomic_fetch_sub(&bbdev->data->process_cnt, 1,
-			      __ATOMIC_RELAXED) - 1 == 0)
+	if (rte_atomic_fetch_sub_explicit(&bbdev->data->process_cnt, 1,
+			      rte_memory_order_relaxed) - 1 == 0)
 		memset(bbdev->data, 0, sizeof(*bbdev->data));
 
 	memset(bbdev, 0, sizeof(*bbdev));
diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index d12e2e7..e1aee08 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -482,7 +482,7 @@ struct rte_bbdev_data {
 	uint16_t dev_id;  /**< Device ID */
 	int socket_id;  /**< NUMA socket that device is on */
 	bool started;  /**< Device run-time state */
-	uint16_t process_cnt;  /** Counter of processes using the device */
+	RTE_ATOMIC(uint16_t) process_cnt;  /** Counter of processes using the device */
 };
 
 /* Forward declarations */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 03/19] eal: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 01/19] power: " Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 02/19] bbdev: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 04/19] eventdev: " Tyler Retzlaff
                     ` (17 subsequent siblings)
  20 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding
rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/common/eal_common_launch.c    |  10 +--
 lib/eal/common/eal_common_mcfg.c      |   2 +-
 lib/eal/common/eal_common_proc.c      |  14 ++--
 lib/eal/common/eal_common_thread.c    |  26 ++++----
 lib/eal/common/eal_common_trace.c     |   8 +--
 lib/eal/common/eal_common_trace_ctf.c |   4 +-
 lib/eal/common/eal_memcfg.h           |   2 +-
 lib/eal/common/eal_private.h          |   4 +-
 lib/eal/common/eal_trace.h            |   4 +-
 lib/eal/common/rte_service.c          | 122 +++++++++++++++++-----------------
 lib/eal/freebsd/eal.c                 |  20 +++---
 lib/eal/include/rte_epoll.h           |   3 +-
 lib/eal/linux/eal.c                   |  26 ++++----
 lib/eal/linux/eal_interrupts.c        |  42 ++++++------
 lib/eal/ppc/include/rte_atomic.h      |   6 +-
 lib/eal/windows/rte_thread.c          |   8 ++-
 16 files changed, 152 insertions(+), 149 deletions(-)

diff --git a/lib/eal/common/eal_common_launch.c b/lib/eal/common/eal_common_launch.c
index 0504598..5320c3b 100644
--- a/lib/eal/common/eal_common_launch.c
+++ b/lib/eal/common/eal_common_launch.c
@@ -18,8 +18,8 @@
 int
 rte_eal_wait_lcore(unsigned worker_id)
 {
-	while (__atomic_load_n(&lcore_config[worker_id].state,
-			__ATOMIC_ACQUIRE) != WAIT)
+	while (rte_atomic_load_explicit(&lcore_config[worker_id].state,
+			rte_memory_order_acquire) != WAIT)
 		rte_pause();
 
 	return lcore_config[worker_id].ret;
@@ -38,8 +38,8 @@
 	/* Check if the worker is in 'WAIT' state. Use acquire order
 	 * since 'state' variable is used as the guard variable.
 	 */
-	if (__atomic_load_n(&lcore_config[worker_id].state,
-			__ATOMIC_ACQUIRE) != WAIT)
+	if (rte_atomic_load_explicit(&lcore_config[worker_id].state,
+			rte_memory_order_acquire) != WAIT)
 		goto finish;
 
 	lcore_config[worker_id].arg = arg;
@@ -47,7 +47,7 @@
 	 * before the worker thread starts running the function.
 	 * Use worker thread function as the guard variable.
 	 */
-	__atomic_store_n(&lcore_config[worker_id].f, f, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&lcore_config[worker_id].f, f, rte_memory_order_release);
 
 	rc = eal_thread_wake_worker(worker_id);
 
diff --git a/lib/eal/common/eal_common_mcfg.c b/lib/eal/common/eal_common_mcfg.c
index 2a785e7..dabb80e 100644
--- a/lib/eal/common/eal_common_mcfg.c
+++ b/lib/eal/common/eal_common_mcfg.c
@@ -30,7 +30,7 @@
 	struct rte_mem_config *mcfg = rte_eal_get_configuration()->mem_config;
 
 	/* wait until shared mem_config finish initialising */
-	rte_wait_until_equal_32(&mcfg->magic, RTE_MAGIC, __ATOMIC_RELAXED);
+	rte_wait_until_equal_32(&mcfg->magic, RTE_MAGIC, rte_memory_order_relaxed);
 }
 
 int
diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index f20a348..728815c 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -33,7 +33,7 @@
 #include "eal_filesystem.h"
 #include "eal_internal_cfg.h"
 
-static int mp_fd = -1;
+static RTE_ATOMIC(int) mp_fd = -1;
 static rte_thread_t mp_handle_tid;
 static char mp_filter[PATH_MAX];   /* Filter for secondary process sockets */
 static char mp_dir_path[PATH_MAX]; /* The directory path for all mp sockets */
@@ -404,7 +404,7 @@ struct pending_request {
 	struct sockaddr_un sa;
 	int fd;
 
-	while ((fd = __atomic_load_n(&mp_fd, __ATOMIC_RELAXED)) >= 0) {
+	while ((fd = rte_atomic_load_explicit(&mp_fd, rte_memory_order_relaxed)) >= 0) {
 		int ret;
 
 		ret = read_msg(fd, &msg, &sa);
@@ -652,7 +652,7 @@ enum async_action {
 		RTE_LOG(ERR, EAL, "failed to create mp thread: %s\n",
 			strerror(errno));
 		close(dir_fd);
-		close(__atomic_exchange_n(&mp_fd, -1, __ATOMIC_RELAXED));
+		close(rte_atomic_exchange_explicit(&mp_fd, -1, rte_memory_order_relaxed));
 		return -1;
 	}
 
@@ -668,7 +668,7 @@ enum async_action {
 {
 	int fd;
 
-	fd = __atomic_exchange_n(&mp_fd, -1, __ATOMIC_RELAXED);
+	fd = rte_atomic_exchange_explicit(&mp_fd, -1, rte_memory_order_relaxed);
 	if (fd < 0)
 		return;
 
@@ -1282,11 +1282,11 @@ enum mp_status {
 
 	expected = MP_STATUS_UNKNOWN;
 	desired = status;
-	if (__atomic_compare_exchange_n(&mcfg->mp_status, &expected, desired,
-			false, __ATOMIC_RELAXED, __ATOMIC_RELAXED))
+	if (rte_atomic_compare_exchange_strong_explicit(&mcfg->mp_status, &expected, desired,
+			rte_memory_order_relaxed, rte_memory_order_relaxed))
 		return true;
 
-	return __atomic_load_n(&mcfg->mp_status, __ATOMIC_RELAXED) == desired;
+	return rte_atomic_load_explicit(&mcfg->mp_status, rte_memory_order_relaxed) == desired;
 }
 
 bool
diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c
index 668b1ed..c422ea8 100644
--- a/lib/eal/common/eal_common_thread.c
+++ b/lib/eal/common/eal_common_thread.c
@@ -191,8 +191,8 @@ unsigned rte_socket_id(void)
 		/* Set the state to 'RUNNING'. Use release order
 		 * since 'state' variable is used as the guard variable.
 		 */
-		__atomic_store_n(&lcore_config[lcore_id].state, RUNNING,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&lcore_config[lcore_id].state, RUNNING,
+			rte_memory_order_release);
 
 		eal_thread_ack_command();
 
@@ -201,8 +201,8 @@ unsigned rte_socket_id(void)
 		 * are accessed only after update to 'f' is visible.
 		 * Wait till the update to 'f' is visible to the worker.
 		 */
-		while ((f = __atomic_load_n(&lcore_config[lcore_id].f,
-				__ATOMIC_ACQUIRE)) == NULL)
+		while ((f = rte_atomic_load_explicit(&lcore_config[lcore_id].f,
+				rte_memory_order_acquire)) == NULL)
 			rte_pause();
 
 		rte_eal_trace_thread_lcore_running(lcore_id, f);
@@ -219,8 +219,8 @@ unsigned rte_socket_id(void)
 		 * are completed before the state is updated.
 		 * Use 'state' as the guard variable.
 		 */
-		__atomic_store_n(&lcore_config[lcore_id].state, WAIT,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&lcore_config[lcore_id].state, WAIT,
+			rte_memory_order_release);
 
 		rte_eal_trace_thread_lcore_stopped(lcore_id);
 	}
@@ -242,7 +242,7 @@ struct control_thread_params {
 	/* Control thread status.
 	 * If the status is CTRL_THREAD_ERROR, 'ret' has the error code.
 	 */
-	enum __rte_ctrl_thread_status status;
+	RTE_ATOMIC(enum __rte_ctrl_thread_status) status;
 };
 
 static int control_thread_init(void *arg)
@@ -259,13 +259,13 @@ static int control_thread_init(void *arg)
 	RTE_PER_LCORE(_socket_id) = SOCKET_ID_ANY;
 	params->ret = rte_thread_set_affinity_by_id(rte_thread_self(), cpuset);
 	if (params->ret != 0) {
-		__atomic_store_n(&params->status,
-			CTRL_THREAD_ERROR, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&params->status,
+			CTRL_THREAD_ERROR, rte_memory_order_release);
 		return 1;
 	}
 
-	__atomic_store_n(&params->status,
-		CTRL_THREAD_RUNNING, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&params->status,
+		CTRL_THREAD_RUNNING, rte_memory_order_release);
 
 	return 0;
 }
@@ -310,8 +310,8 @@ static uint32_t control_thread_start(void *arg)
 
 	/* Wait for the control thread to initialize successfully */
 	while ((ctrl_thread_status =
-			__atomic_load_n(&params->status,
-			__ATOMIC_ACQUIRE)) == CTRL_THREAD_LAUNCHING) {
+			rte_atomic_load_explicit(&params->status,
+			rte_memory_order_acquire)) == CTRL_THREAD_LAUNCHING) {
 		rte_delay_us_sleep(1);
 	}
 
diff --git a/lib/eal/common/eal_common_trace.c b/lib/eal/common/eal_common_trace.c
index d2eac2d..6ad87fc 100644
--- a/lib/eal/common/eal_common_trace.c
+++ b/lib/eal/common/eal_common_trace.c
@@ -97,7 +97,7 @@ struct trace_point_head *
 bool
 rte_trace_is_enabled(void)
 {
-	return __atomic_load_n(&trace.status, __ATOMIC_ACQUIRE) != 0;
+	return rte_atomic_load_explicit(&trace.status, rte_memory_order_acquire) != 0;
 }
 
 static void
@@ -157,7 +157,7 @@ rte_trace_mode rte_trace_mode_get(void)
 	prev = rte_atomic_fetch_or_explicit(t, __RTE_TRACE_FIELD_ENABLE_MASK,
 		rte_memory_order_release);
 	if ((prev & __RTE_TRACE_FIELD_ENABLE_MASK) == 0)
-		__atomic_fetch_add(&trace.status, 1, __ATOMIC_RELEASE);
+		rte_atomic_fetch_add_explicit(&trace.status, 1, rte_memory_order_release);
 	return 0;
 }
 
@@ -172,7 +172,7 @@ rte_trace_mode rte_trace_mode_get(void)
 	prev = rte_atomic_fetch_and_explicit(t, ~__RTE_TRACE_FIELD_ENABLE_MASK,
 		rte_memory_order_release);
 	if ((prev & __RTE_TRACE_FIELD_ENABLE_MASK) != 0)
-		__atomic_fetch_sub(&trace.status, 1, __ATOMIC_RELEASE);
+		rte_atomic_fetch_sub_explicit(&trace.status, 1, rte_memory_order_release);
 	return 0;
 }
 
@@ -526,7 +526,7 @@ rte_trace_mode rte_trace_mode_get(void)
 
 	/* Add the trace point at tail */
 	STAILQ_INSERT_TAIL(&tp_list, tp, next);
-	__atomic_thread_fence(__ATOMIC_RELEASE);
+	__atomic_thread_fence(rte_memory_order_release);
 
 	/* All Good !!! */
 	return 0;
diff --git a/lib/eal/common/eal_common_trace_ctf.c b/lib/eal/common/eal_common_trace_ctf.c
index c6775c3..04c4f71 100644
--- a/lib/eal/common/eal_common_trace_ctf.c
+++ b/lib/eal/common/eal_common_trace_ctf.c
@@ -361,10 +361,10 @@
 	if (ctf_meta == NULL)
 		return -EINVAL;
 
-	if (!__atomic_load_n(&trace->ctf_fixup_done, __ATOMIC_SEQ_CST) &&
+	if (!rte_atomic_load_explicit(&trace->ctf_fixup_done, rte_memory_order_seq_cst) &&
 				rte_get_timer_hz()) {
 		meta_fixup(trace, ctf_meta);
-		__atomic_store_n(&trace->ctf_fixup_done, 1, __ATOMIC_SEQ_CST);
+		rte_atomic_store_explicit(&trace->ctf_fixup_done, 1, rte_memory_order_seq_cst);
 	}
 
 	rc = fprintf(f, "%s", ctf_meta);
diff --git a/lib/eal/common/eal_memcfg.h b/lib/eal/common/eal_memcfg.h
index d5c63e2..60e2089 100644
--- a/lib/eal/common/eal_memcfg.h
+++ b/lib/eal/common/eal_memcfg.h
@@ -42,7 +42,7 @@ struct rte_mem_config {
 	rte_rwlock_t memory_hotplug_lock;
 	/**< Indicates whether memory hotplug request is in progress. */
 
-	uint8_t mp_status; /**< Multiprocess status. */
+	RTE_ATOMIC(uint8_t) mp_status; /**< Multiprocess status. */
 
 	/* memory segments and zones */
 	struct rte_fbarray memzones; /**< Memzone descriptors. */
diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
index ebd496b..4d2e806 100644
--- a/lib/eal/common/eal_private.h
+++ b/lib/eal/common/eal_private.h
@@ -24,11 +24,11 @@ struct lcore_config {
 	int pipe_main2worker[2];   /**< communication pipe with main */
 	int pipe_worker2main[2];   /**< communication pipe with main */
 
-	lcore_function_t * volatile f; /**< function to call */
+	RTE_ATOMIC(lcore_function_t *) volatile f; /**< function to call */
 	void * volatile arg;       /**< argument of function */
 	volatile int ret;          /**< return value of function */
 
-	volatile enum rte_lcore_state_t state; /**< lcore state */
+	volatile RTE_ATOMIC(enum rte_lcore_state_t) state; /**< lcore state */
 	unsigned int socket_id;    /**< physical socket id for this lcore */
 	unsigned int core_id;      /**< core number on socket for this lcore */
 	int core_index;            /**< relative index, starting from 0 */
diff --git a/lib/eal/common/eal_trace.h b/lib/eal/common/eal_trace.h
index d66bcfe..ace2ef3 100644
--- a/lib/eal/common/eal_trace.h
+++ b/lib/eal/common/eal_trace.h
@@ -50,7 +50,7 @@ struct trace_arg {
 struct trace {
 	char *dir;
 	int register_errno;
-	uint32_t status;
+	RTE_ATOMIC(uint32_t) status;
 	enum rte_trace_mode mode;
 	rte_uuid_t uuid;
 	uint32_t buff_len;
@@ -65,7 +65,7 @@ struct trace {
 	uint32_t ctf_meta_offset_freq;
 	uint32_t ctf_meta_offset_freq_off_s;
 	uint32_t ctf_meta_offset_freq_off;
-	uint16_t ctf_fixup_done;
+	RTE_ATOMIC(uint16_t) ctf_fixup_done;
 	rte_spinlock_t lock;
 };
 
diff --git a/lib/eal/common/rte_service.c b/lib/eal/common/rte_service.c
index 098a821..e183d2e 100644
--- a/lib/eal/common/rte_service.c
+++ b/lib/eal/common/rte_service.c
@@ -43,8 +43,8 @@ struct rte_service_spec_impl {
 	rte_spinlock_t execute_lock;
 
 	/* API set/get-able variables */
-	int8_t app_runstate;
-	int8_t comp_runstate;
+	RTE_ATOMIC(int8_t) app_runstate;
+	RTE_ATOMIC(int8_t) comp_runstate;
 	uint8_t internal_flags;
 
 	/* per service statistics */
@@ -52,24 +52,24 @@ struct rte_service_spec_impl {
 	 * It does not indicate the number of cores the service is running
 	 * on currently.
 	 */
-	uint32_t num_mapped_cores;
+	RTE_ATOMIC(uint32_t) num_mapped_cores;
 } __rte_cache_aligned;
 
 struct service_stats {
-	uint64_t calls;
-	uint64_t cycles;
+	RTE_ATOMIC(uint64_t) calls;
+	RTE_ATOMIC(uint64_t) cycles;
 };
 
 /* the internal values of a service core */
 struct core_state {
 	/* map of services IDs are run on this core */
 	uint64_t service_mask;
-	uint8_t runstate; /* running or stopped */
-	uint8_t thread_active; /* indicates when thread is in service_run() */
+	RTE_ATOMIC(uint8_t) runstate; /* running or stopped */
+	RTE_ATOMIC(uint8_t) thread_active; /* indicates when thread is in service_run() */
 	uint8_t is_service_core; /* set if core is currently a service core */
 	uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX];
-	uint64_t loops;
-	uint64_t cycles;
+	RTE_ATOMIC(uint64_t) loops;
+	RTE_ATOMIC(uint64_t) cycles;
 	struct service_stats service_stats[RTE_SERVICE_NUM_MAX];
 } __rte_cache_aligned;
 
@@ -314,11 +314,11 @@ struct core_state {
 	 * service_run and service_runstate_get function.
 	 */
 	if (runstate)
-		__atomic_store_n(&s->comp_runstate, RUNSTATE_RUNNING,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&s->comp_runstate, RUNSTATE_RUNNING,
+			rte_memory_order_release);
 	else
-		__atomic_store_n(&s->comp_runstate, RUNSTATE_STOPPED,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&s->comp_runstate, RUNSTATE_STOPPED,
+			rte_memory_order_release);
 
 	return 0;
 }
@@ -334,11 +334,11 @@ struct core_state {
 	 * service_run runstate_get function.
 	 */
 	if (runstate)
-		__atomic_store_n(&s->app_runstate, RUNSTATE_RUNNING,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&s->app_runstate, RUNSTATE_RUNNING,
+			rte_memory_order_release);
 	else
-		__atomic_store_n(&s->app_runstate, RUNSTATE_STOPPED,
-			__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&s->app_runstate, RUNSTATE_STOPPED,
+			rte_memory_order_release);
 
 	rte_eal_trace_service_runstate_set(id, runstate);
 	return 0;
@@ -354,14 +354,14 @@ struct core_state {
 	 * Use load-acquire memory order. This synchronizes with
 	 * store-release in service state set functions.
 	 */
-	if (__atomic_load_n(&s->comp_runstate, __ATOMIC_ACQUIRE) ==
+	if (rte_atomic_load_explicit(&s->comp_runstate, rte_memory_order_acquire) ==
 			RUNSTATE_RUNNING &&
-	    __atomic_load_n(&s->app_runstate, __ATOMIC_ACQUIRE) ==
+	    rte_atomic_load_explicit(&s->app_runstate, rte_memory_order_acquire) ==
 			RUNSTATE_RUNNING) {
 		int check_disabled = !(s->internal_flags &
 			SERVICE_F_START_CHECK);
-		int lcore_mapped = (__atomic_load_n(&s->num_mapped_cores,
-			__ATOMIC_RELAXED) > 0);
+		int lcore_mapped = (rte_atomic_load_explicit(&s->num_mapped_cores,
+			rte_memory_order_relaxed) > 0);
 
 		return (check_disabled | lcore_mapped);
 	} else
@@ -392,15 +392,15 @@ struct core_state {
 			uint64_t end = rte_rdtsc();
 			uint64_t cycles = end - start;
 
-			__atomic_store_n(&cs->cycles, cs->cycles + cycles,
-				__ATOMIC_RELAXED);
-			__atomic_store_n(&service_stats->cycles,
+			rte_atomic_store_explicit(&cs->cycles, cs->cycles + cycles,
+				rte_memory_order_relaxed);
+			rte_atomic_store_explicit(&service_stats->cycles,
 				service_stats->cycles + cycles,
-				__ATOMIC_RELAXED);
+				rte_memory_order_relaxed);
 		}
 
-		__atomic_store_n(&service_stats->calls,
-			service_stats->calls + 1, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&service_stats->calls,
+			service_stats->calls + 1, rte_memory_order_relaxed);
 	} else {
 		s->spec.callback(userdata);
 	}
@@ -420,9 +420,9 @@ struct core_state {
 	 * Use load-acquire memory order. This synchronizes with
 	 * store-release in service state set functions.
 	 */
-	if (__atomic_load_n(&s->comp_runstate, __ATOMIC_ACQUIRE) !=
+	if (rte_atomic_load_explicit(&s->comp_runstate, rte_memory_order_acquire) !=
 			RUNSTATE_RUNNING ||
-	    __atomic_load_n(&s->app_runstate, __ATOMIC_ACQUIRE) !=
+	    rte_atomic_load_explicit(&s->app_runstate, rte_memory_order_acquire) !=
 			RUNSTATE_RUNNING ||
 	    !(service_mask & (UINT64_C(1) << i))) {
 		cs->service_active_on_lcore[i] = 0;
@@ -472,11 +472,11 @@ struct core_state {
 	/* Increment num_mapped_cores to reflect that this core is
 	 * now mapped capable of running the service.
 	 */
-	__atomic_fetch_add(&s->num_mapped_cores, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&s->num_mapped_cores, 1, rte_memory_order_relaxed);
 
 	int ret = service_run(id, cs, UINT64_MAX, s, serialize_mt_unsafe);
 
-	__atomic_fetch_sub(&s->num_mapped_cores, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_sub_explicit(&s->num_mapped_cores, 1, rte_memory_order_relaxed);
 
 	return ret;
 }
@@ -489,13 +489,13 @@ struct core_state {
 	const int lcore = rte_lcore_id();
 	struct core_state *cs = &lcore_states[lcore];
 
-	__atomic_store_n(&cs->thread_active, 1, __ATOMIC_SEQ_CST);
+	rte_atomic_store_explicit(&cs->thread_active, 1, rte_memory_order_seq_cst);
 
 	/* runstate act as the guard variable. Use load-acquire
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
 	 */
-	while (__atomic_load_n(&cs->runstate, __ATOMIC_ACQUIRE) ==
+	while (rte_atomic_load_explicit(&cs->runstate, rte_memory_order_acquire) ==
 			RUNSTATE_RUNNING) {
 
 		const uint64_t service_mask = cs->service_mask;
@@ -513,7 +513,7 @@ struct core_state {
 			service_run(i, cs, service_mask, service_get(i), 1);
 		}
 
-		__atomic_store_n(&cs->loops, cs->loops + 1, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&cs->loops, cs->loops + 1, rte_memory_order_relaxed);
 	}
 
 	/* Switch off this core for all services, to ensure that future
@@ -526,7 +526,7 @@ struct core_state {
 	 * this store, ensuring that once this store is visible, the service
 	 * lcore thread really is done in service cores code.
 	 */
-	__atomic_store_n(&cs->thread_active, 0, __ATOMIC_SEQ_CST);
+	rte_atomic_store_explicit(&cs->thread_active, 0, rte_memory_order_seq_cst);
 	return 0;
 }
 
@@ -539,8 +539,8 @@ struct core_state {
 	/* Load thread_active using ACQUIRE to avoid instructions dependent on
 	 * the result being re-ordered before this load completes.
 	 */
-	return __atomic_load_n(&lcore_states[lcore].thread_active,
-			       __ATOMIC_ACQUIRE);
+	return rte_atomic_load_explicit(&lcore_states[lcore].thread_active,
+			       rte_memory_order_acquire);
 }
 
 int32_t
@@ -646,13 +646,13 @@ struct core_state {
 
 		if (*set && !lcore_mapped) {
 			lcore_states[lcore].service_mask |= sid_mask;
-			__atomic_fetch_add(&rte_services[sid].num_mapped_cores,
-				1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&rte_services[sid].num_mapped_cores,
+				1, rte_memory_order_relaxed);
 		}
 		if (!*set && lcore_mapped) {
 			lcore_states[lcore].service_mask &= ~(sid_mask);
-			__atomic_fetch_sub(&rte_services[sid].num_mapped_cores,
-				1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_sub_explicit(&rte_services[sid].num_mapped_cores,
+				1, rte_memory_order_relaxed);
 		}
 	}
 
@@ -709,13 +709,13 @@ struct core_state {
 			 * store-release memory order here to synchronize
 			 * with load-acquire in runstate read functions.
 			 */
-			__atomic_store_n(&lcore_states[i].runstate,
-				RUNSTATE_STOPPED, __ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&lcore_states[i].runstate,
+				RUNSTATE_STOPPED, rte_memory_order_release);
 		}
 	}
 	for (i = 0; i < RTE_SERVICE_NUM_MAX; i++)
-		__atomic_store_n(&rte_services[i].num_mapped_cores, 0,
-			__ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&rte_services[i].num_mapped_cores, 0,
+			rte_memory_order_relaxed);
 
 	return 0;
 }
@@ -735,8 +735,8 @@ struct core_state {
 	/* Use store-release memory order here to synchronize with
 	 * load-acquire in runstate read functions.
 	 */
-	__atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
-		__ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
+		rte_memory_order_release);
 
 	return rte_eal_wait_lcore(lcore);
 }
@@ -755,7 +755,7 @@ struct core_state {
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
 	 */
-	if (__atomic_load_n(&cs->runstate, __ATOMIC_ACQUIRE) !=
+	if (rte_atomic_load_explicit(&cs->runstate, rte_memory_order_acquire) !=
 			RUNSTATE_STOPPED)
 		return -EBUSY;
 
@@ -779,7 +779,7 @@ struct core_state {
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
 	 */
-	if (__atomic_load_n(&cs->runstate, __ATOMIC_ACQUIRE) ==
+	if (rte_atomic_load_explicit(&cs->runstate, rte_memory_order_acquire) ==
 			RUNSTATE_RUNNING)
 		return -EALREADY;
 
@@ -789,7 +789,7 @@ struct core_state {
 	/* Use load-acquire memory order here to synchronize with
 	 * store-release in runstate update functions.
 	 */
-	__atomic_store_n(&cs->runstate, RUNSTATE_RUNNING, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&cs->runstate, RUNSTATE_RUNNING, rte_memory_order_release);
 
 	rte_eal_trace_service_lcore_start(lcore);
 
@@ -808,7 +808,7 @@ struct core_state {
 	 * memory order here to synchronize with store-release
 	 * in runstate update functions.
 	 */
-	if (__atomic_load_n(&lcore_states[lcore].runstate, __ATOMIC_ACQUIRE) ==
+	if (rte_atomic_load_explicit(&lcore_states[lcore].runstate, rte_memory_order_acquire) ==
 			RUNSTATE_STOPPED)
 		return -EALREADY;
 
@@ -820,8 +820,8 @@ struct core_state {
 		int32_t enabled = service_mask & (UINT64_C(1) << i);
 		int32_t service_running = rte_service_runstate_get(i);
 		int32_t only_core = (1 ==
-			__atomic_load_n(&rte_services[i].num_mapped_cores,
-				__ATOMIC_RELAXED));
+			rte_atomic_load_explicit(&rte_services[i].num_mapped_cores,
+				rte_memory_order_relaxed));
 
 		/* if the core is mapped, and the service is running, and this
 		 * is the only core that is mapped, the service would cease to
@@ -834,8 +834,8 @@ struct core_state {
 	/* Use store-release memory order here to synchronize with
 	 * load-acquire in runstate read functions.
 	 */
-	__atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
-		__ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
+		rte_memory_order_release);
 
 	rte_eal_trace_service_lcore_stop(lcore);
 
@@ -847,7 +847,7 @@ struct core_state {
 {
 	struct core_state *cs = &lcore_states[lcore];
 
-	return __atomic_load_n(&cs->loops, __ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&cs->loops, rte_memory_order_relaxed);
 }
 
 static uint64_t
@@ -855,7 +855,7 @@ struct core_state {
 {
 	struct core_state *cs = &lcore_states[lcore];
 
-	return __atomic_load_n(&cs->cycles, __ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&cs->cycles, rte_memory_order_relaxed);
 }
 
 static uint64_t
@@ -863,8 +863,8 @@ struct core_state {
 {
 	struct core_state *cs = &lcore_states[lcore];
 
-	return __atomic_load_n(&cs->service_stats[service_id].calls,
-		__ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&cs->service_stats[service_id].calls,
+		rte_memory_order_relaxed);
 }
 
 static uint64_t
@@ -872,8 +872,8 @@ struct core_state {
 {
 	struct core_state *cs = &lcore_states[lcore];
 
-	return __atomic_load_n(&cs->service_stats[service_id].cycles,
-		__ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&cs->service_stats[service_id].cycles,
+		rte_memory_order_relaxed);
 }
 
 typedef uint64_t (*lcore_attr_get_fun)(uint32_t service_id,
diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c
index 39a2868..568e06e 100644
--- a/lib/eal/freebsd/eal.c
+++ b/lib/eal/freebsd/eal.c
@@ -597,8 +597,8 @@ static void rte_eal_init_alert(const char *msg)
 		return -1;
 	}
 
-	if (!__atomic_compare_exchange_n(&run_once, &has_run, 1, 0,
-					__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&run_once, &has_run, 1,
+					rte_memory_order_relaxed, rte_memory_order_relaxed)) {
 		rte_eal_init_alert("already called initialization.");
 		rte_errno = EALREADY;
 		return -1;
@@ -622,7 +622,7 @@ static void rte_eal_init_alert(const char *msg)
 	if (fctret < 0) {
 		rte_eal_init_alert("Invalid 'command line' arguments.");
 		rte_errno = EINVAL;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -636,20 +636,20 @@ static void rte_eal_init_alert(const char *msg)
 	if (eal_plugins_init() < 0) {
 		rte_eal_init_alert("Cannot init plugins");
 		rte_errno = EINVAL;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
 	if (eal_trace_init() < 0) {
 		rte_eal_init_alert("Cannot init trace");
 		rte_errno = EFAULT;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
 	if (eal_option_device_parse()) {
 		rte_errno = ENODEV;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -683,7 +683,7 @@ static void rte_eal_init_alert(const char *msg)
 	if (rte_bus_scan()) {
 		rte_eal_init_alert("Cannot scan the buses for devices");
 		rte_errno = ENODEV;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -736,7 +736,7 @@ static void rte_eal_init_alert(const char *msg)
 		if (ret < 0) {
 			rte_eal_init_alert("Cannot get hugepage information.");
 			rte_errno = EACCES;
-			__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+			rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 			return -1;
 		}
 	}
@@ -915,8 +915,8 @@ static void rte_eal_init_alert(const char *msg)
 	static uint32_t run_once;
 	uint32_t has_run = 0;
 
-	if (!__atomic_compare_exchange_n(&run_once, &has_run, 1, 0,
-			__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&run_once, &has_run, 1,
+			rte_memory_order_relaxed, rte_memory_order_relaxed)) {
 		RTE_LOG(WARNING, EAL, "Already called cleanup\n");
 		rte_errno = EALREADY;
 		return -1;
diff --git a/lib/eal/include/rte_epoll.h b/lib/eal/include/rte_epoll.h
index 01525f5..ae0cf20 100644
--- a/lib/eal/include/rte_epoll.h
+++ b/lib/eal/include/rte_epoll.h
@@ -13,6 +13,7 @@
 
 #include <stdint.h>
 
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -38,7 +39,7 @@ enum {
 
 /** interrupt epoll event obj, taken by epoll_event.ptr */
 struct rte_epoll_event {
-	uint32_t status;           /**< OUT: event status */
+	RTE_ATOMIC(uint32_t) status;           /**< OUT: event status */
 	int fd;                    /**< OUT: event fd */
 	int epfd;       /**< OUT: epoll instance the ev associated with */
 	struct rte_epoll_data epdata;
diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c
index 5f4b2fb..57da058 100644
--- a/lib/eal/linux/eal.c
+++ b/lib/eal/linux/eal.c
@@ -967,7 +967,7 @@ static void rte_eal_init_alert(const char *msg)
 rte_eal_init(int argc, char **argv)
 {
 	int i, fctret, ret;
-	static uint32_t run_once;
+	static RTE_ATOMIC(uint32_t) run_once;
 	uint32_t has_run = 0;
 	char cpuset[RTE_CPU_AFFINITY_STR_LEN];
 	char thread_name[RTE_THREAD_NAME_SIZE];
@@ -983,8 +983,8 @@ static void rte_eal_init_alert(const char *msg)
 		return -1;
 	}
 
-	if (!__atomic_compare_exchange_n(&run_once, &has_run, 1, 0,
-					__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&run_once, &has_run, 1,
+					rte_memory_order_relaxed, rte_memory_order_relaxed)) {
 		rte_eal_init_alert("already called initialization.");
 		rte_errno = EALREADY;
 		return -1;
@@ -1008,14 +1008,14 @@ static void rte_eal_init_alert(const char *msg)
 	if (fctret < 0) {
 		rte_eal_init_alert("Invalid 'command line' arguments.");
 		rte_errno = EINVAL;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
 	if (eal_plugins_init() < 0) {
 		rte_eal_init_alert("Cannot init plugins");
 		rte_errno = EINVAL;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -1027,7 +1027,7 @@ static void rte_eal_init_alert(const char *msg)
 
 	if (eal_option_device_parse()) {
 		rte_errno = ENODEV;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -1061,7 +1061,7 @@ static void rte_eal_init_alert(const char *msg)
 	if (rte_bus_scan()) {
 		rte_eal_init_alert("Cannot scan the buses for devices");
 		rte_errno = ENODEV;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -1125,7 +1125,7 @@ static void rte_eal_init_alert(const char *msg)
 		if (ret < 0) {
 			rte_eal_init_alert("Cannot get hugepage information.");
 			rte_errno = EACCES;
-			__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+			rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 			return -1;
 		}
 	}
@@ -1150,7 +1150,7 @@ static void rte_eal_init_alert(const char *msg)
 			 internal_conf->syslog_facility) < 0) {
 		rte_eal_init_alert("Cannot init logging.");
 		rte_errno = ENOMEM;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 
@@ -1158,7 +1158,7 @@ static void rte_eal_init_alert(const char *msg)
 	if (rte_eal_vfio_setup() < 0) {
 		rte_eal_init_alert("Cannot init VFIO");
 		rte_errno = EAGAIN;
-		__atomic_store_n(&run_once, 0, __ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&run_once, 0, rte_memory_order_relaxed);
 		return -1;
 	}
 #endif
@@ -1345,11 +1345,11 @@ static void rte_eal_init_alert(const char *msg)
 int
 rte_eal_cleanup(void)
 {
-	static uint32_t run_once;
+	static RTE_ATOMIC(uint32_t) run_once;
 	uint32_t has_run = 0;
 
-	if (!__atomic_compare_exchange_n(&run_once, &has_run, 1, 0,
-					__ATOMIC_RELAXED, __ATOMIC_RELAXED)) {
+	if (!rte_atomic_compare_exchange_strong_explicit(&run_once, &has_run, 1,
+					rte_memory_order_relaxed, rte_memory_order_relaxed)) {
 		RTE_LOG(WARNING, EAL, "Already called cleanup\n");
 		rte_errno = EALREADY;
 		return -1;
diff --git a/lib/eal/linux/eal_interrupts.c b/lib/eal/linux/eal_interrupts.c
index 24fff3d..d4919df 100644
--- a/lib/eal/linux/eal_interrupts.c
+++ b/lib/eal/linux/eal_interrupts.c
@@ -1266,9 +1266,9 @@ struct rte_intr_source {
 		 * ordering below acting as a lock to synchronize
 		 * the event data updating.
 		 */
-		if (!rev || !__atomic_compare_exchange_n(&rev->status,
-				    &valid_status, RTE_EPOLL_EXEC, 0,
-				    __ATOMIC_ACQUIRE, __ATOMIC_RELAXED))
+		if (!rev || !rte_atomic_compare_exchange_strong_explicit(&rev->status,
+				    &valid_status, RTE_EPOLL_EXEC,
+				    rte_memory_order_acquire, rte_memory_order_relaxed))
 			continue;
 
 		events[count].status        = RTE_EPOLL_VALID;
@@ -1283,8 +1283,8 @@ struct rte_intr_source {
 		/* the status update should be observed after
 		 * the other fields change.
 		 */
-		__atomic_store_n(&rev->status, RTE_EPOLL_VALID,
-				__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&rev->status, RTE_EPOLL_VALID,
+				rte_memory_order_release);
 		count++;
 	}
 	return count;
@@ -1374,10 +1374,10 @@ struct rte_intr_source {
 {
 	uint32_t valid_status = RTE_EPOLL_VALID;
 
-	while (!__atomic_compare_exchange_n(&ev->status, &valid_status,
-		    RTE_EPOLL_INVALID, 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) {
-		while (__atomic_load_n(&ev->status,
-				__ATOMIC_RELAXED) != RTE_EPOLL_VALID)
+	while (!rte_atomic_compare_exchange_strong_explicit(&ev->status, &valid_status,
+		    RTE_EPOLL_INVALID, rte_memory_order_acquire, rte_memory_order_relaxed)) {
+		while (rte_atomic_load_explicit(&ev->status,
+				rte_memory_order_relaxed) != RTE_EPOLL_VALID)
 			rte_pause();
 		valid_status = RTE_EPOLL_VALID;
 	}
@@ -1402,8 +1402,8 @@ struct rte_intr_source {
 		epfd = rte_intr_tls_epfd();
 
 	if (op == EPOLL_CTL_ADD) {
-		__atomic_store_n(&event->status, RTE_EPOLL_VALID,
-				__ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&event->status, RTE_EPOLL_VALID,
+				rte_memory_order_relaxed);
 		event->fd = fd;  /* ignore fd in event */
 		event->epfd = epfd;
 		ev.data.ptr = (void *)event;
@@ -1415,13 +1415,13 @@ struct rte_intr_source {
 			op, fd, strerror(errno));
 		if (op == EPOLL_CTL_ADD)
 			/* rollback status when CTL_ADD fail */
-			__atomic_store_n(&event->status, RTE_EPOLL_INVALID,
-					__ATOMIC_RELAXED);
+			rte_atomic_store_explicit(&event->status, RTE_EPOLL_INVALID,
+					rte_memory_order_relaxed);
 		return -1;
 	}
 
-	if (op == EPOLL_CTL_DEL && __atomic_load_n(&event->status,
-			__ATOMIC_RELAXED) != RTE_EPOLL_INVALID)
+	if (op == EPOLL_CTL_DEL && rte_atomic_load_explicit(&event->status,
+			rte_memory_order_relaxed) != RTE_EPOLL_INVALID)
 		eal_epoll_data_safe_free(event);
 
 	return 0;
@@ -1450,8 +1450,8 @@ struct rte_intr_source {
 	case RTE_INTR_EVENT_ADD:
 		epfd_op = EPOLL_CTL_ADD;
 		rev = rte_intr_elist_index_get(intr_handle, efd_idx);
-		if (__atomic_load_n(&rev->status,
-				__ATOMIC_RELAXED) != RTE_EPOLL_INVALID) {
+		if (rte_atomic_load_explicit(&rev->status,
+				rte_memory_order_relaxed) != RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event already been added.\n");
 			return -EEXIST;
 		}
@@ -1474,8 +1474,8 @@ struct rte_intr_source {
 	case RTE_INTR_EVENT_DEL:
 		epfd_op = EPOLL_CTL_DEL;
 		rev = rte_intr_elist_index_get(intr_handle, efd_idx);
-		if (__atomic_load_n(&rev->status,
-				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID) {
+		if (rte_atomic_load_explicit(&rev->status,
+				rte_memory_order_relaxed) == RTE_EPOLL_INVALID) {
 			RTE_LOG(INFO, EAL, "Event does not exist.\n");
 			return -EPERM;
 		}
@@ -1500,8 +1500,8 @@ struct rte_intr_source {
 
 	for (i = 0; i < (uint32_t)rte_intr_nb_efd_get(intr_handle); i++) {
 		rev = rte_intr_elist_index_get(intr_handle, i);
-		if (__atomic_load_n(&rev->status,
-				__ATOMIC_RELAXED) == RTE_EPOLL_INVALID)
+		if (rte_atomic_load_explicit(&rev->status,
+				rte_memory_order_relaxed) == RTE_EPOLL_INVALID)
 			continue;
 		if (rte_epoll_ctl(rev->epfd, EPOLL_CTL_DEL, rev->fd, rev)) {
 			/* force free if the entry valid */
diff --git a/lib/eal/ppc/include/rte_atomic.h b/lib/eal/ppc/include/rte_atomic.h
index 7382412..645c713 100644
--- a/lib/eal/ppc/include/rte_atomic.h
+++ b/lib/eal/ppc/include/rte_atomic.h
@@ -48,7 +48,7 @@
 static inline int
 rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
 {
-	return __atomic_compare_exchange(dst, &exp, &src, 0, rte_memory_order_acquire,
+	return rte_atomic_compare_exchange_strong_explicit(dst, &exp, src, rte_memory_order_acquire,
 		rte_memory_order_acquire) ? 1 : 0;
 }
 
@@ -90,7 +90,7 @@ static inline int rte_atomic16_dec_and_test(rte_atomic16_t *v)
 static inline int
 rte_atomic32_cmpset(volatile uint32_t *dst, uint32_t exp, uint32_t src)
 {
-	return __atomic_compare_exchange(dst, &exp, &src, 0, rte_memory_order_acquire,
+	return rte_atomic_compare_exchange_strong_explicit(dst, &exp, src, rte_memory_order_acquire,
 		rte_memory_order_acquire) ? 1 : 0;
 }
 
@@ -132,7 +132,7 @@ static inline int rte_atomic32_dec_and_test(rte_atomic32_t *v)
 static inline int
 rte_atomic64_cmpset(volatile uint64_t *dst, uint64_t exp, uint64_t src)
 {
-	return __atomic_compare_exchange(dst, &exp, &src, 0, rte_memory_order_acquire,
+	return rte_atomic_compare_exchange_strong_explicit(dst, &exp, src, rte_memory_order_acquire,
 		rte_memory_order_acquire) ? 1 : 0;
 }
 
diff --git a/lib/eal/windows/rte_thread.c b/lib/eal/windows/rte_thread.c
index acf6484..145ac4b 100644
--- a/lib/eal/windows/rte_thread.c
+++ b/lib/eal/windows/rte_thread.c
@@ -9,6 +9,7 @@
 #include <rte_eal.h>
 #include <rte_common.h>
 #include <rte_errno.h>
+#include <rte_stdatomic.h>
 #include <rte_thread.h>
 
 #include "eal_windows.h"
@@ -19,7 +20,7 @@ struct eal_tls_key {
 
 struct thread_routine_ctx {
 	rte_thread_func thread_func;
-	bool thread_init_failed;
+	RTE_ATOMIC(bool) thread_init_failed;
 	void *routine_args;
 };
 
@@ -168,7 +169,8 @@ struct thread_routine_ctx {
 thread_func_wrapper(void *arg)
 {
 	struct thread_routine_ctx ctx = *(struct thread_routine_ctx *)arg;
-	const bool thread_exit = __atomic_load_n(&ctx.thread_init_failed, __ATOMIC_ACQUIRE);
+	const bool thread_exit = rte_atomic_load_explicit(
+		&ctx.thread_init_failed, rte_memory_order_acquire);
 
 	free(arg);
 
@@ -237,7 +239,7 @@ struct thread_routine_ctx {
 	}
 
 resume_thread:
-	__atomic_store_n(&ctx->thread_init_failed, thread_exit, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ctx->thread_init_failed, thread_exit, rte_memory_order_release);
 
 	if (ResumeThread(thread_handle) == (DWORD)-1) {
 		ret = thread_log_last_error("ResumeThread()");
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 04/19] eventdev: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (2 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 03/19] eal: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 05/19] gpudev: " Tyler Retzlaff
                     ` (16 subsequent siblings)
  20 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 drivers/event/cnxk/cnxk_tim_worker.h   |  4 +--
 lib/eventdev/rte_event_timer_adapter.c | 66 +++++++++++++++++-----------------
 lib/eventdev/rte_event_timer_adapter.h |  2 +-
 3 files changed, 36 insertions(+), 36 deletions(-)

diff --git a/drivers/event/cnxk/cnxk_tim_worker.h b/drivers/event/cnxk/cnxk_tim_worker.h
index f0857f2..f530d8c 100644
--- a/drivers/event/cnxk/cnxk_tim_worker.h
+++ b/drivers/event/cnxk/cnxk_tim_worker.h
@@ -314,7 +314,7 @@
 
 	tim->impl_opaque[0] = (uintptr_t)chunk;
 	tim->impl_opaque[1] = (uintptr_t)bkt;
-	__atomic_store_n(&tim->state, RTE_EVENT_TIMER_ARMED, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&tim->state, RTE_EVENT_TIMER_ARMED, rte_memory_order_release);
 	cnxk_tim_bkt_inc_nent(bkt);
 	cnxk_tim_bkt_dec_lock_relaxed(bkt);
 
@@ -425,7 +425,7 @@
 
 	tim->impl_opaque[0] = (uintptr_t)chunk;
 	tim->impl_opaque[1] = (uintptr_t)bkt;
-	__atomic_store_n(&tim->state, RTE_EVENT_TIMER_ARMED, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&tim->state, RTE_EVENT_TIMER_ARMED, rte_memory_order_release);
 	cnxk_tim_bkt_inc_nent(bkt);
 	cnxk_tim_bkt_dec_lock_relaxed(bkt);
 
diff --git a/lib/eventdev/rte_event_timer_adapter.c b/lib/eventdev/rte_event_timer_adapter.c
index 427c4c6..2746670 100644
--- a/lib/eventdev/rte_event_timer_adapter.c
+++ b/lib/eventdev/rte_event_timer_adapter.c
@@ -630,12 +630,12 @@ struct swtim {
 	uint32_t timer_data_id;
 	/* Track which cores have actually armed a timer */
 	struct {
-		uint16_t v;
+		RTE_ATOMIC(uint16_t) v;
 	} __rte_cache_aligned in_use[RTE_MAX_LCORE];
 	/* Track which cores' timer lists should be polled */
-	unsigned int poll_lcores[RTE_MAX_LCORE];
+	RTE_ATOMIC(unsigned int) poll_lcores[RTE_MAX_LCORE];
 	/* The number of lists that should be polled */
-	int n_poll_lcores;
+	RTE_ATOMIC(int) n_poll_lcores;
 	/* Timers which have expired and can be returned to a mempool */
 	struct rte_timer *expired_timers[EXP_TIM_BUF_SZ];
 	/* The number of timers that can be returned to a mempool */
@@ -669,10 +669,10 @@ struct swtim {
 
 	if (unlikely(sw->in_use[lcore].v == 0)) {
 		sw->in_use[lcore].v = 1;
-		n_lcores = __atomic_fetch_add(&sw->n_poll_lcores, 1,
-					     __ATOMIC_RELAXED);
-		__atomic_store_n(&sw->poll_lcores[n_lcores], lcore,
-				__ATOMIC_RELAXED);
+		n_lcores = rte_atomic_fetch_add_explicit(&sw->n_poll_lcores, 1,
+					     rte_memory_order_relaxed);
+		rte_atomic_store_explicit(&sw->poll_lcores[n_lcores], lcore,
+				rte_memory_order_relaxed);
 	}
 
 	ret = event_buffer_add(&sw->buffer, &evtim->ev);
@@ -719,8 +719,8 @@ struct swtim {
 		sw->stats.evtim_exp_count++;
 
 		if (type == SINGLE)
-			__atomic_store_n(&evtim->state, RTE_EVENT_TIMER_NOT_ARMED,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&evtim->state, RTE_EVENT_TIMER_NOT_ARMED,
+				rte_memory_order_release);
 	}
 
 	if (event_buffer_batch_ready(&sw->buffer)) {
@@ -846,7 +846,7 @@ struct swtim {
 
 	if (swtim_did_tick(sw)) {
 		rte_timer_alt_manage(sw->timer_data_id,
-				     sw->poll_lcores,
+				     (unsigned int *)(uintptr_t)sw->poll_lcores,
 				     sw->n_poll_lcores,
 				     swtim_callback);
 
@@ -1027,7 +1027,7 @@ struct swtim {
 
 	/* Free outstanding timers */
 	rte_timer_stop_all(sw->timer_data_id,
-			   sw->poll_lcores,
+			   (unsigned int *)(uintptr_t)sw->poll_lcores,
 			   sw->n_poll_lcores,
 			   swtim_free_tim,
 			   sw);
@@ -1142,7 +1142,7 @@ struct swtim {
 	uint64_t cur_cycles;
 
 	/* Check that timer is armed */
-	n_state = __atomic_load_n(&evtim->state, __ATOMIC_ACQUIRE);
+	n_state = rte_atomic_load_explicit(&evtim->state, rte_memory_order_acquire);
 	if (n_state != RTE_EVENT_TIMER_ARMED)
 		return -EINVAL;
 
@@ -1201,15 +1201,15 @@ struct swtim {
 	 * The atomic compare-and-swap operation can prevent the race condition
 	 * on in_use flag between multiple non-EAL threads.
 	 */
-	if (unlikely(__atomic_compare_exchange_n(&sw->in_use[lcore_id].v,
-			&exp_state, 1, 0,
-			__ATOMIC_RELAXED, __ATOMIC_RELAXED))) {
+	if (unlikely(rte_atomic_compare_exchange_strong_explicit(&sw->in_use[lcore_id].v,
+			&exp_state, 1,
+			rte_memory_order_relaxed, rte_memory_order_relaxed))) {
 		EVTIM_LOG_DBG("Adding lcore id = %u to list of lcores to poll",
 			      lcore_id);
-		n_lcores = __atomic_fetch_add(&sw->n_poll_lcores, 1,
-					     __ATOMIC_RELAXED);
-		__atomic_store_n(&sw->poll_lcores[n_lcores], lcore_id,
-				__ATOMIC_RELAXED);
+		n_lcores = rte_atomic_fetch_add_explicit(&sw->n_poll_lcores, 1,
+					     rte_memory_order_relaxed);
+		rte_atomic_store_explicit(&sw->poll_lcores[n_lcores], lcore_id,
+				rte_memory_order_relaxed);
 	}
 
 	ret = rte_mempool_get_bulk(sw->tim_pool, (void **)tims,
@@ -1223,7 +1223,7 @@ struct swtim {
 	type = get_timer_type(adapter);
 
 	for (i = 0; i < nb_evtims; i++) {
-		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		n_state = rte_atomic_load_explicit(&evtims[i]->state, rte_memory_order_acquire);
 		if (n_state == RTE_EVENT_TIMER_ARMED) {
 			rte_errno = EALREADY;
 			break;
@@ -1235,9 +1235,9 @@ struct swtim {
 
 		if (unlikely(check_destination_event_queue(evtims[i],
 							   adapter) < 0)) {
-			__atomic_store_n(&evtims[i]->state,
+			rte_atomic_store_explicit(&evtims[i]->state,
 					RTE_EVENT_TIMER_ERROR,
-					__ATOMIC_RELAXED);
+					rte_memory_order_relaxed);
 			rte_errno = EINVAL;
 			break;
 		}
@@ -1250,15 +1250,15 @@ struct swtim {
 
 		ret = get_timeout_cycles(evtims[i], adapter, &cycles);
 		if (unlikely(ret == -1)) {
-			__atomic_store_n(&evtims[i]->state,
+			rte_atomic_store_explicit(&evtims[i]->state,
 					RTE_EVENT_TIMER_ERROR_TOOLATE,
-					__ATOMIC_RELAXED);
+					rte_memory_order_relaxed);
 			rte_errno = EINVAL;
 			break;
 		} else if (unlikely(ret == -2)) {
-			__atomic_store_n(&evtims[i]->state,
+			rte_atomic_store_explicit(&evtims[i]->state,
 					RTE_EVENT_TIMER_ERROR_TOOEARLY,
-					__ATOMIC_RELAXED);
+					rte_memory_order_relaxed);
 			rte_errno = EINVAL;
 			break;
 		}
@@ -1267,9 +1267,9 @@ struct swtim {
 					  type, lcore_id, NULL, evtims[i]);
 		if (ret < 0) {
 			/* tim was in RUNNING or CONFIG state */
-			__atomic_store_n(&evtims[i]->state,
+			rte_atomic_store_explicit(&evtims[i]->state,
 					RTE_EVENT_TIMER_ERROR,
-					__ATOMIC_RELEASE);
+					rte_memory_order_release);
 			break;
 		}
 
@@ -1277,8 +1277,8 @@ struct swtim {
 		/* RELEASE ordering guarantees the adapter specific value
 		 * changes observed before the update of state.
 		 */
-		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_ARMED,
-				__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&evtims[i]->state, RTE_EVENT_TIMER_ARMED,
+				rte_memory_order_release);
 	}
 
 	if (i < nb_evtims)
@@ -1320,7 +1320,7 @@ struct swtim {
 		/* ACQUIRE ordering guarantees the access of implementation
 		 * specific opaque data under the correct state.
 		 */
-		n_state = __atomic_load_n(&evtims[i]->state, __ATOMIC_ACQUIRE);
+		n_state = rte_atomic_load_explicit(&evtims[i]->state, rte_memory_order_acquire);
 		if (n_state == RTE_EVENT_TIMER_CANCELED) {
 			rte_errno = EALREADY;
 			break;
@@ -1346,8 +1346,8 @@ struct swtim {
 		 * to make sure the state update data observed between
 		 * threads.
 		 */
-		__atomic_store_n(&evtims[i]->state, RTE_EVENT_TIMER_CANCELED,
-				__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&evtims[i]->state, RTE_EVENT_TIMER_CANCELED,
+				rte_memory_order_release);
 	}
 
 	return i;
diff --git a/lib/eventdev/rte_event_timer_adapter.h b/lib/eventdev/rte_event_timer_adapter.h
index fbdddf8..49e646a 100644
--- a/lib/eventdev/rte_event_timer_adapter.h
+++ b/lib/eventdev/rte_event_timer_adapter.h
@@ -498,7 +498,7 @@ struct rte_event_timer {
 	 * implementation specific values to share between the arm and cancel
 	 * operations.  The application should not modify this field.
 	 */
-	enum rte_event_timer_state state;
+	RTE_ATOMIC(enum rte_event_timer_state) state;
 	/**< State of the event timer. */
 	uint8_t user_meta[];
 	/**< Memory to store user specific metadata.
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 05/19] gpudev: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (3 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 04/19] eventdev: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 06/19] ipsec: " Tyler Retzlaff
                     ` (15 subsequent siblings)
  20 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/gpudev/gpudev.c        | 6 +++---
 lib/gpudev/gpudev_driver.h | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/lib/gpudev/gpudev.c b/lib/gpudev/gpudev.c
index 8f12abe..6845d18 100644
--- a/lib/gpudev/gpudev.c
+++ b/lib/gpudev/gpudev.c
@@ -228,7 +228,7 @@ struct rte_gpu *
 	dev->mpshared->info.numa_node = -1;
 	dev->mpshared->info.parent = RTE_GPU_ID_NONE;
 	TAILQ_INIT(&dev->callbacks);
-	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&dev->mpshared->process_refcnt, 1, rte_memory_order_relaxed);
 
 	gpu_count++;
 	GPU_LOG(DEBUG, "new device %s (id %d) of total %d",
@@ -277,7 +277,7 @@ struct rte_gpu *
 
 	TAILQ_INIT(&dev->callbacks);
 	dev->mpshared = shared_dev;
-	__atomic_fetch_add(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&dev->mpshared->process_refcnt, 1, rte_memory_order_relaxed);
 
 	gpu_count++;
 	GPU_LOG(DEBUG, "attached device %s (id %d) of total %d",
@@ -340,7 +340,7 @@ struct rte_gpu *
 
 	gpu_free_callbacks(dev);
 	dev->process_state = RTE_GPU_STATE_UNUSED;
-	__atomic_fetch_sub(&dev->mpshared->process_refcnt, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_sub_explicit(&dev->mpshared->process_refcnt, 1, rte_memory_order_relaxed);
 	gpu_count--;
 
 	return 0;
diff --git a/lib/gpudev/gpudev_driver.h b/lib/gpudev/gpudev_driver.h
index 42898c7..0b1e7f2 100644
--- a/lib/gpudev/gpudev_driver.h
+++ b/lib/gpudev/gpudev_driver.h
@@ -69,7 +69,7 @@ struct rte_gpu_mpshared {
 	/* Device info structure. */
 	struct rte_gpu_info info;
 	/* Counter of processes using the device. */
-	uint16_t process_refcnt; /* Updated by this library. */
+	RTE_ATOMIC(uint16_t) process_refcnt; /* Updated by this library. */
 };
 
 struct rte_gpu {
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 06/19] ipsec: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (4 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 05/19] gpudev: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26 15:54     ` [EXT] " Akhil Goyal
  2023-10-27 12:59     ` Konstantin Ananyev
  2023-10-26  0:31   ` [PATCH v3 07/19] mbuf: " Tyler Retzlaff
                     ` (14 subsequent siblings)
  20 siblings, 2 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/ipsec/ipsec_sqn.h | 2 +-
 lib/ipsec/sa.h        | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/ipsec/ipsec_sqn.h b/lib/ipsec/ipsec_sqn.h
index 505950e..984a9dd 100644
--- a/lib/ipsec/ipsec_sqn.h
+++ b/lib/ipsec/ipsec_sqn.h
@@ -128,7 +128,7 @@
 
 	n = *num;
 	if (SQN_ATOMIC(sa))
-		sqn = __atomic_fetch_add(&sa->sqn.outb, n, __ATOMIC_RELAXED) + n;
+		sqn = rte_atomic_fetch_add_explicit(&sa->sqn.outb, n, rte_memory_order_relaxed) + n;
 	else {
 		sqn = sa->sqn.outb + n;
 		sa->sqn.outb = sqn;
diff --git a/lib/ipsec/sa.h b/lib/ipsec/sa.h
index ce4af8c..4b30bea 100644
--- a/lib/ipsec/sa.h
+++ b/lib/ipsec/sa.h
@@ -124,7 +124,7 @@ struct rte_ipsec_sa {
 	 * place from other frequently accessed data.
 	 */
 	union {
-		uint64_t outb;
+		RTE_ATOMIC(uint64_t) outb;
 		struct {
 			uint32_t rdidx; /* read index */
 			uint32_t wridx; /* write index */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 07/19] mbuf: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (5 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 06/19] ipsec: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-27 13:03     ` Konstantin Ananyev
  2023-10-26  0:31   ` [PATCH v3 08/19] mempool: " Tyler Retzlaff
                     ` (13 subsequent siblings)
  20 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/mbuf/rte_mbuf.h      | 20 ++++++++++----------
 lib/mbuf/rte_mbuf_core.h |  5 +++--
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 913c459..b8ab477 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -361,7 +361,7 @@ struct rte_pktmbuf_pool_private {
 static inline uint16_t
 rte_mbuf_refcnt_read(const struct rte_mbuf *m)
 {
-	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&m->refcnt, rte_memory_order_relaxed);
 }
 
 /**
@@ -374,15 +374,15 @@ struct rte_pktmbuf_pool_private {
 static inline void
 rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
 {
-	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
+	rte_atomic_store_explicit(&m->refcnt, new_value, rte_memory_order_relaxed);
 }
 
 /* internal */
 static inline uint16_t
 __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
 {
-	return __atomic_fetch_add(&m->refcnt, value,
-				 __ATOMIC_ACQ_REL) + value;
+	return rte_atomic_fetch_add_explicit(&m->refcnt, value,
+				 rte_memory_order_acq_rel) + value;
 }
 
 /**
@@ -463,7 +463,7 @@ struct rte_pktmbuf_pool_private {
 static inline uint16_t
 rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
 {
-	return __atomic_load_n(&shinfo->refcnt, __ATOMIC_RELAXED);
+	return rte_atomic_load_explicit(&shinfo->refcnt, rte_memory_order_relaxed);
 }
 
 /**
@@ -478,7 +478,7 @@ struct rte_pktmbuf_pool_private {
 rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
 	uint16_t new_value)
 {
-	__atomic_store_n(&shinfo->refcnt, new_value, __ATOMIC_RELAXED);
+	rte_atomic_store_explicit(&shinfo->refcnt, new_value, rte_memory_order_relaxed);
 }
 
 /**
@@ -502,8 +502,8 @@ struct rte_pktmbuf_pool_private {
 		return (uint16_t)value;
 	}
 
-	return __atomic_fetch_add(&shinfo->refcnt, value,
-				 __ATOMIC_ACQ_REL) + value;
+	return rte_atomic_fetch_add_explicit(&shinfo->refcnt, value,
+				 rte_memory_order_acq_rel) + value;
 }
 
 /** Mbuf prefetch */
@@ -1315,8 +1315,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 	 * Direct usage of add primitive to avoid
 	 * duplication of comparing with one.
 	 */
-	if (likely(__atomic_fetch_add(&shinfo->refcnt, -1,
-				     __ATOMIC_ACQ_REL) - 1))
+	if (likely(rte_atomic_fetch_add_explicit(&shinfo->refcnt, -1,
+				     rte_memory_order_acq_rel) - 1))
 		return 1;
 
 	/* Reinitialize counter before mbuf freeing. */
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index e9bc0d1..5688683 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -19,6 +19,7 @@
 #include <stdint.h>
 
 #include <rte_byteorder.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -497,7 +498,7 @@ struct rte_mbuf {
 	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
 	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
 	 */
-	uint16_t refcnt;
+	RTE_ATOMIC(uint16_t) refcnt;
 
 	/**
 	 * Number of segments. Only valid for the first segment of an mbuf
@@ -674,7 +675,7 @@ struct rte_mbuf {
 struct rte_mbuf_ext_shared_info {
 	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
 	void *fcb_opaque;                        /**< Free callback argument */
-	uint16_t refcnt;
+	RTE_ATOMIC(uint16_t) refcnt;
 };
 
 /** Maximum number of nb_segs allowed. */
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 08/19] mempool: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (6 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 07/19] mbuf: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-27 13:01     ` Konstantin Ananyev
  2023-10-26  0:31   ` [PATCH v3 09/19] rcu: " Tyler Retzlaff
                     ` (12 subsequent siblings)
  20 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/mempool/rte_mempool.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index f70bf36..df87cd2 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -327,8 +327,8 @@ struct rte_mempool {
 		if (likely(__lcore_id < RTE_MAX_LCORE))                         \
 			(mp)->stats[__lcore_id].name += (n);                    \
 		else                                                            \
-			__atomic_fetch_add(&((mp)->stats[RTE_MAX_LCORE].name),  \
-					   (n), __ATOMIC_RELAXED);              \
+			rte_atomic_fetch_add_explicit(&((mp)->stats[RTE_MAX_LCORE].name),  \
+					   (n), rte_memory_order_relaxed);              \
 	} while (0)
 #else
 #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {} while (0)
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 09/19] rcu: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (7 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 08/19] mempool: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 10/19] pdump: " Tyler Retzlaff
                     ` (11 subsequent siblings)
  20 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/rcu/rte_rcu_qsbr.c | 48 +++++++++++++++++------------------
 lib/rcu/rte_rcu_qsbr.h | 68 +++++++++++++++++++++++++-------------------------
 2 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/lib/rcu/rte_rcu_qsbr.c b/lib/rcu/rte_rcu_qsbr.c
index 17be93e..4dc7714 100644
--- a/lib/rcu/rte_rcu_qsbr.c
+++ b/lib/rcu/rte_rcu_qsbr.c
@@ -102,21 +102,21 @@
 	 * go out of sync. Hence, additional checks are required.
 	 */
 	/* Check if the thread is already registered */
-	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_RELAXED);
+	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
+					rte_memory_order_relaxed);
 	if (old_bmap & 1UL << id)
 		return 0;
 
 	do {
 		new_bmap = old_bmap | (1UL << id);
-		success = __atomic_compare_exchange(
+		success = rte_atomic_compare_exchange_strong_explicit(
 					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					&old_bmap, &new_bmap, 0,
-					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
+					&old_bmap, new_bmap,
+					rte_memory_order_release, rte_memory_order_relaxed);
 
 		if (success)
-			__atomic_fetch_add(&v->num_threads,
-						1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&v->num_threads,
+						1, rte_memory_order_relaxed);
 		else if (old_bmap & (1UL << id))
 			/* Someone else registered this thread.
 			 * Counter should not be incremented.
@@ -154,8 +154,8 @@
 	 * go out of sync. Hence, additional checks are required.
 	 */
 	/* Check if the thread is already unregistered */
-	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_RELAXED);
+	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
+					rte_memory_order_relaxed);
 	if (!(old_bmap & (1UL << id)))
 		return 0;
 
@@ -165,14 +165,14 @@
 		 * completed before removal of the thread from the list of
 		 * reporting threads.
 		 */
-		success = __atomic_compare_exchange(
+		success = rte_atomic_compare_exchange_strong_explicit(
 					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					&old_bmap, &new_bmap, 0,
-					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
+					&old_bmap, new_bmap,
+					rte_memory_order_release, rte_memory_order_relaxed);
 
 		if (success)
-			__atomic_fetch_sub(&v->num_threads,
-						1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_sub_explicit(&v->num_threads,
+						1, rte_memory_order_relaxed);
 		else if (!(old_bmap & (1UL << id)))
 			/* Someone else unregistered this thread.
 			 * Counter should not be incremented.
@@ -227,8 +227,8 @@
 
 	fprintf(f, "  Registered thread IDs = ");
 	for (i = 0; i < v->num_elems; i++) {
-		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_ACQUIRE);
+		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
+					rte_memory_order_acquire);
 		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
 		while (bmap) {
 			t = __builtin_ctzl(bmap);
@@ -241,26 +241,26 @@
 	fprintf(f, "\n");
 
 	fprintf(f, "  Token = %" PRIu64 "\n",
-			__atomic_load_n(&v->token, __ATOMIC_ACQUIRE));
+			rte_atomic_load_explicit(&v->token, rte_memory_order_acquire));
 
 	fprintf(f, "  Least Acknowledged Token = %" PRIu64 "\n",
-			__atomic_load_n(&v->acked_token, __ATOMIC_ACQUIRE));
+			rte_atomic_load_explicit(&v->acked_token, rte_memory_order_acquire));
 
 	fprintf(f, "Quiescent State Counts for readers:\n");
 	for (i = 0; i < v->num_elems; i++) {
-		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
-					__ATOMIC_ACQUIRE);
+		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
+					rte_memory_order_acquire);
 		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
 		while (bmap) {
 			t = __builtin_ctzl(bmap);
 			fprintf(f, "thread ID = %u, count = %" PRIu64 ", lock count = %u\n",
 				id + t,
-				__atomic_load_n(
+				rte_atomic_load_explicit(
 					&v->qsbr_cnt[id + t].cnt,
-					__ATOMIC_RELAXED),
-				__atomic_load_n(
+					rte_memory_order_relaxed),
+				rte_atomic_load_explicit(
 					&v->qsbr_cnt[id + t].lock_cnt,
-					__ATOMIC_RELAXED));
+					rte_memory_order_relaxed));
 			bmap &= ~(1UL << t);
 		}
 	}
diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h
index 87e1b55..9f4aed2 100644
--- a/lib/rcu/rte_rcu_qsbr.h
+++ b/lib/rcu/rte_rcu_qsbr.h
@@ -63,11 +63,11 @@
  * Given thread id needs to be converted to index into the array and
  * the id within the array element.
  */
-#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8)
+#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(RTE_ATOMIC(uint64_t)) * 8)
 #define __RTE_QSBR_THRID_ARRAY_SIZE(max_threads) \
 	RTE_ALIGN(RTE_ALIGN_MUL_CEIL(max_threads, \
 		__RTE_QSBR_THRID_ARRAY_ELM_SIZE) >> 3, RTE_CACHE_LINE_SIZE)
-#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *) \
+#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t __rte_atomic *) \
 	((struct rte_rcu_qsbr_cnt *)(v + 1) + v->max_threads) + i)
 #define __RTE_QSBR_THRID_INDEX_SHIFT 6
 #define __RTE_QSBR_THRID_MASK 0x3f
@@ -75,13 +75,13 @@
 
 /* Worker thread counter */
 struct rte_rcu_qsbr_cnt {
-	uint64_t cnt;
+	RTE_ATOMIC(uint64_t) cnt;
 	/**< Quiescent state counter. Value 0 indicates the thread is offline
 	 *   64b counter is used to avoid adding more code to address
 	 *   counter overflow. Changing this to 32b would require additional
 	 *   changes to various APIs.
 	 */
-	uint32_t lock_cnt;
+	RTE_ATOMIC(uint32_t) lock_cnt;
 	/**< Lock counter. Used when RTE_LIBRTE_RCU_DEBUG is enabled */
 } __rte_cache_aligned;
 
@@ -97,16 +97,16 @@ struct rte_rcu_qsbr_cnt {
  * 2) Register thread ID array
  */
 struct rte_rcu_qsbr {
-	uint64_t token __rte_cache_aligned;
+	RTE_ATOMIC(uint64_t) token __rte_cache_aligned;
 	/**< Counter to allow for multiple concurrent quiescent state queries */
-	uint64_t acked_token;
+	RTE_ATOMIC(uint64_t) acked_token;
 	/**< Least token acked by all the threads in the last call to
 	 *   rte_rcu_qsbr_check API.
 	 */
 
 	uint32_t num_elems __rte_cache_aligned;
 	/**< Number of elements in the thread ID array */
-	uint32_t num_threads;
+	RTE_ATOMIC(uint32_t) num_threads;
 	/**< Number of threads currently using this QS variable */
 	uint32_t max_threads;
 	/**< Maximum number of threads using this QS variable */
@@ -311,13 +311,13 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * the following will not move down after the load of any shared
 	 * data structure.
 	 */
-	t = __atomic_load_n(&v->token, __ATOMIC_RELAXED);
+	t = rte_atomic_load_explicit(&v->token, rte_memory_order_relaxed);
 
-	/* __atomic_store_n(cnt, __ATOMIC_RELAXED) is used to ensure
+	/* rte_atomic_store_explicit(cnt, rte_memory_order_relaxed) is used to ensure
 	 * 'cnt' (64b) is accessed atomically.
 	 */
-	__atomic_store_n(&v->qsbr_cnt[thread_id].cnt,
-		t, __ATOMIC_RELAXED);
+	rte_atomic_store_explicit(&v->qsbr_cnt[thread_id].cnt,
+		t, rte_memory_order_relaxed);
 
 	/* The subsequent load of the data structure should not
 	 * move above the store. Hence a store-load barrier
@@ -326,7 +326,7 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * writer might not see that the reader is online, even though
 	 * the reader is referencing the shared data structure.
 	 */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 }
 
 /**
@@ -362,8 +362,8 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * data structure can not move after this store.
 	 */
 
-	__atomic_store_n(&v->qsbr_cnt[thread_id].cnt,
-		__RTE_QSBR_CNT_THR_OFFLINE, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&v->qsbr_cnt[thread_id].cnt,
+		__RTE_QSBR_CNT_THR_OFFLINE, rte_memory_order_release);
 }
 
 /**
@@ -394,8 +394,8 @@ struct rte_rcu_qsbr_dq_parameters {
 
 #if defined(RTE_LIBRTE_RCU_DEBUG)
 	/* Increment the lock counter */
-	__atomic_fetch_add(&v->qsbr_cnt[thread_id].lock_cnt,
-				1, __ATOMIC_ACQUIRE);
+	rte_atomic_fetch_add_explicit(&v->qsbr_cnt[thread_id].lock_cnt,
+				1, rte_memory_order_acquire);
 #endif
 }
 
@@ -427,8 +427,8 @@ struct rte_rcu_qsbr_dq_parameters {
 
 #if defined(RTE_LIBRTE_RCU_DEBUG)
 	/* Decrement the lock counter */
-	__atomic_fetch_sub(&v->qsbr_cnt[thread_id].lock_cnt,
-				1, __ATOMIC_RELEASE);
+	rte_atomic_fetch_sub_explicit(&v->qsbr_cnt[thread_id].lock_cnt,
+				1, rte_memory_order_release);
 
 	__RTE_RCU_IS_LOCK_CNT_ZERO(v, thread_id, WARNING,
 				"Lock counter %u. Nested locks?\n",
@@ -461,7 +461,7 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * structure are visible to the workers before the token
 	 * update is visible.
 	 */
-	t = __atomic_fetch_add(&v->token, 1, __ATOMIC_RELEASE) + 1;
+	t = rte_atomic_fetch_add_explicit(&v->token, 1, rte_memory_order_release) + 1;
 
 	return t;
 }
@@ -493,16 +493,16 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * Later loads of the shared data structure should not move
 	 * above this load. Hence, use load-acquire.
 	 */
-	t = __atomic_load_n(&v->token, __ATOMIC_ACQUIRE);
+	t = rte_atomic_load_explicit(&v->token, rte_memory_order_acquire);
 
 	/* Check if there are updates available from the writer.
 	 * Inform the writer that updates are visible to this reader.
 	 * Prior loads of the shared data structure should not move
 	 * beyond this store. Hence use store-release.
 	 */
-	if (t != __atomic_load_n(&v->qsbr_cnt[thread_id].cnt, __ATOMIC_RELAXED))
-		__atomic_store_n(&v->qsbr_cnt[thread_id].cnt,
-					 t, __ATOMIC_RELEASE);
+	if (t != rte_atomic_load_explicit(&v->qsbr_cnt[thread_id].cnt, rte_memory_order_relaxed))
+		rte_atomic_store_explicit(&v->qsbr_cnt[thread_id].cnt,
+					 t, rte_memory_order_release);
 
 	__RTE_RCU_DP_LOG(DEBUG, "%s: update: token = %" PRIu64 ", Thread ID = %d",
 		__func__, t, thread_id);
@@ -517,7 +517,7 @@ struct rte_rcu_qsbr_dq_parameters {
 	uint32_t i, j, id;
 	uint64_t bmap;
 	uint64_t c;
-	uint64_t *reg_thread_id;
+	RTE_ATOMIC(uint64_t) *reg_thread_id;
 	uint64_t acked_token = __RTE_QSBR_CNT_MAX;
 
 	for (i = 0, reg_thread_id = __RTE_QSBR_THRID_ARRAY_ELM(v, 0);
@@ -526,7 +526,7 @@ struct rte_rcu_qsbr_dq_parameters {
 		/* Load the current registered thread bit map before
 		 * loading the reader thread quiescent state counters.
 		 */
-		bmap = __atomic_load_n(reg_thread_id, __ATOMIC_ACQUIRE);
+		bmap = rte_atomic_load_explicit(reg_thread_id, rte_memory_order_acquire);
 		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
 
 		while (bmap) {
@@ -534,9 +534,9 @@ struct rte_rcu_qsbr_dq_parameters {
 			__RTE_RCU_DP_LOG(DEBUG,
 				"%s: check: token = %" PRIu64 ", wait = %d, Bit Map = 0x%" PRIx64 ", Thread ID = %d",
 				__func__, t, wait, bmap, id + j);
-			c = __atomic_load_n(
+			c = rte_atomic_load_explicit(
 					&v->qsbr_cnt[id + j].cnt,
-					__ATOMIC_ACQUIRE);
+					rte_memory_order_acquire);
 			__RTE_RCU_DP_LOG(DEBUG,
 				"%s: status: token = %" PRIu64 ", wait = %d, Thread QS cnt = %" PRIu64 ", Thread ID = %d",
 				__func__, t, wait, c, id+j);
@@ -554,8 +554,8 @@ struct rte_rcu_qsbr_dq_parameters {
 				/* This thread might have unregistered.
 				 * Re-read the bitmap.
 				 */
-				bmap = __atomic_load_n(reg_thread_id,
-						__ATOMIC_ACQUIRE);
+				bmap = rte_atomic_load_explicit(reg_thread_id,
+						rte_memory_order_acquire);
 
 				continue;
 			}
@@ -576,8 +576,8 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * no need to update this very accurately using compare-and-swap.
 	 */
 	if (acked_token != __RTE_QSBR_CNT_MAX)
-		__atomic_store_n(&v->acked_token, acked_token,
-			__ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&v->acked_token, acked_token,
+			rte_memory_order_relaxed);
 
 	return 1;
 }
@@ -598,7 +598,7 @@ struct rte_rcu_qsbr_dq_parameters {
 			"%s: check: token = %" PRIu64 ", wait = %d, Thread ID = %d",
 			__func__, t, wait, i);
 		while (1) {
-			c = __atomic_load_n(&cnt->cnt, __ATOMIC_ACQUIRE);
+			c = rte_atomic_load_explicit(&cnt->cnt, rte_memory_order_acquire);
 			__RTE_RCU_DP_LOG(DEBUG,
 				"%s: status: token = %" PRIu64 ", wait = %d, Thread QS cnt = %" PRIu64 ", Thread ID = %d",
 				__func__, t, wait, c, i);
@@ -628,8 +628,8 @@ struct rte_rcu_qsbr_dq_parameters {
 	 * no need to update this very accurately using compare-and-swap.
 	 */
 	if (acked_token != __RTE_QSBR_CNT_MAX)
-		__atomic_store_n(&v->acked_token, acked_token,
-			__ATOMIC_RELAXED);
+		rte_atomic_store_explicit(&v->acked_token, acked_token,
+			rte_memory_order_relaxed);
 
 	return 1;
 }
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 10/19] pdump: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (8 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 09/19] rcu: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 11/19] stack: " Tyler Retzlaff
                     ` (10 subsequent siblings)
  20 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/pdump/rte_pdump.c | 14 +++++++-------
 lib/pdump/rte_pdump.h |  8 ++++----
 2 files changed, 11 insertions(+), 11 deletions(-)

diff --git a/lib/pdump/rte_pdump.c b/lib/pdump/rte_pdump.c
index 53cca10..80b90c6 100644
--- a/lib/pdump/rte_pdump.c
+++ b/lib/pdump/rte_pdump.c
@@ -110,8 +110,8 @@ struct pdump_response {
 		 * then packet doesn't match the filter (will be ignored).
 		 */
 		if (cbs->filter && rcs[i] == 0) {
-			__atomic_fetch_add(&stats->filtered,
-					   1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&stats->filtered,
+					   1, rte_memory_order_relaxed);
 			continue;
 		}
 
@@ -127,18 +127,18 @@ struct pdump_response {
 			p = rte_pktmbuf_copy(pkts[i], mp, 0, cbs->snaplen);
 
 		if (unlikely(p == NULL))
-			__atomic_fetch_add(&stats->nombuf, 1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&stats->nombuf, 1, rte_memory_order_relaxed);
 		else
 			dup_bufs[d_pkts++] = p;
 	}
 
-	__atomic_fetch_add(&stats->accepted, d_pkts, __ATOMIC_RELAXED);
+	rte_atomic_fetch_add_explicit(&stats->accepted, d_pkts, rte_memory_order_relaxed);
 
 	ring_enq = rte_ring_enqueue_burst(ring, (void *)&dup_bufs[0], d_pkts, NULL);
 	if (unlikely(ring_enq < d_pkts)) {
 		unsigned int drops = d_pkts - ring_enq;
 
-		__atomic_fetch_add(&stats->ringfull, drops, __ATOMIC_RELAXED);
+		rte_atomic_fetch_add_explicit(&stats->ringfull, drops, rte_memory_order_relaxed);
 		rte_pktmbuf_free_bulk(&dup_bufs[ring_enq], drops);
 	}
 }
@@ -720,10 +720,10 @@ struct pdump_response {
 	uint16_t qid;
 
 	for (qid = 0; qid < nq; qid++) {
-		const uint64_t *perq = (const uint64_t *)&stats[port][qid];
+		const RTE_ATOMIC(uint64_t) *perq = (const uint64_t __rte_atomic *)&stats[port][qid];
 
 		for (i = 0; i < sizeof(*total) / sizeof(uint64_t); i++) {
-			val = __atomic_load_n(&perq[i], __ATOMIC_RELAXED);
+			val = rte_atomic_load_explicit(&perq[i], rte_memory_order_relaxed);
 			sum[i] += val;
 		}
 	}
diff --git a/lib/pdump/rte_pdump.h b/lib/pdump/rte_pdump.h
index b1a3918..7feb2b6 100644
--- a/lib/pdump/rte_pdump.h
+++ b/lib/pdump/rte_pdump.h
@@ -233,10 +233,10 @@ enum {
  * The statistics are sum of both receive and transmit queues.
  */
 struct rte_pdump_stats {
-	uint64_t accepted; /**< Number of packets accepted by filter. */
-	uint64_t filtered; /**< Number of packets rejected by filter. */
-	uint64_t nombuf;   /**< Number of mbuf allocation failures. */
-	uint64_t ringfull; /**< Number of missed packets due to ring full. */
+	RTE_ATOMIC(uint64_t) accepted; /**< Number of packets accepted by filter. */
+	RTE_ATOMIC(uint64_t) filtered; /**< Number of packets rejected by filter. */
+	RTE_ATOMIC(uint64_t) nombuf;   /**< Number of mbuf allocation failures. */
+	RTE_ATOMIC(uint64_t) ringfull; /**< Number of missed packets due to ring full. */
 
 	uint64_t reserved[4]; /**< Reserved and pad to cache line */
 };
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 11/19] stack: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (9 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 10/19] pdump: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 12/19] telemetry: " Tyler Retzlaff
                     ` (9 subsequent siblings)
  20 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/stack/rte_stack.h            |  2 +-
 lib/stack/rte_stack_lf_c11.h     | 24 ++++++++++++------------
 lib/stack/rte_stack_lf_generic.h | 18 +++++++++---------
 3 files changed, 22 insertions(+), 22 deletions(-)

diff --git a/lib/stack/rte_stack.h b/lib/stack/rte_stack.h
index 921d29a..a379300 100644
--- a/lib/stack/rte_stack.h
+++ b/lib/stack/rte_stack.h
@@ -44,7 +44,7 @@ struct rte_stack_lf_list {
 	/** List head */
 	struct rte_stack_lf_head head __rte_aligned(16);
 	/** List len */
-	uint64_t len;
+	RTE_ATOMIC(uint64_t) len;
 };
 
 /* Structure containing two lock-free LIFO lists: the stack itself and a list
diff --git a/lib/stack/rte_stack_lf_c11.h b/lib/stack/rte_stack_lf_c11.h
index 687a6f6..9cb6998 100644
--- a/lib/stack/rte_stack_lf_c11.h
+++ b/lib/stack/rte_stack_lf_c11.h
@@ -26,8 +26,8 @@
 	 * elements. If the mempool is near-empty to the point that this is a
 	 * concern, the user should consider increasing the mempool size.
 	 */
-	return (unsigned int)__atomic_load_n(&s->stack_lf.used.len,
-					     __ATOMIC_RELAXED);
+	return (unsigned int)rte_atomic_load_explicit(&s->stack_lf.used.len,
+					     rte_memory_order_relaxed);
 }
 
 static __rte_always_inline void
@@ -59,14 +59,14 @@
 				(rte_int128_t *)&list->head,
 				(rte_int128_t *)&old_head,
 				(rte_int128_t *)&new_head,
-				1, __ATOMIC_RELEASE,
-				__ATOMIC_RELAXED);
+				1, rte_memory_order_release,
+				rte_memory_order_relaxed);
 	} while (success == 0);
 
 	/* Ensure the stack modifications are not reordered with respect
 	 * to the LIFO len update.
 	 */
-	__atomic_fetch_add(&list->len, num, __ATOMIC_RELEASE);
+	rte_atomic_fetch_add_explicit(&list->len, num, rte_memory_order_release);
 }
 
 static __rte_always_inline struct rte_stack_lf_elem *
@@ -80,7 +80,7 @@
 	int success;
 
 	/* Reserve num elements, if available */
-	len = __atomic_load_n(&list->len, __ATOMIC_RELAXED);
+	len = rte_atomic_load_explicit(&list->len, rte_memory_order_relaxed);
 
 	while (1) {
 		/* Does the list contain enough elements? */
@@ -88,10 +88,10 @@
 			return NULL;
 
 		/* len is updated on failure */
-		if (__atomic_compare_exchange_n(&list->len,
+		if (rte_atomic_compare_exchange_weak_explicit(&list->len,
 						&len, len - num,
-						1, __ATOMIC_ACQUIRE,
-						__ATOMIC_RELAXED))
+						rte_memory_order_acquire,
+						rte_memory_order_relaxed))
 			break;
 	}
 
@@ -110,7 +110,7 @@
 		 * elements are properly ordered with respect to the head
 		 * pointer read.
 		 */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 
 		rte_prefetch0(old_head.top);
 
@@ -159,8 +159,8 @@
 				(rte_int128_t *)&list->head,
 				(rte_int128_t *)&old_head,
 				(rte_int128_t *)&new_head,
-				0, __ATOMIC_RELAXED,
-				__ATOMIC_RELAXED);
+				0, rte_memory_order_relaxed,
+				rte_memory_order_relaxed);
 	} while (success == 0);
 
 	return old_head.top;
diff --git a/lib/stack/rte_stack_lf_generic.h b/lib/stack/rte_stack_lf_generic.h
index 39f7ff3..cc69e4d 100644
--- a/lib/stack/rte_stack_lf_generic.h
+++ b/lib/stack/rte_stack_lf_generic.h
@@ -27,7 +27,7 @@
 	 * concern, the user should consider increasing the mempool size.
 	 */
 	/* NOTE: review for potential ordering optimization */
-	return __atomic_load_n(&s->stack_lf.used.len, __ATOMIC_SEQ_CST);
+	return rte_atomic_load_explicit(&s->stack_lf.used.len, rte_memory_order_seq_cst);
 }
 
 static __rte_always_inline void
@@ -64,11 +64,11 @@
 				(rte_int128_t *)&list->head,
 				(rte_int128_t *)&old_head,
 				(rte_int128_t *)&new_head,
-				1, __ATOMIC_RELEASE,
-				__ATOMIC_RELAXED);
+				1, rte_memory_order_release,
+				rte_memory_order_relaxed);
 	} while (success == 0);
 	/* NOTE: review for potential ordering optimization */
-	__atomic_fetch_add(&list->len, num, __ATOMIC_SEQ_CST);
+	rte_atomic_fetch_add_explicit(&list->len, num, rte_memory_order_seq_cst);
 }
 
 static __rte_always_inline struct rte_stack_lf_elem *
@@ -83,15 +83,15 @@
 	/* Reserve num elements, if available */
 	while (1) {
 		/* NOTE: review for potential ordering optimization */
-		uint64_t len = __atomic_load_n(&list->len, __ATOMIC_SEQ_CST);
+		uint64_t len = rte_atomic_load_explicit(&list->len, rte_memory_order_seq_cst);
 
 		/* Does the list contain enough elements? */
 		if (unlikely(len < num))
 			return NULL;
 
 		/* NOTE: review for potential ordering optimization */
-		if (__atomic_compare_exchange_n(&list->len, &len, len - num,
-				0, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST))
+		if (rte_atomic_compare_exchange_strong_explicit(&list->len, &len, len - num,
+				rte_memory_order_seq_cst, rte_memory_order_seq_cst))
 			break;
 	}
 
@@ -143,8 +143,8 @@
 				(rte_int128_t *)&list->head,
 				(rte_int128_t *)&old_head,
 				(rte_int128_t *)&new_head,
-				1, __ATOMIC_RELEASE,
-				__ATOMIC_RELAXED);
+				1, rte_memory_order_release,
+				rte_memory_order_relaxed);
 	} while (success == 0);
 
 	return old_head.top;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 12/19] telemetry: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (10 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 11/19] stack: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 13/19] vhost: " Tyler Retzlaff
                     ` (8 subsequent siblings)
  20 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/telemetry/telemetry.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/lib/telemetry/telemetry.c b/lib/telemetry/telemetry.c
index aeb078c..9298284 100644
--- a/lib/telemetry/telemetry.c
+++ b/lib/telemetry/telemetry.c
@@ -45,7 +45,7 @@ struct socket {
 	int sock;
 	char path[sizeof(((struct sockaddr_un *)0)->sun_path)];
 	handler fn;
-	uint16_t *num_clients;
+	RTE_ATOMIC(uint16_t) *num_clients;
 };
 static struct socket v2_socket; /* socket for v2 telemetry */
 static struct socket v1_socket; /* socket for v1 telemetry */
@@ -64,7 +64,7 @@ struct socket {
 /* Used when accessing or modifying list of command callbacks */
 static rte_spinlock_t callback_sl = RTE_SPINLOCK_INITIALIZER;
 #ifndef RTE_EXEC_ENV_WINDOWS
-static uint16_t v2_clients;
+static RTE_ATOMIC(uint16_t) v2_clients;
 #endif /* !RTE_EXEC_ENV_WINDOWS */
 
 int
@@ -404,7 +404,7 @@ struct socket {
 		bytes = read(s, buffer, sizeof(buffer) - 1);
 	}
 	close(s);
-	__atomic_fetch_sub(&v2_clients, 1, __ATOMIC_RELAXED);
+	rte_atomic_fetch_sub_explicit(&v2_clients, 1, rte_memory_order_relaxed);
 	return NULL;
 }
 
@@ -421,14 +421,14 @@ struct socket {
 			return NULL;
 		}
 		if (s->num_clients != NULL) {
-			uint16_t conns = __atomic_load_n(s->num_clients,
-					__ATOMIC_RELAXED);
+			uint16_t conns = rte_atomic_load_explicit(s->num_clients,
+					rte_memory_order_relaxed);
 			if (conns >= MAX_CONNECTIONS) {
 				close(s_accepted);
 				continue;
 			}
-			__atomic_fetch_add(s->num_clients, 1,
-					__ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(s->num_clients, 1,
+					rte_memory_order_relaxed);
 		}
 		rc = pthread_create(&th, NULL, s->fn,
 				    (void *)(uintptr_t)s_accepted);
@@ -437,8 +437,8 @@ struct socket {
 				 strerror(rc));
 			close(s_accepted);
 			if (s->num_clients != NULL)
-				__atomic_fetch_sub(s->num_clients, 1,
-						   __ATOMIC_RELAXED);
+				rte_atomic_fetch_sub_explicit(s->num_clients, 1,
+						   rte_memory_order_relaxed);
 			continue;
 		}
 		pthread_detach(th);
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 13/19] vhost: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (11 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 12/19] telemetry: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26 11:57     ` Maxime Coquelin
  2023-10-26  0:31   ` [PATCH v3 14/19] cryptodev: " Tyler Retzlaff
                     ` (7 subsequent siblings)
  20 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/vhost/vdpa.c            |  3 ++-
 lib/vhost/vhost.c           | 42 ++++++++++++++++----------------
 lib/vhost/vhost.h           | 39 ++++++++++++++++--------------
 lib/vhost/vhost_user.c      |  6 ++---
 lib/vhost/virtio_net.c      | 58 +++++++++++++++++++++++++--------------------
 lib/vhost/virtio_net_ctrl.c |  6 +++--
 6 files changed, 84 insertions(+), 70 deletions(-)

diff --git a/lib/vhost/vdpa.c b/lib/vhost/vdpa.c
index 6284ea2..219eef8 100644
--- a/lib/vhost/vdpa.c
+++ b/lib/vhost/vdpa.c
@@ -235,7 +235,8 @@ struct rte_vdpa_device *
 	}
 
 	/* used idx is the synchronization point for the split vring */
-	__atomic_store_n(&vq->used->idx, idx_m, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit((unsigned short __rte_atomic *)&vq->used->idx,
+		idx_m, rte_memory_order_release);
 
 	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))
 		vring_used_event(s_vring) = idx_m;
diff --git a/lib/vhost/vhost.c b/lib/vhost/vhost.c
index 7fde412..bdcf85b 100644
--- a/lib/vhost/vhost.c
+++ b/lib/vhost/vhost.c
@@ -128,12 +128,13 @@ struct vhost_vq_stats_name_off {
 {
 #if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 70100)
 	/*
-	 * __sync_ built-ins are deprecated, but __atomic_ ones
+	 * __sync_ built-ins are deprecated, but rte_atomic_ ones
 	 * are sub-optimized in older GCC versions.
 	 */
 	__sync_fetch_and_or_1(addr, (1U << nr));
 #else
-	__atomic_fetch_or(addr, (1U << nr), __ATOMIC_RELAXED);
+	rte_atomic_fetch_or_explicit((volatile uint8_t __rte_atomic *)addr, (1U << nr),
+		rte_memory_order_relaxed);
 #endif
 }
 
@@ -155,7 +156,7 @@ struct vhost_vq_stats_name_off {
 		return;
 
 	/* To make sure guest memory updates are committed before logging */
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	page = addr / VHOST_LOG_PAGE;
 	while (page * VHOST_LOG_PAGE < addr + len) {
@@ -197,7 +198,7 @@ struct vhost_vq_stats_name_off {
 	if (unlikely(!vq->log_cache))
 		return;
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	log_base = (unsigned long *)(uintptr_t)dev->log_base;
 
@@ -206,17 +207,18 @@ struct vhost_vq_stats_name_off {
 
 #if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 70100)
 		/*
-		 * '__sync' builtins are deprecated, but '__atomic' ones
+		 * '__sync' builtins are deprecated, but 'rte_atomic' ones
 		 * are sub-optimized in older GCC versions.
 		 */
 		__sync_fetch_and_or(log_base + elem->offset, elem->val);
 #else
-		__atomic_fetch_or(log_base + elem->offset, elem->val,
-				__ATOMIC_RELAXED);
+		rte_atomic_fetch_or_explicit(
+			(unsigned long __rte_atomic *)(log_base + elem->offset),
+			elem->val, rte_memory_order_relaxed);
 #endif
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	vq->log_cache_nb_elem = 0;
 }
@@ -231,7 +233,7 @@ struct vhost_vq_stats_name_off {
 
 	if (unlikely(!vq->log_cache)) {
 		/* No logging cache allocated, write dirty log map directly */
-		rte_atomic_thread_fence(__ATOMIC_RELEASE);
+		rte_atomic_thread_fence(rte_memory_order_release);
 		vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page);
 
 		return;
@@ -251,7 +253,7 @@ struct vhost_vq_stats_name_off {
 		 * No more room for a new log cache entry,
 		 * so write the dirty log map directly.
 		 */
-		rte_atomic_thread_fence(__ATOMIC_RELEASE);
+		rte_atomic_thread_fence(rte_memory_order_release);
 		vhost_log_page((uint8_t *)(uintptr_t)dev->log_base, page);
 
 		return;
@@ -1184,11 +1186,11 @@ struct vhost_vq_stats_name_off {
 	if (unlikely(idx >= vq->size))
 		return -1;
 
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	vq->inflight_split->desc[idx].inflight = 0;
 
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	vq->inflight_split->used_idx = last_used_idx;
 	return 0;
@@ -1227,11 +1229,11 @@ struct vhost_vq_stats_name_off {
 	if (unlikely(head >= vq->size))
 		return -1;
 
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	inflight_info->desc[head].inflight = 0;
 
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	inflight_info->old_free_head = inflight_info->free_head;
 	inflight_info->old_used_idx = inflight_info->used_idx;
@@ -1454,7 +1456,7 @@ struct vhost_vq_stats_name_off {
 			vq->avail_wrap_counter << 15;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	vq->device_event->flags = flags;
 	return 0;
@@ -1519,16 +1521,16 @@ struct vhost_vq_stats_name_off {
 
 	rte_rwlock_read_lock(&vq->access_lock);
 
-	__atomic_store_n(&vq->irq_pending, false, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&vq->irq_pending, false, rte_memory_order_release);
 
 	if (dev->backend_ops->inject_irq(dev, vq)) {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-			__atomic_fetch_add(&vq->stats.guest_notifications_error,
-					1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&vq->stats.guest_notifications_error,
+					1, rte_memory_order_relaxed);
 	} else {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-			__atomic_fetch_add(&vq->stats.guest_notifications,
-					1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&vq->stats.guest_notifications,
+					1, rte_memory_order_relaxed);
 		if (dev->notify_ops->guest_notified)
 			dev->notify_ops->guest_notified(dev->vid);
 	}
diff --git a/lib/vhost/vhost.h b/lib/vhost/vhost.h
index 5fc9035..f8624fb 100644
--- a/lib/vhost/vhost.h
+++ b/lib/vhost/vhost.h
@@ -158,9 +158,9 @@ struct virtqueue_stats {
 	uint64_t inflight_completed;
 	uint64_t guest_notifications_suppressed;
 	/* Counters below are atomic, and should be incremented as such. */
-	uint64_t guest_notifications;
-	uint64_t guest_notifications_offloaded;
-	uint64_t guest_notifications_error;
+	RTE_ATOMIC(uint64_t) guest_notifications;
+	RTE_ATOMIC(uint64_t) guest_notifications_offloaded;
+	RTE_ATOMIC(uint64_t) guest_notifications_error;
 };
 
 /**
@@ -348,7 +348,7 @@ struct vhost_virtqueue {
 	struct vhost_vring_addr ring_addrs;
 	struct virtqueue_stats	stats;
 
-	bool irq_pending;
+	RTE_ATOMIC(bool) irq_pending;
 } __rte_cache_aligned;
 
 /* Virtio device status as per Virtio specification */
@@ -486,7 +486,7 @@ struct virtio_net {
 	uint32_t		flags;
 	uint16_t		vhost_hlen;
 	/* to tell if we need broadcast rarp packet */
-	int16_t			broadcast_rarp;
+	RTE_ATOMIC(int16_t)	broadcast_rarp;
 	uint32_t		nr_vring;
 	int			async_copy;
 
@@ -557,7 +557,8 @@ struct virtio_net {
 static inline bool
 desc_is_avail(struct vring_packed_desc *desc, bool wrap_counter)
 {
-	uint16_t flags = __atomic_load_n(&desc->flags, __ATOMIC_ACQUIRE);
+	uint16_t flags = rte_atomic_load_explicit((unsigned short __rte_atomic *)&desc->flags,
+		rte_memory_order_acquire);
 
 	return wrap_counter == !!(flags & VRING_DESC_F_AVAIL) &&
 		wrap_counter != !!(flags & VRING_DESC_F_USED);
@@ -914,17 +915,19 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	bool expected = false;
 
 	if (dev->notify_ops->guest_notify) {
-		if (__atomic_compare_exchange_n(&vq->irq_pending, &expected, true, 0,
-				  __ATOMIC_RELEASE, __ATOMIC_RELAXED)) {
+		if (rte_atomic_compare_exchange_strong_explicit(&vq->irq_pending, &expected, true,
+				  rte_memory_order_release, rte_memory_order_relaxed)) {
 			if (dev->notify_ops->guest_notify(dev->vid, vq->index)) {
 				if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-					__atomic_fetch_add(&vq->stats.guest_notifications_offloaded,
-						1, __ATOMIC_RELAXED);
+					rte_atomic_fetch_add_explicit(
+						&vq->stats.guest_notifications_offloaded,
+						1, rte_memory_order_relaxed);
 				return;
 			}
 
 			/* Offloading failed, fallback to direct IRQ injection */
-			__atomic_store_n(&vq->irq_pending, false, __ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&vq->irq_pending, false,
+				rte_memory_order_release);
 		} else {
 			vq->stats.guest_notifications_suppressed++;
 			return;
@@ -933,14 +936,14 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 
 	if (dev->backend_ops->inject_irq(dev, vq)) {
 		if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-			__atomic_fetch_add(&vq->stats.guest_notifications_error,
-				1, __ATOMIC_RELAXED);
+			rte_atomic_fetch_add_explicit(&vq->stats.guest_notifications_error,
+				1, rte_memory_order_relaxed);
 		return;
 	}
 
 	if (dev->flags & VIRTIO_DEV_STATS_ENABLED)
-		__atomic_fetch_add(&vq->stats.guest_notifications,
-			1, __ATOMIC_RELAXED);
+		rte_atomic_fetch_add_explicit(&vq->stats.guest_notifications,
+			1, rte_memory_order_relaxed);
 	if (dev->notify_ops->guest_notified)
 		dev->notify_ops->guest_notified(dev->vid);
 }
@@ -949,7 +952,7 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 vhost_vring_call_split(struct virtio_net *dev, struct vhost_virtqueue *vq)
 {
 	/* Flush used->idx update before we read avail->flags. */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	/* Don't kick guest if we don't reach index specified by guest. */
 	if (dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX)) {
@@ -981,7 +984,7 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 	bool signalled_used_valid, kick = false;
 
 	/* Flush used desc update. */
-	rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+	rte_atomic_thread_fence(rte_memory_order_seq_cst);
 
 	if (!(dev->features & (1ULL << VIRTIO_RING_F_EVENT_IDX))) {
 		if (vq->driver_event->flags !=
@@ -1007,7 +1010,7 @@ uint64_t translate_log_addr(struct virtio_net *dev, struct vhost_virtqueue *vq,
 		goto kick;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+	rte_atomic_thread_fence(rte_memory_order_acquire);
 
 	off_wrap = vq->driver_event->off_wrap;
 	off = off_wrap & ~(1 << 15);
diff --git a/lib/vhost/vhost_user.c b/lib/vhost/vhost_user.c
index 901a80b..e363121 100644
--- a/lib/vhost/vhost_user.c
+++ b/lib/vhost/vhost_user.c
@@ -1914,7 +1914,7 @@ static int vhost_user_set_vring_err(struct virtio_net **pdev,
 
 	if (inflight_split->used_idx != used->idx) {
 		inflight_split->desc[last_io].inflight = 0;
-		rte_atomic_thread_fence(__ATOMIC_SEQ_CST);
+		rte_atomic_thread_fence(rte_memory_order_seq_cst);
 		inflight_split->used_idx = used->idx;
 	}
 
@@ -2418,10 +2418,10 @@ static int vhost_user_set_log_fd(struct virtio_net **pdev,
 	 * Set the flag to inject a RARP broadcast packet at
 	 * rte_vhost_dequeue_burst().
 	 *
-	 * __ATOMIC_RELEASE ordering is for making sure the mac is
+	 * rte_memory_order_release ordering is for making sure the mac is
 	 * copied before the flag is set.
 	 */
-	__atomic_store_n(&dev->broadcast_rarp, 1, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&dev->broadcast_rarp, 1, rte_memory_order_release);
 	vdpa_dev = dev->vdpa_dev;
 	if (vdpa_dev && vdpa_dev->ops->migration_done)
 		vdpa_dev->ops->migration_done(dev->vid);
diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 759a78e..8af20f1 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -298,8 +298,8 @@
 
 	vhost_log_cache_sync(dev, vq);
 
-	__atomic_fetch_add(&vq->used->idx, vq->shadow_used_idx,
-			   __ATOMIC_RELEASE);
+	rte_atomic_fetch_add_explicit((unsigned short __rte_atomic *)&vq->used->idx,
+		vq->shadow_used_idx, rte_memory_order_release);
 	vq->shadow_used_idx = 0;
 	vhost_log_used_vring(dev, vq, offsetof(struct vring_used, idx),
 		sizeof(vq->used->idx));
@@ -335,7 +335,7 @@
 	}
 
 	/* The ordering for storing desc flags needs to be enforced. */
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	for (i = 0; i < vq->shadow_used_idx; i++) {
 		uint16_t flags;
@@ -387,8 +387,9 @@
 
 	vq->desc_packed[vq->shadow_last_used_idx].id = used_elem->id;
 	/* desc flags is the synchronization point for virtio packed vring */
-	__atomic_store_n(&vq->desc_packed[vq->shadow_last_used_idx].flags,
-			 used_elem->flags, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(
+		(unsigned short __rte_atomic *)&vq->desc_packed[vq->shadow_last_used_idx].flags,
+		used_elem->flags, rte_memory_order_release);
 
 	vhost_log_cache_used_vring(dev, vq, vq->shadow_last_used_idx *
 				   sizeof(struct vring_packed_desc),
@@ -418,7 +419,7 @@
 		desc_base[i].len = lens[i];
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE) {
 		desc_base[i].flags = flags;
@@ -515,7 +516,7 @@
 		vq->desc_packed[vq->last_used_idx + i].len = 0;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 	vhost_for_each_try_unroll(i, begin, PACKED_BATCH_SIZE)
 		vq->desc_packed[vq->last_used_idx + i].flags = flags;
 
@@ -1415,7 +1416,8 @@
 	 * The ordering between avail index and
 	 * desc reads needs to be enforced.
 	 */
-	avail_head = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE);
+	avail_head = rte_atomic_load_explicit((unsigned short __rte_atomic *)&vq->avail->idx,
+		rte_memory_order_acquire);
 
 	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
 
@@ -1806,7 +1808,8 @@
 	/*
 	 * The ordering between avail index and desc reads need to be enforced.
 	 */
-	avail_head = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE);
+	avail_head = rte_atomic_load_explicit((unsigned short __rte_atomic *)&vq->avail->idx,
+		rte_memory_order_acquire);
 
 	rte_prefetch0(&vq->avail->ring[vq->last_avail_idx & (vq->size - 1)]);
 
@@ -2222,7 +2225,7 @@
 	}
 
 	/* The ordering for storing desc flags needs to be enforced. */
-	rte_atomic_thread_fence(__ATOMIC_RELEASE);
+	rte_atomic_thread_fence(rte_memory_order_release);
 
 	from = async->last_buffer_idx_packed;
 
@@ -2311,7 +2314,9 @@
 			vhost_vring_call_packed(dev, vq);
 		} else {
 			write_back_completed_descs_split(vq, n_descs);
-			__atomic_fetch_add(&vq->used->idx, n_descs, __ATOMIC_RELEASE);
+			rte_atomic_fetch_add_explicit(
+				(unsigned short __rte_atomic *)&vq->used->idx,
+				n_descs, rte_memory_order_release);
 			vhost_vring_call_split(dev, vq);
 		}
 	} else {
@@ -3085,8 +3090,8 @@
 	 * The ordering between avail index and
 	 * desc reads needs to be enforced.
 	 */
-	avail_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
-			vq->last_avail_idx;
+	avail_entries = rte_atomic_load_explicit((unsigned short __rte_atomic *)&vq->avail->idx,
+		rte_memory_order_acquire) - vq->last_avail_idx;
 	if (avail_entries == 0)
 		return 0;
 
@@ -3224,7 +3229,7 @@
 			return -1;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+	rte_atomic_thread_fence(rte_memory_order_acquire);
 
 	vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
 		lens[i] = descs[avail_idx + i].len;
@@ -3297,7 +3302,7 @@
 			return -1;
 	}
 
-	rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
+	rte_atomic_thread_fence(rte_memory_order_acquire);
 
 	vhost_for_each_try_unroll(i, 0, PACKED_BATCH_SIZE)
 		lens[i] = descs[avail_idx + i].len;
@@ -3590,7 +3595,7 @@
 	 *
 	 * broadcast_rarp shares a cacheline in the virtio_net structure
 	 * with some fields that are accessed during enqueue and
-	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * rte_atomic_compare_exchange_strong_explicit causes a write if performed compare
 	 * and exchange. This could result in false sharing between enqueue
 	 * and dequeue.
 	 *
@@ -3598,9 +3603,9 @@
 	 * and only performing compare and exchange if the read indicates it
 	 * is likely to be set.
 	 */
-	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
-			__atomic_compare_exchange_n(&dev->broadcast_rarp,
-			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+	if (unlikely(rte_atomic_load_explicit(&dev->broadcast_rarp, rte_memory_order_acquire) &&
+			rte_atomic_compare_exchange_strong_explicit(&dev->broadcast_rarp,
+			&success, 0, rte_memory_order_release, rte_memory_order_relaxed))) {
 
 		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
 		if (rarp_mbuf == NULL) {
@@ -3683,7 +3688,8 @@
 		vhost_vring_call_packed(dev, vq);
 	} else {
 		write_back_completed_descs_split(vq, nr_cpl_pkts);
-		__atomic_fetch_add(&vq->used->idx, nr_cpl_pkts, __ATOMIC_RELEASE);
+		rte_atomic_fetch_add_explicit((unsigned short __rte_atomic *)&vq->used->idx,
+			nr_cpl_pkts, rte_memory_order_release);
 		vhost_vring_call_split(dev, vq);
 	}
 	vq->async->pkts_inflight_n -= nr_cpl_pkts;
@@ -3714,8 +3720,8 @@
 	 * The ordering between avail index and
 	 * desc reads needs to be enforced.
 	 */
-	avail_entries = __atomic_load_n(&vq->avail->idx, __ATOMIC_ACQUIRE) -
-			vq->last_avail_idx;
+	avail_entries = rte_atomic_load_explicit((unsigned short __rte_atomic *)&vq->avail->idx,
+		rte_memory_order_acquire) - vq->last_avail_idx;
 	if (avail_entries == 0)
 		goto out;
 
@@ -4204,7 +4210,7 @@
 	 *
 	 * broadcast_rarp shares a cacheline in the virtio_net structure
 	 * with some fields that are accessed during enqueue and
-	 * __atomic_compare_exchange_n causes a write if performed compare
+	 * rte_atomic_compare_exchange_strong_explicit causes a write if performed compare
 	 * and exchange. This could result in false sharing between enqueue
 	 * and dequeue.
 	 *
@@ -4212,9 +4218,9 @@
 	 * and only performing compare and exchange if the read indicates it
 	 * is likely to be set.
 	 */
-	if (unlikely(__atomic_load_n(&dev->broadcast_rarp, __ATOMIC_ACQUIRE) &&
-			__atomic_compare_exchange_n(&dev->broadcast_rarp,
-			&success, 0, 0, __ATOMIC_RELEASE, __ATOMIC_RELAXED))) {
+	if (unlikely(rte_atomic_load_explicit(&dev->broadcast_rarp, rte_memory_order_acquire) &&
+			rte_atomic_compare_exchange_strong_explicit(&dev->broadcast_rarp,
+			&success, 0, rte_memory_order_release, rte_memory_order_relaxed))) {
 
 		rarp_mbuf = rte_net_make_rarp_packet(mbuf_pool, &dev->mac);
 		if (rarp_mbuf == NULL) {
diff --git a/lib/vhost/virtio_net_ctrl.c b/lib/vhost/virtio_net_ctrl.c
index 6b583a0..c4847f8 100644
--- a/lib/vhost/virtio_net_ctrl.c
+++ b/lib/vhost/virtio_net_ctrl.c
@@ -33,7 +33,8 @@ struct virtio_net_ctrl_elem {
 	uint8_t *ctrl_req;
 	struct vring_desc *descs;
 
-	avail_idx = __atomic_load_n(&cvq->avail->idx, __ATOMIC_ACQUIRE);
+	avail_idx = rte_atomic_load_explicit((unsigned short __rte_atomic *)&cvq->avail->idx,
+		rte_memory_order_acquire);
 	if (avail_idx == cvq->last_avail_idx) {
 		VHOST_LOG_CONFIG(dev->ifname, DEBUG, "Control queue empty\n");
 		return 0;
@@ -236,7 +237,8 @@ struct virtio_net_ctrl_elem {
 	if (cvq->last_used_idx >= cvq->size)
 		cvq->last_used_idx -= cvq->size;
 
-	__atomic_store_n(&cvq->used->idx, cvq->last_used_idx, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit((unsigned short __rte_atomic *)&cvq->used->idx,
+		cvq->last_used_idx, rte_memory_order_release);
 
 	vhost_vring_call_split(dev, dev->cvq);
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 14/19] cryptodev: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (12 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 13/19] vhost: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26 15:53     ` [EXT] " Akhil Goyal
  2023-10-27 13:05     ` Konstantin Ananyev
  2023-10-26  0:31   ` [PATCH v3 15/19] distributor: " Tyler Retzlaff
                     ` (6 subsequent siblings)
  20 siblings, 2 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/cryptodev/rte_cryptodev.c | 22 ++++++++++++----------
 lib/cryptodev/rte_cryptodev.h | 16 ++++++++--------
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/lib/cryptodev/rte_cryptodev.c b/lib/cryptodev/rte_cryptodev.c
index 314710b..b258827 100644
--- a/lib/cryptodev/rte_cryptodev.c
+++ b/lib/cryptodev/rte_cryptodev.c
@@ -1535,12 +1535,12 @@ struct rte_cryptodev_cb *
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
 	} else {
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&list->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&list->next, cb, rte_memory_order_release);
 	}
 
 	rte_spinlock_unlock(&rte_cryptodev_callback_lock);
@@ -1555,7 +1555,8 @@ struct rte_cryptodev_cb *
 				  struct rte_cryptodev_cb *cb)
 {
 	struct rte_cryptodev *dev;
-	struct rte_cryptodev_cb **prev_cb, *curr_cb;
+	RTE_ATOMIC(struct rte_cryptodev_cb *) *prev_cb;
+	struct rte_cryptodev_cb *curr_cb;
 	struct rte_cryptodev_cb_rcu *list;
 	int ret;
 
@@ -1601,8 +1602,8 @@ struct rte_cryptodev_cb *
 		curr_cb = *prev_cb;
 		if (curr_cb == cb) {
 			/* Remove the user cb from the callback list. */
-			__atomic_store_n(prev_cb, curr_cb->next,
-				__ATOMIC_RELAXED);
+			rte_atomic_store_explicit(prev_cb, curr_cb->next,
+				rte_memory_order_relaxed);
 			ret = 0;
 			break;
 		}
@@ -1673,12 +1674,12 @@ struct rte_cryptodev_cb *
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
 	} else {
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&list->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&list->next, cb, rte_memory_order_release);
 	}
 
 	rte_spinlock_unlock(&rte_cryptodev_callback_lock);
@@ -1694,7 +1695,8 @@ struct rte_cryptodev_cb *
 				  struct rte_cryptodev_cb *cb)
 {
 	struct rte_cryptodev *dev;
-	struct rte_cryptodev_cb **prev_cb, *curr_cb;
+	RTE_ATOMIC(struct rte_cryptodev_cb *) *prev_cb;
+	struct rte_cryptodev_cb *curr_cb;
 	struct rte_cryptodev_cb_rcu *list;
 	int ret;
 
@@ -1740,8 +1742,8 @@ struct rte_cryptodev_cb *
 		curr_cb = *prev_cb;
 		if (curr_cb == cb) {
 			/* Remove the user cb from the callback list. */
-			__atomic_store_n(prev_cb, curr_cb->next,
-				__ATOMIC_RELAXED);
+			rte_atomic_store_explicit(prev_cb, curr_cb->next,
+				rte_memory_order_relaxed);
 			ret = 0;
 			break;
 		}
diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
index be0698c..9092118 100644
--- a/lib/cryptodev/rte_cryptodev.h
+++ b/lib/cryptodev/rte_cryptodev.h
@@ -979,7 +979,7 @@ struct rte_cryptodev_config {
  * queue pair on enqueue/dequeue.
  */
 struct rte_cryptodev_cb {
-	struct rte_cryptodev_cb *next;
+	RTE_ATOMIC(struct rte_cryptodev_cb *) next;
 	/**< Pointer to next callback */
 	rte_cryptodev_callback_fn fn;
 	/**< Pointer to callback function */
@@ -992,7 +992,7 @@ struct rte_cryptodev_cb {
  * Structure used to hold information about the RCU for a queue pair.
  */
 struct rte_cryptodev_cb_rcu {
-	struct rte_cryptodev_cb *next;
+	RTE_ATOMIC(struct rte_cryptodev_cb *) next;
 	/**< Pointer to next callback */
 	struct rte_rcu_qsbr *qsbr;
 	/**< RCU QSBR variable per queue pair */
@@ -1947,15 +1947,15 @@ int rte_cryptodev_remove_deq_callback(uint8_t dev_id,
 		struct rte_cryptodev_cb_rcu *list;
 		struct rte_cryptodev_cb *cb;
 
-		/* __ATOMIC_RELEASE memory order was used when the
+		/* rte_memory_order_release memory order was used when the
 		 * call back was inserted into the list.
 		 * Since there is a clear dependency between loading
-		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
 		 * not required.
 		 */
 		list = &fp_ops->qp.deq_cb[qp_id];
 		rte_rcu_qsbr_thread_online(list->qsbr, 0);
-		cb = __atomic_load_n(&list->next, __ATOMIC_RELAXED);
+		cb = rte_atomic_load_explicit(&list->next, rte_memory_order_relaxed);
 
 		while (cb != NULL) {
 			nb_ops = cb->fn(dev_id, qp_id, ops, nb_ops,
@@ -2014,15 +2014,15 @@ int rte_cryptodev_remove_deq_callback(uint8_t dev_id,
 		struct rte_cryptodev_cb_rcu *list;
 		struct rte_cryptodev_cb *cb;
 
-		/* __ATOMIC_RELEASE memory order was used when the
+		/* rte_memory_order_release memory order was used when the
 		 * call back was inserted into the list.
 		 * Since there is a clear dependency between loading
-		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
 		 * not required.
 		 */
 		list = &fp_ops->qp.enq_cb[qp_id];
 		rte_rcu_qsbr_thread_online(list->qsbr, 0);
-		cb = __atomic_load_n(&list->next, __ATOMIC_RELAXED);
+		cb = rte_atomic_load_explicit(&list->next, rte_memory_order_relaxed);
 
 		while (cb != NULL) {
 			nb_ops = cb->fn(dev_id, qp_id, ops, nb_ops,
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 15/19] distributor: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (13 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 14/19] cryptodev: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 16/19] ethdev: " Tyler Retzlaff
                     ` (5 subsequent siblings)
  20 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/distributor/distributor_private.h |  4 +--
 lib/distributor/rte_distributor.c     | 54 +++++++++++++++++------------------
 2 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/lib/distributor/distributor_private.h b/lib/distributor/distributor_private.h
index 2f29343..dfeb9b5 100644
--- a/lib/distributor/distributor_private.h
+++ b/lib/distributor/distributor_private.h
@@ -113,12 +113,12 @@ enum rte_distributor_match_function {
  * There is a separate cacheline for returns in the burst API.
  */
 struct rte_distributor_buffer {
-	volatile int64_t bufptr64[RTE_DIST_BURST_SIZE]
+	volatile RTE_ATOMIC(int64_t) bufptr64[RTE_DIST_BURST_SIZE]
 		__rte_cache_aligned; /* <= outgoing to worker */
 
 	int64_t pad1 __rte_cache_aligned;    /* <= one cache line  */
 
-	volatile int64_t retptr64[RTE_DIST_BURST_SIZE]
+	volatile RTE_ATOMIC(int64_t) retptr64[RTE_DIST_BURST_SIZE]
 		__rte_cache_aligned; /* <= incoming from worker */
 
 	int64_t pad2 __rte_cache_aligned;    /* <= one cache line  */
diff --git a/lib/distributor/rte_distributor.c b/lib/distributor/rte_distributor.c
index 5ca80dd..2ecb95c 100644
--- a/lib/distributor/rte_distributor.c
+++ b/lib/distributor/rte_distributor.c
@@ -38,7 +38,7 @@
 	struct rte_distributor_buffer *buf = &(d->bufs[worker_id]);
 	unsigned int i;
 
-	volatile int64_t *retptr64;
+	volatile RTE_ATOMIC(int64_t) *retptr64;
 
 	if (unlikely(d->alg_type == RTE_DIST_ALG_SINGLE)) {
 		rte_distributor_request_pkt_single(d->d_single,
@@ -50,7 +50,7 @@
 	/* Spin while handshake bits are set (scheduler clears it).
 	 * Sync with worker on GET_BUF flag.
 	 */
-	while (unlikely(__atomic_load_n(retptr64, __ATOMIC_ACQUIRE)
+	while (unlikely(rte_atomic_load_explicit(retptr64, rte_memory_order_acquire)
 			& (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF))) {
 		rte_pause();
 		uint64_t t = rte_rdtsc()+100;
@@ -78,8 +78,8 @@
 	 * line is ready for processing
 	 * Sync with distributor to release retptrs
 	 */
-	__atomic_store_n(retptr64, *retptr64 | RTE_DISTRIB_GET_BUF,
-			__ATOMIC_RELEASE);
+	rte_atomic_store_explicit(retptr64, *retptr64 | RTE_DISTRIB_GET_BUF,
+			rte_memory_order_release);
 }
 
 int
@@ -102,7 +102,7 @@
 	 * RETURN_BUF is set when distributor must retrieve in-flight packets
 	 * Sync with distributor to acquire bufptrs
 	 */
-	if (__atomic_load_n(&(buf->bufptr64[0]), __ATOMIC_ACQUIRE)
+	if (rte_atomic_load_explicit(&(buf->bufptr64[0]), rte_memory_order_acquire)
 		& (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF))
 		return -1;
 
@@ -120,8 +120,8 @@
 	 * on the next cacheline while we're working.
 	 * Sync with distributor on GET_BUF flag. Release bufptrs.
 	 */
-	__atomic_store_n(&(buf->bufptr64[0]),
-		buf->bufptr64[0] | RTE_DISTRIB_GET_BUF, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(buf->bufptr64[0]),
+		buf->bufptr64[0] | RTE_DISTRIB_GET_BUF, rte_memory_order_release);
 
 	return count;
 }
@@ -177,7 +177,7 @@
 	/* Spin while handshake bits are set (scheduler clears it).
 	 * Sync with worker on GET_BUF flag.
 	 */
-	while (unlikely(__atomic_load_n(&(buf->retptr64[0]), __ATOMIC_RELAXED)
+	while (unlikely(rte_atomic_load_explicit(&(buf->retptr64[0]), rte_memory_order_relaxed)
 			& (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF))) {
 		rte_pause();
 		uint64_t t = rte_rdtsc()+100;
@@ -187,7 +187,7 @@
 	}
 
 	/* Sync with distributor to acquire retptrs */
-	__atomic_thread_fence(__ATOMIC_ACQUIRE);
+	__atomic_thread_fence(rte_memory_order_acquire);
 	for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
 		/* Switch off the return bit first */
 		buf->retptr64[i] = 0;
@@ -200,15 +200,15 @@
 	 * we won't read any mbufs from there even if GET_BUF is set.
 	 * This allows distributor to retrieve in-flight already sent packets.
 	 */
-	__atomic_fetch_or(&(buf->bufptr64[0]), RTE_DISTRIB_RETURN_BUF,
-		__ATOMIC_ACQ_REL);
+	rte_atomic_fetch_or_explicit(&(buf->bufptr64[0]), RTE_DISTRIB_RETURN_BUF,
+		rte_memory_order_acq_rel);
 
 	/* set the RETURN_BUF on retptr64 even if we got no returns.
 	 * Sync with distributor on RETURN_BUF flag. Release retptrs.
 	 * Notify distributor that we don't request more packets any more.
 	 */
-	__atomic_store_n(&(buf->retptr64[0]),
-		buf->retptr64[0] | RTE_DISTRIB_RETURN_BUF, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(buf->retptr64[0]),
+		buf->retptr64[0] | RTE_DISTRIB_RETURN_BUF, rte_memory_order_release);
 
 	return 0;
 }
@@ -297,7 +297,7 @@
 	 * to worker which does not require new packets.
 	 * They must be retrieved and assigned to another worker.
 	 */
-	if (!(__atomic_load_n(&(buf->bufptr64[0]), __ATOMIC_ACQUIRE)
+	if (!(rte_atomic_load_explicit(&(buf->bufptr64[0]), rte_memory_order_acquire)
 		& RTE_DISTRIB_GET_BUF))
 		for (i = 0; i < RTE_DIST_BURST_SIZE; i++)
 			if (buf->bufptr64[i] & RTE_DISTRIB_VALID_BUF)
@@ -310,8 +310,8 @@
 	 *     with new packets if worker will make a new request.
 	 * - clear RETURN_BUF to unlock reads on worker side.
 	 */
-	__atomic_store_n(&(buf->bufptr64[0]), RTE_DISTRIB_GET_BUF,
-		__ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(buf->bufptr64[0]), RTE_DISTRIB_GET_BUF,
+		rte_memory_order_release);
 
 	/* Collect backlog packets from worker */
 	for (i = 0; i < d->backlog[wkr].count; i++)
@@ -348,7 +348,7 @@
 	unsigned int i;
 
 	/* Sync on GET_BUF flag. Acquire retptrs. */
-	if (__atomic_load_n(&(buf->retptr64[0]), __ATOMIC_ACQUIRE)
+	if (rte_atomic_load_explicit(&(buf->retptr64[0]), rte_memory_order_acquire)
 		& (RTE_DISTRIB_GET_BUF | RTE_DISTRIB_RETURN_BUF)) {
 		for (i = 0; i < RTE_DIST_BURST_SIZE; i++) {
 			if (buf->retptr64[i] & RTE_DISTRIB_VALID_BUF) {
@@ -379,7 +379,7 @@
 		/* Clear for the worker to populate with more returns.
 		 * Sync with distributor on GET_BUF flag. Release retptrs.
 		 */
-		__atomic_store_n(&(buf->retptr64[0]), 0, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&(buf->retptr64[0]), 0, rte_memory_order_release);
 	}
 	return count;
 }
@@ -404,7 +404,7 @@
 		return 0;
 
 	/* Sync with worker on GET_BUF flag */
-	while (!(__atomic_load_n(&(d->bufs[wkr].bufptr64[0]), __ATOMIC_ACQUIRE)
+	while (!(rte_atomic_load_explicit(&(d->bufs[wkr].bufptr64[0]), rte_memory_order_acquire)
 		& RTE_DISTRIB_GET_BUF)) {
 		handle_returns(d, wkr);
 		if (unlikely(!d->active[wkr]))
@@ -430,8 +430,8 @@
 	/* Clear the GET bit.
 	 * Sync with worker on GET_BUF flag. Release bufptrs.
 	 */
-	__atomic_store_n(&(buf->bufptr64[0]),
-		buf->bufptr64[0] & ~RTE_DISTRIB_GET_BUF, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&(buf->bufptr64[0]),
+		buf->bufptr64[0] & ~RTE_DISTRIB_GET_BUF, rte_memory_order_release);
 	return  buf->count;
 
 }
@@ -463,8 +463,8 @@
 		/* Flush out all non-full cache-lines to workers. */
 		for (wid = 0 ; wid < d->num_workers; wid++) {
 			/* Sync with worker on GET_BUF flag. */
-			if (__atomic_load_n(&(d->bufs[wid].bufptr64[0]),
-				__ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF) {
+			if (rte_atomic_load_explicit(&(d->bufs[wid].bufptr64[0]),
+				rte_memory_order_acquire) & RTE_DISTRIB_GET_BUF) {
 				d->bufs[wid].count = 0;
 				release(d, wid);
 				handle_returns(d, wid);
@@ -598,8 +598,8 @@
 	/* Flush out all non-full cache-lines to workers. */
 	for (wid = 0 ; wid < d->num_workers; wid++)
 		/* Sync with worker on GET_BUF flag. */
-		if ((__atomic_load_n(&(d->bufs[wid].bufptr64[0]),
-			__ATOMIC_ACQUIRE) & RTE_DISTRIB_GET_BUF)) {
+		if ((rte_atomic_load_explicit(&(d->bufs[wid].bufptr64[0]),
+			rte_memory_order_acquire) & RTE_DISTRIB_GET_BUF)) {
 			d->bufs[wid].count = 0;
 			release(d, wid);
 		}
@@ -700,8 +700,8 @@
 	/* throw away returns, so workers can exit */
 	for (wkr = 0; wkr < d->num_workers; wkr++)
 		/* Sync with worker. Release retptrs. */
-		__atomic_store_n(&(d->bufs[wkr].retptr64[0]), 0,
-				__ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&(d->bufs[wkr].retptr64[0]), 0,
+				rte_memory_order_release);
 
 	d->returns.start = d->returns.count = 0;
 }
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 16/19] ethdev: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (14 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 15/19] distributor: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-27 13:04     ` Konstantin Ananyev
  2023-10-26  0:31   ` [PATCH v3 17/19] hash: " Tyler Retzlaff
                     ` (4 subsequent siblings)
  20 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/ethdev/ethdev_driver.h   | 16 ++++++++--------
 lib/ethdev/ethdev_private.c  |  6 +++---
 lib/ethdev/rte_ethdev.c      | 24 ++++++++++++------------
 lib/ethdev/rte_ethdev.h      | 16 ++++++++--------
 lib/ethdev/rte_ethdev_core.h |  2 +-
 5 files changed, 32 insertions(+), 32 deletions(-)

diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index deb23ad..b482cd1 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -30,7 +30,7 @@
  * queue on Rx and Tx.
  */
 struct rte_eth_rxtx_callback {
-	struct rte_eth_rxtx_callback *next;
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) next;
 	union{
 		rte_rx_callback_fn rx;
 		rte_tx_callback_fn tx;
@@ -80,12 +80,12 @@ struct rte_eth_dev {
 	 * User-supplied functions called from rx_burst to post-process
 	 * received packets before passing them to the user
 	 */
-	struct rte_eth_rxtx_callback *post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
 	/**
 	 * User-supplied functions called from tx_burst to pre-process
 	 * received packets before passing them to the driver for transmission
 	 */
-	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
 
 	enum rte_eth_dev_state state; /**< Flag indicating the port state */
 	void *security_ctx; /**< Context for security ops */
@@ -1655,7 +1655,7 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 rte_eth_linkstatus_set(struct rte_eth_dev *dev,
 		       const struct rte_eth_link *new_link)
 {
-	uint64_t *dev_link = (uint64_t *)&(dev->data->dev_link);
+	RTE_ATOMIC(uint64_t) *dev_link = (uint64_t __rte_atomic *)&(dev->data->dev_link);
 	union {
 		uint64_t val64;
 		struct rte_eth_link link;
@@ -1663,8 +1663,8 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 
 	RTE_BUILD_BUG_ON(sizeof(*new_link) != sizeof(uint64_t));
 
-	orig.val64 = __atomic_exchange_n(dev_link, *(const uint64_t *)new_link,
-					__ATOMIC_SEQ_CST);
+	orig.val64 = rte_atomic_exchange_explicit(dev_link, *(const uint64_t *)new_link,
+					rte_memory_order_seq_cst);
 
 	return (orig.link.link_status == new_link->link_status) ? -1 : 0;
 }
@@ -1682,12 +1682,12 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
 rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
 		       struct rte_eth_link *link)
 {
-	uint64_t *src = (uint64_t *)&(dev->data->dev_link);
+	RTE_ATOMIC(uint64_t) *src = (uint64_t __rte_atomic *)&(dev->data->dev_link);
 	uint64_t *dst = (uint64_t *)link;
 
 	RTE_BUILD_BUG_ON(sizeof(*link) != sizeof(uint64_t));
 
-	*dst = __atomic_load_n(src, __ATOMIC_SEQ_CST);
+	*dst = rte_atomic_load_explicit(src, rte_memory_order_seq_cst);
 }
 
 /**
diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
index 7cc7f28..82e2568 100644
--- a/lib/ethdev/ethdev_private.c
+++ b/lib/ethdev/ethdev_private.c
@@ -245,7 +245,7 @@ struct dummy_queue {
 void
 eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo)
 {
-	static void *dummy_data[RTE_MAX_QUEUES_PER_PORT];
+	static RTE_ATOMIC(void *) dummy_data[RTE_MAX_QUEUES_PER_PORT];
 	uintptr_t port_id = fpo - rte_eth_fp_ops;
 
 	per_port_queues[port_id].rx_warn_once = false;
@@ -278,10 +278,10 @@ struct dummy_queue {
 	fpo->recycle_rx_descriptors_refill = dev->recycle_rx_descriptors_refill;
 
 	fpo->rxq.data = dev->data->rx_queues;
-	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
+	fpo->rxq.clbk = (void * __rte_atomic *)(uintptr_t)dev->post_rx_burst_cbs;
 
 	fpo->txq.data = dev->data->tx_queues;
-	fpo->txq.clbk = (void **)(uintptr_t)dev->pre_tx_burst_cbs;
+	fpo->txq.clbk = (void * __rte_atomic *)(uintptr_t)dev->pre_tx_burst_cbs;
 }
 
 uint16_t
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 9dabcb5..af23ac0 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5654,9 +5654,9 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(
+		rte_atomic_store_explicit(
 			&rte_eth_devices[port_id].post_rx_burst_cbs[queue_id],
-			cb, __ATOMIC_RELEASE);
+			cb, rte_memory_order_release);
 
 	} else {
 		while (tail->next)
@@ -5664,7 +5664,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
 	}
 	rte_spinlock_unlock(&eth_dev_rx_cb_lock);
 
@@ -5704,9 +5704,9 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 	/* Stores to cb->fn, cb->param and cb->next should complete before
 	 * cb is visible to data plane threads.
 	 */
-	__atomic_store_n(
+	rte_atomic_store_explicit(
 		&rte_eth_devices[port_id].post_rx_burst_cbs[queue_id],
-		cb, __ATOMIC_RELEASE);
+		cb, rte_memory_order_release);
 	rte_spinlock_unlock(&eth_dev_rx_cb_lock);
 
 	rte_eth_trace_add_first_rx_callback(port_id, queue_id, fn, user_param,
@@ -5757,9 +5757,9 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(
+		rte_atomic_store_explicit(
 			&rte_eth_devices[port_id].pre_tx_burst_cbs[queue_id],
-			cb, __ATOMIC_RELEASE);
+			cb, rte_memory_order_release);
 
 	} else {
 		while (tail->next)
@@ -5767,7 +5767,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		/* Stores to cb->fn and cb->param should complete before
 		 * cb is visible to data plane.
 		 */
-		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
+		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
 	}
 	rte_spinlock_unlock(&eth_dev_tx_cb_lock);
 
@@ -5791,7 +5791,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 	struct rte_eth_rxtx_callback *cb;
-	struct rte_eth_rxtx_callback **prev_cb;
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) *prev_cb;
 	int ret = -EINVAL;
 
 	rte_spinlock_lock(&eth_dev_rx_cb_lock);
@@ -5800,7 +5800,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		cb = *prev_cb;
 		if (cb == user_cb) {
 			/* Remove the user cb from the callback list. */
-			__atomic_store_n(prev_cb, cb->next, __ATOMIC_RELAXED);
+			rte_atomic_store_explicit(prev_cb, cb->next, rte_memory_order_relaxed);
 			ret = 0;
 			break;
 		}
@@ -5828,7 +5828,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 	int ret = -EINVAL;
 	struct rte_eth_rxtx_callback *cb;
-	struct rte_eth_rxtx_callback **prev_cb;
+	RTE_ATOMIC(struct rte_eth_rxtx_callback *) *prev_cb;
 
 	rte_spinlock_lock(&eth_dev_tx_cb_lock);
 	prev_cb = &dev->pre_tx_burst_cbs[queue_id];
@@ -5836,7 +5836,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
 		cb = *prev_cb;
 		if (cb == user_cb) {
 			/* Remove the user cb from the callback list. */
-			__atomic_store_n(prev_cb, cb->next, __ATOMIC_RELAXED);
+			rte_atomic_store_explicit(prev_cb, cb->next, rte_memory_order_relaxed);
 			ret = 0;
 			break;
 		}
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 85b9af7..d1c10f2 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -6023,14 +6023,14 @@ uint16_t rte_eth_call_rx_callbacks(uint16_t port_id, uint16_t queue_id,
 	{
 		void *cb;
 
-		/* __ATOMIC_RELEASE memory order was used when the
+		/* rte_memory_order_release memory order was used when the
 		 * call back was inserted into the list.
 		 * Since there is a clear dependency between loading
-		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
 		 * not required.
 		 */
-		cb = __atomic_load_n((void **)&p->rxq.clbk[queue_id],
-				__ATOMIC_RELAXED);
+		cb = rte_atomic_load_explicit(&p->rxq.clbk[queue_id],
+				rte_memory_order_relaxed);
 		if (unlikely(cb != NULL))
 			nb_rx = rte_eth_call_rx_callbacks(port_id, queue_id,
 					rx_pkts, nb_rx, nb_pkts, cb);
@@ -6360,14 +6360,14 @@ uint16_t rte_eth_call_tx_callbacks(uint16_t port_id, uint16_t queue_id,
 	{
 		void *cb;
 
-		/* __ATOMIC_RELEASE memory order was used when the
+		/* rte_memory_order_release memory order was used when the
 		 * call back was inserted into the list.
 		 * Since there is a clear dependency between loading
-		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
+		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
 		 * not required.
 		 */
-		cb = __atomic_load_n((void **)&p->txq.clbk[queue_id],
-				__ATOMIC_RELAXED);
+		cb = rte_atomic_load_explicit(&p->txq.clbk[queue_id],
+				rte_memory_order_relaxed);
 		if (unlikely(cb != NULL))
 			nb_pkts = rte_eth_call_tx_callbacks(port_id, queue_id,
 					tx_pkts, nb_pkts, cb);
diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
index 32f5f73..4bfaf79 100644
--- a/lib/ethdev/rte_ethdev_core.h
+++ b/lib/ethdev/rte_ethdev_core.h
@@ -71,7 +71,7 @@ struct rte_ethdev_qdata {
 	/** points to array of internal queue data pointers */
 	void **data;
 	/** points to array of queue callback data pointers */
-	void **clbk;
+	RTE_ATOMIC(void *) *clbk;
 };
 
 /**
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 17/19] hash: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (15 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 16/19] ethdev: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 18/19] timer: " Tyler Retzlaff
                     ` (3 subsequent siblings)
  20 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/hash/rte_cuckoo_hash.c | 116 ++++++++++++++++++++++-----------------------
 lib/hash/rte_cuckoo_hash.h |   6 +--
 2 files changed, 61 insertions(+), 61 deletions(-)

diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
index 19b23f2..b2cf60d 100644
--- a/lib/hash/rte_cuckoo_hash.c
+++ b/lib/hash/rte_cuckoo_hash.c
@@ -149,7 +149,7 @@ struct rte_hash *
 	unsigned int writer_takes_lock = 0;
 	unsigned int no_free_on_del = 0;
 	uint32_t *ext_bkt_to_free = NULL;
-	uint32_t *tbl_chng_cnt = NULL;
+	RTE_ATOMIC(uint32_t) *tbl_chng_cnt = NULL;
 	struct lcore_cache *local_free_slots = NULL;
 	unsigned int readwrite_concur_lf_support = 0;
 	uint32_t i;
@@ -713,9 +713,9 @@ struct rte_hash *
 				 * variable. Release the application data
 				 * to the readers.
 				 */
-				__atomic_store_n(&k->pdata,
+				rte_atomic_store_explicit(&k->pdata,
 					data,
-					__ATOMIC_RELEASE);
+					rte_memory_order_release);
 				/*
 				 * Return index where key is stored,
 				 * subtracting the first dummy index
@@ -776,9 +776,9 @@ struct rte_hash *
 			 * key_idx is the guard variable for signature
 			 * and key.
 			 */
-			__atomic_store_n(&prim_bkt->key_idx[i],
+			rte_atomic_store_explicit(&prim_bkt->key_idx[i],
 					 new_idx,
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 			break;
 		}
 	}
@@ -851,9 +851,9 @@ struct rte_hash *
 		if (unlikely(&h->buckets[prev_alt_bkt_idx]
 				!= curr_bkt)) {
 			/* revert it to empty, otherwise duplicated keys */
-			__atomic_store_n(&curr_bkt->key_idx[curr_slot],
+			rte_atomic_store_explicit(&curr_bkt->key_idx[curr_slot],
 				EMPTY_SLOT,
-				__ATOMIC_RELEASE);
+				rte_memory_order_release);
 			__hash_rw_writer_unlock(h);
 			return -1;
 		}
@@ -865,13 +865,13 @@ struct rte_hash *
 			 * Since there is one writer, load acquires on
 			 * tbl_chng_cnt are not required.
 			 */
-			__atomic_store_n(h->tbl_chng_cnt,
+			rte_atomic_store_explicit(h->tbl_chng_cnt,
 					 *h->tbl_chng_cnt + 1,
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 			/* The store to sig_current should not
 			 * move above the store to tbl_chng_cnt.
 			 */
-			__atomic_thread_fence(__ATOMIC_RELEASE);
+			__atomic_thread_fence(rte_memory_order_release);
 		}
 
 		/* Need to swap current/alt sig to allow later
@@ -881,9 +881,9 @@ struct rte_hash *
 		curr_bkt->sig_current[curr_slot] =
 			prev_bkt->sig_current[prev_slot];
 		/* Release the updated bucket entry */
-		__atomic_store_n(&curr_bkt->key_idx[curr_slot],
+		rte_atomic_store_explicit(&curr_bkt->key_idx[curr_slot],
 			prev_bkt->key_idx[prev_slot],
-			__ATOMIC_RELEASE);
+			rte_memory_order_release);
 
 		curr_slot = prev_slot;
 		curr_node = prev_node;
@@ -897,20 +897,20 @@ struct rte_hash *
 		 * Since there is one writer, load acquires on
 		 * tbl_chng_cnt are not required.
 		 */
-		__atomic_store_n(h->tbl_chng_cnt,
+		rte_atomic_store_explicit(h->tbl_chng_cnt,
 				 *h->tbl_chng_cnt + 1,
-				 __ATOMIC_RELEASE);
+				 rte_memory_order_release);
 		/* The store to sig_current should not
 		 * move above the store to tbl_chng_cnt.
 		 */
-		__atomic_thread_fence(__ATOMIC_RELEASE);
+		__atomic_thread_fence(rte_memory_order_release);
 	}
 
 	curr_bkt->sig_current[curr_slot] = sig;
 	/* Release the new bucket entry */
-	__atomic_store_n(&curr_bkt->key_idx[curr_slot],
+	rte_atomic_store_explicit(&curr_bkt->key_idx[curr_slot],
 			 new_idx,
-			 __ATOMIC_RELEASE);
+			 rte_memory_order_release);
 
 	__hash_rw_writer_unlock(h);
 
@@ -1076,9 +1076,9 @@ struct rte_hash *
 	 * not leak after the store of pdata in the key store. i.e. pdata is
 	 * the guard variable. Release the application data to the readers.
 	 */
-	__atomic_store_n(&new_k->pdata,
+	rte_atomic_store_explicit(&new_k->pdata,
 		data,
-		__ATOMIC_RELEASE);
+		rte_memory_order_release);
 	/* Copy key */
 	memcpy(new_k->key, key, h->key_len);
 
@@ -1149,9 +1149,9 @@ struct rte_hash *
 				 * key_idx is the guard variable for signature
 				 * and key.
 				 */
-				__atomic_store_n(&cur_bkt->key_idx[i],
+				rte_atomic_store_explicit(&cur_bkt->key_idx[i],
 						 slot_id,
-						 __ATOMIC_RELEASE);
+						 rte_memory_order_release);
 				__hash_rw_writer_unlock(h);
 				return slot_id - 1;
 			}
@@ -1185,9 +1185,9 @@ struct rte_hash *
 	 * the store to key_idx. i.e. key_idx is the guard variable
 	 * for signature and key.
 	 */
-	__atomic_store_n(&(h->buckets_ext[ext_bkt_id - 1]).key_idx[0],
+	rte_atomic_store_explicit(&(h->buckets_ext[ext_bkt_id - 1]).key_idx[0],
 			 slot_id,
-			 __ATOMIC_RELEASE);
+			 rte_memory_order_release);
 	/* Link the new bucket to sec bucket linked list */
 	last = rte_hash_get_last_bkt(sec_bkt);
 	last->next = &h->buckets_ext[ext_bkt_id - 1];
@@ -1290,17 +1290,17 @@ struct rte_hash *
 		 * key comparison will ensure that the lookup fails.
 		 */
 		if (bkt->sig_current[i] == sig) {
-			key_idx = __atomic_load_n(&bkt->key_idx[i],
-					  __ATOMIC_ACQUIRE);
+			key_idx = rte_atomic_load_explicit(&bkt->key_idx[i],
+					  rte_memory_order_acquire);
 			if (key_idx != EMPTY_SLOT) {
 				k = (struct rte_hash_key *) ((char *)keys +
 						key_idx * h->key_entry_size);
 
 				if (rte_hash_cmp_eq(key, k->key, h) == 0) {
 					if (data != NULL) {
-						*data = __atomic_load_n(
+						*data = rte_atomic_load_explicit(
 							&k->pdata,
-							__ATOMIC_ACQUIRE);
+							rte_memory_order_acquire);
 					}
 					/*
 					 * Return index where key is stored,
@@ -1374,8 +1374,8 @@ struct rte_hash *
 		 * starts. Acquire semantics will make sure that
 		 * loads in search_one_bucket are not hoisted.
 		 */
-		cnt_b = __atomic_load_n(h->tbl_chng_cnt,
-				__ATOMIC_ACQUIRE);
+		cnt_b = rte_atomic_load_explicit(h->tbl_chng_cnt,
+				rte_memory_order_acquire);
 
 		/* Check if key is in primary location */
 		bkt = &h->buckets[prim_bucket_idx];
@@ -1396,7 +1396,7 @@ struct rte_hash *
 		/* The loads of sig_current in search_one_bucket
 		 * should not move below the load from tbl_chng_cnt.
 		 */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 		/* Re-read the table change counter to check if the
 		 * table has changed during search. If yes, re-do
 		 * the search.
@@ -1405,8 +1405,8 @@ struct rte_hash *
 		 * and key index in secondary bucket will make sure
 		 * that it does not get hoisted.
 		 */
-		cnt_a = __atomic_load_n(h->tbl_chng_cnt,
-					__ATOMIC_ACQUIRE);
+		cnt_a = rte_atomic_load_explicit(h->tbl_chng_cnt,
+					rte_memory_order_acquire);
 	} while (cnt_b != cnt_a);
 
 	return -ENOENT;
@@ -1611,26 +1611,26 @@ struct rte_hash *
 	for (i = RTE_HASH_BUCKET_ENTRIES - 1; i >= 0; i--) {
 		if (last_bkt->key_idx[i] != EMPTY_SLOT) {
 			cur_bkt->sig_current[pos] = last_bkt->sig_current[i];
-			__atomic_store_n(&cur_bkt->key_idx[pos],
+			rte_atomic_store_explicit(&cur_bkt->key_idx[pos],
 					 last_bkt->key_idx[i],
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 			if (h->readwrite_concur_lf_support) {
 				/* Inform the readers that the table has changed
 				 * Since there is one writer, load acquire on
 				 * tbl_chng_cnt is not required.
 				 */
-				__atomic_store_n(h->tbl_chng_cnt,
+				rte_atomic_store_explicit(h->tbl_chng_cnt,
 					 *h->tbl_chng_cnt + 1,
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 				/* The store to sig_current should
 				 * not move above the store to tbl_chng_cnt.
 				 */
-				__atomic_thread_fence(__ATOMIC_RELEASE);
+				__atomic_thread_fence(rte_memory_order_release);
 			}
 			last_bkt->sig_current[i] = NULL_SIGNATURE;
-			__atomic_store_n(&last_bkt->key_idx[i],
+			rte_atomic_store_explicit(&last_bkt->key_idx[i],
 					 EMPTY_SLOT,
-					 __ATOMIC_RELEASE);
+					 rte_memory_order_release);
 			return;
 		}
 	}
@@ -1650,8 +1650,8 @@ struct rte_hash *
 
 	/* Check if key is in bucket */
 	for (i = 0; i < RTE_HASH_BUCKET_ENTRIES; i++) {
-		key_idx = __atomic_load_n(&bkt->key_idx[i],
-					  __ATOMIC_ACQUIRE);
+		key_idx = rte_atomic_load_explicit(&bkt->key_idx[i],
+					  rte_memory_order_acquire);
 		if (bkt->sig_current[i] == sig && key_idx != EMPTY_SLOT) {
 			k = (struct rte_hash_key *) ((char *)keys +
 					key_idx * h->key_entry_size);
@@ -1663,9 +1663,9 @@ struct rte_hash *
 				if (!h->no_free_on_del)
 					remove_entry(h, bkt, i);
 
-				__atomic_store_n(&bkt->key_idx[i],
+				rte_atomic_store_explicit(&bkt->key_idx[i],
 						 EMPTY_SLOT,
-						 __ATOMIC_RELEASE);
+						 rte_memory_order_release);
 
 				*pos = i;
 				/*
@@ -2077,8 +2077,8 @@ struct rte_hash *
 		 * starts. Acquire semantics will make sure that
 		 * loads in compare_signatures are not hoisted.
 		 */
-		cnt_b = __atomic_load_n(h->tbl_chng_cnt,
-					__ATOMIC_ACQUIRE);
+		cnt_b = rte_atomic_load_explicit(h->tbl_chng_cnt,
+					rte_memory_order_acquire);
 
 		/* Compare signatures and prefetch key slot of first hit */
 		for (i = 0; i < num_keys; i++) {
@@ -2121,9 +2121,9 @@ struct rte_hash *
 						__builtin_ctzl(prim_hitmask[i])
 						>> 1;
 				uint32_t key_idx =
-				__atomic_load_n(
+				rte_atomic_load_explicit(
 					&primary_bkt[i]->key_idx[hit_index],
-					__ATOMIC_ACQUIRE);
+					rte_memory_order_acquire);
 				const struct rte_hash_key *key_slot =
 					(const struct rte_hash_key *)(
 					(const char *)h->key_store +
@@ -2137,9 +2137,9 @@ struct rte_hash *
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = __atomic_load_n(
+						data[i] = rte_atomic_load_explicit(
 							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
+							rte_memory_order_acquire);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
@@ -2153,9 +2153,9 @@ struct rte_hash *
 						__builtin_ctzl(sec_hitmask[i])
 						>> 1;
 				uint32_t key_idx =
-				__atomic_load_n(
+				rte_atomic_load_explicit(
 					&secondary_bkt[i]->key_idx[hit_index],
-					__ATOMIC_ACQUIRE);
+					rte_memory_order_acquire);
 				const struct rte_hash_key *key_slot =
 					(const struct rte_hash_key *)(
 					(const char *)h->key_store +
@@ -2170,9 +2170,9 @@ struct rte_hash *
 					!rte_hash_cmp_eq(
 						key_slot->key, keys[i], h)) {
 					if (data != NULL)
-						data[i] = __atomic_load_n(
+						data[i] = rte_atomic_load_explicit(
 							&key_slot->pdata,
-							__ATOMIC_ACQUIRE);
+							rte_memory_order_acquire);
 
 					hits |= 1ULL << i;
 					positions[i] = key_idx - 1;
@@ -2216,7 +2216,7 @@ struct rte_hash *
 		/* The loads of sig_current in compare_signatures
 		 * should not move below the load from tbl_chng_cnt.
 		 */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 		/* Re-read the table change counter to check if the
 		 * table has changed during search. If yes, re-do
 		 * the search.
@@ -2225,8 +2225,8 @@ struct rte_hash *
 		 * key index will make sure that it does not get
 		 * hoisted.
 		 */
-		cnt_a = __atomic_load_n(h->tbl_chng_cnt,
-					__ATOMIC_ACQUIRE);
+		cnt_a = rte_atomic_load_explicit(h->tbl_chng_cnt,
+					rte_memory_order_acquire);
 	} while (cnt_b != cnt_a);
 
 	if (hit_mask != NULL)
@@ -2498,8 +2498,8 @@ struct rte_hash *
 	idx = *next % RTE_HASH_BUCKET_ENTRIES;
 
 	/* If current position is empty, go to the next one */
-	while ((position = __atomic_load_n(&h->buckets[bucket_idx].key_idx[idx],
-					__ATOMIC_ACQUIRE)) == EMPTY_SLOT) {
+	while ((position = rte_atomic_load_explicit(&h->buckets[bucket_idx].key_idx[idx],
+					rte_memory_order_acquire)) == EMPTY_SLOT) {
 		(*next)++;
 		/* End of table */
 		if (*next == total_entries_main)
diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
index eb2644f..f7afc4d 100644
--- a/lib/hash/rte_cuckoo_hash.h
+++ b/lib/hash/rte_cuckoo_hash.h
@@ -137,7 +137,7 @@ struct lcore_cache {
 struct rte_hash_key {
 	union {
 		uintptr_t idata;
-		void *pdata;
+		RTE_ATOMIC(void *) pdata;
 	};
 	/* Variable key size */
 	char key[0];
@@ -155,7 +155,7 @@ enum rte_hash_sig_compare_function {
 struct rte_hash_bucket {
 	uint16_t sig_current[RTE_HASH_BUCKET_ENTRIES];
 
-	uint32_t key_idx[RTE_HASH_BUCKET_ENTRIES];
+	RTE_ATOMIC(uint32_t) key_idx[RTE_HASH_BUCKET_ENTRIES];
 
 	uint8_t flag[RTE_HASH_BUCKET_ENTRIES];
 
@@ -229,7 +229,7 @@ struct rte_hash {
 	 * is piggy-backed to freeing of the key index.
 	 */
 	uint32_t *ext_bkt_to_free;
-	uint32_t *tbl_chng_cnt;
+	RTE_ATOMIC(uint32_t) *tbl_chng_cnt;
 	/**< Indicates if the hash table changed from last read. */
 } __rte_cache_aligned;
 
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 18/19] timer: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (16 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 17/19] hash: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-26  0:31   ` [PATCH v3 19/19] ring: " Tyler Retzlaff
                     ` (2 subsequent siblings)
  20 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/timer/rte_timer.c | 50 +++++++++++++++++++++++++-------------------------
 lib/timer/rte_timer.h |  6 +++---
 2 files changed, 28 insertions(+), 28 deletions(-)

diff --git a/lib/timer/rte_timer.c b/lib/timer/rte_timer.c
index 85d6757..53ed221 100644
--- a/lib/timer/rte_timer.c
+++ b/lib/timer/rte_timer.c
@@ -210,7 +210,7 @@ struct rte_timer_data {
 
 	status.state = RTE_TIMER_STOP;
 	status.owner = RTE_TIMER_NO_OWNER;
-	__atomic_store_n(&tim->status.u32, status.u32, __ATOMIC_RELAXED);
+	rte_atomic_store_explicit(&tim->status.u32, status.u32, rte_memory_order_relaxed);
 }
 
 /*
@@ -231,7 +231,7 @@ struct rte_timer_data {
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as being configured */
-	prev_status.u32 = __atomic_load_n(&tim->status.u32, __ATOMIC_RELAXED);
+	prev_status.u32 = rte_atomic_load_explicit(&tim->status.u32, rte_memory_order_relaxed);
 
 	while (success == 0) {
 		/* timer is running on another core
@@ -254,11 +254,11 @@ struct rte_timer_data {
 		 * timer is in CONFIG state, the state cannot be changed
 		 * by other threads. So, we should use ACQUIRE here.
 		 */
-		success = __atomic_compare_exchange_n(&tim->status.u32,
-					      &prev_status.u32,
-					      status.u32, 0,
-					      __ATOMIC_ACQUIRE,
-					      __ATOMIC_RELAXED);
+		success = rte_atomic_compare_exchange_strong_explicit(&tim->status.u32,
+					      (uint32_t *)(uintptr_t)&prev_status.u32,
+					      status.u32,
+					      rte_memory_order_acquire,
+					      rte_memory_order_relaxed);
 	}
 
 	ret_prev_status->u32 = prev_status.u32;
@@ -277,7 +277,7 @@ struct rte_timer_data {
 
 	/* wait that the timer is in correct status before update,
 	 * and mark it as running */
-	prev_status.u32 = __atomic_load_n(&tim->status.u32, __ATOMIC_RELAXED);
+	prev_status.u32 = rte_atomic_load_explicit(&tim->status.u32, rte_memory_order_relaxed);
 
 	while (success == 0) {
 		/* timer is not pending anymore */
@@ -293,11 +293,11 @@ struct rte_timer_data {
 		 * timer is in RUNNING state, the state cannot be changed
 		 * by other threads. So, we should use ACQUIRE here.
 		 */
-		success = __atomic_compare_exchange_n(&tim->status.u32,
-					      &prev_status.u32,
-					      status.u32, 0,
-					      __ATOMIC_ACQUIRE,
-					      __ATOMIC_RELAXED);
+		success = rte_atomic_compare_exchange_strong_explicit(&tim->status.u32,
+					      (uint32_t *)(uintptr_t)&prev_status.u32,
+					      status.u32,
+					      rte_memory_order_acquire,
+					      rte_memory_order_relaxed);
 	}
 
 	return 0;
@@ -530,7 +530,7 @@ struct rte_timer_data {
 	/* The "RELEASE" ordering guarantees the memory operations above
 	 * the status update are observed before the update by all threads
 	 */
-	__atomic_store_n(&tim->status.u32, status.u32, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&tim->status.u32, status.u32, rte_memory_order_release);
 
 	if (tim_lcore != lcore_id || !local_is_locked)
 		rte_spinlock_unlock(&priv_timer[tim_lcore].list_lock);
@@ -612,7 +612,7 @@ struct rte_timer_data {
 	/* The "RELEASE" ordering guarantees the memory operations above
 	 * the status update are observed before the update by all threads
 	 */
-	__atomic_store_n(&tim->status.u32, status.u32, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&tim->status.u32, status.u32, rte_memory_order_release);
 
 	return 0;
 }
@@ -646,8 +646,8 @@ struct rte_timer_data {
 int
 rte_timer_pending(struct rte_timer *tim)
 {
-	return __atomic_load_n(&tim->status.state,
-				__ATOMIC_RELAXED) == RTE_TIMER_PENDING;
+	return rte_atomic_load_explicit(&tim->status.state,
+				rte_memory_order_relaxed) == RTE_TIMER_PENDING;
 }
 
 /* must be called periodically, run all timer that expired */
@@ -753,8 +753,8 @@ struct rte_timer_data {
 			 * operations above the status update are observed
 			 * before the update by all threads
 			 */
-			__atomic_store_n(&tim->status.u32, status.u32,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&tim->status.u32, status.u32,
+				rte_memory_order_release);
 		}
 		else {
 			/* keep it in list and mark timer as pending */
@@ -766,8 +766,8 @@ struct rte_timer_data {
 			 * operations above the status update are observed
 			 * before the update by all threads
 			 */
-			__atomic_store_n(&tim->status.u32, status.u32,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&tim->status.u32, status.u32,
+				rte_memory_order_release);
 			__rte_timer_reset(tim, tim->expire + tim->period,
 				tim->period, lcore_id, tim->f, tim->arg, 1,
 				timer_data);
@@ -941,8 +941,8 @@ struct rte_timer_data {
 			 * operations above the status update are observed
 			 * before the update by all threads
 			 */
-			__atomic_store_n(&tim->status.u32, status.u32,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&tim->status.u32, status.u32,
+				rte_memory_order_release);
 		} else {
 			/* keep it in list and mark timer as pending */
 			rte_spinlock_lock(
@@ -954,8 +954,8 @@ struct rte_timer_data {
 			 * operations above the status update are observed
 			 * before the update by all threads
 			 */
-			__atomic_store_n(&tim->status.u32, status.u32,
-				__ATOMIC_RELEASE);
+			rte_atomic_store_explicit(&tim->status.u32, status.u32,
+				rte_memory_order_release);
 			__rte_timer_reset(tim, tim->expire + tim->period,
 				tim->period, this_lcore, tim->f, tim->arg, 1,
 				data);
diff --git a/lib/timer/rte_timer.h b/lib/timer/rte_timer.h
index d3927d5..a35bc08 100644
--- a/lib/timer/rte_timer.h
+++ b/lib/timer/rte_timer.h
@@ -65,10 +65,10 @@ enum rte_timer_type {
  */
 union rte_timer_status {
 	struct {
-		uint16_t state;  /**< Stop, pending, running, config. */
-		int16_t owner;   /**< The lcore that owns the timer. */
+		RTE_ATOMIC(uint16_t) state;  /**< Stop, pending, running, config. */
+		RTE_ATOMIC(int16_t) owner;   /**< The lcore that owns the timer. */
 	};
-	uint32_t u32;            /**< To atomic-set status + owner. */
+	RTE_ATOMIC(uint32_t) u32;            /**< To atomic-set status + owner. */
 };
 
 #ifdef RTE_LIBRTE_TIMER_DEBUG
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* [PATCH v3 19/19] ring: use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (17 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 18/19] timer: " Tyler Retzlaff
@ 2023-10-26  0:31   ` Tyler Retzlaff
  2023-10-27 12:58     ` Konstantin Ananyev
  2023-10-26 13:47   ` [PATCH v3 00/19] " David Marchand
  2023-10-30 15:34   ` David Marchand
  20 siblings, 1 reply; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26  0:31 UTC (permalink / raw)
  To: dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang,
	Tyler Retzlaff

Replace the use of gcc builtin __atomic_xxx intrinsics with
corresponding rte_atomic_xxx optional stdatomic API

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 drivers/net/mlx5/mlx5_hws_cnt.h   |  4 ++--
 lib/ring/rte_ring_c11_pvt.h       | 47 +++++++++++++++++++++------------------
 lib/ring/rte_ring_core.h          | 12 +++++-----
 lib/ring/rte_ring_generic_pvt.h   | 16 +++++++------
 lib/ring/rte_ring_hts_elem_pvt.h  | 22 +++++++++---------
 lib/ring/rte_ring_peek_elem_pvt.h |  6 ++---
 lib/ring/rte_ring_rts_elem_pvt.h  | 27 +++++++++++-----------
 7 files changed, 71 insertions(+), 63 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
index f462665..dcd5cec 100644
--- a/drivers/net/mlx5/mlx5_hws_cnt.h
+++ b/drivers/net/mlx5/mlx5_hws_cnt.h
@@ -386,7 +386,7 @@ struct mlx5_hws_age_param {
 
 	MLX5_ASSERT(r->prod.sync_type == RTE_RING_SYNC_ST);
 	MLX5_ASSERT(r->cons.sync_type == RTE_RING_SYNC_ST);
-	current_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
+	current_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
 	MLX5_ASSERT(n <= r->capacity);
 	MLX5_ASSERT(n <= rte_ring_count(r));
 	revert2head = current_head - n;
@@ -394,7 +394,7 @@ struct mlx5_hws_age_param {
 	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
 			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
 	/* Update tail */
-	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&r->prod.tail, revert2head, rte_memory_order_release);
 	return n;
 }
 
diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
index f895950..5c10ad8 100644
--- a/lib/ring/rte_ring_c11_pvt.h
+++ b/lib/ring/rte_ring_c11_pvt.h
@@ -22,9 +22,10 @@
 	 * we need to wait for them to complete
 	 */
 	if (!single)
-		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
+		rte_wait_until_equal_32((uint32_t *)(uintptr_t)&ht->tail, old_val,
+			rte_memory_order_relaxed);
 
-	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
 }
 
 /**
@@ -61,19 +62,19 @@
 	unsigned int max = n;
 	int success;
 
-	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
+	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
 	do {
 		/* Reset n to the initial burst count */
 		n = max;
 
 		/* Ensure the head is read before tail */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 
 		/* load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
-		cons_tail = __atomic_load_n(&r->cons.tail,
-					__ATOMIC_ACQUIRE);
+		cons_tail = rte_atomic_load_explicit(&r->cons.tail,
+					rte_memory_order_acquire);
 
 		/* The subtraction is done between two unsigned 32bits value
 		 * (the result is always modulo 32 bits even if we have
@@ -91,14 +92,15 @@
 			return 0;
 
 		*new_head = *old_head + n;
-		if (is_sp)
-			r->prod.head = *new_head, success = 1;
-		else
+		if (is_sp) {
+			r->prod.head = *new_head;
+			success = 1;
+		} else
 			/* on failure, *old_head is updated */
-			success = __atomic_compare_exchange_n(&r->prod.head,
+			success = rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
 					old_head, *new_head,
-					0, __ATOMIC_RELAXED,
-					__ATOMIC_RELAXED);
+					rte_memory_order_relaxed,
+					rte_memory_order_relaxed);
 	} while (unlikely(success == 0));
 	return n;
 }
@@ -137,19 +139,19 @@
 	int success;
 
 	/* move cons.head atomically */
-	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
+	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_relaxed);
 	do {
 		/* Restore n as it may change every loop */
 		n = max;
 
 		/* Ensure the head is read before tail */
-		__atomic_thread_fence(__ATOMIC_ACQUIRE);
+		__atomic_thread_fence(rte_memory_order_acquire);
 
 		/* this load-acquire synchronize with store-release of ht->tail
 		 * in update_tail.
 		 */
-		prod_tail = __atomic_load_n(&r->prod.tail,
-					__ATOMIC_ACQUIRE);
+		prod_tail = rte_atomic_load_explicit(&r->prod.tail,
+					rte_memory_order_acquire);
 
 		/* The subtraction is done between two unsigned 32bits value
 		 * (the result is always modulo 32 bits even if we have
@@ -166,14 +168,15 @@
 			return 0;
 
 		*new_head = *old_head + n;
-		if (is_sc)
-			r->cons.head = *new_head, success = 1;
-		else
+		if (is_sc) {
+			r->cons.head = *new_head;
+			success = 1;
+		} else
 			/* on failure, *old_head will be updated */
-			success = __atomic_compare_exchange_n(&r->cons.head,
+			success = rte_atomic_compare_exchange_strong_explicit(&r->cons.head,
 							old_head, *new_head,
-							0, __ATOMIC_RELAXED,
-							__ATOMIC_RELAXED);
+							rte_memory_order_relaxed,
+							rte_memory_order_relaxed);
 	} while (unlikely(success == 0));
 	return n;
 }
diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 327fdcf..b770873 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -66,8 +66,8 @@ enum rte_ring_sync_type {
  * but offset for *sync_type* and *tail* values should remain the same.
  */
 struct rte_ring_headtail {
-	volatile uint32_t head;      /**< prod/consumer head. */
-	volatile uint32_t tail;      /**< prod/consumer tail. */
+	volatile RTE_ATOMIC(uint32_t) head;      /**< prod/consumer head. */
+	volatile RTE_ATOMIC(uint32_t) tail;      /**< prod/consumer tail. */
 	union {
 		/** sync type of prod/cons */
 		enum rte_ring_sync_type sync_type;
@@ -78,7 +78,7 @@ struct rte_ring_headtail {
 
 union __rte_ring_rts_poscnt {
 	/** raw 8B value to read/write *cnt* and *pos* as one atomic op */
-	uint64_t raw __rte_aligned(8);
+	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
 	struct {
 		uint32_t cnt; /**< head/tail reference counter */
 		uint32_t pos; /**< head/tail position */
@@ -94,10 +94,10 @@ struct rte_ring_rts_headtail {
 
 union __rte_ring_hts_pos {
 	/** raw 8B value to read/write *head* and *tail* as one atomic op */
-	uint64_t raw __rte_aligned(8);
+	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
 	struct {
-		uint32_t head; /**< head position */
-		uint32_t tail; /**< tail position */
+		RTE_ATOMIC(uint32_t) head; /**< head position */
+		RTE_ATOMIC(uint32_t) tail; /**< tail position */
 	} pos;
 };
 
diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
index 5acb6e5..457f41d 100644
--- a/lib/ring/rte_ring_generic_pvt.h
+++ b/lib/ring/rte_ring_generic_pvt.h
@@ -23,7 +23,8 @@
 	 * we need to wait for them to complete
 	 */
 	if (!single)
-		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
+		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
+			rte_memory_order_relaxed);
 
 	ht->tail = new_val;
 }
@@ -89,10 +90,11 @@
 			return 0;
 
 		*new_head = *old_head + n;
-		if (is_sp)
-			r->prod.head = *new_head, success = 1;
-		else
-			success = rte_atomic32_cmpset(&r->prod.head,
+		if (is_sp) {
+			r->prod.head = *new_head;
+			success = 1;
+		} else
+			success = rte_atomic32_cmpset((uint32_t *)(uintptr_t)&r->prod.head,
 					*old_head, *new_head);
 	} while (unlikely(success == 0));
 	return n;
@@ -162,8 +164,8 @@
 			rte_smp_rmb();
 			success = 1;
 		} else {
-			success = rte_atomic32_cmpset(&r->cons.head, *old_head,
-					*new_head);
+			success = rte_atomic32_cmpset((uint32_t *)(uintptr_t)&r->cons.head,
+					*old_head, *new_head);
 		}
 	} while (unlikely(success == 0));
 	return n;
diff --git a/lib/ring/rte_ring_hts_elem_pvt.h b/lib/ring/rte_ring_hts_elem_pvt.h
index a8678d3..91f5eec 100644
--- a/lib/ring/rte_ring_hts_elem_pvt.h
+++ b/lib/ring/rte_ring_hts_elem_pvt.h
@@ -10,6 +10,8 @@
 #ifndef _RTE_RING_HTS_ELEM_PVT_H_
 #define _RTE_RING_HTS_ELEM_PVT_H_
 
+#include <rte_stdatomic.h>
+
 /**
  * @file rte_ring_hts_elem_pvt.h
  * It is not recommended to include this file directly,
@@ -30,7 +32,7 @@
 	RTE_SET_USED(enqueue);
 
 	tail = old_tail + num;
-	__atomic_store_n(&ht->ht.pos.tail, tail, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ht->ht.pos.tail, tail, rte_memory_order_release);
 }
 
 /**
@@ -44,7 +46,7 @@
 {
 	while (p->pos.head != p->pos.tail) {
 		rte_pause();
-		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
+		p->raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_acquire);
 	}
 }
 
@@ -61,7 +63,7 @@
 
 	const uint32_t capacity = r->capacity;
 
-	op.raw = __atomic_load_n(&r->hts_prod.ht.raw, __ATOMIC_ACQUIRE);
+	op.raw = rte_atomic_load_explicit(&r->hts_prod.ht.raw, rte_memory_order_acquire);
 
 	do {
 		/* Reset n to the initial burst count */
@@ -98,9 +100,9 @@
 	 *  - OOO reads of cons tail value
 	 *  - OOO copy of elems from the ring
 	 */
-	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
-			&op.raw, np.raw,
-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_prod.ht.raw,
+			(uint64_t *)(uintptr_t)&op.raw, np.raw,
+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
 
 	*old_head = op.pos.head;
 	return n;
@@ -117,7 +119,7 @@
 	uint32_t n;
 	union __rte_ring_hts_pos np, op;
 
-	op.raw = __atomic_load_n(&r->hts_cons.ht.raw, __ATOMIC_ACQUIRE);
+	op.raw = rte_atomic_load_explicit(&r->hts_cons.ht.raw, rte_memory_order_acquire);
 
 	/* move cons.head atomically */
 	do {
@@ -153,9 +155,9 @@
 	 *  - OOO reads of prod tail value
 	 *  - OOO copy of elems from the ring
 	 */
-	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
-			&op.raw, np.raw,
-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_cons.ht.raw,
+			(uint64_t *)(uintptr_t)&op.raw, np.raw,
+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
 
 	*old_head = op.pos.head;
 	return n;
diff --git a/lib/ring/rte_ring_peek_elem_pvt.h b/lib/ring/rte_ring_peek_elem_pvt.h
index bb0a7d5..b5f0822 100644
--- a/lib/ring/rte_ring_peek_elem_pvt.h
+++ b/lib/ring/rte_ring_peek_elem_pvt.h
@@ -59,7 +59,7 @@
 
 	pos = tail + num;
 	ht->head = pos;
-	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ht->tail, pos, rte_memory_order_release);
 }
 
 /**
@@ -78,7 +78,7 @@
 	uint32_t n;
 	union __rte_ring_hts_pos p;
 
-	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
+	p.raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_relaxed);
 	n = p.pos.head - p.pos.tail;
 
 	RTE_ASSERT(n >= num);
@@ -104,7 +104,7 @@
 	p.pos.head = tail + num;
 	p.pos.tail = p.pos.head;
 
-	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
+	rte_atomic_store_explicit(&ht->ht.raw, p.raw, rte_memory_order_release);
 }
 
 /**
diff --git a/lib/ring/rte_ring_rts_elem_pvt.h b/lib/ring/rte_ring_rts_elem_pvt.h
index 7164213..1226503 100644
--- a/lib/ring/rte_ring_rts_elem_pvt.h
+++ b/lib/ring/rte_ring_rts_elem_pvt.h
@@ -31,18 +31,19 @@
 	 * might preceded us, then don't update tail with new value.
 	 */
 
-	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
+	ot.raw = rte_atomic_load_explicit(&ht->tail.raw, rte_memory_order_acquire);
 
 	do {
 		/* on 32-bit systems we have to do atomic read here */
-		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
+		h.raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_relaxed);
 
 		nt.raw = ot.raw;
 		if (++nt.val.cnt == h.val.cnt)
 			nt.val.pos = h.val.pos;
 
-	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
-			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&ht->tail.raw,
+			(uint64_t *)(uintptr_t)&ot.raw, nt.raw,
+			rte_memory_order_release, rte_memory_order_acquire) == 0);
 }
 
 /**
@@ -59,7 +60,7 @@
 
 	while (h->val.pos - ht->tail.val.pos > max) {
 		rte_pause();
-		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
+		h->raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_acquire);
 	}
 }
 
@@ -76,7 +77,7 @@
 
 	const uint32_t capacity = r->capacity;
 
-	oh.raw = __atomic_load_n(&r->rts_prod.head.raw, __ATOMIC_ACQUIRE);
+	oh.raw = rte_atomic_load_explicit(&r->rts_prod.head.raw, rte_memory_order_acquire);
 
 	do {
 		/* Reset n to the initial burst count */
@@ -113,9 +114,9 @@
 	 *  - OOO reads of cons tail value
 	 *  - OOO copy of elems to the ring
 	 */
-	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
-			&oh.raw, nh.raw,
-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_prod.head.raw,
+			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
 
 	*old_head = oh.val.pos;
 	return n;
@@ -132,7 +133,7 @@
 	uint32_t n;
 	union __rte_ring_rts_poscnt nh, oh;
 
-	oh.raw = __atomic_load_n(&r->rts_cons.head.raw, __ATOMIC_ACQUIRE);
+	oh.raw = rte_atomic_load_explicit(&r->rts_cons.head.raw, rte_memory_order_acquire);
 
 	/* move cons.head atomically */
 	do {
@@ -168,9 +169,9 @@
 	 *  - OOO reads of prod tail value
 	 *  - OOO copy of elems from the ring
 	 */
-	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
-			&oh.raw, nh.raw,
-			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
+	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_cons.head.raw,
+			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
+			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
 
 	*old_head = oh.val.pos;
 	return n;
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: [PATCH v2 09/19] rcu: use rte optional stdatomic API
  2023-10-25 22:38       ` Tyler Retzlaff
@ 2023-10-26  4:24         ` Ruifeng Wang
  2023-10-26 16:36           ` Tyler Retzlaff
  0 siblings, 1 reply; 91+ messages in thread
From: Ruifeng Wang @ 2023-10-26  4:24 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Akhil Goyal, Anatoly Burakov, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Ciara Power, David Christensen,
	David Hunt, Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, jerinj,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, thomas, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang, nd, nd

> -----Original Message-----
> From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> Sent: Thursday, October 26, 2023 6:38 AM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> Cc: dev@dpdk.org; Akhil Goyal <gakhil@marvell.com>; Anatoly Burakov
> <anatoly.burakov@intel.com>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Bruce
> Richardson <bruce.richardson@intel.com>; Chenbo Xia <chenbo.xia@intel.com>; Ciara Power
> <ciara.power@intel.com>; David Christensen <drc@linux.vnet.ibm.com>; David Hunt
> <david.hunt@intel.com>; Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>; Dmitry Malloy
> <dmitrym@microsoft.com>; Elena Agostini <eagostini@nvidia.com>; Erik Gabriel Carrillo
> <erik.g.carrillo@intel.com>; Fan Zhang <fanzhang.oss@gmail.com>; Ferruh Yigit
> <ferruh.yigit@amd.com>; Harman Kalra <hkalra@marvell.com>; Harry van Haaren
> <harry.van.haaren@intel.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> jerinj@marvell.com; Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>; Matan Azrad
> <matan@nvidia.com>; Maxime Coquelin <maxime.coquelin@redhat.com>; Narcisa Ana Maria Vasile
> <navasile@linux.microsoft.com>; Nicolas Chautru <nicolas.chautru@intel.com>; Olivier Matz
> <olivier.matz@6wind.com>; Ori Kam <orika@nvidia.com>; Pallavi Kadam
> <pallavi.kadam@intel.com>; Pavan Nikhilesh <pbhagavatula@marvell.com>; Reshma Pattan
> <reshma.pattan@intel.com>; Sameh Gobriel <sameh.gobriel@intel.com>; Shijith Thotton
> <sthotton@marvell.com>; Sivaprasad Tummala <sivaprasad.tummala@amd.com>; Stephen Hemminger
> <stephen@networkplumber.org>; Suanming Mou <suanmingm@nvidia.com>; Sunil Kumar Kori
> <skori@marvell.com>; thomas@monjalon.net; Viacheslav Ovsiienko <viacheslavo@nvidia.com>;
> Vladimir Medvedkin <vladimir.medvedkin@intel.com>; Yipeng Wang <yipeng1.wang@intel.com>;
> nd <nd@arm.com>
> Subject: Re: [PATCH v2 09/19] rcu: use rte optional stdatomic API
> 
> On Wed, Oct 25, 2023 at 09:41:22AM +0000, Ruifeng Wang wrote:
> > > -----Original Message-----
> > > From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > Sent: Wednesday, October 18, 2023 4:31 AM
> > > To: dev@dpdk.org
> > > Cc: Akhil Goyal <gakhil@marvell.com>; Anatoly Burakov
> > > <anatoly.burakov@intel.com>; Andrew Rybchenko
> > > <andrew.rybchenko@oktetlabs.ru>; Bruce Richardson
> > > <bruce.richardson@intel.com>; Chenbo Xia <chenbo.xia@intel.com>;
> > > Ciara Power <ciara.power@intel.com>; David Christensen
> > > <drc@linux.vnet.ibm.com>; David Hunt <david.hunt@intel.com>; Dmitry
> > > Kozlyuk <dmitry.kozliuk@gmail.com>; Dmitry Malloy
> > > <dmitrym@microsoft.com>; Elena Agostini <eagostini@nvidia.com>; Erik
> > > Gabriel Carrillo <erik.g.carrillo@intel.com>; Fan Zhang
> > > <fanzhang.oss@gmail.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> > > Harman Kalra <hkalra@marvell.com>; Harry van Haaren
> > > <harry.van.haaren@intel.com>; Honnappa Nagarahalli
> > > <Honnappa.Nagarahalli@arm.com>; jerinj@marvell.com; Konstantin
> > > Ananyev <konstantin.v.ananyev@yandex.ru>; Matan Azrad
> > > <matan@nvidia.com>; Maxime Coquelin <maxime.coquelin@redhat.com>;
> > > Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>; Nicolas
> > > Chautru <nicolas.chautru@intel.com>; Olivier Matz
> > > <olivier.matz@6wind.com>; Ori Kam <orika@nvidia.com>; Pallavi Kadam
> > > <pallavi.kadam@intel.com>; Pavan Nikhilesh
> > > <pbhagavatula@marvell.com>; Reshma Pattan <reshma.pattan@intel.com>;
> > > Sameh Gobriel <sameh.gobriel@intel.com>; Shijith Thotton
> > > <sthotton@marvell.com>; Sivaprasad Tummala
> > > <sivaprasad.tummala@amd.com>; Stephen Hemminger
> > > <stephen@networkplumber.org>; Suanming Mou <suanmingm@nvidia.com>;
> > > Sunil Kumar Kori <skori@marvell.com>; thomas@monjalon.net;
> > > Viacheslav Ovsiienko <viacheslavo@nvidia.com>; Vladimir Medvedkin
> > > <vladimir.medvedkin@intel.com>; Yipeng Wang
> > > <yipeng1.wang@intel.com>; Tyler Retzlaff
> > > <roretzla@linux.microsoft.com>
> > > Subject: [PATCH v2 09/19] rcu: use rte optional stdatomic API
> > >
> > > Replace the use of gcc builtin __atomic_xxx intrinsics with
> > > corresponding rte_atomic_xxx optional stdatomic API
> > >
> > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > ---
> > >  lib/rcu/rte_rcu_qsbr.c | 48 +++++++++++++++++------------------
> > >  lib/rcu/rte_rcu_qsbr.h | 68
> > > +++++++++++++++++++++++++-------------------------
> > >  2 files changed, 58 insertions(+), 58 deletions(-)
> > >
> > > diff --git a/lib/rcu/rte_rcu_qsbr.c b/lib/rcu/rte_rcu_qsbr.c index
> > > 17be93e..4dc7714 100644
> > > --- a/lib/rcu/rte_rcu_qsbr.c
> > > +++ b/lib/rcu/rte_rcu_qsbr.c
> > > @@ -102,21 +102,21 @@
> > >  	 * go out of sync. Hence, additional checks are required.
> > >  	 */
> > >  	/* Check if the thread is already registered */
> > > -	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > -					__ATOMIC_RELAXED);
> > > +	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > +					rte_memory_order_relaxed);
> > >  	if (old_bmap & 1UL << id)
> > >  		return 0;
> > >
> > >  	do {
> > >  		new_bmap = old_bmap | (1UL << id);
> > > -		success = __atomic_compare_exchange(
> > > +		success = rte_atomic_compare_exchange_strong_explicit(
> > >  					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > -					&old_bmap, &new_bmap, 0,
> > > -					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
> > > +					&old_bmap, new_bmap,
> > > +					rte_memory_order_release, rte_memory_order_relaxed);
> > >
> > >  		if (success)
> > > -			__atomic_fetch_add(&v->num_threads,
> > > -						1, __ATOMIC_RELAXED);
> > > +			rte_atomic_fetch_add_explicit(&v->num_threads,
> > > +						1, rte_memory_order_relaxed);
> > >  		else if (old_bmap & (1UL << id))
> > >  			/* Someone else registered this thread.
> > >  			 * Counter should not be incremented.
> > > @@ -154,8 +154,8 @@
> > >  	 * go out of sync. Hence, additional checks are required.
> > >  	 */
> > >  	/* Check if the thread is already unregistered */
> > > -	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > -					__ATOMIC_RELAXED);
> > > +	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > +					rte_memory_order_relaxed);
> > >  	if (!(old_bmap & (1UL << id)))
> > >  		return 0;
> > >
> > > @@ -165,14 +165,14 @@
> > >  		 * completed before removal of the thread from the list of
> > >  		 * reporting threads.
> > >  		 */
> > > -		success = __atomic_compare_exchange(
> > > +		success = rte_atomic_compare_exchange_strong_explicit(
> > >  					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > -					&old_bmap, &new_bmap, 0,
> > > -					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
> > > +					&old_bmap, new_bmap,
> > > +					rte_memory_order_release, rte_memory_order_relaxed);
> > >
> > >  		if (success)
> > > -			__atomic_fetch_sub(&v->num_threads,
> > > -						1, __ATOMIC_RELAXED);
> > > +			rte_atomic_fetch_sub_explicit(&v->num_threads,
> > > +						1, rte_memory_order_relaxed);
> > >  		else if (!(old_bmap & (1UL << id)))
> > >  			/* Someone else unregistered this thread.
> > >  			 * Counter should not be incremented.
> > > @@ -227,8 +227,8 @@
> > >
> > >  	fprintf(f, "  Registered thread IDs = ");
> > >  	for (i = 0; i < v->num_elems; i++) {
> > > -		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > -					__ATOMIC_ACQUIRE);
> > > +		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > +					rte_memory_order_acquire);
> > >  		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
> > >  		while (bmap) {
> > >  			t = __builtin_ctzl(bmap);
> > > @@ -241,26 +241,26 @@
> > >  	fprintf(f, "\n");
> > >
> > >  	fprintf(f, "  Token = %" PRIu64 "\n",
> > > -			__atomic_load_n(&v->token, __ATOMIC_ACQUIRE));
> > > +			rte_atomic_load_explicit(&v->token, rte_memory_order_acquire));
> > >
> > >  	fprintf(f, "  Least Acknowledged Token = %" PRIu64 "\n",
> > > -			__atomic_load_n(&v->acked_token, __ATOMIC_ACQUIRE));
> > > +			rte_atomic_load_explicit(&v->acked_token,
> > > +rte_memory_order_acquire));
> > >
> > >  	fprintf(f, "Quiescent State Counts for readers:\n");
> > >  	for (i = 0; i < v->num_elems; i++) {
> > > -		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > -					__ATOMIC_ACQUIRE);
> > > +		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > +					rte_memory_order_acquire);
> > >  		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
> > >  		while (bmap) {
> > >  			t = __builtin_ctzl(bmap);
> > >  			fprintf(f, "thread ID = %u, count = %" PRIu64 ", lock count = %u\n",
> > >  				id + t,
> > > -				__atomic_load_n(
> > > +				rte_atomic_load_explicit(
> > >  					&v->qsbr_cnt[id + t].cnt,
> > > -					__ATOMIC_RELAXED),
> > > -				__atomic_load_n(
> > > +					rte_memory_order_relaxed),
> > > +				rte_atomic_load_explicit(
> > >  					&v->qsbr_cnt[id + t].lock_cnt,
> > > -					__ATOMIC_RELAXED));
> > > +					rte_memory_order_relaxed));
> > >  			bmap &= ~(1UL << t);
> > >  		}
> > >  	}
> > > diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h index
> > > 87e1b55..9f4aed2 100644
> > > --- a/lib/rcu/rte_rcu_qsbr.h
> > > +++ b/lib/rcu/rte_rcu_qsbr.h
> > > @@ -63,11 +63,11 @@
> > >   * Given thread id needs to be converted to index into the array and
> > >   * the id within the array element.
> > >   */
> > > -#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8)
> > > +#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE
> > > +(sizeof(RTE_ATOMIC(uint64_t)) *
> > > +8)
> > >  #define __RTE_QSBR_THRID_ARRAY_SIZE(max_threads) \
> > >  	RTE_ALIGN(RTE_ALIGN_MUL_CEIL(max_threads, \
> > >  		__RTE_QSBR_THRID_ARRAY_ELM_SIZE) >> 3, RTE_CACHE_LINE_SIZE)
> > > -#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *) \
> > > +#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t __rte_atomic *)
> > > +\
> >
> > Is it equivalent to ((RTE_ATOMIC(uint64_t) *)?
> 
> i'm not sure if you're asking about the resultant type of the expression or not?

I see other places are using specifier hence the question.

> 
> in this context we aren't specifying an atomic type but rather adding the atomic qualifier
> to what should already be a variable that has an atomic specified type with a cast which
> is why we use __rte_atomic.

I read from document [1] that atomic qualified type may have a different size from the original type.
If that is the case, the size difference could cause issue in bitmap array accessing.
Did I misunderstand?

[1] https://en.cppreference.com/w/c/language/atomic

> 
> >
> > >  	((struct rte_rcu_qsbr_cnt *)(v + 1) + v->max_threads) + i)
> > > #define __RTE_QSBR_THRID_INDEX_SHIFT 6  #define
> > > __RTE_QSBR_THRID_MASK 0x3f @@ -75,13 +75,13 @@
> > >
> >
> > <snip>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 13/19] vhost: use rte optional stdatomic API
  2023-10-26  0:31   ` [PATCH v3 13/19] vhost: " Tyler Retzlaff
@ 2023-10-26 11:57     ` Maxime Coquelin
  0 siblings, 0 replies; 91+ messages in thread
From: Maxime Coquelin @ 2023-10-26 11:57 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Narcisa Ana Maria Vasile,
	Nicolas Chautru, Olivier Matz, Ori Kam, Pallavi Kadam,
	Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang



On 10/26/23 02:31, Tyler Retzlaff wrote:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/vhost/vdpa.c            |  3 ++-
>   lib/vhost/vhost.c           | 42 ++++++++++++++++----------------
>   lib/vhost/vhost.h           | 39 ++++++++++++++++--------------
>   lib/vhost/vhost_user.c      |  6 ++---
>   lib/vhost/virtio_net.c      | 58 +++++++++++++++++++++++++--------------------
>   lib/vhost/virtio_net_ctrl.c |  6 +++--
>   6 files changed, 84 insertions(+), 70 deletions(-)
> 

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 02/19] bbdev: use rte optional stdatomic API
  2023-10-26  0:31   ` [PATCH v3 02/19] bbdev: " Tyler Retzlaff
@ 2023-10-26 11:57     ` Maxime Coquelin
  0 siblings, 0 replies; 91+ messages in thread
From: Maxime Coquelin @ 2023-10-26 11:57 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Narcisa Ana Maria Vasile,
	Nicolas Chautru, Olivier Matz, Ori Kam, Pallavi Kadam,
	Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang



On 10/26/23 02:31, Tyler Retzlaff wrote:
> Replace the use of gcc builtin __atomic_xxx intrinsics with corresponding
> rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/bbdev/rte_bbdev.c | 6 +++---
>   lib/bbdev/rte_bbdev.h | 2 +-
>   2 files changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/lib/bbdev/rte_bbdev.c b/lib/bbdev/rte_bbdev.c
> index 155323e..cfebea0 100644
> --- a/lib/bbdev/rte_bbdev.c
> +++ b/lib/bbdev/rte_bbdev.c
> @@ -208,7 +208,7 @@ struct rte_bbdev *
>   		return NULL;
>   	}
>   
> -	__atomic_fetch_add(&bbdev->data->process_cnt, 1, __ATOMIC_RELAXED);
> +	rte_atomic_fetch_add_explicit(&bbdev->data->process_cnt, 1, rte_memory_order_relaxed);
>   	bbdev->data->dev_id = dev_id;
>   	bbdev->state = RTE_BBDEV_INITIALIZED;
>   
> @@ -250,8 +250,8 @@ struct rte_bbdev *
>   	}
>   
>   	/* clear shared BBDev Data if no process is using the device anymore */
> -	if (__atomic_fetch_sub(&bbdev->data->process_cnt, 1,
> -			      __ATOMIC_RELAXED) - 1 == 0)
> +	if (rte_atomic_fetch_sub_explicit(&bbdev->data->process_cnt, 1,
> +			      rte_memory_order_relaxed) - 1 == 0)
>   		memset(bbdev->data, 0, sizeof(*bbdev->data));
>   
>   	memset(bbdev, 0, sizeof(*bbdev));
> diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
> index d12e2e7..e1aee08 100644
> --- a/lib/bbdev/rte_bbdev.h
> +++ b/lib/bbdev/rte_bbdev.h
> @@ -482,7 +482,7 @@ struct rte_bbdev_data {
>   	uint16_t dev_id;  /**< Device ID */
>   	int socket_id;  /**< NUMA socket that device is on */
>   	bool started;  /**< Device run-time state */
> -	uint16_t process_cnt;  /** Counter of processes using the device */
> +	RTE_ATOMIC(uint16_t) process_cnt;  /** Counter of processes using the device */
>   };
>   
>   /* Forward declarations */

Reviewed-by: Maxime Coquelin <maxime.coquelin@redhat.com>

Thanks,
Maxime


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 00/19] use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (18 preceding siblings ...)
  2023-10-26  0:31   ` [PATCH v3 19/19] ring: " Tyler Retzlaff
@ 2023-10-26 13:47   ` David Marchand
  2023-10-30 15:34   ` David Marchand
  20 siblings, 0 replies; 91+ messages in thread
From: David Marchand @ 2023-10-26 13:47 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Akhil Goyal, Anatoly Burakov, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Ciara Power, David Christensen,
	David Hunt, Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang

On Thu, Oct 26, 2023 at 2:32 AM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API.
>
> v3:
>   * add missing atomic specification on head variable
>     in struct rte_ring_headtail
>   * adapt to use rte_atomic_xxx stdatomic API in rte_ring_c11_pvt.h
>   * split comma operator statement into 2 statements
>
> v2:
>   * add #include <rte_stdatomic.h> to rte_mbuf_core.h
>   * remove first two patches which were fixes that have
>     been merged in another series
>
> Tyler Retzlaff (19):
>   power: use rte optional stdatomic API
>   bbdev: use rte optional stdatomic API
>   eal: use rte optional stdatomic API
>   eventdev: use rte optional stdatomic API
>   gpudev: use rte optional stdatomic API
>   ipsec: use rte optional stdatomic API
>   mbuf: use rte optional stdatomic API
>   mempool: use rte optional stdatomic API
>   rcu: use rte optional stdatomic API
>   pdump: use rte optional stdatomic API
>   stack: use rte optional stdatomic API
>   telemetry: use rte optional stdatomic API
>   vhost: use rte optional stdatomic API
>   cryptodev: use rte optional stdatomic API
>   distributor: use rte optional stdatomic API
>   ethdev: use rte optional stdatomic API
>   hash: use rte optional stdatomic API
>   timer: use rte optional stdatomic API
>   ring: use rte optional stdatomic API
>
>  drivers/event/cnxk/cnxk_tim_worker.h   |   4 +-
>  drivers/net/mlx5/mlx5_hws_cnt.h        |   4 +-
>  lib/bbdev/rte_bbdev.c                  |   6 +-
>  lib/bbdev/rte_bbdev.h                  |   2 +-
>  lib/cryptodev/rte_cryptodev.c          |  22 +++---
>  lib/cryptodev/rte_cryptodev.h          |  16 ++---
>  lib/distributor/distributor_private.h  |   4 +-
>  lib/distributor/rte_distributor.c      |  54 +++++++--------
>  lib/eal/common/eal_common_launch.c     |  10 +--
>  lib/eal/common/eal_common_mcfg.c       |   2 +-
>  lib/eal/common/eal_common_proc.c       |  14 ++--
>  lib/eal/common/eal_common_thread.c     |  26 +++----
>  lib/eal/common/eal_common_trace.c      |   8 +--
>  lib/eal/common/eal_common_trace_ctf.c  |   4 +-
>  lib/eal/common/eal_memcfg.h            |   2 +-
>  lib/eal/common/eal_private.h           |   4 +-
>  lib/eal/common/eal_trace.h             |   4 +-
>  lib/eal/common/rte_service.c           | 122 ++++++++++++++++-----------------
>  lib/eal/freebsd/eal.c                  |  20 +++---
>  lib/eal/include/rte_epoll.h            |   3 +-
>  lib/eal/linux/eal.c                    |  26 +++----
>  lib/eal/linux/eal_interrupts.c         |  42 ++++++------
>  lib/eal/ppc/include/rte_atomic.h       |   6 +-
>  lib/eal/windows/rte_thread.c           |   8 ++-
>  lib/ethdev/ethdev_driver.h             |  16 ++---
>  lib/ethdev/ethdev_private.c            |   6 +-
>  lib/ethdev/rte_ethdev.c                |  24 +++----
>  lib/ethdev/rte_ethdev.h                |  16 ++---
>  lib/ethdev/rte_ethdev_core.h           |   2 +-
>  lib/eventdev/rte_event_timer_adapter.c |  66 +++++++++---------
>  lib/eventdev/rte_event_timer_adapter.h |   2 +-
>  lib/gpudev/gpudev.c                    |   6 +-
>  lib/gpudev/gpudev_driver.h             |   2 +-
>  lib/hash/rte_cuckoo_hash.c             | 116 +++++++++++++++----------------
>  lib/hash/rte_cuckoo_hash.h             |   6 +-
>  lib/ipsec/ipsec_sqn.h                  |   2 +-
>  lib/ipsec/sa.h                         |   2 +-
>  lib/mbuf/rte_mbuf.h                    |  20 +++---
>  lib/mbuf/rte_mbuf_core.h               |   5 +-
>  lib/mempool/rte_mempool.h              |   4 +-
>  lib/pdump/rte_pdump.c                  |  14 ++--
>  lib/pdump/rte_pdump.h                  |   8 +--
>  lib/power/power_acpi_cpufreq.c         |  33 ++++-----
>  lib/power/power_cppc_cpufreq.c         |  25 +++----
>  lib/power/power_pstate_cpufreq.c       |  31 +++++----
>  lib/rcu/rte_rcu_qsbr.c                 |  48 ++++++-------
>  lib/rcu/rte_rcu_qsbr.h                 |  68 +++++++++---------
>  lib/ring/rte_ring_c11_pvt.h            |  47 +++++++------
>  lib/ring/rte_ring_core.h               |  12 ++--
>  lib/ring/rte_ring_generic_pvt.h        |  16 +++--
>  lib/ring/rte_ring_hts_elem_pvt.h       |  22 +++---
>  lib/ring/rte_ring_peek_elem_pvt.h      |   6 +-
>  lib/ring/rte_ring_rts_elem_pvt.h       |  27 ++++----
>  lib/stack/rte_stack.h                  |   2 +-
>  lib/stack/rte_stack_lf_c11.h           |  24 +++----
>  lib/stack/rte_stack_lf_generic.h       |  18 ++---
>  lib/telemetry/telemetry.c              |  18 ++---
>  lib/timer/rte_timer.c                  |  50 +++++++-------
>  lib/timer/rte_timer.h                  |   6 +-
>  lib/vhost/vdpa.c                       |   3 +-
>  lib/vhost/vhost.c                      |  42 ++++++------
>  lib/vhost/vhost.h                      |  39 ++++++-----
>  lib/vhost/vhost_user.c                 |   6 +-
>  lib/vhost/virtio_net.c                 |  58 +++++++++-------
>  lib/vhost/virtio_net_ctrl.c            |   6 +-
>  65 files changed, 684 insertions(+), 653 deletions(-)

The conversion looks correct to me and the CI looks happy.

There are a few acks (from Konstantin) that are missing in the v3 (but
I can fix this myself, no need for a new revision just for this).
For the series,
Acked-by: David Marchand <david.marchand@redhat.com>


-- 
David Marchand


^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: [EXT] [PATCH v3 14/19] cryptodev: use rte optional stdatomic API
  2023-10-26  0:31   ` [PATCH v3 14/19] cryptodev: " Tyler Retzlaff
@ 2023-10-26 15:53     ` Akhil Goyal
  2023-10-27 13:05     ` Konstantin Ananyev
  1 sibling, 0 replies; 91+ messages in thread
From: Akhil Goyal @ 2023-10-26 15:53 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Anatoly Burakov, Andrew Rybchenko, Bruce Richardson, Chenbo Xia,
	Ciara Power, David Christensen, David Hunt, Dmitry Kozlyuk,
	Dmitry Malloy, Elena Agostini, Erik Gabriel Carrillo, Fan Zhang,
	Ferruh Yigit, Harman Kalra, Harry van Haaren,
	Honnappa Nagarahalli, Jerin Jacob Kollanukkaran,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh Bhagavatula, Reshma Pattan,
	Sameh Gobriel, Shijith Thotton, Sivaprasad Tummala,
	Stephen Hemminger, Suanming Mou, Sunil Kumar Kori,
	Thomas Monjalon, Viacheslav Ovsiienko, Vladimir Medvedkin,
	Yipeng Wang

> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Akhil Goyal <gakhil@marvell.com>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* RE: [EXT] [PATCH v3 06/19] ipsec: use rte optional stdatomic API
  2023-10-26  0:31   ` [PATCH v3 06/19] ipsec: " Tyler Retzlaff
@ 2023-10-26 15:54     ` Akhil Goyal
  2023-10-27 12:59     ` Konstantin Ananyev
  1 sibling, 0 replies; 91+ messages in thread
From: Akhil Goyal @ 2023-10-26 15:54 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Anatoly Burakov, Andrew Rybchenko, Bruce Richardson, Chenbo Xia,
	Ciara Power, David Christensen, David Hunt, Dmitry Kozlyuk,
	Dmitry Malloy, Elena Agostini, Erik Gabriel Carrillo, Fan Zhang,
	Ferruh Yigit, Harman Kalra, Harry van Haaren,
	Honnappa Nagarahalli, Jerin Jacob Kollanukkaran,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh Bhagavatula, Reshma Pattan,
	Sameh Gobriel, Shijith Thotton, Sivaprasad Tummala,
	Stephen Hemminger, Suanming Mou, Sunil Kumar Kori,
	Thomas Monjalon, Viacheslav Ovsiienko, Vladimir Medvedkin,
	Yipeng Wang

> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
Acked-by: Akhil Goyal <gakhil@marvell.com>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v2 09/19] rcu: use rte optional stdatomic API
  2023-10-26  4:24         ` Ruifeng Wang
@ 2023-10-26 16:36           ` Tyler Retzlaff
  0 siblings, 0 replies; 91+ messages in thread
From: Tyler Retzlaff @ 2023-10-26 16:36 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: dev, Akhil Goyal, Anatoly Burakov, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Ciara Power, David Christensen,
	David Hunt, Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, jerinj,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, thomas, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang, nd

On Thu, Oct 26, 2023 at 04:24:54AM +0000, Ruifeng Wang wrote:
> > -----Original Message-----
> > From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > Sent: Thursday, October 26, 2023 6:38 AM
> > To: Ruifeng Wang <Ruifeng.Wang@arm.com>
> > Cc: dev@dpdk.org; Akhil Goyal <gakhil@marvell.com>; Anatoly Burakov
> > <anatoly.burakov@intel.com>; Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>; Bruce
> > Richardson <bruce.richardson@intel.com>; Chenbo Xia <chenbo.xia@intel.com>; Ciara Power
> > <ciara.power@intel.com>; David Christensen <drc@linux.vnet.ibm.com>; David Hunt
> > <david.hunt@intel.com>; Dmitry Kozlyuk <dmitry.kozliuk@gmail.com>; Dmitry Malloy
> > <dmitrym@microsoft.com>; Elena Agostini <eagostini@nvidia.com>; Erik Gabriel Carrillo
> > <erik.g.carrillo@intel.com>; Fan Zhang <fanzhang.oss@gmail.com>; Ferruh Yigit
> > <ferruh.yigit@amd.com>; Harman Kalra <hkalra@marvell.com>; Harry van Haaren
> > <harry.van.haaren@intel.com>; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> > jerinj@marvell.com; Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>; Matan Azrad
> > <matan@nvidia.com>; Maxime Coquelin <maxime.coquelin@redhat.com>; Narcisa Ana Maria Vasile
> > <navasile@linux.microsoft.com>; Nicolas Chautru <nicolas.chautru@intel.com>; Olivier Matz
> > <olivier.matz@6wind.com>; Ori Kam <orika@nvidia.com>; Pallavi Kadam
> > <pallavi.kadam@intel.com>; Pavan Nikhilesh <pbhagavatula@marvell.com>; Reshma Pattan
> > <reshma.pattan@intel.com>; Sameh Gobriel <sameh.gobriel@intel.com>; Shijith Thotton
> > <sthotton@marvell.com>; Sivaprasad Tummala <sivaprasad.tummala@amd.com>; Stephen Hemminger
> > <stephen@networkplumber.org>; Suanming Mou <suanmingm@nvidia.com>; Sunil Kumar Kori
> > <skori@marvell.com>; thomas@monjalon.net; Viacheslav Ovsiienko <viacheslavo@nvidia.com>;
> > Vladimir Medvedkin <vladimir.medvedkin@intel.com>; Yipeng Wang <yipeng1.wang@intel.com>;
> > nd <nd@arm.com>
> > Subject: Re: [PATCH v2 09/19] rcu: use rte optional stdatomic API
> > 
> > On Wed, Oct 25, 2023 at 09:41:22AM +0000, Ruifeng Wang wrote:
> > > > -----Original Message-----
> > > > From: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > > Sent: Wednesday, October 18, 2023 4:31 AM
> > > > To: dev@dpdk.org
> > > > Cc: Akhil Goyal <gakhil@marvell.com>; Anatoly Burakov
> > > > <anatoly.burakov@intel.com>; Andrew Rybchenko
> > > > <andrew.rybchenko@oktetlabs.ru>; Bruce Richardson
> > > > <bruce.richardson@intel.com>; Chenbo Xia <chenbo.xia@intel.com>;
> > > > Ciara Power <ciara.power@intel.com>; David Christensen
> > > > <drc@linux.vnet.ibm.com>; David Hunt <david.hunt@intel.com>; Dmitry
> > > > Kozlyuk <dmitry.kozliuk@gmail.com>; Dmitry Malloy
> > > > <dmitrym@microsoft.com>; Elena Agostini <eagostini@nvidia.com>; Erik
> > > > Gabriel Carrillo <erik.g.carrillo@intel.com>; Fan Zhang
> > > > <fanzhang.oss@gmail.com>; Ferruh Yigit <ferruh.yigit@amd.com>;
> > > > Harman Kalra <hkalra@marvell.com>; Harry van Haaren
> > > > <harry.van.haaren@intel.com>; Honnappa Nagarahalli
> > > > <Honnappa.Nagarahalli@arm.com>; jerinj@marvell.com; Konstantin
> > > > Ananyev <konstantin.v.ananyev@yandex.ru>; Matan Azrad
> > > > <matan@nvidia.com>; Maxime Coquelin <maxime.coquelin@redhat.com>;
> > > > Narcisa Ana Maria Vasile <navasile@linux.microsoft.com>; Nicolas
> > > > Chautru <nicolas.chautru@intel.com>; Olivier Matz
> > > > <olivier.matz@6wind.com>; Ori Kam <orika@nvidia.com>; Pallavi Kadam
> > > > <pallavi.kadam@intel.com>; Pavan Nikhilesh
> > > > <pbhagavatula@marvell.com>; Reshma Pattan <reshma.pattan@intel.com>;
> > > > Sameh Gobriel <sameh.gobriel@intel.com>; Shijith Thotton
> > > > <sthotton@marvell.com>; Sivaprasad Tummala
> > > > <sivaprasad.tummala@amd.com>; Stephen Hemminger
> > > > <stephen@networkplumber.org>; Suanming Mou <suanmingm@nvidia.com>;
> > > > Sunil Kumar Kori <skori@marvell.com>; thomas@monjalon.net;
> > > > Viacheslav Ovsiienko <viacheslavo@nvidia.com>; Vladimir Medvedkin
> > > > <vladimir.medvedkin@intel.com>; Yipeng Wang
> > > > <yipeng1.wang@intel.com>; Tyler Retzlaff
> > > > <roretzla@linux.microsoft.com>
> > > > Subject: [PATCH v2 09/19] rcu: use rte optional stdatomic API
> > > >
> > > > Replace the use of gcc builtin __atomic_xxx intrinsics with
> > > > corresponding rte_atomic_xxx optional stdatomic API
> > > >
> > > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > > ---
> > > >  lib/rcu/rte_rcu_qsbr.c | 48 +++++++++++++++++------------------
> > > >  lib/rcu/rte_rcu_qsbr.h | 68
> > > > +++++++++++++++++++++++++-------------------------
> > > >  2 files changed, 58 insertions(+), 58 deletions(-)
> > > >
> > > > diff --git a/lib/rcu/rte_rcu_qsbr.c b/lib/rcu/rte_rcu_qsbr.c index
> > > > 17be93e..4dc7714 100644
> > > > --- a/lib/rcu/rte_rcu_qsbr.c
> > > > +++ b/lib/rcu/rte_rcu_qsbr.c
> > > > @@ -102,21 +102,21 @@
> > > >  	 * go out of sync. Hence, additional checks are required.
> > > >  	 */
> > > >  	/* Check if the thread is already registered */
> > > > -	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > -					__ATOMIC_RELAXED);
> > > > +	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > +					rte_memory_order_relaxed);
> > > >  	if (old_bmap & 1UL << id)
> > > >  		return 0;
> > > >
> > > >  	do {
> > > >  		new_bmap = old_bmap | (1UL << id);
> > > > -		success = __atomic_compare_exchange(
> > > > +		success = rte_atomic_compare_exchange_strong_explicit(
> > > >  					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > -					&old_bmap, &new_bmap, 0,
> > > > -					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
> > > > +					&old_bmap, new_bmap,
> > > > +					rte_memory_order_release, rte_memory_order_relaxed);
> > > >
> > > >  		if (success)
> > > > -			__atomic_fetch_add(&v->num_threads,
> > > > -						1, __ATOMIC_RELAXED);
> > > > +			rte_atomic_fetch_add_explicit(&v->num_threads,
> > > > +						1, rte_memory_order_relaxed);
> > > >  		else if (old_bmap & (1UL << id))
> > > >  			/* Someone else registered this thread.
> > > >  			 * Counter should not be incremented.
> > > > @@ -154,8 +154,8 @@
> > > >  	 * go out of sync. Hence, additional checks are required.
> > > >  	 */
> > > >  	/* Check if the thread is already unregistered */
> > > > -	old_bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > -					__ATOMIC_RELAXED);
> > > > +	old_bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > +					rte_memory_order_relaxed);
> > > >  	if (!(old_bmap & (1UL << id)))
> > > >  		return 0;
> > > >
> > > > @@ -165,14 +165,14 @@
> > > >  		 * completed before removal of the thread from the list of
> > > >  		 * reporting threads.
> > > >  		 */
> > > > -		success = __atomic_compare_exchange(
> > > > +		success = rte_atomic_compare_exchange_strong_explicit(
> > > >  					__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > -					&old_bmap, &new_bmap, 0,
> > > > -					__ATOMIC_RELEASE, __ATOMIC_RELAXED);
> > > > +					&old_bmap, new_bmap,
> > > > +					rte_memory_order_release, rte_memory_order_relaxed);
> > > >
> > > >  		if (success)
> > > > -			__atomic_fetch_sub(&v->num_threads,
> > > > -						1, __ATOMIC_RELAXED);
> > > > +			rte_atomic_fetch_sub_explicit(&v->num_threads,
> > > > +						1, rte_memory_order_relaxed);
> > > >  		else if (!(old_bmap & (1UL << id)))
> > > >  			/* Someone else unregistered this thread.
> > > >  			 * Counter should not be incremented.
> > > > @@ -227,8 +227,8 @@
> > > >
> > > >  	fprintf(f, "  Registered thread IDs = ");
> > > >  	for (i = 0; i < v->num_elems; i++) {
> > > > -		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > -					__ATOMIC_ACQUIRE);
> > > > +		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > +					rte_memory_order_acquire);
> > > >  		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
> > > >  		while (bmap) {
> > > >  			t = __builtin_ctzl(bmap);
> > > > @@ -241,26 +241,26 @@
> > > >  	fprintf(f, "\n");
> > > >
> > > >  	fprintf(f, "  Token = %" PRIu64 "\n",
> > > > -			__atomic_load_n(&v->token, __ATOMIC_ACQUIRE));
> > > > +			rte_atomic_load_explicit(&v->token, rte_memory_order_acquire));
> > > >
> > > >  	fprintf(f, "  Least Acknowledged Token = %" PRIu64 "\n",
> > > > -			__atomic_load_n(&v->acked_token, __ATOMIC_ACQUIRE));
> > > > +			rte_atomic_load_explicit(&v->acked_token,
> > > > +rte_memory_order_acquire));
> > > >
> > > >  	fprintf(f, "Quiescent State Counts for readers:\n");
> > > >  	for (i = 0; i < v->num_elems; i++) {
> > > > -		bmap = __atomic_load_n(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > -					__ATOMIC_ACQUIRE);
> > > > +		bmap = rte_atomic_load_explicit(__RTE_QSBR_THRID_ARRAY_ELM(v, i),
> > > > +					rte_memory_order_acquire);
> > > >  		id = i << __RTE_QSBR_THRID_INDEX_SHIFT;
> > > >  		while (bmap) {
> > > >  			t = __builtin_ctzl(bmap);
> > > >  			fprintf(f, "thread ID = %u, count = %" PRIu64 ", lock count = %u\n",
> > > >  				id + t,
> > > > -				__atomic_load_n(
> > > > +				rte_atomic_load_explicit(
> > > >  					&v->qsbr_cnt[id + t].cnt,
> > > > -					__ATOMIC_RELAXED),
> > > > -				__atomic_load_n(
> > > > +					rte_memory_order_relaxed),
> > > > +				rte_atomic_load_explicit(
> > > >  					&v->qsbr_cnt[id + t].lock_cnt,
> > > > -					__ATOMIC_RELAXED));
> > > > +					rte_memory_order_relaxed));
> > > >  			bmap &= ~(1UL << t);
> > > >  		}
> > > >  	}
> > > > diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h index
> > > > 87e1b55..9f4aed2 100644
> > > > --- a/lib/rcu/rte_rcu_qsbr.h
> > > > +++ b/lib/rcu/rte_rcu_qsbr.h
> > > > @@ -63,11 +63,11 @@
> > > >   * Given thread id needs to be converted to index into the array and
> > > >   * the id within the array element.
> > > >   */
> > > > -#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE (sizeof(uint64_t) * 8)
> > > > +#define __RTE_QSBR_THRID_ARRAY_ELM_SIZE
> > > > +(sizeof(RTE_ATOMIC(uint64_t)) *
> > > > +8)
> > > >  #define __RTE_QSBR_THRID_ARRAY_SIZE(max_threads) \
> > > >  	RTE_ALIGN(RTE_ALIGN_MUL_CEIL(max_threads, \
> > > >  		__RTE_QSBR_THRID_ARRAY_ELM_SIZE) >> 3, RTE_CACHE_LINE_SIZE)
> > > > -#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t *) \
> > > > +#define __RTE_QSBR_THRID_ARRAY_ELM(v, i) ((uint64_t __rte_atomic *)
> > > > +\
> > >
> > > Is it equivalent to ((RTE_ATOMIC(uint64_t) *)?
> > 
> > i'm not sure if you're asking about the resultant type of the expression or not?
> 
> I see other places are using specifier hence the question.
> 
> > 
> > in this context we aren't specifying an atomic type but rather adding the atomic qualifier
> > to what should already be a variable that has an atomic specified type with a cast which
> > is why we use __rte_atomic.
> 
> I read from document [1] that atomic qualified type may have a different size from the original type.
> If that is the case, the size difference could cause issue in bitmap array accessing.
> Did I misunderstand?
> 
> [1] https://en.cppreference.com/w/c/language/atomic
> 

you do not misunderstand, the standard allows atomic specified type
sizes to differ from their ordinary native type sizes. though i have
some issues with how cppreference is wording things here as compared
with the actual standard.

one of the reasons is it allows all standard atomic functions to be
'generic' which means they can be used on objects of arbitrary size
instead of just integer and pointer types. i.e. you can use it on
struct, union and array types.

it's implementation defined how the operations are made atomic and
is obviously target processor dependent, but in cases when the processor
has no intrinsic support to perform the operation atomically the toolchain
may generate the code that uses a hidden lock. you can test whether this
is the case for arbitrary objects using standard specified atomic_is_lock_free.
https://en.cppreference.com/w/c/atomic/atomic_is_lock_free

so that's the long answer form of why they *may* be different size,
alignment etc.. but the real question is in this instance will it be?

probably not.

mainly because it wouldn't make a lot of sense for clang/gcc to suddenly
decide that sizeof(uint64_t) != sizeof(_Atomic(uint64_t)) or that they
should need to use a lock on amd64 processor to load/store atomically
(assuming native alignment) etc..

a lot of the above is why we had a lot of discussion about how and when
we could enable the use of standard C11 atomics in dpdk. as you've
probably noticed for existing platforms, toolchains and targets it is
actually defaulted off, but it does allow binary packagers or users to
build with it on.

for compatibility only the strictest of guarantees can be made when dpdk
and the application are both built consistently to use or not use
standard atomics. it is strongly cautioned that applications should not
attempt to use an unmatched atomic operation on a dpdk atomic object.
i.e. if you enabled standard atomics, don't use __atomic_load_n directly
on a field from a public dpdk structure, instead use
rte_atomic_load_explicit and make sure your application defines
RTE_ENABLE_STDATOMIC.

hope this explanation helps.

> > 
> > >
> > > >  	((struct rte_rcu_qsbr_cnt *)(v + 1) + v->max_threads) + i)
> > > > #define __RTE_QSBR_THRID_INDEX_SHIFT 6  #define
> > > > __RTE_QSBR_THRID_MASK 0x3f @@ -75,13 +75,13 @@
> > > >
> > >
> > > <snip>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 19/19] ring: use rte optional stdatomic API
  2023-10-26  0:31   ` [PATCH v3 19/19] ring: " Tyler Retzlaff
@ 2023-10-27 12:58     ` Konstantin Ananyev
  0 siblings, 0 replies; 91+ messages in thread
From: Konstantin Ananyev @ 2023-10-27 12:58 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

26.10.2023 01:31, Tyler Retzlaff пишет:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   drivers/net/mlx5/mlx5_hws_cnt.h   |  4 ++--
>   lib/ring/rte_ring_c11_pvt.h       | 47 +++++++++++++++++++++------------------
>   lib/ring/rte_ring_core.h          | 12 +++++-----
>   lib/ring/rte_ring_generic_pvt.h   | 16 +++++++------
>   lib/ring/rte_ring_hts_elem_pvt.h  | 22 +++++++++---------
>   lib/ring/rte_ring_peek_elem_pvt.h |  6 ++---
>   lib/ring/rte_ring_rts_elem_pvt.h  | 27 +++++++++++-----------
>   7 files changed, 71 insertions(+), 63 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_hws_cnt.h b/drivers/net/mlx5/mlx5_hws_cnt.h
> index f462665..dcd5cec 100644
> --- a/drivers/net/mlx5/mlx5_hws_cnt.h
> +++ b/drivers/net/mlx5/mlx5_hws_cnt.h
> @@ -386,7 +386,7 @@ struct mlx5_hws_age_param {
>   
>   	MLX5_ASSERT(r->prod.sync_type == RTE_RING_SYNC_ST);
>   	MLX5_ASSERT(r->cons.sync_type == RTE_RING_SYNC_ST);
> -	current_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
> +	current_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
>   	MLX5_ASSERT(n <= r->capacity);
>   	MLX5_ASSERT(n <= rte_ring_count(r));
>   	revert2head = current_head - n;
> @@ -394,7 +394,7 @@ struct mlx5_hws_age_param {
>   	__rte_ring_get_elem_addr(r, revert2head, sizeof(cnt_id_t), n,
>   			&zcd->ptr1, &zcd->n1, &zcd->ptr2);
>   	/* Update tail */
> -	__atomic_store_n(&r->prod.tail, revert2head, __ATOMIC_RELEASE);
> +	rte_atomic_store_explicit(&r->prod.tail, revert2head, rte_memory_order_release);
>   	return n;
>   }
>   
> diff --git a/lib/ring/rte_ring_c11_pvt.h b/lib/ring/rte_ring_c11_pvt.h
> index f895950..5c10ad8 100644
> --- a/lib/ring/rte_ring_c11_pvt.h
> +++ b/lib/ring/rte_ring_c11_pvt.h
> @@ -22,9 +22,10 @@
>   	 * we need to wait for them to complete
>   	 */
>   	if (!single)
> -		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> +		rte_wait_until_equal_32((uint32_t *)(uintptr_t)&ht->tail, old_val,
> +			rte_memory_order_relaxed);
>   
> -	__atomic_store_n(&ht->tail, new_val, __ATOMIC_RELEASE);
> +	rte_atomic_store_explicit(&ht->tail, new_val, rte_memory_order_release);
>   }
>   
>   /**
> @@ -61,19 +62,19 @@
>   	unsigned int max = n;
>   	int success;
>   
> -	*old_head = __atomic_load_n(&r->prod.head, __ATOMIC_RELAXED);
> +	*old_head = rte_atomic_load_explicit(&r->prod.head, rte_memory_order_relaxed);
>   	do {
>   		/* Reset n to the initial burst count */
>   		n = max;
>   
>   		/* Ensure the head is read before tail */
> -		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> +		__atomic_thread_fence(rte_memory_order_acquire);
>   
>   		/* load-acquire synchronize with store-release of ht->tail
>   		 * in update_tail.
>   		 */
> -		cons_tail = __atomic_load_n(&r->cons.tail,
> -					__ATOMIC_ACQUIRE);
> +		cons_tail = rte_atomic_load_explicit(&r->cons.tail,
> +					rte_memory_order_acquire);
>   
>   		/* The subtraction is done between two unsigned 32bits value
>   		 * (the result is always modulo 32 bits even if we have
> @@ -91,14 +92,15 @@
>   			return 0;
>   
>   		*new_head = *old_head + n;
> -		if (is_sp)
> -			r->prod.head = *new_head, success = 1;
> -		else
> +		if (is_sp) {
> +			r->prod.head = *new_head;
> +			success = 1;
> +		} else
>   			/* on failure, *old_head is updated */
> -			success = __atomic_compare_exchange_n(&r->prod.head,
> +			success = rte_atomic_compare_exchange_strong_explicit(&r->prod.head,
>   					old_head, *new_head,
> -					0, __ATOMIC_RELAXED,
> -					__ATOMIC_RELAXED);
> +					rte_memory_order_relaxed,
> +					rte_memory_order_relaxed);
>   	} while (unlikely(success == 0));
>   	return n;
>   }
> @@ -137,19 +139,19 @@
>   	int success;
>   
>   	/* move cons.head atomically */
> -	*old_head = __atomic_load_n(&r->cons.head, __ATOMIC_RELAXED);
> +	*old_head = rte_atomic_load_explicit(&r->cons.head, rte_memory_order_relaxed);
>   	do {
>   		/* Restore n as it may change every loop */
>   		n = max;
>   
>   		/* Ensure the head is read before tail */
> -		__atomic_thread_fence(__ATOMIC_ACQUIRE);
> +		__atomic_thread_fence(rte_memory_order_acquire);
>   
>   		/* this load-acquire synchronize with store-release of ht->tail
>   		 * in update_tail.
>   		 */
> -		prod_tail = __atomic_load_n(&r->prod.tail,
> -					__ATOMIC_ACQUIRE);
> +		prod_tail = rte_atomic_load_explicit(&r->prod.tail,
> +					rte_memory_order_acquire);
>   
>   		/* The subtraction is done between two unsigned 32bits value
>   		 * (the result is always modulo 32 bits even if we have
> @@ -166,14 +168,15 @@
>   			return 0;
>   
>   		*new_head = *old_head + n;
> -		if (is_sc)
> -			r->cons.head = *new_head, success = 1;
> -		else
> +		if (is_sc) {
> +			r->cons.head = *new_head;


I still think we need to use atomic_store(), here, but it
should be a subject of another patch/discussion.
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

> +			success = 1;
> +		} else
>   			/* on failure, *old_head will be updated */
> -			success = __atomic_compare_exchange_n(&r->cons.head,
> +			success = rte_atomic_compare_exchange_strong_explicit(&r->cons.head,
>   							old_head, *new_head,
> -							0, __ATOMIC_RELAXED,
> -							__ATOMIC_RELAXED);
> +							rte_memory_order_relaxed,
> +							rte_memory_order_relaxed);
>   	} while (unlikely(success == 0));
>   	return n;
>   }
> diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
> index 327fdcf..b770873 100644
> --- a/lib/ring/rte_ring_core.h
> +++ b/lib/ring/rte_ring_core.h
> @@ -66,8 +66,8 @@ enum rte_ring_sync_type {
>    * but offset for *sync_type* and *tail* values should remain the same.
>    */
>   struct rte_ring_headtail {
> -	volatile uint32_t head;      /**< prod/consumer head. */
> -	volatile uint32_t tail;      /**< prod/consumer tail. */
> +	volatile RTE_ATOMIC(uint32_t) head;      /**< prod/consumer head. */
> +	volatile RTE_ATOMIC(uint32_t) tail;      /**< prod/consumer tail. */
>   	union {
>   		/** sync type of prod/cons */
>   		enum rte_ring_sync_type sync_type;
> @@ -78,7 +78,7 @@ struct rte_ring_headtail {
>   
>   union __rte_ring_rts_poscnt {
>   	/** raw 8B value to read/write *cnt* and *pos* as one atomic op */
> -	uint64_t raw __rte_aligned(8);
> +	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
>   	struct {
>   		uint32_t cnt; /**< head/tail reference counter */
>   		uint32_t pos; /**< head/tail position */
> @@ -94,10 +94,10 @@ struct rte_ring_rts_headtail {
>   
>   union __rte_ring_hts_pos {
>   	/** raw 8B value to read/write *head* and *tail* as one atomic op */
> -	uint64_t raw __rte_aligned(8);
> +	RTE_ATOMIC(uint64_t) raw __rte_aligned(8);
>   	struct {
> -		uint32_t head; /**< head position */
> -		uint32_t tail; /**< tail position */
> +		RTE_ATOMIC(uint32_t) head; /**< head position */
> +		RTE_ATOMIC(uint32_t) tail; /**< tail position */
>   	} pos;
>   };
>   
> diff --git a/lib/ring/rte_ring_generic_pvt.h b/lib/ring/rte_ring_generic_pvt.h
> index 5acb6e5..457f41d 100644
> --- a/lib/ring/rte_ring_generic_pvt.h
> +++ b/lib/ring/rte_ring_generic_pvt.h
> @@ -23,7 +23,8 @@
>   	 * we need to wait for them to complete
>   	 */
>   	if (!single)
> -		rte_wait_until_equal_32(&ht->tail, old_val, __ATOMIC_RELAXED);
> +		rte_wait_until_equal_32((volatile uint32_t *)(uintptr_t)&ht->tail, old_val,
> +			rte_memory_order_relaxed);
>   
>   	ht->tail = new_val;
>   }
> @@ -89,10 +90,11 @@
>   			return 0;
>   
>   		*new_head = *old_head + n;
> -		if (is_sp)
> -			r->prod.head = *new_head, success = 1;
> -		else
> -			success = rte_atomic32_cmpset(&r->prod.head,
> +		if (is_sp) {
> +			r->prod.head = *new_head;
> +			success = 1;
> +		} else
> +			success = rte_atomic32_cmpset((uint32_t *)(uintptr_t)&r->prod.head,
>   					*old_head, *new_head);
>   	} while (unlikely(success == 0));
>   	return n;
> @@ -162,8 +164,8 @@
>   			rte_smp_rmb();
>   			success = 1;
>   		} else {
> -			success = rte_atomic32_cmpset(&r->cons.head, *old_head,
> -					*new_head);
> +			success = rte_atomic32_cmpset((uint32_t *)(uintptr_t)&r->cons.head,
> +					*old_head, *new_head);
>   		}
>   	} while (unlikely(success == 0));
>   	return n;
> diff --git a/lib/ring/rte_ring_hts_elem_pvt.h b/lib/ring/rte_ring_hts_elem_pvt.h
> index a8678d3..91f5eec 100644
> --- a/lib/ring/rte_ring_hts_elem_pvt.h
> +++ b/lib/ring/rte_ring_hts_elem_pvt.h
> @@ -10,6 +10,8 @@
>   #ifndef _RTE_RING_HTS_ELEM_PVT_H_
>   #define _RTE_RING_HTS_ELEM_PVT_H_
>   
> +#include <rte_stdatomic.h>
> +
>   /**
>    * @file rte_ring_hts_elem_pvt.h
>    * It is not recommended to include this file directly,
> @@ -30,7 +32,7 @@
>   	RTE_SET_USED(enqueue);
>   
>   	tail = old_tail + num;
> -	__atomic_store_n(&ht->ht.pos.tail, tail, __ATOMIC_RELEASE);
> +	rte_atomic_store_explicit(&ht->ht.pos.tail, tail, rte_memory_order_release);
>   }
>   
>   /**
> @@ -44,7 +46,7 @@
>   {
>   	while (p->pos.head != p->pos.tail) {
>   		rte_pause();
> -		p->raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_ACQUIRE);
> +		p->raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_acquire);
>   	}
>   }
>   
> @@ -61,7 +63,7 @@
>   
>   	const uint32_t capacity = r->capacity;
>   
> -	op.raw = __atomic_load_n(&r->hts_prod.ht.raw, __ATOMIC_ACQUIRE);
> +	op.raw = rte_atomic_load_explicit(&r->hts_prod.ht.raw, rte_memory_order_acquire);
>   
>   	do {
>   		/* Reset n to the initial burst count */
> @@ -98,9 +100,9 @@
>   	 *  - OOO reads of cons tail value
>   	 *  - OOO copy of elems from the ring
>   	 */
> -	} while (__atomic_compare_exchange_n(&r->hts_prod.ht.raw,
> -			&op.raw, np.raw,
> -			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> +	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_prod.ht.raw,
> +			(uint64_t *)(uintptr_t)&op.raw, np.raw,
> +			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
>   
>   	*old_head = op.pos.head;
>   	return n;
> @@ -117,7 +119,7 @@
>   	uint32_t n;
>   	union __rte_ring_hts_pos np, op;
>   
> -	op.raw = __atomic_load_n(&r->hts_cons.ht.raw, __ATOMIC_ACQUIRE);
> +	op.raw = rte_atomic_load_explicit(&r->hts_cons.ht.raw, rte_memory_order_acquire);
>   
>   	/* move cons.head atomically */
>   	do {
> @@ -153,9 +155,9 @@
>   	 *  - OOO reads of prod tail value
>   	 *  - OOO copy of elems from the ring
>   	 */
> -	} while (__atomic_compare_exchange_n(&r->hts_cons.ht.raw,
> -			&op.raw, np.raw,
> -			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> +	} while (rte_atomic_compare_exchange_strong_explicit(&r->hts_cons.ht.raw,
> +			(uint64_t *)(uintptr_t)&op.raw, np.raw,
> +			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
>   
>   	*old_head = op.pos.head;
>   	return n;
> diff --git a/lib/ring/rte_ring_peek_elem_pvt.h b/lib/ring/rte_ring_peek_elem_pvt.h
> index bb0a7d5..b5f0822 100644
> --- a/lib/ring/rte_ring_peek_elem_pvt.h
> +++ b/lib/ring/rte_ring_peek_elem_pvt.h
> @@ -59,7 +59,7 @@
>   
>   	pos = tail + num;
>   	ht->head = pos;
> -	__atomic_store_n(&ht->tail, pos, __ATOMIC_RELEASE);
> +	rte_atomic_store_explicit(&ht->tail, pos, rte_memory_order_release);
>   }
>   
>   /**
> @@ -78,7 +78,7 @@
>   	uint32_t n;
>   	union __rte_ring_hts_pos p;
>   
> -	p.raw = __atomic_load_n(&ht->ht.raw, __ATOMIC_RELAXED);
> +	p.raw = rte_atomic_load_explicit(&ht->ht.raw, rte_memory_order_relaxed);
>   	n = p.pos.head - p.pos.tail;
>   
>   	RTE_ASSERT(n >= num);
> @@ -104,7 +104,7 @@
>   	p.pos.head = tail + num;
>   	p.pos.tail = p.pos.head;
>   
> -	__atomic_store_n(&ht->ht.raw, p.raw, __ATOMIC_RELEASE);
> +	rte_atomic_store_explicit(&ht->ht.raw, p.raw, rte_memory_order_release);
>   }
>   
>   /**
> diff --git a/lib/ring/rte_ring_rts_elem_pvt.h b/lib/ring/rte_ring_rts_elem_pvt.h
> index 7164213..1226503 100644
> --- a/lib/ring/rte_ring_rts_elem_pvt.h
> +++ b/lib/ring/rte_ring_rts_elem_pvt.h
> @@ -31,18 +31,19 @@
>   	 * might preceded us, then don't update tail with new value.
>   	 */
>   
> -	ot.raw = __atomic_load_n(&ht->tail.raw, __ATOMIC_ACQUIRE);
> +	ot.raw = rte_atomic_load_explicit(&ht->tail.raw, rte_memory_order_acquire);
>   
>   	do {
>   		/* on 32-bit systems we have to do atomic read here */
> -		h.raw = __atomic_load_n(&ht->head.raw, __ATOMIC_RELAXED);
> +		h.raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_relaxed);
>   
>   		nt.raw = ot.raw;
>   		if (++nt.val.cnt == h.val.cnt)
>   			nt.val.pos = h.val.pos;
>   
> -	} while (__atomic_compare_exchange_n(&ht->tail.raw, &ot.raw, nt.raw,
> -			0, __ATOMIC_RELEASE, __ATOMIC_ACQUIRE) == 0);
> +	} while (rte_atomic_compare_exchange_strong_explicit(&ht->tail.raw,
> +			(uint64_t *)(uintptr_t)&ot.raw, nt.raw,
> +			rte_memory_order_release, rte_memory_order_acquire) == 0);
>   }
>   
>   /**
> @@ -59,7 +60,7 @@
>   
>   	while (h->val.pos - ht->tail.val.pos > max) {
>   		rte_pause();
> -		h->raw = __atomic_load_n(&ht->head.raw, __ATOMIC_ACQUIRE);
> +		h->raw = rte_atomic_load_explicit(&ht->head.raw, rte_memory_order_acquire);
>   	}
>   }
>   
> @@ -76,7 +77,7 @@
>   
>   	const uint32_t capacity = r->capacity;
>   
> -	oh.raw = __atomic_load_n(&r->rts_prod.head.raw, __ATOMIC_ACQUIRE);
> +	oh.raw = rte_atomic_load_explicit(&r->rts_prod.head.raw, rte_memory_order_acquire);
>   
>   	do {
>   		/* Reset n to the initial burst count */
> @@ -113,9 +114,9 @@
>   	 *  - OOO reads of cons tail value
>   	 *  - OOO copy of elems to the ring
>   	 */
> -	} while (__atomic_compare_exchange_n(&r->rts_prod.head.raw,
> -			&oh.raw, nh.raw,
> -			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> +	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_prod.head.raw,
> +			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
> +			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
>   
>   	*old_head = oh.val.pos;
>   	return n;
> @@ -132,7 +133,7 @@
>   	uint32_t n;
>   	union __rte_ring_rts_poscnt nh, oh;
>   
> -	oh.raw = __atomic_load_n(&r->rts_cons.head.raw, __ATOMIC_ACQUIRE);
> +	oh.raw = rte_atomic_load_explicit(&r->rts_cons.head.raw, rte_memory_order_acquire);
>   
>   	/* move cons.head atomically */
>   	do {
> @@ -168,9 +169,9 @@
>   	 *  - OOO reads of prod tail value
>   	 *  - OOO copy of elems from the ring
>   	 */
> -	} while (__atomic_compare_exchange_n(&r->rts_cons.head.raw,
> -			&oh.raw, nh.raw,
> -			0, __ATOMIC_ACQUIRE, __ATOMIC_ACQUIRE) == 0);
> +	} while (rte_atomic_compare_exchange_strong_explicit(&r->rts_cons.head.raw,
> +			(uint64_t *)(uintptr_t)&oh.raw, nh.raw,
> +			rte_memory_order_acquire, rte_memory_order_acquire) == 0);
>   
>   	*old_head = oh.val.pos;
>   	return n;


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 06/19] ipsec: use rte optional stdatomic API
  2023-10-26  0:31   ` [PATCH v3 06/19] ipsec: " Tyler Retzlaff
  2023-10-26 15:54     ` [EXT] " Akhil Goyal
@ 2023-10-27 12:59     ` Konstantin Ananyev
  1 sibling, 0 replies; 91+ messages in thread
From: Konstantin Ananyev @ 2023-10-27 12:59 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

26.10.2023 01:31, Tyler Retzlaff пишет:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/ipsec/ipsec_sqn.h | 2 +-
>   lib/ipsec/sa.h        | 2 +-
>   2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/ipsec/ipsec_sqn.h b/lib/ipsec/ipsec_sqn.h
> index 505950e..984a9dd 100644
> --- a/lib/ipsec/ipsec_sqn.h
> +++ b/lib/ipsec/ipsec_sqn.h
> @@ -128,7 +128,7 @@
>   
>   	n = *num;
>   	if (SQN_ATOMIC(sa))
> -		sqn = __atomic_fetch_add(&sa->sqn.outb, n, __ATOMIC_RELAXED) + n;
> +		sqn = rte_atomic_fetch_add_explicit(&sa->sqn.outb, n, rte_memory_order_relaxed) + n;
>   	else {
>   		sqn = sa->sqn.outb + n;
>   		sa->sqn.outb = sqn;
> diff --git a/lib/ipsec/sa.h b/lib/ipsec/sa.h
> index ce4af8c..4b30bea 100644
> --- a/lib/ipsec/sa.h
> +++ b/lib/ipsec/sa.h
> @@ -124,7 +124,7 @@ struct rte_ipsec_sa {
>   	 * place from other frequently accessed data.
>   	 */
>   	union {
> -		uint64_t outb;
> +		RTE_ATOMIC(uint64_t) outb;
>   		struct {
>   			uint32_t rdidx; /* read index */
>   			uint32_t wridx; /* write index */

Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 08/19] mempool: use rte optional stdatomic API
  2023-10-26  0:31   ` [PATCH v3 08/19] mempool: " Tyler Retzlaff
@ 2023-10-27 13:01     ` Konstantin Ananyev
  0 siblings, 0 replies; 91+ messages in thread
From: Konstantin Ananyev @ 2023-10-27 13:01 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

26.10.2023 01:31, Tyler Retzlaff пишет:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/mempool/rte_mempool.h | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index f70bf36..df87cd2 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -327,8 +327,8 @@ struct rte_mempool {
>   		if (likely(__lcore_id < RTE_MAX_LCORE))                         \
>   			(mp)->stats[__lcore_id].name += (n);                    \
>   		else                                                            \
> -			__atomic_fetch_add(&((mp)->stats[RTE_MAX_LCORE].name),  \
> -					   (n), __ATOMIC_RELAXED);              \
> +			rte_atomic_fetch_add_explicit(&((mp)->stats[RTE_MAX_LCORE].name),  \
> +					   (n), rte_memory_order_relaxed);              \
>   	} while (0)
>   #else
>   #define RTE_MEMPOOL_STAT_ADD(mp, name, n) do {} while (0)

Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 07/19] mbuf: use rte optional stdatomic API
  2023-10-26  0:31   ` [PATCH v3 07/19] mbuf: " Tyler Retzlaff
@ 2023-10-27 13:03     ` Konstantin Ananyev
  0 siblings, 0 replies; 91+ messages in thread
From: Konstantin Ananyev @ 2023-10-27 13:03 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

26.10.2023 01:31, Tyler Retzlaff пишет:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/mbuf/rte_mbuf.h      | 20 ++++++++++----------
>   lib/mbuf/rte_mbuf_core.h |  5 +++--
>   2 files changed, 13 insertions(+), 12 deletions(-)
> 
> diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> index 913c459..b8ab477 100644
> --- a/lib/mbuf/rte_mbuf.h
> +++ b/lib/mbuf/rte_mbuf.h
> @@ -361,7 +361,7 @@ struct rte_pktmbuf_pool_private {
>   static inline uint16_t
>   rte_mbuf_refcnt_read(const struct rte_mbuf *m)
>   {
> -	return __atomic_load_n(&m->refcnt, __ATOMIC_RELAXED);
> +	return rte_atomic_load_explicit(&m->refcnt, rte_memory_order_relaxed);
>   }
>   
>   /**
> @@ -374,15 +374,15 @@ struct rte_pktmbuf_pool_private {
>   static inline void
>   rte_mbuf_refcnt_set(struct rte_mbuf *m, uint16_t new_value)
>   {
> -	__atomic_store_n(&m->refcnt, new_value, __ATOMIC_RELAXED);
> +	rte_atomic_store_explicit(&m->refcnt, new_value, rte_memory_order_relaxed);
>   }
>   
>   /* internal */
>   static inline uint16_t
>   __rte_mbuf_refcnt_update(struct rte_mbuf *m, int16_t value)
>   {
> -	return __atomic_fetch_add(&m->refcnt, value,
> -				 __ATOMIC_ACQ_REL) + value;
> +	return rte_atomic_fetch_add_explicit(&m->refcnt, value,
> +				 rte_memory_order_acq_rel) + value;
>   }
>   
>   /**
> @@ -463,7 +463,7 @@ struct rte_pktmbuf_pool_private {
>   static inline uint16_t
>   rte_mbuf_ext_refcnt_read(const struct rte_mbuf_ext_shared_info *shinfo)
>   {
> -	return __atomic_load_n(&shinfo->refcnt, __ATOMIC_RELAXED);
> +	return rte_atomic_load_explicit(&shinfo->refcnt, rte_memory_order_relaxed);
>   }
>   
>   /**
> @@ -478,7 +478,7 @@ struct rte_pktmbuf_pool_private {
>   rte_mbuf_ext_refcnt_set(struct rte_mbuf_ext_shared_info *shinfo,
>   	uint16_t new_value)
>   {
> -	__atomic_store_n(&shinfo->refcnt, new_value, __ATOMIC_RELAXED);
> +	rte_atomic_store_explicit(&shinfo->refcnt, new_value, rte_memory_order_relaxed);
>   }
>   
>   /**
> @@ -502,8 +502,8 @@ struct rte_pktmbuf_pool_private {
>   		return (uint16_t)value;
>   	}
>   
> -	return __atomic_fetch_add(&shinfo->refcnt, value,
> -				 __ATOMIC_ACQ_REL) + value;
> +	return rte_atomic_fetch_add_explicit(&shinfo->refcnt, value,
> +				 rte_memory_order_acq_rel) + value;
>   }
>   
>   /** Mbuf prefetch */
> @@ -1315,8 +1315,8 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
>   	 * Direct usage of add primitive to avoid
>   	 * duplication of comparing with one.
>   	 */
> -	if (likely(__atomic_fetch_add(&shinfo->refcnt, -1,
> -				     __ATOMIC_ACQ_REL) - 1))
> +	if (likely(rte_atomic_fetch_add_explicit(&shinfo->refcnt, -1,
> +				     rte_memory_order_acq_rel) - 1))
>   		return 1;
>   
>   	/* Reinitialize counter before mbuf freeing. */
> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> index e9bc0d1..5688683 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -19,6 +19,7 @@
>   #include <stdint.h>
>   
>   #include <rte_byteorder.h>
> +#include <rte_stdatomic.h>
>   
>   #ifdef __cplusplus
>   extern "C" {
> @@ -497,7 +498,7 @@ struct rte_mbuf {
>   	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
>   	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
>   	 */
> -	uint16_t refcnt;
> +	RTE_ATOMIC(uint16_t) refcnt;
>   
>   	/**
>   	 * Number of segments. Only valid for the first segment of an mbuf
> @@ -674,7 +675,7 @@ struct rte_mbuf {
>   struct rte_mbuf_ext_shared_info {
>   	rte_mbuf_extbuf_free_callback_t free_cb; /**< Free callback function */
>   	void *fcb_opaque;                        /**< Free callback argument */
> -	uint16_t refcnt;
> +	RTE_ATOMIC(uint16_t) refcnt;


Just hope that stdatomic for uint16 would never anctualluy increase the 
field  size.
Otherwise it could cause a lot of troubles with dyn mbuf fileds, etc..
Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>


>   };
>   
>   /** Maximum number of nb_segs allowed. */


^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 16/19] ethdev: use rte optional stdatomic API
  2023-10-26  0:31   ` [PATCH v3 16/19] ethdev: " Tyler Retzlaff
@ 2023-10-27 13:04     ` Konstantin Ananyev
  0 siblings, 0 replies; 91+ messages in thread
From: Konstantin Ananyev @ 2023-10-27 13:04 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

26.10.2023 01:31, Tyler Retzlaff пишет:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/ethdev/ethdev_driver.h   | 16 ++++++++--------
>   lib/ethdev/ethdev_private.c  |  6 +++---
>   lib/ethdev/rte_ethdev.c      | 24 ++++++++++++------------
>   lib/ethdev/rte_ethdev.h      | 16 ++++++++--------
>   lib/ethdev/rte_ethdev_core.h |  2 +-
>   5 files changed, 32 insertions(+), 32 deletions(-)
> 
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
> index deb23ad..b482cd1 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -30,7 +30,7 @@
>    * queue on Rx and Tx.
>    */
>   struct rte_eth_rxtx_callback {
> -	struct rte_eth_rxtx_callback *next;
> +	RTE_ATOMIC(struct rte_eth_rxtx_callback *) next;
>   	union{
>   		rte_rx_callback_fn rx;
>   		rte_tx_callback_fn tx;
> @@ -80,12 +80,12 @@ struct rte_eth_dev {
>   	 * User-supplied functions called from rx_burst to post-process
>   	 * received packets before passing them to the user
>   	 */
> -	struct rte_eth_rxtx_callback *post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> +	RTE_ATOMIC(struct rte_eth_rxtx_callback *) post_rx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
>   	/**
>   	 * User-supplied functions called from tx_burst to pre-process
>   	 * received packets before passing them to the driver for transmission
>   	 */
> -	struct rte_eth_rxtx_callback *pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
> +	RTE_ATOMIC(struct rte_eth_rxtx_callback *) pre_tx_burst_cbs[RTE_MAX_QUEUES_PER_PORT];
>   
>   	enum rte_eth_dev_state state; /**< Flag indicating the port state */
>   	void *security_ctx; /**< Context for security ops */
> @@ -1655,7 +1655,7 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
>   rte_eth_linkstatus_set(struct rte_eth_dev *dev,
>   		       const struct rte_eth_link *new_link)
>   {
> -	uint64_t *dev_link = (uint64_t *)&(dev->data->dev_link);
> +	RTE_ATOMIC(uint64_t) *dev_link = (uint64_t __rte_atomic *)&(dev->data->dev_link);
>   	union {
>   		uint64_t val64;
>   		struct rte_eth_link link;
> @@ -1663,8 +1663,8 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
>   
>   	RTE_BUILD_BUG_ON(sizeof(*new_link) != sizeof(uint64_t));
>   
> -	orig.val64 = __atomic_exchange_n(dev_link, *(const uint64_t *)new_link,
> -					__ATOMIC_SEQ_CST);
> +	orig.val64 = rte_atomic_exchange_explicit(dev_link, *(const uint64_t *)new_link,
> +					rte_memory_order_seq_cst);
>   
>   	return (orig.link.link_status == new_link->link_status) ? -1 : 0;
>   }
> @@ -1682,12 +1682,12 @@ int rte_eth_dev_callback_process(struct rte_eth_dev *dev,
>   rte_eth_linkstatus_get(const struct rte_eth_dev *dev,
>   		       struct rte_eth_link *link)
>   {
> -	uint64_t *src = (uint64_t *)&(dev->data->dev_link);
> +	RTE_ATOMIC(uint64_t) *src = (uint64_t __rte_atomic *)&(dev->data->dev_link);
>   	uint64_t *dst = (uint64_t *)link;
>   
>   	RTE_BUILD_BUG_ON(sizeof(*link) != sizeof(uint64_t));
>   
> -	*dst = __atomic_load_n(src, __ATOMIC_SEQ_CST);
> +	*dst = rte_atomic_load_explicit(src, rte_memory_order_seq_cst);
>   }
>   
>   /**
> diff --git a/lib/ethdev/ethdev_private.c b/lib/ethdev/ethdev_private.c
> index 7cc7f28..82e2568 100644
> --- a/lib/ethdev/ethdev_private.c
> +++ b/lib/ethdev/ethdev_private.c
> @@ -245,7 +245,7 @@ struct dummy_queue {
>   void
>   eth_dev_fp_ops_reset(struct rte_eth_fp_ops *fpo)
>   {
> -	static void *dummy_data[RTE_MAX_QUEUES_PER_PORT];
> +	static RTE_ATOMIC(void *) dummy_data[RTE_MAX_QUEUES_PER_PORT];
>   	uintptr_t port_id = fpo - rte_eth_fp_ops;
>   
>   	per_port_queues[port_id].rx_warn_once = false;
> @@ -278,10 +278,10 @@ struct dummy_queue {
>   	fpo->recycle_rx_descriptors_refill = dev->recycle_rx_descriptors_refill;
>   
>   	fpo->rxq.data = dev->data->rx_queues;
> -	fpo->rxq.clbk = (void **)(uintptr_t)dev->post_rx_burst_cbs;
> +	fpo->rxq.clbk = (void * __rte_atomic *)(uintptr_t)dev->post_rx_burst_cbs;
>   
>   	fpo->txq.data = dev->data->tx_queues;
> -	fpo->txq.clbk = (void **)(uintptr_t)dev->pre_tx_burst_cbs;
> +	fpo->txq.clbk = (void * __rte_atomic *)(uintptr_t)dev->pre_tx_burst_cbs;
>   }
>   
>   uint16_t
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index 9dabcb5..af23ac0 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -5654,9 +5654,9 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
>   		/* Stores to cb->fn and cb->param should complete before
>   		 * cb is visible to data plane.
>   		 */
> -		__atomic_store_n(
> +		rte_atomic_store_explicit(
>   			&rte_eth_devices[port_id].post_rx_burst_cbs[queue_id],
> -			cb, __ATOMIC_RELEASE);
> +			cb, rte_memory_order_release);
>   
>   	} else {
>   		while (tail->next)
> @@ -5664,7 +5664,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
>   		/* Stores to cb->fn and cb->param should complete before
>   		 * cb is visible to data plane.
>   		 */
> -		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
> +		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
>   	}
>   	rte_spinlock_unlock(&eth_dev_rx_cb_lock);
>   
> @@ -5704,9 +5704,9 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
>   	/* Stores to cb->fn, cb->param and cb->next should complete before
>   	 * cb is visible to data plane threads.
>   	 */
> -	__atomic_store_n(
> +	rte_atomic_store_explicit(
>   		&rte_eth_devices[port_id].post_rx_burst_cbs[queue_id],
> -		cb, __ATOMIC_RELEASE);
> +		cb, rte_memory_order_release);
>   	rte_spinlock_unlock(&eth_dev_rx_cb_lock);
>   
>   	rte_eth_trace_add_first_rx_callback(port_id, queue_id, fn, user_param,
> @@ -5757,9 +5757,9 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
>   		/* Stores to cb->fn and cb->param should complete before
>   		 * cb is visible to data plane.
>   		 */
> -		__atomic_store_n(
> +		rte_atomic_store_explicit(
>   			&rte_eth_devices[port_id].pre_tx_burst_cbs[queue_id],
> -			cb, __ATOMIC_RELEASE);
> +			cb, rte_memory_order_release);
>   
>   	} else {
>   		while (tail->next)
> @@ -5767,7 +5767,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
>   		/* Stores to cb->fn and cb->param should complete before
>   		 * cb is visible to data plane.
>   		 */
> -		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
> +		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
>   	}
>   	rte_spinlock_unlock(&eth_dev_tx_cb_lock);
>   
> @@ -5791,7 +5791,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
>   
>   	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>   	struct rte_eth_rxtx_callback *cb;
> -	struct rte_eth_rxtx_callback **prev_cb;
> +	RTE_ATOMIC(struct rte_eth_rxtx_callback *) *prev_cb;
>   	int ret = -EINVAL;
>   
>   	rte_spinlock_lock(&eth_dev_rx_cb_lock);
> @@ -5800,7 +5800,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
>   		cb = *prev_cb;
>   		if (cb == user_cb) {
>   			/* Remove the user cb from the callback list. */
> -			__atomic_store_n(prev_cb, cb->next, __ATOMIC_RELAXED);
> +			rte_atomic_store_explicit(prev_cb, cb->next, rte_memory_order_relaxed);
>   			ret = 0;
>   			break;
>   		}
> @@ -5828,7 +5828,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
>   	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
>   	int ret = -EINVAL;
>   	struct rte_eth_rxtx_callback *cb;
> -	struct rte_eth_rxtx_callback **prev_cb;
> +	RTE_ATOMIC(struct rte_eth_rxtx_callback *) *prev_cb;
>   
>   	rte_spinlock_lock(&eth_dev_tx_cb_lock);
>   	prev_cb = &dev->pre_tx_burst_cbs[queue_id];
> @@ -5836,7 +5836,7 @@ int rte_eth_rx_avail_thresh_query(uint16_t port_id, uint16_t *queue_id,
>   		cb = *prev_cb;
>   		if (cb == user_cb) {
>   			/* Remove the user cb from the callback list. */
> -			__atomic_store_n(prev_cb, cb->next, __ATOMIC_RELAXED);
> +			rte_atomic_store_explicit(prev_cb, cb->next, rte_memory_order_relaxed);
>   			ret = 0;
>   			break;
>   		}
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 85b9af7..d1c10f2 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -6023,14 +6023,14 @@ uint16_t rte_eth_call_rx_callbacks(uint16_t port_id, uint16_t queue_id,
>   	{
>   		void *cb;
>   
> -		/* __ATOMIC_RELEASE memory order was used when the
> +		/* rte_memory_order_release memory order was used when the
>   		 * call back was inserted into the list.
>   		 * Since there is a clear dependency between loading
> -		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
> +		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
>   		 * not required.
>   		 */
> -		cb = __atomic_load_n((void **)&p->rxq.clbk[queue_id],
> -				__ATOMIC_RELAXED);
> +		cb = rte_atomic_load_explicit(&p->rxq.clbk[queue_id],
> +				rte_memory_order_relaxed);
>   		if (unlikely(cb != NULL))
>   			nb_rx = rte_eth_call_rx_callbacks(port_id, queue_id,
>   					rx_pkts, nb_rx, nb_pkts, cb);
> @@ -6360,14 +6360,14 @@ uint16_t rte_eth_call_tx_callbacks(uint16_t port_id, uint16_t queue_id,
>   	{
>   		void *cb;
>   
> -		/* __ATOMIC_RELEASE memory order was used when the
> +		/* rte_memory_order_release memory order was used when the
>   		 * call back was inserted into the list.
>   		 * Since there is a clear dependency between loading
> -		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
> +		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
>   		 * not required.
>   		 */
> -		cb = __atomic_load_n((void **)&p->txq.clbk[queue_id],
> -				__ATOMIC_RELAXED);
> +		cb = rte_atomic_load_explicit(&p->txq.clbk[queue_id],
> +				rte_memory_order_relaxed);
>   		if (unlikely(cb != NULL))
>   			nb_pkts = rte_eth_call_tx_callbacks(port_id, queue_id,
>   					tx_pkts, nb_pkts, cb);
> diff --git a/lib/ethdev/rte_ethdev_core.h b/lib/ethdev/rte_ethdev_core.h
> index 32f5f73..4bfaf79 100644
> --- a/lib/ethdev/rte_ethdev_core.h
> +++ b/lib/ethdev/rte_ethdev_core.h
> @@ -71,7 +71,7 @@ struct rte_ethdev_qdata {
>   	/** points to array of internal queue data pointers */
>   	void **data;
>   	/** points to array of queue callback data pointers */
> -	void **clbk;
> +	RTE_ATOMIC(void *) *clbk;
>   };
>   
>   /**

Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 14/19] cryptodev: use rte optional stdatomic API
  2023-10-26  0:31   ` [PATCH v3 14/19] cryptodev: " Tyler Retzlaff
  2023-10-26 15:53     ` [EXT] " Akhil Goyal
@ 2023-10-27 13:05     ` Konstantin Ananyev
  1 sibling, 0 replies; 91+ messages in thread
From: Konstantin Ananyev @ 2023-10-27 13:05 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Akhil Goyal, Anatoly Burakov, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob, Matan Azrad,
	Maxime Coquelin, Narcisa Ana Maria Vasile, Nicolas Chautru,
	Olivier Matz, Ori Kam, Pallavi Kadam, Pavan Nikhilesh,
	Reshma Pattan, Sameh Gobriel, Shijith Thotton,
	Sivaprasad Tummala, Stephen Hemminger, Suanming Mou,
	Sunil Kumar Kori, Thomas Monjalon, Viacheslav Ovsiienko,
	Vladimir Medvedkin, Yipeng Wang

26.10.2023 01:31, Tyler Retzlaff пишет:
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/cryptodev/rte_cryptodev.c | 22 ++++++++++++----------
>   lib/cryptodev/rte_cryptodev.h | 16 ++++++++--------
>   2 files changed, 20 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/cryptodev/rte_cryptodev.c b/lib/cryptodev/rte_cryptodev.c
> index 314710b..b258827 100644
> --- a/lib/cryptodev/rte_cryptodev.c
> +++ b/lib/cryptodev/rte_cryptodev.c
> @@ -1535,12 +1535,12 @@ struct rte_cryptodev_cb *
>   		/* Stores to cb->fn and cb->param should complete before
>   		 * cb is visible to data plane.
>   		 */
> -		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
> +		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
>   	} else {
>   		/* Stores to cb->fn and cb->param should complete before
>   		 * cb is visible to data plane.
>   		 */
> -		__atomic_store_n(&list->next, cb, __ATOMIC_RELEASE);
> +		rte_atomic_store_explicit(&list->next, cb, rte_memory_order_release);
>   	}
>   
>   	rte_spinlock_unlock(&rte_cryptodev_callback_lock);
> @@ -1555,7 +1555,8 @@ struct rte_cryptodev_cb *
>   				  struct rte_cryptodev_cb *cb)
>   {
>   	struct rte_cryptodev *dev;
> -	struct rte_cryptodev_cb **prev_cb, *curr_cb;
> +	RTE_ATOMIC(struct rte_cryptodev_cb *) *prev_cb;
> +	struct rte_cryptodev_cb *curr_cb;
>   	struct rte_cryptodev_cb_rcu *list;
>   	int ret;
>   
> @@ -1601,8 +1602,8 @@ struct rte_cryptodev_cb *
>   		curr_cb = *prev_cb;
>   		if (curr_cb == cb) {
>   			/* Remove the user cb from the callback list. */
> -			__atomic_store_n(prev_cb, curr_cb->next,
> -				__ATOMIC_RELAXED);
> +			rte_atomic_store_explicit(prev_cb, curr_cb->next,
> +				rte_memory_order_relaxed);
>   			ret = 0;
>   			break;
>   		}
> @@ -1673,12 +1674,12 @@ struct rte_cryptodev_cb *
>   		/* Stores to cb->fn and cb->param should complete before
>   		 * cb is visible to data plane.
>   		 */
> -		__atomic_store_n(&tail->next, cb, __ATOMIC_RELEASE);
> +		rte_atomic_store_explicit(&tail->next, cb, rte_memory_order_release);
>   	} else {
>   		/* Stores to cb->fn and cb->param should complete before
>   		 * cb is visible to data plane.
>   		 */
> -		__atomic_store_n(&list->next, cb, __ATOMIC_RELEASE);
> +		rte_atomic_store_explicit(&list->next, cb, rte_memory_order_release);
>   	}
>   
>   	rte_spinlock_unlock(&rte_cryptodev_callback_lock);
> @@ -1694,7 +1695,8 @@ struct rte_cryptodev_cb *
>   				  struct rte_cryptodev_cb *cb)
>   {
>   	struct rte_cryptodev *dev;
> -	struct rte_cryptodev_cb **prev_cb, *curr_cb;
> +	RTE_ATOMIC(struct rte_cryptodev_cb *) *prev_cb;
> +	struct rte_cryptodev_cb *curr_cb;
>   	struct rte_cryptodev_cb_rcu *list;
>   	int ret;
>   
> @@ -1740,8 +1742,8 @@ struct rte_cryptodev_cb *
>   		curr_cb = *prev_cb;
>   		if (curr_cb == cb) {
>   			/* Remove the user cb from the callback list. */
> -			__atomic_store_n(prev_cb, curr_cb->next,
> -				__ATOMIC_RELAXED);
> +			rte_atomic_store_explicit(prev_cb, curr_cb->next,
> +				rte_memory_order_relaxed);
>   			ret = 0;
>   			break;
>   		}
> diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
> index be0698c..9092118 100644
> --- a/lib/cryptodev/rte_cryptodev.h
> +++ b/lib/cryptodev/rte_cryptodev.h
> @@ -979,7 +979,7 @@ struct rte_cryptodev_config {
>    * queue pair on enqueue/dequeue.
>    */
>   struct rte_cryptodev_cb {
> -	struct rte_cryptodev_cb *next;
> +	RTE_ATOMIC(struct rte_cryptodev_cb *) next;
>   	/**< Pointer to next callback */
>   	rte_cryptodev_callback_fn fn;
>   	/**< Pointer to callback function */
> @@ -992,7 +992,7 @@ struct rte_cryptodev_cb {
>    * Structure used to hold information about the RCU for a queue pair.
>    */
>   struct rte_cryptodev_cb_rcu {
> -	struct rte_cryptodev_cb *next;
> +	RTE_ATOMIC(struct rte_cryptodev_cb *) next;
>   	/**< Pointer to next callback */
>   	struct rte_rcu_qsbr *qsbr;
>   	/**< RCU QSBR variable per queue pair */
> @@ -1947,15 +1947,15 @@ int rte_cryptodev_remove_deq_callback(uint8_t dev_id,
>   		struct rte_cryptodev_cb_rcu *list;
>   		struct rte_cryptodev_cb *cb;
>   
> -		/* __ATOMIC_RELEASE memory order was used when the
> +		/* rte_memory_order_release memory order was used when the
>   		 * call back was inserted into the list.
>   		 * Since there is a clear dependency between loading
> -		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
> +		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
>   		 * not required.
>   		 */
>   		list = &fp_ops->qp.deq_cb[qp_id];
>   		rte_rcu_qsbr_thread_online(list->qsbr, 0);
> -		cb = __atomic_load_n(&list->next, __ATOMIC_RELAXED);
> +		cb = rte_atomic_load_explicit(&list->next, rte_memory_order_relaxed);
>   
>   		while (cb != NULL) {
>   			nb_ops = cb->fn(dev_id, qp_id, ops, nb_ops,
> @@ -2014,15 +2014,15 @@ int rte_cryptodev_remove_deq_callback(uint8_t dev_id,
>   		struct rte_cryptodev_cb_rcu *list;
>   		struct rte_cryptodev_cb *cb;
>   
> -		/* __ATOMIC_RELEASE memory order was used when the
> +		/* rte_memory_order_release memory order was used when the
>   		 * call back was inserted into the list.
>   		 * Since there is a clear dependency between loading
> -		 * cb and cb->fn/cb->next, __ATOMIC_ACQUIRE memory order is
> +		 * cb and cb->fn/cb->next, rte_memory_order_acquire memory order is
>   		 * not required.
>   		 */
>   		list = &fp_ops->qp.enq_cb[qp_id];
>   		rte_rcu_qsbr_thread_online(list->qsbr, 0);
> -		cb = __atomic_load_n(&list->next, __ATOMIC_RELAXED);
> +		cb = rte_atomic_load_explicit(&list->next, rte_memory_order_relaxed);
>   
>   		while (cb != NULL) {
>   			nb_ops = cb->fn(dev_id, qp_id, ops, nb_ops,

Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>

^ permalink raw reply	[flat|nested] 91+ messages in thread

* Re: [PATCH v3 00/19] use rte optional stdatomic API
  2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
                     ` (19 preceding siblings ...)
  2023-10-26 13:47   ` [PATCH v3 00/19] " David Marchand
@ 2023-10-30 15:34   ` David Marchand
  20 siblings, 0 replies; 91+ messages in thread
From: David Marchand @ 2023-10-30 15:34 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Akhil Goyal, Anatoly Burakov, Andrew Rybchenko,
	Bruce Richardson, Ciara Power, David Christensen, David Hunt,
	Dmitry Kozlyuk, Dmitry Malloy, Elena Agostini,
	Erik Gabriel Carrillo, Fan Zhang, Ferruh Yigit, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jerin Jacob,
	Konstantin Ananyev, Matan Azrad, Maxime Coquelin,
	Narcisa Ana Maria Vasile, Nicolas Chautru, Olivier Matz, Ori Kam,
	Pallavi Kadam, Pavan Nikhilesh, Reshma Pattan, Sameh Gobriel,
	Shijith Thotton, Sivaprasad Tummala, Stephen Hemminger,
	Suanming Mou, Sunil Kumar Kori, Thomas Monjalon,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Yipeng Wang

On Thu, Oct 26, 2023 at 2:32 AM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional stdatomic API.
>
> v3:
>   * add missing atomic specification on head variable
>     in struct rte_ring_headtail
>   * adapt to use rte_atomic_xxx stdatomic API in rte_ring_c11_pvt.h
>   * split comma operator statement into 2 statements
>
> v2:
>   * add #include <rte_stdatomic.h> to rte_mbuf_core.h
>   * remove first two patches which were fixes that have
>     been merged in another series
>
> Tyler Retzlaff (19):
>   power: use rte optional stdatomic API
>   bbdev: use rte optional stdatomic API
>   eal: use rte optional stdatomic API
>   eventdev: use rte optional stdatomic API
>   gpudev: use rte optional stdatomic API
>   ipsec: use rte optional stdatomic API
>   mbuf: use rte optional stdatomic API
>   mempool: use rte optional stdatomic API
>   rcu: use rte optional stdatomic API
>   pdump: use rte optional stdatomic API
>   stack: use rte optional stdatomic API
>   telemetry: use rte optional stdatomic API
>   vhost: use rte optional stdatomic API
>   cryptodev: use rte optional stdatomic API
>   distributor: use rte optional stdatomic API
>   ethdev: use rte optional stdatomic API
>   hash: use rte optional stdatomic API
>   timer: use rte optional stdatomic API
>   ring: use rte optional stdatomic API
>
>  drivers/event/cnxk/cnxk_tim_worker.h   |   4 +-
>  drivers/net/mlx5/mlx5_hws_cnt.h        |   4 +-
>  lib/bbdev/rte_bbdev.c                  |   6 +-
>  lib/bbdev/rte_bbdev.h                  |   2 +-
>  lib/cryptodev/rte_cryptodev.c          |  22 +++---
>  lib/cryptodev/rte_cryptodev.h          |  16 ++---
>  lib/distributor/distributor_private.h  |   4 +-
>  lib/distributor/rte_distributor.c      |  54 +++++++--------
>  lib/eal/common/eal_common_launch.c     |  10 +--
>  lib/eal/common/eal_common_mcfg.c       |   2 +-
>  lib/eal/common/eal_common_proc.c       |  14 ++--
>  lib/eal/common/eal_common_thread.c     |  26 +++----
>  lib/eal/common/eal_common_trace.c      |   8 +--
>  lib/eal/common/eal_common_trace_ctf.c  |   4 +-
>  lib/eal/common/eal_memcfg.h            |   2 +-
>  lib/eal/common/eal_private.h           |   4 +-
>  lib/eal/common/eal_trace.h             |   4 +-
>  lib/eal/common/rte_service.c           | 122 ++++++++++++++++-----------------
>  lib/eal/freebsd/eal.c                  |  20 +++---
>  lib/eal/include/rte_epoll.h            |   3 +-
>  lib/eal/linux/eal.c                    |  26 +++----
>  lib/eal/linux/eal_interrupts.c         |  42 ++++++------
>  lib/eal/ppc/include/rte_atomic.h       |   6 +-
>  lib/eal/windows/rte_thread.c           |   8 ++-
>  lib/ethdev/ethdev_driver.h             |  16 ++---
>  lib/ethdev/ethdev_private.c            |   6 +-
>  lib/ethdev/rte_ethdev.c                |  24 +++----
>  lib/ethdev/rte_ethdev.h                |  16 ++---
>  lib/ethdev/rte_ethdev_core.h           |   2 +-
>  lib/eventdev/rte_event_timer_adapter.c |  66 +++++++++---------
>  lib/eventdev/rte_event_timer_adapter.h |   2 +-
>  lib/gpudev/gpudev.c                    |   6 +-
>  lib/gpudev/gpudev_driver.h             |   2 +-
>  lib/hash/rte_cuckoo_hash.c             | 116 +++++++++++++++----------------
>  lib/hash/rte_cuckoo_hash.h             |   6 +-
>  lib/ipsec/ipsec_sqn.h                  |   2 +-
>  lib/ipsec/sa.h                         |   2 +-
>  lib/mbuf/rte_mbuf.h                    |  20 +++---
>  lib/mbuf/rte_mbuf_core.h               |   5 +-
>  lib/mempool/rte_mempool.h              |   4 +-
>  lib/pdump/rte_pdump.c                  |  14 ++--
>  lib/pdump/rte_pdump.h                  |   8 +--
>  lib/power/power_acpi_cpufreq.c         |  33 ++++-----
>  lib/power/power_cppc_cpufreq.c         |  25 +++----
>  lib/power/power_pstate_cpufreq.c       |  31 +++++----
>  lib/rcu/rte_rcu_qsbr.c                 |  48 ++++++-------
>  lib/rcu/rte_rcu_qsbr.h                 |  68 +++++++++---------
>  lib/ring/rte_ring_c11_pvt.h            |  47 +++++++------
>  lib/ring/rte_ring_core.h               |  12 ++--
>  lib/ring/rte_ring_generic_pvt.h        |  16 +++--
>  lib/ring/rte_ring_hts_elem_pvt.h       |  22 +++---
>  lib/ring/rte_ring_peek_elem_pvt.h      |   6 +-
>  lib/ring/rte_ring_rts_elem_pvt.h       |  27 ++++----
>  lib/stack/rte_stack.h                  |   2 +-
>  lib/stack/rte_stack_lf_c11.h           |  24 +++----
>  lib/stack/rte_stack_lf_generic.h       |  18 ++---
>  lib/telemetry/telemetry.c              |  18 ++---
>  lib/timer/rte_timer.c                  |  50 +++++++-------
>  lib/timer/rte_timer.h                  |   6 +-
>  lib/vhost/vdpa.c                       |   3 +-
>  lib/vhost/vhost.c                      |  42 ++++++------
>  lib/vhost/vhost.h                      |  39 ++++++-----
>  lib/vhost/vhost_user.c                 |   6 +-
>  lib/vhost/virtio_net.c                 |  58 +++++++++-------
>  lib/vhost/virtio_net_ctrl.c            |   6 +-
>  65 files changed, 684 insertions(+), 653 deletions(-)
>

Series applied, thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 91+ messages in thread

end of thread, other threads:[~2023-10-30 15:35 UTC | newest]

Thread overview: 91+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-10-16 23:08 [PATCH 00/21] use rte optional stdatomic API Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 01/21] power: fix use of rte stdatomic Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 02/21] event/cnxk: remove single " Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 03/21] power: use rte optional stdatomic API Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 04/21] bbdev: " Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 05/21] eal: " Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 06/21] eventdev: " Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 07/21] gpudev: " Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 08/21] ipsec: " Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 09/21] mbuf: " Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 10/21] mempool: " Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 11/21] rcu: " Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 12/21] pdump: " Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 13/21] stack: " Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 14/21] telemetry: " Tyler Retzlaff
2023-10-16 23:08 ` [PATCH 15/21] vhost: " Tyler Retzlaff
2023-10-16 23:09 ` [PATCH 16/21] cryptodev: " Tyler Retzlaff
2023-10-16 23:09 ` [PATCH 17/21] distributor: " Tyler Retzlaff
2023-10-16 23:09 ` [PATCH 18/21] ethdev: " Tyler Retzlaff
2023-10-16 23:09 ` [PATCH 19/21] hash: " Tyler Retzlaff
2023-10-16 23:09 ` [PATCH 20/21] timer: " Tyler Retzlaff
2023-10-16 23:09 ` [PATCH 21/21] ring: " Tyler Retzlaff
2023-10-17 20:30 ` [PATCH v2 00/19] " Tyler Retzlaff
2023-10-17 20:30   ` [PATCH v2 01/19] power: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 02/19] bbdev: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 03/19] eal: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 04/19] eventdev: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 05/19] gpudev: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 06/19] ipsec: " Tyler Retzlaff
2023-10-24  8:45     ` Konstantin Ananyev
2023-10-17 20:31   ` [PATCH v2 07/19] mbuf: " Tyler Retzlaff
2023-10-24  8:46     ` Konstantin Ananyev
2023-10-17 20:31   ` [PATCH v2 08/19] mempool: " Tyler Retzlaff
2023-10-24  8:47     ` Konstantin Ananyev
2023-10-17 20:31   ` [PATCH v2 09/19] rcu: " Tyler Retzlaff
2023-10-25  9:41     ` Ruifeng Wang
2023-10-25 22:38       ` Tyler Retzlaff
2023-10-26  4:24         ` Ruifeng Wang
2023-10-26 16:36           ` Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 10/19] pdump: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 11/19] stack: " Tyler Retzlaff
2023-10-24  8:48     ` Konstantin Ananyev
2023-10-17 20:31   ` [PATCH v2 12/19] telemetry: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 13/19] vhost: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 14/19] cryptodev: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 15/19] distributor: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 16/19] ethdev: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 17/19] hash: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 18/19] timer: " Tyler Retzlaff
2023-10-17 20:31   ` [PATCH v2 19/19] ring: " Tyler Retzlaff
2023-10-24  8:43     ` Konstantin Ananyev
2023-10-24  9:56       ` Morten Brørup
2023-10-24 15:58         ` Tyler Retzlaff
2023-10-24 16:36           ` Morten Brørup
2023-10-24 16:29       ` Tyler Retzlaff
2023-10-25 10:06         ` Konstantin Ananyev
2023-10-25 22:49           ` Tyler Retzlaff
2023-10-25 23:22             ` Tyler Retzlaff
2023-10-17 23:55   ` [PATCH v2 00/19] " Stephen Hemminger
2023-10-26  0:31 ` [PATCH v3 " Tyler Retzlaff
2023-10-26  0:31   ` [PATCH v3 01/19] power: " Tyler Retzlaff
2023-10-26  0:31   ` [PATCH v3 02/19] bbdev: " Tyler Retzlaff
2023-10-26 11:57     ` Maxime Coquelin
2023-10-26  0:31   ` [PATCH v3 03/19] eal: " Tyler Retzlaff
2023-10-26  0:31   ` [PATCH v3 04/19] eventdev: " Tyler Retzlaff
2023-10-26  0:31   ` [PATCH v3 05/19] gpudev: " Tyler Retzlaff
2023-10-26  0:31   ` [PATCH v3 06/19] ipsec: " Tyler Retzlaff
2023-10-26 15:54     ` [EXT] " Akhil Goyal
2023-10-27 12:59     ` Konstantin Ananyev
2023-10-26  0:31   ` [PATCH v3 07/19] mbuf: " Tyler Retzlaff
2023-10-27 13:03     ` Konstantin Ananyev
2023-10-26  0:31   ` [PATCH v3 08/19] mempool: " Tyler Retzlaff
2023-10-27 13:01     ` Konstantin Ananyev
2023-10-26  0:31   ` [PATCH v3 09/19] rcu: " Tyler Retzlaff
2023-10-26  0:31   ` [PATCH v3 10/19] pdump: " Tyler Retzlaff
2023-10-26  0:31   ` [PATCH v3 11/19] stack: " Tyler Retzlaff
2023-10-26  0:31   ` [PATCH v3 12/19] telemetry: " Tyler Retzlaff
2023-10-26  0:31   ` [PATCH v3 13/19] vhost: " Tyler Retzlaff
2023-10-26 11:57     ` Maxime Coquelin
2023-10-26  0:31   ` [PATCH v3 14/19] cryptodev: " Tyler Retzlaff
2023-10-26 15:53     ` [EXT] " Akhil Goyal
2023-10-27 13:05     ` Konstantin Ananyev
2023-10-26  0:31   ` [PATCH v3 15/19] distributor: " Tyler Retzlaff
2023-10-26  0:31   ` [PATCH v3 16/19] ethdev: " Tyler Retzlaff
2023-10-27 13:04     ` Konstantin Ananyev
2023-10-26  0:31   ` [PATCH v3 17/19] hash: " Tyler Retzlaff
2023-10-26  0:31   ` [PATCH v3 18/19] timer: " Tyler Retzlaff
2023-10-26  0:31   ` [PATCH v3 19/19] ring: " Tyler Retzlaff
2023-10-27 12:58     ` Konstantin Ananyev
2023-10-26 13:47   ` [PATCH v3 00/19] " David Marchand
2023-10-30 15:34   ` David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).