DPDK patches and discussions
 help / color / mirror / Atom feed
* [RFC 0/7] Improve EAL bit operations API
@ 2024-03-02 13:53 Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 1/7] eal: extend bit manipulation functions Mattias Rönnblom
                   ` (6 more replies)
  0 siblings, 7 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

RFCv1 is submitted primarily to 1) receive general feedback on if
improvements in this area is worth working on, and 2) receive feedback
on the API.

No test cases are included in v1 and the various functions may well
not do what they are intended to.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign][32|64]() which provides no memory
ordering or atomicity guarantees and no read-once or write-once
semantics (e.g., no use of volatile), but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK instructions on x86) may be a
significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the proposed bitset API (and potentially other
APIs depending on RTE bitops for atomic bit-level ops). Either one
needs two bitset variants, one _Atomic bitset and one non-atomic one,
or the bitset code needs to cast the non-_Atomic pointer to an _Atomic
one. Having a separate _Atomic bitset would be bloat and also prevent
the user from both, in some situations, doing atomic operations
against a bit set, while in other situations (e.g., at times when MT
safety is not a concern) operating on the same words in a non-atomic
manner. That said, all this is still unclear to the author and much
depending on the future path of DPDK atomics.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign*() are added, to assign a
particular boolean value to a particular bit.

All functions have properly documented semantics.

All functions are available in uint32_t and uint64_t variants.

In addition, for every function there is a generic selection variant
which operates on both 32-bit and 64-bit words (depending on the
pointer type). The use of C11 generic selection is the first in the
DPDK code base.

_Generic allow the user code to be a little more impact. Have a
generic atomic test/set/clear/assign bit API also seems consistent
with the "core" (word-size) atomics API, which is generic (both GCC
builtins and <rte_stdatomic.h> are).

The _Generic versions also may avoid having explicit unsigned long
versions of all functions. If you have an unsigned long, it's safe to
use the generic version (e.g., rte_set_bit()) and _Generic will pick
the right function, provided long is either 32 or 64 bit on your
platform (which it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

Things that are left out of this patch set, that may be included
in future versions:

 * Have all functions returning a bit number have the same return type
   (i.e., unsigned int).
 * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
 * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
   for useful/used bit-level GCC builtins.
 * Eliminate the MSVC #ifdef-induced documentation duplication.
 * _Generic versions of things like rte_popcount32(). (?)

ABI-breaking patches should probably go into a separate patch set (?).

Mattias Rönnblom (7):
  eal: extend bit manipulation functions
  eal: add generic bit manipulation macros
  eal: add bit manipulation functions which read or write once
  eal: add generic once-type bit operations macros
  eal: add atomic bit operations
  eal: add generic atomic bit operations
  eal: deprecate relaxed family of bit operations

 lib/eal/include/rte_bitops.h | 1115 +++++++++++++++++++++++++++++++++-
 1 file changed, 1113 insertions(+), 2 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC 1/7] eal: extend bit manipulation functions
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-02 17:05   ` Stephen Hemminger
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 2/7] eal: add generic bit manipulation macros Mattias Rönnblom
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Add functionality to test, set, clear, and assign the value to
individual bits in 32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 194 ++++++++++++++++++++++++++++++++++-
 1 file changed, 192 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..9a368724d5 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,8 +12,9 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
@@ -105,6 +107,194 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * Test if a particular bit in a 32-bit word is set.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to query.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+static inline bool
+rte_bit_test32(const uint32_t *addr, unsigned int nr);
+
+/**
+ * Set bit in 32-bit word.
+ *
+ * Set bit specified by @c nr in the 32-bit word pointed to by
+ * @c addr to '1'.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ */
+static inline void
+rte_bit_set32(uint32_t *addr, unsigned int nr);
+
+/**
+ * Clear bit in 32-bit word.
+ *
+ * Set bit specified by @c nr in the 32-bit word pointed to by
+ * @c addr to '0'.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ */
+static inline void
+rte_bit_clear32(uint32_t *addr, unsigned int nr);
+
+/**
+ * Assign a value to bit in a 32-bit word.
+ *
+ * Set bit specified by @c nr in the 32-bit word pointed to by
+ * @c addr to the value indicated by @c value.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+static inline void
+rte_bit_assign32(uint32_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		rte_bit_set32(addr, nr);
+	else
+		rte_bit_clear32(addr, nr);
+}
+
+/**
+ * Test if a particular bit in a 64-bit word is set.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to query.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+static inline bool
+rte_bit_test64(const uint64_t *addr, unsigned int nr);
+
+/**
+ * Set bit in 64-bit word.
+ *
+ * Set bit specified by @c nr in the 64-bit word pointed to by
+ * @c addr to '1'.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ */
+static inline void
+rte_bit_set64(uint64_t *addr, unsigned int nr);
+
+/**
+ * Clear bit in 64-bit word.
+ *
+ * Set bit specified by @c nr in the 64-bit word pointed to by
+ * @c addr to '0'.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ */
+static inline void
+rte_bit_clear64(uint64_t *addr, unsigned int nr);
+
+/**
+ * Assign a value to bit in a 64-bit word.
+ *
+ * Set bit specified by @c nr in the 64-bit word pointed to by
+ * @c addr to the value indicated by @c value.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+static inline void
+rte_bit_assign64(uint64_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		rte_bit_set64(addr, nr);
+	else
+		rte_bit_clear64(addr, nr);
+}
+
+#define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
+	static inline bool						\
+	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(name, size, qualifier)			\
+	static inline void						\
+	name(qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(name, size, qualifier)			\
+	static inline void						\
+	name(qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+__RTE_GEN_BIT_TEST(rte_bit_test32, 32,)
+__RTE_GEN_BIT_SET(rte_bit_set32, 32,)
+__RTE_GEN_BIT_CLEAR(rte_bit_clear32, 32,)
+
+__RTE_GEN_BIT_TEST(rte_bit_test64, 64,)
+__RTE_GEN_BIT_SET(rte_bit_set64, 64,)
+__RTE_GEN_BIT_CLEAR(rte_bit_clear64, 64,)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 1/7] eal: extend bit manipulation functions Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-04  8:16   ` Heng Wang
  2024-03-04 16:42   ` Tyler Retzlaff
  2024-03-02 13:53 ` [RFC 3/7] eal: add bit manipulation functions which read or write once Mattias Rönnblom
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Add bit-level test/set/clear/assign macros operating on both 32-bit
and 64-bit words by means of C11 generic selection.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 9a368724d5..afd0f11033 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -107,6 +107,87 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_test32,		\
+		 uint64_t *: rte_bit_test64)(addr, nr)
+
+/**
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_set32,		\
+		 uint64_t *: rte_bit_set64)(addr, nr)
+
+/**
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)			\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_clear32,		\
+		 uint64_t *: rte_bit_clear64)(addr, nr)
+
+/**
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)			\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_assign32,			\
+		 uint64_t *: rte_bit_assign64)(addr, nr, value)
+
 /**
  * Test if a particular bit in a 32-bit word is set.
  *
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC 3/7] eal: add bit manipulation functions which read or write once
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 1/7] eal: extend bit manipulation functions Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 2/7] eal: add generic bit manipulation macros Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 4/7] eal: add generic once-type bit operations macros Mattias Rönnblom
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Add bit test/set/clear/assign functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 229 +++++++++++++++++++++++++++++++++++
 1 file changed, 229 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index afd0f11033..3118c51748 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -338,6 +338,227 @@ rte_bit_assign64(uint64_t *addr, unsigned int nr, bool value)
 		rte_bit_clear64(addr, nr);
 }
 
+/**
+ * Test exactly once if a particular bit in a 32-bit word is set.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set32(addr, 17);
+ * if (rte_bit_once_test32(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set32() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set32() and
+ * rte_bit_test32() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test32(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test32(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * with the equivalent to
+ * \code{.c}
+ * if (rte_bit_test32(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * The regular bit set operations (e.g., rte_bit_test32()) should be
+ * preffered over the "once" family of operations (e.g.,
+ * rte_bit_once_test32()), since the latter may prevent optimizations
+ * crucial for run-time performance.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering (except ordering from the same thread to the same memory
+ * location) or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to query.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+static inline bool
+rte_bit_once_test32(const volatile uint32_t *addr, unsigned int nr);
+
+/**
+ * Set bit in 32-bit word exactly once.
+ *
+ * Set bit specified by @c nr in the 32-bit word pointed to by
+ * @c addr to '1'.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ */
+static inline void
+rte_bit_once_set32(volatile uint32_t *addr, unsigned int nr);
+
+/**
+ * Clear bit in 32-bit word exactly once.
+ *
+ * Set bit specified by @c nr in the 32-bit word pointed to by @c addr
+ * to '0'.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_once_test32() for more information and uses cases for the
+ * "once" class of functions.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ */
+static inline void
+rte_bit_once_clear32(volatile uint32_t *addr, unsigned int nr);
+
+/**
+ * Assign a value to bit in a 32-bit word exactly once.
+ *
+ * Set bit specified by @c nr in the 32-bit word pointed to by
+ * @c addr to the value indicated by @c value.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+static inline void
+rte_bit_once_assign32(uint32_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		rte_bit_once_set32(addr, nr);
+	else
+		rte_bit_once_clear32(addr, nr);
+}
+
+/**
+ * Test exactly once if a particular bit in a 64-bit word is set.
+ *
+ * This function is guaranteed to result in exactly one memory load.
+ * See rte_bit_once_test32() for more information and uses cases for the
+ * "once" class of functions.
+ *
+ * rte_v_bit_test64() does give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to query.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+static inline bool
+rte_bit_once_test64(const volatile uint64_t *addr, unsigned int nr);
+
+/**
+ * Set bit in 64-bit word exactly once.
+ *
+ * Set bit specified by @c nr in the 64-bit word pointed to by
+ * @c addr to '1'.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_once_test32() for more information and uses cases for the
+ * "once" class of functions.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ */
+static inline void
+rte_bit_once_set64(volatile uint64_t *addr, unsigned int nr);
+
+/**
+ * Clear bit in 64-bit word exactly once.
+ *
+ * Set bit specified by @c nr in the 64-bit word pointed to by @c addr
+ * to '0'.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_once_test32() for more information and uses cases for the
+ * "once" class of functions.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ */
+static inline void
+rte_bit_once_clear64(volatile uint64_t *addr, unsigned int nr);
+
+/**
+ * Assign a value to bit in a 64-bit word exactly once.
+ *
+ * Set bit specified by @c nr in the 64-bit word pointed to by
+ * @c addr to the value indicated by @c value.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+static inline void
+rte_bit_once_assign64(volatile uint64_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		rte_bit_once_set64(addr, nr);
+	else
+		rte_bit_once_clear64(addr, nr);
+}
+
 #define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
 	static inline bool						\
 	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
@@ -376,6 +597,14 @@ __RTE_GEN_BIT_TEST(rte_bit_test64, 64,)
 __RTE_GEN_BIT_SET(rte_bit_set64, 64,)
 __RTE_GEN_BIT_CLEAR(rte_bit_clear64, 64,)
 
+__RTE_GEN_BIT_TEST(rte_bit_once_test32, 32, volatile)
+__RTE_GEN_BIT_SET(rte_bit_once_set32, 32, volatile)
+__RTE_GEN_BIT_CLEAR(rte_bit_once_clear32, 32, volatile)
+
+__RTE_GEN_BIT_TEST(rte_bit_once_test64, 64, volatile)
+__RTE_GEN_BIT_SET(rte_bit_once_set64, 64, volatile)
+__RTE_GEN_BIT_CLEAR(rte_bit_once_clear64, 64, volatile)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC 4/7] eal: add generic once-type bit operations macros
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
                   ` (2 preceding siblings ...)
  2024-03-02 13:53 ` [RFC 3/7] eal: add bit manipulation functions which read or write once Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 5/7] eal: add atomic bit operations Mattias Rönnblom
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Add macros for once-type bit operations operating on both 32-bit and
64-bit words by means of C11 generic selection.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 101 +++++++++++++++++++++++++++++++++++
 1 file changed, 101 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3118c51748..450334c751 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -188,6 +188,107 @@ extern "C" {
 		 uint32_t *: rte_bit_assign32,			\
 		 uint64_t *: rte_bit_assign64)(addr, nr, value)
 
+/**
+ * Test exactly once if a particular bit in a word is set.
+ *
+ * Generic selection macro to exactly once test the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * This macro is guaranteed to result in exactly one memory load. See
+ * rte_bit_once_test32() for more information and uses cases for the
+ * "once" class of functions.
+ *
+ * rte_bit_once_test() does give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: rte_bit_once_test32,		\
+		 uint64_t *: rte_bit_once_test64)(addr, nr)
+
+/**
+ * Set bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '1'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: rte_bit_once_set32,		\
+		 uint64_t *: rte_bit_once_set64)(addr, nr)
+
+/**
+ * Clear bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '0'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: rte_bit_once_clear32,		\
+		 uint64_t *: rte_bit_once_clear64)(addr, nr)
+
+/**
+ * Assign a value to bit in a word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to the
+ * value indicated by @c value exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_once_assign32,			\
+		 uint64_t *: rte_bit_once_assign64)(addr, nr, value)
+
 /**
  * Test if a particular bit in a 32-bit word is set.
  *
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC 5/7] eal: add atomic bit operations
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
                   ` (3 preceding siblings ...)
  2024-03-02 13:53 ` [RFC 4/7] eal: add generic once-type bit operations macros Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 6/7] eal: add generic " Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 7/7] eal: deprecate relaxed family of " Mattias Rönnblom
  6 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Add atomic bit test/set/clear/assign and test-and-set/clear functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 337 +++++++++++++++++++++++++++++++++++
 1 file changed, 337 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 450334c751..7eb08bc768 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -20,6 +20,7 @@
 #include <stdint.h>
 
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -706,6 +707,342 @@ __RTE_GEN_BIT_TEST(rte_bit_once_test64, 64, volatile)
 __RTE_GEN_BIT_SET(rte_bit_once_set64, 64, volatile)
 __RTE_GEN_BIT_CLEAR(rte_bit_once_clear64, 64, volatile)
 
+/**
+ * Test if a particular bit in a 32-bit word is set with a particular
+ * memory order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to query.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+static inline bool
+rte_bit_atomic_test32(const uint32_t *addr, unsigned int nr, int memory_order);
+
+/**
+ * Atomically set bit in 32-bit word.
+ *
+ * Atomically bit specified by @c nr in the 32-bit word pointed to by
+ * @c addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+static inline void
+rte_bit_atomic_set32(uint32_t *addr, unsigned int nr, int memory_order);
+
+/**
+ * Atomically clear bit in 32-bit word.
+ *
+ * Atomically set bit specified by @c nr in the 32-bit word pointed to
+ * by @c addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+static inline void
+rte_bit_atomic_clear32(uint32_t *addr, unsigned int nr, int memory_order);
+
+/**
+ * Atomically assign a value to bit in a 32-bit word.
+ *
+ * Atomically set bit specified by @c nr in the 32-bit word pointed to
+ * by @c addr to the value indicated by @c value, with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+static inline void
+rte_bit_atomic_assign32(uint32_t *addr, unsigned int nr, bool value,
+			int memory_order);
+
+/*
+ * Atomic test-and-assign is not considered useful-enough to document
+ * and expose in the public API.
+ */
+static inline bool
+__rte_bit_atomic_test_and_assign32(uint32_t *addr, unsigned int nr, bool value,
+				   int memory_order);
+
+/**
+ * Atomically test and set a bit in a 32-bit word.
+ *
+ * Atomically test and set bit specified by @c nr in the 32-bit word
+ * pointed to by @c addr to the value indicated by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+static inline bool
+rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true, memory_order);
+}
+
+/**
+ * Atomically test and clear a bit in a 32-bit word.
+ *
+ * Atomically test and clear bit specified by @c nr in the 32-bit word
+ * pointed to by @c addr to the value indicated by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+static inline bool
+rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false, memory_order);
+}
+
+/**
+ * Test if a particular bit in a 32-bit word is set with a particular
+ * memory order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the 32-bit word to query.
+ * @param nr
+ *   The index of the bit (0-31).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+static inline bool
+rte_bit_atomic_test64(const uint64_t *addr, unsigned int nr, int memory_order);
+
+/**
+ * Atomically set bit in 64-bit word.
+ *
+ * Atomically bit specified by @c nr in the 64-bit word pointed to by
+ * @c addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+static inline void
+rte_bit_atomic_set64(uint64_t *addr, unsigned int nr, int memory_order);
+
+/**
+ * Atomically clear bit in 64-bit word.
+ *
+ * Atomically set bit specified by @c nr in the 64-bit word pointed to
+ * by @c addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+static inline void
+rte_bit_atomic_clear64(uint64_t *addr, unsigned int nr, int memory_order);
+
+/**
+ * Atomically assign a value to bit in a 64-bit word.
+ *
+ * Atomically set bit specified by @c nr in the 64-bit word pointed to
+ * by @c addr to the value indicated by @c value, with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+static inline void
+rte_bit_atomic_assign64(uint64_t *addr, unsigned int nr, bool value,
+			int memory_order);
+
+/*
+ * Atomic test-and-assign is not considered useful-enough to document
+ * and expose in the public API.
+ */
+static inline bool
+__rte_bit_atomic_test_and_assign64(uint64_t *addr, unsigned int nr, bool value,
+				   int memory_order);
+/**
+ * Atomically test and set a bit in a 64-bit word.
+ *
+ * Atomically test and set bit specified by @c nr in the 64-bit word
+ * pointed to by @c addr to the value indicated by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+static inline bool
+rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true, memory_order);
+}
+
+/**
+ * Atomically test and clear a bit in a 64-bit word.
+ *
+ * Atomically test and clear bit specified by @c nr in the 64-bit word
+ * pointed to by @c addr to the value indicated by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the 64-bit word to modify.
+ * @param nr
+ *   The index of the bit (0-63).
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+static inline bool
+rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false, memory_order);
+}
+
+#ifndef RTE_ENABLE_STDATOMIC
+
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	static inline bool						\
+	rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				    unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return __atomic_load_n(addr, memory_order) & mask;	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	static inline void						\
+	rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				   unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		__atomic_fetch_or(addr, mask, memory_order);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	static inline void						\
+	rte_bit_atomic_clear ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		__atomic_fetch_and(addr, ~mask, memory_order);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	static inline void						\
+	rte_bit_atomic_assign ## size(uint ## size ## _t *addr,		\
+				      unsigned int nr, bool value,	\
+				      int memory_order)			\
+	{								\
+		if (value)						\
+			rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			rte_bit_atomic_clear ## size(addr, nr, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t before;				\
+		uint ## size ## _t after;				\
+									\
+		before = __atomic_load_n(addr, __ATOMIC_RELAXED);	\
+									\
+		do {							\
+			rte_bit_assign ## size(&before, nr, value);	\
+		} while(!__atomic_compare_exchange_n(addr, &before, after, \
+						     true, __ATOMIC_RELAXED, \
+						     memory_order));	\
+		return rte_bit_test ## size(&before, nr);		\
+	}
+
+#else
+#error "C11 atomics (MSVC) not supported in this RFC version"
+#endif
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC 6/7] eal: add generic atomic bit operations
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
                   ` (4 preceding siblings ...)
  2024-03-02 13:53 ` [RFC 5/7] eal: add atomic bit operations Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-02 13:53 ` [RFC 7/7] eal: deprecate relaxed family of " Mattias Rönnblom
  6 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Add atomic bit-level test/set/clear/assign macros operating on both
32-bit and 64-bit words by means of C11 generic selection.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 125 +++++++++++++++++++++++++++++++++++
 1 file changed, 125 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 7eb08bc768..b5a9df5930 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -290,6 +290,131 @@ extern "C" {
 		 uint32_t *: rte_bit_once_assign32,			\
 		 uint64_t *: rte_bit_once_assign64)(addr, nr, value)
 
+/**
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_atomic_test32,			\
+		 uint64_t *: rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_atomic_set32,			\
+		 uint64_t *: rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_atomic_clear32,			\
+		 uint64_t *: rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_atomic_assign32,			\
+		 uint64_t *: rte_bit_atomic_assign64)(addr, nr, value,	\
+						      memory_order)
+
+/**
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: rte_bit_atomic_test_and_set64)(addr, nr,	\
+							    memory_order))
+
+/**
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: rte_bit_atomic_test_and_clear64)(addr, nr, \
+							      memory_order))
+
 /**
  * Test if a particular bit in a 32-bit word is set.
  *
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC 7/7] eal: deprecate relaxed family of bit operations
  2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
                   ` (5 preceding siblings ...)
  2024-03-02 13:53 ` [RFC 6/7] eal: add generic " Mattias Rönnblom
@ 2024-03-02 13:53 ` Mattias Rönnblom
  2024-03-02 17:07   ` Stephen Hemminger
  6 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

Informally (by means of documentation) deprecate the
rte_bit_relaxed_*() family of bit-level operations.

rte_bit_relaxed_*() has been replaced by three new families of
bit-level query and manipulation functions.

rte_bit_relaxed_*() failed to deliver the atomicity guarantees their
name suggested. If deprecated, it will encourage the user to consider
whether the actual, implemented behavior (e.g., non-atomic
test-and-set with read/write-once semantics) or the semantics implied
by their names (i.e., atomic), or something else, is what's actually
needed.

Bugzilla ID: 1385

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 48 ++++++++++++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index b5a9df5930..783dd0e1ee 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -1179,6 +1179,10 @@ __RTE_GEN_BIT_ATOMIC_OPS(64)
  *   The address holding the bit.
  * @return
  *   The target bit.
+ * @note
+ *   This function is deprecated. Use rte_bit_test32(),
+ *   rte_bit_once_test32(), or rte_bit_atomic_test32() instead,
+ *   depending on exactly what guarantees are required.
  */
 static inline uint32_t
 rte_bit_relaxed_get32(unsigned int nr, volatile uint32_t *addr)
@@ -1196,6 +1200,10 @@ rte_bit_relaxed_get32(unsigned int nr, volatile uint32_t *addr)
  *   The target bit to set.
  * @param addr
  *   The address holding the bit.
+ * @note
+ *   This function is deprecated. Use rte_bit_set32(),
+ *   rte_bit_once_set32(), or rte_bit_atomic_set32() instead,
+ *   depending on exactly what guarantees are required.
  */
 static inline void
 rte_bit_relaxed_set32(unsigned int nr, volatile uint32_t *addr)
@@ -1213,6 +1221,10 @@ rte_bit_relaxed_set32(unsigned int nr, volatile uint32_t *addr)
  *   The target bit to clear.
  * @param addr
  *   The address holding the bit.
+ * @note
+ *   This function is deprecated. Use rte_bit_clear32(),
+ *   rte_bit_once_clear32(), or rte_bit_atomic_clear32() instead,
+ *   depending on exactly what guarantees are required.
  */
 static inline void
 rte_bit_relaxed_clear32(unsigned int nr, volatile uint32_t *addr)
@@ -1233,6 +1245,12 @@ rte_bit_relaxed_clear32(unsigned int nr, volatile uint32_t *addr)
  *   The address holding the bit.
  * @return
  *   The original bit.
+ * @note
+ *   This function is deprecated and replaced by
+ *   rte_bit_atomic_test_and_set32(), for use cases where the
+ *   operation needs to be atomic. For non-atomic/non-ordered use
+ *   cases, use rte_bit_test32() + rte_bit_set32() or
+ *   rte_bit_once_test32() + rte_bit_once_set32().
  */
 static inline uint32_t
 rte_bit_relaxed_test_and_set32(unsigned int nr, volatile uint32_t *addr)
@@ -1255,6 +1273,12 @@ rte_bit_relaxed_test_and_set32(unsigned int nr, volatile uint32_t *addr)
  *   The address holding the bit.
  * @return
  *   The original bit.
+ * @note
+ *   This function is deprecated and replaced by
+ *   rte_bit_atomic_test_and_clear32(), for use cases where the
+ *   operation needs to be atomic. For non-atomic/non-ordered use
+ *   cases, use rte_bit_test32() + rte_bit_clear32() or
+ *   rte_bit_once_test32() + rte_bit_once_clear32().
  */
 static inline uint32_t
 rte_bit_relaxed_test_and_clear32(unsigned int nr, volatile uint32_t *addr)
@@ -1278,6 +1302,10 @@ rte_bit_relaxed_test_and_clear32(unsigned int nr, volatile uint32_t *addr)
  *   The address holding the bit.
  * @return
  *   The target bit.
+ * @note
+ *   This function is deprecated. Use rte_bit_test64(),
+ *   rte_bit_once_test64(), or rte_bit_atomic_test64() instead,
+ *   depending on exactly what guarantees are required.
  */
 static inline uint64_t
 rte_bit_relaxed_get64(unsigned int nr, volatile uint64_t *addr)
@@ -1295,6 +1323,10 @@ rte_bit_relaxed_get64(unsigned int nr, volatile uint64_t *addr)
  *   The target bit to set.
  * @param addr
  *   The address holding the bit.
+ * @note
+ *   This function is deprecated. Use rte_bit_set64(),
+ *   rte_bit_once_set64(), or rte_bit_atomic_set64() instead,
+ *   depending on exactly what guarantees are required.
  */
 static inline void
 rte_bit_relaxed_set64(unsigned int nr, volatile uint64_t *addr)
@@ -1312,6 +1344,10 @@ rte_bit_relaxed_set64(unsigned int nr, volatile uint64_t *addr)
  *   The target bit to clear.
  * @param addr
  *   The address holding the bit.
+ * @note
+ *   This function is deprecated. Use rte_bit_clear64(),
+ *   rte_bit_once_clear64(), or rte_bit_atomic_clear64() instead,
+ *   depending on exactly what guarantees are required.
  */
 static inline void
 rte_bit_relaxed_clear64(unsigned int nr, volatile uint64_t *addr)
@@ -1332,6 +1368,12 @@ rte_bit_relaxed_clear64(unsigned int nr, volatile uint64_t *addr)
  *   The address holding the bit.
  * @return
  *   The original bit.
+ * @note
+ *   This function is deprecated and replaced by
+ *   rte_bit_atomic_test_and_set64(), for use cases where the
+ *   operation needs to be atomic. For non-atomic/non-ordered use
+ *   cases, use rte_bit_test64() + rte_bit_set64() or
+ *   rte_bit_once_test64() + rte_bit_once_set64().
  */
 static inline uint64_t
 rte_bit_relaxed_test_and_set64(unsigned int nr, volatile uint64_t *addr)
@@ -1354,6 +1396,12 @@ rte_bit_relaxed_test_and_set64(unsigned int nr, volatile uint64_t *addr)
  *   The address holding the bit.
  * @return
  *   The original bit.
+ * @note
+ *   This function is deprecated and replaced by
+ *   rte_bit_atomic_test_and_clear64(), for use cases where the
+ *   operation needs to be atomic. For non-atomic/non-ordered use
+ *   cases, use rte_bit_test64() + rte_bit_clear64() or
+ *   rte_bit_once_test64() + rte_bit_once_clear64().
  */
 static inline uint64_t
 rte_bit_relaxed_test_and_clear64(unsigned int nr, volatile uint64_t *addr)
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 1/7] eal: extend bit manipulation functions
  2024-03-02 13:53 ` [RFC 1/7] eal: extend bit manipulation functions Mattias Rönnblom
@ 2024-03-02 17:05   ` Stephen Hemminger
  2024-03-03  6:26     ` Mattias Rönnblom
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
  1 sibling, 1 reply; 160+ messages in thread
From: Stephen Hemminger @ 2024-03-02 17:05 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, hofors, Heng Wang

On Sat, 2 Mar 2024 14:53:22 +0100
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> index 449565eeae..9a368724d5 100644
> --- a/lib/eal/include/rte_bitops.h
> +++ b/lib/eal/include/rte_bitops.h
> @@ -2,6 +2,7 @@
>   * Copyright(c) 2020 Arm Limited
>   * Copyright(c) 2010-2019 Intel Corporation
>   * Copyright(c) 2023 Microsoft Corporation
> + * Copyright(c) 2024 Ericsson AB
>   */
>  

Unless this is coming from another project code base, the common
practice is not to add copyright for each contributor in later versions.

> +/**
> + * Test if a particular bit in a 32-bit word is set.
> + *
> + * This function does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the 32-bit word to query.
> + * @param nr
> + *   The index of the bit (0-31).
> + * @return
> + *   Returns true if the bit is set, and false otherwise.
> + */
> +static inline bool
> +rte_bit_test32(const uint32_t *addr, unsigned int nr);

Is it possible to reorder these inlines to avoid having
forward declarations?

Also, new functions should be marked __rte_experimental
for a release or two.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 7/7] eal: deprecate relaxed family of bit operations
  2024-03-02 13:53 ` [RFC 7/7] eal: deprecate relaxed family of " Mattias Rönnblom
@ 2024-03-02 17:07   ` Stephen Hemminger
  2024-03-03  6:30     ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: Stephen Hemminger @ 2024-03-02 17:07 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, hofors, Heng Wang

On Sat, 2 Mar 2024 14:53:28 +0100
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> index b5a9df5930..783dd0e1ee 100644
> --- a/lib/eal/include/rte_bitops.h
> +++ b/lib/eal/include/rte_bitops.h
> @@ -1179,6 +1179,10 @@ __RTE_GEN_BIT_ATOMIC_OPS(64)
>   *   The address holding the bit.
>   * @return
>   *   The target bit.
> + * @note
> + *   This function is deprecated. Use rte_bit_test32(),
> + *   rte_bit_once_test32(), or rte_bit_atomic_test32() instead,
> + *   depending on exactly what guarantees are required.
>   */
>  static inline uint32_t
>  rte_bit_relaxed_get32(unsigned int nr, volatile uint32_t *addr)

The DPDK process is:
	- mark these as deprecated in release notes of release N.
	- mark these as deprecated using __rte_deprecated in next LTS
	- drop these in LTS release after that.

Don't use notes for this.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 1/7] eal: extend bit manipulation functions
  2024-03-02 17:05   ` Stephen Hemminger
@ 2024-03-03  6:26     ` Mattias Rönnblom
  2024-03-04 16:34       ` Tyler Retzlaff
  0 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-03  6:26 UTC (permalink / raw)
  To: Stephen Hemminger, Mattias Rönnblom; +Cc: dev, Heng Wang

On 2024-03-02 18:05, Stephen Hemminger wrote:
> On Sat, 2 Mar 2024 14:53:22 +0100
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>> index 449565eeae..9a368724d5 100644
>> --- a/lib/eal/include/rte_bitops.h
>> +++ b/lib/eal/include/rte_bitops.h
>> @@ -2,6 +2,7 @@
>>    * Copyright(c) 2020 Arm Limited
>>    * Copyright(c) 2010-2019 Intel Corporation
>>    * Copyright(c) 2023 Microsoft Corporation
>> + * Copyright(c) 2024 Ericsson AB
>>    */
>>   
> 
> Unless this is coming from another project code base, the common
> practice is not to add copyright for each contributor in later versions.
> 

Unless it's a large contribution (compared to the rest of the file)?

I guess that's why the 916c50d commit adds the Microsoft copyright notice.

>> +/**
>> + * Test if a particular bit in a 32-bit word is set.
>> + *
>> + * This function does not give any guarantees in regards to memory
>> + * ordering or atomicity.
>> + *
>> + * @param addr
>> + *   A pointer to the 32-bit word to query.
>> + * @param nr
>> + *   The index of the bit (0-31).
>> + * @return
>> + *   Returns true if the bit is set, and false otherwise.
>> + */
>> +static inline bool
>> +rte_bit_test32(const uint32_t *addr, unsigned int nr);
> 
> Is it possible to reorder these inlines to avoid having
> forward declarations?
> 

Yes, but I'm not sure it's a net gain.

A statement expression macro seems like a perfect tool for the job, but 
then MSVC doesn't support statement expressions. You could also have a 
macro that just generate the function body, as oppose to the whole function.

I'll consider if I should just bite the bullet and expand all the 
macros. 4x duplication.

> Also, new functions should be marked __rte_experimental
> for a release or two.

Yes, thanks.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 7/7] eal: deprecate relaxed family of bit operations
  2024-03-02 17:07   ` Stephen Hemminger
@ 2024-03-03  6:30     ` Mattias Rönnblom
  0 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-03  6:30 UTC (permalink / raw)
  To: Stephen Hemminger, Mattias Rönnblom; +Cc: dev, Heng Wang

On 2024-03-02 18:07, Stephen Hemminger wrote:
> On Sat, 2 Mar 2024 14:53:28 +0100
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>> index b5a9df5930..783dd0e1ee 100644
>> --- a/lib/eal/include/rte_bitops.h
>> +++ b/lib/eal/include/rte_bitops.h
>> @@ -1179,6 +1179,10 @@ __RTE_GEN_BIT_ATOMIC_OPS(64)
>>    *   The address holding the bit.
>>    * @return
>>    *   The target bit.
>> + * @note
>> + *   This function is deprecated. Use rte_bit_test32(),
>> + *   rte_bit_once_test32(), or rte_bit_atomic_test32() instead,
>> + *   depending on exactly what guarantees are required.
>>    */
>>   static inline uint32_t
>>   rte_bit_relaxed_get32(unsigned int nr, volatile uint32_t *addr)
> 
> The DPDK process is:
> 	- mark these as deprecated in release notes of release N.
> 	- mark these as deprecated using __rte_deprecated in next LTS
> 	- drop these in LTS release after that.
> 
> Don't use notes for this.

Don't use notes to replace the above process, or don't use notes at all?

A note seems useful to me, especially considering there is a choice to 
be made (not just mindlessly replacing one call with another).

Anyway, release notes updates have to wait so I'll just drop this patch 
for now.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-02 13:53 ` [RFC 2/7] eal: add generic bit manipulation macros Mattias Rönnblom
@ 2024-03-04  8:16   ` Heng Wang
  2024-03-04 15:41     ` Mattias Rönnblom
  2024-03-04 16:42   ` Tyler Retzlaff
  1 sibling, 1 reply; 160+ messages in thread
From: Heng Wang @ 2024-03-04  8:16 UTC (permalink / raw)
  To: Mattias Rönnblom, dev; +Cc: hofors

Hi Mattias,
  I have a comment about the _Generic. What if the user gives uint8_t * or uint16_t * as the address. One improvement is that we could add a default branch in _Generic to throw a compiler error or assert false.
  Another question is what if nr >= sizeof(type) ? What if you do, for example, (uint32_t)1 << 35? Maybe we could add an assert in the implementation?

Regards,
Heng

-----Original Message-----
From: Mattias Rönnblom <mattias.ronnblom@ericsson.com> 
Sent: Saturday, March 2, 2024 2:53 PM
To: dev@dpdk.org
Cc: hofors@lysator.liu.se; Heng Wang <heng.wang@ericsson.com>; Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Subject: [RFC 2/7] eal: add generic bit manipulation macros

Add bit-level test/set/clear/assign macros operating on both 32-bit and 64-bit words by means of C11 generic selection.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
 1 file changed, 81 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h index 9a368724d5..afd0f11033 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -107,6 +107,87 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_test32,		\
+		 uint64_t *: rte_bit_test64)(addr, nr)
+
+/**
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_set32,		\
+		 uint64_t *: rte_bit_set64)(addr, nr)
+
+/**
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)			\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_clear32,		\
+		 uint64_t *: rte_bit_clear64)(addr, nr)
+
+/**
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 
+64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)			\
+	_Generic((addr),				\
+		 uint32_t *: rte_bit_assign32,			\
+		 uint64_t *: rte_bit_assign64)(addr, nr, value)
+
 /**
  * Test if a particular bit in a 32-bit word is set.
  *
--
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-04  8:16   ` Heng Wang
@ 2024-03-04 15:41     ` Mattias Rönnblom
  0 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-04 15:41 UTC (permalink / raw)
  To: Heng Wang, Mattias Rönnblom, dev


On 2024-03-04 09:16, Heng Wang wrote:
> Hi Mattias,
>    I have a comment about the _Generic. What if the user gives uint8_t * or uint16_t * as the address. One improvement is that we could add a default branch in _Generic to throw a compiler error or assert false.

If the user pass an incompatible pointer, the compiler will generate an 
error.

>    Another question is what if nr >= sizeof(type) ? What if you do, for example, (uint32_t)1 << 35? Maybe we could add an assert in the implementation?
> 

There are already such asserts in the functions the macro delegates to.

That said, DPDK RTE_ASSERT()s are disabled even in debug builds, so I'm 
not sure it's going to help anyone.

> Regards,
> Heng
> 
> -----Original Message-----
> From: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Sent: Saturday, March 2, 2024 2:53 PM
> To: dev@dpdk.org
> Cc: hofors@lysator.liu.se; Heng Wang <heng.wang@ericsson.com>; Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Subject: [RFC 2/7] eal: add generic bit manipulation macros
> 
> Add bit-level test/set/clear/assign macros operating on both 32-bit and 64-bit words by means of C11 generic selection.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---
>   lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
>   1 file changed, 81 insertions(+)
> 
> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h index 9a368724d5..afd0f11033 100644
> --- a/lib/eal/include/rte_bitops.h
> +++ b/lib/eal/include/rte_bitops.h
> @@ -107,6 +107,87 @@ extern "C" {
>   #define RTE_FIELD_GET64(mask, reg) \
>   		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
>   
> +/**
> + * Test bit in word.
> + *
> + * Generic selection macro to test the value of a bit in a 32-bit or
> + * 64-bit word. The type of operation depends on the type of the @c
> + * addr parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + */
> +#define rte_bit_test(addr, nr)				\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_test32,		\
> +		 uint64_t *: rte_bit_test64)(addr, nr)
> +
> +/**
> + * Set bit in word.
> + *
> + * Generic selection macro to set a bit in a 32-bit or 64-bit
> + * word. The type of operation depends on the type of the @c addr
> + * parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + */
> +#define rte_bit_set(addr, nr)				\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_set32,		\
> +		 uint64_t *: rte_bit_set64)(addr, nr)
> +
> +/**
> + * Clear bit in word.
> + *
> + * Generic selection macro to clear a bit in a 32-bit or 64-bit
> + * word. The type of operation depends on the type of the @c addr
> + * parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + */
> +#define rte_bit_clear(addr, nr)			\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_clear32,		\
> +		 uint64_t *: rte_bit_clear64)(addr, nr)
> +
> +/**
> + * Assign a value to a bit in word.
> + *
> + * Generic selection macro to assign a value to a bit in a 32-bit or
> +64-bit
> + * word. The type of operation depends on the type of the @c addr parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + * @param value
> + *   The new value of the bit - true for '1', or false for '0'.
> + */
> +#define rte_bit_assign(addr, nr, value)			\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_assign32,			\
> +		 uint64_t *: rte_bit_assign64)(addr, nr, value)
> +
>   /**
>    * Test if a particular bit in a 32-bit word is set.
>    *
> --
> 2.34.1
> 

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 1/7] eal: extend bit manipulation functions
  2024-03-03  6:26     ` Mattias Rönnblom
@ 2024-03-04 16:34       ` Tyler Retzlaff
  2024-03-05 18:01         ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: Tyler Retzlaff @ 2024-03-04 16:34 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Stephen Hemminger, Mattias Rönnblom, dev, Heng Wang

On Sun, Mar 03, 2024 at 07:26:36AM +0100, Mattias Rönnblom wrote:
> On 2024-03-02 18:05, Stephen Hemminger wrote:
> >On Sat, 2 Mar 2024 14:53:22 +0100
> >Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> >
> >>diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> >>index 449565eeae..9a368724d5 100644
> >>--- a/lib/eal/include/rte_bitops.h
> >>+++ b/lib/eal/include/rte_bitops.h
> >>@@ -2,6 +2,7 @@
> >>   * Copyright(c) 2020 Arm Limited
> >>   * Copyright(c) 2010-2019 Intel Corporation
> >>   * Copyright(c) 2023 Microsoft Corporation
> >>+ * Copyright(c) 2024 Ericsson AB
> >>   */
> >
> >Unless this is coming from another project code base, the common
> >practice is not to add copyright for each contributor in later versions.
> >
> 
> Unless it's a large contribution (compared to the rest of the file)?
> 
> I guess that's why the 916c50d commit adds the Microsoft copyright notice.
> 
> >>+/**
> >>+ * Test if a particular bit in a 32-bit word is set.
> >>+ *
> >>+ * This function does not give any guarantees in regards to memory
> >>+ * ordering or atomicity.
> >>+ *
> >>+ * @param addr
> >>+ *   A pointer to the 32-bit word to query.
> >>+ * @param nr
> >>+ *   The index of the bit (0-31).
> >>+ * @return
> >>+ *   Returns true if the bit is set, and false otherwise.
> >>+ */
> >>+static inline bool
> >>+rte_bit_test32(const uint32_t *addr, unsigned int nr);
> >
> >Is it possible to reorder these inlines to avoid having
> >forward declarations?
> >
> 
> Yes, but I'm not sure it's a net gain.
> 
> A statement expression macro seems like a perfect tool for the job,
> but then MSVC doesn't support statement expressions. You could also
> have a macro that just generate the function body, as oppose to the
> whole function.

statement expressions can be used even with MSVC when using C. but GCC
documentation discourages their use for C++. since the header is
consumed by C++ in addition to C it's preferrable to avoid them.

> 
> I'll consider if I should just bite the bullet and expand all the
> macros. 4x duplication.
> 
> >Also, new functions should be marked __rte_experimental
> >for a release or two.
> 
> Yes, thanks.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-02 13:53 ` [RFC 2/7] eal: add generic bit manipulation macros Mattias Rönnblom
  2024-03-04  8:16   ` Heng Wang
@ 2024-03-04 16:42   ` Tyler Retzlaff
  2024-03-05 18:08     ` Mattias Rönnblom
  1 sibling, 1 reply; 160+ messages in thread
From: Tyler Retzlaff @ 2024-03-04 16:42 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, hofors, Heng Wang

On Sat, Mar 02, 2024 at 02:53:23PM +0100, Mattias Rönnblom wrote:
> Add bit-level test/set/clear/assign macros operating on both 32-bit
> and 64-bit words by means of C11 generic selection.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---

_Generic is nice here. should we discourage direct use of the inline
functions in preference of using the macro always? either way lgtm.

Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

>  lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
>  1 file changed, 81 insertions(+)
> 
> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> index 9a368724d5..afd0f11033 100644
> --- a/lib/eal/include/rte_bitops.h
> +++ b/lib/eal/include/rte_bitops.h
> @@ -107,6 +107,87 @@ extern "C" {
>  #define RTE_FIELD_GET64(mask, reg) \
>  		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
>  
> +/**
> + * Test bit in word.
> + *
> + * Generic selection macro to test the value of a bit in a 32-bit or
> + * 64-bit word. The type of operation depends on the type of the @c
> + * addr parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + */
> +#define rte_bit_test(addr, nr)				\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_test32,		\
> +		 uint64_t *: rte_bit_test64)(addr, nr)
> +
> +/**
> + * Set bit in word.
> + *
> + * Generic selection macro to set a bit in a 32-bit or 64-bit
> + * word. The type of operation depends on the type of the @c addr
> + * parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + */
> +#define rte_bit_set(addr, nr)				\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_set32,		\
> +		 uint64_t *: rte_bit_set64)(addr, nr)
> +
> +/**
> + * Clear bit in word.
> + *
> + * Generic selection macro to clear a bit in a 32-bit or 64-bit
> + * word. The type of operation depends on the type of the @c addr
> + * parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + */
> +#define rte_bit_clear(addr, nr)			\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_clear32,		\
> +		 uint64_t *: rte_bit_clear64)(addr, nr)
> +
> +/**
> + * Assign a value to a bit in word.
> + *
> + * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
> + * word. The type of operation depends on the type of the @c addr parameter.
> + *
> + * This macro does not give any guarantees in regards to memory
> + * ordering or atomicity.
> + *
> + * @param addr
> + *   A pointer to the word to modify.
> + * @param nr
> + *   The index of the bit.
> + * @param value
> + *   The new value of the bit - true for '1', or false for '0'.
> + */
> +#define rte_bit_assign(addr, nr, value)			\
> +	_Generic((addr),				\
> +		 uint32_t *: rte_bit_assign32,			\
> +		 uint64_t *: rte_bit_assign64)(addr, nr, value)
> +
>  /**
>   * Test if a particular bit in a 32-bit word is set.
>   *
> -- 
> 2.34.1

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 1/7] eal: extend bit manipulation functions
  2024-03-04 16:34       ` Tyler Retzlaff
@ 2024-03-05 18:01         ` Mattias Rönnblom
  2024-03-05 18:06           ` Tyler Retzlaff
  0 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-05 18:01 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: Stephen Hemminger, Mattias Rönnblom, dev, Heng Wang

On 2024-03-04 17:34, Tyler Retzlaff wrote:
> On Sun, Mar 03, 2024 at 07:26:36AM +0100, Mattias Rönnblom wrote:
>> On 2024-03-02 18:05, Stephen Hemminger wrote:
>>> On Sat, 2 Mar 2024 14:53:22 +0100
>>> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>>>
>>>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>>>> index 449565eeae..9a368724d5 100644
>>>> --- a/lib/eal/include/rte_bitops.h
>>>> +++ b/lib/eal/include/rte_bitops.h
>>>> @@ -2,6 +2,7 @@
>>>>    * Copyright(c) 2020 Arm Limited
>>>>    * Copyright(c) 2010-2019 Intel Corporation
>>>>    * Copyright(c) 2023 Microsoft Corporation
>>>> + * Copyright(c) 2024 Ericsson AB
>>>>    */
>>>
>>> Unless this is coming from another project code base, the common
>>> practice is not to add copyright for each contributor in later versions.
>>>
>>
>> Unless it's a large contribution (compared to the rest of the file)?
>>
>> I guess that's why the 916c50d commit adds the Microsoft copyright notice.
>>
>>>> +/**
>>>> + * Test if a particular bit in a 32-bit word is set.
>>>> + *
>>>> + * This function does not give any guarantees in regards to memory
>>>> + * ordering or atomicity.
>>>> + *
>>>> + * @param addr
>>>> + *   A pointer to the 32-bit word to query.
>>>> + * @param nr
>>>> + *   The index of the bit (0-31).
>>>> + * @return
>>>> + *   Returns true if the bit is set, and false otherwise.
>>>> + */
>>>> +static inline bool
>>>> +rte_bit_test32(const uint32_t *addr, unsigned int nr);
>>>
>>> Is it possible to reorder these inlines to avoid having
>>> forward declarations?
>>>
>>
>> Yes, but I'm not sure it's a net gain.
>>
>> A statement expression macro seems like a perfect tool for the job,
>> but then MSVC doesn't support statement expressions. You could also
>> have a macro that just generate the function body, as oppose to the
>> whole function.
> 
> statement expressions can be used even with MSVC when using C. but GCC
> documentation discourages their use for C++. since the header is

GCC documentation discourages statement expressions *of a particular 
form* from being included in headers to be consumed by C++.

They would be fine to use here, especially considering they wouldn't be 
a part of the public API (i.e., only invoked from the static inline 
functions in the API).

> consumed by C++ in addition to C it's preferrable to avoid them.
> 
>>
>> I'll consider if I should just bite the bullet and expand all the
>> macros. 4x duplication.
>>
>>> Also, new functions should be marked __rte_experimental
>>> for a release or two.
>>
>> Yes, thanks.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 1/7] eal: extend bit manipulation functions
  2024-03-05 18:01         ` Mattias Rönnblom
@ 2024-03-05 18:06           ` Tyler Retzlaff
  0 siblings, 0 replies; 160+ messages in thread
From: Tyler Retzlaff @ 2024-03-05 18:06 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Stephen Hemminger, Mattias Rönnblom, dev, Heng Wang

On Tue, Mar 05, 2024 at 07:01:50PM +0100, Mattias Rönnblom wrote:
> On 2024-03-04 17:34, Tyler Retzlaff wrote:
> >On Sun, Mar 03, 2024 at 07:26:36AM +0100, Mattias Rönnblom wrote:
> >>On 2024-03-02 18:05, Stephen Hemminger wrote:
> >>>On Sat, 2 Mar 2024 14:53:22 +0100
> >>>Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> >>>
> >>>>diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> >>>>index 449565eeae..9a368724d5 100644
> >>>>--- a/lib/eal/include/rte_bitops.h
> >>>>+++ b/lib/eal/include/rte_bitops.h
> >>>>@@ -2,6 +2,7 @@
> >>>>   * Copyright(c) 2020 Arm Limited
> >>>>   * Copyright(c) 2010-2019 Intel Corporation
> >>>>   * Copyright(c) 2023 Microsoft Corporation
> >>>>+ * Copyright(c) 2024 Ericsson AB
> >>>>   */
> >>>
> >>>Unless this is coming from another project code base, the common
> >>>practice is not to add copyright for each contributor in later versions.
> >>>
> >>
> >>Unless it's a large contribution (compared to the rest of the file)?
> >>
> >>I guess that's why the 916c50d commit adds the Microsoft copyright notice.
> >>
> >>>>+/**
> >>>>+ * Test if a particular bit in a 32-bit word is set.
> >>>>+ *
> >>>>+ * This function does not give any guarantees in regards to memory
> >>>>+ * ordering or atomicity.
> >>>>+ *
> >>>>+ * @param addr
> >>>>+ *   A pointer to the 32-bit word to query.
> >>>>+ * @param nr
> >>>>+ *   The index of the bit (0-31).
> >>>>+ * @return
> >>>>+ *   Returns true if the bit is set, and false otherwise.
> >>>>+ */
> >>>>+static inline bool
> >>>>+rte_bit_test32(const uint32_t *addr, unsigned int nr);
> >>>
> >>>Is it possible to reorder these inlines to avoid having
> >>>forward declarations?
> >>>
> >>
> >>Yes, but I'm not sure it's a net gain.
> >>
> >>A statement expression macro seems like a perfect tool for the job,
> >>but then MSVC doesn't support statement expressions. You could also
> >>have a macro that just generate the function body, as oppose to the
> >>whole function.
> >
> >statement expressions can be used even with MSVC when using C. but GCC
> >documentation discourages their use for C++. since the header is
> 
> GCC documentation discourages statement expressions *of a particular
> form* from being included in headers to be consumed by C++.
> 
> They would be fine to use here, especially considering they wouldn't
> be a part of the public API (i.e., only invoked from the static
> inline functions in the API).

agreed, there should be no problem.

> 
> >consumed by C++ in addition to C it's preferrable to avoid them.
> >
> >>
> >>I'll consider if I should just bite the bullet and expand all the
> >>macros. 4x duplication.
> >>
> >>>Also, new functions should be marked __rte_experimental
> >>>for a release or two.
> >>
> >>Yes, thanks.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-04 16:42   ` Tyler Retzlaff
@ 2024-03-05 18:08     ` Mattias Rönnblom
  2024-03-05 18:22       ` Tyler Retzlaff
  0 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-05 18:08 UTC (permalink / raw)
  To: Tyler Retzlaff, Mattias Rönnblom; +Cc: dev, Heng Wang

On 2024-03-04 17:42, Tyler Retzlaff wrote:
> On Sat, Mar 02, 2024 at 02:53:23PM +0100, Mattias Rönnblom wrote:
>> Add bit-level test/set/clear/assign macros operating on both 32-bit
>> and 64-bit words by means of C11 generic selection.
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
> 
> _Generic is nice here. should we discourage direct use of the inline
> functions in preference of using the macro always? either way lgtm.
> 

That was something I considered, but decided against it for RFC v1. I 
wasn't even sure people would like _Generic.

The big upside of having only the _Generic macros would be a much 
smaller API, but maybe a tiny bit less (type-)safe to use.

Also, _Generic is new for DPDK, so who knows what issues it might cause 
with old compilers.

Thanks.

> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> 
>>   lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
>>   1 file changed, 81 insertions(+)
>>
>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>> index 9a368724d5..afd0f11033 100644
>> --- a/lib/eal/include/rte_bitops.h
>> +++ b/lib/eal/include/rte_bitops.h
>> @@ -107,6 +107,87 @@ extern "C" {
>>   #define RTE_FIELD_GET64(mask, reg) \
>>   		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
>>   
>> +/**
>> + * Test bit in word.
>> + *
>> + * Generic selection macro to test the value of a bit in a 32-bit or
>> + * 64-bit word. The type of operation depends on the type of the @c
>> + * addr parameter.
>> + *
>> + * This macro does not give any guarantees in regards to memory
>> + * ordering or atomicity.
>> + *
>> + * @param addr
>> + *   A pointer to the word to modify.
>> + * @param nr
>> + *   The index of the bit.
>> + */
>> +#define rte_bit_test(addr, nr)				\
>> +	_Generic((addr),				\
>> +		 uint32_t *: rte_bit_test32,		\
>> +		 uint64_t *: rte_bit_test64)(addr, nr)
>> +
>> +/**
>> + * Set bit in word.
>> + *
>> + * Generic selection macro to set a bit in a 32-bit or 64-bit
>> + * word. The type of operation depends on the type of the @c addr
>> + * parameter.
>> + *
>> + * This macro does not give any guarantees in regards to memory
>> + * ordering or atomicity.
>> + *
>> + * @param addr
>> + *   A pointer to the word to modify.
>> + * @param nr
>> + *   The index of the bit.
>> + */
>> +#define rte_bit_set(addr, nr)				\
>> +	_Generic((addr),				\
>> +		 uint32_t *: rte_bit_set32,		\
>> +		 uint64_t *: rte_bit_set64)(addr, nr)
>> +
>> +/**
>> + * Clear bit in word.
>> + *
>> + * Generic selection macro to clear a bit in a 32-bit or 64-bit
>> + * word. The type of operation depends on the type of the @c addr
>> + * parameter.
>> + *
>> + * This macro does not give any guarantees in regards to memory
>> + * ordering or atomicity.
>> + *
>> + * @param addr
>> + *   A pointer to the word to modify.
>> + * @param nr
>> + *   The index of the bit.
>> + */
>> +#define rte_bit_clear(addr, nr)			\
>> +	_Generic((addr),				\
>> +		 uint32_t *: rte_bit_clear32,		\
>> +		 uint64_t *: rte_bit_clear64)(addr, nr)
>> +
>> +/**
>> + * Assign a value to a bit in word.
>> + *
>> + * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
>> + * word. The type of operation depends on the type of the @c addr parameter.
>> + *
>> + * This macro does not give any guarantees in regards to memory
>> + * ordering or atomicity.
>> + *
>> + * @param addr
>> + *   A pointer to the word to modify.
>> + * @param nr
>> + *   The index of the bit.
>> + * @param value
>> + *   The new value of the bit - true for '1', or false for '0'.
>> + */
>> +#define rte_bit_assign(addr, nr, value)			\
>> +	_Generic((addr),				\
>> +		 uint32_t *: rte_bit_assign32,			\
>> +		 uint64_t *: rte_bit_assign64)(addr, nr, value)
>> +
>>   /**
>>    * Test if a particular bit in a 32-bit word is set.
>>    *
>> -- 
>> 2.34.1

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-05 18:08     ` Mattias Rönnblom
@ 2024-03-05 18:22       ` Tyler Retzlaff
  2024-03-05 20:02         ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: Tyler Retzlaff @ 2024-03-05 18:22 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: Mattias Rönnblom, dev, Heng Wang

On Tue, Mar 05, 2024 at 07:08:36PM +0100, Mattias Rönnblom wrote:
> On 2024-03-04 17:42, Tyler Retzlaff wrote:
> >On Sat, Mar 02, 2024 at 02:53:23PM +0100, Mattias Rönnblom wrote:
> >>Add bit-level test/set/clear/assign macros operating on both 32-bit
> >>and 64-bit words by means of C11 generic selection.
> >>
> >>Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>---
> >
> >_Generic is nice here. should we discourage direct use of the inline
> >functions in preference of using the macro always? either way lgtm.
> >
> 
> That was something I considered, but decided against it for RFC v1.
> I wasn't even sure people would like _Generic.
> 
> The big upside of having only the _Generic macros would be a much
> smaller API, but maybe a tiny bit less (type-)safe to use.

i'm curious what misuse pattern you anticipate or have seen that may be
less type-safe? just so i can look out for them.

i (perhaps naively) have liked generic functions for their selection of
the "correct" type and for _Generic if no leg/case exists compiler
error (as opposed to e.g. silent truncation).

> 
> Also, _Generic is new for DPDK, so who knows what issues it might
> cause with old compilers.

i was thinking about this overnight, it's supposed to be standard C11
and my use on various compilers showed no problem but I can't recall if
i did any evaluation when consuming as a part of a C++ translation unit
so there could be problems.

> 
> Thanks.
> 
> >Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >
> >>  lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
> >>  1 file changed, 81 insertions(+)
> >>
> >>diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> >>index 9a368724d5..afd0f11033 100644
> >>--- a/lib/eal/include/rte_bitops.h
> >>+++ b/lib/eal/include/rte_bitops.h
> >>@@ -107,6 +107,87 @@ extern "C" {
> >>  #define RTE_FIELD_GET64(mask, reg) \
> >>  		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
> >>+/**
> >>+ * Test bit in word.
> >>+ *
> >>+ * Generic selection macro to test the value of a bit in a 32-bit or
> >>+ * 64-bit word. The type of operation depends on the type of the @c
> >>+ * addr parameter.
> >>+ *
> >>+ * This macro does not give any guarantees in regards to memory
> >>+ * ordering or atomicity.
> >>+ *
> >>+ * @param addr
> >>+ *   A pointer to the word to modify.
> >>+ * @param nr
> >>+ *   The index of the bit.
> >>+ */
> >>+#define rte_bit_test(addr, nr)				\
> >>+	_Generic((addr),				\
> >>+		 uint32_t *: rte_bit_test32,		\
> >>+		 uint64_t *: rte_bit_test64)(addr, nr)
> >>+
> >>+/**
> >>+ * Set bit in word.
> >>+ *
> >>+ * Generic selection macro to set a bit in a 32-bit or 64-bit
> >>+ * word. The type of operation depends on the type of the @c addr
> >>+ * parameter.
> >>+ *
> >>+ * This macro does not give any guarantees in regards to memory
> >>+ * ordering or atomicity.
> >>+ *
> >>+ * @param addr
> >>+ *   A pointer to the word to modify.
> >>+ * @param nr
> >>+ *   The index of the bit.
> >>+ */
> >>+#define rte_bit_set(addr, nr)				\
> >>+	_Generic((addr),				\
> >>+		 uint32_t *: rte_bit_set32,		\
> >>+		 uint64_t *: rte_bit_set64)(addr, nr)
> >>+
> >>+/**
> >>+ * Clear bit in word.
> >>+ *
> >>+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
> >>+ * word. The type of operation depends on the type of the @c addr
> >>+ * parameter.
> >>+ *
> >>+ * This macro does not give any guarantees in regards to memory
> >>+ * ordering or atomicity.
> >>+ *
> >>+ * @param addr
> >>+ *   A pointer to the word to modify.
> >>+ * @param nr
> >>+ *   The index of the bit.
> >>+ */
> >>+#define rte_bit_clear(addr, nr)			\
> >>+	_Generic((addr),				\
> >>+		 uint32_t *: rte_bit_clear32,		\
> >>+		 uint64_t *: rte_bit_clear64)(addr, nr)
> >>+
> >>+/**
> >>+ * Assign a value to a bit in word.
> >>+ *
> >>+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
> >>+ * word. The type of operation depends on the type of the @c addr parameter.
> >>+ *
> >>+ * This macro does not give any guarantees in regards to memory
> >>+ * ordering or atomicity.
> >>+ *
> >>+ * @param addr
> >>+ *   A pointer to the word to modify.
> >>+ * @param nr
> >>+ *   The index of the bit.
> >>+ * @param value
> >>+ *   The new value of the bit - true for '1', or false for '0'.
> >>+ */
> >>+#define rte_bit_assign(addr, nr, value)			\
> >>+	_Generic((addr),				\
> >>+		 uint32_t *: rte_bit_assign32,			\
> >>+		 uint64_t *: rte_bit_assign64)(addr, nr, value)
> >>+
> >>  /**
> >>   * Test if a particular bit in a 32-bit word is set.
> >>   *
> >>-- 
> >>2.34.1

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-05 18:22       ` Tyler Retzlaff
@ 2024-03-05 20:02         ` Mattias Rönnblom
  2024-03-05 20:53           ` Tyler Retzlaff
  0 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-03-05 20:02 UTC (permalink / raw)
  To: Tyler Retzlaff; +Cc: Mattias Rönnblom, dev, Heng Wang

On 2024-03-05 19:22, Tyler Retzlaff wrote:
> On Tue, Mar 05, 2024 at 07:08:36PM +0100, Mattias Rönnblom wrote:
>> On 2024-03-04 17:42, Tyler Retzlaff wrote:
>>> On Sat, Mar 02, 2024 at 02:53:23PM +0100, Mattias Rönnblom wrote:
>>>> Add bit-level test/set/clear/assign macros operating on both 32-bit
>>>> and 64-bit words by means of C11 generic selection.
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>> ---
>>>
>>> _Generic is nice here. should we discourage direct use of the inline
>>> functions in preference of using the macro always? either way lgtm.
>>>
>>
>> That was something I considered, but decided against it for RFC v1.
>> I wasn't even sure people would like _Generic.
>>
>> The big upside of having only the _Generic macros would be a much
>> smaller API, but maybe a tiny bit less (type-)safe to use.
> 
> i'm curious what misuse pattern you anticipate or have seen that may be
> less type-safe? just so i can look out for them.
> 

That was just a gut feeling, not to be taken too seriously.

uint32_t *p = some_void_pointer;
/../
rte_bit_set32(p, 17);

A code section like this is redundant in the way the type (or at least 
type size) is coded both into the function name, and the pointer type. 
The use of rte_set_bit() will eliminate this, which is good (DRY), and 
bad, because now the type isn't "double-checked".

As you can see, it's a pretty weak argument.

> i (perhaps naively) have liked generic functions for their selection of
> the "correct" type and for _Generic if no leg/case exists compiler
> error (as opposed to e.g. silent truncation).
> 
>>
>> Also, _Generic is new for DPDK, so who knows what issues it might
>> cause with old compilers.
> 
> i was thinking about this overnight, it's supposed to be standard C11
> and my use on various compilers showed no problem but I can't recall if
> i did any evaluation when consuming as a part of a C++ translation unit
> so there could be problems.
> 

It would be unfortunate if DPDK was prohibited from using _Generic.

>>
>> Thanks.
>>
>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>>>
>>>>   lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
>>>>   1 file changed, 81 insertions(+)
>>>>
>>>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>>>> index 9a368724d5..afd0f11033 100644
>>>> --- a/lib/eal/include/rte_bitops.h
>>>> +++ b/lib/eal/include/rte_bitops.h
>>>> @@ -107,6 +107,87 @@ extern "C" {
>>>>   #define RTE_FIELD_GET64(mask, reg) \
>>>>   		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
>>>> +/**
>>>> + * Test bit in word.
>>>> + *
>>>> + * Generic selection macro to test the value of a bit in a 32-bit or
>>>> + * 64-bit word. The type of operation depends on the type of the @c
>>>> + * addr parameter.
>>>> + *
>>>> + * This macro does not give any guarantees in regards to memory
>>>> + * ordering or atomicity.
>>>> + *
>>>> + * @param addr
>>>> + *   A pointer to the word to modify.
>>>> + * @param nr
>>>> + *   The index of the bit.
>>>> + */
>>>> +#define rte_bit_test(addr, nr)				\
>>>> +	_Generic((addr),				\
>>>> +		 uint32_t *: rte_bit_test32,		\
>>>> +		 uint64_t *: rte_bit_test64)(addr, nr)
>>>> +
>>>> +/**
>>>> + * Set bit in word.
>>>> + *
>>>> + * Generic selection macro to set a bit in a 32-bit or 64-bit
>>>> + * word. The type of operation depends on the type of the @c addr
>>>> + * parameter.
>>>> + *
>>>> + * This macro does not give any guarantees in regards to memory
>>>> + * ordering or atomicity.
>>>> + *
>>>> + * @param addr
>>>> + *   A pointer to the word to modify.
>>>> + * @param nr
>>>> + *   The index of the bit.
>>>> + */
>>>> +#define rte_bit_set(addr, nr)				\
>>>> +	_Generic((addr),				\
>>>> +		 uint32_t *: rte_bit_set32,		\
>>>> +		 uint64_t *: rte_bit_set64)(addr, nr)
>>>> +
>>>> +/**
>>>> + * Clear bit in word.
>>>> + *
>>>> + * Generic selection macro to clear a bit in a 32-bit or 64-bit
>>>> + * word. The type of operation depends on the type of the @c addr
>>>> + * parameter.
>>>> + *
>>>> + * This macro does not give any guarantees in regards to memory
>>>> + * ordering or atomicity.
>>>> + *
>>>> + * @param addr
>>>> + *   A pointer to the word to modify.
>>>> + * @param nr
>>>> + *   The index of the bit.
>>>> + */
>>>> +#define rte_bit_clear(addr, nr)			\
>>>> +	_Generic((addr),				\
>>>> +		 uint32_t *: rte_bit_clear32,		\
>>>> +		 uint64_t *: rte_bit_clear64)(addr, nr)
>>>> +
>>>> +/**
>>>> + * Assign a value to a bit in word.
>>>> + *
>>>> + * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
>>>> + * word. The type of operation depends on the type of the @c addr parameter.
>>>> + *
>>>> + * This macro does not give any guarantees in regards to memory
>>>> + * ordering or atomicity.
>>>> + *
>>>> + * @param addr
>>>> + *   A pointer to the word to modify.
>>>> + * @param nr
>>>> + *   The index of the bit.
>>>> + * @param value
>>>> + *   The new value of the bit - true for '1', or false for '0'.
>>>> + */
>>>> +#define rte_bit_assign(addr, nr, value)			\
>>>> +	_Generic((addr),				\
>>>> +		 uint32_t *: rte_bit_assign32,			\
>>>> +		 uint64_t *: rte_bit_assign64)(addr, nr, value)
>>>> +
>>>>   /**
>>>>    * Test if a particular bit in a 32-bit word is set.
>>>>    *
>>>> -- 
>>>> 2.34.1

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC 2/7] eal: add generic bit manipulation macros
  2024-03-05 20:02         ` Mattias Rönnblom
@ 2024-03-05 20:53           ` Tyler Retzlaff
  0 siblings, 0 replies; 160+ messages in thread
From: Tyler Retzlaff @ 2024-03-05 20:53 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: Mattias Rönnblom, dev, Heng Wang

On Tue, Mar 05, 2024 at 09:02:34PM +0100, Mattias Rönnblom wrote:
> On 2024-03-05 19:22, Tyler Retzlaff wrote:
> >On Tue, Mar 05, 2024 at 07:08:36PM +0100, Mattias Rönnblom wrote:
> >>On 2024-03-04 17:42, Tyler Retzlaff wrote:
> >>>On Sat, Mar 02, 2024 at 02:53:23PM +0100, Mattias Rönnblom wrote:
> >>>>Add bit-level test/set/clear/assign macros operating on both 32-bit
> >>>>and 64-bit words by means of C11 generic selection.
> >>>>
> >>>>Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>>---
> >>>
> >>>_Generic is nice here. should we discourage direct use of the inline
> >>>functions in preference of using the macro always? either way lgtm.
> >>>
> >>
> >>That was something I considered, but decided against it for RFC v1.
> >>I wasn't even sure people would like _Generic.
> >>
> >>The big upside of having only the _Generic macros would be a much
> >>smaller API, but maybe a tiny bit less (type-)safe to use.
> >
> >i'm curious what misuse pattern you anticipate or have seen that may be
> >less type-safe? just so i can look out for them.
> >
> 
> That was just a gut feeling, not to be taken too seriously.
> 
> uint32_t *p = some_void_pointer;
> /../
> rte_bit_set32(p, 17);
> 
> A code section like this is redundant in the way the type (or at
> least type size) is coded both into the function name, and the
> pointer type. The use of rte_set_bit() will eliminate this, which is
> good (DRY), and bad, because now the type isn't "double-checked".
> 
> As you can see, it's a pretty weak argument.
> 
> >i (perhaps naively) have liked generic functions for their selection of
> >the "correct" type and for _Generic if no leg/case exists compiler
> >error (as opposed to e.g. silent truncation).
> >
> >>
> >>Also, _Generic is new for DPDK, so who knows what issues it might
> >>cause with old compilers.
> >
> >i was thinking about this overnight, it's supposed to be standard C11
> >and my use on various compilers showed no problem but I can't recall if
> >i did any evaluation when consuming as a part of a C++ translation unit
> >so there could be problems.
> >
> 
> It would be unfortunate if DPDK was prohibited from using _Generic.

I agree, I don't think it should be prohibited. If C++ poses a problem
we can work to find solutions.

> 
> >>
> >>Thanks.
> >>
> >>>Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >>>
> >>>>  lib/eal/include/rte_bitops.h | 81 ++++++++++++++++++++++++++++++++++++
> >>>>  1 file changed, 81 insertions(+)
> >>>>
> >>>>diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> >>>>index 9a368724d5..afd0f11033 100644
> >>>>--- a/lib/eal/include/rte_bitops.h
> >>>>+++ b/lib/eal/include/rte_bitops.h
> >>>>@@ -107,6 +107,87 @@ extern "C" {
> >>>>  #define RTE_FIELD_GET64(mask, reg) \
> >>>>  		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
> >>>>+/**
> >>>>+ * Test bit in word.
> >>>>+ *
> >>>>+ * Generic selection macro to test the value of a bit in a 32-bit or
> >>>>+ * 64-bit word. The type of operation depends on the type of the @c
> >>>>+ * addr parameter.
> >>>>+ *
> >>>>+ * This macro does not give any guarantees in regards to memory
> >>>>+ * ordering or atomicity.
> >>>>+ *
> >>>>+ * @param addr
> >>>>+ *   A pointer to the word to modify.
> >>>>+ * @param nr
> >>>>+ *   The index of the bit.
> >>>>+ */
> >>>>+#define rte_bit_test(addr, nr)				\
> >>>>+	_Generic((addr),				\
> >>>>+		 uint32_t *: rte_bit_test32,		\
> >>>>+		 uint64_t *: rte_bit_test64)(addr, nr)
> >>>>+
> >>>>+/**
> >>>>+ * Set bit in word.
> >>>>+ *
> >>>>+ * Generic selection macro to set a bit in a 32-bit or 64-bit
> >>>>+ * word. The type of operation depends on the type of the @c addr
> >>>>+ * parameter.
> >>>>+ *
> >>>>+ * This macro does not give any guarantees in regards to memory
> >>>>+ * ordering or atomicity.
> >>>>+ *
> >>>>+ * @param addr
> >>>>+ *   A pointer to the word to modify.
> >>>>+ * @param nr
> >>>>+ *   The index of the bit.
> >>>>+ */
> >>>>+#define rte_bit_set(addr, nr)				\
> >>>>+	_Generic((addr),				\
> >>>>+		 uint32_t *: rte_bit_set32,		\
> >>>>+		 uint64_t *: rte_bit_set64)(addr, nr)
> >>>>+
> >>>>+/**
> >>>>+ * Clear bit in word.
> >>>>+ *
> >>>>+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
> >>>>+ * word. The type of operation depends on the type of the @c addr
> >>>>+ * parameter.
> >>>>+ *
> >>>>+ * This macro does not give any guarantees in regards to memory
> >>>>+ * ordering or atomicity.
> >>>>+ *
> >>>>+ * @param addr
> >>>>+ *   A pointer to the word to modify.
> >>>>+ * @param nr
> >>>>+ *   The index of the bit.
> >>>>+ */
> >>>>+#define rte_bit_clear(addr, nr)			\
> >>>>+	_Generic((addr),				\
> >>>>+		 uint32_t *: rte_bit_clear32,		\
> >>>>+		 uint64_t *: rte_bit_clear64)(addr, nr)
> >>>>+
> >>>>+/**
> >>>>+ * Assign a value to a bit in word.
> >>>>+ *
> >>>>+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
> >>>>+ * word. The type of operation depends on the type of the @c addr parameter.
> >>>>+ *
> >>>>+ * This macro does not give any guarantees in regards to memory
> >>>>+ * ordering or atomicity.
> >>>>+ *
> >>>>+ * @param addr
> >>>>+ *   A pointer to the word to modify.
> >>>>+ * @param nr
> >>>>+ *   The index of the bit.
> >>>>+ * @param value
> >>>>+ *   The new value of the bit - true for '1', or false for '0'.
> >>>>+ */
> >>>>+#define rte_bit_assign(addr, nr, value)			\
> >>>>+	_Generic((addr),				\
> >>>>+		 uint32_t *: rte_bit_assign32,			\
> >>>>+		 uint64_t *: rte_bit_assign64)(addr, nr, value)
> >>>>+
> >>>>  /**
> >>>>   * Test if a particular bit in a 32-bit word is set.
> >>>>   *
> >>>>-- 
> >>>>2.34.1

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v2 0/6] Improve EAL bit operations API
  2024-03-02 13:53 ` [RFC 1/7] eal: extend bit manipulation functions Mattias Rönnblom
  2024-03-02 17:05   ` Stephen Hemminger
@ 2024-04-25  8:58   ` Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
                       ` (7 more replies)
  1 sibling, 8 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign]() which provides no memory ordering or
atomicity guarantees and no read-once or write-once semantics (e.g.,
no use of volatile), but does provide the best performance. The
performance degradation resulting from the use of volatile (e.g.,
forcing loads and stores to actually occur and in the number
specified) and atomic (e.g., LOCK-prefixed instructions on x86) may be
significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions (or more correctly, generic selection macros)
operate on both 32 and 64-bit words, with type checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

Things that are left out of this patch set, that may be included
in future versions:

 * Have all functions returning a bit number have the same return type
   (i.e., unsigned int).
 * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
 * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
   for useful/used bit-level GCC builtins.
 * Eliminate the MSVC #ifdef-induced documentation duplication.
 * _Generic versions of things like rte_popcount32(). (?)

Mattias Rönnblom (6):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add exactly-once bit access functions
  eal: add unit tests for exactly-once bit access functions
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions

 app/test/test_bitops.c       | 319 +++++++++++++++++-
 lib/eal/include/rte_bitops.h | 624 ++++++++++++++++++++++++++++++++++-
 2 files changed, 925 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v2 1/6] eal: extend bit manipulation functionality
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-04-25  8:58     ` Mattias Rönnblom
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 2/6] eal: add unit tests for bit operations Mattias Rönnblom
                       ` (6 subsequent siblings)
  7 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

Add functionality to test, set, clear, and assign the value to
individual bits in 32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 158 ++++++++++++++++++++++++++++++++++-
 1 file changed, 156 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..75a29fdfe0 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,157 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_test32,			\
+		 uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+#define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
+	static inline bool						\
+	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(name, size, qualifier)			\
+	static inline void						\
+	name(qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(name, size, qualifier)			\
+	static inline void						\
+	name(qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+__RTE_GEN_BIT_TEST(__rte_bit_test32, 32,)
+__RTE_GEN_BIT_SET(__rte_bit_set32, 32,)
+__RTE_GEN_BIT_CLEAR(__rte_bit_clear32, 32,)
+
+__RTE_GEN_BIT_TEST(__rte_bit_test64, 64,)
+__RTE_GEN_BIT_SET(__rte_bit_set64, 64,)
+__RTE_GEN_BIT_CLEAR(__rte_bit_clear64, 64,)
+
+__rte_experimental
+static inline void
+__rte_bit_assign32(uint32_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_set32(addr, nr);
+	else
+		__rte_bit_clear32(addr, nr);
+}
+
+__rte_experimental
+static inline void
+__rte_bit_assign64(uint64_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_set64(addr, nr);
+	else
+		__rte_bit_clear64(addr, nr);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v2 2/6] eal: add unit tests for bit operations
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-25  8:58     ` Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
                       ` (5 subsequent siblings)
  7 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

Extend bitops tests to cover the rte_bit_[set|clear|assign|test]()
family of functions.

The tests are converted to use the test suite runner framework.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c | 76 +++++++++++++++++++++++++++++++++---------
 1 file changed, 61 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..f788b561a0 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,59 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    test_fun, size)				\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access_32, rte_bit_set, rte_bit_clear, \
+		    rte_bit_assign, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access_64, rte_bit_set, rte_bit_clear, \
+		    rte_bit_assign, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +163,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access_32),
+		TEST_CASE(test_bit_access_64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v2 3/6] eal: add exactly-once bit access functions
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 2/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-04-25  8:58     ` Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 4/6] eal: add unit tests for " Mattias Rönnblom
                       ` (4 subsequent siblings)
  7 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

Add bit test/set/clear/assign functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 170 +++++++++++++++++++++++++++++++++++
 1 file changed, 170 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 75a29fdfe0..a2746e657f 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -201,6 +201,147 @@ extern "C" {
 		 uint32_t *: __rte_bit_assign32,			\
 		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Generic selection macro to test exactly once the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set(addr, 17);
+ * if (rte_bit_once_test(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set() and
+ * rte_bit_test() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * the equivalent to
+ * \code{.c}
+ * if (rte_bit_test(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * rte_bit_once_test() does not give any guarantees in regards to
+ * memory ordering or atomicity.
+ *
+ * The regular bit set operations (e.g., rte_bit_test()) should be
+ * preferred over the "once" family of operations (e.g.,
+ * rte_bit_once_test()) if possible, since the latter may prevent
+ * optimizations crucial for run-time performance.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_test32,		\
+		 uint64_t *: __rte_bit_once_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '1'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_set32,		\
+		 uint64_t *: __rte_bit_once_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '0'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_clear32,		\
+		 uint64_t *: __rte_bit_once_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to bit in a word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to the
+ * value indicated by @c value exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_once_assign32,			\
+		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
+
 #define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
 	static inline bool						\
 	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
@@ -239,6 +380,14 @@ __RTE_GEN_BIT_TEST(__rte_bit_test64, 64,)
 __RTE_GEN_BIT_SET(__rte_bit_set64, 64,)
 __RTE_GEN_BIT_CLEAR(__rte_bit_clear64, 64,)
 
+__RTE_GEN_BIT_TEST(__rte_bit_once_test32, 32, volatile)
+__RTE_GEN_BIT_SET(__rte_bit_once_set32, 32, volatile)
+__RTE_GEN_BIT_CLEAR(__rte_bit_once_clear32, 32, volatile)
+
+__RTE_GEN_BIT_TEST(__rte_bit_once_test64, 64, volatile)
+__RTE_GEN_BIT_SET(__rte_bit_once_set64, 64, volatile)
+__RTE_GEN_BIT_CLEAR(__rte_bit_once_clear64, 64, volatile)
+
 __rte_experimental
 static inline void
 __rte_bit_assign32(uint32_t *addr, unsigned int nr, bool value)
@@ -259,6 +408,27 @@ __rte_bit_assign64(uint64_t *addr, unsigned int nr, bool value)
 		__rte_bit_clear64(addr, nr);
 }
 
+
+__rte_experimental
+static inline void
+__rte_bit_once_assign32(uint32_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_once_set32(addr, nr);
+	else
+		__rte_bit_once_clear32(addr, nr);
+}
+
+__rte_experimental
+static inline void
+__rte_bit_once_assign64(volatile uint64_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_once_set64(addr, nr);
+	else
+		__rte_bit_once_clear64(addr, nr);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v2 4/6] eal: add unit tests for exactly-once bit access functions
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
                       ` (2 preceding siblings ...)
  2024-04-25  8:58     ` [RFC v2 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-04-25  8:58     ` Mattias Rönnblom
  2024-04-25  8:58     ` [RFC v2 5/6] eal: add atomic bit operations Mattias Rönnblom
                       ` (3 subsequent siblings)
  7 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_once_[set|clear|assign|test]() family of functions.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index f788b561a0..12c1027e36 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -46,12 +46,20 @@
 		return TEST_SUCCESS;					\
 	}
 
-GEN_TEST_BIT_ACCESS(test_bit_access_32, rte_bit_set, rte_bit_clear, \
+GEN_TEST_BIT_ACCESS(test_bit_access_32, rte_bit_set, rte_bit_clear,	\
 		    rte_bit_assign, rte_bit_test, 32)
 
-GEN_TEST_BIT_ACCESS(test_bit_access_64, rte_bit_set, rte_bit_clear, \
+GEN_TEST_BIT_ACCESS(test_bit_access_64, rte_bit_set, rte_bit_clear,	\
 		    rte_bit_assign, rte_bit_test, 64)
 
+GEN_TEST_BIT_ACCESS(test_bit_once_access_32, rte_bit_once_set,		\
+		    rte_bit_once_clear, rte_bit_once_assign,		\
+		    rte_bit_once_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_once_access_64, rte_bit_once_set,		\
+		    rte_bit_once_clear, rte_bit_once_assign,		\
+		    rte_bit_once_test, 64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -168,6 +176,8 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access_32),
 		TEST_CASE(test_bit_access_64),
+		TEST_CASE(test_bit_once_access_32),
+		TEST_CASE(test_bit_once_access_64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v2 5/6] eal: add atomic bit operations
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
                       ` (3 preceding siblings ...)
  2024-04-25  8:58     ` [RFC v2 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-04-25  8:58     ` Mattias Rönnblom
  2024-04-25 10:25       ` Morten Brørup
  2024-04-25  8:58     ` [RFC v2 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
                       ` (2 subsequent siblings)
  7 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

Add atomic bit test/set/clear/assign and test-and-set/clear functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 297 +++++++++++++++++++++++++++++++++++
 1 file changed, 297 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index a2746e657f..8c38a1ac03 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -342,6 +343,177 @@ extern "C" {
 		 uint32_t *: __rte_bit_once_assign32,			\
 		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
 	static inline bool						\
 	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
@@ -429,6 +601,131 @@ __rte_bit_once_assign64(volatile uint64_t *addr, unsigned int nr, bool value)
 		__rte_bit_once_clear64(addr, nr);
 }
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_assign ## size(&target, nr, value);	\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+		return __rte_bit_test ## size(&before, nr);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false,
+						  memory_order);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v2 6/6] eal: add unit tests for atomic bit access functions
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
                       ` (4 preceding siblings ...)
  2024-04-25  8:58     ` [RFC v2 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-04-25  8:58     ` Mattias Rönnblom
  2024-04-25 18:05     ` [RFC v2 0/6] Improve EAL bit operations API Tyler Retzlaff
  2024-04-26 21:35     ` Patrick Robb
  7 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-25  8:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_atomic_[set|clear|assign|test|test_and_[set|clear|assign]]()
family of functions.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c       | 233 ++++++++++++++++++++++++++++++++++-
 lib/eal/include/rte_bitops.h |   1 -
 2 files changed, 232 insertions(+), 2 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 12c1027e36..a0967260aa 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -60,6 +63,228 @@ GEN_TEST_BIT_ACCESS(test_bit_once_access_64, rte_bit_once_set,		\
 		    rte_bit_once_clear, rte_bit_once_assign,		\
 		    rte_bit_once_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access_32, bit_atomic_set,	\
+		    bit_atomic_clear, bit_atomic_assign,	\
+		    bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access_64, bit_atomic_set,	\
+		    bit_atomic_clear, bit_atomic_assign,	\
+		    bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore_ ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign_ ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore_ ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign_ ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore_ ## size main = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore_ ## size worker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		main.bit = rte_rand_max(size);				\
+		do {							\
+			worker.bit = rte_rand_max(size);		\
+		} while (worker.bit == main.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign_ ## size, \
+					       &worker,	worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign_ ## size(&main);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!main.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!worker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore_ ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify_ ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore_ ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify_ ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore_ ## size main = {	\
+			.word = &word,				       \
+			.bit = bit \
+		};							\
+		struct parallel_test_and_set_lcore_ ## size worker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify_ ## size, \
+					       &worker,	worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify_ ## size(&main);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = main.flips + worker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -178,6 +403,12 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access_64),
 		TEST_CASE(test_bit_once_access_32),
 		TEST_CASE(test_bit_once_access_64),
+		TEST_CASE(test_bit_atomic_access_32),
+		TEST_CASE(test_bit_atomic_access_64),
+		TEST_CASE(test_bit_atomic_parallel_assign_32),
+		TEST_CASE(test_bit_atomic_parallel_assign_64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify_32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify_64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 8c38a1ac03..bc6d79086b 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -485,7 +485,6 @@ extern "C" {
 		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
 		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
 								memory_order)
-
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-25  8:58     ` [RFC v2 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-04-25 10:25       ` Morten Brørup
  2024-04-25 14:36         ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: Morten Brørup @ 2024-04-25 10:25 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff

> +#define rte_bit_atomic_test(addr, nr, memory_order)			\
> +	_Generic((addr),						\
> +		 uint32_t *: __rte_bit_atomic_test32,			\
> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)

I wonder if these should have RTE_ATOMIC qualifier:

+		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,			\
+		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr, memory_order)


> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
> +	static inline bool						\
> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\

I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:

+	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t) *addr,	\

instead of casting into a_addr.

> +				      unsigned int nr, int memory_order) \
> +	{								\
> +		RTE_ASSERT(nr < size);					\
> +									\
> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
> +		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
> +	}


Similar considerations regarding volatile qualifier for the "once" operations.


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-25 10:25       ` Morten Brørup
@ 2024-04-25 14:36         ` Mattias Rönnblom
  2024-04-25 16:18           ` Morten Brørup
  0 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-25 14:36 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

On 2024-04-25 12:25, Morten Brørup wrote:
>> +#define rte_bit_atomic_test(addr, nr, memory_order)			\
>> +	_Generic((addr),						\
>> +		 uint32_t *: __rte_bit_atomic_test32,			\
>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
> 
> I wonder if these should have RTE_ATOMIC qualifier:
> 
> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,			\
> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr, memory_order)
> 
> 
>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
>> +	static inline bool						\
>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
> 
> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
> 
> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t) *addr,	\
> 
> instead of casting into a_addr.
> 

Check the cover letter for the rationale for the cast.

Where I'm at now is that I think C11 _Atomic is rather poor design. The 
assumption that an object which allows for atomic access always should 
require all operations upon it to be atomic, regardless of where it is 
in its lifetime, and which thread is accessing it, does not hold, in the 
general case.

The only reason for _Atomic being as it is, as far as I can see, is to 
accommodate for ISAs which does not have the appropriate atomic machine 
instructions, and thus require a lock or some other data associated with 
the actual user-data-carrying bits. Neither GCC nor DPDK supports any 
such ISAs, to my knowledge. I suspect neither never will. So the cast 
will continue to work.

>> +				      unsigned int nr, int memory_order) \
>> +	{								\
>> +		RTE_ASSERT(nr < size);					\
>> +									\
>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>> +		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
>> +	}
> 
> 
> Similar considerations regarding volatile qualifier for the "once" operations.
> 

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-25 14:36         ` Mattias Rönnblom
@ 2024-04-25 16:18           ` Morten Brørup
  2024-04-26  9:39             ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: Morten Brørup @ 2024-04-25 16:18 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev, Tyler Retzlaff
  Cc: Heng Wang, Stephen Hemminger, techboard

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Thursday, 25 April 2024 16.36
> 
> On 2024-04-25 12:25, Morten Brørup wrote:
> >> +#define rte_bit_atomic_test(addr, nr, memory_order)
> 	\
> >> +	_Generic((addr),						\
> >> +		 uint32_t *: __rte_bit_atomic_test32,			\
> >> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
> memory_order)
> >
> > I wonder if these should have RTE_ATOMIC qualifier:
> >
> > +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
> 		\
> > +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr,
> memory_order)
> >
> >
> >> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
> 	\
> >> +	static inline bool						\
> >> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
> 	\
> >
> > I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
> >
> > +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t)
> *addr,	\
> >
> > instead of casting into a_addr.
> >
> 
> Check the cover letter for the rationale for the cast.

Thanks, that clarifies it. Then...
For the series:
Acked-by: Morten Brørup <mb@smartsharesystems.com>

> 
> Where I'm at now is that I think C11 _Atomic is rather poor design. The
> assumption that an object which allows for atomic access always should
> require all operations upon it to be atomic, regardless of where it is
> in its lifetime, and which thread is accessing it, does not hold, in the
> general case.

It might be slow, but I suppose the C11 standard prioritizes correctness over performance.

It seems locks are automatically added to _Atomic types larger than what is natively supported by the architecture.
E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]

[1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/

> 
> The only reason for _Atomic being as it is, as far as I can see, is to
> accommodate for ISAs which does not have the appropriate atomic machine
> instructions, and thus require a lock or some other data associated with
> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
> such ISAs, to my knowledge. I suspect neither never will. So the cast
> will continue to work.

I tend to agree with you on this.

We should officially decide that DPDK treats RTE_ATOMIC types as a union of _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic and non-atomic.

> 
> >> +				      unsigned int nr, int memory_order) \
> >> +	{								\
> >> +		RTE_ASSERT(nr < size);					\
> >> +									\
> >> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> >> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> >> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
> >> +		return rte_atomic_load_explicit(a_addr, memory_order) &
> mask; \
> >> +	}
> >
> >
> > Similar considerations regarding volatile qualifier for the "once"
> operations.
> >

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v2 0/6] Improve EAL bit operations API
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
                       ` (5 preceding siblings ...)
  2024-04-25  8:58     ` [RFC v2 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
@ 2024-04-25 18:05     ` Tyler Retzlaff
  2024-04-26 11:17       ` Mattias Rönnblom
  2024-04-26 21:35     ` Patrick Robb
  7 siblings, 1 reply; 160+ messages in thread
From: Tyler Retzlaff @ 2024-04-25 18:05 UTC (permalink / raw)
  To: Mattias Rönnblom; +Cc: dev, hofors, Heng Wang, Stephen Hemminger

On Thu, Apr 25, 2024 at 10:58:47AM +0200, Mattias Rönnblom wrote:
> This patch set represent an attempt to improve and extend the RTE
> bitops API, in particular for functions that operate on individual
> bits.
> 
> All new functionality is exposed to the user as generic selection
> macros, delegating the actual work to private (__-marked) static
> inline functions. Public functions (e.g., rte_bit_set32()) would just
> be bloating the API. Such generic selection macros will here be
> referred to as "functions", although technically they are not.


> 
> The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
> replaced with three families:
> 
> rte_bit_[test|set|clear|assign]() which provides no memory ordering or
> atomicity guarantees and no read-once or write-once semantics (e.g.,
> no use of volatile), but does provide the best performance. The
> performance degradation resulting from the use of volatile (e.g.,
> forcing loads and stores to actually occur and in the number
> specified) and atomic (e.g., LOCK-prefixed instructions on x86) may be
> significant.
> 
> rte_bit_once_*() which guarantees program-level load and stores
> actually occurring (i.e., prevents certain optimizations). The primary
> use of these functions are in the context of memory mapped
> I/O. Feedback on the details (semantics, naming) here would be greatly
> appreciated, since the author is not much of a driver developer.
> 
> rte_bit_atomic_*() which provides atomic bit-level operations,
> including the possibility to specifying memory ordering constraints
> (or the lack thereof).
> 
> The atomic functions take non-_Atomic pointers, to be flexible, just
> like the GCC builtins and default <rte_stdatomic.h>. The issue with
> _Atomic APIs is that it may well be the case that the user wants to
> perform both non-atomic and atomic operations on the same word.
> 
> Having _Atomic-marked addresses would complicate supporting atomic
> bit-level operations in the bitset API (proposed in a different RFC
> patchset), and potentially other APIs depending on RTE bitops for
> atomic bit-level ops). Either one needs two bitset variants, one
> _Atomic bitset and one non-atomic one, or the bitset code needs to
> cast the non-_Atomic pointer to an _Atomic one. Having a separate
> _Atomic bitset would be bloat and also prevent the user from both, in
> some situations, doing atomic operations against a bit set, while in
> other situations (e.g., at times when MT safety is not a concern)
> operating on the same objects in a non-atomic manner.

understood. i think the only downside is that if you do have an
_Atomic-specified type you'll have to cast the qualification away
to use the function like macro.

as a suggestion the _Generic legs could include both _Atomic-specified
and non-_Atomic-specified types where an intermediate inline function
could strip the qualification to use your core inline implementations.

_Generic((v), int *: __foo32, RTE_ATOMIC(int) *: __foo32_unqual)(v))

static inline void
__foo32(int *a) { ... }

static inline void
__foo32_unqual(RTE_ATOMIC(int) *a) { __foo32((int *)(uintptr_t)(a)); }

there is some similar prior art in newer ISO C23 with typeof_unqual.

https://en.cppreference.com/w/c/language/typeof

> 
> Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
> not uint32_t or uint64_t. The author found the use of such large types
> confusing, and also failed to see any performance benefits.
> 
> A set of functions rte_bit_*_assign() are added, to assign a
> particular boolean value to a particular bit.
> 
> All new functions have properly documented semantics.
> 
> All new functions (or more correctly, generic selection macros)
> operate on both 32 and 64-bit words, with type checking.
> 
> _Generic allow the user code to be a little more impact. Have a
> type-generic atomic test/set/clear/assign bit API also seems
> consistent with the "core" (word-size) atomics API, which is generic
> (both GCC builtins and <rte_stdatomic.h> are).

ack, can you confirm _Generic is usable from a C++ TU? i may be making a
mistake locally but using g++ version 11.4.0 -std=c++20 it wasn't
accepting it.

i think using _Generic is ideal, it eliminates mistakes when handling
the different integer sizes so if it turns out C++ doesn't want to
cooperate the function like macro can conditionally expand to a C++
template this will need to be done for MSVC since i can confirm
_Generic does not work with MSVC C++.

> 
> The _Generic versions avoids having explicit unsigned long versions of
> all functions. If you have an unsigned long, it's safe to use the
> generic version (e.g., rte_set_bit()) and _Generic will pick the right
> function, provided long is either 32 or 64 bit on your platform (which
> it is on all DPDK-supported ABIs).
> 
> The generic rte_bit_set() is a macro, and not a function, but
> nevertheless has been given a lower-case name. That's how C11 does it
> (for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
> can't be taken, but it does not evaluate its parameters more than
> once.
> 
> Things that are left out of this patch set, that may be included
> in future versions:
> 
>  * Have all functions returning a bit number have the same return type
>    (i.e., unsigned int).
>  * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
>  * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
>    for useful/used bit-level GCC builtins.
>  * Eliminate the MSVC #ifdef-induced documentation duplication.
>  * _Generic versions of things like rte_popcount32(). (?)

it would be nice to see them all converted, at the time i added them we
still hadn't adopted C11 so was limited. but certainly not asking for it
as a part of this series.

> 
> Mattias Rönnblom (6):
>   eal: extend bit manipulation functionality
>   eal: add unit tests for bit operations
>   eal: add exactly-once bit access functions
>   eal: add unit tests for exactly-once bit access functions
>   eal: add atomic bit operations
>   eal: add unit tests for atomic bit access functions
> 
>  app/test/test_bitops.c       | 319 +++++++++++++++++-
>  lib/eal/include/rte_bitops.h | 624 ++++++++++++++++++++++++++++++++++-
>  2 files changed, 925 insertions(+), 18 deletions(-)
> 
> -- 

Series-acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

> 2.34.1

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-25 16:18           ` Morten Brørup
@ 2024-04-26  9:39             ` Mattias Rönnblom
  2024-04-26 12:00               ` Morten Brørup
  2024-04-30 16:52               ` Tyler Retzlaff
  0 siblings, 2 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-26  9:39 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev, Tyler Retzlaff
  Cc: Heng Wang, Stephen Hemminger, techboard

On 2024-04-25 18:18, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Thursday, 25 April 2024 16.36
>>
>> On 2024-04-25 12:25, Morten Brørup wrote:
>>>> +#define rte_bit_atomic_test(addr, nr, memory_order)
>> 	\
>>>> +	_Generic((addr),						\
>>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
>>>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
>> memory_order)
>>>
>>> I wonder if these should have RTE_ATOMIC qualifier:
>>>
>>> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
>> 		\
>>> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr,
>> memory_order)
>>>
>>>
>>>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
>> 	\
>>>> +	static inline bool						\
>>>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
>> 	\
>>>
>>> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
>>>
>>> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t)
>> *addr,	\
>>>
>>> instead of casting into a_addr.
>>>
>>
>> Check the cover letter for the rationale for the cast.
> 
> Thanks, that clarifies it. Then...
> For the series:
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> 
>>
>> Where I'm at now is that I think C11 _Atomic is rather poor design. The
>> assumption that an object which allows for atomic access always should
>> require all operations upon it to be atomic, regardless of where it is
>> in its lifetime, and which thread is accessing it, does not hold, in the
>> general case.
> 
> It might be slow, but I suppose the C11 standard prioritizes correctness over performance.
> 

That's a false dichotomy, in this case. You can have both.

> It seems locks are automatically added to _Atomic types larger than what is natively supported by the architecture.
> E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]
> 
> [1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-2022-version-17-5-preview-2/
> 
>>
>> The only reason for _Atomic being as it is, as far as I can see, is to
>> accommodate for ISAs which does not have the appropriate atomic machine
>> instructions, and thus require a lock or some other data associated with
>> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
>> such ISAs, to my knowledge. I suspect neither never will. So the cast
>> will continue to work.
> 
> I tend to agree with you on this.
> 
> We should officially decide that DPDK treats RTE_ATOMIC types as a union of _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic and non-atomic.
> 

I think this is a subject which needs to be further explored.

Objects that can be accessed both atomically and non-atomically should 
be without _Atomic. With my current understanding of this issue, that 
seems like the best option.

You could turn it around as well, and have such marked _Atomic and have 
explicit casts to their non-_Atomic cousins when operated upon by 
non-atomic functions. Not sure how realistic that is, since 
non-atomicity is the norm. All generic selection-based "functions" must 
take this into account.

>>
>>>> +				      unsigned int nr, int memory_order) \
>>>> +	{								\
>>>> +		RTE_ASSERT(nr < size);					\
>>>> +									\
>>>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
>>>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
>>>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>>>> +		return rte_atomic_load_explicit(a_addr, memory_order) &
>> mask; \
>>>> +	}
>>>
>>>
>>> Similar considerations regarding volatile qualifier for the "once"
>> operations.
>>>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v2 0/6] Improve EAL bit operations API
  2024-04-25 18:05     ` [RFC v2 0/6] Improve EAL bit operations API Tyler Retzlaff
@ 2024-04-26 11:17       ` Mattias Rönnblom
  0 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-26 11:17 UTC (permalink / raw)
  To: Tyler Retzlaff, Mattias Rönnblom; +Cc: dev, Heng Wang, Stephen Hemminger

On 2024-04-25 20:05, Tyler Retzlaff wrote:
> On Thu, Apr 25, 2024 at 10:58:47AM +0200, Mattias Rönnblom wrote:
>> This patch set represent an attempt to improve and extend the RTE
>> bitops API, in particular for functions that operate on individual
>> bits.
>>
>> All new functionality is exposed to the user as generic selection
>> macros, delegating the actual work to private (__-marked) static
>> inline functions. Public functions (e.g., rte_bit_set32()) would just
>> be bloating the API. Such generic selection macros will here be
>> referred to as "functions", although technically they are not.
> 
> 
>>
>> The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
>> replaced with three families:
>>
>> rte_bit_[test|set|clear|assign]() which provides no memory ordering or
>> atomicity guarantees and no read-once or write-once semantics (e.g.,
>> no use of volatile), but does provide the best performance. The
>> performance degradation resulting from the use of volatile (e.g.,
>> forcing loads and stores to actually occur and in the number
>> specified) and atomic (e.g., LOCK-prefixed instructions on x86) may be
>> significant.
>>
>> rte_bit_once_*() which guarantees program-level load and stores
>> actually occurring (i.e., prevents certain optimizations). The primary
>> use of these functions are in the context of memory mapped
>> I/O. Feedback on the details (semantics, naming) here would be greatly
>> appreciated, since the author is not much of a driver developer.
>>
>> rte_bit_atomic_*() which provides atomic bit-level operations,
>> including the possibility to specifying memory ordering constraints
>> (or the lack thereof).
>>
>> The atomic functions take non-_Atomic pointers, to be flexible, just
>> like the GCC builtins and default <rte_stdatomic.h>. The issue with
>> _Atomic APIs is that it may well be the case that the user wants to
>> perform both non-atomic and atomic operations on the same word.
>>
>> Having _Atomic-marked addresses would complicate supporting atomic
>> bit-level operations in the bitset API (proposed in a different RFC
>> patchset), and potentially other APIs depending on RTE bitops for
>> atomic bit-level ops). Either one needs two bitset variants, one
>> _Atomic bitset and one non-atomic one, or the bitset code needs to
>> cast the non-_Atomic pointer to an _Atomic one. Having a separate
>> _Atomic bitset would be bloat and also prevent the user from both, in
>> some situations, doing atomic operations against a bit set, while in
>> other situations (e.g., at times when MT safety is not a concern)
>> operating on the same objects in a non-atomic manner.
> 
> understood. i think the only downside is that if you do have an
> _Atomic-specified type you'll have to cast the qualification away
> to use the function like macro.
> 

This is tricky, and I can't say I've really converged on an opinion, but 
it seems to me at this point you shouldn't mark anything _Atomic.

> as a suggestion the _Generic legs could include both _Atomic-specified
> and non-_Atomic-specified types where an intermediate inline function
> could strip the qualification to use your core inline implementations.
> 
> _Generic((v), int *: __foo32, RTE_ATOMIC(int) *: __foo32_unqual)(v))
> 
> static inline void
> __foo32(int *a) { ... }
> 
> static inline void
> __foo32_unqual(RTE_ATOMIC(int) *a) { __foo32((int *)(uintptr_t)(a)); }
> 
> there is some similar prior art in newer ISO C23 with typeof_unqual.
> 
> https://en.cppreference.com/w/c/language/typeof
> 

This is an interesting solution, but I'm not sure it's a problem that 
needs to be solved.

>>
>> Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
>> not uint32_t or uint64_t. The author found the use of such large types
>> confusing, and also failed to see any performance benefits.
>>
>> A set of functions rte_bit_*_assign() are added, to assign a
>> particular boolean value to a particular bit.
>>
>> All new functions have properly documented semantics.
>>
>> All new functions (or more correctly, generic selection macros)
>> operate on both 32 and 64-bit words, with type checking.
>>
>> _Generic allow the user code to be a little more impact. Have a
>> type-generic atomic test/set/clear/assign bit API also seems
>> consistent with the "core" (word-size) atomics API, which is generic
>> (both GCC builtins and <rte_stdatomic.h> are).
> 
> ack, can you confirm _Generic is usable from a C++ TU? i may be making a
> mistake locally but using g++ version 11.4.0 -std=c++20 it wasn't
> accepting it.
> 
> i think using _Generic is ideal, it eliminates mistakes when handling
> the different integer sizes so if it turns out C++ doesn't want to
> cooperate the function like macro can conditionally expand to a C++
> template this will need to be done for MSVC since i can confirm
> _Generic does not work with MSVC C++.
> 

That's unfortunate.

No, I didn't try it with C++. I just assumed _Generic was C++ as well.

The naive solution would be to include two overloaded functions per 
function-like macro.

#ifdef __cplusplus

#undef rte_bit_set

static inline void
rte_bit_set(uint32_t *addr, unsigned int nr)
{
     __rte_bit_set32(addr, nr);
}

static inline void
rte_bit_set(uint64_t *addr, unsigned int nr)
{
     __rte_bit_set64(addr, nr);
}
#endif

Did you have something more clever/less verbose in mind? The best would 
if one could have a completely generic C++-compatible replacement of 
_Generic, but it's not obvious how that would work.

What's the minimum C++ version required by DPDK? C++11?

>>
>> The _Generic versions avoids having explicit unsigned long versions of
>> all functions. If you have an unsigned long, it's safe to use the
>> generic version (e.g., rte_set_bit()) and _Generic will pick the right
>> function, provided long is either 32 or 64 bit on your platform (which
>> it is on all DPDK-supported ABIs).
>>
>> The generic rte_bit_set() is a macro, and not a function, but
>> nevertheless has been given a lower-case name. That's how C11 does it
>> (for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
>> can't be taken, but it does not evaluate its parameters more than
>> once.
>>
>> Things that are left out of this patch set, that may be included
>> in future versions:
>>
>>   * Have all functions returning a bit number have the same return type
>>     (i.e., unsigned int).
>>   * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
>>   * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
>>     for useful/used bit-level GCC builtins.
>>   * Eliminate the MSVC #ifdef-induced documentation duplication.
>>   * _Generic versions of things like rte_popcount32(). (?)
> 
> it would be nice to see them all converted, at the time i added them we
> still hadn't adopted C11 so was limited. but certainly not asking for it
> as a part of this series.
> 
>>
>> Mattias Rönnblom (6):
>>    eal: extend bit manipulation functionality
>>    eal: add unit tests for bit operations
>>    eal: add exactly-once bit access functions
>>    eal: add unit tests for exactly-once bit access functions
>>    eal: add atomic bit operations
>>    eal: add unit tests for atomic bit access functions
>>
>>   app/test/test_bitops.c       | 319 +++++++++++++++++-
>>   lib/eal/include/rte_bitops.h | 624 ++++++++++++++++++++++++++++++++++-
>>   2 files changed, 925 insertions(+), 18 deletions(-)
>>
>> -- 
> 
> Series-acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> 
>> 2.34.1

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-26  9:39             ` Mattias Rönnblom
@ 2024-04-26 12:00               ` Morten Brørup
  2024-04-28 15:37                 ` Mattias Rönnblom
  2024-04-30 16:52               ` Tyler Retzlaff
  1 sibling, 1 reply; 160+ messages in thread
From: Morten Brørup @ 2024-04-26 12:00 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev, Tyler Retzlaff
  Cc: Heng Wang, Stephen Hemminger, techboard

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Friday, 26 April 2024 11.39
> 
> On 2024-04-25 18:18, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >> Sent: Thursday, 25 April 2024 16.36
> >>
> >> On 2024-04-25 12:25, Morten Brørup wrote:
> >>>> +#define rte_bit_atomic_test(addr, nr, memory_order)
> >> 	\
> >>>> +	_Generic((addr),						\
> >>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
> >>>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
> >> memory_order)
> >>>
> >>> I wonder if these should have RTE_ATOMIC qualifier:
> >>>
> >>> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
> >> 		\
> >>> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr,
> >> memory_order)
> >>>
> >>>
> >>>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
> >> 	\
> >>>> +	static inline bool						\
> >>>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
> >> 	\
> >>>
> >>> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
> >>>
> >>> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t)
> >> *addr,	\
> >>>
> >>> instead of casting into a_addr.
> >>>
> >>
> >> Check the cover letter for the rationale for the cast.
> >
> > Thanks, that clarifies it. Then...
> > For the series:
> > Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >
> >>
> >> Where I'm at now is that I think C11 _Atomic is rather poor design. The
> >> assumption that an object which allows for atomic access always should
> >> require all operations upon it to be atomic, regardless of where it is
> >> in its lifetime, and which thread is accessing it, does not hold, in the
> >> general case.
> >
> > It might be slow, but I suppose the C11 standard prioritizes correctness
> over performance.
> >
> 
> That's a false dichotomy, in this case. You can have both.

In theory you shouldn't need non-atomic access to atomic variables.
In reality, we want it anyway, because real CPUs are faster at non-atomic operations.

> 
> > It seems locks are automatically added to _Atomic types larger than what is
> natively supported by the architecture.
> > E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]
> >
> > [1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-
> 2022-version-17-5-preview-2/
> >
> >>
> >> The only reason for _Atomic being as it is, as far as I can see, is to
> >> accommodate for ISAs which does not have the appropriate atomic machine
> >> instructions, and thus require a lock or some other data associated with
> >> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
> >> such ISAs, to my knowledge. I suspect neither never will. So the cast
> >> will continue to work.
> >
> > I tend to agree with you on this.
> >
> > We should officially decide that DPDK treats RTE_ATOMIC types as a union of
> _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic
> and non-atomic.
> >
> 
> I think this is a subject which needs to be further explored.

Yes. It's easier exploring and deciding now, when our options are open, than after we have locked down the affected APIs.

> 
> Objects that can be accessed both atomically and non-atomically should
> be without _Atomic. With my current understanding of this issue, that
> seems like the best option.

Agree.

The alterative described below is certainly no good!

It would be nice if they were marked as sometimes-atomic by a qualifier or special type, like rte_be32_t marks the network byte order variant of an uint32_t.

Furthermore, large atomic objects need the _Atomic qualifier for the compiler to add (and use) the associated lock.
Alternatively, we could specify that sometimes-atomic objects cannot be larger than 8 byte, which is what MSVC can handle without locking.

> 
> You could turn it around as well, and have such marked _Atomic and have
> explicit casts to their non-_Atomic cousins when operated upon by
> non-atomic functions. Not sure how realistic that is, since
> non-atomicity is the norm. All generic selection-based "functions" must
> take this into account.
> 
> >>
> >>>> +				      unsigned int nr, int memory_order) \
> >>>> +	{								\
> >>>> +		RTE_ASSERT(nr < size);					\
> >>>> +									\
> >>>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> >>>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> >>>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
> >>>> +		return rte_atomic_load_explicit(a_addr, memory_order) &
> >> mask; \
> >>>> +	}
> >>>
> >>>
> >>> Similar considerations regarding volatile qualifier for the "once"
> >> operations.
> >>>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v2 0/6] Improve EAL bit operations API
  2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
                       ` (6 preceding siblings ...)
  2024-04-25 18:05     ` [RFC v2 0/6] Improve EAL bit operations API Tyler Retzlaff
@ 2024-04-26 21:35     ` Patrick Robb
  7 siblings, 0 replies; 160+ messages in thread
From: Patrick Robb @ 2024-04-26 21:35 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff

[-- Attachment #1: Type: text/plain, Size: 296 bytes --]

Recheck-request: iol-compile-amd64-testing

The DPDK Community Lab updated to the latest Alpine image yesterday, which
resulted in all Alpine builds failing. The failure is unrelated to your
patch, and this recheck should remove the fail on Patchwork, as we have
disabled Alpine testing for now.

[-- Attachment #2: Type: text/html, Size: 361 bytes --]

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-26 12:00               ` Morten Brørup
@ 2024-04-28 15:37                 ` Mattias Rönnblom
  2024-04-29  7:24                   ` Morten Brørup
  0 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-28 15:37 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev, Tyler Retzlaff
  Cc: Heng Wang, Stephen Hemminger, techboard

On 2024-04-26 14:00, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Friday, 26 April 2024 11.39
>>
>> On 2024-04-25 18:18, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>>>> Sent: Thursday, 25 April 2024 16.36
>>>>
>>>> On 2024-04-25 12:25, Morten Brørup wrote:
>>>>>> +#define rte_bit_atomic_test(addr, nr, memory_order)
>>>> 	\
>>>>>> +	_Generic((addr),						\
>>>>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
>>>>>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
>>>> memory_order)
>>>>>
>>>>> I wonder if these should have RTE_ATOMIC qualifier:
>>>>>
>>>>> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
>>>> 		\
>>>>> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr, nr,
>>>> memory_order)
>>>>>
>>>>>
>>>>>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
>>>> 	\
>>>>>> +	static inline bool						\
>>>>>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
>>>> 	\
>>>>>
>>>>> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
>>>>>
>>>>> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ## _t)
>>>> *addr,	\
>>>>>
>>>>> instead of casting into a_addr.
>>>>>
>>>>
>>>> Check the cover letter for the rationale for the cast.
>>>
>>> Thanks, that clarifies it. Then...
>>> For the series:
>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>>
>>>>
>>>> Where I'm at now is that I think C11 _Atomic is rather poor design. The
>>>> assumption that an object which allows for atomic access always should
>>>> require all operations upon it to be atomic, regardless of where it is
>>>> in its lifetime, and which thread is accessing it, does not hold, in the
>>>> general case.
>>>
>>> It might be slow, but I suppose the C11 standard prioritizes correctness
>> over performance.
>>>
>>
>> That's a false dichotomy, in this case. You can have both.
> 
> In theory you shouldn't need non-atomic access to atomic variables.
> In reality, we want it anyway, because real CPUs are faster at non-atomic operations.
> 
>>
>>> It seems locks are automatically added to _Atomic types larger than what is
>> natively supported by the architecture.
>>> E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]
>>>
>>> [1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-
>> 2022-version-17-5-preview-2/
>>>
>>>>
>>>> The only reason for _Atomic being as it is, as far as I can see, is to
>>>> accommodate for ISAs which does not have the appropriate atomic machine
>>>> instructions, and thus require a lock or some other data associated with
>>>> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
>>>> such ISAs, to my knowledge. I suspect neither never will. So the cast
>>>> will continue to work.
>>>
>>> I tend to agree with you on this.
>>>
>>> We should officially decide that DPDK treats RTE_ATOMIC types as a union of
>> _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic
>> and non-atomic.
>>>
>>
>> I think this is a subject which needs to be further explored.
> 
> Yes. It's easier exploring and deciding now, when our options are open, than after we have locked down the affected APIs.
> 
>>
>> Objects that can be accessed both atomically and non-atomically should
>> be without _Atomic. With my current understanding of this issue, that
>> seems like the best option.
> 
> Agree.
> 
> The alterative described below is certainly no good!
> 
> It would be nice if they were marked as sometimes-atomic by a qualifier or special type, like rte_be32_t marks the network byte order variant of an uint32_t.
> 
> Furthermore, large atomic objects need the _Atomic qualifier for the compiler to add (and use) the associated lock.

If you have larger objects than the ISA can handle, you wouldn't want to 
leave the choice of the synchronization primitive to use to the 
compiler. I don't see how it could possibly know, which one is the most 
appropriate, especially in a DPDK context. It would for example need to 
know if the contending threads are non-preemptable or not.

In some situations a sequence lock may well be your best option. Will 
the compiler generate one for you?

If "lock" means std::mutex, it would be a disaster, performance-wise.

> Alternatively, we could specify that sometimes-atomic objects cannot be larger than 8 byte, which is what MSVC can handle without locking.
> 
>>
>> You could turn it around as well, and have such marked _Atomic and have
>> explicit casts to their non-_Atomic cousins when operated upon by
>> non-atomic functions. Not sure how realistic that is, since
>> non-atomicity is the norm. All generic selection-based "functions" must
>> take this into account.
>>
>>>>
>>>>>> +				      unsigned int nr, int memory_order) \
>>>>>> +	{								\
>>>>>> +		RTE_ASSERT(nr < size);					\
>>>>>> +									\
>>>>>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
>>>>>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
>>>>>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>>>>>> +		return rte_atomic_load_explicit(a_addr, memory_order) &
>>>> mask; \
>>>>>> +	}
>>>>>
>>>>>
>>>>> Similar considerations regarding volatile qualifier for the "once"
>>>> operations.
>>>>>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-28 15:37                 ` Mattias Rönnblom
@ 2024-04-29  7:24                   ` Morten Brørup
  0 siblings, 0 replies; 160+ messages in thread
From: Morten Brørup @ 2024-04-29  7:24 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev, Tyler Retzlaff
  Cc: Heng Wang, Stephen Hemminger, techboard

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Sunday, 28 April 2024 17.38
> 
> On 2024-04-26 14:00, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >> Sent: Friday, 26 April 2024 11.39
> >>
> >> On 2024-04-25 18:18, Morten Brørup wrote:
> >>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >>>> Sent: Thursday, 25 April 2024 16.36
> >>>>
> >>>> On 2024-04-25 12:25, Morten Brørup wrote:
> >>>>>> +#define rte_bit_atomic_test(addr, nr, memory_order)
> >>>> 	\
> >>>>>> +	_Generic((addr),						\
> >>>>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
> >>>>>> +		 uint64_t *: __rte_bit_atomic_test64)(addr, nr,
> >>>> memory_order)
> >>>>>
> >>>>> I wonder if these should have RTE_ATOMIC qualifier:
> >>>>>
> >>>>> +		 RTE_ATOMIC(uint32_t) *: __rte_bit_atomic_test32,
> >>>> 		\
> >>>>> +		 RTE_ATOMIC(uint64_t) *: __rte_bit_atomic_test64)(addr,
> nr,
> >>>> memory_order)
> >>>>>
> >>>>>
> >>>>>> +#define __RTE_GEN_BIT_ATOMIC_TEST(size)
> >>>> 	\
> >>>>>> +	static inline bool						\
> >>>>>> +	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,
> >>>> 	\
> >>>>>
> >>>>> I wonder if the "addr" parameter should have RTE_ATOMIC qualifier:
> >>>>>
> >>>>> +	__rte_bit_atomic_test ## size(const RTE_ATOMIC(uint ## size ##
> _t)
> >>>> *addr,	\
> >>>>>
> >>>>> instead of casting into a_addr.
> >>>>>
> >>>>
> >>>> Check the cover letter for the rationale for the cast.
> >>>
> >>> Thanks, that clarifies it. Then...
> >>> For the series:
> >>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >>>
> >>>>
> >>>> Where I'm at now is that I think C11 _Atomic is rather poor design. The
> >>>> assumption that an object which allows for atomic access always should
> >>>> require all operations upon it to be atomic, regardless of where it is
> >>>> in its lifetime, and which thread is accessing it, does not hold, in the
> >>>> general case.
> >>>
> >>> It might be slow, but I suppose the C11 standard prioritizes correctness
> >> over performance.
> >>>
> >>
> >> That's a false dichotomy, in this case. You can have both.
> >
> > In theory you shouldn't need non-atomic access to atomic variables.
> > In reality, we want it anyway, because real CPUs are faster at non-atomic
> operations.
> >
> >>
> >>> It seems locks are automatically added to _Atomic types larger than what
> is
> >> natively supported by the architecture.
> >>> E.g. MSVC adds locks to _Atomic types larger than 8 byte. [1]
> >>>
> >>> [1]: https://devblogs.microsoft.com/cppblog/c11-atomics-in-visual-studio-
> >> 2022-version-17-5-preview-2/
> >>>
> >>>>
> >>>> The only reason for _Atomic being as it is, as far as I can see, is to
> >>>> accommodate for ISAs which does not have the appropriate atomic machine
> >>>> instructions, and thus require a lock or some other data associated with
> >>>> the actual user-data-carrying bits. Neither GCC nor DPDK supports any
> >>>> such ISAs, to my knowledge. I suspect neither never will. So the cast
> >>>> will continue to work.
> >>>
> >>> I tend to agree with you on this.
> >>>
> >>> We should officially decide that DPDK treats RTE_ATOMIC types as a union
> of
> >> _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both
> atomic
> >> and non-atomic.
> >>>
> >>
> >> I think this is a subject which needs to be further explored.
> >
> > Yes. It's easier exploring and deciding now, when our options are open, than
> after we have locked down the affected APIs.
> >
> >>
> >> Objects that can be accessed both atomically and non-atomically should
> >> be without _Atomic. With my current understanding of this issue, that
> >> seems like the best option.
> >
> > Agree.
> >
> > The alterative described below is certainly no good!
> >
> > It would be nice if they were marked as sometimes-atomic by a qualifier or
> special type, like rte_be32_t marks the network byte order variant of an
> uint32_t.
> >
> > Furthermore, large atomic objects need the _Atomic qualifier for the
> compiler to add (and use) the associated lock.
> 
> If you have larger objects than the ISA can handle, you wouldn't want to
> leave the choice of the synchronization primitive to use to the
> compiler. I don't see how it could possibly know, which one is the most
> appropriate, especially in a DPDK context. It would for example need to
> know if the contending threads are non-preemptable or not.
> 
> In some situations a sequence lock may well be your best option. Will
> the compiler generate one for you?
> 
> If "lock" means std::mutex, it would be a disaster, performance-wise.

Considering that the atomic functions, e.g. atomic_fetch_add(), without _explicit(..., memory_order) means memory_order_seq_cst, I think it does. This makes it relatively straightforward to use atomic types, at the cost of performance.

There's a good description here:
https://en.cppreference.com/w/c/language/atomic

Note that accessing members of an _Atomic struct/union is undefined behavior.
For those, you need to have a non-atomic type, used as "value" to void atomic_store( volatile _Atomic struct mytype * obj, const struct mytype value ), and return value from atomic_load( const volatile _Atomic struct mytype * obj ).

In other words, for structs/unions, _Atomic variables are only accessed through accessor functions taking pointers to them, and thereby transformed from/to values of similar non-atomic type.
I think that this concept also supports your suggestion above: Objects that can be accessed both atomically and non-atomically should be without _Atomic.

But I still think it would be a good idea to mark them as sometimes-atomic, for source code readability/review purposes.

E.g. the mbuf's refcnt field is of the type RTE_ATOMIC(uint16_t). If it is not only accessed through atomic_ accessor functions, should it still be marked RTE_ATOMIC()?

In the future, compilers might warn or error when an _Atomic variable (of any type) is being accessed directly.
The extreme solution would be not to mix atomic and non-atomic access to variables. But that seems unrealistic (at this time).

If we truly want to support C11 atomics, we need to understand and follow the concepts in the standard.

> 
> > Alternatively, we could specify that sometimes-atomic objects cannot be
> larger than 8 byte, which is what MSVC can handle without locking.
> >
> >>
> >> You could turn it around as well, and have such marked _Atomic and have
> >> explicit casts to their non-_Atomic cousins when operated upon by
> >> non-atomic functions. Not sure how realistic that is, since
> >> non-atomicity is the norm. All generic selection-based "functions" must
> >> take this into account.
> >>
> >>>>
> >>>>>> +				      unsigned int nr, int memory_order) \
> >>>>>> +	{								\
> >>>>>> +		RTE_ASSERT(nr < size);					\
> >>>>>> +									\
> >>>>>> +		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> >>>>>> +			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> >>>>>> +		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;
> 	\
> >>>>>> +		return rte_atomic_load_explicit(a_addr, memory_order) &
> >>>> mask; \
> >>>>>> +	}
> >>>>>
> >>>>>
> >>>>> Similar considerations regarding volatile qualifier for the "once"
> >>>> operations.
> >>>>>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v3 0/6] Improve EAL bit operations API
  2024-04-25  8:58     ` [RFC v2 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-29  9:51       ` Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
                           ` (5 more replies)
  0 siblings, 6 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign]() which provides no memory ordering or
atomicity guarantees and no read-once or write-once semantics (e.g.,
no use of volatile), but does provide the best performance. The
performance degradation resulting from the use of volatile (e.g.,
forcing loads and stores to actually occur and in the number
specified) and atomic (e.g., LOCK-prefixed instructions on x86) may be
significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't generic selection, and in C++ translation units the
_Generic macros are replaced with overloaded functions.

Things that are left out of this patch set, that may be included
in future versions:

 * Have all functions returning a bit number have the same return type
   (i.e., unsigned int).
 * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
 * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
   for useful/used bit-level GCC builtins.
 * Eliminate the MSVC #ifdef-induced documentation duplication.
 * _Generic versions of things like rte_popcount32(). (?)

Mattias Rönnblom (6):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add exactly-once bit access functions
  eal: add unit tests for exactly-once bit access functions
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions

 app/test/test_bitops.c       | 319 ++++++++++++++-
 lib/eal/include/rte_bitops.h | 768 ++++++++++++++++++++++++++++++++++-
 2 files changed, 1069 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v3 1/6] eal: extend bit manipulation functionality
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-04-29  9:51         ` Mattias Rönnblom
  2024-04-29 11:12           ` Morten Brørup
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 2/6] eal: add unit tests for bit operations Mattias Rönnblom
                           ` (4 subsequent siblings)
  5 siblings, 2 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add functionality to test, set, clear, and assign the value to
individual bits in 32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 218 ++++++++++++++++++++++++++++++++++-
 1 file changed, 216 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..fb2e3dae7b 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,157 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_test32,			\
+		 uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+#define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
+	static inline bool						\
+	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(name, size, qualifier)			\
+	static inline void						\
+	name(qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(name, size, qualifier)			\
+	static inline void						\
+	name(qualifier uint ## size ## _t *addr, unsigned int nr)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+__RTE_GEN_BIT_TEST(__rte_bit_test32, 32, )
+__RTE_GEN_BIT_SET(__rte_bit_set32, 32, )
+__RTE_GEN_BIT_CLEAR(__rte_bit_clear32, 32, )
+
+__RTE_GEN_BIT_TEST(__rte_bit_test64, 64, )
+__RTE_GEN_BIT_SET(__rte_bit_set64, 64, )
+__RTE_GEN_BIT_CLEAR(__rte_bit_clear64, 64, )
+
+__rte_experimental
+static inline void
+__rte_bit_assign32(uint32_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_set32(addr, nr);
+	else
+		__rte_bit_clear32(addr, nr);
+}
+
+__rte_experimental
+static inline void
+__rte_bit_assign64(uint64_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_set64(addr, nr);
+	else
+		__rte_bit_clear64(addr, nr);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +941,66 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set, , unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear, , unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign, , unsigned int, nr, bool, value)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v3 2/6] eal: add unit tests for bit operations
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-29  9:51         ` Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_[set|clear|assign|test]()
family of functions.

The tests are converted to use the test suite runner framework.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c | 76 +++++++++++++++++++++++++++++++++---------
 1 file changed, 61 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..f788b561a0 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,59 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    test_fun, size)				\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access_32, rte_bit_set, rte_bit_clear, \
+		    rte_bit_assign, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access_64, rte_bit_set, rte_bit_clear, \
+		    rte_bit_assign, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +163,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access_32),
+		TEST_CASE(test_bit_access_64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v3 3/6] eal: add exactly-once bit access functions
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 2/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-04-29  9:51         ` Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 4/6] eal: add unit tests for " Mattias Rönnblom
                           ` (2 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add bit test/set/clear/assign functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

RFC v3:
    * Work around lack of C++ support for _Generic (Tyler Retzlaff).

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 180 +++++++++++++++++++++++++++++++++++
 1 file changed, 180 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index fb2e3dae7b..eac3f8b86a 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -201,6 +201,147 @@ extern "C" {
 		 uint32_t *: __rte_bit_assign32,			\
 		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Generic selection macro to test exactly once the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set(addr, 17);
+ * if (rte_bit_once_test(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set() and
+ * rte_bit_test() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * the equivalent to
+ * \code{.c}
+ * if (rte_bit_test(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * rte_bit_once_test() does not give any guarantees in regards to
+ * memory ordering or atomicity.
+ *
+ * The regular bit set operations (e.g., rte_bit_test()) should be
+ * preferred over the "once" family of operations (e.g.,
+ * rte_bit_once_test()) if possible, since the latter may prevent
+ * optimizations crucial for run-time performance.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_test32,		\
+		 uint64_t *: __rte_bit_once_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '1'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_set32,		\
+		 uint64_t *: __rte_bit_once_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '0'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_clear32,		\
+		 uint64_t *: __rte_bit_once_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to bit in a word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to the
+ * value indicated by @c value exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_once_assign32,			\
+		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
+
 #define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
 	static inline bool						\
 	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
@@ -239,6 +380,14 @@ __RTE_GEN_BIT_TEST(__rte_bit_test64, 64, )
 __RTE_GEN_BIT_SET(__rte_bit_set64, 64, )
 __RTE_GEN_BIT_CLEAR(__rte_bit_clear64, 64, )
 
+__RTE_GEN_BIT_TEST(__rte_bit_once_test32, 32, volatile)
+__RTE_GEN_BIT_SET(__rte_bit_once_set32, 32, volatile)
+__RTE_GEN_BIT_CLEAR(__rte_bit_once_clear32, 32, volatile)
+
+__RTE_GEN_BIT_TEST(__rte_bit_once_test64, 64, volatile)
+__RTE_GEN_BIT_SET(__rte_bit_once_set64, 64, volatile)
+__RTE_GEN_BIT_CLEAR(__rte_bit_once_clear64, 64, volatile)
+
 __rte_experimental
 static inline void
 __rte_bit_assign32(uint32_t *addr, unsigned int nr, bool value)
@@ -259,6 +408,27 @@ __rte_bit_assign64(uint64_t *addr, unsigned int nr, bool value)
 		__rte_bit_clear64(addr, nr);
 }
 
+
+__rte_experimental
+static inline void
+__rte_bit_once_assign32(uint32_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_once_set32(addr, nr);
+	else
+		__rte_bit_once_clear32(addr, nr);
+}
+
+__rte_experimental
+static inline void
+__rte_bit_once_assign64(volatile uint64_t *addr, unsigned int nr, bool value)
+{
+	if (value)
+		__rte_bit_once_set64(addr, nr);
+	else
+		__rte_bit_once_clear64(addr, nr);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -953,6 +1123,11 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_clear
 #undef rte_bit_assign
 
+#undef rte_bit_once_test
+#undef rte_bit_once_set
+#undef rte_bit_once_clear
+#undef rte_bit_once_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1001,6 +1176,11 @@ __RTE_BIT_OVERLOAD_2(set, , unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear, , unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign, , unsigned int, nr, bool, value)
 
+__RTE_BIT_OVERLOAD_2R(once_test, const volatile, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_set, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v3 4/6] eal: add unit tests for exactly-once bit access functions
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
                           ` (2 preceding siblings ...)
  2024-04-29  9:51         ` [RFC v3 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-04-29  9:51         ` Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 5/6] eal: add atomic bit operations Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_once_[set|clear|assign|test]() family of functions.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index f788b561a0..12c1027e36 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -46,12 +46,20 @@
 		return TEST_SUCCESS;					\
 	}
 
-GEN_TEST_BIT_ACCESS(test_bit_access_32, rte_bit_set, rte_bit_clear, \
+GEN_TEST_BIT_ACCESS(test_bit_access_32, rte_bit_set, rte_bit_clear,	\
 		    rte_bit_assign, rte_bit_test, 32)
 
-GEN_TEST_BIT_ACCESS(test_bit_access_64, rte_bit_set, rte_bit_clear, \
+GEN_TEST_BIT_ACCESS(test_bit_access_64, rte_bit_set, rte_bit_clear,	\
 		    rte_bit_assign, rte_bit_test, 64)
 
+GEN_TEST_BIT_ACCESS(test_bit_once_access_32, rte_bit_once_set,		\
+		    rte_bit_once_clear, rte_bit_once_assign,		\
+		    rte_bit_once_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_once_access_64, rte_bit_once_set,		\
+		    rte_bit_once_clear, rte_bit_once_assign,		\
+		    rte_bit_once_test, 64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -168,6 +176,8 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access_32),
 		TEST_CASE(test_bit_access_64),
+		TEST_CASE(test_bit_once_access_32),
+		TEST_CASE(test_bit_once_access_64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v3 5/6] eal: add atomic bit operations
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
                           ` (3 preceding siblings ...)
  2024-04-29  9:51         ` [RFC v3 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-04-29  9:51         ` Mattias Rönnblom
  2024-04-29  9:51         ` [RFC v3 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add atomic bit test/set/clear/assign and test-and-set/clear functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/eal/include/rte_bitops.h | 371 +++++++++++++++++++++++++++++++++++
 1 file changed, 371 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index eac3f8b86a..2af5355a8a 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -342,6 +343,177 @@ extern "C" {
 		 uint32_t *: __rte_bit_once_assign32,			\
 		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(name, size, qualifier)			\
 	static inline bool						\
 	name(const qualifier uint ## size ## _t *addr, unsigned int nr)	\
@@ -429,6 +601,131 @@ __rte_bit_once_assign64(volatile uint64_t *addr, unsigned int nr, bool value)
 		__rte_bit_once_clear64(addr, nr);
 }
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_assign ## size(&target, nr, value);	\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+		return __rte_bit_test ## size(&before, nr);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false,
+						  memory_order);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -1128,6 +1425,14 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_once_clear
 #undef rte_bit_once_assign
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1171,6 +1476,59 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set, , unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear, , unsigned int, nr)
@@ -1181,6 +1539,19 @@ __RTE_BIT_OVERLOAD_2(once_set, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,	\
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v3 6/6] eal: add unit tests for atomic bit access functions
  2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
                           ` (4 preceding siblings ...)
  2024-04-29  9:51         ` [RFC v3 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-04-29  9:51         ` Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-29  9:51 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_atomic_[set|clear|assign|test|test_and_[set|clear|assign]]()
family of functions.

RFC v3:
 * Rename variable 'main' to make ICC happy.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c       | 233 ++++++++++++++++++++++++++++++++++-
 lib/eal/include/rte_bitops.h |   1 -
 2 files changed, 232 insertions(+), 2 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 12c1027e36..d77793dfe8 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -60,6 +63,228 @@ GEN_TEST_BIT_ACCESS(test_bit_once_access_64, rte_bit_once_set,		\
 		    rte_bit_once_clear, rte_bit_once_assign,		\
 		    rte_bit_once_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access_32, bit_atomic_set,	\
+		    bit_atomic_clear, bit_atomic_assign,	\
+		    bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access_64, bit_atomic_set,	\
+		    bit_atomic_clear, bit_atomic_assign,	\
+		    bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore_ ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign_ ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore_ ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign_ ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore_ ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore_ ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign_ ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign_ ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore_ ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify_ ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore_ ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify_ ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore_ ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore_ ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify_ ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify_ ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -178,6 +403,12 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access_64),
 		TEST_CASE(test_bit_once_access_32),
 		TEST_CASE(test_bit_once_access_64),
+		TEST_CASE(test_bit_atomic_access_32),
+		TEST_CASE(test_bit_atomic_access_64),
+		TEST_CASE(test_bit_atomic_parallel_assign_32),
+		TEST_CASE(test_bit_atomic_parallel_assign_64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify_32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify_64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 2af5355a8a..5717691e7c 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -485,7 +485,6 @@ extern "C" {
 		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
 		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
 								memory_order)
-
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [RFC v3 1/6] eal: extend bit manipulation functionality
  2024-04-29  9:51         ` [RFC v3 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-29 11:12           ` Morten Brørup
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
  1 sibling, 0 replies; 160+ messages in thread
From: Morten Brørup @ 2024-04-29 11:12 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Monday, 29 April 2024 11.52
> 
> Add functionality to test, set, clear, and assign the value to
> individual bits in 32-bit or 64-bit words.
> 
> These functions have no implications on memory ordering, atomicity and
> does not use volatile and thus does not prevent any compiler
> optimizations.
> 
> RFC v3:
>  * Work around lack of C++ support for _Generic (Tyler Retzlaff).
>  * Fix ','-related checkpatch warnings.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---

For the series,
Acked-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v4 0/6] Improve EAL bit operations API
  2024-04-29  9:51         ` [RFC v3 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-04-29 11:12           ` Morten Brørup
@ 2024-04-30  9:55           ` Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
                               ` (5 more replies)
  1 sibling, 6 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees and no read-once or write-once
semantics (e.g., no use of volatile), but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions.

Things that are left out of this patch set, that may be included
in future versions:

 * Have all functions returning a bit number have the same return type
   (i.e., unsigned int).
 * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
 * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
   for useful/used bit-level GCC builtins.
 * Eliminate the MSVC #ifdef-induced documentation duplication.
 * _Generic versions of things like rte_popcount32(). (?)

Mattias Rönnblom (6):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add exactly-once bit access functions
  eal: add unit tests for exactly-once bit access functions
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions

 app/test/test_bitops.c       |  405 ++++++++++++-
 lib/eal/include/rte_bitops.h | 1070 +++++++++++++++++++++++++++++++++-
 2 files changed, 1457 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v4 1/6] eal: extend bit manipulation functionality
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-04-30  9:55             ` Mattias Rönnblom
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 2/6] eal: add unit tests for bit operations Mattias Rönnblom
                               ` (4 subsequent siblings)
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add functionality to test, set, clear, and assign the value to
individual bits in 32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 257 ++++++++++++++++++++++++++++++++++-
 1 file changed, 255 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..9d426f1602 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,194 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_test32,			\
+		 uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## family ## fun ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(family, fun, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## family ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## family ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## family ## test ## size(addr, nr);	\
+		__rte_bit_ ## family ## assign ## size(addr, nr, !value); \
+	}
+
+__RTE_GEN_BIT_TEST(, test,, 32)
+__RTE_GEN_BIT_SET(, set,, 32)
+__RTE_GEN_BIT_CLEAR(, clear,, 32)
+__RTE_GEN_BIT_ASSIGN(, assign,, 32)
+__RTE_GEN_BIT_FLIP(, flip,, 32)
+
+__RTE_GEN_BIT_TEST(, test,, 64)
+__RTE_GEN_BIT_SET(, set,, 64)
+__RTE_GEN_BIT_CLEAR(, clear,, 64)
+__RTE_GEN_BIT_ASSIGN(, assign,, 64)
+__RTE_GEN_BIT_FLIP(, flip,, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +978,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v4 2/6] eal: add unit tests for bit operations
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-30  9:55             ` Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
                               ` (3 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_[set|clear|assign|test]()
family of functions.

The tests are converted to use the test suite runner framework.

RFC v4:
 * Remove redundant line continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 80 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 65 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..111f9b328e 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,63 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +167,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v4 3/6] eal: add exactly-once bit access functions
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 2/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-04-30  9:55             ` Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 4/6] eal: add unit tests for " Mattias Rönnblom
                               ` (2 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add bit test/set/clear/assign functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

RFC v3:
    * Work around lack of C++ support for _Generic (Tyler Retzlaff).

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 195 +++++++++++++++++++++++++++++++++++
 1 file changed, 195 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 9d426f1602..f77bd83e97 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -224,6 +224,177 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Generic selection macro to test exactly once the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set(addr, 17);
+ * if (rte_bit_once_test(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set() and
+ * rte_bit_test() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * the equivalent to
+ * \code{.c}
+ * if (rte_bit_test(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * rte_bit_once_test() does not give any guarantees in regards to
+ * memory ordering or atomicity.
+ *
+ * The regular bit set operations (e.g., rte_bit_test()) should be
+ * preferred over the "once" family of operations (e.g.,
+ * rte_bit_once_test()) if possible, since the latter may prevent
+ * optimizations crucial for run-time performance.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_test32,		\
+		 uint64_t *: __rte_bit_once_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '1'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_set32,		\
+		 uint64_t *: __rte_bit_once_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '0'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_clear32,		\
+		 uint64_t *: __rte_bit_once_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to bit in a word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to the
+ * value indicated by @c value exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_once_assign32,			\
+		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip bit in word, reading and writing exactly once.
+ *
+ * Change the value of a bit to '0' if '1' or '1' if '0' in a 32-bit
+ * or 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit flip operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_flip(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_flip32,		\
+		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -296,6 +467,18 @@ __RTE_GEN_BIT_CLEAR(, clear,, 64)
 __RTE_GEN_BIT_ASSIGN(, assign,, 64)
 __RTE_GEN_BIT_FLIP(, flip,, 64)
 
+__RTE_GEN_BIT_TEST(once_, test, volatile, 32)
+__RTE_GEN_BIT_SET(once_, set, volatile, 32)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 32)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 32)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 32)
+
+__RTE_GEN_BIT_TEST(once_, test, volatile, 64)
+__RTE_GEN_BIT_SET(once_, set, volatile, 64)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -991,6 +1174,12 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_once_test
+#undef rte_bit_once_set
+#undef rte_bit_once_clear
+#undef rte_bit_once_assign
+#undef rte_bit_once_flip
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1040,6 +1229,12 @@ __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_2R(once_test, const volatile, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_set, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v4 4/6] eal: add unit tests for exactly-once bit access functions
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
                               ` (2 preceding siblings ...)
  2024-04-30  9:55             ` [RFC v4 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-04-30  9:55             ` Mattias Rönnblom
  2024-04-30 10:37               ` Morten Brørup
  2024-04-30  9:55             ` [RFC v4 5/6] eal: add atomic bit operations Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_once_[set|clear|assign|test]() family of functions.

RFC v4:
 * Remove redundant continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c       |  10 +
 lib/eal/include/rte_bitops.h | 425 +++++++++++++++++++++++++++++++++++
 2 files changed, 435 insertions(+)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 111f9b328e..615ec6e563 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -56,6 +56,14 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+GEN_TEST_BIT_ACCESS(test_bit_once_access32, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -172,6 +180,8 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index f77bd83e97..abfe96d531 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -395,6 +396,199 @@ extern "C" {
 		 uint32_t *: __rte_bit_once_flip32,		\
 		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Atomically negate the value of the bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -479,6 +673,162 @@ __RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
 __RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
 __RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_assign ## size(&target, nr, value);	\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+		return __rte_bit_test ## size(&before, nr);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_flip ## size(&target, nr);		\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
+	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false,
+						  memory_order);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -1180,6 +1530,14 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_once_assign
 #undef rte_bit_once_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1223,6 +1581,59 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
@@ -1235,6 +1646,20 @@ __RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v4 5/6] eal: add atomic bit operations
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
                               ` (3 preceding siblings ...)
  2024-04-30  9:55             ` [RFC v4 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-04-30  9:55             ` Mattias Rönnblom
  2024-04-30  9:55             ` [RFC v4 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add atomic bit test/set/clear/assign and test-and-set/clear functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 194 +++++++++++++++++++++++++++++++++++
 1 file changed, 194 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index abfe96d531..f014bd913e 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -589,6 +589,199 @@ extern "C" {
 								 value, \
 								 memory_order)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Atomically negate the value of the bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -1534,6 +1727,7 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_atomic_set
 #undef rte_bit_atomic_clear
 #undef rte_bit_atomic_assign
+#undef rte_bit_atomic_flip
 #undef rte_bit_atomic_test_and_set
 #undef rte_bit_atomic_test_and_clear
 #undef rte_bit_atomic_test_and_assign
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v4 6/6] eal: add unit tests for atomic bit access functions
  2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
                               ` (4 preceding siblings ...)
  2024-04-30  9:55             ` [RFC v4 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-04-30  9:55             ` Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30  9:55 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_atomic_[set|clear|assign|test|test_and_[set|clear|assign]]()
family of functions.

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c       | 315 ++++++++++++++++++++++++++++++++++-
 lib/eal/include/rte_bitops.h |   1 -
 2 files changed, 314 insertions(+), 2 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 615ec6e563..abc07e8caf 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -64,6 +67,304 @@ GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
 		    rte_bit_once_clear, rte_bit_once_assign,
 		    rte_bit_once_flip, rte_bit_once_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -182,6 +483,18 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_once_access32),
 		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index f014bd913e..fb771c6dfc 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -560,7 +560,6 @@ extern "C" {
 		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
 		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
 								memory_order)
-
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [RFC v4 4/6] eal: add unit tests for exactly-once bit access functions
  2024-04-30  9:55             ` [RFC v4 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-04-30 10:37               ` Morten Brørup
  2024-04-30 11:58                 ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: Morten Brørup @ 2024-04-30 10:37 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Tuesday, 30 April 2024 11.55
> 
> Extend bitops tests to cover the
> rte_bit_once_[set|clear|assign|test]() family of functions.
> 
> RFC v4:
>  * Remove redundant continuations.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>  app/test/test_bitops.c       |  10 +
>  lib/eal/include/rte_bitops.h | 425 +++++++++++++++++++++++++++++++++++
>  2 files changed, 435 insertions(+)

The rte_bitops.h changes belong in another patch in the series.



^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v4 4/6] eal: add unit tests for exactly-once bit access functions
  2024-04-30 10:37               ` Morten Brørup
@ 2024-04-30 11:58                 ` Mattias Rönnblom
  0 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 11:58 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

On 2024-04-30 12:37, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Tuesday, 30 April 2024 11.55
>>
>> Extend bitops tests to cover the
>> rte_bit_once_[set|clear|assign|test]() family of functions.
>>
>> RFC v4:
>>   * Remove redundant continuations.
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>> ---
>>   app/test/test_bitops.c       |  10 +
>>   lib/eal/include/rte_bitops.h | 425 +++++++++++++++++++++++++++++++++++
>>   2 files changed, 435 insertions(+)
> 
> The rte_bitops.h changes belong in another patch in the series.
> 
> 

Thanks. Will send a v5.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v5 0/6] Improve EAL bit operations API
  2024-04-30  9:55             ` [RFC v4 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-30 12:08               ` Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
                                   ` (5 more replies)
  0 siblings, 6 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees and no read-once or write-once
semantics (e.g., no use of volatile), but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions.

Things that are left out of this patch set, that may be included
in future versions:

 * Have all functions returning a bit number have the same return type
   (i.e., unsigned int).
 * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
 * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
   for useful/used bit-level GCC builtins.
 * Eliminate the MSVC #ifdef-induced documentation duplication.
 * _Generic versions of things like rte_popcount32(). (?)

Mattias Rönnblom (6):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add exactly-once bit access functions
  eal: add unit tests for exactly-once bit access functions
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions

 app/test/test_bitops.c       | 405 +++++++++++++++-
 lib/eal/include/rte_bitops.h | 877 ++++++++++++++++++++++++++++++++++-
 2 files changed, 1264 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v5 1/6] eal: extend bit manipulation functionality
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-04-30 12:08                 ` Mattias Rönnblom
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 2/6] eal: add unit tests for bit operations Mattias Rönnblom
                                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add functionality to test, set, clear, and assign the value to
individual bits in 32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 257 ++++++++++++++++++++++++++++++++++-
 1 file changed, 255 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..9d426f1602 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,194 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_test32,			\
+		 uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## family ## fun ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(family, fun, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## family ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## family ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## family ## test ## size(addr, nr);	\
+		__rte_bit_ ## family ## assign ## size(addr, nr, !value); \
+	}
+
+__RTE_GEN_BIT_TEST(, test,, 32)
+__RTE_GEN_BIT_SET(, set,, 32)
+__RTE_GEN_BIT_CLEAR(, clear,, 32)
+__RTE_GEN_BIT_ASSIGN(, assign,, 32)
+__RTE_GEN_BIT_FLIP(, flip,, 32)
+
+__RTE_GEN_BIT_TEST(, test,, 64)
+__RTE_GEN_BIT_SET(, set,, 64)
+__RTE_GEN_BIT_CLEAR(, clear,, 64)
+__RTE_GEN_BIT_ASSIGN(, assign,, 64)
+__RTE_GEN_BIT_FLIP(, flip,, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +978,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v5 2/6] eal: add unit tests for bit operations
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-04-30 12:08                 ` Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
                                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_[set|clear|assign|test]()
family of functions.

The tests are converted to use the test suite runner framework.

RFC v4:
 * Remove redundant line continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 80 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 65 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..111f9b328e 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,63 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +167,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v5 3/6] eal: add exactly-once bit access functions
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 2/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-04-30 12:08                 ` Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 4/6] eal: add unit tests for " Mattias Rönnblom
                                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add bit test/set/clear/assign functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

RFC v3:
    * Work around lack of C++ support for _Generic (Tyler Retzlaff).

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 195 +++++++++++++++++++++++++++++++++++
 1 file changed, 195 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 9d426f1602..f77bd83e97 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -224,6 +224,177 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Generic selection macro to test exactly once the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set(addr, 17);
+ * if (rte_bit_once_test(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set() and
+ * rte_bit_test() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * the equivalent to
+ * \code{.c}
+ * if (rte_bit_test(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * rte_bit_once_test() does not give any guarantees in regards to
+ * memory ordering or atomicity.
+ *
+ * The regular bit set operations (e.g., rte_bit_test()) should be
+ * preferred over the "once" family of operations (e.g.,
+ * rte_bit_once_test()) if possible, since the latter may prevent
+ * optimizations crucial for run-time performance.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_test32,		\
+		 uint64_t *: __rte_bit_once_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '1'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_set32,		\
+		 uint64_t *: __rte_bit_once_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '0'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_clear32,		\
+		 uint64_t *: __rte_bit_once_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to bit in a word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to the
+ * value indicated by @c value exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_once_assign32,			\
+		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip bit in word, reading and writing exactly once.
+ *
+ * Change the value of a bit to '0' if '1' or '1' if '0' in a 32-bit
+ * or 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit flip operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_flip(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_flip32,		\
+		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -296,6 +467,18 @@ __RTE_GEN_BIT_CLEAR(, clear,, 64)
 __RTE_GEN_BIT_ASSIGN(, assign,, 64)
 __RTE_GEN_BIT_FLIP(, flip,, 64)
 
+__RTE_GEN_BIT_TEST(once_, test, volatile, 32)
+__RTE_GEN_BIT_SET(once_, set, volatile, 32)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 32)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 32)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 32)
+
+__RTE_GEN_BIT_TEST(once_, test, volatile, 64)
+__RTE_GEN_BIT_SET(once_, set, volatile, 64)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -991,6 +1174,12 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_once_test
+#undef rte_bit_once_set
+#undef rte_bit_once_clear
+#undef rte_bit_once_assign
+#undef rte_bit_once_flip
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1040,6 +1229,12 @@ __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_2R(once_test, const volatile, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_set, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v5 4/6] eal: add unit tests for exactly-once bit access functions
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
                                   ` (2 preceding siblings ...)
  2024-04-30 12:08                 ` [RFC v5 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-04-30 12:08                 ` Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 5/6] eal: add atomic bit operations Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_once_[set|clear|assign|test]() family of functions.

RFC v5:
 * Atomic bit op implementation moved from this patch to the proper
   patch in the series. (Morten Brørup)

RFC v4:
 * Remove redundant continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 111f9b328e..615ec6e563 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -56,6 +56,14 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+GEN_TEST_BIT_ACCESS(test_bit_once_access32, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -172,6 +180,8 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v5 5/6] eal: add atomic bit operations
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
                                   ` (3 preceding siblings ...)
  2024-04-30 12:08                 ` [RFC v5 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-04-30 12:08                 ` Mattias Rönnblom
  2024-04-30 12:08                 ` [RFC v5 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add atomic bit test/set/clear/assign and test-and-set/clear functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 425 +++++++++++++++++++++++++++++++++++
 1 file changed, 425 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index f77bd83e97..abfe96d531 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -395,6 +396,199 @@ extern "C" {
 		 uint32_t *: __rte_bit_once_flip32,		\
 		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 uint64_t *: __rte_bit_atomic_test64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Atomically negate the value of the bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -479,6 +673,162 @@ __RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
 __RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
 __RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_assign ## size(&target, nr, value);	\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+		return __rte_bit_test ## size(&before, nr);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_flip ## size(&target, nr);		\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
+	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false,
+						  memory_order);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -1180,6 +1530,14 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_once_assign
 #undef rte_bit_once_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1223,6 +1581,59 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
@@ -1235,6 +1646,20 @@ __RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v5 6/6] eal: add unit tests for atomic bit access functions
  2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
                                   ` (4 preceding siblings ...)
  2024-04-30 12:08                 ` [RFC v5 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-04-30 12:08                 ` Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-04-30 12:08 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_atomic_[set|clear|assign|test|test_and_[set|clear|assign]]()
family of functions.

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 315 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 314 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 615ec6e563..abc07e8caf 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -64,6 +67,304 @@ GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
 		    rte_bit_once_clear, rte_bit_once_assign,
 		    rte_bit_once_flip, rte_bit_once_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -182,6 +483,18 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_once_access32),
 		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v2 5/6] eal: add atomic bit operations
  2024-04-26  9:39             ` Mattias Rönnblom
  2024-04-26 12:00               ` Morten Brørup
@ 2024-04-30 16:52               ` Tyler Retzlaff
  1 sibling, 0 replies; 160+ messages in thread
From: Tyler Retzlaff @ 2024-04-30 16:52 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Morten Brørup, Mattias Rönnblom, dev, Heng Wang,
	Stephen Hemminger, techboard

On Fri, Apr 26, 2024 at 11:39:17AM +0200, Mattias Rönnblom wrote:

[ ... ]

> >
> >>
> >>The only reason for _Atomic being as it is, as far as I can see, is to
> >>accommodate for ISAs which does not have the appropriate atomic machine
> >>instructions, and thus require a lock or some other data associated with
> >>the actual user-data-carrying bits. Neither GCC nor DPDK supports any
> >>such ISAs, to my knowledge. I suspect neither never will. So the cast
> >>will continue to work.
> >
> >I tend to agree with you on this.
> >
> >We should officially decide that DPDK treats RTE_ATOMIC types as a union of _Atomic and non-atomic, i.e. operations on RTE_ATOMIC types can be both atomic and non-atomic.
> >
> 
> I think this is a subject which needs to be further explored.
> 
> Objects that can be accessed both atomically and non-atomically
> should be without _Atomic. With my current understanding of this
> issue, that seems like the best option.

i've been distracted by other work and while not in the scope of this
series i want to say +1 to having this discussion. utilizing a union for
this atomic vs non-atomic access that appears in practice is a good idea.

> 
> You could turn it around as well, and have such marked _Atomic and
> have explicit casts to their non-_Atomic cousins when operated upon
> by non-atomic functions. Not sure how realistic that is, since
> non-atomicity is the norm. All generic selection-based "functions"
> must take this into account.

the problem with casts is they are actually different types and may have
different size and/or alignment relative to their non-atomic types.
for current non-locking atomics the implementations happen to be the
same (presumably because it was practical) but the union is definitely a
cleaner approach.

> 
> >>
> >>>>+				      unsigned int nr, int memory_order) \
> >>>>+	{								\
> >>>>+		RTE_ASSERT(nr < size);					\
> >>>>+									\
> >>>>+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> >>>>+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> >>>>+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
> >>>>+		return rte_atomic_load_explicit(a_addr, memory_order) &
> >>mask; \
> >>>>+	}
> >>>
> >>>
> >>>Similar considerations regarding volatile qualifier for the "once"
> >>operations.
> >>>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v6 0/6] Improve EAL bit operations API
  2024-04-30 12:08                 ` [RFC v5 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-05-02  5:57                   ` Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
                                       ` (5 more replies)
  0 siblings, 6 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees and no read-once or write-once
semantics (e.g., no use of volatile), but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions.

Mattias Rönnblom (6):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add exactly-once bit access functions
  eal: add unit tests for exactly-once bit access functions
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions

 app/test/test_bitops.c       | 410 +++++++++++++++-
 lib/eal/include/rte_bitops.h | 884 ++++++++++++++++++++++++++++++++++-
 2 files changed, 1276 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v6 1/6] eal: extend bit manipulation functionality
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-05-02  5:57                     ` Mattias Rönnblom
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 2/6] eal: add unit tests for bit operations Mattias Rönnblom
                                       ` (4 subsequent siblings)
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add functionality to test and modify the value of individual bits in
32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

RFC v6:
 * Have rte_bit_test() accept const-marked bitsets.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 259 ++++++++++++++++++++++++++++++++++-
 1 file changed, 257 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..3297133e22 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,196 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		uint32_t *: __rte_bit_test32,			\
+		const uint32_t *: __rte_bit_test32,		\
+		uint64_t *: __rte_bit_test64,			\
+		const uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## family ## fun ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(family, fun, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## family ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## family ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## family ## test ## size(addr, nr);	\
+		__rte_bit_ ## family ## assign ## size(addr, nr, !value); \
+	}
+
+__RTE_GEN_BIT_TEST(, test,, 32)
+__RTE_GEN_BIT_SET(, set,, 32)
+__RTE_GEN_BIT_CLEAR(, clear,, 32)
+__RTE_GEN_BIT_ASSIGN(, assign,, 32)
+__RTE_GEN_BIT_FLIP(, flip,, 32)
+
+__RTE_GEN_BIT_TEST(, test,, 64)
+__RTE_GEN_BIT_SET(, set,, 64)
+__RTE_GEN_BIT_CLEAR(, clear,, 64)
+__RTE_GEN_BIT_ASSIGN(, assign,, 64)
+__RTE_GEN_BIT_FLIP(, flip,, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +980,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v6 2/6] eal: add unit tests for bit operations
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-05-02  5:57                     ` Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
                                       ` (3 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_[test|set|clear|assign|flip]()
functions.

The tests are converted to use the test suite runner framework.

RFC v6:
 * Test rte_bit_*test() usage through const pointers.

RFC v4:
 * Remove redundant line continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 85 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..322f58c066 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,68 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+									\
+			const uint ## size ## _t *const_ptr = &word;	\
+			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
+				    reference_bit,			\
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +172,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v6 3/6] eal: add exactly-once bit access functions
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 2/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-05-02  5:57                     ` Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 4/6] eal: add unit tests for " Mattias Rönnblom
                                       ` (2 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add test/set/clear/assign/flip functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

RFC v6:
 * Have rte_bit_once_test() accept const-marked bitsets.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 197 +++++++++++++++++++++++++++++++++++
 1 file changed, 197 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3297133e22..caec4f36bb 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -226,6 +226,179 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Generic selection macro to test exactly once the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set(addr, 17);
+ * if (rte_bit_once_test(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set() and
+ * rte_bit_test() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * the equivalent to
+ * \code{.c}
+ * if (rte_bit_test(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * rte_bit_once_test() does not give any guarantees in regards to
+ * memory ordering or atomicity.
+ *
+ * The regular bit set operations (e.g., rte_bit_test()) should be
+ * preferred over the "once" family of operations (e.g.,
+ * rte_bit_once_test()) if possible, since the latter may prevent
+ * optimizations crucial for run-time performance.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)					\
+	_Generic((addr),						\
+		uint32_t *: __rte_bit_once_test32,			\
+		const uint32_t *: __rte_bit_once_test32,		\
+		uint64_t *: __rte_bit_once_test64,			\
+		const uint64_t *: __rte_bit_once_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '1'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit set operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_set32,		\
+		 uint64_t *: __rte_bit_once_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to '0'
+ * exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_clear32,		\
+		 uint64_t *: __rte_bit_once_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to bit in a word exactly once.
+ *
+ * Set bit specified by @c nr in the word pointed to by @c addr to the
+ * value indicated by @c value exactly once.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * This function does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_once_assign32,			\
+		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip bit in word, reading and writing exactly once.
+ *
+ * Change the value of a bit to '0' if '1' or '1' if '0' in a 32-bit
+ * or 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This function is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit flip operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_flip(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_flip32,		\
+		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -298,6 +471,18 @@ __RTE_GEN_BIT_CLEAR(, clear,, 64)
 __RTE_GEN_BIT_ASSIGN(, assign,, 64)
 __RTE_GEN_BIT_FLIP(, flip,, 64)
 
+__RTE_GEN_BIT_TEST(once_, test, volatile, 32)
+__RTE_GEN_BIT_SET(once_, set, volatile, 32)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 32)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 32)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 32)
+
+__RTE_GEN_BIT_TEST(once_, test, volatile, 64)
+__RTE_GEN_BIT_SET(once_, set, volatile, 64)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -993,6 +1178,12 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_once_test
+#undef rte_bit_once_set
+#undef rte_bit_once_clear
+#undef rte_bit_once_assign
+#undef rte_bit_once_flip
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1042,6 +1233,12 @@ __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_2R(once_test, const volatile, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_set, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v6 4/6] eal: add unit tests for exactly-once bit access functions
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
                                       ` (2 preceding siblings ...)
  2024-05-02  5:57                     ` [RFC v6 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-05-02  5:57                     ` Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 5/6] eal: add atomic bit operations Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_once_*() family of functions.

RFC v5:
 * Atomic bit op implementation moved from this patch to the proper
   patch in the series. (Morten Brørup)

RFC v4:
 * Remove redundant continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 322f58c066..9bffc4da14 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -61,6 +61,14 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+GEN_TEST_BIT_ACCESS(test_bit_once_access32, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -177,6 +185,8 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v6 5/6] eal: add atomic bit operations
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
                                       ` (3 preceding siblings ...)
  2024-05-02  5:57                     ` [RFC v6 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-05-02  5:57                     ` Mattias Rönnblom
  2024-05-03  6:41                       ` Mattias Rönnblom
  2024-05-02  5:57                     ` [RFC v6 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add atomic bit test/set/clear/assign/flip and
test-and-set/clear/assign/flip functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v6:
 * Have rte_bit_atomic_test() accept const-marked bitsets.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 428 +++++++++++++++++++++++++++++++++++
 1 file changed, 428 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index caec4f36bb..9cde982113 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -399,6 +400,202 @@ extern "C" {
 		 uint32_t *: __rte_bit_once_flip32,		\
 		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 uint64_t *: __rte_bit_atomic_test64,			\
+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
+							    memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '1', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to '0', with the memory ordering as specified by @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Atomically set bit specified by @c nr in the word pointed to by @c
+ * addr to the value indicated by @c value, with the memory ordering
+ * as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Atomically negate the value of the bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Atomically test and set bit specified by @c nr in the word pointed
+ * to by @c addr to '1', with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Atomically test and clear bit specified by @c nr in the word
+ * pointed to by @c addr to '0', with the memory ordering as specified
+ * with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Atomically test and assign bit specified by @c nr in the word
+ * pointed to by @c addr the value specified by @c value, with the
+ * memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use. See <rte_stdatomics.h> for details.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -483,6 +680,162 @@ __RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
 __RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
 __RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_assign ## size(&target, nr, value);	\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+		return __rte_bit_test ## size(&before, nr);		\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t before;				\
+		uint ## size ## _t target;				\
+									\
+		before = rte_atomic_load_explicit(a_addr,		\
+						  rte_memory_order_relaxed); \
+									\
+		do {							\
+			target = before;				\
+			__rte_bit_flip ## size(&target, nr);		\
+		} while (!rte_atomic_compare_exchange_weak_explicit(	\
+				a_addr, &before, target,		\
+				rte_memory_order_relaxed,		\
+				memory_order));				\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
+	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set32(uint32_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear32(uint32_t *addr, unsigned int nr,
+				int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign32(addr, nr, false,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_set64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, true,
+						  memory_order);
+}
+
+__rte_experimental
+static inline bool
+__rte_bit_atomic_test_and_clear64(uint64_t *addr, unsigned int nr,
+			      int memory_order)
+{
+	return __rte_bit_atomic_test_and_assign64(addr, nr, false,
+						  memory_order);
+}
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -1184,6 +1537,14 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_once_assign
 #undef rte_bit_once_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1227,6 +1588,59 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
@@ -1239,6 +1653,20 @@ __RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v6 6/6] eal: add unit tests for atomic bit access functions
  2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
                                       ` (4 preceding siblings ...)
  2024-05-02  5:57                     ` [RFC v6 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-05-02  5:57                     ` Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-02  5:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_atomic_*() family of
functions.

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 315 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 314 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 9bffc4da14..c86d7e1f77 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -69,6 +72,304 @@ GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
 		    rte_bit_once_clear, rte_bit_once_assign,
 		    rte_bit_once_flip, rte_bit_once_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -187,6 +488,18 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_once_access32),
 		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v6 5/6] eal: add atomic bit operations
  2024-05-02  5:57                     ` [RFC v6 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-05-03  6:41                       ` Mattias Rönnblom
  2024-05-03 23:30                         ` Tyler Retzlaff
  0 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-03  6:41 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff, Morten Brørup

On 2024-05-02 07:57, Mattias Rönnblom wrote:
> Add atomic bit test/set/clear/assign/flip and
> test-and-set/clear/assign/flip functions.
> 
> All atomic bit functions allow (and indeed, require) the caller to
> specify a memory order.
> 
> RFC v6:
>   * Have rte_bit_atomic_test() accept const-marked bitsets.
> 
> RFC v4:
>   * Add atomic bit flip.
>   * Mark macro-generated private functions experimental.
> 
> RFC v3:
>   * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> 
> RFC v2:
>   o Add rte_bit_atomic_test_and_assign() (for consistency).
>   o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
>   o Use <rte_stdatomics.h> to support MSVC.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>   lib/eal/include/rte_bitops.h | 428 +++++++++++++++++++++++++++++++++++
>   1 file changed, 428 insertions(+)
> 
> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> index caec4f36bb..9cde982113 100644
> --- a/lib/eal/include/rte_bitops.h
> +++ b/lib/eal/include/rte_bitops.h
> @@ -21,6 +21,7 @@
>   
>   #include <rte_compat.h>
>   #include <rte_debug.h>
> +#include <rte_stdatomic.h>
>   
>   #ifdef __cplusplus
>   extern "C" {
> @@ -399,6 +400,202 @@ extern "C" {
>   		 uint32_t *: __rte_bit_once_flip32,		\
>   		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
>   
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice.
> + *
> + * Test if a particular bit in a word is set with a particular memory
> + * order.
> + *
> + * Test a bit with the resulting memory load ordered as per the
> + * specified memory order.
> + *
> + * @param addr
> + *   A pointer to the word to query.
> + * @param nr
> + *   The index of the bit.
> + * @param memory_order
> + *   The memory order to use. See <rte_stdatomics.h> for details.
> + * @return
> + *   Returns true if the bit is set, and false otherwise.
> + */
> +#define rte_bit_atomic_test(addr, nr, memory_order)			\
> +	_Generic((addr),						\
> +		 uint32_t *: __rte_bit_atomic_test32,			\
> +		 const uint32_t *: __rte_bit_atomic_test32,		\
> +		 uint64_t *: __rte_bit_atomic_test64,			\
> +		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
> +							    memory_order)

Should __rte_bit_atomic_test32()'s addr parameter be marked volatile, 
and two volatile-marked branches added to the above list? Both the 
C11-style GCC built-ins and the C11-proper atomic functions have 
addresses marked volatile. The Linux kernel and the old __sync GCC 
built-ins on the other hand, doesn't (although I think you still get 
volatile semantics). The only point of "volatile", as far as I can see, 
is to avoid warnings in case the user passed a volatile-marked pointer. 
The drawback is that *you're asking for volatile semantics*, although 
with the current compilers, it seems like that is what you get, 
regardless if you asked for it or not.

Just to be clear: even these functions would accept volatile-marked 
pointers, non-volatile pointers should be accepted as well (and should 
generally be preferred).

Isn't parallel programming in C lovely.

<snip>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v6 5/6] eal: add atomic bit operations
  2024-05-03  6:41                       ` Mattias Rönnblom
@ 2024-05-03 23:30                         ` Tyler Retzlaff
  2024-05-04 15:36                           ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: Tyler Retzlaff @ 2024-05-03 23:30 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Mattias Rönnblom, dev, Heng Wang, Stephen Hemminger,
	Morten Brørup

On Fri, May 03, 2024 at 08:41:09AM +0200, Mattias Rönnblom wrote:
> On 2024-05-02 07:57, Mattias Rönnblom wrote:
> >Add atomic bit test/set/clear/assign/flip and
> >test-and-set/clear/assign/flip functions.
> >
> >All atomic bit functions allow (and indeed, require) the caller to
> >specify a memory order.
> >
> >RFC v6:
> >  * Have rte_bit_atomic_test() accept const-marked bitsets.
> >
> >RFC v4:
> >  * Add atomic bit flip.
> >  * Mark macro-generated private functions experimental.
> >
> >RFC v3:
> >  * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> >
> >RFC v2:
> >  o Add rte_bit_atomic_test_and_assign() (for consistency).
> >  o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
> >  o Use <rte_stdatomics.h> to support MSVC.
> >
> >Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >---
> >  lib/eal/include/rte_bitops.h | 428 +++++++++++++++++++++++++++++++++++
> >  1 file changed, 428 insertions(+)
> >
> >diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
> >index caec4f36bb..9cde982113 100644
> >--- a/lib/eal/include/rte_bitops.h
> >+++ b/lib/eal/include/rte_bitops.h
> >@@ -21,6 +21,7 @@
> >  #include <rte_compat.h>
> >  #include <rte_debug.h>
> >+#include <rte_stdatomic.h>
> >  #ifdef __cplusplus
> >  extern "C" {
> >@@ -399,6 +400,202 @@ extern "C" {
> >  		 uint32_t *: __rte_bit_once_flip32,		\
> >  		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
> >+/**
> >+ * @warning
> >+ * @b EXPERIMENTAL: this API may change without prior notice.
> >+ *
> >+ * Test if a particular bit in a word is set with a particular memory
> >+ * order.
> >+ *
> >+ * Test a bit with the resulting memory load ordered as per the
> >+ * specified memory order.
> >+ *
> >+ * @param addr
> >+ *   A pointer to the word to query.
> >+ * @param nr
> >+ *   The index of the bit.
> >+ * @param memory_order
> >+ *   The memory order to use. See <rte_stdatomics.h> for details.
> >+ * @return
> >+ *   Returns true if the bit is set, and false otherwise.
> >+ */
> >+#define rte_bit_atomic_test(addr, nr, memory_order)			\
> >+	_Generic((addr),						\
> >+		 uint32_t *: __rte_bit_atomic_test32,			\
> >+		 const uint32_t *: __rte_bit_atomic_test32,		\
> >+		 uint64_t *: __rte_bit_atomic_test64,			\
> >+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
> >+							    memory_order)
> 
> Should __rte_bit_atomic_test32()'s addr parameter be marked
> volatile, and two volatile-marked branches added to the above list?

off-topic comment relating to the generic type selection list above, i was
reading C17 DR481 recently and i think we may want to avoid providing
qualified and unauqlified types in the list.

DR481: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2396.htm#dr_481

"the controlling expression of a generic selection shall have type
compatibile with at most one of the types named in its generic
association list."

"the type of the controlling expression is the type of the expression as
if it had undergone an lvalue conversion"

"lvalue conversion drops type qualifiers"

so the unqualified type of the controlling expression is only matched
selection list which i guess that means the qualified entries in the
list are never selected.

i suppose the implication here is we couldn't then provide 2 inline
functions one for volatile qualified and for not volatile qualified.

as for a single function where the parameter is or isn't volatile
qualified. if we're always forwarding to an intrinsic i've always
assumed (perhaps incorrectly) that the intrinsic itself did what was
semantically correct even without qualification.

as you note i believe there is a convenience element in providing the
volatile qualified version since it means the function like macro /
inline function will accept both volatile qualified and unqualified
whereas if we did not qualify the parameter it would require the
caller/user to strip the volatile qualification if present with casts.

i imagine in most cases we are just forwarding, in which case it seems
not horrible to provide the qualified version.

> Both the C11-style GCC built-ins and the C11-proper atomic functions
> have addresses marked volatile. The Linux kernel and the old __sync
> GCC built-ins on the other hand, doesn't (although I think you still
> get volatile semantics). The only point of "volatile", as far as I
> can see, is to avoid warnings in case the user passed a
> volatile-marked pointer. The drawback is that *you're asking for
> volatile semantics*, although with the current compilers, it seems
> like that is what you get, regardless if you asked for it or not.
> 
> Just to be clear: even these functions would accept volatile-marked
> pointers, non-volatile pointers should be accepted as well (and
> should generally be preferred).
> 
> Isn't parallel programming in C lovely.

it's super!

> 
> <snip>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v6 5/6] eal: add atomic bit operations
  2024-05-03 23:30                         ` Tyler Retzlaff
@ 2024-05-04 15:36                           ` Mattias Rönnblom
  0 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-04 15:36 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Mattias Rönnblom, dev, Heng Wang, Stephen Hemminger,
	Morten Brørup

On 2024-05-04 01:30, Tyler Retzlaff wrote:
> On Fri, May 03, 2024 at 08:41:09AM +0200, Mattias Rönnblom wrote:
>> On 2024-05-02 07:57, Mattias Rönnblom wrote:
>>> Add atomic bit test/set/clear/assign/flip and
>>> test-and-set/clear/assign/flip functions.
>>>
>>> All atomic bit functions allow (and indeed, require) the caller to
>>> specify a memory order.
>>>
>>> RFC v6:
>>>   * Have rte_bit_atomic_test() accept const-marked bitsets.
>>>
>>> RFC v4:
>>>   * Add atomic bit flip.
>>>   * Mark macro-generated private functions experimental.
>>>
>>> RFC v3:
>>>   * Work around lack of C++ support for _Generic (Tyler Retzlaff).
>>>
>>> RFC v2:
>>>   o Add rte_bit_atomic_test_and_assign() (for consistency).
>>>   o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
>>>   o Use <rte_stdatomics.h> to support MSVC.
>>>
>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>>> ---
>>>   lib/eal/include/rte_bitops.h | 428 +++++++++++++++++++++++++++++++++++
>>>   1 file changed, 428 insertions(+)
>>>
>>> diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
>>> index caec4f36bb..9cde982113 100644
>>> --- a/lib/eal/include/rte_bitops.h
>>> +++ b/lib/eal/include/rte_bitops.h
>>> @@ -21,6 +21,7 @@
>>>   #include <rte_compat.h>
>>>   #include <rte_debug.h>
>>> +#include <rte_stdatomic.h>
>>>   #ifdef __cplusplus
>>>   extern "C" {
>>> @@ -399,6 +400,202 @@ extern "C" {
>>>   		 uint32_t *: __rte_bit_once_flip32,		\
>>>   		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
>>> +/**
>>> + * @warning
>>> + * @b EXPERIMENTAL: this API may change without prior notice.
>>> + *
>>> + * Test if a particular bit in a word is set with a particular memory
>>> + * order.
>>> + *
>>> + * Test a bit with the resulting memory load ordered as per the
>>> + * specified memory order.
>>> + *
>>> + * @param addr
>>> + *   A pointer to the word to query.
>>> + * @param nr
>>> + *   The index of the bit.
>>> + * @param memory_order
>>> + *   The memory order to use. See <rte_stdatomics.h> for details.
>>> + * @return
>>> + *   Returns true if the bit is set, and false otherwise.
>>> + */
>>> +#define rte_bit_atomic_test(addr, nr, memory_order)			\
>>> +	_Generic((addr),						\
>>> +		 uint32_t *: __rte_bit_atomic_test32,			\
>>> +		 const uint32_t *: __rte_bit_atomic_test32,		\
>>> +		 uint64_t *: __rte_bit_atomic_test64,			\
>>> +		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
>>> +							    memory_order)
>>
>> Should __rte_bit_atomic_test32()'s addr parameter be marked
>> volatile, and two volatile-marked branches added to the above list?
> 
> off-topic comment relating to the generic type selection list above, i was
> reading C17 DR481 recently and i think we may want to avoid providing
> qualified and unauqlified types in the list.
> 
> DR481: https://www.open-std.org/jtc1/sc22/wg14/www/docs/n2396.htm#dr_481
> 
> "the controlling expression of a generic selection shall have type
> compatibile with at most one of the types named in its generic
> association list."
> 

Const and unqualified pointers are not compatible. Without the "const 
uint32_t *" element in the association list, passing const-qualified 
pointers to rte_bit_test() will cause a compiler error.

So, if you want to support both passing const-qualified and unqualified 
pointers (which you do, obviously), then you have no other option than 
to treat them separately.

GCC, clang and ICC all seem to agree on this. The standard also is 
pretty clear on this, from what I can tell. "No two generic associations 
in the same generic selection shall specify compatible types." (6.5.1.1, 
note *compatible*). "For two pointer types to be compatible, both shall 
be identically qualified and both shall be pointers to compatible 
types." (6.7.6.1)

> "the type of the controlling expression is the type of the expression as
> if it had undergone an lvalue conversion"
> 
> "lvalue conversion drops type qualifiers"
> 
> so the unqualified type of the controlling expression is only matched
> selection list which i guess that means the qualified entries in the
> list are never selected.
> 
> i suppose the implication here is we couldn't then provide 2 inline
> functions one for volatile qualified and for not volatile qualified.
> 
> as for a single function where the parameter is or isn't volatile
> qualified. if we're always forwarding to an intrinsic i've always
> assumed (perhaps incorrectly) that the intrinsic itself did what was
> semantically correct even without qualification.
> 
> as you note i believe there is a convenience element in providing the
> volatile qualified version since it means the function like macro /
> inline function will accept both volatile qualified and unqualified
> whereas if we did not qualify the parameter it would require the
> caller/user to strip the volatile qualification if present with casts.
> 
> i imagine in most cases we are just forwarding, in which case it seems
> not horrible to provide the qualified version.
> 
>> Both the C11-style GCC built-ins and the C11-proper atomic functions
>> have addresses marked volatile. The Linux kernel and the old __sync
>> GCC built-ins on the other hand, doesn't (although I think you still
>> get volatile semantics). The only point of "volatile", as far as I
>> can see, is to avoid warnings in case the user passed a
>> volatile-marked pointer. The drawback is that *you're asking for
>> volatile semantics*, although with the current compilers, it seems
>> like that is what you get, regardless if you asked for it or not.
>>
>> Just to be clear: even these functions would accept volatile-marked
>> pointers, non-volatile pointers should be accepted as well (and
>> should generally be preferred).
>>
>> Isn't parallel programming in C lovely.
> 
> it's super!
> 
>>
>> <snip>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v7 0/6] Improve EAL bit operations API
  2024-05-02  5:57                     ` [RFC v6 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-05-05  8:37                       ` Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
                                           ` (5 more replies)
  0 siblings, 6 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees and no read-once or write-once
semantics (e.g., no use of volatile), but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions.

Mattias Rönnblom (6):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add exactly-once bit access functions
  eal: add unit tests for exactly-once bit access functions
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions

 app/test/test_bitops.c       | 410 +++++++++++++++-
 lib/eal/include/rte_bitops.h | 873 ++++++++++++++++++++++++++++++++++-
 2 files changed, 1265 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v7 1/6] eal: extend bit manipulation functionality
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-05-05  8:37                         ` Mattias Rönnblom
  2024-08-09  9:04                           ` [PATCH 0/5] Improve EAL bit operations API Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 2/6] eal: add unit tests for bit operations Mattias Rönnblom
                                           ` (4 subsequent siblings)
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add functionality to test and modify the value of individual bits in
32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

RFC v6:
 * Have rte_bit_test() accept const-marked bitsets.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 259 ++++++++++++++++++++++++++++++++++-
 1 file changed, 257 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..3297133e22 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,196 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		uint32_t *: __rte_bit_test32,			\
+		const uint32_t *: __rte_bit_test32,		\
+		uint64_t *: __rte_bit_test64,			\
+		const uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## family ## fun ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(family, fun, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## family ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## family ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## family ## test ## size(addr, nr);	\
+		__rte_bit_ ## family ## assign ## size(addr, nr, !value); \
+	}
+
+__RTE_GEN_BIT_TEST(, test,, 32)
+__RTE_GEN_BIT_SET(, set,, 32)
+__RTE_GEN_BIT_CLEAR(, clear,, 32)
+__RTE_GEN_BIT_ASSIGN(, assign,, 32)
+__RTE_GEN_BIT_FLIP(, flip,, 32)
+
+__RTE_GEN_BIT_TEST(, test,, 64)
+__RTE_GEN_BIT_SET(, set,, 64)
+__RTE_GEN_BIT_CLEAR(, clear,, 64)
+__RTE_GEN_BIT_ASSIGN(, assign,, 64)
+__RTE_GEN_BIT_FLIP(, flip,, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +980,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v7 2/6] eal: add unit tests for bit operations
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-05-05  8:37                         ` Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
                                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_[test|set|clear|assign|flip]()
functions.

The tests are converted to use the test suite runner framework.

RFC v6:
 * Test rte_bit_*test() usage through const pointers.

RFC v4:
 * Remove redundant line continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 85 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..322f58c066 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,68 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+									\
+			const uint ## size ## _t *const_ptr = &word;	\
+			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
+				    reference_bit,			\
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +172,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 2/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-05-05  8:37                         ` Mattias Rönnblom
  2024-05-07 19:17                           ` Morten Brørup
  2024-05-05  8:37                         ` [RFC v7 4/6] eal: add unit tests for " Mattias Rönnblom
                                           ` (2 subsequent siblings)
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add test/set/clear/assign/flip functions which prevents certain
compiler optimizations and guarantees that program-level memory loads
and/or stores will actually occur.

These functions are useful when interacting with memory-mapped
hardware devices.

The "once" family of functions does not promise atomicity and provides
no memory ordering guarantees beyond the C11 relaxed memory model.

RFC v7:
 * Fix various minor issues in documentation.

RFC v6:
 * Have rte_bit_once_test() accept const-marked bitsets.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 201 +++++++++++++++++++++++++++++++++++
 1 file changed, 201 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3297133e22..3644aa115c 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -226,6 +226,183 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Generic selection macro to test exactly once the value of a bit in
+ * a 32-bit or 64-bit word. The type of operation depends on the type
+ * of the @c addr parameter.
+ *
+ * rte_bit_once_test() is guaranteed to result in exactly one memory
+ * load (e.g., it may not be eliminate or merged by the compiler).
+ *
+ * \code{.c}
+ * rte_bit_once_set(addr, 17);
+ * if (rte_bit_once_test(addr, 17)) {
+ *     ...
+ * }
+ * \endcode
+ *
+ * In the above example, rte_bit_once_set() may not be removed by
+ * the compiler, which would be allowed in case rte_bit_set() and
+ * rte_bit_test() was used.
+ *
+ * \code{.c}
+ * while (rte_bit_once_test(addr, 17);
+ *     ;
+ * \endcode
+ *
+ * In case rte_bit_test(addr, 17) was used instead, the resulting
+ * object code could (and in many cases would be) replaced with
+ * the equivalent to
+ * \code{.c}
+ * if (rte_bit_test(addr, 17)) {
+ *   for (;;) // spin forever
+ *       ;
+ * }
+ * \endcode
+ *
+ * rte_bit_once_test() does not give any guarantees in regards to
+ * memory ordering or atomicity.
+ *
+ * The regular bit set operations (e.g., rte_bit_test()) should be
+ * preferred over the "once" family of operations (e.g.,
+ * rte_bit_once_test()) if possible, since the latter may prevent
+ * optimizations crucial for run-time performance.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+
+#define rte_bit_once_test(addr, nr)					\
+	_Generic((addr),						\
+		uint32_t *: __rte_bit_once_test32,			\
+		const uint32_t *: __rte_bit_once_test32,		\
+		uint64_t *: __rte_bit_once_test64,			\
+		const uint64_t *: __rte_bit_once_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word exactly once.
+ *
+ * Generic selection macro to set bit specified by @c nr in the word
+ * pointed to by @c addr to '1' exactly once.
+ *
+ * rte_bit_once_set() is guaranteed to result in exactly one memory
+ * load and exactly one memory store, *or* an atomic bit set
+ * operation.
+ *
+ * See rte_bit_test_once32() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+
+#define rte_bit_once_set(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_set32,		\
+		 uint64_t *: __rte_bit_once_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word exactly once.
+ *
+ * Generic selection macro to set bit specified by @c nr in the word
+ * pointed to by @c addr to '0' exactly once.
+ *
+ * rte_bit_once_clear() is guaranteed to result in exactly one memory load
+ * and exactly one memory store, *or* an atomic bit clear operation.
+ *
+ * See rte_bit_test_once() for more information and uses cases for
+ * the "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_clear(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_clear32,		\
+		 uint64_t *: __rte_bit_once_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to bit in a word exactly once.
+ *
+ * Generic selection macro to set bit specified by @c nr in the word
+ * pointed to by @c addr to the value indicated by @c value exactly
+ * once.
+ *
+ * rte_bit_once_assign() is guaranteed to result in exactly one memory
+ * load and exactly one memory store, *or* an atomic bit clear
+ * operation.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_once_assign(addr, nr, value)				\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_once_assign32,			\
+		 uint64_t *: __rte_bit_once_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip bit in word, reading and writing exactly once.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * rte_bit_once_flip() is guaranteed to result in exactly one memory
+ * load and exactly one memory store, *or* an atomic bit flip
+ * operation.
+ *
+ * See rte_bit_test_once() for more information and uses cases for the
+ * "once" class of functions.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_once_flip(addr, nr)				\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_once_flip32,		\
+		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -298,6 +475,18 @@ __RTE_GEN_BIT_CLEAR(, clear,, 64)
 __RTE_GEN_BIT_ASSIGN(, assign,, 64)
 __RTE_GEN_BIT_FLIP(, flip,, 64)
 
+__RTE_GEN_BIT_TEST(once_, test, volatile, 32)
+__RTE_GEN_BIT_SET(once_, set, volatile, 32)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 32)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 32)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 32)
+
+__RTE_GEN_BIT_TEST(once_, test, volatile, 64)
+__RTE_GEN_BIT_SET(once_, set, volatile, 64)
+__RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
+__RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
+__RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -993,6 +1182,12 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_once_test
+#undef rte_bit_once_set
+#undef rte_bit_once_clear
+#undef rte_bit_once_assign
+#undef rte_bit_once_flip
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1042,6 +1237,12 @@ __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_2R(once_test, const volatile, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_set, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v7 4/6] eal: add unit tests for exactly-once bit access functions
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
                                           ` (2 preceding siblings ...)
  2024-05-05  8:37                         ` [RFC v7 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-05-05  8:37                         ` Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 5/6] eal: add atomic bit operations Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_once_*() family of functions.

RFC v5:
 * Atomic bit op implementation moved from this patch to the proper
   patch in the series. (Morten Brørup)

RFC v4:
 * Remove redundant continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 322f58c066..9bffc4da14 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -61,6 +61,14 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+GEN_TEST_BIT_ACCESS(test_bit_once_access32, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
+		    rte_bit_once_clear, rte_bit_once_assign,
+		    rte_bit_once_flip, rte_bit_once_test, 64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -177,6 +185,8 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v7 5/6] eal: add atomic bit operations
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
                                           ` (3 preceding siblings ...)
  2024-05-05  8:37                         ` [RFC v7 4/6] eal: add unit tests for " Mattias Rönnblom
@ 2024-05-05  8:37                         ` Mattias Rönnblom
  2024-05-05  8:37                         ` [RFC v7 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add atomic bit test/set/clear/assign/flip and
test-and-set/clear/assign/flip functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v7:
 * Replace compare-exchange-based rte_bitset_atomic_test_and_*() and
   flip() with implementations that use the previous value as returned
   by the atomic fetch function.
 * Reword documentation to match the non-atomic macro variants.
 * Remove pointer to <rte_stdatomic.h> for memory model documentation,
   since there is no documentation for that API.

RFC v6:
 * Have rte_bit_atomic_test() accept const-marked bitsets.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 413 +++++++++++++++++++++++++++++++++++
 1 file changed, 413 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3644aa115c..673b888c1a 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -403,6 +404,204 @@ extern "C" {
 		 uint32_t *: __rte_bit_once_flip32,		\
 		 uint64_t *: __rte_bit_once_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 uint64_t *: __rte_bit_atomic_test64,			\
+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
+							    memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '1', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '0', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Generic selection macro to atomically negate the value of the bit
+ * specified by @c nr in the word pointed to by @c addr to the value
+ * indicated by @c value, with the memory ordering as specified with
+ * @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Generic selection macro to atomically test and set bit specified by
+ * @c nr in the word pointed to by @c addr to '1', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Generic selection macro to atomically test and clear bit specified
+ * by @c nr in the word pointed to by @c addr to '0', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Generic selection macro to atomically test and assign bit specified
+ * by @c nr in the word pointed to by @c addr the value specified by
+ * @c value, with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -487,6 +686,145 @@ __RTE_GEN_BIT_CLEAR(once_, clear, volatile, 64)
 __RTE_GEN_BIT_ASSIGN(once_, assign, volatile, 64)
 __RTE_GEN_BIT_FLIP(once_, flip, volatile, 64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_xor_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_set ## size(uint ## size ## _t *addr,	\
+					      unsigned int nr,		\
+					      int memory_order)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+		prev = rte_atomic_fetch_or_explicit(a_addr, mask,	\
+						    memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_clear ## size(uint ## size ## _t *addr,	\
+						unsigned int nr,	\
+						int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+	        prev = rte_atomic_fetch_and_explicit(a_addr, ~mask,	\
+						     memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		if (value)						\
+			return __rte_bit_atomic_test_and_set ## size(addr, nr, \
+								     memory_order); \
+		else							\
+			return __rte_bit_atomic_test_and_clear ## size(addr, nr, \
+								       memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
+	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -1188,6 +1526,14 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_once_assign
 #undef rte_bit_once_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1231,6 +1577,59 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
@@ -1243,6 +1642,20 @@ __RTE_BIT_OVERLOAD_2(once_clear, volatile, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(once_assign, volatile, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(once_flip, volatile, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [RFC v7 6/6] eal: add unit tests for atomic bit access functions
  2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
                                           ` (4 preceding siblings ...)
  2024-05-05  8:37                         ` [RFC v7 5/6] eal: add atomic bit operations Mattias Rönnblom
@ 2024-05-05  8:37                         ` Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-05  8:37 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_atomic_*() family of
functions.

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 315 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 314 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 9bffc4da14..c86d7e1f77 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -69,6 +72,304 @@ GEN_TEST_BIT_ACCESS(test_bit_once_access64, rte_bit_once_set,
 		    rte_bit_once_clear, rte_bit_once_assign,
 		    rte_bit_once_flip, rte_bit_once_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -187,6 +488,18 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_once_access32),
 		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_once_access32),
+		TEST_CASE(test_bit_once_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-05  8:37                         ` [RFC v7 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
@ 2024-05-07 19:17                           ` Morten Brørup
  2024-05-08  6:47                             ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: Morten Brørup @ 2024-05-07 19:17 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Sunday, 5 May 2024 10.38
> 
> Add test/set/clear/assign/flip functions which prevents certain
> compiler optimizations and guarantees that program-level memory loads
> and/or stores will actually occur.
> 
> These functions are useful when interacting with memory-mapped
> hardware devices.
> 
> The "once" family of functions does not promise atomicity and provides
> no memory ordering guarantees beyond the C11 relaxed memory model.

In another thread, Stephen referred to the extended discussion on memory models in Linux kernel documentation:
https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-barriers.html

Unlike the "once" family of functions in this RFC, the "once" family of functions in the kernel also guarantee memory ordering, specifically for memory-mapped hardware devices. The document describes the rationale with examples.

It makes me think that DPDK "once" family of functions should behave similarly.
Alternatively, if the "once" family of functions cannot be generically implemented with a memory ordering that is optimal for all use cases, drop this family of functions, and instead rely on the "atomic" family of functions for interacting with memory-mapped hardware devices.

> 
> RFC v7:
>  * Fix various minor issues in documentation.
> 
> RFC v6:
>  * Have rte_bit_once_test() accept const-marked bitsets.
> 
> RFC v3:
>  * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-07 19:17                           ` Morten Brørup
@ 2024-05-08  6:47                             ` Mattias Rönnblom
  2024-05-08  7:33                               ` Morten Brørup
  0 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-08  6:47 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

On 2024-05-07 21:17, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>> Sent: Sunday, 5 May 2024 10.38
>>
>> Add test/set/clear/assign/flip functions which prevents certain
>> compiler optimizations and guarantees that program-level memory loads
>> and/or stores will actually occur.
>>
>> These functions are useful when interacting with memory-mapped
>> hardware devices.
>>
>> The "once" family of functions does not promise atomicity and provides
>> no memory ordering guarantees beyond the C11 relaxed memory model.
> 
> In another thread, Stephen referred to the extended discussion on memory models in Linux kernel documentation:
> https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-barriers.html
> 
> Unlike the "once" family of functions in this RFC, the "once" family of functions in the kernel also guarantee memory ordering, specifically for memory-mapped hardware devices. The document describes the rationale with examples.
> 

What more specifically did you have in mind? READ_ONCE() and 
WRITE_ONCE()? They give almost no guarantees. Very much relaxed.

I've read that document.

What you should keep in mind if you read that document, is that DPDK 
doesn't use the kernel's memory model, and doesn't have the kernel's 
barrier and atomics APIs. What we have are an obsolete, miniature 
look-alike in <rte_atomic.h> and something C11-like in <rte_stdatomic.h>.

My general impression is that DPDK was moving in the C11 direction 
memory model-wise, which is not the model the kernel uses.

> It makes me think that DPDK "once" family of functions should behave similarly.

I think they do already.

Also, rte_bit_once_set() works as the kernel's __set_bit().

> Alternatively, if the "once" family of functions cannot be generically implemented with a memory ordering that is optimal for all use cases, drop this family of functions, and instead rely on the "atomic" family of functions for interacting with memory-mapped hardware devices.
> 
>>
>> RFC v7:
>>   * Fix various minor issues in documentation.
>>
>> RFC v6:
>>   * Have rte_bit_once_test() accept const-marked bitsets.
>>
>> RFC v3:
>>   * Work around lack of C++ support for _Generic (Tyler Retzlaff).
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>> ---
> 

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08  6:47                             ` Mattias Rönnblom
@ 2024-05-08  7:33                               ` Morten Brørup
  2024-05-08  8:00                                 ` Mattias Rönnblom
  2024-05-08 15:15                                 ` Stephen Hemminger
  0 siblings, 2 replies; 160+ messages in thread
From: Morten Brørup @ 2024-05-08  7:33 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Wednesday, 8 May 2024 08.47
> 
> On 2024-05-07 21:17, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >> Sent: Sunday, 5 May 2024 10.38
> >>
> >> Add test/set/clear/assign/flip functions which prevents certain
> >> compiler optimizations and guarantees that program-level memory loads
> >> and/or stores will actually occur.
> >>
> >> These functions are useful when interacting with memory-mapped
> >> hardware devices.
> >>
> >> The "once" family of functions does not promise atomicity and provides
> >> no memory ordering guarantees beyond the C11 relaxed memory model.
> >
> > In another thread, Stephen referred to the extended discussion on memory
> models in Linux kernel documentation:
> > https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-
> barriers.html
> >
> > Unlike the "once" family of functions in this RFC, the "once" family of
> functions in the kernel also guarantee memory ordering, specifically for
> memory-mapped hardware devices. The document describes the rationale with
> examples.
> >
> 
> What more specifically did you have in mind? READ_ONCE() and
> WRITE_ONCE()? They give almost no guarantees. Very much relaxed.

The way I read it, they do provide memory ordering guarantees.

Ignore that the kernel's "once" functions operates on words and this RFC operates on bits, the behavior is the same. Either there are memory ordering guarantees, or there are not.

> 
> I've read that document.
> 
> What you should keep in mind if you read that document, is that DPDK
> doesn't use the kernel's memory model, and doesn't have the kernel's
> barrier and atomics APIs. What we have are an obsolete, miniature
> look-alike in <rte_atomic.h> and something C11-like in <rte_stdatomic.h>.
> 
> My general impression is that DPDK was moving in the C11 direction
> memory model-wise, which is not the model the kernel uses.

I think you and I agree that using legacy methods only because "the kernel does it that way" would not be the optimal roadmap for DPDK.

We should keep moving in the C11 direction memory model-wise.
I consider it more descriptive, and thus expect compilers to eventually produce better optimized code.

> 
> > It makes me think that DPDK "once" family of functions should behave
> similarly.
> 
> I think they do already.

I haven't looked deep into it, but the RFC's documentation says otherwise:
The "once" family of functions does not promise atomicity and provides *no memory ordering* guarantees beyond the C11 relaxed memory model.

> 
> Also, rte_bit_once_set() works as the kernel's __set_bit().
> 
> > Alternatively, if the "once" family of functions cannot be generically
> implemented with a memory ordering that is optimal for all use cases, drop
> this family of functions, and instead rely on the "atomic" family of functions
> for interacting with memory-mapped hardware devices.
> >
> >>
> >> RFC v7:
> >>   * Fix various minor issues in documentation.
> >>
> >> RFC v6:
> >>   * Have rte_bit_once_test() accept const-marked bitsets.
> >>
> >> RFC v3:
> >>   * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> >>
> >> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >> ---
> >

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08  7:33                               ` Morten Brørup
@ 2024-05-08  8:00                                 ` Mattias Rönnblom
  2024-05-08  8:11                                   ` Morten Brørup
  2024-05-08 15:15                                 ` Stephen Hemminger
  1 sibling, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-08  8:00 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

On 2024-05-08 09:33, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Wednesday, 8 May 2024 08.47
>>
>> On 2024-05-07 21:17, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>>>> Sent: Sunday, 5 May 2024 10.38
>>>>
>>>> Add test/set/clear/assign/flip functions which prevents certain
>>>> compiler optimizations and guarantees that program-level memory loads
>>>> and/or stores will actually occur.
>>>>
>>>> These functions are useful when interacting with memory-mapped
>>>> hardware devices.
>>>>
>>>> The "once" family of functions does not promise atomicity and provides
>>>> no memory ordering guarantees beyond the C11 relaxed memory model.
>>>
>>> In another thread, Stephen referred to the extended discussion on memory
>> models in Linux kernel documentation:
>>> https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-
>> barriers.html
>>>
>>> Unlike the "once" family of functions in this RFC, the "once" family of
>> functions in the kernel also guarantee memory ordering, specifically for
>> memory-mapped hardware devices. The document describes the rationale with
>> examples.
>>>
>>
>> What more specifically did you have in mind? READ_ONCE() and
>> WRITE_ONCE()? They give almost no guarantees. Very much relaxed.
> 
> The way I read it, they do provide memory ordering guarantees.
> 

Sure. All types memory operations comes with some kind guarantees. A 
series of non-atomic, non-volatile stores issued by a particular thread 
are guaranteed to happen in program order, from the point of view of 
that thread, for example. Would be hard to write a program if that 
wasn't true.

"This macro does not give any guarantees in regards to memory ordering /../"

This is not true. I will rephrase to "any *additional* guarantees" for 
both plain and "once" family documentation.

> Ignore that the kernel's "once" functions operates on words and this RFC operates on bits, the behavior is the same. Either there are memory ordering guarantees, or there are not.
> 
>>
>> I've read that document.
>>
>> What you should keep in mind if you read that document, is that DPDK
>> doesn't use the kernel's memory model, and doesn't have the kernel's
>> barrier and atomics APIs. What we have are an obsolete, miniature
>> look-alike in <rte_atomic.h> and something C11-like in <rte_stdatomic.h>.
>>
>> My general impression is that DPDK was moving in the C11 direction
>> memory model-wise, which is not the model the kernel uses.
> 
> I think you and I agree that using legacy methods only because "the kernel does it that way" would not be the optimal roadmap for DPDK.
> 
> We should keep moving in the C11 direction memory model-wise.
> I consider it more descriptive, and thus expect compilers to eventually produce better optimized code.
> 
>>
>>> It makes me think that DPDK "once" family of functions should behave
>> similarly.
>>
>> I think they do already.
> 
> I haven't looked deep into it, but the RFC's documentation says otherwise:
> The "once" family of functions does not promise atomicity and provides *no memory ordering* guarantees beyond the C11 relaxed memory model.
> 
>>
>> Also, rte_bit_once_set() works as the kernel's __set_bit().
>>
>>> Alternatively, if the "once" family of functions cannot be generically
>> implemented with a memory ordering that is optimal for all use cases, drop
>> this family of functions, and instead rely on the "atomic" family of functions
>> for interacting with memory-mapped hardware devices.
>>>
>>>>
>>>> RFC v7:
>>>>    * Fix various minor issues in documentation.
>>>>
>>>> RFC v6:
>>>>    * Have rte_bit_once_test() accept const-marked bitsets.
>>>>
>>>> RFC v3:
>>>>    * Work around lack of C++ support for _Generic (Tyler Retzlaff).
>>>>
>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>>>> ---
>>>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08  8:00                                 ` Mattias Rönnblom
@ 2024-05-08  8:11                                   ` Morten Brørup
  2024-05-08  9:27                                     ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: Morten Brørup @ 2024-05-08  8:11 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Wednesday, 8 May 2024 10.00
> 
> On 2024-05-08 09:33, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >> Sent: Wednesday, 8 May 2024 08.47
> >>
> >> On 2024-05-07 21:17, Morten Brørup wrote:
> >>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >>>> Sent: Sunday, 5 May 2024 10.38
> >>>>
> >>>> Add test/set/clear/assign/flip functions which prevents certain
> >>>> compiler optimizations and guarantees that program-level memory loads
> >>>> and/or stores will actually occur.
> >>>>
> >>>> These functions are useful when interacting with memory-mapped
> >>>> hardware devices.
> >>>>
> >>>> The "once" family of functions does not promise atomicity and provides
> >>>> no memory ordering guarantees beyond the C11 relaxed memory model.
> >>>
> >>> In another thread, Stephen referred to the extended discussion on memory
> >> models in Linux kernel documentation:
> >>> https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-
> >> barriers.html
> >>>
> >>> Unlike the "once" family of functions in this RFC, the "once" family of
> >> functions in the kernel also guarantee memory ordering, specifically for
> >> memory-mapped hardware devices. The document describes the rationale with
> >> examples.
> >>>
> >>
> >> What more specifically did you have in mind? READ_ONCE() and
> >> WRITE_ONCE()? They give almost no guarantees. Very much relaxed.
> >
> > The way I read it, they do provide memory ordering guarantees.
> >
> 
> Sure. All types memory operations comes with some kind guarantees. A
> series of non-atomic, non-volatile stores issued by a particular thread
> are guaranteed to happen in program order, from the point of view of
> that thread, for example. Would be hard to write a program if that
> wasn't true.
> 
> "This macro does not give any guarantees in regards to memory ordering /../"
> 
> This is not true. I will rephrase to "any *additional* guarantees" for
> both plain and "once" family documentation.

Consider code like this:
set_once(HW_START_BIT);
while (!get_once(HW_DONE_BIT)) /*busy wait*/;

If the "once" functions are used for hardware access, they must guarantee that HW_START_BIT has been written before HW_DONE_BIT is read.

The documentation must reflect this ordering guarantee.

> 
> > Ignore that the kernel's "once" functions operates on words and this RFC
> operates on bits, the behavior is the same. Either there are memory ordering
> guarantees, or there are not.
> >
> >>
> >> I've read that document.
> >>
> >> What you should keep in mind if you read that document, is that DPDK
> >> doesn't use the kernel's memory model, and doesn't have the kernel's
> >> barrier and atomics APIs. What we have are an obsolete, miniature
> >> look-alike in <rte_atomic.h> and something C11-like in <rte_stdatomic.h>.
> >>
> >> My general impression is that DPDK was moving in the C11 direction
> >> memory model-wise, which is not the model the kernel uses.
> >
> > I think you and I agree that using legacy methods only because "the kernel
> does it that way" would not be the optimal roadmap for DPDK.
> >
> > We should keep moving in the C11 direction memory model-wise.
> > I consider it more descriptive, and thus expect compilers to eventually
> produce better optimized code.
> >
> >>
> >>> It makes me think that DPDK "once" family of functions should behave
> >> similarly.
> >>
> >> I think they do already.
> >
> > I haven't looked deep into it, but the RFC's documentation says otherwise:
> > The "once" family of functions does not promise atomicity and provides *no
> memory ordering* guarantees beyond the C11 relaxed memory model.
> >
> >>
> >> Also, rte_bit_once_set() works as the kernel's __set_bit().
> >>
> >>> Alternatively, if the "once" family of functions cannot be generically
> >> implemented with a memory ordering that is optimal for all use cases, drop
> >> this family of functions, and instead rely on the "atomic" family of
> functions
> >> for interacting with memory-mapped hardware devices.
> >>>
> >>>>
> >>>> RFC v7:
> >>>>    * Fix various minor issues in documentation.
> >>>>
> >>>> RFC v6:
> >>>>    * Have rte_bit_once_test() accept const-marked bitsets.
> >>>>
> >>>> RFC v3:
> >>>>    * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> >>>>
> >>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >>>> ---
> >>>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08  8:11                                   ` Morten Brørup
@ 2024-05-08  9:27                                     ` Mattias Rönnblom
  2024-05-08 10:08                                       ` Morten Brørup
  0 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-05-08  9:27 UTC (permalink / raw)
  To: Morten Brørup, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

On 2024-05-08 10:11, Morten Brørup wrote:
>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>> Sent: Wednesday, 8 May 2024 10.00
>>
>> On 2024-05-08 09:33, Morten Brørup wrote:
>>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
>>>> Sent: Wednesday, 8 May 2024 08.47
>>>>
>>>> On 2024-05-07 21:17, Morten Brørup wrote:
>>>>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
>>>>>> Sent: Sunday, 5 May 2024 10.38
>>>>>>
>>>>>> Add test/set/clear/assign/flip functions which prevents certain
>>>>>> compiler optimizations and guarantees that program-level memory loads
>>>>>> and/or stores will actually occur.
>>>>>>
>>>>>> These functions are useful when interacting with memory-mapped
>>>>>> hardware devices.
>>>>>>
>>>>>> The "once" family of functions does not promise atomicity and provides
>>>>>> no memory ordering guarantees beyond the C11 relaxed memory model.
>>>>>
>>>>> In another thread, Stephen referred to the extended discussion on memory
>>>> models in Linux kernel documentation:
>>>>> https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-
>>>> barriers.html
>>>>>
>>>>> Unlike the "once" family of functions in this RFC, the "once" family of
>>>> functions in the kernel also guarantee memory ordering, specifically for
>>>> memory-mapped hardware devices. The document describes the rationale with
>>>> examples.
>>>>>
>>>>
>>>> What more specifically did you have in mind? READ_ONCE() and
>>>> WRITE_ONCE()? They give almost no guarantees. Very much relaxed.
>>>
>>> The way I read it, they do provide memory ordering guarantees.
>>>
>>
>> Sure. All types memory operations comes with some kind guarantees. A
>> series of non-atomic, non-volatile stores issued by a particular thread
>> are guaranteed to happen in program order, from the point of view of
>> that thread, for example. Would be hard to write a program if that
>> wasn't true.
>>
>> "This macro does not give any guarantees in regards to memory ordering /../"
>>
>> This is not true. I will rephrase to "any *additional* guarantees" for
>> both plain and "once" family documentation.
> 
> Consider code like this:
> set_once(HW_START_BIT);
> while (!get_once(HW_DONE_BIT)) /*busy wait*/;
> 
> If the "once" functions are used for hardware access, they must guarantee that HW_START_BIT has been written before HW_DONE_BIT is read.
> 

Provided bits reside in the same word, there is (or at least, should be) 
such guarantee, and otherwise, you'll need a barrier.

I'm guessing in most cases the requirements are actually not as strict 
as you pose them: DONE starts as 0, so it may actually be read before 
START is written to, but not all DONE reads can be reordered ahead of 
the single START write. In that case, a compiler barrier between set and 
the get loop should suffice. Otherwise, you need a full barrier, or an 
I/O barrier.

Anyway, since the exact purpose of the "once" type bit operations is 
unclear, maybe I should drop them from the patch set.

Now, they are much like the Linux kernel's __set_bit(), but for hardware 
access, maybe they should be more like writel().

> The documentation must reflect this ordering guarantee.
> 
>>
>>> Ignore that the kernel's "once" functions operates on words and this RFC
>> operates on bits, the behavior is the same. Either there are memory ordering
>> guarantees, or there are not.
>>>
>>>>
>>>> I've read that document.
>>>>
>>>> What you should keep in mind if you read that document, is that DPDK
>>>> doesn't use the kernel's memory model, and doesn't have the kernel's
>>>> barrier and atomics APIs. What we have are an obsolete, miniature
>>>> look-alike in <rte_atomic.h> and something C11-like in <rte_stdatomic.h>.
>>>>
>>>> My general impression is that DPDK was moving in the C11 direction
>>>> memory model-wise, which is not the model the kernel uses.
>>>
>>> I think you and I agree that using legacy methods only because "the kernel
>> does it that way" would not be the optimal roadmap for DPDK.
>>>
>>> We should keep moving in the C11 direction memory model-wise.
>>> I consider it more descriptive, and thus expect compilers to eventually
>> produce better optimized code.
>>>
>>>>
>>>>> It makes me think that DPDK "once" family of functions should behave
>>>> similarly.
>>>>
>>>> I think they do already.
>>>
>>> I haven't looked deep into it, but the RFC's documentation says otherwise:
>>> The "once" family of functions does not promise atomicity and provides *no
>> memory ordering* guarantees beyond the C11 relaxed memory model.
>>>
>>>>
>>>> Also, rte_bit_once_set() works as the kernel's __set_bit().
>>>>
>>>>> Alternatively, if the "once" family of functions cannot be generically
>>>> implemented with a memory ordering that is optimal for all use cases, drop
>>>> this family of functions, and instead rely on the "atomic" family of
>> functions
>>>> for interacting with memory-mapped hardware devices.
>>>>>
>>>>>>
>>>>>> RFC v7:
>>>>>>     * Fix various minor issues in documentation.
>>>>>>
>>>>>> RFC v6:
>>>>>>     * Have rte_bit_once_test() accept const-marked bitsets.
>>>>>>
>>>>>> RFC v3:
>>>>>>     * Work around lack of C++ support for _Generic (Tyler Retzlaff).
>>>>>>
>>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>>>>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>>>>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
>>>>>> ---
>>>>>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08  9:27                                     ` Mattias Rönnblom
@ 2024-05-08 10:08                                       ` Morten Brørup
  0 siblings, 0 replies; 160+ messages in thread
From: Morten Brørup @ 2024-05-08 10:08 UTC (permalink / raw)
  To: Mattias Rönnblom, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Wednesday, 8 May 2024 11.27
> 
> On 2024-05-08 10:11, Morten Brørup wrote:
> >> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >> Sent: Wednesday, 8 May 2024 10.00
> >>
> >> On 2024-05-08 09:33, Morten Brørup wrote:
> >>>> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> >>>> Sent: Wednesday, 8 May 2024 08.47
> >>>>
> >>>> On 2024-05-07 21:17, Morten Brørup wrote:
> >>>>>> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> >>>>>> Sent: Sunday, 5 May 2024 10.38
> >>>>>>
> >>>>>> Add test/set/clear/assign/flip functions which prevents certain
> >>>>>> compiler optimizations and guarantees that program-level memory loads
> >>>>>> and/or stores will actually occur.
> >>>>>>
> >>>>>> These functions are useful when interacting with memory-mapped
> >>>>>> hardware devices.
> >>>>>>
> >>>>>> The "once" family of functions does not promise atomicity and provides
> >>>>>> no memory ordering guarantees beyond the C11 relaxed memory model.
> >>>>>
> >>>>> In another thread, Stephen referred to the extended discussion on memory
> >>>> models in Linux kernel documentation:
> >>>>> https://www.kernel.org/doc/html/latest/core-api/wrappers/memory-
> >>>> barriers.html
> >>>>>
> >>>>> Unlike the "once" family of functions in this RFC, the "once" family of
> >>>> functions in the kernel also guarantee memory ordering, specifically for
> >>>> memory-mapped hardware devices. The document describes the rationale with
> >>>> examples.
> >>>>>
> >>>>
> >>>> What more specifically did you have in mind? READ_ONCE() and
> >>>> WRITE_ONCE()? They give almost no guarantees. Very much relaxed.
> >>>
> >>> The way I read it, they do provide memory ordering guarantees.
> >>>
> >>
> >> Sure. All types memory operations comes with some kind guarantees. A
> >> series of non-atomic, non-volatile stores issued by a particular thread
> >> are guaranteed to happen in program order, from the point of view of
> >> that thread, for example. Would be hard to write a program if that
> >> wasn't true.
> >>
> >> "This macro does not give any guarantees in regards to memory ordering
> /../"
> >>
> >> This is not true. I will rephrase to "any *additional* guarantees" for
> >> both plain and "once" family documentation.
> >
> > Consider code like this:
> > set_once(HW_START_BIT);
> > while (!get_once(HW_DONE_BIT)) /*busy wait*/;
> >
> > If the "once" functions are used for hardware access, they must guarantee
> that HW_START_BIT has been written before HW_DONE_BIT is read.
> >
> 
> Provided bits reside in the same word, there is (or at least, should be)
> such guarantee, and otherwise, you'll need a barrier.
> 
> I'm guessing in most cases the requirements are actually not as strict
> as you pose them: DONE starts as 0, so it may actually be read before
> START is written to, but not all DONE reads can be reordered ahead of
> the single START write. In that case, a compiler barrier between set and
> the get loop should suffice. Otherwise, you need a full barrier, or an
> I/O barrier.
> 
> Anyway, since the exact purpose of the "once" type bit operations is
> unclear, maybe I should drop them from the patch set.

I agree.

The "once" family, unless designed for accessing hardware registers, somehow seems like a subset of the "atomic" family.

Looking at DPDK drivers, they access hardware registers using e.g. rte_read32(), which looks like this:

static __rte_always_inline uint32_t
rte_read32(const volatile void *addr)
{
	uint32_t val;
	val = rte_read32_relaxed(addr);
	rte_io_rmb();
	return val;
}

If the "once" family of functions is for hardware access, they should do something similar regarding ordering and barriers.
And even if they do, I'm not sure the hardware driver developers are going to use them, unless other environments (e.g. Linux, Windows, BSD) supported by the hardware driver's common low-level code provide similar functions.

> 
> Now, they are much like the Linux kernel's __set_bit(), but for hardware
> access, maybe they should be more like writel().
> 
> > The documentation must reflect this ordering guarantee.
> >
> >>
> >>> Ignore that the kernel's "once" functions operates on words and this RFC
> >> operates on bits, the behavior is the same. Either there are memory
> ordering
> >> guarantees, or there are not.
> >>>
> >>>>
> >>>> I've read that document.
> >>>>
> >>>> What you should keep in mind if you read that document, is that DPDK
> >>>> doesn't use the kernel's memory model, and doesn't have the kernel's
> >>>> barrier and atomics APIs. What we have are an obsolete, miniature
> >>>> look-alike in <rte_atomic.h> and something C11-like in <rte_stdatomic.h>.
> >>>>
> >>>> My general impression is that DPDK was moving in the C11 direction
> >>>> memory model-wise, which is not the model the kernel uses.
> >>>
> >>> I think you and I agree that using legacy methods only because "the kernel
> >> does it that way" would not be the optimal roadmap for DPDK.
> >>>
> >>> We should keep moving in the C11 direction memory model-wise.
> >>> I consider it more descriptive, and thus expect compilers to eventually
> >> produce better optimized code.
> >>>
> >>>>
> >>>>> It makes me think that DPDK "once" family of functions should behave
> >>>> similarly.
> >>>>
> >>>> I think they do already.
> >>>
> >>> I haven't looked deep into it, but the RFC's documentation says otherwise:
> >>> The "once" family of functions does not promise atomicity and provides *no
> >> memory ordering* guarantees beyond the C11 relaxed memory model.
> >>>
> >>>>
> >>>> Also, rte_bit_once_set() works as the kernel's __set_bit().
> >>>>
> >>>>> Alternatively, if the "once" family of functions cannot be generically
> >>>> implemented with a memory ordering that is optimal for all use cases,
> drop
> >>>> this family of functions, and instead rely on the "atomic" family of
> >> functions
> >>>> for interacting with memory-mapped hardware devices.
> >>>>>
> >>>>>>
> >>>>>> RFC v7:
> >>>>>>     * Fix various minor issues in documentation.
> >>>>>>
> >>>>>> RFC v6:
> >>>>>>     * Have rte_bit_once_test() accept const-marked bitsets.
> >>>>>>
> >>>>>> RFC v3:
> >>>>>>     * Work around lack of C++ support for _Generic (Tyler Retzlaff).
> >>>>>>
> >>>>>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> >>>>>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> >>>>>> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >>>>>> ---
> >>>>>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08  7:33                               ` Morten Brørup
  2024-05-08  8:00                                 ` Mattias Rönnblom
@ 2024-05-08 15:15                                 ` Stephen Hemminger
  2024-05-08 16:16                                   ` Morten Brørup
  1 sibling, 1 reply; 160+ messages in thread
From: Stephen Hemminger @ 2024-05-08 15:15 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Mattias Rönnblom, Mattias Rönnblom, dev, Heng Wang,
	Tyler Retzlaff

On Wed, 8 May 2024 09:33:43 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> > What more specifically did you have in mind? READ_ONCE() and
> > WRITE_ONCE()? They give almost no guarantees. Very much relaxed.  
> 
> The way I read it, they do provide memory ordering guarantees.
> 
> Ignore that the kernel's "once" functions operates on words and this RFC operates on bits, the behavior is the same. Either there are memory ordering guarantees, or there are not.

The kernel's READ_ONCE/WRITE_ONCE are compiler only ordering, i.e only apply to single CPU.
RTFM memory-barriers.txt..

GUARANTEES
----------

There are some minimal guarantees that may be expected of a CPU:

 (*) On any given CPU, dependent memory accesses will be issued in order, with
     respect to itself.  This means that for:

	Q = READ_ONCE(P); D = READ_ONCE(*Q);

     the CPU will issue the following memory operations:

	Q = LOAD P, D = LOAD *Q

     and always in that order.  However, on DEC Alpha, READ_ONCE() also
     emits a memory-barrier instruction, so that a DEC Alpha CPU will
     instead issue the following memory operations:

	Q = LOAD P, MEMORY_BARRIER, D = LOAD *Q, MEMORY_BARRIER

     Whether on DEC Alpha or not, the READ_ONCE() also prevents compiler
     mischief.

 (*) Overlapping loads and stores within a particular CPU will appear to be
     ordered within that CPU.  This means that for:

	a = READ_ONCE(*X); WRITE_ONCE(*X, b);

     the CPU will only issue the following sequence of memory operations:

	a = LOAD *X, STORE *X = b

     And for:

	WRITE_ONCE(*X, c); d = READ_ONCE(*X);

     the CPU will only issue:

	STORE *X = c, d = LOAD *X

     (Loads and stores overlap if they are targeted at overlapping pieces of
     memory).

^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [RFC v7 3/6] eal: add exactly-once bit access functions
  2024-05-08 15:15                                 ` Stephen Hemminger
@ 2024-05-08 16:16                                   ` Morten Brørup
  0 siblings, 0 replies; 160+ messages in thread
From: Morten Brørup @ 2024-05-08 16:16 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Mattias Rönnblom, Mattias Rönnblom, dev, Heng Wang,
	Tyler Retzlaff

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 8 May 2024 17.16
> 
> On Wed, 8 May 2024 09:33:43 +0200
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > > What more specifically did you have in mind? READ_ONCE() and
> > > WRITE_ONCE()? They give almost no guarantees. Very much relaxed.
> >
> > The way I read it, they do provide memory ordering guarantees.
> >
> > Ignore that the kernel's "once" functions operates on words and this RFC
> operates on bits, the behavior is the same. Either there are memory ordering
> guarantees, or there are not.
> 
> The kernel's READ_ONCE/WRITE_ONCE are compiler only ordering, i.e only apply
> to single CPU.
> RTFM memory-barriers.txt..
> 
> GUARANTEES
> ----------
> 
> There are some minimal guarantees that may be expected of a CPU:
> 
>  (*) On any given CPU, dependent memory accesses will be issued in order, with
>      respect to itself.  This means that for:
> 
> 	Q = READ_ONCE(P); D = READ_ONCE(*Q);
> 
>      the CPU will issue the following memory operations:
> 
> 	Q = LOAD P, D = LOAD *Q
> 
>      and always in that order.
>      However, on DEC Alpha, READ_ONCE() also
>      emits a memory-barrier instruction, so that a DEC Alpha CPU will
>      instead issue the following memory operations:
> 
> 	Q = LOAD P, MEMORY_BARRIER, D = LOAD *Q, MEMORY_BARRIER
> 
>      Whether on DEC Alpha or not, the READ_ONCE() also prevents compiler
>      mischief.
> 
>  (*) Overlapping loads and stores within a particular CPU will appear to be
>      ordered within that CPU.  This means that for:
> 
> 	a = READ_ONCE(*X); WRITE_ONCE(*X, b);
> 
>      the CPU will only issue the following sequence of memory operations:
> 
> 	a = LOAD *X, STORE *X = b
> 
>      And for:
> 
> 	WRITE_ONCE(*X, c); d = READ_ONCE(*X);
> 
>      the CPU will only issue:
> 
> 	STORE *X = c, d = LOAD *X
> 
>      (Loads and stores overlap if they are targeted at overlapping pieces of
>      memory).

It says "*the CPU* will issue the following [sequence of] *memory operations*",
not "*the compiler* will generate the following *CPU instructions*".

To me, that reads like a memory ordering guarantee.


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH 0/5] Improve EAL bit operations API
  2024-05-05  8:37                         ` [RFC v7 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-08-09  9:04                           ` Mattias Rönnblom
  2024-08-09  9:04                             ` [PATCH 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
                                               ` (4 more replies)
  0 siblings, 5 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09  9:04 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() functions is replaced
with two new families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees, but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant. rte_bit_[test|set|clear|assign|flip]() may be
used with volatile word pointers, in which case they guarantee
that the program-level accesses actually occur.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions, implemented by
means of a huge, complicated C macro mess.

Mattias Rönnblom (5):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions
  eal: extend bitops to handle volatile pointers

 app/test/test_bitops.c       | 414 ++++++++++++++++++-
 lib/eal/include/rte_bitops.h | 778 ++++++++++++++++++++++++++++++++++-
 2 files changed, 1174 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH 1/5] eal: extend bit manipulation functionality
  2024-08-09  9:04                           ` [PATCH 0/5] Improve EAL bit operations API Mattias Rönnblom
@ 2024-08-09  9:04                             ` Mattias Rönnblom
  2024-08-09  9:58                               ` [PATCH v2 0/5] Improve EAL bit operations API Mattias Rönnblom
  2024-08-09  9:04                             ` [PATCH 2/5] eal: add unit tests for bit operations Mattias Rönnblom
                                               ` (3 subsequent siblings)
  4 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09  9:04 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add functionality to test and modify the value of individual bits in
32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

RFC v6:
 * Have rte_bit_test() accept const-marked bitsets.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 259 ++++++++++++++++++++++++++++++++++-
 1 file changed, 257 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..3297133e22 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,196 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		uint32_t *: __rte_bit_test32,			\
+		const uint32_t *: __rte_bit_test32,		\
+		uint64_t *: __rte_bit_test64,			\
+		const uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## family ## fun ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(family, fun, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## family ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## family ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## family ## test ## size(addr, nr);	\
+		__rte_bit_ ## family ## assign ## size(addr, nr, !value); \
+	}
+
+__RTE_GEN_BIT_TEST(, test,, 32)
+__RTE_GEN_BIT_SET(, set,, 32)
+__RTE_GEN_BIT_CLEAR(, clear,, 32)
+__RTE_GEN_BIT_ASSIGN(, assign,, 32)
+__RTE_GEN_BIT_FLIP(, flip,, 32)
+
+__RTE_GEN_BIT_TEST(, test,, 64)
+__RTE_GEN_BIT_SET(, set,, 64)
+__RTE_GEN_BIT_CLEAR(, clear,, 64)
+__RTE_GEN_BIT_ASSIGN(, assign,, 64)
+__RTE_GEN_BIT_FLIP(, flip,, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +980,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH 2/5] eal: add unit tests for bit operations
  2024-08-09  9:04                           ` [PATCH 0/5] Improve EAL bit operations API Mattias Rönnblom
  2024-08-09  9:04                             ` [PATCH 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-08-09  9:04                             ` Mattias Rönnblom
  2024-08-09 15:03                               ` Stephen Hemminger
  2024-08-09  9:04                             ` [PATCH 3/5] eal: add atomic " Mattias Rönnblom
                                               ` (2 subsequent siblings)
  4 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09  9:04 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_[test|set|clear|assign|flip]()
functions.

The tests are converted to use the test suite runner framework.

RFC v6:
 * Test rte_bit_*test() usage through const pointers.

RFC v4:
 * Remove redundant line continuations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 85 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..322f58c066 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,68 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+									\
+			const uint ## size ## _t *const_ptr = &word;	\
+			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
+				    reference_bit,			\
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +172,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH 3/5] eal: add atomic bit operations
  2024-08-09  9:04                           ` [PATCH 0/5] Improve EAL bit operations API Mattias Rönnblom
  2024-08-09  9:04                             ` [PATCH 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-08-09  9:04                             ` [PATCH 2/5] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-08-09  9:04                             ` Mattias Rönnblom
  2024-08-09  9:04                             ` [PATCH 4/5] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  2024-08-09  9:04                             ` [PATCH 5/5] eal: extend bitops to handle volatile pointers Mattias Rönnblom
  4 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09  9:04 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add atomic bit test/set/clear/assign/flip and
test-and-set/clear/assign/flip functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

RFC v8:
 * Add missing macro #undef for C++ version of atomic bit flip.

RFC v7:
 * Replace compare-exchange-based rte_bitset_atomic_test_and_*() and
   flip() with implementations that use the previous value as returned
   by the atomic fetch function.
 * Reword documentation to match the non-atomic macro variants.
 * Remove pointer to <rte_stdatomic.h> for memory model documentation,
   since there is no documentation for that API.

RFC v6:
 * Have rte_bit_atomic_test() accept const-marked bitsets.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/eal/include/rte_bitops.h | 414 +++++++++++++++++++++++++++++++++++
 1 file changed, 414 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3297133e22..4d878099ed 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -226,6 +227,204 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 uint64_t *: __rte_bit_atomic_test64,			\
+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
+							    memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '1', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '0', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Generic selection macro to atomically negate the value of the bit
+ * specified by @c nr in the word pointed to by @c addr to the value
+ * indicated by @c value, with the memory ordering as specified with
+ * @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Generic selection macro to atomically test and set bit specified by
+ * @c nr in the word pointed to by @c addr to '1', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Generic selection macro to atomically test and clear bit specified
+ * by @c nr in the word pointed to by @c addr to '0', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Generic selection macro to atomically test and assign bit specified
+ * by @c nr in the word pointed to by @c addr the value specified by
+ * @c value, with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -298,6 +497,145 @@ __RTE_GEN_BIT_CLEAR(, clear,, 64)
 __RTE_GEN_BIT_ASSIGN(, assign,, 64)
 __RTE_GEN_BIT_FLIP(, flip,, 64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_xor_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)				\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_set ## size(uint ## size ## _t *addr,	\
+					      unsigned int nr,		\
+					      int memory_order)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+		prev = rte_atomic_fetch_or_explicit(a_addr, mask,	\
+						    memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_clear ## size(uint ## size ## _t *addr, \
+						unsigned int nr,	\
+						int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+	        prev = rte_atomic_fetch_and_explicit(a_addr, ~mask,	\
+						     memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		if (value)						\
+			return __rte_bit_atomic_test_and_set ## size(addr, nr, \
+								     memory_order); \
+		else							\
+			return __rte_bit_atomic_test_and_clear ## size(addr, nr, \
+								       memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
+	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -993,6 +1331,15 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_flip
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1036,12 +1383,79 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH 4/5] eal: add unit tests for atomic bit access functions
  2024-08-09  9:04                           ` [PATCH 0/5] Improve EAL bit operations API Mattias Rönnblom
                                               ` (2 preceding siblings ...)
  2024-08-09  9:04                             ` [PATCH 3/5] eal: add atomic " Mattias Rönnblom
@ 2024-08-09  9:04                             ` Mattias Rönnblom
  2024-08-09  9:04                             ` [PATCH 5/5] eal: extend bitops to handle volatile pointers Mattias Rönnblom
  4 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09  9:04 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_atomic_*() family of
functions.

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 app/test/test_bitops.c | 313 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 312 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 322f58c066..b80216a0a1 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -61,6 +64,304 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -177,6 +478,16 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH 5/5] eal: extend bitops to handle volatile pointers
  2024-08-09  9:04                           ` [PATCH 0/5] Improve EAL bit operations API Mattias Rönnblom
                                               ` (3 preceding siblings ...)
  2024-08-09  9:04                             ` [PATCH 4/5] eal: add unit tests for atomic bit access functions Mattias Rönnblom
@ 2024-08-09  9:04                             ` Mattias Rönnblom
  4 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09  9:04 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Have rte_bit_[test|set|clear|assign|flip]() and rte_bit_atomic_*()
handle volatile-marked pointers.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/test_bitops.c       |  30 ++-
 lib/eal/include/rte_bitops.h | 427 ++++++++++++++++++++++-------------
 2 files changed, 289 insertions(+), 168 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index b80216a0a1..e6e9f7ec44 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -14,13 +14,13 @@
 #include "test.h"
 
 #define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
-			    flip_fun, test_fun, size)			\
+			    flip_fun, test_fun, size, mod)		\
 	static int							\
 	test_name(void)							\
 	{								\
 		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
 		unsigned int bit_nr;					\
-		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+		mod uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
 									\
 		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
 			bool reference_bit = (reference >> bit_nr) & 1;	\
@@ -41,7 +41,7 @@
 				    "Bit %d had unflipped value", bit_nr); \
 			flip_fun(&word, bit_nr);			\
 									\
-			const uint ## size ## _t *const_ptr = &word;	\
+			const mod uint ## size ## _t *const_ptr = &word; \
 			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
 				    reference_bit,			\
 				    "Bit %d had unexpected value", bit_nr); \
@@ -59,10 +59,16 @@
 	}
 
 GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
-		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32,)
 
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
-		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64,)
+
+GEN_TEST_BIT_ACCESS(test_bit_v_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32, volatile)
+
+GEN_TEST_BIT_ACCESS(test_bit_v_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64, volatile)
 
 #define bit_atomic_set(addr, nr)				\
 	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
@@ -81,11 +87,19 @@ GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 
 GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
 		    bit_atomic_clear, bit_atomic_assign,
-		    bit_atomic_flip, bit_atomic_test, 32)
+		    bit_atomic_flip, bit_atomic_test, 32,)
 
 GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
 		    bit_atomic_clear, bit_atomic_assign,
-		    bit_atomic_flip, bit_atomic_test, 64)
+		    bit_atomic_flip, bit_atomic_test, 64,)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_v_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32, volatile)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_v_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64, volatile)
 
 #define PARALLEL_TEST_RUNTIME 0.25
 
@@ -480,6 +494,8 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_v_access32),
+		TEST_CASE(test_bit_v_access64),
 		TEST_CASE(test_bit_atomic_access32),
 		TEST_CASE(test_bit_atomic_access64),
 		TEST_CASE(test_bit_atomic_parallel_assign32),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 4d878099ed..1355949fb6 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -127,12 +127,16 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_test(addr, nr)					\
-	_Generic((addr),					\
-		uint32_t *: __rte_bit_test32,			\
-		const uint32_t *: __rte_bit_test32,		\
-		uint64_t *: __rte_bit_test64,			\
-		const uint64_t *: __rte_bit_test64)(addr, nr)
+#define rte_bit_test(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_test32,				\
+		 const uint32_t *: __rte_bit_test32,			\
+		 volatile uint32_t *: __rte_bit_v_test32,		\
+		 const volatile uint32_t *: __rte_bit_v_test32,		\
+		 uint64_t *: __rte_bit_test64,				\
+		 const uint64_t *: __rte_bit_test64,			\
+		 volatile uint64_t *: __rte_bit_v_test64,		\
+		 const volatile uint64_t *: __rte_bit_v_test64)(addr, nr)
 
 /**
  * @warning
@@ -152,10 +156,12 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_set(addr, nr)				\
-	_Generic((addr),				\
-		 uint32_t *: __rte_bit_set32,		\
-		 uint64_t *: __rte_bit_set64)(addr, nr)
+#define rte_bit_set(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_set32,				\
+		 volatile uint32_t *: __rte_bit_v_set32,		\
+		 uint64_t *: __rte_bit_set64,				\
+		 volatile uint64_t *: __rte_bit_v_set64)(addr, nr)
 
 /**
  * @warning
@@ -175,10 +181,12 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_clear(addr, nr)					\
-	_Generic((addr),					\
-		 uint32_t *: __rte_bit_clear32,			\
-		 uint64_t *: __rte_bit_clear64)(addr, nr)
+#define rte_bit_clear(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_clear32,				\
+		 volatile uint32_t *: __rte_bit_v_clear32,		\
+		 uint64_t *: __rte_bit_clear64,				\
+		 volatile uint64_t *: __rte_bit_v_clear64)(addr, nr)
 
 /**
  * @warning
@@ -202,7 +210,9 @@ extern "C" {
 #define rte_bit_assign(addr, nr, value)					\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_assign32,			\
-		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+		 volatile uint32_t *: __rte_bit_v_assign32,		\
+		 uint64_t *: __rte_bit_assign64,			\
+		 volatile uint64_t *: __rte_bit_v_assign64)(addr, nr, value)
 
 /**
  * @warning
@@ -225,7 +235,9 @@ extern "C" {
 #define rte_bit_flip(addr, nr)						\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_flip32,				\
-		 uint64_t *: __rte_bit_flip64)(addr, nr)
+		 volatile uint32_t *: __rte_bit_v_flip32,		\
+		 uint64_t *: __rte_bit_flip64,				\
+		 volatile uint64_t *: __rte_bit_v_flip64)(addr, nr)
 
 /**
  * @warning
@@ -250,9 +262,13 @@ extern "C" {
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test32,			\
 		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 volatile uint32_t *: __rte_bit_atomic_v_test32,	\
+		 const volatile uint32_t *: __rte_bit_atomic_v_test32,	\
 		 uint64_t *: __rte_bit_atomic_test64,			\
-		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
-							    memory_order)
+		 const uint64_t *: __rte_bit_atomic_test64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test64,	\
+		 const volatile uint64_t *: __rte_bit_atomic_v_test64) \
+						    (addr, nr, memory_order)
 
 /**
  * @warning
@@ -274,7 +290,10 @@ extern "C" {
 #define rte_bit_atomic_set(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_set32,			\
-		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_set32,		\
+		 uint64_t *: __rte_bit_atomic_set64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_set64)(addr, nr, \
+								memory_order)
 
 /**
  * @warning
@@ -296,7 +315,10 @@ extern "C" {
 #define rte_bit_atomic_clear(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_clear32,			\
-		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_clear32,	\
+		 uint64_t *: __rte_bit_atomic_clear64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_clear64)(addr, nr, \
+								  memory_order)
 
 /**
  * @warning
@@ -320,8 +342,11 @@ extern "C" {
 #define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_assign32,			\
-		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
-							memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_assign32,	\
+		 uint64_t *: __rte_bit_atomic_assign64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_assign64)(addr, nr, \
+								   value, \
+								   memory_order)
 
 /**
  * @warning
@@ -344,7 +369,10 @@ extern "C" {
 #define rte_bit_atomic_flip(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_flip32,			\
-		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_flip32,	\
+		 uint64_t *: __rte_bit_atomic_flip64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_flip64)(addr, nr, \
+								 memory_order)
 
 /**
  * @warning
@@ -368,8 +396,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
-		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
-							      memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_set32, \
+		 uint64_t *: __rte_bit_atomic_test_and_set64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_set64) \
+						    (addr, nr, memory_order)
 
 /**
  * @warning
@@ -393,8 +423,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
-		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
-								memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_clear32, \
+		 uint64_t *: __rte_bit_atomic_test_and_clear64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_clear64) \
+						       (addr, nr, memory_order)
 
 /**
  * @warning
@@ -421,9 +453,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
-		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
-								 value, \
-								 memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_assign32, \
+		 uint64_t *: __rte_bit_atomic_test_and_assign64,	\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_assign64) \
+						(addr, nr, value, memory_order)
 
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
@@ -491,93 +524,105 @@ __RTE_GEN_BIT_CLEAR(, clear,, 32)
 __RTE_GEN_BIT_ASSIGN(, assign,, 32)
 __RTE_GEN_BIT_FLIP(, flip,, 32)
 
+__RTE_GEN_BIT_TEST(v_, test, volatile, 32)
+__RTE_GEN_BIT_SET(v_, set, volatile, 32)
+__RTE_GEN_BIT_CLEAR(v_, clear, volatile, 32)
+__RTE_GEN_BIT_ASSIGN(v_, assign, volatile, 32)
+__RTE_GEN_BIT_FLIP(v_, flip, volatile, 32)
+
 __RTE_GEN_BIT_TEST(, test,, 64)
 __RTE_GEN_BIT_SET(, set,, 64)
 __RTE_GEN_BIT_CLEAR(, clear,, 64)
 __RTE_GEN_BIT_ASSIGN(, assign,, 64)
 __RTE_GEN_BIT_FLIP(, flip,, 64)
 
-#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+__RTE_GEN_BIT_TEST(v_, test, volatile, 64)
+__RTE_GEN_BIT_SET(v_, set, volatile, 64)
+__RTE_GEN_BIT_CLEAR(v_, clear, volatile, 64)
+__RTE_GEN_BIT_ASSIGN(v_, assign, volatile, 64)
+__RTE_GEN_BIT_FLIP(v_, flip, volatile, 64)
+
+#define __RTE_GEN_BIT_ATOMIC_TEST(v, qualifier, size)			\
 	__rte_experimental						\
 	static inline bool						\
-	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
-				      unsigned int nr, int memory_order) \
+	__rte_bit_atomic_ ## v ## test ## size(const qualifier uint ## size ## _t *addr, \
+					       unsigned int nr, int memory_order) \
 	{								\
 		RTE_ASSERT(nr < size);					\
 									\
-		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
-			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		const qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr = \
+			(const qualifier RTE_ATOMIC(uint ## size ## _t) *)addr;	\
 		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
 		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+#define __RTE_GEN_BIT_ATOMIC_SET(v, qualifier, size)			\
 	__rte_experimental						\
 	static inline void						\
-	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
-				     unsigned int nr, int memory_order)	\
+	__rte_bit_atomic_ ## v ## set ## size(qualifier uint ## size ## _t *addr, \
+					      unsigned int nr, int memory_order) \
 	{								\
 		RTE_ASSERT(nr < size);					\
 									\
-		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
-			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
 		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
 		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(v, qualifier, size)			\
 	__rte_experimental						\
 	static inline void						\
-	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
-				       unsigned int nr, int memory_order) \
+	__rte_bit_atomic_ ## v ## clear ## size(qualifier uint ## size ## _t *addr,	\
+						unsigned int nr, int memory_order) \
 	{								\
 		RTE_ASSERT(nr < size);					\
 									\
-		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
-			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
 		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
 		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+#define __RTE_GEN_BIT_ATOMIC_FLIP(v, qualifier, size)			\
 	__rte_experimental						\
 	static inline void						\
-	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
-				       unsigned int nr, int memory_order) \
+	__rte_bit_atomic_ ## v ## flip ## size(qualifier uint ## size ## _t *addr, \
+					       unsigned int nr, int memory_order) \
 	{								\
 		RTE_ASSERT(nr < size);					\
 									\
-		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
-			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
 		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
 		rte_atomic_fetch_xor_explicit(a_addr, mask, memory_order); \
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(v, qualifier, size)			\
 	__rte_experimental						\
 	static inline void						\
-	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
-					unsigned int nr, bool value,	\
-					int memory_order)		\
+	__rte_bit_atomic_## v ## assign ## size(qualifier uint ## size ## _t *addr, \
+						unsigned int nr, bool value, \
+						int memory_order)	\
 	{								\
 		if (value)						\
-			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+			__rte_bit_atomic_ ## v ## set ## size(addr, nr, memory_order); \
 		else							\
-			__rte_bit_atomic_clear ## size(addr, nr,	\
-						       memory_order);	\
+			__rte_bit_atomic_ ## v ## clear ## size(addr, nr, \
+								     memory_order); \
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)				\
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(v, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
-	__rte_bit_atomic_test_and_set ## size(uint ## size ## _t *addr,	\
-					      unsigned int nr,		\
-					      int memory_order)		\
+	__rte_bit_atomic_ ## v ## test_and_set ## size(qualifier uint ## size ## _t *addr, \
+						       unsigned int nr,	\
+						       int memory_order) \
 	{								\
 		RTE_ASSERT(nr < size);					\
 									\
-		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
-			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
 		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
 		uint ## size ## _t prev;				\
 									\
@@ -587,17 +632,17 @@ __RTE_GEN_BIT_FLIP(, flip,, 64)
 		return prev & mask;					\
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)			\
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(v, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
-	__rte_bit_atomic_test_and_clear ## size(uint ## size ## _t *addr, \
-						unsigned int nr,	\
-						int memory_order)	\
+	__rte_bit_atomic_ ## v ## test_and_clear ## size(qualifier uint ## size ## _t *addr, \
+							 unsigned int nr, \
+							 int memory_order) \
 	{								\
 		RTE_ASSERT(nr < size);					\
 									\
-		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
-			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
 		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
 		uint ## size ## _t prev;				\
 									\
@@ -607,34 +652,36 @@ __RTE_GEN_BIT_FLIP(, flip,, 64)
 		return prev & mask;					\
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(v, qualifier, size)	\
 	__rte_experimental						\
 	static inline bool						\
-	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
-						 unsigned int nr,	\
-						 bool value,		\
-						 int memory_order)	\
+	__rte_bit_atomic_ ## v ## test_and_assign ## size(qualifier uint ## size ## _t *addr, \
+							  unsigned int nr, \
+							  bool value,	\
+							  int memory_order) \
 	{								\
 		if (value)						\
-			return __rte_bit_atomic_test_and_set ## size(addr, nr, \
-								     memory_order); \
+			return __rte_bit_atomic_ ## v ## test_and_set ## size(addr, nr, memory_order); \
 		else							\
-			return __rte_bit_atomic_test_and_clear ## size(addr, nr, \
-								       memory_order); \
+			return __rte_bit_atomic_ ## v ## test_and_clear ## size(addr, nr, memory_order); \
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
-	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
-	__RTE_GEN_BIT_ATOMIC_SET(size)			\
-	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
-	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
-	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)		\
-	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)	\
-	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
-	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+#define __RTE_GEN_BIT_ATOMIC_OPS(v, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST(v, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_SET(v, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(v, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(v, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(v, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(v, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(v, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_FLIP(v, qualifier, size)
 
-__RTE_GEN_BIT_ATOMIC_OPS(32)
-__RTE_GEN_BIT_ATOMIC_OPS(64)
+#define __RTE_GEN_BIT_ATOMIC_OPS_SIZE(size) \
+	__RTE_GEN_BIT_ATOMIC_OPS(,, size) \
+	__RTE_GEN_BIT_ATOMIC_OPS(v_, volatile, size)
+
+__RTE_GEN_BIT_ATOMIC_OPS_SIZE(32)
+__RTE_GEN_BIT_ATOMIC_OPS_SIZE(64)
 
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
@@ -1340,120 +1387,178 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_atomic_test_and_clear
 #undef rte_bit_atomic_test_and_assign
 
-#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+#define __RTE_BIT_OVERLOAD_V_2(family, v, fun, c, size, arg1_type, arg1_name) \
 	static inline void						\
-	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
-			arg1_type arg1_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name)			\
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
-	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
-	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+#define __RTE_BIT_OVERLOAD_SZ_2(family, fun, c, size, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_V_2(family,, fun, c, size, arg1_type,	\
+			       arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2(family, v_, fun, c volatile, size, \
+			       arg1_type, arg1_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name)				\
+#define __RTE_BIT_OVERLOAD_2(family, fun, c, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(family, fun, c, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(family, fun, c, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_V_2R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name)				\
 	static inline ret_type						\
-	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
 			arg1_type arg1_name)				\
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
-	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2R(family,, fun, c, size, ret_type, arg1_type, \
+				arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2R(family, v_, fun, c volatile,		\
+				size, ret_type, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_2R(family, fun, c, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, 32, ret_type, arg1_type, \
 				 arg1_name)				\
-	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+	__RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, 64, ret_type, arg1_type, \
 				 arg1_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
-				arg2_type, arg2_name)			\
+#define __RTE_BIT_OVERLOAD_V_3(family, v, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name)			\
 	static inline void						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name) \
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name, \
+							 arg2_name);	\
 	}
 
-#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+#define __RTE_BIT_OVERLOAD_SZ_3(family, fun, c, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_V_3(family,, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_V_3(family, v_, fun, c volatile, size, arg1_type, \
+			       arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_3(family, fun, c, arg1_type, arg1_name, arg2_type, \
 			     arg2_name)					\
-	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+	__RTE_BIT_OVERLOAD_SZ_3(family, fun, c, 32, arg1_type, arg1_name, \
 				arg2_type, arg2_name)			\
-	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+	__RTE_BIT_OVERLOAD_SZ_3(family, fun, c, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name)	\
+#define __RTE_BIT_OVERLOAD_V_3R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name)	\
 	static inline ret_type						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name) \
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name, \
+								arg2_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
-			      arg2_type, arg2_name)			\
-	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, size, ret_type, arg1_type, \
 				 arg1_name, arg2_type, arg2_name)	\
-	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name)
+	__RTE_BIT_OVERLOAD_V_3R(family,, fun, c, size, ret_type, \
+				arg1_type, arg1_name, arg2_type, arg2_name) \
+	__RTE_BIT_OVERLOAD_V_3R(family, v_, fun, c volatile, size, \
+				ret_type, arg1_type, arg1_name, arg2_type, \
+				arg2_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
-				arg2_type, arg2_name, arg3_type, arg3_name) \
+#define __RTE_BIT_OVERLOAD_3R(family, fun, c, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, 32, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name) \
+	__RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, 64, ret_type, \
+				 arg1_type, arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_V_4(family, v, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name, arg3_type,	arg3_name) \
 	static inline void						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name, arg3_type arg3_name)	\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name, \
+				  arg3_type arg3_name)			\
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
-					  arg3_name);		      \
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name, \
+							 arg2_name,	\
+							 arg3_name);	\
 	}
 
-#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
-			     arg2_name, arg3_type, arg3_name)		\
-	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+#define __RTE_BIT_OVERLOAD_SZ_4(family, fun, c, size, arg1_type, arg1_name, \
 				arg2_type, arg2_name, arg3_type, arg3_name) \
-	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
-				arg2_type, arg2_name, arg3_type, arg3_name)
-
-#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name, arg3_type, \
-				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4(family,, fun, c, size, arg1_type,	\
+			       arg1_name, arg2_type, arg2_name, arg3_type, \
+			       arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4(family, v_, fun, c volatile, size,	\
+			       arg1_type, arg1_name, arg2_type, arg2_name, \
+			       arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_4(family, fun, c, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(family, fun, c, 32, arg1_type,		\
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4(family, fun, c, 64, arg1_type,		\
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)
+
+#define __RTE_BIT_OVERLOAD_V_4R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
 	static inline ret_type						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name, arg3_type arg3_name)	\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name, \
+				  arg3_type arg3_name)			\
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
-						 arg3_name);		\
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name, \
+								arg2_name, \
+								arg3_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
-			      arg2_type, arg2_name, arg3_type, arg3_name) \
-	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, size, ret_type, arg1_type, \
 				 arg1_name, arg2_type, arg2_name, arg3_type, \
 				 arg3_name)				\
-	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name, arg3_type, \
-				 arg3_name)
-
-__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
-__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
-__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
-__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
-__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
-
-__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+	__RTE_BIT_OVERLOAD_V_4R(family,, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4R(family, v_, fun, c volatile, size,	\
+				ret_type, arg1_type, arg1_name, arg2_type, \
+				arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_4R(family, fun, c, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, 32, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name, \
+				 arg3_type, arg3_name)			\
+	__RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, 64, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name, \
+				 arg3_type, arg3_name)
+
+__RTE_BIT_OVERLOAD_2R(, test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(, set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(, clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(, assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(, flip,, unsigned int, nr)
+
+__RTE_BIT_OVERLOAD_3R(atomic_, test, const, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+__RTE_BIT_OVERLOAD_3(atomic_, set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_, clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_, assign,, unsigned int, nr, bool, value,
 		     int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_3(atomic_, flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_, test_and_set,, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_3R(atomic_, test_and_clear,, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_4R(atomic_, test_and_assign,, bool, unsigned int, nr,
 		      bool, value, int, memory_order)
 
 #endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v2 0/5] Improve EAL bit operations API
  2024-08-09  9:04                             ` [PATCH 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-08-09  9:58                               ` Mattias Rönnblom
  2024-08-09  9:58                                 ` [PATCH v2 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
                                                   ` (4 more replies)
  0 siblings, 5 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09  9:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() functions is replaced
with two new families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees, but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant. rte_bit_[test|set|clear|assign|flip]() may be
used with volatile word pointers, in which case they guarantee
that the program-level accesses actually occur.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions, implemented by
means of a huge, complicated C macro mess.

Mattias Rönnblom (5):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions
  eal: extend bitops to handle volatile pointers

 app/test/test_bitops.c       | 414 ++++++++++++++++++-
 lib/eal/include/rte_bitops.h | 778 ++++++++++++++++++++++++++++++++++-
 2 files changed, 1174 insertions(+), 18 deletions(-)

Mattias Rönnblom (5):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions
  eal: extend bitops to handle volatile pointers

 app/test/test_bitops.c       | 416 ++++++++++++++++++-
 lib/eal/include/rte_bitops.h | 778 ++++++++++++++++++++++++++++++++++-
 2 files changed, 1176 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v2 1/5] eal: extend bit manipulation functionality
  2024-08-09  9:58                               ` [PATCH v2 0/5] Improve EAL bit operations API Mattias Rönnblom
@ 2024-08-09  9:58                                 ` Mattias Rönnblom
  2024-08-12 11:16                                   ` Jack Bond-Preston
  2024-08-12 12:49                                   ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
  2024-08-09  9:58                                 ` [PATCH v2 2/5] eal: add unit tests for bit operations Mattias Rönnblom
                                                   ` (3 subsequent siblings)
  4 siblings, 2 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09  9:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add functionality to test and modify the value of individual bits in
32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

--

RFC v6:
 * Have rte_bit_test() accept const-marked bitsets.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.
---
 lib/eal/include/rte_bitops.h | 259 ++++++++++++++++++++++++++++++++++-
 1 file changed, 257 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..3297133e22 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,196 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		uint32_t *: __rte_bit_test32,			\
+		const uint32_t *: __rte_bit_test32,		\
+		uint64_t *: __rte_bit_test64,			\
+		const uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## family ## fun ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(family, fun, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## family ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## family ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(family, fun, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## family ## fun ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## family ## test ## size(addr, nr);	\
+		__rte_bit_ ## family ## assign ## size(addr, nr, !value); \
+	}
+
+__RTE_GEN_BIT_TEST(, test,, 32)
+__RTE_GEN_BIT_SET(, set,, 32)
+__RTE_GEN_BIT_CLEAR(, clear,, 32)
+__RTE_GEN_BIT_ASSIGN(, assign,, 32)
+__RTE_GEN_BIT_FLIP(, flip,, 32)
+
+__RTE_GEN_BIT_TEST(, test,, 64)
+__RTE_GEN_BIT_SET(, set,, 64)
+__RTE_GEN_BIT_CLEAR(, clear,, 64)
+__RTE_GEN_BIT_ASSIGN(, assign,, 64)
+__RTE_GEN_BIT_FLIP(, flip,, 64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +980,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v2 2/5] eal: add unit tests for bit operations
  2024-08-09  9:58                               ` [PATCH v2 0/5] Improve EAL bit operations API Mattias Rönnblom
  2024-08-09  9:58                                 ` [PATCH v2 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-08-09  9:58                                 ` Mattias Rönnblom
  2024-08-09  9:58                                 ` [PATCH v2 3/5] eal: add atomic " Mattias Rönnblom
                                                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09  9:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_[test|set|clear|assign|flip]()
functions.

The tests are converted to use the test suite runner framework.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

--

RFC v6:
 * Test rte_bit_*test() usage through const pointers.

RFC v4:
 * Remove redundant line continuations.
---
 app/test/test_bitops.c | 85 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..322f58c066 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,68 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+									\
+			const uint ## size ## _t *const_ptr = &word;	\
+			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
+				    reference_bit,			\
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +172,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v2 3/5] eal: add atomic bit operations
  2024-08-09  9:58                               ` [PATCH v2 0/5] Improve EAL bit operations API Mattias Rönnblom
  2024-08-09  9:58                                 ` [PATCH v2 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-08-09  9:58                                 ` [PATCH v2 2/5] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-08-09  9:58                                 ` Mattias Rönnblom
  2024-08-12 11:19                                   ` Jack Bond-Preston
  2024-08-09  9:58                                 ` [PATCH v2 4/5] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  2024-08-09  9:58                                 ` [PATCH v2 5/5] eal: extend bitops to handle volatile pointers Mattias Rönnblom
  4 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09  9:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Add atomic bit test/set/clear/assign/flip and
test-and-set/clear/assign/flip functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

--

PATCH:
 * Add missing macro #undef for C++ version of atomic bit flip.

RFC v7:
 * Replace compare-exchange-based rte_bitset_atomic_test_and_*() and
   flip() with implementations that use the previous value as returned
   by the atomic fetch function.
 * Reword documentation to match the non-atomic macro variants.
 * Remove pointer to <rte_stdatomic.h> for memory model documentation,
   since there is no documentation for that API.

RFC v6:
 * Have rte_bit_atomic_test() accept const-marked bitsets.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.
---
 lib/eal/include/rte_bitops.h | 414 +++++++++++++++++++++++++++++++++++
 1 file changed, 414 insertions(+)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3297133e22..4d878099ed 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -226,6 +227,204 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 uint64_t *: __rte_bit_atomic_test64,			\
+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
+							    memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '1', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '0', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Generic selection macro to atomically negate the value of the bit
+ * specified by @c nr in the word pointed to by @c addr to the value
+ * indicated by @c value, with the memory ordering as specified with
+ * @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Generic selection macro to atomically test and set bit specified by
+ * @c nr in the word pointed to by @c addr to '1', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Generic selection macro to atomically test and clear bit specified
+ * by @c nr in the word pointed to by @c addr to '0', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Generic selection macro to atomically test and assign bit specified
+ * by @c nr in the word pointed to by @c addr the value specified by
+ * @c value, with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
@@ -298,6 +497,145 @@ __RTE_GEN_BIT_CLEAR(, clear,, 64)
 __RTE_GEN_BIT_ASSIGN(, assign,, 64)
 __RTE_GEN_BIT_FLIP(, flip,, 64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
+				      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
+				     unsigned int nr, int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
+				       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_xor_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
+					unsigned int nr, bool value,	\
+					int memory_order)		\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_clear ## size(addr, nr,	\
+						       memory_order);	\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)				\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_set ## size(uint ## size ## _t *addr,	\
+					      unsigned int nr,		\
+					      int memory_order)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+		prev = rte_atomic_fetch_or_explicit(a_addr, mask,	\
+						    memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_clear ## size(uint ## size ## _t *addr, \
+						unsigned int nr,	\
+						int memory_order)	\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
+			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+	        prev = rte_atomic_fetch_and_explicit(a_addr, ~mask,	\
+						     memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
+						 unsigned int nr,	\
+						 bool value,		\
+						 int memory_order)	\
+	{								\
+		if (value)						\
+			return __rte_bit_atomic_test_and_set ## size(addr, nr, \
+								     memory_order); \
+		else							\
+			return __rte_bit_atomic_test_and_clear ## size(addr, nr, \
+								       memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
+	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
+	__RTE_GEN_BIT_ATOMIC_SET(size)			\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)		\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
+	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+
+__RTE_GEN_BIT_ATOMIC_OPS(32)
+__RTE_GEN_BIT_ATOMIC_OPS(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -993,6 +1331,15 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_flip
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1036,12 +1383,79 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v2 4/5] eal: add unit tests for atomic bit access functions
  2024-08-09  9:58                               ` [PATCH v2 0/5] Improve EAL bit operations API Mattias Rönnblom
                                                   ` (2 preceding siblings ...)
  2024-08-09  9:58                                 ` [PATCH v2 3/5] eal: add atomic " Mattias Rönnblom
@ 2024-08-09  9:58                                 ` Mattias Rönnblom
  2024-08-09  9:58                                 ` [PATCH v2 5/5] eal: extend bitops to handle volatile pointers Mattias Rönnblom
  4 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09  9:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_atomic_*() family of
functions.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

--

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.
---
 app/test/test_bitops.c | 313 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 312 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 322f58c066..b80216a0a1 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -61,6 +64,304 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -177,6 +478,16 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v2 5/5] eal: extend bitops to handle volatile pointers
  2024-08-09  9:58                               ` [PATCH v2 0/5] Improve EAL bit operations API Mattias Rönnblom
                                                   ` (3 preceding siblings ...)
  2024-08-09  9:58                                 ` [PATCH v2 4/5] eal: add unit tests for atomic bit access functions Mattias Rönnblom
@ 2024-08-09  9:58                                 ` Mattias Rönnblom
  2024-08-09 11:48                                   ` Morten Brørup
  2024-08-12 11:22                                   ` Jack Bond-Preston
  4 siblings, 2 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09  9:58 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup, Mattias Rönnblom

Have rte_bit_[test|set|clear|assign|flip]() and rte_bit_atomic_*()
handle volatile-marked pointers.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>

--

PATCH v2:
 * Actually run the est_bit_atomic_v_access*() test functions.
---
 app/test/test_bitops.c       |  32 ++-
 lib/eal/include/rte_bitops.h | 427 ++++++++++++++++++++++-------------
 2 files changed, 291 insertions(+), 168 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index b80216a0a1..10e87f6776 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -14,13 +14,13 @@
 #include "test.h"
 
 #define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
-			    flip_fun, test_fun, size)			\
+			    flip_fun, test_fun, size, mod)		\
 	static int							\
 	test_name(void)							\
 	{								\
 		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
 		unsigned int bit_nr;					\
-		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+		mod uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
 									\
 		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
 			bool reference_bit = (reference >> bit_nr) & 1;	\
@@ -41,7 +41,7 @@
 				    "Bit %d had unflipped value", bit_nr); \
 			flip_fun(&word, bit_nr);			\
 									\
-			const uint ## size ## _t *const_ptr = &word;	\
+			const mod uint ## size ## _t *const_ptr = &word; \
 			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
 				    reference_bit,			\
 				    "Bit %d had unexpected value", bit_nr); \
@@ -59,10 +59,16 @@
 	}
 
 GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
-		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32,)
 
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
-		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64,)
+
+GEN_TEST_BIT_ACCESS(test_bit_v_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32, volatile)
+
+GEN_TEST_BIT_ACCESS(test_bit_v_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64, volatile)
 
 #define bit_atomic_set(addr, nr)				\
 	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
@@ -81,11 +87,19 @@ GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 
 GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
 		    bit_atomic_clear, bit_atomic_assign,
-		    bit_atomic_flip, bit_atomic_test, 32)
+		    bit_atomic_flip, bit_atomic_test, 32,)
 
 GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
 		    bit_atomic_clear, bit_atomic_assign,
-		    bit_atomic_flip, bit_atomic_test, 64)
+		    bit_atomic_flip, bit_atomic_test, 64,)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_v_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32, volatile)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_v_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64, volatile)
 
 #define PARALLEL_TEST_RUNTIME 0.25
 
@@ -480,8 +494,12 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_v_access32),
+		TEST_CASE(test_bit_v_access64),
 		TEST_CASE(test_bit_atomic_access32),
 		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_v_access32),
+		TEST_CASE(test_bit_atomic_v_access64),
 		TEST_CASE(test_bit_atomic_parallel_assign32),
 		TEST_CASE(test_bit_atomic_parallel_assign64),
 		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 4d878099ed..1355949fb6 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -127,12 +127,16 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_test(addr, nr)					\
-	_Generic((addr),					\
-		uint32_t *: __rte_bit_test32,			\
-		const uint32_t *: __rte_bit_test32,		\
-		uint64_t *: __rte_bit_test64,			\
-		const uint64_t *: __rte_bit_test64)(addr, nr)
+#define rte_bit_test(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_test32,				\
+		 const uint32_t *: __rte_bit_test32,			\
+		 volatile uint32_t *: __rte_bit_v_test32,		\
+		 const volatile uint32_t *: __rte_bit_v_test32,		\
+		 uint64_t *: __rte_bit_test64,				\
+		 const uint64_t *: __rte_bit_test64,			\
+		 volatile uint64_t *: __rte_bit_v_test64,		\
+		 const volatile uint64_t *: __rte_bit_v_test64)(addr, nr)
 
 /**
  * @warning
@@ -152,10 +156,12 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_set(addr, nr)				\
-	_Generic((addr),				\
-		 uint32_t *: __rte_bit_set32,		\
-		 uint64_t *: __rte_bit_set64)(addr, nr)
+#define rte_bit_set(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_set32,				\
+		 volatile uint32_t *: __rte_bit_v_set32,		\
+		 uint64_t *: __rte_bit_set64,				\
+		 volatile uint64_t *: __rte_bit_v_set64)(addr, nr)
 
 /**
  * @warning
@@ -175,10 +181,12 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_clear(addr, nr)					\
-	_Generic((addr),					\
-		 uint32_t *: __rte_bit_clear32,			\
-		 uint64_t *: __rte_bit_clear64)(addr, nr)
+#define rte_bit_clear(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_clear32,				\
+		 volatile uint32_t *: __rte_bit_v_clear32,		\
+		 uint64_t *: __rte_bit_clear64,				\
+		 volatile uint64_t *: __rte_bit_v_clear64)(addr, nr)
 
 /**
  * @warning
@@ -202,7 +210,9 @@ extern "C" {
 #define rte_bit_assign(addr, nr, value)					\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_assign32,			\
-		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+		 volatile uint32_t *: __rte_bit_v_assign32,		\
+		 uint64_t *: __rte_bit_assign64,			\
+		 volatile uint64_t *: __rte_bit_v_assign64)(addr, nr, value)
 
 /**
  * @warning
@@ -225,7 +235,9 @@ extern "C" {
 #define rte_bit_flip(addr, nr)						\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_flip32,				\
-		 uint64_t *: __rte_bit_flip64)(addr, nr)
+		 volatile uint32_t *: __rte_bit_v_flip32,		\
+		 uint64_t *: __rte_bit_flip64,				\
+		 volatile uint64_t *: __rte_bit_v_flip64)(addr, nr)
 
 /**
  * @warning
@@ -250,9 +262,13 @@ extern "C" {
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test32,			\
 		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 volatile uint32_t *: __rte_bit_atomic_v_test32,	\
+		 const volatile uint32_t *: __rte_bit_atomic_v_test32,	\
 		 uint64_t *: __rte_bit_atomic_test64,			\
-		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
-							    memory_order)
+		 const uint64_t *: __rte_bit_atomic_test64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test64,	\
+		 const volatile uint64_t *: __rte_bit_atomic_v_test64) \
+						    (addr, nr, memory_order)
 
 /**
  * @warning
@@ -274,7 +290,10 @@ extern "C" {
 #define rte_bit_atomic_set(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_set32,			\
-		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_set32,		\
+		 uint64_t *: __rte_bit_atomic_set64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_set64)(addr, nr, \
+								memory_order)
 
 /**
  * @warning
@@ -296,7 +315,10 @@ extern "C" {
 #define rte_bit_atomic_clear(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_clear32,			\
-		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_clear32,	\
+		 uint64_t *: __rte_bit_atomic_clear64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_clear64)(addr, nr, \
+								  memory_order)
 
 /**
  * @warning
@@ -320,8 +342,11 @@ extern "C" {
 #define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_assign32,			\
-		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
-							memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_assign32,	\
+		 uint64_t *: __rte_bit_atomic_assign64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_assign64)(addr, nr, \
+								   value, \
+								   memory_order)
 
 /**
  * @warning
@@ -344,7 +369,10 @@ extern "C" {
 #define rte_bit_atomic_flip(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_flip32,			\
-		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_flip32,	\
+		 uint64_t *: __rte_bit_atomic_flip64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_flip64)(addr, nr, \
+								 memory_order)
 
 /**
  * @warning
@@ -368,8 +396,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
-		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
-							      memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_set32, \
+		 uint64_t *: __rte_bit_atomic_test_and_set64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_set64) \
+						    (addr, nr, memory_order)
 
 /**
  * @warning
@@ -393,8 +423,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
-		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
-								memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_clear32, \
+		 uint64_t *: __rte_bit_atomic_test_and_clear64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_clear64) \
+						       (addr, nr, memory_order)
 
 /**
  * @warning
@@ -421,9 +453,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
-		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
-								 value, \
-								 memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_assign32, \
+		 uint64_t *: __rte_bit_atomic_test_and_assign64,	\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_assign64) \
+						(addr, nr, value, memory_order)
 
 #define __RTE_GEN_BIT_TEST(family, fun, qualifier, size)		\
 	__rte_experimental						\
@@ -491,93 +524,105 @@ __RTE_GEN_BIT_CLEAR(, clear,, 32)
 __RTE_GEN_BIT_ASSIGN(, assign,, 32)
 __RTE_GEN_BIT_FLIP(, flip,, 32)
 
+__RTE_GEN_BIT_TEST(v_, test, volatile, 32)
+__RTE_GEN_BIT_SET(v_, set, volatile, 32)
+__RTE_GEN_BIT_CLEAR(v_, clear, volatile, 32)
+__RTE_GEN_BIT_ASSIGN(v_, assign, volatile, 32)
+__RTE_GEN_BIT_FLIP(v_, flip, volatile, 32)
+
 __RTE_GEN_BIT_TEST(, test,, 64)
 __RTE_GEN_BIT_SET(, set,, 64)
 __RTE_GEN_BIT_CLEAR(, clear,, 64)
 __RTE_GEN_BIT_ASSIGN(, assign,, 64)
 __RTE_GEN_BIT_FLIP(, flip,, 64)
 
-#define __RTE_GEN_BIT_ATOMIC_TEST(size)					\
+__RTE_GEN_BIT_TEST(v_, test, volatile, 64)
+__RTE_GEN_BIT_SET(v_, set, volatile, 64)
+__RTE_GEN_BIT_CLEAR(v_, clear, volatile, 64)
+__RTE_GEN_BIT_ASSIGN(v_, assign, volatile, 64)
+__RTE_GEN_BIT_FLIP(v_, flip, volatile, 64)
+
+#define __RTE_GEN_BIT_ATOMIC_TEST(v, qualifier, size)			\
 	__rte_experimental						\
 	static inline bool						\
-	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
-				      unsigned int nr, int memory_order) \
+	__rte_bit_atomic_ ## v ## test ## size(const qualifier uint ## size ## _t *addr, \
+					       unsigned int nr, int memory_order) \
 	{								\
 		RTE_ASSERT(nr < size);					\
 									\
-		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
-			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		const qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr = \
+			(const qualifier RTE_ATOMIC(uint ## size ## _t) *)addr;	\
 		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
 		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
+#define __RTE_GEN_BIT_ATOMIC_SET(v, qualifier, size)			\
 	__rte_experimental						\
 	static inline void						\
-	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
-				     unsigned int nr, int memory_order)	\
+	__rte_bit_atomic_ ## v ## set ## size(qualifier uint ## size ## _t *addr, \
+					      unsigned int nr, int memory_order) \
 	{								\
 		RTE_ASSERT(nr < size);					\
 									\
-		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
-			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
 		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
 		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(v, qualifier, size)			\
 	__rte_experimental						\
 	static inline void						\
-	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
-				       unsigned int nr, int memory_order) \
+	__rte_bit_atomic_ ## v ## clear ## size(qualifier uint ## size ## _t *addr,	\
+						unsigned int nr, int memory_order) \
 	{								\
 		RTE_ASSERT(nr < size);					\
 									\
-		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
-			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
 		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
 		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
+#define __RTE_GEN_BIT_ATOMIC_FLIP(v, qualifier, size)			\
 	__rte_experimental						\
 	static inline void						\
-	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
-				       unsigned int nr, int memory_order) \
+	__rte_bit_atomic_ ## v ## flip ## size(qualifier uint ## size ## _t *addr, \
+					       unsigned int nr, int memory_order) \
 	{								\
 		RTE_ASSERT(nr < size);					\
 									\
-		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
-			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
 		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
 		rte_atomic_fetch_xor_explicit(a_addr, mask, memory_order); \
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(v, qualifier, size)			\
 	__rte_experimental						\
 	static inline void						\
-	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
-					unsigned int nr, bool value,	\
-					int memory_order)		\
+	__rte_bit_atomic_## v ## assign ## size(qualifier uint ## size ## _t *addr, \
+						unsigned int nr, bool value, \
+						int memory_order)	\
 	{								\
 		if (value)						\
-			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
+			__rte_bit_atomic_ ## v ## set ## size(addr, nr, memory_order); \
 		else							\
-			__rte_bit_atomic_clear ## size(addr, nr,	\
-						       memory_order);	\
+			__rte_bit_atomic_ ## v ## clear ## size(addr, nr, \
+								     memory_order); \
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)				\
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(v, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
-	__rte_bit_atomic_test_and_set ## size(uint ## size ## _t *addr,	\
-					      unsigned int nr,		\
-					      int memory_order)		\
+	__rte_bit_atomic_ ## v ## test_and_set ## size(qualifier uint ## size ## _t *addr, \
+						       unsigned int nr,	\
+						       int memory_order) \
 	{								\
 		RTE_ASSERT(nr < size);					\
 									\
-		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
-			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
 		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
 		uint ## size ## _t prev;				\
 									\
@@ -587,17 +632,17 @@ __RTE_GEN_BIT_FLIP(, flip,, 64)
 		return prev & mask;					\
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)			\
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(v, qualifier, size)		\
 	__rte_experimental						\
 	static inline bool						\
-	__rte_bit_atomic_test_and_clear ## size(uint ## size ## _t *addr, \
-						unsigned int nr,	\
-						int memory_order)	\
+	__rte_bit_atomic_ ## v ## test_and_clear ## size(qualifier uint ## size ## _t *addr, \
+							 unsigned int nr, \
+							 int memory_order) \
 	{								\
 		RTE_ASSERT(nr < size);					\
 									\
-		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
-			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
 		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
 		uint ## size ## _t prev;				\
 									\
@@ -607,34 +652,36 @@ __RTE_GEN_BIT_FLIP(, flip,, 64)
 		return prev & mask;					\
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(v, qualifier, size)	\
 	__rte_experimental						\
 	static inline bool						\
-	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
-						 unsigned int nr,	\
-						 bool value,		\
-						 int memory_order)	\
+	__rte_bit_atomic_ ## v ## test_and_assign ## size(qualifier uint ## size ## _t *addr, \
+							  unsigned int nr, \
+							  bool value,	\
+							  int memory_order) \
 	{								\
 		if (value)						\
-			return __rte_bit_atomic_test_and_set ## size(addr, nr, \
-								     memory_order); \
+			return __rte_bit_atomic_ ## v ## test_and_set ## size(addr, nr, memory_order); \
 		else							\
-			return __rte_bit_atomic_test_and_clear ## size(addr, nr, \
-								       memory_order); \
+			return __rte_bit_atomic_ ## v ## test_and_clear ## size(addr, nr, memory_order); \
 	}
 
-#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
-	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
-	__RTE_GEN_BIT_ATOMIC_SET(size)			\
-	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
-	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
-	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)		\
-	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)	\
-	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
-	__RTE_GEN_BIT_ATOMIC_FLIP(size)
+#define __RTE_GEN_BIT_ATOMIC_OPS(v, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST(v, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_SET(v, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(v, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(v, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(v, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(v, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(v, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_FLIP(v, qualifier, size)
 
-__RTE_GEN_BIT_ATOMIC_OPS(32)
-__RTE_GEN_BIT_ATOMIC_OPS(64)
+#define __RTE_GEN_BIT_ATOMIC_OPS_SIZE(size) \
+	__RTE_GEN_BIT_ATOMIC_OPS(,, size) \
+	__RTE_GEN_BIT_ATOMIC_OPS(v_, volatile, size)
+
+__RTE_GEN_BIT_ATOMIC_OPS_SIZE(32)
+__RTE_GEN_BIT_ATOMIC_OPS_SIZE(64)
 
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
@@ -1340,120 +1387,178 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_atomic_test_and_clear
 #undef rte_bit_atomic_test_and_assign
 
-#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+#define __RTE_BIT_OVERLOAD_V_2(family, v, fun, c, size, arg1_type, arg1_name) \
 	static inline void						\
-	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
-			arg1_type arg1_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name)			\
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
-	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
-	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+#define __RTE_BIT_OVERLOAD_SZ_2(family, fun, c, size, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_V_2(family,, fun, c, size, arg1_type,	\
+			       arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2(family, v_, fun, c volatile, size, \
+			       arg1_type, arg1_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name)				\
+#define __RTE_BIT_OVERLOAD_2(family, fun, c, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(family, fun, c, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(family, fun, c, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_V_2R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name)				\
 	static inline ret_type						\
-	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
 			arg1_type arg1_name)				\
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
-	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2R(family,, fun, c, size, ret_type, arg1_type, \
+				arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2R(family, v_, fun, c volatile,		\
+				size, ret_type, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_2R(family, fun, c, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, 32, ret_type, arg1_type, \
 				 arg1_name)				\
-	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+	__RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, 64, ret_type, arg1_type, \
 				 arg1_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
-				arg2_type, arg2_name)			\
+#define __RTE_BIT_OVERLOAD_V_3(family, v, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name)			\
 	static inline void						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name) \
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name, \
+							 arg2_name);	\
 	}
 
-#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+#define __RTE_BIT_OVERLOAD_SZ_3(family, fun, c, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_V_3(family,, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_V_3(family, v_, fun, c volatile, size, arg1_type, \
+			       arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_3(family, fun, c, arg1_type, arg1_name, arg2_type, \
 			     arg2_name)					\
-	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+	__RTE_BIT_OVERLOAD_SZ_3(family, fun, c, 32, arg1_type, arg1_name, \
 				arg2_type, arg2_name)			\
-	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+	__RTE_BIT_OVERLOAD_SZ_3(family, fun, c, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name)	\
+#define __RTE_BIT_OVERLOAD_V_3R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name)	\
 	static inline ret_type						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name) \
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name, \
+								arg2_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
-			      arg2_type, arg2_name)			\
-	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, size, ret_type, arg1_type, \
 				 arg1_name, arg2_type, arg2_name)	\
-	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name)
+	__RTE_BIT_OVERLOAD_V_3R(family,, fun, c, size, ret_type, \
+				arg1_type, arg1_name, arg2_type, arg2_name) \
+	__RTE_BIT_OVERLOAD_V_3R(family, v_, fun, c volatile, size, \
+				ret_type, arg1_type, arg1_name, arg2_type, \
+				arg2_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
-				arg2_type, arg2_name, arg3_type, arg3_name) \
+#define __RTE_BIT_OVERLOAD_3R(family, fun, c, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, 32, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name) \
+	__RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, 64, ret_type, \
+				 arg1_type, arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_V_4(family, v, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name, arg3_type,	arg3_name) \
 	static inline void						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name, arg3_type arg3_name)	\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name, \
+				  arg3_type arg3_name)			\
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
-					  arg3_name);		      \
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name, \
+							 arg2_name,	\
+							 arg3_name);	\
 	}
 
-#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
-			     arg2_name, arg3_type, arg3_name)		\
-	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+#define __RTE_BIT_OVERLOAD_SZ_4(family, fun, c, size, arg1_type, arg1_name, \
 				arg2_type, arg2_name, arg3_type, arg3_name) \
-	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
-				arg2_type, arg2_name, arg3_type, arg3_name)
-
-#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name, arg3_type, \
-				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4(family,, fun, c, size, arg1_type,	\
+			       arg1_name, arg2_type, arg2_name, arg3_type, \
+			       arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4(family, v_, fun, c volatile, size,	\
+			       arg1_type, arg1_name, arg2_type, arg2_name, \
+			       arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_4(family, fun, c, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(family, fun, c, 32, arg1_type,		\
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4(family, fun, c, 64, arg1_type,		\
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)
+
+#define __RTE_BIT_OVERLOAD_V_4R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
 	static inline ret_type						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name, arg3_type arg3_name)	\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name, \
+				  arg3_type arg3_name)			\
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
-						 arg3_name);		\
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name, \
+								arg2_name, \
+								arg3_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
-			      arg2_type, arg2_name, arg3_type, arg3_name) \
-	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, size, ret_type, arg1_type, \
 				 arg1_name, arg2_type, arg2_name, arg3_type, \
 				 arg3_name)				\
-	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name, arg3_type, \
-				 arg3_name)
-
-__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
-__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
-__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
-__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
-__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
-
-__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+	__RTE_BIT_OVERLOAD_V_4R(family,, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4R(family, v_, fun, c volatile, size,	\
+				ret_type, arg1_type, arg1_name, arg2_type, \
+				arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_4R(family, fun, c, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, 32, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name, \
+				 arg3_type, arg3_name)			\
+	__RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, 64, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name, \
+				 arg3_type, arg3_name)
+
+__RTE_BIT_OVERLOAD_2R(, test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(, set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(, clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(, assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(, flip,, unsigned int, nr)
+
+__RTE_BIT_OVERLOAD_3R(atomic_, test, const, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+__RTE_BIT_OVERLOAD_3(atomic_, set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_, clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_, assign,, unsigned int, nr, bool, value,
 		     int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_3(atomic_, flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_, test_and_set,, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_3R(atomic_, test_and_clear,, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_4R(atomic_, test_and_assign,, bool, unsigned int, nr,
 		      bool, value, int, memory_order)
 
 #endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [PATCH v2 5/5] eal: extend bitops to handle volatile pointers
  2024-08-09  9:58                                 ` [PATCH v2 5/5] eal: extend bitops to handle volatile pointers Mattias Rönnblom
@ 2024-08-09 11:48                                   ` Morten Brørup
  2024-08-12 11:22                                   ` Jack Bond-Preston
  1 sibling, 0 replies; 160+ messages in thread
From: Morten Brørup @ 2024-08-09 11:48 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff

> +#define rte_bit_test(addr, nr)						\
> +	_Generic((addr),						\
> +		 uint32_t *: __rte_bit_test32,				\
> +		 const uint32_t *: __rte_bit_test32,			\
> +		 volatile uint32_t *: __rte_bit_v_test32,		\
> +		 const volatile uint32_t *: __rte_bit_v_test32,		\

I had to read up on "const volatile *", and it checks out.

Acked-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/5] eal: add unit tests for bit operations
  2024-08-09  9:04                             ` [PATCH 2/5] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-08-09 15:03                               ` Stephen Hemminger
  2024-08-09 15:37                                 ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: Stephen Hemminger @ 2024-08-09 15:03 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, hofors, Heng Wang, Joyce Kong, Tyler Retzlaff, Morten Brørup

On Fri, 9 Aug 2024 11:04:36 +0200
Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:

> -uint32_t val32;
> -uint64_t val64;
> +#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
> +			    flip_fun, test_fun, size)			\
> +	static int							\
> +	test_name(void)							\
> +	{								\
> +		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
> +		unsigned int bit_nr;					\
> +		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
> +									\
> +		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
> +			bool reference_bit = (reference >> bit_nr) & 1;	\
> +			bool assign = rte_rand() & 1;			\
> +			if (assign)					\
> +				assign_fun(&word, bit_nr, reference_bit); \
> +			else {						\
> +				if (reference_bit)			\
> +					set_fun(&word, bit_nr);		\
> +				else					\
> +					clear_fun(&word, bit_nr);	\
> +									\
> +			}						\
> +			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
> +				    "Bit %d had unexpected value", bit_nr); \
> +			flip_fun(&word, bit_nr);			\
> +			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
> +				    "Bit %d had unflipped value", bit_nr); \
> +			flip_fun(&word, bit_nr);			\
> +									\
> +			const uint ## size ## _t *const_ptr = &word;	\
> +			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
> +				    reference_bit,			\
> +				    "Bit %d had unexpected value", bit_nr); \
> +		}							\
> +									\
> +		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
> +			bool reference_bit = (reference >> bit_nr) & 1;	\
> +			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
> +				    "Bit %d had unexpected value", bit_nr); \
> +		}							\
> +									\
> +		TEST_ASSERT(reference == word, "Word had unexpected value"); \
> +									\
> +		return TEST_SUCCESS;					\
> +	}
> +
> +GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
> +		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
> +
> +GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
> +		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)

Having large macro like this for two cases adds complexity without
additional clarity. Just duplicate the code please.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/5] eal: add unit tests for bit operations
  2024-08-09 15:03                               ` Stephen Hemminger
@ 2024-08-09 15:37                                 ` Mattias Rönnblom
  2024-08-09 16:31                                   ` Stephen Hemminger
  0 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09 15:37 UTC (permalink / raw)
  To: Stephen Hemminger, Mattias Rönnblom
  Cc: dev, Heng Wang, Joyce Kong, Tyler Retzlaff, Morten Brørup

On 2024-08-09 17:03, Stephen Hemminger wrote:
> On Fri, 9 Aug 2024 11:04:36 +0200
> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> 
>> -uint32_t val32;
>> -uint64_t val64;
>> +#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
>> +			    flip_fun, test_fun, size)			\
>> +	static int							\
>> +	test_name(void)							\
>> +	{								\
>> +		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
>> +		unsigned int bit_nr;					\
>> +		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
>> +									\
>> +		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
>> +			bool reference_bit = (reference >> bit_nr) & 1;	\
>> +			bool assign = rte_rand() & 1;			\
>> +			if (assign)					\
>> +				assign_fun(&word, bit_nr, reference_bit); \
>> +			else {						\
>> +				if (reference_bit)			\
>> +					set_fun(&word, bit_nr);		\
>> +				else					\
>> +					clear_fun(&word, bit_nr);	\
>> +									\
>> +			}						\
>> +			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
>> +				    "Bit %d had unexpected value", bit_nr); \
>> +			flip_fun(&word, bit_nr);			\
>> +			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
>> +				    "Bit %d had unflipped value", bit_nr); \
>> +			flip_fun(&word, bit_nr);			\
>> +									\
>> +			const uint ## size ## _t *const_ptr = &word;	\
>> +			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
>> +				    reference_bit,			\
>> +				    "Bit %d had unexpected value", bit_nr); \
>> +		}							\
>> +									\
>> +		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
>> +			bool reference_bit = (reference >> bit_nr) & 1;	\
>> +			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
>> +				    "Bit %d had unexpected value", bit_nr); \
>> +		}							\
>> +									\
>> +		TEST_ASSERT(reference == word, "Word had unexpected value"); \
>> +									\
>> +		return TEST_SUCCESS;					\
>> +	}
>> +
>> +GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
>> +		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
>> +
>> +GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
>> +		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
> 
> Having large macro like this for two cases adds complexity without
> additional clarity. Just duplicate the code please.

GEN_TEST_BIT_ACCESS is being used by six more test cases in later 
patches in the series.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/5] eal: add unit tests for bit operations
  2024-08-09 15:37                                 ` Mattias Rönnblom
@ 2024-08-09 16:31                                   ` Stephen Hemminger
  2024-08-09 16:57                                     ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: Stephen Hemminger @ 2024-08-09 16:31 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Mattias Rönnblom, dev, Heng Wang, Joyce Kong,
	Tyler Retzlaff, Morten Brørup

On Fri, 9 Aug 2024 17:37:08 +0200
Mattias Rönnblom <hofors@lysator.liu.se> wrote:

> On 2024-08-09 17:03, Stephen Hemminger wrote:
> > On Fri, 9 Aug 2024 11:04:36 +0200
> > Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
> >   
> >> -uint32_t val32;
> >> -uint64_t val64;
> >> +#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
> >> +			    flip_fun, test_fun, size)			\
> >> +	static int							\
> >> +	test_name(void)							\
> >> +	{								\
> >> +		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
> >> +		unsigned int bit_nr;					\
> >> +		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
> >> +									\
> >> +		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
> >> +			bool reference_bit = (reference >> bit_nr) & 1;	\
> >> +			bool assign = rte_rand() & 1;			\
> >> +			if (assign)					\
> >> +				assign_fun(&word, bit_nr, reference_bit); \
> >> +			else {						\
> >> +				if (reference_bit)			\
> >> +					set_fun(&word, bit_nr);		\
> >> +				else					\
> >> +					clear_fun(&word, bit_nr);	\
> >> +									\
> >> +			}						\
> >> +			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
> >> +				    "Bit %d had unexpected value", bit_nr); \
> >> +			flip_fun(&word, bit_nr);			\
> >> +			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
> >> +				    "Bit %d had unflipped value", bit_nr); \
> >> +			flip_fun(&word, bit_nr);			\
> >> +									\
> >> +			const uint ## size ## _t *const_ptr = &word;	\
> >> +			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
> >> +				    reference_bit,			\
> >> +				    "Bit %d had unexpected value", bit_nr); \
> >> +		}							\
> >> +									\
> >> +		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
> >> +			bool reference_bit = (reference >> bit_nr) & 1;	\
> >> +			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
> >> +				    "Bit %d had unexpected value", bit_nr); \
> >> +		}							\
> >> +									\
> >> +		TEST_ASSERT(reference == word, "Word had unexpected value"); \
> >> +									\
> >> +		return TEST_SUCCESS;					\
> >> +	}
> >> +
> >> +GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
> >> +		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
> >> +
> >> +GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
> >> +		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)  
> > 
> > Having large macro like this for two cases adds complexity without
> > additional clarity. Just duplicate the code please.  
> 
> GEN_TEST_BIT_ACCESS is being used by six more test cases in later 
> patches in the series.

Would it be possible to make it a function and pass function pointers with
Generic?

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH 2/5] eal: add unit tests for bit operations
  2024-08-09 16:31                                   ` Stephen Hemminger
@ 2024-08-09 16:57                                     ` Mattias Rönnblom
  0 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-09 16:57 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Mattias Rönnblom, dev, Heng Wang, Joyce Kong,
	Tyler Retzlaff, Morten Brørup

On 2024-08-09 18:31, Stephen Hemminger wrote:
> On Fri, 9 Aug 2024 17:37:08 +0200
> Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> 
>> On 2024-08-09 17:03, Stephen Hemminger wrote:
>>> On Fri, 9 Aug 2024 11:04:36 +0200
>>> Mattias Rönnblom <mattias.ronnblom@ericsson.com> wrote:
>>>    
>>>> -uint32_t val32;
>>>> -uint64_t val64;
>>>> +#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
>>>> +			    flip_fun, test_fun, size)			\
>>>> +	static int							\
>>>> +	test_name(void)							\
>>>> +	{								\
>>>> +		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
>>>> +		unsigned int bit_nr;					\
>>>> +		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
>>>> +									\
>>>> +		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
>>>> +			bool reference_bit = (reference >> bit_nr) & 1;	\
>>>> +			bool assign = rte_rand() & 1;			\
>>>> +			if (assign)					\
>>>> +				assign_fun(&word, bit_nr, reference_bit); \
>>>> +			else {						\
>>>> +				if (reference_bit)			\
>>>> +					set_fun(&word, bit_nr);		\
>>>> +				else					\
>>>> +					clear_fun(&word, bit_nr);	\
>>>> +									\
>>>> +			}						\
>>>> +			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
>>>> +				    "Bit %d had unexpected value", bit_nr); \
>>>> +			flip_fun(&word, bit_nr);			\
>>>> +			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
>>>> +				    "Bit %d had unflipped value", bit_nr); \
>>>> +			flip_fun(&word, bit_nr);			\
>>>> +									\
>>>> +			const uint ## size ## _t *const_ptr = &word;	\
>>>> +			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
>>>> +				    reference_bit,			\
>>>> +				    "Bit %d had unexpected value", bit_nr); \
>>>> +		}							\
>>>> +									\
>>>> +		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
>>>> +			bool reference_bit = (reference >> bit_nr) & 1;	\
>>>> +			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
>>>> +				    "Bit %d had unexpected value", bit_nr); \
>>>> +		}							\
>>>> +									\
>>>> +		TEST_ASSERT(reference == word, "Word had unexpected value"); \
>>>> +									\
>>>> +		return TEST_SUCCESS;					\
>>>> +	}
>>>> +
>>>> +GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
>>>> +		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
>>>> +
>>>> +GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
>>>> +		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
>>>
>>> Having large macro like this for two cases adds complexity without
>>> additional clarity. Just duplicate the code please.
>>
>> GEN_TEST_BIT_ACCESS is being used by six more test cases in later
>> patches in the series.
> 
> Would it be possible to make it a function and pass function pointers with
> Generic?

I'm not sure exactly what you are suggesting here, but a function can't 
do the job of GEN_TEST_BIT_ACCESS. You can't pass macros as parameters 
to functions, and also the signatures of the _Generic-macros-under-test 
(e.g., set_fun) various across different test cases.

I agree with what underlies your suggestion - prefer functions over 
macros when functions can do the job (reasonably well).

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 1/5] eal: extend bit manipulation functionality
  2024-08-09  9:58                                 ` [PATCH v2 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-08-12 11:16                                   ` Jack Bond-Preston
  2024-08-12 11:58                                     ` Mattias Rönnblom
  2024-08-12 12:49                                   ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
  1 sibling, 1 reply; 160+ messages in thread
From: Jack Bond-Preston @ 2024-08-12 11:16 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup

On 09/08/2024 10:58, Mattias Rönnblom wrote:
<snip>
> +
> +__RTE_GEN_BIT_TEST(, test,, 32)
> +__RTE_GEN_BIT_SET(, set,, 32)
> +__RTE_GEN_BIT_CLEAR(, clear,, 32)
> +__RTE_GEN_BIT_ASSIGN(, assign,, 32)
> +__RTE_GEN_BIT_FLIP(, flip,, 32)
> +
> +__RTE_GEN_BIT_TEST(, test,, 64)
> +__RTE_GEN_BIT_SET(, set,, 64)
> +__RTE_GEN_BIT_CLEAR(, clear,, 64)
> +__RTE_GEN_BIT_ASSIGN(, assign,, 64)
> +__RTE_GEN_BIT_FLIP(, flip,, 64)

What is the purpose of the `fun` argument? As opposed to just having 
these written out in the macro definitions. I notice the atomic 
equivalents don't have this.

>   /*------------------------ 32-bit relaxed operations ------------------------*/
>   
>   /**
> <snip>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 3/5] eal: add atomic bit operations
  2024-08-09  9:58                                 ` [PATCH v2 3/5] eal: add atomic " Mattias Rönnblom
@ 2024-08-12 11:19                                   ` Jack Bond-Preston
  2024-08-12 12:00                                     ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: Jack Bond-Preston @ 2024-08-12 11:19 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup

On 09/08/2024 10:58, Mattias Rönnblom wrote:
> <snip>
> +
> +#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
> +	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
> +	__RTE_GEN_BIT_ATOMIC_SET(size)			\
> +	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
> +	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
> +	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)		\
> +	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)	\
> +	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
> +	__RTE_GEN_BIT_ATOMIC_FLIP(size)
> +
> +__RTE_GEN_BIT_ATOMIC_OPS(32)
> +__RTE_GEN_BIT_ATOMIC_OPS(64)

For the non-atomic operations, the arguments family and qualifier were 
added in the initial commit, and unused until the 
volatile-support-adding commit. Perhaps the atomic equivalents should be 
the same? (ie. add the family and qualifier arguments in this patch and 
don't use them until patch 5/5.

> +
>   /*------------------------ 32-bit relaxed operations ------------------------*/
>   
>   /**
> <snip>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 5/5] eal: extend bitops to handle volatile pointers
  2024-08-09  9:58                                 ` [PATCH v2 5/5] eal: extend bitops to handle volatile pointers Mattias Rönnblom
  2024-08-09 11:48                                   ` Morten Brørup
@ 2024-08-12 11:22                                   ` Jack Bond-Preston
  2024-08-12 12:28                                     ` Mattias Rönnblom
  1 sibling, 1 reply; 160+ messages in thread
From: Jack Bond-Preston @ 2024-08-12 11:22 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup

On 09/08/2024 10:58, Mattias Rönnblom wrote:
> <snip>
> +#define __RTE_GEN_BIT_ATOMIC_TEST(v, qualifier, size)			\
>   	__rte_experimental						\
>   	static inline bool						\
> -	__rte_bit_atomic_test ## size(const uint ## size ## _t *addr,	\
> -				      unsigned int nr, int memory_order) \
> +	__rte_bit_atomic_ ## v ## test ## size(const qualifier uint ## size ## _t *addr, \
> +					       unsigned int nr, int memory_order) \
>   	{								\
>   		RTE_ASSERT(nr < size);					\
>   									\
> -		const RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> -			(const RTE_ATOMIC(uint ## size ## _t) *)addr;	\
> +		const qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr = \
> +			(const qualifier RTE_ATOMIC(uint ## size ## _t) *)addr;	\
>   		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>   		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
>   	}
>   
> -#define __RTE_GEN_BIT_ATOMIC_SET(size)					\
> +#define __RTE_GEN_BIT_ATOMIC_SET(v, qualifier, size)			\
>   	__rte_experimental						\
>   	static inline void						\
> -	__rte_bit_atomic_set ## size(uint ## size ## _t *addr,		\
> -				     unsigned int nr, int memory_order)	\
> +	__rte_bit_atomic_ ## v ## set ## size(qualifier uint ## size ## _t *addr, \
> +					      unsigned int nr, int memory_order) \
>   	{								\
>   		RTE_ASSERT(nr < size);					\
>   									\
> -		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> -			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
> +		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
> +			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
>   		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>   		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
>   	}
>   
> -#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)				\
> +#define __RTE_GEN_BIT_ATOMIC_CLEAR(v, qualifier, size)			\
>   	__rte_experimental						\
>   	static inline void						\
> -	__rte_bit_atomic_clear ## size(uint ## size ## _t *addr,	\
> -				       unsigned int nr, int memory_order) \
> +	__rte_bit_atomic_ ## v ## clear ## size(qualifier uint ## size ## _t *addr,	\
> +						unsigned int nr, int memory_order) \
>   	{								\
>   		RTE_ASSERT(nr < size);					\
>   									\
> -		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> -			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
> +		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
> +			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
>   		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>   		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
>   	}
>   
> -#define __RTE_GEN_BIT_ATOMIC_FLIP(size)					\
> +#define __RTE_GEN_BIT_ATOMIC_FLIP(v, qualifier, size)			\
>   	__rte_experimental						\
>   	static inline void						\
> -	__rte_bit_atomic_flip ## size(uint ## size ## _t *addr,		\
> -				       unsigned int nr, int memory_order) \
> +	__rte_bit_atomic_ ## v ## flip ## size(qualifier uint ## size ## _t *addr, \
> +					       unsigned int nr, int memory_order) \
>   	{								\
>   		RTE_ASSERT(nr < size);					\
>   									\
> -		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> -			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
> +		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
> +			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
>   		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>   		rte_atomic_fetch_xor_explicit(a_addr, mask, memory_order); \
>   	}
>   
> -#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)				\
> +#define __RTE_GEN_BIT_ATOMIC_ASSIGN(v, qualifier, size)			\
>   	__rte_experimental						\
>   	static inline void						\
> -	__rte_bit_atomic_assign ## size(uint ## size ## _t *addr,	\
> -					unsigned int nr, bool value,	\
> -					int memory_order)		\
> +	__rte_bit_atomic_## v ## assign ## size(qualifier uint ## size ## _t *addr, \
> +						unsigned int nr, bool value, \
> +						int memory_order)	\
>   	{								\
>   		if (value)						\
> -			__rte_bit_atomic_set ## size(addr, nr, memory_order); \
> +			__rte_bit_atomic_ ## v ## set ## size(addr, nr, memory_order); \
>   		else							\
> -			__rte_bit_atomic_clear ## size(addr, nr,	\
> -						       memory_order);	\
> +			__rte_bit_atomic_ ## v ## clear ## size(addr, nr, \
> +								     memory_order); \
>   	}
>   
> -#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)				\
> +#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(v, qualifier, size)		\
>   	__rte_experimental						\
>   	static inline bool						\
> -	__rte_bit_atomic_test_and_set ## size(uint ## size ## _t *addr,	\
> -					      unsigned int nr,		\
> -					      int memory_order)		\
> +	__rte_bit_atomic_ ## v ## test_and_set ## size(qualifier uint ## size ## _t *addr, \
> +						       unsigned int nr,	\
> +						       int memory_order) \
>   	{								\
>   		RTE_ASSERT(nr < size);					\
>   									\
> -		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> -			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
> +		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
> +			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
>   		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>   		uint ## size ## _t prev;				\
>   									\
> @@ -587,17 +632,17 @@ __RTE_GEN_BIT_FLIP(, flip,, 64)
>   		return prev & mask;					\
>   	}
>   
> -#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)			\
> +#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(v, qualifier, size)		\
>   	__rte_experimental						\
>   	static inline bool						\
> -	__rte_bit_atomic_test_and_clear ## size(uint ## size ## _t *addr, \
> -						unsigned int nr,	\
> -						int memory_order)	\
> +	__rte_bit_atomic_ ## v ## test_and_clear ## size(qualifier uint ## size ## _t *addr, \
> +							 unsigned int nr, \
> +							 int memory_order) \
>   	{								\
>   		RTE_ASSERT(nr < size);					\
>   									\
> -		RTE_ATOMIC(uint ## size ## _t) *a_addr =		\
> -			(RTE_ATOMIC(uint ## size ## _t) *)addr;		\
> +		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
> +			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
>   		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
>   		uint ## size ## _t prev;				\
>   									\
> @@ -607,34 +652,36 @@ __RTE_GEN_BIT_FLIP(, flip,, 64)
>   		return prev & mask;					\
>   	}
>   
> -#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)			\
> +#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(v, qualifier, size)	\
>   	__rte_experimental						\
>   	static inline bool						\
> -	__rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
> -						 unsigned int nr,	\
> -						 bool value,		\
> -						 int memory_order)	\
> +	__rte_bit_atomic_ ## v ## test_and_assign ## size(qualifier uint ## size ## _t *addr, \
> +							  unsigned int nr, \
> +							  bool value,	\
> +							  int memory_order) \
>   	{								\
>   		if (value)						\
> -			return __rte_bit_atomic_test_and_set ## size(addr, nr, \
> -								     memory_order); \
> +			return __rte_bit_atomic_ ## v ## test_and_set ## size(addr, nr, memory_order); \
>   		else							\
> -			return __rte_bit_atomic_test_and_clear ## size(addr, nr, \
> -								       memory_order); \
> +			return __rte_bit_atomic_ ## v ## test_and_clear ## size(addr, nr, memory_order); \
>   	}
>   
> -#define __RTE_GEN_BIT_ATOMIC_OPS(size)			\
> -	__RTE_GEN_BIT_ATOMIC_TEST(size)			\
> -	__RTE_GEN_BIT_ATOMIC_SET(size)			\
> -	__RTE_GEN_BIT_ATOMIC_CLEAR(size)		\
> -	__RTE_GEN_BIT_ATOMIC_ASSIGN(size)		\
> -	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)		\
> -	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)	\
> -	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)	\
> -	__RTE_GEN_BIT_ATOMIC_FLIP(size)
> +#define __RTE_GEN_BIT_ATOMIC_OPS(v, qualifier, size)	\
> +	__RTE_GEN_BIT_ATOMIC_TEST(v, qualifier, size)	\
> +	__RTE_GEN_BIT_ATOMIC_SET(v, qualifier, size)	\
> +	__RTE_GEN_BIT_ATOMIC_CLEAR(v, qualifier, size)	\
> +	__RTE_GEN_BIT_ATOMIC_ASSIGN(v, qualifier, size)	\
> +	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(v, qualifier, size) \
> +	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(v, qualifier, size) \
> +	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(v, qualifier, size) \
> +	__RTE_GEN_BIT_ATOMIC_FLIP(v, qualifier, size)
>   
> -__RTE_GEN_BIT_ATOMIC_OPS(32)
> -__RTE_GEN_BIT_ATOMIC_OPS(64)
> +#define __RTE_GEN_BIT_ATOMIC_OPS_SIZE(size) \
> +	__RTE_GEN_BIT_ATOMIC_OPS(,, size) \
> +	__RTE_GEN_BIT_ATOMIC_OPS(v_, volatile, size)
> +
> +__RTE_GEN_BIT_ATOMIC_OPS_SIZE(32)
> +__RTE_GEN_BIT_ATOMIC_OPS_SIZE(64)

The first argument for these should probably be called "family", for 
consistency with the non-atomic ops.

>   
>   /*------------------------ 32-bit relaxed operations ------------------------*/
>   <snip>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 1/5] eal: extend bit manipulation functionality
  2024-08-12 11:16                                   ` Jack Bond-Preston
@ 2024-08-12 11:58                                     ` Mattias Rönnblom
  0 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-12 11:58 UTC (permalink / raw)
  To: Jack Bond-Preston, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup

On 2024-08-12 13:16, Jack Bond-Preston wrote:
> On 09/08/2024 10:58, Mattias Rönnblom wrote:
> <snip>
>> +
>> +__RTE_GEN_BIT_TEST(, test,, 32)
>> +__RTE_GEN_BIT_SET(, set,, 32)
>> +__RTE_GEN_BIT_CLEAR(, clear,, 32)
>> +__RTE_GEN_BIT_ASSIGN(, assign,, 32)
>> +__RTE_GEN_BIT_FLIP(, flip,, 32)
>> +
>> +__RTE_GEN_BIT_TEST(, test,, 64)
>> +__RTE_GEN_BIT_SET(, set,, 64)
>> +__RTE_GEN_BIT_CLEAR(, clear,, 64)
>> +__RTE_GEN_BIT_ASSIGN(, assign,, 64)
>> +__RTE_GEN_BIT_FLIP(, flip,, 64)
> 
> What is the purpose of the `fun` argument? As opposed to just having 
> these written out in the macro definitions. I notice the atomic 
> equivalents don't have this.
> 

It has no purpose, any more. I failed to clean that up, after removing 
the "once" family of functions.

Thanks.

>>   /*------------------------ 32-bit relaxed operations 
>> ------------------------*/
>>   /**
>> <snip>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 3/5] eal: add atomic bit operations
  2024-08-12 11:19                                   ` Jack Bond-Preston
@ 2024-08-12 12:00                                     ` Mattias Rönnblom
  0 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-12 12:00 UTC (permalink / raw)
  To: Jack Bond-Preston, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup

On 2024-08-12 13:19, Jack Bond-Preston wrote:
> On 09/08/2024 10:58, Mattias Rönnblom wrote:
>> <snip>
>> +
>> +#define __RTE_GEN_BIT_ATOMIC_OPS(size)            \
>> +    __RTE_GEN_BIT_ATOMIC_TEST(size)            \
>> +    __RTE_GEN_BIT_ATOMIC_SET(size)            \
>> +    __RTE_GEN_BIT_ATOMIC_CLEAR(size)        \
>> +    __RTE_GEN_BIT_ATOMIC_ASSIGN(size)        \
>> +    __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)        \
>> +    __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)    \
>> +    __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)    \
>> +    __RTE_GEN_BIT_ATOMIC_FLIP(size)
>> +
>> +__RTE_GEN_BIT_ATOMIC_OPS(32)
>> +__RTE_GEN_BIT_ATOMIC_OPS(64)
> 
> For the non-atomic operations, the arguments family and qualifier were 
> added in the initial commit, and unused until the 
> volatile-support-adding commit. Perhaps the atomic equivalents should be 
> the same? (ie. add the family and qualifier arguments in this patch and 
> don't use them until patch 5/5.
> 

Sounds like a good idea.

>> +
>>   /*------------------------ 32-bit relaxed operations 
>> ------------------------*/
>>   /**
>> <snip>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v2 5/5] eal: extend bitops to handle volatile pointers
  2024-08-12 11:22                                   ` Jack Bond-Preston
@ 2024-08-12 12:28                                     ` Mattias Rönnblom
  0 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-12 12:28 UTC (permalink / raw)
  To: Jack Bond-Preston, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Joyce Kong, Tyler Retzlaff,
	Morten Brørup

On 2024-08-12 13:22, Jack Bond-Preston wrote:
> On 09/08/2024 10:58, Mattias Rönnblom wrote:
>> <snip>
>> +#define __RTE_GEN_BIT_ATOMIC_TEST(v, qualifier, size)            \
>>       __rte_experimental                        \
>>       static inline bool                        \
>> -    __rte_bit_atomic_test ## size(const uint ## size ## _t *addr,    \
>> -                      unsigned int nr, int memory_order) \
>> +    __rte_bit_atomic_ ## v ## test ## size(const qualifier uint ## 
>> size ## _t *addr, \
>> +                           unsigned int nr, int memory_order) \
>>       {                                \
>>           RTE_ASSERT(nr < size);                    \
>>                                       \
>> -        const RTE_ATOMIC(uint ## size ## _t) *a_addr =        \
>> -            (const RTE_ATOMIC(uint ## size ## _t) *)addr;    \
>> +        const qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr = \
>> +            (const qualifier RTE_ATOMIC(uint ## size ## _t) *)addr;    \
>>           uint ## size ## _t mask = (uint ## size ## _t)1 << nr;    \
>>           return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
>>       }
>> -#define __RTE_GEN_BIT_ATOMIC_SET(size)                    \
>> +#define __RTE_GEN_BIT_ATOMIC_SET(v, qualifier, size)            \
>>       __rte_experimental                        \
>>       static inline void                        \
>> -    __rte_bit_atomic_set ## size(uint ## size ## _t *addr,        \
>> -                     unsigned int nr, int memory_order)    \
>> +    __rte_bit_atomic_ ## v ## set ## size(qualifier uint ## size ## 
>> _t *addr, \
>> +                          unsigned int nr, int memory_order) \
>>       {                                \
>>           RTE_ASSERT(nr < size);                    \
>>                                       \
>> -        RTE_ATOMIC(uint ## size ## _t) *a_addr =        \
>> -            (RTE_ATOMIC(uint ## size ## _t) *)addr;        \
>> +        qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =    \
>> +            (qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
>>           uint ## size ## _t mask = (uint ## size ## _t)1 << nr;    \
>>           rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
>>       }
>> -#define __RTE_GEN_BIT_ATOMIC_CLEAR(size)                \
>> +#define __RTE_GEN_BIT_ATOMIC_CLEAR(v, qualifier, size)            \
>>       __rte_experimental                        \
>>       static inline void                        \
>> -    __rte_bit_atomic_clear ## size(uint ## size ## _t *addr,    \
>> -                       unsigned int nr, int memory_order) \
>> +    __rte_bit_atomic_ ## v ## clear ## size(qualifier uint ## size ## 
>> _t *addr,    \
>> +                        unsigned int nr, int memory_order) \
>>       {                                \
>>           RTE_ASSERT(nr < size);                    \
>>                                       \
>> -        RTE_ATOMIC(uint ## size ## _t) *a_addr =        \
>> -            (RTE_ATOMIC(uint ## size ## _t) *)addr;        \
>> +        qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =    \
>> +            (qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
>>           uint ## size ## _t mask = (uint ## size ## _t)1 << nr;    \
>>           rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
>>       }
>> -#define __RTE_GEN_BIT_ATOMIC_FLIP(size)                    \
>> +#define __RTE_GEN_BIT_ATOMIC_FLIP(v, qualifier, size)            \
>>       __rte_experimental                        \
>>       static inline void                        \
>> -    __rte_bit_atomic_flip ## size(uint ## size ## _t *addr,        \
>> -                       unsigned int nr, int memory_order) \
>> +    __rte_bit_atomic_ ## v ## flip ## size(qualifier uint ## size ## 
>> _t *addr, \
>> +                           unsigned int nr, int memory_order) \
>>       {                                \
>>           RTE_ASSERT(nr < size);                    \
>>                                       \
>> -        RTE_ATOMIC(uint ## size ## _t) *a_addr =        \
>> -            (RTE_ATOMIC(uint ## size ## _t) *)addr;        \
>> +        qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =    \
>> +            (qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
>>           uint ## size ## _t mask = (uint ## size ## _t)1 << nr;    \
>>           rte_atomic_fetch_xor_explicit(a_addr, mask, memory_order); \
>>       }
>> -#define __RTE_GEN_BIT_ATOMIC_ASSIGN(size)                \
>> +#define __RTE_GEN_BIT_ATOMIC_ASSIGN(v, qualifier, size)            \
>>       __rte_experimental                        \
>>       static inline void                        \
>> -    __rte_bit_atomic_assign ## size(uint ## size ## _t *addr,    \
>> -                    unsigned int nr, bool value,    \
>> -                    int memory_order)        \
>> +    __rte_bit_atomic_## v ## assign ## size(qualifier uint ## size ## 
>> _t *addr, \
>> +                        unsigned int nr, bool value, \
>> +                        int memory_order)    \
>>       {                                \
>>           if (value)                        \
>> -            __rte_bit_atomic_set ## size(addr, nr, memory_order); \
>> +            __rte_bit_atomic_ ## v ## set ## size(addr, nr, 
>> memory_order); \
>>           else                            \
>> -            __rte_bit_atomic_clear ## size(addr, nr,    \
>> -                               memory_order);    \
>> +            __rte_bit_atomic_ ## v ## clear ## size(addr, nr, \
>> +                                     memory_order); \
>>       }
>> -#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)                \
>> +#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(v, qualifier, size)        \
>>       __rte_experimental                        \
>>       static inline bool                        \
>> -    __rte_bit_atomic_test_and_set ## size(uint ## size ## _t *addr,    \
>> -                          unsigned int nr,        \
>> -                          int memory_order)        \
>> +    __rte_bit_atomic_ ## v ## test_and_set ## size(qualifier uint ## 
>> size ## _t *addr, \
>> +                               unsigned int nr,    \
>> +                               int memory_order) \
>>       {                                \
>>           RTE_ASSERT(nr < size);                    \
>>                                       \
>> -        RTE_ATOMIC(uint ## size ## _t) *a_addr =        \
>> -            (RTE_ATOMIC(uint ## size ## _t) *)addr;        \
>> +        qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =    \
>> +            (qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
>>           uint ## size ## _t mask = (uint ## size ## _t)1 << nr;    \
>>           uint ## size ## _t prev;                \
>>                                       \
>> @@ -587,17 +632,17 @@ __RTE_GEN_BIT_FLIP(, flip,, 64)
>>           return prev & mask;                    \
>>       }
>> -#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)            \
>> +#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(v, qualifier, size)        \
>>       __rte_experimental                        \
>>       static inline bool                        \
>> -    __rte_bit_atomic_test_and_clear ## size(uint ## size ## _t *addr, \
>> -                        unsigned int nr,    \
>> -                        int memory_order)    \
>> +    __rte_bit_atomic_ ## v ## test_and_clear ## size(qualifier uint 
>> ## size ## _t *addr, \
>> +                             unsigned int nr, \
>> +                             int memory_order) \
>>       {                                \
>>           RTE_ASSERT(nr < size);                    \
>>                                       \
>> -        RTE_ATOMIC(uint ## size ## _t) *a_addr =        \
>> -            (RTE_ATOMIC(uint ## size ## _t) *)addr;        \
>> +        qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =    \
>> +            (qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
>>           uint ## size ## _t mask = (uint ## size ## _t)1 << nr;    \
>>           uint ## size ## _t prev;                \
>>                                       \
>> @@ -607,34 +652,36 @@ __RTE_GEN_BIT_FLIP(, flip,, 64)
>>           return prev & mask;                    \
>>       }
>> -#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)            \
>> +#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(v, qualifier, size)    \
>>       __rte_experimental                        \
>>       static inline bool                        \
>> -    __rte_bit_atomic_test_and_assign ## size(uint ## size ## _t *addr, \
>> -                         unsigned int nr,    \
>> -                         bool value,        \
>> -                         int memory_order)    \
>> +    __rte_bit_atomic_ ## v ## test_and_assign ## size(qualifier uint 
>> ## size ## _t *addr, \
>> +                              unsigned int nr, \
>> +                              bool value,    \
>> +                              int memory_order) \
>>       {                                \
>>           if (value)                        \
>> -            return __rte_bit_atomic_test_and_set ## size(addr, nr, \
>> -                                     memory_order); \
>> +            return __rte_bit_atomic_ ## v ## test_and_set ## 
>> size(addr, nr, memory_order); \
>>           else                            \
>> -            return __rte_bit_atomic_test_and_clear ## size(addr, nr, \
>> -                                       memory_order); \
>> +            return __rte_bit_atomic_ ## v ## test_and_clear ## 
>> size(addr, nr, memory_order); \
>>       }
>> -#define __RTE_GEN_BIT_ATOMIC_OPS(size)            \
>> -    __RTE_GEN_BIT_ATOMIC_TEST(size)            \
>> -    __RTE_GEN_BIT_ATOMIC_SET(size)            \
>> -    __RTE_GEN_BIT_ATOMIC_CLEAR(size)        \
>> -    __RTE_GEN_BIT_ATOMIC_ASSIGN(size)        \
>> -    __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(size)        \
>> -    __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(size)    \
>> -    __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(size)    \
>> -    __RTE_GEN_BIT_ATOMIC_FLIP(size)
>> +#define __RTE_GEN_BIT_ATOMIC_OPS(v, qualifier, size)    \
>> +    __RTE_GEN_BIT_ATOMIC_TEST(v, qualifier, size)    \
>> +    __RTE_GEN_BIT_ATOMIC_SET(v, qualifier, size)    \
>> +    __RTE_GEN_BIT_ATOMIC_CLEAR(v, qualifier, size)    \
>> +    __RTE_GEN_BIT_ATOMIC_ASSIGN(v, qualifier, size)    \
>> +    __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(v, qualifier, size) \
>> +    __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(v, qualifier, size) \
>> +    __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(v, qualifier, size) \
>> +    __RTE_GEN_BIT_ATOMIC_FLIP(v, qualifier, size)
>> -__RTE_GEN_BIT_ATOMIC_OPS(32)
>> -__RTE_GEN_BIT_ATOMIC_OPS(64)
>> +#define __RTE_GEN_BIT_ATOMIC_OPS_SIZE(size) \
>> +    __RTE_GEN_BIT_ATOMIC_OPS(,, size) \
>> +    __RTE_GEN_BIT_ATOMIC_OPS(v_, volatile, size)
>> +
>> +__RTE_GEN_BIT_ATOMIC_OPS_SIZE(32)
>> +__RTE_GEN_BIT_ATOMIC_OPS_SIZE(64)
> 
> The first argument for these should probably be called "family", for 
> consistency with the non-atomic ops.
> 

The family is "atomic" or "" (for the non-atomic version, so it's not a 
good name.

I'll rename the macro parameters in __RTE_GEN_BIT_TEST() instead. 
'qualifier' should be 'c', or maybe const_qualifier or const_qual to be 
more descriptive. The names should be consistent with the overload macros.

>>   /*------------------------ 32-bit relaxed operations 
>> ------------------------*/
>>   <snip>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v3 0/5] Improve EAL bit operations API
  2024-08-09  9:58                                 ` [PATCH v2 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-08-12 11:16                                   ` Jack Bond-Preston
@ 2024-08-12 12:49                                   ` Mattias Rönnblom
  2024-08-12 12:49                                     ` [PATCH v3 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
                                                       ` (5 more replies)
  1 sibling, 6 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-12 12:49 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() functions is replaced
with two new families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees, but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant. rte_bit_[test|set|clear|assign|flip]() may be
used with volatile word pointers, in which case they guarantee
that the program-level accesses actually occur.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions, implemented by
means of a huge, complicated C macro mess.

Mattias Rönnblom (5):
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions
  eal: extend bitops to handle volatile pointers

 app/test/test_bitops.c                 | 416 +++++++++++++-
 doc/guides/rel_notes/release_24_11.rst |  17 +
 lib/eal/include/rte_bitops.h           | 768 ++++++++++++++++++++++++-
 3 files changed, 1183 insertions(+), 18 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v3 1/5] eal: extend bit manipulation functionality
  2024-08-12 12:49                                   ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
@ 2024-08-12 12:49                                     ` Mattias Rönnblom
  2024-08-12 13:24                                       ` Jack Bond-Preston
  2024-09-09 14:57                                       ` [PATCH v4 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-08-12 12:49                                     ` [PATCH v3 2/5] eal: add unit tests for bit operations Mattias Rönnblom
                                                       ` (4 subsequent siblings)
  5 siblings, 2 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-12 12:49 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, Mattias Rönnblom

Add functionality to test and modify the value of individual bits in
32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

--

PATCH v3:
 * Remove unnecessary <rte_compat.h> include.
 * Remove redundant 'fun' parameter from the __RTE_GEN_BIT_*() macros
   (Jack Bond-Preston).
 * Introduce __RTE_BIT_BIT_OPS() macro, consistent with how things
   are done when generating the atomic bit operations.
 * Refer to volatile bit op functions as variants instead of families
   (macro parameter naming).

RFC v6:
 * Have rte_bit_test() accept const-marked bitsets.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.
---
 lib/eal/include/rte_bitops.h | 260 ++++++++++++++++++++++++++++++++++-
 1 file changed, 258 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..6915b945ba 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,197 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		uint32_t *: __rte_bit_test32,			\
+		const uint32_t *: __rte_bit_test32,		\
+		uint64_t *: __rte_bit_test64,			\
+		const uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## variant ## test ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## set ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## clear ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## assign ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## variant ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## variant ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## flip ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## variant ## test ## size(addr, nr); \
+		__rte_bit_ ## variant ## assign ## size(addr, nr, !value); \
+	}
+
+#define __RTE_GEN_BIT_OPS(v, qualifier, size)	\
+	__RTE_GEN_BIT_TEST(v, qualifier, size)	\
+	__RTE_GEN_BIT_SET(v, qualifier, size)	\
+	__RTE_GEN_BIT_CLEAR(v, qualifier, size)	\
+	__RTE_GEN_BIT_ASSIGN(v, qualifier, size)	\
+	__RTE_GEN_BIT_FLIP(v, qualifier, size)
+
+#define __RTE_GEN_BIT_OPS_SIZE(size) \
+	__RTE_GEN_BIT_OPS(,, size)
+
+__RTE_GEN_BIT_OPS_SIZE(32)
+__RTE_GEN_BIT_OPS_SIZE(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +981,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v3 2/5] eal: add unit tests for bit operations
  2024-08-12 12:49                                   ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
  2024-08-12 12:49                                     ` [PATCH v3 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-08-12 12:49                                     ` Mattias Rönnblom
  2024-08-12 13:25                                       ` Jack Bond-Preston
  2024-08-12 12:49                                     ` [PATCH v3 3/5] eal: add atomic " Mattias Rönnblom
                                                       ` (3 subsequent siblings)
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-12 12:49 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_[test|set|clear|assign|flip]()
functions.

The tests are converted to use the test suite runner framework.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

--

RFC v6:
 * Test rte_bit_*test() usage through const pointers.

RFC v4:
 * Remove redundant line continuations.
---
 app/test/test_bitops.c | 85 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..322f58c066 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,68 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+									\
+			const uint ## size ## _t *const_ptr = &word;	\
+			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
+				    reference_bit,			\
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +172,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v3 3/5] eal: add atomic bit operations
  2024-08-12 12:49                                   ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
  2024-08-12 12:49                                     ` [PATCH v3 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-08-12 12:49                                     ` [PATCH v3 2/5] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-08-12 12:49                                     ` Mattias Rönnblom
  2024-08-12 13:25                                       ` Jack Bond-Preston
  2024-08-12 12:49                                     ` [PATCH v3 4/5] eal: add unit tests for atomic bit access functions Mattias Rönnblom
                                                       ` (2 subsequent siblings)
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-12 12:49 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, Mattias Rönnblom

Add atomic bit test/set/clear/assign/flip and
test-and-set/clear/assign/flip functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

--

PATCH v3:
 * Introduce __RTE_GEN_BIT_ATOMIC_*() 'qualifier' argument already in
   this patch (Jack Bond-Preston).
 * Refer to volatile bit op functions as variants instead of families
   (macro parameter naming).
 * Update release notes.

PATCH:
 * Add missing macro #undef for C++ version of atomic bit flip.

RFC v7:
 * Replace compare-exchange-based rte_bitset_atomic_test_and_*() and
   flip() with implementations that use the previous value as returned
   by the atomic fetch function.
 * Reword documentation to match the non-atomic macro variants.
 * Remove pointer to <rte_stdatomic.h> for memory model documentation,
   since there is no documentation for that API.

RFC v6:
 * Have rte_bit_atomic_test() accept const-marked bitsets.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.
---
 doc/guides/rel_notes/release_24_11.rst |  17 +
 lib/eal/include/rte_bitops.h           | 415 +++++++++++++++++++++++++
 2 files changed, 432 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..3111b1e4c0 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -56,6 +56,23 @@ New Features
      =======================================================
 
 
+* **Extended bit operations API.**
+
+  The support for bit-level operations on single 32- and 64-bit words
+  in <rte_bitops.h> has been extended with two families of
+  semantically well-defined functions.
+
+  rte_bit_[test|set|clear|assign|flip]() functions provide excellent
+  performance (by avoiding restricting the compiler and CPU), but give
+  no guarantees in regards to memory ordering or atomicity.
+
+  rte_bit_atomic_*() provides atomic bit-level operations, including
+  the possibility to specifying memory ordering constraints.
+
+  The new public API elements are polymorphic, using the _Generic-
+  based macros (for C) and function overloading (in C++ translation
+  units).
+
 Removed Items
 -------------
 
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 6915b945ba..3ad6795fd1 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -226,6 +227,204 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 uint64_t *: __rte_bit_atomic_test64,			\
+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
+							    memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '1', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '0', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Generic selection macro to atomically negate the value of the bit
+ * specified by @c nr in the word pointed to by @c addr to the value
+ * indicated by @c value, with the memory ordering as specified with
+ * @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Generic selection macro to atomically test and set bit specified by
+ * @c nr in the word pointed to by @c addr to '1', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Generic selection macro to atomically test and clear bit specified
+ * by @c nr in the word pointed to by @c addr to '0', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Generic selection macro to atomically test and assign bit specified
+ * by @c nr in the word pointed to by @c addr the value specified by
+ * @c value, with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(variant, qualifier, size)			\
 	__rte_experimental						\
 	static inline bool						\
@@ -299,6 +498,146 @@ extern "C" {
 __RTE_GEN_BIT_OPS_SIZE(32)
 __RTE_GEN_BIT_OPS_SIZE(64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test ## size(const qualifier uint ## size ## _t *addr, \
+						     unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr = \
+			(const qualifier RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_ ## variant ## set ## size(qualifier uint ## size ## _t *addr, \
+					      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_ ## variant ## clear ## size(qualifier uint ## size ## _t *addr,	\
+						unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_ ## variant ## flip ## size(qualifier uint ## size ## _t *addr, \
+					       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_xor_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_## variant ## assign ## size(qualifier uint ## size ## _t *addr, \
+						unsigned int nr, bool value, \
+						int memory_order)	\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_ ## variant ## set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_ ## variant ## clear ## size(addr, nr, \
+								     memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(variant, qualifier, size)	\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test_and_set ## size(qualifier uint ## size ## _t *addr, \
+						       unsigned int nr,	\
+						       int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+		prev = rte_atomic_fetch_or_explicit(a_addr, mask,	\
+						    memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(variant, qualifier, size)	\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test_and_clear ## size(qualifier uint ## size ## _t *addr, \
+							 unsigned int nr, \
+							 int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+	        prev = rte_atomic_fetch_and_explicit(a_addr, ~mask,	\
+						     memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(variant, qualifier, size)	\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test_and_assign ## size(qualifier uint ## size ## _t *addr, \
+							  unsigned int nr, \
+							  bool value,	\
+							  int memory_order) \
+	{								\
+		if (value)						\
+			return __rte_bit_atomic_ ## variant ## test_and_set ## size(addr, nr, memory_order); \
+		else							\
+			return __rte_bit_atomic_ ## variant ## test_and_clear ## size(addr, nr, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_SET(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(variant, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(variant, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(variant, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_FLIP(variant, qualifier, size)
+
+#define __RTE_GEN_BIT_ATOMIC_OPS_SIZE(size) \
+	__RTE_GEN_BIT_ATOMIC_OPS(,, size)
+
+__RTE_GEN_BIT_ATOMIC_OPS_SIZE(32)
+__RTE_GEN_BIT_ATOMIC_OPS_SIZE(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -994,6 +1333,15 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_flip
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1037,12 +1385,79 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v3 4/5] eal: add unit tests for atomic bit access functions
  2024-08-12 12:49                                   ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
                                                       ` (2 preceding siblings ...)
  2024-08-12 12:49                                     ` [PATCH v3 3/5] eal: add atomic " Mattias Rönnblom
@ 2024-08-12 12:49                                     ` Mattias Rönnblom
  2024-08-12 13:26                                       ` Jack Bond-Preston
  2024-08-12 12:49                                     ` [PATCH v3 5/5] eal: extend bitops to handle volatile pointers Mattias Rönnblom
  2024-08-20 17:05                                     ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-12 12:49 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_atomic_*() family of
functions.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

--

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.
---
 app/test/test_bitops.c | 313 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 312 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 322f58c066..b80216a0a1 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -61,6 +64,304 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -177,6 +478,16 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v3 5/5] eal: extend bitops to handle volatile pointers
  2024-08-12 12:49                                   ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
                                                       ` (3 preceding siblings ...)
  2024-08-12 12:49                                     ` [PATCH v3 4/5] eal: add unit tests for atomic bit access functions Mattias Rönnblom
@ 2024-08-12 12:49                                     ` Mattias Rönnblom
  2024-08-12 13:26                                       ` Jack Bond-Preston
  2024-08-20 17:05                                     ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-12 12:49 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, Mattias Rönnblom

Have rte_bit_[test|set|clear|assign|flip]() and rte_bit_atomic_*()
handle volatile-marked pointers.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>

--

PATCH v3:
 * Updated to reflect removed 'fun' parameter in __RTE_GEN_BIT_*()
   (Jack Bond-Preston).

PATCH v2:
 * Actually run the test_bit_atomic_v_access*() test functions.
---
 app/test/test_bitops.c       |  32 +++-
 lib/eal/include/rte_bitops.h | 301 +++++++++++++++++++++++------------
 2 files changed, 222 insertions(+), 111 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index b80216a0a1..10e87f6776 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -14,13 +14,13 @@
 #include "test.h"
 
 #define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
-			    flip_fun, test_fun, size)			\
+			    flip_fun, test_fun, size, mod)		\
 	static int							\
 	test_name(void)							\
 	{								\
 		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
 		unsigned int bit_nr;					\
-		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+		mod uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
 									\
 		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
 			bool reference_bit = (reference >> bit_nr) & 1;	\
@@ -41,7 +41,7 @@
 				    "Bit %d had unflipped value", bit_nr); \
 			flip_fun(&word, bit_nr);			\
 									\
-			const uint ## size ## _t *const_ptr = &word;	\
+			const mod uint ## size ## _t *const_ptr = &word; \
 			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
 				    reference_bit,			\
 				    "Bit %d had unexpected value", bit_nr); \
@@ -59,10 +59,16 @@
 	}
 
 GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
-		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32,)
 
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
-		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64,)
+
+GEN_TEST_BIT_ACCESS(test_bit_v_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32, volatile)
+
+GEN_TEST_BIT_ACCESS(test_bit_v_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64, volatile)
 
 #define bit_atomic_set(addr, nr)				\
 	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
@@ -81,11 +87,19 @@ GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 
 GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
 		    bit_atomic_clear, bit_atomic_assign,
-		    bit_atomic_flip, bit_atomic_test, 32)
+		    bit_atomic_flip, bit_atomic_test, 32,)
 
 GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
 		    bit_atomic_clear, bit_atomic_assign,
-		    bit_atomic_flip, bit_atomic_test, 64)
+		    bit_atomic_flip, bit_atomic_test, 64,)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_v_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32, volatile)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_v_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64, volatile)
 
 #define PARALLEL_TEST_RUNTIME 0.25
 
@@ -480,8 +494,12 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_v_access32),
+		TEST_CASE(test_bit_v_access64),
 		TEST_CASE(test_bit_atomic_access32),
 		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_v_access32),
+		TEST_CASE(test_bit_atomic_v_access64),
 		TEST_CASE(test_bit_atomic_parallel_assign32),
 		TEST_CASE(test_bit_atomic_parallel_assign64),
 		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3ad6795fd1..d7a07c4099 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -127,12 +127,16 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_test(addr, nr)					\
-	_Generic((addr),					\
-		uint32_t *: __rte_bit_test32,			\
-		const uint32_t *: __rte_bit_test32,		\
-		uint64_t *: __rte_bit_test64,			\
-		const uint64_t *: __rte_bit_test64)(addr, nr)
+#define rte_bit_test(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_test32,				\
+		 const uint32_t *: __rte_bit_test32,			\
+		 volatile uint32_t *: __rte_bit_v_test32,		\
+		 const volatile uint32_t *: __rte_bit_v_test32,		\
+		 uint64_t *: __rte_bit_test64,				\
+		 const uint64_t *: __rte_bit_test64,			\
+		 volatile uint64_t *: __rte_bit_v_test64,		\
+		 const volatile uint64_t *: __rte_bit_v_test64)(addr, nr)
 
 /**
  * @warning
@@ -152,10 +156,12 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_set(addr, nr)				\
-	_Generic((addr),				\
-		 uint32_t *: __rte_bit_set32,		\
-		 uint64_t *: __rte_bit_set64)(addr, nr)
+#define rte_bit_set(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_set32,				\
+		 volatile uint32_t *: __rte_bit_v_set32,		\
+		 uint64_t *: __rte_bit_set64,				\
+		 volatile uint64_t *: __rte_bit_v_set64)(addr, nr)
 
 /**
  * @warning
@@ -175,10 +181,12 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_clear(addr, nr)					\
-	_Generic((addr),					\
-		 uint32_t *: __rte_bit_clear32,			\
-		 uint64_t *: __rte_bit_clear64)(addr, nr)
+#define rte_bit_clear(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_clear32,				\
+		 volatile uint32_t *: __rte_bit_v_clear32,		\
+		 uint64_t *: __rte_bit_clear64,				\
+		 volatile uint64_t *: __rte_bit_v_clear64)(addr, nr)
 
 /**
  * @warning
@@ -202,7 +210,9 @@ extern "C" {
 #define rte_bit_assign(addr, nr, value)					\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_assign32,			\
-		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+		 volatile uint32_t *: __rte_bit_v_assign32,		\
+		 uint64_t *: __rte_bit_assign64,			\
+		 volatile uint64_t *: __rte_bit_v_assign64)(addr, nr, value)
 
 /**
  * @warning
@@ -225,7 +235,9 @@ extern "C" {
 #define rte_bit_flip(addr, nr)						\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_flip32,				\
-		 uint64_t *: __rte_bit_flip64)(addr, nr)
+		 volatile uint32_t *: __rte_bit_v_flip32,		\
+		 uint64_t *: __rte_bit_flip64,				\
+		 volatile uint64_t *: __rte_bit_v_flip64)(addr, nr)
 
 /**
  * @warning
@@ -250,9 +262,13 @@ extern "C" {
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test32,			\
 		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 volatile uint32_t *: __rte_bit_atomic_v_test32,	\
+		 const volatile uint32_t *: __rte_bit_atomic_v_test32,	\
 		 uint64_t *: __rte_bit_atomic_test64,			\
-		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
-							    memory_order)
+		 const uint64_t *: __rte_bit_atomic_test64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test64,	\
+		 const volatile uint64_t *: __rte_bit_atomic_v_test64) \
+						    (addr, nr, memory_order)
 
 /**
  * @warning
@@ -274,7 +290,10 @@ extern "C" {
 #define rte_bit_atomic_set(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_set32,			\
-		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_set32,		\
+		 uint64_t *: __rte_bit_atomic_set64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_set64)(addr, nr, \
+								memory_order)
 
 /**
  * @warning
@@ -296,7 +315,10 @@ extern "C" {
 #define rte_bit_atomic_clear(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_clear32,			\
-		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_clear32,	\
+		 uint64_t *: __rte_bit_atomic_clear64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_clear64)(addr, nr, \
+								  memory_order)
 
 /**
  * @warning
@@ -320,8 +342,11 @@ extern "C" {
 #define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_assign32,			\
-		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
-							memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_assign32,	\
+		 uint64_t *: __rte_bit_atomic_assign64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_assign64)(addr, nr, \
+								   value, \
+								   memory_order)
 
 /**
  * @warning
@@ -344,7 +369,10 @@ extern "C" {
 #define rte_bit_atomic_flip(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_flip32,			\
-		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_flip32,	\
+		 uint64_t *: __rte_bit_atomic_flip64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_flip64)(addr, nr, \
+								 memory_order)
 
 /**
  * @warning
@@ -368,8 +396,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
-		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
-							      memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_set32, \
+		 uint64_t *: __rte_bit_atomic_test_and_set64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_set64) \
+						    (addr, nr, memory_order)
 
 /**
  * @warning
@@ -393,8 +423,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
-		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
-								memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_clear32, \
+		 uint64_t *: __rte_bit_atomic_test_and_clear64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_clear64) \
+						       (addr, nr, memory_order)
 
 /**
  * @warning
@@ -421,9 +453,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
-		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
-								 value, \
-								 memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_assign32, \
+		 uint64_t *: __rte_bit_atomic_test_and_assign64,	\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_assign64) \
+						(addr, nr, value, memory_order)
 
 #define __RTE_GEN_BIT_TEST(variant, qualifier, size)			\
 	__rte_experimental						\
@@ -493,7 +526,8 @@ extern "C" {
 	__RTE_GEN_BIT_FLIP(v, qualifier, size)
 
 #define __RTE_GEN_BIT_OPS_SIZE(size) \
-	__RTE_GEN_BIT_OPS(,, size)
+	__RTE_GEN_BIT_OPS(,, size) \
+	__RTE_GEN_BIT_OPS(v_, volatile, size)
 
 __RTE_GEN_BIT_OPS_SIZE(32)
 __RTE_GEN_BIT_OPS_SIZE(64)
@@ -633,7 +667,8 @@ __RTE_GEN_BIT_OPS_SIZE(64)
 	__RTE_GEN_BIT_ATOMIC_FLIP(variant, qualifier, size)
 
 #define __RTE_GEN_BIT_ATOMIC_OPS_SIZE(size) \
-	__RTE_GEN_BIT_ATOMIC_OPS(,, size)
+	__RTE_GEN_BIT_ATOMIC_OPS(,, size) \
+	__RTE_GEN_BIT_ATOMIC_OPS(v_, volatile, size)
 
 __RTE_GEN_BIT_ATOMIC_OPS_SIZE(32)
 __RTE_GEN_BIT_ATOMIC_OPS_SIZE(64)
@@ -1342,120 +1377,178 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_atomic_test_and_clear
 #undef rte_bit_atomic_test_and_assign
 
-#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+#define __RTE_BIT_OVERLOAD_V_2(family, v, fun, c, size, arg1_type, arg1_name) \
 	static inline void						\
-	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
-			arg1_type arg1_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name)			\
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
-	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
-	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+#define __RTE_BIT_OVERLOAD_SZ_2(family, fun, c, size, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_V_2(family,, fun, c, size, arg1_type,	\
+			       arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2(family, v_, fun, c volatile, size, \
+			       arg1_type, arg1_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name)				\
+#define __RTE_BIT_OVERLOAD_2(family, fun, c, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(family, fun, c, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(family, fun, c, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_V_2R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name)				\
 	static inline ret_type						\
-	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
 			arg1_type arg1_name)				\
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
-	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2R(family,, fun, c, size, ret_type, arg1_type, \
+				arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2R(family, v_, fun, c volatile,		\
+				size, ret_type, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_2R(family, fun, c, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, 32, ret_type, arg1_type, \
 				 arg1_name)				\
-	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+	__RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, 64, ret_type, arg1_type, \
 				 arg1_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
-				arg2_type, arg2_name)			\
+#define __RTE_BIT_OVERLOAD_V_3(family, v, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name)			\
 	static inline void						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name) \
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name, \
+							 arg2_name);	\
 	}
 
-#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+#define __RTE_BIT_OVERLOAD_SZ_3(family, fun, c, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_V_3(family,, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_V_3(family, v_, fun, c volatile, size, arg1_type, \
+			       arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_3(family, fun, c, arg1_type, arg1_name, arg2_type, \
 			     arg2_name)					\
-	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+	__RTE_BIT_OVERLOAD_SZ_3(family, fun, c, 32, arg1_type, arg1_name, \
 				arg2_type, arg2_name)			\
-	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+	__RTE_BIT_OVERLOAD_SZ_3(family, fun, c, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name)	\
+#define __RTE_BIT_OVERLOAD_V_3R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name)	\
 	static inline ret_type						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name) \
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name, \
+								arg2_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
-			      arg2_type, arg2_name)			\
-	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, size, ret_type, arg1_type, \
 				 arg1_name, arg2_type, arg2_name)	\
-	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name)
+	__RTE_BIT_OVERLOAD_V_3R(family,, fun, c, size, ret_type, \
+				arg1_type, arg1_name, arg2_type, arg2_name) \
+	__RTE_BIT_OVERLOAD_V_3R(family, v_, fun, c volatile, size, \
+				ret_type, arg1_type, arg1_name, arg2_type, \
+				arg2_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
-				arg2_type, arg2_name, arg3_type, arg3_name) \
+#define __RTE_BIT_OVERLOAD_3R(family, fun, c, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, 32, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name) \
+	__RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, 64, ret_type, \
+				 arg1_type, arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_V_4(family, v, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name, arg3_type,	arg3_name) \
 	static inline void						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name, arg3_type arg3_name)	\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name, \
+				  arg3_type arg3_name)			\
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
-					  arg3_name);		      \
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name, \
+							 arg2_name,	\
+							 arg3_name);	\
 	}
 
-#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
-			     arg2_name, arg3_type, arg3_name)		\
-	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+#define __RTE_BIT_OVERLOAD_SZ_4(family, fun, c, size, arg1_type, arg1_name, \
 				arg2_type, arg2_name, arg3_type, arg3_name) \
-	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
-				arg2_type, arg2_name, arg3_type, arg3_name)
-
-#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name, arg3_type, \
-				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4(family,, fun, c, size, arg1_type,	\
+			       arg1_name, arg2_type, arg2_name, arg3_type, \
+			       arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4(family, v_, fun, c volatile, size,	\
+			       arg1_type, arg1_name, arg2_type, arg2_name, \
+			       arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_4(family, fun, c, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(family, fun, c, 32, arg1_type,		\
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4(family, fun, c, 64, arg1_type,		\
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)
+
+#define __RTE_BIT_OVERLOAD_V_4R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
 	static inline ret_type						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name, arg3_type arg3_name)	\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name, \
+				  arg3_type arg3_name)			\
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
-						 arg3_name);		\
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name, \
+								arg2_name, \
+								arg3_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
-			      arg2_type, arg2_name, arg3_type, arg3_name) \
-	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, size, ret_type, arg1_type, \
 				 arg1_name, arg2_type, arg2_name, arg3_type, \
 				 arg3_name)				\
-	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name, arg3_type, \
-				 arg3_name)
-
-__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
-__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
-__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
-__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
-__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
-
-__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+	__RTE_BIT_OVERLOAD_V_4R(family,, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4R(family, v_, fun, c volatile, size,	\
+				ret_type, arg1_type, arg1_name, arg2_type, \
+				arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_4R(family, fun, c, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, 32, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name, \
+				 arg3_type, arg3_name)			\
+	__RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, 64, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name, \
+				 arg3_type, arg3_name)
+
+__RTE_BIT_OVERLOAD_2R(, test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(, set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(, clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(, assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(, flip,, unsigned int, nr)
+
+__RTE_BIT_OVERLOAD_3R(atomic_, test, const, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+__RTE_BIT_OVERLOAD_3(atomic_, set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_, clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_, assign,, unsigned int, nr, bool, value,
 		     int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_3(atomic_, flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_, test_and_set,, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_3R(atomic_, test_and_clear,, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_4R(atomic_, test_and_assign,, bool, unsigned int, nr,
 		      bool, value, int, memory_order)
 
 #endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 1/5] eal: extend bit manipulation functionality
  2024-08-12 12:49                                     ` [PATCH v3 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-08-12 13:24                                       ` Jack Bond-Preston
  2024-09-09 14:57                                       ` [PATCH v4 0/6] Improve EAL bit operations API Mattias Rönnblom
  1 sibling, 0 replies; 160+ messages in thread
From: Jack Bond-Preston @ 2024-08-12 13:24 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff, Morten Brørup

On 12/08/2024 13:49, Mattias Rönnblom wrote:
> Add functionality to test and modify the value of individual bits in
> 32-bit or 64-bit words.
> 
> These functions have no implications on memory ordering, atomicity and
> does not use volatile and thus does not prevent any compiler
> optimizations.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 2/5] eal: add unit tests for bit operations
  2024-08-12 12:49                                     ` [PATCH v3 2/5] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-08-12 13:25                                       ` Jack Bond-Preston
  0 siblings, 0 replies; 160+ messages in thread
From: Jack Bond-Preston @ 2024-08-12 13:25 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff, Morten Brørup

On 12/08/2024 13:49, Mattias Rönnblom wrote:
> Extend bitops tests to cover the
> rte_bit_[test|set|clear|assign|flip]()
> functions.
> 
> The tests are converted to use the test suite runner framework.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>



^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 3/5] eal: add atomic bit operations
  2024-08-12 12:49                                     ` [PATCH v3 3/5] eal: add atomic " Mattias Rönnblom
@ 2024-08-12 13:25                                       ` Jack Bond-Preston
  0 siblings, 0 replies; 160+ messages in thread
From: Jack Bond-Preston @ 2024-08-12 13:25 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff, Morten Brørup

On 12/08/2024 13:49, Mattias Rönnblom wrote:
> Add atomic bit test/set/clear/assign/flip and
> test-and-set/clear/assign/flip functions.
> 
> All atomic bit functions allow (and indeed, require) the caller to
> specify a memory order.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 4/5] eal: add unit tests for atomic bit access functions
  2024-08-12 12:49                                     ` [PATCH v3 4/5] eal: add unit tests for atomic bit access functions Mattias Rönnblom
@ 2024-08-12 13:26                                       ` Jack Bond-Preston
  0 siblings, 0 replies; 160+ messages in thread
From: Jack Bond-Preston @ 2024-08-12 13:26 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff, Morten Brørup

On 12/08/2024 13:49, Mattias Rönnblom wrote:
> Extend bitops tests to cover the rte_bit_atomic_*() family of
> functions.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>



^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 5/5] eal: extend bitops to handle volatile pointers
  2024-08-12 12:49                                     ` [PATCH v3 5/5] eal: extend bitops to handle volatile pointers Mattias Rönnblom
@ 2024-08-12 13:26                                       ` Jack Bond-Preston
  0 siblings, 0 replies; 160+ messages in thread
From: Jack Bond-Preston @ 2024-08-12 13:26 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff, Morten Brørup

On 12/08/2024 13:49, Mattias Rönnblom wrote:
> Have rte_bit_[test|set|clear|assign|flip]() and rte_bit_atomic_*()
> handle volatile-marked pointers.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>



^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 0/5] Improve EAL bit operations API
  2024-08-12 12:49                                   ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
                                                       ` (4 preceding siblings ...)
  2024-08-12 12:49                                     ` [PATCH v3 5/5] eal: extend bitops to handle volatile pointers Mattias Rönnblom
@ 2024-08-20 17:05                                     ` Mattias Rönnblom
  2024-09-05  8:10                                       ` David Marchand
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-08-20 17:05 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff, Morten Brørup,
	Jack Bond-Preston, david.marchand, Thomas Monjalon

On 2024-08-12 14:49, Mattias Rönnblom wrote:
> This patch set represent an attempt to improve and extend the RTE
> bitops API, in particular for functions that operate on individual
> bits.
> 

Is there anyone else that has any opinion on this patch set? Details, or 
big picture.

<snip>

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 0/5] Improve EAL bit operations API
  2024-08-20 17:05                                     ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
@ 2024-09-05  8:10                                       ` David Marchand
  2024-09-09 12:04                                         ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: David Marchand @ 2024-09-05  8:10 UTC (permalink / raw)
  To: Mattias Rönnblom, Tyler Retzlaff
  Cc: Mattias Rönnblom, dev, Heng Wang, Stephen Hemminger,
	Morten Brørup, Jack Bond-Preston, Thomas Monjalon

Hello,

On Tue, Aug 20, 2024 at 7:05 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>
> On 2024-08-12 14:49, Mattias Rönnblom wrote:
> > This patch set represent an attempt to improve and extend the RTE
> > bitops API, in particular for functions that operate on individual
> > bits.
> >
>
> Is there anyone else that has any opinion on this patch set? Details, or
> big picture.

Tyler, are you ok with this series?

Mattias, there are issues reported by the CI (compilation on Ubuntu
22.04 in GHA, and unit test failure in UNH), please have a look.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 0/5] Improve EAL bit operations API
  2024-09-05  8:10                                       ` David Marchand
@ 2024-09-09 12:04                                         ` Mattias Rönnblom
  2024-09-09 12:24                                           ` Thomas Monjalon
  2024-09-09 12:25                                           ` David Marchand
  0 siblings, 2 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-09 12:04 UTC (permalink / raw)
  To: David Marchand, Tyler Retzlaff
  Cc: Mattias Rönnblom, dev, Heng Wang, Stephen Hemminger,
	Morten Brørup, Jack Bond-Preston, Thomas Monjalon

On 2024-09-05 10:10, David Marchand wrote:
> Hello,
> 
> On Tue, Aug 20, 2024 at 7:05 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>>
>> On 2024-08-12 14:49, Mattias Rönnblom wrote:
>>> This patch set represent an attempt to improve and extend the RTE
>>> bitops API, in particular for functions that operate on individual
>>> bits.
>>>
>>
>> Is there anyone else that has any opinion on this patch set? Details, or
>> big picture.
> 
> Tyler, are you ok with this series?
> 
> Mattias, there are issues reported by the CI (compilation on Ubuntu
> 22.04 in GHA, and unit test failure in UNH), please have a look.
> 
> 

Standard practice in DPDK header files is the following:

--
/* rte_bar.h */
#ifdef __cplusplus
extern "C" {
#endif

#include <rte_foo.h>

void
rte_foo_do(void);

/../
--

That seems not like best practice to me, since rte_bar.h is messing 
around with linkage of constructs of any files included. In particular, 
it prohibits replacing _Generic with C++ function overloading, in C++ TUs.

What one should do is to have extern "C" linkage only on functions which 
the include file in question (e.g., rte_foo.h) itself declares.

--
/* rte_bar.h */
#include <rte_foo.h>

#ifdef __cplusplus
extern "C" {
#endif

void
rte_foo_do(void);

/../
--

There are 259 header files in the DPDK repo in need of fixing.

Should the fix be 259 patches, or something smaller? One large patch, or 
a patch per library, or something else. Please advise, over.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 0/5] Improve EAL bit operations API
  2024-09-09 12:04                                         ` Mattias Rönnblom
@ 2024-09-09 12:24                                           ` Thomas Monjalon
  2024-09-09 12:25                                           ` David Marchand
  1 sibling, 0 replies; 160+ messages in thread
From: Thomas Monjalon @ 2024-09-09 12:24 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: David Marchand, Tyler Retzlaff, Mattias Rönnblom, dev,
	Heng Wang, Stephen Hemminger, Morten Brørup,
	Jack Bond-Preston

09/09/2024 14:04, Mattias Rönnblom:
> What one should do is to have extern "C" linkage only on functions which 
> the include file in question (e.g., rte_foo.h) itself declares.
> 
> --
> /* rte_bar.h */
> #include <rte_foo.h>
> 
> #ifdef __cplusplus
> extern "C" {
> #endif
> 
> void
> rte_foo_do(void);
> 
> /../
> --
> 
> There are 259 header files in the DPDK repo in need of fixing.
> 
> Should the fix be 259 patches, or something smaller? One large patch, or 
> a patch per library, or something else. Please advise, over.

Moving includes in the whole tree can be done in a single patch,
there is nothing specific per library in such a mechanical move.




^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 0/5] Improve EAL bit operations API
  2024-09-09 12:04                                         ` Mattias Rönnblom
  2024-09-09 12:24                                           ` Thomas Monjalon
@ 2024-09-09 12:25                                           ` David Marchand
  2024-09-09 13:09                                             ` Mattias Rönnblom
  1 sibling, 1 reply; 160+ messages in thread
From: David Marchand @ 2024-09-09 12:25 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Tyler Retzlaff, Mattias Rönnblom, dev, Heng Wang,
	Stephen Hemminger, Morten Brørup, Jack Bond-Preston,
	Thomas Monjalon

On Mon, Sep 9, 2024 at 2:05 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> > Mattias, there are issues reported by the CI (compilation on Ubuntu
> > 22.04 in GHA, and unit test failure in UNH), please have a look.
> >
> >
>
> Standard practice in DPDK header files is the following:
>
> --
> /* rte_bar.h */
> #ifdef __cplusplus
> extern "C" {
> #endif
>
> #include <rte_foo.h>
>
> void
> rte_foo_do(void);
>
> /../
> --
>
> That seems not like best practice to me, since rte_bar.h is messing
> around with linkage of constructs of any files included. In particular,
> it prohibits replacing _Generic with C++ function overloading, in C++ TUs.
>
> What one should do is to have extern "C" linkage only on functions which
> the include file in question (e.g., rte_foo.h) itself declares.

This is probably not the best practice, but since you intend to fix
it, it will be perfect afterwards :-).


>
> --
> /* rte_bar.h */
> #include <rte_foo.h>
>
> #ifdef __cplusplus
> extern "C" {
> #endif
>
> void
> rte_foo_do(void);
>
> /../
> --
>
> There are 259 header files in the DPDK repo in need of fixing.
>
> Should the fix be 259 patches, or something smaller? One large patch, or
> a patch per library, or something else. Please advise, over.

The change seems mechanical, so one single change is ok.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v3 0/5] Improve EAL bit operations API
  2024-09-09 12:25                                           ` David Marchand
@ 2024-09-09 13:09                                             ` Mattias Rönnblom
  0 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-09 13:09 UTC (permalink / raw)
  To: David Marchand
  Cc: Tyler Retzlaff, Mattias Rönnblom, dev, Heng Wang,
	Stephen Hemminger, Morten Brørup, Jack Bond-Preston,
	Thomas Monjalon

On 2024-09-09 14:25, David Marchand wrote:
> On Mon, Sep 9, 2024 at 2:05 PM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>>> Mattias, there are issues reported by the CI (compilation on Ubuntu
>>> 22.04 in GHA, and unit test failure in UNH), please have a look.
>>>
>>>
>>
>> Standard practice in DPDK header files is the following:
>>
>> --
>> /* rte_bar.h */
>> #ifdef __cplusplus
>> extern "C" {
>> #endif
>>
>> #include <rte_foo.h>
>>
>> void
>> rte_foo_do(void);
>>
>> /../
>> --
>>
>> That seems not like best practice to me, since rte_bar.h is messing
>> around with linkage of constructs of any files included. In particular,
>> it prohibits replacing _Generic with C++ function overloading, in C++ TUs.
>>
>> What one should do is to have extern "C" linkage only on functions which
>> the include file in question (e.g., rte_foo.h) itself declares.
> 
> This is probably not the best practice, but since you intend to fix
> it, it will be perfect afterwards :-).
> 

Actually, I intended to opt for a less-than-perfect solution, where you 
just move the 'extern "C"' to cover everything but includes, rather than 
just what is necessary (i.e., functions and global variables).

That change was easily automated, but the perfect solution requires a 
more elaborate script or human intervention.

> 
>>
>> --
>> /* rte_bar.h */
>> #include <rte_foo.h>
>>
>> #ifdef __cplusplus
>> extern "C" {
>> #endif
>>
>> void
>> rte_foo_do(void);
>>
>> /../
>> --
>>
>> There are 259 header files in the DPDK repo in need of fixing.
>>
>> Should the fix be 259 patches, or something smaller? One large patch, or
>> a patch per library, or something else. Please advise, over.
> 
> The change seems mechanical, so one single change is ok.
> 
> 

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v4 0/6] Improve EAL bit operations API
  2024-08-12 12:49                                     ` [PATCH v3 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
  2024-08-12 13:24                                       ` Jack Bond-Preston
@ 2024-09-09 14:57                                       ` Mattias Rönnblom
  2024-09-09 14:57                                         ` [PATCH v4 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
                                                           ` (5 more replies)
  1 sibling, 6 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-09 14:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() functions is replaced
with two new families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees, but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant. rte_bit_[test|set|clear|assign|flip]() may be
used with volatile word pointers, in which case they guarantee
that the program-level accesses actually occur.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions, implemented by
means of a huge, complicated C macro mess.

Mattias Rönnblom (6):
  dpdk: do not force C linkage on include file dependencies
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions
  eal: extend bitops to handle volatile pointers

 app/test/packet_burst_generator.h             |   8 +-
 app/test/test_bitops.c                        | 416 +++++++++-
 app/test/virtual_pmd.h                        |   4 +-
 doc/guides/rel_notes/release_24_11.rst        |  17 +
 drivers/bus/auxiliary/bus_auxiliary_driver.h  |   8 +-
 drivers/bus/cdx/bus_cdx_driver.h              |   8 +-
 drivers/bus/dpaa/include/fsl_qman.h           |   8 +-
 drivers/bus/fslmc/bus_fslmc_driver.h          |   8 +-
 drivers/bus/pci/bus_pci_driver.h              |   8 +-
 drivers/bus/pci/rte_bus_pci.h                 |   8 +-
 drivers/bus/platform/bus_platform_driver.h    |   8 +-
 drivers/bus/vdev/bus_vdev_driver.h            |   8 +-
 drivers/bus/vmbus/bus_vmbus_driver.h          |   8 +-
 drivers/bus/vmbus/rte_bus_vmbus.h             |   8 +-
 drivers/dma/cnxk/cnxk_dma_event_dp.h          |   8 +-
 drivers/dma/ioat/ioat_hw_defs.h               |   4 +-
 drivers/event/dlb2/rte_pmd_dlb2.h             |   8 +-
 drivers/mempool/dpaa2/rte_dpaa2_mempool.h     |   6 +-
 drivers/net/avp/rte_avp_fifo.h                |   8 +-
 drivers/net/bonding/rte_eth_bond.h            |   4 +-
 drivers/net/i40e/rte_pmd_i40e.h               |   8 +-
 drivers/net/mlx5/mlx5_trace.h                 |   8 +-
 drivers/net/ring/rte_eth_ring.h               |   4 +-
 drivers/net/vhost/rte_eth_vhost.h             |   8 +-
 drivers/raw/ifpga/afu_pmd_core.h              |   8 +-
 drivers/raw/ifpga/afu_pmd_he_hssi.h           |   6 +-
 drivers/raw/ifpga/afu_pmd_he_lpbk.h           |   6 +-
 drivers/raw/ifpga/afu_pmd_he_mem.h            |   6 +-
 drivers/raw/ifpga/afu_pmd_n3000.h             |   6 +-
 drivers/raw/ifpga/rte_pmd_afu.h               |   4 +-
 drivers/raw/ifpga/rte_pmd_ifpga.h             |   4 +-
 examples/ethtool/lib/rte_ethtool.h            |   8 +-
 examples/qos_sched/main.h                     |   4 +-
 examples/vm_power_manager/channel_manager.h   |   8 +-
 lib/acl/rte_acl_osdep.h                       |   8 +-
 lib/bbdev/rte_bbdev.h                         |   8 +-
 lib/bbdev/rte_bbdev_op.h                      |   8 +-
 lib/bbdev/rte_bbdev_pmd.h                     |   8 +-
 lib/bpf/bpf_def.h                             |   8 +-
 lib/compressdev/rte_comp.h                    |   4 +-
 lib/compressdev/rte_compressdev.h             |   6 +-
 lib/compressdev/rte_compressdev_internal.h    |   8 +-
 lib/compressdev/rte_compressdev_pmd.h         |   8 +-
 lib/cryptodev/cryptodev_pmd.h                 |   8 +-
 lib/cryptodev/cryptodev_trace.h               |   8 +-
 lib/cryptodev/rte_crypto.h                    |   8 +-
 lib/cryptodev/rte_crypto_asym.h               |   8 +-
 lib/cryptodev/rte_crypto_sym.h                |   8 +-
 lib/cryptodev/rte_cryptodev.h                 |   8 +-
 lib/cryptodev/rte_cryptodev_trace_fp.h        |   4 +-
 lib/dispatcher/rte_dispatcher.h               |   8 +-
 lib/dmadev/rte_dmadev.h                       |   8 +-
 lib/eal/arm/include/rte_atomic_32.h           |   4 +-
 lib/eal/arm/include/rte_atomic_64.h           |   8 +-
 lib/eal/arm/include/rte_byteorder.h           |   8 +-
 lib/eal/arm/include/rte_cpuflags_32.h         |   8 +-
 lib/eal/arm/include/rte_cpuflags_64.h         |   8 +-
 lib/eal/arm/include/rte_cycles_32.h           |   4 +-
 lib/eal/arm/include/rte_cycles_64.h           |   4 +-
 lib/eal/arm/include/rte_io.h                  |   8 +-
 lib/eal/arm/include/rte_io_64.h               |   8 +-
 lib/eal/arm/include/rte_memcpy_32.h           |   8 +-
 lib/eal/arm/include/rte_memcpy_64.h           |   8 +-
 lib/eal/arm/include/rte_pause.h               |   8 +-
 lib/eal/arm/include/rte_pause_32.h            |   6 +-
 lib/eal/arm/include/rte_pause_64.h            |   8 +-
 lib/eal/arm/include/rte_power_intrinsics.h    |   8 +-
 lib/eal/arm/include/rte_prefetch_32.h         |   8 +-
 lib/eal/arm/include/rte_prefetch_64.h         |   8 +-
 lib/eal/arm/include/rte_rwlock.h              |   4 +-
 lib/eal/arm/include/rte_spinlock.h            |   6 +-
 lib/eal/freebsd/include/rte_os.h              |   8 +-
 lib/eal/include/bus_driver.h                  |   8 +-
 lib/eal/include/dev_driver.h                  |   6 +-
 lib/eal/include/eal_trace_internal.h          |   8 +-
 lib/eal/include/generic/rte_cycles.h          |   8 +
 lib/eal/include/generic/rte_memcpy.h          |   8 +
 lib/eal/include/generic/rte_pause.h           |   8 +
 .../include/generic/rte_power_intrinsics.h    |   8 +
 lib/eal/include/generic/rte_prefetch.h        |   8 +
 lib/eal/include/generic/rte_rwlock.h          |   8 +-
 lib/eal/include/generic/rte_spinlock.h        |   8 +
 lib/eal/include/rte_alarm.h                   |   4 +-
 lib/eal/include/rte_bitmap.h                  |   8 +-
 lib/eal/include/rte_bitops.h                  | 768 +++++++++++++++++-
 lib/eal/include/rte_bus.h                     |   8 +-
 lib/eal/include/rte_class.h                   |   4 +-
 lib/eal/include/rte_common.h                  |   8 +-
 lib/eal/include/rte_dev.h                     |   8 +-
 lib/eal/include/rte_devargs.h                 |   8 +-
 lib/eal/include/rte_eal_trace.h               |   4 +-
 lib/eal/include/rte_errno.h                   |   4 +-
 lib/eal/include/rte_fbarray.h                 |   8 +-
 lib/eal/include/rte_keepalive.h               |   6 +-
 lib/eal/include/rte_mcslock.h                 |   8 +-
 lib/eal/include/rte_memory.h                  |   8 +-
 lib/eal/include/rte_pci_dev_features.h        |   4 +-
 lib/eal/include/rte_pflock.h                  |   8 +-
 lib/eal/include/rte_random.h                  |   4 +-
 lib/eal/include/rte_seqcount.h                |   8 +-
 lib/eal/include/rte_seqlock.h                 |   8 +-
 lib/eal/include/rte_service.h                 |   8 +-
 lib/eal/include/rte_service_component.h       |   4 +-
 lib/eal/include/rte_stdatomic.h               |   5 +-
 lib/eal/include/rte_string_fns.h              |  17 +-
 lib/eal/include/rte_tailq.h                   |   6 +-
 lib/eal/include/rte_ticketlock.h              |   8 +-
 lib/eal/include/rte_time.h                    |   6 +-
 lib/eal/include/rte_trace.h                   |   8 +-
 lib/eal/include/rte_trace_point.h             |   8 +-
 lib/eal/include/rte_trace_point_register.h    |   8 +-
 lib/eal/include/rte_uuid.h                    |   8 +-
 lib/eal/include/rte_version.h                 |   6 +-
 lib/eal/include/rte_vfio.h                    |   8 +-
 lib/eal/linux/include/rte_os.h                |   8 +-
 lib/eal/loongarch/include/rte_atomic.h        |   6 +-
 lib/eal/loongarch/include/rte_byteorder.h     |   4 +-
 lib/eal/loongarch/include/rte_cpuflags.h      |   8 +-
 lib/eal/loongarch/include/rte_cycles.h        |   4 +-
 lib/eal/loongarch/include/rte_io.h            |   4 +-
 lib/eal/loongarch/include/rte_memcpy.h        |   4 +-
 lib/eal/loongarch/include/rte_pause.h         |   8 +-
 .../loongarch/include/rte_power_intrinsics.h  |   8 +-
 lib/eal/loongarch/include/rte_prefetch.h      |   8 +-
 lib/eal/loongarch/include/rte_rwlock.h        |   4 +-
 lib/eal/loongarch/include/rte_spinlock.h      |   6 +-
 lib/eal/ppc/include/rte_atomic.h              |   6 +-
 lib/eal/ppc/include/rte_byteorder.h           |   6 +-
 lib/eal/ppc/include/rte_cpuflags.h            |   8 +-
 lib/eal/ppc/include/rte_cycles.h              |   8 +-
 lib/eal/ppc/include/rte_io.h                  |   4 +-
 lib/eal/ppc/include/rte_memcpy.h              |   4 +-
 lib/eal/ppc/include/rte_pause.h               |   8 +-
 lib/eal/ppc/include/rte_power_intrinsics.h    |   8 +-
 lib/eal/ppc/include/rte_prefetch.h            |   8 +-
 lib/eal/ppc/include/rte_rwlock.h              |   4 +-
 lib/eal/ppc/include/rte_spinlock.h            |   8 +-
 lib/eal/riscv/include/rte_atomic.h            |   8 +-
 lib/eal/riscv/include/rte_byteorder.h         |   8 +-
 lib/eal/riscv/include/rte_cpuflags.h          |   8 +-
 lib/eal/riscv/include/rte_cycles.h            |   4 +-
 lib/eal/riscv/include/rte_io.h                |   4 +-
 lib/eal/riscv/include/rte_memcpy.h            |   4 +-
 lib/eal/riscv/include/rte_pause.h             |   8 +-
 lib/eal/riscv/include/rte_power_intrinsics.h  |   8 +-
 lib/eal/riscv/include/rte_prefetch.h          |   8 +-
 lib/eal/riscv/include/rte_rwlock.h            |   4 +-
 lib/eal/riscv/include/rte_spinlock.h          |   6 +-
 lib/eal/windows/include/pthread.h             |   6 +-
 lib/eal/windows/include/regex.h               |   8 +-
 lib/eal/windows/include/rte_windows.h         |   8 +-
 lib/eal/x86/include/rte_atomic.h              |   8 +-
 lib/eal/x86/include/rte_byteorder.h           |   8 +-
 lib/eal/x86/include/rte_cpuflags.h            |   8 +-
 lib/eal/x86/include/rte_cycles.h              |   8 +-
 lib/eal/x86/include/rte_io.h                  |   8 +-
 lib/eal/x86/include/rte_pause.h               |   7 +-
 lib/eal/x86/include/rte_power_intrinsics.h    |   8 +-
 lib/eal/x86/include/rte_prefetch.h            |   8 +-
 lib/eal/x86/include/rte_rwlock.h              |   6 +-
 lib/eal/x86/include/rte_spinlock.h            |   8 +-
 lib/ethdev/ethdev_driver.h                    |   8 +-
 lib/ethdev/ethdev_pci.h                       |   8 +-
 lib/ethdev/ethdev_trace.h                     |   8 +-
 lib/ethdev/ethdev_vdev.h                      |   8 +-
 lib/ethdev/rte_cman.h                         |   4 +-
 lib/ethdev/rte_dev_info.h                     |   4 +-
 lib/ethdev/rte_ethdev.h                       |   8 +-
 lib/ethdev/rte_ethdev_trace_fp.h              |   4 +-
 lib/eventdev/event_timer_adapter_pmd.h        |   4 +-
 lib/eventdev/eventdev_pmd.h                   |   8 +-
 lib/eventdev/eventdev_pmd_pci.h               |   8 +-
 lib/eventdev/eventdev_pmd_vdev.h              |   8 +-
 lib/eventdev/eventdev_trace.h                 |   8 +-
 lib/eventdev/rte_event_crypto_adapter.h       |   8 +-
 lib/eventdev/rte_event_eth_rx_adapter.h       |   8 +-
 lib/eventdev/rte_event_eth_tx_adapter.h       |   8 +-
 lib/eventdev/rte_event_ring.h                 |   8 +-
 lib/eventdev/rte_event_timer_adapter.h        |   8 +-
 lib/eventdev/rte_eventdev.h                   |   8 +-
 lib/eventdev/rte_eventdev_trace_fp.h          |   4 +-
 lib/graph/rte_graph_model_mcore_dispatch.h    |   8 +-
 lib/graph/rte_graph_worker.h                  |   6 +-
 lib/gso/rte_gso.h                             |   6 +-
 lib/hash/rte_fbk_hash.h                       |   8 +-
 lib/hash/rte_hash_crc.h                       |   8 +-
 lib/hash/rte_jhash.h                          |   8 +-
 lib/hash/rte_thash.h                          |   8 +-
 lib/hash/rte_thash_gfni.h                     |   8 +-
 lib/ip_frag/rte_ip_frag.h                     |   8 +-
 lib/ipsec/rte_ipsec.h                         |   8 +-
 lib/log/rte_log.h                             |   8 +-
 lib/lpm/rte_lpm.h                             |   8 +-
 lib/member/rte_member.h                       |   8 +-
 lib/member/rte_member_sketch.h                |   6 +-
 lib/member/rte_member_sketch_avx512.h         |   8 +-
 lib/member/rte_member_x86.h                   |   4 +-
 lib/member/rte_xxh64_avx512.h                 |   6 +-
 lib/mempool/mempool_trace.h                   |   8 +-
 lib/mempool/rte_mempool_trace_fp.h            |   4 +-
 lib/meter/rte_meter.h                         |   8 +-
 lib/mldev/mldev_utils.h                       |   8 +-
 lib/mldev/rte_mldev_core.h                    |   8 +-
 lib/mldev/rte_mldev_pmd.h                     |   8 +-
 lib/net/rte_ether.h                           |   8 +-
 lib/net/rte_net.h                             |   8 +-
 lib/net/rte_sctp.h                            |   8 +-
 lib/node/rte_node_eth_api.h                   |   8 +-
 lib/node/rte_node_ip4_api.h                   |   8 +-
 lib/node/rte_node_ip6_api.h                   |   6 +-
 lib/node/rte_node_udp4_input_api.h            |   8 +-
 lib/pci/rte_pci.h                             |   8 +-
 lib/pdcp/rte_pdcp.h                           |   8 +-
 lib/pipeline/rte_pipeline.h                   |   8 +-
 lib/pipeline/rte_port_in_action.h             |   8 +-
 lib/pipeline/rte_swx_ctl.h                    |   8 +-
 lib/pipeline/rte_swx_extern.h                 |   8 +-
 lib/pipeline/rte_swx_ipsec.h                  |   8 +-
 lib/pipeline/rte_swx_pipeline.h               |   8 +-
 lib/pipeline/rte_swx_pipeline_spec.h          |   8 +-
 lib/pipeline/rte_table_action.h               |   8 +-
 lib/port/rte_port.h                           |   8 +-
 lib/port/rte_port_ethdev.h                    |   8 +-
 lib/port/rte_port_eventdev.h                  |   8 +-
 lib/port/rte_port_fd.h                        |   8 +-
 lib/port/rte_port_frag.h                      |   8 +-
 lib/port/rte_port_ras.h                       |   8 +-
 lib/port/rte_port_ring.h                      |   8 +-
 lib/port/rte_port_sched.h                     |   8 +-
 lib/port/rte_port_source_sink.h               |   8 +-
 lib/port/rte_port_sym_crypto.h                |   8 +-
 lib/port/rte_swx_port.h                       |   8 +-
 lib/port/rte_swx_port_ethdev.h                |   8 +-
 lib/port/rte_swx_port_fd.h                    |   8 +-
 lib/port/rte_swx_port_ring.h                  |   8 +-
 lib/port/rte_swx_port_source_sink.h           |   8 +-
 lib/rawdev/rte_rawdev.h                       |   6 +-
 lib/rawdev/rte_rawdev_pmd.h                   |   8 +-
 lib/rcu/rte_rcu_qsbr.h                        |   8 +-
 lib/regexdev/rte_regexdev.h                   |   8 +-
 lib/ring/rte_ring.h                           |   6 +-
 lib/ring/rte_ring_core.h                      |   8 +-
 lib/ring/rte_ring_elem.h                      |   8 +-
 lib/ring/rte_ring_hts.h                       |   4 +-
 lib/ring/rte_ring_peek.h                      |   4 +-
 lib/ring/rte_ring_peek_zc.h                   |   4 +-
 lib/ring/rte_ring_rts.h                       |   4 +-
 lib/sched/rte_approx.h                        |   8 +-
 lib/sched/rte_pie.h                           |   8 +-
 lib/sched/rte_red.h                           |   8 +-
 lib/sched/rte_sched.h                         |   8 +-
 lib/sched/rte_sched_common.h                  |   6 +-
 lib/security/rte_security.h                   |   8 +-
 lib/security/rte_security_driver.h            |   6 +-
 lib/stack/rte_stack.h                         |   8 +-
 lib/table/rte_lru.h                           |  12 +-
 lib/table/rte_lru_arm64.h                     |   8 +-
 lib/table/rte_lru_x86.h                       |   8 -
 lib/table/rte_swx_hash_func.h                 |   8 +-
 lib/table/rte_swx_keycmp.h                    |   8 +-
 lib/table/rte_swx_table.h                     |   8 +-
 lib/table/rte_swx_table_em.h                  |   8 +-
 lib/table/rte_swx_table_learner.h             |   8 +-
 lib/table/rte_swx_table_selector.h            |   8 +-
 lib/table/rte_swx_table_wm.h                  |   8 +-
 lib/table/rte_table.h                         |   8 +-
 lib/table/rte_table_acl.h                     |   8 +-
 lib/table/rte_table_array.h                   |   8 +-
 lib/table/rte_table_hash.h                    |   8 +-
 lib/table/rte_table_hash_cuckoo.h             |   8 +-
 lib/table/rte_table_hash_func.h               |  12 +-
 lib/table/rte_table_lpm.h                     |   8 +-
 lib/table/rte_table_lpm_ipv6.h                |   8 +-
 lib/table/rte_table_stub.h                    |   8 +-
 lib/telemetry/rte_telemetry.h                 |   8 +-
 lib/vhost/rte_vdpa.h                          |   8 +-
 lib/vhost/rte_vhost.h                         |   8 +-
 lib/vhost/rte_vhost_async.h                   |   8 +-
 lib/vhost/rte_vhost_crypto.h                  |   4 +-
 lib/vhost/vdpa_driver.h                       |   8 +-
 280 files changed, 2203 insertions(+), 993 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v4 1/6] dpdk: do not force C linkage on include file dependencies
  2024-09-09 14:57                                       ` [PATCH v4 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-09-09 14:57                                         ` Mattias Rönnblom
  2024-09-09 16:43                                           ` Morten Brørup
                                                             ` (2 more replies)
  2024-09-09 14:57                                         ` [PATCH v4 2/6] eal: extend bit manipulation functionality Mattias Rönnblom
                                                           ` (4 subsequent siblings)
  5 siblings, 3 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-09 14:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Mattias Rönnblom

Assure that 'extern "C" { /../ }' do not cover files included from a
particular header file, and address minor issues resulting from this
change of order.

Dealing with C++ should delegate to the individual include file level,
rather than being imposed by the user of that file. For example,
forcing C linkage prevents __Generic macros being replaced with
overloaded static inline functions in C++ translation units.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 app/test/packet_burst_generator.h               |  8 ++++----
 app/test/virtual_pmd.h                          |  4 ++--
 drivers/bus/auxiliary/bus_auxiliary_driver.h    |  8 ++++----
 drivers/bus/cdx/bus_cdx_driver.h                |  8 ++++----
 drivers/bus/dpaa/include/fsl_qman.h             |  8 ++++----
 drivers/bus/fslmc/bus_fslmc_driver.h            |  8 ++++----
 drivers/bus/pci/bus_pci_driver.h                |  8 ++++----
 drivers/bus/pci/rte_bus_pci.h                   |  8 ++++----
 drivers/bus/platform/bus_platform_driver.h      |  8 ++++----
 drivers/bus/vdev/bus_vdev_driver.h              |  8 ++++----
 drivers/bus/vmbus/bus_vmbus_driver.h            |  8 ++++----
 drivers/bus/vmbus/rte_bus_vmbus.h               |  8 ++++----
 drivers/dma/cnxk/cnxk_dma_event_dp.h            |  8 ++++----
 drivers/dma/ioat/ioat_hw_defs.h                 |  4 ++--
 drivers/event/dlb2/rte_pmd_dlb2.h               |  8 ++++----
 drivers/mempool/dpaa2/rte_dpaa2_mempool.h       |  6 +++---
 drivers/net/avp/rte_avp_fifo.h                  |  8 ++++----
 drivers/net/bonding/rte_eth_bond.h              |  4 ++--
 drivers/net/i40e/rte_pmd_i40e.h                 |  8 ++++----
 drivers/net/mlx5/mlx5_trace.h                   |  8 ++++----
 drivers/net/ring/rte_eth_ring.h                 |  4 ++--
 drivers/net/vhost/rte_eth_vhost.h               |  8 ++++----
 drivers/raw/ifpga/afu_pmd_core.h                |  8 ++++----
 drivers/raw/ifpga/afu_pmd_he_hssi.h             |  6 +++---
 drivers/raw/ifpga/afu_pmd_he_lpbk.h             |  6 +++---
 drivers/raw/ifpga/afu_pmd_he_mem.h              |  6 +++---
 drivers/raw/ifpga/afu_pmd_n3000.h               |  6 +++---
 drivers/raw/ifpga/rte_pmd_afu.h                 |  4 ++--
 drivers/raw/ifpga/rte_pmd_ifpga.h               |  4 ++--
 examples/ethtool/lib/rte_ethtool.h              |  8 ++++----
 examples/qos_sched/main.h                       |  4 ++--
 examples/vm_power_manager/channel_manager.h     |  8 ++++----
 lib/acl/rte_acl_osdep.h                         |  8 ++++----
 lib/bbdev/rte_bbdev.h                           |  8 ++++----
 lib/bbdev/rte_bbdev_op.h                        |  8 ++++----
 lib/bbdev/rte_bbdev_pmd.h                       |  8 ++++----
 lib/bpf/bpf_def.h                               |  8 ++++----
 lib/compressdev/rte_comp.h                      |  4 ++--
 lib/compressdev/rte_compressdev.h               |  6 +++---
 lib/compressdev/rte_compressdev_internal.h      |  8 ++++----
 lib/compressdev/rte_compressdev_pmd.h           |  8 ++++----
 lib/cryptodev/cryptodev_pmd.h                   |  8 ++++----
 lib/cryptodev/cryptodev_trace.h                 |  8 ++++----
 lib/cryptodev/rte_crypto.h                      |  8 ++++----
 lib/cryptodev/rte_crypto_asym.h                 |  8 ++++----
 lib/cryptodev/rte_crypto_sym.h                  |  8 ++++----
 lib/cryptodev/rte_cryptodev.h                   |  8 ++++----
 lib/cryptodev/rte_cryptodev_trace_fp.h          |  4 ++--
 lib/dispatcher/rte_dispatcher.h                 |  8 ++++----
 lib/dmadev/rte_dmadev.h                         |  8 ++++----
 lib/eal/arm/include/rte_atomic_32.h             |  4 ++--
 lib/eal/arm/include/rte_atomic_64.h             |  8 ++++----
 lib/eal/arm/include/rte_byteorder.h             |  8 ++++----
 lib/eal/arm/include/rte_cpuflags_32.h           |  8 ++++----
 lib/eal/arm/include/rte_cpuflags_64.h           |  8 ++++----
 lib/eal/arm/include/rte_cycles_32.h             |  4 ++--
 lib/eal/arm/include/rte_cycles_64.h             |  4 ++--
 lib/eal/arm/include/rte_io.h                    |  8 ++++----
 lib/eal/arm/include/rte_io_64.h                 |  8 ++++----
 lib/eal/arm/include/rte_memcpy_32.h             |  8 ++++----
 lib/eal/arm/include/rte_memcpy_64.h             |  8 ++++----
 lib/eal/arm/include/rte_pause.h                 |  8 ++++----
 lib/eal/arm/include/rte_pause_32.h              |  6 +++---
 lib/eal/arm/include/rte_pause_64.h              |  8 ++++----
 lib/eal/arm/include/rte_power_intrinsics.h      |  8 ++++----
 lib/eal/arm/include/rte_prefetch_32.h           |  8 ++++----
 lib/eal/arm/include/rte_prefetch_64.h           |  8 ++++----
 lib/eal/arm/include/rte_rwlock.h                |  4 ++--
 lib/eal/arm/include/rte_spinlock.h              |  6 +++---
 lib/eal/freebsd/include/rte_os.h                |  8 ++++----
 lib/eal/include/bus_driver.h                    |  8 ++++----
 lib/eal/include/dev_driver.h                    |  6 +++---
 lib/eal/include/eal_trace_internal.h            |  8 ++++----
 lib/eal/include/generic/rte_cycles.h            |  8 ++++++++
 lib/eal/include/generic/rte_memcpy.h            |  8 ++++++++
 lib/eal/include/generic/rte_pause.h             |  8 ++++++++
 lib/eal/include/generic/rte_power_intrinsics.h  |  8 ++++++++
 lib/eal/include/generic/rte_prefetch.h          |  8 ++++++++
 lib/eal/include/generic/rte_rwlock.h            |  8 ++++----
 lib/eal/include/generic/rte_spinlock.h          |  8 ++++++++
 lib/eal/include/rte_alarm.h                     |  4 ++--
 lib/eal/include/rte_bitmap.h                    |  8 ++++----
 lib/eal/include/rte_bus.h                       |  8 ++++----
 lib/eal/include/rte_class.h                     |  4 ++--
 lib/eal/include/rte_common.h                    |  8 ++++----
 lib/eal/include/rte_dev.h                       |  8 ++++----
 lib/eal/include/rte_devargs.h                   |  8 ++++----
 lib/eal/include/rte_eal_trace.h                 |  4 ++--
 lib/eal/include/rte_errno.h                     |  4 ++--
 lib/eal/include/rte_fbarray.h                   |  8 ++++----
 lib/eal/include/rte_keepalive.h                 |  6 +++---
 lib/eal/include/rte_mcslock.h                   |  8 ++++----
 lib/eal/include/rte_memory.h                    |  8 ++++----
 lib/eal/include/rte_pci_dev_features.h          |  4 ++--
 lib/eal/include/rte_pflock.h                    |  8 ++++----
 lib/eal/include/rte_random.h                    |  4 ++--
 lib/eal/include/rte_seqcount.h                  |  8 ++++----
 lib/eal/include/rte_seqlock.h                   |  8 ++++----
 lib/eal/include/rte_service.h                   |  8 ++++----
 lib/eal/include/rte_service_component.h         |  4 ++--
 lib/eal/include/rte_stdatomic.h                 |  5 +----
 lib/eal/include/rte_string_fns.h                | 17 ++++++++++++-----
 lib/eal/include/rte_tailq.h                     |  6 +++---
 lib/eal/include/rte_ticketlock.h                |  8 ++++----
 lib/eal/include/rte_time.h                      |  6 +++---
 lib/eal/include/rte_trace.h                     |  8 ++++----
 lib/eal/include/rte_trace_point.h               |  8 ++++----
 lib/eal/include/rte_trace_point_register.h      |  8 ++++----
 lib/eal/include/rte_uuid.h                      |  8 ++++----
 lib/eal/include/rte_version.h                   |  6 +++---
 lib/eal/include/rte_vfio.h                      |  8 ++++----
 lib/eal/linux/include/rte_os.h                  |  8 ++++----
 lib/eal/loongarch/include/rte_atomic.h          |  6 +++---
 lib/eal/loongarch/include/rte_byteorder.h       |  4 ++--
 lib/eal/loongarch/include/rte_cpuflags.h        |  8 ++++----
 lib/eal/loongarch/include/rte_cycles.h          |  4 ++--
 lib/eal/loongarch/include/rte_io.h              |  4 ++--
 lib/eal/loongarch/include/rte_memcpy.h          |  4 ++--
 lib/eal/loongarch/include/rte_pause.h           |  8 ++++----
 .../loongarch/include/rte_power_intrinsics.h    |  8 ++++----
 lib/eal/loongarch/include/rte_prefetch.h        |  8 ++++----
 lib/eal/loongarch/include/rte_rwlock.h          |  4 ++--
 lib/eal/loongarch/include/rte_spinlock.h        |  6 +++---
 lib/eal/ppc/include/rte_atomic.h                |  6 +++---
 lib/eal/ppc/include/rte_byteorder.h             |  6 +++---
 lib/eal/ppc/include/rte_cpuflags.h              |  8 ++++----
 lib/eal/ppc/include/rte_cycles.h                |  8 ++++----
 lib/eal/ppc/include/rte_io.h                    |  4 ++--
 lib/eal/ppc/include/rte_memcpy.h                |  4 ++--
 lib/eal/ppc/include/rte_pause.h                 |  8 ++++----
 lib/eal/ppc/include/rte_power_intrinsics.h      |  8 ++++----
 lib/eal/ppc/include/rte_prefetch.h              |  8 ++++----
 lib/eal/ppc/include/rte_rwlock.h                |  4 ++--
 lib/eal/ppc/include/rte_spinlock.h              |  8 ++++----
 lib/eal/riscv/include/rte_atomic.h              |  8 ++++----
 lib/eal/riscv/include/rte_byteorder.h           |  8 ++++----
 lib/eal/riscv/include/rte_cpuflags.h            |  8 ++++----
 lib/eal/riscv/include/rte_cycles.h              |  4 ++--
 lib/eal/riscv/include/rte_io.h                  |  4 ++--
 lib/eal/riscv/include/rte_memcpy.h              |  4 ++--
 lib/eal/riscv/include/rte_pause.h               |  8 ++++----
 lib/eal/riscv/include/rte_power_intrinsics.h    |  8 ++++----
 lib/eal/riscv/include/rte_prefetch.h            |  8 ++++----
 lib/eal/riscv/include/rte_rwlock.h              |  4 ++--
 lib/eal/riscv/include/rte_spinlock.h            |  6 +++---
 lib/eal/windows/include/pthread.h               |  6 +++---
 lib/eal/windows/include/regex.h                 |  8 ++++----
 lib/eal/windows/include/rte_windows.h           |  8 ++++----
 lib/eal/x86/include/rte_atomic.h                |  8 ++++----
 lib/eal/x86/include/rte_byteorder.h             |  8 ++++----
 lib/eal/x86/include/rte_cpuflags.h              |  8 ++++----
 lib/eal/x86/include/rte_cycles.h                |  8 ++++----
 lib/eal/x86/include/rte_io.h                    |  8 ++++----
 lib/eal/x86/include/rte_pause.h                 |  7 ++++---
 lib/eal/x86/include/rte_power_intrinsics.h      |  8 ++++----
 lib/eal/x86/include/rte_prefetch.h              |  8 ++++----
 lib/eal/x86/include/rte_rwlock.h                |  6 +++---
 lib/eal/x86/include/rte_spinlock.h              |  8 ++++----
 lib/ethdev/ethdev_driver.h                      |  8 ++++----
 lib/ethdev/ethdev_pci.h                         |  8 ++++----
 lib/ethdev/ethdev_trace.h                       |  8 ++++----
 lib/ethdev/ethdev_vdev.h                        |  8 ++++----
 lib/ethdev/rte_cman.h                           |  4 ++--
 lib/ethdev/rte_dev_info.h                       |  4 ++--
 lib/ethdev/rte_ethdev.h                         |  8 ++++----
 lib/ethdev/rte_ethdev_trace_fp.h                |  4 ++--
 lib/eventdev/event_timer_adapter_pmd.h          |  4 ++--
 lib/eventdev/eventdev_pmd.h                     |  8 ++++----
 lib/eventdev/eventdev_pmd_pci.h                 |  8 ++++----
 lib/eventdev/eventdev_pmd_vdev.h                |  8 ++++----
 lib/eventdev/eventdev_trace.h                   |  8 ++++----
 lib/eventdev/rte_event_crypto_adapter.h         |  8 ++++----
 lib/eventdev/rte_event_eth_rx_adapter.h         |  8 ++++----
 lib/eventdev/rte_event_eth_tx_adapter.h         |  8 ++++----
 lib/eventdev/rte_event_ring.h                   |  8 ++++----
 lib/eventdev/rte_event_timer_adapter.h          |  8 ++++----
 lib/eventdev/rte_eventdev.h                     |  8 ++++----
 lib/eventdev/rte_eventdev_trace_fp.h            |  4 ++--
 lib/graph/rte_graph_model_mcore_dispatch.h      |  8 ++++----
 lib/graph/rte_graph_worker.h                    |  6 +++---
 lib/gso/rte_gso.h                               |  6 +++---
 lib/hash/rte_fbk_hash.h                         |  8 ++++----
 lib/hash/rte_hash_crc.h                         |  8 ++++----
 lib/hash/rte_jhash.h                            |  8 ++++----
 lib/hash/rte_thash.h                            |  8 ++++----
 lib/hash/rte_thash_gfni.h                       |  8 ++++----
 lib/ip_frag/rte_ip_frag.h                       |  8 ++++----
 lib/ipsec/rte_ipsec.h                           |  8 ++++----
 lib/log/rte_log.h                               |  8 ++++----
 lib/lpm/rte_lpm.h                               |  8 ++++----
 lib/member/rte_member.h                         |  8 ++++----
 lib/member/rte_member_sketch.h                  |  6 +++---
 lib/member/rte_member_sketch_avx512.h           |  8 ++++----
 lib/member/rte_member_x86.h                     |  4 ++--
 lib/member/rte_xxh64_avx512.h                   |  6 +++---
 lib/mempool/mempool_trace.h                     |  8 ++++----
 lib/mempool/rte_mempool_trace_fp.h              |  4 ++--
 lib/meter/rte_meter.h                           |  8 ++++----
 lib/mldev/mldev_utils.h                         |  8 ++++----
 lib/mldev/rte_mldev_core.h                      |  8 ++++----
 lib/mldev/rte_mldev_pmd.h                       |  8 ++++----
 lib/net/rte_ether.h                             |  8 ++++----
 lib/net/rte_net.h                               |  8 ++++----
 lib/net/rte_sctp.h                              |  8 ++++----
 lib/node/rte_node_eth_api.h                     |  8 ++++----
 lib/node/rte_node_ip4_api.h                     |  8 ++++----
 lib/node/rte_node_ip6_api.h                     |  6 +++---
 lib/node/rte_node_udp4_input_api.h              |  8 ++++----
 lib/pci/rte_pci.h                               |  8 ++++----
 lib/pdcp/rte_pdcp.h                             |  8 ++++----
 lib/pipeline/rte_pipeline.h                     |  8 ++++----
 lib/pipeline/rte_port_in_action.h               |  8 ++++----
 lib/pipeline/rte_swx_ctl.h                      |  8 ++++----
 lib/pipeline/rte_swx_extern.h                   |  8 ++++----
 lib/pipeline/rte_swx_ipsec.h                    |  8 ++++----
 lib/pipeline/rte_swx_pipeline.h                 |  8 ++++----
 lib/pipeline/rte_swx_pipeline_spec.h            |  8 ++++----
 lib/pipeline/rte_table_action.h                 |  8 ++++----
 lib/port/rte_port.h                             |  8 ++++----
 lib/port/rte_port_ethdev.h                      |  8 ++++----
 lib/port/rte_port_eventdev.h                    |  8 ++++----
 lib/port/rte_port_fd.h                          |  8 ++++----
 lib/port/rte_port_frag.h                        |  8 ++++----
 lib/port/rte_port_ras.h                         |  8 ++++----
 lib/port/rte_port_ring.h                        |  8 ++++----
 lib/port/rte_port_sched.h                       |  8 ++++----
 lib/port/rte_port_source_sink.h                 |  8 ++++----
 lib/port/rte_port_sym_crypto.h                  |  8 ++++----
 lib/port/rte_swx_port.h                         |  8 ++++----
 lib/port/rte_swx_port_ethdev.h                  |  8 ++++----
 lib/port/rte_swx_port_fd.h                      |  8 ++++----
 lib/port/rte_swx_port_ring.h                    |  8 ++++----
 lib/port/rte_swx_port_source_sink.h             |  8 ++++----
 lib/rawdev/rte_rawdev.h                         |  6 +++---
 lib/rawdev/rte_rawdev_pmd.h                     |  8 ++++----
 lib/rcu/rte_rcu_qsbr.h                          |  8 ++++----
 lib/regexdev/rte_regexdev.h                     |  8 ++++----
 lib/ring/rte_ring.h                             |  6 +++---
 lib/ring/rte_ring_core.h                        |  8 ++++----
 lib/ring/rte_ring_elem.h                        |  8 ++++----
 lib/ring/rte_ring_hts.h                         |  4 ++--
 lib/ring/rte_ring_peek.h                        |  4 ++--
 lib/ring/rte_ring_peek_zc.h                     |  4 ++--
 lib/ring/rte_ring_rts.h                         |  4 ++--
 lib/sched/rte_approx.h                          |  8 ++++----
 lib/sched/rte_pie.h                             |  8 ++++----
 lib/sched/rte_red.h                             |  8 ++++----
 lib/sched/rte_sched.h                           |  8 ++++----
 lib/sched/rte_sched_common.h                    |  6 +++---
 lib/security/rte_security.h                     |  8 ++++----
 lib/security/rte_security_driver.h              |  6 +++---
 lib/stack/rte_stack.h                           |  8 ++++----
 lib/table/rte_lru.h                             | 12 ++++--------
 lib/table/rte_lru_arm64.h                       |  8 ++++----
 lib/table/rte_lru_x86.h                         |  8 --------
 lib/table/rte_swx_hash_func.h                   |  8 ++++----
 lib/table/rte_swx_keycmp.h                      |  8 ++++----
 lib/table/rte_swx_table.h                       |  8 ++++----
 lib/table/rte_swx_table_em.h                    |  8 ++++----
 lib/table/rte_swx_table_learner.h               |  8 ++++----
 lib/table/rte_swx_table_selector.h              |  8 ++++----
 lib/table/rte_swx_table_wm.h                    |  8 ++++----
 lib/table/rte_table.h                           |  8 ++++----
 lib/table/rte_table_acl.h                       |  8 ++++----
 lib/table/rte_table_array.h                     |  8 ++++----
 lib/table/rte_table_hash.h                      |  8 ++++----
 lib/table/rte_table_hash_cuckoo.h               |  8 ++++----
 lib/table/rte_table_hash_func.h                 | 12 ++++++++----
 lib/table/rte_table_lpm.h                       |  8 ++++----
 lib/table/rte_table_lpm_ipv6.h                  |  8 ++++----
 lib/table/rte_table_stub.h                      |  8 ++++----
 lib/telemetry/rte_telemetry.h                   |  8 ++++----
 lib/vhost/rte_vdpa.h                            |  8 ++++----
 lib/vhost/rte_vhost.h                           |  8 ++++----
 lib/vhost/rte_vhost_async.h                     |  8 ++++----
 lib/vhost/rte_vhost_crypto.h                    |  4 ++--
 lib/vhost/vdpa_driver.h                         |  8 ++++----
 277 files changed, 1020 insertions(+), 975 deletions(-)

diff --git a/app/test/packet_burst_generator.h b/app/test/packet_burst_generator.h
index b99286f50e..cce41bcd0f 100644
--- a/app/test/packet_burst_generator.h
+++ b/app/test/packet_burst_generator.h
@@ -5,10 +5,6 @@
 #ifndef PACKET_BURST_GENERATOR_H_
 #define PACKET_BURST_GENERATOR_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_mbuf.h>
 #include <rte_ether.h>
 #include <rte_arp.h>
@@ -17,6 +13,10 @@ extern "C" {
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define IPV4_ADDR(a, b, c, d)(((a & 0xff) << 24) | ((b & 0xff) << 16) | \
 		((c & 0xff) << 8) | (d & 0xff))
 
diff --git a/app/test/virtual_pmd.h b/app/test/virtual_pmd.h
index 120b58b273..a5a71d7cb4 100644
--- a/app/test/virtual_pmd.h
+++ b/app/test/virtual_pmd.h
@@ -5,12 +5,12 @@
 #ifndef __VIRTUAL_ETHDEV_H_
 #define __VIRTUAL_ETHDEV_H_
 
+#include <rte_ether.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ether.h>
-
 int
 virtual_ethdev_init(void);
 
diff --git a/drivers/bus/auxiliary/bus_auxiliary_driver.h b/drivers/bus/auxiliary/bus_auxiliary_driver.h
index 58fb7c7f69..40ab1f0912 100644
--- a/drivers/bus/auxiliary/bus_auxiliary_driver.h
+++ b/drivers/bus/auxiliary/bus_auxiliary_driver.h
@@ -11,10 +11,6 @@
  * Auxiliary Bus Interface.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <limits.h>
@@ -28,6 +24,10 @@ extern "C" {
 #include <dev_driver.h>
 #include <rte_kvargs.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_BUS_AUXILIARY_NAME "auxiliary"
 
 /* Forward declarations */
diff --git a/drivers/bus/cdx/bus_cdx_driver.h b/drivers/bus/cdx/bus_cdx_driver.h
index 211f8e406b..d390e7b5a1 100644
--- a/drivers/bus/cdx/bus_cdx_driver.h
+++ b/drivers/bus/cdx/bus_cdx_driver.h
@@ -10,10 +10,6 @@
  * AMD CDX bus interface
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdlib.h>
 #include <inttypes.h>
 #include <linux/types.h>
@@ -22,6 +18,10 @@ extern "C" {
 #include <dev_driver.h>
 #include <rte_interrupts.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Forward declarations */
 struct rte_cdx_device;
 struct rte_cdx_driver;
diff --git a/drivers/bus/dpaa/include/fsl_qman.h b/drivers/bus/dpaa/include/fsl_qman.h
index c0677976e8..f39007b84d 100644
--- a/drivers/bus/dpaa/include/fsl_qman.h
+++ b/drivers/bus/dpaa/include/fsl_qman.h
@@ -8,14 +8,14 @@
 #ifndef __FSL_QMAN_H
 #define __FSL_QMAN_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <dpaa_rbtree.h>
 #include <rte_compat.h>
 #include <rte_eventdev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* FQ lookups (turn this on for 64bit user-space) */
 #ifdef RTE_ARCH_64
 #define CONFIG_FSL_QMAN_FQ_LOOKUP
diff --git a/drivers/bus/fslmc/bus_fslmc_driver.h b/drivers/bus/fslmc/bus_fslmc_driver.h
index 7ac5fe6ff1..3095458133 100644
--- a/drivers/bus/fslmc/bus_fslmc_driver.h
+++ b/drivers/bus/fslmc/bus_fslmc_driver.h
@@ -13,10 +13,6 @@
  * RTE FSLMC Bus Interface
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <limits.h>
@@ -40,6 +36,10 @@ extern "C" {
 #include "portal/dpaa2_hw_pvt.h"
 #include "portal/dpaa2_hw_dpio.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define FSLMC_OBJECT_MAX_LEN 32   /**< Length of each device on bus */
 
 #define DPAA2_INVALID_MBUF_SEQN        0
diff --git a/drivers/bus/pci/bus_pci_driver.h b/drivers/bus/pci/bus_pci_driver.h
index be32263a82..2cc1119072 100644
--- a/drivers/bus/pci/bus_pci_driver.h
+++ b/drivers/bus/pci/bus_pci_driver.h
@@ -6,14 +6,14 @@
 #ifndef BUS_PCI_DRIVER_H
 #define BUS_PCI_DRIVER_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bus_pci.h>
 #include <dev_driver.h>
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Pathname of PCI devices directory. */
 __rte_internal
 const char *rte_pci_get_sysfs_path(void);
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index a3798cb1cb..19a7b15b99 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -11,10 +11,6 @@
  * PCI device & driver interface
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <limits.h>
@@ -27,6 +23,10 @@ extern "C" {
 #include <rte_interrupts.h>
 #include <rte_pci.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Forward declarations */
 struct rte_pci_device;
 struct rte_pci_driver;
diff --git a/drivers/bus/platform/bus_platform_driver.h b/drivers/bus/platform/bus_platform_driver.h
index 5ac54fb739..a6f246f7c4 100644
--- a/drivers/bus/platform/bus_platform_driver.h
+++ b/drivers/bus/platform/bus_platform_driver.h
@@ -10,10 +10,6 @@
  * Platform bus interface.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stddef.h>
 #include <stdint.h>
 
@@ -23,6 +19,10 @@ extern "C" {
 #include <rte_os.h>
 #include <rte_vfio.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Forward declarations */
 struct rte_platform_bus;
 struct rte_platform_device;
diff --git a/drivers/bus/vdev/bus_vdev_driver.h b/drivers/bus/vdev/bus_vdev_driver.h
index bc7e30d7c6..cba1fb5269 100644
--- a/drivers/bus/vdev/bus_vdev_driver.h
+++ b/drivers/bus/vdev/bus_vdev_driver.h
@@ -5,15 +5,15 @@
 #ifndef BUS_VDEV_DRIVER_H
 #define BUS_VDEV_DRIVER_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bus_vdev.h>
 #include <rte_compat.h>
 #include <dev_driver.h>
 #include <rte_devargs.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_vdev_device {
 	RTE_TAILQ_ENTRY(rte_vdev_device) next;      /**< Next attached vdev */
 	struct rte_device device;               /**< Inherit core device */
diff --git a/drivers/bus/vmbus/bus_vmbus_driver.h b/drivers/bus/vmbus/bus_vmbus_driver.h
index e2475a642d..bc394208de 100644
--- a/drivers/bus/vmbus/bus_vmbus_driver.h
+++ b/drivers/bus/vmbus/bus_vmbus_driver.h
@@ -6,14 +6,14 @@
 #ifndef BUS_VMBUS_DRIVER_H
 #define BUS_VMBUS_DRIVER_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bus_vmbus.h>
 #include <rte_compat.h>
 #include <dev_driver.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct vmbus_channel;
 struct vmbus_mon_page;
 
diff --git a/drivers/bus/vmbus/rte_bus_vmbus.h b/drivers/bus/vmbus/rte_bus_vmbus.h
index 9467bd8f3d..fd18bca73c 100644
--- a/drivers/bus/vmbus/rte_bus_vmbus.h
+++ b/drivers/bus/vmbus/rte_bus_vmbus.h
@@ -11,10 +11,6 @@
  *
  * VMBUS Interface
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <limits.h>
@@ -28,6 +24,10 @@ extern "C" {
 #include <rte_interrupts.h>
 #include <rte_vmbus_reg.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Forward declarations */
 struct rte_vmbus_device;
 struct rte_vmbus_driver;
diff --git a/drivers/dma/cnxk/cnxk_dma_event_dp.h b/drivers/dma/cnxk/cnxk_dma_event_dp.h
index 06b5ca8279..8c6cf5dd9a 100644
--- a/drivers/dma/cnxk/cnxk_dma_event_dp.h
+++ b/drivers/dma/cnxk/cnxk_dma_event_dp.h
@@ -5,16 +5,16 @@
 #ifndef _CNXK_DMA_EVENT_DP_H_
 #define _CNXK_DMA_EVENT_DP_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_common.h>
 #include <rte_compat.h>
 #include <rte_eventdev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 __rte_internal
 uint16_t cn10k_dma_adapter_enqueue(void *ws, struct rte_event ev[], uint16_t nb_events);
 
diff --git a/drivers/dma/ioat/ioat_hw_defs.h b/drivers/dma/ioat/ioat_hw_defs.h
index dc3493a78f..11893951f2 100644
--- a/drivers/dma/ioat/ioat_hw_defs.h
+++ b/drivers/dma/ioat/ioat_hw_defs.h
@@ -5,12 +5,12 @@
 #ifndef IOAT_HW_DEFS_H
 #define IOAT_HW_DEFS_H
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 #define IOAT_PCI_CHANERR_INT_OFFSET	0x180
 
 #define IOAT_VER_3_0	0x30
diff --git a/drivers/event/dlb2/rte_pmd_dlb2.h b/drivers/event/dlb2/rte_pmd_dlb2.h
index 334c6c356d..dba7fd2f43 100644
--- a/drivers/event/dlb2/rte_pmd_dlb2.h
+++ b/drivers/event/dlb2/rte_pmd_dlb2.h
@@ -11,14 +11,14 @@
 #ifndef _RTE_PMD_DLB2_H_
 #define _RTE_PMD_DLB2_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
diff --git a/drivers/mempool/dpaa2/rte_dpaa2_mempool.h b/drivers/mempool/dpaa2/rte_dpaa2_mempool.h
index 7fe3d93f61..0286090b1b 100644
--- a/drivers/mempool/dpaa2/rte_dpaa2_mempool.h
+++ b/drivers/mempool/dpaa2/rte_dpaa2_mempool.h
@@ -12,13 +12,13 @@
  *
  */
 
+#include <rte_compat.h>
+#include <rte_mempool.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_compat.h>
-#include <rte_mempool.h>
-
 /**
  * Get BPID corresponding to the packet pool
  *
diff --git a/drivers/net/avp/rte_avp_fifo.h b/drivers/net/avp/rte_avp_fifo.h
index c1658da685..879de3b1c0 100644
--- a/drivers/net/avp/rte_avp_fifo.h
+++ b/drivers/net/avp/rte_avp_fifo.h
@@ -8,10 +8,6 @@
 
 #include "rte_avp_common.h"
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef __KERNEL__
 /* Write memory barrier for kernel compiles */
 #define AVP_WMB() smp_wmb()
@@ -27,6 +23,10 @@ extern "C" {
 #ifndef __KERNEL__
 #include <rte_debug.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Initializes the avp fifo structure
  */
diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
index f10165f2c6..e59ff8793e 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -17,12 +17,12 @@
  * load balancing of network ports
  */
 
+#include <rte_ether.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ether.h>
-
 /* Supported modes of operation of link bonding library  */
 
 #define BONDING_MODE_ROUND_ROBIN		(0)
diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h
index a802f989e9..5af7e2330f 100644
--- a/drivers/net/i40e/rte_pmd_i40e.h
+++ b/drivers/net/i40e/rte_pmd_i40e.h
@@ -14,14 +14,14 @@
  *
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Response sent back to i40e driver from user app after callback
  */
diff --git a/drivers/net/mlx5/mlx5_trace.h b/drivers/net/mlx5/mlx5_trace.h
index 888d96f60b..a8f0b372c8 100644
--- a/drivers/net/mlx5/mlx5_trace.h
+++ b/drivers/net/mlx5/mlx5_trace.h
@@ -11,14 +11,14 @@
  * API for mlx5 PMD trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <mlx5_prm.h>
 #include <rte_mbuf.h>
 #include <rte_trace_point.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* TX burst subroutines trace points. */
 RTE_TRACE_POINT_FP(
 	rte_pmd_mlx5_trace_tx_entry,
diff --git a/drivers/net/ring/rte_eth_ring.h b/drivers/net/ring/rte_eth_ring.h
index 59e074d0ad..98292c7b33 100644
--- a/drivers/net/ring/rte_eth_ring.h
+++ b/drivers/net/ring/rte_eth_ring.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_ETH_RING_H_
 #define _RTE_ETH_RING_H_
 
+#include <rte_ring.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring.h>
-
 /**
  * Create a new ethdev port from a set of rings
  *
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
index 0e68b9f668..6ec59a7adc 100644
--- a/drivers/net/vhost/rte_eth_vhost.h
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -5,15 +5,15 @@
 #ifndef _RTE_ETH_VHOST_H_
 #define _RTE_ETH_VHOST_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdbool.h>
 
 #include <rte_vhost.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Event description.
  */
diff --git a/drivers/raw/ifpga/afu_pmd_core.h b/drivers/raw/ifpga/afu_pmd_core.h
index a8f1afe343..abf9e491f7 100644
--- a/drivers/raw/ifpga/afu_pmd_core.h
+++ b/drivers/raw/ifpga/afu_pmd_core.h
@@ -5,10 +5,6 @@
 #ifndef AFU_PMD_CORE_H
 #define AFU_PMD_CORE_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdio.h>
 #include <unistd.h>
@@ -20,6 +16,10 @@ extern "C" {
 
 #include "ifpga_rawdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define AFU_RAWDEV_MAX_DRVS  32
 
 struct afu_rawdev;
diff --git a/drivers/raw/ifpga/afu_pmd_he_hssi.h b/drivers/raw/ifpga/afu_pmd_he_hssi.h
index aebbe32d54..282289d912 100644
--- a/drivers/raw/ifpga/afu_pmd_he_hssi.h
+++ b/drivers/raw/ifpga/afu_pmd_he_hssi.h
@@ -5,13 +5,13 @@
 #ifndef AFU_PMD_HE_HSSI_H
 #define AFU_PMD_HE_HSSI_H
 
+#include "afu_pmd_core.h"
+#include "rte_pmd_afu.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "afu_pmd_core.h"
-#include "rte_pmd_afu.h"
-
 #define HE_HSSI_UUID_L    0xbb370242ac130002
 #define HE_HSSI_UUID_H    0x823c334c98bf11ea
 #define NUM_HE_HSSI_PORTS 8
diff --git a/drivers/raw/ifpga/afu_pmd_he_lpbk.h b/drivers/raw/ifpga/afu_pmd_he_lpbk.h
index eab7b55199..67b3653c21 100644
--- a/drivers/raw/ifpga/afu_pmd_he_lpbk.h
+++ b/drivers/raw/ifpga/afu_pmd_he_lpbk.h
@@ -5,13 +5,13 @@
 #ifndef AFU_PMD_HE_LPBK_H
 #define AFU_PMD_HE_LPBK_H
 
+#include "afu_pmd_core.h"
+#include "rte_pmd_afu.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "afu_pmd_core.h"
-#include "rte_pmd_afu.h"
-
 #define HE_LPBK_UUID_L     0xb94b12284c31e02b
 #define HE_LPBK_UUID_H     0x56e203e9864f49a7
 #define HE_MEM_LPBK_UUID_L 0xbb652a578330a8eb
diff --git a/drivers/raw/ifpga/afu_pmd_he_mem.h b/drivers/raw/ifpga/afu_pmd_he_mem.h
index 998ca92416..41854d8c58 100644
--- a/drivers/raw/ifpga/afu_pmd_he_mem.h
+++ b/drivers/raw/ifpga/afu_pmd_he_mem.h
@@ -5,13 +5,13 @@
 #ifndef AFU_PMD_HE_MEM_H
 #define AFU_PMD_HE_MEM_H
 
+#include "afu_pmd_core.h"
+#include "rte_pmd_afu.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "afu_pmd_core.h"
-#include "rte_pmd_afu.h"
-
 #define HE_MEM_TG_UUID_L  0xa3dc5b831f5cecbb
 #define HE_MEM_TG_UUID_H  0x4dadea342c7848cb
 
diff --git a/drivers/raw/ifpga/afu_pmd_n3000.h b/drivers/raw/ifpga/afu_pmd_n3000.h
index 403cc64b91..f6b6e07c6b 100644
--- a/drivers/raw/ifpga/afu_pmd_n3000.h
+++ b/drivers/raw/ifpga/afu_pmd_n3000.h
@@ -5,13 +5,13 @@
 #ifndef AFU_PMD_N3000_H
 #define AFU_PMD_N3000_H
 
+#include "afu_pmd_core.h"
+#include "rte_pmd_afu.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "afu_pmd_core.h"
-#include "rte_pmd_afu.h"
-
 #define N3000_AFU_UUID_L  0xc000c9660d824272
 #define N3000_AFU_UUID_H  0x9aeffe5f84570612
 #define N3000_NLB0_UUID_L 0xf89e433683f9040b
diff --git a/drivers/raw/ifpga/rte_pmd_afu.h b/drivers/raw/ifpga/rte_pmd_afu.h
index 5403ed25f5..0edacc3a9c 100644
--- a/drivers/raw/ifpga/rte_pmd_afu.h
+++ b/drivers/raw/ifpga/rte_pmd_afu.h
@@ -14,12 +14,12 @@
  *
  */
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 #define RTE_PMD_AFU_N3000_NLB   1
 #define RTE_PMD_AFU_N3000_DMA   2
 
diff --git a/drivers/raw/ifpga/rte_pmd_ifpga.h b/drivers/raw/ifpga/rte_pmd_ifpga.h
index 791543f2cd..36b7f9c018 100644
--- a/drivers/raw/ifpga/rte_pmd_ifpga.h
+++ b/drivers/raw/ifpga/rte_pmd_ifpga.h
@@ -14,12 +14,12 @@
  *
  */
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 #define IFPGA_MAX_PORT_NUM   4
 
 /**
diff --git a/examples/ethtool/lib/rte_ethtool.h b/examples/ethtool/lib/rte_ethtool.h
index d27e0102b1..c7dd3d9755 100644
--- a/examples/ethtool/lib/rte_ethtool.h
+++ b/examples/ethtool/lib/rte_ethtool.h
@@ -30,14 +30,14 @@
  * rte_ethtool_net_set_rx_mode      net_device_ops::ndo_set_rx_mode
  *
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_ethdev.h>
 #include <linux/ethtool.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Retrieve the Ethernet device driver information according to
  * attributes described by ethtool data structure, ethtool_drvinfo.
diff --git a/examples/qos_sched/main.h b/examples/qos_sched/main.h
index 04e77a4a10..ea66df0434 100644
--- a/examples/qos_sched/main.h
+++ b/examples/qos_sched/main.h
@@ -5,12 +5,12 @@
 #ifndef _MAIN_H_
 #define _MAIN_H_
 
+#include <rte_sched.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_sched.h>
-
 #define RTE_LOGTYPE_APP RTE_LOGTYPE_USER1
 
 /*
diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h
index eb989b20ad..6f70539815 100644
--- a/examples/vm_power_manager/channel_manager.h
+++ b/examples/vm_power_manager/channel_manager.h
@@ -5,16 +5,16 @@
 #ifndef CHANNEL_MANAGER_H_
 #define CHANNEL_MANAGER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <linux/limits.h>
 #include <linux/un.h>
 #include <stdbool.h>
 
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Maximum name length including '\0' terminator */
 #define CHANNEL_MGR_MAX_NAME_LEN    64
 
diff --git a/lib/acl/rte_acl_osdep.h b/lib/acl/rte_acl_osdep.h
index 3c1dc402ca..e4c7d07c69 100644
--- a/lib/acl/rte_acl_osdep.h
+++ b/lib/acl/rte_acl_osdep.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_ACL_OSDEP_H_
 #define _RTE_ACL_OSDEP_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -49,6 +45,10 @@ extern "C" {
 #include <rte_cpuflags.h>
 #include <rte_debug.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index 0cbfdd1c95..9e83dd2bb0 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -20,10 +20,6 @@
  * from the same queue.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdbool.h>
 
@@ -32,6 +28,10 @@ extern "C" {
 
 #include "rte_bbdev_op.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_BBDEV_MAX_DEVS
 #define RTE_BBDEV_MAX_DEVS 128  /**< Max number of devices */
 #endif
diff --git a/lib/bbdev/rte_bbdev_op.h b/lib/bbdev/rte_bbdev_op.h
index 459631d0d0..6f4bae7d0f 100644
--- a/lib/bbdev/rte_bbdev_op.h
+++ b/lib/bbdev/rte_bbdev_op.h
@@ -11,10 +11,6 @@
  * Defines wireless base band layer 1 operations and capabilities
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -23,6 +19,10 @@ extern "C" {
 #include <rte_memory.h>
 #include <rte_mempool.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Number of columns in sub-block interleaver (36.212, section 5.1.4.1.1) */
 #define RTE_BBDEV_TURBO_C_SUBBLOCK (32)
 /* Maximum size of Transport Block (36.213, Table, Table 7.1.7.2.5-1) */
diff --git a/lib/bbdev/rte_bbdev_pmd.h b/lib/bbdev/rte_bbdev_pmd.h
index 442b23943d..0a1738fc05 100644
--- a/lib/bbdev/rte_bbdev_pmd.h
+++ b/lib/bbdev/rte_bbdev_pmd.h
@@ -14,15 +14,15 @@
  * bbdev interface. User applications should not use this API.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_log.h>
 
 #include "rte_bbdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Suggested value for SW based devices */
 #define RTE_BBDEV_DEFAULT_MAX_NB_QUEUES RTE_MAX_LCORE
 
diff --git a/lib/bpf/bpf_def.h b/lib/bpf/bpf_def.h
index f08cd9106b..9f2e162914 100644
--- a/lib/bpf/bpf_def.h
+++ b/lib/bpf/bpf_def.h
@@ -7,10 +7,6 @@
 #ifndef _RTE_BPF_DEF_H_
 #define _RTE_BPF_DEF_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -25,6 +21,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 
 /*
  * The instruction encodings.
diff --git a/lib/compressdev/rte_comp.h b/lib/compressdev/rte_comp.h
index 830a240b6b..d66a4b1cb9 100644
--- a/lib/compressdev/rte_comp.h
+++ b/lib/compressdev/rte_comp.h
@@ -11,12 +11,12 @@
  * RTE definitions for Data Compression Service
  */
 
+#include <rte_mbuf.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_mbuf.h>
-
 /**
  * compression service feature flags
  *
diff --git a/lib/compressdev/rte_compressdev.h b/lib/compressdev/rte_compressdev.h
index e0294a18bd..b3392553a6 100644
--- a/lib/compressdev/rte_compressdev.h
+++ b/lib/compressdev/rte_compressdev.h
@@ -13,13 +13,13 @@
  * Defines comp device APIs for the provisioning of compression operations.
  */
 
+
+#include "rte_comp.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-
-#include "rte_comp.h"
-
 /**
  * Parameter log base 2 range description.
  * Final value will be 2^value.
diff --git a/lib/compressdev/rte_compressdev_internal.h b/lib/compressdev/rte_compressdev_internal.h
index 67f8b51a37..a980d74cbf 100644
--- a/lib/compressdev/rte_compressdev_internal.h
+++ b/lib/compressdev/rte_compressdev_internal.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_COMPRESSDEV_INTERNAL_H_
 #define _RTE_COMPRESSDEV_INTERNAL_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /* rte_compressdev_internal.h
  * This file holds Compressdev private data structures.
  */
@@ -16,6 +12,10 @@ extern "C" {
 
 #include "rte_comp.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_COMPRESSDEV_NAME_MAX_LEN	(64)
 /**< Max length of name of comp PMD */
 
diff --git a/lib/compressdev/rte_compressdev_pmd.h b/lib/compressdev/rte_compressdev_pmd.h
index 32e29c9d16..ea721f014d 100644
--- a/lib/compressdev/rte_compressdev_pmd.h
+++ b/lib/compressdev/rte_compressdev_pmd.h
@@ -13,10 +13,6 @@
  * them directly.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 
 #include <dev_driver.h>
@@ -24,6 +20,10 @@ extern "C" {
 #include "rte_compressdev.h"
 #include "rte_compressdev_internal.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_COMPRESSDEV_PMD_NAME_ARG			("name")
 #define RTE_COMPRESSDEV_PMD_SOCKET_ID_ARG		("socket_id")
 
diff --git a/lib/cryptodev/cryptodev_pmd.h b/lib/cryptodev/cryptodev_pmd.h
index 6c114f7181..3e2e2673b8 100644
--- a/lib/cryptodev/cryptodev_pmd.h
+++ b/lib/cryptodev/cryptodev_pmd.h
@@ -5,10 +5,6 @@
 #ifndef _CRYPTODEV_PMD_H_
 #define _CRYPTODEV_PMD_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** @file
  * RTE Crypto PMD APIs
  *
@@ -28,6 +24,10 @@ extern "C" {
 #include "rte_crypto.h"
 #include "rte_cryptodev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 
 #define RTE_CRYPTODEV_PMD_DEFAULT_MAX_NB_QUEUE_PAIRS	8
 
diff --git a/lib/cryptodev/cryptodev_trace.h b/lib/cryptodev/cryptodev_trace.h
index 935f0d564b..e186f0f3c1 100644
--- a/lib/cryptodev/cryptodev_trace.h
+++ b/lib/cryptodev/cryptodev_trace.h
@@ -11,14 +11,14 @@
  * API for cryptodev trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_trace_point.h>
 
 #include "rte_cryptodev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_TRACE_POINT(
 	rte_cryptodev_trace_configure,
 	RTE_TRACE_POINT_ARGS(uint8_t dev_id,
diff --git a/lib/cryptodev/rte_crypto.h b/lib/cryptodev/rte_crypto.h
index dbc2700da5..dcf4a36fb2 100644
--- a/lib/cryptodev/rte_crypto.h
+++ b/lib/cryptodev/rte_crypto.h
@@ -11,10 +11,6 @@
  * RTE Cryptography Common Definitions
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 
 #include <rte_mbuf.h>
 #include <rte_memory.h>
@@ -24,6 +20,10 @@ extern "C" {
 #include "rte_crypto_sym.h"
 #include "rte_crypto_asym.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Crypto operation types */
 enum rte_crypto_op_type {
 	RTE_CRYPTO_OP_TYPE_UNDEFINED,
diff --git a/lib/cryptodev/rte_crypto_asym.h b/lib/cryptodev/rte_crypto_asym.h
index 39d3da3952..4b7ea36961 100644
--- a/lib/cryptodev/rte_crypto_asym.h
+++ b/lib/cryptodev/rte_crypto_asym.h
@@ -14,10 +14,6 @@
  * asymmetric crypto operations.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 #include <stdint.h>
 
@@ -27,6 +23,10 @@ extern "C" {
 
 #include "rte_crypto_sym.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_cryptodev_asym_session;
 
 /** asym key exchange operation type name strings */
diff --git a/lib/cryptodev/rte_crypto_sym.h b/lib/cryptodev/rte_crypto_sym.h
index 53b18b9412..fb73024010 100644
--- a/lib/cryptodev/rte_crypto_sym.h
+++ b/lib/cryptodev/rte_crypto_sym.h
@@ -14,10 +14,6 @@
  * as supported symmetric crypto operation combinations.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 
 #include <rte_compat.h>
@@ -26,6 +22,10 @@ extern "C" {
 #include <rte_mempool.h>
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Crypto IO Vector (in analogy with struct iovec)
  * Supposed be used to pass input/output data buffers for crypto data-path
diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
index bec947f6d5..8051c5a6a3 100644
--- a/lib/cryptodev/rte_cryptodev.h
+++ b/lib/cryptodev/rte_cryptodev.h
@@ -14,10 +14,6 @@
  * authentication operations.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include "rte_kvargs.h"
 #include "rte_crypto.h"
@@ -1859,6 +1855,10 @@ int rte_cryptodev_remove_deq_callback(uint8_t dev_id,
 				      struct rte_cryptodev_cb *cb);
 
 #include <rte_cryptodev_core.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 /**
  *
  * Dequeue a burst of processed crypto operations from a queue on the crypto
diff --git a/lib/cryptodev/rte_cryptodev_trace_fp.h b/lib/cryptodev/rte_cryptodev_trace_fp.h
index dbfbc7b2e5..f23f882804 100644
--- a/lib/cryptodev/rte_cryptodev_trace_fp.h
+++ b/lib/cryptodev/rte_cryptodev_trace_fp.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_CRYPTODEV_TRACE_FP_H_
 #define _RTE_CRYPTODEV_TRACE_FP_H_
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 RTE_TRACE_POINT_FP(
 	rte_cryptodev_trace_enqueue_burst,
 	RTE_TRACE_POINT_ARGS(uint8_t dev_id, uint16_t qp_id, void **ops,
diff --git a/lib/dispatcher/rte_dispatcher.h b/lib/dispatcher/rte_dispatcher.h
index d8182d5f2c..ba2c353073 100644
--- a/lib/dispatcher/rte_dispatcher.h
+++ b/lib/dispatcher/rte_dispatcher.h
@@ -19,16 +19,16 @@
  * event device.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stdint.h>
 
 #include <rte_compat.h>
 #include <rte_eventdev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Function prototype for match callbacks.
  *
diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
index 5474a5281d..11b72b0f2d 100644
--- a/lib/dmadev/rte_dmadev.h
+++ b/lib/dmadev/rte_dmadev.h
@@ -149,10 +149,6 @@
 #include <rte_bitops.h>
 #include <rte_common.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** Maximum number of devices if rte_dma_dev_max() is not called. */
 #define RTE_DMADEV_DEFAULT_MAX 64
 
@@ -775,6 +771,10 @@ struct rte_dma_sge {
 #include "rte_dmadev_core.h"
 #include "rte_dmadev_trace_fp.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**@{@name DMA operation flag
  * @see rte_dma_copy()
  * @see rte_dma_copy_sg()
diff --git a/lib/eal/arm/include/rte_atomic_32.h b/lib/eal/arm/include/rte_atomic_32.h
index 62fc33773d..0b9a0dfa30 100644
--- a/lib/eal/arm/include/rte_atomic_32.h
+++ b/lib/eal/arm/include/rte_atomic_32.h
@@ -9,12 +9,12 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
+#include "generic/rte_atomic.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_atomic.h"
-
 #define	rte_mb()  __sync_synchronize()
 
 #define	rte_wmb() do { asm volatile ("dmb st" : : : "memory"); } while (0)
diff --git a/lib/eal/arm/include/rte_atomic_64.h b/lib/eal/arm/include/rte_atomic_64.h
index 7c99fc0a02..181bb60929 100644
--- a/lib/eal/arm/include/rte_atomic_64.h
+++ b/lib/eal/arm/include/rte_atomic_64.h
@@ -10,14 +10,14 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "generic/rte_atomic.h"
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define rte_mb() asm volatile("dmb osh" : : : "memory")
 
 #define rte_wmb() asm volatile("dmb oshst" : : : "memory")
diff --git a/lib/eal/arm/include/rte_byteorder.h b/lib/eal/arm/include/rte_byteorder.h
index ff02052f2e..a0aaff4a28 100644
--- a/lib/eal/arm/include/rte_byteorder.h
+++ b/lib/eal/arm/include/rte_byteorder.h
@@ -9,14 +9,14 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include "generic/rte_byteorder.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* ARM architecture is bi-endian (both big and little). */
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 
diff --git a/lib/eal/arm/include/rte_cpuflags_32.h b/lib/eal/arm/include/rte_cpuflags_32.h
index 770b09b99d..7e33acd9fb 100644
--- a/lib/eal/arm/include/rte_cpuflags_32.h
+++ b/lib/eal/arm/include/rte_cpuflags_32.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_CPUFLAGS_ARM32_H_
 #define _RTE_CPUFLAGS_ARM32_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -46,6 +42,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h
index afe70209c3..f84633159e 100644
--- a/lib/eal/arm/include/rte_cpuflags_64.h
+++ b/lib/eal/arm/include/rte_cpuflags_64.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_CPUFLAGS_ARM64_H_
 #define _RTE_CPUFLAGS_ARM64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -40,6 +36,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/arm/include/rte_cycles_32.h b/lib/eal/arm/include/rte_cycles_32.h
index 859cd2e5bb..2b20c8c6f5 100644
--- a/lib/eal/arm/include/rte_cycles_32.h
+++ b/lib/eal/arm/include/rte_cycles_32.h
@@ -15,12 +15,12 @@
 
 #include <time.h>
 
+#include "generic/rte_cycles.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_cycles.h"
-
 /**
  * Read the time base register.
  *
diff --git a/lib/eal/arm/include/rte_cycles_64.h b/lib/eal/arm/include/rte_cycles_64.h
index 8b05302f47..bb76e4d7e0 100644
--- a/lib/eal/arm/include/rte_cycles_64.h
+++ b/lib/eal/arm/include/rte_cycles_64.h
@@ -6,12 +6,12 @@
 #ifndef _RTE_CYCLES_ARM64_H_
 #define _RTE_CYCLES_ARM64_H_
 
+#include "generic/rte_cycles.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_cycles.h"
-
 /** Read generic counter frequency */
 static __rte_always_inline uint64_t
 __rte_arm64_cntfrq(void)
diff --git a/lib/eal/arm/include/rte_io.h b/lib/eal/arm/include/rte_io.h
index f4e66e6bad..658768697c 100644
--- a/lib/eal/arm/include/rte_io.h
+++ b/lib/eal/arm/include/rte_io.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_IO_ARM_H_
 #define _RTE_IO_ARM_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef RTE_ARCH_64
 #include "rte_io_64.h"
 #else
 #include "generic/rte_io.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #endif
 
 #ifdef __cplusplus
diff --git a/lib/eal/arm/include/rte_io_64.h b/lib/eal/arm/include/rte_io_64.h
index 96da7789ce..88db82a7eb 100644
--- a/lib/eal/arm/include/rte_io_64.h
+++ b/lib/eal/arm/include/rte_io_64.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_IO_ARM64_H_
 #define _RTE_IO_ARM64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #define RTE_OVERRIDE_IO_H
@@ -17,6 +13,10 @@ extern "C" {
 #include <rte_compat.h>
 #include "rte_atomic_64.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static __rte_always_inline uint8_t
 rte_read8_relaxed(const volatile void *addr)
 {
diff --git a/lib/eal/arm/include/rte_memcpy_32.h b/lib/eal/arm/include/rte_memcpy_32.h
index fb3245b59c..99fd5757ca 100644
--- a/lib/eal/arm/include/rte_memcpy_32.h
+++ b/lib/eal/arm/include/rte_memcpy_32.h
@@ -8,10 +8,6 @@
 #include <stdint.h>
 #include <string.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "generic/rte_memcpy.h"
 
 #ifdef RTE_ARCH_ARM_NEON_MEMCPY
@@ -23,6 +19,10 @@ extern "C" {
 /* ARM NEON Intrinsics are used to copy data */
 #include <arm_neon.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
diff --git a/lib/eal/arm/include/rte_memcpy_64.h b/lib/eal/arm/include/rte_memcpy_64.h
index 85ad587bd3..c7d0c345ad 100644
--- a/lib/eal/arm/include/rte_memcpy_64.h
+++ b/lib/eal/arm/include/rte_memcpy_64.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_MEMCPY_ARM64_H_
 #define _RTE_MEMCPY_ARM64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <string.h>
 
@@ -18,6 +14,10 @@ extern "C" {
 #include <rte_common.h>
 #include <rte_branch_prediction.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * The memory copy performance differs on different AArch64 micro-architectures.
  * And the most recent glibc (e.g. 2.23 or later) can provide a better memcpy()
diff --git a/lib/eal/arm/include/rte_pause.h b/lib/eal/arm/include/rte_pause.h
index 6c7002ad98..8f35d60a6e 100644
--- a/lib/eal/arm/include/rte_pause.h
+++ b/lib/eal/arm/include/rte_pause.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_PAUSE_ARM_H_
 #define _RTE_PAUSE_ARM_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef RTE_ARCH_64
 #include <rte_pause_64.h>
 #else
 #include <rte_pause_32.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #endif
 
 #ifdef __cplusplus
diff --git a/lib/eal/arm/include/rte_pause_32.h b/lib/eal/arm/include/rte_pause_32.h
index d4768c7a98..7870fac763 100644
--- a/lib/eal/arm/include/rte_pause_32.h
+++ b/lib/eal/arm/include/rte_pause_32.h
@@ -5,13 +5,13 @@
 #ifndef _RTE_PAUSE_ARM32_H_
 #define _RTE_PAUSE_ARM32_H_
 
+#include <rte_common.h>
+#include "generic/rte_pause.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_pause.h"
-
 static inline void rte_pause(void)
 {
 }
diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h
index 9e2dbf3531..1526bf87cc 100644
--- a/lib/eal/arm/include/rte_pause_64.h
+++ b/lib/eal/arm/include/rte_pause_64.h
@@ -6,10 +6,6 @@
 #ifndef _RTE_PAUSE_ARM64_H_
 #define _RTE_PAUSE_ARM64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_stdatomic.h>
 
@@ -19,6 +15,10 @@ extern "C" {
 
 #include "generic/rte_pause.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_pause(void)
 {
 	asm volatile("yield" ::: "memory");
diff --git a/lib/eal/arm/include/rte_power_intrinsics.h b/lib/eal/arm/include/rte_power_intrinsics.h
index 9e498e9ebf..5481f45ad3 100644
--- a/lib/eal/arm/include/rte_power_intrinsics.h
+++ b/lib/eal/arm/include/rte_power_intrinsics.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_POWER_INTRINSIC_ARM_H_
 #define _RTE_POWER_INTRINSIC_ARM_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/arm/include/rte_prefetch_32.h b/lib/eal/arm/include/rte_prefetch_32.h
index 0e9a140c8a..619bf27c79 100644
--- a/lib/eal/arm/include/rte_prefetch_32.h
+++ b/lib/eal/arm/include/rte_prefetch_32.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_PREFETCH_ARM32_H_
 #define _RTE_PREFETCH_ARM32_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	asm volatile ("pld [%0]" : : "r" (p));
diff --git a/lib/eal/arm/include/rte_prefetch_64.h b/lib/eal/arm/include/rte_prefetch_64.h
index 22cba48e29..4f60123b8b 100644
--- a/lib/eal/arm/include/rte_prefetch_64.h
+++ b/lib/eal/arm/include/rte_prefetch_64.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_PREFETCH_ARM_64_H_
 #define _RTE_PREFETCH_ARM_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	asm volatile ("PRFM PLDL1KEEP, [%0]" : : "r" (p));
diff --git a/lib/eal/arm/include/rte_rwlock.h b/lib/eal/arm/include/rte_rwlock.h
index 18bb37b036..727cabafec 100644
--- a/lib/eal/arm/include/rte_rwlock.h
+++ b/lib/eal/arm/include/rte_rwlock.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_RWLOCK_ARM_H_
 #define _RTE_RWLOCK_ARM_H_
 
+#include "generic/rte_rwlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 {
diff --git a/lib/eal/arm/include/rte_spinlock.h b/lib/eal/arm/include/rte_spinlock.h
index a973763c23..a5d01b0d21 100644
--- a/lib/eal/arm/include/rte_spinlock.h
+++ b/lib/eal/arm/include/rte_spinlock.h
@@ -9,13 +9,13 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
+#include <rte_common.h>
+#include "generic/rte_spinlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_spinlock.h"
-
 static inline int rte_tm_supported(void)
 {
 	return 0;
diff --git a/lib/eal/freebsd/include/rte_os.h b/lib/eal/freebsd/include/rte_os.h
index 003468caff..f31f6af12d 100644
--- a/lib/eal/freebsd/include/rte_os.h
+++ b/lib/eal/freebsd/include/rte_os.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_OS_H_
 #define _RTE_OS_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * This header should contain any definition
  * which is not supported natively or named differently in FreeBSD.
@@ -17,6 +13,10 @@ extern "C" {
 #include <pthread_np.h>
 #include <sys/queue.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* These macros are compatible with system's sys/queue.h. */
 #define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
 #define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
diff --git a/lib/eal/include/bus_driver.h b/lib/eal/include/bus_driver.h
index 7b85a17a09..60527b75b6 100644
--- a/lib/eal/include/bus_driver.h
+++ b/lib/eal/include/bus_driver.h
@@ -5,16 +5,16 @@
 #ifndef BUS_DRIVER_H
 #define BUS_DRIVER_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bus.h>
 #include <rte_compat.h>
 #include <rte_dev.h>
 #include <rte_eal.h>
 #include <rte_tailq.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_devargs;
 struct rte_device;
 
diff --git a/lib/eal/include/dev_driver.h b/lib/eal/include/dev_driver.h
index 5efa8c437e..f7a9c17dc3 100644
--- a/lib/eal/include/dev_driver.h
+++ b/lib/eal/include/dev_driver.h
@@ -5,13 +5,13 @@
 #ifndef DEV_DRIVER_H
 #define DEV_DRIVER_H
 
+#include <rte_common.h>
+#include <rte_dev.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <rte_dev.h>
-
 /**
  * A structure describing a device driver.
  */
diff --git a/lib/eal/include/eal_trace_internal.h b/lib/eal/include/eal_trace_internal.h
index 09c354717f..50f91d0929 100644
--- a/lib/eal/include/eal_trace_internal.h
+++ b/lib/eal/include/eal_trace_internal.h
@@ -11,16 +11,16 @@
  * API for EAL trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_alarm.h>
 #include <rte_interrupts.h>
 #include <rte_trace_point.h>
 
 #include "eal_interrupts.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Alarm */
 RTE_TRACE_POINT(
 	rte_eal_trace_alarm_set,
diff --git a/lib/eal/include/generic/rte_cycles.h b/lib/eal/include/generic/rte_cycles.h
index 075e899f5a..7cfd51f0eb 100644
--- a/lib/eal/include/generic/rte_cycles.h
+++ b/lib/eal/include/generic/rte_cycles.h
@@ -16,6 +16,10 @@
 #include <rte_debug.h>
 #include <rte_atomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define MS_PER_S 1000
 #define US_PER_S 1000000
 #define NS_PER_S 1000000000
@@ -175,4 +179,8 @@ void rte_delay_us_sleep(unsigned int us);
  */
 void rte_delay_us_callback_register(void(*userfunc)(unsigned int));
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_CYCLES_H_ */
diff --git a/lib/eal/include/generic/rte_memcpy.h b/lib/eal/include/generic/rte_memcpy.h
index e7f0f8eaa9..da53b72ca8 100644
--- a/lib/eal/include/generic/rte_memcpy.h
+++ b/lib/eal/include/generic/rte_memcpy.h
@@ -5,6 +5,10 @@
 #ifndef _RTE_MEMCPY_H_
 #define _RTE_MEMCPY_H_
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @file
  *
@@ -113,4 +117,8 @@ rte_memcpy(void *dst, const void *src, size_t n);
 
 #endif /* __DOXYGEN__ */
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_MEMCPY_H_ */
diff --git a/lib/eal/include/generic/rte_pause.h b/lib/eal/include/generic/rte_pause.h
index f2a1eadcbd..968c0886d3 100644
--- a/lib/eal/include/generic/rte_pause.h
+++ b/lib/eal/include/generic/rte_pause.h
@@ -19,6 +19,10 @@
 #include <rte_atomic.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Pause CPU execution for a short while
  *
@@ -136,4 +140,8 @@ rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
 } while (0)
 #endif /* ! RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED */
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_PAUSE_H_ */
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index ea899f1bfa..86c0559468 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -9,6 +9,10 @@
 
 #include <rte_spinlock.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @file
  * Advanced power management operations.
@@ -147,4 +151,8 @@ int rte_power_pause(const uint64_t tsc_timestamp);
 int rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
 		const uint32_t num, const uint64_t tsc_timestamp);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_POWER_INTRINSIC_H_ */
diff --git a/lib/eal/include/generic/rte_prefetch.h b/lib/eal/include/generic/rte_prefetch.h
index 773b3b8d1e..f7ac4ab48a 100644
--- a/lib/eal/include/generic/rte_prefetch.h
+++ b/lib/eal/include/generic/rte_prefetch.h
@@ -7,6 +7,10 @@
 
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @file
  *
@@ -146,4 +150,8 @@ __rte_experimental
 static inline void
 rte_cldemote(const volatile void *p);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/eal/include/generic/rte_rwlock.h b/lib/eal/include/generic/rte_rwlock.h
index 5f939be98c..ac0474466a 100644
--- a/lib/eal/include/generic/rte_rwlock.h
+++ b/lib/eal/include/generic/rte_rwlock.h
@@ -22,10 +22,6 @@
  *  https://locklessinc.com/articles/locks/
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <errno.h>
 
 #include <rte_branch_prediction.h>
@@ -34,6 +30,10 @@ extern "C" {
 #include <rte_pause.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_rwlock_t type.
  *
diff --git a/lib/eal/include/generic/rte_spinlock.h b/lib/eal/include/generic/rte_spinlock.h
index 23fb04896f..c2980601b2 100644
--- a/lib/eal/include/generic/rte_spinlock.h
+++ b/lib/eal/include/generic/rte_spinlock.h
@@ -25,6 +25,10 @@
 #include <rte_pause.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_spinlock_t type.
  */
@@ -318,4 +322,8 @@ __rte_warn_unused_result
 static inline int rte_spinlock_recursive_trylock_tm(
 	rte_spinlock_recursive_t *slr);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_SPINLOCK_H_ */
diff --git a/lib/eal/include/rte_alarm.h b/lib/eal/include/rte_alarm.h
index 7e4d0b2407..9b4721b77f 100644
--- a/lib/eal/include/rte_alarm.h
+++ b/lib/eal/include/rte_alarm.h
@@ -14,12 +14,12 @@
  * Does not require hpet support.
  */
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 /**
  * Signature of callback back function called when an alarm goes off.
  */
diff --git a/lib/eal/include/rte_bitmap.h b/lib/eal/include/rte_bitmap.h
index ebe46000a0..abb102f1d3 100644
--- a/lib/eal/include/rte_bitmap.h
+++ b/lib/eal/include/rte_bitmap.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_BITMAP_H__
 #define __INCLUDE_RTE_BITMAP_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Bitmap
@@ -43,6 +39,10 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_prefetch.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Slab */
 #define RTE_BITMAP_SLAB_BIT_SIZE                 64
 #define RTE_BITMAP_SLAB_BIT_SIZE_LOG2            6
diff --git a/lib/eal/include/rte_bus.h b/lib/eal/include/rte_bus.h
index dfe756fb11..519f7b35f0 100644
--- a/lib/eal/include/rte_bus.h
+++ b/lib/eal/include/rte_bus.h
@@ -14,14 +14,14 @@
  * over the devices and drivers in EAL.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 
 #include <rte_eal.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_bus;
 struct rte_device;
 
diff --git a/lib/eal/include/rte_class.h b/lib/eal/include/rte_class.h
index 16e544ec9a..7631e36e82 100644
--- a/lib/eal/include/rte_class.h
+++ b/lib/eal/include/rte_class.h
@@ -18,12 +18,12 @@
  * cryptographic co-processor (crypto), etc.
  */
 
+#include <rte_dev.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_dev.h>
-
 /** Double linked list of classes */
 RTE_TAILQ_HEAD(rte_class_list, rte_class);
 
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index eec0400dad..2486caa471 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -12,10 +12,6 @@
  * for DPDK.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <assert.h>
 #include <limits.h>
 #include <stdint.h>
@@ -26,6 +22,10 @@ extern "C" {
 /* OS specific include */
 #include <rte_os.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_TOOLCHAIN_MSVC
 #ifndef typeof
 #define typeof __typeof__
diff --git a/lib/eal/include/rte_dev.h b/lib/eal/include/rte_dev.h
index cefa04f905..738400e8d1 100644
--- a/lib/eal/include/rte_dev.h
+++ b/lib/eal/include/rte_dev.h
@@ -13,16 +13,16 @@
  * This file manages the list of device drivers.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 
 #include <rte_config.h>
 #include <rte_common.h>
 #include <rte_log.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_bus;
 struct rte_devargs;
 struct rte_device;
diff --git a/lib/eal/include/rte_devargs.h b/lib/eal/include/rte_devargs.h
index 515e978bbe..ed5a4675d9 100644
--- a/lib/eal/include/rte_devargs.h
+++ b/lib/eal/include/rte_devargs.h
@@ -16,14 +16,14 @@
  * list of rte_devargs structures.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 
 #include <rte_dev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_bus;
 
 /**
diff --git a/lib/eal/include/rte_eal_trace.h b/lib/eal/include/rte_eal_trace.h
index c3d15bbe5e..9ad2112801 100644
--- a/lib/eal/include/rte_eal_trace.h
+++ b/lib/eal/include/rte_eal_trace.h
@@ -11,12 +11,12 @@
  * API for EAL trace support
  */
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 /* Generic */
 RTE_TRACE_POINT(
 	rte_eal_trace_generic_void,
diff --git a/lib/eal/include/rte_errno.h b/lib/eal/include/rte_errno.h
index ba45591d24..c49818a40e 100644
--- a/lib/eal/include/rte_errno.h
+++ b/lib/eal/include/rte_errno.h
@@ -11,12 +11,12 @@
 #ifndef _RTE_ERRNO_H_
 #define _RTE_ERRNO_H_
 
+#include <rte_per_lcore.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_per_lcore.h>
-
 RTE_DECLARE_PER_LCORE(int, _rte_errno); /**< Per core error number. */
 
 /**
diff --git a/lib/eal/include/rte_fbarray.h b/lib/eal/include/rte_fbarray.h
index e33076778f..27dbfc2d6c 100644
--- a/lib/eal/include/rte_fbarray.h
+++ b/lib/eal/include/rte_fbarray.h
@@ -30,14 +30,14 @@
  * another process is using ``rte_fbarray``.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 
 #include <rte_rwlock.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_FBARRAY_NAME_LEN 64
 
 struct rte_fbarray {
diff --git a/lib/eal/include/rte_keepalive.h b/lib/eal/include/rte_keepalive.h
index 3ec413da01..9ff870f6b4 100644
--- a/lib/eal/include/rte_keepalive.h
+++ b/lib/eal/include/rte_keepalive.h
@@ -10,13 +10,13 @@
 #ifndef _KEEPALIVE_H_
 #define _KEEPALIVE_H_
 
+#include <rte_config.h>
+#include <rte_memory.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_config.h>
-#include <rte_memory.h>
-
 #ifndef RTE_KEEPALIVE_MAXCORES
 /**
  * Number of cores to track.
diff --git a/lib/eal/include/rte_mcslock.h b/lib/eal/include/rte_mcslock.h
index 0aeb1a09f4..bb218d2e50 100644
--- a/lib/eal/include/rte_mcslock.h
+++ b/lib/eal/include/rte_mcslock.h
@@ -19,16 +19,16 @@
  * they acquired the lock.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_lcore.h>
 #include <rte_common.h>
 #include <rte_pause.h>
 #include <rte_branch_prediction.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_mcslock_t type.
  */
diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h
index 842362d527..dbd0a6bedc 100644
--- a/lib/eal/include/rte_memory.h
+++ b/lib/eal/include/rte_memory.h
@@ -15,16 +15,16 @@
 #include <stddef.h>
 #include <stdio.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bitops.h>
 #include <rte_common.h>
 #include <rte_config.h>
 #include <rte_eal_memconfig.h>
 #include <rte_fbarray.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_PGSIZE_4K   (1ULL << 12)
 #define RTE_PGSIZE_64K  (1ULL << 16)
 #define RTE_PGSIZE_256K (1ULL << 18)
diff --git a/lib/eal/include/rte_pci_dev_features.h b/lib/eal/include/rte_pci_dev_features.h
index ee6e10590c..bc6d3d4c1f 100644
--- a/lib/eal/include/rte_pci_dev_features.h
+++ b/lib/eal/include/rte_pci_dev_features.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_PCI_DEV_FEATURES_H
 #define _RTE_PCI_DEV_FEATURES_H
 
+#include <rte_pci_dev_feature_defs.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_pci_dev_feature_defs.h>
-
 #define RTE_INTR_MODE_NONE_NAME "none"
 #define RTE_INTR_MODE_LEGACY_NAME "legacy"
 #define RTE_INTR_MODE_MSI_NAME "msi"
diff --git a/lib/eal/include/rte_pflock.h b/lib/eal/include/rte_pflock.h
index 37aa223ac3..6797ce5920 100644
--- a/lib/eal/include/rte_pflock.h
+++ b/lib/eal/include/rte_pflock.h
@@ -27,14 +27,14 @@
  * All locks must be initialised before use, and only initialised once.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_pause.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_pflock_t type.
  */
diff --git a/lib/eal/include/rte_random.h b/lib/eal/include/rte_random.h
index 5031c6fe5f..15cbe6215a 100644
--- a/lib/eal/include/rte_random.h
+++ b/lib/eal/include/rte_random.h
@@ -11,12 +11,12 @@
  * Pseudo-random Generators in RTE
  */
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 /**
  * Seed the pseudo-random generator.
  *
diff --git a/lib/eal/include/rte_seqcount.h b/lib/eal/include/rte_seqcount.h
index 88a6746900..d71afa6ab7 100644
--- a/lib/eal/include/rte_seqcount.h
+++ b/lib/eal/include/rte_seqcount.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_SEQCOUNT_H_
 #define _RTE_SEQCOUNT_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Seqcount
@@ -27,6 +23,10 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The RTE seqcount type.
  */
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
index 2677bd9440..e0e94900d1 100644
--- a/lib/eal/include/rte_seqlock.h
+++ b/lib/eal/include/rte_seqlock.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_SEQLOCK_H_
 #define _RTE_SEQLOCK_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Seqlock
@@ -95,6 +91,10 @@ extern "C" {
 #include <rte_seqcount.h>
 #include <rte_spinlock.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The RTE seqlock type.
  */
diff --git a/lib/eal/include/rte_service.h b/lib/eal/include/rte_service.h
index e49a7a877e..94919ae584 100644
--- a/lib/eal/include/rte_service.h
+++ b/lib/eal/include/rte_service.h
@@ -23,16 +23,16 @@
  * application has access to the remaining lcores as normal.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include<stdio.h>
 #include <stdint.h>
 
 #include <rte_config.h>
 #include <rte_lcore.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_SERVICE_NAME_MAX 32
 
 /* Capabilities of a service.
diff --git a/lib/eal/include/rte_service_component.h b/lib/eal/include/rte_service_component.h
index a5350c97e5..acdf45cf60 100644
--- a/lib/eal/include/rte_service_component.h
+++ b/lib/eal/include/rte_service_component.h
@@ -10,12 +10,12 @@
  * operate, and you wish to run the component using service cores
  */
 
+#include <rte_service.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_service.h>
-
 /**
  * Signature of callback function to run a service.
  *
diff --git a/lib/eal/include/rte_stdatomic.h b/lib/eal/include/rte_stdatomic.h
index 7a081cb500..0f11a15e4e 100644
--- a/lib/eal/include/rte_stdatomic.h
+++ b/lib/eal/include/rte_stdatomic.h
@@ -7,10 +7,6 @@
 
 #include <assert.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef RTE_ENABLE_STDATOMIC
 #ifndef _MSC_VER
 #ifdef __STDC_NO_ATOMICS__
@@ -188,6 +184,7 @@ typedef int rte_memory_order;
 #endif
 
 #ifdef __cplusplus
+extern "C" {
 }
 #endif
 
diff --git a/lib/eal/include/rte_string_fns.h b/lib/eal/include/rte_string_fns.h
index 13badec7b3..702bd81251 100644
--- a/lib/eal/include/rte_string_fns.h
+++ b/lib/eal/include/rte_string_fns.h
@@ -11,10 +11,6 @@
 #ifndef _RTE_STRING_FNS_H_
 #define _RTE_STRING_FNS_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <ctype.h>
 #include <stdio.h>
 #include <string.h>
@@ -22,6 +18,10 @@ extern "C" {
 #include <rte_common.h>
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Takes string "string" parameter and splits it at character "delim"
  * up to maxtokens-1 times - to give "maxtokens" resulting tokens. Like
@@ -77,6 +77,10 @@ rte_strlcat(char *dst, const char *src, size_t size)
 	return l + strlen(src);
 }
 
+#ifdef __cplusplus
+}
+#endif
+
 /* pull in a strlcpy function */
 #ifdef RTE_EXEC_ENV_FREEBSD
 #ifndef __BSD_VISIBLE /* non-standard functions are hidden */
@@ -95,6 +99,10 @@ rte_strlcat(char *dst, const char *src, size_t size)
 #endif /* RTE_USE_LIBBSD */
 #endif /* FREEBSD */
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Copy string src to buffer dst of size dsize.
  * At most dsize-1 chars will be copied.
@@ -141,7 +149,6 @@ rte_str_skip_leading_spaces(const char *src)
 	return p;
 }
 
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/include/rte_tailq.h b/lib/eal/include/rte_tailq.h
index 931d549e59..89f7ef2134 100644
--- a/lib/eal/include/rte_tailq.h
+++ b/lib/eal/include/rte_tailq.h
@@ -10,13 +10,13 @@
  *  Here defines rte_tailq APIs for only internal use
  */
 
+#include <stdio.h>
+#include <rte_debug.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdio.h>
-#include <rte_debug.h>
-
 /** dummy structure type used by the rte_tailq APIs */
 struct rte_tailq_entry {
 	RTE_TAILQ_ENTRY(rte_tailq_entry) next; /**< Pointer entries for a tailq list */
diff --git a/lib/eal/include/rte_ticketlock.h b/lib/eal/include/rte_ticketlock.h
index 73884eb07b..e60f60699c 100644
--- a/lib/eal/include/rte_ticketlock.h
+++ b/lib/eal/include/rte_ticketlock.h
@@ -17,15 +17,15 @@
  * All locks must be initialised before use, and only initialised once.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_lcore.h>
 #include <rte_pause.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_ticketlock_t type.
  */
diff --git a/lib/eal/include/rte_time.h b/lib/eal/include/rte_time.h
index ec25f7b93d..c5c3a233e4 100644
--- a/lib/eal/include/rte_time.h
+++ b/lib/eal/include/rte_time.h
@@ -5,13 +5,13 @@
 #ifndef _RTE_TIME_H_
 #define _RTE_TIME_H_
 
+#include <stdint.h>
+#include <time.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include <time.h>
-
 #define NSEC_PER_SEC             1000000000L
 
 /**
diff --git a/lib/eal/include/rte_trace.h b/lib/eal/include/rte_trace.h
index a6e991fad3..1c824b2158 100644
--- a/lib/eal/include/rte_trace.h
+++ b/lib/eal/include/rte_trace.h
@@ -16,16 +16,16 @@
  * @b EXPERIMENTAL: this API may change without prior notice
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stdio.h>
 
 #include <rte_common.h>
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  *  Test if trace is enabled.
  *
diff --git a/lib/eal/include/rte_trace_point.h b/lib/eal/include/rte_trace_point.h
index 41e2a7f99e..bc737d585e 100644
--- a/lib/eal/include/rte_trace_point.h
+++ b/lib/eal/include/rte_trace_point.h
@@ -16,10 +16,6 @@
  * @b EXPERIMENTAL: this API may change without prior notice
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stdio.h>
 
@@ -32,6 +28,10 @@ extern "C" {
 #include <rte_string_fns.h>
 #include <rte_uuid.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** The tracepoint object. */
 typedef RTE_ATOMIC(uint64_t) rte_trace_point_t;
 
diff --git a/lib/eal/include/rte_trace_point_register.h b/lib/eal/include/rte_trace_point_register.h
index 41260e5964..8726338fe4 100644
--- a/lib/eal/include/rte_trace_point_register.h
+++ b/lib/eal/include/rte_trace_point_register.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_TRACE_POINT_REGISTER_H_
 #define _RTE_TRACE_POINT_REGISTER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef _RTE_TRACE_POINT_H_
 #error for registration, include this file first before <rte_trace_point.h>
 #endif
@@ -16,6 +12,10 @@ extern "C" {
 #include <rte_per_lcore.h>
 #include <rte_trace_point.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_DECLARE_PER_LCORE(volatile int, trace_point_sz);
 
 #define RTE_TRACE_POINT_REGISTER(trace, name) \
diff --git a/lib/eal/include/rte_uuid.h b/lib/eal/include/rte_uuid.h
index cfefd4308a..def5907a00 100644
--- a/lib/eal/include/rte_uuid.h
+++ b/lib/eal/include/rte_uuid.h
@@ -10,14 +10,14 @@
 #ifndef _RTE_UUID_H_
 #define _RTE_UUID_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stddef.h>
 #include <string.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Struct describing a Universal Unique Identifier
  */
diff --git a/lib/eal/include/rte_version.h b/lib/eal/include/rte_version.h
index 422d00fdff..be3f753617 100644
--- a/lib/eal/include/rte_version.h
+++ b/lib/eal/include/rte_version.h
@@ -10,13 +10,13 @@
 #ifndef _RTE_VERSION_H_
 #define _RTE_VERSION_H_
 
+#include <string.h>
+#include <stdio.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <string.h>
-#include <stdio.h>
-
 /**
  * Macro to compute a version number usable for comparisons
  */
diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h
index b774625d9f..06b249dca0 100644
--- a/lib/eal/include/rte_vfio.h
+++ b/lib/eal/include/rte_vfio.h
@@ -10,10 +10,6 @@
  * RTE VFIO. This library provides various VFIO related utility functions.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stdint.h>
 
@@ -36,6 +32,10 @@ extern "C" {
 
 #include <linux/vfio.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
diff --git a/lib/eal/linux/include/rte_os.h b/lib/eal/linux/include/rte_os.h
index c72bf5b7e6..dba0e29827 100644
--- a/lib/eal/linux/include/rte_os.h
+++ b/lib/eal/linux/include/rte_os.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_OS_H_
 #define _RTE_OS_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * This header should contain any definition
  * which is not supported natively or named differently in Linux.
@@ -17,6 +13,10 @@ extern "C" {
 #include <sched.h>
 #include <sys/queue.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* These macros are compatible with system's sys/queue.h. */
 #define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
 #define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
diff --git a/lib/eal/loongarch/include/rte_atomic.h b/lib/eal/loongarch/include/rte_atomic.h
index 0510b8f781..c8066a4612 100644
--- a/lib/eal/loongarch/include/rte_atomic.h
+++ b/lib/eal/loongarch/include/rte_atomic.h
@@ -9,13 +9,13 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
+#include <rte_common.h>
+#include "generic/rte_atomic.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_atomic.h"
-
 #define rte_mb()	do { asm volatile("dbar 0":::"memory"); } while (0)
 
 #define rte_wmb()	rte_mb()
diff --git a/lib/eal/loongarch/include/rte_byteorder.h b/lib/eal/loongarch/include/rte_byteorder.h
index 0da6097a4f..9b092e2a59 100644
--- a/lib/eal/loongarch/include/rte_byteorder.h
+++ b/lib/eal/loongarch/include/rte_byteorder.h
@@ -5,12 +5,12 @@
 #ifndef RTE_BYTEORDER_LOONGARCH_H
 #define RTE_BYTEORDER_LOONGARCH_H
 
+#include "generic/rte_byteorder.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_byteorder.h"
-
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 
 #define rte_cpu_to_le_16(x) (x)
diff --git a/lib/eal/loongarch/include/rte_cpuflags.h b/lib/eal/loongarch/include/rte_cpuflags.h
index 6b592c147c..c1e04ac545 100644
--- a/lib/eal/loongarch/include/rte_cpuflags.h
+++ b/lib/eal/loongarch/include/rte_cpuflags.h
@@ -5,10 +5,6 @@
 #ifndef RTE_CPUFLAGS_LOONGARCH_H
 #define RTE_CPUFLAGS_LOONGARCH_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -30,6 +26,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/loongarch/include/rte_cycles.h b/lib/eal/loongarch/include/rte_cycles.h
index f612d1ad10..128c8646e9 100644
--- a/lib/eal/loongarch/include/rte_cycles.h
+++ b/lib/eal/loongarch/include/rte_cycles.h
@@ -5,12 +5,12 @@
 #ifndef RTE_CYCLES_LOONGARCH_H
 #define RTE_CYCLES_LOONGARCH_H
 
+#include "generic/rte_cycles.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_cycles.h"
-
 /**
  * Read the time base register.
  *
diff --git a/lib/eal/loongarch/include/rte_io.h b/lib/eal/loongarch/include/rte_io.h
index 40e40efa86..e32a4737b2 100644
--- a/lib/eal/loongarch/include/rte_io.h
+++ b/lib/eal/loongarch/include/rte_io.h
@@ -5,12 +5,12 @@
 #ifndef RTE_IO_LOONGARCH_H
 #define RTE_IO_LOONGARCH_H
 
+#include "generic/rte_io.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_io.h"
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/loongarch/include/rte_memcpy.h b/lib/eal/loongarch/include/rte_memcpy.h
index 22578d40f4..5412a0fdc1 100644
--- a/lib/eal/loongarch/include/rte_memcpy.h
+++ b/lib/eal/loongarch/include/rte_memcpy.h
@@ -10,12 +10,12 @@
 
 #include "rte_common.h"
 
+#include "generic/rte_memcpy.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_memcpy.h"
-
 static inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
diff --git a/lib/eal/loongarch/include/rte_pause.h b/lib/eal/loongarch/include/rte_pause.h
index 4302e1b9be..cffa2874d6 100644
--- a/lib/eal/loongarch/include/rte_pause.h
+++ b/lib/eal/loongarch/include/rte_pause.h
@@ -5,14 +5,14 @@
 #ifndef RTE_PAUSE_LOONGARCH_H
 #define RTE_PAUSE_LOONGARCH_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "rte_atomic.h"
 
 #include "generic/rte_pause.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_pause(void)
 {
 }
diff --git a/lib/eal/loongarch/include/rte_power_intrinsics.h b/lib/eal/loongarch/include/rte_power_intrinsics.h
index d5dbd94567..9e11478206 100644
--- a/lib/eal/loongarch/include/rte_power_intrinsics.h
+++ b/lib/eal/loongarch/include/rte_power_intrinsics.h
@@ -5,14 +5,14 @@
 #ifndef RTE_POWER_INTRINSIC_LOONGARCH_H
 #define RTE_POWER_INTRINSIC_LOONGARCH_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/loongarch/include/rte_prefetch.h b/lib/eal/loongarch/include/rte_prefetch.h
index 64b1fd2c2a..8da08a5566 100644
--- a/lib/eal/loongarch/include/rte_prefetch.h
+++ b/lib/eal/loongarch/include/rte_prefetch.h
@@ -5,14 +5,14 @@
 #ifndef RTE_PREFETCH_LOONGARCH_H
 #define RTE_PREFETCH_LOONGARCH_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	__builtin_prefetch((const void *)(uintptr_t)p, 0, 3);
diff --git a/lib/eal/loongarch/include/rte_rwlock.h b/lib/eal/loongarch/include/rte_rwlock.h
index aedc6f3349..48924599c5 100644
--- a/lib/eal/loongarch/include/rte_rwlock.h
+++ b/lib/eal/loongarch/include/rte_rwlock.h
@@ -5,12 +5,12 @@
 #ifndef RTE_RWLOCK_LOONGARCH_H
 #define RTE_RWLOCK_LOONGARCH_H
 
+#include "generic/rte_rwlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 {
diff --git a/lib/eal/loongarch/include/rte_spinlock.h b/lib/eal/loongarch/include/rte_spinlock.h
index e8d34e9728..38f00f631d 100644
--- a/lib/eal/loongarch/include/rte_spinlock.h
+++ b/lib/eal/loongarch/include/rte_spinlock.h
@@ -5,13 +5,13 @@
 #ifndef RTE_SPINLOCK_LOONGARCH_H
 #define RTE_SPINLOCK_LOONGARCH_H
 
+#include <rte_common.h>
+#include "generic/rte_spinlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_spinlock.h"
-
 #ifndef RTE_FORCE_INTRINSICS
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
diff --git a/lib/eal/ppc/include/rte_atomic.h b/lib/eal/ppc/include/rte_atomic.h
index 645c7132df..6ce2e5188a 100644
--- a/lib/eal/ppc/include/rte_atomic.h
+++ b/lib/eal/ppc/include/rte_atomic.h
@@ -12,13 +12,13 @@
 #ifndef _RTE_ATOMIC_PPC_64_H_
 #define _RTE_ATOMIC_PPC_64_H_
 
+#include <stdint.h>
+#include "generic/rte_atomic.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include "generic/rte_atomic.h"
-
 #define	rte_mb()  asm volatile("sync" : : : "memory")
 
 #define	rte_wmb() asm volatile("sync" : : : "memory")
diff --git a/lib/eal/ppc/include/rte_byteorder.h b/lib/eal/ppc/include/rte_byteorder.h
index de94e2ad32..1d19e96f72 100644
--- a/lib/eal/ppc/include/rte_byteorder.h
+++ b/lib/eal/ppc/include/rte_byteorder.h
@@ -8,13 +8,13 @@
 #ifndef _RTE_BYTEORDER_PPC_64_H_
 #define _RTE_BYTEORDER_PPC_64_H_
 
+#include <stdint.h>
+#include "generic/rte_byteorder.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include "generic/rte_byteorder.h"
-
 /*
  * An architecture-optimized byte swap for a 16-bit value.
  *
diff --git a/lib/eal/ppc/include/rte_cpuflags.h b/lib/eal/ppc/include/rte_cpuflags.h
index dedc1ab469..b7bb8f6872 100644
--- a/lib/eal/ppc/include/rte_cpuflags.h
+++ b/lib/eal/ppc/include/rte_cpuflags.h
@@ -6,10 +6,6 @@
 #ifndef _RTE_CPUFLAGS_PPC_64_H_
 #define _RTE_CPUFLAGS_PPC_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -52,6 +48,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/ppc/include/rte_cycles.h b/lib/eal/ppc/include/rte_cycles.h
index 666fc9b0bf..1e6e6cccc8 100644
--- a/lib/eal/ppc/include/rte_cycles.h
+++ b/lib/eal/ppc/include/rte_cycles.h
@@ -6,10 +6,6 @@
 #ifndef _RTE_CYCLES_PPC_64_H_
 #define _RTE_CYCLES_PPC_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <features.h>
 #ifdef __GLIBC__
 #include <sys/platform/ppc.h>
@@ -20,6 +16,10 @@ extern "C" {
 #include <rte_byteorder.h>
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Read the time base register.
  *
diff --git a/lib/eal/ppc/include/rte_io.h b/lib/eal/ppc/include/rte_io.h
index 01455065e5..c9371b784e 100644
--- a/lib/eal/ppc/include/rte_io.h
+++ b/lib/eal/ppc/include/rte_io.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_IO_PPC_64_H_
 #define _RTE_IO_PPC_64_H_
 
+#include "generic/rte_io.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_io.h"
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/ppc/include/rte_memcpy.h b/lib/eal/ppc/include/rte_memcpy.h
index 6f388c0234..eae73128c4 100644
--- a/lib/eal/ppc/include/rte_memcpy.h
+++ b/lib/eal/ppc/include/rte_memcpy.h
@@ -12,12 +12,12 @@
 #include "rte_altivec.h"
 #include "rte_common.h"
 
+#include "generic/rte_memcpy.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_memcpy.h"
-
 #if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 90000)
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Warray-bounds"
diff --git a/lib/eal/ppc/include/rte_pause.h b/lib/eal/ppc/include/rte_pause.h
index 16e47ce22f..78a73aceed 100644
--- a/lib/eal/ppc/include/rte_pause.h
+++ b/lib/eal/ppc/include/rte_pause.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_PAUSE_PPC64_H_
 #define _RTE_PAUSE_PPC64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "rte_atomic.h"
 
 #include "generic/rte_pause.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_pause(void)
 {
 	/* Set hardware multi-threading low priority */
diff --git a/lib/eal/ppc/include/rte_power_intrinsics.h b/lib/eal/ppc/include/rte_power_intrinsics.h
index c0e9ac279f..6207eeb04d 100644
--- a/lib/eal/ppc/include/rte_power_intrinsics.h
+++ b/lib/eal/ppc/include/rte_power_intrinsics.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_POWER_INTRINSIC_PPC_H_
 #define _RTE_POWER_INTRINSIC_PPC_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/ppc/include/rte_prefetch.h b/lib/eal/ppc/include/rte_prefetch.h
index 2e1b5751e0..bae95af7bf 100644
--- a/lib/eal/ppc/include/rte_prefetch.h
+++ b/lib/eal/ppc/include/rte_prefetch.h
@@ -6,14 +6,14 @@
 #ifndef _RTE_PREFETCH_PPC_64_H_
 #define _RTE_PREFETCH_PPC_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	asm volatile ("dcbt 0,%[p],0" : : [p] "r" (p));
diff --git a/lib/eal/ppc/include/rte_rwlock.h b/lib/eal/ppc/include/rte_rwlock.h
index 9fadc04076..bee8da4070 100644
--- a/lib/eal/ppc/include/rte_rwlock.h
+++ b/lib/eal/ppc/include/rte_rwlock.h
@@ -3,12 +3,12 @@
 #ifndef _RTE_RWLOCK_PPC_64_H_
 #define _RTE_RWLOCK_PPC_64_H_
 
+#include "generic/rte_rwlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 {
diff --git a/lib/eal/ppc/include/rte_spinlock.h b/lib/eal/ppc/include/rte_spinlock.h
index 3a4c905b22..77f90f974a 100644
--- a/lib/eal/ppc/include/rte_spinlock.h
+++ b/lib/eal/ppc/include/rte_spinlock.h
@@ -6,14 +6,14 @@
 #ifndef _RTE_SPINLOCK_PPC_64_H_
 #define _RTE_SPINLOCK_PPC_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_pause.h>
 #include "generic/rte_spinlock.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Fixme: Use intrinsics to implement the spinlock on Power architecture */
 
 #ifndef RTE_FORCE_INTRINSICS
diff --git a/lib/eal/riscv/include/rte_atomic.h b/lib/eal/riscv/include/rte_atomic.h
index 2603bc90ea..66346ad474 100644
--- a/lib/eal/riscv/include/rte_atomic.h
+++ b/lib/eal/riscv/include/rte_atomic.h
@@ -12,15 +12,15 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include <rte_config.h>
 #include "generic/rte_atomic.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define rte_mb()	asm volatile("fence rw, rw" : : : "memory")
 
 #define rte_wmb()	asm volatile("fence w, w" : : : "memory")
diff --git a/lib/eal/riscv/include/rte_byteorder.h b/lib/eal/riscv/include/rte_byteorder.h
index 25bd0c275d..c9ff5c0dd1 100644
--- a/lib/eal/riscv/include/rte_byteorder.h
+++ b/lib/eal/riscv/include/rte_byteorder.h
@@ -8,14 +8,14 @@
 #ifndef RTE_BYTEORDER_RISCV_H
 #define RTE_BYTEORDER_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include "generic/rte_byteorder.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_BYTE_ORDER
 #define RTE_BYTE_ORDER RTE_LITTLE_ENDIAN
 #endif
diff --git a/lib/eal/riscv/include/rte_cpuflags.h b/lib/eal/riscv/include/rte_cpuflags.h
index d742efc40f..ac2004f02d 100644
--- a/lib/eal/riscv/include/rte_cpuflags.h
+++ b/lib/eal/riscv/include/rte_cpuflags.h
@@ -8,10 +8,6 @@
 #ifndef RTE_CPUFLAGS_RISCV_H
 #define RTE_CPUFLAGS_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -46,6 +42,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/riscv/include/rte_cycles.h b/lib/eal/riscv/include/rte_cycles.h
index 04750ca253..7926809a73 100644
--- a/lib/eal/riscv/include/rte_cycles.h
+++ b/lib/eal/riscv/include/rte_cycles.h
@@ -8,12 +8,12 @@
 #ifndef RTE_CYCLES_RISCV_H
 #define RTE_CYCLES_RISCV_H
 
+#include "generic/rte_cycles.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_cycles.h"
-
 #ifndef RTE_RISCV_RDTSC_USE_HPM
 #define RTE_RISCV_RDTSC_USE_HPM 0
 #endif
diff --git a/lib/eal/riscv/include/rte_io.h b/lib/eal/riscv/include/rte_io.h
index 29659c9590..911dbb6bd2 100644
--- a/lib/eal/riscv/include/rte_io.h
+++ b/lib/eal/riscv/include/rte_io.h
@@ -8,12 +8,12 @@
 #ifndef RTE_IO_RISCV_H
 #define RTE_IO_RISCV_H
 
+#include "generic/rte_io.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_io.h"
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/riscv/include/rte_memcpy.h b/lib/eal/riscv/include/rte_memcpy.h
index e34f19396e..d8a942c5d2 100644
--- a/lib/eal/riscv/include/rte_memcpy.h
+++ b/lib/eal/riscv/include/rte_memcpy.h
@@ -12,12 +12,12 @@
 
 #include "rte_common.h"
 
+#include "generic/rte_memcpy.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_memcpy.h"
-
 static inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
diff --git a/lib/eal/riscv/include/rte_pause.h b/lib/eal/riscv/include/rte_pause.h
index cb8e9ca52d..3f473cd8db 100644
--- a/lib/eal/riscv/include/rte_pause.h
+++ b/lib/eal/riscv/include/rte_pause.h
@@ -7,14 +7,14 @@
 #ifndef RTE_PAUSE_RISCV_H
 #define RTE_PAUSE_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "rte_atomic.h"
 
 #include "generic/rte_pause.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_pause(void)
 {
 	/* Insert pause hint directly to be compatible with old compilers.
diff --git a/lib/eal/riscv/include/rte_power_intrinsics.h b/lib/eal/riscv/include/rte_power_intrinsics.h
index 636e58e71f..3f7dba1640 100644
--- a/lib/eal/riscv/include/rte_power_intrinsics.h
+++ b/lib/eal/riscv/include/rte_power_intrinsics.h
@@ -7,14 +7,14 @@
 #ifndef RTE_POWER_INTRINSIC_RISCV_H
 #define RTE_POWER_INTRINSIC_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/riscv/include/rte_prefetch.h b/lib/eal/riscv/include/rte_prefetch.h
index 748cf1b626..42146491ea 100644
--- a/lib/eal/riscv/include/rte_prefetch.h
+++ b/lib/eal/riscv/include/rte_prefetch.h
@@ -8,14 +8,14 @@
 #ifndef RTE_PREFETCH_RISCV_H
 #define RTE_PREFETCH_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	RTE_SET_USED(p);
diff --git a/lib/eal/riscv/include/rte_rwlock.h b/lib/eal/riscv/include/rte_rwlock.h
index 9cdaf1b0ef..730970eecb 100644
--- a/lib/eal/riscv/include/rte_rwlock.h
+++ b/lib/eal/riscv/include/rte_rwlock.h
@@ -7,12 +7,12 @@
 #ifndef RTE_RWLOCK_RISCV_H
 #define RTE_RWLOCK_RISCV_H
 
+#include "generic/rte_rwlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 {
diff --git a/lib/eal/riscv/include/rte_spinlock.h b/lib/eal/riscv/include/rte_spinlock.h
index 6af430735c..5fe4980e44 100644
--- a/lib/eal/riscv/include/rte_spinlock.h
+++ b/lib/eal/riscv/include/rte_spinlock.h
@@ -12,13 +12,13 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
+#include <rte_common.h>
+#include "generic/rte_spinlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_spinlock.h"
-
 static inline int rte_tm_supported(void)
 {
 	return 0;
diff --git a/lib/eal/windows/include/pthread.h b/lib/eal/windows/include/pthread.h
index 051b9311c2..e1c31017d1 100644
--- a/lib/eal/windows/include/pthread.h
+++ b/lib/eal/windows/include/pthread.h
@@ -13,13 +13,13 @@
  * eal_common_thread.c and common\include\rte_per_lcore.h as Microsoft libc
  * does not contain pthread.h. This may be removed in future releases.
  */
+#include <rte_common.h>
+#include <rte_windows.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <rte_windows.h>
-
 #define PTHREAD_BARRIER_SERIAL_THREAD TRUE
 
 /* defining pthread_t type on Windows since there is no in Microsoft libc*/
diff --git a/lib/eal/windows/include/regex.h b/lib/eal/windows/include/regex.h
index 827f938414..a224c0cd29 100644
--- a/lib/eal/windows/include/regex.h
+++ b/lib/eal/windows/include/regex.h
@@ -10,15 +10,15 @@
  * as Microsoft libc does not contain regex.h. This may be removed in
  * future releases.
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #define REG_NOMATCH 1
 #define REG_ESPACE 12
 
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* defining regex_t for Windows */
 typedef void *regex_t;
 /* defining regmatch_t for Windows */
diff --git a/lib/eal/windows/include/rte_windows.h b/lib/eal/windows/include/rte_windows.h
index 567ed7d820..e78f007ffa 100644
--- a/lib/eal/windows/include/rte_windows.h
+++ b/lib/eal/windows/include/rte_windows.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_WINDOWS_H_
 #define _RTE_WINDOWS_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file Windows-specific facilities
  *
@@ -44,6 +40,10 @@ extern "C" {
 #include <devguid.h>
 #include <rte_log.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Log GetLastError() with context, usually a Win32 API function and arguments.
  */
diff --git a/lib/eal/x86/include/rte_atomic.h b/lib/eal/x86/include/rte_atomic.h
index 74b1b24b7a..ad571ad132 100644
--- a/lib/eal/x86/include/rte_atomic.h
+++ b/lib/eal/x86/include/rte_atomic.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_ATOMIC_X86_H_
 #define _RTE_ATOMIC_X86_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include <rte_config.h>
@@ -279,6 +275,10 @@ static inline int rte_atomic32_dec_and_test(rte_atomic32_t *v)
 #include "rte_atomic_32.h"
 #else
 #include "rte_atomic_64.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #endif
 
 #endif
diff --git a/lib/eal/x86/include/rte_byteorder.h b/lib/eal/x86/include/rte_byteorder.h
index adbec0c157..89f2f65566 100644
--- a/lib/eal/x86/include/rte_byteorder.h
+++ b/lib/eal/x86/include/rte_byteorder.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_BYTEORDER_X86_H_
 #define _RTE_BYTEORDER_X86_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include <rte_config.h>
@@ -64,6 +60,10 @@ static inline uint32_t rte_arch_bswap32(uint32_t _x)
 #include "rte_byteorder_32.h"
 #else
 #include "rte_byteorder_64.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #endif
 #endif
 
diff --git a/lib/eal/x86/include/rte_cpuflags.h b/lib/eal/x86/include/rte_cpuflags.h
index 1ee00e70fe..e843d1e5f4 100644
--- a/lib/eal/x86/include/rte_cpuflags.h
+++ b/lib/eal/x86/include/rte_cpuflags.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_CPUFLAGS_X86_64_H_
 #define _RTE_CPUFLAGS_X86_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 enum rte_cpu_flag_t {
 	/* (EAX 01h) ECX features*/
 	RTE_CPUFLAG_SSE3 = 0,               /**< SSE3 */
@@ -138,6 +134,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/x86/include/rte_cycles.h b/lib/eal/x86/include/rte_cycles.h
index 2afe85e28c..8de43840da 100644
--- a/lib/eal/x86/include/rte_cycles.h
+++ b/lib/eal/x86/include/rte_cycles.h
@@ -12,10 +12,6 @@
 #include <x86intrin.h>
 #endif
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "generic/rte_cycles.h"
 
 #ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
@@ -26,6 +22,10 @@ extern int rte_cycles_vmware_tsc_map;
 #include <rte_common.h>
 #include <rte_config.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline uint64_t
 rte_rdtsc(void)
 {
diff --git a/lib/eal/x86/include/rte_io.h b/lib/eal/x86/include/rte_io.h
index 0e1fefdee1..c11cb8cd89 100644
--- a/lib/eal/x86/include/rte_io.h
+++ b/lib/eal/x86/include/rte_io.h
@@ -5,16 +5,16 @@
 #ifndef _RTE_IO_X86_H_
 #define _RTE_IO_X86_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include "rte_cpuflags.h"
 
 #define RTE_NATIVE_WRITE32_WC
 #include "generic/rte_io.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * MOVDIRI wrapper.
diff --git a/lib/eal/x86/include/rte_pause.h b/lib/eal/x86/include/rte_pause.h
index b4cf1df1d0..54f028b295 100644
--- a/lib/eal/x86/include/rte_pause.h
+++ b/lib/eal/x86/include/rte_pause.h
@@ -5,13 +5,14 @@
 #ifndef _RTE_PAUSE_X86_H_
 #define _RTE_PAUSE_X86_H_
 
+#include "generic/rte_pause.h"
+
+#include <emmintrin.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_pause.h"
-
-#include <emmintrin.h>
 static inline void rte_pause(void)
 {
 	_mm_pause();
diff --git a/lib/eal/x86/include/rte_power_intrinsics.h b/lib/eal/x86/include/rte_power_intrinsics.h
index e4c2b87f73..fcb780fc5b 100644
--- a/lib/eal/x86/include/rte_power_intrinsics.h
+++ b/lib/eal/x86/include/rte_power_intrinsics.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_POWER_INTRINSIC_X86_H_
 #define _RTE_POWER_INTRINSIC_X86_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/x86/include/rte_prefetch.h b/lib/eal/x86/include/rte_prefetch.h
index 8a9377714f..34a609cc65 100644
--- a/lib/eal/x86/include/rte_prefetch.h
+++ b/lib/eal/x86/include/rte_prefetch.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_PREFETCH_X86_64_H_
 #define _RTE_PREFETCH_X86_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef RTE_TOOLCHAIN_MSVC
 #include <emmintrin.h>
 #endif
@@ -17,6 +13,10 @@ extern "C" {
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 #ifdef RTE_TOOLCHAIN_MSVC
diff --git a/lib/eal/x86/include/rte_rwlock.h b/lib/eal/x86/include/rte_rwlock.h
index 1796b69265..281eff33b9 100644
--- a/lib/eal/x86/include/rte_rwlock.h
+++ b/lib/eal/x86/include/rte_rwlock.h
@@ -5,13 +5,13 @@
 #ifndef _RTE_RWLOCK_X86_64_H_
 #define _RTE_RWLOCK_X86_64_H_
 
+#include "generic/rte_rwlock.h"
+#include "rte_spinlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-#include "rte_spinlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 	__rte_no_thread_safety_analysis
diff --git a/lib/eal/x86/include/rte_spinlock.h b/lib/eal/x86/include/rte_spinlock.h
index a6c23ea1f6..5632dec73e 100644
--- a/lib/eal/x86/include/rte_spinlock.h
+++ b/lib/eal/x86/include/rte_spinlock.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_SPINLOCK_X86_64_H_
 #define _RTE_SPINLOCK_X86_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "generic/rte_spinlock.h"
 #include "rte_rtm.h"
 #include "rte_cpuflags.h"
@@ -17,6 +13,10 @@ extern "C" {
 #include "rte_pause.h"
 #include "rte_cycles.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_RTM_MAX_RETRIES (20)
 #define RTE_XABORT_LOCK_BUSY (0xff)
 
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 883e59a927..ae00ead865 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_ETHDEV_DRIVER_H_
 #define _RTE_ETHDEV_DRIVER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -24,6 +20,10 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_ethdev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * Structure used to hold information about the callbacks to be called for a
diff --git a/lib/ethdev/ethdev_pci.h b/lib/ethdev/ethdev_pci.h
index ec4f731270..2229ffa252 100644
--- a/lib/ethdev/ethdev_pci.h
+++ b/lib/ethdev/ethdev_pci.h
@@ -6,16 +6,16 @@
 #ifndef _RTE_ETHDEV_PCI_H_
 #define _RTE_ETHDEV_PCI_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_malloc.h>
 #include <rte_pci.h>
 #include <bus_pci_driver.h>
 #include <rte_config.h>
 #include <ethdev_driver.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Copy pci device info to the Ethernet device data.
  * Shared memory (eth_dev->data) only updated by primary process, so it is safe
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 3bec87bfdb..36a38f718a 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -11,10 +11,6 @@
  * API for ethdev trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <dev_driver.h>
 #include <rte_trace_point.h>
 
@@ -22,6 +18,10 @@ extern "C" {
 #include "rte_mtr.h"
 #include "rte_tm.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_TRACE_POINT(
 	rte_ethdev_trace_configure,
 	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t nb_rx_q,
diff --git a/lib/ethdev/ethdev_vdev.h b/lib/ethdev/ethdev_vdev.h
index 364f140f91..010ec75a00 100644
--- a/lib/ethdev/ethdev_vdev.h
+++ b/lib/ethdev/ethdev_vdev.h
@@ -6,15 +6,15 @@
 #ifndef _RTE_ETHDEV_VDEV_H_
 #define _RTE_ETHDEV_VDEV_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_config.h>
 #include <rte_malloc.h>
 #include <bus_vdev_driver.h>
 #include <ethdev_driver.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * Allocates a new ethdev slot for an Ethernet device and returns the pointer
diff --git a/lib/ethdev/rte_cman.h b/lib/ethdev/rte_cman.h
index 297db8e095..dedd6cb71a 100644
--- a/lib/ethdev/rte_cman.h
+++ b/lib/ethdev/rte_cman.h
@@ -5,12 +5,12 @@
 #ifndef RTE_CMAN_H
 #define RTE_CMAN_H
 
+#include <rte_bitops.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_bitops.h>
-
 /**
  * @file
  * Congestion management related parameters for DPDK.
diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
index 67cf0ae526..4fde2ad408 100644
--- a/lib/ethdev/rte_dev_info.h
+++ b/lib/ethdev/rte_dev_info.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_DEV_INFO_H_
 #define _RTE_DEV_INFO_H_
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 /*
  * Placeholder for accessing device registers
  */
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 548fada1c7..a75e26bf07 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -145,10 +145,6 @@
  * a 0 value by the receive function of the driver for a given number of tries.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 /* Use this macro to check if LRO API is supported */
@@ -5966,6 +5962,10 @@ int rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config *config
 
 #include <rte_ethdev_core.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * Helper routine for rte_eth_rx_burst().
diff --git a/lib/ethdev/rte_ethdev_trace_fp.h b/lib/ethdev/rte_ethdev_trace_fp.h
index 40b6e4756b..c11b4f18f7 100644
--- a/lib/ethdev/rte_ethdev_trace_fp.h
+++ b/lib/ethdev/rte_ethdev_trace_fp.h
@@ -11,12 +11,12 @@
  * API for ethdev trace support
  */
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 RTE_TRACE_POINT_FP(
 	rte_ethdev_trace_rx_burst,
 	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
diff --git a/lib/eventdev/event_timer_adapter_pmd.h b/lib/eventdev/event_timer_adapter_pmd.h
index cd5127f047..fffcd90c8f 100644
--- a/lib/eventdev/event_timer_adapter_pmd.h
+++ b/lib/eventdev/event_timer_adapter_pmd.h
@@ -16,12 +16,12 @@
  * versioning.
  */
 
+#include "rte_event_timer_adapter.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "rte_event_timer_adapter.h"
-
 /*
  * Definitions of functions exported by an event timer adapter implementation
  * through *rte_event_timer_adapter_ops* structure supplied in the
diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h
index 7a5699f14b..fd5f7a14f4 100644
--- a/lib/eventdev/eventdev_pmd.h
+++ b/lib/eventdev/eventdev_pmd.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_EVENTDEV_PMD_H_
 #define _RTE_EVENTDEV_PMD_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** @file
  * RTE Event PMD APIs
  *
@@ -31,6 +27,10 @@ extern "C" {
 #include "event_timer_adapter_pmd.h"
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 extern int rte_event_logtype;
 #define RTE_LOGTYPE_EVENTDEV rte_event_logtype
 
diff --git a/lib/eventdev/eventdev_pmd_pci.h b/lib/eventdev/eventdev_pmd_pci.h
index 26aa3a6635..5cb5916a84 100644
--- a/lib/eventdev/eventdev_pmd_pci.h
+++ b/lib/eventdev/eventdev_pmd_pci.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_EVENTDEV_PMD_PCI_H_
 #define _RTE_EVENTDEV_PMD_PCI_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** @file
  * RTE Eventdev PCI PMD APIs
  *
@@ -28,6 +24,10 @@ extern "C" {
 
 #include "eventdev_pmd.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 typedef int (*eventdev_pmd_pci_callback_t)(struct rte_eventdev *dev);
 
 /**
diff --git a/lib/eventdev/eventdev_pmd_vdev.h b/lib/eventdev/eventdev_pmd_vdev.h
index bb433ba955..4eaefa0b0b 100644
--- a/lib/eventdev/eventdev_pmd_vdev.h
+++ b/lib/eventdev/eventdev_pmd_vdev.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_EVENTDEV_PMD_VDEV_H_
 #define _RTE_EVENTDEV_PMD_VDEV_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** @file
  * RTE Eventdev VDEV PMD APIs
  *
@@ -27,6 +23,10 @@ extern "C" {
 
 #include "eventdev_pmd.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * Creates a new virtual event device and returns the pointer to that device.
diff --git a/lib/eventdev/eventdev_trace.h b/lib/eventdev/eventdev_trace.h
index 9c2b261c06..8ff8841729 100644
--- a/lib/eventdev/eventdev_trace.h
+++ b/lib/eventdev/eventdev_trace.h
@@ -11,10 +11,6 @@
  * API for ethdev trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_trace_point.h>
 
 #include "rte_eventdev.h"
@@ -22,6 +18,10 @@ extern "C" {
 #include "rte_event_eth_rx_adapter.h"
 #include "rte_event_timer_adapter.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_TRACE_POINT(
 	rte_eventdev_trace_configure,
 	RTE_TRACE_POINT_ARGS(uint8_t dev_id,
diff --git a/lib/eventdev/rte_event_crypto_adapter.h b/lib/eventdev/rte_event_crypto_adapter.h
index e07f159b77..c9b277c664 100644
--- a/lib/eventdev/rte_event_crypto_adapter.h
+++ b/lib/eventdev/rte_event_crypto_adapter.h
@@ -167,14 +167,14 @@
  * from the start of the rte_crypto_op including initialization vector (IV).
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Crypto event adapter mode
  */
diff --git a/lib/eventdev/rte_event_eth_rx_adapter.h b/lib/eventdev/rte_event_eth_rx_adapter.h
index cf42c69b0d..9237e198a7 100644
--- a/lib/eventdev/rte_event_eth_rx_adapter.h
+++ b/lib/eventdev/rte_event_eth_rx_adapter.h
@@ -87,10 +87,6 @@
  * event based so the callback can also modify the event data if it needs to.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -98,6 +94,10 @@ extern "C" {
 
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_EVENT_ETH_RX_ADAPTER_MAX_INSTANCE 32
 
 /* struct rte_event_eth_rx_adapter_queue_conf flags definitions */
diff --git a/lib/eventdev/rte_event_eth_tx_adapter.h b/lib/eventdev/rte_event_eth_tx_adapter.h
index b38b3fce97..ef01345ac2 100644
--- a/lib/eventdev/rte_event_eth_tx_adapter.h
+++ b/lib/eventdev/rte_event_eth_tx_adapter.h
@@ -76,10 +76,6 @@
  * impact due to a change in how the transmit queue index is specified.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -87,6 +83,10 @@ extern "C" {
 
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Adapter configuration structure
  *
diff --git a/lib/eventdev/rte_event_ring.h b/lib/eventdev/rte_event_ring.h
index f9cf19ae16..5769da269e 100644
--- a/lib/eventdev/rte_event_ring.h
+++ b/lib/eventdev/rte_event_ring.h
@@ -14,10 +14,6 @@
 #ifndef _RTE_EVENT_RING_
 #define _RTE_EVENT_RING_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_common.h>
@@ -25,6 +21,10 @@ extern "C" {
 #include <rte_ring_elem.h>
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_TAILQ_EVENT_RING_NAME "RTE_EVENT_RING"
 
 /**
diff --git a/lib/eventdev/rte_event_timer_adapter.h b/lib/eventdev/rte_event_timer_adapter.h
index 0bd1b30045..256807b3bf 100644
--- a/lib/eventdev/rte_event_timer_adapter.h
+++ b/lib/eventdev/rte_event_timer_adapter.h
@@ -107,14 +107,14 @@
  * All these use cases require high resolution and low time drift.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 
 #include "rte_eventdev.h"
 #include "rte_eventdev_trace_fp.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Timer adapter clock source
  */
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index 08e5f9320b..e5c5b7df64 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -237,10 +237,6 @@
  * \endcode
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_errno.h>
@@ -2469,6 +2465,10 @@ rte_event_vector_pool_create(const char *name, unsigned int n,
 
 #include <rte_eventdev_core.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static __rte_always_inline uint16_t
 __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
 			  const struct rte_event ev[], uint16_t nb_events,
diff --git a/lib/eventdev/rte_eventdev_trace_fp.h b/lib/eventdev/rte_eventdev_trace_fp.h
index 04d510ad00..8656f1e6e4 100644
--- a/lib/eventdev/rte_eventdev_trace_fp.h
+++ b/lib/eventdev/rte_eventdev_trace_fp.h
@@ -11,12 +11,12 @@
  * API for ethdev trace support
  */
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 RTE_TRACE_POINT_FP(
 	rte_eventdev_trace_deq_burst,
 	RTE_TRACE_POINT_ARGS(uint8_t dev_id, uint8_t port_id, void *ev_table,
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 732b89297f..f9ff3daa88 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -12,10 +12,6 @@
  * dispatch model.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_errno.h>
 #include <rte_mempool.h>
 #include <rte_memzone.h>
@@ -23,6 +19,10 @@ extern "C" {
 
 #include "rte_graph_worker_common.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
 #define RTE_GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
 	((typeof(nb_nodes))((nb_nodes) * RTE_GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 03d0e01b68..b0f952a82c 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -6,13 +6,13 @@
 #ifndef _RTE_GRAPH_WORKER_H_
 #define _RTE_GRAPH_WORKER_H_
 
+#include "rte_graph_model_rtc.h"
+#include "rte_graph_model_mcore_dispatch.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "rte_graph_model_rtc.h"
-#include "rte_graph_model_mcore_dispatch.h"
-
 /**
  * Perform graph walk on the circular buffer and invoke the process function
  * of the nodes and collect the stats.
diff --git a/lib/gso/rte_gso.h b/lib/gso/rte_gso.h
index d60cb65f18..75246989dc 100644
--- a/lib/gso/rte_gso.h
+++ b/lib/gso/rte_gso.h
@@ -10,13 +10,13 @@
  * Interface to GSO library
  */
 
+#include <stdint.h>
+#include <rte_mbuf.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include <rte_mbuf.h>
-
 /* Minimum GSO segment size for TCP based packets. */
 #define RTE_GSO_SEG_SIZE_MIN (sizeof(struct rte_ether_hdr) + \
 		sizeof(struct rte_ipv4_hdr) + sizeof(struct rte_tcp_hdr) + 1)
diff --git a/lib/hash/rte_fbk_hash.h b/lib/hash/rte_fbk_hash.h
index b01126999b..1f0c1d1b6c 100644
--- a/lib/hash/rte_fbk_hash.h
+++ b/lib/hash/rte_fbk_hash.h
@@ -18,15 +18,15 @@
 #include <stdint.h>
 #include <errno.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 
 #include <rte_hash_crc.h>
 #include <rte_jhash.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_FBK_HASH_INIT_VAL_DEFAULT
 /** Initialising value used when calculating hash. */
 #define RTE_FBK_HASH_INIT_VAL_DEFAULT		0xFFFFFFFF
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 8ad2422ec3..fa07c97685 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -11,10 +11,6 @@
  * RTE CRC Hash
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_branch_prediction.h>
@@ -39,6 +35,10 @@ extern uint8_t rte_hash_crc32_alg;
 #include "rte_crc_generic.h"
 #endif
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
  * calculation.
diff --git a/lib/hash/rte_jhash.h b/lib/hash/rte_jhash.h
index f2446f081e..b70799d209 100644
--- a/lib/hash/rte_jhash.h
+++ b/lib/hash/rte_jhash.h
@@ -11,10 +11,6 @@
  * jhash functions.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <string.h>
 #include <limits.h>
@@ -23,6 +19,10 @@ extern "C" {
 #include <rte_log.h>
 #include <rte_byteorder.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* jhash.h: Jenkins hash support.
  *
  * Copyright (C) 2006 Bob Jenkins (bob_jenkins@burtleburtle.net)
diff --git a/lib/hash/rte_thash.h b/lib/hash/rte_thash.h
index 30b657e67a..ec9bc57efa 100644
--- a/lib/hash/rte_thash.h
+++ b/lib/hash/rte_thash.h
@@ -15,10 +15,6 @@
  * after GRE header decapsulating)
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_byteorder.h>
@@ -28,6 +24,10 @@ extern "C" {
 
 #if defined(RTE_ARCH_X86) || defined(__ARM_NEON)
 #include <rte_vect.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #endif
 
 #ifdef RTE_ARCH_X86
diff --git a/lib/hash/rte_thash_gfni.h b/lib/hash/rte_thash_gfni.h
index 132f37506d..5234c1697f 100644
--- a/lib/hash/rte_thash_gfni.h
+++ b/lib/hash/rte_thash_gfni.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_THASH_GFNI_H_
 #define _RTE_THASH_GFNI_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_log.h>
 
@@ -16,6 +12,10 @@ extern "C" {
 
 #include <rte_thash_x86_gfni.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #endif
 
 /**
diff --git a/lib/ip_frag/rte_ip_frag.h b/lib/ip_frag/rte_ip_frag.h
index 2ad318096b..84fd717953 100644
--- a/lib/ip_frag/rte_ip_frag.h
+++ b/lib/ip_frag/rte_ip_frag.h
@@ -12,10 +12,6 @@
  * Implementation of IP packet fragmentation and reassembly.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdio.h>
 
@@ -25,6 +21,10 @@ extern "C" {
 #include <rte_ip.h>
 #include <rte_byteorder.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_mbuf;
 
 /** death row size (in packets) */
diff --git a/lib/ipsec/rte_ipsec.h b/lib/ipsec/rte_ipsec.h
index f15f6f2966..28b7a61aea 100644
--- a/lib/ipsec/rte_ipsec.h
+++ b/lib/ipsec/rte_ipsec.h
@@ -17,10 +17,6 @@
 #include <rte_ipsec_sa.h>
 #include <rte_mbuf.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 struct rte_ipsec_session;
 
 /**
@@ -181,6 +177,10 @@ rte_ipsec_telemetry_sa_del(const struct rte_ipsec_sa *sa);
 
 #include <rte_ipsec_group.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/log/rte_log.h b/lib/log/rte_log.h
index f357c59548..3735137150 100644
--- a/lib/log/rte_log.h
+++ b/lib/log/rte_log.h
@@ -13,10 +13,6 @@
  * This file provides a log API to RTE applications.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <assert.h>
 #include <stdint.h>
 #include <stdio.h>
@@ -26,6 +22,10 @@ extern "C" {
 #include <rte_common.h>
 #include <rte_config.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* SDK log type */
 #define RTE_LOGTYPE_EAL        0 /**< Log related to eal. */
 				 /* was RTE_LOGTYPE_MALLOC */
diff --git a/lib/lpm/rte_lpm.h b/lib/lpm/rte_lpm.h
index 9c6df311cb..329dc1aad4 100644
--- a/lib/lpm/rte_lpm.h
+++ b/lib/lpm/rte_lpm.h
@@ -391,6 +391,10 @@ static inline void
 rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
 	uint32_t defv);
 
+#ifdef __cplusplus
+}
+#endif
+
 #if defined(RTE_ARCH_ARM)
 #ifdef RTE_HAS_SVE_ACLE
 #include "rte_lpm_sve.h"
@@ -407,8 +411,4 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
 #include "rte_lpm_scalar.h"
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif /* _RTE_LPM_H_ */
diff --git a/lib/member/rte_member.h b/lib/member/rte_member.h
index aec192eba5..109bdd000b 100644
--- a/lib/member/rte_member.h
+++ b/lib/member/rte_member.h
@@ -54,10 +54,6 @@
 #ifndef _RTE_MEMBER_H_
 #define _RTE_MEMBER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdbool.h>
 #include <inttypes.h>
@@ -100,6 +96,10 @@ typedef uint16_t member_set_t;
 #define MEMBER_HASH_FUNC       rte_jhash
 #endif
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** @internal setsummary structure. */
 struct rte_member_setsum;
 
diff --git a/lib/member/rte_member_sketch.h b/lib/member/rte_member_sketch.h
index 74f24ca223..6a8d5104dd 100644
--- a/lib/member/rte_member_sketch.h
+++ b/lib/member/rte_member_sketch.h
@@ -5,13 +5,13 @@
 #ifndef RTE_MEMBER_SKETCH_H
 #define RTE_MEMBER_SKETCH_H
 
+#include <rte_vect.h>
+#include <rte_ring_elem.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_vect.h>
-#include <rte_ring_elem.h>
-
 #define NUM_ROW_SCALAR 5
 #define INTERVAL (1 << 15)
 
diff --git a/lib/member/rte_member_sketch_avx512.h b/lib/member/rte_member_sketch_avx512.h
index 52666b5b4c..a8ef3b065e 100644
--- a/lib/member/rte_member_sketch_avx512.h
+++ b/lib/member/rte_member_sketch_avx512.h
@@ -5,14 +5,14 @@
 #ifndef RTE_MEMBER_SKETCH_AVX512_H
 #define RTE_MEMBER_SKETCH_AVX512_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_vect.h>
 #include "rte_member.h"
 #include "rte_member_sketch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define NUM_ROW_VEC 8
 
 void
diff --git a/lib/member/rte_member_x86.h b/lib/member/rte_member_x86.h
index d115151f9f..4de453485b 100644
--- a/lib/member/rte_member_x86.h
+++ b/lib/member/rte_member_x86.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_MEMBER_X86_H_
 #define _RTE_MEMBER_X86_H_
 
+#include <x86intrin.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <x86intrin.h>
-
 #if defined(__AVX2__)
 
 static inline int
diff --git a/lib/member/rte_xxh64_avx512.h b/lib/member/rte_xxh64_avx512.h
index ffe6cb79f9..58f896ebb8 100644
--- a/lib/member/rte_xxh64_avx512.h
+++ b/lib/member/rte_xxh64_avx512.h
@@ -5,13 +5,13 @@
 #ifndef RTE_XXH64_AVX512_H
 #define RTE_XXH64_AVX512_H
 
+#include <rte_common.h>
+#include <immintrin.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <immintrin.h>
-
 /* 0b1001111000110111011110011011000110000101111010111100101010000111 */
 static const uint64_t PRIME64_1 = 0x9E3779B185EBCA87ULL;
 /* 0b1100001010110010101011100011110100100111110101001110101101001111 */
diff --git a/lib/mempool/mempool_trace.h b/lib/mempool/mempool_trace.h
index dffef062e4..c595a3116b 100644
--- a/lib/mempool/mempool_trace.h
+++ b/lib/mempool/mempool_trace.h
@@ -11,15 +11,15 @@
  * APIs for mempool trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "rte_mempool.h"
 
 #include <rte_memzone.h>
 #include <rte_trace_point.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_TRACE_POINT(
 	rte_mempool_trace_create,
 	RTE_TRACE_POINT_ARGS(const char *name, uint32_t nb_elts,
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..9c5cdbb291 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -11,12 +11,12 @@
  * Mempool fast path API for trace support
  */
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 RTE_TRACE_POINT_FP(
 	rte_mempool_trace_ops_dequeue_bulk,
 	RTE_TRACE_POINT_ARGS(void *mempool, void **obj_table,
diff --git a/lib/meter/rte_meter.h b/lib/meter/rte_meter.h
index bd68cbe389..e72bf93b3e 100644
--- a/lib/meter/rte_meter.h
+++ b/lib/meter/rte_meter.h
@@ -6,10 +6,6 @@
 #ifndef __INCLUDE_RTE_METER_H__
 #define __INCLUDE_RTE_METER_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Traffic Metering
@@ -22,6 +18,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Application Programmer's Interface (API)
  */
diff --git a/lib/mldev/mldev_utils.h b/lib/mldev/mldev_utils.h
index 5e2a180adc..bf21067d38 100644
--- a/lib/mldev/mldev_utils.h
+++ b/lib/mldev/mldev_utils.h
@@ -5,10 +5,6 @@
 #ifndef RTE_MLDEV_UTILS_H
 #define RTE_MLDEV_UTILS_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -20,6 +16,10 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_mldev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  *
diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h
index b3bd281083..8dccf125fc 100644
--- a/lib/mldev/rte_mldev_core.h
+++ b/lib/mldev/rte_mldev_core.h
@@ -16,10 +16,6 @@
  * These APIs are for MLDEV PMDs and library only.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <dev_driver.h>
@@ -27,6 +23,10 @@ extern "C" {
 #include <rte_log.h>
 #include <rte_mldev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Device state */
 #define ML_DEV_DETACHED (0)
 #define ML_DEV_ATTACHED (1)
diff --git a/lib/mldev/rte_mldev_pmd.h b/lib/mldev/rte_mldev_pmd.h
index fd5bbf4360..47c0f23223 100644
--- a/lib/mldev/rte_mldev_pmd.h
+++ b/lib/mldev/rte_mldev_pmd.h
@@ -14,10 +14,6 @@
  * These APIs are for MLDEV PMDs only and user applications should not call them directly.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_common.h>
@@ -25,6 +21,10 @@ extern "C" {
 #include <rte_mldev.h>
 #include <rte_mldev_core.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  *
diff --git a/lib/net/rte_ether.h b/lib/net/rte_ether.h
index 32ed515aef..403e84f50b 100644
--- a/lib/net/rte_ether.h
+++ b/lib/net/rte_ether.h
@@ -11,10 +11,6 @@
  * Ethernet Helpers in RTE
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdio.h>
 
@@ -22,6 +18,10 @@ extern "C" {
 #include <rte_mbuf.h>
 #include <rte_byteorder.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_ETHER_ADDR_LEN  6 /**< Length of Ethernet address. */
 #define RTE_ETHER_TYPE_LEN  2 /**< Length of Ethernet type field. */
 #define RTE_ETHER_CRC_LEN   4 /**< Length of Ethernet CRC. */
diff --git a/lib/net/rte_net.h b/lib/net/rte_net.h
index cdc6cf956d..40ad6a71a1 100644
--- a/lib/net/rte_net.h
+++ b/lib/net/rte_net.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_NET_PTYPE_H_
 #define _RTE_NET_PTYPE_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_ip.h>
 #include <rte_udp.h>
 #include <rte_tcp.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Structure containing header lengths associated to a packet, filled
  * by rte_net_get_ptype().
diff --git a/lib/net/rte_sctp.h b/lib/net/rte_sctp.h
index 965682dc2b..a8ba9e49d8 100644
--- a/lib/net/rte_sctp.h
+++ b/lib/net/rte_sctp.h
@@ -14,14 +14,14 @@
 #ifndef _RTE_SCTP_H_
 #define _RTE_SCTP_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_byteorder.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * SCTP Header
  */
diff --git a/lib/node/rte_node_eth_api.h b/lib/node/rte_node_eth_api.h
index 143cf131b3..2b7019f6bb 100644
--- a/lib/node/rte_node_eth_api.h
+++ b/lib/node/rte_node_eth_api.h
@@ -16,15 +16,15 @@
  * and its queue associations.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_graph.h>
 #include <rte_mempool.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Port config for ethdev_rx and ethdev_tx node.
  */
diff --git a/lib/node/rte_node_ip4_api.h b/lib/node/rte_node_ip4_api.h
index 24f8ec843a..950751a525 100644
--- a/lib/node/rte_node_ip4_api.h
+++ b/lib/node/rte_node_ip4_api.h
@@ -15,15 +15,15 @@
  * This API allows to do control path functions of ip4_* nodes
  * like ip4_lookup, ip4_rewrite.
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_compat.h>
 
 #include <rte_graph.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * IP4 lookup next nodes.
  */
diff --git a/lib/node/rte_node_ip6_api.h b/lib/node/rte_node_ip6_api.h
index a538dc2ea7..f467aac7b6 100644
--- a/lib/node/rte_node_ip6_api.h
+++ b/lib/node/rte_node_ip6_api.h
@@ -15,13 +15,13 @@
  * This API allows to do control path functions of ip6_* nodes
  * like ip6_lookup, ip6_rewrite.
  */
+#include <rte_common.h>
+#include <rte_compat.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <rte_compat.h>
-
 /**
  * IP6 lookup next nodes.
  */
diff --git a/lib/node/rte_node_udp4_input_api.h b/lib/node/rte_node_udp4_input_api.h
index c873acbbe0..694660bd6a 100644
--- a/lib/node/rte_node_udp4_input_api.h
+++ b/lib/node/rte_node_udp4_input_api.h
@@ -16,14 +16,14 @@
  * like udp4_input.
  *
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_compat.h>
 
 #include "rte_graph.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 /**
  * UDP4 lookup next nodes.
  */
diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
index c26fc77209..9a50a12142 100644
--- a/lib/pci/rte_pci.h
+++ b/lib/pci/rte_pci.h
@@ -12,14 +12,14 @@
  * RTE PCI Library
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <inttypes.h>
 #include <sys/types.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Conventional PCI and PCI-X Mode 1 devices have 256 bytes of
  * configuration space.  PCI-X Mode 2 and PCIe devices have 4096 bytes of
diff --git a/lib/pdcp/rte_pdcp.h b/lib/pdcp/rte_pdcp.h
index f74524f83d..15fcbf9607 100644
--- a/lib/pdcp/rte_pdcp.h
+++ b/lib/pdcp/rte_pdcp.h
@@ -19,10 +19,6 @@
 #include <rte_pdcp_hdr.h>
 #include <rte_security.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /* Forward declarations. */
 struct rte_pdcp_entity;
 
@@ -373,6 +369,10 @@ rte_pdcp_t_reordering_expiry_handle(const struct rte_pdcp_entity *entity,
  */
 #include <rte_pdcp_group.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pipeline/rte_pipeline.h b/lib/pipeline/rte_pipeline.h
index 0c7994b4f2..c9e7172453 100644
--- a/lib/pipeline/rte_pipeline.h
+++ b/lib/pipeline/rte_pipeline.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PIPELINE_H__
 #define __INCLUDE_RTE_PIPELINE_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Pipeline
@@ -59,6 +55,10 @@ extern "C" {
 #include <rte_table.h>
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_mbuf;
 
 /*
diff --git a/lib/pipeline/rte_port_in_action.h b/lib/pipeline/rte_port_in_action.h
index ec2994599f..9d17bae988 100644
--- a/lib/pipeline/rte_port_in_action.h
+++ b/lib/pipeline/rte_port_in_action.h
@@ -46,10 +46,6 @@
  * @b EXPERIMENTAL: this API may change without prior notice
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -57,6 +53,10 @@ extern "C" {
 
 #include "rte_pipeline.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Input port actions. */
 enum rte_port_in_action_type {
 	/** Filter selected input packets. */
diff --git a/lib/pipeline/rte_swx_ctl.h b/lib/pipeline/rte_swx_ctl.h
index 6ef2551ab5..c4e63753f5 100644
--- a/lib/pipeline/rte_swx_ctl.h
+++ b/lib/pipeline/rte_swx_ctl.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_CTL_H__
 #define __INCLUDE_RTE_SWX_CTL_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Pipeline Control
@@ -22,6 +18,10 @@ extern "C" {
 #include "rte_swx_port.h"
 #include "rte_swx_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_swx_pipeline;
 
 /** Name size. */
diff --git a/lib/pipeline/rte_swx_extern.h b/lib/pipeline/rte_swx_extern.h
index e10e963d63..1553fa81ec 100644
--- a/lib/pipeline/rte_swx_extern.h
+++ b/lib/pipeline/rte_swx_extern.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_EXTERN_H__
 #define __INCLUDE_RTE_SWX_EXTERN_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Extern objects and functions
@@ -19,6 +15,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Extern type
  */
diff --git a/lib/pipeline/rte_swx_ipsec.h b/lib/pipeline/rte_swx_ipsec.h
index 7c07fdc739..d2e5abef7d 100644
--- a/lib/pipeline/rte_swx_ipsec.h
+++ b/lib/pipeline/rte_swx_ipsec.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_IPSEC_H__
 #define __INCLUDE_RTE_SWX_IPSEC_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Internet Protocol Security (IPsec)
@@ -53,6 +49,10 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_crypto_sym.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * IPsec Setup API
  */
diff --git a/lib/pipeline/rte_swx_pipeline.h b/lib/pipeline/rte_swx_pipeline.h
index 25df042d3b..882bd4bf6f 100644
--- a/lib/pipeline/rte_swx_pipeline.h
+++ b/lib/pipeline/rte_swx_pipeline.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PIPELINE_H__
 #define __INCLUDE_RTE_SWX_PIPELINE_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Pipeline
@@ -22,6 +18,10 @@ extern "C" {
 #include "rte_swx_table.h"
 #include "rte_swx_extern.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Name size. */
 #ifndef RTE_SWX_NAME_SIZE
 #define RTE_SWX_NAME_SIZE 64
diff --git a/lib/pipeline/rte_swx_pipeline_spec.h b/lib/pipeline/rte_swx_pipeline_spec.h
index dd88c0bfab..077b407c0a 100644
--- a/lib/pipeline/rte_swx_pipeline_spec.h
+++ b/lib/pipeline/rte_swx_pipeline_spec.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PIPELINE_SPEC_H__
 #define __INCLUDE_RTE_SWX_PIPELINE_SPEC_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdio.h>
 
@@ -15,6 +11,10 @@ extern "C" {
 
 #include <rte_swx_pipeline.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * extobj.
  *
diff --git a/lib/pipeline/rte_table_action.h b/lib/pipeline/rte_table_action.h
index 5dffbeb700..bab4bfd2e2 100644
--- a/lib/pipeline/rte_table_action.h
+++ b/lib/pipeline/rte_table_action.h
@@ -52,10 +52,6 @@
  * @b EXPERIMENTAL: this API may change without prior notice
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -65,6 +61,10 @@ extern "C" {
 
 #include "rte_pipeline.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Table actions. */
 enum rte_table_action_type {
 	/** Forward to next pipeline table, output port or drop. */
diff --git a/lib/port/rte_port.h b/lib/port/rte_port.h
index 0e30db371e..4b20872537 100644
--- a/lib/port/rte_port.h
+++ b/lib/port/rte_port.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_H__
 #define __INCLUDE_RTE_PORT_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port
@@ -20,6 +16,10 @@ extern "C" {
 #include <stdint.h>
 #include <rte_mbuf.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**@{
  * Macros to allow accessing metadata stored in the mbuf headroom
  * just beyond the end of the mbuf data structure returned by a port
diff --git a/lib/port/rte_port_ethdev.h b/lib/port/rte_port_ethdev.h
index e07021cb89..7729ff0da3 100644
--- a/lib/port/rte_port_ethdev.h
+++ b/lib/port/rte_port_ethdev.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_ETHDEV_H__
 #define __INCLUDE_RTE_PORT_ETHDEV_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Ethernet Device
@@ -21,6 +17,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ethdev_reader port parameters */
 struct rte_port_ethdev_reader_params {
 	/** NIC RX port ID */
diff --git a/lib/port/rte_port_eventdev.h b/lib/port/rte_port_eventdev.h
index 0efb8e1021..d9eccf07d4 100644
--- a/lib/port/rte_port_eventdev.h
+++ b/lib/port/rte_port_eventdev.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_EVENTDEV_H__
 #define __INCLUDE_RTE_PORT_EVENTDEV_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Eventdev Interface
@@ -24,6 +20,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Eventdev_reader port parameters */
 struct rte_port_eventdev_reader_params {
 	/** Eventdev Device ID */
diff --git a/lib/port/rte_port_fd.h b/lib/port/rte_port_fd.h
index 885b9ada22..40a5e4a426 100644
--- a/lib/port/rte_port_fd.h
+++ b/lib/port/rte_port_fd.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_FD_H__
 #define __INCLUDE_RTE_PORT_FD_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port FD Device
@@ -21,6 +17,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** fd_reader port parameters */
 struct rte_port_fd_reader_params {
 	/** File descriptor */
diff --git a/lib/port/rte_port_frag.h b/lib/port/rte_port_frag.h
index 4055872e8d..9a10f10523 100644
--- a/lib/port/rte_port_frag.h
+++ b/lib/port/rte_port_frag.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_IP_FRAG_H__
 #define __INCLUDE_RTE_PORT_IP_FRAG_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port for IPv4 Fragmentation
@@ -31,6 +27,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ring_reader_ipv4_frag port parameters */
 struct rte_port_ring_reader_frag_params {
 	/** Underlying single consumer ring that has to be pre-initialized. */
diff --git a/lib/port/rte_port_ras.h b/lib/port/rte_port_ras.h
index 94cfb3ed92..86e36f5362 100644
--- a/lib/port/rte_port_ras.h
+++ b/lib/port/rte_port_ras.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_RAS_H__
 #define __INCLUDE_RTE_PORT_RAS_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port for IPv4 Reassembly
@@ -31,6 +27,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ring_writer_ipv4_ras port parameters */
 struct rte_port_ring_writer_ras_params {
 	/** Underlying single consumer ring that has to be pre-initialized. */
diff --git a/lib/port/rte_port_ring.h b/lib/port/rte_port_ring.h
index 027928c924..2089d0889b 100644
--- a/lib/port/rte_port_ring.h
+++ b/lib/port/rte_port_ring.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_RING_H__
 #define __INCLUDE_RTE_PORT_RING_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Ring
@@ -27,6 +23,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ring_reader port parameters */
 struct rte_port_ring_reader_params {
 	/** Underlying consumer ring that has to be pre-initialized */
diff --git a/lib/port/rte_port_sched.h b/lib/port/rte_port_sched.h
index 251380ef80..1bf08ae6a9 100644
--- a/lib/port/rte_port_sched.h
+++ b/lib/port/rte_port_sched.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_SCHED_H__
 #define __INCLUDE_RTE_PORT_SCHED_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Hierarchical Scheduler
@@ -23,6 +19,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** sched_reader port parameters */
 struct rte_port_sched_reader_params {
 	/** Underlying pre-initialized rte_sched_port */
diff --git a/lib/port/rte_port_source_sink.h b/lib/port/rte_port_source_sink.h
index bcdbaf1e40..3122dd5038 100644
--- a/lib/port/rte_port_source_sink.h
+++ b/lib/port/rte_port_source_sink.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_SOURCE_SINK_H__
 #define __INCLUDE_RTE_PORT_SOURCE_SINK_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Source/Sink
@@ -19,6 +15,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** source port parameters */
 struct rte_port_source_params {
 	/** Pre-initialized buffer pool */
diff --git a/lib/port/rte_port_sym_crypto.h b/lib/port/rte_port_sym_crypto.h
index 6532b4388a..d03cdc1e8b 100644
--- a/lib/port/rte_port_sym_crypto.h
+++ b/lib/port/rte_port_sym_crypto.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_SYM_CRYPTO_H__
 #define __INCLUDE_RTE_PORT_SYM_CRYPTO_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port sym crypto Interface
@@ -23,6 +19,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Function prototype for reader post action. */
 typedef void (*rte_port_sym_crypto_reader_callback_fn)(struct rte_mbuf **pkts,
 		uint16_t n_pkts, void *arg);
diff --git a/lib/port/rte_swx_port.h b/lib/port/rte_swx_port.h
index 1dbd95ae87..b52b125572 100644
--- a/lib/port/rte_swx_port.h
+++ b/lib/port/rte_swx_port.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_H__
 #define __INCLUDE_RTE_SWX_PORT_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Port
@@ -17,6 +13,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Packet. */
 struct rte_swx_pkt {
 	/** Opaque packet handle. */
diff --git a/lib/port/rte_swx_port_ethdev.h b/lib/port/rte_swx_port_ethdev.h
index cbc2d7b213..1828031e67 100644
--- a/lib/port/rte_swx_port_ethdev.h
+++ b/lib/port/rte_swx_port_ethdev.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_ETHDEV_H__
 #define __INCLUDE_RTE_SWX_PORT_ETHDEV_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Ethernet Device Input and Output Ports
@@ -17,6 +13,10 @@ extern "C" {
 
 #include "rte_swx_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Ethernet device input port (reader) creation parameters. */
 struct rte_swx_port_ethdev_reader_params {
 	/** Name of a valid and fully configured Ethernet device. */
diff --git a/lib/port/rte_swx_port_fd.h b/lib/port/rte_swx_port_fd.h
index e61719c8f6..63529cf0ab 100644
--- a/lib/port/rte_swx_port_fd.h
+++ b/lib/port/rte_swx_port_fd.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_FD_H__
 #define __INCLUDE_RTE_SWX_PORT_FD_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX FD Input and Output Ports
@@ -18,6 +14,10 @@ extern "C" {
 
 #include "rte_swx_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** fd_reader port parameters */
 struct rte_swx_port_fd_reader_params {
 	/** File descriptor. Must be valid and opened in non-blocking mode. */
diff --git a/lib/port/rte_swx_port_ring.h b/lib/port/rte_swx_port_ring.h
index efc485fb08..ef241c3fee 100644
--- a/lib/port/rte_swx_port_ring.h
+++ b/lib/port/rte_swx_port_ring.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_RING_H__
 #define __INCLUDE_RTE_SWX_PORT_RING_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Ring Input and Output Ports
@@ -18,6 +14,10 @@ extern "C" {
 
 #include "rte_swx_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Ring input port (reader) creation parameters. */
 struct rte_swx_port_ring_reader_params {
 	/** Name of valid RTE ring. */
diff --git a/lib/port/rte_swx_port_source_sink.h b/lib/port/rte_swx_port_source_sink.h
index 91bcbf74f4..e3ca7cfbb4 100644
--- a/lib/port/rte_swx_port_source_sink.h
+++ b/lib/port/rte_swx_port_source_sink.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_SOURCE_SINK_H__
 #define __INCLUDE_RTE_SWX_PORT_SOURCE_SINK_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Source and Sink Ports
@@ -15,6 +11,10 @@ extern "C" {
 
 #include "rte_swx_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum number of packets to read from the PCAP file. */
 #ifndef RTE_SWX_PORT_SOURCE_PKTS_MAX
 #define RTE_SWX_PORT_SOURCE_PKTS_MAX 1024
diff --git a/lib/rawdev/rte_rawdev.h b/lib/rawdev/rte_rawdev.h
index 640037b524..3fc471526e 100644
--- a/lib/rawdev/rte_rawdev.h
+++ b/lib/rawdev/rte_rawdev.h
@@ -14,13 +14,13 @@
  * no specific type already available in DPDK.
  */
 
+#include <rte_common.h>
+#include <rte_memory.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <rte_memory.h>
-
 /* Rawdevice object - essentially a void to be typecast by implementation */
 typedef void *rte_rawdev_obj_t;
 
diff --git a/lib/rawdev/rte_rawdev_pmd.h b/lib/rawdev/rte_rawdev_pmd.h
index 22b406444d..408ed461a4 100644
--- a/lib/rawdev/rte_rawdev_pmd.h
+++ b/lib/rawdev/rte_rawdev_pmd.h
@@ -13,10 +13,6 @@
  * any application.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 
 #include <dev_driver.h>
@@ -26,6 +22,10 @@ extern "C" {
 
 #include "rte_rawdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 extern int librawdev_logtype;
 #define RTE_LOGTYPE_RAWDEV librawdev_logtype
 
diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h
index ed3dd6d3d2..550fadf56a 100644
--- a/lib/rcu/rte_rcu_qsbr.h
+++ b/lib/rcu/rte_rcu_qsbr.h
@@ -21,10 +21,6 @@
  * entered quiescent state.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <inttypes.h>
 #include <stdalign.h>
 #include <stdbool.h>
@@ -36,6 +32,10 @@ extern "C" {
 #include <rte_atomic.h>
 #include <rte_ring.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 extern int rte_rcu_log_type;
 #define RTE_LOGTYPE_RCU rte_rcu_log_type
 
diff --git a/lib/regexdev/rte_regexdev.h b/lib/regexdev/rte_regexdev.h
index a50b841b1e..b18a1d4251 100644
--- a/lib/regexdev/rte_regexdev.h
+++ b/lib/regexdev/rte_regexdev.h
@@ -194,10 +194,6 @@
  * - rte_regexdev_dequeue_burst()
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_dev.h>
@@ -1428,6 +1424,10 @@ struct rte_regex_ops {
 
 #include "rte_regexdev_core.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
diff --git a/lib/ring/rte_ring.h b/lib/ring/rte_ring.h
index c709f30497..11ca69c73d 100644
--- a/lib/ring/rte_ring.h
+++ b/lib/ring/rte_ring.h
@@ -34,13 +34,13 @@
  * for more information.
  */
 
+#include <rte_ring_core.h>
+#include <rte_ring_elem.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_core.h>
-#include <rte_ring_elem.h>
-
 /**
  * Calculate the memory size needed for a ring
  *
diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 270869d214..222c5aeb3f 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -19,10 +19,6 @@
  * instead.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdalign.h>
 #include <stdio.h>
 #include <stdint.h>
@@ -38,6 +34,10 @@ extern "C" {
 #include <rte_pause.h>
 #include <rte_debug.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_TAILQ_RING_NAME "RTE_RING"
 
 /** enqueue/dequeue behavior types */
diff --git a/lib/ring/rte_ring_elem.h b/lib/ring/rte_ring_elem.h
index 7f7d4951d3..506f686884 100644
--- a/lib/ring/rte_ring_elem.h
+++ b/lib/ring/rte_ring_elem.h
@@ -16,10 +16,6 @@
  * RTE Ring with user defined element size
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_ring_core.h>
 #include <rte_ring_elem_pvt.h>
 
@@ -699,6 +695,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 
 #include <rte_ring.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ring/rte_ring_hts.h b/lib/ring/rte_ring_hts.h
index 9a5938ac58..a41acea740 100644
--- a/lib/ring/rte_ring_hts.h
+++ b/lib/ring/rte_ring_hts.h
@@ -24,12 +24,12 @@
  * To achieve that 64-bit CAS is used by head update routine.
  */
 
+#include <rte_ring_hts_elem_pvt.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_hts_elem_pvt.h>
-
 /**
  * Enqueue several objects on the HTS ring (multi-producers safe).
  *
diff --git a/lib/ring/rte_ring_peek.h b/lib/ring/rte_ring_peek.h
index c0621d12e2..2312f52668 100644
--- a/lib/ring/rte_ring_peek.h
+++ b/lib/ring/rte_ring_peek.h
@@ -43,12 +43,12 @@
  * with enqueue(/dequeue) operation till _finish_ completes.
  */
 
+#include <rte_ring_peek_elem_pvt.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_peek_elem_pvt.h>
-
 /**
  * Start to enqueue several objects on the ring.
  * Note that no actual objects are put in the queue by this function,
diff --git a/lib/ring/rte_ring_peek_zc.h b/lib/ring/rte_ring_peek_zc.h
index 0b5e34b731..3254fe0481 100644
--- a/lib/ring/rte_ring_peek_zc.h
+++ b/lib/ring/rte_ring_peek_zc.h
@@ -67,12 +67,12 @@
  * with enqueue/dequeue operation till _finish_ completes.
  */
 
+#include <rte_ring_peek_elem_pvt.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_peek_elem_pvt.h>
-
 /**
  * Ring zero-copy information structure.
  *
diff --git a/lib/ring/rte_ring_rts.h b/lib/ring/rte_ring_rts.h
index 50fc8f74db..d7a3863c83 100644
--- a/lib/ring/rte_ring_rts.h
+++ b/lib/ring/rte_ring_rts.h
@@ -51,12 +51,12 @@
  * By default HTD_MAX == ring.capacity / 8.
  */
 
+#include <rte_ring_rts_elem_pvt.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_rts_elem_pvt.h>
-
 /**
  * Enqueue several objects on the RTS ring (multi-producers safe).
  *
diff --git a/lib/sched/rte_approx.h b/lib/sched/rte_approx.h
index b60086330e..738e33a98b 100644
--- a/lib/sched/rte_approx.h
+++ b/lib/sched/rte_approx.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_APPROX_H__
 #define __INCLUDE_RTE_APPROX_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Rational Approximation
@@ -20,6 +16,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Find best rational approximation
  *
diff --git a/lib/sched/rte_pie.h b/lib/sched/rte_pie.h
index 1477a47700..2a385ffdba 100644
--- a/lib/sched/rte_pie.h
+++ b/lib/sched/rte_pie.h
@@ -5,10 +5,6 @@
 #ifndef __RTE_PIE_H_INCLUDED__
 #define __RTE_PIE_H_INCLUDED__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * Proportional Integral controller Enhanced (PIE)
@@ -20,6 +16,10 @@ extern "C" {
 #include <rte_debug.h>
 #include <rte_cycles.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_DQ_THRESHOLD   16384   /**< Queue length threshold (2^14)
 				     * to start measurement cycle (bytes)
 				     */
diff --git a/lib/sched/rte_red.h b/lib/sched/rte_red.h
index afaa35fcd6..e62abb9295 100644
--- a/lib/sched/rte_red.h
+++ b/lib/sched/rte_red.h
@@ -5,10 +5,6 @@
 #ifndef __RTE_RED_H_INCLUDED__
 #define __RTE_RED_H_INCLUDED__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Random Early Detection (RED)
@@ -20,6 +16,10 @@ extern "C" {
 #include <rte_cycles.h>
 #include <rte_branch_prediction.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_RED_SCALING                     10         /**< Fraction size for fixed-point */
 #define RTE_RED_S                           (1 << 22)  /**< Packet size multiplied by number of leaf queues */
 #define RTE_RED_MAX_TH_MAX                  1023       /**< Max threshold limit in fixed point format */
diff --git a/lib/sched/rte_sched.h b/lib/sched/rte_sched.h
index b882c4a882..222e6b3583 100644
--- a/lib/sched/rte_sched.h
+++ b/lib/sched/rte_sched.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_SCHED_H__
 #define __INCLUDE_RTE_SCHED_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Hierarchical Scheduler
@@ -62,6 +58,10 @@ extern "C" {
 #include "rte_red.h"
 #include "rte_pie.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum number of queues per pipe.
  * Note that the multiple queues (power of 2) can only be assigned to
  * lowest priority (best-effort) traffic class. Other higher priority traffic
diff --git a/lib/sched/rte_sched_common.h b/lib/sched/rte_sched_common.h
index 573d164569..a5acb9c08a 100644
--- a/lib/sched/rte_sched_common.h
+++ b/lib/sched/rte_sched_common.h
@@ -5,13 +5,13 @@
 #ifndef __INCLUDE_RTE_SCHED_COMMON_H__
 #define __INCLUDE_RTE_SCHED_COMMON_H__
 
+#include <stdint.h>
+#include <sys/types.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include <sys/types.h>
-
 #if 0
 static inline uint32_t
 rte_min_pos_4_u16(uint16_t *x)
diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
index 1c8474b74f..7a9bafa0fa 100644
--- a/lib/security/rte_security.h
+++ b/lib/security/rte_security.h
@@ -12,10 +12,6 @@
  * RTE Security Common Definitions
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <sys/types.h>
 
 #include <rte_compat.h>
@@ -24,6 +20,10 @@ extern "C" {
 #include <rte_ip.h>
 #include <rte_mbuf_dyn.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** IPSec protocol mode */
 enum rte_security_ipsec_sa_mode {
 	RTE_SECURITY_IPSEC_SA_MODE_TRANSPORT = 1,
diff --git a/lib/security/rte_security_driver.h b/lib/security/rte_security_driver.h
index 9bb5052a4c..2ceb145066 100644
--- a/lib/security/rte_security_driver.h
+++ b/lib/security/rte_security_driver.h
@@ -12,13 +12,13 @@
  * RTE Security Common Definitions
  */
 
+#include <rte_compat.h>
+#include "rte_security.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_compat.h>
-#include "rte_security.h"
-
 /**
  * @internal
  * Security session to be used by library for internal usage
diff --git a/lib/stack/rte_stack.h b/lib/stack/rte_stack.h
index 3325757568..4439adfc42 100644
--- a/lib/stack/rte_stack.h
+++ b/lib/stack/rte_stack.h
@@ -15,10 +15,6 @@
 #ifndef _RTE_STACK_H_
 #define _RTE_STACK_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdalign.h>
 
 #include <rte_debug.h>
@@ -95,6 +91,10 @@ struct __rte_cache_aligned rte_stack {
 #include "rte_stack_std.h"
 #include "rte_stack_lf.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Push several objects on the stack (MT-safe).
  *
diff --git a/lib/table/rte_lru.h b/lib/table/rte_lru.h
index 88229d8632..bc1ad36500 100644
--- a/lib/table/rte_lru.h
+++ b/lib/table/rte_lru.h
@@ -5,15 +5,15 @@
 #ifndef __INCLUDE_RTE_LRU_H__
 #define __INCLUDE_RTE_LRU_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_config.h>
 #ifdef RTE_ARCH_X86_64
 #include "rte_lru_x86.h"
 #elif defined(RTE_ARCH_ARM64)
 #include "rte_lru_arm64.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #else
 #undef RTE_TABLE_HASH_LRU_STRATEGY
 #define RTE_TABLE_HASH_LRU_STRATEGY                        1
@@ -86,8 +86,4 @@ do {									\
 
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif
diff --git a/lib/table/rte_lru_arm64.h b/lib/table/rte_lru_arm64.h
index f19b0bdb4e..f9a4678ee0 100644
--- a/lib/table/rte_lru_arm64.h
+++ b/lib/table/rte_lru_arm64.h
@@ -5,14 +5,14 @@
 #ifndef __RTE_LRU_ARM64_H__
 #define __RTE_LRU_ARM64_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_vect.h>
 #include <rte_bitops.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_TABLE_HASH_LRU_STRATEGY
 #ifdef __ARM_NEON
 #define RTE_TABLE_HASH_LRU_STRATEGY                        3
diff --git a/lib/table/rte_lru_x86.h b/lib/table/rte_lru_x86.h
index ddfb8c1c8c..93f4a136a8 100644
--- a/lib/table/rte_lru_x86.h
+++ b/lib/table/rte_lru_x86.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_LRU_X86_H__
 #define __INCLUDE_RTE_LRU_X86_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_config.h>
@@ -97,8 +93,4 @@ do {									\
 
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif
diff --git a/lib/table/rte_swx_hash_func.h b/lib/table/rte_swx_hash_func.h
index 04f3d543e7..9c65cfa913 100644
--- a/lib/table/rte_swx_hash_func.h
+++ b/lib/table/rte_swx_hash_func.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_HASH_FUNC_H__
 #define __INCLUDE_RTE_SWX_HASH_FUNC_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Hash Function
@@ -15,6 +11,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Hash function prototype
  *
diff --git a/lib/table/rte_swx_keycmp.h b/lib/table/rte_swx_keycmp.h
index 09fb1be869..b0ed819307 100644
--- a/lib/table/rte_swx_keycmp.h
+++ b/lib/table/rte_swx_keycmp.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_KEYCMP_H__
 #define __INCLUDE_RTE_SWX_KEYCMP_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Key Comparison Functions
@@ -16,6 +12,10 @@ extern "C" {
 #include <stdint.h>
 #include <string.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Key comparison function prototype
  *
diff --git a/lib/table/rte_swx_table.h b/lib/table/rte_swx_table.h
index ac01e19781..3c53459498 100644
--- a/lib/table/rte_swx_table.h
+++ b/lib/table/rte_swx_table.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_H__
 #define __INCLUDE_RTE_SWX_TABLE_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Table
@@ -21,6 +17,10 @@ extern "C" {
 
 #include "rte_swx_hash_func.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Match type. */
 enum rte_swx_table_match_type {
 	/** Wildcard Match (WM). */
diff --git a/lib/table/rte_swx_table_em.h b/lib/table/rte_swx_table_em.h
index b7423dd060..592541f01f 100644
--- a/lib/table/rte_swx_table_em.h
+++ b/lib/table/rte_swx_table_em.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_EM_H__
 #define __INCLUDE_RTE_SWX_TABLE_EM_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Exact Match Table
@@ -16,6 +12,10 @@ extern "C" {
 
 #include <rte_swx_table.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Exact match table operations - unoptimized. */
 extern struct rte_swx_table_ops rte_swx_table_exact_match_unoptimized_ops;
 
diff --git a/lib/table/rte_swx_table_learner.h b/lib/table/rte_swx_table_learner.h
index c5ea015b8d..9a18be083d 100644
--- a/lib/table/rte_swx_table_learner.h
+++ b/lib/table/rte_swx_table_learner.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_LEARNER_H__
 #define __INCLUDE_RTE_SWX_TABLE_LEARNER_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Learner Table
@@ -53,6 +49,10 @@ extern "C" {
 
 #include "rte_swx_hash_func.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum number of key timeout values per learner table. */
 #ifndef RTE_SWX_TABLE_LEARNER_N_KEY_TIMEOUTS_MAX
 #define RTE_SWX_TABLE_LEARNER_N_KEY_TIMEOUTS_MAX 16
diff --git a/lib/table/rte_swx_table_selector.h b/lib/table/rte_swx_table_selector.h
index 05863cc90b..ef29bdb6b0 100644
--- a/lib/table/rte_swx_table_selector.h
+++ b/lib/table/rte_swx_table_selector.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_SELECTOR_H__
 #define __INCLUDE_RTE_SWX_TABLE_SELECTOR_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Selector Table
@@ -21,6 +17,10 @@ extern "C" {
 
 #include "rte_swx_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Selector table creation parameters. */
 struct rte_swx_table_selector_params {
 	/** Group ID offset. */
diff --git a/lib/table/rte_swx_table_wm.h b/lib/table/rte_swx_table_wm.h
index 4fd52c0a17..7eb6f8e2a6 100644
--- a/lib/table/rte_swx_table_wm.h
+++ b/lib/table/rte_swx_table_wm.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_WM_H__
 #define __INCLUDE_RTE_SWX_TABLE_WM_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Wildcard Match Table
@@ -16,6 +12,10 @@ extern "C" {
 
 #include <rte_swx_table.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Wildcard match table operations. */
 extern struct rte_swx_table_ops rte_swx_table_wildcard_match_ops;
 
diff --git a/lib/table/rte_table.h b/lib/table/rte_table.h
index 9a5faf0e32..43a5a1a7b3 100644
--- a/lib/table/rte_table.h
+++ b/lib/table/rte_table.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_H__
 #define __INCLUDE_RTE_TABLE_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table
@@ -27,6 +23,10 @@ extern "C" {
 #include <stdint.h>
 #include <rte_port.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_mbuf;
 
 /** Lookup table statistics */
diff --git a/lib/table/rte_table_acl.h b/lib/table/rte_table_acl.h
index 1cb7b9fbbd..61af7b88e4 100644
--- a/lib/table/rte_table_acl.h
+++ b/lib/table/rte_table_acl.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_ACL_H__
 #define __INCLUDE_RTE_TABLE_ACL_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table ACL
@@ -25,6 +21,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ACL table parameters */
 struct rte_table_acl_params {
 	/** Name */
diff --git a/lib/table/rte_table_array.h b/lib/table/rte_table_array.h
index fad83b0588..b2a7b95d68 100644
--- a/lib/table/rte_table_array.h
+++ b/lib/table/rte_table_array.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_ARRAY_H__
 #define __INCLUDE_RTE_TABLE_ARRAY_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table Array
@@ -20,6 +16,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Array table parameters */
 struct rte_table_array_params {
 	/** Number of array entries. Has to be a power of two. */
diff --git a/lib/table/rte_table_hash.h b/lib/table/rte_table_hash.h
index 6698621dae..ff8fc9e9ce 100644
--- a/lib/table/rte_table_hash.h
+++ b/lib/table/rte_table_hash.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_HASH_H__
 #define __INCLUDE_RTE_TABLE_HASH_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table Hash
@@ -52,6 +48,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Hash function */
 typedef uint64_t (*rte_table_hash_op_hash)(
 	void *key,
diff --git a/lib/table/rte_table_hash_cuckoo.h b/lib/table/rte_table_hash_cuckoo.h
index 3a55d28e9b..55aa12216a 100644
--- a/lib/table/rte_table_hash_cuckoo.h
+++ b/lib/table/rte_table_hash_cuckoo.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_HASH_CUCKOO_H__
 #define __INCLUDE_RTE_TABLE_HASH_CUCKOO_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table Hash Cuckoo
@@ -20,6 +16,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Hash table parameters */
 struct rte_table_hash_cuckoo_params {
 	/** Name */
diff --git a/lib/table/rte_table_hash_func.h b/lib/table/rte_table_hash_func.h
index aa779c2182..cba7ec4c20 100644
--- a/lib/table/rte_table_hash_func.h
+++ b/lib/table/rte_table_hash_func.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_HASH_FUNC_H__
 #define __INCLUDE_RTE_TABLE_HASH_FUNC_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -18,6 +14,10 @@ extern "C" {
 
 #include <x86intrin.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline uint64_t
 rte_crc32_u64(uint64_t crc, uint64_t v)
 {
@@ -28,6 +28,10 @@ rte_crc32_u64(uint64_t crc, uint64_t v)
 #include "rte_table_hash_func_arm64.h"
 #else
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline uint64_t
 rte_crc32_u64(uint64_t crc, uint64_t v)
 {
diff --git a/lib/table/rte_table_lpm.h b/lib/table/rte_table_lpm.h
index dde32deed9..59b9bdee89 100644
--- a/lib/table/rte_table_lpm.h
+++ b/lib/table/rte_table_lpm.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_LPM_H__
 #define __INCLUDE_RTE_TABLE_LPM_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table LPM for IPv4
@@ -45,6 +41,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** LPM table parameters */
 struct rte_table_lpm_params {
 	/** Table name */
diff --git a/lib/table/rte_table_lpm_ipv6.h b/lib/table/rte_table_lpm_ipv6.h
index 96ddbd32c2..166a5ba9ee 100644
--- a/lib/table/rte_table_lpm_ipv6.h
+++ b/lib/table/rte_table_lpm_ipv6.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_LPM_IPV6_H__
 #define __INCLUDE_RTE_TABLE_LPM_IPV6_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table LPM for IPv6
@@ -45,6 +41,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_LPM_IPV6_ADDR_SIZE 16
 
 /** LPM table parameters */
diff --git a/lib/table/rte_table_stub.h b/lib/table/rte_table_stub.h
index 846526ea99..f7e589df16 100644
--- a/lib/table/rte_table_stub.h
+++ b/lib/table/rte_table_stub.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_STUB_H__
 #define __INCLUDE_RTE_TABLE_STUB_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table Stub
@@ -18,6 +14,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Stub table parameters: NONE */
 
 /** Stub table operations */
diff --git a/lib/telemetry/rte_telemetry.h b/lib/telemetry/rte_telemetry.h
index cab9daa6fe..463819e2bf 100644
--- a/lib/telemetry/rte_telemetry.h
+++ b/lib/telemetry/rte_telemetry.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_TELEMETRY_H_
 #define _RTE_TELEMETRY_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_compat.h>
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum length for string used in object. */
 #define RTE_TEL_MAX_STRING_LEN 128
 /** Maximum length of string. */
diff --git a/lib/vhost/rte_vdpa.h b/lib/vhost/rte_vdpa.h
index 6ac85d1bbf..18e273c20f 100644
--- a/lib/vhost/rte_vdpa.h
+++ b/lib/vhost/rte_vdpa.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_VDPA_H_
 #define _RTE_VDPA_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -17,6 +13,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum name length for statistics counters */
 #define RTE_VDPA_STATS_NAME_SIZE 64
 
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index b0434c4b8d..c7a5f56df8 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -18,10 +18,6 @@
 #include <rte_memory.h>
 #include <rte_mempool.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifndef __cplusplus
 /* These are not C++-aware. */
 #include <linux/vhost.h>
@@ -29,6 +25,10 @@ extern "C" {
 #include <linux/virtio_net.h>
 #endif
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_VHOST_USER_CLIENT		(1ULL << 0)
 #define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
 #define RTE_VHOST_USER_RESERVED_1	(1ULL << 2)
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index 8f190dd44b..60995e4e62 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -5,15 +5,15 @@
 #ifndef _RTE_VHOST_ASYNC_H_
 #define _RTE_VHOST_ASYNC_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
 #include <rte_mbuf.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Register an async channel for a vhost queue
  *
diff --git a/lib/vhost/rte_vhost_crypto.h b/lib/vhost/rte_vhost_crypto.h
index f962a53818..af61f0907e 100644
--- a/lib/vhost/rte_vhost_crypto.h
+++ b/lib/vhost/rte_vhost_crypto.h
@@ -5,12 +5,12 @@
 #ifndef _VHOST_CRYPTO_H_
 #define _VHOST_CRYPTO_H_
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 /* pre-declare structs to avoid including full headers */
 struct rte_mempool;
 struct rte_crypto_op;
diff --git a/lib/vhost/vdpa_driver.h b/lib/vhost/vdpa_driver.h
index 8db4ab9f4d..42392a0d14 100644
--- a/lib/vhost/vdpa_driver.h
+++ b/lib/vhost/vdpa_driver.h
@@ -5,10 +5,6 @@
 #ifndef _VDPA_DRIVER_H_
 #define _VDPA_DRIVER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 
 #include <rte_compat.h>
@@ -16,6 +12,10 @@ extern "C" {
 #include "rte_vhost.h"
 #include "rte_vdpa.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_VHOST_QUEUE_ALL UINT16_MAX
 
 /**
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v4 2/6] eal: extend bit manipulation functionality
  2024-09-09 14:57                                       ` [PATCH v4 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-09-09 14:57                                         ` [PATCH v4 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
@ 2024-09-09 14:57                                         ` Mattias Rönnblom
  2024-09-09 14:57                                         ` [PATCH v4 3/6] eal: add unit tests for bit operations Mattias Rönnblom
                                                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-09 14:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Mattias Rönnblom

Add functionality to test and modify the value of individual bits in
32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

PATCH v3:
 * Remove unnecessary <rte_compat.h> include.
 * Remove redundant 'fun' parameter from the __RTE_GEN_BIT_*() macros
   (Jack Bond-Preston).
 * Introduce __RTE_BIT_BIT_OPS() macro, consistent with how things
   are done when generating the atomic bit operations.
 * Refer to volatile bit op functions as variants instead of families
   (macro parameter naming).

RFC v6:
 * Have rte_bit_test() accept const-marked bitsets.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.
---
 lib/eal/include/rte_bitops.h | 260 ++++++++++++++++++++++++++++++++++-
 1 file changed, 258 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..6915b945ba 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,197 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		uint32_t *: __rte_bit_test32,			\
+		const uint32_t *: __rte_bit_test32,		\
+		uint64_t *: __rte_bit_test64,			\
+		const uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## variant ## test ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## set ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## clear ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## assign ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## variant ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## variant ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## flip ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## variant ## test ## size(addr, nr); \
+		__rte_bit_ ## variant ## assign ## size(addr, nr, !value); \
+	}
+
+#define __RTE_GEN_BIT_OPS(v, qualifier, size)	\
+	__RTE_GEN_BIT_TEST(v, qualifier, size)	\
+	__RTE_GEN_BIT_SET(v, qualifier, size)	\
+	__RTE_GEN_BIT_CLEAR(v, qualifier, size)	\
+	__RTE_GEN_BIT_ASSIGN(v, qualifier, size)	\
+	__RTE_GEN_BIT_FLIP(v, qualifier, size)
+
+#define __RTE_GEN_BIT_OPS_SIZE(size) \
+	__RTE_GEN_BIT_OPS(,, size)
+
+__RTE_GEN_BIT_OPS_SIZE(32)
+__RTE_GEN_BIT_OPS_SIZE(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +981,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v4 3/6] eal: add unit tests for bit operations
  2024-09-09 14:57                                       ` [PATCH v4 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-09-09 14:57                                         ` [PATCH v4 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
  2024-09-09 14:57                                         ` [PATCH v4 2/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-09-09 14:57                                         ` Mattias Rönnblom
  2024-09-09 14:57                                         ` [PATCH v4 4/6] eal: add atomic " Mattias Rönnblom
                                                           ` (2 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-09 14:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_[test|set|clear|assign|flip]()
functions.

The tests are converted to use the test suite runner framework.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

RFC v6:
 * Test rte_bit_*test() usage through const pointers.

RFC v4:
 * Remove redundant line continuations.
---
 app/test/test_bitops.c | 85 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..322f58c066 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,68 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+									\
+			const uint ## size ## _t *const_ptr = &word;	\
+			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
+				    reference_bit,			\
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +172,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v4 4/6] eal: add atomic bit operations
  2024-09-09 14:57                                       ` [PATCH v4 0/6] Improve EAL bit operations API Mattias Rönnblom
                                                           ` (2 preceding siblings ...)
  2024-09-09 14:57                                         ` [PATCH v4 3/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-09-09 14:57                                         ` Mattias Rönnblom
  2024-09-09 14:57                                         ` [PATCH v4 5/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  2024-09-09 14:57                                         ` [PATCH v4 6/6] eal: extend bitops to handle volatile pointers Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-09 14:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Mattias Rönnblom

Add atomic bit test/set/clear/assign/flip and
test-and-set/clear/assign/flip functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

PATCH v3:
 * Introduce __RTE_GEN_BIT_ATOMIC_*() 'qualifier' argument already in
   this patch (Jack Bond-Preston).
 * Refer to volatile bit op functions as variants instead of families
   (macro parameter naming).
 * Update release notes.

PATCH:
 * Add missing macro #undef for C++ version of atomic bit flip.

RFC v7:
 * Replace compare-exchange-based rte_bitset_atomic_test_and_*() and
   flip() with implementations that use the previous value as returned
   by the atomic fetch function.
 * Reword documentation to match the non-atomic macro variants.
 * Remove pointer to <rte_stdatomic.h> for memory model documentation,
   since there is no documentation for that API.

RFC v6:
 * Have rte_bit_atomic_test() accept const-marked bitsets.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.
---
 doc/guides/rel_notes/release_24_11.rst |  17 +
 lib/eal/include/rte_bitops.h           | 415 +++++++++++++++++++++++++
 2 files changed, 432 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..3111b1e4c0 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -56,6 +56,23 @@ New Features
      =======================================================
 
 
+* **Extended bit operations API.**
+
+  The support for bit-level operations on single 32- and 64-bit words
+  in <rte_bitops.h> has been extended with two families of
+  semantically well-defined functions.
+
+  rte_bit_[test|set|clear|assign|flip]() functions provide excellent
+  performance (by avoiding restricting the compiler and CPU), but give
+  no guarantees in regards to memory ordering or atomicity.
+
+  rte_bit_atomic_*() provides atomic bit-level operations, including
+  the possibility to specifying memory ordering constraints.
+
+  The new public API elements are polymorphic, using the _Generic-
+  based macros (for C) and function overloading (in C++ translation
+  units).
+
 Removed Items
 -------------
 
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 6915b945ba..3ad6795fd1 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -226,6 +227,204 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 uint64_t *: __rte_bit_atomic_test64,			\
+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
+							    memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '1', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '0', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Generic selection macro to atomically negate the value of the bit
+ * specified by @c nr in the word pointed to by @c addr to the value
+ * indicated by @c value, with the memory ordering as specified with
+ * @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Generic selection macro to atomically test and set bit specified by
+ * @c nr in the word pointed to by @c addr to '1', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Generic selection macro to atomically test and clear bit specified
+ * by @c nr in the word pointed to by @c addr to '0', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Generic selection macro to atomically test and assign bit specified
+ * by @c nr in the word pointed to by @c addr the value specified by
+ * @c value, with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(variant, qualifier, size)			\
 	__rte_experimental						\
 	static inline bool						\
@@ -299,6 +498,146 @@ extern "C" {
 __RTE_GEN_BIT_OPS_SIZE(32)
 __RTE_GEN_BIT_OPS_SIZE(64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test ## size(const qualifier uint ## size ## _t *addr, \
+						     unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr = \
+			(const qualifier RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_ ## variant ## set ## size(qualifier uint ## size ## _t *addr, \
+					      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_ ## variant ## clear ## size(qualifier uint ## size ## _t *addr,	\
+						unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_ ## variant ## flip ## size(qualifier uint ## size ## _t *addr, \
+					       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_xor_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_## variant ## assign ## size(qualifier uint ## size ## _t *addr, \
+						unsigned int nr, bool value, \
+						int memory_order)	\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_ ## variant ## set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_ ## variant ## clear ## size(addr, nr, \
+								     memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(variant, qualifier, size)	\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test_and_set ## size(qualifier uint ## size ## _t *addr, \
+						       unsigned int nr,	\
+						       int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+		prev = rte_atomic_fetch_or_explicit(a_addr, mask,	\
+						    memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(variant, qualifier, size)	\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test_and_clear ## size(qualifier uint ## size ## _t *addr, \
+							 unsigned int nr, \
+							 int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+	        prev = rte_atomic_fetch_and_explicit(a_addr, ~mask,	\
+						     memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(variant, qualifier, size)	\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test_and_assign ## size(qualifier uint ## size ## _t *addr, \
+							  unsigned int nr, \
+							  bool value,	\
+							  int memory_order) \
+	{								\
+		if (value)						\
+			return __rte_bit_atomic_ ## variant ## test_and_set ## size(addr, nr, memory_order); \
+		else							\
+			return __rte_bit_atomic_ ## variant ## test_and_clear ## size(addr, nr, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_SET(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(variant, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(variant, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(variant, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_FLIP(variant, qualifier, size)
+
+#define __RTE_GEN_BIT_ATOMIC_OPS_SIZE(size) \
+	__RTE_GEN_BIT_ATOMIC_OPS(,, size)
+
+__RTE_GEN_BIT_ATOMIC_OPS_SIZE(32)
+__RTE_GEN_BIT_ATOMIC_OPS_SIZE(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -994,6 +1333,15 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_flip
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1037,12 +1385,79 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v4 5/6] eal: add unit tests for atomic bit access functions
  2024-09-09 14:57                                       ` [PATCH v4 0/6] Improve EAL bit operations API Mattias Rönnblom
                                                           ` (3 preceding siblings ...)
  2024-09-09 14:57                                         ` [PATCH v4 4/6] eal: add atomic " Mattias Rönnblom
@ 2024-09-09 14:57                                         ` Mattias Rönnblom
  2024-09-09 14:57                                         ` [PATCH v4 6/6] eal: extend bitops to handle volatile pointers Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-09 14:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Mattias Rönnblom

Extend bitops tests to cover the rte_bit_atomic_*() family of
functions.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.
---
 app/test/test_bitops.c | 313 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 312 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 322f58c066..b80216a0a1 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -61,6 +64,304 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -177,6 +478,16 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v4 6/6] eal: extend bitops to handle volatile pointers
  2024-09-09 14:57                                       ` [PATCH v4 0/6] Improve EAL bit operations API Mattias Rönnblom
                                                           ` (4 preceding siblings ...)
  2024-09-09 14:57                                         ` [PATCH v4 5/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
@ 2024-09-09 14:57                                         ` Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-09 14:57 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Mattias Rönnblom

Have rte_bit_[test|set|clear|assign|flip]() and rte_bit_atomic_*()
handle volatile-marked pointers.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

PATCH v3:
 * Updated to reflect removed 'fun' parameter in __RTE_GEN_BIT_*()
   (Jack Bond-Preston).

PATCH v2:
 * Actually run the test_bit_atomic_v_access*() test functions.
---
 app/test/test_bitops.c       |  32 +++-
 lib/eal/include/rte_bitops.h | 301 +++++++++++++++++++++++------------
 2 files changed, 222 insertions(+), 111 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index b80216a0a1..10e87f6776 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -14,13 +14,13 @@
 #include "test.h"
 
 #define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
-			    flip_fun, test_fun, size)			\
+			    flip_fun, test_fun, size, mod)		\
 	static int							\
 	test_name(void)							\
 	{								\
 		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
 		unsigned int bit_nr;					\
-		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+		mod uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
 									\
 		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
 			bool reference_bit = (reference >> bit_nr) & 1;	\
@@ -41,7 +41,7 @@
 				    "Bit %d had unflipped value", bit_nr); \
 			flip_fun(&word, bit_nr);			\
 									\
-			const uint ## size ## _t *const_ptr = &word;	\
+			const mod uint ## size ## _t *const_ptr = &word; \
 			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
 				    reference_bit,			\
 				    "Bit %d had unexpected value", bit_nr); \
@@ -59,10 +59,16 @@
 	}
 
 GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
-		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32,)
 
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
-		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64,)
+
+GEN_TEST_BIT_ACCESS(test_bit_v_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32, volatile)
+
+GEN_TEST_BIT_ACCESS(test_bit_v_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64, volatile)
 
 #define bit_atomic_set(addr, nr)				\
 	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
@@ -81,11 +87,19 @@ GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 
 GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
 		    bit_atomic_clear, bit_atomic_assign,
-		    bit_atomic_flip, bit_atomic_test, 32)
+		    bit_atomic_flip, bit_atomic_test, 32,)
 
 GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
 		    bit_atomic_clear, bit_atomic_assign,
-		    bit_atomic_flip, bit_atomic_test, 64)
+		    bit_atomic_flip, bit_atomic_test, 64,)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_v_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32, volatile)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_v_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64, volatile)
 
 #define PARALLEL_TEST_RUNTIME 0.25
 
@@ -480,8 +494,12 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_v_access32),
+		TEST_CASE(test_bit_v_access64),
 		TEST_CASE(test_bit_atomic_access32),
 		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_v_access32),
+		TEST_CASE(test_bit_atomic_v_access64),
 		TEST_CASE(test_bit_atomic_parallel_assign32),
 		TEST_CASE(test_bit_atomic_parallel_assign64),
 		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3ad6795fd1..d7a07c4099 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -127,12 +127,16 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_test(addr, nr)					\
-	_Generic((addr),					\
-		uint32_t *: __rte_bit_test32,			\
-		const uint32_t *: __rte_bit_test32,		\
-		uint64_t *: __rte_bit_test64,			\
-		const uint64_t *: __rte_bit_test64)(addr, nr)
+#define rte_bit_test(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_test32,				\
+		 const uint32_t *: __rte_bit_test32,			\
+		 volatile uint32_t *: __rte_bit_v_test32,		\
+		 const volatile uint32_t *: __rte_bit_v_test32,		\
+		 uint64_t *: __rte_bit_test64,				\
+		 const uint64_t *: __rte_bit_test64,			\
+		 volatile uint64_t *: __rte_bit_v_test64,		\
+		 const volatile uint64_t *: __rte_bit_v_test64)(addr, nr)
 
 /**
  * @warning
@@ -152,10 +156,12 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_set(addr, nr)				\
-	_Generic((addr),				\
-		 uint32_t *: __rte_bit_set32,		\
-		 uint64_t *: __rte_bit_set64)(addr, nr)
+#define rte_bit_set(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_set32,				\
+		 volatile uint32_t *: __rte_bit_v_set32,		\
+		 uint64_t *: __rte_bit_set64,				\
+		 volatile uint64_t *: __rte_bit_v_set64)(addr, nr)
 
 /**
  * @warning
@@ -175,10 +181,12 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_clear(addr, nr)					\
-	_Generic((addr),					\
-		 uint32_t *: __rte_bit_clear32,			\
-		 uint64_t *: __rte_bit_clear64)(addr, nr)
+#define rte_bit_clear(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_clear32,				\
+		 volatile uint32_t *: __rte_bit_v_clear32,		\
+		 uint64_t *: __rte_bit_clear64,				\
+		 volatile uint64_t *: __rte_bit_v_clear64)(addr, nr)
 
 /**
  * @warning
@@ -202,7 +210,9 @@ extern "C" {
 #define rte_bit_assign(addr, nr, value)					\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_assign32,			\
-		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+		 volatile uint32_t *: __rte_bit_v_assign32,		\
+		 uint64_t *: __rte_bit_assign64,			\
+		 volatile uint64_t *: __rte_bit_v_assign64)(addr, nr, value)
 
 /**
  * @warning
@@ -225,7 +235,9 @@ extern "C" {
 #define rte_bit_flip(addr, nr)						\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_flip32,				\
-		 uint64_t *: __rte_bit_flip64)(addr, nr)
+		 volatile uint32_t *: __rte_bit_v_flip32,		\
+		 uint64_t *: __rte_bit_flip64,				\
+		 volatile uint64_t *: __rte_bit_v_flip64)(addr, nr)
 
 /**
  * @warning
@@ -250,9 +262,13 @@ extern "C" {
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test32,			\
 		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 volatile uint32_t *: __rte_bit_atomic_v_test32,	\
+		 const volatile uint32_t *: __rte_bit_atomic_v_test32,	\
 		 uint64_t *: __rte_bit_atomic_test64,			\
-		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
-							    memory_order)
+		 const uint64_t *: __rte_bit_atomic_test64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test64,	\
+		 const volatile uint64_t *: __rte_bit_atomic_v_test64) \
+						    (addr, nr, memory_order)
 
 /**
  * @warning
@@ -274,7 +290,10 @@ extern "C" {
 #define rte_bit_atomic_set(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_set32,			\
-		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_set32,		\
+		 uint64_t *: __rte_bit_atomic_set64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_set64)(addr, nr, \
+								memory_order)
 
 /**
  * @warning
@@ -296,7 +315,10 @@ extern "C" {
 #define rte_bit_atomic_clear(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_clear32,			\
-		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_clear32,	\
+		 uint64_t *: __rte_bit_atomic_clear64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_clear64)(addr, nr, \
+								  memory_order)
 
 /**
  * @warning
@@ -320,8 +342,11 @@ extern "C" {
 #define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_assign32,			\
-		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
-							memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_assign32,	\
+		 uint64_t *: __rte_bit_atomic_assign64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_assign64)(addr, nr, \
+								   value, \
+								   memory_order)
 
 /**
  * @warning
@@ -344,7 +369,10 @@ extern "C" {
 #define rte_bit_atomic_flip(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_flip32,			\
-		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_flip32,	\
+		 uint64_t *: __rte_bit_atomic_flip64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_flip64)(addr, nr, \
+								 memory_order)
 
 /**
  * @warning
@@ -368,8 +396,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
-		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
-							      memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_set32, \
+		 uint64_t *: __rte_bit_atomic_test_and_set64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_set64) \
+						    (addr, nr, memory_order)
 
 /**
  * @warning
@@ -393,8 +423,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
-		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
-								memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_clear32, \
+		 uint64_t *: __rte_bit_atomic_test_and_clear64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_clear64) \
+						       (addr, nr, memory_order)
 
 /**
  * @warning
@@ -421,9 +453,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
-		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
-								 value, \
-								 memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_assign32, \
+		 uint64_t *: __rte_bit_atomic_test_and_assign64,	\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_assign64) \
+						(addr, nr, value, memory_order)
 
 #define __RTE_GEN_BIT_TEST(variant, qualifier, size)			\
 	__rte_experimental						\
@@ -493,7 +526,8 @@ extern "C" {
 	__RTE_GEN_BIT_FLIP(v, qualifier, size)
 
 #define __RTE_GEN_BIT_OPS_SIZE(size) \
-	__RTE_GEN_BIT_OPS(,, size)
+	__RTE_GEN_BIT_OPS(,, size) \
+	__RTE_GEN_BIT_OPS(v_, volatile, size)
 
 __RTE_GEN_BIT_OPS_SIZE(32)
 __RTE_GEN_BIT_OPS_SIZE(64)
@@ -633,7 +667,8 @@ __RTE_GEN_BIT_OPS_SIZE(64)
 	__RTE_GEN_BIT_ATOMIC_FLIP(variant, qualifier, size)
 
 #define __RTE_GEN_BIT_ATOMIC_OPS_SIZE(size) \
-	__RTE_GEN_BIT_ATOMIC_OPS(,, size)
+	__RTE_GEN_BIT_ATOMIC_OPS(,, size) \
+	__RTE_GEN_BIT_ATOMIC_OPS(v_, volatile, size)
 
 __RTE_GEN_BIT_ATOMIC_OPS_SIZE(32)
 __RTE_GEN_BIT_ATOMIC_OPS_SIZE(64)
@@ -1342,120 +1377,178 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_atomic_test_and_clear
 #undef rte_bit_atomic_test_and_assign
 
-#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+#define __RTE_BIT_OVERLOAD_V_2(family, v, fun, c, size, arg1_type, arg1_name) \
 	static inline void						\
-	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
-			arg1_type arg1_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name)			\
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
-	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
-	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+#define __RTE_BIT_OVERLOAD_SZ_2(family, fun, c, size, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_V_2(family,, fun, c, size, arg1_type,	\
+			       arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2(family, v_, fun, c volatile, size, \
+			       arg1_type, arg1_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name)				\
+#define __RTE_BIT_OVERLOAD_2(family, fun, c, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(family, fun, c, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(family, fun, c, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_V_2R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name)				\
 	static inline ret_type						\
-	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
 			arg1_type arg1_name)				\
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
-	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2R(family,, fun, c, size, ret_type, arg1_type, \
+				arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2R(family, v_, fun, c volatile,		\
+				size, ret_type, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_2R(family, fun, c, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, 32, ret_type, arg1_type, \
 				 arg1_name)				\
-	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+	__RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, 64, ret_type, arg1_type, \
 				 arg1_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
-				arg2_type, arg2_name)			\
+#define __RTE_BIT_OVERLOAD_V_3(family, v, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name)			\
 	static inline void						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name) \
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name, \
+							 arg2_name);	\
 	}
 
-#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+#define __RTE_BIT_OVERLOAD_SZ_3(family, fun, c, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_V_3(family,, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_V_3(family, v_, fun, c volatile, size, arg1_type, \
+			       arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_3(family, fun, c, arg1_type, arg1_name, arg2_type, \
 			     arg2_name)					\
-	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+	__RTE_BIT_OVERLOAD_SZ_3(family, fun, c, 32, arg1_type, arg1_name, \
 				arg2_type, arg2_name)			\
-	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+	__RTE_BIT_OVERLOAD_SZ_3(family, fun, c, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name)	\
+#define __RTE_BIT_OVERLOAD_V_3R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name)	\
 	static inline ret_type						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name) \
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name, \
+								arg2_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
-			      arg2_type, arg2_name)			\
-	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, size, ret_type, arg1_type, \
 				 arg1_name, arg2_type, arg2_name)	\
-	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name)
+	__RTE_BIT_OVERLOAD_V_3R(family,, fun, c, size, ret_type, \
+				arg1_type, arg1_name, arg2_type, arg2_name) \
+	__RTE_BIT_OVERLOAD_V_3R(family, v_, fun, c volatile, size, \
+				ret_type, arg1_type, arg1_name, arg2_type, \
+				arg2_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
-				arg2_type, arg2_name, arg3_type, arg3_name) \
+#define __RTE_BIT_OVERLOAD_3R(family, fun, c, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, 32, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name) \
+	__RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, 64, ret_type, \
+				 arg1_type, arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_V_4(family, v, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name, arg3_type,	arg3_name) \
 	static inline void						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name, arg3_type arg3_name)	\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name, \
+				  arg3_type arg3_name)			\
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
-					  arg3_name);		      \
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name, \
+							 arg2_name,	\
+							 arg3_name);	\
 	}
 
-#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
-			     arg2_name, arg3_type, arg3_name)		\
-	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+#define __RTE_BIT_OVERLOAD_SZ_4(family, fun, c, size, arg1_type, arg1_name, \
 				arg2_type, arg2_name, arg3_type, arg3_name) \
-	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
-				arg2_type, arg2_name, arg3_type, arg3_name)
-
-#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name, arg3_type, \
-				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4(family,, fun, c, size, arg1_type,	\
+			       arg1_name, arg2_type, arg2_name, arg3_type, \
+			       arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4(family, v_, fun, c volatile, size,	\
+			       arg1_type, arg1_name, arg2_type, arg2_name, \
+			       arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_4(family, fun, c, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(family, fun, c, 32, arg1_type,		\
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4(family, fun, c, 64, arg1_type,		\
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)
+
+#define __RTE_BIT_OVERLOAD_V_4R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
 	static inline ret_type						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name, arg3_type arg3_name)	\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name, \
+				  arg3_type arg3_name)			\
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
-						 arg3_name);		\
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name, \
+								arg2_name, \
+								arg3_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
-			      arg2_type, arg2_name, arg3_type, arg3_name) \
-	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, size, ret_type, arg1_type, \
 				 arg1_name, arg2_type, arg2_name, arg3_type, \
 				 arg3_name)				\
-	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name, arg3_type, \
-				 arg3_name)
-
-__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
-__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
-__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
-__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
-__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
-
-__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+	__RTE_BIT_OVERLOAD_V_4R(family,, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4R(family, v_, fun, c volatile, size,	\
+				ret_type, arg1_type, arg1_name, arg2_type, \
+				arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_4R(family, fun, c, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, 32, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name, \
+				 arg3_type, arg3_name)			\
+	__RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, 64, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name, \
+				 arg3_type, arg3_name)
+
+__RTE_BIT_OVERLOAD_2R(, test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(, set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(, clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(, assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(, flip,, unsigned int, nr)
+
+__RTE_BIT_OVERLOAD_3R(atomic_, test, const, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+__RTE_BIT_OVERLOAD_3(atomic_, set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_, clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_, assign,, unsigned int, nr, bool, value,
 		     int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_3(atomic_, flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_, test_and_set,, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_3R(atomic_, test_and_clear,, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_4R(atomic_, test_and_assign,, bool, unsigned int, nr,
 		      bool, value, int, memory_order)
 
 #endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* RE: [PATCH v4 1/6] dpdk: do not force C linkage on include file dependencies
  2024-09-09 14:57                                         ` [PATCH v4 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
@ 2024-09-09 16:43                                           ` Morten Brørup
  2024-09-10  0:50                                           ` fengchengwen
  2024-09-10  6:20                                           ` [PATCH v5 0/6] Improve EAL bit operations API Mattias Rönnblom
  2 siblings, 0 replies; 160+ messages in thread
From: Morten Brørup @ 2024-09-09 16:43 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Jack Bond-Preston, David Marchand

> From: Mattias Rönnblom [mailto:mattias.ronnblom@ericsson.com]
> Sent: Monday, 9 September 2024 16.58
> 
> Assure that 'extern "C" { /../ }' do not cover files included from a
> particular header file, and address minor issues resulting from this
> change of order.
> 
> Dealing with C++ should delegate to the individual include file level,
> rather than being imposed by the user of that file. For example,
> forcing C linkage prevents __Generic macros being replaced with
> overloaded static inline functions in C++ translation units.
> 
> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
> ---

Acked-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 1/6] dpdk: do not force C linkage on include file dependencies
  2024-09-09 14:57                                         ` [PATCH v4 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
  2024-09-09 16:43                                           ` Morten Brørup
@ 2024-09-10  0:50                                           ` fengchengwen
  2024-09-10  5:10                                             ` Mattias Rönnblom
  2024-09-10  6:20                                           ` [PATCH v5 0/6] Improve EAL bit operations API Mattias Rönnblom
  2 siblings, 1 reply; 160+ messages in thread
From: fengchengwen @ 2024-09-10  0:50 UTC (permalink / raw)
  To: Mattias Rönnblom, dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand

On 2024/9/9 22:57, Mattias Rönnblom wrote:
> diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
> index 5474a5281d..11b72b0f2d 100644
> --- a/lib/dmadev/rte_dmadev.h
> +++ b/lib/dmadev/rte_dmadev.h
> @@ -149,10 +149,6 @@
>  #include <rte_bitops.h>
>  #include <rte_common.h>
>  
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> -
>  /** Maximum number of devices if rte_dma_dev_max() is not called. */
>  #define RTE_DMADEV_DEFAULT_MAX 64

There are many C functions declaration in this region, we should wrap it
by extern "C" {}, so let's keep or add like:

#include "rte_dmadev_core.h"
#ifdef __cplusplus
    }
#endif

#include "rte_dmadev_trace_fp.h"

#ifdef __cplusplus
    extern "C" {
#endif

>  
> @@ -775,6 +771,10 @@ struct rte_dma_sge {
>  #include "rte_dmadev_core.h"
>  #include "rte_dmadev_trace_fp.h"
>  
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
>  /**@{@name DMA operation flag
>   * @see rte_dma_copy()
>   * @see rte_dma_copy_sg()

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v4 1/6] dpdk: do not force C linkage on include file dependencies
  2024-09-10  0:50                                           ` fengchengwen
@ 2024-09-10  5:10                                             ` Mattias Rönnblom
  0 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  5:10 UTC (permalink / raw)
  To: fengchengwen, Mattias Rönnblom, dev
  Cc: Heng Wang, Stephen Hemminger, Tyler Retzlaff, Morten Brørup,
	Jack Bond-Preston, David Marchand

On 2024-09-10 02:50, fengchengwen wrote:
> On 2024/9/9 22:57, Mattias Rönnblom wrote:
>> diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
>> index 5474a5281d..11b72b0f2d 100644
>> --- a/lib/dmadev/rte_dmadev.h
>> +++ b/lib/dmadev/rte_dmadev.h
>> @@ -149,10 +149,6 @@
>>   #include <rte_bitops.h>
>>   #include <rte_common.h>
>>   
>> -#ifdef __cplusplus
>> -extern "C" {
>> -#endif
>> -
>>   /** Maximum number of devices if rte_dma_dev_max() is not called. */
>>   #define RTE_DMADEV_DEFAULT_MAX 64
> 
> There are many C functions declaration in this region, we should wrap it
> by extern "C" {}, so let's keep or add like:
> 
> #include "rte_dmadev_core.h"
> #ifdef __cplusplus
>      }
> #endif
> 
> #include "rte_dmadev_trace_fp.h"
> 
> #ifdef __cplusplus
>      extern "C" {
> #endif
> 

OK, will do. Thanks!

>>   
>> @@ -775,6 +771,10 @@ struct rte_dma_sge {
>>   #include "rte_dmadev_core.h"
>>   #include "rte_dmadev_trace_fp.h"
>>   
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>>   /**@{@name DMA operation flag
>>    * @see rte_dma_copy()
>>    * @see rte_dma_copy_sg()

^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 0/6] Improve EAL bit operations API
  2024-09-09 14:57                                         ` [PATCH v4 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
  2024-09-09 16:43                                           ` Morten Brørup
  2024-09-10  0:50                                           ` fengchengwen
@ 2024-09-10  6:20                                           ` Mattias Rönnblom
  2024-09-10  6:20                                             ` [PATCH v5 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
                                                               ` (5 more replies)
  2 siblings, 6 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  6:20 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() functions is replaced
with two new families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees, but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant. rte_bit_[test|set|clear|assign|flip]() may be
used with volatile word pointers, in which case they guarantee
that the program-level accesses actually occur.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions, implemented by
means of a huge, complicated C macro mess.

Mattias Rönnblom (6):
  dpdk: do not force C linkage on include file dependencies
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions
  eal: extend bitops to handle volatile pointers

 app/test/packet_burst_generator.h             |   8 +-
 app/test/test_bitops.c                        | 416 +++++++++-
 app/test/virtual_pmd.h                        |   4 +-
 doc/guides/rel_notes/release_24_11.rst        |  17 +
 drivers/bus/auxiliary/bus_auxiliary_driver.h  |   8 +-
 drivers/bus/cdx/bus_cdx_driver.h              |   8 +-
 drivers/bus/dpaa/include/fsl_qman.h           |   8 +-
 drivers/bus/fslmc/bus_fslmc_driver.h          |   8 +-
 drivers/bus/pci/bus_pci_driver.h              |   8 +-
 drivers/bus/pci/rte_bus_pci.h                 |   8 +-
 drivers/bus/platform/bus_platform_driver.h    |   8 +-
 drivers/bus/vdev/bus_vdev_driver.h            |   8 +-
 drivers/bus/vmbus/bus_vmbus_driver.h          |   8 +-
 drivers/bus/vmbus/rte_bus_vmbus.h             |   8 +-
 drivers/dma/cnxk/cnxk_dma_event_dp.h          |   8 +-
 drivers/dma/ioat/ioat_hw_defs.h               |   4 +-
 drivers/event/dlb2/rte_pmd_dlb2.h             |   8 +-
 drivers/mempool/dpaa2/rte_dpaa2_mempool.h     |   6 +-
 drivers/net/avp/rte_avp_fifo.h                |   8 +-
 drivers/net/bonding/rte_eth_bond.h            |   4 +-
 drivers/net/i40e/rte_pmd_i40e.h               |   8 +-
 drivers/net/mlx5/mlx5_trace.h                 |   8 +-
 drivers/net/ring/rte_eth_ring.h               |   4 +-
 drivers/net/vhost/rte_eth_vhost.h             |   8 +-
 drivers/raw/ifpga/afu_pmd_core.h              |   8 +-
 drivers/raw/ifpga/afu_pmd_he_hssi.h           |   6 +-
 drivers/raw/ifpga/afu_pmd_he_lpbk.h           |   6 +-
 drivers/raw/ifpga/afu_pmd_he_mem.h            |   6 +-
 drivers/raw/ifpga/afu_pmd_n3000.h             |   6 +-
 drivers/raw/ifpga/rte_pmd_afu.h               |   4 +-
 drivers/raw/ifpga/rte_pmd_ifpga.h             |   4 +-
 examples/ethtool/lib/rte_ethtool.h            |   8 +-
 examples/qos_sched/main.h                     |   4 +-
 examples/vm_power_manager/channel_manager.h   |   8 +-
 lib/acl/rte_acl_osdep.h                       |   8 +-
 lib/bbdev/rte_bbdev.h                         |   8 +-
 lib/bbdev/rte_bbdev_op.h                      |   8 +-
 lib/bbdev/rte_bbdev_pmd.h                     |   8 +-
 lib/bpf/bpf_def.h                             |   8 +-
 lib/compressdev/rte_comp.h                    |   4 +-
 lib/compressdev/rte_compressdev.h             |   6 +-
 lib/compressdev/rte_compressdev_internal.h    |   8 +-
 lib/compressdev/rte_compressdev_pmd.h         |   8 +-
 lib/cryptodev/cryptodev_pmd.h                 |   8 +-
 lib/cryptodev/cryptodev_trace.h               |   8 +-
 lib/cryptodev/rte_crypto.h                    |   8 +-
 lib/cryptodev/rte_crypto_asym.h               |   8 +-
 lib/cryptodev/rte_crypto_sym.h                |   8 +-
 lib/cryptodev/rte_cryptodev.h                 |   8 +-
 lib/cryptodev/rte_cryptodev_trace_fp.h        |   4 +-
 lib/dispatcher/rte_dispatcher.h               |   8 +-
 lib/dmadev/rte_dmadev.h                       |   8 +
 lib/eal/arm/include/rte_atomic_32.h           |   4 +-
 lib/eal/arm/include/rte_atomic_64.h           |   8 +-
 lib/eal/arm/include/rte_byteorder.h           |   8 +-
 lib/eal/arm/include/rte_cpuflags_32.h         |   8 +-
 lib/eal/arm/include/rte_cpuflags_64.h         |   8 +-
 lib/eal/arm/include/rte_cycles_32.h           |   4 +-
 lib/eal/arm/include/rte_cycles_64.h           |   4 +-
 lib/eal/arm/include/rte_io.h                  |   8 +-
 lib/eal/arm/include/rte_io_64.h               |   8 +-
 lib/eal/arm/include/rte_memcpy_32.h           |   8 +-
 lib/eal/arm/include/rte_memcpy_64.h           |   8 +-
 lib/eal/arm/include/rte_pause.h               |   8 +-
 lib/eal/arm/include/rte_pause_32.h            |   6 +-
 lib/eal/arm/include/rte_pause_64.h            |   8 +-
 lib/eal/arm/include/rte_power_intrinsics.h    |   8 +-
 lib/eal/arm/include/rte_prefetch_32.h         |   8 +-
 lib/eal/arm/include/rte_prefetch_64.h         |   8 +-
 lib/eal/arm/include/rte_rwlock.h              |   4 +-
 lib/eal/arm/include/rte_spinlock.h            |   6 +-
 lib/eal/freebsd/include/rte_os.h              |   8 +-
 lib/eal/include/bus_driver.h                  |   8 +-
 lib/eal/include/dev_driver.h                  |   6 +-
 lib/eal/include/eal_trace_internal.h          |   8 +-
 lib/eal/include/generic/rte_byteorder.h       |   8 +
 lib/eal/include/generic/rte_cycles.h          |   8 +
 lib/eal/include/generic/rte_memcpy.h          |   8 +
 lib/eal/include/generic/rte_pause.h           |   8 +
 .../include/generic/rte_power_intrinsics.h    |   8 +
 lib/eal/include/generic/rte_prefetch.h        |   8 +
 lib/eal/include/generic/rte_rwlock.h          |   8 +-
 lib/eal/include/generic/rte_spinlock.h        |   8 +
 lib/eal/include/rte_alarm.h                   |   4 +-
 lib/eal/include/rte_bitmap.h                  |   8 +-
 lib/eal/include/rte_bitops.h                  | 768 +++++++++++++++++-
 lib/eal/include/rte_bus.h                     |   8 +-
 lib/eal/include/rte_class.h                   |   4 +-
 lib/eal/include/rte_common.h                  |   8 +-
 lib/eal/include/rte_dev.h                     |   8 +-
 lib/eal/include/rte_devargs.h                 |   8 +-
 lib/eal/include/rte_eal_trace.h               |   4 +-
 lib/eal/include/rte_errno.h                   |   4 +-
 lib/eal/include/rte_fbarray.h                 |   8 +-
 lib/eal/include/rte_keepalive.h               |   6 +-
 lib/eal/include/rte_mcslock.h                 |   8 +-
 lib/eal/include/rte_memory.h                  |   8 +-
 lib/eal/include/rte_pci_dev_features.h        |   4 +-
 lib/eal/include/rte_pflock.h                  |   8 +-
 lib/eal/include/rte_random.h                  |   4 +-
 lib/eal/include/rte_seqcount.h                |   8 +-
 lib/eal/include/rte_seqlock.h                 |   8 +-
 lib/eal/include/rte_service.h                 |   8 +-
 lib/eal/include/rte_service_component.h       |   4 +-
 lib/eal/include/rte_stdatomic.h               |   5 +-
 lib/eal/include/rte_string_fns.h              |  17 +-
 lib/eal/include/rte_tailq.h                   |   6 +-
 lib/eal/include/rte_ticketlock.h              |   8 +-
 lib/eal/include/rte_time.h                    |   6 +-
 lib/eal/include/rte_trace.h                   |   8 +-
 lib/eal/include/rte_trace_point.h             |   8 +-
 lib/eal/include/rte_trace_point_register.h    |   8 +-
 lib/eal/include/rte_uuid.h                    |   8 +-
 lib/eal/include/rte_version.h                 |   6 +-
 lib/eal/include/rte_vfio.h                    |   8 +-
 lib/eal/linux/include/rte_os.h                |   8 +-
 lib/eal/loongarch/include/rte_atomic.h        |   6 +-
 lib/eal/loongarch/include/rte_byteorder.h     |   4 +-
 lib/eal/loongarch/include/rte_cpuflags.h      |   8 +-
 lib/eal/loongarch/include/rte_cycles.h        |   4 +-
 lib/eal/loongarch/include/rte_io.h            |   4 +-
 lib/eal/loongarch/include/rte_memcpy.h        |   4 +-
 lib/eal/loongarch/include/rte_pause.h         |   8 +-
 .../loongarch/include/rte_power_intrinsics.h  |   8 +-
 lib/eal/loongarch/include/rte_prefetch.h      |   8 +-
 lib/eal/loongarch/include/rte_rwlock.h        |   4 +-
 lib/eal/loongarch/include/rte_spinlock.h      |   6 +-
 lib/eal/ppc/include/rte_atomic.h              |   6 +-
 lib/eal/ppc/include/rte_byteorder.h           |   6 +-
 lib/eal/ppc/include/rte_cpuflags.h            |   8 +-
 lib/eal/ppc/include/rte_cycles.h              |   8 +-
 lib/eal/ppc/include/rte_io.h                  |   4 +-
 lib/eal/ppc/include/rte_memcpy.h              |   4 +-
 lib/eal/ppc/include/rte_pause.h               |   8 +-
 lib/eal/ppc/include/rte_power_intrinsics.h    |   8 +-
 lib/eal/ppc/include/rte_prefetch.h            |   8 +-
 lib/eal/ppc/include/rte_rwlock.h              |   4 +-
 lib/eal/ppc/include/rte_spinlock.h            |   8 +-
 lib/eal/riscv/include/rte_atomic.h            |   8 +-
 lib/eal/riscv/include/rte_byteorder.h         |   8 +-
 lib/eal/riscv/include/rte_cpuflags.h          |   8 +-
 lib/eal/riscv/include/rte_cycles.h            |   4 +-
 lib/eal/riscv/include/rte_io.h                |   4 +-
 lib/eal/riscv/include/rte_memcpy.h            |   4 +-
 lib/eal/riscv/include/rte_pause.h             |   8 +-
 lib/eal/riscv/include/rte_power_intrinsics.h  |   8 +-
 lib/eal/riscv/include/rte_prefetch.h          |   8 +-
 lib/eal/riscv/include/rte_rwlock.h            |   4 +-
 lib/eal/riscv/include/rte_spinlock.h          |   6 +-
 lib/eal/windows/include/pthread.h             |   6 +-
 lib/eal/windows/include/regex.h               |   8 +-
 lib/eal/windows/include/rte_windows.h         |   8 +-
 lib/eal/x86/include/rte_atomic.h              |   8 +-
 lib/eal/x86/include/rte_byteorder.h           |  16 +-
 lib/eal/x86/include/rte_cpuflags.h            |   8 +-
 lib/eal/x86/include/rte_cycles.h              |   8 +-
 lib/eal/x86/include/rte_io.h                  |   8 +-
 lib/eal/x86/include/rte_pause.h               |   7 +-
 lib/eal/x86/include/rte_power_intrinsics.h    |   8 +-
 lib/eal/x86/include/rte_prefetch.h            |   8 +-
 lib/eal/x86/include/rte_rwlock.h              |   6 +-
 lib/eal/x86/include/rte_spinlock.h            |   8 +-
 lib/ethdev/ethdev_driver.h                    |   8 +-
 lib/ethdev/ethdev_pci.h                       |   8 +-
 lib/ethdev/ethdev_trace.h                     |   8 +-
 lib/ethdev/ethdev_vdev.h                      |   8 +-
 lib/ethdev/rte_cman.h                         |   4 +-
 lib/ethdev/rte_dev_info.h                     |   4 +-
 lib/ethdev/rte_ethdev.h                       |   8 +-
 lib/ethdev/rte_ethdev_trace_fp.h              |   4 +-
 lib/eventdev/event_timer_adapter_pmd.h        |   4 +-
 lib/eventdev/eventdev_pmd.h                   |   8 +-
 lib/eventdev/eventdev_pmd_pci.h               |   8 +-
 lib/eventdev/eventdev_pmd_vdev.h              |   8 +-
 lib/eventdev/eventdev_trace.h                 |   8 +-
 lib/eventdev/rte_event_crypto_adapter.h       |   8 +-
 lib/eventdev/rte_event_eth_rx_adapter.h       |   8 +-
 lib/eventdev/rte_event_eth_tx_adapter.h       |   8 +-
 lib/eventdev/rte_event_ring.h                 |   8 +-
 lib/eventdev/rte_event_timer_adapter.h        |   8 +-
 lib/eventdev/rte_eventdev.h                   |   8 +-
 lib/eventdev/rte_eventdev_trace_fp.h          |   4 +-
 lib/graph/rte_graph_model_mcore_dispatch.h    |   8 +-
 lib/graph/rte_graph_worker.h                  |   6 +-
 lib/gso/rte_gso.h                             |   6 +-
 lib/hash/rte_fbk_hash.h                       |   8 +-
 lib/hash/rte_hash_crc.h                       |   8 +-
 lib/hash/rte_jhash.h                          |   8 +-
 lib/hash/rte_thash.h                          |   8 +-
 lib/hash/rte_thash_gfni.h                     |   8 +-
 lib/ip_frag/rte_ip_frag.h                     |   8 +-
 lib/ipsec/rte_ipsec.h                         |   8 +-
 lib/log/rte_log.h                             |   8 +-
 lib/lpm/rte_lpm.h                             |   8 +-
 lib/member/rte_member.h                       |   8 +-
 lib/member/rte_member_sketch.h                |   6 +-
 lib/member/rte_member_sketch_avx512.h         |   8 +-
 lib/member/rte_member_x86.h                   |   4 +-
 lib/member/rte_xxh64_avx512.h                 |   6 +-
 lib/mempool/mempool_trace.h                   |   8 +-
 lib/mempool/rte_mempool_trace_fp.h            |   4 +-
 lib/meter/rte_meter.h                         |   8 +-
 lib/mldev/mldev_utils.h                       |   8 +-
 lib/mldev/rte_mldev_core.h                    |   8 +-
 lib/mldev/rte_mldev_pmd.h                     |   8 +-
 lib/net/rte_ether.h                           |   8 +-
 lib/net/rte_net.h                             |   8 +-
 lib/net/rte_sctp.h                            |   8 +-
 lib/node/rte_node_eth_api.h                   |   8 +-
 lib/node/rte_node_ip4_api.h                   |   8 +-
 lib/node/rte_node_ip6_api.h                   |   6 +-
 lib/node/rte_node_udp4_input_api.h            |   8 +-
 lib/pci/rte_pci.h                             |   8 +-
 lib/pdcp/rte_pdcp.h                           |   8 +-
 lib/pipeline/rte_pipeline.h                   |   8 +-
 lib/pipeline/rte_port_in_action.h             |   8 +-
 lib/pipeline/rte_swx_ctl.h                    |   8 +-
 lib/pipeline/rte_swx_extern.h                 |   8 +-
 lib/pipeline/rte_swx_ipsec.h                  |   8 +-
 lib/pipeline/rte_swx_pipeline.h               |   8 +-
 lib/pipeline/rte_swx_pipeline_spec.h          |   8 +-
 lib/pipeline/rte_table_action.h               |   8 +-
 lib/port/rte_port.h                           |   8 +-
 lib/port/rte_port_ethdev.h                    |   8 +-
 lib/port/rte_port_eventdev.h                  |   8 +-
 lib/port/rte_port_fd.h                        |   8 +-
 lib/port/rte_port_frag.h                      |   8 +-
 lib/port/rte_port_ras.h                       |   8 +-
 lib/port/rte_port_ring.h                      |   8 +-
 lib/port/rte_port_sched.h                     |   8 +-
 lib/port/rte_port_source_sink.h               |   8 +-
 lib/port/rte_port_sym_crypto.h                |   8 +-
 lib/port/rte_swx_port.h                       |   8 +-
 lib/port/rte_swx_port_ethdev.h                |   8 +-
 lib/port/rte_swx_port_fd.h                    |   8 +-
 lib/port/rte_swx_port_ring.h                  |   8 +-
 lib/port/rte_swx_port_source_sink.h           |   8 +-
 lib/rawdev/rte_rawdev.h                       |   6 +-
 lib/rawdev/rte_rawdev_pmd.h                   |   8 +-
 lib/rcu/rte_rcu_qsbr.h                        |   8 +-
 lib/regexdev/rte_regexdev.h                   |   8 +-
 lib/ring/rte_ring.h                           |   6 +-
 lib/ring/rte_ring_core.h                      |   8 +-
 lib/ring/rte_ring_elem.h                      |   8 +-
 lib/ring/rte_ring_hts.h                       |   4 +-
 lib/ring/rte_ring_peek.h                      |   4 +-
 lib/ring/rte_ring_peek_zc.h                   |   4 +-
 lib/ring/rte_ring_rts.h                       |   4 +-
 lib/sched/rte_approx.h                        |   8 +-
 lib/sched/rte_pie.h                           |   8 +-
 lib/sched/rte_red.h                           |   8 +-
 lib/sched/rte_sched.h                         |   8 +-
 lib/sched/rte_sched_common.h                  |   6 +-
 lib/security/rte_security.h                   |   8 +-
 lib/security/rte_security_driver.h            |   6 +-
 lib/stack/rte_stack.h                         |   8 +-
 lib/table/rte_lru.h                           |  12 +-
 lib/table/rte_lru_arm64.h                     |   8 +-
 lib/table/rte_lru_x86.h                       |   8 -
 lib/table/rte_swx_hash_func.h                 |   8 +-
 lib/table/rte_swx_keycmp.h                    |   8 +-
 lib/table/rte_swx_table.h                     |   8 +-
 lib/table/rte_swx_table_em.h                  |   8 +-
 lib/table/rte_swx_table_learner.h             |   8 +-
 lib/table/rte_swx_table_selector.h            |   8 +-
 lib/table/rte_swx_table_wm.h                  |   8 +-
 lib/table/rte_table.h                         |   8 +-
 lib/table/rte_table_acl.h                     |   8 +-
 lib/table/rte_table_array.h                   |   8 +-
 lib/table/rte_table_hash.h                    |   8 +-
 lib/table/rte_table_hash_cuckoo.h             |   8 +-
 lib/table/rte_table_hash_func.h               |  12 +-
 lib/table/rte_table_lpm.h                     |   8 +-
 lib/table/rte_table_lpm_ipv6.h                |   8 +-
 lib/table/rte_table_stub.h                    |   8 +-
 lib/telemetry/rte_telemetry.h                 |   8 +-
 lib/vhost/rte_vdpa.h                          |   8 +-
 lib/vhost/rte_vhost.h                         |   8 +-
 lib/vhost/rte_vhost_async.h                   |   8 +-
 lib/vhost/rte_vhost_crypto.h                  |   4 +-
 lib/vhost/vdpa_driver.h                       |   8 +-
 281 files changed, 2219 insertions(+), 993 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 1/6] dpdk: do not force C linkage on include file dependencies
  2024-09-10  6:20                                           ` [PATCH v5 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-09-10  6:20                                             ` Mattias Rönnblom
  2024-09-10  8:31                                               ` [PATCH v6 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-09-10  6:20                                             ` [PATCH v5 2/6] eal: extend bit manipulation functionality Mattias Rönnblom
                                                               ` (4 subsequent siblings)
  5 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  6:20 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

Assure that 'extern "C" { /../ }' do not cover files included from a
particular header file, and address minor issues resulting from this
change of order.

Dealing with C++ should delegate to the individual include file level,
rather than being imposed by the user of that file. For example,
forcing C linkage prevents __Generic macros being replaced with
overloaded static inline functions in C++ translation units.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>

--

PATCH v5:
 * rte_dmadev.h was still including files under extern "C" { /../ }.
   (Chengwen Feng)
 * Fix rte_byteorder.h, broken on 32-bit x86.
---
 app/test/packet_burst_generator.h               |  8 ++++----
 app/test/virtual_pmd.h                          |  4 ++--
 drivers/bus/auxiliary/bus_auxiliary_driver.h    |  8 ++++----
 drivers/bus/cdx/bus_cdx_driver.h                |  8 ++++----
 drivers/bus/dpaa/include/fsl_qman.h             |  8 ++++----
 drivers/bus/fslmc/bus_fslmc_driver.h            |  8 ++++----
 drivers/bus/pci/bus_pci_driver.h                |  8 ++++----
 drivers/bus/pci/rte_bus_pci.h                   |  8 ++++----
 drivers/bus/platform/bus_platform_driver.h      |  8 ++++----
 drivers/bus/vdev/bus_vdev_driver.h              |  8 ++++----
 drivers/bus/vmbus/bus_vmbus_driver.h            |  8 ++++----
 drivers/bus/vmbus/rte_bus_vmbus.h               |  8 ++++----
 drivers/dma/cnxk/cnxk_dma_event_dp.h            |  8 ++++----
 drivers/dma/ioat/ioat_hw_defs.h                 |  4 ++--
 drivers/event/dlb2/rte_pmd_dlb2.h               |  8 ++++----
 drivers/mempool/dpaa2/rte_dpaa2_mempool.h       |  6 +++---
 drivers/net/avp/rte_avp_fifo.h                  |  8 ++++----
 drivers/net/bonding/rte_eth_bond.h              |  4 ++--
 drivers/net/i40e/rte_pmd_i40e.h                 |  8 ++++----
 drivers/net/mlx5/mlx5_trace.h                   |  8 ++++----
 drivers/net/ring/rte_eth_ring.h                 |  4 ++--
 drivers/net/vhost/rte_eth_vhost.h               |  8 ++++----
 drivers/raw/ifpga/afu_pmd_core.h                |  8 ++++----
 drivers/raw/ifpga/afu_pmd_he_hssi.h             |  6 +++---
 drivers/raw/ifpga/afu_pmd_he_lpbk.h             |  6 +++---
 drivers/raw/ifpga/afu_pmd_he_mem.h              |  6 +++---
 drivers/raw/ifpga/afu_pmd_n3000.h               |  6 +++---
 drivers/raw/ifpga/rte_pmd_afu.h                 |  4 ++--
 drivers/raw/ifpga/rte_pmd_ifpga.h               |  4 ++--
 examples/ethtool/lib/rte_ethtool.h              |  8 ++++----
 examples/qos_sched/main.h                       |  4 ++--
 examples/vm_power_manager/channel_manager.h     |  8 ++++----
 lib/acl/rte_acl_osdep.h                         |  8 ++++----
 lib/bbdev/rte_bbdev.h                           |  8 ++++----
 lib/bbdev/rte_bbdev_op.h                        |  8 ++++----
 lib/bbdev/rte_bbdev_pmd.h                       |  8 ++++----
 lib/bpf/bpf_def.h                               |  8 ++++----
 lib/compressdev/rte_comp.h                      |  4 ++--
 lib/compressdev/rte_compressdev.h               |  6 +++---
 lib/compressdev/rte_compressdev_internal.h      |  8 ++++----
 lib/compressdev/rte_compressdev_pmd.h           |  8 ++++----
 lib/cryptodev/cryptodev_pmd.h                   |  8 ++++----
 lib/cryptodev/cryptodev_trace.h                 |  8 ++++----
 lib/cryptodev/rte_crypto.h                      |  8 ++++----
 lib/cryptodev/rte_crypto_asym.h                 |  8 ++++----
 lib/cryptodev/rte_crypto_sym.h                  |  8 ++++----
 lib/cryptodev/rte_cryptodev.h                   |  8 ++++----
 lib/cryptodev/rte_cryptodev_trace_fp.h          |  4 ++--
 lib/dispatcher/rte_dispatcher.h                 |  8 ++++----
 lib/dmadev/rte_dmadev.h                         |  8 ++++++++
 lib/eal/arm/include/rte_atomic_32.h             |  4 ++--
 lib/eal/arm/include/rte_atomic_64.h             |  8 ++++----
 lib/eal/arm/include/rte_byteorder.h             |  8 ++++----
 lib/eal/arm/include/rte_cpuflags_32.h           |  8 ++++----
 lib/eal/arm/include/rte_cpuflags_64.h           |  8 ++++----
 lib/eal/arm/include/rte_cycles_32.h             |  4 ++--
 lib/eal/arm/include/rte_cycles_64.h             |  4 ++--
 lib/eal/arm/include/rte_io.h                    |  8 ++++----
 lib/eal/arm/include/rte_io_64.h                 |  8 ++++----
 lib/eal/arm/include/rte_memcpy_32.h             |  8 ++++----
 lib/eal/arm/include/rte_memcpy_64.h             |  8 ++++----
 lib/eal/arm/include/rte_pause.h                 |  8 ++++----
 lib/eal/arm/include/rte_pause_32.h              |  6 +++---
 lib/eal/arm/include/rte_pause_64.h              |  8 ++++----
 lib/eal/arm/include/rte_power_intrinsics.h      |  8 ++++----
 lib/eal/arm/include/rte_prefetch_32.h           |  8 ++++----
 lib/eal/arm/include/rte_prefetch_64.h           |  8 ++++----
 lib/eal/arm/include/rte_rwlock.h                |  4 ++--
 lib/eal/arm/include/rte_spinlock.h              |  6 +++---
 lib/eal/freebsd/include/rte_os.h                |  8 ++++----
 lib/eal/include/bus_driver.h                    |  8 ++++----
 lib/eal/include/dev_driver.h                    |  6 +++---
 lib/eal/include/eal_trace_internal.h            |  8 ++++----
 lib/eal/include/generic/rte_byteorder.h         |  8 ++++++++
 lib/eal/include/generic/rte_cycles.h            |  8 ++++++++
 lib/eal/include/generic/rte_memcpy.h            |  8 ++++++++
 lib/eal/include/generic/rte_pause.h             |  8 ++++++++
 lib/eal/include/generic/rte_power_intrinsics.h  |  8 ++++++++
 lib/eal/include/generic/rte_prefetch.h          |  8 ++++++++
 lib/eal/include/generic/rte_rwlock.h            |  8 ++++----
 lib/eal/include/generic/rte_spinlock.h          |  8 ++++++++
 lib/eal/include/rte_alarm.h                     |  4 ++--
 lib/eal/include/rte_bitmap.h                    |  8 ++++----
 lib/eal/include/rte_bus.h                       |  8 ++++----
 lib/eal/include/rte_class.h                     |  4 ++--
 lib/eal/include/rte_common.h                    |  8 ++++----
 lib/eal/include/rte_dev.h                       |  8 ++++----
 lib/eal/include/rte_devargs.h                   |  8 ++++----
 lib/eal/include/rte_eal_trace.h                 |  4 ++--
 lib/eal/include/rte_errno.h                     |  4 ++--
 lib/eal/include/rte_fbarray.h                   |  8 ++++----
 lib/eal/include/rte_keepalive.h                 |  6 +++---
 lib/eal/include/rte_mcslock.h                   |  8 ++++----
 lib/eal/include/rte_memory.h                    |  8 ++++----
 lib/eal/include/rte_pci_dev_features.h          |  4 ++--
 lib/eal/include/rte_pflock.h                    |  8 ++++----
 lib/eal/include/rte_random.h                    |  4 ++--
 lib/eal/include/rte_seqcount.h                  |  8 ++++----
 lib/eal/include/rte_seqlock.h                   |  8 ++++----
 lib/eal/include/rte_service.h                   |  8 ++++----
 lib/eal/include/rte_service_component.h         |  4 ++--
 lib/eal/include/rte_stdatomic.h                 |  5 +----
 lib/eal/include/rte_string_fns.h                | 17 ++++++++++++-----
 lib/eal/include/rte_tailq.h                     |  6 +++---
 lib/eal/include/rte_ticketlock.h                |  8 ++++----
 lib/eal/include/rte_time.h                      |  6 +++---
 lib/eal/include/rte_trace.h                     |  8 ++++----
 lib/eal/include/rte_trace_point.h               |  8 ++++----
 lib/eal/include/rte_trace_point_register.h      |  8 ++++----
 lib/eal/include/rte_uuid.h                      |  8 ++++----
 lib/eal/include/rte_version.h                   |  6 +++---
 lib/eal/include/rte_vfio.h                      |  8 ++++----
 lib/eal/linux/include/rte_os.h                  |  8 ++++----
 lib/eal/loongarch/include/rte_atomic.h          |  6 +++---
 lib/eal/loongarch/include/rte_byteorder.h       |  4 ++--
 lib/eal/loongarch/include/rte_cpuflags.h        |  8 ++++----
 lib/eal/loongarch/include/rte_cycles.h          |  4 ++--
 lib/eal/loongarch/include/rte_io.h              |  4 ++--
 lib/eal/loongarch/include/rte_memcpy.h          |  4 ++--
 lib/eal/loongarch/include/rte_pause.h           |  8 ++++----
 .../loongarch/include/rte_power_intrinsics.h    |  8 ++++----
 lib/eal/loongarch/include/rte_prefetch.h        |  8 ++++----
 lib/eal/loongarch/include/rte_rwlock.h          |  4 ++--
 lib/eal/loongarch/include/rte_spinlock.h        |  6 +++---
 lib/eal/ppc/include/rte_atomic.h                |  6 +++---
 lib/eal/ppc/include/rte_byteorder.h             |  6 +++---
 lib/eal/ppc/include/rte_cpuflags.h              |  8 ++++----
 lib/eal/ppc/include/rte_cycles.h                |  8 ++++----
 lib/eal/ppc/include/rte_io.h                    |  4 ++--
 lib/eal/ppc/include/rte_memcpy.h                |  4 ++--
 lib/eal/ppc/include/rte_pause.h                 |  8 ++++----
 lib/eal/ppc/include/rte_power_intrinsics.h      |  8 ++++----
 lib/eal/ppc/include/rte_prefetch.h              |  8 ++++----
 lib/eal/ppc/include/rte_rwlock.h                |  4 ++--
 lib/eal/ppc/include/rte_spinlock.h              |  8 ++++----
 lib/eal/riscv/include/rte_atomic.h              |  8 ++++----
 lib/eal/riscv/include/rte_byteorder.h           |  8 ++++----
 lib/eal/riscv/include/rte_cpuflags.h            |  8 ++++----
 lib/eal/riscv/include/rte_cycles.h              |  4 ++--
 lib/eal/riscv/include/rte_io.h                  |  4 ++--
 lib/eal/riscv/include/rte_memcpy.h              |  4 ++--
 lib/eal/riscv/include/rte_pause.h               |  8 ++++----
 lib/eal/riscv/include/rte_power_intrinsics.h    |  8 ++++----
 lib/eal/riscv/include/rte_prefetch.h            |  8 ++++----
 lib/eal/riscv/include/rte_rwlock.h              |  4 ++--
 lib/eal/riscv/include/rte_spinlock.h            |  6 +++---
 lib/eal/windows/include/pthread.h               |  6 +++---
 lib/eal/windows/include/regex.h                 |  8 ++++----
 lib/eal/windows/include/rte_windows.h           |  8 ++++----
 lib/eal/x86/include/rte_atomic.h                |  8 ++++----
 lib/eal/x86/include/rte_byteorder.h             | 16 ++++++++--------
 lib/eal/x86/include/rte_cpuflags.h              |  8 ++++----
 lib/eal/x86/include/rte_cycles.h                |  8 ++++----
 lib/eal/x86/include/rte_io.h                    |  8 ++++----
 lib/eal/x86/include/rte_pause.h                 |  7 ++++---
 lib/eal/x86/include/rte_power_intrinsics.h      |  8 ++++----
 lib/eal/x86/include/rte_prefetch.h              |  8 ++++----
 lib/eal/x86/include/rte_rwlock.h                |  6 +++---
 lib/eal/x86/include/rte_spinlock.h              |  8 ++++----
 lib/ethdev/ethdev_driver.h                      |  8 ++++----
 lib/ethdev/ethdev_pci.h                         |  8 ++++----
 lib/ethdev/ethdev_trace.h                       |  8 ++++----
 lib/ethdev/ethdev_vdev.h                        |  8 ++++----
 lib/ethdev/rte_cman.h                           |  4 ++--
 lib/ethdev/rte_dev_info.h                       |  4 ++--
 lib/ethdev/rte_ethdev.h                         |  8 ++++----
 lib/ethdev/rte_ethdev_trace_fp.h                |  4 ++--
 lib/eventdev/event_timer_adapter_pmd.h          |  4 ++--
 lib/eventdev/eventdev_pmd.h                     |  8 ++++----
 lib/eventdev/eventdev_pmd_pci.h                 |  8 ++++----
 lib/eventdev/eventdev_pmd_vdev.h                |  8 ++++----
 lib/eventdev/eventdev_trace.h                   |  8 ++++----
 lib/eventdev/rte_event_crypto_adapter.h         |  8 ++++----
 lib/eventdev/rte_event_eth_rx_adapter.h         |  8 ++++----
 lib/eventdev/rte_event_eth_tx_adapter.h         |  8 ++++----
 lib/eventdev/rte_event_ring.h                   |  8 ++++----
 lib/eventdev/rte_event_timer_adapter.h          |  8 ++++----
 lib/eventdev/rte_eventdev.h                     |  8 ++++----
 lib/eventdev/rte_eventdev_trace_fp.h            |  4 ++--
 lib/graph/rte_graph_model_mcore_dispatch.h      |  8 ++++----
 lib/graph/rte_graph_worker.h                    |  6 +++---
 lib/gso/rte_gso.h                               |  6 +++---
 lib/hash/rte_fbk_hash.h                         |  8 ++++----
 lib/hash/rte_hash_crc.h                         |  8 ++++----
 lib/hash/rte_jhash.h                            |  8 ++++----
 lib/hash/rte_thash.h                            |  8 ++++----
 lib/hash/rte_thash_gfni.h                       |  8 ++++----
 lib/ip_frag/rte_ip_frag.h                       |  8 ++++----
 lib/ipsec/rte_ipsec.h                           |  8 ++++----
 lib/log/rte_log.h                               |  8 ++++----
 lib/lpm/rte_lpm.h                               |  8 ++++----
 lib/member/rte_member.h                         |  8 ++++----
 lib/member/rte_member_sketch.h                  |  6 +++---
 lib/member/rte_member_sketch_avx512.h           |  8 ++++----
 lib/member/rte_member_x86.h                     |  4 ++--
 lib/member/rte_xxh64_avx512.h                   |  6 +++---
 lib/mempool/mempool_trace.h                     |  8 ++++----
 lib/mempool/rte_mempool_trace_fp.h              |  4 ++--
 lib/meter/rte_meter.h                           |  8 ++++----
 lib/mldev/mldev_utils.h                         |  8 ++++----
 lib/mldev/rte_mldev_core.h                      |  8 ++++----
 lib/mldev/rte_mldev_pmd.h                       |  8 ++++----
 lib/net/rte_ether.h                             |  8 ++++----
 lib/net/rte_net.h                               |  8 ++++----
 lib/net/rte_sctp.h                              |  8 ++++----
 lib/node/rte_node_eth_api.h                     |  8 ++++----
 lib/node/rte_node_ip4_api.h                     |  8 ++++----
 lib/node/rte_node_ip6_api.h                     |  6 +++---
 lib/node/rte_node_udp4_input_api.h              |  8 ++++----
 lib/pci/rte_pci.h                               |  8 ++++----
 lib/pdcp/rte_pdcp.h                             |  8 ++++----
 lib/pipeline/rte_pipeline.h                     |  8 ++++----
 lib/pipeline/rte_port_in_action.h               |  8 ++++----
 lib/pipeline/rte_swx_ctl.h                      |  8 ++++----
 lib/pipeline/rte_swx_extern.h                   |  8 ++++----
 lib/pipeline/rte_swx_ipsec.h                    |  8 ++++----
 lib/pipeline/rte_swx_pipeline.h                 |  8 ++++----
 lib/pipeline/rte_swx_pipeline_spec.h            |  8 ++++----
 lib/pipeline/rte_table_action.h                 |  8 ++++----
 lib/port/rte_port.h                             |  8 ++++----
 lib/port/rte_port_ethdev.h                      |  8 ++++----
 lib/port/rte_port_eventdev.h                    |  8 ++++----
 lib/port/rte_port_fd.h                          |  8 ++++----
 lib/port/rte_port_frag.h                        |  8 ++++----
 lib/port/rte_port_ras.h                         |  8 ++++----
 lib/port/rte_port_ring.h                        |  8 ++++----
 lib/port/rte_port_sched.h                       |  8 ++++----
 lib/port/rte_port_source_sink.h                 |  8 ++++----
 lib/port/rte_port_sym_crypto.h                  |  8 ++++----
 lib/port/rte_swx_port.h                         |  8 ++++----
 lib/port/rte_swx_port_ethdev.h                  |  8 ++++----
 lib/port/rte_swx_port_fd.h                      |  8 ++++----
 lib/port/rte_swx_port_ring.h                    |  8 ++++----
 lib/port/rte_swx_port_source_sink.h             |  8 ++++----
 lib/rawdev/rte_rawdev.h                         |  6 +++---
 lib/rawdev/rte_rawdev_pmd.h                     |  8 ++++----
 lib/rcu/rte_rcu_qsbr.h                          |  8 ++++----
 lib/regexdev/rte_regexdev.h                     |  8 ++++----
 lib/ring/rte_ring.h                             |  6 +++---
 lib/ring/rte_ring_core.h                        |  8 ++++----
 lib/ring/rte_ring_elem.h                        |  8 ++++----
 lib/ring/rte_ring_hts.h                         |  4 ++--
 lib/ring/rte_ring_peek.h                        |  4 ++--
 lib/ring/rte_ring_peek_zc.h                     |  4 ++--
 lib/ring/rte_ring_rts.h                         |  4 ++--
 lib/sched/rte_approx.h                          |  8 ++++----
 lib/sched/rte_pie.h                             |  8 ++++----
 lib/sched/rte_red.h                             |  8 ++++----
 lib/sched/rte_sched.h                           |  8 ++++----
 lib/sched/rte_sched_common.h                    |  6 +++---
 lib/security/rte_security.h                     |  8 ++++----
 lib/security/rte_security_driver.h              |  6 +++---
 lib/stack/rte_stack.h                           |  8 ++++----
 lib/table/rte_lru.h                             | 12 ++++--------
 lib/table/rte_lru_arm64.h                       |  8 ++++----
 lib/table/rte_lru_x86.h                         |  8 --------
 lib/table/rte_swx_hash_func.h                   |  8 ++++----
 lib/table/rte_swx_keycmp.h                      |  8 ++++----
 lib/table/rte_swx_table.h                       |  8 ++++----
 lib/table/rte_swx_table_em.h                    |  8 ++++----
 lib/table/rte_swx_table_learner.h               |  8 ++++----
 lib/table/rte_swx_table_selector.h              |  8 ++++----
 lib/table/rte_swx_table_wm.h                    |  8 ++++----
 lib/table/rte_table.h                           |  8 ++++----
 lib/table/rte_table_acl.h                       |  8 ++++----
 lib/table/rte_table_array.h                     |  8 ++++----
 lib/table/rte_table_hash.h                      |  8 ++++----
 lib/table/rte_table_hash_cuckoo.h               |  8 ++++----
 lib/table/rte_table_hash_func.h                 | 12 ++++++++----
 lib/table/rte_table_lpm.h                       |  8 ++++----
 lib/table/rte_table_lpm_ipv6.h                  |  8 ++++----
 lib/table/rte_table_stub.h                      |  8 ++++----
 lib/telemetry/rte_telemetry.h                   |  8 ++++----
 lib/vhost/rte_vdpa.h                            |  8 ++++----
 lib/vhost/rte_vhost.h                           |  8 ++++----
 lib/vhost/rte_vhost_async.h                     |  8 ++++----
 lib/vhost/rte_vhost_crypto.h                    |  4 ++--
 lib/vhost/vdpa_driver.h                         |  8 ++++----
 278 files changed, 1036 insertions(+), 975 deletions(-)

diff --git a/app/test/packet_burst_generator.h b/app/test/packet_burst_generator.h
index b99286f50e..cce41bcd0f 100644
--- a/app/test/packet_burst_generator.h
+++ b/app/test/packet_burst_generator.h
@@ -5,10 +5,6 @@
 #ifndef PACKET_BURST_GENERATOR_H_
 #define PACKET_BURST_GENERATOR_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_mbuf.h>
 #include <rte_ether.h>
 #include <rte_arp.h>
@@ -17,6 +13,10 @@ extern "C" {
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define IPV4_ADDR(a, b, c, d)(((a & 0xff) << 24) | ((b & 0xff) << 16) | \
 		((c & 0xff) << 8) | (d & 0xff))
 
diff --git a/app/test/virtual_pmd.h b/app/test/virtual_pmd.h
index 120b58b273..a5a71d7cb4 100644
--- a/app/test/virtual_pmd.h
+++ b/app/test/virtual_pmd.h
@@ -5,12 +5,12 @@
 #ifndef __VIRTUAL_ETHDEV_H_
 #define __VIRTUAL_ETHDEV_H_
 
+#include <rte_ether.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ether.h>
-
 int
 virtual_ethdev_init(void);
 
diff --git a/drivers/bus/auxiliary/bus_auxiliary_driver.h b/drivers/bus/auxiliary/bus_auxiliary_driver.h
index 58fb7c7f69..40ab1f0912 100644
--- a/drivers/bus/auxiliary/bus_auxiliary_driver.h
+++ b/drivers/bus/auxiliary/bus_auxiliary_driver.h
@@ -11,10 +11,6 @@
  * Auxiliary Bus Interface.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <limits.h>
@@ -28,6 +24,10 @@ extern "C" {
 #include <dev_driver.h>
 #include <rte_kvargs.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_BUS_AUXILIARY_NAME "auxiliary"
 
 /* Forward declarations */
diff --git a/drivers/bus/cdx/bus_cdx_driver.h b/drivers/bus/cdx/bus_cdx_driver.h
index 211f8e406b..d390e7b5a1 100644
--- a/drivers/bus/cdx/bus_cdx_driver.h
+++ b/drivers/bus/cdx/bus_cdx_driver.h
@@ -10,10 +10,6 @@
  * AMD CDX bus interface
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdlib.h>
 #include <inttypes.h>
 #include <linux/types.h>
@@ -22,6 +18,10 @@ extern "C" {
 #include <dev_driver.h>
 #include <rte_interrupts.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Forward declarations */
 struct rte_cdx_device;
 struct rte_cdx_driver;
diff --git a/drivers/bus/dpaa/include/fsl_qman.h b/drivers/bus/dpaa/include/fsl_qman.h
index c0677976e8..f39007b84d 100644
--- a/drivers/bus/dpaa/include/fsl_qman.h
+++ b/drivers/bus/dpaa/include/fsl_qman.h
@@ -8,14 +8,14 @@
 #ifndef __FSL_QMAN_H
 #define __FSL_QMAN_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <dpaa_rbtree.h>
 #include <rte_compat.h>
 #include <rte_eventdev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* FQ lookups (turn this on for 64bit user-space) */
 #ifdef RTE_ARCH_64
 #define CONFIG_FSL_QMAN_FQ_LOOKUP
diff --git a/drivers/bus/fslmc/bus_fslmc_driver.h b/drivers/bus/fslmc/bus_fslmc_driver.h
index 7ac5fe6ff1..3095458133 100644
--- a/drivers/bus/fslmc/bus_fslmc_driver.h
+++ b/drivers/bus/fslmc/bus_fslmc_driver.h
@@ -13,10 +13,6 @@
  * RTE FSLMC Bus Interface
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <limits.h>
@@ -40,6 +36,10 @@ extern "C" {
 #include "portal/dpaa2_hw_pvt.h"
 #include "portal/dpaa2_hw_dpio.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define FSLMC_OBJECT_MAX_LEN 32   /**< Length of each device on bus */
 
 #define DPAA2_INVALID_MBUF_SEQN        0
diff --git a/drivers/bus/pci/bus_pci_driver.h b/drivers/bus/pci/bus_pci_driver.h
index be32263a82..2cc1119072 100644
--- a/drivers/bus/pci/bus_pci_driver.h
+++ b/drivers/bus/pci/bus_pci_driver.h
@@ -6,14 +6,14 @@
 #ifndef BUS_PCI_DRIVER_H
 #define BUS_PCI_DRIVER_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bus_pci.h>
 #include <dev_driver.h>
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Pathname of PCI devices directory. */
 __rte_internal
 const char *rte_pci_get_sysfs_path(void);
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index a3798cb1cb..19a7b15b99 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -11,10 +11,6 @@
  * PCI device & driver interface
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <limits.h>
@@ -27,6 +23,10 @@ extern "C" {
 #include <rte_interrupts.h>
 #include <rte_pci.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Forward declarations */
 struct rte_pci_device;
 struct rte_pci_driver;
diff --git a/drivers/bus/platform/bus_platform_driver.h b/drivers/bus/platform/bus_platform_driver.h
index 5ac54fb739..a6f246f7c4 100644
--- a/drivers/bus/platform/bus_platform_driver.h
+++ b/drivers/bus/platform/bus_platform_driver.h
@@ -10,10 +10,6 @@
  * Platform bus interface.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stddef.h>
 #include <stdint.h>
 
@@ -23,6 +19,10 @@ extern "C" {
 #include <rte_os.h>
 #include <rte_vfio.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Forward declarations */
 struct rte_platform_bus;
 struct rte_platform_device;
diff --git a/drivers/bus/vdev/bus_vdev_driver.h b/drivers/bus/vdev/bus_vdev_driver.h
index bc7e30d7c6..cba1fb5269 100644
--- a/drivers/bus/vdev/bus_vdev_driver.h
+++ b/drivers/bus/vdev/bus_vdev_driver.h
@@ -5,15 +5,15 @@
 #ifndef BUS_VDEV_DRIVER_H
 #define BUS_VDEV_DRIVER_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bus_vdev.h>
 #include <rte_compat.h>
 #include <dev_driver.h>
 #include <rte_devargs.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_vdev_device {
 	RTE_TAILQ_ENTRY(rte_vdev_device) next;      /**< Next attached vdev */
 	struct rte_device device;               /**< Inherit core device */
diff --git a/drivers/bus/vmbus/bus_vmbus_driver.h b/drivers/bus/vmbus/bus_vmbus_driver.h
index e2475a642d..bc394208de 100644
--- a/drivers/bus/vmbus/bus_vmbus_driver.h
+++ b/drivers/bus/vmbus/bus_vmbus_driver.h
@@ -6,14 +6,14 @@
 #ifndef BUS_VMBUS_DRIVER_H
 #define BUS_VMBUS_DRIVER_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bus_vmbus.h>
 #include <rte_compat.h>
 #include <dev_driver.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct vmbus_channel;
 struct vmbus_mon_page;
 
diff --git a/drivers/bus/vmbus/rte_bus_vmbus.h b/drivers/bus/vmbus/rte_bus_vmbus.h
index 9467bd8f3d..fd18bca73c 100644
--- a/drivers/bus/vmbus/rte_bus_vmbus.h
+++ b/drivers/bus/vmbus/rte_bus_vmbus.h
@@ -11,10 +11,6 @@
  *
  * VMBUS Interface
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <limits.h>
@@ -28,6 +24,10 @@ extern "C" {
 #include <rte_interrupts.h>
 #include <rte_vmbus_reg.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Forward declarations */
 struct rte_vmbus_device;
 struct rte_vmbus_driver;
diff --git a/drivers/dma/cnxk/cnxk_dma_event_dp.h b/drivers/dma/cnxk/cnxk_dma_event_dp.h
index 06b5ca8279..8c6cf5dd9a 100644
--- a/drivers/dma/cnxk/cnxk_dma_event_dp.h
+++ b/drivers/dma/cnxk/cnxk_dma_event_dp.h
@@ -5,16 +5,16 @@
 #ifndef _CNXK_DMA_EVENT_DP_H_
 #define _CNXK_DMA_EVENT_DP_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_common.h>
 #include <rte_compat.h>
 #include <rte_eventdev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 __rte_internal
 uint16_t cn10k_dma_adapter_enqueue(void *ws, struct rte_event ev[], uint16_t nb_events);
 
diff --git a/drivers/dma/ioat/ioat_hw_defs.h b/drivers/dma/ioat/ioat_hw_defs.h
index dc3493a78f..11893951f2 100644
--- a/drivers/dma/ioat/ioat_hw_defs.h
+++ b/drivers/dma/ioat/ioat_hw_defs.h
@@ -5,12 +5,12 @@
 #ifndef IOAT_HW_DEFS_H
 #define IOAT_HW_DEFS_H
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 #define IOAT_PCI_CHANERR_INT_OFFSET	0x180
 
 #define IOAT_VER_3_0	0x30
diff --git a/drivers/event/dlb2/rte_pmd_dlb2.h b/drivers/event/dlb2/rte_pmd_dlb2.h
index 334c6c356d..dba7fd2f43 100644
--- a/drivers/event/dlb2/rte_pmd_dlb2.h
+++ b/drivers/event/dlb2/rte_pmd_dlb2.h
@@ -11,14 +11,14 @@
 #ifndef _RTE_PMD_DLB2_H_
 #define _RTE_PMD_DLB2_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
diff --git a/drivers/mempool/dpaa2/rte_dpaa2_mempool.h b/drivers/mempool/dpaa2/rte_dpaa2_mempool.h
index 7fe3d93f61..0286090b1b 100644
--- a/drivers/mempool/dpaa2/rte_dpaa2_mempool.h
+++ b/drivers/mempool/dpaa2/rte_dpaa2_mempool.h
@@ -12,13 +12,13 @@
  *
  */
 
+#include <rte_compat.h>
+#include <rte_mempool.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_compat.h>
-#include <rte_mempool.h>
-
 /**
  * Get BPID corresponding to the packet pool
  *
diff --git a/drivers/net/avp/rte_avp_fifo.h b/drivers/net/avp/rte_avp_fifo.h
index c1658da685..879de3b1c0 100644
--- a/drivers/net/avp/rte_avp_fifo.h
+++ b/drivers/net/avp/rte_avp_fifo.h
@@ -8,10 +8,6 @@
 
 #include "rte_avp_common.h"
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef __KERNEL__
 /* Write memory barrier for kernel compiles */
 #define AVP_WMB() smp_wmb()
@@ -27,6 +23,10 @@ extern "C" {
 #ifndef __KERNEL__
 #include <rte_debug.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Initializes the avp fifo structure
  */
diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
index f10165f2c6..e59ff8793e 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -17,12 +17,12 @@
  * load balancing of network ports
  */
 
+#include <rte_ether.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ether.h>
-
 /* Supported modes of operation of link bonding library  */
 
 #define BONDING_MODE_ROUND_ROBIN		(0)
diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h
index a802f989e9..5af7e2330f 100644
--- a/drivers/net/i40e/rte_pmd_i40e.h
+++ b/drivers/net/i40e/rte_pmd_i40e.h
@@ -14,14 +14,14 @@
  *
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Response sent back to i40e driver from user app after callback
  */
diff --git a/drivers/net/mlx5/mlx5_trace.h b/drivers/net/mlx5/mlx5_trace.h
index 888d96f60b..a8f0b372c8 100644
--- a/drivers/net/mlx5/mlx5_trace.h
+++ b/drivers/net/mlx5/mlx5_trace.h
@@ -11,14 +11,14 @@
  * API for mlx5 PMD trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <mlx5_prm.h>
 #include <rte_mbuf.h>
 #include <rte_trace_point.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* TX burst subroutines trace points. */
 RTE_TRACE_POINT_FP(
 	rte_pmd_mlx5_trace_tx_entry,
diff --git a/drivers/net/ring/rte_eth_ring.h b/drivers/net/ring/rte_eth_ring.h
index 59e074d0ad..98292c7b33 100644
--- a/drivers/net/ring/rte_eth_ring.h
+++ b/drivers/net/ring/rte_eth_ring.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_ETH_RING_H_
 #define _RTE_ETH_RING_H_
 
+#include <rte_ring.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring.h>
-
 /**
  * Create a new ethdev port from a set of rings
  *
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
index 0e68b9f668..6ec59a7adc 100644
--- a/drivers/net/vhost/rte_eth_vhost.h
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -5,15 +5,15 @@
 #ifndef _RTE_ETH_VHOST_H_
 #define _RTE_ETH_VHOST_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdbool.h>
 
 #include <rte_vhost.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Event description.
  */
diff --git a/drivers/raw/ifpga/afu_pmd_core.h b/drivers/raw/ifpga/afu_pmd_core.h
index a8f1afe343..abf9e491f7 100644
--- a/drivers/raw/ifpga/afu_pmd_core.h
+++ b/drivers/raw/ifpga/afu_pmd_core.h
@@ -5,10 +5,6 @@
 #ifndef AFU_PMD_CORE_H
 #define AFU_PMD_CORE_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdio.h>
 #include <unistd.h>
@@ -20,6 +16,10 @@ extern "C" {
 
 #include "ifpga_rawdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define AFU_RAWDEV_MAX_DRVS  32
 
 struct afu_rawdev;
diff --git a/drivers/raw/ifpga/afu_pmd_he_hssi.h b/drivers/raw/ifpga/afu_pmd_he_hssi.h
index aebbe32d54..282289d912 100644
--- a/drivers/raw/ifpga/afu_pmd_he_hssi.h
+++ b/drivers/raw/ifpga/afu_pmd_he_hssi.h
@@ -5,13 +5,13 @@
 #ifndef AFU_PMD_HE_HSSI_H
 #define AFU_PMD_HE_HSSI_H
 
+#include "afu_pmd_core.h"
+#include "rte_pmd_afu.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "afu_pmd_core.h"
-#include "rte_pmd_afu.h"
-
 #define HE_HSSI_UUID_L    0xbb370242ac130002
 #define HE_HSSI_UUID_H    0x823c334c98bf11ea
 #define NUM_HE_HSSI_PORTS 8
diff --git a/drivers/raw/ifpga/afu_pmd_he_lpbk.h b/drivers/raw/ifpga/afu_pmd_he_lpbk.h
index eab7b55199..67b3653c21 100644
--- a/drivers/raw/ifpga/afu_pmd_he_lpbk.h
+++ b/drivers/raw/ifpga/afu_pmd_he_lpbk.h
@@ -5,13 +5,13 @@
 #ifndef AFU_PMD_HE_LPBK_H
 #define AFU_PMD_HE_LPBK_H
 
+#include "afu_pmd_core.h"
+#include "rte_pmd_afu.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "afu_pmd_core.h"
-#include "rte_pmd_afu.h"
-
 #define HE_LPBK_UUID_L     0xb94b12284c31e02b
 #define HE_LPBK_UUID_H     0x56e203e9864f49a7
 #define HE_MEM_LPBK_UUID_L 0xbb652a578330a8eb
diff --git a/drivers/raw/ifpga/afu_pmd_he_mem.h b/drivers/raw/ifpga/afu_pmd_he_mem.h
index 998ca92416..41854d8c58 100644
--- a/drivers/raw/ifpga/afu_pmd_he_mem.h
+++ b/drivers/raw/ifpga/afu_pmd_he_mem.h
@@ -5,13 +5,13 @@
 #ifndef AFU_PMD_HE_MEM_H
 #define AFU_PMD_HE_MEM_H
 
+#include "afu_pmd_core.h"
+#include "rte_pmd_afu.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "afu_pmd_core.h"
-#include "rte_pmd_afu.h"
-
 #define HE_MEM_TG_UUID_L  0xa3dc5b831f5cecbb
 #define HE_MEM_TG_UUID_H  0x4dadea342c7848cb
 
diff --git a/drivers/raw/ifpga/afu_pmd_n3000.h b/drivers/raw/ifpga/afu_pmd_n3000.h
index 403cc64b91..f6b6e07c6b 100644
--- a/drivers/raw/ifpga/afu_pmd_n3000.h
+++ b/drivers/raw/ifpga/afu_pmd_n3000.h
@@ -5,13 +5,13 @@
 #ifndef AFU_PMD_N3000_H
 #define AFU_PMD_N3000_H
 
+#include "afu_pmd_core.h"
+#include "rte_pmd_afu.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "afu_pmd_core.h"
-#include "rte_pmd_afu.h"
-
 #define N3000_AFU_UUID_L  0xc000c9660d824272
 #define N3000_AFU_UUID_H  0x9aeffe5f84570612
 #define N3000_NLB0_UUID_L 0xf89e433683f9040b
diff --git a/drivers/raw/ifpga/rte_pmd_afu.h b/drivers/raw/ifpga/rte_pmd_afu.h
index 5403ed25f5..0edacc3a9c 100644
--- a/drivers/raw/ifpga/rte_pmd_afu.h
+++ b/drivers/raw/ifpga/rte_pmd_afu.h
@@ -14,12 +14,12 @@
  *
  */
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 #define RTE_PMD_AFU_N3000_NLB   1
 #define RTE_PMD_AFU_N3000_DMA   2
 
diff --git a/drivers/raw/ifpga/rte_pmd_ifpga.h b/drivers/raw/ifpga/rte_pmd_ifpga.h
index 791543f2cd..36b7f9c018 100644
--- a/drivers/raw/ifpga/rte_pmd_ifpga.h
+++ b/drivers/raw/ifpga/rte_pmd_ifpga.h
@@ -14,12 +14,12 @@
  *
  */
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 #define IFPGA_MAX_PORT_NUM   4
 
 /**
diff --git a/examples/ethtool/lib/rte_ethtool.h b/examples/ethtool/lib/rte_ethtool.h
index d27e0102b1..c7dd3d9755 100644
--- a/examples/ethtool/lib/rte_ethtool.h
+++ b/examples/ethtool/lib/rte_ethtool.h
@@ -30,14 +30,14 @@
  * rte_ethtool_net_set_rx_mode      net_device_ops::ndo_set_rx_mode
  *
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_ethdev.h>
 #include <linux/ethtool.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Retrieve the Ethernet device driver information according to
  * attributes described by ethtool data structure, ethtool_drvinfo.
diff --git a/examples/qos_sched/main.h b/examples/qos_sched/main.h
index 04e77a4a10..ea66df0434 100644
--- a/examples/qos_sched/main.h
+++ b/examples/qos_sched/main.h
@@ -5,12 +5,12 @@
 #ifndef _MAIN_H_
 #define _MAIN_H_
 
+#include <rte_sched.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_sched.h>
-
 #define RTE_LOGTYPE_APP RTE_LOGTYPE_USER1
 
 /*
diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h
index eb989b20ad..6f70539815 100644
--- a/examples/vm_power_manager/channel_manager.h
+++ b/examples/vm_power_manager/channel_manager.h
@@ -5,16 +5,16 @@
 #ifndef CHANNEL_MANAGER_H_
 #define CHANNEL_MANAGER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <linux/limits.h>
 #include <linux/un.h>
 #include <stdbool.h>
 
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Maximum name length including '\0' terminator */
 #define CHANNEL_MGR_MAX_NAME_LEN    64
 
diff --git a/lib/acl/rte_acl_osdep.h b/lib/acl/rte_acl_osdep.h
index 3c1dc402ca..e4c7d07c69 100644
--- a/lib/acl/rte_acl_osdep.h
+++ b/lib/acl/rte_acl_osdep.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_ACL_OSDEP_H_
 #define _RTE_ACL_OSDEP_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -49,6 +45,10 @@ extern "C" {
 #include <rte_cpuflags.h>
 #include <rte_debug.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index 0cbfdd1c95..9e83dd2bb0 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -20,10 +20,6 @@
  * from the same queue.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdbool.h>
 
@@ -32,6 +28,10 @@ extern "C" {
 
 #include "rte_bbdev_op.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_BBDEV_MAX_DEVS
 #define RTE_BBDEV_MAX_DEVS 128  /**< Max number of devices */
 #endif
diff --git a/lib/bbdev/rte_bbdev_op.h b/lib/bbdev/rte_bbdev_op.h
index 459631d0d0..6f4bae7d0f 100644
--- a/lib/bbdev/rte_bbdev_op.h
+++ b/lib/bbdev/rte_bbdev_op.h
@@ -11,10 +11,6 @@
  * Defines wireless base band layer 1 operations and capabilities
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -23,6 +19,10 @@ extern "C" {
 #include <rte_memory.h>
 #include <rte_mempool.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Number of columns in sub-block interleaver (36.212, section 5.1.4.1.1) */
 #define RTE_BBDEV_TURBO_C_SUBBLOCK (32)
 /* Maximum size of Transport Block (36.213, Table, Table 7.1.7.2.5-1) */
diff --git a/lib/bbdev/rte_bbdev_pmd.h b/lib/bbdev/rte_bbdev_pmd.h
index 442b23943d..0a1738fc05 100644
--- a/lib/bbdev/rte_bbdev_pmd.h
+++ b/lib/bbdev/rte_bbdev_pmd.h
@@ -14,15 +14,15 @@
  * bbdev interface. User applications should not use this API.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_log.h>
 
 #include "rte_bbdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Suggested value for SW based devices */
 #define RTE_BBDEV_DEFAULT_MAX_NB_QUEUES RTE_MAX_LCORE
 
diff --git a/lib/bpf/bpf_def.h b/lib/bpf/bpf_def.h
index f08cd9106b..9f2e162914 100644
--- a/lib/bpf/bpf_def.h
+++ b/lib/bpf/bpf_def.h
@@ -7,10 +7,6 @@
 #ifndef _RTE_BPF_DEF_H_
 #define _RTE_BPF_DEF_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -25,6 +21,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 
 /*
  * The instruction encodings.
diff --git a/lib/compressdev/rte_comp.h b/lib/compressdev/rte_comp.h
index 830a240b6b..d66a4b1cb9 100644
--- a/lib/compressdev/rte_comp.h
+++ b/lib/compressdev/rte_comp.h
@@ -11,12 +11,12 @@
  * RTE definitions for Data Compression Service
  */
 
+#include <rte_mbuf.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_mbuf.h>
-
 /**
  * compression service feature flags
  *
diff --git a/lib/compressdev/rte_compressdev.h b/lib/compressdev/rte_compressdev.h
index e0294a18bd..b3392553a6 100644
--- a/lib/compressdev/rte_compressdev.h
+++ b/lib/compressdev/rte_compressdev.h
@@ -13,13 +13,13 @@
  * Defines comp device APIs for the provisioning of compression operations.
  */
 
+
+#include "rte_comp.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-
-#include "rte_comp.h"
-
 /**
  * Parameter log base 2 range description.
  * Final value will be 2^value.
diff --git a/lib/compressdev/rte_compressdev_internal.h b/lib/compressdev/rte_compressdev_internal.h
index 67f8b51a37..a980d74cbf 100644
--- a/lib/compressdev/rte_compressdev_internal.h
+++ b/lib/compressdev/rte_compressdev_internal.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_COMPRESSDEV_INTERNAL_H_
 #define _RTE_COMPRESSDEV_INTERNAL_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /* rte_compressdev_internal.h
  * This file holds Compressdev private data structures.
  */
@@ -16,6 +12,10 @@ extern "C" {
 
 #include "rte_comp.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_COMPRESSDEV_NAME_MAX_LEN	(64)
 /**< Max length of name of comp PMD */
 
diff --git a/lib/compressdev/rte_compressdev_pmd.h b/lib/compressdev/rte_compressdev_pmd.h
index 32e29c9d16..ea721f014d 100644
--- a/lib/compressdev/rte_compressdev_pmd.h
+++ b/lib/compressdev/rte_compressdev_pmd.h
@@ -13,10 +13,6 @@
  * them directly.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 
 #include <dev_driver.h>
@@ -24,6 +20,10 @@ extern "C" {
 #include "rte_compressdev.h"
 #include "rte_compressdev_internal.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_COMPRESSDEV_PMD_NAME_ARG			("name")
 #define RTE_COMPRESSDEV_PMD_SOCKET_ID_ARG		("socket_id")
 
diff --git a/lib/cryptodev/cryptodev_pmd.h b/lib/cryptodev/cryptodev_pmd.h
index 6c114f7181..3e2e2673b8 100644
--- a/lib/cryptodev/cryptodev_pmd.h
+++ b/lib/cryptodev/cryptodev_pmd.h
@@ -5,10 +5,6 @@
 #ifndef _CRYPTODEV_PMD_H_
 #define _CRYPTODEV_PMD_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** @file
  * RTE Crypto PMD APIs
  *
@@ -28,6 +24,10 @@ extern "C" {
 #include "rte_crypto.h"
 #include "rte_cryptodev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 
 #define RTE_CRYPTODEV_PMD_DEFAULT_MAX_NB_QUEUE_PAIRS	8
 
diff --git a/lib/cryptodev/cryptodev_trace.h b/lib/cryptodev/cryptodev_trace.h
index 935f0d564b..e186f0f3c1 100644
--- a/lib/cryptodev/cryptodev_trace.h
+++ b/lib/cryptodev/cryptodev_trace.h
@@ -11,14 +11,14 @@
  * API for cryptodev trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_trace_point.h>
 
 #include "rte_cryptodev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_TRACE_POINT(
 	rte_cryptodev_trace_configure,
 	RTE_TRACE_POINT_ARGS(uint8_t dev_id,
diff --git a/lib/cryptodev/rte_crypto.h b/lib/cryptodev/rte_crypto.h
index dbc2700da5..dcf4a36fb2 100644
--- a/lib/cryptodev/rte_crypto.h
+++ b/lib/cryptodev/rte_crypto.h
@@ -11,10 +11,6 @@
  * RTE Cryptography Common Definitions
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 
 #include <rte_mbuf.h>
 #include <rte_memory.h>
@@ -24,6 +20,10 @@ extern "C" {
 #include "rte_crypto_sym.h"
 #include "rte_crypto_asym.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Crypto operation types */
 enum rte_crypto_op_type {
 	RTE_CRYPTO_OP_TYPE_UNDEFINED,
diff --git a/lib/cryptodev/rte_crypto_asym.h b/lib/cryptodev/rte_crypto_asym.h
index 39d3da3952..4b7ea36961 100644
--- a/lib/cryptodev/rte_crypto_asym.h
+++ b/lib/cryptodev/rte_crypto_asym.h
@@ -14,10 +14,6 @@
  * asymmetric crypto operations.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 #include <stdint.h>
 
@@ -27,6 +23,10 @@ extern "C" {
 
 #include "rte_crypto_sym.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_cryptodev_asym_session;
 
 /** asym key exchange operation type name strings */
diff --git a/lib/cryptodev/rte_crypto_sym.h b/lib/cryptodev/rte_crypto_sym.h
index 53b18b9412..fb73024010 100644
--- a/lib/cryptodev/rte_crypto_sym.h
+++ b/lib/cryptodev/rte_crypto_sym.h
@@ -14,10 +14,6 @@
  * as supported symmetric crypto operation combinations.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 
 #include <rte_compat.h>
@@ -26,6 +22,10 @@ extern "C" {
 #include <rte_mempool.h>
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Crypto IO Vector (in analogy with struct iovec)
  * Supposed be used to pass input/output data buffers for crypto data-path
diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
index bec947f6d5..8051c5a6a3 100644
--- a/lib/cryptodev/rte_cryptodev.h
+++ b/lib/cryptodev/rte_cryptodev.h
@@ -14,10 +14,6 @@
  * authentication operations.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include "rte_kvargs.h"
 #include "rte_crypto.h"
@@ -1859,6 +1855,10 @@ int rte_cryptodev_remove_deq_callback(uint8_t dev_id,
 				      struct rte_cryptodev_cb *cb);
 
 #include <rte_cryptodev_core.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 /**
  *
  * Dequeue a burst of processed crypto operations from a queue on the crypto
diff --git a/lib/cryptodev/rte_cryptodev_trace_fp.h b/lib/cryptodev/rte_cryptodev_trace_fp.h
index dbfbc7b2e5..f23f882804 100644
--- a/lib/cryptodev/rte_cryptodev_trace_fp.h
+++ b/lib/cryptodev/rte_cryptodev_trace_fp.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_CRYPTODEV_TRACE_FP_H_
 #define _RTE_CRYPTODEV_TRACE_FP_H_
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 RTE_TRACE_POINT_FP(
 	rte_cryptodev_trace_enqueue_burst,
 	RTE_TRACE_POINT_ARGS(uint8_t dev_id, uint16_t qp_id, void **ops,
diff --git a/lib/dispatcher/rte_dispatcher.h b/lib/dispatcher/rte_dispatcher.h
index d8182d5f2c..ba2c353073 100644
--- a/lib/dispatcher/rte_dispatcher.h
+++ b/lib/dispatcher/rte_dispatcher.h
@@ -19,16 +19,16 @@
  * event device.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stdint.h>
 
 #include <rte_compat.h>
 #include <rte_eventdev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Function prototype for match callbacks.
  *
diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
index 5474a5281d..d174d325a1 100644
--- a/lib/dmadev/rte_dmadev.h
+++ b/lib/dmadev/rte_dmadev.h
@@ -772,9 +772,17 @@ struct rte_dma_sge {
 	uint32_t length; /**< The DMA operation length. */
 };
 
+#ifdef __cplusplus
+}
+#endif
+
 #include "rte_dmadev_core.h"
 #include "rte_dmadev_trace_fp.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**@{@name DMA operation flag
  * @see rte_dma_copy()
  * @see rte_dma_copy_sg()
diff --git a/lib/eal/arm/include/rte_atomic_32.h b/lib/eal/arm/include/rte_atomic_32.h
index 62fc33773d..0b9a0dfa30 100644
--- a/lib/eal/arm/include/rte_atomic_32.h
+++ b/lib/eal/arm/include/rte_atomic_32.h
@@ -9,12 +9,12 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
+#include "generic/rte_atomic.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_atomic.h"
-
 #define	rte_mb()  __sync_synchronize()
 
 #define	rte_wmb() do { asm volatile ("dmb st" : : : "memory"); } while (0)
diff --git a/lib/eal/arm/include/rte_atomic_64.h b/lib/eal/arm/include/rte_atomic_64.h
index 7c99fc0a02..181bb60929 100644
--- a/lib/eal/arm/include/rte_atomic_64.h
+++ b/lib/eal/arm/include/rte_atomic_64.h
@@ -10,14 +10,14 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "generic/rte_atomic.h"
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define rte_mb() asm volatile("dmb osh" : : : "memory")
 
 #define rte_wmb() asm volatile("dmb oshst" : : : "memory")
diff --git a/lib/eal/arm/include/rte_byteorder.h b/lib/eal/arm/include/rte_byteorder.h
index ff02052f2e..a0aaff4a28 100644
--- a/lib/eal/arm/include/rte_byteorder.h
+++ b/lib/eal/arm/include/rte_byteorder.h
@@ -9,14 +9,14 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include "generic/rte_byteorder.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* ARM architecture is bi-endian (both big and little). */
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 
diff --git a/lib/eal/arm/include/rte_cpuflags_32.h b/lib/eal/arm/include/rte_cpuflags_32.h
index 770b09b99d..7e33acd9fb 100644
--- a/lib/eal/arm/include/rte_cpuflags_32.h
+++ b/lib/eal/arm/include/rte_cpuflags_32.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_CPUFLAGS_ARM32_H_
 #define _RTE_CPUFLAGS_ARM32_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -46,6 +42,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h
index afe70209c3..f84633159e 100644
--- a/lib/eal/arm/include/rte_cpuflags_64.h
+++ b/lib/eal/arm/include/rte_cpuflags_64.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_CPUFLAGS_ARM64_H_
 #define _RTE_CPUFLAGS_ARM64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -40,6 +36,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/arm/include/rte_cycles_32.h b/lib/eal/arm/include/rte_cycles_32.h
index 859cd2e5bb..2b20c8c6f5 100644
--- a/lib/eal/arm/include/rte_cycles_32.h
+++ b/lib/eal/arm/include/rte_cycles_32.h
@@ -15,12 +15,12 @@
 
 #include <time.h>
 
+#include "generic/rte_cycles.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_cycles.h"
-
 /**
  * Read the time base register.
  *
diff --git a/lib/eal/arm/include/rte_cycles_64.h b/lib/eal/arm/include/rte_cycles_64.h
index 8b05302f47..bb76e4d7e0 100644
--- a/lib/eal/arm/include/rte_cycles_64.h
+++ b/lib/eal/arm/include/rte_cycles_64.h
@@ -6,12 +6,12 @@
 #ifndef _RTE_CYCLES_ARM64_H_
 #define _RTE_CYCLES_ARM64_H_
 
+#include "generic/rte_cycles.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_cycles.h"
-
 /** Read generic counter frequency */
 static __rte_always_inline uint64_t
 __rte_arm64_cntfrq(void)
diff --git a/lib/eal/arm/include/rte_io.h b/lib/eal/arm/include/rte_io.h
index f4e66e6bad..658768697c 100644
--- a/lib/eal/arm/include/rte_io.h
+++ b/lib/eal/arm/include/rte_io.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_IO_ARM_H_
 #define _RTE_IO_ARM_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef RTE_ARCH_64
 #include "rte_io_64.h"
 #else
 #include "generic/rte_io.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #endif
 
 #ifdef __cplusplus
diff --git a/lib/eal/arm/include/rte_io_64.h b/lib/eal/arm/include/rte_io_64.h
index 96da7789ce..88db82a7eb 100644
--- a/lib/eal/arm/include/rte_io_64.h
+++ b/lib/eal/arm/include/rte_io_64.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_IO_ARM64_H_
 #define _RTE_IO_ARM64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #define RTE_OVERRIDE_IO_H
@@ -17,6 +13,10 @@ extern "C" {
 #include <rte_compat.h>
 #include "rte_atomic_64.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static __rte_always_inline uint8_t
 rte_read8_relaxed(const volatile void *addr)
 {
diff --git a/lib/eal/arm/include/rte_memcpy_32.h b/lib/eal/arm/include/rte_memcpy_32.h
index fb3245b59c..99fd5757ca 100644
--- a/lib/eal/arm/include/rte_memcpy_32.h
+++ b/lib/eal/arm/include/rte_memcpy_32.h
@@ -8,10 +8,6 @@
 #include <stdint.h>
 #include <string.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "generic/rte_memcpy.h"
 
 #ifdef RTE_ARCH_ARM_NEON_MEMCPY
@@ -23,6 +19,10 @@ extern "C" {
 /* ARM NEON Intrinsics are used to copy data */
 #include <arm_neon.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
diff --git a/lib/eal/arm/include/rte_memcpy_64.h b/lib/eal/arm/include/rte_memcpy_64.h
index 85ad587bd3..c7d0c345ad 100644
--- a/lib/eal/arm/include/rte_memcpy_64.h
+++ b/lib/eal/arm/include/rte_memcpy_64.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_MEMCPY_ARM64_H_
 #define _RTE_MEMCPY_ARM64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <string.h>
 
@@ -18,6 +14,10 @@ extern "C" {
 #include <rte_common.h>
 #include <rte_branch_prediction.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * The memory copy performance differs on different AArch64 micro-architectures.
  * And the most recent glibc (e.g. 2.23 or later) can provide a better memcpy()
diff --git a/lib/eal/arm/include/rte_pause.h b/lib/eal/arm/include/rte_pause.h
index 6c7002ad98..8f35d60a6e 100644
--- a/lib/eal/arm/include/rte_pause.h
+++ b/lib/eal/arm/include/rte_pause.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_PAUSE_ARM_H_
 #define _RTE_PAUSE_ARM_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef RTE_ARCH_64
 #include <rte_pause_64.h>
 #else
 #include <rte_pause_32.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #endif
 
 #ifdef __cplusplus
diff --git a/lib/eal/arm/include/rte_pause_32.h b/lib/eal/arm/include/rte_pause_32.h
index d4768c7a98..7870fac763 100644
--- a/lib/eal/arm/include/rte_pause_32.h
+++ b/lib/eal/arm/include/rte_pause_32.h
@@ -5,13 +5,13 @@
 #ifndef _RTE_PAUSE_ARM32_H_
 #define _RTE_PAUSE_ARM32_H_
 
+#include <rte_common.h>
+#include "generic/rte_pause.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_pause.h"
-
 static inline void rte_pause(void)
 {
 }
diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h
index 9e2dbf3531..1526bf87cc 100644
--- a/lib/eal/arm/include/rte_pause_64.h
+++ b/lib/eal/arm/include/rte_pause_64.h
@@ -6,10 +6,6 @@
 #ifndef _RTE_PAUSE_ARM64_H_
 #define _RTE_PAUSE_ARM64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_stdatomic.h>
 
@@ -19,6 +15,10 @@ extern "C" {
 
 #include "generic/rte_pause.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_pause(void)
 {
 	asm volatile("yield" ::: "memory");
diff --git a/lib/eal/arm/include/rte_power_intrinsics.h b/lib/eal/arm/include/rte_power_intrinsics.h
index 9e498e9ebf..5481f45ad3 100644
--- a/lib/eal/arm/include/rte_power_intrinsics.h
+++ b/lib/eal/arm/include/rte_power_intrinsics.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_POWER_INTRINSIC_ARM_H_
 #define _RTE_POWER_INTRINSIC_ARM_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/arm/include/rte_prefetch_32.h b/lib/eal/arm/include/rte_prefetch_32.h
index 0e9a140c8a..619bf27c79 100644
--- a/lib/eal/arm/include/rte_prefetch_32.h
+++ b/lib/eal/arm/include/rte_prefetch_32.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_PREFETCH_ARM32_H_
 #define _RTE_PREFETCH_ARM32_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	asm volatile ("pld [%0]" : : "r" (p));
diff --git a/lib/eal/arm/include/rte_prefetch_64.h b/lib/eal/arm/include/rte_prefetch_64.h
index 22cba48e29..4f60123b8b 100644
--- a/lib/eal/arm/include/rte_prefetch_64.h
+++ b/lib/eal/arm/include/rte_prefetch_64.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_PREFETCH_ARM_64_H_
 #define _RTE_PREFETCH_ARM_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	asm volatile ("PRFM PLDL1KEEP, [%0]" : : "r" (p));
diff --git a/lib/eal/arm/include/rte_rwlock.h b/lib/eal/arm/include/rte_rwlock.h
index 18bb37b036..727cabafec 100644
--- a/lib/eal/arm/include/rte_rwlock.h
+++ b/lib/eal/arm/include/rte_rwlock.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_RWLOCK_ARM_H_
 #define _RTE_RWLOCK_ARM_H_
 
+#include "generic/rte_rwlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 {
diff --git a/lib/eal/arm/include/rte_spinlock.h b/lib/eal/arm/include/rte_spinlock.h
index a973763c23..a5d01b0d21 100644
--- a/lib/eal/arm/include/rte_spinlock.h
+++ b/lib/eal/arm/include/rte_spinlock.h
@@ -9,13 +9,13 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
+#include <rte_common.h>
+#include "generic/rte_spinlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_spinlock.h"
-
 static inline int rte_tm_supported(void)
 {
 	return 0;
diff --git a/lib/eal/freebsd/include/rte_os.h b/lib/eal/freebsd/include/rte_os.h
index 003468caff..f31f6af12d 100644
--- a/lib/eal/freebsd/include/rte_os.h
+++ b/lib/eal/freebsd/include/rte_os.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_OS_H_
 #define _RTE_OS_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * This header should contain any definition
  * which is not supported natively or named differently in FreeBSD.
@@ -17,6 +13,10 @@ extern "C" {
 #include <pthread_np.h>
 #include <sys/queue.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* These macros are compatible with system's sys/queue.h. */
 #define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
 #define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
diff --git a/lib/eal/include/bus_driver.h b/lib/eal/include/bus_driver.h
index 7b85a17a09..60527b75b6 100644
--- a/lib/eal/include/bus_driver.h
+++ b/lib/eal/include/bus_driver.h
@@ -5,16 +5,16 @@
 #ifndef BUS_DRIVER_H
 #define BUS_DRIVER_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bus.h>
 #include <rte_compat.h>
 #include <rte_dev.h>
 #include <rte_eal.h>
 #include <rte_tailq.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_devargs;
 struct rte_device;
 
diff --git a/lib/eal/include/dev_driver.h b/lib/eal/include/dev_driver.h
index 5efa8c437e..f7a9c17dc3 100644
--- a/lib/eal/include/dev_driver.h
+++ b/lib/eal/include/dev_driver.h
@@ -5,13 +5,13 @@
 #ifndef DEV_DRIVER_H
 #define DEV_DRIVER_H
 
+#include <rte_common.h>
+#include <rte_dev.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <rte_dev.h>
-
 /**
  * A structure describing a device driver.
  */
diff --git a/lib/eal/include/eal_trace_internal.h b/lib/eal/include/eal_trace_internal.h
index 09c354717f..50f91d0929 100644
--- a/lib/eal/include/eal_trace_internal.h
+++ b/lib/eal/include/eal_trace_internal.h
@@ -11,16 +11,16 @@
  * API for EAL trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_alarm.h>
 #include <rte_interrupts.h>
 #include <rte_trace_point.h>
 
 #include "eal_interrupts.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Alarm */
 RTE_TRACE_POINT(
 	rte_eal_trace_alarm_set,
diff --git a/lib/eal/include/generic/rte_byteorder.h b/lib/eal/include/generic/rte_byteorder.h
index f1c04ba83e..7973d6326f 100644
--- a/lib/eal/include/generic/rte_byteorder.h
+++ b/lib/eal/include/generic/rte_byteorder.h
@@ -24,6 +24,10 @@
 #include <rte_common.h>
 #include <rte_config.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Compile-time endianness detection
  */
@@ -251,4 +255,8 @@ static uint64_t rte_be_to_cpu_64(rte_be64_t x);
 #endif
 #endif
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_BYTEORDER_H_ */
diff --git a/lib/eal/include/generic/rte_cycles.h b/lib/eal/include/generic/rte_cycles.h
index 075e899f5a..7cfd51f0eb 100644
--- a/lib/eal/include/generic/rte_cycles.h
+++ b/lib/eal/include/generic/rte_cycles.h
@@ -16,6 +16,10 @@
 #include <rte_debug.h>
 #include <rte_atomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define MS_PER_S 1000
 #define US_PER_S 1000000
 #define NS_PER_S 1000000000
@@ -175,4 +179,8 @@ void rte_delay_us_sleep(unsigned int us);
  */
 void rte_delay_us_callback_register(void(*userfunc)(unsigned int));
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_CYCLES_H_ */
diff --git a/lib/eal/include/generic/rte_memcpy.h b/lib/eal/include/generic/rte_memcpy.h
index e7f0f8eaa9..da53b72ca8 100644
--- a/lib/eal/include/generic/rte_memcpy.h
+++ b/lib/eal/include/generic/rte_memcpy.h
@@ -5,6 +5,10 @@
 #ifndef _RTE_MEMCPY_H_
 #define _RTE_MEMCPY_H_
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @file
  *
@@ -113,4 +117,8 @@ rte_memcpy(void *dst, const void *src, size_t n);
 
 #endif /* __DOXYGEN__ */
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_MEMCPY_H_ */
diff --git a/lib/eal/include/generic/rte_pause.h b/lib/eal/include/generic/rte_pause.h
index f2a1eadcbd..968c0886d3 100644
--- a/lib/eal/include/generic/rte_pause.h
+++ b/lib/eal/include/generic/rte_pause.h
@@ -19,6 +19,10 @@
 #include <rte_atomic.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Pause CPU execution for a short while
  *
@@ -136,4 +140,8 @@ rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
 } while (0)
 #endif /* ! RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED */
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_PAUSE_H_ */
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index ea899f1bfa..86c0559468 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -9,6 +9,10 @@
 
 #include <rte_spinlock.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @file
  * Advanced power management operations.
@@ -147,4 +151,8 @@ int rte_power_pause(const uint64_t tsc_timestamp);
 int rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
 		const uint32_t num, const uint64_t tsc_timestamp);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_POWER_INTRINSIC_H_ */
diff --git a/lib/eal/include/generic/rte_prefetch.h b/lib/eal/include/generic/rte_prefetch.h
index 773b3b8d1e..f7ac4ab48a 100644
--- a/lib/eal/include/generic/rte_prefetch.h
+++ b/lib/eal/include/generic/rte_prefetch.h
@@ -7,6 +7,10 @@
 
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @file
  *
@@ -146,4 +150,8 @@ __rte_experimental
 static inline void
 rte_cldemote(const volatile void *p);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/eal/include/generic/rte_rwlock.h b/lib/eal/include/generic/rte_rwlock.h
index 5f939be98c..ac0474466a 100644
--- a/lib/eal/include/generic/rte_rwlock.h
+++ b/lib/eal/include/generic/rte_rwlock.h
@@ -22,10 +22,6 @@
  *  https://locklessinc.com/articles/locks/
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <errno.h>
 
 #include <rte_branch_prediction.h>
@@ -34,6 +30,10 @@ extern "C" {
 #include <rte_pause.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_rwlock_t type.
  *
diff --git a/lib/eal/include/generic/rte_spinlock.h b/lib/eal/include/generic/rte_spinlock.h
index 23fb04896f..c2980601b2 100644
--- a/lib/eal/include/generic/rte_spinlock.h
+++ b/lib/eal/include/generic/rte_spinlock.h
@@ -25,6 +25,10 @@
 #include <rte_pause.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_spinlock_t type.
  */
@@ -318,4 +322,8 @@ __rte_warn_unused_result
 static inline int rte_spinlock_recursive_trylock_tm(
 	rte_spinlock_recursive_t *slr);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_SPINLOCK_H_ */
diff --git a/lib/eal/include/rte_alarm.h b/lib/eal/include/rte_alarm.h
index 7e4d0b2407..9b4721b77f 100644
--- a/lib/eal/include/rte_alarm.h
+++ b/lib/eal/include/rte_alarm.h
@@ -14,12 +14,12 @@
  * Does not require hpet support.
  */
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 /**
  * Signature of callback back function called when an alarm goes off.
  */
diff --git a/lib/eal/include/rte_bitmap.h b/lib/eal/include/rte_bitmap.h
index ebe46000a0..abb102f1d3 100644
--- a/lib/eal/include/rte_bitmap.h
+++ b/lib/eal/include/rte_bitmap.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_BITMAP_H__
 #define __INCLUDE_RTE_BITMAP_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Bitmap
@@ -43,6 +39,10 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_prefetch.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Slab */
 #define RTE_BITMAP_SLAB_BIT_SIZE                 64
 #define RTE_BITMAP_SLAB_BIT_SIZE_LOG2            6
diff --git a/lib/eal/include/rte_bus.h b/lib/eal/include/rte_bus.h
index dfe756fb11..519f7b35f0 100644
--- a/lib/eal/include/rte_bus.h
+++ b/lib/eal/include/rte_bus.h
@@ -14,14 +14,14 @@
  * over the devices and drivers in EAL.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 
 #include <rte_eal.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_bus;
 struct rte_device;
 
diff --git a/lib/eal/include/rte_class.h b/lib/eal/include/rte_class.h
index 16e544ec9a..7631e36e82 100644
--- a/lib/eal/include/rte_class.h
+++ b/lib/eal/include/rte_class.h
@@ -18,12 +18,12 @@
  * cryptographic co-processor (crypto), etc.
  */
 
+#include <rte_dev.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_dev.h>
-
 /** Double linked list of classes */
 RTE_TAILQ_HEAD(rte_class_list, rte_class);
 
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index eec0400dad..2486caa471 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -12,10 +12,6 @@
  * for DPDK.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <assert.h>
 #include <limits.h>
 #include <stdint.h>
@@ -26,6 +22,10 @@ extern "C" {
 /* OS specific include */
 #include <rte_os.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_TOOLCHAIN_MSVC
 #ifndef typeof
 #define typeof __typeof__
diff --git a/lib/eal/include/rte_dev.h b/lib/eal/include/rte_dev.h
index cefa04f905..738400e8d1 100644
--- a/lib/eal/include/rte_dev.h
+++ b/lib/eal/include/rte_dev.h
@@ -13,16 +13,16 @@
  * This file manages the list of device drivers.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 
 #include <rte_config.h>
 #include <rte_common.h>
 #include <rte_log.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_bus;
 struct rte_devargs;
 struct rte_device;
diff --git a/lib/eal/include/rte_devargs.h b/lib/eal/include/rte_devargs.h
index 515e978bbe..ed5a4675d9 100644
--- a/lib/eal/include/rte_devargs.h
+++ b/lib/eal/include/rte_devargs.h
@@ -16,14 +16,14 @@
  * list of rte_devargs structures.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 
 #include <rte_dev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_bus;
 
 /**
diff --git a/lib/eal/include/rte_eal_trace.h b/lib/eal/include/rte_eal_trace.h
index c3d15bbe5e..9ad2112801 100644
--- a/lib/eal/include/rte_eal_trace.h
+++ b/lib/eal/include/rte_eal_trace.h
@@ -11,12 +11,12 @@
  * API for EAL trace support
  */
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 /* Generic */
 RTE_TRACE_POINT(
 	rte_eal_trace_generic_void,
diff --git a/lib/eal/include/rte_errno.h b/lib/eal/include/rte_errno.h
index ba45591d24..c49818a40e 100644
--- a/lib/eal/include/rte_errno.h
+++ b/lib/eal/include/rte_errno.h
@@ -11,12 +11,12 @@
 #ifndef _RTE_ERRNO_H_
 #define _RTE_ERRNO_H_
 
+#include <rte_per_lcore.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_per_lcore.h>
-
 RTE_DECLARE_PER_LCORE(int, _rte_errno); /**< Per core error number. */
 
 /**
diff --git a/lib/eal/include/rte_fbarray.h b/lib/eal/include/rte_fbarray.h
index e33076778f..27dbfc2d6c 100644
--- a/lib/eal/include/rte_fbarray.h
+++ b/lib/eal/include/rte_fbarray.h
@@ -30,14 +30,14 @@
  * another process is using ``rte_fbarray``.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 
 #include <rte_rwlock.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_FBARRAY_NAME_LEN 64
 
 struct rte_fbarray {
diff --git a/lib/eal/include/rte_keepalive.h b/lib/eal/include/rte_keepalive.h
index 3ec413da01..9ff870f6b4 100644
--- a/lib/eal/include/rte_keepalive.h
+++ b/lib/eal/include/rte_keepalive.h
@@ -10,13 +10,13 @@
 #ifndef _KEEPALIVE_H_
 #define _KEEPALIVE_H_
 
+#include <rte_config.h>
+#include <rte_memory.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_config.h>
-#include <rte_memory.h>
-
 #ifndef RTE_KEEPALIVE_MAXCORES
 /**
  * Number of cores to track.
diff --git a/lib/eal/include/rte_mcslock.h b/lib/eal/include/rte_mcslock.h
index 0aeb1a09f4..bb218d2e50 100644
--- a/lib/eal/include/rte_mcslock.h
+++ b/lib/eal/include/rte_mcslock.h
@@ -19,16 +19,16 @@
  * they acquired the lock.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_lcore.h>
 #include <rte_common.h>
 #include <rte_pause.h>
 #include <rte_branch_prediction.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_mcslock_t type.
  */
diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h
index 842362d527..dbd0a6bedc 100644
--- a/lib/eal/include/rte_memory.h
+++ b/lib/eal/include/rte_memory.h
@@ -15,16 +15,16 @@
 #include <stddef.h>
 #include <stdio.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bitops.h>
 #include <rte_common.h>
 #include <rte_config.h>
 #include <rte_eal_memconfig.h>
 #include <rte_fbarray.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_PGSIZE_4K   (1ULL << 12)
 #define RTE_PGSIZE_64K  (1ULL << 16)
 #define RTE_PGSIZE_256K (1ULL << 18)
diff --git a/lib/eal/include/rte_pci_dev_features.h b/lib/eal/include/rte_pci_dev_features.h
index ee6e10590c..bc6d3d4c1f 100644
--- a/lib/eal/include/rte_pci_dev_features.h
+++ b/lib/eal/include/rte_pci_dev_features.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_PCI_DEV_FEATURES_H
 #define _RTE_PCI_DEV_FEATURES_H
 
+#include <rte_pci_dev_feature_defs.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_pci_dev_feature_defs.h>
-
 #define RTE_INTR_MODE_NONE_NAME "none"
 #define RTE_INTR_MODE_LEGACY_NAME "legacy"
 #define RTE_INTR_MODE_MSI_NAME "msi"
diff --git a/lib/eal/include/rte_pflock.h b/lib/eal/include/rte_pflock.h
index 37aa223ac3..6797ce5920 100644
--- a/lib/eal/include/rte_pflock.h
+++ b/lib/eal/include/rte_pflock.h
@@ -27,14 +27,14 @@
  * All locks must be initialised before use, and only initialised once.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_pause.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_pflock_t type.
  */
diff --git a/lib/eal/include/rte_random.h b/lib/eal/include/rte_random.h
index 5031c6fe5f..15cbe6215a 100644
--- a/lib/eal/include/rte_random.h
+++ b/lib/eal/include/rte_random.h
@@ -11,12 +11,12 @@
  * Pseudo-random Generators in RTE
  */
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 /**
  * Seed the pseudo-random generator.
  *
diff --git a/lib/eal/include/rte_seqcount.h b/lib/eal/include/rte_seqcount.h
index 88a6746900..d71afa6ab7 100644
--- a/lib/eal/include/rte_seqcount.h
+++ b/lib/eal/include/rte_seqcount.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_SEQCOUNT_H_
 #define _RTE_SEQCOUNT_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Seqcount
@@ -27,6 +23,10 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The RTE seqcount type.
  */
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
index 2677bd9440..e0e94900d1 100644
--- a/lib/eal/include/rte_seqlock.h
+++ b/lib/eal/include/rte_seqlock.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_SEQLOCK_H_
 #define _RTE_SEQLOCK_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Seqlock
@@ -95,6 +91,10 @@ extern "C" {
 #include <rte_seqcount.h>
 #include <rte_spinlock.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The RTE seqlock type.
  */
diff --git a/lib/eal/include/rte_service.h b/lib/eal/include/rte_service.h
index e49a7a877e..94919ae584 100644
--- a/lib/eal/include/rte_service.h
+++ b/lib/eal/include/rte_service.h
@@ -23,16 +23,16 @@
  * application has access to the remaining lcores as normal.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include<stdio.h>
 #include <stdint.h>
 
 #include <rte_config.h>
 #include <rte_lcore.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_SERVICE_NAME_MAX 32
 
 /* Capabilities of a service.
diff --git a/lib/eal/include/rte_service_component.h b/lib/eal/include/rte_service_component.h
index a5350c97e5..acdf45cf60 100644
--- a/lib/eal/include/rte_service_component.h
+++ b/lib/eal/include/rte_service_component.h
@@ -10,12 +10,12 @@
  * operate, and you wish to run the component using service cores
  */
 
+#include <rte_service.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_service.h>
-
 /**
  * Signature of callback function to run a service.
  *
diff --git a/lib/eal/include/rte_stdatomic.h b/lib/eal/include/rte_stdatomic.h
index 7a081cb500..0f11a15e4e 100644
--- a/lib/eal/include/rte_stdatomic.h
+++ b/lib/eal/include/rte_stdatomic.h
@@ -7,10 +7,6 @@
 
 #include <assert.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef RTE_ENABLE_STDATOMIC
 #ifndef _MSC_VER
 #ifdef __STDC_NO_ATOMICS__
@@ -188,6 +184,7 @@ typedef int rte_memory_order;
 #endif
 
 #ifdef __cplusplus
+extern "C" {
 }
 #endif
 
diff --git a/lib/eal/include/rte_string_fns.h b/lib/eal/include/rte_string_fns.h
index 13badec7b3..702bd81251 100644
--- a/lib/eal/include/rte_string_fns.h
+++ b/lib/eal/include/rte_string_fns.h
@@ -11,10 +11,6 @@
 #ifndef _RTE_STRING_FNS_H_
 #define _RTE_STRING_FNS_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <ctype.h>
 #include <stdio.h>
 #include <string.h>
@@ -22,6 +18,10 @@ extern "C" {
 #include <rte_common.h>
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Takes string "string" parameter and splits it at character "delim"
  * up to maxtokens-1 times - to give "maxtokens" resulting tokens. Like
@@ -77,6 +77,10 @@ rte_strlcat(char *dst, const char *src, size_t size)
 	return l + strlen(src);
 }
 
+#ifdef __cplusplus
+}
+#endif
+
 /* pull in a strlcpy function */
 #ifdef RTE_EXEC_ENV_FREEBSD
 #ifndef __BSD_VISIBLE /* non-standard functions are hidden */
@@ -95,6 +99,10 @@ rte_strlcat(char *dst, const char *src, size_t size)
 #endif /* RTE_USE_LIBBSD */
 #endif /* FREEBSD */
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Copy string src to buffer dst of size dsize.
  * At most dsize-1 chars will be copied.
@@ -141,7 +149,6 @@ rte_str_skip_leading_spaces(const char *src)
 	return p;
 }
 
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/include/rte_tailq.h b/lib/eal/include/rte_tailq.h
index 931d549e59..89f7ef2134 100644
--- a/lib/eal/include/rte_tailq.h
+++ b/lib/eal/include/rte_tailq.h
@@ -10,13 +10,13 @@
  *  Here defines rte_tailq APIs for only internal use
  */
 
+#include <stdio.h>
+#include <rte_debug.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdio.h>
-#include <rte_debug.h>
-
 /** dummy structure type used by the rte_tailq APIs */
 struct rte_tailq_entry {
 	RTE_TAILQ_ENTRY(rte_tailq_entry) next; /**< Pointer entries for a tailq list */
diff --git a/lib/eal/include/rte_ticketlock.h b/lib/eal/include/rte_ticketlock.h
index 73884eb07b..e60f60699c 100644
--- a/lib/eal/include/rte_ticketlock.h
+++ b/lib/eal/include/rte_ticketlock.h
@@ -17,15 +17,15 @@
  * All locks must be initialised before use, and only initialised once.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_lcore.h>
 #include <rte_pause.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_ticketlock_t type.
  */
diff --git a/lib/eal/include/rte_time.h b/lib/eal/include/rte_time.h
index ec25f7b93d..c5c3a233e4 100644
--- a/lib/eal/include/rte_time.h
+++ b/lib/eal/include/rte_time.h
@@ -5,13 +5,13 @@
 #ifndef _RTE_TIME_H_
 #define _RTE_TIME_H_
 
+#include <stdint.h>
+#include <time.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include <time.h>
-
 #define NSEC_PER_SEC             1000000000L
 
 /**
diff --git a/lib/eal/include/rte_trace.h b/lib/eal/include/rte_trace.h
index a6e991fad3..1c824b2158 100644
--- a/lib/eal/include/rte_trace.h
+++ b/lib/eal/include/rte_trace.h
@@ -16,16 +16,16 @@
  * @b EXPERIMENTAL: this API may change without prior notice
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stdio.h>
 
 #include <rte_common.h>
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  *  Test if trace is enabled.
  *
diff --git a/lib/eal/include/rte_trace_point.h b/lib/eal/include/rte_trace_point.h
index 41e2a7f99e..bc737d585e 100644
--- a/lib/eal/include/rte_trace_point.h
+++ b/lib/eal/include/rte_trace_point.h
@@ -16,10 +16,6 @@
  * @b EXPERIMENTAL: this API may change without prior notice
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stdio.h>
 
@@ -32,6 +28,10 @@ extern "C" {
 #include <rte_string_fns.h>
 #include <rte_uuid.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** The tracepoint object. */
 typedef RTE_ATOMIC(uint64_t) rte_trace_point_t;
 
diff --git a/lib/eal/include/rte_trace_point_register.h b/lib/eal/include/rte_trace_point_register.h
index 41260e5964..8726338fe4 100644
--- a/lib/eal/include/rte_trace_point_register.h
+++ b/lib/eal/include/rte_trace_point_register.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_TRACE_POINT_REGISTER_H_
 #define _RTE_TRACE_POINT_REGISTER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef _RTE_TRACE_POINT_H_
 #error for registration, include this file first before <rte_trace_point.h>
 #endif
@@ -16,6 +12,10 @@ extern "C" {
 #include <rte_per_lcore.h>
 #include <rte_trace_point.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_DECLARE_PER_LCORE(volatile int, trace_point_sz);
 
 #define RTE_TRACE_POINT_REGISTER(trace, name) \
diff --git a/lib/eal/include/rte_uuid.h b/lib/eal/include/rte_uuid.h
index cfefd4308a..def5907a00 100644
--- a/lib/eal/include/rte_uuid.h
+++ b/lib/eal/include/rte_uuid.h
@@ -10,14 +10,14 @@
 #ifndef _RTE_UUID_H_
 #define _RTE_UUID_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stddef.h>
 #include <string.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Struct describing a Universal Unique Identifier
  */
diff --git a/lib/eal/include/rte_version.h b/lib/eal/include/rte_version.h
index 422d00fdff..be3f753617 100644
--- a/lib/eal/include/rte_version.h
+++ b/lib/eal/include/rte_version.h
@@ -10,13 +10,13 @@
 #ifndef _RTE_VERSION_H_
 #define _RTE_VERSION_H_
 
+#include <string.h>
+#include <stdio.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <string.h>
-#include <stdio.h>
-
 /**
  * Macro to compute a version number usable for comparisons
  */
diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h
index b774625d9f..06b249dca0 100644
--- a/lib/eal/include/rte_vfio.h
+++ b/lib/eal/include/rte_vfio.h
@@ -10,10 +10,6 @@
  * RTE VFIO. This library provides various VFIO related utility functions.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stdint.h>
 
@@ -36,6 +32,10 @@ extern "C" {
 
 #include <linux/vfio.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
diff --git a/lib/eal/linux/include/rte_os.h b/lib/eal/linux/include/rte_os.h
index c72bf5b7e6..dba0e29827 100644
--- a/lib/eal/linux/include/rte_os.h
+++ b/lib/eal/linux/include/rte_os.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_OS_H_
 #define _RTE_OS_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * This header should contain any definition
  * which is not supported natively or named differently in Linux.
@@ -17,6 +13,10 @@ extern "C" {
 #include <sched.h>
 #include <sys/queue.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* These macros are compatible with system's sys/queue.h. */
 #define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
 #define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
diff --git a/lib/eal/loongarch/include/rte_atomic.h b/lib/eal/loongarch/include/rte_atomic.h
index 0510b8f781..c8066a4612 100644
--- a/lib/eal/loongarch/include/rte_atomic.h
+++ b/lib/eal/loongarch/include/rte_atomic.h
@@ -9,13 +9,13 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
+#include <rte_common.h>
+#include "generic/rte_atomic.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_atomic.h"
-
 #define rte_mb()	do { asm volatile("dbar 0":::"memory"); } while (0)
 
 #define rte_wmb()	rte_mb()
diff --git a/lib/eal/loongarch/include/rte_byteorder.h b/lib/eal/loongarch/include/rte_byteorder.h
index 0da6097a4f..9b092e2a59 100644
--- a/lib/eal/loongarch/include/rte_byteorder.h
+++ b/lib/eal/loongarch/include/rte_byteorder.h
@@ -5,12 +5,12 @@
 #ifndef RTE_BYTEORDER_LOONGARCH_H
 #define RTE_BYTEORDER_LOONGARCH_H
 
+#include "generic/rte_byteorder.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_byteorder.h"
-
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 
 #define rte_cpu_to_le_16(x) (x)
diff --git a/lib/eal/loongarch/include/rte_cpuflags.h b/lib/eal/loongarch/include/rte_cpuflags.h
index 6b592c147c..c1e04ac545 100644
--- a/lib/eal/loongarch/include/rte_cpuflags.h
+++ b/lib/eal/loongarch/include/rte_cpuflags.h
@@ -5,10 +5,6 @@
 #ifndef RTE_CPUFLAGS_LOONGARCH_H
 #define RTE_CPUFLAGS_LOONGARCH_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -30,6 +26,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/loongarch/include/rte_cycles.h b/lib/eal/loongarch/include/rte_cycles.h
index f612d1ad10..128c8646e9 100644
--- a/lib/eal/loongarch/include/rte_cycles.h
+++ b/lib/eal/loongarch/include/rte_cycles.h
@@ -5,12 +5,12 @@
 #ifndef RTE_CYCLES_LOONGARCH_H
 #define RTE_CYCLES_LOONGARCH_H
 
+#include "generic/rte_cycles.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_cycles.h"
-
 /**
  * Read the time base register.
  *
diff --git a/lib/eal/loongarch/include/rte_io.h b/lib/eal/loongarch/include/rte_io.h
index 40e40efa86..e32a4737b2 100644
--- a/lib/eal/loongarch/include/rte_io.h
+++ b/lib/eal/loongarch/include/rte_io.h
@@ -5,12 +5,12 @@
 #ifndef RTE_IO_LOONGARCH_H
 #define RTE_IO_LOONGARCH_H
 
+#include "generic/rte_io.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_io.h"
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/loongarch/include/rte_memcpy.h b/lib/eal/loongarch/include/rte_memcpy.h
index 22578d40f4..5412a0fdc1 100644
--- a/lib/eal/loongarch/include/rte_memcpy.h
+++ b/lib/eal/loongarch/include/rte_memcpy.h
@@ -10,12 +10,12 @@
 
 #include "rte_common.h"
 
+#include "generic/rte_memcpy.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_memcpy.h"
-
 static inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
diff --git a/lib/eal/loongarch/include/rte_pause.h b/lib/eal/loongarch/include/rte_pause.h
index 4302e1b9be..cffa2874d6 100644
--- a/lib/eal/loongarch/include/rte_pause.h
+++ b/lib/eal/loongarch/include/rte_pause.h
@@ -5,14 +5,14 @@
 #ifndef RTE_PAUSE_LOONGARCH_H
 #define RTE_PAUSE_LOONGARCH_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "rte_atomic.h"
 
 #include "generic/rte_pause.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_pause(void)
 {
 }
diff --git a/lib/eal/loongarch/include/rte_power_intrinsics.h b/lib/eal/loongarch/include/rte_power_intrinsics.h
index d5dbd94567..9e11478206 100644
--- a/lib/eal/loongarch/include/rte_power_intrinsics.h
+++ b/lib/eal/loongarch/include/rte_power_intrinsics.h
@@ -5,14 +5,14 @@
 #ifndef RTE_POWER_INTRINSIC_LOONGARCH_H
 #define RTE_POWER_INTRINSIC_LOONGARCH_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/loongarch/include/rte_prefetch.h b/lib/eal/loongarch/include/rte_prefetch.h
index 64b1fd2c2a..8da08a5566 100644
--- a/lib/eal/loongarch/include/rte_prefetch.h
+++ b/lib/eal/loongarch/include/rte_prefetch.h
@@ -5,14 +5,14 @@
 #ifndef RTE_PREFETCH_LOONGARCH_H
 #define RTE_PREFETCH_LOONGARCH_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	__builtin_prefetch((const void *)(uintptr_t)p, 0, 3);
diff --git a/lib/eal/loongarch/include/rte_rwlock.h b/lib/eal/loongarch/include/rte_rwlock.h
index aedc6f3349..48924599c5 100644
--- a/lib/eal/loongarch/include/rte_rwlock.h
+++ b/lib/eal/loongarch/include/rte_rwlock.h
@@ -5,12 +5,12 @@
 #ifndef RTE_RWLOCK_LOONGARCH_H
 #define RTE_RWLOCK_LOONGARCH_H
 
+#include "generic/rte_rwlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 {
diff --git a/lib/eal/loongarch/include/rte_spinlock.h b/lib/eal/loongarch/include/rte_spinlock.h
index e8d34e9728..38f00f631d 100644
--- a/lib/eal/loongarch/include/rte_spinlock.h
+++ b/lib/eal/loongarch/include/rte_spinlock.h
@@ -5,13 +5,13 @@
 #ifndef RTE_SPINLOCK_LOONGARCH_H
 #define RTE_SPINLOCK_LOONGARCH_H
 
+#include <rte_common.h>
+#include "generic/rte_spinlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_spinlock.h"
-
 #ifndef RTE_FORCE_INTRINSICS
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
diff --git a/lib/eal/ppc/include/rte_atomic.h b/lib/eal/ppc/include/rte_atomic.h
index 645c7132df..6ce2e5188a 100644
--- a/lib/eal/ppc/include/rte_atomic.h
+++ b/lib/eal/ppc/include/rte_atomic.h
@@ -12,13 +12,13 @@
 #ifndef _RTE_ATOMIC_PPC_64_H_
 #define _RTE_ATOMIC_PPC_64_H_
 
+#include <stdint.h>
+#include "generic/rte_atomic.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include "generic/rte_atomic.h"
-
 #define	rte_mb()  asm volatile("sync" : : : "memory")
 
 #define	rte_wmb() asm volatile("sync" : : : "memory")
diff --git a/lib/eal/ppc/include/rte_byteorder.h b/lib/eal/ppc/include/rte_byteorder.h
index de94e2ad32..1d19e96f72 100644
--- a/lib/eal/ppc/include/rte_byteorder.h
+++ b/lib/eal/ppc/include/rte_byteorder.h
@@ -8,13 +8,13 @@
 #ifndef _RTE_BYTEORDER_PPC_64_H_
 #define _RTE_BYTEORDER_PPC_64_H_
 
+#include <stdint.h>
+#include "generic/rte_byteorder.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include "generic/rte_byteorder.h"
-
 /*
  * An architecture-optimized byte swap for a 16-bit value.
  *
diff --git a/lib/eal/ppc/include/rte_cpuflags.h b/lib/eal/ppc/include/rte_cpuflags.h
index dedc1ab469..b7bb8f6872 100644
--- a/lib/eal/ppc/include/rte_cpuflags.h
+++ b/lib/eal/ppc/include/rte_cpuflags.h
@@ -6,10 +6,6 @@
 #ifndef _RTE_CPUFLAGS_PPC_64_H_
 #define _RTE_CPUFLAGS_PPC_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -52,6 +48,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/ppc/include/rte_cycles.h b/lib/eal/ppc/include/rte_cycles.h
index 666fc9b0bf..1e6e6cccc8 100644
--- a/lib/eal/ppc/include/rte_cycles.h
+++ b/lib/eal/ppc/include/rte_cycles.h
@@ -6,10 +6,6 @@
 #ifndef _RTE_CYCLES_PPC_64_H_
 #define _RTE_CYCLES_PPC_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <features.h>
 #ifdef __GLIBC__
 #include <sys/platform/ppc.h>
@@ -20,6 +16,10 @@ extern "C" {
 #include <rte_byteorder.h>
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Read the time base register.
  *
diff --git a/lib/eal/ppc/include/rte_io.h b/lib/eal/ppc/include/rte_io.h
index 01455065e5..c9371b784e 100644
--- a/lib/eal/ppc/include/rte_io.h
+++ b/lib/eal/ppc/include/rte_io.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_IO_PPC_64_H_
 #define _RTE_IO_PPC_64_H_
 
+#include "generic/rte_io.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_io.h"
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/ppc/include/rte_memcpy.h b/lib/eal/ppc/include/rte_memcpy.h
index 6f388c0234..eae73128c4 100644
--- a/lib/eal/ppc/include/rte_memcpy.h
+++ b/lib/eal/ppc/include/rte_memcpy.h
@@ -12,12 +12,12 @@
 #include "rte_altivec.h"
 #include "rte_common.h"
 
+#include "generic/rte_memcpy.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_memcpy.h"
-
 #if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 90000)
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Warray-bounds"
diff --git a/lib/eal/ppc/include/rte_pause.h b/lib/eal/ppc/include/rte_pause.h
index 16e47ce22f..78a73aceed 100644
--- a/lib/eal/ppc/include/rte_pause.h
+++ b/lib/eal/ppc/include/rte_pause.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_PAUSE_PPC64_H_
 #define _RTE_PAUSE_PPC64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "rte_atomic.h"
 
 #include "generic/rte_pause.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_pause(void)
 {
 	/* Set hardware multi-threading low priority */
diff --git a/lib/eal/ppc/include/rte_power_intrinsics.h b/lib/eal/ppc/include/rte_power_intrinsics.h
index c0e9ac279f..6207eeb04d 100644
--- a/lib/eal/ppc/include/rte_power_intrinsics.h
+++ b/lib/eal/ppc/include/rte_power_intrinsics.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_POWER_INTRINSIC_PPC_H_
 #define _RTE_POWER_INTRINSIC_PPC_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/ppc/include/rte_prefetch.h b/lib/eal/ppc/include/rte_prefetch.h
index 2e1b5751e0..bae95af7bf 100644
--- a/lib/eal/ppc/include/rte_prefetch.h
+++ b/lib/eal/ppc/include/rte_prefetch.h
@@ -6,14 +6,14 @@
 #ifndef _RTE_PREFETCH_PPC_64_H_
 #define _RTE_PREFETCH_PPC_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	asm volatile ("dcbt 0,%[p],0" : : [p] "r" (p));
diff --git a/lib/eal/ppc/include/rte_rwlock.h b/lib/eal/ppc/include/rte_rwlock.h
index 9fadc04076..bee8da4070 100644
--- a/lib/eal/ppc/include/rte_rwlock.h
+++ b/lib/eal/ppc/include/rte_rwlock.h
@@ -3,12 +3,12 @@
 #ifndef _RTE_RWLOCK_PPC_64_H_
 #define _RTE_RWLOCK_PPC_64_H_
 
+#include "generic/rte_rwlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 {
diff --git a/lib/eal/ppc/include/rte_spinlock.h b/lib/eal/ppc/include/rte_spinlock.h
index 3a4c905b22..77f90f974a 100644
--- a/lib/eal/ppc/include/rte_spinlock.h
+++ b/lib/eal/ppc/include/rte_spinlock.h
@@ -6,14 +6,14 @@
 #ifndef _RTE_SPINLOCK_PPC_64_H_
 #define _RTE_SPINLOCK_PPC_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_pause.h>
 #include "generic/rte_spinlock.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Fixme: Use intrinsics to implement the spinlock on Power architecture */
 
 #ifndef RTE_FORCE_INTRINSICS
diff --git a/lib/eal/riscv/include/rte_atomic.h b/lib/eal/riscv/include/rte_atomic.h
index 2603bc90ea..66346ad474 100644
--- a/lib/eal/riscv/include/rte_atomic.h
+++ b/lib/eal/riscv/include/rte_atomic.h
@@ -12,15 +12,15 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include <rte_config.h>
 #include "generic/rte_atomic.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define rte_mb()	asm volatile("fence rw, rw" : : : "memory")
 
 #define rte_wmb()	asm volatile("fence w, w" : : : "memory")
diff --git a/lib/eal/riscv/include/rte_byteorder.h b/lib/eal/riscv/include/rte_byteorder.h
index 25bd0c275d..c9ff5c0dd1 100644
--- a/lib/eal/riscv/include/rte_byteorder.h
+++ b/lib/eal/riscv/include/rte_byteorder.h
@@ -8,14 +8,14 @@
 #ifndef RTE_BYTEORDER_RISCV_H
 #define RTE_BYTEORDER_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include "generic/rte_byteorder.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_BYTE_ORDER
 #define RTE_BYTE_ORDER RTE_LITTLE_ENDIAN
 #endif
diff --git a/lib/eal/riscv/include/rte_cpuflags.h b/lib/eal/riscv/include/rte_cpuflags.h
index d742efc40f..ac2004f02d 100644
--- a/lib/eal/riscv/include/rte_cpuflags.h
+++ b/lib/eal/riscv/include/rte_cpuflags.h
@@ -8,10 +8,6 @@
 #ifndef RTE_CPUFLAGS_RISCV_H
 #define RTE_CPUFLAGS_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -46,6 +42,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/riscv/include/rte_cycles.h b/lib/eal/riscv/include/rte_cycles.h
index 04750ca253..7926809a73 100644
--- a/lib/eal/riscv/include/rte_cycles.h
+++ b/lib/eal/riscv/include/rte_cycles.h
@@ -8,12 +8,12 @@
 #ifndef RTE_CYCLES_RISCV_H
 #define RTE_CYCLES_RISCV_H
 
+#include "generic/rte_cycles.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_cycles.h"
-
 #ifndef RTE_RISCV_RDTSC_USE_HPM
 #define RTE_RISCV_RDTSC_USE_HPM 0
 #endif
diff --git a/lib/eal/riscv/include/rte_io.h b/lib/eal/riscv/include/rte_io.h
index 29659c9590..911dbb6bd2 100644
--- a/lib/eal/riscv/include/rte_io.h
+++ b/lib/eal/riscv/include/rte_io.h
@@ -8,12 +8,12 @@
 #ifndef RTE_IO_RISCV_H
 #define RTE_IO_RISCV_H
 
+#include "generic/rte_io.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_io.h"
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/riscv/include/rte_memcpy.h b/lib/eal/riscv/include/rte_memcpy.h
index e34f19396e..d8a942c5d2 100644
--- a/lib/eal/riscv/include/rte_memcpy.h
+++ b/lib/eal/riscv/include/rte_memcpy.h
@@ -12,12 +12,12 @@
 
 #include "rte_common.h"
 
+#include "generic/rte_memcpy.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_memcpy.h"
-
 static inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
diff --git a/lib/eal/riscv/include/rte_pause.h b/lib/eal/riscv/include/rte_pause.h
index cb8e9ca52d..3f473cd8db 100644
--- a/lib/eal/riscv/include/rte_pause.h
+++ b/lib/eal/riscv/include/rte_pause.h
@@ -7,14 +7,14 @@
 #ifndef RTE_PAUSE_RISCV_H
 #define RTE_PAUSE_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "rte_atomic.h"
 
 #include "generic/rte_pause.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_pause(void)
 {
 	/* Insert pause hint directly to be compatible with old compilers.
diff --git a/lib/eal/riscv/include/rte_power_intrinsics.h b/lib/eal/riscv/include/rte_power_intrinsics.h
index 636e58e71f..3f7dba1640 100644
--- a/lib/eal/riscv/include/rte_power_intrinsics.h
+++ b/lib/eal/riscv/include/rte_power_intrinsics.h
@@ -7,14 +7,14 @@
 #ifndef RTE_POWER_INTRINSIC_RISCV_H
 #define RTE_POWER_INTRINSIC_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/riscv/include/rte_prefetch.h b/lib/eal/riscv/include/rte_prefetch.h
index 748cf1b626..42146491ea 100644
--- a/lib/eal/riscv/include/rte_prefetch.h
+++ b/lib/eal/riscv/include/rte_prefetch.h
@@ -8,14 +8,14 @@
 #ifndef RTE_PREFETCH_RISCV_H
 #define RTE_PREFETCH_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	RTE_SET_USED(p);
diff --git a/lib/eal/riscv/include/rte_rwlock.h b/lib/eal/riscv/include/rte_rwlock.h
index 9cdaf1b0ef..730970eecb 100644
--- a/lib/eal/riscv/include/rte_rwlock.h
+++ b/lib/eal/riscv/include/rte_rwlock.h
@@ -7,12 +7,12 @@
 #ifndef RTE_RWLOCK_RISCV_H
 #define RTE_RWLOCK_RISCV_H
 
+#include "generic/rte_rwlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 {
diff --git a/lib/eal/riscv/include/rte_spinlock.h b/lib/eal/riscv/include/rte_spinlock.h
index 6af430735c..5fe4980e44 100644
--- a/lib/eal/riscv/include/rte_spinlock.h
+++ b/lib/eal/riscv/include/rte_spinlock.h
@@ -12,13 +12,13 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
+#include <rte_common.h>
+#include "generic/rte_spinlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_spinlock.h"
-
 static inline int rte_tm_supported(void)
 {
 	return 0;
diff --git a/lib/eal/windows/include/pthread.h b/lib/eal/windows/include/pthread.h
index 051b9311c2..e1c31017d1 100644
--- a/lib/eal/windows/include/pthread.h
+++ b/lib/eal/windows/include/pthread.h
@@ -13,13 +13,13 @@
  * eal_common_thread.c and common\include\rte_per_lcore.h as Microsoft libc
  * does not contain pthread.h. This may be removed in future releases.
  */
+#include <rte_common.h>
+#include <rte_windows.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <rte_windows.h>
-
 #define PTHREAD_BARRIER_SERIAL_THREAD TRUE
 
 /* defining pthread_t type on Windows since there is no in Microsoft libc*/
diff --git a/lib/eal/windows/include/regex.h b/lib/eal/windows/include/regex.h
index 827f938414..a224c0cd29 100644
--- a/lib/eal/windows/include/regex.h
+++ b/lib/eal/windows/include/regex.h
@@ -10,15 +10,15 @@
  * as Microsoft libc does not contain regex.h. This may be removed in
  * future releases.
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #define REG_NOMATCH 1
 #define REG_ESPACE 12
 
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* defining regex_t for Windows */
 typedef void *regex_t;
 /* defining regmatch_t for Windows */
diff --git a/lib/eal/windows/include/rte_windows.h b/lib/eal/windows/include/rte_windows.h
index 567ed7d820..e78f007ffa 100644
--- a/lib/eal/windows/include/rte_windows.h
+++ b/lib/eal/windows/include/rte_windows.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_WINDOWS_H_
 #define _RTE_WINDOWS_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file Windows-specific facilities
  *
@@ -44,6 +40,10 @@ extern "C" {
 #include <devguid.h>
 #include <rte_log.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Log GetLastError() with context, usually a Win32 API function and arguments.
  */
diff --git a/lib/eal/x86/include/rte_atomic.h b/lib/eal/x86/include/rte_atomic.h
index 74b1b24b7a..ad571ad132 100644
--- a/lib/eal/x86/include/rte_atomic.h
+++ b/lib/eal/x86/include/rte_atomic.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_ATOMIC_X86_H_
 #define _RTE_ATOMIC_X86_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include <rte_config.h>
@@ -279,6 +275,10 @@ static inline int rte_atomic32_dec_and_test(rte_atomic32_t *v)
 #include "rte_atomic_32.h"
 #else
 #include "rte_atomic_64.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #endif
 
 #endif
diff --git a/lib/eal/x86/include/rte_byteorder.h b/lib/eal/x86/include/rte_byteorder.h
index adbec0c157..5a49ffcd50 100644
--- a/lib/eal/x86/include/rte_byteorder.h
+++ b/lib/eal/x86/include/rte_byteorder.h
@@ -5,15 +5,15 @@
 #ifndef _RTE_BYTEORDER_X86_H_
 #define _RTE_BYTEORDER_X86_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include <rte_config.h>
 #include "generic/rte_byteorder.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_BYTE_ORDER
 #define RTE_BYTE_ORDER RTE_LITTLE_ENDIAN
 #endif
@@ -48,6 +48,10 @@ static inline uint32_t rte_arch_bswap32(uint32_t _x)
 	return x;
 }
 
+#ifdef __cplusplus
+}
+#endif
+
 #define rte_bswap16(x) ((uint16_t)(__builtin_constant_p(x) ?		\
 				   rte_constant_bswap16(x) :		\
 				   rte_arch_bswap16(x)))
@@ -83,8 +87,4 @@ static inline uint32_t rte_arch_bswap32(uint32_t _x)
 #define rte_be_to_cpu_32(x) rte_bswap32(x)
 #define rte_be_to_cpu_64(x) rte_bswap64(x)
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif /* _RTE_BYTEORDER_X86_H_ */
diff --git a/lib/eal/x86/include/rte_cpuflags.h b/lib/eal/x86/include/rte_cpuflags.h
index 1ee00e70fe..e843d1e5f4 100644
--- a/lib/eal/x86/include/rte_cpuflags.h
+++ b/lib/eal/x86/include/rte_cpuflags.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_CPUFLAGS_X86_64_H_
 #define _RTE_CPUFLAGS_X86_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 enum rte_cpu_flag_t {
 	/* (EAX 01h) ECX features*/
 	RTE_CPUFLAG_SSE3 = 0,               /**< SSE3 */
@@ -138,6 +134,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/x86/include/rte_cycles.h b/lib/eal/x86/include/rte_cycles.h
index 2afe85e28c..8de43840da 100644
--- a/lib/eal/x86/include/rte_cycles.h
+++ b/lib/eal/x86/include/rte_cycles.h
@@ -12,10 +12,6 @@
 #include <x86intrin.h>
 #endif
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "generic/rte_cycles.h"
 
 #ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
@@ -26,6 +22,10 @@ extern int rte_cycles_vmware_tsc_map;
 #include <rte_common.h>
 #include <rte_config.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline uint64_t
 rte_rdtsc(void)
 {
diff --git a/lib/eal/x86/include/rte_io.h b/lib/eal/x86/include/rte_io.h
index 0e1fefdee1..c11cb8cd89 100644
--- a/lib/eal/x86/include/rte_io.h
+++ b/lib/eal/x86/include/rte_io.h
@@ -5,16 +5,16 @@
 #ifndef _RTE_IO_X86_H_
 #define _RTE_IO_X86_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include "rte_cpuflags.h"
 
 #define RTE_NATIVE_WRITE32_WC
 #include "generic/rte_io.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * MOVDIRI wrapper.
diff --git a/lib/eal/x86/include/rte_pause.h b/lib/eal/x86/include/rte_pause.h
index b4cf1df1d0..54f028b295 100644
--- a/lib/eal/x86/include/rte_pause.h
+++ b/lib/eal/x86/include/rte_pause.h
@@ -5,13 +5,14 @@
 #ifndef _RTE_PAUSE_X86_H_
 #define _RTE_PAUSE_X86_H_
 
+#include "generic/rte_pause.h"
+
+#include <emmintrin.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_pause.h"
-
-#include <emmintrin.h>
 static inline void rte_pause(void)
 {
 	_mm_pause();
diff --git a/lib/eal/x86/include/rte_power_intrinsics.h b/lib/eal/x86/include/rte_power_intrinsics.h
index e4c2b87f73..fcb780fc5b 100644
--- a/lib/eal/x86/include/rte_power_intrinsics.h
+++ b/lib/eal/x86/include/rte_power_intrinsics.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_POWER_INTRINSIC_X86_H_
 #define _RTE_POWER_INTRINSIC_X86_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/x86/include/rte_prefetch.h b/lib/eal/x86/include/rte_prefetch.h
index 8a9377714f..34a609cc65 100644
--- a/lib/eal/x86/include/rte_prefetch.h
+++ b/lib/eal/x86/include/rte_prefetch.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_PREFETCH_X86_64_H_
 #define _RTE_PREFETCH_X86_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef RTE_TOOLCHAIN_MSVC
 #include <emmintrin.h>
 #endif
@@ -17,6 +13,10 @@ extern "C" {
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 #ifdef RTE_TOOLCHAIN_MSVC
diff --git a/lib/eal/x86/include/rte_rwlock.h b/lib/eal/x86/include/rte_rwlock.h
index 1796b69265..281eff33b9 100644
--- a/lib/eal/x86/include/rte_rwlock.h
+++ b/lib/eal/x86/include/rte_rwlock.h
@@ -5,13 +5,13 @@
 #ifndef _RTE_RWLOCK_X86_64_H_
 #define _RTE_RWLOCK_X86_64_H_
 
+#include "generic/rte_rwlock.h"
+#include "rte_spinlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-#include "rte_spinlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 	__rte_no_thread_safety_analysis
diff --git a/lib/eal/x86/include/rte_spinlock.h b/lib/eal/x86/include/rte_spinlock.h
index a6c23ea1f6..5632dec73e 100644
--- a/lib/eal/x86/include/rte_spinlock.h
+++ b/lib/eal/x86/include/rte_spinlock.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_SPINLOCK_X86_64_H_
 #define _RTE_SPINLOCK_X86_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "generic/rte_spinlock.h"
 #include "rte_rtm.h"
 #include "rte_cpuflags.h"
@@ -17,6 +13,10 @@ extern "C" {
 #include "rte_pause.h"
 #include "rte_cycles.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_RTM_MAX_RETRIES (20)
 #define RTE_XABORT_LOCK_BUSY (0xff)
 
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 883e59a927..ae00ead865 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_ETHDEV_DRIVER_H_
 #define _RTE_ETHDEV_DRIVER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -24,6 +20,10 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_ethdev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * Structure used to hold information about the callbacks to be called for a
diff --git a/lib/ethdev/ethdev_pci.h b/lib/ethdev/ethdev_pci.h
index ec4f731270..2229ffa252 100644
--- a/lib/ethdev/ethdev_pci.h
+++ b/lib/ethdev/ethdev_pci.h
@@ -6,16 +6,16 @@
 #ifndef _RTE_ETHDEV_PCI_H_
 #define _RTE_ETHDEV_PCI_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_malloc.h>
 #include <rte_pci.h>
 #include <bus_pci_driver.h>
 #include <rte_config.h>
 #include <ethdev_driver.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Copy pci device info to the Ethernet device data.
  * Shared memory (eth_dev->data) only updated by primary process, so it is safe
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 3bec87bfdb..36a38f718a 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -11,10 +11,6 @@
  * API for ethdev trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <dev_driver.h>
 #include <rte_trace_point.h>
 
@@ -22,6 +18,10 @@ extern "C" {
 #include "rte_mtr.h"
 #include "rte_tm.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_TRACE_POINT(
 	rte_ethdev_trace_configure,
 	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t nb_rx_q,
diff --git a/lib/ethdev/ethdev_vdev.h b/lib/ethdev/ethdev_vdev.h
index 364f140f91..010ec75a00 100644
--- a/lib/ethdev/ethdev_vdev.h
+++ b/lib/ethdev/ethdev_vdev.h
@@ -6,15 +6,15 @@
 #ifndef _RTE_ETHDEV_VDEV_H_
 #define _RTE_ETHDEV_VDEV_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_config.h>
 #include <rte_malloc.h>
 #include <bus_vdev_driver.h>
 #include <ethdev_driver.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * Allocates a new ethdev slot for an Ethernet device and returns the pointer
diff --git a/lib/ethdev/rte_cman.h b/lib/ethdev/rte_cman.h
index 297db8e095..dedd6cb71a 100644
--- a/lib/ethdev/rte_cman.h
+++ b/lib/ethdev/rte_cman.h
@@ -5,12 +5,12 @@
 #ifndef RTE_CMAN_H
 #define RTE_CMAN_H
 
+#include <rte_bitops.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_bitops.h>
-
 /**
  * @file
  * Congestion management related parameters for DPDK.
diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
index 67cf0ae526..4fde2ad408 100644
--- a/lib/ethdev/rte_dev_info.h
+++ b/lib/ethdev/rte_dev_info.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_DEV_INFO_H_
 #define _RTE_DEV_INFO_H_
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 /*
  * Placeholder for accessing device registers
  */
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 548fada1c7..a75e26bf07 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -145,10 +145,6 @@
  * a 0 value by the receive function of the driver for a given number of tries.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 /* Use this macro to check if LRO API is supported */
@@ -5966,6 +5962,10 @@ int rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config *config
 
 #include <rte_ethdev_core.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * Helper routine for rte_eth_rx_burst().
diff --git a/lib/ethdev/rte_ethdev_trace_fp.h b/lib/ethdev/rte_ethdev_trace_fp.h
index 40b6e4756b..c11b4f18f7 100644
--- a/lib/ethdev/rte_ethdev_trace_fp.h
+++ b/lib/ethdev/rte_ethdev_trace_fp.h
@@ -11,12 +11,12 @@
  * API for ethdev trace support
  */
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 RTE_TRACE_POINT_FP(
 	rte_ethdev_trace_rx_burst,
 	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
diff --git a/lib/eventdev/event_timer_adapter_pmd.h b/lib/eventdev/event_timer_adapter_pmd.h
index cd5127f047..fffcd90c8f 100644
--- a/lib/eventdev/event_timer_adapter_pmd.h
+++ b/lib/eventdev/event_timer_adapter_pmd.h
@@ -16,12 +16,12 @@
  * versioning.
  */
 
+#include "rte_event_timer_adapter.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "rte_event_timer_adapter.h"
-
 /*
  * Definitions of functions exported by an event timer adapter implementation
  * through *rte_event_timer_adapter_ops* structure supplied in the
diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h
index 7a5699f14b..fd5f7a14f4 100644
--- a/lib/eventdev/eventdev_pmd.h
+++ b/lib/eventdev/eventdev_pmd.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_EVENTDEV_PMD_H_
 #define _RTE_EVENTDEV_PMD_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** @file
  * RTE Event PMD APIs
  *
@@ -31,6 +27,10 @@ extern "C" {
 #include "event_timer_adapter_pmd.h"
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 extern int rte_event_logtype;
 #define RTE_LOGTYPE_EVENTDEV rte_event_logtype
 
diff --git a/lib/eventdev/eventdev_pmd_pci.h b/lib/eventdev/eventdev_pmd_pci.h
index 26aa3a6635..5cb5916a84 100644
--- a/lib/eventdev/eventdev_pmd_pci.h
+++ b/lib/eventdev/eventdev_pmd_pci.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_EVENTDEV_PMD_PCI_H_
 #define _RTE_EVENTDEV_PMD_PCI_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** @file
  * RTE Eventdev PCI PMD APIs
  *
@@ -28,6 +24,10 @@ extern "C" {
 
 #include "eventdev_pmd.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 typedef int (*eventdev_pmd_pci_callback_t)(struct rte_eventdev *dev);
 
 /**
diff --git a/lib/eventdev/eventdev_pmd_vdev.h b/lib/eventdev/eventdev_pmd_vdev.h
index bb433ba955..4eaefa0b0b 100644
--- a/lib/eventdev/eventdev_pmd_vdev.h
+++ b/lib/eventdev/eventdev_pmd_vdev.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_EVENTDEV_PMD_VDEV_H_
 #define _RTE_EVENTDEV_PMD_VDEV_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** @file
  * RTE Eventdev VDEV PMD APIs
  *
@@ -27,6 +23,10 @@ extern "C" {
 
 #include "eventdev_pmd.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * Creates a new virtual event device and returns the pointer to that device.
diff --git a/lib/eventdev/eventdev_trace.h b/lib/eventdev/eventdev_trace.h
index 9c2b261c06..8ff8841729 100644
--- a/lib/eventdev/eventdev_trace.h
+++ b/lib/eventdev/eventdev_trace.h
@@ -11,10 +11,6 @@
  * API for ethdev trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_trace_point.h>
 
 #include "rte_eventdev.h"
@@ -22,6 +18,10 @@ extern "C" {
 #include "rte_event_eth_rx_adapter.h"
 #include "rte_event_timer_adapter.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_TRACE_POINT(
 	rte_eventdev_trace_configure,
 	RTE_TRACE_POINT_ARGS(uint8_t dev_id,
diff --git a/lib/eventdev/rte_event_crypto_adapter.h b/lib/eventdev/rte_event_crypto_adapter.h
index e07f159b77..c9b277c664 100644
--- a/lib/eventdev/rte_event_crypto_adapter.h
+++ b/lib/eventdev/rte_event_crypto_adapter.h
@@ -167,14 +167,14 @@
  * from the start of the rte_crypto_op including initialization vector (IV).
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Crypto event adapter mode
  */
diff --git a/lib/eventdev/rte_event_eth_rx_adapter.h b/lib/eventdev/rte_event_eth_rx_adapter.h
index cf42c69b0d..9237e198a7 100644
--- a/lib/eventdev/rte_event_eth_rx_adapter.h
+++ b/lib/eventdev/rte_event_eth_rx_adapter.h
@@ -87,10 +87,6 @@
  * event based so the callback can also modify the event data if it needs to.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -98,6 +94,10 @@ extern "C" {
 
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_EVENT_ETH_RX_ADAPTER_MAX_INSTANCE 32
 
 /* struct rte_event_eth_rx_adapter_queue_conf flags definitions */
diff --git a/lib/eventdev/rte_event_eth_tx_adapter.h b/lib/eventdev/rte_event_eth_tx_adapter.h
index b38b3fce97..ef01345ac2 100644
--- a/lib/eventdev/rte_event_eth_tx_adapter.h
+++ b/lib/eventdev/rte_event_eth_tx_adapter.h
@@ -76,10 +76,6 @@
  * impact due to a change in how the transmit queue index is specified.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -87,6 +83,10 @@ extern "C" {
 
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Adapter configuration structure
  *
diff --git a/lib/eventdev/rte_event_ring.h b/lib/eventdev/rte_event_ring.h
index f9cf19ae16..5769da269e 100644
--- a/lib/eventdev/rte_event_ring.h
+++ b/lib/eventdev/rte_event_ring.h
@@ -14,10 +14,6 @@
 #ifndef _RTE_EVENT_RING_
 #define _RTE_EVENT_RING_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_common.h>
@@ -25,6 +21,10 @@ extern "C" {
 #include <rte_ring_elem.h>
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_TAILQ_EVENT_RING_NAME "RTE_EVENT_RING"
 
 /**
diff --git a/lib/eventdev/rte_event_timer_adapter.h b/lib/eventdev/rte_event_timer_adapter.h
index 0bd1b30045..256807b3bf 100644
--- a/lib/eventdev/rte_event_timer_adapter.h
+++ b/lib/eventdev/rte_event_timer_adapter.h
@@ -107,14 +107,14 @@
  * All these use cases require high resolution and low time drift.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 
 #include "rte_eventdev.h"
 #include "rte_eventdev_trace_fp.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Timer adapter clock source
  */
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index 08e5f9320b..e5c5b7df64 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -237,10 +237,6 @@
  * \endcode
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_errno.h>
@@ -2469,6 +2465,10 @@ rte_event_vector_pool_create(const char *name, unsigned int n,
 
 #include <rte_eventdev_core.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static __rte_always_inline uint16_t
 __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
 			  const struct rte_event ev[], uint16_t nb_events,
diff --git a/lib/eventdev/rte_eventdev_trace_fp.h b/lib/eventdev/rte_eventdev_trace_fp.h
index 04d510ad00..8656f1e6e4 100644
--- a/lib/eventdev/rte_eventdev_trace_fp.h
+++ b/lib/eventdev/rte_eventdev_trace_fp.h
@@ -11,12 +11,12 @@
  * API for ethdev trace support
  */
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 RTE_TRACE_POINT_FP(
 	rte_eventdev_trace_deq_burst,
 	RTE_TRACE_POINT_ARGS(uint8_t dev_id, uint8_t port_id, void *ev_table,
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 732b89297f..f9ff3daa88 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -12,10 +12,6 @@
  * dispatch model.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_errno.h>
 #include <rte_mempool.h>
 #include <rte_memzone.h>
@@ -23,6 +19,10 @@ extern "C" {
 
 #include "rte_graph_worker_common.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
 #define RTE_GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
 	((typeof(nb_nodes))((nb_nodes) * RTE_GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 03d0e01b68..b0f952a82c 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -6,13 +6,13 @@
 #ifndef _RTE_GRAPH_WORKER_H_
 #define _RTE_GRAPH_WORKER_H_
 
+#include "rte_graph_model_rtc.h"
+#include "rte_graph_model_mcore_dispatch.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "rte_graph_model_rtc.h"
-#include "rte_graph_model_mcore_dispatch.h"
-
 /**
  * Perform graph walk on the circular buffer and invoke the process function
  * of the nodes and collect the stats.
diff --git a/lib/gso/rte_gso.h b/lib/gso/rte_gso.h
index d60cb65f18..75246989dc 100644
--- a/lib/gso/rte_gso.h
+++ b/lib/gso/rte_gso.h
@@ -10,13 +10,13 @@
  * Interface to GSO library
  */
 
+#include <stdint.h>
+#include <rte_mbuf.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include <rte_mbuf.h>
-
 /* Minimum GSO segment size for TCP based packets. */
 #define RTE_GSO_SEG_SIZE_MIN (sizeof(struct rte_ether_hdr) + \
 		sizeof(struct rte_ipv4_hdr) + sizeof(struct rte_tcp_hdr) + 1)
diff --git a/lib/hash/rte_fbk_hash.h b/lib/hash/rte_fbk_hash.h
index b01126999b..1f0c1d1b6c 100644
--- a/lib/hash/rte_fbk_hash.h
+++ b/lib/hash/rte_fbk_hash.h
@@ -18,15 +18,15 @@
 #include <stdint.h>
 #include <errno.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 
 #include <rte_hash_crc.h>
 #include <rte_jhash.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_FBK_HASH_INIT_VAL_DEFAULT
 /** Initialising value used when calculating hash. */
 #define RTE_FBK_HASH_INIT_VAL_DEFAULT		0xFFFFFFFF
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 8ad2422ec3..fa07c97685 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -11,10 +11,6 @@
  * RTE CRC Hash
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_branch_prediction.h>
@@ -39,6 +35,10 @@ extern uint8_t rte_hash_crc32_alg;
 #include "rte_crc_generic.h"
 #endif
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
  * calculation.
diff --git a/lib/hash/rte_jhash.h b/lib/hash/rte_jhash.h
index f2446f081e..b70799d209 100644
--- a/lib/hash/rte_jhash.h
+++ b/lib/hash/rte_jhash.h
@@ -11,10 +11,6 @@
  * jhash functions.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <string.h>
 #include <limits.h>
@@ -23,6 +19,10 @@ extern "C" {
 #include <rte_log.h>
 #include <rte_byteorder.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* jhash.h: Jenkins hash support.
  *
  * Copyright (C) 2006 Bob Jenkins (bob_jenkins@burtleburtle.net)
diff --git a/lib/hash/rte_thash.h b/lib/hash/rte_thash.h
index 30b657e67a..ec9bc57efa 100644
--- a/lib/hash/rte_thash.h
+++ b/lib/hash/rte_thash.h
@@ -15,10 +15,6 @@
  * after GRE header decapsulating)
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_byteorder.h>
@@ -28,6 +24,10 @@ extern "C" {
 
 #if defined(RTE_ARCH_X86) || defined(__ARM_NEON)
 #include <rte_vect.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #endif
 
 #ifdef RTE_ARCH_X86
diff --git a/lib/hash/rte_thash_gfni.h b/lib/hash/rte_thash_gfni.h
index 132f37506d..5234c1697f 100644
--- a/lib/hash/rte_thash_gfni.h
+++ b/lib/hash/rte_thash_gfni.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_THASH_GFNI_H_
 #define _RTE_THASH_GFNI_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_log.h>
 
@@ -16,6 +12,10 @@ extern "C" {
 
 #include <rte_thash_x86_gfni.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #endif
 
 /**
diff --git a/lib/ip_frag/rte_ip_frag.h b/lib/ip_frag/rte_ip_frag.h
index 2ad318096b..84fd717953 100644
--- a/lib/ip_frag/rte_ip_frag.h
+++ b/lib/ip_frag/rte_ip_frag.h
@@ -12,10 +12,6 @@
  * Implementation of IP packet fragmentation and reassembly.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdio.h>
 
@@ -25,6 +21,10 @@ extern "C" {
 #include <rte_ip.h>
 #include <rte_byteorder.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_mbuf;
 
 /** death row size (in packets) */
diff --git a/lib/ipsec/rte_ipsec.h b/lib/ipsec/rte_ipsec.h
index f15f6f2966..28b7a61aea 100644
--- a/lib/ipsec/rte_ipsec.h
+++ b/lib/ipsec/rte_ipsec.h
@@ -17,10 +17,6 @@
 #include <rte_ipsec_sa.h>
 #include <rte_mbuf.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 struct rte_ipsec_session;
 
 /**
@@ -181,6 +177,10 @@ rte_ipsec_telemetry_sa_del(const struct rte_ipsec_sa *sa);
 
 #include <rte_ipsec_group.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/log/rte_log.h b/lib/log/rte_log.h
index f357c59548..3735137150 100644
--- a/lib/log/rte_log.h
+++ b/lib/log/rte_log.h
@@ -13,10 +13,6 @@
  * This file provides a log API to RTE applications.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <assert.h>
 #include <stdint.h>
 #include <stdio.h>
@@ -26,6 +22,10 @@ extern "C" {
 #include <rte_common.h>
 #include <rte_config.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* SDK log type */
 #define RTE_LOGTYPE_EAL        0 /**< Log related to eal. */
 				 /* was RTE_LOGTYPE_MALLOC */
diff --git a/lib/lpm/rte_lpm.h b/lib/lpm/rte_lpm.h
index 9c6df311cb..329dc1aad4 100644
--- a/lib/lpm/rte_lpm.h
+++ b/lib/lpm/rte_lpm.h
@@ -391,6 +391,10 @@ static inline void
 rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
 	uint32_t defv);
 
+#ifdef __cplusplus
+}
+#endif
+
 #if defined(RTE_ARCH_ARM)
 #ifdef RTE_HAS_SVE_ACLE
 #include "rte_lpm_sve.h"
@@ -407,8 +411,4 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
 #include "rte_lpm_scalar.h"
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif /* _RTE_LPM_H_ */
diff --git a/lib/member/rte_member.h b/lib/member/rte_member.h
index aec192eba5..109bdd000b 100644
--- a/lib/member/rte_member.h
+++ b/lib/member/rte_member.h
@@ -54,10 +54,6 @@
 #ifndef _RTE_MEMBER_H_
 #define _RTE_MEMBER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdbool.h>
 #include <inttypes.h>
@@ -100,6 +96,10 @@ typedef uint16_t member_set_t;
 #define MEMBER_HASH_FUNC       rte_jhash
 #endif
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** @internal setsummary structure. */
 struct rte_member_setsum;
 
diff --git a/lib/member/rte_member_sketch.h b/lib/member/rte_member_sketch.h
index 74f24ca223..6a8d5104dd 100644
--- a/lib/member/rte_member_sketch.h
+++ b/lib/member/rte_member_sketch.h
@@ -5,13 +5,13 @@
 #ifndef RTE_MEMBER_SKETCH_H
 #define RTE_MEMBER_SKETCH_H
 
+#include <rte_vect.h>
+#include <rte_ring_elem.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_vect.h>
-#include <rte_ring_elem.h>
-
 #define NUM_ROW_SCALAR 5
 #define INTERVAL (1 << 15)
 
diff --git a/lib/member/rte_member_sketch_avx512.h b/lib/member/rte_member_sketch_avx512.h
index 52666b5b4c..a8ef3b065e 100644
--- a/lib/member/rte_member_sketch_avx512.h
+++ b/lib/member/rte_member_sketch_avx512.h
@@ -5,14 +5,14 @@
 #ifndef RTE_MEMBER_SKETCH_AVX512_H
 #define RTE_MEMBER_SKETCH_AVX512_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_vect.h>
 #include "rte_member.h"
 #include "rte_member_sketch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define NUM_ROW_VEC 8
 
 void
diff --git a/lib/member/rte_member_x86.h b/lib/member/rte_member_x86.h
index d115151f9f..4de453485b 100644
--- a/lib/member/rte_member_x86.h
+++ b/lib/member/rte_member_x86.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_MEMBER_X86_H_
 #define _RTE_MEMBER_X86_H_
 
+#include <x86intrin.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <x86intrin.h>
-
 #if defined(__AVX2__)
 
 static inline int
diff --git a/lib/member/rte_xxh64_avx512.h b/lib/member/rte_xxh64_avx512.h
index ffe6cb79f9..58f896ebb8 100644
--- a/lib/member/rte_xxh64_avx512.h
+++ b/lib/member/rte_xxh64_avx512.h
@@ -5,13 +5,13 @@
 #ifndef RTE_XXH64_AVX512_H
 #define RTE_XXH64_AVX512_H
 
+#include <rte_common.h>
+#include <immintrin.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <immintrin.h>
-
 /* 0b1001111000110111011110011011000110000101111010111100101010000111 */
 static const uint64_t PRIME64_1 = 0x9E3779B185EBCA87ULL;
 /* 0b1100001010110010101011100011110100100111110101001110101101001111 */
diff --git a/lib/mempool/mempool_trace.h b/lib/mempool/mempool_trace.h
index dffef062e4..c595a3116b 100644
--- a/lib/mempool/mempool_trace.h
+++ b/lib/mempool/mempool_trace.h
@@ -11,15 +11,15 @@
  * APIs for mempool trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "rte_mempool.h"
 
 #include <rte_memzone.h>
 #include <rte_trace_point.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_TRACE_POINT(
 	rte_mempool_trace_create,
 	RTE_TRACE_POINT_ARGS(const char *name, uint32_t nb_elts,
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..9c5cdbb291 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -11,12 +11,12 @@
  * Mempool fast path API for trace support
  */
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 RTE_TRACE_POINT_FP(
 	rte_mempool_trace_ops_dequeue_bulk,
 	RTE_TRACE_POINT_ARGS(void *mempool, void **obj_table,
diff --git a/lib/meter/rte_meter.h b/lib/meter/rte_meter.h
index bd68cbe389..e72bf93b3e 100644
--- a/lib/meter/rte_meter.h
+++ b/lib/meter/rte_meter.h
@@ -6,10 +6,6 @@
 #ifndef __INCLUDE_RTE_METER_H__
 #define __INCLUDE_RTE_METER_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Traffic Metering
@@ -22,6 +18,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Application Programmer's Interface (API)
  */
diff --git a/lib/mldev/mldev_utils.h b/lib/mldev/mldev_utils.h
index 5e2a180adc..bf21067d38 100644
--- a/lib/mldev/mldev_utils.h
+++ b/lib/mldev/mldev_utils.h
@@ -5,10 +5,6 @@
 #ifndef RTE_MLDEV_UTILS_H
 #define RTE_MLDEV_UTILS_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -20,6 +16,10 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_mldev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  *
diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h
index b3bd281083..8dccf125fc 100644
--- a/lib/mldev/rte_mldev_core.h
+++ b/lib/mldev/rte_mldev_core.h
@@ -16,10 +16,6 @@
  * These APIs are for MLDEV PMDs and library only.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <dev_driver.h>
@@ -27,6 +23,10 @@ extern "C" {
 #include <rte_log.h>
 #include <rte_mldev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Device state */
 #define ML_DEV_DETACHED (0)
 #define ML_DEV_ATTACHED (1)
diff --git a/lib/mldev/rte_mldev_pmd.h b/lib/mldev/rte_mldev_pmd.h
index fd5bbf4360..47c0f23223 100644
--- a/lib/mldev/rte_mldev_pmd.h
+++ b/lib/mldev/rte_mldev_pmd.h
@@ -14,10 +14,6 @@
  * These APIs are for MLDEV PMDs only and user applications should not call them directly.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_common.h>
@@ -25,6 +21,10 @@ extern "C" {
 #include <rte_mldev.h>
 #include <rte_mldev_core.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  *
diff --git a/lib/net/rte_ether.h b/lib/net/rte_ether.h
index 32ed515aef..403e84f50b 100644
--- a/lib/net/rte_ether.h
+++ b/lib/net/rte_ether.h
@@ -11,10 +11,6 @@
  * Ethernet Helpers in RTE
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdio.h>
 
@@ -22,6 +18,10 @@ extern "C" {
 #include <rte_mbuf.h>
 #include <rte_byteorder.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_ETHER_ADDR_LEN  6 /**< Length of Ethernet address. */
 #define RTE_ETHER_TYPE_LEN  2 /**< Length of Ethernet type field. */
 #define RTE_ETHER_CRC_LEN   4 /**< Length of Ethernet CRC. */
diff --git a/lib/net/rte_net.h b/lib/net/rte_net.h
index cdc6cf956d..40ad6a71a1 100644
--- a/lib/net/rte_net.h
+++ b/lib/net/rte_net.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_NET_PTYPE_H_
 #define _RTE_NET_PTYPE_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_ip.h>
 #include <rte_udp.h>
 #include <rte_tcp.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Structure containing header lengths associated to a packet, filled
  * by rte_net_get_ptype().
diff --git a/lib/net/rte_sctp.h b/lib/net/rte_sctp.h
index 965682dc2b..a8ba9e49d8 100644
--- a/lib/net/rte_sctp.h
+++ b/lib/net/rte_sctp.h
@@ -14,14 +14,14 @@
 #ifndef _RTE_SCTP_H_
 #define _RTE_SCTP_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_byteorder.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * SCTP Header
  */
diff --git a/lib/node/rte_node_eth_api.h b/lib/node/rte_node_eth_api.h
index 143cf131b3..2b7019f6bb 100644
--- a/lib/node/rte_node_eth_api.h
+++ b/lib/node/rte_node_eth_api.h
@@ -16,15 +16,15 @@
  * and its queue associations.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_graph.h>
 #include <rte_mempool.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Port config for ethdev_rx and ethdev_tx node.
  */
diff --git a/lib/node/rte_node_ip4_api.h b/lib/node/rte_node_ip4_api.h
index 24f8ec843a..950751a525 100644
--- a/lib/node/rte_node_ip4_api.h
+++ b/lib/node/rte_node_ip4_api.h
@@ -15,15 +15,15 @@
  * This API allows to do control path functions of ip4_* nodes
  * like ip4_lookup, ip4_rewrite.
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_compat.h>
 
 #include <rte_graph.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * IP4 lookup next nodes.
  */
diff --git a/lib/node/rte_node_ip6_api.h b/lib/node/rte_node_ip6_api.h
index a538dc2ea7..f467aac7b6 100644
--- a/lib/node/rte_node_ip6_api.h
+++ b/lib/node/rte_node_ip6_api.h
@@ -15,13 +15,13 @@
  * This API allows to do control path functions of ip6_* nodes
  * like ip6_lookup, ip6_rewrite.
  */
+#include <rte_common.h>
+#include <rte_compat.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <rte_compat.h>
-
 /**
  * IP6 lookup next nodes.
  */
diff --git a/lib/node/rte_node_udp4_input_api.h b/lib/node/rte_node_udp4_input_api.h
index c873acbbe0..694660bd6a 100644
--- a/lib/node/rte_node_udp4_input_api.h
+++ b/lib/node/rte_node_udp4_input_api.h
@@ -16,14 +16,14 @@
  * like udp4_input.
  *
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_compat.h>
 
 #include "rte_graph.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 /**
  * UDP4 lookup next nodes.
  */
diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
index c26fc77209..9a50a12142 100644
--- a/lib/pci/rte_pci.h
+++ b/lib/pci/rte_pci.h
@@ -12,14 +12,14 @@
  * RTE PCI Library
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <inttypes.h>
 #include <sys/types.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Conventional PCI and PCI-X Mode 1 devices have 256 bytes of
  * configuration space.  PCI-X Mode 2 and PCIe devices have 4096 bytes of
diff --git a/lib/pdcp/rte_pdcp.h b/lib/pdcp/rte_pdcp.h
index f74524f83d..15fcbf9607 100644
--- a/lib/pdcp/rte_pdcp.h
+++ b/lib/pdcp/rte_pdcp.h
@@ -19,10 +19,6 @@
 #include <rte_pdcp_hdr.h>
 #include <rte_security.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /* Forward declarations. */
 struct rte_pdcp_entity;
 
@@ -373,6 +369,10 @@ rte_pdcp_t_reordering_expiry_handle(const struct rte_pdcp_entity *entity,
  */
 #include <rte_pdcp_group.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pipeline/rte_pipeline.h b/lib/pipeline/rte_pipeline.h
index 0c7994b4f2..c9e7172453 100644
--- a/lib/pipeline/rte_pipeline.h
+++ b/lib/pipeline/rte_pipeline.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PIPELINE_H__
 #define __INCLUDE_RTE_PIPELINE_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Pipeline
@@ -59,6 +55,10 @@ extern "C" {
 #include <rte_table.h>
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_mbuf;
 
 /*
diff --git a/lib/pipeline/rte_port_in_action.h b/lib/pipeline/rte_port_in_action.h
index ec2994599f..9d17bae988 100644
--- a/lib/pipeline/rte_port_in_action.h
+++ b/lib/pipeline/rte_port_in_action.h
@@ -46,10 +46,6 @@
  * @b EXPERIMENTAL: this API may change without prior notice
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -57,6 +53,10 @@ extern "C" {
 
 #include "rte_pipeline.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Input port actions. */
 enum rte_port_in_action_type {
 	/** Filter selected input packets. */
diff --git a/lib/pipeline/rte_swx_ctl.h b/lib/pipeline/rte_swx_ctl.h
index 6ef2551ab5..c4e63753f5 100644
--- a/lib/pipeline/rte_swx_ctl.h
+++ b/lib/pipeline/rte_swx_ctl.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_CTL_H__
 #define __INCLUDE_RTE_SWX_CTL_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Pipeline Control
@@ -22,6 +18,10 @@ extern "C" {
 #include "rte_swx_port.h"
 #include "rte_swx_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_swx_pipeline;
 
 /** Name size. */
diff --git a/lib/pipeline/rte_swx_extern.h b/lib/pipeline/rte_swx_extern.h
index e10e963d63..1553fa81ec 100644
--- a/lib/pipeline/rte_swx_extern.h
+++ b/lib/pipeline/rte_swx_extern.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_EXTERN_H__
 #define __INCLUDE_RTE_SWX_EXTERN_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Extern objects and functions
@@ -19,6 +15,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Extern type
  */
diff --git a/lib/pipeline/rte_swx_ipsec.h b/lib/pipeline/rte_swx_ipsec.h
index 7c07fdc739..d2e5abef7d 100644
--- a/lib/pipeline/rte_swx_ipsec.h
+++ b/lib/pipeline/rte_swx_ipsec.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_IPSEC_H__
 #define __INCLUDE_RTE_SWX_IPSEC_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Internet Protocol Security (IPsec)
@@ -53,6 +49,10 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_crypto_sym.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * IPsec Setup API
  */
diff --git a/lib/pipeline/rte_swx_pipeline.h b/lib/pipeline/rte_swx_pipeline.h
index 25df042d3b..882bd4bf6f 100644
--- a/lib/pipeline/rte_swx_pipeline.h
+++ b/lib/pipeline/rte_swx_pipeline.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PIPELINE_H__
 #define __INCLUDE_RTE_SWX_PIPELINE_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Pipeline
@@ -22,6 +18,10 @@ extern "C" {
 #include "rte_swx_table.h"
 #include "rte_swx_extern.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Name size. */
 #ifndef RTE_SWX_NAME_SIZE
 #define RTE_SWX_NAME_SIZE 64
diff --git a/lib/pipeline/rte_swx_pipeline_spec.h b/lib/pipeline/rte_swx_pipeline_spec.h
index dd88c0bfab..077b407c0a 100644
--- a/lib/pipeline/rte_swx_pipeline_spec.h
+++ b/lib/pipeline/rte_swx_pipeline_spec.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PIPELINE_SPEC_H__
 #define __INCLUDE_RTE_SWX_PIPELINE_SPEC_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdio.h>
 
@@ -15,6 +11,10 @@ extern "C" {
 
 #include <rte_swx_pipeline.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * extobj.
  *
diff --git a/lib/pipeline/rte_table_action.h b/lib/pipeline/rte_table_action.h
index 5dffbeb700..bab4bfd2e2 100644
--- a/lib/pipeline/rte_table_action.h
+++ b/lib/pipeline/rte_table_action.h
@@ -52,10 +52,6 @@
  * @b EXPERIMENTAL: this API may change without prior notice
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -65,6 +61,10 @@ extern "C" {
 
 #include "rte_pipeline.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Table actions. */
 enum rte_table_action_type {
 	/** Forward to next pipeline table, output port or drop. */
diff --git a/lib/port/rte_port.h b/lib/port/rte_port.h
index 0e30db371e..4b20872537 100644
--- a/lib/port/rte_port.h
+++ b/lib/port/rte_port.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_H__
 #define __INCLUDE_RTE_PORT_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port
@@ -20,6 +16,10 @@ extern "C" {
 #include <stdint.h>
 #include <rte_mbuf.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**@{
  * Macros to allow accessing metadata stored in the mbuf headroom
  * just beyond the end of the mbuf data structure returned by a port
diff --git a/lib/port/rte_port_ethdev.h b/lib/port/rte_port_ethdev.h
index e07021cb89..7729ff0da3 100644
--- a/lib/port/rte_port_ethdev.h
+++ b/lib/port/rte_port_ethdev.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_ETHDEV_H__
 #define __INCLUDE_RTE_PORT_ETHDEV_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Ethernet Device
@@ -21,6 +17,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ethdev_reader port parameters */
 struct rte_port_ethdev_reader_params {
 	/** NIC RX port ID */
diff --git a/lib/port/rte_port_eventdev.h b/lib/port/rte_port_eventdev.h
index 0efb8e1021..d9eccf07d4 100644
--- a/lib/port/rte_port_eventdev.h
+++ b/lib/port/rte_port_eventdev.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_EVENTDEV_H__
 #define __INCLUDE_RTE_PORT_EVENTDEV_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Eventdev Interface
@@ -24,6 +20,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Eventdev_reader port parameters */
 struct rte_port_eventdev_reader_params {
 	/** Eventdev Device ID */
diff --git a/lib/port/rte_port_fd.h b/lib/port/rte_port_fd.h
index 885b9ada22..40a5e4a426 100644
--- a/lib/port/rte_port_fd.h
+++ b/lib/port/rte_port_fd.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_FD_H__
 #define __INCLUDE_RTE_PORT_FD_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port FD Device
@@ -21,6 +17,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** fd_reader port parameters */
 struct rte_port_fd_reader_params {
 	/** File descriptor */
diff --git a/lib/port/rte_port_frag.h b/lib/port/rte_port_frag.h
index 4055872e8d..9a10f10523 100644
--- a/lib/port/rte_port_frag.h
+++ b/lib/port/rte_port_frag.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_IP_FRAG_H__
 #define __INCLUDE_RTE_PORT_IP_FRAG_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port for IPv4 Fragmentation
@@ -31,6 +27,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ring_reader_ipv4_frag port parameters */
 struct rte_port_ring_reader_frag_params {
 	/** Underlying single consumer ring that has to be pre-initialized. */
diff --git a/lib/port/rte_port_ras.h b/lib/port/rte_port_ras.h
index 94cfb3ed92..86e36f5362 100644
--- a/lib/port/rte_port_ras.h
+++ b/lib/port/rte_port_ras.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_RAS_H__
 #define __INCLUDE_RTE_PORT_RAS_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port for IPv4 Reassembly
@@ -31,6 +27,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ring_writer_ipv4_ras port parameters */
 struct rte_port_ring_writer_ras_params {
 	/** Underlying single consumer ring that has to be pre-initialized. */
diff --git a/lib/port/rte_port_ring.h b/lib/port/rte_port_ring.h
index 027928c924..2089d0889b 100644
--- a/lib/port/rte_port_ring.h
+++ b/lib/port/rte_port_ring.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_RING_H__
 #define __INCLUDE_RTE_PORT_RING_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Ring
@@ -27,6 +23,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ring_reader port parameters */
 struct rte_port_ring_reader_params {
 	/** Underlying consumer ring that has to be pre-initialized */
diff --git a/lib/port/rte_port_sched.h b/lib/port/rte_port_sched.h
index 251380ef80..1bf08ae6a9 100644
--- a/lib/port/rte_port_sched.h
+++ b/lib/port/rte_port_sched.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_SCHED_H__
 #define __INCLUDE_RTE_PORT_SCHED_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Hierarchical Scheduler
@@ -23,6 +19,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** sched_reader port parameters */
 struct rte_port_sched_reader_params {
 	/** Underlying pre-initialized rte_sched_port */
diff --git a/lib/port/rte_port_source_sink.h b/lib/port/rte_port_source_sink.h
index bcdbaf1e40..3122dd5038 100644
--- a/lib/port/rte_port_source_sink.h
+++ b/lib/port/rte_port_source_sink.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_SOURCE_SINK_H__
 #define __INCLUDE_RTE_PORT_SOURCE_SINK_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Source/Sink
@@ -19,6 +15,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** source port parameters */
 struct rte_port_source_params {
 	/** Pre-initialized buffer pool */
diff --git a/lib/port/rte_port_sym_crypto.h b/lib/port/rte_port_sym_crypto.h
index 6532b4388a..d03cdc1e8b 100644
--- a/lib/port/rte_port_sym_crypto.h
+++ b/lib/port/rte_port_sym_crypto.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_SYM_CRYPTO_H__
 #define __INCLUDE_RTE_PORT_SYM_CRYPTO_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port sym crypto Interface
@@ -23,6 +19,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Function prototype for reader post action. */
 typedef void (*rte_port_sym_crypto_reader_callback_fn)(struct rte_mbuf **pkts,
 		uint16_t n_pkts, void *arg);
diff --git a/lib/port/rte_swx_port.h b/lib/port/rte_swx_port.h
index 1dbd95ae87..b52b125572 100644
--- a/lib/port/rte_swx_port.h
+++ b/lib/port/rte_swx_port.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_H__
 #define __INCLUDE_RTE_SWX_PORT_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Port
@@ -17,6 +13,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Packet. */
 struct rte_swx_pkt {
 	/** Opaque packet handle. */
diff --git a/lib/port/rte_swx_port_ethdev.h b/lib/port/rte_swx_port_ethdev.h
index cbc2d7b213..1828031e67 100644
--- a/lib/port/rte_swx_port_ethdev.h
+++ b/lib/port/rte_swx_port_ethdev.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_ETHDEV_H__
 #define __INCLUDE_RTE_SWX_PORT_ETHDEV_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Ethernet Device Input and Output Ports
@@ -17,6 +13,10 @@ extern "C" {
 
 #include "rte_swx_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Ethernet device input port (reader) creation parameters. */
 struct rte_swx_port_ethdev_reader_params {
 	/** Name of a valid and fully configured Ethernet device. */
diff --git a/lib/port/rte_swx_port_fd.h b/lib/port/rte_swx_port_fd.h
index e61719c8f6..63529cf0ab 100644
--- a/lib/port/rte_swx_port_fd.h
+++ b/lib/port/rte_swx_port_fd.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_FD_H__
 #define __INCLUDE_RTE_SWX_PORT_FD_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX FD Input and Output Ports
@@ -18,6 +14,10 @@ extern "C" {
 
 #include "rte_swx_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** fd_reader port parameters */
 struct rte_swx_port_fd_reader_params {
 	/** File descriptor. Must be valid and opened in non-blocking mode. */
diff --git a/lib/port/rte_swx_port_ring.h b/lib/port/rte_swx_port_ring.h
index efc485fb08..ef241c3fee 100644
--- a/lib/port/rte_swx_port_ring.h
+++ b/lib/port/rte_swx_port_ring.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_RING_H__
 #define __INCLUDE_RTE_SWX_PORT_RING_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Ring Input and Output Ports
@@ -18,6 +14,10 @@ extern "C" {
 
 #include "rte_swx_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Ring input port (reader) creation parameters. */
 struct rte_swx_port_ring_reader_params {
 	/** Name of valid RTE ring. */
diff --git a/lib/port/rte_swx_port_source_sink.h b/lib/port/rte_swx_port_source_sink.h
index 91bcbf74f4..e3ca7cfbb4 100644
--- a/lib/port/rte_swx_port_source_sink.h
+++ b/lib/port/rte_swx_port_source_sink.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_SOURCE_SINK_H__
 #define __INCLUDE_RTE_SWX_PORT_SOURCE_SINK_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Source and Sink Ports
@@ -15,6 +11,10 @@ extern "C" {
 
 #include "rte_swx_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum number of packets to read from the PCAP file. */
 #ifndef RTE_SWX_PORT_SOURCE_PKTS_MAX
 #define RTE_SWX_PORT_SOURCE_PKTS_MAX 1024
diff --git a/lib/rawdev/rte_rawdev.h b/lib/rawdev/rte_rawdev.h
index 640037b524..3fc471526e 100644
--- a/lib/rawdev/rte_rawdev.h
+++ b/lib/rawdev/rte_rawdev.h
@@ -14,13 +14,13 @@
  * no specific type already available in DPDK.
  */
 
+#include <rte_common.h>
+#include <rte_memory.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <rte_memory.h>
-
 /* Rawdevice object - essentially a void to be typecast by implementation */
 typedef void *rte_rawdev_obj_t;
 
diff --git a/lib/rawdev/rte_rawdev_pmd.h b/lib/rawdev/rte_rawdev_pmd.h
index 22b406444d..408ed461a4 100644
--- a/lib/rawdev/rte_rawdev_pmd.h
+++ b/lib/rawdev/rte_rawdev_pmd.h
@@ -13,10 +13,6 @@
  * any application.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 
 #include <dev_driver.h>
@@ -26,6 +22,10 @@ extern "C" {
 
 #include "rte_rawdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 extern int librawdev_logtype;
 #define RTE_LOGTYPE_RAWDEV librawdev_logtype
 
diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h
index ed3dd6d3d2..550fadf56a 100644
--- a/lib/rcu/rte_rcu_qsbr.h
+++ b/lib/rcu/rte_rcu_qsbr.h
@@ -21,10 +21,6 @@
  * entered quiescent state.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <inttypes.h>
 #include <stdalign.h>
 #include <stdbool.h>
@@ -36,6 +32,10 @@ extern "C" {
 #include <rte_atomic.h>
 #include <rte_ring.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 extern int rte_rcu_log_type;
 #define RTE_LOGTYPE_RCU rte_rcu_log_type
 
diff --git a/lib/regexdev/rte_regexdev.h b/lib/regexdev/rte_regexdev.h
index a50b841b1e..b18a1d4251 100644
--- a/lib/regexdev/rte_regexdev.h
+++ b/lib/regexdev/rte_regexdev.h
@@ -194,10 +194,6 @@
  * - rte_regexdev_dequeue_burst()
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_dev.h>
@@ -1428,6 +1424,10 @@ struct rte_regex_ops {
 
 #include "rte_regexdev_core.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
diff --git a/lib/ring/rte_ring.h b/lib/ring/rte_ring.h
index c709f30497..11ca69c73d 100644
--- a/lib/ring/rte_ring.h
+++ b/lib/ring/rte_ring.h
@@ -34,13 +34,13 @@
  * for more information.
  */
 
+#include <rte_ring_core.h>
+#include <rte_ring_elem.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_core.h>
-#include <rte_ring_elem.h>
-
 /**
  * Calculate the memory size needed for a ring
  *
diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 270869d214..222c5aeb3f 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -19,10 +19,6 @@
  * instead.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdalign.h>
 #include <stdio.h>
 #include <stdint.h>
@@ -38,6 +34,10 @@ extern "C" {
 #include <rte_pause.h>
 #include <rte_debug.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_TAILQ_RING_NAME "RTE_RING"
 
 /** enqueue/dequeue behavior types */
diff --git a/lib/ring/rte_ring_elem.h b/lib/ring/rte_ring_elem.h
index 7f7d4951d3..506f686884 100644
--- a/lib/ring/rte_ring_elem.h
+++ b/lib/ring/rte_ring_elem.h
@@ -16,10 +16,6 @@
  * RTE Ring with user defined element size
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_ring_core.h>
 #include <rte_ring_elem_pvt.h>
 
@@ -699,6 +695,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 
 #include <rte_ring.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ring/rte_ring_hts.h b/lib/ring/rte_ring_hts.h
index 9a5938ac58..a41acea740 100644
--- a/lib/ring/rte_ring_hts.h
+++ b/lib/ring/rte_ring_hts.h
@@ -24,12 +24,12 @@
  * To achieve that 64-bit CAS is used by head update routine.
  */
 
+#include <rte_ring_hts_elem_pvt.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_hts_elem_pvt.h>
-
 /**
  * Enqueue several objects on the HTS ring (multi-producers safe).
  *
diff --git a/lib/ring/rte_ring_peek.h b/lib/ring/rte_ring_peek.h
index c0621d12e2..2312f52668 100644
--- a/lib/ring/rte_ring_peek.h
+++ b/lib/ring/rte_ring_peek.h
@@ -43,12 +43,12 @@
  * with enqueue(/dequeue) operation till _finish_ completes.
  */
 
+#include <rte_ring_peek_elem_pvt.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_peek_elem_pvt.h>
-
 /**
  * Start to enqueue several objects on the ring.
  * Note that no actual objects are put in the queue by this function,
diff --git a/lib/ring/rte_ring_peek_zc.h b/lib/ring/rte_ring_peek_zc.h
index 0b5e34b731..3254fe0481 100644
--- a/lib/ring/rte_ring_peek_zc.h
+++ b/lib/ring/rte_ring_peek_zc.h
@@ -67,12 +67,12 @@
  * with enqueue/dequeue operation till _finish_ completes.
  */
 
+#include <rte_ring_peek_elem_pvt.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_peek_elem_pvt.h>
-
 /**
  * Ring zero-copy information structure.
  *
diff --git a/lib/ring/rte_ring_rts.h b/lib/ring/rte_ring_rts.h
index 50fc8f74db..d7a3863c83 100644
--- a/lib/ring/rte_ring_rts.h
+++ b/lib/ring/rte_ring_rts.h
@@ -51,12 +51,12 @@
  * By default HTD_MAX == ring.capacity / 8.
  */
 
+#include <rte_ring_rts_elem_pvt.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_rts_elem_pvt.h>
-
 /**
  * Enqueue several objects on the RTS ring (multi-producers safe).
  *
diff --git a/lib/sched/rte_approx.h b/lib/sched/rte_approx.h
index b60086330e..738e33a98b 100644
--- a/lib/sched/rte_approx.h
+++ b/lib/sched/rte_approx.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_APPROX_H__
 #define __INCLUDE_RTE_APPROX_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Rational Approximation
@@ -20,6 +16,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Find best rational approximation
  *
diff --git a/lib/sched/rte_pie.h b/lib/sched/rte_pie.h
index 1477a47700..2a385ffdba 100644
--- a/lib/sched/rte_pie.h
+++ b/lib/sched/rte_pie.h
@@ -5,10 +5,6 @@
 #ifndef __RTE_PIE_H_INCLUDED__
 #define __RTE_PIE_H_INCLUDED__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * Proportional Integral controller Enhanced (PIE)
@@ -20,6 +16,10 @@ extern "C" {
 #include <rte_debug.h>
 #include <rte_cycles.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_DQ_THRESHOLD   16384   /**< Queue length threshold (2^14)
 				     * to start measurement cycle (bytes)
 				     */
diff --git a/lib/sched/rte_red.h b/lib/sched/rte_red.h
index afaa35fcd6..e62abb9295 100644
--- a/lib/sched/rte_red.h
+++ b/lib/sched/rte_red.h
@@ -5,10 +5,6 @@
 #ifndef __RTE_RED_H_INCLUDED__
 #define __RTE_RED_H_INCLUDED__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Random Early Detection (RED)
@@ -20,6 +16,10 @@ extern "C" {
 #include <rte_cycles.h>
 #include <rte_branch_prediction.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_RED_SCALING                     10         /**< Fraction size for fixed-point */
 #define RTE_RED_S                           (1 << 22)  /**< Packet size multiplied by number of leaf queues */
 #define RTE_RED_MAX_TH_MAX                  1023       /**< Max threshold limit in fixed point format */
diff --git a/lib/sched/rte_sched.h b/lib/sched/rte_sched.h
index b882c4a882..222e6b3583 100644
--- a/lib/sched/rte_sched.h
+++ b/lib/sched/rte_sched.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_SCHED_H__
 #define __INCLUDE_RTE_SCHED_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Hierarchical Scheduler
@@ -62,6 +58,10 @@ extern "C" {
 #include "rte_red.h"
 #include "rte_pie.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum number of queues per pipe.
  * Note that the multiple queues (power of 2) can only be assigned to
  * lowest priority (best-effort) traffic class. Other higher priority traffic
diff --git a/lib/sched/rte_sched_common.h b/lib/sched/rte_sched_common.h
index 573d164569..a5acb9c08a 100644
--- a/lib/sched/rte_sched_common.h
+++ b/lib/sched/rte_sched_common.h
@@ -5,13 +5,13 @@
 #ifndef __INCLUDE_RTE_SCHED_COMMON_H__
 #define __INCLUDE_RTE_SCHED_COMMON_H__
 
+#include <stdint.h>
+#include <sys/types.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include <sys/types.h>
-
 #if 0
 static inline uint32_t
 rte_min_pos_4_u16(uint16_t *x)
diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
index 1c8474b74f..7a9bafa0fa 100644
--- a/lib/security/rte_security.h
+++ b/lib/security/rte_security.h
@@ -12,10 +12,6 @@
  * RTE Security Common Definitions
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <sys/types.h>
 
 #include <rte_compat.h>
@@ -24,6 +20,10 @@ extern "C" {
 #include <rte_ip.h>
 #include <rte_mbuf_dyn.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** IPSec protocol mode */
 enum rte_security_ipsec_sa_mode {
 	RTE_SECURITY_IPSEC_SA_MODE_TRANSPORT = 1,
diff --git a/lib/security/rte_security_driver.h b/lib/security/rte_security_driver.h
index 9bb5052a4c..2ceb145066 100644
--- a/lib/security/rte_security_driver.h
+++ b/lib/security/rte_security_driver.h
@@ -12,13 +12,13 @@
  * RTE Security Common Definitions
  */
 
+#include <rte_compat.h>
+#include "rte_security.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_compat.h>
-#include "rte_security.h"
-
 /**
  * @internal
  * Security session to be used by library for internal usage
diff --git a/lib/stack/rte_stack.h b/lib/stack/rte_stack.h
index 3325757568..4439adfc42 100644
--- a/lib/stack/rte_stack.h
+++ b/lib/stack/rte_stack.h
@@ -15,10 +15,6 @@
 #ifndef _RTE_STACK_H_
 #define _RTE_STACK_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdalign.h>
 
 #include <rte_debug.h>
@@ -95,6 +91,10 @@ struct __rte_cache_aligned rte_stack {
 #include "rte_stack_std.h"
 #include "rte_stack_lf.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Push several objects on the stack (MT-safe).
  *
diff --git a/lib/table/rte_lru.h b/lib/table/rte_lru.h
index 88229d8632..bc1ad36500 100644
--- a/lib/table/rte_lru.h
+++ b/lib/table/rte_lru.h
@@ -5,15 +5,15 @@
 #ifndef __INCLUDE_RTE_LRU_H__
 #define __INCLUDE_RTE_LRU_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_config.h>
 #ifdef RTE_ARCH_X86_64
 #include "rte_lru_x86.h"
 #elif defined(RTE_ARCH_ARM64)
 #include "rte_lru_arm64.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #else
 #undef RTE_TABLE_HASH_LRU_STRATEGY
 #define RTE_TABLE_HASH_LRU_STRATEGY                        1
@@ -86,8 +86,4 @@ do {									\
 
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif
diff --git a/lib/table/rte_lru_arm64.h b/lib/table/rte_lru_arm64.h
index f19b0bdb4e..f9a4678ee0 100644
--- a/lib/table/rte_lru_arm64.h
+++ b/lib/table/rte_lru_arm64.h
@@ -5,14 +5,14 @@
 #ifndef __RTE_LRU_ARM64_H__
 #define __RTE_LRU_ARM64_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_vect.h>
 #include <rte_bitops.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_TABLE_HASH_LRU_STRATEGY
 #ifdef __ARM_NEON
 #define RTE_TABLE_HASH_LRU_STRATEGY                        3
diff --git a/lib/table/rte_lru_x86.h b/lib/table/rte_lru_x86.h
index ddfb8c1c8c..93f4a136a8 100644
--- a/lib/table/rte_lru_x86.h
+++ b/lib/table/rte_lru_x86.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_LRU_X86_H__
 #define __INCLUDE_RTE_LRU_X86_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_config.h>
@@ -97,8 +93,4 @@ do {									\
 
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif
diff --git a/lib/table/rte_swx_hash_func.h b/lib/table/rte_swx_hash_func.h
index 04f3d543e7..9c65cfa913 100644
--- a/lib/table/rte_swx_hash_func.h
+++ b/lib/table/rte_swx_hash_func.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_HASH_FUNC_H__
 #define __INCLUDE_RTE_SWX_HASH_FUNC_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Hash Function
@@ -15,6 +11,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Hash function prototype
  *
diff --git a/lib/table/rte_swx_keycmp.h b/lib/table/rte_swx_keycmp.h
index 09fb1be869..b0ed819307 100644
--- a/lib/table/rte_swx_keycmp.h
+++ b/lib/table/rte_swx_keycmp.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_KEYCMP_H__
 #define __INCLUDE_RTE_SWX_KEYCMP_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Key Comparison Functions
@@ -16,6 +12,10 @@ extern "C" {
 #include <stdint.h>
 #include <string.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Key comparison function prototype
  *
diff --git a/lib/table/rte_swx_table.h b/lib/table/rte_swx_table.h
index ac01e19781..3c53459498 100644
--- a/lib/table/rte_swx_table.h
+++ b/lib/table/rte_swx_table.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_H__
 #define __INCLUDE_RTE_SWX_TABLE_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Table
@@ -21,6 +17,10 @@ extern "C" {
 
 #include "rte_swx_hash_func.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Match type. */
 enum rte_swx_table_match_type {
 	/** Wildcard Match (WM). */
diff --git a/lib/table/rte_swx_table_em.h b/lib/table/rte_swx_table_em.h
index b7423dd060..592541f01f 100644
--- a/lib/table/rte_swx_table_em.h
+++ b/lib/table/rte_swx_table_em.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_EM_H__
 #define __INCLUDE_RTE_SWX_TABLE_EM_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Exact Match Table
@@ -16,6 +12,10 @@ extern "C" {
 
 #include <rte_swx_table.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Exact match table operations - unoptimized. */
 extern struct rte_swx_table_ops rte_swx_table_exact_match_unoptimized_ops;
 
diff --git a/lib/table/rte_swx_table_learner.h b/lib/table/rte_swx_table_learner.h
index c5ea015b8d..9a18be083d 100644
--- a/lib/table/rte_swx_table_learner.h
+++ b/lib/table/rte_swx_table_learner.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_LEARNER_H__
 #define __INCLUDE_RTE_SWX_TABLE_LEARNER_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Learner Table
@@ -53,6 +49,10 @@ extern "C" {
 
 #include "rte_swx_hash_func.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum number of key timeout values per learner table. */
 #ifndef RTE_SWX_TABLE_LEARNER_N_KEY_TIMEOUTS_MAX
 #define RTE_SWX_TABLE_LEARNER_N_KEY_TIMEOUTS_MAX 16
diff --git a/lib/table/rte_swx_table_selector.h b/lib/table/rte_swx_table_selector.h
index 05863cc90b..ef29bdb6b0 100644
--- a/lib/table/rte_swx_table_selector.h
+++ b/lib/table/rte_swx_table_selector.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_SELECTOR_H__
 #define __INCLUDE_RTE_SWX_TABLE_SELECTOR_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Selector Table
@@ -21,6 +17,10 @@ extern "C" {
 
 #include "rte_swx_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Selector table creation parameters. */
 struct rte_swx_table_selector_params {
 	/** Group ID offset. */
diff --git a/lib/table/rte_swx_table_wm.h b/lib/table/rte_swx_table_wm.h
index 4fd52c0a17..7eb6f8e2a6 100644
--- a/lib/table/rte_swx_table_wm.h
+++ b/lib/table/rte_swx_table_wm.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_WM_H__
 #define __INCLUDE_RTE_SWX_TABLE_WM_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Wildcard Match Table
@@ -16,6 +12,10 @@ extern "C" {
 
 #include <rte_swx_table.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Wildcard match table operations. */
 extern struct rte_swx_table_ops rte_swx_table_wildcard_match_ops;
 
diff --git a/lib/table/rte_table.h b/lib/table/rte_table.h
index 9a5faf0e32..43a5a1a7b3 100644
--- a/lib/table/rte_table.h
+++ b/lib/table/rte_table.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_H__
 #define __INCLUDE_RTE_TABLE_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table
@@ -27,6 +23,10 @@ extern "C" {
 #include <stdint.h>
 #include <rte_port.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_mbuf;
 
 /** Lookup table statistics */
diff --git a/lib/table/rte_table_acl.h b/lib/table/rte_table_acl.h
index 1cb7b9fbbd..61af7b88e4 100644
--- a/lib/table/rte_table_acl.h
+++ b/lib/table/rte_table_acl.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_ACL_H__
 #define __INCLUDE_RTE_TABLE_ACL_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table ACL
@@ -25,6 +21,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ACL table parameters */
 struct rte_table_acl_params {
 	/** Name */
diff --git a/lib/table/rte_table_array.h b/lib/table/rte_table_array.h
index fad83b0588..b2a7b95d68 100644
--- a/lib/table/rte_table_array.h
+++ b/lib/table/rte_table_array.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_ARRAY_H__
 #define __INCLUDE_RTE_TABLE_ARRAY_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table Array
@@ -20,6 +16,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Array table parameters */
 struct rte_table_array_params {
 	/** Number of array entries. Has to be a power of two. */
diff --git a/lib/table/rte_table_hash.h b/lib/table/rte_table_hash.h
index 6698621dae..ff8fc9e9ce 100644
--- a/lib/table/rte_table_hash.h
+++ b/lib/table/rte_table_hash.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_HASH_H__
 #define __INCLUDE_RTE_TABLE_HASH_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table Hash
@@ -52,6 +48,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Hash function */
 typedef uint64_t (*rte_table_hash_op_hash)(
 	void *key,
diff --git a/lib/table/rte_table_hash_cuckoo.h b/lib/table/rte_table_hash_cuckoo.h
index 3a55d28e9b..55aa12216a 100644
--- a/lib/table/rte_table_hash_cuckoo.h
+++ b/lib/table/rte_table_hash_cuckoo.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_HASH_CUCKOO_H__
 #define __INCLUDE_RTE_TABLE_HASH_CUCKOO_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table Hash Cuckoo
@@ -20,6 +16,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Hash table parameters */
 struct rte_table_hash_cuckoo_params {
 	/** Name */
diff --git a/lib/table/rte_table_hash_func.h b/lib/table/rte_table_hash_func.h
index aa779c2182..cba7ec4c20 100644
--- a/lib/table/rte_table_hash_func.h
+++ b/lib/table/rte_table_hash_func.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_HASH_FUNC_H__
 #define __INCLUDE_RTE_TABLE_HASH_FUNC_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -18,6 +14,10 @@ extern "C" {
 
 #include <x86intrin.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline uint64_t
 rte_crc32_u64(uint64_t crc, uint64_t v)
 {
@@ -28,6 +28,10 @@ rte_crc32_u64(uint64_t crc, uint64_t v)
 #include "rte_table_hash_func_arm64.h"
 #else
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline uint64_t
 rte_crc32_u64(uint64_t crc, uint64_t v)
 {
diff --git a/lib/table/rte_table_lpm.h b/lib/table/rte_table_lpm.h
index dde32deed9..59b9bdee89 100644
--- a/lib/table/rte_table_lpm.h
+++ b/lib/table/rte_table_lpm.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_LPM_H__
 #define __INCLUDE_RTE_TABLE_LPM_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table LPM for IPv4
@@ -45,6 +41,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** LPM table parameters */
 struct rte_table_lpm_params {
 	/** Table name */
diff --git a/lib/table/rte_table_lpm_ipv6.h b/lib/table/rte_table_lpm_ipv6.h
index 96ddbd32c2..166a5ba9ee 100644
--- a/lib/table/rte_table_lpm_ipv6.h
+++ b/lib/table/rte_table_lpm_ipv6.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_LPM_IPV6_H__
 #define __INCLUDE_RTE_TABLE_LPM_IPV6_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table LPM for IPv6
@@ -45,6 +41,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_LPM_IPV6_ADDR_SIZE 16
 
 /** LPM table parameters */
diff --git a/lib/table/rte_table_stub.h b/lib/table/rte_table_stub.h
index 846526ea99..f7e589df16 100644
--- a/lib/table/rte_table_stub.h
+++ b/lib/table/rte_table_stub.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_STUB_H__
 #define __INCLUDE_RTE_TABLE_STUB_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table Stub
@@ -18,6 +14,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Stub table parameters: NONE */
 
 /** Stub table operations */
diff --git a/lib/telemetry/rte_telemetry.h b/lib/telemetry/rte_telemetry.h
index cab9daa6fe..463819e2bf 100644
--- a/lib/telemetry/rte_telemetry.h
+++ b/lib/telemetry/rte_telemetry.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_TELEMETRY_H_
 #define _RTE_TELEMETRY_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_compat.h>
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum length for string used in object. */
 #define RTE_TEL_MAX_STRING_LEN 128
 /** Maximum length of string. */
diff --git a/lib/vhost/rte_vdpa.h b/lib/vhost/rte_vdpa.h
index 6ac85d1bbf..18e273c20f 100644
--- a/lib/vhost/rte_vdpa.h
+++ b/lib/vhost/rte_vdpa.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_VDPA_H_
 #define _RTE_VDPA_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -17,6 +13,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum name length for statistics counters */
 #define RTE_VDPA_STATS_NAME_SIZE 64
 
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index b0434c4b8d..c7a5f56df8 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -18,10 +18,6 @@
 #include <rte_memory.h>
 #include <rte_mempool.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifndef __cplusplus
 /* These are not C++-aware. */
 #include <linux/vhost.h>
@@ -29,6 +25,10 @@ extern "C" {
 #include <linux/virtio_net.h>
 #endif
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_VHOST_USER_CLIENT		(1ULL << 0)
 #define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
 #define RTE_VHOST_USER_RESERVED_1	(1ULL << 2)
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index 8f190dd44b..60995e4e62 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -5,15 +5,15 @@
 #ifndef _RTE_VHOST_ASYNC_H_
 #define _RTE_VHOST_ASYNC_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
 #include <rte_mbuf.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Register an async channel for a vhost queue
  *
diff --git a/lib/vhost/rte_vhost_crypto.h b/lib/vhost/rte_vhost_crypto.h
index f962a53818..af61f0907e 100644
--- a/lib/vhost/rte_vhost_crypto.h
+++ b/lib/vhost/rte_vhost_crypto.h
@@ -5,12 +5,12 @@
 #ifndef _VHOST_CRYPTO_H_
 #define _VHOST_CRYPTO_H_
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 /* pre-declare structs to avoid including full headers */
 struct rte_mempool;
 struct rte_crypto_op;
diff --git a/lib/vhost/vdpa_driver.h b/lib/vhost/vdpa_driver.h
index 8db4ab9f4d..42392a0d14 100644
--- a/lib/vhost/vdpa_driver.h
+++ b/lib/vhost/vdpa_driver.h
@@ -5,10 +5,6 @@
 #ifndef _VDPA_DRIVER_H_
 #define _VDPA_DRIVER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 
 #include <rte_compat.h>
@@ -16,6 +12,10 @@ extern "C" {
 #include "rte_vhost.h"
 #include "rte_vdpa.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_VHOST_QUEUE_ALL UINT16_MAX
 
 /**
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 2/6] eal: extend bit manipulation functionality
  2024-09-10  6:20                                           ` [PATCH v5 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-09-10  6:20                                             ` [PATCH v5 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
@ 2024-09-10  6:20                                             ` Mattias Rönnblom
  2024-09-10  6:20                                             ` [PATCH v5 3/6] eal: add unit tests for bit operations Mattias Rönnblom
                                                               ` (3 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  6:20 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

Add functionality to test and modify the value of individual bits in
32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

PATCH v3:
 * Remove unnecessary <rte_compat.h> include.
 * Remove redundant 'fun' parameter from the __RTE_GEN_BIT_*() macros
   (Jack Bond-Preston).
 * Introduce __RTE_BIT_BIT_OPS() macro, consistent with how things
   are done when generating the atomic bit operations.
 * Refer to volatile bit op functions as variants instead of families
   (macro parameter naming).

RFC v6:
 * Have rte_bit_test() accept const-marked bitsets.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.
---
 lib/eal/include/rte_bitops.h | 260 ++++++++++++++++++++++++++++++++++-
 1 file changed, 258 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..6915b945ba 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,197 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		uint32_t *: __rte_bit_test32,			\
+		const uint32_t *: __rte_bit_test32,		\
+		uint64_t *: __rte_bit_test64,			\
+		const uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## variant ## test ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## set ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## clear ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## assign ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## variant ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## variant ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## flip ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## variant ## test ## size(addr, nr); \
+		__rte_bit_ ## variant ## assign ## size(addr, nr, !value); \
+	}
+
+#define __RTE_GEN_BIT_OPS(v, qualifier, size)	\
+	__RTE_GEN_BIT_TEST(v, qualifier, size)	\
+	__RTE_GEN_BIT_SET(v, qualifier, size)	\
+	__RTE_GEN_BIT_CLEAR(v, qualifier, size)	\
+	__RTE_GEN_BIT_ASSIGN(v, qualifier, size)	\
+	__RTE_GEN_BIT_FLIP(v, qualifier, size)
+
+#define __RTE_GEN_BIT_OPS_SIZE(size) \
+	__RTE_GEN_BIT_OPS(,, size)
+
+__RTE_GEN_BIT_OPS_SIZE(32)
+__RTE_GEN_BIT_OPS_SIZE(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +981,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 3/6] eal: add unit tests for bit operations
  2024-09-10  6:20                                           ` [PATCH v5 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-09-10  6:20                                             ` [PATCH v5 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
  2024-09-10  6:20                                             ` [PATCH v5 2/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-09-10  6:20                                             ` Mattias Rönnblom
  2024-09-10  6:20                                             ` [PATCH v5 4/6] eal: add atomic " Mattias Rönnblom
                                                               ` (2 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  6:20 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_[test|set|clear|assign|flip]()
functions.

The tests are converted to use the test suite runner framework.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

RFC v6:
 * Test rte_bit_*test() usage through const pointers.

RFC v4:
 * Remove redundant line continuations.
---
 app/test/test_bitops.c | 85 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..322f58c066 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,68 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+									\
+			const uint ## size ## _t *const_ptr = &word;	\
+			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
+				    reference_bit,			\
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +172,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 4/6] eal: add atomic bit operations
  2024-09-10  6:20                                           ` [PATCH v5 0/6] Improve EAL bit operations API Mattias Rönnblom
                                                               ` (2 preceding siblings ...)
  2024-09-10  6:20                                             ` [PATCH v5 3/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-09-10  6:20                                             ` Mattias Rönnblom
  2024-09-10  6:20                                             ` [PATCH v5 5/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  2024-09-10  6:20                                             ` [PATCH v5 6/6] eal: extend bitops to handle volatile pointers Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  6:20 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

Add atomic bit test/set/clear/assign/flip and
test-and-set/clear/assign/flip functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

PATCH v3:
 * Introduce __RTE_GEN_BIT_ATOMIC_*() 'qualifier' argument already in
   this patch (Jack Bond-Preston).
 * Refer to volatile bit op functions as variants instead of families
   (macro parameter naming).
 * Update release notes.

PATCH:
 * Add missing macro #undef for C++ version of atomic bit flip.

RFC v7:
 * Replace compare-exchange-based rte_bitset_atomic_test_and_*() and
   flip() with implementations that use the previous value as returned
   by the atomic fetch function.
 * Reword documentation to match the non-atomic macro variants.
 * Remove pointer to <rte_stdatomic.h> for memory model documentation,
   since there is no documentation for that API.

RFC v6:
 * Have rte_bit_atomic_test() accept const-marked bitsets.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.
---
 doc/guides/rel_notes/release_24_11.rst |  17 +
 lib/eal/include/rte_bitops.h           | 415 +++++++++++++++++++++++++
 2 files changed, 432 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..3111b1e4c0 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -56,6 +56,23 @@ New Features
      =======================================================
 
 
+* **Extended bit operations API.**
+
+  The support for bit-level operations on single 32- and 64-bit words
+  in <rte_bitops.h> has been extended with two families of
+  semantically well-defined functions.
+
+  rte_bit_[test|set|clear|assign|flip]() functions provide excellent
+  performance (by avoiding restricting the compiler and CPU), but give
+  no guarantees in regards to memory ordering or atomicity.
+
+  rte_bit_atomic_*() provides atomic bit-level operations, including
+  the possibility to specifying memory ordering constraints.
+
+  The new public API elements are polymorphic, using the _Generic-
+  based macros (for C) and function overloading (in C++ translation
+  units).
+
 Removed Items
 -------------
 
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 6915b945ba..3ad6795fd1 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -226,6 +227,204 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 uint64_t *: __rte_bit_atomic_test64,			\
+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
+							    memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '1', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '0', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Generic selection macro to atomically negate the value of the bit
+ * specified by @c nr in the word pointed to by @c addr to the value
+ * indicated by @c value, with the memory ordering as specified with
+ * @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Generic selection macro to atomically test and set bit specified by
+ * @c nr in the word pointed to by @c addr to '1', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Generic selection macro to atomically test and clear bit specified
+ * by @c nr in the word pointed to by @c addr to '0', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Generic selection macro to atomically test and assign bit specified
+ * by @c nr in the word pointed to by @c addr the value specified by
+ * @c value, with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(variant, qualifier, size)			\
 	__rte_experimental						\
 	static inline bool						\
@@ -299,6 +498,146 @@ extern "C" {
 __RTE_GEN_BIT_OPS_SIZE(32)
 __RTE_GEN_BIT_OPS_SIZE(64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test ## size(const qualifier uint ## size ## _t *addr, \
+						     unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr = \
+			(const qualifier RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_ ## variant ## set ## size(qualifier uint ## size ## _t *addr, \
+					      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_ ## variant ## clear ## size(qualifier uint ## size ## _t *addr,	\
+						unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_ ## variant ## flip ## size(qualifier uint ## size ## _t *addr, \
+					       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_xor_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_## variant ## assign ## size(qualifier uint ## size ## _t *addr, \
+						unsigned int nr, bool value, \
+						int memory_order)	\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_ ## variant ## set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_ ## variant ## clear ## size(addr, nr, \
+								     memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(variant, qualifier, size)	\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test_and_set ## size(qualifier uint ## size ## _t *addr, \
+						       unsigned int nr,	\
+						       int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+		prev = rte_atomic_fetch_or_explicit(a_addr, mask,	\
+						    memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(variant, qualifier, size)	\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test_and_clear ## size(qualifier uint ## size ## _t *addr, \
+							 unsigned int nr, \
+							 int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+	        prev = rte_atomic_fetch_and_explicit(a_addr, ~mask,	\
+						     memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(variant, qualifier, size)	\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test_and_assign ## size(qualifier uint ## size ## _t *addr, \
+							  unsigned int nr, \
+							  bool value,	\
+							  int memory_order) \
+	{								\
+		if (value)						\
+			return __rte_bit_atomic_ ## variant ## test_and_set ## size(addr, nr, memory_order); \
+		else							\
+			return __rte_bit_atomic_ ## variant ## test_and_clear ## size(addr, nr, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_SET(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(variant, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(variant, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(variant, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_FLIP(variant, qualifier, size)
+
+#define __RTE_GEN_BIT_ATOMIC_OPS_SIZE(size) \
+	__RTE_GEN_BIT_ATOMIC_OPS(,, size)
+
+__RTE_GEN_BIT_ATOMIC_OPS_SIZE(32)
+__RTE_GEN_BIT_ATOMIC_OPS_SIZE(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -994,6 +1333,15 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_flip
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1037,12 +1385,79 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 5/6] eal: add unit tests for atomic bit access functions
  2024-09-10  6:20                                           ` [PATCH v5 0/6] Improve EAL bit operations API Mattias Rönnblom
                                                               ` (3 preceding siblings ...)
  2024-09-10  6:20                                             ` [PATCH v5 4/6] eal: add atomic " Mattias Rönnblom
@ 2024-09-10  6:20                                             ` Mattias Rönnblom
  2024-09-10  6:20                                             ` [PATCH v5 6/6] eal: extend bitops to handle volatile pointers Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  6:20 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_atomic_*() family of
functions.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.
---
 app/test/test_bitops.c | 313 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 312 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 322f58c066..b80216a0a1 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -61,6 +64,304 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -177,6 +478,16 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v5 6/6] eal: extend bitops to handle volatile pointers
  2024-09-10  6:20                                           ` [PATCH v5 0/6] Improve EAL bit operations API Mattias Rönnblom
                                                               ` (4 preceding siblings ...)
  2024-09-10  6:20                                             ` [PATCH v5 5/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
@ 2024-09-10  6:20                                             ` Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  6:20 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

Have rte_bit_[test|set|clear|assign|flip]() and rte_bit_atomic_*()
handle volatile-marked pointers.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

PATCH v3:
 * Updated to reflect removed 'fun' parameter in __RTE_GEN_BIT_*()
   (Jack Bond-Preston).

PATCH v2:
 * Actually run the test_bit_atomic_v_access*() test functions.
---
 app/test/test_bitops.c       |  32 +++-
 lib/eal/include/rte_bitops.h | 301 +++++++++++++++++++++++------------
 2 files changed, 222 insertions(+), 111 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index b80216a0a1..10e87f6776 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -14,13 +14,13 @@
 #include "test.h"
 
 #define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
-			    flip_fun, test_fun, size)			\
+			    flip_fun, test_fun, size, mod)		\
 	static int							\
 	test_name(void)							\
 	{								\
 		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
 		unsigned int bit_nr;					\
-		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+		mod uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
 									\
 		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
 			bool reference_bit = (reference >> bit_nr) & 1;	\
@@ -41,7 +41,7 @@
 				    "Bit %d had unflipped value", bit_nr); \
 			flip_fun(&word, bit_nr);			\
 									\
-			const uint ## size ## _t *const_ptr = &word;	\
+			const mod uint ## size ## _t *const_ptr = &word; \
 			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
 				    reference_bit,			\
 				    "Bit %d had unexpected value", bit_nr); \
@@ -59,10 +59,16 @@
 	}
 
 GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
-		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32,)
 
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
-		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64,)
+
+GEN_TEST_BIT_ACCESS(test_bit_v_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32, volatile)
+
+GEN_TEST_BIT_ACCESS(test_bit_v_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64, volatile)
 
 #define bit_atomic_set(addr, nr)				\
 	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
@@ -81,11 +87,19 @@ GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 
 GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
 		    bit_atomic_clear, bit_atomic_assign,
-		    bit_atomic_flip, bit_atomic_test, 32)
+		    bit_atomic_flip, bit_atomic_test, 32,)
 
 GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
 		    bit_atomic_clear, bit_atomic_assign,
-		    bit_atomic_flip, bit_atomic_test, 64)
+		    bit_atomic_flip, bit_atomic_test, 64,)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_v_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32, volatile)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_v_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64, volatile)
 
 #define PARALLEL_TEST_RUNTIME 0.25
 
@@ -480,8 +494,12 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_v_access32),
+		TEST_CASE(test_bit_v_access64),
 		TEST_CASE(test_bit_atomic_access32),
 		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_v_access32),
+		TEST_CASE(test_bit_atomic_v_access64),
 		TEST_CASE(test_bit_atomic_parallel_assign32),
 		TEST_CASE(test_bit_atomic_parallel_assign64),
 		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3ad6795fd1..d7a07c4099 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -127,12 +127,16 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_test(addr, nr)					\
-	_Generic((addr),					\
-		uint32_t *: __rte_bit_test32,			\
-		const uint32_t *: __rte_bit_test32,		\
-		uint64_t *: __rte_bit_test64,			\
-		const uint64_t *: __rte_bit_test64)(addr, nr)
+#define rte_bit_test(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_test32,				\
+		 const uint32_t *: __rte_bit_test32,			\
+		 volatile uint32_t *: __rte_bit_v_test32,		\
+		 const volatile uint32_t *: __rte_bit_v_test32,		\
+		 uint64_t *: __rte_bit_test64,				\
+		 const uint64_t *: __rte_bit_test64,			\
+		 volatile uint64_t *: __rte_bit_v_test64,		\
+		 const volatile uint64_t *: __rte_bit_v_test64)(addr, nr)
 
 /**
  * @warning
@@ -152,10 +156,12 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_set(addr, nr)				\
-	_Generic((addr),				\
-		 uint32_t *: __rte_bit_set32,		\
-		 uint64_t *: __rte_bit_set64)(addr, nr)
+#define rte_bit_set(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_set32,				\
+		 volatile uint32_t *: __rte_bit_v_set32,		\
+		 uint64_t *: __rte_bit_set64,				\
+		 volatile uint64_t *: __rte_bit_v_set64)(addr, nr)
 
 /**
  * @warning
@@ -175,10 +181,12 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_clear(addr, nr)					\
-	_Generic((addr),					\
-		 uint32_t *: __rte_bit_clear32,			\
-		 uint64_t *: __rte_bit_clear64)(addr, nr)
+#define rte_bit_clear(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_clear32,				\
+		 volatile uint32_t *: __rte_bit_v_clear32,		\
+		 uint64_t *: __rte_bit_clear64,				\
+		 volatile uint64_t *: __rte_bit_v_clear64)(addr, nr)
 
 /**
  * @warning
@@ -202,7 +210,9 @@ extern "C" {
 #define rte_bit_assign(addr, nr, value)					\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_assign32,			\
-		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+		 volatile uint32_t *: __rte_bit_v_assign32,		\
+		 uint64_t *: __rte_bit_assign64,			\
+		 volatile uint64_t *: __rte_bit_v_assign64)(addr, nr, value)
 
 /**
  * @warning
@@ -225,7 +235,9 @@ extern "C" {
 #define rte_bit_flip(addr, nr)						\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_flip32,				\
-		 uint64_t *: __rte_bit_flip64)(addr, nr)
+		 volatile uint32_t *: __rte_bit_v_flip32,		\
+		 uint64_t *: __rte_bit_flip64,				\
+		 volatile uint64_t *: __rte_bit_v_flip64)(addr, nr)
 
 /**
  * @warning
@@ -250,9 +262,13 @@ extern "C" {
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test32,			\
 		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 volatile uint32_t *: __rte_bit_atomic_v_test32,	\
+		 const volatile uint32_t *: __rte_bit_atomic_v_test32,	\
 		 uint64_t *: __rte_bit_atomic_test64,			\
-		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
-							    memory_order)
+		 const uint64_t *: __rte_bit_atomic_test64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test64,	\
+		 const volatile uint64_t *: __rte_bit_atomic_v_test64) \
+						    (addr, nr, memory_order)
 
 /**
  * @warning
@@ -274,7 +290,10 @@ extern "C" {
 #define rte_bit_atomic_set(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_set32,			\
-		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_set32,		\
+		 uint64_t *: __rte_bit_atomic_set64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_set64)(addr, nr, \
+								memory_order)
 
 /**
  * @warning
@@ -296,7 +315,10 @@ extern "C" {
 #define rte_bit_atomic_clear(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_clear32,			\
-		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_clear32,	\
+		 uint64_t *: __rte_bit_atomic_clear64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_clear64)(addr, nr, \
+								  memory_order)
 
 /**
  * @warning
@@ -320,8 +342,11 @@ extern "C" {
 #define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_assign32,			\
-		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
-							memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_assign32,	\
+		 uint64_t *: __rte_bit_atomic_assign64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_assign64)(addr, nr, \
+								   value, \
+								   memory_order)
 
 /**
  * @warning
@@ -344,7 +369,10 @@ extern "C" {
 #define rte_bit_atomic_flip(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_flip32,			\
-		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_flip32,	\
+		 uint64_t *: __rte_bit_atomic_flip64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_flip64)(addr, nr, \
+								 memory_order)
 
 /**
  * @warning
@@ -368,8 +396,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
-		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
-							      memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_set32, \
+		 uint64_t *: __rte_bit_atomic_test_and_set64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_set64) \
+						    (addr, nr, memory_order)
 
 /**
  * @warning
@@ -393,8 +423,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
-		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
-								memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_clear32, \
+		 uint64_t *: __rte_bit_atomic_test_and_clear64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_clear64) \
+						       (addr, nr, memory_order)
 
 /**
  * @warning
@@ -421,9 +453,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
-		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
-								 value, \
-								 memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_assign32, \
+		 uint64_t *: __rte_bit_atomic_test_and_assign64,	\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_assign64) \
+						(addr, nr, value, memory_order)
 
 #define __RTE_GEN_BIT_TEST(variant, qualifier, size)			\
 	__rte_experimental						\
@@ -493,7 +526,8 @@ extern "C" {
 	__RTE_GEN_BIT_FLIP(v, qualifier, size)
 
 #define __RTE_GEN_BIT_OPS_SIZE(size) \
-	__RTE_GEN_BIT_OPS(,, size)
+	__RTE_GEN_BIT_OPS(,, size) \
+	__RTE_GEN_BIT_OPS(v_, volatile, size)
 
 __RTE_GEN_BIT_OPS_SIZE(32)
 __RTE_GEN_BIT_OPS_SIZE(64)
@@ -633,7 +667,8 @@ __RTE_GEN_BIT_OPS_SIZE(64)
 	__RTE_GEN_BIT_ATOMIC_FLIP(variant, qualifier, size)
 
 #define __RTE_GEN_BIT_ATOMIC_OPS_SIZE(size) \
-	__RTE_GEN_BIT_ATOMIC_OPS(,, size)
+	__RTE_GEN_BIT_ATOMIC_OPS(,, size) \
+	__RTE_GEN_BIT_ATOMIC_OPS(v_, volatile, size)
 
 __RTE_GEN_BIT_ATOMIC_OPS_SIZE(32)
 __RTE_GEN_BIT_ATOMIC_OPS_SIZE(64)
@@ -1342,120 +1377,178 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_atomic_test_and_clear
 #undef rte_bit_atomic_test_and_assign
 
-#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+#define __RTE_BIT_OVERLOAD_V_2(family, v, fun, c, size, arg1_type, arg1_name) \
 	static inline void						\
-	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
-			arg1_type arg1_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name)			\
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
-	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
-	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+#define __RTE_BIT_OVERLOAD_SZ_2(family, fun, c, size, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_V_2(family,, fun, c, size, arg1_type,	\
+			       arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2(family, v_, fun, c volatile, size, \
+			       arg1_type, arg1_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name)				\
+#define __RTE_BIT_OVERLOAD_2(family, fun, c, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(family, fun, c, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(family, fun, c, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_V_2R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name)				\
 	static inline ret_type						\
-	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
 			arg1_type arg1_name)				\
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
-	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2R(family,, fun, c, size, ret_type, arg1_type, \
+				arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2R(family, v_, fun, c volatile,		\
+				size, ret_type, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_2R(family, fun, c, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, 32, ret_type, arg1_type, \
 				 arg1_name)				\
-	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+	__RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, 64, ret_type, arg1_type, \
 				 arg1_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
-				arg2_type, arg2_name)			\
+#define __RTE_BIT_OVERLOAD_V_3(family, v, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name)			\
 	static inline void						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name) \
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name, \
+							 arg2_name);	\
 	}
 
-#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+#define __RTE_BIT_OVERLOAD_SZ_3(family, fun, c, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_V_3(family,, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_V_3(family, v_, fun, c volatile, size, arg1_type, \
+			       arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_3(family, fun, c, arg1_type, arg1_name, arg2_type, \
 			     arg2_name)					\
-	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+	__RTE_BIT_OVERLOAD_SZ_3(family, fun, c, 32, arg1_type, arg1_name, \
 				arg2_type, arg2_name)			\
-	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+	__RTE_BIT_OVERLOAD_SZ_3(family, fun, c, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name)	\
+#define __RTE_BIT_OVERLOAD_V_3R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name)	\
 	static inline ret_type						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name) \
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name, \
+								arg2_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
-			      arg2_type, arg2_name)			\
-	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, size, ret_type, arg1_type, \
 				 arg1_name, arg2_type, arg2_name)	\
-	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name)
+	__RTE_BIT_OVERLOAD_V_3R(family,, fun, c, size, ret_type, \
+				arg1_type, arg1_name, arg2_type, arg2_name) \
+	__RTE_BIT_OVERLOAD_V_3R(family, v_, fun, c volatile, size, \
+				ret_type, arg1_type, arg1_name, arg2_type, \
+				arg2_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
-				arg2_type, arg2_name, arg3_type, arg3_name) \
+#define __RTE_BIT_OVERLOAD_3R(family, fun, c, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, 32, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name) \
+	__RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, 64, ret_type, \
+				 arg1_type, arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_V_4(family, v, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name, arg3_type,	arg3_name) \
 	static inline void						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name, arg3_type arg3_name)	\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name, \
+				  arg3_type arg3_name)			\
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
-					  arg3_name);		      \
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name, \
+							 arg2_name,	\
+							 arg3_name);	\
 	}
 
-#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
-			     arg2_name, arg3_type, arg3_name)		\
-	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+#define __RTE_BIT_OVERLOAD_SZ_4(family, fun, c, size, arg1_type, arg1_name, \
 				arg2_type, arg2_name, arg3_type, arg3_name) \
-	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
-				arg2_type, arg2_name, arg3_type, arg3_name)
-
-#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name, arg3_type, \
-				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4(family,, fun, c, size, arg1_type,	\
+			       arg1_name, arg2_type, arg2_name, arg3_type, \
+			       arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4(family, v_, fun, c volatile, size,	\
+			       arg1_type, arg1_name, arg2_type, arg2_name, \
+			       arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_4(family, fun, c, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(family, fun, c, 32, arg1_type,		\
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4(family, fun, c, 64, arg1_type,		\
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)
+
+#define __RTE_BIT_OVERLOAD_V_4R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
 	static inline ret_type						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name, arg3_type arg3_name)	\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name, \
+				  arg3_type arg3_name)			\
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
-						 arg3_name);		\
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name, \
+								arg2_name, \
+								arg3_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
-			      arg2_type, arg2_name, arg3_type, arg3_name) \
-	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, size, ret_type, arg1_type, \
 				 arg1_name, arg2_type, arg2_name, arg3_type, \
 				 arg3_name)				\
-	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name, arg3_type, \
-				 arg3_name)
-
-__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
-__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
-__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
-__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
-__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
-
-__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+	__RTE_BIT_OVERLOAD_V_4R(family,, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4R(family, v_, fun, c volatile, size,	\
+				ret_type, arg1_type, arg1_name, arg2_type, \
+				arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_4R(family, fun, c, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, 32, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name, \
+				 arg3_type, arg3_name)			\
+	__RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, 64, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name, \
+				 arg3_type, arg3_name)
+
+__RTE_BIT_OVERLOAD_2R(, test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(, set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(, clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(, assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(, flip,, unsigned int, nr)
+
+__RTE_BIT_OVERLOAD_3R(atomic_, test, const, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+__RTE_BIT_OVERLOAD_3(atomic_, set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_, clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_, assign,, unsigned int, nr, bool, value,
 		     int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_3(atomic_, flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_, test_and_set,, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_3R(atomic_, test_and_clear,, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_4R(atomic_, test_and_assign,, bool, unsigned int, nr,
 		      bool, value, int, memory_order)
 
 #endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v6 0/6] Improve EAL bit operations API
  2024-09-10  6:20                                             ` [PATCH v5 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
@ 2024-09-10  8:31                                               ` Mattias Rönnblom
  2024-09-10  8:31                                                 ` [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
                                                                   ` (5 more replies)
  0 siblings, 6 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  8:31 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

All new functionality is exposed to the user as generic selection
macros, delegating the actual work to private (__-marked) static
inline functions. Public functions (e.g., rte_bit_set32()) would just
be bloating the API. Such generic selection macros will here be
referred to as "functions", although technically they are not.

The legacy <rte_bitops.h> rte_bit_relaxed_*() functions is replaced
with two new families:

rte_bit_[test|set|clear|assign|flip]() which provides no memory
ordering or atomicity guarantees, but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK-prefixed instructions on x86)
may be significant. rte_bit_[test|set|clear|assign|flip]() may be
used with volatile word pointers, in which case they guarantee
that the program-level accesses actually occur.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the bitset API (proposed in a different RFC
patchset), and potentially other APIs depending on RTE bitops for
atomic bit-level ops). Either one needs two bitset variants, one
_Atomic bitset and one non-atomic one, or the bitset code needs to
cast the non-_Atomic pointer to an _Atomic one. Having a separate
_Atomic bitset would be bloat and also prevent the user from both, in
some situations, doing atomic operations against a bit set, while in
other situations (e.g., at times when MT safety is not a concern)
operating on the same objects in a non-atomic manner.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign() are added, to assign a
particular boolean value to a particular bit.

All new functions have properly documented semantics.

All new functions operate on both 32 and 64-bit words, with type
checking.

_Generic allow the user code to be a little more impact. Have a
type-generic atomic test/set/clear/assign bit API also seems
consistent with the "core" (word-size) atomics API, which is generic
(both GCC builtins and <rte_stdatomic.h> are).

The _Generic versions avoids having explicit unsigned long versions of
all functions. If you have an unsigned long, it's safe to use the
generic version (e.g., rte_set_bit()) and _Generic will pick the right
function, provided long is either 32 or 64 bit on your platform (which
it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

C++ doesn't support generic selection. In C++ translation units the
_Generic macros are replaced with overloaded functions, implemented by
means of a huge, complicated C macro mess.

Mattias Rönnblom (6):
  dpdk: do not force C linkage on include file dependencies
  eal: extend bit manipulation functionality
  eal: add unit tests for bit operations
  eal: add atomic bit operations
  eal: add unit tests for atomic bit access functions
  eal: extend bitops to handle volatile pointers

 app/test/packet_burst_generator.h             |   8 +-
 app/test/test_bitops.c                        | 416 +++++++++-
 app/test/virtual_pmd.h                        |   4 +-
 doc/guides/rel_notes/release_24_11.rst        |  17 +
 drivers/bus/auxiliary/bus_auxiliary_driver.h  |   8 +-
 drivers/bus/cdx/bus_cdx_driver.h              |   8 +-
 drivers/bus/dpaa/include/fsl_qman.h           |   8 +-
 drivers/bus/fslmc/bus_fslmc_driver.h          |   8 +-
 drivers/bus/pci/bus_pci_driver.h              |   8 +-
 drivers/bus/pci/rte_bus_pci.h                 |   8 +-
 drivers/bus/platform/bus_platform_driver.h    |   8 +-
 drivers/bus/vdev/bus_vdev_driver.h            |   8 +-
 drivers/bus/vmbus/bus_vmbus_driver.h          |   8 +-
 drivers/bus/vmbus/rte_bus_vmbus.h             |   8 +-
 drivers/dma/cnxk/cnxk_dma_event_dp.h          |   8 +-
 drivers/dma/ioat/ioat_hw_defs.h               |   4 +-
 drivers/event/dlb2/rte_pmd_dlb2.h             |   8 +-
 drivers/mempool/dpaa2/rte_dpaa2_mempool.h     |   6 +-
 drivers/net/avp/rte_avp_fifo.h                |   8 +-
 drivers/net/bonding/rte_eth_bond.h            |   4 +-
 drivers/net/i40e/rte_pmd_i40e.h               |   8 +-
 drivers/net/mlx5/mlx5_trace.h                 |   8 +-
 drivers/net/ring/rte_eth_ring.h               |   4 +-
 drivers/net/vhost/rte_eth_vhost.h             |   8 +-
 drivers/raw/ifpga/afu_pmd_core.h              |   8 +-
 drivers/raw/ifpga/afu_pmd_he_hssi.h           |   6 +-
 drivers/raw/ifpga/afu_pmd_he_lpbk.h           |   6 +-
 drivers/raw/ifpga/afu_pmd_he_mem.h            |   6 +-
 drivers/raw/ifpga/afu_pmd_n3000.h             |   6 +-
 drivers/raw/ifpga/rte_pmd_afu.h               |   4 +-
 drivers/raw/ifpga/rte_pmd_ifpga.h             |   4 +-
 examples/ethtool/lib/rte_ethtool.h            |   8 +-
 examples/qos_sched/main.h                     |   4 +-
 examples/vm_power_manager/channel_manager.h   |   8 +-
 lib/acl/rte_acl_osdep.h                       |   8 +-
 lib/bbdev/rte_bbdev.h                         |   8 +-
 lib/bbdev/rte_bbdev_op.h                      |   8 +-
 lib/bbdev/rte_bbdev_pmd.h                     |   8 +-
 lib/bpf/bpf_def.h                             |   8 +-
 lib/compressdev/rte_comp.h                    |   4 +-
 lib/compressdev/rte_compressdev.h             |   6 +-
 lib/compressdev/rte_compressdev_internal.h    |   8 +-
 lib/compressdev/rte_compressdev_pmd.h         |   8 +-
 lib/cryptodev/cryptodev_pmd.h                 |   8 +-
 lib/cryptodev/cryptodev_trace.h               |   8 +-
 lib/cryptodev/rte_crypto.h                    |   8 +-
 lib/cryptodev/rte_crypto_asym.h               |   8 +-
 lib/cryptodev/rte_crypto_sym.h                |   8 +-
 lib/cryptodev/rte_cryptodev.h                 |   8 +-
 lib/cryptodev/rte_cryptodev_trace_fp.h        |   4 +-
 lib/dispatcher/rte_dispatcher.h               |   8 +-
 lib/dmadev/rte_dmadev.h                       |   8 +
 lib/eal/arm/include/rte_atomic_32.h           |   4 +-
 lib/eal/arm/include/rte_atomic_64.h           |   8 +-
 lib/eal/arm/include/rte_byteorder.h           |   8 +-
 lib/eal/arm/include/rte_cpuflags_32.h         |   8 +-
 lib/eal/arm/include/rte_cpuflags_64.h         |   8 +-
 lib/eal/arm/include/rte_cycles_32.h           |   4 +-
 lib/eal/arm/include/rte_cycles_64.h           |   4 +-
 lib/eal/arm/include/rte_io.h                  |   8 +-
 lib/eal/arm/include/rte_io_64.h               |   8 +-
 lib/eal/arm/include/rte_memcpy_32.h           |   8 +-
 lib/eal/arm/include/rte_memcpy_64.h           |   8 +-
 lib/eal/arm/include/rte_pause.h               |   8 +-
 lib/eal/arm/include/rte_pause_32.h            |   6 +-
 lib/eal/arm/include/rte_pause_64.h            |   8 +-
 lib/eal/arm/include/rte_power_intrinsics.h    |   8 +-
 lib/eal/arm/include/rte_prefetch_32.h         |   8 +-
 lib/eal/arm/include/rte_prefetch_64.h         |   8 +-
 lib/eal/arm/include/rte_rwlock.h              |   4 +-
 lib/eal/arm/include/rte_spinlock.h            |   6 +-
 lib/eal/freebsd/include/rte_os.h              |   8 +-
 lib/eal/include/bus_driver.h                  |   8 +-
 lib/eal/include/dev_driver.h                  |   6 +-
 lib/eal/include/eal_trace_internal.h          |   8 +-
 lib/eal/include/generic/rte_atomic.h          |   8 +
 lib/eal/include/generic/rte_byteorder.h       |   8 +
 lib/eal/include/generic/rte_cpuflags.h        |   8 +
 lib/eal/include/generic/rte_cycles.h          |   8 +
 lib/eal/include/generic/rte_io.h              |   8 +
 lib/eal/include/generic/rte_memcpy.h          |   8 +
 lib/eal/include/generic/rte_pause.h           |   8 +
 .../include/generic/rte_power_intrinsics.h    |   8 +
 lib/eal/include/generic/rte_prefetch.h        |   8 +
 lib/eal/include/generic/rte_rwlock.h          |   8 +-
 lib/eal/include/generic/rte_spinlock.h        |   8 +
 lib/eal/include/generic/rte_vect.h            |   8 +
 lib/eal/include/rte_alarm.h                   |   4 +-
 lib/eal/include/rte_bitmap.h                  |   8 +-
 lib/eal/include/rte_bitops.h                  | 768 +++++++++++++++++-
 lib/eal/include/rte_bus.h                     |   8 +-
 lib/eal/include/rte_class.h                   |   4 +-
 lib/eal/include/rte_common.h                  |   8 +-
 lib/eal/include/rte_dev.h                     |   8 +-
 lib/eal/include/rte_devargs.h                 |   8 +-
 lib/eal/include/rte_eal_trace.h               |   4 +-
 lib/eal/include/rte_errno.h                   |   4 +-
 lib/eal/include/rte_fbarray.h                 |   8 +-
 lib/eal/include/rte_keepalive.h               |   6 +-
 lib/eal/include/rte_mcslock.h                 |   8 +-
 lib/eal/include/rte_memory.h                  |   8 +-
 lib/eal/include/rte_pci_dev_features.h        |   4 +-
 lib/eal/include/rte_pflock.h                  |   8 +-
 lib/eal/include/rte_random.h                  |   4 +-
 lib/eal/include/rte_seqcount.h                |   8 +-
 lib/eal/include/rte_seqlock.h                 |   8 +-
 lib/eal/include/rte_service.h                 |   8 +-
 lib/eal/include/rte_service_component.h       |   4 +-
 lib/eal/include/rte_stdatomic.h               |   5 +-
 lib/eal/include/rte_string_fns.h              |  17 +-
 lib/eal/include/rte_tailq.h                   |   6 +-
 lib/eal/include/rte_ticketlock.h              |   8 +-
 lib/eal/include/rte_time.h                    |   6 +-
 lib/eal/include/rte_trace.h                   |   8 +-
 lib/eal/include/rte_trace_point.h             |   8 +-
 lib/eal/include/rte_trace_point_register.h    |   8 +-
 lib/eal/include/rte_uuid.h                    |   8 +-
 lib/eal/include/rte_version.h                 |   6 +-
 lib/eal/include/rte_vfio.h                    |   8 +-
 lib/eal/linux/include/rte_os.h                |   8 +-
 lib/eal/loongarch/include/rte_atomic.h        |   6 +-
 lib/eal/loongarch/include/rte_byteorder.h     |   4 +-
 lib/eal/loongarch/include/rte_cpuflags.h      |   8 +-
 lib/eal/loongarch/include/rte_cycles.h        |   4 +-
 lib/eal/loongarch/include/rte_io.h            |   4 +-
 lib/eal/loongarch/include/rte_memcpy.h        |   4 +-
 lib/eal/loongarch/include/rte_pause.h         |   8 +-
 .../loongarch/include/rte_power_intrinsics.h  |   8 +-
 lib/eal/loongarch/include/rte_prefetch.h      |   8 +-
 lib/eal/loongarch/include/rte_rwlock.h        |   4 +-
 lib/eal/loongarch/include/rte_spinlock.h      |   6 +-
 lib/eal/ppc/include/rte_atomic.h              |   6 +-
 lib/eal/ppc/include/rte_byteorder.h           |   6 +-
 lib/eal/ppc/include/rte_cpuflags.h            |   8 +-
 lib/eal/ppc/include/rte_cycles.h              |   8 +-
 lib/eal/ppc/include/rte_io.h                  |   4 +-
 lib/eal/ppc/include/rte_memcpy.h              |   4 +-
 lib/eal/ppc/include/rte_pause.h               |   8 +-
 lib/eal/ppc/include/rte_power_intrinsics.h    |   8 +-
 lib/eal/ppc/include/rte_prefetch.h            |   8 +-
 lib/eal/ppc/include/rte_rwlock.h              |   4 +-
 lib/eal/ppc/include/rte_spinlock.h            |   8 +-
 lib/eal/riscv/include/rte_atomic.h            |   8 +-
 lib/eal/riscv/include/rte_byteorder.h         |   8 +-
 lib/eal/riscv/include/rte_cpuflags.h          |   8 +-
 lib/eal/riscv/include/rte_cycles.h            |   4 +-
 lib/eal/riscv/include/rte_io.h                |   4 +-
 lib/eal/riscv/include/rte_memcpy.h            |   4 +-
 lib/eal/riscv/include/rte_pause.h             |   8 +-
 lib/eal/riscv/include/rte_power_intrinsics.h  |   8 +-
 lib/eal/riscv/include/rte_prefetch.h          |   8 +-
 lib/eal/riscv/include/rte_rwlock.h            |   4 +-
 lib/eal/riscv/include/rte_spinlock.h          |   6 +-
 lib/eal/windows/include/pthread.h             |   6 +-
 lib/eal/windows/include/regex.h               |   8 +-
 lib/eal/windows/include/rte_windows.h         |   8 +-
 lib/eal/x86/include/rte_atomic.h              |  25 +-
 lib/eal/x86/include/rte_byteorder.h           |  16 +-
 lib/eal/x86/include/rte_cpuflags.h            |   8 +-
 lib/eal/x86/include/rte_cycles.h              |   8 +-
 lib/eal/x86/include/rte_io.h                  |   8 +-
 lib/eal/x86/include/rte_pause.h               |   7 +-
 lib/eal/x86/include/rte_power_intrinsics.h    |   8 +-
 lib/eal/x86/include/rte_prefetch.h            |   8 +-
 lib/eal/x86/include/rte_rwlock.h              |   6 +-
 lib/eal/x86/include/rte_spinlock.h            |   9 +-
 lib/ethdev/ethdev_driver.h                    |   8 +-
 lib/ethdev/ethdev_pci.h                       |   8 +-
 lib/ethdev/ethdev_trace.h                     |   8 +-
 lib/ethdev/ethdev_vdev.h                      |   8 +-
 lib/ethdev/rte_cman.h                         |   4 +-
 lib/ethdev/rte_dev_info.h                     |   4 +-
 lib/ethdev/rte_ethdev.h                       |   8 +-
 lib/ethdev/rte_ethdev_trace_fp.h              |   4 +-
 lib/eventdev/event_timer_adapter_pmd.h        |   4 +-
 lib/eventdev/eventdev_pmd.h                   |   8 +-
 lib/eventdev/eventdev_pmd_pci.h               |   8 +-
 lib/eventdev/eventdev_pmd_vdev.h              |   8 +-
 lib/eventdev/eventdev_trace.h                 |   8 +-
 lib/eventdev/rte_event_crypto_adapter.h       |   8 +-
 lib/eventdev/rte_event_eth_rx_adapter.h       |   8 +-
 lib/eventdev/rte_event_eth_tx_adapter.h       |   8 +-
 lib/eventdev/rte_event_ring.h                 |   8 +-
 lib/eventdev/rte_event_timer_adapter.h        |   8 +-
 lib/eventdev/rte_eventdev.h                   |   8 +-
 lib/eventdev/rte_eventdev_trace_fp.h          |   4 +-
 lib/graph/rte_graph_model_mcore_dispatch.h    |   8 +-
 lib/graph/rte_graph_worker.h                  |   6 +-
 lib/gso/rte_gso.h                             |   6 +-
 lib/hash/rte_fbk_hash.h                       |   8 +-
 lib/hash/rte_hash_crc.h                       |   8 +-
 lib/hash/rte_jhash.h                          |   8 +-
 lib/hash/rte_thash.h                          |   8 +-
 lib/hash/rte_thash_gfni.h                     |   8 +-
 lib/ip_frag/rte_ip_frag.h                     |   8 +-
 lib/ipsec/rte_ipsec.h                         |   8 +-
 lib/log/rte_log.h                             |   8 +-
 lib/lpm/rte_lpm.h                             |   8 +-
 lib/member/rte_member.h                       |   8 +-
 lib/member/rte_member_sketch.h                |   6 +-
 lib/member/rte_member_sketch_avx512.h         |   8 +-
 lib/member/rte_member_x86.h                   |   4 +-
 lib/member/rte_xxh64_avx512.h                 |   6 +-
 lib/mempool/mempool_trace.h                   |   8 +-
 lib/mempool/rte_mempool_trace_fp.h            |   4 +-
 lib/meter/rte_meter.h                         |   8 +-
 lib/mldev/mldev_utils.h                       |   8 +-
 lib/mldev/rte_mldev_core.h                    |   8 +-
 lib/mldev/rte_mldev_pmd.h                     |   8 +-
 lib/net/rte_ether.h                           |   8 +-
 lib/net/rte_net.h                             |   8 +-
 lib/net/rte_sctp.h                            |   8 +-
 lib/node/rte_node_eth_api.h                   |   8 +-
 lib/node/rte_node_ip4_api.h                   |   8 +-
 lib/node/rte_node_ip6_api.h                   |   6 +-
 lib/node/rte_node_udp4_input_api.h            |   8 +-
 lib/pci/rte_pci.h                             |   8 +-
 lib/pdcp/rte_pdcp.h                           |   8 +-
 lib/pipeline/rte_pipeline.h                   |   8 +-
 lib/pipeline/rte_port_in_action.h             |   8 +-
 lib/pipeline/rte_swx_ctl.h                    |   8 +-
 lib/pipeline/rte_swx_extern.h                 |   8 +-
 lib/pipeline/rte_swx_ipsec.h                  |   8 +-
 lib/pipeline/rte_swx_pipeline.h               |   8 +-
 lib/pipeline/rte_swx_pipeline_spec.h          |   8 +-
 lib/pipeline/rte_table_action.h               |   8 +-
 lib/port/rte_port.h                           |   8 +-
 lib/port/rte_port_ethdev.h                    |   8 +-
 lib/port/rte_port_eventdev.h                  |   8 +-
 lib/port/rte_port_fd.h                        |   8 +-
 lib/port/rte_port_frag.h                      |   8 +-
 lib/port/rte_port_ras.h                       |   8 +-
 lib/port/rte_port_ring.h                      |   8 +-
 lib/port/rte_port_sched.h                     |   8 +-
 lib/port/rte_port_source_sink.h               |   8 +-
 lib/port/rte_port_sym_crypto.h                |   8 +-
 lib/port/rte_swx_port.h                       |   8 +-
 lib/port/rte_swx_port_ethdev.h                |   8 +-
 lib/port/rte_swx_port_fd.h                    |   8 +-
 lib/port/rte_swx_port_ring.h                  |   8 +-
 lib/port/rte_swx_port_source_sink.h           |   8 +-
 lib/rawdev/rte_rawdev.h                       |   6 +-
 lib/rawdev/rte_rawdev_pmd.h                   |   8 +-
 lib/rcu/rte_rcu_qsbr.h                        |   8 +-
 lib/regexdev/rte_regexdev.h                   |   8 +-
 lib/ring/rte_ring.h                           |   6 +-
 lib/ring/rte_ring_core.h                      |   8 +-
 lib/ring/rte_ring_elem.h                      |   8 +-
 lib/ring/rte_ring_hts.h                       |   4 +-
 lib/ring/rte_ring_peek.h                      |   4 +-
 lib/ring/rte_ring_peek_zc.h                   |   4 +-
 lib/ring/rte_ring_rts.h                       |   4 +-
 lib/sched/rte_approx.h                        |   8 +-
 lib/sched/rte_pie.h                           |   8 +-
 lib/sched/rte_red.h                           |   8 +-
 lib/sched/rte_sched.h                         |   8 +-
 lib/sched/rte_sched_common.h                  |   6 +-
 lib/security/rte_security.h                   |   8 +-
 lib/security/rte_security_driver.h            |   6 +-
 lib/stack/rte_stack.h                         |   8 +-
 lib/table/rte_lru.h                           |  12 +-
 lib/table/rte_lru_arm64.h                     |   8 +-
 lib/table/rte_lru_x86.h                       |   8 -
 lib/table/rte_swx_hash_func.h                 |   8 +-
 lib/table/rte_swx_keycmp.h                    |   8 +-
 lib/table/rte_swx_table.h                     |   8 +-
 lib/table/rte_swx_table_em.h                  |   8 +-
 lib/table/rte_swx_table_learner.h             |   8 +-
 lib/table/rte_swx_table_selector.h            |   8 +-
 lib/table/rte_swx_table_wm.h                  |   8 +-
 lib/table/rte_table.h                         |   8 +-
 lib/table/rte_table_acl.h                     |   8 +-
 lib/table/rte_table_array.h                   |   8 +-
 lib/table/rte_table_hash.h                    |   8 +-
 lib/table/rte_table_hash_cuckoo.h             |   8 +-
 lib/table/rte_table_hash_func.h               |  12 +-
 lib/table/rte_table_lpm.h                     |   8 +-
 lib/table/rte_table_lpm_ipv6.h                |   8 +-
 lib/table/rte_table_stub.h                    |   8 +-
 lib/telemetry/rte_telemetry.h                 |   8 +-
 lib/vhost/rte_vdpa.h                          |   8 +-
 lib/vhost/rte_vhost.h                         |   8 +-
 lib/vhost/rte_vhost_async.h                   |   8 +-
 lib/vhost/rte_vhost_crypto.h                  |   4 +-
 lib/vhost/vdpa_driver.h                       |   8 +-
 285 files changed, 2264 insertions(+), 998 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies
  2024-09-10  8:31                                               ` [PATCH v6 0/6] Improve EAL bit operations API Mattias Rönnblom
@ 2024-09-10  8:31                                                 ` Mattias Rönnblom
  2024-09-16 12:05                                                   ` David Marchand
  2024-09-16 12:13                                                   ` David Marchand
  2024-09-10  8:31                                                 ` [PATCH v6 2/6] eal: extend bit manipulation functionality Mattias Rönnblom
                                                                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  8:31 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

Assure that 'extern "C" { /../ }' do not cover files included from a
particular header file, and address minor issues resulting from this
change of order.

Dealing with C++ should delegate to the individual include file level,
rather than being imposed by the user of that file. For example,
forcing C linkage prevents __Generic macros being replaced with
overloaded static inline functions in C++ translation units.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>

--

PATCH v6:
 * Add missing extern "C" in rte_atomic.h, rte_cpuflags.h, rte_io.h,
   rte_vect.h.
 * Fix 32-bit x86 build issues in rte_atomic.h.

PATCH v5:
 * rte_dmadev.h was still including files under extern "C" { /../ }.
   (Chengwen Feng)
 * Fix rte_byteorder.h, broken on 32-bit x86.
---
 app/test/packet_burst_generator.h             |  8 +++---
 app/test/virtual_pmd.h                        |  4 +--
 drivers/bus/auxiliary/bus_auxiliary_driver.h  |  8 +++---
 drivers/bus/cdx/bus_cdx_driver.h              |  8 +++---
 drivers/bus/dpaa/include/fsl_qman.h           |  8 +++---
 drivers/bus/fslmc/bus_fslmc_driver.h          |  8 +++---
 drivers/bus/pci/bus_pci_driver.h              |  8 +++---
 drivers/bus/pci/rte_bus_pci.h                 |  8 +++---
 drivers/bus/platform/bus_platform_driver.h    |  8 +++---
 drivers/bus/vdev/bus_vdev_driver.h            |  8 +++---
 drivers/bus/vmbus/bus_vmbus_driver.h          |  8 +++---
 drivers/bus/vmbus/rte_bus_vmbus.h             |  8 +++---
 drivers/dma/cnxk/cnxk_dma_event_dp.h          |  8 +++---
 drivers/dma/ioat/ioat_hw_defs.h               |  4 +--
 drivers/event/dlb2/rte_pmd_dlb2.h             |  8 +++---
 drivers/mempool/dpaa2/rte_dpaa2_mempool.h     |  6 ++---
 drivers/net/avp/rte_avp_fifo.h                |  8 +++---
 drivers/net/bonding/rte_eth_bond.h            |  4 +--
 drivers/net/i40e/rte_pmd_i40e.h               |  8 +++---
 drivers/net/mlx5/mlx5_trace.h                 |  8 +++---
 drivers/net/ring/rte_eth_ring.h               |  4 +--
 drivers/net/vhost/rte_eth_vhost.h             |  8 +++---
 drivers/raw/ifpga/afu_pmd_core.h              |  8 +++---
 drivers/raw/ifpga/afu_pmd_he_hssi.h           |  6 ++---
 drivers/raw/ifpga/afu_pmd_he_lpbk.h           |  6 ++---
 drivers/raw/ifpga/afu_pmd_he_mem.h            |  6 ++---
 drivers/raw/ifpga/afu_pmd_n3000.h             |  6 ++---
 drivers/raw/ifpga/rte_pmd_afu.h               |  4 +--
 drivers/raw/ifpga/rte_pmd_ifpga.h             |  4 +--
 examples/ethtool/lib/rte_ethtool.h            |  8 +++---
 examples/qos_sched/main.h                     |  4 +--
 examples/vm_power_manager/channel_manager.h   |  8 +++---
 lib/acl/rte_acl_osdep.h                       |  8 +++---
 lib/bbdev/rte_bbdev.h                         |  8 +++---
 lib/bbdev/rte_bbdev_op.h                      |  8 +++---
 lib/bbdev/rte_bbdev_pmd.h                     |  8 +++---
 lib/bpf/bpf_def.h                             |  8 +++---
 lib/compressdev/rte_comp.h                    |  4 +--
 lib/compressdev/rte_compressdev.h             |  6 ++---
 lib/compressdev/rte_compressdev_internal.h    |  8 +++---
 lib/compressdev/rte_compressdev_pmd.h         |  8 +++---
 lib/cryptodev/cryptodev_pmd.h                 |  8 +++---
 lib/cryptodev/cryptodev_trace.h               |  8 +++---
 lib/cryptodev/rte_crypto.h                    |  8 +++---
 lib/cryptodev/rte_crypto_asym.h               |  8 +++---
 lib/cryptodev/rte_crypto_sym.h                |  8 +++---
 lib/cryptodev/rte_cryptodev.h                 |  8 +++---
 lib/cryptodev/rte_cryptodev_trace_fp.h        |  4 +--
 lib/dispatcher/rte_dispatcher.h               |  8 +++---
 lib/dmadev/rte_dmadev.h                       |  8 ++++++
 lib/eal/arm/include/rte_atomic_32.h           |  4 +--
 lib/eal/arm/include/rte_atomic_64.h           |  8 +++---
 lib/eal/arm/include/rte_byteorder.h           |  8 +++---
 lib/eal/arm/include/rte_cpuflags_32.h         |  8 +++---
 lib/eal/arm/include/rte_cpuflags_64.h         |  8 +++---
 lib/eal/arm/include/rte_cycles_32.h           |  4 +--
 lib/eal/arm/include/rte_cycles_64.h           |  4 +--
 lib/eal/arm/include/rte_io.h                  |  8 +++---
 lib/eal/arm/include/rte_io_64.h               |  8 +++---
 lib/eal/arm/include/rte_memcpy_32.h           |  8 +++---
 lib/eal/arm/include/rte_memcpy_64.h           |  8 +++---
 lib/eal/arm/include/rte_pause.h               |  8 +++---
 lib/eal/arm/include/rte_pause_32.h            |  6 ++---
 lib/eal/arm/include/rte_pause_64.h            |  8 +++---
 lib/eal/arm/include/rte_power_intrinsics.h    |  8 +++---
 lib/eal/arm/include/rte_prefetch_32.h         |  8 +++---
 lib/eal/arm/include/rte_prefetch_64.h         |  8 +++---
 lib/eal/arm/include/rte_rwlock.h              |  4 +--
 lib/eal/arm/include/rte_spinlock.h            |  6 ++---
 lib/eal/freebsd/include/rte_os.h              |  8 +++---
 lib/eal/include/bus_driver.h                  |  8 +++---
 lib/eal/include/dev_driver.h                  |  6 ++---
 lib/eal/include/eal_trace_internal.h          |  8 +++---
 lib/eal/include/generic/rte_atomic.h          |  8 ++++++
 lib/eal/include/generic/rte_byteorder.h       |  8 ++++++
 lib/eal/include/generic/rte_cpuflags.h        |  8 ++++++
 lib/eal/include/generic/rte_cycles.h          |  8 ++++++
 lib/eal/include/generic/rte_io.h              |  8 ++++++
 lib/eal/include/generic/rte_memcpy.h          |  8 ++++++
 lib/eal/include/generic/rte_pause.h           |  8 ++++++
 .../include/generic/rte_power_intrinsics.h    |  8 ++++++
 lib/eal/include/generic/rte_prefetch.h        |  8 ++++++
 lib/eal/include/generic/rte_rwlock.h          |  8 +++---
 lib/eal/include/generic/rte_spinlock.h        |  8 ++++++
 lib/eal/include/generic/rte_vect.h            |  8 ++++++
 lib/eal/include/rte_alarm.h                   |  4 +--
 lib/eal/include/rte_bitmap.h                  |  8 +++---
 lib/eal/include/rte_bus.h                     |  8 +++---
 lib/eal/include/rte_class.h                   |  4 +--
 lib/eal/include/rte_common.h                  |  8 +++---
 lib/eal/include/rte_dev.h                     |  8 +++---
 lib/eal/include/rte_devargs.h                 |  8 +++---
 lib/eal/include/rte_eal_trace.h               |  4 +--
 lib/eal/include/rte_errno.h                   |  4 +--
 lib/eal/include/rte_fbarray.h                 |  8 +++---
 lib/eal/include/rte_keepalive.h               |  6 ++---
 lib/eal/include/rte_mcslock.h                 |  8 +++---
 lib/eal/include/rte_memory.h                  |  8 +++---
 lib/eal/include/rte_pci_dev_features.h        |  4 +--
 lib/eal/include/rte_pflock.h                  |  8 +++---
 lib/eal/include/rte_random.h                  |  4 +--
 lib/eal/include/rte_seqcount.h                |  8 +++---
 lib/eal/include/rte_seqlock.h                 |  8 +++---
 lib/eal/include/rte_service.h                 |  8 +++---
 lib/eal/include/rte_service_component.h       |  4 +--
 lib/eal/include/rte_stdatomic.h               |  5 +---
 lib/eal/include/rte_string_fns.h              | 17 +++++++++----
 lib/eal/include/rte_tailq.h                   |  6 ++---
 lib/eal/include/rte_ticketlock.h              |  8 +++---
 lib/eal/include/rte_time.h                    |  6 ++---
 lib/eal/include/rte_trace.h                   |  8 +++---
 lib/eal/include/rte_trace_point.h             |  8 +++---
 lib/eal/include/rte_trace_point_register.h    |  8 +++---
 lib/eal/include/rte_uuid.h                    |  8 +++---
 lib/eal/include/rte_version.h                 |  6 ++---
 lib/eal/include/rte_vfio.h                    |  8 +++---
 lib/eal/linux/include/rte_os.h                |  8 +++---
 lib/eal/loongarch/include/rte_atomic.h        |  6 ++---
 lib/eal/loongarch/include/rte_byteorder.h     |  4 +--
 lib/eal/loongarch/include/rte_cpuflags.h      |  8 +++---
 lib/eal/loongarch/include/rte_cycles.h        |  4 +--
 lib/eal/loongarch/include/rte_io.h            |  4 +--
 lib/eal/loongarch/include/rte_memcpy.h        |  4 +--
 lib/eal/loongarch/include/rte_pause.h         |  8 +++---
 .../loongarch/include/rte_power_intrinsics.h  |  8 +++---
 lib/eal/loongarch/include/rte_prefetch.h      |  8 +++---
 lib/eal/loongarch/include/rte_rwlock.h        |  4 +--
 lib/eal/loongarch/include/rte_spinlock.h      |  6 ++---
 lib/eal/ppc/include/rte_atomic.h              |  6 ++---
 lib/eal/ppc/include/rte_byteorder.h           |  6 ++---
 lib/eal/ppc/include/rte_cpuflags.h            |  8 +++---
 lib/eal/ppc/include/rte_cycles.h              |  8 +++---
 lib/eal/ppc/include/rte_io.h                  |  4 +--
 lib/eal/ppc/include/rte_memcpy.h              |  4 +--
 lib/eal/ppc/include/rte_pause.h               |  8 +++---
 lib/eal/ppc/include/rte_power_intrinsics.h    |  8 +++---
 lib/eal/ppc/include/rte_prefetch.h            |  8 +++---
 lib/eal/ppc/include/rte_rwlock.h              |  4 +--
 lib/eal/ppc/include/rte_spinlock.h            |  8 +++---
 lib/eal/riscv/include/rte_atomic.h            |  8 +++---
 lib/eal/riscv/include/rte_byteorder.h         |  8 +++---
 lib/eal/riscv/include/rte_cpuflags.h          |  8 +++---
 lib/eal/riscv/include/rte_cycles.h            |  4 +--
 lib/eal/riscv/include/rte_io.h                |  4 +--
 lib/eal/riscv/include/rte_memcpy.h            |  4 +--
 lib/eal/riscv/include/rte_pause.h             |  8 +++---
 lib/eal/riscv/include/rte_power_intrinsics.h  |  8 +++---
 lib/eal/riscv/include/rte_prefetch.h          |  8 +++---
 lib/eal/riscv/include/rte_rwlock.h            |  4 +--
 lib/eal/riscv/include/rte_spinlock.h          |  6 ++---
 lib/eal/windows/include/pthread.h             |  6 ++---
 lib/eal/windows/include/regex.h               |  8 +++---
 lib/eal/windows/include/rte_windows.h         |  8 +++---
 lib/eal/x86/include/rte_atomic.h              | 25 +++++++++++++------
 lib/eal/x86/include/rte_byteorder.h           | 16 ++++++------
 lib/eal/x86/include/rte_cpuflags.h            |  8 +++---
 lib/eal/x86/include/rte_cycles.h              |  8 +++---
 lib/eal/x86/include/rte_io.h                  |  8 +++---
 lib/eal/x86/include/rte_pause.h               |  7 +++---
 lib/eal/x86/include/rte_power_intrinsics.h    |  8 +++---
 lib/eal/x86/include/rte_prefetch.h            |  8 +++---
 lib/eal/x86/include/rte_rwlock.h              |  6 ++---
 lib/eal/x86/include/rte_spinlock.h            |  9 +++----
 lib/ethdev/ethdev_driver.h                    |  8 +++---
 lib/ethdev/ethdev_pci.h                       |  8 +++---
 lib/ethdev/ethdev_trace.h                     |  8 +++---
 lib/ethdev/ethdev_vdev.h                      |  8 +++---
 lib/ethdev/rte_cman.h                         |  4 +--
 lib/ethdev/rte_dev_info.h                     |  4 +--
 lib/ethdev/rte_ethdev.h                       |  8 +++---
 lib/ethdev/rte_ethdev_trace_fp.h              |  4 +--
 lib/eventdev/event_timer_adapter_pmd.h        |  4 +--
 lib/eventdev/eventdev_pmd.h                   |  8 +++---
 lib/eventdev/eventdev_pmd_pci.h               |  8 +++---
 lib/eventdev/eventdev_pmd_vdev.h              |  8 +++---
 lib/eventdev/eventdev_trace.h                 |  8 +++---
 lib/eventdev/rte_event_crypto_adapter.h       |  8 +++---
 lib/eventdev/rte_event_eth_rx_adapter.h       |  8 +++---
 lib/eventdev/rte_event_eth_tx_adapter.h       |  8 +++---
 lib/eventdev/rte_event_ring.h                 |  8 +++---
 lib/eventdev/rte_event_timer_adapter.h        |  8 +++---
 lib/eventdev/rte_eventdev.h                   |  8 +++---
 lib/eventdev/rte_eventdev_trace_fp.h          |  4 +--
 lib/graph/rte_graph_model_mcore_dispatch.h    |  8 +++---
 lib/graph/rte_graph_worker.h                  |  6 ++---
 lib/gso/rte_gso.h                             |  6 ++---
 lib/hash/rte_fbk_hash.h                       |  8 +++---
 lib/hash/rte_hash_crc.h                       |  8 +++---
 lib/hash/rte_jhash.h                          |  8 +++---
 lib/hash/rte_thash.h                          |  8 +++---
 lib/hash/rte_thash_gfni.h                     |  8 +++---
 lib/ip_frag/rte_ip_frag.h                     |  8 +++---
 lib/ipsec/rte_ipsec.h                         |  8 +++---
 lib/log/rte_log.h                             |  8 +++---
 lib/lpm/rte_lpm.h                             |  8 +++---
 lib/member/rte_member.h                       |  8 +++---
 lib/member/rte_member_sketch.h                |  6 ++---
 lib/member/rte_member_sketch_avx512.h         |  8 +++---
 lib/member/rte_member_x86.h                   |  4 +--
 lib/member/rte_xxh64_avx512.h                 |  6 ++---
 lib/mempool/mempool_trace.h                   |  8 +++---
 lib/mempool/rte_mempool_trace_fp.h            |  4 +--
 lib/meter/rte_meter.h                         |  8 +++---
 lib/mldev/mldev_utils.h                       |  8 +++---
 lib/mldev/rte_mldev_core.h                    |  8 +++---
 lib/mldev/rte_mldev_pmd.h                     |  8 +++---
 lib/net/rte_ether.h                           |  8 +++---
 lib/net/rte_net.h                             |  8 +++---
 lib/net/rte_sctp.h                            |  8 +++---
 lib/node/rte_node_eth_api.h                   |  8 +++---
 lib/node/rte_node_ip4_api.h                   |  8 +++---
 lib/node/rte_node_ip6_api.h                   |  6 ++---
 lib/node/rte_node_udp4_input_api.h            |  8 +++---
 lib/pci/rte_pci.h                             |  8 +++---
 lib/pdcp/rte_pdcp.h                           |  8 +++---
 lib/pipeline/rte_pipeline.h                   |  8 +++---
 lib/pipeline/rte_port_in_action.h             |  8 +++---
 lib/pipeline/rte_swx_ctl.h                    |  8 +++---
 lib/pipeline/rte_swx_extern.h                 |  8 +++---
 lib/pipeline/rte_swx_ipsec.h                  |  8 +++---
 lib/pipeline/rte_swx_pipeline.h               |  8 +++---
 lib/pipeline/rte_swx_pipeline_spec.h          |  8 +++---
 lib/pipeline/rte_table_action.h               |  8 +++---
 lib/port/rte_port.h                           |  8 +++---
 lib/port/rte_port_ethdev.h                    |  8 +++---
 lib/port/rte_port_eventdev.h                  |  8 +++---
 lib/port/rte_port_fd.h                        |  8 +++---
 lib/port/rte_port_frag.h                      |  8 +++---
 lib/port/rte_port_ras.h                       |  8 +++---
 lib/port/rte_port_ring.h                      |  8 +++---
 lib/port/rte_port_sched.h                     |  8 +++---
 lib/port/rte_port_source_sink.h               |  8 +++---
 lib/port/rte_port_sym_crypto.h                |  8 +++---
 lib/port/rte_swx_port.h                       |  8 +++---
 lib/port/rte_swx_port_ethdev.h                |  8 +++---
 lib/port/rte_swx_port_fd.h                    |  8 +++---
 lib/port/rte_swx_port_ring.h                  |  8 +++---
 lib/port/rte_swx_port_source_sink.h           |  8 +++---
 lib/rawdev/rte_rawdev.h                       |  6 ++---
 lib/rawdev/rte_rawdev_pmd.h                   |  8 +++---
 lib/rcu/rte_rcu_qsbr.h                        |  8 +++---
 lib/regexdev/rte_regexdev.h                   |  8 +++---
 lib/ring/rte_ring.h                           |  6 ++---
 lib/ring/rte_ring_core.h                      |  8 +++---
 lib/ring/rte_ring_elem.h                      |  8 +++---
 lib/ring/rte_ring_hts.h                       |  4 +--
 lib/ring/rte_ring_peek.h                      |  4 +--
 lib/ring/rte_ring_peek_zc.h                   |  4 +--
 lib/ring/rte_ring_rts.h                       |  4 +--
 lib/sched/rte_approx.h                        |  8 +++---
 lib/sched/rte_pie.h                           |  8 +++---
 lib/sched/rte_red.h                           |  8 +++---
 lib/sched/rte_sched.h                         |  8 +++---
 lib/sched/rte_sched_common.h                  |  6 ++---
 lib/security/rte_security.h                   |  8 +++---
 lib/security/rte_security_driver.h            |  6 ++---
 lib/stack/rte_stack.h                         |  8 +++---
 lib/table/rte_lru.h                           | 12 +++------
 lib/table/rte_lru_arm64.h                     |  8 +++---
 lib/table/rte_lru_x86.h                       |  8 ------
 lib/table/rte_swx_hash_func.h                 |  8 +++---
 lib/table/rte_swx_keycmp.h                    |  8 +++---
 lib/table/rte_swx_table.h                     |  8 +++---
 lib/table/rte_swx_table_em.h                  |  8 +++---
 lib/table/rte_swx_table_learner.h             |  8 +++---
 lib/table/rte_swx_table_selector.h            |  8 +++---
 lib/table/rte_swx_table_wm.h                  |  8 +++---
 lib/table/rte_table.h                         |  8 +++---
 lib/table/rte_table_acl.h                     |  8 +++---
 lib/table/rte_table_array.h                   |  8 +++---
 lib/table/rte_table_hash.h                    |  8 +++---
 lib/table/rte_table_hash_cuckoo.h             |  8 +++---
 lib/table/rte_table_hash_func.h               | 12 ++++++---
 lib/table/rte_table_lpm.h                     |  8 +++---
 lib/table/rte_table_lpm_ipv6.h                |  8 +++---
 lib/table/rte_table_stub.h                    |  8 +++---
 lib/telemetry/rte_telemetry.h                 |  8 +++---
 lib/vhost/rte_vdpa.h                          |  8 +++---
 lib/vhost/rte_vhost.h                         |  8 +++---
 lib/vhost/rte_vhost_async.h                   |  8 +++---
 lib/vhost/rte_vhost_crypto.h                  |  4 +--
 lib/vhost/vdpa_driver.h                       |  8 +++---
 282 files changed, 1081 insertions(+), 980 deletions(-)

diff --git a/app/test/packet_burst_generator.h b/app/test/packet_burst_generator.h
index b99286f50e..cce41bcd0f 100644
--- a/app/test/packet_burst_generator.h
+++ b/app/test/packet_burst_generator.h
@@ -5,10 +5,6 @@
 #ifndef PACKET_BURST_GENERATOR_H_
 #define PACKET_BURST_GENERATOR_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_mbuf.h>
 #include <rte_ether.h>
 #include <rte_arp.h>
@@ -17,6 +13,10 @@ extern "C" {
 #include <rte_tcp.h>
 #include <rte_sctp.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define IPV4_ADDR(a, b, c, d)(((a & 0xff) << 24) | ((b & 0xff) << 16) | \
 		((c & 0xff) << 8) | (d & 0xff))
 
diff --git a/app/test/virtual_pmd.h b/app/test/virtual_pmd.h
index 120b58b273..a5a71d7cb4 100644
--- a/app/test/virtual_pmd.h
+++ b/app/test/virtual_pmd.h
@@ -5,12 +5,12 @@
 #ifndef __VIRTUAL_ETHDEV_H_
 #define __VIRTUAL_ETHDEV_H_
 
+#include <rte_ether.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ether.h>
-
 int
 virtual_ethdev_init(void);
 
diff --git a/drivers/bus/auxiliary/bus_auxiliary_driver.h b/drivers/bus/auxiliary/bus_auxiliary_driver.h
index 58fb7c7f69..40ab1f0912 100644
--- a/drivers/bus/auxiliary/bus_auxiliary_driver.h
+++ b/drivers/bus/auxiliary/bus_auxiliary_driver.h
@@ -11,10 +11,6 @@
  * Auxiliary Bus Interface.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <limits.h>
@@ -28,6 +24,10 @@ extern "C" {
 #include <dev_driver.h>
 #include <rte_kvargs.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_BUS_AUXILIARY_NAME "auxiliary"
 
 /* Forward declarations */
diff --git a/drivers/bus/cdx/bus_cdx_driver.h b/drivers/bus/cdx/bus_cdx_driver.h
index 211f8e406b..d390e7b5a1 100644
--- a/drivers/bus/cdx/bus_cdx_driver.h
+++ b/drivers/bus/cdx/bus_cdx_driver.h
@@ -10,10 +10,6 @@
  * AMD CDX bus interface
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdlib.h>
 #include <inttypes.h>
 #include <linux/types.h>
@@ -22,6 +18,10 @@ extern "C" {
 #include <dev_driver.h>
 #include <rte_interrupts.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Forward declarations */
 struct rte_cdx_device;
 struct rte_cdx_driver;
diff --git a/drivers/bus/dpaa/include/fsl_qman.h b/drivers/bus/dpaa/include/fsl_qman.h
index c0677976e8..f39007b84d 100644
--- a/drivers/bus/dpaa/include/fsl_qman.h
+++ b/drivers/bus/dpaa/include/fsl_qman.h
@@ -8,14 +8,14 @@
 #ifndef __FSL_QMAN_H
 #define __FSL_QMAN_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <dpaa_rbtree.h>
 #include <rte_compat.h>
 #include <rte_eventdev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* FQ lookups (turn this on for 64bit user-space) */
 #ifdef RTE_ARCH_64
 #define CONFIG_FSL_QMAN_FQ_LOOKUP
diff --git a/drivers/bus/fslmc/bus_fslmc_driver.h b/drivers/bus/fslmc/bus_fslmc_driver.h
index 7ac5fe6ff1..3095458133 100644
--- a/drivers/bus/fslmc/bus_fslmc_driver.h
+++ b/drivers/bus/fslmc/bus_fslmc_driver.h
@@ -13,10 +13,6 @@
  * RTE FSLMC Bus Interface
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <limits.h>
@@ -40,6 +36,10 @@ extern "C" {
 #include "portal/dpaa2_hw_pvt.h"
 #include "portal/dpaa2_hw_dpio.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define FSLMC_OBJECT_MAX_LEN 32   /**< Length of each device on bus */
 
 #define DPAA2_INVALID_MBUF_SEQN        0
diff --git a/drivers/bus/pci/bus_pci_driver.h b/drivers/bus/pci/bus_pci_driver.h
index be32263a82..2cc1119072 100644
--- a/drivers/bus/pci/bus_pci_driver.h
+++ b/drivers/bus/pci/bus_pci_driver.h
@@ -6,14 +6,14 @@
 #ifndef BUS_PCI_DRIVER_H
 #define BUS_PCI_DRIVER_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bus_pci.h>
 #include <dev_driver.h>
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Pathname of PCI devices directory. */
 __rte_internal
 const char *rte_pci_get_sysfs_path(void);
diff --git a/drivers/bus/pci/rte_bus_pci.h b/drivers/bus/pci/rte_bus_pci.h
index a3798cb1cb..19a7b15b99 100644
--- a/drivers/bus/pci/rte_bus_pci.h
+++ b/drivers/bus/pci/rte_bus_pci.h
@@ -11,10 +11,6 @@
  * PCI device & driver interface
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <limits.h>
@@ -27,6 +23,10 @@ extern "C" {
 #include <rte_interrupts.h>
 #include <rte_pci.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Forward declarations */
 struct rte_pci_device;
 struct rte_pci_driver;
diff --git a/drivers/bus/platform/bus_platform_driver.h b/drivers/bus/platform/bus_platform_driver.h
index 5ac54fb739..a6f246f7c4 100644
--- a/drivers/bus/platform/bus_platform_driver.h
+++ b/drivers/bus/platform/bus_platform_driver.h
@@ -10,10 +10,6 @@
  * Platform bus interface.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stddef.h>
 #include <stdint.h>
 
@@ -23,6 +19,10 @@ extern "C" {
 #include <rte_os.h>
 #include <rte_vfio.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Forward declarations */
 struct rte_platform_bus;
 struct rte_platform_device;
diff --git a/drivers/bus/vdev/bus_vdev_driver.h b/drivers/bus/vdev/bus_vdev_driver.h
index bc7e30d7c6..cba1fb5269 100644
--- a/drivers/bus/vdev/bus_vdev_driver.h
+++ b/drivers/bus/vdev/bus_vdev_driver.h
@@ -5,15 +5,15 @@
 #ifndef BUS_VDEV_DRIVER_H
 #define BUS_VDEV_DRIVER_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bus_vdev.h>
 #include <rte_compat.h>
 #include <dev_driver.h>
 #include <rte_devargs.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_vdev_device {
 	RTE_TAILQ_ENTRY(rte_vdev_device) next;      /**< Next attached vdev */
 	struct rte_device device;               /**< Inherit core device */
diff --git a/drivers/bus/vmbus/bus_vmbus_driver.h b/drivers/bus/vmbus/bus_vmbus_driver.h
index e2475a642d..bc394208de 100644
--- a/drivers/bus/vmbus/bus_vmbus_driver.h
+++ b/drivers/bus/vmbus/bus_vmbus_driver.h
@@ -6,14 +6,14 @@
 #ifndef BUS_VMBUS_DRIVER_H
 #define BUS_VMBUS_DRIVER_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bus_vmbus.h>
 #include <rte_compat.h>
 #include <dev_driver.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct vmbus_channel;
 struct vmbus_mon_page;
 
diff --git a/drivers/bus/vmbus/rte_bus_vmbus.h b/drivers/bus/vmbus/rte_bus_vmbus.h
index 9467bd8f3d..fd18bca73c 100644
--- a/drivers/bus/vmbus/rte_bus_vmbus.h
+++ b/drivers/bus/vmbus/rte_bus_vmbus.h
@@ -11,10 +11,6 @@
  *
  * VMBUS Interface
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <stdlib.h>
 #include <limits.h>
@@ -28,6 +24,10 @@ extern "C" {
 #include <rte_interrupts.h>
 #include <rte_vmbus_reg.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Forward declarations */
 struct rte_vmbus_device;
 struct rte_vmbus_driver;
diff --git a/drivers/dma/cnxk/cnxk_dma_event_dp.h b/drivers/dma/cnxk/cnxk_dma_event_dp.h
index 06b5ca8279..8c6cf5dd9a 100644
--- a/drivers/dma/cnxk/cnxk_dma_event_dp.h
+++ b/drivers/dma/cnxk/cnxk_dma_event_dp.h
@@ -5,16 +5,16 @@
 #ifndef _CNXK_DMA_EVENT_DP_H_
 #define _CNXK_DMA_EVENT_DP_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_common.h>
 #include <rte_compat.h>
 #include <rte_eventdev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 __rte_internal
 uint16_t cn10k_dma_adapter_enqueue(void *ws, struct rte_event ev[], uint16_t nb_events);
 
diff --git a/drivers/dma/ioat/ioat_hw_defs.h b/drivers/dma/ioat/ioat_hw_defs.h
index dc3493a78f..11893951f2 100644
--- a/drivers/dma/ioat/ioat_hw_defs.h
+++ b/drivers/dma/ioat/ioat_hw_defs.h
@@ -5,12 +5,12 @@
 #ifndef IOAT_HW_DEFS_H
 #define IOAT_HW_DEFS_H
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 #define IOAT_PCI_CHANERR_INT_OFFSET	0x180
 
 #define IOAT_VER_3_0	0x30
diff --git a/drivers/event/dlb2/rte_pmd_dlb2.h b/drivers/event/dlb2/rte_pmd_dlb2.h
index 334c6c356d..dba7fd2f43 100644
--- a/drivers/event/dlb2/rte_pmd_dlb2.h
+++ b/drivers/event/dlb2/rte_pmd_dlb2.h
@@ -11,14 +11,14 @@
 #ifndef _RTE_PMD_DLB2_H_
 #define _RTE_PMD_DLB2_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
diff --git a/drivers/mempool/dpaa2/rte_dpaa2_mempool.h b/drivers/mempool/dpaa2/rte_dpaa2_mempool.h
index 7fe3d93f61..0286090b1b 100644
--- a/drivers/mempool/dpaa2/rte_dpaa2_mempool.h
+++ b/drivers/mempool/dpaa2/rte_dpaa2_mempool.h
@@ -12,13 +12,13 @@
  *
  */
 
+#include <rte_compat.h>
+#include <rte_mempool.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_compat.h>
-#include <rte_mempool.h>
-
 /**
  * Get BPID corresponding to the packet pool
  *
diff --git a/drivers/net/avp/rte_avp_fifo.h b/drivers/net/avp/rte_avp_fifo.h
index c1658da685..879de3b1c0 100644
--- a/drivers/net/avp/rte_avp_fifo.h
+++ b/drivers/net/avp/rte_avp_fifo.h
@@ -8,10 +8,6 @@
 
 #include "rte_avp_common.h"
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef __KERNEL__
 /* Write memory barrier for kernel compiles */
 #define AVP_WMB() smp_wmb()
@@ -27,6 +23,10 @@ extern "C" {
 #ifndef __KERNEL__
 #include <rte_debug.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Initializes the avp fifo structure
  */
diff --git a/drivers/net/bonding/rte_eth_bond.h b/drivers/net/bonding/rte_eth_bond.h
index f10165f2c6..e59ff8793e 100644
--- a/drivers/net/bonding/rte_eth_bond.h
+++ b/drivers/net/bonding/rte_eth_bond.h
@@ -17,12 +17,12 @@
  * load balancing of network ports
  */
 
+#include <rte_ether.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ether.h>
-
 /* Supported modes of operation of link bonding library  */
 
 #define BONDING_MODE_ROUND_ROBIN		(0)
diff --git a/drivers/net/i40e/rte_pmd_i40e.h b/drivers/net/i40e/rte_pmd_i40e.h
index a802f989e9..5af7e2330f 100644
--- a/drivers/net/i40e/rte_pmd_i40e.h
+++ b/drivers/net/i40e/rte_pmd_i40e.h
@@ -14,14 +14,14 @@
  *
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_ethdev.h>
 #include <rte_ether.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Response sent back to i40e driver from user app after callback
  */
diff --git a/drivers/net/mlx5/mlx5_trace.h b/drivers/net/mlx5/mlx5_trace.h
index 888d96f60b..a8f0b372c8 100644
--- a/drivers/net/mlx5/mlx5_trace.h
+++ b/drivers/net/mlx5/mlx5_trace.h
@@ -11,14 +11,14 @@
  * API for mlx5 PMD trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <mlx5_prm.h>
 #include <rte_mbuf.h>
 #include <rte_trace_point.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* TX burst subroutines trace points. */
 RTE_TRACE_POINT_FP(
 	rte_pmd_mlx5_trace_tx_entry,
diff --git a/drivers/net/ring/rte_eth_ring.h b/drivers/net/ring/rte_eth_ring.h
index 59e074d0ad..98292c7b33 100644
--- a/drivers/net/ring/rte_eth_ring.h
+++ b/drivers/net/ring/rte_eth_ring.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_ETH_RING_H_
 #define _RTE_ETH_RING_H_
 
+#include <rte_ring.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring.h>
-
 /**
  * Create a new ethdev port from a set of rings
  *
diff --git a/drivers/net/vhost/rte_eth_vhost.h b/drivers/net/vhost/rte_eth_vhost.h
index 0e68b9f668..6ec59a7adc 100644
--- a/drivers/net/vhost/rte_eth_vhost.h
+++ b/drivers/net/vhost/rte_eth_vhost.h
@@ -5,15 +5,15 @@
 #ifndef _RTE_ETH_VHOST_H_
 #define _RTE_ETH_VHOST_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdbool.h>
 
 #include <rte_vhost.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Event description.
  */
diff --git a/drivers/raw/ifpga/afu_pmd_core.h b/drivers/raw/ifpga/afu_pmd_core.h
index a8f1afe343..abf9e491f7 100644
--- a/drivers/raw/ifpga/afu_pmd_core.h
+++ b/drivers/raw/ifpga/afu_pmd_core.h
@@ -5,10 +5,6 @@
 #ifndef AFU_PMD_CORE_H
 #define AFU_PMD_CORE_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdio.h>
 #include <unistd.h>
@@ -20,6 +16,10 @@ extern "C" {
 
 #include "ifpga_rawdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define AFU_RAWDEV_MAX_DRVS  32
 
 struct afu_rawdev;
diff --git a/drivers/raw/ifpga/afu_pmd_he_hssi.h b/drivers/raw/ifpga/afu_pmd_he_hssi.h
index aebbe32d54..282289d912 100644
--- a/drivers/raw/ifpga/afu_pmd_he_hssi.h
+++ b/drivers/raw/ifpga/afu_pmd_he_hssi.h
@@ -5,13 +5,13 @@
 #ifndef AFU_PMD_HE_HSSI_H
 #define AFU_PMD_HE_HSSI_H
 
+#include "afu_pmd_core.h"
+#include "rte_pmd_afu.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "afu_pmd_core.h"
-#include "rte_pmd_afu.h"
-
 #define HE_HSSI_UUID_L    0xbb370242ac130002
 #define HE_HSSI_UUID_H    0x823c334c98bf11ea
 #define NUM_HE_HSSI_PORTS 8
diff --git a/drivers/raw/ifpga/afu_pmd_he_lpbk.h b/drivers/raw/ifpga/afu_pmd_he_lpbk.h
index eab7b55199..67b3653c21 100644
--- a/drivers/raw/ifpga/afu_pmd_he_lpbk.h
+++ b/drivers/raw/ifpga/afu_pmd_he_lpbk.h
@@ -5,13 +5,13 @@
 #ifndef AFU_PMD_HE_LPBK_H
 #define AFU_PMD_HE_LPBK_H
 
+#include "afu_pmd_core.h"
+#include "rte_pmd_afu.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "afu_pmd_core.h"
-#include "rte_pmd_afu.h"
-
 #define HE_LPBK_UUID_L     0xb94b12284c31e02b
 #define HE_LPBK_UUID_H     0x56e203e9864f49a7
 #define HE_MEM_LPBK_UUID_L 0xbb652a578330a8eb
diff --git a/drivers/raw/ifpga/afu_pmd_he_mem.h b/drivers/raw/ifpga/afu_pmd_he_mem.h
index 998ca92416..41854d8c58 100644
--- a/drivers/raw/ifpga/afu_pmd_he_mem.h
+++ b/drivers/raw/ifpga/afu_pmd_he_mem.h
@@ -5,13 +5,13 @@
 #ifndef AFU_PMD_HE_MEM_H
 #define AFU_PMD_HE_MEM_H
 
+#include "afu_pmd_core.h"
+#include "rte_pmd_afu.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "afu_pmd_core.h"
-#include "rte_pmd_afu.h"
-
 #define HE_MEM_TG_UUID_L  0xa3dc5b831f5cecbb
 #define HE_MEM_TG_UUID_H  0x4dadea342c7848cb
 
diff --git a/drivers/raw/ifpga/afu_pmd_n3000.h b/drivers/raw/ifpga/afu_pmd_n3000.h
index 403cc64b91..f6b6e07c6b 100644
--- a/drivers/raw/ifpga/afu_pmd_n3000.h
+++ b/drivers/raw/ifpga/afu_pmd_n3000.h
@@ -5,13 +5,13 @@
 #ifndef AFU_PMD_N3000_H
 #define AFU_PMD_N3000_H
 
+#include "afu_pmd_core.h"
+#include "rte_pmd_afu.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "afu_pmd_core.h"
-#include "rte_pmd_afu.h"
-
 #define N3000_AFU_UUID_L  0xc000c9660d824272
 #define N3000_AFU_UUID_H  0x9aeffe5f84570612
 #define N3000_NLB0_UUID_L 0xf89e433683f9040b
diff --git a/drivers/raw/ifpga/rte_pmd_afu.h b/drivers/raw/ifpga/rte_pmd_afu.h
index 5403ed25f5..0edacc3a9c 100644
--- a/drivers/raw/ifpga/rte_pmd_afu.h
+++ b/drivers/raw/ifpga/rte_pmd_afu.h
@@ -14,12 +14,12 @@
  *
  */
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 #define RTE_PMD_AFU_N3000_NLB   1
 #define RTE_PMD_AFU_N3000_DMA   2
 
diff --git a/drivers/raw/ifpga/rte_pmd_ifpga.h b/drivers/raw/ifpga/rte_pmd_ifpga.h
index 791543f2cd..36b7f9c018 100644
--- a/drivers/raw/ifpga/rte_pmd_ifpga.h
+++ b/drivers/raw/ifpga/rte_pmd_ifpga.h
@@ -14,12 +14,12 @@
  *
  */
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 #define IFPGA_MAX_PORT_NUM   4
 
 /**
diff --git a/examples/ethtool/lib/rte_ethtool.h b/examples/ethtool/lib/rte_ethtool.h
index d27e0102b1..c7dd3d9755 100644
--- a/examples/ethtool/lib/rte_ethtool.h
+++ b/examples/ethtool/lib/rte_ethtool.h
@@ -30,14 +30,14 @@
  * rte_ethtool_net_set_rx_mode      net_device_ops::ndo_set_rx_mode
  *
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_ethdev.h>
 #include <linux/ethtool.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Retrieve the Ethernet device driver information according to
  * attributes described by ethtool data structure, ethtool_drvinfo.
diff --git a/examples/qos_sched/main.h b/examples/qos_sched/main.h
index 04e77a4a10..ea66df0434 100644
--- a/examples/qos_sched/main.h
+++ b/examples/qos_sched/main.h
@@ -5,12 +5,12 @@
 #ifndef _MAIN_H_
 #define _MAIN_H_
 
+#include <rte_sched.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_sched.h>
-
 #define RTE_LOGTYPE_APP RTE_LOGTYPE_USER1
 
 /*
diff --git a/examples/vm_power_manager/channel_manager.h b/examples/vm_power_manager/channel_manager.h
index eb989b20ad..6f70539815 100644
--- a/examples/vm_power_manager/channel_manager.h
+++ b/examples/vm_power_manager/channel_manager.h
@@ -5,16 +5,16 @@
 #ifndef CHANNEL_MANAGER_H_
 #define CHANNEL_MANAGER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <linux/limits.h>
 #include <linux/un.h>
 #include <stdbool.h>
 
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Maximum name length including '\0' terminator */
 #define CHANNEL_MGR_MAX_NAME_LEN    64
 
diff --git a/lib/acl/rte_acl_osdep.h b/lib/acl/rte_acl_osdep.h
index 3c1dc402ca..e4c7d07c69 100644
--- a/lib/acl/rte_acl_osdep.h
+++ b/lib/acl/rte_acl_osdep.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_ACL_OSDEP_H_
 #define _RTE_ACL_OSDEP_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -49,6 +45,10 @@ extern "C" {
 #include <rte_cpuflags.h>
 #include <rte_debug.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
index 0cbfdd1c95..9e83dd2bb0 100644
--- a/lib/bbdev/rte_bbdev.h
+++ b/lib/bbdev/rte_bbdev.h
@@ -20,10 +20,6 @@
  * from the same queue.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdbool.h>
 
@@ -32,6 +28,10 @@ extern "C" {
 
 #include "rte_bbdev_op.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_BBDEV_MAX_DEVS
 #define RTE_BBDEV_MAX_DEVS 128  /**< Max number of devices */
 #endif
diff --git a/lib/bbdev/rte_bbdev_op.h b/lib/bbdev/rte_bbdev_op.h
index 459631d0d0..6f4bae7d0f 100644
--- a/lib/bbdev/rte_bbdev_op.h
+++ b/lib/bbdev/rte_bbdev_op.h
@@ -11,10 +11,6 @@
  * Defines wireless base band layer 1 operations and capabilities
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -23,6 +19,10 @@ extern "C" {
 #include <rte_memory.h>
 #include <rte_mempool.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Number of columns in sub-block interleaver (36.212, section 5.1.4.1.1) */
 #define RTE_BBDEV_TURBO_C_SUBBLOCK (32)
 /* Maximum size of Transport Block (36.213, Table, Table 7.1.7.2.5-1) */
diff --git a/lib/bbdev/rte_bbdev_pmd.h b/lib/bbdev/rte_bbdev_pmd.h
index 442b23943d..0a1738fc05 100644
--- a/lib/bbdev/rte_bbdev_pmd.h
+++ b/lib/bbdev/rte_bbdev_pmd.h
@@ -14,15 +14,15 @@
  * bbdev interface. User applications should not use this API.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_log.h>
 
 #include "rte_bbdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Suggested value for SW based devices */
 #define RTE_BBDEV_DEFAULT_MAX_NB_QUEUES RTE_MAX_LCORE
 
diff --git a/lib/bpf/bpf_def.h b/lib/bpf/bpf_def.h
index f08cd9106b..9f2e162914 100644
--- a/lib/bpf/bpf_def.h
+++ b/lib/bpf/bpf_def.h
@@ -7,10 +7,6 @@
 #ifndef _RTE_BPF_DEF_H_
 #define _RTE_BPF_DEF_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -25,6 +21,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 
 /*
  * The instruction encodings.
diff --git a/lib/compressdev/rte_comp.h b/lib/compressdev/rte_comp.h
index 830a240b6b..d66a4b1cb9 100644
--- a/lib/compressdev/rte_comp.h
+++ b/lib/compressdev/rte_comp.h
@@ -11,12 +11,12 @@
  * RTE definitions for Data Compression Service
  */
 
+#include <rte_mbuf.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_mbuf.h>
-
 /**
  * compression service feature flags
  *
diff --git a/lib/compressdev/rte_compressdev.h b/lib/compressdev/rte_compressdev.h
index e0294a18bd..b3392553a6 100644
--- a/lib/compressdev/rte_compressdev.h
+++ b/lib/compressdev/rte_compressdev.h
@@ -13,13 +13,13 @@
  * Defines comp device APIs for the provisioning of compression operations.
  */
 
+
+#include "rte_comp.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-
-#include "rte_comp.h"
-
 /**
  * Parameter log base 2 range description.
  * Final value will be 2^value.
diff --git a/lib/compressdev/rte_compressdev_internal.h b/lib/compressdev/rte_compressdev_internal.h
index 67f8b51a37..a980d74cbf 100644
--- a/lib/compressdev/rte_compressdev_internal.h
+++ b/lib/compressdev/rte_compressdev_internal.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_COMPRESSDEV_INTERNAL_H_
 #define _RTE_COMPRESSDEV_INTERNAL_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /* rte_compressdev_internal.h
  * This file holds Compressdev private data structures.
  */
@@ -16,6 +12,10 @@ extern "C" {
 
 #include "rte_comp.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_COMPRESSDEV_NAME_MAX_LEN	(64)
 /**< Max length of name of comp PMD */
 
diff --git a/lib/compressdev/rte_compressdev_pmd.h b/lib/compressdev/rte_compressdev_pmd.h
index 32e29c9d16..ea721f014d 100644
--- a/lib/compressdev/rte_compressdev_pmd.h
+++ b/lib/compressdev/rte_compressdev_pmd.h
@@ -13,10 +13,6 @@
  * them directly.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 
 #include <dev_driver.h>
@@ -24,6 +20,10 @@ extern "C" {
 #include "rte_compressdev.h"
 #include "rte_compressdev_internal.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_COMPRESSDEV_PMD_NAME_ARG			("name")
 #define RTE_COMPRESSDEV_PMD_SOCKET_ID_ARG		("socket_id")
 
diff --git a/lib/cryptodev/cryptodev_pmd.h b/lib/cryptodev/cryptodev_pmd.h
index 6c114f7181..3e2e2673b8 100644
--- a/lib/cryptodev/cryptodev_pmd.h
+++ b/lib/cryptodev/cryptodev_pmd.h
@@ -5,10 +5,6 @@
 #ifndef _CRYPTODEV_PMD_H_
 #define _CRYPTODEV_PMD_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** @file
  * RTE Crypto PMD APIs
  *
@@ -28,6 +24,10 @@ extern "C" {
 #include "rte_crypto.h"
 #include "rte_cryptodev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 
 #define RTE_CRYPTODEV_PMD_DEFAULT_MAX_NB_QUEUE_PAIRS	8
 
diff --git a/lib/cryptodev/cryptodev_trace.h b/lib/cryptodev/cryptodev_trace.h
index 935f0d564b..e186f0f3c1 100644
--- a/lib/cryptodev/cryptodev_trace.h
+++ b/lib/cryptodev/cryptodev_trace.h
@@ -11,14 +11,14 @@
  * API for cryptodev trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_trace_point.h>
 
 #include "rte_cryptodev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_TRACE_POINT(
 	rte_cryptodev_trace_configure,
 	RTE_TRACE_POINT_ARGS(uint8_t dev_id,
diff --git a/lib/cryptodev/rte_crypto.h b/lib/cryptodev/rte_crypto.h
index dbc2700da5..dcf4a36fb2 100644
--- a/lib/cryptodev/rte_crypto.h
+++ b/lib/cryptodev/rte_crypto.h
@@ -11,10 +11,6 @@
  * RTE Cryptography Common Definitions
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 
 #include <rte_mbuf.h>
 #include <rte_memory.h>
@@ -24,6 +20,10 @@ extern "C" {
 #include "rte_crypto_sym.h"
 #include "rte_crypto_asym.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Crypto operation types */
 enum rte_crypto_op_type {
 	RTE_CRYPTO_OP_TYPE_UNDEFINED,
diff --git a/lib/cryptodev/rte_crypto_asym.h b/lib/cryptodev/rte_crypto_asym.h
index 39d3da3952..4b7ea36961 100644
--- a/lib/cryptodev/rte_crypto_asym.h
+++ b/lib/cryptodev/rte_crypto_asym.h
@@ -14,10 +14,6 @@
  * asymmetric crypto operations.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 #include <stdint.h>
 
@@ -27,6 +23,10 @@ extern "C" {
 
 #include "rte_crypto_sym.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_cryptodev_asym_session;
 
 /** asym key exchange operation type name strings */
diff --git a/lib/cryptodev/rte_crypto_sym.h b/lib/cryptodev/rte_crypto_sym.h
index 53b18b9412..fb73024010 100644
--- a/lib/cryptodev/rte_crypto_sym.h
+++ b/lib/cryptodev/rte_crypto_sym.h
@@ -14,10 +14,6 @@
  * as supported symmetric crypto operation combinations.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 
 #include <rte_compat.h>
@@ -26,6 +22,10 @@ extern "C" {
 #include <rte_mempool.h>
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Crypto IO Vector (in analogy with struct iovec)
  * Supposed be used to pass input/output data buffers for crypto data-path
diff --git a/lib/cryptodev/rte_cryptodev.h b/lib/cryptodev/rte_cryptodev.h
index bec947f6d5..8051c5a6a3 100644
--- a/lib/cryptodev/rte_cryptodev.h
+++ b/lib/cryptodev/rte_cryptodev.h
@@ -14,10 +14,6 @@
  * authentication operations.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include "rte_kvargs.h"
 #include "rte_crypto.h"
@@ -1859,6 +1855,10 @@ int rte_cryptodev_remove_deq_callback(uint8_t dev_id,
 				      struct rte_cryptodev_cb *cb);
 
 #include <rte_cryptodev_core.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 /**
  *
  * Dequeue a burst of processed crypto operations from a queue on the crypto
diff --git a/lib/cryptodev/rte_cryptodev_trace_fp.h b/lib/cryptodev/rte_cryptodev_trace_fp.h
index dbfbc7b2e5..f23f882804 100644
--- a/lib/cryptodev/rte_cryptodev_trace_fp.h
+++ b/lib/cryptodev/rte_cryptodev_trace_fp.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_CRYPTODEV_TRACE_FP_H_
 #define _RTE_CRYPTODEV_TRACE_FP_H_
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 RTE_TRACE_POINT_FP(
 	rte_cryptodev_trace_enqueue_burst,
 	RTE_TRACE_POINT_ARGS(uint8_t dev_id, uint16_t qp_id, void **ops,
diff --git a/lib/dispatcher/rte_dispatcher.h b/lib/dispatcher/rte_dispatcher.h
index d8182d5f2c..ba2c353073 100644
--- a/lib/dispatcher/rte_dispatcher.h
+++ b/lib/dispatcher/rte_dispatcher.h
@@ -19,16 +19,16 @@
  * event device.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stdint.h>
 
 #include <rte_compat.h>
 #include <rte_eventdev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Function prototype for match callbacks.
  *
diff --git a/lib/dmadev/rte_dmadev.h b/lib/dmadev/rte_dmadev.h
index 5474a5281d..d174d325a1 100644
--- a/lib/dmadev/rte_dmadev.h
+++ b/lib/dmadev/rte_dmadev.h
@@ -772,9 +772,17 @@ struct rte_dma_sge {
 	uint32_t length; /**< The DMA operation length. */
 };
 
+#ifdef __cplusplus
+}
+#endif
+
 #include "rte_dmadev_core.h"
 #include "rte_dmadev_trace_fp.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**@{@name DMA operation flag
  * @see rte_dma_copy()
  * @see rte_dma_copy_sg()
diff --git a/lib/eal/arm/include/rte_atomic_32.h b/lib/eal/arm/include/rte_atomic_32.h
index 62fc33773d..0b9a0dfa30 100644
--- a/lib/eal/arm/include/rte_atomic_32.h
+++ b/lib/eal/arm/include/rte_atomic_32.h
@@ -9,12 +9,12 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
+#include "generic/rte_atomic.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_atomic.h"
-
 #define	rte_mb()  __sync_synchronize()
 
 #define	rte_wmb() do { asm volatile ("dmb st" : : : "memory"); } while (0)
diff --git a/lib/eal/arm/include/rte_atomic_64.h b/lib/eal/arm/include/rte_atomic_64.h
index 7c99fc0a02..181bb60929 100644
--- a/lib/eal/arm/include/rte_atomic_64.h
+++ b/lib/eal/arm/include/rte_atomic_64.h
@@ -10,14 +10,14 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "generic/rte_atomic.h"
 #include <rte_branch_prediction.h>
 #include <rte_debug.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define rte_mb() asm volatile("dmb osh" : : : "memory")
 
 #define rte_wmb() asm volatile("dmb oshst" : : : "memory")
diff --git a/lib/eal/arm/include/rte_byteorder.h b/lib/eal/arm/include/rte_byteorder.h
index ff02052f2e..a0aaff4a28 100644
--- a/lib/eal/arm/include/rte_byteorder.h
+++ b/lib/eal/arm/include/rte_byteorder.h
@@ -9,14 +9,14 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include "generic/rte_byteorder.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* ARM architecture is bi-endian (both big and little). */
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 
diff --git a/lib/eal/arm/include/rte_cpuflags_32.h b/lib/eal/arm/include/rte_cpuflags_32.h
index 770b09b99d..7e33acd9fb 100644
--- a/lib/eal/arm/include/rte_cpuflags_32.h
+++ b/lib/eal/arm/include/rte_cpuflags_32.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_CPUFLAGS_ARM32_H_
 #define _RTE_CPUFLAGS_ARM32_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -46,6 +42,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h
index afe70209c3..f84633159e 100644
--- a/lib/eal/arm/include/rte_cpuflags_64.h
+++ b/lib/eal/arm/include/rte_cpuflags_64.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_CPUFLAGS_ARM64_H_
 #define _RTE_CPUFLAGS_ARM64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -40,6 +36,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/arm/include/rte_cycles_32.h b/lib/eal/arm/include/rte_cycles_32.h
index 859cd2e5bb..2b20c8c6f5 100644
--- a/lib/eal/arm/include/rte_cycles_32.h
+++ b/lib/eal/arm/include/rte_cycles_32.h
@@ -15,12 +15,12 @@
 
 #include <time.h>
 
+#include "generic/rte_cycles.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_cycles.h"
-
 /**
  * Read the time base register.
  *
diff --git a/lib/eal/arm/include/rte_cycles_64.h b/lib/eal/arm/include/rte_cycles_64.h
index 8b05302f47..bb76e4d7e0 100644
--- a/lib/eal/arm/include/rte_cycles_64.h
+++ b/lib/eal/arm/include/rte_cycles_64.h
@@ -6,12 +6,12 @@
 #ifndef _RTE_CYCLES_ARM64_H_
 #define _RTE_CYCLES_ARM64_H_
 
+#include "generic/rte_cycles.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_cycles.h"
-
 /** Read generic counter frequency */
 static __rte_always_inline uint64_t
 __rte_arm64_cntfrq(void)
diff --git a/lib/eal/arm/include/rte_io.h b/lib/eal/arm/include/rte_io.h
index f4e66e6bad..658768697c 100644
--- a/lib/eal/arm/include/rte_io.h
+++ b/lib/eal/arm/include/rte_io.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_IO_ARM_H_
 #define _RTE_IO_ARM_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef RTE_ARCH_64
 #include "rte_io_64.h"
 #else
 #include "generic/rte_io.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #endif
 
 #ifdef __cplusplus
diff --git a/lib/eal/arm/include/rte_io_64.h b/lib/eal/arm/include/rte_io_64.h
index 96da7789ce..88db82a7eb 100644
--- a/lib/eal/arm/include/rte_io_64.h
+++ b/lib/eal/arm/include/rte_io_64.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_IO_ARM64_H_
 #define _RTE_IO_ARM64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #define RTE_OVERRIDE_IO_H
@@ -17,6 +13,10 @@ extern "C" {
 #include <rte_compat.h>
 #include "rte_atomic_64.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static __rte_always_inline uint8_t
 rte_read8_relaxed(const volatile void *addr)
 {
diff --git a/lib/eal/arm/include/rte_memcpy_32.h b/lib/eal/arm/include/rte_memcpy_32.h
index fb3245b59c..99fd5757ca 100644
--- a/lib/eal/arm/include/rte_memcpy_32.h
+++ b/lib/eal/arm/include/rte_memcpy_32.h
@@ -8,10 +8,6 @@
 #include <stdint.h>
 #include <string.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "generic/rte_memcpy.h"
 
 #ifdef RTE_ARCH_ARM_NEON_MEMCPY
@@ -23,6 +19,10 @@ extern "C" {
 /* ARM NEON Intrinsics are used to copy data */
 #include <arm_neon.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
diff --git a/lib/eal/arm/include/rte_memcpy_64.h b/lib/eal/arm/include/rte_memcpy_64.h
index 85ad587bd3..c7d0c345ad 100644
--- a/lib/eal/arm/include/rte_memcpy_64.h
+++ b/lib/eal/arm/include/rte_memcpy_64.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_MEMCPY_ARM64_H_
 #define _RTE_MEMCPY_ARM64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <string.h>
 
@@ -18,6 +14,10 @@ extern "C" {
 #include <rte_common.h>
 #include <rte_branch_prediction.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * The memory copy performance differs on different AArch64 micro-architectures.
  * And the most recent glibc (e.g. 2.23 or later) can provide a better memcpy()
diff --git a/lib/eal/arm/include/rte_pause.h b/lib/eal/arm/include/rte_pause.h
index 6c7002ad98..8f35d60a6e 100644
--- a/lib/eal/arm/include/rte_pause.h
+++ b/lib/eal/arm/include/rte_pause.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_PAUSE_ARM_H_
 #define _RTE_PAUSE_ARM_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef RTE_ARCH_64
 #include <rte_pause_64.h>
 #else
 #include <rte_pause_32.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #endif
 
 #ifdef __cplusplus
diff --git a/lib/eal/arm/include/rte_pause_32.h b/lib/eal/arm/include/rte_pause_32.h
index d4768c7a98..7870fac763 100644
--- a/lib/eal/arm/include/rte_pause_32.h
+++ b/lib/eal/arm/include/rte_pause_32.h
@@ -5,13 +5,13 @@
 #ifndef _RTE_PAUSE_ARM32_H_
 #define _RTE_PAUSE_ARM32_H_
 
+#include <rte_common.h>
+#include "generic/rte_pause.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_pause.h"
-
 static inline void rte_pause(void)
 {
 }
diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h
index 9e2dbf3531..1526bf87cc 100644
--- a/lib/eal/arm/include/rte_pause_64.h
+++ b/lib/eal/arm/include/rte_pause_64.h
@@ -6,10 +6,6 @@
 #ifndef _RTE_PAUSE_ARM64_H_
 #define _RTE_PAUSE_ARM64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_stdatomic.h>
 
@@ -19,6 +15,10 @@ extern "C" {
 
 #include "generic/rte_pause.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_pause(void)
 {
 	asm volatile("yield" ::: "memory");
diff --git a/lib/eal/arm/include/rte_power_intrinsics.h b/lib/eal/arm/include/rte_power_intrinsics.h
index 9e498e9ebf..5481f45ad3 100644
--- a/lib/eal/arm/include/rte_power_intrinsics.h
+++ b/lib/eal/arm/include/rte_power_intrinsics.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_POWER_INTRINSIC_ARM_H_
 #define _RTE_POWER_INTRINSIC_ARM_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/arm/include/rte_prefetch_32.h b/lib/eal/arm/include/rte_prefetch_32.h
index 0e9a140c8a..619bf27c79 100644
--- a/lib/eal/arm/include/rte_prefetch_32.h
+++ b/lib/eal/arm/include/rte_prefetch_32.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_PREFETCH_ARM32_H_
 #define _RTE_PREFETCH_ARM32_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	asm volatile ("pld [%0]" : : "r" (p));
diff --git a/lib/eal/arm/include/rte_prefetch_64.h b/lib/eal/arm/include/rte_prefetch_64.h
index 22cba48e29..4f60123b8b 100644
--- a/lib/eal/arm/include/rte_prefetch_64.h
+++ b/lib/eal/arm/include/rte_prefetch_64.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_PREFETCH_ARM_64_H_
 #define _RTE_PREFETCH_ARM_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	asm volatile ("PRFM PLDL1KEEP, [%0]" : : "r" (p));
diff --git a/lib/eal/arm/include/rte_rwlock.h b/lib/eal/arm/include/rte_rwlock.h
index 18bb37b036..727cabafec 100644
--- a/lib/eal/arm/include/rte_rwlock.h
+++ b/lib/eal/arm/include/rte_rwlock.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_RWLOCK_ARM_H_
 #define _RTE_RWLOCK_ARM_H_
 
+#include "generic/rte_rwlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 {
diff --git a/lib/eal/arm/include/rte_spinlock.h b/lib/eal/arm/include/rte_spinlock.h
index a973763c23..a5d01b0d21 100644
--- a/lib/eal/arm/include/rte_spinlock.h
+++ b/lib/eal/arm/include/rte_spinlock.h
@@ -9,13 +9,13 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
+#include <rte_common.h>
+#include "generic/rte_spinlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_spinlock.h"
-
 static inline int rte_tm_supported(void)
 {
 	return 0;
diff --git a/lib/eal/freebsd/include/rte_os.h b/lib/eal/freebsd/include/rte_os.h
index 003468caff..f31f6af12d 100644
--- a/lib/eal/freebsd/include/rte_os.h
+++ b/lib/eal/freebsd/include/rte_os.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_OS_H_
 #define _RTE_OS_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * This header should contain any definition
  * which is not supported natively or named differently in FreeBSD.
@@ -17,6 +13,10 @@ extern "C" {
 #include <pthread_np.h>
 #include <sys/queue.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* These macros are compatible with system's sys/queue.h. */
 #define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
 #define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
diff --git a/lib/eal/include/bus_driver.h b/lib/eal/include/bus_driver.h
index 7b85a17a09..60527b75b6 100644
--- a/lib/eal/include/bus_driver.h
+++ b/lib/eal/include/bus_driver.h
@@ -5,16 +5,16 @@
 #ifndef BUS_DRIVER_H
 #define BUS_DRIVER_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bus.h>
 #include <rte_compat.h>
 #include <rte_dev.h>
 #include <rte_eal.h>
 #include <rte_tailq.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_devargs;
 struct rte_device;
 
diff --git a/lib/eal/include/dev_driver.h b/lib/eal/include/dev_driver.h
index 5efa8c437e..f7a9c17dc3 100644
--- a/lib/eal/include/dev_driver.h
+++ b/lib/eal/include/dev_driver.h
@@ -5,13 +5,13 @@
 #ifndef DEV_DRIVER_H
 #define DEV_DRIVER_H
 
+#include <rte_common.h>
+#include <rte_dev.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <rte_dev.h>
-
 /**
  * A structure describing a device driver.
  */
diff --git a/lib/eal/include/eal_trace_internal.h b/lib/eal/include/eal_trace_internal.h
index 09c354717f..50f91d0929 100644
--- a/lib/eal/include/eal_trace_internal.h
+++ b/lib/eal/include/eal_trace_internal.h
@@ -11,16 +11,16 @@
  * API for EAL trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_alarm.h>
 #include <rte_interrupts.h>
 #include <rte_trace_point.h>
 
 #include "eal_interrupts.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Alarm */
 RTE_TRACE_POINT(
 	rte_eal_trace_alarm_set,
diff --git a/lib/eal/include/generic/rte_atomic.h b/lib/eal/include/generic/rte_atomic.h
index f859707744..0a4f3f8528 100644
--- a/lib/eal/include/generic/rte_atomic.h
+++ b/lib/eal/include/generic/rte_atomic.h
@@ -17,6 +17,10 @@
 #include <rte_common.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __DOXYGEN__
 
 /** @name Memory Barrier
@@ -1156,4 +1160,8 @@ rte_atomic128_cmp_exchange(rte_int128_t *dst,
 
 #endif /* __DOXYGEN__ */
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_ATOMIC_H_ */
diff --git a/lib/eal/include/generic/rte_byteorder.h b/lib/eal/include/generic/rte_byteorder.h
index f1c04ba83e..7973d6326f 100644
--- a/lib/eal/include/generic/rte_byteorder.h
+++ b/lib/eal/include/generic/rte_byteorder.h
@@ -24,6 +24,10 @@
 #include <rte_common.h>
 #include <rte_config.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Compile-time endianness detection
  */
@@ -251,4 +255,8 @@ static uint64_t rte_be_to_cpu_64(rte_be64_t x);
 #endif
 #endif
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_BYTEORDER_H_ */
diff --git a/lib/eal/include/generic/rte_cpuflags.h b/lib/eal/include/generic/rte_cpuflags.h
index d35551e931..bfe9df4516 100644
--- a/lib/eal/include/generic/rte_cpuflags.h
+++ b/lib/eal/include/generic/rte_cpuflags.h
@@ -15,6 +15,10 @@
 
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Structure used to describe platform-specific intrinsics that may or may not
  * be supported at runtime.
@@ -104,4 +108,8 @@ rte_cpu_getauxval(unsigned long type);
 int
 rte_cpu_strcmp_auxval(unsigned long type, const char *str);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_CPUFLAGS_H_ */
diff --git a/lib/eal/include/generic/rte_cycles.h b/lib/eal/include/generic/rte_cycles.h
index 075e899f5a..7cfd51f0eb 100644
--- a/lib/eal/include/generic/rte_cycles.h
+++ b/lib/eal/include/generic/rte_cycles.h
@@ -16,6 +16,10 @@
 #include <rte_debug.h>
 #include <rte_atomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define MS_PER_S 1000
 #define US_PER_S 1000000
 #define NS_PER_S 1000000000
@@ -175,4 +179,8 @@ void rte_delay_us_sleep(unsigned int us);
  */
 void rte_delay_us_callback_register(void(*userfunc)(unsigned int));
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_CYCLES_H_ */
diff --git a/lib/eal/include/generic/rte_io.h b/lib/eal/include/generic/rte_io.h
index ebcf8051e1..73b0f7a9f4 100644
--- a/lib/eal/include/generic/rte_io.h
+++ b/lib/eal/include/generic/rte_io.h
@@ -17,6 +17,10 @@
 #include <rte_compat.h>
 #include <rte_atomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __DOXYGEN__
 
 /**
@@ -396,4 +400,8 @@ rte_write32_wc_relaxed(uint32_t value, volatile void *addr)
 
 #endif /* RTE_OVERRIDE_IO_H */
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_IO_H_ */
diff --git a/lib/eal/include/generic/rte_memcpy.h b/lib/eal/include/generic/rte_memcpy.h
index e7f0f8eaa9..da53b72ca8 100644
--- a/lib/eal/include/generic/rte_memcpy.h
+++ b/lib/eal/include/generic/rte_memcpy.h
@@ -5,6 +5,10 @@
 #ifndef _RTE_MEMCPY_H_
 #define _RTE_MEMCPY_H_
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @file
  *
@@ -113,4 +117,8 @@ rte_memcpy(void *dst, const void *src, size_t n);
 
 #endif /* __DOXYGEN__ */
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_MEMCPY_H_ */
diff --git a/lib/eal/include/generic/rte_pause.h b/lib/eal/include/generic/rte_pause.h
index f2a1eadcbd..968c0886d3 100644
--- a/lib/eal/include/generic/rte_pause.h
+++ b/lib/eal/include/generic/rte_pause.h
@@ -19,6 +19,10 @@
 #include <rte_atomic.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Pause CPU execution for a short while
  *
@@ -136,4 +140,8 @@ rte_wait_until_equal_64(volatile uint64_t *addr, uint64_t expected,
 } while (0)
 #endif /* ! RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED */
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_PAUSE_H_ */
diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h
index ea899f1bfa..86c0559468 100644
--- a/lib/eal/include/generic/rte_power_intrinsics.h
+++ b/lib/eal/include/generic/rte_power_intrinsics.h
@@ -9,6 +9,10 @@
 
 #include <rte_spinlock.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @file
  * Advanced power management operations.
@@ -147,4 +151,8 @@ int rte_power_pause(const uint64_t tsc_timestamp);
 int rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[],
 		const uint32_t num, const uint64_t tsc_timestamp);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_POWER_INTRINSIC_H_ */
diff --git a/lib/eal/include/generic/rte_prefetch.h b/lib/eal/include/generic/rte_prefetch.h
index 773b3b8d1e..f7ac4ab48a 100644
--- a/lib/eal/include/generic/rte_prefetch.h
+++ b/lib/eal/include/generic/rte_prefetch.h
@@ -7,6 +7,10 @@
 
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @file
  *
@@ -146,4 +150,8 @@ __rte_experimental
 static inline void
 rte_cldemote(const volatile void *p);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_PREFETCH_H_ */
diff --git a/lib/eal/include/generic/rte_rwlock.h b/lib/eal/include/generic/rte_rwlock.h
index 5f939be98c..ac0474466a 100644
--- a/lib/eal/include/generic/rte_rwlock.h
+++ b/lib/eal/include/generic/rte_rwlock.h
@@ -22,10 +22,6 @@
  *  https://locklessinc.com/articles/locks/
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <errno.h>
 
 #include <rte_branch_prediction.h>
@@ -34,6 +30,10 @@ extern "C" {
 #include <rte_pause.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_rwlock_t type.
  *
diff --git a/lib/eal/include/generic/rte_spinlock.h b/lib/eal/include/generic/rte_spinlock.h
index 23fb04896f..c2980601b2 100644
--- a/lib/eal/include/generic/rte_spinlock.h
+++ b/lib/eal/include/generic/rte_spinlock.h
@@ -25,6 +25,10 @@
 #include <rte_pause.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_spinlock_t type.
  */
@@ -318,4 +322,8 @@ __rte_warn_unused_result
 static inline int rte_spinlock_recursive_trylock_tm(
 	rte_spinlock_recursive_t *slr);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_SPINLOCK_H_ */
diff --git a/lib/eal/include/generic/rte_vect.h b/lib/eal/include/generic/rte_vect.h
index 1f84292a41..b87520a4d9 100644
--- a/lib/eal/include/generic/rte_vect.h
+++ b/lib/eal/include/generic/rte_vect.h
@@ -209,6 +209,10 @@ enum rte_vect_max_simd {
 	 */
 };
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Get the supported SIMD bitwidth.
  *
@@ -230,4 +234,8 @@ uint16_t rte_vect_get_max_simd_bitwidth(void);
  */
 int rte_vect_set_max_simd_bitwidth(uint16_t bitwidth);
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif /* _RTE_VECT_H_ */
diff --git a/lib/eal/include/rte_alarm.h b/lib/eal/include/rte_alarm.h
index 7e4d0b2407..9b4721b77f 100644
--- a/lib/eal/include/rte_alarm.h
+++ b/lib/eal/include/rte_alarm.h
@@ -14,12 +14,12 @@
  * Does not require hpet support.
  */
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 /**
  * Signature of callback back function called when an alarm goes off.
  */
diff --git a/lib/eal/include/rte_bitmap.h b/lib/eal/include/rte_bitmap.h
index ebe46000a0..abb102f1d3 100644
--- a/lib/eal/include/rte_bitmap.h
+++ b/lib/eal/include/rte_bitmap.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_BITMAP_H__
 #define __INCLUDE_RTE_BITMAP_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Bitmap
@@ -43,6 +39,10 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_prefetch.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Slab */
 #define RTE_BITMAP_SLAB_BIT_SIZE                 64
 #define RTE_BITMAP_SLAB_BIT_SIZE_LOG2            6
diff --git a/lib/eal/include/rte_bus.h b/lib/eal/include/rte_bus.h
index dfe756fb11..519f7b35f0 100644
--- a/lib/eal/include/rte_bus.h
+++ b/lib/eal/include/rte_bus.h
@@ -14,14 +14,14 @@
  * over the devices and drivers in EAL.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 
 #include <rte_eal.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_bus;
 struct rte_device;
 
diff --git a/lib/eal/include/rte_class.h b/lib/eal/include/rte_class.h
index 16e544ec9a..7631e36e82 100644
--- a/lib/eal/include/rte_class.h
+++ b/lib/eal/include/rte_class.h
@@ -18,12 +18,12 @@
  * cryptographic co-processor (crypto), etc.
  */
 
+#include <rte_dev.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_dev.h>
-
 /** Double linked list of classes */
 RTE_TAILQ_HEAD(rte_class_list, rte_class);
 
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index eec0400dad..2486caa471 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -12,10 +12,6 @@
  * for DPDK.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <assert.h>
 #include <limits.h>
 #include <stdint.h>
@@ -26,6 +22,10 @@ extern "C" {
 /* OS specific include */
 #include <rte_os.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_TOOLCHAIN_MSVC
 #ifndef typeof
 #define typeof __typeof__
diff --git a/lib/eal/include/rte_dev.h b/lib/eal/include/rte_dev.h
index cefa04f905..738400e8d1 100644
--- a/lib/eal/include/rte_dev.h
+++ b/lib/eal/include/rte_dev.h
@@ -13,16 +13,16 @@
  * This file manages the list of device drivers.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 
 #include <rte_config.h>
 #include <rte_common.h>
 #include <rte_log.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_bus;
 struct rte_devargs;
 struct rte_device;
diff --git a/lib/eal/include/rte_devargs.h b/lib/eal/include/rte_devargs.h
index 515e978bbe..ed5a4675d9 100644
--- a/lib/eal/include/rte_devargs.h
+++ b/lib/eal/include/rte_devargs.h
@@ -16,14 +16,14 @@
  * list of rte_devargs structures.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 
 #include <rte_dev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_bus;
 
 /**
diff --git a/lib/eal/include/rte_eal_trace.h b/lib/eal/include/rte_eal_trace.h
index c3d15bbe5e..9ad2112801 100644
--- a/lib/eal/include/rte_eal_trace.h
+++ b/lib/eal/include/rte_eal_trace.h
@@ -11,12 +11,12 @@
  * API for EAL trace support
  */
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 /* Generic */
 RTE_TRACE_POINT(
 	rte_eal_trace_generic_void,
diff --git a/lib/eal/include/rte_errno.h b/lib/eal/include/rte_errno.h
index ba45591d24..c49818a40e 100644
--- a/lib/eal/include/rte_errno.h
+++ b/lib/eal/include/rte_errno.h
@@ -11,12 +11,12 @@
 #ifndef _RTE_ERRNO_H_
 #define _RTE_ERRNO_H_
 
+#include <rte_per_lcore.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_per_lcore.h>
-
 RTE_DECLARE_PER_LCORE(int, _rte_errno); /**< Per core error number. */
 
 /**
diff --git a/lib/eal/include/rte_fbarray.h b/lib/eal/include/rte_fbarray.h
index e33076778f..27dbfc2d6c 100644
--- a/lib/eal/include/rte_fbarray.h
+++ b/lib/eal/include/rte_fbarray.h
@@ -30,14 +30,14 @@
  * another process is using ``rte_fbarray``.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 
 #include <rte_rwlock.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_FBARRAY_NAME_LEN 64
 
 struct rte_fbarray {
diff --git a/lib/eal/include/rte_keepalive.h b/lib/eal/include/rte_keepalive.h
index 3ec413da01..9ff870f6b4 100644
--- a/lib/eal/include/rte_keepalive.h
+++ b/lib/eal/include/rte_keepalive.h
@@ -10,13 +10,13 @@
 #ifndef _KEEPALIVE_H_
 #define _KEEPALIVE_H_
 
+#include <rte_config.h>
+#include <rte_memory.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_config.h>
-#include <rte_memory.h>
-
 #ifndef RTE_KEEPALIVE_MAXCORES
 /**
  * Number of cores to track.
diff --git a/lib/eal/include/rte_mcslock.h b/lib/eal/include/rte_mcslock.h
index 0aeb1a09f4..bb218d2e50 100644
--- a/lib/eal/include/rte_mcslock.h
+++ b/lib/eal/include/rte_mcslock.h
@@ -19,16 +19,16 @@
  * they acquired the lock.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_lcore.h>
 #include <rte_common.h>
 #include <rte_pause.h>
 #include <rte_branch_prediction.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_mcslock_t type.
  */
diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h
index 842362d527..dbd0a6bedc 100644
--- a/lib/eal/include/rte_memory.h
+++ b/lib/eal/include/rte_memory.h
@@ -15,16 +15,16 @@
 #include <stddef.h>
 #include <stdio.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_bitops.h>
 #include <rte_common.h>
 #include <rte_config.h>
 #include <rte_eal_memconfig.h>
 #include <rte_fbarray.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_PGSIZE_4K   (1ULL << 12)
 #define RTE_PGSIZE_64K  (1ULL << 16)
 #define RTE_PGSIZE_256K (1ULL << 18)
diff --git a/lib/eal/include/rte_pci_dev_features.h b/lib/eal/include/rte_pci_dev_features.h
index ee6e10590c..bc6d3d4c1f 100644
--- a/lib/eal/include/rte_pci_dev_features.h
+++ b/lib/eal/include/rte_pci_dev_features.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_PCI_DEV_FEATURES_H
 #define _RTE_PCI_DEV_FEATURES_H
 
+#include <rte_pci_dev_feature_defs.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_pci_dev_feature_defs.h>
-
 #define RTE_INTR_MODE_NONE_NAME "none"
 #define RTE_INTR_MODE_LEGACY_NAME "legacy"
 #define RTE_INTR_MODE_MSI_NAME "msi"
diff --git a/lib/eal/include/rte_pflock.h b/lib/eal/include/rte_pflock.h
index 37aa223ac3..6797ce5920 100644
--- a/lib/eal/include/rte_pflock.h
+++ b/lib/eal/include/rte_pflock.h
@@ -27,14 +27,14 @@
  * All locks must be initialised before use, and only initialised once.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_pause.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_pflock_t type.
  */
diff --git a/lib/eal/include/rte_random.h b/lib/eal/include/rte_random.h
index 5031c6fe5f..15cbe6215a 100644
--- a/lib/eal/include/rte_random.h
+++ b/lib/eal/include/rte_random.h
@@ -11,12 +11,12 @@
  * Pseudo-random Generators in RTE
  */
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 /**
  * Seed the pseudo-random generator.
  *
diff --git a/lib/eal/include/rte_seqcount.h b/lib/eal/include/rte_seqcount.h
index 88a6746900..d71afa6ab7 100644
--- a/lib/eal/include/rte_seqcount.h
+++ b/lib/eal/include/rte_seqcount.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_SEQCOUNT_H_
 #define _RTE_SEQCOUNT_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Seqcount
@@ -27,6 +23,10 @@ extern "C" {
 #include <rte_branch_prediction.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The RTE seqcount type.
  */
diff --git a/lib/eal/include/rte_seqlock.h b/lib/eal/include/rte_seqlock.h
index 2677bd9440..e0e94900d1 100644
--- a/lib/eal/include/rte_seqlock.h
+++ b/lib/eal/include/rte_seqlock.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_SEQLOCK_H_
 #define _RTE_SEQLOCK_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Seqlock
@@ -95,6 +91,10 @@ extern "C" {
 #include <rte_seqcount.h>
 #include <rte_spinlock.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The RTE seqlock type.
  */
diff --git a/lib/eal/include/rte_service.h b/lib/eal/include/rte_service.h
index e49a7a877e..94919ae584 100644
--- a/lib/eal/include/rte_service.h
+++ b/lib/eal/include/rte_service.h
@@ -23,16 +23,16 @@
  * application has access to the remaining lcores as normal.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include<stdio.h>
 #include <stdint.h>
 
 #include <rte_config.h>
 #include <rte_lcore.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_SERVICE_NAME_MAX 32
 
 /* Capabilities of a service.
diff --git a/lib/eal/include/rte_service_component.h b/lib/eal/include/rte_service_component.h
index a5350c97e5..acdf45cf60 100644
--- a/lib/eal/include/rte_service_component.h
+++ b/lib/eal/include/rte_service_component.h
@@ -10,12 +10,12 @@
  * operate, and you wish to run the component using service cores
  */
 
+#include <rte_service.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_service.h>
-
 /**
  * Signature of callback function to run a service.
  *
diff --git a/lib/eal/include/rte_stdatomic.h b/lib/eal/include/rte_stdatomic.h
index 7a081cb500..0f11a15e4e 100644
--- a/lib/eal/include/rte_stdatomic.h
+++ b/lib/eal/include/rte_stdatomic.h
@@ -7,10 +7,6 @@
 
 #include <assert.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef RTE_ENABLE_STDATOMIC
 #ifndef _MSC_VER
 #ifdef __STDC_NO_ATOMICS__
@@ -188,6 +184,7 @@ typedef int rte_memory_order;
 #endif
 
 #ifdef __cplusplus
+extern "C" {
 }
 #endif
 
diff --git a/lib/eal/include/rte_string_fns.h b/lib/eal/include/rte_string_fns.h
index 13badec7b3..702bd81251 100644
--- a/lib/eal/include/rte_string_fns.h
+++ b/lib/eal/include/rte_string_fns.h
@@ -11,10 +11,6 @@
 #ifndef _RTE_STRING_FNS_H_
 #define _RTE_STRING_FNS_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <ctype.h>
 #include <stdio.h>
 #include <string.h>
@@ -22,6 +18,10 @@ extern "C" {
 #include <rte_common.h>
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Takes string "string" parameter and splits it at character "delim"
  * up to maxtokens-1 times - to give "maxtokens" resulting tokens. Like
@@ -77,6 +77,10 @@ rte_strlcat(char *dst, const char *src, size_t size)
 	return l + strlen(src);
 }
 
+#ifdef __cplusplus
+}
+#endif
+
 /* pull in a strlcpy function */
 #ifdef RTE_EXEC_ENV_FREEBSD
 #ifndef __BSD_VISIBLE /* non-standard functions are hidden */
@@ -95,6 +99,10 @@ rte_strlcat(char *dst, const char *src, size_t size)
 #endif /* RTE_USE_LIBBSD */
 #endif /* FREEBSD */
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Copy string src to buffer dst of size dsize.
  * At most dsize-1 chars will be copied.
@@ -141,7 +149,6 @@ rte_str_skip_leading_spaces(const char *src)
 	return p;
 }
 
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/include/rte_tailq.h b/lib/eal/include/rte_tailq.h
index 931d549e59..89f7ef2134 100644
--- a/lib/eal/include/rte_tailq.h
+++ b/lib/eal/include/rte_tailq.h
@@ -10,13 +10,13 @@
  *  Here defines rte_tailq APIs for only internal use
  */
 
+#include <stdio.h>
+#include <rte_debug.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdio.h>
-#include <rte_debug.h>
-
 /** dummy structure type used by the rte_tailq APIs */
 struct rte_tailq_entry {
 	RTE_TAILQ_ENTRY(rte_tailq_entry) next; /**< Pointer entries for a tailq list */
diff --git a/lib/eal/include/rte_ticketlock.h b/lib/eal/include/rte_ticketlock.h
index 73884eb07b..e60f60699c 100644
--- a/lib/eal/include/rte_ticketlock.h
+++ b/lib/eal/include/rte_ticketlock.h
@@ -17,15 +17,15 @@
  * All locks must be initialised before use, and only initialised once.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_lcore.h>
 #include <rte_pause.h>
 #include <rte_stdatomic.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * The rte_ticketlock_t type.
  */
diff --git a/lib/eal/include/rte_time.h b/lib/eal/include/rte_time.h
index ec25f7b93d..c5c3a233e4 100644
--- a/lib/eal/include/rte_time.h
+++ b/lib/eal/include/rte_time.h
@@ -5,13 +5,13 @@
 #ifndef _RTE_TIME_H_
 #define _RTE_TIME_H_
 
+#include <stdint.h>
+#include <time.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include <time.h>
-
 #define NSEC_PER_SEC             1000000000L
 
 /**
diff --git a/lib/eal/include/rte_trace.h b/lib/eal/include/rte_trace.h
index a6e991fad3..1c824b2158 100644
--- a/lib/eal/include/rte_trace.h
+++ b/lib/eal/include/rte_trace.h
@@ -16,16 +16,16 @@
  * @b EXPERIMENTAL: this API may change without prior notice
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stdio.h>
 
 #include <rte_common.h>
 #include <rte_compat.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  *  Test if trace is enabled.
  *
diff --git a/lib/eal/include/rte_trace_point.h b/lib/eal/include/rte_trace_point.h
index 41e2a7f99e..bc737d585e 100644
--- a/lib/eal/include/rte_trace_point.h
+++ b/lib/eal/include/rte_trace_point.h
@@ -16,10 +16,6 @@
  * @b EXPERIMENTAL: this API may change without prior notice
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stdio.h>
 
@@ -32,6 +28,10 @@ extern "C" {
 #include <rte_string_fns.h>
 #include <rte_uuid.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** The tracepoint object. */
 typedef RTE_ATOMIC(uint64_t) rte_trace_point_t;
 
diff --git a/lib/eal/include/rte_trace_point_register.h b/lib/eal/include/rte_trace_point_register.h
index 41260e5964..8726338fe4 100644
--- a/lib/eal/include/rte_trace_point_register.h
+++ b/lib/eal/include/rte_trace_point_register.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_TRACE_POINT_REGISTER_H_
 #define _RTE_TRACE_POINT_REGISTER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef _RTE_TRACE_POINT_H_
 #error for registration, include this file first before <rte_trace_point.h>
 #endif
@@ -16,6 +12,10 @@ extern "C" {
 #include <rte_per_lcore.h>
 #include <rte_trace_point.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_DECLARE_PER_LCORE(volatile int, trace_point_sz);
 
 #define RTE_TRACE_POINT_REGISTER(trace, name) \
diff --git a/lib/eal/include/rte_uuid.h b/lib/eal/include/rte_uuid.h
index cfefd4308a..def5907a00 100644
--- a/lib/eal/include/rte_uuid.h
+++ b/lib/eal/include/rte_uuid.h
@@ -10,14 +10,14 @@
 #ifndef _RTE_UUID_H_
 #define _RTE_UUID_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stddef.h>
 #include <string.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Struct describing a Universal Unique Identifier
  */
diff --git a/lib/eal/include/rte_version.h b/lib/eal/include/rte_version.h
index 422d00fdff..be3f753617 100644
--- a/lib/eal/include/rte_version.h
+++ b/lib/eal/include/rte_version.h
@@ -10,13 +10,13 @@
 #ifndef _RTE_VERSION_H_
 #define _RTE_VERSION_H_
 
+#include <string.h>
+#include <stdio.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <string.h>
-#include <stdio.h>
-
 /**
  * Macro to compute a version number usable for comparisons
  */
diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h
index b774625d9f..06b249dca0 100644
--- a/lib/eal/include/rte_vfio.h
+++ b/lib/eal/include/rte_vfio.h
@@ -10,10 +10,6 @@
  * RTE VFIO. This library provides various VFIO related utility functions.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 #include <stdint.h>
 
@@ -36,6 +32,10 @@ extern "C" {
 
 #include <linux/vfio.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define VFIO_DIR "/dev/vfio"
 #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
 #define VFIO_GROUP_FMT "/dev/vfio/%u"
diff --git a/lib/eal/linux/include/rte_os.h b/lib/eal/linux/include/rte_os.h
index c72bf5b7e6..dba0e29827 100644
--- a/lib/eal/linux/include/rte_os.h
+++ b/lib/eal/linux/include/rte_os.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_OS_H_
 #define _RTE_OS_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * This header should contain any definition
  * which is not supported natively or named differently in Linux.
@@ -17,6 +13,10 @@ extern "C" {
 #include <sched.h>
 #include <sys/queue.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* These macros are compatible with system's sys/queue.h. */
 #define RTE_TAILQ_HEAD(name, type) TAILQ_HEAD(name, type)
 #define RTE_TAILQ_ENTRY(type) TAILQ_ENTRY(type)
diff --git a/lib/eal/loongarch/include/rte_atomic.h b/lib/eal/loongarch/include/rte_atomic.h
index 0510b8f781..c8066a4612 100644
--- a/lib/eal/loongarch/include/rte_atomic.h
+++ b/lib/eal/loongarch/include/rte_atomic.h
@@ -9,13 +9,13 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
+#include <rte_common.h>
+#include "generic/rte_atomic.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_atomic.h"
-
 #define rte_mb()	do { asm volatile("dbar 0":::"memory"); } while (0)
 
 #define rte_wmb()	rte_mb()
diff --git a/lib/eal/loongarch/include/rte_byteorder.h b/lib/eal/loongarch/include/rte_byteorder.h
index 0da6097a4f..9b092e2a59 100644
--- a/lib/eal/loongarch/include/rte_byteorder.h
+++ b/lib/eal/loongarch/include/rte_byteorder.h
@@ -5,12 +5,12 @@
 #ifndef RTE_BYTEORDER_LOONGARCH_H
 #define RTE_BYTEORDER_LOONGARCH_H
 
+#include "generic/rte_byteorder.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_byteorder.h"
-
 #if RTE_BYTE_ORDER == RTE_LITTLE_ENDIAN
 
 #define rte_cpu_to_le_16(x) (x)
diff --git a/lib/eal/loongarch/include/rte_cpuflags.h b/lib/eal/loongarch/include/rte_cpuflags.h
index 6b592c147c..c1e04ac545 100644
--- a/lib/eal/loongarch/include/rte_cpuflags.h
+++ b/lib/eal/loongarch/include/rte_cpuflags.h
@@ -5,10 +5,6 @@
 #ifndef RTE_CPUFLAGS_LOONGARCH_H
 #define RTE_CPUFLAGS_LOONGARCH_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -30,6 +26,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/loongarch/include/rte_cycles.h b/lib/eal/loongarch/include/rte_cycles.h
index f612d1ad10..128c8646e9 100644
--- a/lib/eal/loongarch/include/rte_cycles.h
+++ b/lib/eal/loongarch/include/rte_cycles.h
@@ -5,12 +5,12 @@
 #ifndef RTE_CYCLES_LOONGARCH_H
 #define RTE_CYCLES_LOONGARCH_H
 
+#include "generic/rte_cycles.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_cycles.h"
-
 /**
  * Read the time base register.
  *
diff --git a/lib/eal/loongarch/include/rte_io.h b/lib/eal/loongarch/include/rte_io.h
index 40e40efa86..e32a4737b2 100644
--- a/lib/eal/loongarch/include/rte_io.h
+++ b/lib/eal/loongarch/include/rte_io.h
@@ -5,12 +5,12 @@
 #ifndef RTE_IO_LOONGARCH_H
 #define RTE_IO_LOONGARCH_H
 
+#include "generic/rte_io.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_io.h"
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/loongarch/include/rte_memcpy.h b/lib/eal/loongarch/include/rte_memcpy.h
index 22578d40f4..5412a0fdc1 100644
--- a/lib/eal/loongarch/include/rte_memcpy.h
+++ b/lib/eal/loongarch/include/rte_memcpy.h
@@ -10,12 +10,12 @@
 
 #include "rte_common.h"
 
+#include "generic/rte_memcpy.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_memcpy.h"
-
 static inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
diff --git a/lib/eal/loongarch/include/rte_pause.h b/lib/eal/loongarch/include/rte_pause.h
index 4302e1b9be..cffa2874d6 100644
--- a/lib/eal/loongarch/include/rte_pause.h
+++ b/lib/eal/loongarch/include/rte_pause.h
@@ -5,14 +5,14 @@
 #ifndef RTE_PAUSE_LOONGARCH_H
 #define RTE_PAUSE_LOONGARCH_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "rte_atomic.h"
 
 #include "generic/rte_pause.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_pause(void)
 {
 }
diff --git a/lib/eal/loongarch/include/rte_power_intrinsics.h b/lib/eal/loongarch/include/rte_power_intrinsics.h
index d5dbd94567..9e11478206 100644
--- a/lib/eal/loongarch/include/rte_power_intrinsics.h
+++ b/lib/eal/loongarch/include/rte_power_intrinsics.h
@@ -5,14 +5,14 @@
 #ifndef RTE_POWER_INTRINSIC_LOONGARCH_H
 #define RTE_POWER_INTRINSIC_LOONGARCH_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/loongarch/include/rte_prefetch.h b/lib/eal/loongarch/include/rte_prefetch.h
index 64b1fd2c2a..8da08a5566 100644
--- a/lib/eal/loongarch/include/rte_prefetch.h
+++ b/lib/eal/loongarch/include/rte_prefetch.h
@@ -5,14 +5,14 @@
 #ifndef RTE_PREFETCH_LOONGARCH_H
 #define RTE_PREFETCH_LOONGARCH_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	__builtin_prefetch((const void *)(uintptr_t)p, 0, 3);
diff --git a/lib/eal/loongarch/include/rte_rwlock.h b/lib/eal/loongarch/include/rte_rwlock.h
index aedc6f3349..48924599c5 100644
--- a/lib/eal/loongarch/include/rte_rwlock.h
+++ b/lib/eal/loongarch/include/rte_rwlock.h
@@ -5,12 +5,12 @@
 #ifndef RTE_RWLOCK_LOONGARCH_H
 #define RTE_RWLOCK_LOONGARCH_H
 
+#include "generic/rte_rwlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 {
diff --git a/lib/eal/loongarch/include/rte_spinlock.h b/lib/eal/loongarch/include/rte_spinlock.h
index e8d34e9728..38f00f631d 100644
--- a/lib/eal/loongarch/include/rte_spinlock.h
+++ b/lib/eal/loongarch/include/rte_spinlock.h
@@ -5,13 +5,13 @@
 #ifndef RTE_SPINLOCK_LOONGARCH_H
 #define RTE_SPINLOCK_LOONGARCH_H
 
+#include <rte_common.h>
+#include "generic/rte_spinlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_spinlock.h"
-
 #ifndef RTE_FORCE_INTRINSICS
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
diff --git a/lib/eal/ppc/include/rte_atomic.h b/lib/eal/ppc/include/rte_atomic.h
index 645c7132df..6ce2e5188a 100644
--- a/lib/eal/ppc/include/rte_atomic.h
+++ b/lib/eal/ppc/include/rte_atomic.h
@@ -12,13 +12,13 @@
 #ifndef _RTE_ATOMIC_PPC_64_H_
 #define _RTE_ATOMIC_PPC_64_H_
 
+#include <stdint.h>
+#include "generic/rte_atomic.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include "generic/rte_atomic.h"
-
 #define	rte_mb()  asm volatile("sync" : : : "memory")
 
 #define	rte_wmb() asm volatile("sync" : : : "memory")
diff --git a/lib/eal/ppc/include/rte_byteorder.h b/lib/eal/ppc/include/rte_byteorder.h
index de94e2ad32..1d19e96f72 100644
--- a/lib/eal/ppc/include/rte_byteorder.h
+++ b/lib/eal/ppc/include/rte_byteorder.h
@@ -8,13 +8,13 @@
 #ifndef _RTE_BYTEORDER_PPC_64_H_
 #define _RTE_BYTEORDER_PPC_64_H_
 
+#include <stdint.h>
+#include "generic/rte_byteorder.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include "generic/rte_byteorder.h"
-
 /*
  * An architecture-optimized byte swap for a 16-bit value.
  *
diff --git a/lib/eal/ppc/include/rte_cpuflags.h b/lib/eal/ppc/include/rte_cpuflags.h
index dedc1ab469..b7bb8f6872 100644
--- a/lib/eal/ppc/include/rte_cpuflags.h
+++ b/lib/eal/ppc/include/rte_cpuflags.h
@@ -6,10 +6,6 @@
 #ifndef _RTE_CPUFLAGS_PPC_64_H_
 #define _RTE_CPUFLAGS_PPC_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -52,6 +48,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/ppc/include/rte_cycles.h b/lib/eal/ppc/include/rte_cycles.h
index 666fc9b0bf..1e6e6cccc8 100644
--- a/lib/eal/ppc/include/rte_cycles.h
+++ b/lib/eal/ppc/include/rte_cycles.h
@@ -6,10 +6,6 @@
 #ifndef _RTE_CYCLES_PPC_64_H_
 #define _RTE_CYCLES_PPC_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <features.h>
 #ifdef __GLIBC__
 #include <sys/platform/ppc.h>
@@ -20,6 +16,10 @@ extern "C" {
 #include <rte_byteorder.h>
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Read the time base register.
  *
diff --git a/lib/eal/ppc/include/rte_io.h b/lib/eal/ppc/include/rte_io.h
index 01455065e5..c9371b784e 100644
--- a/lib/eal/ppc/include/rte_io.h
+++ b/lib/eal/ppc/include/rte_io.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_IO_PPC_64_H_
 #define _RTE_IO_PPC_64_H_
 
+#include "generic/rte_io.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_io.h"
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/ppc/include/rte_memcpy.h b/lib/eal/ppc/include/rte_memcpy.h
index 6f388c0234..eae73128c4 100644
--- a/lib/eal/ppc/include/rte_memcpy.h
+++ b/lib/eal/ppc/include/rte_memcpy.h
@@ -12,12 +12,12 @@
 #include "rte_altivec.h"
 #include "rte_common.h"
 
+#include "generic/rte_memcpy.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_memcpy.h"
-
 #if defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION >= 90000)
 #pragma GCC diagnostic push
 #pragma GCC diagnostic ignored "-Warray-bounds"
diff --git a/lib/eal/ppc/include/rte_pause.h b/lib/eal/ppc/include/rte_pause.h
index 16e47ce22f..78a73aceed 100644
--- a/lib/eal/ppc/include/rte_pause.h
+++ b/lib/eal/ppc/include/rte_pause.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_PAUSE_PPC64_H_
 #define _RTE_PAUSE_PPC64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "rte_atomic.h"
 
 #include "generic/rte_pause.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_pause(void)
 {
 	/* Set hardware multi-threading low priority */
diff --git a/lib/eal/ppc/include/rte_power_intrinsics.h b/lib/eal/ppc/include/rte_power_intrinsics.h
index c0e9ac279f..6207eeb04d 100644
--- a/lib/eal/ppc/include/rte_power_intrinsics.h
+++ b/lib/eal/ppc/include/rte_power_intrinsics.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_POWER_INTRINSIC_PPC_H_
 #define _RTE_POWER_INTRINSIC_PPC_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/ppc/include/rte_prefetch.h b/lib/eal/ppc/include/rte_prefetch.h
index 2e1b5751e0..bae95af7bf 100644
--- a/lib/eal/ppc/include/rte_prefetch.h
+++ b/lib/eal/ppc/include/rte_prefetch.h
@@ -6,14 +6,14 @@
 #ifndef _RTE_PREFETCH_PPC_64_H_
 #define _RTE_PREFETCH_PPC_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	asm volatile ("dcbt 0,%[p],0" : : [p] "r" (p));
diff --git a/lib/eal/ppc/include/rte_rwlock.h b/lib/eal/ppc/include/rte_rwlock.h
index 9fadc04076..bee8da4070 100644
--- a/lib/eal/ppc/include/rte_rwlock.h
+++ b/lib/eal/ppc/include/rte_rwlock.h
@@ -3,12 +3,12 @@
 #ifndef _RTE_RWLOCK_PPC_64_H_
 #define _RTE_RWLOCK_PPC_64_H_
 
+#include "generic/rte_rwlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 {
diff --git a/lib/eal/ppc/include/rte_spinlock.h b/lib/eal/ppc/include/rte_spinlock.h
index 3a4c905b22..77f90f974a 100644
--- a/lib/eal/ppc/include/rte_spinlock.h
+++ b/lib/eal/ppc/include/rte_spinlock.h
@@ -6,14 +6,14 @@
 #ifndef _RTE_SPINLOCK_PPC_64_H_
 #define _RTE_SPINLOCK_PPC_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_pause.h>
 #include "generic/rte_spinlock.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Fixme: Use intrinsics to implement the spinlock on Power architecture */
 
 #ifndef RTE_FORCE_INTRINSICS
diff --git a/lib/eal/riscv/include/rte_atomic.h b/lib/eal/riscv/include/rte_atomic.h
index 2603bc90ea..66346ad474 100644
--- a/lib/eal/riscv/include/rte_atomic.h
+++ b/lib/eal/riscv/include/rte_atomic.h
@@ -12,15 +12,15 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include <rte_config.h>
 #include "generic/rte_atomic.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define rte_mb()	asm volatile("fence rw, rw" : : : "memory")
 
 #define rte_wmb()	asm volatile("fence w, w" : : : "memory")
diff --git a/lib/eal/riscv/include/rte_byteorder.h b/lib/eal/riscv/include/rte_byteorder.h
index 25bd0c275d..c9ff5c0dd1 100644
--- a/lib/eal/riscv/include/rte_byteorder.h
+++ b/lib/eal/riscv/include/rte_byteorder.h
@@ -8,14 +8,14 @@
 #ifndef RTE_BYTEORDER_RISCV_H
 #define RTE_BYTEORDER_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include "generic/rte_byteorder.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_BYTE_ORDER
 #define RTE_BYTE_ORDER RTE_LITTLE_ENDIAN
 #endif
diff --git a/lib/eal/riscv/include/rte_cpuflags.h b/lib/eal/riscv/include/rte_cpuflags.h
index d742efc40f..ac2004f02d 100644
--- a/lib/eal/riscv/include/rte_cpuflags.h
+++ b/lib/eal/riscv/include/rte_cpuflags.h
@@ -8,10 +8,6 @@
 #ifndef RTE_CPUFLAGS_RISCV_H
 #define RTE_CPUFLAGS_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * Enumeration of all CPU features supported
  */
@@ -46,6 +42,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/riscv/include/rte_cycles.h b/lib/eal/riscv/include/rte_cycles.h
index 04750ca253..7926809a73 100644
--- a/lib/eal/riscv/include/rte_cycles.h
+++ b/lib/eal/riscv/include/rte_cycles.h
@@ -8,12 +8,12 @@
 #ifndef RTE_CYCLES_RISCV_H
 #define RTE_CYCLES_RISCV_H
 
+#include "generic/rte_cycles.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_cycles.h"
-
 #ifndef RTE_RISCV_RDTSC_USE_HPM
 #define RTE_RISCV_RDTSC_USE_HPM 0
 #endif
diff --git a/lib/eal/riscv/include/rte_io.h b/lib/eal/riscv/include/rte_io.h
index 29659c9590..911dbb6bd2 100644
--- a/lib/eal/riscv/include/rte_io.h
+++ b/lib/eal/riscv/include/rte_io.h
@@ -8,12 +8,12 @@
 #ifndef RTE_IO_RISCV_H
 #define RTE_IO_RISCV_H
 
+#include "generic/rte_io.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_io.h"
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/riscv/include/rte_memcpy.h b/lib/eal/riscv/include/rte_memcpy.h
index e34f19396e..d8a942c5d2 100644
--- a/lib/eal/riscv/include/rte_memcpy.h
+++ b/lib/eal/riscv/include/rte_memcpy.h
@@ -12,12 +12,12 @@
 
 #include "rte_common.h"
 
+#include "generic/rte_memcpy.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_memcpy.h"
-
 static inline void
 rte_mov16(uint8_t *dst, const uint8_t *src)
 {
diff --git a/lib/eal/riscv/include/rte_pause.h b/lib/eal/riscv/include/rte_pause.h
index cb8e9ca52d..3f473cd8db 100644
--- a/lib/eal/riscv/include/rte_pause.h
+++ b/lib/eal/riscv/include/rte_pause.h
@@ -7,14 +7,14 @@
 #ifndef RTE_PAUSE_RISCV_H
 #define RTE_PAUSE_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "rte_atomic.h"
 
 #include "generic/rte_pause.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_pause(void)
 {
 	/* Insert pause hint directly to be compatible with old compilers.
diff --git a/lib/eal/riscv/include/rte_power_intrinsics.h b/lib/eal/riscv/include/rte_power_intrinsics.h
index 636e58e71f..3f7dba1640 100644
--- a/lib/eal/riscv/include/rte_power_intrinsics.h
+++ b/lib/eal/riscv/include/rte_power_intrinsics.h
@@ -7,14 +7,14 @@
 #ifndef RTE_POWER_INTRINSIC_RISCV_H
 #define RTE_POWER_INTRINSIC_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/riscv/include/rte_prefetch.h b/lib/eal/riscv/include/rte_prefetch.h
index 748cf1b626..42146491ea 100644
--- a/lib/eal/riscv/include/rte_prefetch.h
+++ b/lib/eal/riscv/include/rte_prefetch.h
@@ -8,14 +8,14 @@
 #ifndef RTE_PREFETCH_RISCV_H
 #define RTE_PREFETCH_RISCV_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 	RTE_SET_USED(p);
diff --git a/lib/eal/riscv/include/rte_rwlock.h b/lib/eal/riscv/include/rte_rwlock.h
index 9cdaf1b0ef..730970eecb 100644
--- a/lib/eal/riscv/include/rte_rwlock.h
+++ b/lib/eal/riscv/include/rte_rwlock.h
@@ -7,12 +7,12 @@
 #ifndef RTE_RWLOCK_RISCV_H
 #define RTE_RWLOCK_RISCV_H
 
+#include "generic/rte_rwlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 {
diff --git a/lib/eal/riscv/include/rte_spinlock.h b/lib/eal/riscv/include/rte_spinlock.h
index 6af430735c..5fe4980e44 100644
--- a/lib/eal/riscv/include/rte_spinlock.h
+++ b/lib/eal/riscv/include/rte_spinlock.h
@@ -12,13 +12,13 @@
 #  error Platform must be built with RTE_FORCE_INTRINSICS
 #endif
 
+#include <rte_common.h>
+#include "generic/rte_spinlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include "generic/rte_spinlock.h"
-
 static inline int rte_tm_supported(void)
 {
 	return 0;
diff --git a/lib/eal/windows/include/pthread.h b/lib/eal/windows/include/pthread.h
index 051b9311c2..e1c31017d1 100644
--- a/lib/eal/windows/include/pthread.h
+++ b/lib/eal/windows/include/pthread.h
@@ -13,13 +13,13 @@
  * eal_common_thread.c and common\include\rte_per_lcore.h as Microsoft libc
  * does not contain pthread.h. This may be removed in future releases.
  */
+#include <rte_common.h>
+#include <rte_windows.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <rte_windows.h>
-
 #define PTHREAD_BARRIER_SERIAL_THREAD TRUE
 
 /* defining pthread_t type on Windows since there is no in Microsoft libc*/
diff --git a/lib/eal/windows/include/regex.h b/lib/eal/windows/include/regex.h
index 827f938414..a224c0cd29 100644
--- a/lib/eal/windows/include/regex.h
+++ b/lib/eal/windows/include/regex.h
@@ -10,15 +10,15 @@
  * as Microsoft libc does not contain regex.h. This may be removed in
  * future releases.
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #define REG_NOMATCH 1
 #define REG_ESPACE 12
 
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* defining regex_t for Windows */
 typedef void *regex_t;
 /* defining regmatch_t for Windows */
diff --git a/lib/eal/windows/include/rte_windows.h b/lib/eal/windows/include/rte_windows.h
index 567ed7d820..e78f007ffa 100644
--- a/lib/eal/windows/include/rte_windows.h
+++ b/lib/eal/windows/include/rte_windows.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_WINDOWS_H_
 #define _RTE_WINDOWS_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file Windows-specific facilities
  *
@@ -44,6 +40,10 @@ extern "C" {
 #include <devguid.h>
 #include <rte_log.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Log GetLastError() with context, usually a Win32 API function and arguments.
  */
diff --git a/lib/eal/x86/include/rte_atomic.h b/lib/eal/x86/include/rte_atomic.h
index 74b1b24b7a..c72c47c83e 100644
--- a/lib/eal/x86/include/rte_atomic.h
+++ b/lib/eal/x86/include/rte_atomic.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_ATOMIC_X86_H_
 #define _RTE_ATOMIC_X86_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include <rte_config.h>
@@ -31,6 +27,10 @@ extern "C" {
 
 #define rte_smp_rmb() rte_compiler_barrier()
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * From Intel Software Development Manual; Vol 3;
  * 8.2.2 Memory Ordering in P6 and More Recent Processor Families:
@@ -99,10 +99,18 @@ rte_atomic_thread_fence(rte_memory_order memorder)
 		__rte_atomic_thread_fence(memorder);
 }
 
+#ifdef __cplusplus
+}
+#endif
+
 #ifndef RTE_TOOLCHAIN_MSVC
 
 /*------------------------- 16 bit atomic operations -------------------------*/
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_FORCE_INTRINSICS
 static inline int
 rte_atomic16_cmpset(volatile uint16_t *dst, uint16_t exp, uint16_t src)
@@ -273,6 +281,11 @@ static inline int rte_atomic32_dec_and_test(rte_atomic32_t *v)
 			);
 	return ret != 0;
 }
+
+#ifdef __cplusplus
+}
+#endif
+
 #endif
 
 #ifdef RTE_ARCH_I686
@@ -283,8 +296,4 @@ static inline int rte_atomic32_dec_and_test(rte_atomic32_t *v)
 
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif /* _RTE_ATOMIC_X86_H_ */
diff --git a/lib/eal/x86/include/rte_byteorder.h b/lib/eal/x86/include/rte_byteorder.h
index adbec0c157..5a49ffcd50 100644
--- a/lib/eal/x86/include/rte_byteorder.h
+++ b/lib/eal/x86/include/rte_byteorder.h
@@ -5,15 +5,15 @@
 #ifndef _RTE_BYTEORDER_X86_H_
 #define _RTE_BYTEORDER_X86_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_common.h>
 #include <rte_config.h>
 #include "generic/rte_byteorder.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_BYTE_ORDER
 #define RTE_BYTE_ORDER RTE_LITTLE_ENDIAN
 #endif
@@ -48,6 +48,10 @@ static inline uint32_t rte_arch_bswap32(uint32_t _x)
 	return x;
 }
 
+#ifdef __cplusplus
+}
+#endif
+
 #define rte_bswap16(x) ((uint16_t)(__builtin_constant_p(x) ?		\
 				   rte_constant_bswap16(x) :		\
 				   rte_arch_bswap16(x)))
@@ -83,8 +87,4 @@ static inline uint32_t rte_arch_bswap32(uint32_t _x)
 #define rte_be_to_cpu_32(x) rte_bswap32(x)
 #define rte_be_to_cpu_64(x) rte_bswap64(x)
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif /* _RTE_BYTEORDER_X86_H_ */
diff --git a/lib/eal/x86/include/rte_cpuflags.h b/lib/eal/x86/include/rte_cpuflags.h
index 1ee00e70fe..e843d1e5f4 100644
--- a/lib/eal/x86/include/rte_cpuflags.h
+++ b/lib/eal/x86/include/rte_cpuflags.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_CPUFLAGS_X86_64_H_
 #define _RTE_CPUFLAGS_X86_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 enum rte_cpu_flag_t {
 	/* (EAX 01h) ECX features*/
 	RTE_CPUFLAG_SSE3 = 0,               /**< SSE3 */
@@ -138,6 +134,10 @@ enum rte_cpu_flag_t {
 
 #include "generic/rte_cpuflags.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/x86/include/rte_cycles.h b/lib/eal/x86/include/rte_cycles.h
index 2afe85e28c..8de43840da 100644
--- a/lib/eal/x86/include/rte_cycles.h
+++ b/lib/eal/x86/include/rte_cycles.h
@@ -12,10 +12,6 @@
 #include <x86intrin.h>
 #endif
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "generic/rte_cycles.h"
 
 #ifdef RTE_LIBRTE_EAL_VMWARE_TSC_MAP_SUPPORT
@@ -26,6 +22,10 @@ extern int rte_cycles_vmware_tsc_map;
 #include <rte_common.h>
 #include <rte_config.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline uint64_t
 rte_rdtsc(void)
 {
diff --git a/lib/eal/x86/include/rte_io.h b/lib/eal/x86/include/rte_io.h
index 0e1fefdee1..c11cb8cd89 100644
--- a/lib/eal/x86/include/rte_io.h
+++ b/lib/eal/x86/include/rte_io.h
@@ -5,16 +5,16 @@
 #ifndef _RTE_IO_X86_H_
 #define _RTE_IO_X86_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include "rte_cpuflags.h"
 
 #define RTE_NATIVE_WRITE32_WC
 #include "generic/rte_io.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * MOVDIRI wrapper.
diff --git a/lib/eal/x86/include/rte_pause.h b/lib/eal/x86/include/rte_pause.h
index b4cf1df1d0..54f028b295 100644
--- a/lib/eal/x86/include/rte_pause.h
+++ b/lib/eal/x86/include/rte_pause.h
@@ -5,13 +5,14 @@
 #ifndef _RTE_PAUSE_X86_H_
 #define _RTE_PAUSE_X86_H_
 
+#include "generic/rte_pause.h"
+
+#include <emmintrin.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_pause.h"
-
-#include <emmintrin.h>
 static inline void rte_pause(void)
 {
 	_mm_pause();
diff --git a/lib/eal/x86/include/rte_power_intrinsics.h b/lib/eal/x86/include/rte_power_intrinsics.h
index e4c2b87f73..fcb780fc5b 100644
--- a/lib/eal/x86/include/rte_power_intrinsics.h
+++ b/lib/eal/x86/include/rte_power_intrinsics.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_POWER_INTRINSIC_X86_H_
 #define _RTE_POWER_INTRINSIC_X86_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 
 #include "generic/rte_power_intrinsics.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/x86/include/rte_prefetch.h b/lib/eal/x86/include/rte_prefetch.h
index 8a9377714f..34a609cc65 100644
--- a/lib/eal/x86/include/rte_prefetch.h
+++ b/lib/eal/x86/include/rte_prefetch.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_PREFETCH_X86_64_H_
 #define _RTE_PREFETCH_X86_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifdef RTE_TOOLCHAIN_MSVC
 #include <emmintrin.h>
 #endif
@@ -17,6 +13,10 @@ extern "C" {
 #include <rte_common.h>
 #include "generic/rte_prefetch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline void rte_prefetch0(const volatile void *p)
 {
 #ifdef RTE_TOOLCHAIN_MSVC
diff --git a/lib/eal/x86/include/rte_rwlock.h b/lib/eal/x86/include/rte_rwlock.h
index 1796b69265..281eff33b9 100644
--- a/lib/eal/x86/include/rte_rwlock.h
+++ b/lib/eal/x86/include/rte_rwlock.h
@@ -5,13 +5,13 @@
 #ifndef _RTE_RWLOCK_X86_64_H_
 #define _RTE_RWLOCK_X86_64_H_
 
+#include "generic/rte_rwlock.h"
+#include "rte_spinlock.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "generic/rte_rwlock.h"
-#include "rte_spinlock.h"
-
 static inline void
 rte_rwlock_read_lock_tm(rte_rwlock_t *rwl)
 	__rte_no_thread_safety_analysis
diff --git a/lib/eal/x86/include/rte_spinlock.h b/lib/eal/x86/include/rte_spinlock.h
index a6c23ea1f6..a14da41964 100644
--- a/lib/eal/x86/include/rte_spinlock.h
+++ b/lib/eal/x86/include/rte_spinlock.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_SPINLOCK_X86_64_H_
 #define _RTE_SPINLOCK_X86_64_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "generic/rte_spinlock.h"
 #include "rte_rtm.h"
 #include "rte_cpuflags.h"
@@ -17,6 +13,10 @@ extern "C" {
 #include "rte_pause.h"
 #include "rte_cycles.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_RTM_MAX_RETRIES (20)
 #define RTE_XABORT_LOCK_BUSY (0xff)
 
@@ -182,7 +182,6 @@ rte_spinlock_recursive_trylock_tm(rte_spinlock_recursive_t *slr)
 	return rte_spinlock_recursive_trylock(slr);
 }
 
-
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 883e59a927..ae00ead865 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_ETHDEV_DRIVER_H_
 #define _RTE_ETHDEV_DRIVER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -24,6 +20,10 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_ethdev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * Structure used to hold information about the callbacks to be called for a
diff --git a/lib/ethdev/ethdev_pci.h b/lib/ethdev/ethdev_pci.h
index ec4f731270..2229ffa252 100644
--- a/lib/ethdev/ethdev_pci.h
+++ b/lib/ethdev/ethdev_pci.h
@@ -6,16 +6,16 @@
 #ifndef _RTE_ETHDEV_PCI_H_
 #define _RTE_ETHDEV_PCI_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_malloc.h>
 #include <rte_pci.h>
 #include <bus_pci_driver.h>
 #include <rte_config.h>
 #include <ethdev_driver.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Copy pci device info to the Ethernet device data.
  * Shared memory (eth_dev->data) only updated by primary process, so it is safe
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 3bec87bfdb..36a38f718a 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -11,10 +11,6 @@
  * API for ethdev trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <dev_driver.h>
 #include <rte_trace_point.h>
 
@@ -22,6 +18,10 @@ extern "C" {
 #include "rte_mtr.h"
 #include "rte_tm.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_TRACE_POINT(
 	rte_ethdev_trace_configure,
 	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t nb_rx_q,
diff --git a/lib/ethdev/ethdev_vdev.h b/lib/ethdev/ethdev_vdev.h
index 364f140f91..010ec75a00 100644
--- a/lib/ethdev/ethdev_vdev.h
+++ b/lib/ethdev/ethdev_vdev.h
@@ -6,15 +6,15 @@
 #ifndef _RTE_ETHDEV_VDEV_H_
 #define _RTE_ETHDEV_VDEV_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_config.h>
 #include <rte_malloc.h>
 #include <bus_vdev_driver.h>
 #include <ethdev_driver.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * Allocates a new ethdev slot for an Ethernet device and returns the pointer
diff --git a/lib/ethdev/rte_cman.h b/lib/ethdev/rte_cman.h
index 297db8e095..dedd6cb71a 100644
--- a/lib/ethdev/rte_cman.h
+++ b/lib/ethdev/rte_cman.h
@@ -5,12 +5,12 @@
 #ifndef RTE_CMAN_H
 #define RTE_CMAN_H
 
+#include <rte_bitops.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_bitops.h>
-
 /**
  * @file
  * Congestion management related parameters for DPDK.
diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
index 67cf0ae526..4fde2ad408 100644
--- a/lib/ethdev/rte_dev_info.h
+++ b/lib/ethdev/rte_dev_info.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_DEV_INFO_H_
 #define _RTE_DEV_INFO_H_
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 /*
  * Placeholder for accessing device registers
  */
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 548fada1c7..a75e26bf07 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -145,10 +145,6 @@
  * a 0 value by the receive function of the driver for a given number of tries.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 /* Use this macro to check if LRO API is supported */
@@ -5966,6 +5962,10 @@ int rte_eth_cman_config_get(uint16_t port_id, struct rte_eth_cman_config *config
 
 #include <rte_ethdev_core.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * Helper routine for rte_eth_rx_burst().
diff --git a/lib/ethdev/rte_ethdev_trace_fp.h b/lib/ethdev/rte_ethdev_trace_fp.h
index 40b6e4756b..c11b4f18f7 100644
--- a/lib/ethdev/rte_ethdev_trace_fp.h
+++ b/lib/ethdev/rte_ethdev_trace_fp.h
@@ -11,12 +11,12 @@
  * API for ethdev trace support
  */
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 RTE_TRACE_POINT_FP(
 	rte_ethdev_trace_rx_burst,
 	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
diff --git a/lib/eventdev/event_timer_adapter_pmd.h b/lib/eventdev/event_timer_adapter_pmd.h
index cd5127f047..fffcd90c8f 100644
--- a/lib/eventdev/event_timer_adapter_pmd.h
+++ b/lib/eventdev/event_timer_adapter_pmd.h
@@ -16,12 +16,12 @@
  * versioning.
  */
 
+#include "rte_event_timer_adapter.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "rte_event_timer_adapter.h"
-
 /*
  * Definitions of functions exported by an event timer adapter implementation
  * through *rte_event_timer_adapter_ops* structure supplied in the
diff --git a/lib/eventdev/eventdev_pmd.h b/lib/eventdev/eventdev_pmd.h
index 7a5699f14b..fd5f7a14f4 100644
--- a/lib/eventdev/eventdev_pmd.h
+++ b/lib/eventdev/eventdev_pmd.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_EVENTDEV_PMD_H_
 #define _RTE_EVENTDEV_PMD_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** @file
  * RTE Event PMD APIs
  *
@@ -31,6 +27,10 @@ extern "C" {
 #include "event_timer_adapter_pmd.h"
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 extern int rte_event_logtype;
 #define RTE_LOGTYPE_EVENTDEV rte_event_logtype
 
diff --git a/lib/eventdev/eventdev_pmd_pci.h b/lib/eventdev/eventdev_pmd_pci.h
index 26aa3a6635..5cb5916a84 100644
--- a/lib/eventdev/eventdev_pmd_pci.h
+++ b/lib/eventdev/eventdev_pmd_pci.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_EVENTDEV_PMD_PCI_H_
 #define _RTE_EVENTDEV_PMD_PCI_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** @file
  * RTE Eventdev PCI PMD APIs
  *
@@ -28,6 +24,10 @@ extern "C" {
 
 #include "eventdev_pmd.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 typedef int (*eventdev_pmd_pci_callback_t)(struct rte_eventdev *dev);
 
 /**
diff --git a/lib/eventdev/eventdev_pmd_vdev.h b/lib/eventdev/eventdev_pmd_vdev.h
index bb433ba955..4eaefa0b0b 100644
--- a/lib/eventdev/eventdev_pmd_vdev.h
+++ b/lib/eventdev/eventdev_pmd_vdev.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_EVENTDEV_PMD_VDEV_H_
 #define _RTE_EVENTDEV_PMD_VDEV_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /** @file
  * RTE Eventdev VDEV PMD APIs
  *
@@ -27,6 +23,10 @@ extern "C" {
 
 #include "eventdev_pmd.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  * Creates a new virtual event device and returns the pointer to that device.
diff --git a/lib/eventdev/eventdev_trace.h b/lib/eventdev/eventdev_trace.h
index 9c2b261c06..8ff8841729 100644
--- a/lib/eventdev/eventdev_trace.h
+++ b/lib/eventdev/eventdev_trace.h
@@ -11,10 +11,6 @@
  * API for ethdev trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_trace_point.h>
 
 #include "rte_eventdev.h"
@@ -22,6 +18,10 @@ extern "C" {
 #include "rte_event_eth_rx_adapter.h"
 #include "rte_event_timer_adapter.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_TRACE_POINT(
 	rte_eventdev_trace_configure,
 	RTE_TRACE_POINT_ARGS(uint8_t dev_id,
diff --git a/lib/eventdev/rte_event_crypto_adapter.h b/lib/eventdev/rte_event_crypto_adapter.h
index e07f159b77..c9b277c664 100644
--- a/lib/eventdev/rte_event_crypto_adapter.h
+++ b/lib/eventdev/rte_event_crypto_adapter.h
@@ -167,14 +167,14 @@
  * from the start of the rte_crypto_op including initialization vector (IV).
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Crypto event adapter mode
  */
diff --git a/lib/eventdev/rte_event_eth_rx_adapter.h b/lib/eventdev/rte_event_eth_rx_adapter.h
index cf42c69b0d..9237e198a7 100644
--- a/lib/eventdev/rte_event_eth_rx_adapter.h
+++ b/lib/eventdev/rte_event_eth_rx_adapter.h
@@ -87,10 +87,6 @@
  * event based so the callback can also modify the event data if it needs to.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -98,6 +94,10 @@ extern "C" {
 
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_EVENT_ETH_RX_ADAPTER_MAX_INSTANCE 32
 
 /* struct rte_event_eth_rx_adapter_queue_conf flags definitions */
diff --git a/lib/eventdev/rte_event_eth_tx_adapter.h b/lib/eventdev/rte_event_eth_tx_adapter.h
index b38b3fce97..ef01345ac2 100644
--- a/lib/eventdev/rte_event_eth_tx_adapter.h
+++ b/lib/eventdev/rte_event_eth_tx_adapter.h
@@ -76,10 +76,6 @@
  * impact due to a change in how the transmit queue index is specified.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -87,6 +83,10 @@ extern "C" {
 
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Adapter configuration structure
  *
diff --git a/lib/eventdev/rte_event_ring.h b/lib/eventdev/rte_event_ring.h
index f9cf19ae16..5769da269e 100644
--- a/lib/eventdev/rte_event_ring.h
+++ b/lib/eventdev/rte_event_ring.h
@@ -14,10 +14,6 @@
 #ifndef _RTE_EVENT_RING_
 #define _RTE_EVENT_RING_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_common.h>
@@ -25,6 +21,10 @@ extern "C" {
 #include <rte_ring_elem.h>
 #include "rte_eventdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_TAILQ_EVENT_RING_NAME "RTE_EVENT_RING"
 
 /**
diff --git a/lib/eventdev/rte_event_timer_adapter.h b/lib/eventdev/rte_event_timer_adapter.h
index 0bd1b30045..256807b3bf 100644
--- a/lib/eventdev/rte_event_timer_adapter.h
+++ b/lib/eventdev/rte_event_timer_adapter.h
@@ -107,14 +107,14 @@
  * All these use cases require high resolution and low time drift.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 
 #include "rte_eventdev.h"
 #include "rte_eventdev_trace_fp.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Timer adapter clock source
  */
diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
index 08e5f9320b..e5c5b7df64 100644
--- a/lib/eventdev/rte_eventdev.h
+++ b/lib/eventdev/rte_eventdev.h
@@ -237,10 +237,6 @@
  * \endcode
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_errno.h>
@@ -2469,6 +2465,10 @@ rte_event_vector_pool_create(const char *name, unsigned int n,
 
 #include <rte_eventdev_core.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static __rte_always_inline uint16_t
 __rte_event_enqueue_burst(uint8_t dev_id, uint8_t port_id,
 			  const struct rte_event ev[], uint16_t nb_events,
diff --git a/lib/eventdev/rte_eventdev_trace_fp.h b/lib/eventdev/rte_eventdev_trace_fp.h
index 04d510ad00..8656f1e6e4 100644
--- a/lib/eventdev/rte_eventdev_trace_fp.h
+++ b/lib/eventdev/rte_eventdev_trace_fp.h
@@ -11,12 +11,12 @@
  * API for ethdev trace support
  */
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 RTE_TRACE_POINT_FP(
 	rte_eventdev_trace_deq_burst,
 	RTE_TRACE_POINT_ARGS(uint8_t dev_id, uint8_t port_id, void *ev_table,
diff --git a/lib/graph/rte_graph_model_mcore_dispatch.h b/lib/graph/rte_graph_model_mcore_dispatch.h
index 732b89297f..f9ff3daa88 100644
--- a/lib/graph/rte_graph_model_mcore_dispatch.h
+++ b/lib/graph/rte_graph_model_mcore_dispatch.h
@@ -12,10 +12,6 @@
  * dispatch model.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_errno.h>
 #include <rte_mempool.h>
 #include <rte_memzone.h>
@@ -23,6 +19,10 @@ extern "C" {
 
 #include "rte_graph_worker_common.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_GRAPH_SCHED_WQ_SIZE_MULTIPLIER  8
 #define RTE_GRAPH_SCHED_WQ_SIZE(nb_nodes)   \
 	((typeof(nb_nodes))((nb_nodes) * RTE_GRAPH_SCHED_WQ_SIZE_MULTIPLIER))
diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h
index 03d0e01b68..b0f952a82c 100644
--- a/lib/graph/rte_graph_worker.h
+++ b/lib/graph/rte_graph_worker.h
@@ -6,13 +6,13 @@
 #ifndef _RTE_GRAPH_WORKER_H_
 #define _RTE_GRAPH_WORKER_H_
 
+#include "rte_graph_model_rtc.h"
+#include "rte_graph_model_mcore_dispatch.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include "rte_graph_model_rtc.h"
-#include "rte_graph_model_mcore_dispatch.h"
-
 /**
  * Perform graph walk on the circular buffer and invoke the process function
  * of the nodes and collect the stats.
diff --git a/lib/gso/rte_gso.h b/lib/gso/rte_gso.h
index d60cb65f18..75246989dc 100644
--- a/lib/gso/rte_gso.h
+++ b/lib/gso/rte_gso.h
@@ -10,13 +10,13 @@
  * Interface to GSO library
  */
 
+#include <stdint.h>
+#include <rte_mbuf.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include <rte_mbuf.h>
-
 /* Minimum GSO segment size for TCP based packets. */
 #define RTE_GSO_SEG_SIZE_MIN (sizeof(struct rte_ether_hdr) + \
 		sizeof(struct rte_ipv4_hdr) + sizeof(struct rte_tcp_hdr) + 1)
diff --git a/lib/hash/rte_fbk_hash.h b/lib/hash/rte_fbk_hash.h
index b01126999b..1f0c1d1b6c 100644
--- a/lib/hash/rte_fbk_hash.h
+++ b/lib/hash/rte_fbk_hash.h
@@ -18,15 +18,15 @@
 #include <stdint.h>
 #include <errno.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 
 #include <rte_hash_crc.h>
 #include <rte_jhash.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_FBK_HASH_INIT_VAL_DEFAULT
 /** Initialising value used when calculating hash. */
 #define RTE_FBK_HASH_INIT_VAL_DEFAULT		0xFFFFFFFF
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 8ad2422ec3..fa07c97685 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -11,10 +11,6 @@
  * RTE CRC Hash
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_branch_prediction.h>
@@ -39,6 +35,10 @@ extern uint8_t rte_hash_crc32_alg;
 #include "rte_crc_generic.h"
 #endif
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
  * calculation.
diff --git a/lib/hash/rte_jhash.h b/lib/hash/rte_jhash.h
index f2446f081e..b70799d209 100644
--- a/lib/hash/rte_jhash.h
+++ b/lib/hash/rte_jhash.h
@@ -11,10 +11,6 @@
  * jhash functions.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <string.h>
 #include <limits.h>
@@ -23,6 +19,10 @@ extern "C" {
 #include <rte_log.h>
 #include <rte_byteorder.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* jhash.h: Jenkins hash support.
  *
  * Copyright (C) 2006 Bob Jenkins (bob_jenkins@burtleburtle.net)
diff --git a/lib/hash/rte_thash.h b/lib/hash/rte_thash.h
index 30b657e67a..ec9bc57efa 100644
--- a/lib/hash/rte_thash.h
+++ b/lib/hash/rte_thash.h
@@ -15,10 +15,6 @@
  * after GRE header decapsulating)
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_byteorder.h>
@@ -28,6 +24,10 @@ extern "C" {
 
 #if defined(RTE_ARCH_X86) || defined(__ARM_NEON)
 #include <rte_vect.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #endif
 
 #ifdef RTE_ARCH_X86
diff --git a/lib/hash/rte_thash_gfni.h b/lib/hash/rte_thash_gfni.h
index 132f37506d..5234c1697f 100644
--- a/lib/hash/rte_thash_gfni.h
+++ b/lib/hash/rte_thash_gfni.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_THASH_GFNI_H_
 #define _RTE_THASH_GFNI_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_log.h>
 
@@ -16,6 +12,10 @@ extern "C" {
 
 #include <rte_thash_x86_gfni.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #endif
 
 /**
diff --git a/lib/ip_frag/rte_ip_frag.h b/lib/ip_frag/rte_ip_frag.h
index 2ad318096b..84fd717953 100644
--- a/lib/ip_frag/rte_ip_frag.h
+++ b/lib/ip_frag/rte_ip_frag.h
@@ -12,10 +12,6 @@
  * Implementation of IP packet fragmentation and reassembly.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdio.h>
 
@@ -25,6 +21,10 @@ extern "C" {
 #include <rte_ip.h>
 #include <rte_byteorder.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_mbuf;
 
 /** death row size (in packets) */
diff --git a/lib/ipsec/rte_ipsec.h b/lib/ipsec/rte_ipsec.h
index f15f6f2966..28b7a61aea 100644
--- a/lib/ipsec/rte_ipsec.h
+++ b/lib/ipsec/rte_ipsec.h
@@ -17,10 +17,6 @@
 #include <rte_ipsec_sa.h>
 #include <rte_mbuf.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 struct rte_ipsec_session;
 
 /**
@@ -181,6 +177,10 @@ rte_ipsec_telemetry_sa_del(const struct rte_ipsec_sa *sa);
 
 #include <rte_ipsec_group.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/log/rte_log.h b/lib/log/rte_log.h
index f357c59548..3735137150 100644
--- a/lib/log/rte_log.h
+++ b/lib/log/rte_log.h
@@ -13,10 +13,6 @@
  * This file provides a log API to RTE applications.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <assert.h>
 #include <stdint.h>
 #include <stdio.h>
@@ -26,6 +22,10 @@ extern "C" {
 #include <rte_common.h>
 #include <rte_config.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* SDK log type */
 #define RTE_LOGTYPE_EAL        0 /**< Log related to eal. */
 				 /* was RTE_LOGTYPE_MALLOC */
diff --git a/lib/lpm/rte_lpm.h b/lib/lpm/rte_lpm.h
index 9c6df311cb..329dc1aad4 100644
--- a/lib/lpm/rte_lpm.h
+++ b/lib/lpm/rte_lpm.h
@@ -391,6 +391,10 @@ static inline void
 rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
 	uint32_t defv);
 
+#ifdef __cplusplus
+}
+#endif
+
 #if defined(RTE_ARCH_ARM)
 #ifdef RTE_HAS_SVE_ACLE
 #include "rte_lpm_sve.h"
@@ -407,8 +411,4 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4],
 #include "rte_lpm_scalar.h"
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif /* _RTE_LPM_H_ */
diff --git a/lib/member/rte_member.h b/lib/member/rte_member.h
index aec192eba5..109bdd000b 100644
--- a/lib/member/rte_member.h
+++ b/lib/member/rte_member.h
@@ -54,10 +54,6 @@
 #ifndef _RTE_MEMBER_H_
 #define _RTE_MEMBER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdbool.h>
 #include <inttypes.h>
@@ -100,6 +96,10 @@ typedef uint16_t member_set_t;
 #define MEMBER_HASH_FUNC       rte_jhash
 #endif
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** @internal setsummary structure. */
 struct rte_member_setsum;
 
diff --git a/lib/member/rte_member_sketch.h b/lib/member/rte_member_sketch.h
index 74f24ca223..6a8d5104dd 100644
--- a/lib/member/rte_member_sketch.h
+++ b/lib/member/rte_member_sketch.h
@@ -5,13 +5,13 @@
 #ifndef RTE_MEMBER_SKETCH_H
 #define RTE_MEMBER_SKETCH_H
 
+#include <rte_vect.h>
+#include <rte_ring_elem.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_vect.h>
-#include <rte_ring_elem.h>
-
 #define NUM_ROW_SCALAR 5
 #define INTERVAL (1 << 15)
 
diff --git a/lib/member/rte_member_sketch_avx512.h b/lib/member/rte_member_sketch_avx512.h
index 52666b5b4c..a8ef3b065e 100644
--- a/lib/member/rte_member_sketch_avx512.h
+++ b/lib/member/rte_member_sketch_avx512.h
@@ -5,14 +5,14 @@
 #ifndef RTE_MEMBER_SKETCH_AVX512_H
 #define RTE_MEMBER_SKETCH_AVX512_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_vect.h>
 #include "rte_member.h"
 #include "rte_member_sketch.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define NUM_ROW_VEC 8
 
 void
diff --git a/lib/member/rte_member_x86.h b/lib/member/rte_member_x86.h
index d115151f9f..4de453485b 100644
--- a/lib/member/rte_member_x86.h
+++ b/lib/member/rte_member_x86.h
@@ -5,12 +5,12 @@
 #ifndef _RTE_MEMBER_X86_H_
 #define _RTE_MEMBER_X86_H_
 
+#include <x86intrin.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <x86intrin.h>
-
 #if defined(__AVX2__)
 
 static inline int
diff --git a/lib/member/rte_xxh64_avx512.h b/lib/member/rte_xxh64_avx512.h
index ffe6cb79f9..58f896ebb8 100644
--- a/lib/member/rte_xxh64_avx512.h
+++ b/lib/member/rte_xxh64_avx512.h
@@ -5,13 +5,13 @@
 #ifndef RTE_XXH64_AVX512_H
 #define RTE_XXH64_AVX512_H
 
+#include <rte_common.h>
+#include <immintrin.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <immintrin.h>
-
 /* 0b1001111000110111011110011011000110000101111010111100101010000111 */
 static const uint64_t PRIME64_1 = 0x9E3779B185EBCA87ULL;
 /* 0b1100001010110010101011100011110100100111110101001110101101001111 */
diff --git a/lib/mempool/mempool_trace.h b/lib/mempool/mempool_trace.h
index dffef062e4..c595a3116b 100644
--- a/lib/mempool/mempool_trace.h
+++ b/lib/mempool/mempool_trace.h
@@ -11,15 +11,15 @@
  * APIs for mempool trace support
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include "rte_mempool.h"
 
 #include <rte_memzone.h>
 #include <rte_trace_point.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 RTE_TRACE_POINT(
 	rte_mempool_trace_create,
 	RTE_TRACE_POINT_ARGS(const char *name, uint32_t nb_elts,
diff --git a/lib/mempool/rte_mempool_trace_fp.h b/lib/mempool/rte_mempool_trace_fp.h
index ed060e887c..9c5cdbb291 100644
--- a/lib/mempool/rte_mempool_trace_fp.h
+++ b/lib/mempool/rte_mempool_trace_fp.h
@@ -11,12 +11,12 @@
  * Mempool fast path API for trace support
  */
 
+#include <rte_trace_point.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_trace_point.h>
-
 RTE_TRACE_POINT_FP(
 	rte_mempool_trace_ops_dequeue_bulk,
 	RTE_TRACE_POINT_ARGS(void *mempool, void **obj_table,
diff --git a/lib/meter/rte_meter.h b/lib/meter/rte_meter.h
index bd68cbe389..e72bf93b3e 100644
--- a/lib/meter/rte_meter.h
+++ b/lib/meter/rte_meter.h
@@ -6,10 +6,6 @@
 #ifndef __INCLUDE_RTE_METER_H__
 #define __INCLUDE_RTE_METER_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Traffic Metering
@@ -22,6 +18,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Application Programmer's Interface (API)
  */
diff --git a/lib/mldev/mldev_utils.h b/lib/mldev/mldev_utils.h
index 5e2a180adc..bf21067d38 100644
--- a/lib/mldev/mldev_utils.h
+++ b/lib/mldev/mldev_utils.h
@@ -5,10 +5,6 @@
 #ifndef RTE_MLDEV_UTILS_H
 #define RTE_MLDEV_UTILS_H
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -20,6 +16,10 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_mldev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  *
diff --git a/lib/mldev/rte_mldev_core.h b/lib/mldev/rte_mldev_core.h
index b3bd281083..8dccf125fc 100644
--- a/lib/mldev/rte_mldev_core.h
+++ b/lib/mldev/rte_mldev_core.h
@@ -16,10 +16,6 @@
  * These APIs are for MLDEV PMDs and library only.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <dev_driver.h>
@@ -27,6 +23,10 @@ extern "C" {
 #include <rte_log.h>
 #include <rte_mldev.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /* Device state */
 #define ML_DEV_DETACHED (0)
 #define ML_DEV_ATTACHED (1)
diff --git a/lib/mldev/rte_mldev_pmd.h b/lib/mldev/rte_mldev_pmd.h
index fd5bbf4360..47c0f23223 100644
--- a/lib/mldev/rte_mldev_pmd.h
+++ b/lib/mldev/rte_mldev_pmd.h
@@ -14,10 +14,6 @@
  * These APIs are for MLDEV PMDs only and user applications should not call them directly.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_common.h>
@@ -25,6 +21,10 @@ extern "C" {
 #include <rte_mldev.h>
 #include <rte_mldev_core.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @internal
  *
diff --git a/lib/net/rte_ether.h b/lib/net/rte_ether.h
index 32ed515aef..403e84f50b 100644
--- a/lib/net/rte_ether.h
+++ b/lib/net/rte_ether.h
@@ -11,10 +11,6 @@
  * Ethernet Helpers in RTE
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdio.h>
 
@@ -22,6 +18,10 @@ extern "C" {
 #include <rte_mbuf.h>
 #include <rte_byteorder.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_ETHER_ADDR_LEN  6 /**< Length of Ethernet address. */
 #define RTE_ETHER_TYPE_LEN  2 /**< Length of Ethernet type field. */
 #define RTE_ETHER_CRC_LEN   4 /**< Length of Ethernet CRC. */
diff --git a/lib/net/rte_net.h b/lib/net/rte_net.h
index cdc6cf956d..40ad6a71a1 100644
--- a/lib/net/rte_net.h
+++ b/lib/net/rte_net.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_NET_PTYPE_H_
 #define _RTE_NET_PTYPE_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_ip.h>
 #include <rte_udp.h>
 #include <rte_tcp.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Structure containing header lengths associated to a packet, filled
  * by rte_net_get_ptype().
diff --git a/lib/net/rte_sctp.h b/lib/net/rte_sctp.h
index 965682dc2b..a8ba9e49d8 100644
--- a/lib/net/rte_sctp.h
+++ b/lib/net/rte_sctp.h
@@ -14,14 +14,14 @@
 #ifndef _RTE_SCTP_H_
 #define _RTE_SCTP_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_byteorder.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * SCTP Header
  */
diff --git a/lib/node/rte_node_eth_api.h b/lib/node/rte_node_eth_api.h
index 143cf131b3..2b7019f6bb 100644
--- a/lib/node/rte_node_eth_api.h
+++ b/lib/node/rte_node_eth_api.h
@@ -16,15 +16,15 @@
  * and its queue associations.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_graph.h>
 #include <rte_mempool.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Port config for ethdev_rx and ethdev_tx node.
  */
diff --git a/lib/node/rte_node_ip4_api.h b/lib/node/rte_node_ip4_api.h
index 24f8ec843a..950751a525 100644
--- a/lib/node/rte_node_ip4_api.h
+++ b/lib/node/rte_node_ip4_api.h
@@ -15,15 +15,15 @@
  * This API allows to do control path functions of ip4_* nodes
  * like ip4_lookup, ip4_rewrite.
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_compat.h>
 
 #include <rte_graph.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * IP4 lookup next nodes.
  */
diff --git a/lib/node/rte_node_ip6_api.h b/lib/node/rte_node_ip6_api.h
index a538dc2ea7..f467aac7b6 100644
--- a/lib/node/rte_node_ip6_api.h
+++ b/lib/node/rte_node_ip6_api.h
@@ -15,13 +15,13 @@
  * This API allows to do control path functions of ip6_* nodes
  * like ip6_lookup, ip6_rewrite.
  */
+#include <rte_common.h>
+#include <rte_compat.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <rte_compat.h>
-
 /**
  * IP6 lookup next nodes.
  */
diff --git a/lib/node/rte_node_udp4_input_api.h b/lib/node/rte_node_udp4_input_api.h
index c873acbbe0..694660bd6a 100644
--- a/lib/node/rte_node_udp4_input_api.h
+++ b/lib/node/rte_node_udp4_input_api.h
@@ -16,14 +16,14 @@
  * like udp4_input.
  *
  */
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_common.h>
 #include <rte_compat.h>
 
 #include "rte_graph.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 /**
  * UDP4 lookup next nodes.
  */
diff --git a/lib/pci/rte_pci.h b/lib/pci/rte_pci.h
index c26fc77209..9a50a12142 100644
--- a/lib/pci/rte_pci.h
+++ b/lib/pci/rte_pci.h
@@ -12,14 +12,14 @@
  * RTE PCI Library
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdio.h>
 #include <inttypes.h>
 #include <sys/types.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Conventional PCI and PCI-X Mode 1 devices have 256 bytes of
  * configuration space.  PCI-X Mode 2 and PCIe devices have 4096 bytes of
diff --git a/lib/pdcp/rte_pdcp.h b/lib/pdcp/rte_pdcp.h
index f74524f83d..15fcbf9607 100644
--- a/lib/pdcp/rte_pdcp.h
+++ b/lib/pdcp/rte_pdcp.h
@@ -19,10 +19,6 @@
 #include <rte_pdcp_hdr.h>
 #include <rte_security.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /* Forward declarations. */
 struct rte_pdcp_entity;
 
@@ -373,6 +369,10 @@ rte_pdcp_t_reordering_expiry_handle(const struct rte_pdcp_entity *entity,
  */
 #include <rte_pdcp_group.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/pipeline/rte_pipeline.h b/lib/pipeline/rte_pipeline.h
index 0c7994b4f2..c9e7172453 100644
--- a/lib/pipeline/rte_pipeline.h
+++ b/lib/pipeline/rte_pipeline.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PIPELINE_H__
 #define __INCLUDE_RTE_PIPELINE_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Pipeline
@@ -59,6 +55,10 @@ extern "C" {
 #include <rte_table.h>
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_mbuf;
 
 /*
diff --git a/lib/pipeline/rte_port_in_action.h b/lib/pipeline/rte_port_in_action.h
index ec2994599f..9d17bae988 100644
--- a/lib/pipeline/rte_port_in_action.h
+++ b/lib/pipeline/rte_port_in_action.h
@@ -46,10 +46,6 @@
  * @b EXPERIMENTAL: this API may change without prior notice
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -57,6 +53,10 @@ extern "C" {
 
 #include "rte_pipeline.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Input port actions. */
 enum rte_port_in_action_type {
 	/** Filter selected input packets. */
diff --git a/lib/pipeline/rte_swx_ctl.h b/lib/pipeline/rte_swx_ctl.h
index 6ef2551ab5..c4e63753f5 100644
--- a/lib/pipeline/rte_swx_ctl.h
+++ b/lib/pipeline/rte_swx_ctl.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_CTL_H__
 #define __INCLUDE_RTE_SWX_CTL_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Pipeline Control
@@ -22,6 +18,10 @@ extern "C" {
 #include "rte_swx_port.h"
 #include "rte_swx_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_swx_pipeline;
 
 /** Name size. */
diff --git a/lib/pipeline/rte_swx_extern.h b/lib/pipeline/rte_swx_extern.h
index e10e963d63..1553fa81ec 100644
--- a/lib/pipeline/rte_swx_extern.h
+++ b/lib/pipeline/rte_swx_extern.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_EXTERN_H__
 #define __INCLUDE_RTE_SWX_EXTERN_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Extern objects and functions
@@ -19,6 +15,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * Extern type
  */
diff --git a/lib/pipeline/rte_swx_ipsec.h b/lib/pipeline/rte_swx_ipsec.h
index 7c07fdc739..d2e5abef7d 100644
--- a/lib/pipeline/rte_swx_ipsec.h
+++ b/lib/pipeline/rte_swx_ipsec.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_IPSEC_H__
 #define __INCLUDE_RTE_SWX_IPSEC_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Internet Protocol Security (IPsec)
@@ -53,6 +49,10 @@ extern "C" {
 #include <rte_compat.h>
 #include <rte_crypto_sym.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * IPsec Setup API
  */
diff --git a/lib/pipeline/rte_swx_pipeline.h b/lib/pipeline/rte_swx_pipeline.h
index 25df042d3b..882bd4bf6f 100644
--- a/lib/pipeline/rte_swx_pipeline.h
+++ b/lib/pipeline/rte_swx_pipeline.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PIPELINE_H__
 #define __INCLUDE_RTE_SWX_PIPELINE_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Pipeline
@@ -22,6 +18,10 @@ extern "C" {
 #include "rte_swx_table.h"
 #include "rte_swx_extern.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Name size. */
 #ifndef RTE_SWX_NAME_SIZE
 #define RTE_SWX_NAME_SIZE 64
diff --git a/lib/pipeline/rte_swx_pipeline_spec.h b/lib/pipeline/rte_swx_pipeline_spec.h
index dd88c0bfab..077b407c0a 100644
--- a/lib/pipeline/rte_swx_pipeline_spec.h
+++ b/lib/pipeline/rte_swx_pipeline_spec.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PIPELINE_SPEC_H__
 #define __INCLUDE_RTE_SWX_PIPELINE_SPEC_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <stdio.h>
 
@@ -15,6 +11,10 @@ extern "C" {
 
 #include <rte_swx_pipeline.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /*
  * extobj.
  *
diff --git a/lib/pipeline/rte_table_action.h b/lib/pipeline/rte_table_action.h
index 5dffbeb700..bab4bfd2e2 100644
--- a/lib/pipeline/rte_table_action.h
+++ b/lib/pipeline/rte_table_action.h
@@ -52,10 +52,6 @@
  * @b EXPERIMENTAL: this API may change without prior notice
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -65,6 +61,10 @@ extern "C" {
 
 #include "rte_pipeline.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Table actions. */
 enum rte_table_action_type {
 	/** Forward to next pipeline table, output port or drop. */
diff --git a/lib/port/rte_port.h b/lib/port/rte_port.h
index 0e30db371e..4b20872537 100644
--- a/lib/port/rte_port.h
+++ b/lib/port/rte_port.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_H__
 #define __INCLUDE_RTE_PORT_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port
@@ -20,6 +16,10 @@ extern "C" {
 #include <stdint.h>
 #include <rte_mbuf.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**@{
  * Macros to allow accessing metadata stored in the mbuf headroom
  * just beyond the end of the mbuf data structure returned by a port
diff --git a/lib/port/rte_port_ethdev.h b/lib/port/rte_port_ethdev.h
index e07021cb89..7729ff0da3 100644
--- a/lib/port/rte_port_ethdev.h
+++ b/lib/port/rte_port_ethdev.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_ETHDEV_H__
 #define __INCLUDE_RTE_PORT_ETHDEV_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Ethernet Device
@@ -21,6 +17,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ethdev_reader port parameters */
 struct rte_port_ethdev_reader_params {
 	/** NIC RX port ID */
diff --git a/lib/port/rte_port_eventdev.h b/lib/port/rte_port_eventdev.h
index 0efb8e1021..d9eccf07d4 100644
--- a/lib/port/rte_port_eventdev.h
+++ b/lib/port/rte_port_eventdev.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_EVENTDEV_H__
 #define __INCLUDE_RTE_PORT_EVENTDEV_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Eventdev Interface
@@ -24,6 +20,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Eventdev_reader port parameters */
 struct rte_port_eventdev_reader_params {
 	/** Eventdev Device ID */
diff --git a/lib/port/rte_port_fd.h b/lib/port/rte_port_fd.h
index 885b9ada22..40a5e4a426 100644
--- a/lib/port/rte_port_fd.h
+++ b/lib/port/rte_port_fd.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_FD_H__
 #define __INCLUDE_RTE_PORT_FD_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port FD Device
@@ -21,6 +17,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** fd_reader port parameters */
 struct rte_port_fd_reader_params {
 	/** File descriptor */
diff --git a/lib/port/rte_port_frag.h b/lib/port/rte_port_frag.h
index 4055872e8d..9a10f10523 100644
--- a/lib/port/rte_port_frag.h
+++ b/lib/port/rte_port_frag.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_IP_FRAG_H__
 #define __INCLUDE_RTE_PORT_IP_FRAG_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port for IPv4 Fragmentation
@@ -31,6 +27,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ring_reader_ipv4_frag port parameters */
 struct rte_port_ring_reader_frag_params {
 	/** Underlying single consumer ring that has to be pre-initialized. */
diff --git a/lib/port/rte_port_ras.h b/lib/port/rte_port_ras.h
index 94cfb3ed92..86e36f5362 100644
--- a/lib/port/rte_port_ras.h
+++ b/lib/port/rte_port_ras.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_RAS_H__
 #define __INCLUDE_RTE_PORT_RAS_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port for IPv4 Reassembly
@@ -31,6 +27,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ring_writer_ipv4_ras port parameters */
 struct rte_port_ring_writer_ras_params {
 	/** Underlying single consumer ring that has to be pre-initialized. */
diff --git a/lib/port/rte_port_ring.h b/lib/port/rte_port_ring.h
index 027928c924..2089d0889b 100644
--- a/lib/port/rte_port_ring.h
+++ b/lib/port/rte_port_ring.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_RING_H__
 #define __INCLUDE_RTE_PORT_RING_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Ring
@@ -27,6 +23,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ring_reader port parameters */
 struct rte_port_ring_reader_params {
 	/** Underlying consumer ring that has to be pre-initialized */
diff --git a/lib/port/rte_port_sched.h b/lib/port/rte_port_sched.h
index 251380ef80..1bf08ae6a9 100644
--- a/lib/port/rte_port_sched.h
+++ b/lib/port/rte_port_sched.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_SCHED_H__
 #define __INCLUDE_RTE_PORT_SCHED_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Hierarchical Scheduler
@@ -23,6 +19,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** sched_reader port parameters */
 struct rte_port_sched_reader_params {
 	/** Underlying pre-initialized rte_sched_port */
diff --git a/lib/port/rte_port_source_sink.h b/lib/port/rte_port_source_sink.h
index bcdbaf1e40..3122dd5038 100644
--- a/lib/port/rte_port_source_sink.h
+++ b/lib/port/rte_port_source_sink.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_SOURCE_SINK_H__
 #define __INCLUDE_RTE_PORT_SOURCE_SINK_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port Source/Sink
@@ -19,6 +15,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** source port parameters */
 struct rte_port_source_params {
 	/** Pre-initialized buffer pool */
diff --git a/lib/port/rte_port_sym_crypto.h b/lib/port/rte_port_sym_crypto.h
index 6532b4388a..d03cdc1e8b 100644
--- a/lib/port/rte_port_sym_crypto.h
+++ b/lib/port/rte_port_sym_crypto.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_PORT_SYM_CRYPTO_H__
 #define __INCLUDE_RTE_PORT_SYM_CRYPTO_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Port sym crypto Interface
@@ -23,6 +19,10 @@ extern "C" {
 
 #include "rte_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Function prototype for reader post action. */
 typedef void (*rte_port_sym_crypto_reader_callback_fn)(struct rte_mbuf **pkts,
 		uint16_t n_pkts, void *arg);
diff --git a/lib/port/rte_swx_port.h b/lib/port/rte_swx_port.h
index 1dbd95ae87..b52b125572 100644
--- a/lib/port/rte_swx_port.h
+++ b/lib/port/rte_swx_port.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_H__
 #define __INCLUDE_RTE_SWX_PORT_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Port
@@ -17,6 +13,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Packet. */
 struct rte_swx_pkt {
 	/** Opaque packet handle. */
diff --git a/lib/port/rte_swx_port_ethdev.h b/lib/port/rte_swx_port_ethdev.h
index cbc2d7b213..1828031e67 100644
--- a/lib/port/rte_swx_port_ethdev.h
+++ b/lib/port/rte_swx_port_ethdev.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_ETHDEV_H__
 #define __INCLUDE_RTE_SWX_PORT_ETHDEV_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Ethernet Device Input and Output Ports
@@ -17,6 +13,10 @@ extern "C" {
 
 #include "rte_swx_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Ethernet device input port (reader) creation parameters. */
 struct rte_swx_port_ethdev_reader_params {
 	/** Name of a valid and fully configured Ethernet device. */
diff --git a/lib/port/rte_swx_port_fd.h b/lib/port/rte_swx_port_fd.h
index e61719c8f6..63529cf0ab 100644
--- a/lib/port/rte_swx_port_fd.h
+++ b/lib/port/rte_swx_port_fd.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_FD_H__
 #define __INCLUDE_RTE_SWX_PORT_FD_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX FD Input and Output Ports
@@ -18,6 +14,10 @@ extern "C" {
 
 #include "rte_swx_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** fd_reader port parameters */
 struct rte_swx_port_fd_reader_params {
 	/** File descriptor. Must be valid and opened in non-blocking mode. */
diff --git a/lib/port/rte_swx_port_ring.h b/lib/port/rte_swx_port_ring.h
index efc485fb08..ef241c3fee 100644
--- a/lib/port/rte_swx_port_ring.h
+++ b/lib/port/rte_swx_port_ring.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_RING_H__
 #define __INCLUDE_RTE_SWX_PORT_RING_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Ring Input and Output Ports
@@ -18,6 +14,10 @@ extern "C" {
 
 #include "rte_swx_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Ring input port (reader) creation parameters. */
 struct rte_swx_port_ring_reader_params {
 	/** Name of valid RTE ring. */
diff --git a/lib/port/rte_swx_port_source_sink.h b/lib/port/rte_swx_port_source_sink.h
index 91bcbf74f4..e3ca7cfbb4 100644
--- a/lib/port/rte_swx_port_source_sink.h
+++ b/lib/port/rte_swx_port_source_sink.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_PORT_SOURCE_SINK_H__
 #define __INCLUDE_RTE_SWX_PORT_SOURCE_SINK_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Source and Sink Ports
@@ -15,6 +11,10 @@ extern "C" {
 
 #include "rte_swx_port.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum number of packets to read from the PCAP file. */
 #ifndef RTE_SWX_PORT_SOURCE_PKTS_MAX
 #define RTE_SWX_PORT_SOURCE_PKTS_MAX 1024
diff --git a/lib/rawdev/rte_rawdev.h b/lib/rawdev/rte_rawdev.h
index 640037b524..3fc471526e 100644
--- a/lib/rawdev/rte_rawdev.h
+++ b/lib/rawdev/rte_rawdev.h
@@ -14,13 +14,13 @@
  * no specific type already available in DPDK.
  */
 
+#include <rte_common.h>
+#include <rte_memory.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_common.h>
-#include <rte_memory.h>
-
 /* Rawdevice object - essentially a void to be typecast by implementation */
 typedef void *rte_rawdev_obj_t;
 
diff --git a/lib/rawdev/rte_rawdev_pmd.h b/lib/rawdev/rte_rawdev_pmd.h
index 22b406444d..408ed461a4 100644
--- a/lib/rawdev/rte_rawdev_pmd.h
+++ b/lib/rawdev/rte_rawdev_pmd.h
@@ -13,10 +13,6 @@
  * any application.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <string.h>
 
 #include <dev_driver.h>
@@ -26,6 +22,10 @@ extern "C" {
 
 #include "rte_rawdev.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 extern int librawdev_logtype;
 #define RTE_LOGTYPE_RAWDEV librawdev_logtype
 
diff --git a/lib/rcu/rte_rcu_qsbr.h b/lib/rcu/rte_rcu_qsbr.h
index ed3dd6d3d2..550fadf56a 100644
--- a/lib/rcu/rte_rcu_qsbr.h
+++ b/lib/rcu/rte_rcu_qsbr.h
@@ -21,10 +21,6 @@
  * entered quiescent state.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <inttypes.h>
 #include <stdalign.h>
 #include <stdbool.h>
@@ -36,6 +32,10 @@ extern "C" {
 #include <rte_atomic.h>
 #include <rte_ring.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 extern int rte_rcu_log_type;
 #define RTE_LOGTYPE_RCU rte_rcu_log_type
 
diff --git a/lib/regexdev/rte_regexdev.h b/lib/regexdev/rte_regexdev.h
index a50b841b1e..b18a1d4251 100644
--- a/lib/regexdev/rte_regexdev.h
+++ b/lib/regexdev/rte_regexdev.h
@@ -194,10 +194,6 @@
  * - rte_regexdev_dequeue_burst()
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_compat.h>
 #include <rte_common.h>
 #include <rte_dev.h>
@@ -1428,6 +1424,10 @@ struct rte_regex_ops {
 
 #include "rte_regexdev_core.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * @warning
  * @b EXPERIMENTAL: this API may change without prior notice.
diff --git a/lib/ring/rte_ring.h b/lib/ring/rte_ring.h
index c709f30497..11ca69c73d 100644
--- a/lib/ring/rte_ring.h
+++ b/lib/ring/rte_ring.h
@@ -34,13 +34,13 @@
  * for more information.
  */
 
+#include <rte_ring_core.h>
+#include <rte_ring_elem.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_core.h>
-#include <rte_ring_elem.h>
-
 /**
  * Calculate the memory size needed for a ring
  *
diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
index 270869d214..222c5aeb3f 100644
--- a/lib/ring/rte_ring_core.h
+++ b/lib/ring/rte_ring_core.h
@@ -19,10 +19,6 @@
  * instead.
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdalign.h>
 #include <stdio.h>
 #include <stdint.h>
@@ -38,6 +34,10 @@ extern "C" {
 #include <rte_pause.h>
 #include <rte_debug.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_TAILQ_RING_NAME "RTE_RING"
 
 /** enqueue/dequeue behavior types */
diff --git a/lib/ring/rte_ring_elem.h b/lib/ring/rte_ring_elem.h
index 7f7d4951d3..506f686884 100644
--- a/lib/ring/rte_ring_elem.h
+++ b/lib/ring/rte_ring_elem.h
@@ -16,10 +16,6 @@
  * RTE Ring with user defined element size
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_ring_core.h>
 #include <rte_ring_elem_pvt.h>
 
@@ -699,6 +695,10 @@ rte_ring_dequeue_burst_elem(struct rte_ring *r, void *obj_table,
 
 #include <rte_ring.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ring/rte_ring_hts.h b/lib/ring/rte_ring_hts.h
index 9a5938ac58..a41acea740 100644
--- a/lib/ring/rte_ring_hts.h
+++ b/lib/ring/rte_ring_hts.h
@@ -24,12 +24,12 @@
  * To achieve that 64-bit CAS is used by head update routine.
  */
 
+#include <rte_ring_hts_elem_pvt.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_hts_elem_pvt.h>
-
 /**
  * Enqueue several objects on the HTS ring (multi-producers safe).
  *
diff --git a/lib/ring/rte_ring_peek.h b/lib/ring/rte_ring_peek.h
index c0621d12e2..2312f52668 100644
--- a/lib/ring/rte_ring_peek.h
+++ b/lib/ring/rte_ring_peek.h
@@ -43,12 +43,12 @@
  * with enqueue(/dequeue) operation till _finish_ completes.
  */
 
+#include <rte_ring_peek_elem_pvt.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_peek_elem_pvt.h>
-
 /**
  * Start to enqueue several objects on the ring.
  * Note that no actual objects are put in the queue by this function,
diff --git a/lib/ring/rte_ring_peek_zc.h b/lib/ring/rte_ring_peek_zc.h
index 0b5e34b731..3254fe0481 100644
--- a/lib/ring/rte_ring_peek_zc.h
+++ b/lib/ring/rte_ring_peek_zc.h
@@ -67,12 +67,12 @@
  * with enqueue/dequeue operation till _finish_ completes.
  */
 
+#include <rte_ring_peek_elem_pvt.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_peek_elem_pvt.h>
-
 /**
  * Ring zero-copy information structure.
  *
diff --git a/lib/ring/rte_ring_rts.h b/lib/ring/rte_ring_rts.h
index 50fc8f74db..d7a3863c83 100644
--- a/lib/ring/rte_ring_rts.h
+++ b/lib/ring/rte_ring_rts.h
@@ -51,12 +51,12 @@
  * By default HTD_MAX == ring.capacity / 8.
  */
 
+#include <rte_ring_rts_elem_pvt.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_ring_rts_elem_pvt.h>
-
 /**
  * Enqueue several objects on the RTS ring (multi-producers safe).
  *
diff --git a/lib/sched/rte_approx.h b/lib/sched/rte_approx.h
index b60086330e..738e33a98b 100644
--- a/lib/sched/rte_approx.h
+++ b/lib/sched/rte_approx.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_APPROX_H__
 #define __INCLUDE_RTE_APPROX_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Rational Approximation
@@ -20,6 +16,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Find best rational approximation
  *
diff --git a/lib/sched/rte_pie.h b/lib/sched/rte_pie.h
index 1477a47700..2a385ffdba 100644
--- a/lib/sched/rte_pie.h
+++ b/lib/sched/rte_pie.h
@@ -5,10 +5,6 @@
 #ifndef __RTE_PIE_H_INCLUDED__
 #define __RTE_PIE_H_INCLUDED__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * Proportional Integral controller Enhanced (PIE)
@@ -20,6 +16,10 @@ extern "C" {
 #include <rte_debug.h>
 #include <rte_cycles.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_DQ_THRESHOLD   16384   /**< Queue length threshold (2^14)
 				     * to start measurement cycle (bytes)
 				     */
diff --git a/lib/sched/rte_red.h b/lib/sched/rte_red.h
index afaa35fcd6..e62abb9295 100644
--- a/lib/sched/rte_red.h
+++ b/lib/sched/rte_red.h
@@ -5,10 +5,6 @@
 #ifndef __RTE_RED_H_INCLUDED__
 #define __RTE_RED_H_INCLUDED__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Random Early Detection (RED)
@@ -20,6 +16,10 @@ extern "C" {
 #include <rte_cycles.h>
 #include <rte_branch_prediction.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_RED_SCALING                     10         /**< Fraction size for fixed-point */
 #define RTE_RED_S                           (1 << 22)  /**< Packet size multiplied by number of leaf queues */
 #define RTE_RED_MAX_TH_MAX                  1023       /**< Max threshold limit in fixed point format */
diff --git a/lib/sched/rte_sched.h b/lib/sched/rte_sched.h
index b882c4a882..222e6b3583 100644
--- a/lib/sched/rte_sched.h
+++ b/lib/sched/rte_sched.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_SCHED_H__
 #define __INCLUDE_RTE_SCHED_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Hierarchical Scheduler
@@ -62,6 +58,10 @@ extern "C" {
 #include "rte_red.h"
 #include "rte_pie.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum number of queues per pipe.
  * Note that the multiple queues (power of 2) can only be assigned to
  * lowest priority (best-effort) traffic class. Other higher priority traffic
diff --git a/lib/sched/rte_sched_common.h b/lib/sched/rte_sched_common.h
index 573d164569..a5acb9c08a 100644
--- a/lib/sched/rte_sched_common.h
+++ b/lib/sched/rte_sched_common.h
@@ -5,13 +5,13 @@
 #ifndef __INCLUDE_RTE_SCHED_COMMON_H__
 #define __INCLUDE_RTE_SCHED_COMMON_H__
 
+#include <stdint.h>
+#include <sys/types.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-#include <sys/types.h>
-
 #if 0
 static inline uint32_t
 rte_min_pos_4_u16(uint16_t *x)
diff --git a/lib/security/rte_security.h b/lib/security/rte_security.h
index 1c8474b74f..7a9bafa0fa 100644
--- a/lib/security/rte_security.h
+++ b/lib/security/rte_security.h
@@ -12,10 +12,6 @@
  * RTE Security Common Definitions
  */
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <sys/types.h>
 
 #include <rte_compat.h>
@@ -24,6 +20,10 @@ extern "C" {
 #include <rte_ip.h>
 #include <rte_mbuf_dyn.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** IPSec protocol mode */
 enum rte_security_ipsec_sa_mode {
 	RTE_SECURITY_IPSEC_SA_MODE_TRANSPORT = 1,
diff --git a/lib/security/rte_security_driver.h b/lib/security/rte_security_driver.h
index 9bb5052a4c..2ceb145066 100644
--- a/lib/security/rte_security_driver.h
+++ b/lib/security/rte_security_driver.h
@@ -12,13 +12,13 @@
  * RTE Security Common Definitions
  */
 
+#include <rte_compat.h>
+#include "rte_security.h"
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <rte_compat.h>
-#include "rte_security.h"
-
 /**
  * @internal
  * Security session to be used by library for internal usage
diff --git a/lib/stack/rte_stack.h b/lib/stack/rte_stack.h
index 3325757568..4439adfc42 100644
--- a/lib/stack/rte_stack.h
+++ b/lib/stack/rte_stack.h
@@ -15,10 +15,6 @@
 #ifndef _RTE_STACK_H_
 #define _RTE_STACK_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdalign.h>
 
 #include <rte_debug.h>
@@ -95,6 +91,10 @@ struct __rte_cache_aligned rte_stack {
 #include "rte_stack_std.h"
 #include "rte_stack_lf.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Push several objects on the stack (MT-safe).
  *
diff --git a/lib/table/rte_lru.h b/lib/table/rte_lru.h
index 88229d8632..bc1ad36500 100644
--- a/lib/table/rte_lru.h
+++ b/lib/table/rte_lru.h
@@ -5,15 +5,15 @@
 #ifndef __INCLUDE_RTE_LRU_H__
 #define __INCLUDE_RTE_LRU_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <rte_config.h>
 #ifdef RTE_ARCH_X86_64
 #include "rte_lru_x86.h"
 #elif defined(RTE_ARCH_ARM64)
 #include "rte_lru_arm64.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
 #else
 #undef RTE_TABLE_HASH_LRU_STRATEGY
 #define RTE_TABLE_HASH_LRU_STRATEGY                        1
@@ -86,8 +86,4 @@ do {									\
 
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif
diff --git a/lib/table/rte_lru_arm64.h b/lib/table/rte_lru_arm64.h
index f19b0bdb4e..f9a4678ee0 100644
--- a/lib/table/rte_lru_arm64.h
+++ b/lib/table/rte_lru_arm64.h
@@ -5,14 +5,14 @@
 #ifndef __RTE_LRU_ARM64_H__
 #define __RTE_LRU_ARM64_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_vect.h>
 #include <rte_bitops.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #ifndef RTE_TABLE_HASH_LRU_STRATEGY
 #ifdef __ARM_NEON
 #define RTE_TABLE_HASH_LRU_STRATEGY                        3
diff --git a/lib/table/rte_lru_x86.h b/lib/table/rte_lru_x86.h
index ddfb8c1c8c..93f4a136a8 100644
--- a/lib/table/rte_lru_x86.h
+++ b/lib/table/rte_lru_x86.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_LRU_X86_H__
 #define __INCLUDE_RTE_LRU_X86_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_config.h>
@@ -97,8 +93,4 @@ do {									\
 
 #endif
 
-#ifdef __cplusplus
-}
-#endif
-
 #endif
diff --git a/lib/table/rte_swx_hash_func.h b/lib/table/rte_swx_hash_func.h
index 04f3d543e7..9c65cfa913 100644
--- a/lib/table/rte_swx_hash_func.h
+++ b/lib/table/rte_swx_hash_func.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_HASH_FUNC_H__
 #define __INCLUDE_RTE_SWX_HASH_FUNC_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Hash Function
@@ -15,6 +11,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Hash function prototype
  *
diff --git a/lib/table/rte_swx_keycmp.h b/lib/table/rte_swx_keycmp.h
index 09fb1be869..b0ed819307 100644
--- a/lib/table/rte_swx_keycmp.h
+++ b/lib/table/rte_swx_keycmp.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_KEYCMP_H__
 #define __INCLUDE_RTE_SWX_KEYCMP_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Key Comparison Functions
@@ -16,6 +12,10 @@ extern "C" {
 #include <stdint.h>
 #include <string.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Key comparison function prototype
  *
diff --git a/lib/table/rte_swx_table.h b/lib/table/rte_swx_table.h
index ac01e19781..3c53459498 100644
--- a/lib/table/rte_swx_table.h
+++ b/lib/table/rte_swx_table.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_H__
 #define __INCLUDE_RTE_SWX_TABLE_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Table
@@ -21,6 +17,10 @@ extern "C" {
 
 #include "rte_swx_hash_func.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Match type. */
 enum rte_swx_table_match_type {
 	/** Wildcard Match (WM). */
diff --git a/lib/table/rte_swx_table_em.h b/lib/table/rte_swx_table_em.h
index b7423dd060..592541f01f 100644
--- a/lib/table/rte_swx_table_em.h
+++ b/lib/table/rte_swx_table_em.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_EM_H__
 #define __INCLUDE_RTE_SWX_TABLE_EM_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Exact Match Table
@@ -16,6 +12,10 @@ extern "C" {
 
 #include <rte_swx_table.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Exact match table operations - unoptimized. */
 extern struct rte_swx_table_ops rte_swx_table_exact_match_unoptimized_ops;
 
diff --git a/lib/table/rte_swx_table_learner.h b/lib/table/rte_swx_table_learner.h
index c5ea015b8d..9a18be083d 100644
--- a/lib/table/rte_swx_table_learner.h
+++ b/lib/table/rte_swx_table_learner.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_LEARNER_H__
 #define __INCLUDE_RTE_SWX_TABLE_LEARNER_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Learner Table
@@ -53,6 +49,10 @@ extern "C" {
 
 #include "rte_swx_hash_func.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum number of key timeout values per learner table. */
 #ifndef RTE_SWX_TABLE_LEARNER_N_KEY_TIMEOUTS_MAX
 #define RTE_SWX_TABLE_LEARNER_N_KEY_TIMEOUTS_MAX 16
diff --git a/lib/table/rte_swx_table_selector.h b/lib/table/rte_swx_table_selector.h
index 05863cc90b..ef29bdb6b0 100644
--- a/lib/table/rte_swx_table_selector.h
+++ b/lib/table/rte_swx_table_selector.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_SELECTOR_H__
 #define __INCLUDE_RTE_SWX_TABLE_SELECTOR_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Selector Table
@@ -21,6 +17,10 @@ extern "C" {
 
 #include "rte_swx_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Selector table creation parameters. */
 struct rte_swx_table_selector_params {
 	/** Group ID offset. */
diff --git a/lib/table/rte_swx_table_wm.h b/lib/table/rte_swx_table_wm.h
index 4fd52c0a17..7eb6f8e2a6 100644
--- a/lib/table/rte_swx_table_wm.h
+++ b/lib/table/rte_swx_table_wm.h
@@ -4,10 +4,6 @@
 #ifndef __INCLUDE_RTE_SWX_TABLE_WM_H__
 #define __INCLUDE_RTE_SWX_TABLE_WM_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE SWX Wildcard Match Table
@@ -16,6 +12,10 @@ extern "C" {
 
 #include <rte_swx_table.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Wildcard match table operations. */
 extern struct rte_swx_table_ops rte_swx_table_wildcard_match_ops;
 
diff --git a/lib/table/rte_table.h b/lib/table/rte_table.h
index 9a5faf0e32..43a5a1a7b3 100644
--- a/lib/table/rte_table.h
+++ b/lib/table/rte_table.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_H__
 #define __INCLUDE_RTE_TABLE_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table
@@ -27,6 +23,10 @@ extern "C" {
 #include <stdint.h>
 #include <rte_port.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 struct rte_mbuf;
 
 /** Lookup table statistics */
diff --git a/lib/table/rte_table_acl.h b/lib/table/rte_table_acl.h
index 1cb7b9fbbd..61af7b88e4 100644
--- a/lib/table/rte_table_acl.h
+++ b/lib/table/rte_table_acl.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_ACL_H__
 #define __INCLUDE_RTE_TABLE_ACL_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table ACL
@@ -25,6 +21,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** ACL table parameters */
 struct rte_table_acl_params {
 	/** Name */
diff --git a/lib/table/rte_table_array.h b/lib/table/rte_table_array.h
index fad83b0588..b2a7b95d68 100644
--- a/lib/table/rte_table_array.h
+++ b/lib/table/rte_table_array.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_ARRAY_H__
 #define __INCLUDE_RTE_TABLE_ARRAY_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table Array
@@ -20,6 +16,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Array table parameters */
 struct rte_table_array_params {
 	/** Number of array entries. Has to be a power of two. */
diff --git a/lib/table/rte_table_hash.h b/lib/table/rte_table_hash.h
index 6698621dae..ff8fc9e9ce 100644
--- a/lib/table/rte_table_hash.h
+++ b/lib/table/rte_table_hash.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_HASH_H__
 #define __INCLUDE_RTE_TABLE_HASH_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table Hash
@@ -52,6 +48,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Hash function */
 typedef uint64_t (*rte_table_hash_op_hash)(
 	void *key,
diff --git a/lib/table/rte_table_hash_cuckoo.h b/lib/table/rte_table_hash_cuckoo.h
index 3a55d28e9b..55aa12216a 100644
--- a/lib/table/rte_table_hash_cuckoo.h
+++ b/lib/table/rte_table_hash_cuckoo.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_HASH_CUCKOO_H__
 #define __INCLUDE_RTE_TABLE_HASH_CUCKOO_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table Hash Cuckoo
@@ -20,6 +16,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Hash table parameters */
 struct rte_table_hash_cuckoo_params {
 	/** Name */
diff --git a/lib/table/rte_table_hash_func.h b/lib/table/rte_table_hash_func.h
index aa779c2182..cba7ec4c20 100644
--- a/lib/table/rte_table_hash_func.h
+++ b/lib/table/rte_table_hash_func.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_HASH_FUNC_H__
 #define __INCLUDE_RTE_TABLE_HASH_FUNC_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
@@ -18,6 +14,10 @@ extern "C" {
 
 #include <x86intrin.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline uint64_t
 rte_crc32_u64(uint64_t crc, uint64_t v)
 {
@@ -28,6 +28,10 @@ rte_crc32_u64(uint64_t crc, uint64_t v)
 #include "rte_table_hash_func_arm64.h"
 #else
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 static inline uint64_t
 rte_crc32_u64(uint64_t crc, uint64_t v)
 {
diff --git a/lib/table/rte_table_lpm.h b/lib/table/rte_table_lpm.h
index dde32deed9..59b9bdee89 100644
--- a/lib/table/rte_table_lpm.h
+++ b/lib/table/rte_table_lpm.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_LPM_H__
 #define __INCLUDE_RTE_TABLE_LPM_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table LPM for IPv4
@@ -45,6 +41,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** LPM table parameters */
 struct rte_table_lpm_params {
 	/** Table name */
diff --git a/lib/table/rte_table_lpm_ipv6.h b/lib/table/rte_table_lpm_ipv6.h
index 96ddbd32c2..166a5ba9ee 100644
--- a/lib/table/rte_table_lpm_ipv6.h
+++ b/lib/table/rte_table_lpm_ipv6.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_LPM_IPV6_H__
 #define __INCLUDE_RTE_TABLE_LPM_IPV6_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table LPM for IPv6
@@ -45,6 +41,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_LPM_IPV6_ADDR_SIZE 16
 
 /** LPM table parameters */
diff --git a/lib/table/rte_table_stub.h b/lib/table/rte_table_stub.h
index 846526ea99..f7e589df16 100644
--- a/lib/table/rte_table_stub.h
+++ b/lib/table/rte_table_stub.h
@@ -5,10 +5,6 @@
 #ifndef __INCLUDE_RTE_TABLE_STUB_H__
 #define __INCLUDE_RTE_TABLE_STUB_H__
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  * RTE Table Stub
@@ -18,6 +14,10 @@ extern "C" {
 
 #include "rte_table.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Stub table parameters: NONE */
 
 /** Stub table operations */
diff --git a/lib/telemetry/rte_telemetry.h b/lib/telemetry/rte_telemetry.h
index cab9daa6fe..463819e2bf 100644
--- a/lib/telemetry/rte_telemetry.h
+++ b/lib/telemetry/rte_telemetry.h
@@ -5,14 +5,14 @@
 #ifndef _RTE_TELEMETRY_H_
 #define _RTE_TELEMETRY_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 #include <rte_compat.h>
 #include <rte_common.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum length for string used in object. */
 #define RTE_TEL_MAX_STRING_LEN 128
 /** Maximum length of string. */
diff --git a/lib/vhost/rte_vdpa.h b/lib/vhost/rte_vdpa.h
index 6ac85d1bbf..18e273c20f 100644
--- a/lib/vhost/rte_vdpa.h
+++ b/lib/vhost/rte_vdpa.h
@@ -5,10 +5,6 @@
 #ifndef _RTE_VDPA_H_
 #define _RTE_VDPA_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 /**
  * @file
  *
@@ -17,6 +13,10 @@ extern "C" {
 
 #include <stdint.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /** Maximum name length for statistics counters */
 #define RTE_VDPA_STATS_NAME_SIZE 64
 
diff --git a/lib/vhost/rte_vhost.h b/lib/vhost/rte_vhost.h
index b0434c4b8d..c7a5f56df8 100644
--- a/lib/vhost/rte_vhost.h
+++ b/lib/vhost/rte_vhost.h
@@ -18,10 +18,6 @@
 #include <rte_memory.h>
 #include <rte_mempool.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #ifndef __cplusplus
 /* These are not C++-aware. */
 #include <linux/vhost.h>
@@ -29,6 +25,10 @@ extern "C" {
 #include <linux/virtio_net.h>
 #endif
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_VHOST_USER_CLIENT		(1ULL << 0)
 #define RTE_VHOST_USER_NO_RECONNECT	(1ULL << 1)
 #define RTE_VHOST_USER_RESERVED_1	(1ULL << 2)
diff --git a/lib/vhost/rte_vhost_async.h b/lib/vhost/rte_vhost_async.h
index 8f190dd44b..60995e4e62 100644
--- a/lib/vhost/rte_vhost_async.h
+++ b/lib/vhost/rte_vhost_async.h
@@ -5,15 +5,15 @@
 #ifndef _RTE_VHOST_ASYNC_H_
 #define _RTE_VHOST_ASYNC_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdint.h>
 
 #include <rte_compat.h>
 #include <rte_mbuf.h>
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 /**
  * Register an async channel for a vhost queue
  *
diff --git a/lib/vhost/rte_vhost_crypto.h b/lib/vhost/rte_vhost_crypto.h
index f962a53818..af61f0907e 100644
--- a/lib/vhost/rte_vhost_crypto.h
+++ b/lib/vhost/rte_vhost_crypto.h
@@ -5,12 +5,12 @@
 #ifndef _VHOST_CRYPTO_H_
 #define _VHOST_CRYPTO_H_
 
+#include <stdint.h>
+
 #ifdef __cplusplus
 extern "C" {
 #endif
 
-#include <stdint.h>
-
 /* pre-declare structs to avoid including full headers */
 struct rte_mempool;
 struct rte_crypto_op;
diff --git a/lib/vhost/vdpa_driver.h b/lib/vhost/vdpa_driver.h
index 8db4ab9f4d..42392a0d14 100644
--- a/lib/vhost/vdpa_driver.h
+++ b/lib/vhost/vdpa_driver.h
@@ -5,10 +5,6 @@
 #ifndef _VDPA_DRIVER_H_
 #define _VDPA_DRIVER_H_
 
-#ifdef __cplusplus
-extern "C" {
-#endif
-
 #include <stdbool.h>
 
 #include <rte_compat.h>
@@ -16,6 +12,10 @@ extern "C" {
 #include "rte_vhost.h"
 #include "rte_vdpa.h"
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #define RTE_VHOST_QUEUE_ALL UINT16_MAX
 
 /**
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v6 2/6] eal: extend bit manipulation functionality
  2024-09-10  8:31                                               ` [PATCH v6 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-09-10  8:31                                                 ` [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
@ 2024-09-10  8:31                                                 ` Mattias Rönnblom
  2024-09-10  8:31                                                 ` [PATCH v6 3/6] eal: add unit tests for bit operations Mattias Rönnblom
                                                                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  8:31 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

Add functionality to test and modify the value of individual bits in
32-bit or 64-bit words.

These functions have no implications on memory ordering, atomicity and
does not use volatile and thus does not prevent any compiler
optimizations.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

PATCH v3:
 * Remove unnecessary <rte_compat.h> include.
 * Remove redundant 'fun' parameter from the __RTE_GEN_BIT_*() macros
   (Jack Bond-Preston).
 * Introduce __RTE_BIT_BIT_OPS() macro, consistent with how things
   are done when generating the atomic bit operations.
 * Refer to volatile bit op functions as variants instead of families
   (macro parameter naming).

RFC v6:
 * Have rte_bit_test() accept const-marked bitsets.

RFC v4:
 * Add rte_bit_flip() which, believe it or not, flips the value of a bit.
 * Mark macro-generated private functions as experimental.
 * Use macros to generate *assign*() functions.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).
 * Fix ','-related checkpatch warnings.
---
 lib/eal/include/rte_bitops.h | 260 ++++++++++++++++++++++++++++++++++-
 1 file changed, 258 insertions(+), 2 deletions(-)

diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 449565eeae..6915b945ba 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -2,6 +2,7 @@
  * Copyright(c) 2020 Arm Limited
  * Copyright(c) 2010-2019 Intel Corporation
  * Copyright(c) 2023 Microsoft Corporation
+ * Copyright(c) 2024 Ericsson AB
  */
 
 #ifndef _RTE_BITOPS_H_
@@ -11,12 +12,14 @@
  * @file
  * Bit Operations
  *
- * This file defines a family of APIs for bit operations
- * without enforcing memory ordering.
+ * This file provides functionality for low-level, single-word
+ * arithmetic and bit-level operations, such as counting or
+ * setting individual bits.
  */
 
 #include <stdint.h>
 
+#include <rte_compat.h>
 #include <rte_debug.h>
 
 #ifdef __cplusplus
@@ -105,6 +108,197 @@ extern "C" {
 #define RTE_FIELD_GET64(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> rte_ctz64(mask)))
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test bit in word.
+ *
+ * Generic selection macro to test the value of a bit in a 32-bit or
+ * 64-bit word. The type of operation depends on the type of the @c
+ * addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_test(addr, nr)					\
+	_Generic((addr),					\
+		uint32_t *: __rte_bit_test32,			\
+		const uint32_t *: __rte_bit_test32,		\
+		uint64_t *: __rte_bit_test64,			\
+		const uint64_t *: __rte_bit_test64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Set bit in word.
+ *
+ * Generic selection macro to set a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_set(addr, nr)				\
+	_Generic((addr),				\
+		 uint32_t *: __rte_bit_set32,		\
+		 uint64_t *: __rte_bit_set64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Clear bit in word.
+ *
+ * Generic selection macro to clear a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr
+ * parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_clear(addr, nr)					\
+	_Generic((addr),					\
+		 uint32_t *: __rte_bit_clear32,			\
+		 uint64_t *: __rte_bit_clear64)(addr, nr)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Assign a value to a bit in word.
+ *
+ * Generic selection macro to assign a value to a bit in a 32-bit or 64-bit
+ * word. The type of operation depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ */
+#define rte_bit_assign(addr, nr, value)					\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_assign32,			\
+		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Flip a bit in word.
+ *
+ * Generic selection macro to change the value of a bit to '0' if '1'
+ * or '1' if '0' in a 32-bit or 64-bit word. The type of operation
+ * depends on the type of the @c addr parameter.
+ *
+ * This macro does not give any guarantees in regards to memory
+ * ordering or atomicity.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ */
+#define rte_bit_flip(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_flip32,				\
+		 uint64_t *: __rte_bit_flip64)(addr, nr)
+
+#define __RTE_GEN_BIT_TEST(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_ ## variant ## test ## size(const qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return *addr & mask;					\
+	}
+
+#define __RTE_GEN_BIT_SET(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## set ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		*addr |= mask;						\
+	}								\
+
+#define __RTE_GEN_BIT_CLEAR(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## clear ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		uint ## size ## _t mask = ~((uint ## size ## _t)1 << nr); \
+		(*addr) &= mask;					\
+	}								\
+
+#define __RTE_GEN_BIT_ASSIGN(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## assign ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr, bool value) \
+	{								\
+		if (value)						\
+			__rte_bit_ ## variant ## set ## size(addr, nr);	\
+		else							\
+			__rte_bit_ ## variant ## clear ## size(addr, nr); \
+	}
+
+#define __RTE_GEN_BIT_FLIP(variant, qualifier, size)			\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_ ## variant ## flip ## size(qualifier uint ## size ## _t *addr, \
+					    unsigned int nr)		\
+	{								\
+		bool value;						\
+									\
+		value = __rte_bit_ ## variant ## test ## size(addr, nr); \
+		__rte_bit_ ## variant ## assign ## size(addr, nr, !value); \
+	}
+
+#define __RTE_GEN_BIT_OPS(v, qualifier, size)	\
+	__RTE_GEN_BIT_TEST(v, qualifier, size)	\
+	__RTE_GEN_BIT_SET(v, qualifier, size)	\
+	__RTE_GEN_BIT_CLEAR(v, qualifier, size)	\
+	__RTE_GEN_BIT_ASSIGN(v, qualifier, size)	\
+	__RTE_GEN_BIT_FLIP(v, qualifier, size)
+
+#define __RTE_GEN_BIT_OPS_SIZE(size) \
+	__RTE_GEN_BIT_OPS(,, size)
+
+__RTE_GEN_BIT_OPS_SIZE(32)
+__RTE_GEN_BIT_OPS_SIZE(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -787,6 +981,68 @@ rte_log2_u64(uint64_t v)
 
 #ifdef __cplusplus
 }
+
+/*
+ * Since C++ doesn't support generic selection (i.e., _Generic),
+ * function overloading is used instead. Such functions must be
+ * defined outside 'extern "C"' to be accepted by the compiler.
+ */
+
+#undef rte_bit_test
+#undef rte_bit_set
+#undef rte_bit_clear
+#undef rte_bit_assign
+#undef rte_bit_flip
+
+#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+	static inline void						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+			arg1_type arg1_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+	}
+
+#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name)					\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name)
+
+__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v6 3/6] eal: add unit tests for bit operations
  2024-09-10  8:31                                               ` [PATCH v6 0/6] Improve EAL bit operations API Mattias Rönnblom
  2024-09-10  8:31                                                 ` [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
  2024-09-10  8:31                                                 ` [PATCH v6 2/6] eal: extend bit manipulation functionality Mattias Rönnblom
@ 2024-09-10  8:31                                                 ` Mattias Rönnblom
  2024-09-10  8:31                                                 ` [PATCH v6 4/6] eal: add atomic " Mattias Rönnblom
                                                                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  8:31 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

Extend bitops tests to cover the
rte_bit_[test|set|clear|assign|flip]()
functions.

The tests are converted to use the test suite runner framework.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

RFC v6:
 * Test rte_bit_*test() usage through const pointers.

RFC v4:
 * Remove redundant line continuations.
---
 app/test/test_bitops.c | 85 ++++++++++++++++++++++++++++++++++--------
 1 file changed, 70 insertions(+), 15 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 0d4ccfb468..322f58c066 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -1,13 +1,68 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Ericsson AB
  */
 
+#include <stdbool.h>
+
 #include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_random.h>
 #include "test.h"
 
-uint32_t val32;
-uint64_t val64;
+#define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
+			    flip_fun, test_fun, size)			\
+	static int							\
+	test_name(void)							\
+	{								\
+		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
+		unsigned int bit_nr;					\
+		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			bool assign = rte_rand() & 1;			\
+			if (assign)					\
+				assign_fun(&word, bit_nr, reference_bit); \
+			else {						\
+				if (reference_bit)			\
+					set_fun(&word, bit_nr);		\
+				else					\
+					clear_fun(&word, bit_nr);	\
+									\
+			}						\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+			TEST_ASSERT(test_fun(&word, bit_nr) != reference_bit, \
+				    "Bit %d had unflipped value", bit_nr); \
+			flip_fun(&word, bit_nr);			\
+									\
+			const uint ## size ## _t *const_ptr = &word;	\
+			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
+				    reference_bit,			\
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
+			bool reference_bit = (reference >> bit_nr) & 1;	\
+			TEST_ASSERT(test_fun(&word, bit_nr) == reference_bit, \
+				    "Bit %d had unexpected value", bit_nr); \
+		}							\
+									\
+		TEST_ASSERT(reference == word, "Word had unexpected value"); \
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+
+static uint32_t val32;
+static uint64_t val64;
 
 #define MAX_BITS_32 32
 #define MAX_BITS_64 64
@@ -117,22 +172,22 @@ test_bit_relaxed_test_set_clear(void)
 	return TEST_SUCCESS;
 }
 
+static struct unit_test_suite test_suite = {
+	.suite_name = "Bitops test suite",
+	.unit_test_cases = {
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_relaxed_set),
+		TEST_CASE(test_bit_relaxed_clear),
+		TEST_CASE(test_bit_relaxed_test_set_clear),
+		TEST_CASES_END()
+	}
+};
+
 static int
 test_bitops(void)
 {
-	val32 = 0;
-	val64 = 0;
-
-	if (test_bit_relaxed_set() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_clear() < 0)
-		return TEST_FAILED;
-
-	if (test_bit_relaxed_test_set_clear() < 0)
-		return TEST_FAILED;
-
-	return TEST_SUCCESS;
+	return unit_test_suite_runner(&test_suite);
 }
 
 REGISTER_FAST_TEST(bitops_autotest, true, true, test_bitops);
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v6 4/6] eal: add atomic bit operations
  2024-09-10  8:31                                               ` [PATCH v6 0/6] Improve EAL bit operations API Mattias Rönnblom
                                                                   ` (2 preceding siblings ...)
  2024-09-10  8:31                                                 ` [PATCH v6 3/6] eal: add unit tests for bit operations Mattias Rönnblom
@ 2024-09-10  8:31                                                 ` Mattias Rönnblom
  2024-09-10  8:31                                                 ` [PATCH v6 5/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
  2024-09-10  8:31                                                 ` [PATCH v6 6/6] eal: extend bitops to handle volatile pointers Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  8:31 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

Add atomic bit test/set/clear/assign/flip and
test-and-set/clear/assign/flip functions.

All atomic bit functions allow (and indeed, require) the caller to
specify a memory order.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

PATCH v3:
 * Introduce __RTE_GEN_BIT_ATOMIC_*() 'qualifier' argument already in
   this patch (Jack Bond-Preston).
 * Refer to volatile bit op functions as variants instead of families
   (macro parameter naming).
 * Update release notes.

PATCH:
 * Add missing macro #undef for C++ version of atomic bit flip.

RFC v7:
 * Replace compare-exchange-based rte_bitset_atomic_test_and_*() and
   flip() with implementations that use the previous value as returned
   by the atomic fetch function.
 * Reword documentation to match the non-atomic macro variants.
 * Remove pointer to <rte_stdatomic.h> for memory model documentation,
   since there is no documentation for that API.

RFC v6:
 * Have rte_bit_atomic_test() accept const-marked bitsets.

RFC v4:
 * Add atomic bit flip.
 * Mark macro-generated private functions experimental.

RFC v3:
 * Work around lack of C++ support for _Generic (Tyler Retzlaff).

RFC v2:
 o Add rte_bit_atomic_test_and_assign() (for consistency).
 o Fix bugs in rte_bit_atomic_test_and_[set|clear]().
 o Use <rte_stdatomics.h> to support MSVC.
---
 doc/guides/rel_notes/release_24_11.rst |  17 +
 lib/eal/include/rte_bitops.h           | 415 +++++++++++++++++++++++++
 2 files changed, 432 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
index 0ff70d9057..3111b1e4c0 100644
--- a/doc/guides/rel_notes/release_24_11.rst
+++ b/doc/guides/rel_notes/release_24_11.rst
@@ -56,6 +56,23 @@ New Features
      =======================================================
 
 
+* **Extended bit operations API.**
+
+  The support for bit-level operations on single 32- and 64-bit words
+  in <rte_bitops.h> has been extended with two families of
+  semantically well-defined functions.
+
+  rte_bit_[test|set|clear|assign|flip]() functions provide excellent
+  performance (by avoiding restricting the compiler and CPU), but give
+  no guarantees in regards to memory ordering or atomicity.
+
+  rte_bit_atomic_*() provides atomic bit-level operations, including
+  the possibility to specifying memory ordering constraints.
+
+  The new public API elements are polymorphic, using the _Generic-
+  based macros (for C) and function overloading (in C++ translation
+  units).
+
 Removed Items
 -------------
 
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 6915b945ba..3ad6795fd1 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -21,6 +21,7 @@
 
 #include <rte_compat.h>
 #include <rte_debug.h>
+#include <rte_stdatomic.h>
 
 #ifdef __cplusplus
 extern "C" {
@@ -226,6 +227,204 @@ extern "C" {
 		 uint32_t *: __rte_bit_flip32,				\
 		 uint64_t *: __rte_bit_flip64)(addr, nr)
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Test if a particular bit in a word is set with a particular memory
+ * order.
+ *
+ * Test a bit with the resulting memory load ordered as per the
+ * specified memory order.
+ *
+ * @param addr
+ *   A pointer to the word to query.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit is set, and false otherwise.
+ */
+#define rte_bit_atomic_test(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test32,			\
+		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 uint64_t *: __rte_bit_atomic_test64,			\
+		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
+							    memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically set bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '1', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_set(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_set32,			\
+		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically clear bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in
+ * the word pointed to by @c addr to '0', with the memory ordering as
+ * specified by @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_clear(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_clear32,			\
+		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically assign a value to bit in word.
+ *
+ * Generic selection macro to atomically set bit specified by @c nr in the
+ * word pointed to by @c addr to the value indicated by @c value, with
+ * the memory ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_assign32,			\
+		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
+							memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically flip bit in word.
+ *
+ * Generic selection macro to atomically negate the value of the bit
+ * specified by @c nr in the word pointed to by @c addr to the value
+ * indicated by @c value, with the memory ordering as specified with
+ * @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ */
+#define rte_bit_atomic_flip(addr, nr, memory_order)			\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_flip32,			\
+		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and set a bit in word.
+ *
+ * Generic selection macro to atomically test and set bit specified by
+ * @c nr in the word pointed to by @c addr to '1', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
+							      memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and clear a bit in word.
+ *
+ * Generic selection macro to atomically test and clear bit specified
+ * by @c nr in the word pointed to by @c addr to '0', with the memory
+ * ordering as specified with @c memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
+		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
+								memory_order)
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change without prior notice.
+ *
+ * Atomically test and assign a bit in word.
+ *
+ * Generic selection macro to atomically test and assign bit specified
+ * by @c nr in the word pointed to by @c addr the value specified by
+ * @c value, with the memory ordering as specified with @c
+ * memory_order.
+ *
+ * @param addr
+ *   A pointer to the word to modify.
+ * @param nr
+ *   The index of the bit.
+ * @param value
+ *   The new value of the bit - true for '1', or false for '0'.
+ * @param memory_order
+ *   The memory order to use.
+ * @return
+ *   Returns true if the bit was set, and false otherwise.
+ */
+#define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
+		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
+								 value, \
+								 memory_order)
+
 #define __RTE_GEN_BIT_TEST(variant, qualifier, size)			\
 	__rte_experimental						\
 	static inline bool						\
@@ -299,6 +498,146 @@ extern "C" {
 __RTE_GEN_BIT_OPS_SIZE(32)
 __RTE_GEN_BIT_OPS_SIZE(64)
 
+#define __RTE_GEN_BIT_ATOMIC_TEST(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test ## size(const qualifier uint ## size ## _t *addr, \
+						     unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		const qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr = \
+			(const qualifier RTE_ATOMIC(uint ## size ## _t) *)addr;	\
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		return rte_atomic_load_explicit(a_addr, memory_order) & mask; \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_SET(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_ ## variant ## set ## size(qualifier uint ## size ## _t *addr, \
+					      unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_or_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_CLEAR(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_ ## variant ## clear ## size(qualifier uint ## size ## _t *addr,	\
+						unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_and_explicit(a_addr, ~mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_FLIP(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_ ## variant ## flip ## size(qualifier uint ## size ## _t *addr, \
+					       unsigned int nr, int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		rte_atomic_fetch_xor_explicit(a_addr, mask, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_ASSIGN(variant, qualifier, size)		\
+	__rte_experimental						\
+	static inline void						\
+	__rte_bit_atomic_## variant ## assign ## size(qualifier uint ## size ## _t *addr, \
+						unsigned int nr, bool value, \
+						int memory_order)	\
+	{								\
+		if (value)						\
+			__rte_bit_atomic_ ## variant ## set ## size(addr, nr, memory_order); \
+		else							\
+			__rte_bit_atomic_ ## variant ## clear ## size(addr, nr, \
+								     memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_SET(variant, qualifier, size)	\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test_and_set ## size(qualifier uint ## size ## _t *addr, \
+						       unsigned int nr,	\
+						       int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+		prev = rte_atomic_fetch_or_explicit(a_addr, mask,	\
+						    memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(variant, qualifier, size)	\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test_and_clear ## size(qualifier uint ## size ## _t *addr, \
+							 unsigned int nr, \
+							 int memory_order) \
+	{								\
+		RTE_ASSERT(nr < size);					\
+									\
+		qualifier RTE_ATOMIC(uint ## size ## _t) *a_addr =	\
+			(qualifier RTE_ATOMIC(uint ## size ## _t) *)addr; \
+		uint ## size ## _t mask = (uint ## size ## _t)1 << nr;	\
+		uint ## size ## _t prev;				\
+									\
+	        prev = rte_atomic_fetch_and_explicit(a_addr, ~mask,	\
+						     memory_order);	\
+									\
+		return prev & mask;					\
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(variant, qualifier, size)	\
+	__rte_experimental						\
+	static inline bool						\
+	__rte_bit_atomic_ ## variant ## test_and_assign ## size(qualifier uint ## size ## _t *addr, \
+							  unsigned int nr, \
+							  bool value,	\
+							  int memory_order) \
+	{								\
+		if (value)						\
+			return __rte_bit_atomic_ ## variant ## test_and_set ## size(addr, nr, memory_order); \
+		else							\
+			return __rte_bit_atomic_ ## variant ## test_and_clear ## size(addr, nr, memory_order); \
+	}
+
+#define __RTE_GEN_BIT_ATOMIC_OPS(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_SET(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_CLEAR(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_ASSIGN(variant, qualifier, size)	\
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_SET(variant, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_CLEAR(variant, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_TEST_AND_ASSIGN(variant, qualifier, size) \
+	__RTE_GEN_BIT_ATOMIC_FLIP(variant, qualifier, size)
+
+#define __RTE_GEN_BIT_ATOMIC_OPS_SIZE(size) \
+	__RTE_GEN_BIT_ATOMIC_OPS(,, size)
+
+__RTE_GEN_BIT_ATOMIC_OPS_SIZE(32)
+__RTE_GEN_BIT_ATOMIC_OPS_SIZE(64)
+
 /*------------------------ 32-bit relaxed operations ------------------------*/
 
 /**
@@ -994,6 +1333,15 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_assign
 #undef rte_bit_flip
 
+#undef rte_bit_atomic_test
+#undef rte_bit_atomic_set
+#undef rte_bit_atomic_clear
+#undef rte_bit_atomic_assign
+#undef rte_bit_atomic_flip
+#undef rte_bit_atomic_test_and_set
+#undef rte_bit_atomic_test_and_clear
+#undef rte_bit_atomic_test_and_assign
+
 #define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
 	static inline void						\
 	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
@@ -1037,12 +1385,79 @@ rte_log2_u64(uint64_t v)
 	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
+#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name)				\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+	}
+
+#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)	\
+	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	static inline void						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
+					  arg3_name);		      \
+	}
+
+#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
+				arg2_type, arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	static inline ret_type						\
+	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
+			arg2_type arg2_name, arg3_type arg3_name)	\
+	{								\
+		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
+						 arg3_name);		\
+	}
+
+#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
+				 arg1_name, arg2_type, arg2_name, arg3_type, \
+				 arg3_name)
+
 __RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
 __RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
 __RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
 
+__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+		     int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+		      int, memory_order)
+__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+		      bool, value, int, memory_order)
+
 #endif
 
 #endif /* _RTE_BITOPS_H_ */
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v6 5/6] eal: add unit tests for atomic bit access functions
  2024-09-10  8:31                                               ` [PATCH v6 0/6] Improve EAL bit operations API Mattias Rönnblom
                                                                   ` (3 preceding siblings ...)
  2024-09-10  8:31                                                 ` [PATCH v6 4/6] eal: add atomic " Mattias Rönnblom
@ 2024-09-10  8:31                                                 ` Mattias Rönnblom
  2024-09-10  8:31                                                 ` [PATCH v6 6/6] eal: extend bitops to handle volatile pointers Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  8:31 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

Extend bitops tests to cover the rte_bit_atomic_*() family of
functions.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

RFC v4:
 * Add atomicity test for atomic bit flip.

RFC v3:
 * Rename variable 'main' to make ICC happy.
---
 app/test/test_bitops.c | 313 ++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 312 insertions(+), 1 deletion(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index 322f58c066..b80216a0a1 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -3,10 +3,13 @@
  * Copyright(c) 2024 Ericsson AB
  */
 
+#include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_launch.h>
 #include <rte_bitops.h>
+#include <rte_cycles.h>
+#include <rte_launch.h>
+#include <rte_lcore.h>
 #include <rte_random.h>
 #include "test.h"
 
@@ -61,6 +64,304 @@ GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
 
+#define bit_atomic_set(addr, nr)				\
+	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_clear(addr, nr)					\
+	rte_bit_atomic_clear(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_assign(addr, nr, value)				\
+	rte_bit_atomic_assign(addr, nr, value, rte_memory_order_relaxed)
+
+#define bit_atomic_flip(addr, nr)					\
+    rte_bit_atomic_flip(addr, nr, rte_memory_order_relaxed)
+
+#define bit_atomic_test(addr, nr)				\
+	rte_bit_atomic_test(addr, nr, rte_memory_order_relaxed)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64)
+
+#define PARALLEL_TEST_RUNTIME 0.25
+
+#define GEN_TEST_BIT_PARALLEL_ASSIGN(size)				\
+									\
+	struct parallel_access_lcore ## size				\
+	{								\
+		unsigned int bit;					\
+		uint ## size ##_t *word;				\
+		bool failed;						\
+	};								\
+									\
+	static int							\
+	run_parallel_assign ## size(void *arg)				\
+	{								\
+		struct parallel_access_lcore ## size *lcore = arg;	\
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		bool value = false;					\
+									\
+		do {							\
+			bool new_value = rte_rand() & 1;		\
+			bool use_test_and_modify = rte_rand() & 1;	\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (rte_bit_atomic_test(lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) != value) { \
+				lcore->failed = true;			\
+				break;					\
+			}						\
+									\
+			if (use_test_and_modify) {			\
+				bool old_value;				\
+				if (use_assign) 			\
+					old_value = rte_bit_atomic_test_and_assign( \
+						lcore->word, lcore->bit, new_value, \
+						rte_memory_order_relaxed); \
+				else {					\
+					old_value = new_value ?		\
+						rte_bit_atomic_test_and_set( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed) : \
+						rte_bit_atomic_test_and_clear( \
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+				if (old_value != value) {		\
+					lcore->failed = true;		\
+					break;				\
+				}					\
+			} else {					\
+				if (use_assign)				\
+					rte_bit_atomic_assign(lcore->word, lcore->bit, \
+							      new_value, \
+							      rte_memory_order_relaxed); \
+				else {					\
+					if (new_value)			\
+						rte_bit_atomic_set(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+					else				\
+						rte_bit_atomic_clear(	\
+							lcore->word, lcore->bit, \
+							rte_memory_order_relaxed); \
+				}					\
+			}						\
+									\
+			value = new_value;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_assign ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		struct parallel_access_lcore ## size lmain = {		\
+			.word = &word					\
+		};							\
+		struct parallel_access_lcore ## size lworker = {	\
+			.word = &word					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		lmain.bit = rte_rand_max(size);				\
+		do {							\
+			lworker.bit = rte_rand_max(size);		\
+		} while (lworker.bit == lmain.bit);			\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_assign ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_assign ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		TEST_ASSERT(!lmain.failed, "Main lcore atomic access failed"); \
+		TEST_ASSERT(!lworker.failed, "Worker lcore atomic access " \
+			    "failed");					\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_ASSIGN(32)
+GEN_TEST_BIT_PARALLEL_ASSIGN(64)
+
+#define GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(size)			\
+									\
+	struct parallel_test_and_set_lcore ## size			\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_test_and_modify ## size(void *arg)		\
+	{								\
+		struct parallel_test_and_set_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			bool old_value;					\
+			bool new_value = rte_rand() & 1;		\
+			bool use_assign = rte_rand() & 1;		\
+									\
+			if (use_assign)					\
+				old_value = rte_bit_atomic_test_and_assign( \
+					lcore->word, lcore->bit, new_value, \
+					rte_memory_order_relaxed);	\
+			else						\
+				old_value = new_value ?			\
+					rte_bit_atomic_test_and_set(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed) : \
+					rte_bit_atomic_test_and_clear(	\
+						lcore->word, lcore->bit, \
+						rte_memory_order_relaxed); \
+			if (old_value != new_value)			\
+				lcore->flips++;				\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_test_and_modify ## size(void)		\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_test_and_set_lcore ## size lmain = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_test_and_set_lcore ## size lworker = {	\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_test_and_modify ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_test_and_modify ## size(&lmain);		\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(32)
+GEN_TEST_BIT_PARALLEL_TEST_AND_MODIFY(64)
+
+#define GEN_TEST_BIT_PARALLEL_FLIP(size)				\
+									\
+	struct parallel_flip_lcore ## size				\
+	{								\
+		uint ## size ##_t *word;				\
+		unsigned int bit;					\
+		uint64_t flips;						\
+	};								\
+									\
+	static int							\
+	run_parallel_flip ## size(void *arg)				\
+	{								\
+		struct parallel_flip_lcore ## size *lcore = arg; \
+		uint64_t deadline = rte_get_timer_cycles() +		\
+			PARALLEL_TEST_RUNTIME * rte_get_timer_hz();	\
+		do {							\
+			rte_bit_atomic_flip(lcore->word, lcore->bit,	\
+					    rte_memory_order_relaxed);	\
+			lcore->flips++;					\
+		} while (rte_get_timer_cycles() < deadline);		\
+									\
+		return 0;						\
+	}								\
+									\
+	static int							\
+	test_bit_atomic_parallel_flip ## size(void)			\
+	{								\
+		unsigned int worker_lcore_id;				\
+		uint ## size ## _t word = 0;				\
+		unsigned int bit = rte_rand_max(size);			\
+		struct parallel_flip_lcore ## size lmain = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+		struct parallel_flip_lcore ## size lworker = {		\
+			.word = &word,					\
+			.bit = bit					\
+		};							\
+									\
+		if (rte_lcore_count() < 2) {				\
+			printf("Need multiple cores to run parallel test.\n"); \
+			return TEST_SKIPPED;				\
+		}							\
+									\
+		worker_lcore_id = rte_get_next_lcore(-1, 1, 0);		\
+									\
+		int rc = rte_eal_remote_launch(run_parallel_flip ## size, \
+					       &lworker, worker_lcore_id); \
+		TEST_ASSERT(rc == 0, "Worker thread launch failed");	\
+									\
+		run_parallel_flip ## size(&lmain);			\
+									\
+		rte_eal_mp_wait_lcore();				\
+									\
+		uint64_t total_flips = lmain.flips + lworker.flips;	\
+		bool expected_value = total_flips % 2;			\
+									\
+		TEST_ASSERT(expected_value == rte_bit_test(&word, bit), \
+			    "After %"PRId64" flips, the bit value "	\
+			    "should be %d", total_flips, expected_value); \
+									\
+		uint64_t expected_word = 0;				\
+		rte_bit_assign(&expected_word, bit, expected_value);	\
+									\
+		TEST_ASSERT(expected_word == word, "Untouched bits have " \
+			    "changed value");				\
+									\
+		return TEST_SUCCESS;					\
+	}
+
+GEN_TEST_BIT_PARALLEL_FLIP(32)
+GEN_TEST_BIT_PARALLEL_FLIP(64)
+
 static uint32_t val32;
 static uint64_t val64;
 
@@ -177,6 +478,16 @@ static struct unit_test_suite test_suite = {
 	.unit_test_cases = {
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_access32),
+		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_atomic_access32),
+		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_parallel_assign32),
+		TEST_CASE(test_bit_atomic_parallel_assign64),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
+		TEST_CASE(test_bit_atomic_parallel_test_and_modify64),
+		TEST_CASE(test_bit_atomic_parallel_flip32),
+		TEST_CASE(test_bit_atomic_parallel_flip64),
 		TEST_CASE(test_bit_relaxed_set),
 		TEST_CASE(test_bit_relaxed_clear),
 		TEST_CASE(test_bit_relaxed_test_set_clear),
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* [PATCH v6 6/6] eal: extend bitops to handle volatile pointers
  2024-09-10  8:31                                               ` [PATCH v6 0/6] Improve EAL bit operations API Mattias Rönnblom
                                                                   ` (4 preceding siblings ...)
  2024-09-10  8:31                                                 ` [PATCH v6 5/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
@ 2024-09-10  8:31                                                 ` Mattias Rönnblom
  5 siblings, 0 replies; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-10  8:31 UTC (permalink / raw)
  To: dev
  Cc: hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, David Marchand,
	Chengwen Feng, Mattias Rönnblom

Have rte_bit_[test|set|clear|assign|flip]() and rte_bit_atomic_*()
handle volatile-marked pointers.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>

--

PATCH v3:
 * Updated to reflect removed 'fun' parameter in __RTE_GEN_BIT_*()
   (Jack Bond-Preston).

PATCH v2:
 * Actually run the test_bit_atomic_v_access*() test functions.
---
 app/test/test_bitops.c       |  32 +++-
 lib/eal/include/rte_bitops.h | 301 +++++++++++++++++++++++------------
 2 files changed, 222 insertions(+), 111 deletions(-)

diff --git a/app/test/test_bitops.c b/app/test/test_bitops.c
index b80216a0a1..10e87f6776 100644
--- a/app/test/test_bitops.c
+++ b/app/test/test_bitops.c
@@ -14,13 +14,13 @@
 #include "test.h"
 
 #define GEN_TEST_BIT_ACCESS(test_name, set_fun, clear_fun, assign_fun,	\
-			    flip_fun, test_fun, size)			\
+			    flip_fun, test_fun, size, mod)		\
 	static int							\
 	test_name(void)							\
 	{								\
 		uint ## size ## _t reference = (uint ## size ## _t)rte_rand(); \
 		unsigned int bit_nr;					\
-		uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
+		mod uint ## size ## _t word = (uint ## size ## _t)rte_rand(); \
 									\
 		for (bit_nr = 0; bit_nr < size; bit_nr++) {		\
 			bool reference_bit = (reference >> bit_nr) & 1;	\
@@ -41,7 +41,7 @@
 				    "Bit %d had unflipped value", bit_nr); \
 			flip_fun(&word, bit_nr);			\
 									\
-			const uint ## size ## _t *const_ptr = &word;	\
+			const mod uint ## size ## _t *const_ptr = &word; \
 			TEST_ASSERT(test_fun(const_ptr, bit_nr) ==	\
 				    reference_bit,			\
 				    "Bit %d had unexpected value", bit_nr); \
@@ -59,10 +59,16 @@
 	}
 
 GEN_TEST_BIT_ACCESS(test_bit_access32, rte_bit_set, rte_bit_clear,
-		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32)
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32,)
 
 GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
-		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64)
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64,)
+
+GEN_TEST_BIT_ACCESS(test_bit_v_access32, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 32, volatile)
+
+GEN_TEST_BIT_ACCESS(test_bit_v_access64, rte_bit_set, rte_bit_clear,
+		    rte_bit_assign, rte_bit_flip, rte_bit_test, 64, volatile)
 
 #define bit_atomic_set(addr, nr)				\
 	rte_bit_atomic_set(addr, nr, rte_memory_order_relaxed)
@@ -81,11 +87,19 @@ GEN_TEST_BIT_ACCESS(test_bit_access64, rte_bit_set, rte_bit_clear,
 
 GEN_TEST_BIT_ACCESS(test_bit_atomic_access32, bit_atomic_set,
 		    bit_atomic_clear, bit_atomic_assign,
-		    bit_atomic_flip, bit_atomic_test, 32)
+		    bit_atomic_flip, bit_atomic_test, 32,)
 
 GEN_TEST_BIT_ACCESS(test_bit_atomic_access64, bit_atomic_set,
 		    bit_atomic_clear, bit_atomic_assign,
-		    bit_atomic_flip, bit_atomic_test, 64)
+		    bit_atomic_flip, bit_atomic_test, 64,)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_v_access32, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 32, volatile)
+
+GEN_TEST_BIT_ACCESS(test_bit_atomic_v_access64, bit_atomic_set,
+		    bit_atomic_clear, bit_atomic_assign,
+		    bit_atomic_flip, bit_atomic_test, 64, volatile)
 
 #define PARALLEL_TEST_RUNTIME 0.25
 
@@ -480,8 +494,12 @@ static struct unit_test_suite test_suite = {
 		TEST_CASE(test_bit_access64),
 		TEST_CASE(test_bit_access32),
 		TEST_CASE(test_bit_access64),
+		TEST_CASE(test_bit_v_access32),
+		TEST_CASE(test_bit_v_access64),
 		TEST_CASE(test_bit_atomic_access32),
 		TEST_CASE(test_bit_atomic_access64),
+		TEST_CASE(test_bit_atomic_v_access32),
+		TEST_CASE(test_bit_atomic_v_access64),
 		TEST_CASE(test_bit_atomic_parallel_assign32),
 		TEST_CASE(test_bit_atomic_parallel_assign64),
 		TEST_CASE(test_bit_atomic_parallel_test_and_modify32),
diff --git a/lib/eal/include/rte_bitops.h b/lib/eal/include/rte_bitops.h
index 3ad6795fd1..d7a07c4099 100644
--- a/lib/eal/include/rte_bitops.h
+++ b/lib/eal/include/rte_bitops.h
@@ -127,12 +127,16 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_test(addr, nr)					\
-	_Generic((addr),					\
-		uint32_t *: __rte_bit_test32,			\
-		const uint32_t *: __rte_bit_test32,		\
-		uint64_t *: __rte_bit_test64,			\
-		const uint64_t *: __rte_bit_test64)(addr, nr)
+#define rte_bit_test(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_test32,				\
+		 const uint32_t *: __rte_bit_test32,			\
+		 volatile uint32_t *: __rte_bit_v_test32,		\
+		 const volatile uint32_t *: __rte_bit_v_test32,		\
+		 uint64_t *: __rte_bit_test64,				\
+		 const uint64_t *: __rte_bit_test64,			\
+		 volatile uint64_t *: __rte_bit_v_test64,		\
+		 const volatile uint64_t *: __rte_bit_v_test64)(addr, nr)
 
 /**
  * @warning
@@ -152,10 +156,12 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_set(addr, nr)				\
-	_Generic((addr),				\
-		 uint32_t *: __rte_bit_set32,		\
-		 uint64_t *: __rte_bit_set64)(addr, nr)
+#define rte_bit_set(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_set32,				\
+		 volatile uint32_t *: __rte_bit_v_set32,		\
+		 uint64_t *: __rte_bit_set64,				\
+		 volatile uint64_t *: __rte_bit_v_set64)(addr, nr)
 
 /**
  * @warning
@@ -175,10 +181,12 @@ extern "C" {
  * @param nr
  *   The index of the bit.
  */
-#define rte_bit_clear(addr, nr)					\
-	_Generic((addr),					\
-		 uint32_t *: __rte_bit_clear32,			\
-		 uint64_t *: __rte_bit_clear64)(addr, nr)
+#define rte_bit_clear(addr, nr)						\
+	_Generic((addr),						\
+		 uint32_t *: __rte_bit_clear32,				\
+		 volatile uint32_t *: __rte_bit_v_clear32,		\
+		 uint64_t *: __rte_bit_clear64,				\
+		 volatile uint64_t *: __rte_bit_v_clear64)(addr, nr)
 
 /**
  * @warning
@@ -202,7 +210,9 @@ extern "C" {
 #define rte_bit_assign(addr, nr, value)					\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_assign32,			\
-		 uint64_t *: __rte_bit_assign64)(addr, nr, value)
+		 volatile uint32_t *: __rte_bit_v_assign32,		\
+		 uint64_t *: __rte_bit_assign64,			\
+		 volatile uint64_t *: __rte_bit_v_assign64)(addr, nr, value)
 
 /**
  * @warning
@@ -225,7 +235,9 @@ extern "C" {
 #define rte_bit_flip(addr, nr)						\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_flip32,				\
-		 uint64_t *: __rte_bit_flip64)(addr, nr)
+		 volatile uint32_t *: __rte_bit_v_flip32,		\
+		 uint64_t *: __rte_bit_flip64,				\
+		 volatile uint64_t *: __rte_bit_v_flip64)(addr, nr)
 
 /**
  * @warning
@@ -250,9 +262,13 @@ extern "C" {
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test32,			\
 		 const uint32_t *: __rte_bit_atomic_test32,		\
+		 volatile uint32_t *: __rte_bit_atomic_v_test32,	\
+		 const volatile uint32_t *: __rte_bit_atomic_v_test32,	\
 		 uint64_t *: __rte_bit_atomic_test64,			\
-		 const uint64_t *: __rte_bit_atomic_test64)(addr, nr,	\
-							    memory_order)
+		 const uint64_t *: __rte_bit_atomic_test64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test64,	\
+		 const volatile uint64_t *: __rte_bit_atomic_v_test64) \
+						    (addr, nr, memory_order)
 
 /**
  * @warning
@@ -274,7 +290,10 @@ extern "C" {
 #define rte_bit_atomic_set(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_set32,			\
-		 uint64_t *: __rte_bit_atomic_set64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_set32,		\
+		 uint64_t *: __rte_bit_atomic_set64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_set64)(addr, nr, \
+								memory_order)
 
 /**
  * @warning
@@ -296,7 +315,10 @@ extern "C" {
 #define rte_bit_atomic_clear(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_clear32,			\
-		 uint64_t *: __rte_bit_atomic_clear64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_clear32,	\
+		 uint64_t *: __rte_bit_atomic_clear64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_clear64)(addr, nr, \
+								  memory_order)
 
 /**
  * @warning
@@ -320,8 +342,11 @@ extern "C" {
 #define rte_bit_atomic_assign(addr, nr, value, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_assign32,			\
-		 uint64_t *: __rte_bit_atomic_assign64)(addr, nr, value, \
-							memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_assign32,	\
+		 uint64_t *: __rte_bit_atomic_assign64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_assign64)(addr, nr, \
+								   value, \
+								   memory_order)
 
 /**
  * @warning
@@ -344,7 +369,10 @@ extern "C" {
 #define rte_bit_atomic_flip(addr, nr, memory_order)			\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_flip32,			\
-		 uint64_t *: __rte_bit_atomic_flip64)(addr, nr, memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_flip32,	\
+		 uint64_t *: __rte_bit_atomic_flip64,			\
+		 volatile uint64_t *: __rte_bit_atomic_v_flip64)(addr, nr, \
+								 memory_order)
 
 /**
  * @warning
@@ -368,8 +396,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_set(addr, nr, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_set32,		\
-		 uint64_t *: __rte_bit_atomic_test_and_set64)(addr, nr,	\
-							      memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_set32, \
+		 uint64_t *: __rte_bit_atomic_test_and_set64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_set64) \
+						    (addr, nr, memory_order)
 
 /**
  * @warning
@@ -393,8 +423,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_clear(addr, nr, memory_order)		\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_clear32,		\
-		 uint64_t *: __rte_bit_atomic_test_and_clear64)(addr, nr, \
-								memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_clear32, \
+		 uint64_t *: __rte_bit_atomic_test_and_clear64,		\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_clear64) \
+						       (addr, nr, memory_order)
 
 /**
  * @warning
@@ -421,9 +453,10 @@ extern "C" {
 #define rte_bit_atomic_test_and_assign(addr, nr, value, memory_order)	\
 	_Generic((addr),						\
 		 uint32_t *: __rte_bit_atomic_test_and_assign32,	\
-		 uint64_t *: __rte_bit_atomic_test_and_assign64)(addr, nr, \
-								 value, \
-								 memory_order)
+		 volatile uint32_t *: __rte_bit_atomic_v_test_and_assign32, \
+		 uint64_t *: __rte_bit_atomic_test_and_assign64,	\
+		 volatile uint64_t *: __rte_bit_atomic_v_test_and_assign64) \
+						(addr, nr, value, memory_order)
 
 #define __RTE_GEN_BIT_TEST(variant, qualifier, size)			\
 	__rte_experimental						\
@@ -493,7 +526,8 @@ extern "C" {
 	__RTE_GEN_BIT_FLIP(v, qualifier, size)
 
 #define __RTE_GEN_BIT_OPS_SIZE(size) \
-	__RTE_GEN_BIT_OPS(,, size)
+	__RTE_GEN_BIT_OPS(,, size) \
+	__RTE_GEN_BIT_OPS(v_, volatile, size)
 
 __RTE_GEN_BIT_OPS_SIZE(32)
 __RTE_GEN_BIT_OPS_SIZE(64)
@@ -633,7 +667,8 @@ __RTE_GEN_BIT_OPS_SIZE(64)
 	__RTE_GEN_BIT_ATOMIC_FLIP(variant, qualifier, size)
 
 #define __RTE_GEN_BIT_ATOMIC_OPS_SIZE(size) \
-	__RTE_GEN_BIT_ATOMIC_OPS(,, size)
+	__RTE_GEN_BIT_ATOMIC_OPS(,, size) \
+	__RTE_GEN_BIT_ATOMIC_OPS(v_, volatile, size)
 
 __RTE_GEN_BIT_ATOMIC_OPS_SIZE(32)
 __RTE_GEN_BIT_ATOMIC_OPS_SIZE(64)
@@ -1342,120 +1377,178 @@ rte_log2_u64(uint64_t v)
 #undef rte_bit_atomic_test_and_clear
 #undef rte_bit_atomic_test_and_assign
 
-#define __RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, size, arg1_type, arg1_name) \
+#define __RTE_BIT_OVERLOAD_V_2(family, v, fun, c, size, arg1_type, arg1_name) \
 	static inline void						\
-	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
-			arg1_type arg1_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name)			\
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name);		\
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_2(fun, qualifier, arg1_type, arg1_name)	\
-	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 32, arg1_type, arg1_name) \
-	__RTE_BIT_OVERLOAD_SZ_2(fun, qualifier, 64, arg1_type, arg1_name)
+#define __RTE_BIT_OVERLOAD_SZ_2(family, fun, c, size, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_V_2(family,, fun, c, size, arg1_type,	\
+			       arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2(family, v_, fun, c volatile, size, \
+			       arg1_type, arg1_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name)				\
+#define __RTE_BIT_OVERLOAD_2(family, fun, c, arg1_type, arg1_name)	\
+	__RTE_BIT_OVERLOAD_SZ_2(family, fun, c, 32, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2(family, fun, c, 64, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_V_2R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name)				\
 	static inline ret_type						\
-	rte_bit_ ## fun(qualifier uint ## size ## _t *addr,		\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
 			arg1_type arg1_name)				\
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name);	\
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_2R(fun, qualifier, ret_type, arg1_type, arg1_name) \
-	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, size, ret_type, arg1_type, \
+				 arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2R(family,, fun, c, size, ret_type, arg1_type, \
+				arg1_name)				\
+	__RTE_BIT_OVERLOAD_V_2R(family, v_, fun, c volatile,		\
+				size, ret_type, arg1_type, arg1_name)
+
+#define __RTE_BIT_OVERLOAD_2R(family, fun, c, ret_type, arg1_type, arg1_name) \
+	__RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, 32, ret_type, arg1_type, \
 				 arg1_name)				\
-	__RTE_BIT_OVERLOAD_SZ_2R(fun, qualifier, 64, ret_type, arg1_type, \
+	__RTE_BIT_OVERLOAD_SZ_2R(family, fun, c, 64, ret_type, arg1_type, \
 				 arg1_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, size, arg1_type, arg1_name, \
-				arg2_type, arg2_name)			\
+#define __RTE_BIT_OVERLOAD_V_3(family, v, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name)			\
 	static inline void						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name) \
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name);	\
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name, \
+							 arg2_name);	\
 	}
 
-#define __RTE_BIT_OVERLOAD_3(fun, qualifier, arg1_type, arg1_name, arg2_type, \
+#define __RTE_BIT_OVERLOAD_SZ_3(family, fun, c, size, arg1_type, arg1_name, \
+				arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_V_3(family,, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_V_3(family, v_, fun, c volatile, size, arg1_type, \
+			       arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_3(family, fun, c, arg1_type, arg1_name, arg2_type, \
 			     arg2_name)					\
-	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 32, arg1_type, arg1_name, \
+	__RTE_BIT_OVERLOAD_SZ_3(family, fun, c, 32, arg1_type, arg1_name, \
 				arg2_type, arg2_name)			\
-	__RTE_BIT_OVERLOAD_SZ_3(fun, qualifier, 64, arg1_type, arg1_name, \
+	__RTE_BIT_OVERLOAD_SZ_3(family, fun, c, 64, arg1_type, arg1_name, \
 				arg2_type, arg2_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name)	\
+#define __RTE_BIT_OVERLOAD_V_3R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name)	\
 	static inline ret_type						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name)				\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name) \
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name); \
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name, \
+								arg2_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_3R(fun, qualifier, ret_type, arg1_type, arg1_name, \
-			      arg2_type, arg2_name)			\
-	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, size, ret_type, arg1_type, \
 				 arg1_name, arg2_type, arg2_name)	\
-	__RTE_BIT_OVERLOAD_SZ_3R(fun, qualifier, 64, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name)
+	__RTE_BIT_OVERLOAD_V_3R(family,, fun, c, size, ret_type, \
+				arg1_type, arg1_name, arg2_type, arg2_name) \
+	__RTE_BIT_OVERLOAD_V_3R(family, v_, fun, c volatile, size, \
+				ret_type, arg1_type, arg1_name, arg2_type, \
+				arg2_name)
 
-#define __RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, size, arg1_type, arg1_name, \
-				arg2_type, arg2_name, arg3_type, arg3_name) \
+#define __RTE_BIT_OVERLOAD_3R(family, fun, c, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name)			\
+	__RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, 32, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name) \
+	__RTE_BIT_OVERLOAD_SZ_3R(family, fun, c, 64, ret_type, \
+				 arg1_type, arg1_name, arg2_type, arg2_name)
+
+#define __RTE_BIT_OVERLOAD_V_4(family, v, fun, c, size, arg1_type, arg1_name, \
+			       arg2_type, arg2_name, arg3_type,	arg3_name) \
 	static inline void						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name, arg3_type arg3_name)	\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name, \
+				  arg3_type arg3_name)			\
 	{								\
-		__rte_bit_ ## fun ## size(addr, arg1_name, arg2_name,	\
-					  arg3_name);		      \
+		__rte_bit_ ## family ## v ## fun ## size(addr, arg1_name, \
+							 arg2_name,	\
+							 arg3_name);	\
 	}
 
-#define __RTE_BIT_OVERLOAD_4(fun, qualifier, arg1_type, arg1_name, arg2_type, \
-			     arg2_name, arg3_type, arg3_name)		\
-	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 32, arg1_type, arg1_name, \
+#define __RTE_BIT_OVERLOAD_SZ_4(family, fun, c, size, arg1_type, arg1_name, \
 				arg2_type, arg2_name, arg3_type, arg3_name) \
-	__RTE_BIT_OVERLOAD_SZ_4(fun, qualifier, 64, arg1_type, arg1_name, \
-				arg2_type, arg2_name, arg3_type, arg3_name)
-
-#define __RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, size, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name, arg3_type, \
-				 arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4(family,, fun, c, size, arg1_type,	\
+			       arg1_name, arg2_type, arg2_name, arg3_type, \
+			       arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4(family, v_, fun, c volatile, size,	\
+			       arg1_type, arg1_name, arg2_type, arg2_name, \
+			       arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_4(family, fun, c, arg1_type, arg1_name, arg2_type, \
+			     arg2_name, arg3_type, arg3_name)		\
+	__RTE_BIT_OVERLOAD_SZ_4(family, fun, c, 32, arg1_type,		\
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
+	__RTE_BIT_OVERLOAD_SZ_4(family, fun, c, 64, arg1_type,		\
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)
+
+#define __RTE_BIT_OVERLOAD_V_4R(family, v, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
 	static inline ret_type						\
-	rte_bit_ ## fun(uint ## size ## _t *addr, arg1_type arg1_name,	\
-			arg2_type arg2_name, arg3_type arg3_name)	\
+	rte_bit_ ## family ## fun(c uint ## size ## _t *addr,		\
+				  arg1_type arg1_name, arg2_type arg2_name, \
+				  arg3_type arg3_name)			\
 	{								\
-		return __rte_bit_ ## fun ## size(addr, arg1_name, arg2_name, \
-						 arg3_name);		\
+		return __rte_bit_ ## family ## v ## fun ## size(addr,	\
+								arg1_name, \
+								arg2_name, \
+								arg3_name); \
 	}
 
-#define __RTE_BIT_OVERLOAD_4R(fun, qualifier, ret_type, arg1_type, arg1_name, \
-			      arg2_type, arg2_name, arg3_type, arg3_name) \
-	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 32, ret_type, arg1_type, \
+#define __RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, size, ret_type, arg1_type, \
 				 arg1_name, arg2_type, arg2_name, arg3_type, \
 				 arg3_name)				\
-	__RTE_BIT_OVERLOAD_SZ_4R(fun, qualifier, 64, ret_type, arg1_type, \
-				 arg1_name, arg2_type, arg2_name, arg3_type, \
-				 arg3_name)
-
-__RTE_BIT_OVERLOAD_2R(test, const, bool, unsigned int, nr)
-__RTE_BIT_OVERLOAD_2(set,, unsigned int, nr)
-__RTE_BIT_OVERLOAD_2(clear,, unsigned int, nr)
-__RTE_BIT_OVERLOAD_3(assign,, unsigned int, nr, bool, value)
-__RTE_BIT_OVERLOAD_2(flip,, unsigned int, nr)
-
-__RTE_BIT_OVERLOAD_3R(atomic_test, const, bool, unsigned int, nr,
+	__RTE_BIT_OVERLOAD_V_4R(family,, fun, c, size, ret_type, arg1_type, \
+				arg1_name, arg2_type, arg2_name, arg3_type, \
+				arg3_name)				\
+	__RTE_BIT_OVERLOAD_V_4R(family, v_, fun, c volatile, size,	\
+				ret_type, arg1_type, arg1_name, arg2_type, \
+				arg2_name, arg3_type, arg3_name)
+
+#define __RTE_BIT_OVERLOAD_4R(family, fun, c, ret_type, arg1_type, arg1_name, \
+			      arg2_type, arg2_name, arg3_type, arg3_name) \
+	__RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, 32, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name, \
+				 arg3_type, arg3_name)			\
+	__RTE_BIT_OVERLOAD_SZ_4R(family, fun, c, 64, ret_type,		\
+				 arg1_type, arg1_name, arg2_type, arg2_name, \
+				 arg3_type, arg3_name)
+
+__RTE_BIT_OVERLOAD_2R(, test, const, bool, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(, set,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_2(, clear,, unsigned int, nr)
+__RTE_BIT_OVERLOAD_3(, assign,, unsigned int, nr, bool, value)
+__RTE_BIT_OVERLOAD_2(, flip,, unsigned int, nr)
+
+__RTE_BIT_OVERLOAD_3R(atomic_, test, const, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_set,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_clear,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_4(atomic_assign,, unsigned int, nr, bool, value,
+__RTE_BIT_OVERLOAD_3(atomic_, set,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3(atomic_, clear,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_4(atomic_, assign,, unsigned int, nr, bool, value,
 		     int, memory_order)
-__RTE_BIT_OVERLOAD_3(atomic_flip,, unsigned int, nr, int, memory_order)
-__RTE_BIT_OVERLOAD_3R(atomic_test_and_set,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_3(atomic_, flip,, unsigned int, nr, int, memory_order)
+__RTE_BIT_OVERLOAD_3R(atomic_, test_and_set,, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_3R(atomic_test_and_clear,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_3R(atomic_, test_and_clear,, bool, unsigned int, nr,
 		      int, memory_order)
-__RTE_BIT_OVERLOAD_4R(atomic_test_and_assign,, bool, unsigned int, nr,
+__RTE_BIT_OVERLOAD_4R(atomic_, test_and_assign,, bool, unsigned int, nr,
 		      bool, value, int, memory_order)
 
 #endif
-- 
2.34.1


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies
  2024-09-10  8:31                                                 ` [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
@ 2024-09-16 12:05                                                   ` David Marchand
  2024-09-17  9:30                                                     ` Mattias Rönnblom
  2024-09-16 12:13                                                   ` David Marchand
  1 sibling, 1 reply; 160+ messages in thread
From: David Marchand @ 2024-09-16 12:05 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, Chengwen Feng

Hello,

On Tue, Sep 10, 2024 at 10:41 AM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
> diff --git a/lib/acl/rte_acl_osdep.h b/lib/acl/rte_acl_osdep.h
> index 3c1dc402ca..e4c7d07c69 100644
> --- a/lib/acl/rte_acl_osdep.h
> +++ b/lib/acl/rte_acl_osdep.h
> @@ -5,10 +5,6 @@
>  #ifndef _RTE_ACL_OSDEP_H_
>  #define _RTE_ACL_OSDEP_H_
>
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> -
>  /**
>   * @file
>   *
> @@ -49,6 +45,10 @@ extern "C" {
>  #include <rte_cpuflags.h>
>  #include <rte_debug.h>
>
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
>  #ifdef __cplusplus
>  }
>  #endif

This part is a NOOP, so we can just drop it.

I found this occurence in other files of the patch.

$ git show lib/ | grep -E '^ .*__cplusplus|diff' | grep -B1
__cplusplus | sed -ne 's/^diff --git a\/\(.*\) b\/.*$/\1/p' | while
read file; do git show -- $file | tr '\n' ' ' | grep -q ' +#ifdef
__cplusplus +extern "C" { +#endif +  #ifdef __cplusplus' && echo
$file; done
lib/acl/rte_acl_osdep.h
lib/eal/arm/include/rte_cpuflags_32.h
lib/eal/arm/include/rte_cpuflags_64.h
lib/eal/arm/include/rte_power_intrinsics.h
lib/eal/loongarch/include/rte_cpuflags.h
lib/eal/loongarch/include/rte_power_intrinsics.h
lib/eal/ppc/include/rte_cpuflags.h
lib/eal/ppc/include/rte_power_intrinsics.h
lib/eal/riscv/include/rte_cpuflags.h
lib/eal/riscv/include/rte_power_intrinsics.h
lib/eal/x86/include/rte_cpuflags.h
lib/eal/x86/include/rte_power_intrinsics.h
lib/ipsec/rte_ipsec.h
lib/pdcp/rte_pdcp.h
lib/ring/rte_ring_elem.h


> diff --git a/lib/eal/arm/include/rte_io.h b/lib/eal/arm/include/rte_io.h
> index f4e66e6bad..658768697c 100644
> --- a/lib/eal/arm/include/rte_io.h
> +++ b/lib/eal/arm/include/rte_io.h
> @@ -5,14 +5,14 @@
>  #ifndef _RTE_IO_ARM_H_
>  #define _RTE_IO_ARM_H_
>
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> -
>  #ifdef RTE_ARCH_64
>  #include "rte_io_64.h"
>  #else
>  #include "generic/rte_io.h"
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
>  #endif

I suspect it is the reason for the CI build error on ARM.
This block should be out of the #endif, but then with the next lines,
it ends up as a noop.

>
>  #ifdef __cplusplus


> diff --git a/lib/eal/arm/include/rte_pause.h b/lib/eal/arm/include/rte_pause.h
> index 6c7002ad98..8f35d60a6e 100644
> --- a/lib/eal/arm/include/rte_pause.h
> +++ b/lib/eal/arm/include/rte_pause.h
> @@ -5,14 +5,14 @@
>  #ifndef _RTE_PAUSE_ARM_H_
>  #define _RTE_PAUSE_ARM_H_
>
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> -
>  #ifdef RTE_ARCH_64
>  #include <rte_pause_64.h>
>  #else
>  #include <rte_pause_32.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
>  #endif

Idem, probably breaking build for ARM.


> diff --git a/lib/eal/include/generic/rte_atomic.h b/lib/eal/include/generic/rte_atomic.h
> index f859707744..0a4f3f8528 100644
> --- a/lib/eal/include/generic/rte_atomic.h
> +++ b/lib/eal/include/generic/rte_atomic.h
> @@ -17,6 +17,10 @@
>  #include <rte_common.h>
>  #include <rte_stdatomic.h>
>
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
>  #ifdef __DOXYGEN__
>
>  /** @name Memory Barrier
> @@ -1156,4 +1160,8 @@ rte_atomic128_cmp_exchange(rte_int128_t *dst,
>
>  #endif /* __DOXYGEN__ */
>
> +#ifdef __cplusplus
> +}
> +#endif
> +

I would move under #ifdef DOXYGEN.

The rest looks good to me.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies
  2024-09-10  8:31                                                 ` [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
  2024-09-16 12:05                                                   ` David Marchand
@ 2024-09-16 12:13                                                   ` David Marchand
  1 sibling, 0 replies; 160+ messages in thread
From: David Marchand @ 2024-09-16 12:13 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, hofors, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, Chengwen Feng

Addendum to previous mail.
I missed some issues, but the CI did catch them.

On Tue, Sep 10, 2024 at 10:41 AM Mattias Rönnblom
<mattias.ronnblom@ericsson.com> wrote:
> diff --git a/lib/eal/include/rte_vfio.h b/lib/eal/include/rte_vfio.h
> index b774625d9f..06b249dca0 100644
> --- a/lib/eal/include/rte_vfio.h
> +++ b/lib/eal/include/rte_vfio.h
> @@ -10,10 +10,6 @@
>   * RTE VFIO. This library provides various VFIO related utility functions.
>   */
>
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> -
>  #include <stdbool.h>
>  #include <stdint.h>
>
> @@ -36,6 +32,10 @@ extern "C" {
>
>  #include <linux/vfio.h>
>
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
>  #define VFIO_DIR "/dev/vfio"
>  #define VFIO_CONTAINER_PATH "/dev/vfio/vfio"
>  #define VFIO_GROUP_FMT "/dev/vfio/%u"

This hunk above should be out of the #idef VFIO_PRESENT.


> diff --git a/lib/hash/rte_thash_gfni.h b/lib/hash/rte_thash_gfni.h
> index 132f37506d..5234c1697f 100644
> --- a/lib/hash/rte_thash_gfni.h
> +++ b/lib/hash/rte_thash_gfni.h
> @@ -5,10 +5,6 @@
>  #ifndef _RTE_THASH_GFNI_H_
>  #define _RTE_THASH_GFNI_H_
>
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> -
>  #include <rte_compat.h>
>  #include <rte_log.h>
>
> @@ -16,6 +12,10 @@ extern "C" {
>
>  #include <rte_thash_x86_gfni.h>
>
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
>  #endif
>
>  /**

This hunk above should be out of the #idef RTE_ARCH_X86.



-- 
David Marchand


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies
  2024-09-16 12:05                                                   ` David Marchand
@ 2024-09-17  9:30                                                     ` Mattias Rönnblom
  2024-09-18 11:15                                                       ` David Marchand
  0 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-17  9:30 UTC (permalink / raw)
  To: David Marchand, Mattias Rönnblom
  Cc: dev, Heng Wang, Stephen Hemminger, Tyler Retzlaff,
	Morten Brørup, Jack Bond-Preston, Chengwen Feng

On 2024-09-16 14:05, David Marchand wrote:
> Hello,
> 
> On Tue, Sep 10, 2024 at 10:41 AM Mattias Rönnblom
> <mattias.ronnblom@ericsson.com> wrote:
>> diff --git a/lib/acl/rte_acl_osdep.h b/lib/acl/rte_acl_osdep.h
>> index 3c1dc402ca..e4c7d07c69 100644
>> --- a/lib/acl/rte_acl_osdep.h
>> +++ b/lib/acl/rte_acl_osdep.h
>> @@ -5,10 +5,6 @@
>>   #ifndef _RTE_ACL_OSDEP_H_
>>   #define _RTE_ACL_OSDEP_H_
>>
>> -#ifdef __cplusplus
>> -extern "C" {
>> -#endif
>> -
>>   /**
>>    * @file
>>    *
>> @@ -49,6 +45,10 @@ extern "C" {
>>   #include <rte_cpuflags.h>
>>   #include <rte_debug.h>
>>
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>>   #ifdef __cplusplus
>>   }
>>   #endif
> 
> This part is a NOOP, so we can just drop it.
> 

I did try to drop such NOOPs, but then something called 
sanitycheckcpp.exe failed the build because it required 'extern "C"' in 
those header files.

Isn't that check superfluous? A missing 'extern "C"' would be detected 
at a later stage, when the dummy C++ programs are compiled against the 
public header files.

If we agree santifycheckcpp.exe should be fixed, is that a separate 
patch or need it be a part of this patch set?

> I found this occurence in other files of the patch.
> 
> $ git show lib/ | grep -E '^ .*__cplusplus|diff' | grep -B1
> __cplusplus | sed -ne 's/^diff --git a\/\(.*\) b\/.*$/\1/p' | while
> read file; do git show -- $file | tr '\n' ' ' | grep -q ' +#ifdef
> __cplusplus +extern "C" { +#endif +  #ifdef __cplusplus' && echo
> $file; done
> lib/acl/rte_acl_osdep.h
> lib/eal/arm/include/rte_cpuflags_32.h
> lib/eal/arm/include/rte_cpuflags_64.h
> lib/eal/arm/include/rte_power_intrinsics.h
> lib/eal/loongarch/include/rte_cpuflags.h
> lib/eal/loongarch/include/rte_power_intrinsics.h
> lib/eal/ppc/include/rte_cpuflags.h
> lib/eal/ppc/include/rte_power_intrinsics.h
> lib/eal/riscv/include/rte_cpuflags.h
> lib/eal/riscv/include/rte_power_intrinsics.h
> lib/eal/x86/include/rte_cpuflags.h
> lib/eal/x86/include/rte_power_intrinsics.h
> lib/ipsec/rte_ipsec.h
> lib/pdcp/rte_pdcp.h
> lib/ring/rte_ring_elem.h
> 
> 
>> diff --git a/lib/eal/arm/include/rte_io.h b/lib/eal/arm/include/rte_io.h
>> index f4e66e6bad..658768697c 100644
>> --- a/lib/eal/arm/include/rte_io.h
>> +++ b/lib/eal/arm/include/rte_io.h
>> @@ -5,14 +5,14 @@
>>   #ifndef _RTE_IO_ARM_H_
>>   #define _RTE_IO_ARM_H_
>>
>> -#ifdef __cplusplus
>> -extern "C" {
>> -#endif
>> -
>>   #ifdef RTE_ARCH_64
>>   #include "rte_io_64.h"
>>   #else
>>   #include "generic/rte_io.h"
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>>   #endif
> 
> I suspect it is the reason for the CI build error on ARM.
> This block should be out of the #endif, but then with the next lines,
> it ends up as a noop.
> 
>>
>>   #ifdef __cplusplus
> 
> 
>> diff --git a/lib/eal/arm/include/rte_pause.h b/lib/eal/arm/include/rte_pause.h
>> index 6c7002ad98..8f35d60a6e 100644
>> --- a/lib/eal/arm/include/rte_pause.h
>> +++ b/lib/eal/arm/include/rte_pause.h
>> @@ -5,14 +5,14 @@
>>   #ifndef _RTE_PAUSE_ARM_H_
>>   #define _RTE_PAUSE_ARM_H_
>>
>> -#ifdef __cplusplus
>> -extern "C" {
>> -#endif
>> -
>>   #ifdef RTE_ARCH_64
>>   #include <rte_pause_64.h>
>>   #else
>>   #include <rte_pause_32.h>
>> +
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>>   #endif
> 
> Idem, probably breaking build for ARM.
> 
> 
>> diff --git a/lib/eal/include/generic/rte_atomic.h b/lib/eal/include/generic/rte_atomic.h
>> index f859707744..0a4f3f8528 100644
>> --- a/lib/eal/include/generic/rte_atomic.h
>> +++ b/lib/eal/include/generic/rte_atomic.h
>> @@ -17,6 +17,10 @@
>>   #include <rte_common.h>
>>   #include <rte_stdatomic.h>
>>
>> +#ifdef __cplusplus
>> +extern "C" {
>> +#endif
>> +
>>   #ifdef __DOXYGEN__
>>
>>   /** @name Memory Barrier
>> @@ -1156,4 +1160,8 @@ rte_atomic128_cmp_exchange(rte_int128_t *dst,
>>
>>   #endif /* __DOXYGEN__ */
>>
>> +#ifdef __cplusplus
>> +}
>> +#endif
>> +
> 
> I would move under #ifdef DOXYGEN.
> 

Why? The pattern now is "almost always directly after the #includes". 
That is better than before, but not ideal. C linkage should only be 
covering functions and global variables declared, I think.

> The rest looks good to me.
> 
> 

Thanks for the help!


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies
  2024-09-17  9:30                                                     ` Mattias Rönnblom
@ 2024-09-18 11:15                                                       ` David Marchand
  2024-09-18 12:09                                                         ` Mattias Rönnblom
  0 siblings, 1 reply; 160+ messages in thread
From: David Marchand @ 2024-09-18 11:15 UTC (permalink / raw)
  To: Mattias Rönnblom, Bruce Richardson, Tyler Retzlaff
  Cc: Mattias Rönnblom, dev, Heng Wang, Stephen Hemminger,
	Morten Brørup, Jack Bond-Preston, Chengwen Feng

On Tue, Sep 17, 2024 at 11:30 AM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>
> On 2024-09-16 14:05, David Marchand wrote:
> > Hello,
> >
> > On Tue, Sep 10, 2024 at 10:41 AM Mattias Rönnblom
> > <mattias.ronnblom@ericsson.com> wrote:
> >> diff --git a/lib/acl/rte_acl_osdep.h b/lib/acl/rte_acl_osdep.h
> >> index 3c1dc402ca..e4c7d07c69 100644
> >> --- a/lib/acl/rte_acl_osdep.h
> >> +++ b/lib/acl/rte_acl_osdep.h
> >> @@ -5,10 +5,6 @@
> >>   #ifndef _RTE_ACL_OSDEP_H_
> >>   #define _RTE_ACL_OSDEP_H_
> >>
> >> -#ifdef __cplusplus
> >> -extern "C" {
> >> -#endif
> >> -
> >>   /**
> >>    * @file
> >>    *
> >> @@ -49,6 +45,10 @@ extern "C" {
> >>   #include <rte_cpuflags.h>
> >>   #include <rte_debug.h>
> >>
> >> +#ifdef __cplusplus
> >> +extern "C" {
> >> +#endif
> >> +
> >>   #ifdef __cplusplus
> >>   }
> >>   #endif
> >
> > This part is a NOOP, so we can just drop it.
> >
>
> I did try to drop such NOOPs, but then something called
> sanitycheckcpp.exe failed the build because it required 'extern "C"' in
> those header files.
>
> Isn't that check superfluous? A missing 'extern "C"' would be detected
> at a later stage, when the dummy C++ programs are compiled against the
> public header files.
>
> If we agree santifycheckcpp.exe should be fixed, is that a separate
> patch or need it be a part of this patch set?

This check was added with 1ee492bdc4ff ("buildtools/chkincs: check
missing C++ guards").
The check is too naive, and I am not sure we can actually make a better one...

I would remove this check, if no better option.


> >> diff --git a/lib/eal/include/generic/rte_atomic.h b/lib/eal/include/generic/rte_atomic.h
> >> index f859707744..0a4f3f8528 100644
> >> --- a/lib/eal/include/generic/rte_atomic.h
> >> +++ b/lib/eal/include/generic/rte_atomic.h
> >> @@ -17,6 +17,10 @@
> >>   #include <rte_common.h>
> >>   #include <rte_stdatomic.h>
> >>
> >> +#ifdef __cplusplus
> >> +extern "C" {
> >> +#endif
> >> +
> >>   #ifdef __DOXYGEN__
> >>
> >>   /** @name Memory Barrier
> >> @@ -1156,4 +1160,8 @@ rte_atomic128_cmp_exchange(rte_int128_t *dst,
> >>
> >>   #endif /* __DOXYGEN__ */
> >>
> >> +#ifdef __cplusplus
> >> +}
> >> +#endif
> >> +
> >
> > I would move under #ifdef DOXYGEN.
> >
>
> Why? The pattern now is "almost always directly after the #includes".
> That is better than before, but not ideal. C linkage should only be
> covering functions and global variables declared, I think.

I hear you about how the marking was done but it already has some
manual edits (seeing how some fixes were needed).


-- 
David Marchand


^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies
  2024-09-18 11:15                                                       ` David Marchand
@ 2024-09-18 12:09                                                         ` Mattias Rönnblom
  2024-09-18 12:46                                                           ` Bruce Richardson
  0 siblings, 1 reply; 160+ messages in thread
From: Mattias Rönnblom @ 2024-09-18 12:09 UTC (permalink / raw)
  To: David Marchand, Bruce Richardson, Tyler Retzlaff
  Cc: Mattias Rönnblom, dev, Heng Wang, Stephen Hemminger,
	Morten Brørup, Jack Bond-Preston, Chengwen Feng

On 2024-09-18 13:15, David Marchand wrote:
> On Tue, Sep 17, 2024 at 11:30 AM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
>>
>> On 2024-09-16 14:05, David Marchand wrote:
>>> Hello,
>>>
>>> On Tue, Sep 10, 2024 at 10:41 AM Mattias Rönnblom
>>> <mattias.ronnblom@ericsson.com> wrote:
>>>> diff --git a/lib/acl/rte_acl_osdep.h b/lib/acl/rte_acl_osdep.h
>>>> index 3c1dc402ca..e4c7d07c69 100644
>>>> --- a/lib/acl/rte_acl_osdep.h
>>>> +++ b/lib/acl/rte_acl_osdep.h
>>>> @@ -5,10 +5,6 @@
>>>>    #ifndef _RTE_ACL_OSDEP_H_
>>>>    #define _RTE_ACL_OSDEP_H_
>>>>
>>>> -#ifdef __cplusplus
>>>> -extern "C" {
>>>> -#endif
>>>> -
>>>>    /**
>>>>     * @file
>>>>     *
>>>> @@ -49,6 +45,10 @@ extern "C" {
>>>>    #include <rte_cpuflags.h>
>>>>    #include <rte_debug.h>
>>>>
>>>> +#ifdef __cplusplus
>>>> +extern "C" {
>>>> +#endif
>>>> +
>>>>    #ifdef __cplusplus
>>>>    }
>>>>    #endif
>>>
>>> This part is a NOOP, so we can just drop it.
>>>
>>
>> I did try to drop such NOOPs, but then something called
>> sanitycheckcpp.exe failed the build because it required 'extern "C"' in
>> those header files.
>>
>> Isn't that check superfluous? A missing 'extern "C"' would be detected
>> at a later stage, when the dummy C++ programs are compiled against the
>> public header files.
>>
>> If we agree santifycheckcpp.exe should be fixed, is that a separate
>> patch or need it be a part of this patch set?
> 
> This check was added with 1ee492bdc4ff ("buildtools/chkincs: check
> missing C++ guards").
> The check is too naive, and I am not sure we can actually make a better one...
> 
> I would remove this check, if no better option.
> 

Just to be clear: what you are suggesting is removing the check as a 
part of this patch set?

I think I was wrong saying the dummy C++ programs already detect 
omissions of C linkage.

I'll leave for Bruce to comment on this before I do anything.

> 
>>>> diff --git a/lib/eal/include/generic/rte_atomic.h b/lib/eal/include/generic/rte_atomic.h
>>>> index f859707744..0a4f3f8528 100644
>>>> --- a/lib/eal/include/generic/rte_atomic.h
>>>> +++ b/lib/eal/include/generic/rte_atomic.h
>>>> @@ -17,6 +17,10 @@
>>>>    #include <rte_common.h>
>>>>    #include <rte_stdatomic.h>
>>>>
>>>> +#ifdef __cplusplus
>>>> +extern "C" {
>>>> +#endif
>>>> +
>>>>    #ifdef __DOXYGEN__
>>>>
>>>>    /** @name Memory Barrier
>>>> @@ -1156,4 +1160,8 @@ rte_atomic128_cmp_exchange(rte_int128_t *dst,
>>>>
>>>>    #endif /* __DOXYGEN__ */
>>>>
>>>> +#ifdef __cplusplus
>>>> +}
>>>> +#endif
>>>> +
>>>
>>> I would move under #ifdef DOXYGEN.
>>>
>>
>> Why? The pattern now is "almost always directly after the #includes".
>> That is better than before, but not ideal. C linkage should only be
>> covering functions and global variables declared, I think.
> 
> I hear you about how the marking was done but it already has some
> manual edits (seeing how some fixes were needed).
> 
> 

I was not arguing against manual edits. I was arguing against 
inconsistent placement of the #ifdefs.

That said, I don't know what the purpose of the ifdef DOXYGEN is.

^ permalink raw reply	[flat|nested] 160+ messages in thread

* Re: [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies
  2024-09-18 12:09                                                         ` Mattias Rönnblom
@ 2024-09-18 12:46                                                           ` Bruce Richardson
  0 siblings, 0 replies; 160+ messages in thread
From: Bruce Richardson @ 2024-09-18 12:46 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: David Marchand, Tyler Retzlaff, Mattias Rönnblom, dev,
	Heng Wang, Stephen Hemminger, Morten Brørup,
	Jack Bond-Preston, Chengwen Feng

On Wed, Sep 18, 2024 at 02:09:26PM +0200, Mattias Rönnblom wrote:
> On 2024-09-18 13:15, David Marchand wrote:
> > On Tue, Sep 17, 2024 at 11:30 AM Mattias Rönnblom <hofors@lysator.liu.se> wrote:
> > > 
> > > On 2024-09-16 14:05, David Marchand wrote:
> > > > Hello,
> > > > 
> > > > On Tue, Sep 10, 2024 at 10:41 AM Mattias Rönnblom
> > > > <mattias.ronnblom@ericsson.com> wrote:
> > > > > diff --git a/lib/acl/rte_acl_osdep.h b/lib/acl/rte_acl_osdep.h
> > > > > index 3c1dc402ca..e4c7d07c69 100644
> > > > > --- a/lib/acl/rte_acl_osdep.h
> > > > > +++ b/lib/acl/rte_acl_osdep.h
> > > > > @@ -5,10 +5,6 @@
> > > > >    #ifndef _RTE_ACL_OSDEP_H_
> > > > >    #define _RTE_ACL_OSDEP_H_
> > > > > 
> > > > > -#ifdef __cplusplus
> > > > > -extern "C" {
> > > > > -#endif
> > > > > -
> > > > >    /**
> > > > >     * @file
> > > > >     *
> > > > > @@ -49,6 +45,10 @@ extern "C" {
> > > > >    #include <rte_cpuflags.h>
> > > > >    #include <rte_debug.h>
> > > > > 
> > > > > +#ifdef __cplusplus
> > > > > +extern "C" {
> > > > > +#endif
> > > > > +
> > > > >    #ifdef __cplusplus
> > > > >    }
> > > > >    #endif
> > > > 
> > > > This part is a NOOP, so we can just drop it.
> > > > 
> > > 
> > > I did try to drop such NOOPs, but then something called
> > > sanitycheckcpp.exe failed the build because it required 'extern "C"' in
> > > those header files.
> > > 
> > > Isn't that check superfluous? A missing 'extern "C"' would be detected
> > > at a later stage, when the dummy C++ programs are compiled against the
> > > public header files.
> > > 
> > > If we agree santifycheckcpp.exe should be fixed, is that a separate
> > > patch or need it be a part of this patch set?
> > 
> > This check was added with 1ee492bdc4ff ("buildtools/chkincs: check
> > missing C++ guards").
> > The check is too naive, and I am not sure we can actually make a better one...
> > 
> > I would remove this check, if no better option.
> > 
> 
> Just to be clear: what you are suggesting is removing the check as a part of
> this patch set?
> 
> I think I was wrong saying the dummy C++ programs already detect omissions
> of C linkage.
> 
> I'll leave for Bruce to comment on this before I do anything.
> 

I agree that the existing check is very naive. Maybe we can go with a
simple fix like adding an allowlist of files which we ignore for 'extern C'
checking?

I don't remember the details of the original patch unfortunately, but from the
commit log I think I found that just compiling C++ with the C headers
didn't throw any errors for the missing extern. I think the functions need
to be actually called and then attempted linked for us to see the errors,
and that is not something that is easily implemented.

/Bruce

^ permalink raw reply	[flat|nested] 160+ messages in thread

end of thread, other threads:[~2024-09-18 12:47 UTC | newest]

Thread overview: 160+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-02 13:53 [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
2024-03-02 13:53 ` [RFC 1/7] eal: extend bit manipulation functions Mattias Rönnblom
2024-03-02 17:05   ` Stephen Hemminger
2024-03-03  6:26     ` Mattias Rönnblom
2024-03-04 16:34       ` Tyler Retzlaff
2024-03-05 18:01         ` Mattias Rönnblom
2024-03-05 18:06           ` Tyler Retzlaff
2024-04-25  8:58   ` [RFC v2 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-04-25  8:58     ` [RFC v2 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-04-29  9:51       ` [RFC v3 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-04-29  9:51         ` [RFC v3 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-04-29 11:12           ` Morten Brørup
2024-04-30  9:55           ` [RFC v4 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-04-30  9:55             ` [RFC v4 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-04-30 12:08               ` [RFC v5 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-04-30 12:08                 ` [RFC v5 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-05-02  5:57                   ` [RFC v6 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-05-02  5:57                     ` [RFC v6 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-05-05  8:37                       ` [RFC v7 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-05-05  8:37                         ` [RFC v7 1/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-08-09  9:04                           ` [PATCH 0/5] Improve EAL bit operations API Mattias Rönnblom
2024-08-09  9:04                             ` [PATCH 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
2024-08-09  9:58                               ` [PATCH v2 0/5] Improve EAL bit operations API Mattias Rönnblom
2024-08-09  9:58                                 ` [PATCH v2 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
2024-08-12 11:16                                   ` Jack Bond-Preston
2024-08-12 11:58                                     ` Mattias Rönnblom
2024-08-12 12:49                                   ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
2024-08-12 12:49                                     ` [PATCH v3 1/5] eal: extend bit manipulation functionality Mattias Rönnblom
2024-08-12 13:24                                       ` Jack Bond-Preston
2024-09-09 14:57                                       ` [PATCH v4 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-09-09 14:57                                         ` [PATCH v4 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
2024-09-09 16:43                                           ` Morten Brørup
2024-09-10  0:50                                           ` fengchengwen
2024-09-10  5:10                                             ` Mattias Rönnblom
2024-09-10  6:20                                           ` [PATCH v5 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-09-10  6:20                                             ` [PATCH v5 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
2024-09-10  8:31                                               ` [PATCH v6 0/6] Improve EAL bit operations API Mattias Rönnblom
2024-09-10  8:31                                                 ` [PATCH v6 1/6] dpdk: do not force C linkage on include file dependencies Mattias Rönnblom
2024-09-16 12:05                                                   ` David Marchand
2024-09-17  9:30                                                     ` Mattias Rönnblom
2024-09-18 11:15                                                       ` David Marchand
2024-09-18 12:09                                                         ` Mattias Rönnblom
2024-09-18 12:46                                                           ` Bruce Richardson
2024-09-16 12:13                                                   ` David Marchand
2024-09-10  8:31                                                 ` [PATCH v6 2/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-09-10  8:31                                                 ` [PATCH v6 3/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-09-10  8:31                                                 ` [PATCH v6 4/6] eal: add atomic " Mattias Rönnblom
2024-09-10  8:31                                                 ` [PATCH v6 5/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-09-10  8:31                                                 ` [PATCH v6 6/6] eal: extend bitops to handle volatile pointers Mattias Rönnblom
2024-09-10  6:20                                             ` [PATCH v5 2/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-09-10  6:20                                             ` [PATCH v5 3/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-09-10  6:20                                             ` [PATCH v5 4/6] eal: add atomic " Mattias Rönnblom
2024-09-10  6:20                                             ` [PATCH v5 5/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-09-10  6:20                                             ` [PATCH v5 6/6] eal: extend bitops to handle volatile pointers Mattias Rönnblom
2024-09-09 14:57                                         ` [PATCH v4 2/6] eal: extend bit manipulation functionality Mattias Rönnblom
2024-09-09 14:57                                         ` [PATCH v4 3/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-09-09 14:57                                         ` [PATCH v4 4/6] eal: add atomic " Mattias Rönnblom
2024-09-09 14:57                                         ` [PATCH v4 5/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-09-09 14:57                                         ` [PATCH v4 6/6] eal: extend bitops to handle volatile pointers Mattias Rönnblom
2024-08-12 12:49                                     ` [PATCH v3 2/5] eal: add unit tests for bit operations Mattias Rönnblom
2024-08-12 13:25                                       ` Jack Bond-Preston
2024-08-12 12:49                                     ` [PATCH v3 3/5] eal: add atomic " Mattias Rönnblom
2024-08-12 13:25                                       ` Jack Bond-Preston
2024-08-12 12:49                                     ` [PATCH v3 4/5] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-08-12 13:26                                       ` Jack Bond-Preston
2024-08-12 12:49                                     ` [PATCH v3 5/5] eal: extend bitops to handle volatile pointers Mattias Rönnblom
2024-08-12 13:26                                       ` Jack Bond-Preston
2024-08-20 17:05                                     ` [PATCH v3 0/5] Improve EAL bit operations API Mattias Rönnblom
2024-09-05  8:10                                       ` David Marchand
2024-09-09 12:04                                         ` Mattias Rönnblom
2024-09-09 12:24                                           ` Thomas Monjalon
2024-09-09 12:25                                           ` David Marchand
2024-09-09 13:09                                             ` Mattias Rönnblom
2024-08-09  9:58                                 ` [PATCH v2 2/5] eal: add unit tests for bit operations Mattias Rönnblom
2024-08-09  9:58                                 ` [PATCH v2 3/5] eal: add atomic " Mattias Rönnblom
2024-08-12 11:19                                   ` Jack Bond-Preston
2024-08-12 12:00                                     ` Mattias Rönnblom
2024-08-09  9:58                                 ` [PATCH v2 4/5] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-08-09  9:58                                 ` [PATCH v2 5/5] eal: extend bitops to handle volatile pointers Mattias Rönnblom
2024-08-09 11:48                                   ` Morten Brørup
2024-08-12 11:22                                   ` Jack Bond-Preston
2024-08-12 12:28                                     ` Mattias Rönnblom
2024-08-09  9:04                             ` [PATCH 2/5] eal: add unit tests for bit operations Mattias Rönnblom
2024-08-09 15:03                               ` Stephen Hemminger
2024-08-09 15:37                                 ` Mattias Rönnblom
2024-08-09 16:31                                   ` Stephen Hemminger
2024-08-09 16:57                                     ` Mattias Rönnblom
2024-08-09  9:04                             ` [PATCH 3/5] eal: add atomic " Mattias Rönnblom
2024-08-09  9:04                             ` [PATCH 4/5] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-08-09  9:04                             ` [PATCH 5/5] eal: extend bitops to handle volatile pointers Mattias Rönnblom
2024-05-05  8:37                         ` [RFC v7 2/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-05-05  8:37                         ` [RFC v7 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
2024-05-07 19:17                           ` Morten Brørup
2024-05-08  6:47                             ` Mattias Rönnblom
2024-05-08  7:33                               ` Morten Brørup
2024-05-08  8:00                                 ` Mattias Rönnblom
2024-05-08  8:11                                   ` Morten Brørup
2024-05-08  9:27                                     ` Mattias Rönnblom
2024-05-08 10:08                                       ` Morten Brørup
2024-05-08 15:15                                 ` Stephen Hemminger
2024-05-08 16:16                                   ` Morten Brørup
2024-05-05  8:37                         ` [RFC v7 4/6] eal: add unit tests for " Mattias Rönnblom
2024-05-05  8:37                         ` [RFC v7 5/6] eal: add atomic bit operations Mattias Rönnblom
2024-05-05  8:37                         ` [RFC v7 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-05-02  5:57                     ` [RFC v6 2/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-05-02  5:57                     ` [RFC v6 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
2024-05-02  5:57                     ` [RFC v6 4/6] eal: add unit tests for " Mattias Rönnblom
2024-05-02  5:57                     ` [RFC v6 5/6] eal: add atomic bit operations Mattias Rönnblom
2024-05-03  6:41                       ` Mattias Rönnblom
2024-05-03 23:30                         ` Tyler Retzlaff
2024-05-04 15:36                           ` Mattias Rönnblom
2024-05-02  5:57                     ` [RFC v6 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-04-30 12:08                 ` [RFC v5 2/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-04-30 12:08                 ` [RFC v5 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
2024-04-30 12:08                 ` [RFC v5 4/6] eal: add unit tests for " Mattias Rönnblom
2024-04-30 12:08                 ` [RFC v5 5/6] eal: add atomic bit operations Mattias Rönnblom
2024-04-30 12:08                 ` [RFC v5 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-04-30  9:55             ` [RFC v4 2/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-04-30  9:55             ` [RFC v4 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
2024-04-30  9:55             ` [RFC v4 4/6] eal: add unit tests for " Mattias Rönnblom
2024-04-30 10:37               ` Morten Brørup
2024-04-30 11:58                 ` Mattias Rönnblom
2024-04-30  9:55             ` [RFC v4 5/6] eal: add atomic bit operations Mattias Rönnblom
2024-04-30  9:55             ` [RFC v4 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-04-29  9:51         ` [RFC v3 2/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-04-29  9:51         ` [RFC v3 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
2024-04-29  9:51         ` [RFC v3 4/6] eal: add unit tests for " Mattias Rönnblom
2024-04-29  9:51         ` [RFC v3 5/6] eal: add atomic bit operations Mattias Rönnblom
2024-04-29  9:51         ` [RFC v3 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-04-25  8:58     ` [RFC v2 2/6] eal: add unit tests for bit operations Mattias Rönnblom
2024-04-25  8:58     ` [RFC v2 3/6] eal: add exactly-once bit access functions Mattias Rönnblom
2024-04-25  8:58     ` [RFC v2 4/6] eal: add unit tests for " Mattias Rönnblom
2024-04-25  8:58     ` [RFC v2 5/6] eal: add atomic bit operations Mattias Rönnblom
2024-04-25 10:25       ` Morten Brørup
2024-04-25 14:36         ` Mattias Rönnblom
2024-04-25 16:18           ` Morten Brørup
2024-04-26  9:39             ` Mattias Rönnblom
2024-04-26 12:00               ` Morten Brørup
2024-04-28 15:37                 ` Mattias Rönnblom
2024-04-29  7:24                   ` Morten Brørup
2024-04-30 16:52               ` Tyler Retzlaff
2024-04-25  8:58     ` [RFC v2 6/6] eal: add unit tests for atomic bit access functions Mattias Rönnblom
2024-04-25 18:05     ` [RFC v2 0/6] Improve EAL bit operations API Tyler Retzlaff
2024-04-26 11:17       ` Mattias Rönnblom
2024-04-26 21:35     ` Patrick Robb
2024-03-02 13:53 ` [RFC 2/7] eal: add generic bit manipulation macros Mattias Rönnblom
2024-03-04  8:16   ` Heng Wang
2024-03-04 15:41     ` Mattias Rönnblom
2024-03-04 16:42   ` Tyler Retzlaff
2024-03-05 18:08     ` Mattias Rönnblom
2024-03-05 18:22       ` Tyler Retzlaff
2024-03-05 20:02         ` Mattias Rönnblom
2024-03-05 20:53           ` Tyler Retzlaff
2024-03-02 13:53 ` [RFC 3/7] eal: add bit manipulation functions which read or write once Mattias Rönnblom
2024-03-02 13:53 ` [RFC 4/7] eal: add generic once-type bit operations macros Mattias Rönnblom
2024-03-02 13:53 ` [RFC 5/7] eal: add atomic bit operations Mattias Rönnblom
2024-03-02 13:53 ` [RFC 6/7] eal: add generic " Mattias Rönnblom
2024-03-02 13:53 ` [RFC 7/7] eal: deprecate relaxed family of " Mattias Rönnblom
2024-03-02 17:07   ` Stephen Hemminger
2024-03-03  6:30     ` Mattias Rönnblom

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).