* [PATCH 0/3] add diagnostics macros to make code portable @ 2024-12-27 1:33 Andre Muezerie 2024-12-27 1:33 ` [PATCH 1/3] lib/eal: " Andre Muezerie ` (17 more replies) 0 siblings, 18 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-27 1:33 UTC (permalink / raw) Cc: dev, Andre Muezerie It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCCΓÇÖs pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Andre Muezerie (3): lib/eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 4 +-- drivers/net/axgbe/axgbe_rxtx.h | 12 +++---- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 +-- drivers/net/dpaa2/dpaa2_rxtx.c | 15 ++------ drivers/net/fm10k/fm10k_rxtx_vec.c | 4 +-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 2 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 +- drivers/net/i40e/i40e_rxtx_common_avx.h | 4 +-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 2 +- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 4 +-- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 4 +-- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 +-- drivers/net/i40e/i40e_rxtx_vec_neon.c | 3 +- drivers/net/i40e/i40e_rxtx_vec_sse.c | 4 +-- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 4 +-- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 4 +-- drivers/net/iavf/iavf_rxtx_vec_common.h | 4 +-- drivers/net/iavf/iavf_rxtx_vec_sse.c | 4 +-- drivers/net/ice/ice_rxtx_common_avx.h | 4 +-- drivers/net/ice/ice_rxtx_vec_avx2.c | 4 +-- drivers/net/ice/ice_rxtx_vec_avx512.c | 4 +-- drivers/net/ice/ice_rxtx_vec_common.h | 4 +-- drivers/net/ice/ice_rxtx_vec_sse.c | 4 +-- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 +-- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 +- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 +- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 4 +-- drivers/net/mlx5/mlx5_flow.c | 6 ++-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 6 ++-- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 +- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 4 +-- drivers/net/tap/tap_flow.c | 6 ++-- drivers/net/virtio/virtio_rxtx_simple.c | 4 +-- lib/eal/include/rte_common.h | 34 +++++++++++++++++++ 34 files changed, 77 insertions(+), 101 deletions(-) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH 1/3] lib/eal: add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie @ 2024-12-27 1:33 ` Andre Muezerie 2024-12-27 1:33 ` [PATCH 2/3] drivers/common: " Andre Muezerie ` (16 subsequent siblings) 17 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-27 1:33 UTC (permalink / raw) To: Tyler Retzlaff; +Cc: dev, Andre Muezerie It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC’s pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 34 ++++++++++++++++++++++++++++++++++ 1 file changed, 34 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4d299f2b36..c5f91730ef 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -137,6 +137,40 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macro to ignore whenever a pointer is cast so as to remove a type + * qualifier from the target type. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macro to ignore code that might break the strict aliasing rules that + * the compiler is using for optimization. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wstrict_aliasing \ + _Pragma("GCC diagnostic ignored \"-Wstrict-aliasing\"") +#else +#define __rte_diagnostic_ignored_wstrict_aliasing +#endif + +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + /** * Mark a function or variable to a weak reference. */ -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH 2/3] drivers/common: add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie 2024-12-27 1:33 ` [PATCH 1/3] lib/eal: " Andre Muezerie @ 2024-12-27 1:33 ` Andre Muezerie 2024-12-27 17:57 ` Stephen Hemminger 2024-12-27 1:33 ` [PATCH 3/3] drivers/net: " Andre Muezerie ` (15 subsequent siblings) 17 siblings, 1 reply; 87+ messages in thread From: Andre Muezerie @ 2024-12-27 1:33 UTC (permalink / raw) To: Bruce Richardson, Konstantin Ananyev, Jingjing Wu, Praveen Shetty Cc: dev, Andre Muezerie It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC’s pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..d298a5ca36 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,9 +6,7 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 2/3] drivers/common: add diagnostics macros to make code portable 2024-12-27 1:33 ` [PATCH 2/3] drivers/common: " Andre Muezerie @ 2024-12-27 17:57 ` Stephen Hemminger 2024-12-27 19:43 ` Andre Muezerie 0 siblings, 1 reply; 87+ messages in thread From: Stephen Hemminger @ 2024-12-27 17:57 UTC (permalink / raw) To: Andre Muezerie Cc: Bruce Richardson, Konstantin Ananyev, Jingjing Wu, Praveen Shetty, dev On Thu, 26 Dec 2024 17:33:15 -0800 Andre Muezerie <andremue@linux.microsoft.com> wrote: > From: Andre Muezerie <andremue@linux.microsoft.com> > To: Bruce Richardson <bruce.richardson@intel.com>, Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>, Jingjing Wu <jingjing.wu@intel.com>, Praveen Shetty <praveen.shetty@intel.com> > Cc: dev@dpdk.org, Andre Muezerie <andremue@linux.microsoft.com> > Subject: [PATCH 2/3] drivers/common: add diagnostics macros to make code portable > Date: Thu, 26 Dec 2024 17:33:15 -0800 > X-Mailer: git-send-email 1.8.3.1 > > It was a common pattern to have "GCC diagnostic ignored" pragmas > sprinkled over the code and only activate these pragmas for certain > compilers (gcc and clang). Clang supports GCC’s pragma for > compatibility with existing source code, so #pragma GCC diagnostic > and #pragma clang diagnostic are synonyms for Clang > (https://clang.llvm.org/docs/UsersManual.html). As much as possible, these should be fixed. Disabling warnings hides too many pre-existing bugs. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH 2/3] drivers/common: add diagnostics macros to make code portable 2024-12-27 17:57 ` Stephen Hemminger @ 2024-12-27 19:43 ` Andre Muezerie 0 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-27 19:43 UTC (permalink / raw) To: Stephen Hemminger Cc: Bruce Richardson, Konstantin Ananyev, Jingjing Wu, Praveen Shetty, dev On Fri, Dec 27, 2024 at 09:57:03AM -0800, Stephen Hemminger wrote: > On Thu, 26 Dec 2024 17:33:15 -0800 > Andre Muezerie <andremue@linux.microsoft.com> wrote: > > > From: Andre Muezerie <andremue@linux.microsoft.com> > > To: Bruce Richardson <bruce.richardson@intel.com>, Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>, Jingjing Wu <jingjing.wu@intel.com>, Praveen Shetty <praveen.shetty@intel.com> > > Cc: dev@dpdk.org, Andre Muezerie <andremue@linux.microsoft.com> > > Subject: [PATCH 2/3] drivers/common: add diagnostics macros to make code portable > > Date: Thu, 26 Dec 2024 17:33:15 -0800 > > X-Mailer: git-send-email 1.8.3.1 > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > sprinkled over the code and only activate these pragmas for certain > > compilers (gcc and clang). Clang supports GCC’s pragma for > > compatibility with existing source code, so #pragma GCC diagnostic > > and #pragma clang diagnostic are synonyms for Clang > > (https://clang.llvm.org/docs/UsersManual.html). > > As much as possible, these should be fixed. Disabling warnings hides too many > pre-existing bugs. I'll take a closer look at these warnings and see what I can get fixed. My main goal here was to get the code to compile with MSVC (without disabling additional warnings), but I agree that it's not great to see these warnings getting disabled in so many places. ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH 3/3] drivers/net: add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie 2024-12-27 1:33 ` [PATCH 1/3] lib/eal: " Andre Muezerie 2024-12-27 1:33 ` [PATCH 2/3] drivers/common: " Andre Muezerie @ 2024-12-27 1:33 ` Andre Muezerie 2024-12-28 0:45 ` [PATCH v2 0/3] " Andre Muezerie ` (14 subsequent siblings) 17 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-27 1:33 UTC (permalink / raw) To: Selwin Sebastian, Praveen Shetty, Hemant Agrawal, Sachin Saxena, Wathsala Vithanage, Jie Hai, Ian Stokes, Bruce Richardson, Konstantin Ananyev, David Christensen, Vladimir Medvedkin, Anatoly Burakov, Jingjing Wu, Dariusz Sosnowski, Viacheslav Ovsiienko, Bing Zhao, Ori Kam, Suanming Mou, Matan Azrad, Stephen Hemminger, Maxime Coquelin, Chenbo Xia Cc: dev, Andre Muezerie It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC’s pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 12 ++++-------- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 +--- drivers/net/dpaa2/dpaa2_rxtx.c | 15 +++------------ drivers/net/fm10k/fm10k_rxtx_vec.c | 4 +--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 2 +- drivers/net/i40e/i40e_recycle_mbufs_vec_common.c | 2 +- drivers/net/i40e/i40e_rxtx_common_avx.h | 4 +--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 2 +- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 4 +--- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 4 +--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 +--- drivers/net/i40e/i40e_rxtx_vec_neon.c | 3 +-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 4 +--- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 4 +--- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 4 +--- drivers/net/iavf/iavf_rxtx_vec_common.h | 4 +--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 4 +--- drivers/net/ice/ice_rxtx_common_avx.h | 4 +--- drivers/net/ice/ice_rxtx_vec_avx2.c | 4 +--- drivers/net/ice/ice_rxtx_vec_avx512.c | 4 +--- drivers/net/ice/ice_rxtx_vec_common.h | 4 +--- drivers/net/ice/ice_rxtx_vec_sse.c | 4 +--- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 +--- .../net/ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 +- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 +- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 4 +--- drivers/net/mlx5/mlx5_flow.c | 6 +++--- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 6 ++---- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 +- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 4 +--- drivers/net/tap/tap_flow.c | 6 +++--- drivers/net/virtio/virtio_rxtx_simple.c | 4 +--- 32 files changed, 42 insertions(+), 98 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..59583477ac 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,14 +6,10 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +#include <rte_common.h> + +/* to suppress compiler warnings related to descriptor casting */ +__rte_diagnostic_ignored_wcast_qual /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5cc841022c 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,9 +11,7 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..f2aba62588 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,13 +1962,8 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual /* This function loopbacks all the received packets.*/ uint16_t @@ -2118,8 +2113,4 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif +__rte_diagnostic_pop diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..d8c8eba9b5 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,9 +11,7 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..429f37b8f7 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,7 +9,7 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_ignored_wcast_qual static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..c399bfd95d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,7 +10,7 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_ignored_wcast_qual void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..2a2635ce43 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,9 +11,7 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual #ifdef __AVX2__ static __rte_always_inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..e3d4ec0459 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,7 +15,7 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_ignored_wcast_qual static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..49a9866ea9 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,9 +15,7 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..c2148b65e4 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,9 +15,7 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual #define RTE_I40E_DESCS_PER_LOOP_AVX 8 diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..94eaf6109d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,9 +11,7 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..fc82189e84 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,8 +16,7 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_ignored_wcast_qual static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..89a07f74f5 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,9 +14,7 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..d80b06c4a6 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,9 +6,7 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..876935d199 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,9 +6,7 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..fc886b0ab6 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,9 +11,7 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..88556e1bf3 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,9 +12,7 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..7d2acc622f 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,9 +7,7 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual #ifdef __AVX2__ static __rte_always_inline void diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..1d53404af8 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,9 +7,7 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..e1f41312b5 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,9 +7,7 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual #define ICE_DESCS_PER_LOOP_AVX 8 diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..537d1d086a 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,9 +7,7 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..77ede76632 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,9 +6,7 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..ba29901e67 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,9 +11,7 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..b39661b3e3 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,7 +8,7 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_ignored_wcast_qual void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..25e820bef8 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,7 +11,7 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_ignored_wcast_qual static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..5a5e8242ef 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,9 +12,7 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..0a89d2c414 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,10 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop +__rte_diagnostic_pop return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..8fa91d1269 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,10 +25,8 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif +__rte_diagnostic_ignored_wcast_qual +__rte_diagnostic_ignored_wstrict_aliasing /** * Store free buffers to RX SW ring. diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..d097bb443b 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,7 +25,7 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_ignored_wcast_qual /** * Store free buffers to RX SW ring. diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..f786a91032 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,9 +24,7 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual /** * Store free buffers to RX SW ring. diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..4037c212c2 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,9 +23,7 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif +__rte_diagnostic_ignored_wcast_qual int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v2 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (2 preceding siblings ...) 2024-12-27 1:33 ` [PATCH 3/3] drivers/net: " Andre Muezerie @ 2024-12-28 0:45 ` Andre Muezerie 2024-12-28 0:45 ` [PATCH v2 1/3] lib/eal: " Andre Muezerie ` (2 more replies) 2024-12-28 3:18 ` [PATCH v3 0/3] " Andre Muezerie ` (13 subsequent siblings) 17 siblings, 3 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-28 0:45 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): lib/eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- drivers/net/axgbe/axgbe_rxtx.h | 9 ---- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 -- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++----- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 ++++++-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 2 - .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 10 ++-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 2 - drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 -- drivers/net/i40e/i40e_rxtx_vec_neon.c | 3 -- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++-- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++-- drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 ++++++++++-- drivers/net/ice/ice_rxtx_common_avx.h | 10 ++-- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_common.h | 4 -- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++-- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 -- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 ++++++-- drivers/net/mlx5/mlx5_flow.c | 6 +-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 - drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 4 -- drivers/net/tap/tap_flow.c | 6 +-- drivers/net/virtio/virtio_rxtx_simple.c | 4 -- lib/eal/include/rte_common.h | 23 ++++++++++ 34 files changed, 269 insertions(+), 135 deletions(-) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v2 1/3] lib/eal: add diagnostics macros to make code portable 2024-12-28 0:45 ` [PATCH v2 0/3] " Andre Muezerie @ 2024-12-28 0:45 ` Andre Muezerie 2024-12-28 0:45 ` [PATCH v2 2/3] drivers/common: " Andre Muezerie 2024-12-28 0:45 ` [PATCH v2 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-28 0:45 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4d299f2b36..13b7b92f46 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -137,6 +137,29 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macro to ignore whenever a pointer is cast so as to remove a type + * qualifier from the target type. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + /** * Mark a function or variable to a weak reference. */ -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v2 2/3] drivers/common: add diagnostics macros to make code portable 2024-12-28 0:45 ` [PATCH v2 0/3] " Andre Muezerie 2024-12-28 0:45 ` [PATCH v2 1/3] lib/eal: " Andre Muezerie @ 2024-12-28 0:45 ` Andre Muezerie 2024-12-28 0:45 ` [PATCH v2 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-28 0:45 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..37cd0a43e2 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,8 +30,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -108,8 +107,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +166,11 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +221,13 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)rxdp, desc0_1); _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); +__rte_diagnostic_pop rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -336,6 +344,8 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -359,6 +369,7 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,8 +571,11 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -634,8 +648,11 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -797,6 +814,8 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -820,6 +839,7 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1151,10 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1201,10 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1435,7 +1461,10 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1480,7 +1509,10 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1521,11 +1553,14 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); +__rte_diagnostic_pop nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1575,10 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); +__rte_diagnostic_pop tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v2 3/3] drivers/net: add diagnostics macros to make code portable 2024-12-28 0:45 ` [PATCH v2 0/3] " Andre Muezerie 2024-12-28 0:45 ` [PATCH v2 1/3] lib/eal: " Andre Muezerie 2024-12-28 0:45 ` [PATCH v2 2/3] drivers/common: " Andre Muezerie @ 2024-12-28 0:45 ` Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-28 0:45 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 ------ drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 --- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++-------- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 +++++++++--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 2 -- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 -- drivers/net/i40e/i40e_rxtx_common_avx.h | 10 ++++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 2 -- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 3 -- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++++---- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++++++---- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 +++++++++++++++---- drivers/net/ice/ice_rxtx_common_avx.h | 10 ++++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_common.h | 4 --- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 --- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 +++++++++--- drivers/net/mlx5/mlx5_flow.c | 6 ++-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 ---- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 -- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 4 --- drivers/net/tap/tap_flow.c | 6 ++-- drivers/net/virtio/virtio_rxtx_simple.c | 4 --- 32 files changed, 204 insertions(+), 131 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..21c306fd94 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,10 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); +__rte_diagnostic_pop dq_storage++; num_rx++; @@ -2118,8 +2113,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..3d534a91ac 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].q, dma_addr0); +__rte_diagnostic_pop } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +315,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); +__rte_diagnostic_pop /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +467,10 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +482,14 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +744,10 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..5297087085 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..97e3ab6845 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,8 +32,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..02d930a7f2 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..90a1d4661a 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,6 +275,8 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -291,6 +292,7 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +697,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -728,8 +733,11 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..b77989074f 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -263,6 +262,8 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -286,6 +287,7 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +877,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -909,7 +914,10 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..f056f40dee 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..8a5537bcf2 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,8 +37,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +99,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +467,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +482,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +692,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..f5503d9dae 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,6 +189,8 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -208,6 +206,7 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +508,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -742,6 +741,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -765,6 +766,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -959,6 +961,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -990,6 +994,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1669,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1719,8 +1727,11 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..04894debc8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -164,6 +160,8 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -187,6 +185,7 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +599,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -733,6 +732,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -756,6 +757,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1111,6 +1113,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -1142,6 +1146,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1988,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2045,10 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -2225,7 +2236,10 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); +__rte_diagnostic_pop } static __rte_always_inline void @@ -2300,7 +2314,10 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..690e0749e4 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,8 +418,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -458,8 +457,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..373df0c935 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,8 +34,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -69,8 +68,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +580,10 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +595,14 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +791,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +872,10 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +887,14 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -927,6 +941,8 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs_bh[3] = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); @@ -938,6 +954,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1366,10 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..b9cb1f5279 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,8 +29,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -77,8 +76,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..6c6a810a15 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,6 +250,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -269,6 +267,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,6 +443,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -475,6 +476,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +792,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -841,8 +846,11 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..5a774550dc 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -243,6 +239,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -266,6 +264,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -474,6 +473,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -505,6 +506,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +989,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1029,7 +1034,10 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..f2991cee1a 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,8 +48,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -91,8 +90,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +427,10 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +442,14 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -489,6 +497,8 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh3 = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); @@ -504,6 +514,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop /** * to shift the 32b RSS hash value to the @@ -680,7 +691,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..4e7a64b39f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..8e4048a32f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,8 +37,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -76,8 +75,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +468,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +483,14 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +684,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..0a89d2c414 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,10 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop +__rte_diagnostic_pop return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..e7deacc1fb 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..55c23dac80 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (3 preceding siblings ...) 2024-12-28 0:45 ` [PATCH v2 0/3] " Andre Muezerie @ 2024-12-28 3:18 ` Andre Muezerie 2024-12-28 3:18 ` [PATCH v3 1/3] lib/eal: " Andre Muezerie ` (2 more replies) 2024-12-30 15:59 ` [PATCH v4 0/3] " Andre Muezerie ` (12 subsequent siblings) 17 siblings, 3 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-28 3:18 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): lib/eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- drivers/net/axgbe/axgbe_rxtx.h | 9 ---- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 -- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++----- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 ++++++-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 11 ++++- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 -- drivers/net/i40e/i40e_rxtx_vec_neon.c | 9 ++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++-- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++-- drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 ++++++++++-- drivers/net/ice/ice_rxtx_common_avx.h | 10 ++-- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_common.h | 4 -- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++-- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 -- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 ++++++-- drivers/net/mlx5/mlx5_flow.c | 6 +-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 - drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 ++++++++--- drivers/net/tap/tap_flow.c | 6 +-- drivers/net/virtio/virtio_rxtx_simple.c | 4 -- lib/eal/include/rte_common.h | 23 ++++++++++ 34 files changed, 312 insertions(+), 138 deletions(-) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 1/3] lib/eal: add diagnostics macros to make code portable 2024-12-28 3:18 ` [PATCH v3 0/3] " Andre Muezerie @ 2024-12-28 3:18 ` Andre Muezerie 2024-12-28 3:18 ` [PATCH v3 2/3] drivers/common: " Andre Muezerie 2024-12-28 3:18 ` [PATCH v3 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-28 3:18 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4d299f2b36..13b7b92f46 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -137,6 +137,29 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macro to ignore whenever a pointer is cast so as to remove a type + * qualifier from the target type. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + /** * Mark a function or variable to a weak reference. */ -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 2/3] drivers/common: add diagnostics macros to make code portable 2024-12-28 3:18 ` [PATCH v3 0/3] " Andre Muezerie 2024-12-28 3:18 ` [PATCH v3 1/3] lib/eal: " Andre Muezerie @ 2024-12-28 3:18 ` Andre Muezerie 2024-12-28 3:18 ` [PATCH v3 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-28 3:18 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..37cd0a43e2 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,8 +30,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -108,8 +107,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +166,11 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +221,13 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)rxdp, desc0_1); _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); +__rte_diagnostic_pop rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -336,6 +344,8 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -359,6 +369,7 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,8 +571,11 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -634,8 +648,11 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -797,6 +814,8 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -820,6 +839,7 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1151,10 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1201,10 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1435,7 +1461,10 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1480,7 +1509,10 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1521,11 +1553,14 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); +__rte_diagnostic_pop nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1575,10 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); +__rte_diagnostic_pop tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v3 3/3] drivers/net: add diagnostics macros to make code portable 2024-12-28 3:18 ` [PATCH v3 0/3] " Andre Muezerie 2024-12-28 3:18 ` [PATCH v3 1/3] lib/eal: " Andre Muezerie 2024-12-28 3:18 ` [PATCH v3 2/3] drivers/common: " Andre Muezerie @ 2024-12-28 3:18 ` Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-28 3:18 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 ------ drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 --- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++-------- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 +++++++++--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 ++-- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 -- drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 11 +++++-- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 9 ++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++++---- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++++++---- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 +++++++++++++++---- drivers/net/ice/ice_rxtx_common_avx.h | 10 ++++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_common.h | 4 --- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 --- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 +++++++++--- drivers/net/mlx5/mlx5_flow.c | 6 ++-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 ---- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 -- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 +++++++++++----- drivers/net/tap/tap_flow.c | 6 ++-- drivers/net/virtio/virtio_rxtx_simple.c | 4 --- 32 files changed, 247 insertions(+), 134 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..21c306fd94 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,10 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); +__rte_diagnostic_pop dq_storage++; num_rx++; @@ -2118,8 +2113,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..3d534a91ac 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].q, dma_addr0); +__rte_diagnostic_pop } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +315,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); +__rte_diagnostic_pop /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +467,10 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +482,14 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +744,10 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..99f080f3e8 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,11 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&desc->addr, val1); vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); +__rte_diagnostic_pop } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..bcabacf689 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,8 +32,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -144,8 +146,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__*/ @@ -190,8 +195,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..6771cc7928 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -286,7 +284,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = *(__vector unsigned long *)(rxdp + 3); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +297,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = *(__vector unsigned long *)(rxdp + 2); rte_compiler_barrier(); descs[1] = *(__vector unsigned long *)(rxdp + 1); rte_compiler_barrier(); descs[0] = *(__vector unsigned long *)(rxdp); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +538,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual *(__vector unsigned long *)txdp = descriptor; +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..90a1d4661a 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,6 +275,8 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -291,6 +292,7 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +697,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -728,8 +733,11 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..b77989074f 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -263,6 +262,8 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -286,6 +287,7 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +877,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -909,7 +914,10 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..f35f3e1e20 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -421,6 +418,8 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -433,6 +432,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +662,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..8a5537bcf2 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,8 +37,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +99,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +467,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +482,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +692,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..f5503d9dae 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,6 +189,8 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -208,6 +206,7 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +508,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -742,6 +741,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -765,6 +766,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -959,6 +961,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -990,6 +994,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1669,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1719,8 +1727,11 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..04894debc8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -164,6 +160,8 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -187,6 +185,7 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +599,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -733,6 +732,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -756,6 +757,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1111,6 +1113,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -1142,6 +1146,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1988,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2045,10 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -2225,7 +2236,10 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); +__rte_diagnostic_pop } static __rte_always_inline void @@ -2300,7 +2314,10 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..690e0749e4 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,8 +418,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -458,8 +457,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..373df0c935 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,8 +34,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -69,8 +68,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +580,10 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +595,14 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +791,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +872,10 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +887,14 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -927,6 +941,8 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs_bh[3] = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); @@ -938,6 +954,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1366,10 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..b9cb1f5279 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,8 +29,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -77,8 +76,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..6c6a810a15 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,6 +250,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -269,6 +267,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,6 +443,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -475,6 +476,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +792,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -841,8 +846,11 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..5a774550dc 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -243,6 +239,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -266,6 +264,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -474,6 +473,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -505,6 +506,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +989,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1029,7 +1034,10 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..f2991cee1a 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,8 +48,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -91,8 +90,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +427,10 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +442,14 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -489,6 +497,8 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh3 = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); @@ -504,6 +514,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop /** * to shift the 32b RSS hash value to the @@ -680,7 +691,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..4e7a64b39f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..8e4048a32f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,8 +37,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -76,8 +75,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +468,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +483,14 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +684,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..0a89d2c414 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,10 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop +__rte_diagnostic_pop return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..e7deacc1fb 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..f4d08d5b30 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile void *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -129,8 +126,11 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, * E. store flow tag (rte_flow mark). */ cycle: +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (rxq->cqe_comp_layout) rte_prefetch0((void *)(cq + mcqe_n)); +__rte_diagnostic_pop for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -138,6 +138,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, __m128i byte_cnt, invalid_mask; #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) @@ -145,6 +147,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* A.1 load mCQEs into a 128bit register. */ mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); +__rte_diagnostic_pop /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -354,9 +357,12 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* Move to next CQE and invalidate consumed CQEs. */ if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (pos + 8 < mcqe_n) rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); +__rte_diagnostic_pop + mcq = (volatile void *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +377,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,6 +657,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqes[3] = _mm_loadl_epi64((__m128i *) &cq[pos + p3].sop_drop_qpn); rte_compiler_barrier(); @@ -683,6 +691,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); +__rte_diagnostic_pop cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,6 +709,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); @@ -710,6 +721,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); +__rte_diagnostic_pop cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v4 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (4 preceding siblings ...) 2024-12-28 3:18 ` [PATCH v3 0/3] " Andre Muezerie @ 2024-12-30 15:59 ` Andre Muezerie 2024-12-30 15:59 ` [PATCH v4 1/3] lib/eal: " Andre Muezerie ` (3 more replies) 2024-12-31 18:55 ` [PATCH v5 " Andre Muezerie ` (11 subsequent siblings) 17 siblings, 4 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-30 15:59 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): lib/eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- drivers/net/axgbe/axgbe_rxtx.h | 9 ---- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 -- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++----- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 ++++++-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 ++++++- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 -- drivers/net/i40e/i40e_rxtx_vec_neon.c | 15 ++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++-- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++-- drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 ++++++++++-- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_common.h | 4 -- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++-- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 -- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 ++++++-- drivers/net/mlx5/mlx5_flow.c | 6 +-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 - drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 ++++++++--- drivers/net/tap/tap_flow.c | 6 +-- drivers/net/virtio/virtio_rxtx_simple.c | 4 -- lib/eal/include/rte_common.h | 23 ++++++++++ 34 files changed, 330 insertions(+), 138 deletions(-) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v4 1/3] lib/eal: add diagnostics macros to make code portable 2024-12-30 15:59 ` [PATCH v4 0/3] " Andre Muezerie @ 2024-12-30 15:59 ` Andre Muezerie 2024-12-30 15:59 ` [PATCH v4 2/3] drivers/common: " Andre Muezerie ` (2 subsequent siblings) 3 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-30 15:59 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4d299f2b36..13b7b92f46 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -137,6 +137,29 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macro to ignore whenever a pointer is cast so as to remove a type + * qualifier from the target type. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + /** * Mark a function or variable to a weak reference. */ -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v4 2/3] drivers/common: add diagnostics macros to make code portable 2024-12-30 15:59 ` [PATCH v4 0/3] " Andre Muezerie 2024-12-30 15:59 ` [PATCH v4 1/3] lib/eal: " Andre Muezerie @ 2024-12-30 15:59 ` Andre Muezerie 2024-12-30 15:59 ` [PATCH v4 3/3] drivers/net: " Andre Muezerie 2024-12-30 17:44 ` [PATCH v4 0/3] " Stephen Hemminger 3 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-30 15:59 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..37cd0a43e2 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,8 +30,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -108,8 +107,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +166,11 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +221,13 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)rxdp, desc0_1); _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); +__rte_diagnostic_pop rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -336,6 +344,8 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -359,6 +369,7 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,8 +571,11 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -634,8 +648,11 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -797,6 +814,8 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -820,6 +839,7 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1151,10 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1201,10 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1435,7 +1461,10 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1480,7 +1509,10 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1521,11 +1553,14 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); +__rte_diagnostic_pop nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1575,10 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); +__rte_diagnostic_pop tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v4 3/3] drivers/net: add diagnostics macros to make code portable 2024-12-30 15:59 ` [PATCH v4 0/3] " Andre Muezerie 2024-12-30 15:59 ` [PATCH v4 1/3] lib/eal: " Andre Muezerie 2024-12-30 15:59 ` [PATCH v4 2/3] drivers/common: " Andre Muezerie @ 2024-12-30 15:59 ` Andre Muezerie 2024-12-30 17:44 ` [PATCH v4 0/3] " Stephen Hemminger 3 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-30 15:59 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 ------ drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 --- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++-------- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 +++++++++--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 ++-- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 -- drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 15 ++++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++++---- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++++++---- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 +++++++++++++++---- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_common.h | 4 --- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 --- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 +++++++++--- drivers/net/mlx5/mlx5_flow.c | 6 ++-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 ---- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 -- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 +++++++++++----- drivers/net/tap/tap_flow.c | 6 ++-- drivers/net/virtio/virtio_rxtx_simple.c | 4 --- 32 files changed, 265 insertions(+), 134 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..21c306fd94 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,10 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); +__rte_diagnostic_pop dq_storage++; num_rx++; @@ -2118,8 +2113,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..3d534a91ac 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].q, dma_addr0); +__rte_diagnostic_pop } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +315,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); +__rte_diagnostic_pop /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +467,10 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +482,14 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +744,10 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..99f080f3e8 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,11 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&desc->addr, val1); vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); +__rte_diagnostic_pop } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..bcabacf689 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,8 +32,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -144,8 +146,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__*/ @@ -190,8 +195,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..c5546132e9 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -43,8 +41,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = (__vector unsigned long){}; for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp[i].read); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +85,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +290,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = *(__vector unsigned long *)(rxdp + 3); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +303,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = *(__vector unsigned long *)(rxdp + 2); rte_compiler_barrier(); descs[1] = *(__vector unsigned long *)(rxdp + 1); rte_compiler_barrier(); descs[0] = *(__vector unsigned long *)(rxdp); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +544,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual *(__vector unsigned long *)txdp = descriptor; +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..90a1d4661a 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,6 +275,8 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -291,6 +292,7 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +697,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -728,8 +733,11 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..b77989074f 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -263,6 +262,8 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -286,6 +287,7 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +877,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -909,7 +914,10 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..9928ab7ba8 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -58,11 +55,14 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +87,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,6 +424,8 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -433,6 +438,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +668,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..8a5537bcf2 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,8 +37,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +99,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +467,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +482,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +692,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..f5503d9dae 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,6 +189,8 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -208,6 +206,7 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +508,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -742,6 +741,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -765,6 +766,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -959,6 +961,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -990,6 +994,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1669,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1719,8 +1727,11 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..04894debc8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -164,6 +160,8 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -187,6 +185,7 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +599,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -733,6 +732,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -756,6 +757,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1111,6 +1113,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -1142,6 +1146,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1988,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2045,10 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -2225,7 +2236,10 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); +__rte_diagnostic_pop } static __rte_always_inline void @@ -2300,7 +2314,10 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..690e0749e4 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,8 +418,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -458,8 +457,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..373df0c935 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,8 +34,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -69,8 +68,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +580,10 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +595,14 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +791,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +872,10 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +887,14 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -927,6 +941,8 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs_bh[3] = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); @@ -938,6 +954,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1366,10 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..b0d5232510 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,8 +29,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -77,8 +76,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -157,8 +159,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__ */ @@ -213,8 +218,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..6c6a810a15 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,6 +250,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -269,6 +267,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,6 +443,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -475,6 +476,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +792,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -841,8 +846,11 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..5a774550dc 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -243,6 +239,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -266,6 +264,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -474,6 +473,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -505,6 +506,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +989,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1029,7 +1034,10 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..f2991cee1a 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,8 +48,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -91,8 +90,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +427,10 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +442,14 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -489,6 +497,8 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh3 = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); @@ -504,6 +514,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop /** * to shift the 32b RSS hash value to the @@ -680,7 +691,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..4e7a64b39f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..8e4048a32f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,8 +37,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -76,8 +75,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +468,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +483,14 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +684,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..0a89d2c414 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,10 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop +__rte_diagnostic_pop return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..e7deacc1fb 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..f4d08d5b30 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile void *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -129,8 +126,11 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, * E. store flow tag (rte_flow mark). */ cycle: +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (rxq->cqe_comp_layout) rte_prefetch0((void *)(cq + mcqe_n)); +__rte_diagnostic_pop for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -138,6 +138,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, __m128i byte_cnt, invalid_mask; #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) @@ -145,6 +147,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* A.1 load mCQEs into a 128bit register. */ mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); +__rte_diagnostic_pop /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -354,9 +357,12 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* Move to next CQE and invalidate consumed CQEs. */ if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (pos + 8 < mcqe_n) rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); +__rte_diagnostic_pop + mcq = (volatile void *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +377,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,6 +657,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqes[3] = _mm_loadl_epi64((__m128i *) &cq[pos + p3].sop_drop_qpn); rte_compiler_barrier(); @@ -683,6 +691,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); +__rte_diagnostic_pop cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,6 +709,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); @@ -710,6 +721,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); +__rte_diagnostic_pop cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v4 0/3] add diagnostics macros to make code portable 2024-12-30 15:59 ` [PATCH v4 0/3] " Andre Muezerie ` (2 preceding siblings ...) 2024-12-30 15:59 ` [PATCH v4 3/3] drivers/net: " Andre Muezerie @ 2024-12-30 17:44 ` Stephen Hemminger 3 siblings, 0 replies; 87+ messages in thread From: Stephen Hemminger @ 2024-12-30 17:44 UTC (permalink / raw) To: Andre Muezerie; +Cc: dev On Mon, 30 Dec 2024 07:59:27 -0800 Andre Muezerie <andremue@linux.microsoft.com> wrote: > From: Andre Muezerie <andremue@linux.microsoft.com> > To: andremue@linux.microsoft.com > Cc: dev@dpdk.org, stephen@networkplumber.org > Subject: [PATCH v4 0/3] add diagnostics macros to make code portable > Date: Mon, 30 Dec 2024 07:59:27 -0800 > X-Mailer: git-send-email 1.8.3.1 > > It was a common pattern to have "GCC diagnostic ignored" pragmas > sprinkled over the code and only activate these pragmas for certain > compilers (gcc and clang). Clang supports GCC's pragma for > compatibility with existing source code, so #pragma GCC diagnostic > and #pragma clang diagnostic are synonyms for Clang > (https://clang.llvm.org/docs/UsersManual.html). > > Now that effort is being made to make the code compatible with MSVC > these expressions would become more complex. It makes sense to hide > this complexity behind macros. This makes maintenance easier as these > macros are defined in a single place. As a plus the code becomes > more readable as well. Does anyone still use the Intel C compiler with DPDK? I really wonder if it is only maintained for legacy/sponsorship reasons. ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v5 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (5 preceding siblings ...) 2024-12-30 15:59 ` [PATCH v4 0/3] " Andre Muezerie @ 2024-12-31 18:55 ` Andre Muezerie 2024-12-31 18:55 ` [PATCH v5 1/3] lib/eal: " Andre Muezerie ` (2 more replies) 2024-12-31 20:15 ` [PATCH v6 0/3] " Andre Muezerie ` (10 subsequent siblings) 17 siblings, 3 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-31 18:55 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. v5: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): lib/eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- drivers/net/axgbe/axgbe_rxtx.h | 9 ---- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 -- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++----- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 ++++++-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 ++++++- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 -- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 ++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++-- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++-- drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 ++++++++++-- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_common.h | 4 -- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++-- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 -- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 ++++++-- drivers/net/mlx5/mlx5_flow.c | 6 +-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 - drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 ++++++++--- drivers/net/tap/tap_flow.c | 6 +-- drivers/net/virtio/virtio_rxtx_simple.c | 4 -- lib/eal/include/rte_common.h | 23 ++++++++++ 34 files changed, 333 insertions(+), 138 deletions(-) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v5 1/3] lib/eal: add diagnostics macros to make code portable 2024-12-31 18:55 ` [PATCH v5 " Andre Muezerie @ 2024-12-31 18:55 ` Andre Muezerie 2024-12-31 18:55 ` [PATCH v5 2/3] drivers/common: " Andre Muezerie 2024-12-31 18:55 ` [PATCH v5 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-31 18:55 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4d299f2b36..13b7b92f46 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -137,6 +137,29 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macro to ignore whenever a pointer is cast so as to remove a type + * qualifier from the target type. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + /** * Mark a function or variable to a weak reference. */ -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v5 2/3] drivers/common: add diagnostics macros to make code portable 2024-12-31 18:55 ` [PATCH v5 " Andre Muezerie 2024-12-31 18:55 ` [PATCH v5 1/3] lib/eal: " Andre Muezerie @ 2024-12-31 18:55 ` Andre Muezerie 2024-12-31 18:55 ` [PATCH v5 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-31 18:55 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..37cd0a43e2 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,8 +30,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -108,8 +107,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +166,11 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +221,13 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)rxdp, desc0_1); _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); +__rte_diagnostic_pop rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -336,6 +344,8 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -359,6 +369,7 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,8 +571,11 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -634,8 +648,11 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -797,6 +814,8 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -820,6 +839,7 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1151,10 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1201,10 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1435,7 +1461,10 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1480,7 +1509,10 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1521,11 +1553,14 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); +__rte_diagnostic_pop nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1575,10 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); +__rte_diagnostic_pop tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v5 3/3] drivers/net: add diagnostics macros to make code portable 2024-12-31 18:55 ` [PATCH v5 " Andre Muezerie 2024-12-31 18:55 ` [PATCH v5 1/3] lib/eal: " Andre Muezerie 2024-12-31 18:55 ` [PATCH v5 2/3] drivers/common: " Andre Muezerie @ 2024-12-31 18:55 ` Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-31 18:55 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 ------ drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 --- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++-------- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 +++++++++--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 ++-- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 -- drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++++---- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++++++---- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 +++++++++++++++---- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_common.h | 4 --- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 --- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 +++++++++--- drivers/net/mlx5/mlx5_flow.c | 6 ++-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 ---- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 -- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 +++++++++++----- drivers/net/tap/tap_flow.c | 6 ++-- drivers/net/virtio/virtio_rxtx_simple.c | 4 --- 32 files changed, 268 insertions(+), 134 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..21c306fd94 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,10 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); +__rte_diagnostic_pop dq_storage++; num_rx++; @@ -2118,8 +2113,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..3d534a91ac 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].q, dma_addr0); +__rte_diagnostic_pop } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +315,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); +__rte_diagnostic_pop /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +467,10 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +482,14 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +744,10 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..99f080f3e8 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,11 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&desc->addr, val1); vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); +__rte_diagnostic_pop } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..bcabacf689 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,8 +32,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -144,8 +146,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__*/ @@ -190,8 +195,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..c5546132e9 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -43,8 +41,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = (__vector unsigned long){}; for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp[i].read); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +85,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +290,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = *(__vector unsigned long *)(rxdp + 3); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +303,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = *(__vector unsigned long *)(rxdp + 2); rte_compiler_barrier(); descs[1] = *(__vector unsigned long *)(rxdp + 1); rte_compiler_barrier(); descs[0] = *(__vector unsigned long *)(rxdp); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +544,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual *(__vector unsigned long *)txdp = descriptor; +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..90a1d4661a 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,6 +275,8 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -291,6 +292,7 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +697,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -728,8 +733,11 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..b77989074f 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -263,6 +262,8 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -286,6 +287,7 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +877,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -909,7 +914,10 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..cfc6d9ead3 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +38,10 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,11 +58,14 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +90,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,6 +427,8 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -433,6 +441,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +671,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..8a5537bcf2 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,8 +37,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +99,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +467,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +482,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +692,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..f5503d9dae 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,6 +189,8 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -208,6 +206,7 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +508,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -742,6 +741,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -765,6 +766,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -959,6 +961,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -990,6 +994,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1669,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1719,8 +1727,11 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..04894debc8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -164,6 +160,8 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -187,6 +185,7 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +599,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -733,6 +732,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -756,6 +757,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1111,6 +1113,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -1142,6 +1146,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1988,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2045,10 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -2225,7 +2236,10 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); +__rte_diagnostic_pop } static __rte_always_inline void @@ -2300,7 +2314,10 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..690e0749e4 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,8 +418,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -458,8 +457,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..373df0c935 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,8 +34,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -69,8 +68,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +580,10 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +595,14 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +791,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +872,10 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +887,14 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -927,6 +941,8 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs_bh[3] = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); @@ -938,6 +954,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1366,10 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..b0d5232510 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,8 +29,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -77,8 +76,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -157,8 +159,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__ */ @@ -213,8 +218,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..6c6a810a15 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,6 +250,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -269,6 +267,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,6 +443,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -475,6 +476,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +792,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -841,8 +846,11 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..5a774550dc 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -243,6 +239,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -266,6 +264,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -474,6 +473,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -505,6 +506,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +989,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1029,7 +1034,10 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..f2991cee1a 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,8 +48,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -91,8 +90,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +427,10 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +442,14 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -489,6 +497,8 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh3 = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); @@ -504,6 +514,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop /** * to shift the 32b RSS hash value to the @@ -680,7 +691,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..4e7a64b39f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..8e4048a32f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,8 +37,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -76,8 +75,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +468,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +483,14 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +684,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..0a89d2c414 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,10 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop +__rte_diagnostic_pop return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..e7deacc1fb 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..f4d08d5b30 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile void *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -129,8 +126,11 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, * E. store flow tag (rte_flow mark). */ cycle: +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (rxq->cqe_comp_layout) rte_prefetch0((void *)(cq + mcqe_n)); +__rte_diagnostic_pop for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -138,6 +138,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, __m128i byte_cnt, invalid_mask; #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) @@ -145,6 +147,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* A.1 load mCQEs into a 128bit register. */ mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); +__rte_diagnostic_pop /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -354,9 +357,12 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* Move to next CQE and invalidate consumed CQEs. */ if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (pos + 8 < mcqe_n) rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); +__rte_diagnostic_pop + mcq = (volatile void *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +377,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,6 +657,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqes[3] = _mm_loadl_epi64((__m128i *) &cq[pos + p3].sop_drop_qpn); rte_compiler_barrier(); @@ -683,6 +691,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); +__rte_diagnostic_pop cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,6 +709,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); @@ -710,6 +721,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); +__rte_diagnostic_pop cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v6 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (6 preceding siblings ...) 2024-12-31 18:55 ` [PATCH v5 " Andre Muezerie @ 2024-12-31 20:15 ` Andre Muezerie 2024-12-31 20:15 ` [PATCH v6 1/3] lib/eal: " Andre Muezerie ` (2 more replies) 2024-12-31 22:30 ` [PATCH v7 0/3] " Andre Muezerie ` (9 subsequent siblings) 17 siblings, 3 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-31 20:15 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. v6: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v5: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): lib/eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- drivers/net/axgbe/axgbe_rxtx.h | 9 ---- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 -- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++----- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 ++++++-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 ++++++- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 -- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 ++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++-- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++-- drivers/net/iavf/iavf_rxtx_vec_neon.c | 3 ++ drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 ++++++++++-- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_common.h | 4 -- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++-- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 -- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 ++++++-- drivers/net/mlx5/mlx5_flow.c | 6 +-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 - drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 ++++++++--- drivers/net/tap/tap_flow.c | 6 +-- drivers/net/virtio/virtio_rxtx_simple.c | 4 -- lib/eal/include/rte_common.h | 23 ++++++++++ 35 files changed, 336 insertions(+), 138 deletions(-) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v6 1/3] lib/eal: add diagnostics macros to make code portable 2024-12-31 20:15 ` [PATCH v6 0/3] " Andre Muezerie @ 2024-12-31 20:15 ` Andre Muezerie 2024-12-31 20:15 ` [PATCH v6 2/3] drivers/common: " Andre Muezerie 2024-12-31 20:15 ` [PATCH v6 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-31 20:15 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4d299f2b36..13b7b92f46 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -137,6 +137,29 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macro to ignore whenever a pointer is cast so as to remove a type + * qualifier from the target type. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + /** * Mark a function or variable to a weak reference. */ -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v6 2/3] drivers/common: add diagnostics macros to make code portable 2024-12-31 20:15 ` [PATCH v6 0/3] " Andre Muezerie 2024-12-31 20:15 ` [PATCH v6 1/3] lib/eal: " Andre Muezerie @ 2024-12-31 20:15 ` Andre Muezerie 2024-12-31 20:15 ` [PATCH v6 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-31 20:15 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..37cd0a43e2 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,8 +30,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -108,8 +107,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +166,11 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +221,13 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)rxdp, desc0_1); _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); +__rte_diagnostic_pop rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -336,6 +344,8 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -359,6 +369,7 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,8 +571,11 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -634,8 +648,11 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -797,6 +814,8 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -820,6 +839,7 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1151,10 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1201,10 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1435,7 +1461,10 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1480,7 +1509,10 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1521,11 +1553,14 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); +__rte_diagnostic_pop nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1575,10 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); +__rte_diagnostic_pop tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v6 3/3] drivers/net: add diagnostics macros to make code portable 2024-12-31 20:15 ` [PATCH v6 0/3] " Andre Muezerie 2024-12-31 20:15 ` [PATCH v6 1/3] lib/eal: " Andre Muezerie 2024-12-31 20:15 ` [PATCH v6 2/3] drivers/common: " Andre Muezerie @ 2024-12-31 20:15 ` Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-31 20:15 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 ------ drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 --- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++-------- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 +++++++++--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 ++-- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 -- drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++++---- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++++++---- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++++--- drivers/net/iavf/iavf_rxtx_vec_neon.c | 3 ++ drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 +++++++++++++++---- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_common.h | 4 --- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 --- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 +++++++++--- drivers/net/mlx5/mlx5_flow.c | 6 ++-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 ---- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 -- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 +++++++++++----- drivers/net/tap/tap_flow.c | 6 ++-- drivers/net/virtio/virtio_rxtx_simple.c | 4 --- 33 files changed, 271 insertions(+), 134 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..21c306fd94 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,10 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); +__rte_diagnostic_pop dq_storage++; num_rx++; @@ -2118,8 +2113,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..3d534a91ac 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].q, dma_addr0); +__rte_diagnostic_pop } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +315,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); +__rte_diagnostic_pop /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +467,10 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +482,14 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +744,10 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..99f080f3e8 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,11 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&desc->addr, val1); vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); +__rte_diagnostic_pop } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..bcabacf689 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,8 +32,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -144,8 +146,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__*/ @@ -190,8 +195,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..c5546132e9 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -43,8 +41,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = (__vector unsigned long){}; for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp[i].read); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +85,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +290,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = *(__vector unsigned long *)(rxdp + 3); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +303,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = *(__vector unsigned long *)(rxdp + 2); rte_compiler_barrier(); descs[1] = *(__vector unsigned long *)(rxdp + 1); rte_compiler_barrier(); descs[0] = *(__vector unsigned long *)(rxdp); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +544,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual *(__vector unsigned long *)txdp = descriptor; +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..90a1d4661a 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,6 +275,8 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -291,6 +292,7 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +697,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -728,8 +733,11 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..b77989074f 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -263,6 +262,8 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -286,6 +287,7 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +877,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -909,7 +914,10 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..cfc6d9ead3 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +38,10 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,11 +58,14 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +90,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,6 +427,8 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -433,6 +441,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +671,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..8a5537bcf2 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,8 +37,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +99,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +467,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +482,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +692,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..f5503d9dae 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,6 +189,8 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -208,6 +206,7 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +508,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -742,6 +741,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -765,6 +766,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -959,6 +961,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -990,6 +994,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1669,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1719,8 +1727,11 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..04894debc8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -164,6 +160,8 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -187,6 +185,7 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +599,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -733,6 +732,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -756,6 +757,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1111,6 +1113,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -1142,6 +1146,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1988,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2045,10 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -2225,7 +2236,10 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); +__rte_diagnostic_pop } static __rte_always_inline void @@ -2300,7 +2314,10 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..690e0749e4 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,8 +418,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -458,8 +457,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c index 04be574683..ab347938d2 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_neon.c +++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c @@ -269,6 +269,8 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -281,6 +283,7 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..373df0c935 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,8 +34,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -69,8 +68,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +580,10 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +595,14 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +791,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +872,10 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +887,14 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -927,6 +941,8 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs_bh[3] = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); @@ -938,6 +954,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1366,10 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..b0d5232510 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,8 +29,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -77,8 +76,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -157,8 +159,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__ */ @@ -213,8 +218,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..6c6a810a15 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,6 +250,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -269,6 +267,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,6 +443,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -475,6 +476,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +792,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -841,8 +846,11 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..5a774550dc 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -243,6 +239,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -266,6 +264,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -474,6 +473,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -505,6 +506,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +989,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1029,7 +1034,10 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..f2991cee1a 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,8 +48,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -91,8 +90,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +427,10 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +442,14 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -489,6 +497,8 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh3 = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); @@ -504,6 +514,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop /** * to shift the 32b RSS hash value to the @@ -680,7 +691,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..4e7a64b39f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..8e4048a32f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,8 +37,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -76,8 +75,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +468,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +483,14 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +684,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..0a89d2c414 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,10 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop +__rte_diagnostic_pop return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..e7deacc1fb 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..f4d08d5b30 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile void *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -129,8 +126,11 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, * E. store flow tag (rte_flow mark). */ cycle: +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (rxq->cqe_comp_layout) rte_prefetch0((void *)(cq + mcqe_n)); +__rte_diagnostic_pop for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -138,6 +138,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, __m128i byte_cnt, invalid_mask; #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) @@ -145,6 +147,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* A.1 load mCQEs into a 128bit register. */ mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); +__rte_diagnostic_pop /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -354,9 +357,12 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* Move to next CQE and invalidate consumed CQEs. */ if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (pos + 8 < mcqe_n) rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); +__rte_diagnostic_pop + mcq = (volatile void *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +377,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,6 +657,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqes[3] = _mm_loadl_epi64((__m128i *) &cq[pos + p3].sop_drop_qpn); rte_compiler_barrier(); @@ -683,6 +691,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); +__rte_diagnostic_pop cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,6 +709,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); @@ -710,6 +721,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); +__rte_diagnostic_pop cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v7 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (7 preceding siblings ...) 2024-12-31 20:15 ` [PATCH v6 0/3] " Andre Muezerie @ 2024-12-31 22:30 ` Andre Muezerie 2024-12-31 22:30 ` [PATCH v7 1/3] lib/eal: " Andre Muezerie ` (2 more replies) 2025-01-01 0:48 ` [PATCH v8 0/3] " Andre Muezerie ` (8 subsequent siblings) 17 siblings, 3 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-31 22:30 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. v7: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v6: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v5: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): lib/eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- drivers/net/axgbe/axgbe_rxtx.h | 9 ---- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 -- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++----- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 ++++++-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 ++++++- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 -- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 ++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++-- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++-- drivers/net/iavf/iavf_rxtx_vec_neon.c | 9 ++++ drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 ++++++++++-- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_common.h | 4 -- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++-- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 -- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 ++++++-- drivers/net/mlx5/mlx5_flow.c | 6 +-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 - drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 ++++++++--- drivers/net/tap/tap_flow.c | 6 +-- drivers/net/virtio/virtio_rxtx_simple.c | 4 -- lib/eal/include/rte_common.h | 23 ++++++++++ 35 files changed, 342 insertions(+), 138 deletions(-) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v7 1/3] lib/eal: add diagnostics macros to make code portable 2024-12-31 22:30 ` [PATCH v7 0/3] " Andre Muezerie @ 2024-12-31 22:30 ` Andre Muezerie 2024-12-31 22:30 ` [PATCH v7 2/3] drivers/common: " Andre Muezerie 2024-12-31 22:30 ` [PATCH v7 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-31 22:30 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4d299f2b36..13b7b92f46 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -137,6 +137,29 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macro to ignore whenever a pointer is cast so as to remove a type + * qualifier from the target type. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + /** * Mark a function or variable to a weak reference. */ -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v7 2/3] drivers/common: add diagnostics macros to make code portable 2024-12-31 22:30 ` [PATCH v7 0/3] " Andre Muezerie 2024-12-31 22:30 ` [PATCH v7 1/3] lib/eal: " Andre Muezerie @ 2024-12-31 22:30 ` Andre Muezerie 2024-12-31 22:30 ` [PATCH v7 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-31 22:30 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..37cd0a43e2 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,8 +30,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -108,8 +107,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +166,11 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +221,13 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)rxdp, desc0_1); _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); +__rte_diagnostic_pop rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -336,6 +344,8 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -359,6 +369,7 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,8 +571,11 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -634,8 +648,11 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -797,6 +814,8 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -820,6 +839,7 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1151,10 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1201,10 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1435,7 +1461,10 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1480,7 +1509,10 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1521,11 +1553,14 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); +__rte_diagnostic_pop nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1575,10 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); +__rte_diagnostic_pop tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v7 3/3] drivers/net: add diagnostics macros to make code portable 2024-12-31 22:30 ` [PATCH v7 0/3] " Andre Muezerie 2024-12-31 22:30 ` [PATCH v7 1/3] lib/eal: " Andre Muezerie 2024-12-31 22:30 ` [PATCH v7 2/3] drivers/common: " Andre Muezerie @ 2024-12-31 22:30 ` Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2024-12-31 22:30 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 ------ drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 --- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++-------- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 +++++++++--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 ++-- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 -- drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++++---- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++++++---- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++++--- drivers/net/iavf/iavf_rxtx_vec_neon.c | 9 ++++++ drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 +++++++++++++++---- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_common.h | 4 --- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 --- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 +++++++++--- drivers/net/mlx5/mlx5_flow.c | 6 ++-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 ---- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 -- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 +++++++++++----- drivers/net/tap/tap_flow.c | 6 ++-- drivers/net/virtio/virtio_rxtx_simple.c | 4 --- 33 files changed, 277 insertions(+), 134 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..21c306fd94 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,10 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); +__rte_diagnostic_pop dq_storage++; num_rx++; @@ -2118,8 +2113,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..3d534a91ac 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].q, dma_addr0); +__rte_diagnostic_pop } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +315,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); +__rte_diagnostic_pop /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +467,10 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +482,14 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +744,10 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..99f080f3e8 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,11 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&desc->addr, val1); vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); +__rte_diagnostic_pop } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..bcabacf689 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,8 +32,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -144,8 +146,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__*/ @@ -190,8 +195,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..c5546132e9 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -43,8 +41,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = (__vector unsigned long){}; for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp[i].read); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +85,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +290,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = *(__vector unsigned long *)(rxdp + 3); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +303,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = *(__vector unsigned long *)(rxdp + 2); rte_compiler_barrier(); descs[1] = *(__vector unsigned long *)(rxdp + 1); rte_compiler_barrier(); descs[0] = *(__vector unsigned long *)(rxdp); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +544,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual *(__vector unsigned long *)txdp = descriptor; +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..90a1d4661a 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,6 +275,8 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -291,6 +292,7 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +697,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -728,8 +733,11 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..b77989074f 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -263,6 +262,8 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -286,6 +287,7 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +877,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -909,7 +914,10 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..cfc6d9ead3 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +38,10 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,11 +58,14 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +90,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,6 +427,8 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -433,6 +441,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +671,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..8a5537bcf2 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,8 +37,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +99,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +467,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +482,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +692,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..f5503d9dae 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,6 +189,8 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -208,6 +206,7 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +508,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -742,6 +741,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -765,6 +766,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -959,6 +961,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -990,6 +994,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1669,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1719,8 +1727,11 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..04894debc8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -164,6 +160,8 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -187,6 +185,7 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +599,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -733,6 +732,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -756,6 +757,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1111,6 +1113,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -1142,6 +1146,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1988,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2045,10 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -2225,7 +2236,10 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); +__rte_diagnostic_pop } static __rte_always_inline void @@ -2300,7 +2314,10 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..690e0749e4 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,8 +418,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -458,8 +457,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c index 04be574683..5e723e1193 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_neon.c +++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c @@ -36,7 +36,10 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxep[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -53,11 +56,14 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH; @@ -269,6 +275,8 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -281,6 +289,7 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..373df0c935 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,8 +34,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -69,8 +68,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +580,10 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +595,14 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +791,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +872,10 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +887,14 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -927,6 +941,8 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs_bh[3] = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); @@ -938,6 +954,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1366,10 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..b0d5232510 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,8 +29,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -77,8 +76,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -157,8 +159,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__ */ @@ -213,8 +218,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..6c6a810a15 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,6 +250,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -269,6 +267,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,6 +443,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -475,6 +476,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +792,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -841,8 +846,11 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..5a774550dc 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -243,6 +239,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -266,6 +264,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -474,6 +473,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -505,6 +506,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +989,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1029,7 +1034,10 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..f2991cee1a 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,8 +48,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -91,8 +90,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +427,10 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +442,14 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -489,6 +497,8 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh3 = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); @@ -504,6 +514,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop /** * to shift the 32b RSS hash value to the @@ -680,7 +691,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..4e7a64b39f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..8e4048a32f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,8 +37,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -76,8 +75,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +468,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +483,14 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +684,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..0a89d2c414 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,10 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop +__rte_diagnostic_pop return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..e7deacc1fb 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..f4d08d5b30 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile void *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -129,8 +126,11 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, * E. store flow tag (rte_flow mark). */ cycle: +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (rxq->cqe_comp_layout) rte_prefetch0((void *)(cq + mcqe_n)); +__rte_diagnostic_pop for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -138,6 +138,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, __m128i byte_cnt, invalid_mask; #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) @@ -145,6 +147,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* A.1 load mCQEs into a 128bit register. */ mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); +__rte_diagnostic_pop /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -354,9 +357,12 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* Move to next CQE and invalidate consumed CQEs. */ if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (pos + 8 < mcqe_n) rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); +__rte_diagnostic_pop + mcq = (volatile void *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +377,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,6 +657,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqes[3] = _mm_loadl_epi64((__m128i *) &cq[pos + p3].sop_drop_qpn); rte_compiler_barrier(); @@ -683,6 +691,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); +__rte_diagnostic_pop cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,6 +709,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); @@ -710,6 +721,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); +__rte_diagnostic_pop cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v8 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (8 preceding siblings ...) 2024-12-31 22:30 ` [PATCH v7 0/3] " Andre Muezerie @ 2025-01-01 0:48 ` Andre Muezerie 2025-01-01 0:48 ` [PATCH v8 1/3] lib/eal: " Andre Muezerie ` (2 more replies) 2025-01-01 3:36 ` [PATCH v9 0/3] " Andre Muezerie ` (7 subsequent siblings) 17 siblings, 3 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-01 0:48 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. v8: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v7: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v6: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v5: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): lib/eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- drivers/net/axgbe/axgbe_rxtx.h | 9 ---- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 -- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++----- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 ++++++-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 ++++++- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 -- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 ++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++-- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++-- drivers/net/iavf/iavf_rxtx_vec_neon.c | 9 ++++ drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 ++++++++++-- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_common.h | 4 -- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++-- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 -- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 8 +++- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 ++++++-- drivers/net/mlx5/mlx5_flow.c | 6 +-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 - drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 ++++++++--- drivers/net/tap/tap_flow.c | 6 +-- drivers/net/virtio/virtio_rxtx_simple.c | 4 -- lib/eal/include/rte_common.h | 23 ++++++++++ 35 files changed, 348 insertions(+), 138 deletions(-) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v8 1/3] lib/eal: add diagnostics macros to make code portable 2025-01-01 0:48 ` [PATCH v8 0/3] " Andre Muezerie @ 2025-01-01 0:48 ` Andre Muezerie 2025-01-01 0:48 ` [PATCH v8 2/3] drivers/common: " Andre Muezerie 2025-01-01 0:48 ` [PATCH v8 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-01 0:48 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4d299f2b36..13b7b92f46 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -137,6 +137,29 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macro to ignore whenever a pointer is cast so as to remove a type + * qualifier from the target type. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + /** * Mark a function or variable to a weak reference. */ -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v8 2/3] drivers/common: add diagnostics macros to make code portable 2025-01-01 0:48 ` [PATCH v8 0/3] " Andre Muezerie 2025-01-01 0:48 ` [PATCH v8 1/3] lib/eal: " Andre Muezerie @ 2025-01-01 0:48 ` Andre Muezerie 2025-01-01 0:48 ` [PATCH v8 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-01 0:48 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..37cd0a43e2 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,8 +30,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -108,8 +107,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +166,11 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +221,13 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)rxdp, desc0_1); _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); +__rte_diagnostic_pop rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -336,6 +344,8 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -359,6 +369,7 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,8 +571,11 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -634,8 +648,11 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -797,6 +814,8 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -820,6 +839,7 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1151,10 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1201,10 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1435,7 +1461,10 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1480,7 +1509,10 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1521,11 +1553,14 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); +__rte_diagnostic_pop nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1575,10 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); +__rte_diagnostic_pop tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v8 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-01 0:48 ` [PATCH v8 0/3] " Andre Muezerie 2025-01-01 0:48 ` [PATCH v8 1/3] lib/eal: " Andre Muezerie 2025-01-01 0:48 ` [PATCH v8 2/3] drivers/common: " Andre Muezerie @ 2025-01-01 0:48 ` Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-01 0:48 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 ------ drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 --- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++-------- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 +++++++++--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 ++-- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 -- drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++++---- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++++++---- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++++--- drivers/net/iavf/iavf_rxtx_vec_neon.c | 9 ++++++ drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 +++++++++++++++---- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_common.h | 4 --- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 --- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 8 +++-- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 +++++++++--- drivers/net/mlx5/mlx5_flow.c | 6 ++-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 ---- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 -- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 +++++++++++----- drivers/net/tap/tap_flow.c | 6 ++-- drivers/net/virtio/virtio_rxtx_simple.c | 4 --- 33 files changed, 283 insertions(+), 134 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..21c306fd94 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,10 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); +__rte_diagnostic_pop dq_storage++; num_rx++; @@ -2118,8 +2113,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..3d534a91ac 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].q, dma_addr0); +__rte_diagnostic_pop } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +315,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); +__rte_diagnostic_pop /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +467,10 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +482,14 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +744,10 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..99f080f3e8 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,11 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&desc->addr, val1); vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); +__rte_diagnostic_pop } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..bcabacf689 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,8 +32,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -144,8 +146,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__*/ @@ -190,8 +195,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..c5546132e9 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -43,8 +41,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = (__vector unsigned long){}; for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp[i].read); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +85,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +290,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = *(__vector unsigned long *)(rxdp + 3); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +303,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = *(__vector unsigned long *)(rxdp + 2); rte_compiler_barrier(); descs[1] = *(__vector unsigned long *)(rxdp + 1); rte_compiler_barrier(); descs[0] = *(__vector unsigned long *)(rxdp); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +544,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual *(__vector unsigned long *)txdp = descriptor; +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..90a1d4661a 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,6 +275,8 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -291,6 +292,7 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +697,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -728,8 +733,11 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..b77989074f 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -263,6 +262,8 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -286,6 +287,7 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +877,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -909,7 +914,10 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..cfc6d9ead3 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +38,10 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,11 +58,14 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +90,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,6 +427,8 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -433,6 +441,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +671,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..8a5537bcf2 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,8 +37,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +99,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +467,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +482,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +692,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..f5503d9dae 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,6 +189,8 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -208,6 +206,7 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +508,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -742,6 +741,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -765,6 +766,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -959,6 +961,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -990,6 +994,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1669,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1719,8 +1727,11 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..04894debc8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -164,6 +160,8 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -187,6 +185,7 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +599,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -733,6 +732,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -756,6 +757,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1111,6 +1113,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -1142,6 +1146,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1988,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2045,10 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -2225,7 +2236,10 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); +__rte_diagnostic_pop } static __rte_always_inline void @@ -2300,7 +2314,10 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..690e0749e4 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,8 +418,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -458,8 +457,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c index 04be574683..5e723e1193 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_neon.c +++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c @@ -36,7 +36,10 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxep[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -53,11 +56,14 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH; @@ -269,6 +275,8 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -281,6 +289,7 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..373df0c935 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,8 +34,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -69,8 +68,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +580,10 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +595,14 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +791,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +872,10 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +887,14 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -927,6 +941,8 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs_bh[3] = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); @@ -938,6 +954,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1366,10 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..b0d5232510 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,8 +29,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -77,8 +76,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -157,8 +159,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__ */ @@ -213,8 +218,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..6c6a810a15 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,6 +250,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -269,6 +267,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,6 +443,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -475,6 +476,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +792,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -841,8 +846,11 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..5a774550dc 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -243,6 +239,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -266,6 +264,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -474,6 +473,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -505,6 +506,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +989,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1029,7 +1034,10 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..f2991cee1a 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,8 +48,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -91,8 +90,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +427,10 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +442,14 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -489,6 +497,8 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh3 = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); @@ -504,6 +514,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop /** * to shift the 32b RSS hash value to the @@ -680,7 +691,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..07367a63ea 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -367,10 +365,13 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); /* A. load 4 pkts descs */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[0] = vld1q_u64((uint64_t *)(rxdp)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); @@ -554,7 +555,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, pkt->buf_iova + pkt->data_off, (uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..8e4048a32f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,8 +37,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -76,8 +75,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +468,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +483,14 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +684,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..0a89d2c414 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,10 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop +__rte_diagnostic_pop return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..e7deacc1fb 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..f4d08d5b30 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile void *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -129,8 +126,11 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, * E. store flow tag (rte_flow mark). */ cycle: +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (rxq->cqe_comp_layout) rte_prefetch0((void *)(cq + mcqe_n)); +__rte_diagnostic_pop for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -138,6 +138,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, __m128i byte_cnt, invalid_mask; #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) @@ -145,6 +147,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* A.1 load mCQEs into a 128bit register. */ mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); +__rte_diagnostic_pop /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -354,9 +357,12 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* Move to next CQE and invalidate consumed CQEs. */ if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (pos + 8 < mcqe_n) rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); +__rte_diagnostic_pop + mcq = (volatile void *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +377,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,6 +657,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqes[3] = _mm_loadl_epi64((__m128i *) &cq[pos + p3].sop_drop_qpn); rte_compiler_barrier(); @@ -683,6 +691,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); +__rte_diagnostic_pop cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,6 +709,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); @@ -710,6 +721,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); +__rte_diagnostic_pop cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v9 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (9 preceding siblings ...) 2025-01-01 0:48 ` [PATCH v8 0/3] " Andre Muezerie @ 2025-01-01 3:36 ` Andre Muezerie 2025-01-01 3:36 ` [PATCH v9 1/3] lib/eal: " Andre Muezerie ` (2 more replies) 2025-01-03 0:12 ` [PATCH v10 0/3] " Andre Muezerie ` (6 subsequent siblings) 17 siblings, 3 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-01 3:36 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. v9: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v8: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v7: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v6: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v5: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): lib/eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- drivers/net/axgbe/axgbe_rxtx.h | 9 ---- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 -- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++----- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 ++++++-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 ++++++- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 -- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 ++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++-- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++-- drivers/net/iavf/iavf_rxtx_vec_neon.c | 9 ++++ drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 ++++++++++-- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_common.h | 4 -- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++-- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 -- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 14 +++++- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 ++++++-- drivers/net/mlx5/mlx5_flow.c | 6 +-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 - drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 ++++++++--- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 9 ++++ drivers/net/tap/tap_flow.c | 6 +-- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 9 ++++ drivers/net/virtio/virtio_rxtx_simple.c | 4 -- lib/eal/include/rte_common.h | 23 ++++++++++ 37 files changed, 372 insertions(+), 138 deletions(-) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v9 1/3] lib/eal: add diagnostics macros to make code portable 2025-01-01 3:36 ` [PATCH v9 0/3] " Andre Muezerie @ 2025-01-01 3:36 ` Andre Muezerie 2025-01-01 3:36 ` [PATCH v9 2/3] drivers/common: " Andre Muezerie 2025-01-01 3:36 ` [PATCH v9 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-01 3:36 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4d299f2b36..13b7b92f46 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -137,6 +137,29 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macro to ignore whenever a pointer is cast so as to remove a type + * qualifier from the target type. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + /** * Mark a function or variable to a weak reference. */ -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v9 2/3] drivers/common: add diagnostics macros to make code portable 2025-01-01 3:36 ` [PATCH v9 0/3] " Andre Muezerie 2025-01-01 3:36 ` [PATCH v9 1/3] lib/eal: " Andre Muezerie @ 2025-01-01 3:36 ` Andre Muezerie 2025-01-01 3:36 ` [PATCH v9 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-01 3:36 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..37cd0a43e2 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,8 +30,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -108,8 +107,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +166,11 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +221,13 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)rxdp, desc0_1); _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); +__rte_diagnostic_pop rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -336,6 +344,8 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -359,6 +369,7 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,8 +571,11 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -634,8 +648,11 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -797,6 +814,8 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -820,6 +839,7 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1151,10 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1201,10 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1435,7 +1461,10 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1480,7 +1509,10 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1521,11 +1553,14 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); +__rte_diagnostic_pop nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1575,10 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); +__rte_diagnostic_pop tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v9 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-01 3:36 ` [PATCH v9 0/3] " Andre Muezerie 2025-01-01 3:36 ` [PATCH v9 1/3] lib/eal: " Andre Muezerie 2025-01-01 3:36 ` [PATCH v9 2/3] drivers/common: " Andre Muezerie @ 2025-01-01 3:36 ` Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-01 3:36 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 ------ drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 --- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++-------- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 +++++++++--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 ++-- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 -- drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++++---- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++++++---- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++++--- drivers/net/iavf/iavf_rxtx_vec_neon.c | 9 ++++++ drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 +++++++++++++++---- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_common.h | 4 --- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 --- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 14 +++++++-- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 +++++++++--- drivers/net/mlx5/mlx5_flow.c | 6 ++-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 ---- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 2 -- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 +++++++++++----- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 9 ++++++ drivers/net/tap/tap_flow.c | 6 ++-- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 9 ++++++ drivers/net/virtio/virtio_rxtx_simple.c | 4 --- 35 files changed, 307 insertions(+), 134 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..21c306fd94 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,10 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); +__rte_diagnostic_pop dq_storage++; num_rx++; @@ -2118,8 +2113,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..3d534a91ac 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].q, dma_addr0); +__rte_diagnostic_pop } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +315,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); +__rte_diagnostic_pop /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +467,10 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +482,14 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +744,10 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..99f080f3e8 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,11 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&desc->addr, val1); vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); +__rte_diagnostic_pop } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..bcabacf689 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,8 +32,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -144,8 +146,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__*/ @@ -190,8 +195,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..c5546132e9 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -43,8 +41,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = (__vector unsigned long){}; for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp[i].read); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +85,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +290,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = *(__vector unsigned long *)(rxdp + 3); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +303,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = *(__vector unsigned long *)(rxdp + 2); rte_compiler_barrier(); descs[1] = *(__vector unsigned long *)(rxdp + 1); rte_compiler_barrier(); descs[0] = *(__vector unsigned long *)(rxdp); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +544,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual *(__vector unsigned long *)txdp = descriptor; +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..90a1d4661a 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,6 +275,8 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -291,6 +292,7 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +697,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -728,8 +733,11 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..b77989074f 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -263,6 +262,8 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -286,6 +287,7 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +877,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -909,7 +914,10 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..cfc6d9ead3 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +38,10 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,11 +58,14 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +90,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,6 +427,8 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -433,6 +441,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +671,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..8a5537bcf2 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,8 +37,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +99,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +467,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +482,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +692,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..f5503d9dae 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,6 +189,8 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -208,6 +206,7 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +508,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -742,6 +741,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -765,6 +766,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -959,6 +961,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -990,6 +994,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1669,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1719,8 +1727,11 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..04894debc8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -164,6 +160,8 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -187,6 +185,7 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +599,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -733,6 +732,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -756,6 +757,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1111,6 +1113,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -1142,6 +1146,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1988,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2045,10 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -2225,7 +2236,10 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); +__rte_diagnostic_pop } static __rte_always_inline void @@ -2300,7 +2314,10 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..690e0749e4 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,8 +418,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -458,8 +457,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c index 04be574683..5e723e1193 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_neon.c +++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c @@ -36,7 +36,10 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxep[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -53,11 +56,14 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH; @@ -269,6 +275,8 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -281,6 +289,7 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..373df0c935 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,8 +34,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -69,8 +68,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +580,10 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +595,14 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +791,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +872,10 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +887,14 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -927,6 +941,8 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs_bh[3] = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); @@ -938,6 +954,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1366,10 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..b0d5232510 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,8 +29,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -77,8 +76,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -157,8 +159,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__ */ @@ -213,8 +218,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..6c6a810a15 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,6 +250,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -269,6 +267,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,6 +443,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -475,6 +476,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +792,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -841,8 +846,11 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..5a774550dc 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -243,6 +239,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -266,6 +264,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -474,6 +473,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -505,6 +506,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +989,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1029,7 +1034,10 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..f2991cee1a 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,8 +48,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -91,8 +90,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +427,10 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +442,14 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -489,6 +497,8 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh3 = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); @@ -504,6 +514,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop /** * to shift the 32b RSS hash value to the @@ -680,7 +691,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..491d1aa546 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -36,8 +34,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -60,12 +61,15 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -367,10 +371,13 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); /* A. load 4 pkts descs */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[0] = vld1q_u64((uint64_t *)(rxdp)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); @@ -554,7 +561,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, pkt->buf_iova + pkt->data_off, (uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..8e4048a32f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,8 +37,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -76,8 +75,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +468,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +483,14 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +684,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..0a89d2c414 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,10 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop +__rte_diagnostic_pop return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..e7deacc1fb 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..f4d08d5b30 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile void *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -129,8 +126,11 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, * E. store flow tag (rte_flow mark). */ cycle: +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (rxq->cqe_comp_layout) rte_prefetch0((void *)(cq + mcqe_n)); +__rte_diagnostic_pop for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -138,6 +138,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, __m128i byte_cnt, invalid_mask; #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) @@ -145,6 +147,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* A.1 load mCQEs into a 128bit register. */ mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); +__rte_diagnostic_pop /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -354,9 +357,12 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* Move to next CQE and invalidate consumed CQEs. */ if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (pos + 8 < mcqe_n) rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); +__rte_diagnostic_pop + mcq = (volatile void *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +377,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,6 +657,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqes[3] = _mm_loadl_epi64((__m128i *) &cq[pos + p3].sop_drop_qpn); rte_compiler_barrier(); @@ -683,6 +691,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); +__rte_diagnostic_pop cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,6 +709,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); @@ -710,6 +721,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); +__rte_diagnostic_pop cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c index 37075ea5e7..29e928472d 100644 --- a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c +++ b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c @@ -35,7 +35,10 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_NGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,12 +61,15 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_NGBE_RXQ_REARM_THRESH; @@ -484,7 +490,10 @@ vtx1(volatile struct ngbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c index d4d647fab5..a634cba97c 100644 --- a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c +++ b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c @@ -34,7 +34,10 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_TXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -57,12 +60,15 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_TXGBE_RXQ_REARM_THRESH; @@ -484,7 +490,10 @@ vtx1(volatile struct txgbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v10 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (10 preceding siblings ...) 2025-01-01 3:36 ` [PATCH v9 0/3] " Andre Muezerie @ 2025-01-03 0:12 ` Andre Muezerie 2025-01-03 0:12 ` [PATCH v10 1/3] lib/eal: " Andre Muezerie ` (2 more replies) 2025-01-03 15:36 ` [PATCH v11 0/3] " Andre Muezerie ` (5 subsequent siblings) 17 siblings, 3 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-03 0:12 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. v10: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v9: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v8: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v7: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v6: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v5: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): lib/eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- drivers/net/axgbe/axgbe_rxtx.h | 9 ---- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 -- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++----- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 ++++++-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 ++++++- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 -- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 ++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++-- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++-- drivers/net/iavf/iavf_rxtx_vec_neon.c | 9 ++++ drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 ++++++++++-- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_common.h | 4 -- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++-- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 -- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 14 +++++- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 ++++++-- drivers/net/mlx5/mlx5_flow.c | 6 +-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 15 +++--- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 ++++++++--- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 9 ++++ drivers/net/tap/tap_flow.c | 6 +-- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 9 ++++ drivers/net/virtio/virtio_rxtx_simple.c | 4 -- lib/eal/include/rte_common.h | 23 ++++++++++ 37 files changed, 380 insertions(+), 143 deletions(-) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v10 1/3] lib/eal: add diagnostics macros to make code portable 2025-01-03 0:12 ` [PATCH v10 0/3] " Andre Muezerie @ 2025-01-03 0:12 ` Andre Muezerie 2025-01-03 0:12 ` [PATCH v10 2/3] drivers/common: " Andre Muezerie 2025-01-03 0:12 ` [PATCH v10 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-03 0:12 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4d299f2b36..13b7b92f46 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -137,6 +137,29 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macro to ignore whenever a pointer is cast so as to remove a type + * qualifier from the target type. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + /** * Mark a function or variable to a weak reference. */ -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v10 2/3] drivers/common: add diagnostics macros to make code portable 2025-01-03 0:12 ` [PATCH v10 0/3] " Andre Muezerie 2025-01-03 0:12 ` [PATCH v10 1/3] lib/eal: " Andre Muezerie @ 2025-01-03 0:12 ` Andre Muezerie 2025-01-03 0:12 ` [PATCH v10 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-03 0:12 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..37cd0a43e2 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,8 +30,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -108,8 +107,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +166,11 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +221,13 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)rxdp, desc0_1); _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); +__rte_diagnostic_pop rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -336,6 +344,8 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -359,6 +369,7 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,8 +571,11 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -634,8 +648,11 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -797,6 +814,8 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -820,6 +839,7 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1151,10 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1201,10 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1435,7 +1461,10 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1480,7 +1509,10 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1521,11 +1553,14 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); +__rte_diagnostic_pop nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1575,10 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); +__rte_diagnostic_pop tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v10 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-03 0:12 ` [PATCH v10 0/3] " Andre Muezerie 2025-01-03 0:12 ` [PATCH v10 1/3] lib/eal: " Andre Muezerie 2025-01-03 0:12 ` [PATCH v10 2/3] drivers/common: " Andre Muezerie @ 2025-01-03 0:12 ` Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-03 0:12 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 ------ drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 --- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++-------- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 +++++++++--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 ++-- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 -- drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++++---- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++++++---- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++++--- drivers/net/iavf/iavf_rxtx_vec_neon.c | 9 ++++++ drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 +++++++++++++++---- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_common.h | 4 --- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 --- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 14 +++++++-- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 +++++++++--- drivers/net/mlx5/mlx5_flow.c | 6 ++-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 ---- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 15 +++++----- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 +++++++++++----- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 9 ++++++ drivers/net/tap/tap_flow.c | 6 ++-- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 9 ++++++ drivers/net/virtio/virtio_rxtx_simple.c | 4 --- 35 files changed, 315 insertions(+), 139 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..21c306fd94 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,10 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); +__rte_diagnostic_pop dq_storage++; num_rx++; @@ -2118,8 +2113,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..3d534a91ac 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].q, dma_addr0); +__rte_diagnostic_pop } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +315,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); +__rte_diagnostic_pop /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +467,10 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +482,14 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +744,10 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..99f080f3e8 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,11 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&desc->addr, val1); vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); +__rte_diagnostic_pop } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..bcabacf689 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,8 +32,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -144,8 +146,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__*/ @@ -190,8 +195,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..c5546132e9 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -43,8 +41,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = (__vector unsigned long){}; for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp[i].read); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +85,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +290,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = *(__vector unsigned long *)(rxdp + 3); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +303,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = *(__vector unsigned long *)(rxdp + 2); rte_compiler_barrier(); descs[1] = *(__vector unsigned long *)(rxdp + 1); rte_compiler_barrier(); descs[0] = *(__vector unsigned long *)(rxdp); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +544,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual *(__vector unsigned long *)txdp = descriptor; +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..90a1d4661a 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,6 +275,8 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -291,6 +292,7 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +697,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -728,8 +733,11 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..b77989074f 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -263,6 +262,8 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -286,6 +287,7 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +877,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -909,7 +914,10 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..cfc6d9ead3 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +38,10 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,11 +58,14 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +90,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,6 +427,8 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -433,6 +441,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +671,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..8a5537bcf2 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,8 +37,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +99,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +467,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +482,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +692,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..f5503d9dae 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,6 +189,8 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -208,6 +206,7 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +508,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -742,6 +741,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -765,6 +766,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -959,6 +961,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -990,6 +994,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1669,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1719,8 +1727,11 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..04894debc8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -164,6 +160,8 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -187,6 +185,7 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +599,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -733,6 +732,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -756,6 +757,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1111,6 +1113,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -1142,6 +1146,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1988,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2045,10 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -2225,7 +2236,10 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); +__rte_diagnostic_pop } static __rte_always_inline void @@ -2300,7 +2314,10 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..690e0749e4 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,8 +418,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -458,8 +457,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c index 04be574683..5e723e1193 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_neon.c +++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c @@ -36,7 +36,10 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxep[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -53,11 +56,14 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH; @@ -269,6 +275,8 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -281,6 +289,7 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..373df0c935 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,8 +34,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -69,8 +68,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +580,10 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +595,14 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +791,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +872,10 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +887,14 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -927,6 +941,8 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs_bh[3] = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); @@ -938,6 +954,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1366,10 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..b0d5232510 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,8 +29,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -77,8 +76,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -157,8 +159,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__ */ @@ -213,8 +218,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..6c6a810a15 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,6 +250,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -269,6 +267,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,6 +443,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -475,6 +476,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +792,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -841,8 +846,11 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..5a774550dc 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -243,6 +239,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -266,6 +264,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -474,6 +473,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -505,6 +506,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +989,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1029,7 +1034,10 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..f2991cee1a 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,8 +48,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -91,8 +90,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +427,10 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +442,14 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -489,6 +497,8 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh3 = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); @@ -504,6 +514,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop /** * to shift the 32b RSS hash value to the @@ -680,7 +691,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..491d1aa546 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -36,8 +34,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -60,12 +61,15 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -367,10 +371,13 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); /* A. load 4 pkts descs */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[0] = vld1q_u64((uint64_t *)(rxdp)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); @@ -554,7 +561,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, pkt->buf_iova + pkt->data_off, (uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..8e4048a32f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,8 +37,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -76,8 +75,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +468,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +483,14 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +684,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..0a89d2c414 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,10 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop +__rte_diagnostic_pop return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..191df684b0 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * @@ -138,10 +136,13 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, * E. store flow tag (rte_flow mark). */ cycle: +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (rxq->cqe_comp_layout) rte_prefetch0((void *)(cq + mcqe_n)); for (pos = 0; pos < mcqe_n; ) { uint8_t *p = (void *)&mcq[pos % 8]; +__rte_diagnostic_pop uint8_t *e0 = (void *)&elts[pos]->rearm_data; uint8_t *e1 = (void *)&elts[pos + 1]->rearm_data; uint8_t *e2 = (void *)&elts[pos + 2]->rearm_data; @@ -157,7 +158,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) - rte_prefetch0((void *)(cq + pos + i)); + rte_prefetch0((volatile void *)(cq + pos + i)); __asm__ volatile ( /* A.1 load mCQEs into a 128bit register. */ "ld1 {v16.16b - v17.16b}, [%[mcq]] \n\t" @@ -367,8 +368,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { if (pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)&(cq + pos)->pkt_info; + rte_prefetch0((volatile void *)(cq + pos + 8)); + mcq = (volatile void *)&(cq + pos)->pkt_info; for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -383,7 +384,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -663,7 +664,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, mask = vcreate_u16(pkts_n - pos < MLX5_VPMD_DESCS_PER_LOOP ? -1UL >> ((pkts_n - pos) * sizeof(uint16_t) * 8) : 0); - p0 = (void *)&cq[pos].pkt_info; + p0 = (volatile void *)&cq[pos].pkt_info; p1 = p0 + (pkts_n - pos > 1) * sizeof(struct mlx5_cqe); p2 = p1 + (pkts_n - pos > 2) * sizeof(struct mlx5_cqe); p3 = p2 + (pkts_n - pos > 3) * sizeof(struct mlx5_cqe); diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..f4d08d5b30 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile void *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -129,8 +126,11 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, * E. store flow tag (rte_flow mark). */ cycle: +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (rxq->cqe_comp_layout) rte_prefetch0((void *)(cq + mcqe_n)); +__rte_diagnostic_pop for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -138,6 +138,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, __m128i byte_cnt, invalid_mask; #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) @@ -145,6 +147,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* A.1 load mCQEs into a 128bit register. */ mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); +__rte_diagnostic_pop /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -354,9 +357,12 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* Move to next CQE and invalidate consumed CQEs. */ if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (pos + 8 < mcqe_n) rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); +__rte_diagnostic_pop + mcq = (volatile void *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +377,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,6 +657,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqes[3] = _mm_loadl_epi64((__m128i *) &cq[pos + p3].sop_drop_qpn); rte_compiler_barrier(); @@ -683,6 +691,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); +__rte_diagnostic_pop cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,6 +709,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); @@ -710,6 +721,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); +__rte_diagnostic_pop cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c index 37075ea5e7..29e928472d 100644 --- a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c +++ b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c @@ -35,7 +35,10 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_NGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,12 +61,15 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_NGBE_RXQ_REARM_THRESH; @@ -484,7 +490,10 @@ vtx1(volatile struct ngbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c index d4d647fab5..a634cba97c 100644 --- a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c +++ b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c @@ -34,7 +34,10 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_TXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -57,12 +60,15 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_TXGBE_RXQ_REARM_THRESH; @@ -484,7 +490,10 @@ vtx1(volatile struct txgbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v11 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (11 preceding siblings ...) 2025-01-03 0:12 ` [PATCH v10 0/3] " Andre Muezerie @ 2025-01-03 15:36 ` Andre Muezerie 2025-01-03 15:36 ` [PATCH v11 1/3] lib/eal: " Andre Muezerie ` (3 more replies) 2025-01-15 4:27 ` [PATCH v12 " Andre Muezerie ` (4 subsequent siblings) 17 siblings, 4 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-03 15:36 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. v11: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v10: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v9: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v8: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v7: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v6: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v5: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): lib/eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- drivers/net/axgbe/axgbe_rxtx.h | 9 ---- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 -- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++----- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 ++++++-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 ++++++- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++-- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 -- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 ++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++-- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++-- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++-- drivers/net/iavf/iavf_rxtx_vec_neon.c | 9 ++++ drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 ++++++++++-- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++-- drivers/net/ice/ice_rxtx_vec_common.h | 4 -- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++-- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 -- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 14 +++++- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 ++++++-- drivers/net/mlx5/mlx5_flow.c | 6 +-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 +++++--- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 ++++++++--- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 9 ++++ drivers/net/tap/tap_flow.c | 6 +-- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 9 ++++ drivers/net/virtio/virtio_rxtx_simple.c | 4 -- lib/eal/include/rte_common.h | 23 ++++++++++ 37 files changed, 383 insertions(+), 143 deletions(-) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v11 1/3] lib/eal: add diagnostics macros to make code portable 2025-01-03 15:36 ` [PATCH v11 0/3] " Andre Muezerie @ 2025-01-03 15:36 ` Andre Muezerie 2025-01-03 15:36 ` [PATCH v11 2/3] drivers/common: " Andre Muezerie ` (2 subsequent siblings) 3 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-03 15:36 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4d299f2b36..13b7b92f46 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -137,6 +137,29 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macro to ignore whenever a pointer is cast so as to remove a type + * qualifier from the target type. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + /** * Mark a function or variable to a weak reference. */ -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v11 2/3] drivers/common: add diagnostics macros to make code portable 2025-01-03 15:36 ` [PATCH v11 0/3] " Andre Muezerie 2025-01-03 15:36 ` [PATCH v11 1/3] lib/eal: " Andre Muezerie @ 2025-01-03 15:36 ` Andre Muezerie 2025-01-03 15:36 ` [PATCH v11 3/3] drivers/net: " Andre Muezerie 2025-01-03 19:24 ` [PATCH v11 0/3] " Stephen Hemminger 3 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-03 15:36 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 46 +++++++++++++++++-- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..37cd0a43e2 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,8 +30,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -108,8 +107,11 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +166,11 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +221,13 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)rxdp, desc0_1); _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); +__rte_diagnostic_pop rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -336,6 +344,8 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -359,6 +369,7 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,8 +571,11 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -634,8 +648,11 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)&rxdp[i], dma_addr0); +__rte_diagnostic_pop } } rte_atomic_fetch_add_explicit(&rx_bufq->rx_stats.mbuf_alloc_failed, @@ -797,6 +814,8 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -820,6 +839,7 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1151,10 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1201,10 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1435,7 +1461,10 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1480,7 +1509,10 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -1521,11 +1553,14 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); +__rte_diagnostic_pop nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1575,10 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); +__rte_diagnostic_pop tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v11 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-03 15:36 ` [PATCH v11 0/3] " Andre Muezerie 2025-01-03 15:36 ` [PATCH v11 1/3] lib/eal: " Andre Muezerie 2025-01-03 15:36 ` [PATCH v11 2/3] drivers/common: " Andre Muezerie @ 2025-01-03 15:36 ` Andre Muezerie 2025-01-03 19:24 ` [PATCH v11 0/3] " Stephen Hemminger 3 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-03 15:36 UTC (permalink / raw) To: andremue; +Cc: dev, stephen It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 ------ drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 --- drivers/net/dpaa2/dpaa2_rxtx.c | 16 ++-------- drivers/net/fm10k/fm10k_rxtx_vec.c | 19 +++++++++--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 5 ++-- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 -- drivers/net/i40e/i40e_rxtx_common_avx.h | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 17 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 --- drivers/net/i40e/i40e_rxtx_vec_neon.c | 18 +++++++++-- drivers/net/i40e/i40e_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 21 +++++++++---- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 27 +++++++++++++---- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 ++++--- drivers/net/iavf/iavf_rxtx_vec_neon.c | 9 ++++++ drivers/net/iavf/iavf_rxtx_vec_sse.c | 30 +++++++++++++++---- drivers/net/ice/ice_rxtx_common_avx.h | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_avx512.c | 16 +++++++--- drivers/net/ice/ice_rxtx_vec_common.h | 4 --- drivers/net/ice/ice_rxtx_vec_sse.c | 22 +++++++++++--- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 --- .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 -- drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 14 +++++++-- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 19 +++++++++--- drivers/net/mlx5/mlx5_flow.c | 6 ++-- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 ---- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 ++++++----- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 26 +++++++++++----- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 9 ++++++ drivers/net/tap/tap_flow.c | 6 ++-- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 9 ++++++ drivers/net/virtio/virtio_rxtx_simple.c | 4 --- 35 files changed, 318 insertions(+), 139 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..21c306fd94 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,10 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); +__rte_diagnostic_pop dq_storage++; num_rx++; @@ -2118,8 +2113,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..3d534a91ac 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].q, dma_addr0); +__rte_diagnostic_pop } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +315,11 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); +__rte_diagnostic_pop /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +467,10 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +482,14 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +744,10 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..99f080f3e8 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,11 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&desc->addr, val1); vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); +__rte_diagnostic_pop } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..bcabacf689 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,8 +32,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -144,8 +146,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__*/ @@ -190,8 +195,11 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..c5546132e9 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -43,8 +41,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = (__vector unsigned long){}; for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp[i].read); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +85,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +290,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = *(__vector unsigned long *)(rxdp + 3); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +303,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = *(__vector unsigned long *)(rxdp + 2); rte_compiler_barrier(); descs[1] = *(__vector unsigned long *)(rxdp + 1); rte_compiler_barrier(); descs[0] = *(__vector unsigned long *)(rxdp); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +544,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual *(__vector unsigned long *)txdp = descriptor; +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..90a1d4661a 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,6 +275,8 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -291,6 +292,7 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +697,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -728,8 +733,11 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..b77989074f 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,11 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); +__rte_diagnostic_pop const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -263,6 +262,8 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -286,6 +287,7 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +877,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void @@ -909,7 +914,10 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..cfc6d9ead3 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +38,10 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,11 +58,14 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +90,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,6 +427,8 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -433,6 +441,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +671,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..8a5537bcf2 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,8 +37,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -72,8 +71,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +99,13 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); +__rte_diagnostic_pop /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +467,10 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +482,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +692,10 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..f5503d9dae 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,6 +189,8 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -208,6 +206,7 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +508,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -742,6 +741,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -765,6 +766,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -959,6 +961,8 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -990,6 +994,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1669,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1719,8 +1727,11 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..04894debc8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -164,6 +160,8 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, #endif __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -187,6 +185,7 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +599,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -733,6 +732,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -756,6 +757,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1111,6 +1113,8 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -1142,6 +1146,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1988,10 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_storeu_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2045,10 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ @@ -2225,7 +2236,10 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); +__rte_diagnostic_pop } static __rte_always_inline void @@ -2300,7 +2314,10 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..690e0749e4 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,8 +418,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -458,8 +457,11 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c index 04be574683..5e723e1193 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_neon.c +++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c @@ -36,7 +36,10 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxep[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -53,11 +56,14 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH; @@ -269,6 +275,8 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); @@ -281,6 +289,7 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); +__rte_diagnostic_pop /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..373df0c935 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,8 +34,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -69,8 +68,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +580,10 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +595,14 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +791,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +872,10 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +887,14 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -927,6 +941,8 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs_bh[3] = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); @@ -938,6 +954,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1366,10 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..b0d5232510 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,8 +29,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -77,8 +76,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } #else #ifdef __AVX512VL__ @@ -157,8 +159,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); +__rte_diagnostic_pop } } else #endif /* __AVX512VL__ */ @@ -213,8 +218,11 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); +__rte_diagnostic_pop } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..6c6a810a15 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,6 +250,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); @@ -269,6 +267,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,6 +443,8 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -475,6 +476,7 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +792,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -841,8 +846,11 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm256_store_si256((void *)(txdp + 2), desc2_3); _mm256_store_si256((void *)txdp, desc0_1); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..5a774550dc 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -243,6 +239,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; /* load in descriptors, in reverse order */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); rte_compiler_barrier(); @@ -266,6 +264,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, rte_compiler_barrier(); const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); +__rte_diagnostic_pop raw_desc6_7 = _mm256_inserti128_si256 @@ -474,6 +473,8 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh7 = _mm_load_si128 ((void *)(&rxdp[7].wb.status_error1)); @@ -505,6 +506,7 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +989,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static __rte_always_inline void @@ -1029,7 +1034,10 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm512_storeu_si512((void *)txdp, desc0_3); +__rte_diagnostic_pop } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..f2991cee1a 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,8 +48,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -91,8 +90,11 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +427,10 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +442,14 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -489,6 +497,8 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual const __m128i raw_desc_bh3 = _mm_load_si128 ((void *)(&rxdp[3].wb.status_error1)); @@ -504,6 +514,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, const __m128i raw_desc_bh0 = _mm_load_si128 ((void *)(&rxdp[0].wb.status_error1)); +__rte_diagnostic_pop /** * to shift the 32b RSS hash value to the @@ -680,7 +691,10 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..491d1aa546 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -36,8 +34,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp[i].read, zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -60,12 +61,15 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -367,10 +371,13 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); /* A. load 4 pkts descs */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[0] = vld1q_u64((uint64_t *)(rxdp)); descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); +__rte_diagnostic_pop /* B.2 copy 2 mbuf point into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); @@ -554,7 +561,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, pkt->buf_iova + pkt->data_off, (uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..8e4048a32f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,8 +37,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -76,8 +75,11 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +468,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); +__rte_diagnostic_pop rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +483,14 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); rte_compiler_barrier(); descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); rte_compiler_barrier(); descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); +__rte_diagnostic_pop #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +684,10 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&txdp->read, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..0a89d2c414 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,10 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop +__rte_diagnostic_pop return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..b7e2aacff1 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * @@ -75,7 +73,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { volatile struct mlx5_mini_cqe8 *mcq = - (void *)&(cq + !rxq->cqe_comp_layout)->pkt_info; + (volatile void *)&(cq + !rxq->cqe_comp_layout)->pkt_info; /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -138,10 +136,13 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, * E. store flow tag (rte_flow mark). */ cycle: +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (rxq->cqe_comp_layout) rte_prefetch0((void *)(cq + mcqe_n)); for (pos = 0; pos < mcqe_n; ) { uint8_t *p = (void *)&mcq[pos % 8]; +__rte_diagnostic_pop uint8_t *e0 = (void *)&elts[pos]->rearm_data; uint8_t *e1 = (void *)&elts[pos + 1]->rearm_data; uint8_t *e2 = (void *)&elts[pos + 2]->rearm_data; @@ -157,7 +158,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) - rte_prefetch0((void *)(cq + pos + i)); + rte_prefetch0((volatile void *)(cq + pos + i)); __asm__ volatile ( /* A.1 load mCQEs into a 128bit register. */ "ld1 {v16.16b - v17.16b}, [%[mcq]] \n\t" @@ -367,8 +368,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { if (pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)&(cq + pos)->pkt_info; + rte_prefetch0((volatile void *)(cq + pos + 8)); + mcq = (volatile void *)&(cq + pos)->pkt_info; for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -383,7 +384,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -663,7 +664,10 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, mask = vcreate_u16(pkts_n - pos < MLX5_VPMD_DESCS_PER_LOOP ? -1UL >> ((pkts_n - pos) * sizeof(uint16_t) * 8) : 0); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual p0 = (void *)&cq[pos].pkt_info; +__rte_diagnostic_pop p1 = p0 + (pkts_n - pos > 1) * sizeof(struct mlx5_cqe); p2 = p1 + (pkts_n - pos > 2) * sizeof(struct mlx5_cqe); p3 = p2 + (pkts_n - pos > 3) * sizeof(struct mlx5_cqe); diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..f4d08d5b30 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile void *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -129,8 +126,11 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, * E. store flow tag (rte_flow mark). */ cycle: +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (rxq->cqe_comp_layout) rte_prefetch0((void *)(cq + mcqe_n)); +__rte_diagnostic_pop for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -138,6 +138,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, __m128i byte_cnt, invalid_mask; #endif +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) @@ -145,6 +147,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* A.1 load mCQEs into a 128bit register. */ mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); +__rte_diagnostic_pop /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -354,9 +357,12 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* Move to next CQE and invalidate consumed CQEs. */ if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual if (pos + 8 < mcqe_n) rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); +__rte_diagnostic_pop + mcq = (volatile void *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +377,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,6 +657,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqes[3] = _mm_loadl_epi64((__m128i *) &cq[pos + p3].sop_drop_qpn); rte_compiler_barrier(); @@ -683,6 +691,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); +__rte_diagnostic_pop cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,6 +709,8 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); @@ -710,6 +721,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); +__rte_diagnostic_pop cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c index 37075ea5e7..29e928472d 100644 --- a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c +++ b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c @@ -35,7 +35,10 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_NGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,12 +61,15 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_NGBE_RXQ_REARM_THRESH; @@ -484,7 +490,10 @@ vtx1(volatile struct ngbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c index d4d647fab5..a634cba97c 100644 --- a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c +++ b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c @@ -34,7 +34,10 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_TXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); +__rte_diagnostic_pop } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -57,12 +60,15 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); +__rte_diagnostic_pop } rxq->rxrearm_start += RTE_TXGBE_RXQ_REARM_THRESH; @@ -484,7 +490,10 @@ vtx1(volatile struct txgbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); +__rte_diagnostic_pop } static inline void diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v11 0/3] add diagnostics macros to make code portable 2025-01-03 15:36 ` [PATCH v11 0/3] " Andre Muezerie ` (2 preceding siblings ...) 2025-01-03 15:36 ` [PATCH v11 3/3] drivers/net: " Andre Muezerie @ 2025-01-03 19:24 ` Stephen Hemminger 2025-01-03 21:26 ` Andre Muezerie 3 siblings, 1 reply; 87+ messages in thread From: Stephen Hemminger @ 2025-01-03 19:24 UTC (permalink / raw) To: Andre Muezerie; +Cc: dev On Fri, 3 Jan 2025 07:36:48 -0800 Andre Muezerie <andremue@linux.microsoft.com> wrote: > From: Andre Muezerie <andremue@linux.microsoft.com> > To: andremue@linux.microsoft.com > Cc: dev@dpdk.org, stephen@networkplumber.org > Subject: [PATCH v11 0/3] add diagnostics macros to make code portable > Date: Fri, 3 Jan 2025 07:36:48 -0800 > X-Mailer: git-send-email 1.8.3.1 > > It was a common pattern to have "GCC diagnostic ignored" pragmas > sprinkled over the code and only activate these pragmas for certain > compilers (gcc and clang). Clang supports GCC's pragma for > compatibility with existing source code, so #pragma GCC diagnostic > and #pragma clang diagnostic are synonyms for Clang > (https://clang.llvm.org/docs/UsersManual.html). > > Now that effort is being made to make the code compatible with MSVC > these expressions would become more complex. It makes sense to hide > this complexity behind macros. This makes maintenance easier as these > macros are defined in a single place. As a plus the code becomes > more readable as well. Since 90% of these cases are about removing const from a pointer, maybe it would be better to have a macro that did that? Would not work for base driver code which is pretending to be platform independent. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v11 0/3] add diagnostics macros to make code portable 2025-01-03 19:24 ` [PATCH v11 0/3] " Stephen Hemminger @ 2025-01-03 21:26 ` Andre Muezerie 2025-01-06 11:00 ` Bruce Richardson 0 siblings, 1 reply; 87+ messages in thread From: Andre Muezerie @ 2025-01-03 21:26 UTC (permalink / raw) To: Stephen Hemminger; +Cc: dev On Fri, Jan 03, 2025 at 11:24:02AM -0800, Stephen Hemminger wrote: > On Fri, 3 Jan 2025 07:36:48 -0800 > Andre Muezerie <andremue@linux.microsoft.com> wrote: > > > From: Andre Muezerie <andremue@linux.microsoft.com> > > To: andremue@linux.microsoft.com > > Cc: dev@dpdk.org, stephen@networkplumber.org > > Subject: [PATCH v11 0/3] add diagnostics macros to make code portable > > Date: Fri, 3 Jan 2025 07:36:48 -0800 > > X-Mailer: git-send-email 1.8.3.1 > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > sprinkled over the code and only activate these pragmas for certain > > compilers (gcc and clang). Clang supports GCC's pragma for > > compatibility with existing source code, so #pragma GCC diagnostic > > and #pragma clang diagnostic are synonyms for Clang > > (https://clang.llvm.org/docs/UsersManual.html). > > > > Now that effort is being made to make the code compatible with MSVC > > these expressions would become more complex. It makes sense to hide > > this complexity behind macros. This makes maintenance easier as these > > macros are defined in a single place. As a plus the code becomes > > more readable as well. > > Since 90% of these cases are about removing const from a pointer, > maybe it would be better to have a macro that did that? > > Would not work for base driver code which is pretending to be platform independent. Most of the warnings I've seen were about dropping the volatile qualifier, like the one below: ../drivers/net/i40e/i40e_rxtx_vec_sse.c:42:32: warning: cast from 'volatile struct i40e_32byte_rx_desc::(unnamed at ../drivers/net/i40e/base/i40e_type.h:803:2) *' to '__attribute__((__vector_size__(2 * sizeof(long long)))) long long *' drops volatile qualifier [-Wcast-qual] 42 | _mm_store_si128((__m128i *)&rxdp[i].read, | ^ To make sure I understood your suggestion correctly, you're proposing to replace this __rte_diagnostic_push __rte_diagnostic_ignored_wcast_qual _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); __rte_diagnostic_pop with something like this? _mm_store_si128(RTE_IGNORE_CAST_QUAL((__m128i *)&rxdp[i].read), dma_addr0); This could be done, and I think it does look better, despite the slight line length increase. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v11 0/3] add diagnostics macros to make code portable 2025-01-03 21:26 ` Andre Muezerie @ 2025-01-06 11:00 ` Bruce Richardson 2025-01-08 2:46 ` Andre Muezerie 0 siblings, 1 reply; 87+ messages in thread From: Bruce Richardson @ 2025-01-06 11:00 UTC (permalink / raw) To: Andre Muezerie; +Cc: Stephen Hemminger, dev On Fri, Jan 03, 2025 at 01:26:34PM -0800, Andre Muezerie wrote: > On Fri, Jan 03, 2025 at 11:24:02AM -0800, Stephen Hemminger wrote: > > On Fri, 3 Jan 2025 07:36:48 -0800 > > Andre Muezerie <andremue@linux.microsoft.com> wrote: > > > > > From: Andre Muezerie <andremue@linux.microsoft.com> > > > To: andremue@linux.microsoft.com > > > Cc: dev@dpdk.org, stephen@networkplumber.org > > > Subject: [PATCH v11 0/3] add diagnostics macros to make code portable > > > Date: Fri, 3 Jan 2025 07:36:48 -0800 > > > X-Mailer: git-send-email 1.8.3.1 > > > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > > sprinkled over the code and only activate these pragmas for certain > > > compilers (gcc and clang). Clang supports GCC's pragma for > > > compatibility with existing source code, so #pragma GCC diagnostic > > > and #pragma clang diagnostic are synonyms for Clang > > > (https://clang.llvm.org/docs/UsersManual.html). > > > > > > Now that effort is being made to make the code compatible with MSVC > > > these expressions would become more complex. It makes sense to hide > > > this complexity behind macros. This makes maintenance easier as these > > > macros are defined in a single place. As a plus the code becomes > > > more readable as well. > > > > Since 90% of these cases are about removing const from a pointer, > > maybe it would be better to have a macro that did that? > > > > Would not work for base driver code which is pretending to be platform independent. > > Most of the warnings I've seen were about dropping the volatile qualifier, like the one below: > > ../drivers/net/i40e/i40e_rxtx_vec_sse.c:42:32: warning: cast from 'volatile struct i40e_32byte_rx_desc::(unnamed at ../drivers/net/i40e/base/i40e_type.h:803:2) *' to '__attribute__((__vector_size__(2 * sizeof(long long)))) long long *' drops volatile qualifier [-Wcast-qual] > 42 | _mm_store_si128((__m128i *)&rxdp[i].read, > | ^ > > To make sure I understood your suggestion correctly, you're proposing to replace this > > __rte_diagnostic_push > __rte_diagnostic_ignored_wcast_qual > _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); > __rte_diagnostic_pop > > > with something like this? > > _mm_store_si128(RTE_IGNORE_CAST_QUAL((__m128i *)&rxdp[i].read), dma_addr0); > > This could be done, and I think it does look better, despite the slight line length increase. +1 for this option. One macro can be used to drop all qualifiers, both const and volatile, right? ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v11 0/3] add diagnostics macros to make code portable 2025-01-06 11:00 ` Bruce Richardson @ 2025-01-08 2:46 ` Andre Muezerie 2025-01-08 9:20 ` Bruce Richardson 0 siblings, 1 reply; 87+ messages in thread From: Andre Muezerie @ 2025-01-08 2:46 UTC (permalink / raw) To: Bruce Richardson; +Cc: Stephen Hemminger, dev On Mon, Jan 06, 2025 at 11:00:15AM +0000, Bruce Richardson wrote: > On Fri, Jan 03, 2025 at 01:26:34PM -0800, Andre Muezerie wrote: > > On Fri, Jan 03, 2025 at 11:24:02AM -0800, Stephen Hemminger wrote: > > > On Fri, 3 Jan 2025 07:36:48 -0800 > > > Andre Muezerie <andremue@linux.microsoft.com> wrote: > > > > > > > From: Andre Muezerie <andremue@linux.microsoft.com> > > > > To: andremue@linux.microsoft.com > > > > Cc: dev@dpdk.org, stephen@networkplumber.org > > > > Subject: [PATCH v11 0/3] add diagnostics macros to make code portable > > > > Date: Fri, 3 Jan 2025 07:36:48 -0800 > > > > X-Mailer: git-send-email 1.8.3.1 > > > > > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > > > sprinkled over the code and only activate these pragmas for certain > > > > compilers (gcc and clang). Clang supports GCC's pragma for > > > > compatibility with existing source code, so #pragma GCC diagnostic > > > > and #pragma clang diagnostic are synonyms for Clang > > > > (https://clang.llvm.org/docs/UsersManual.html). > > > > > > > > Now that effort is being made to make the code compatible with MSVC > > > > these expressions would become more complex. It makes sense to hide > > > > this complexity behind macros. This makes maintenance easier as these > > > > macros are defined in a single place. As a plus the code becomes > > > > more readable as well. > > > > > > Since 90% of these cases are about removing const from a pointer, > > > maybe it would be better to have a macro that did that? > > > > > > Would not work for base driver code which is pretending to be platform independent. > > > > Most of the warnings I've seen were about dropping the volatile qualifier, like the one below: > > > > ../drivers/net/i40e/i40e_rxtx_vec_sse.c:42:32: warning: cast from 'volatile struct i40e_32byte_rx_desc::(unnamed at ../drivers/net/i40e/base/i40e_type.h:803:2) *' to '__attribute__((__vector_size__(2 * sizeof(long long)))) long long *' drops volatile qualifier [-Wcast-qual] > > 42 | _mm_store_si128((__m128i *)&rxdp[i].read, > > | ^ > > > > To make sure I understood your suggestion correctly, you're proposing to replace this > > > > __rte_diagnostic_push > > __rte_diagnostic_ignored_wcast_qual > > _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); > > __rte_diagnostic_pop > > > > > > with something like this? > > > > _mm_store_si128(RTE_IGNORE_CAST_QUAL((__m128i *)&rxdp[i].read), dma_addr0); > > > > This could be done, and I think it does look better, despite the slight line length increase. > > +1 for this option. One macro can be used to drop all qualifiers, both > const and volatile, right? Yes, a single macro can drop all qualifiers. I did realize though that the macro must involve the entire expression - it cannot be used just around one parameter, unfortunately. It would look like: RTE_IGNORE_CAST_QUAL(_mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0);) This is still the same line length as before, but not as elegant. For some code blocks where many consecutive lines (4+) would require this macro I feel that it might still make sense to use __rte_diagnostic_ignored_wcast_qual, because using RTE_IGNORE_CAST_QUAL might result in additional lines anyways due to 80 column width limit. I'm planning to submit a new version of the patchset with this mixed approach unless there are objections. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v11 0/3] add diagnostics macros to make code portable 2025-01-08 2:46 ` Andre Muezerie @ 2025-01-08 9:20 ` Bruce Richardson 2025-01-14 19:20 ` Andre Muezerie 0 siblings, 1 reply; 87+ messages in thread From: Bruce Richardson @ 2025-01-08 9:20 UTC (permalink / raw) To: Andre Muezerie; +Cc: Stephen Hemminger, dev On Tue, Jan 07, 2025 at 06:46:48PM -0800, Andre Muezerie wrote: > On Mon, Jan 06, 2025 at 11:00:15AM +0000, Bruce Richardson wrote: > > On Fri, Jan 03, 2025 at 01:26:34PM -0800, Andre Muezerie wrote: > > > On Fri, Jan 03, 2025 at 11:24:02AM -0800, Stephen Hemminger wrote: > > > > On Fri, 3 Jan 2025 07:36:48 -0800 > > > > Andre Muezerie <andremue@linux.microsoft.com> wrote: > > > > > > > > > From: Andre Muezerie <andremue@linux.microsoft.com> > > > > > To: andremue@linux.microsoft.com > > > > > Cc: dev@dpdk.org, stephen@networkplumber.org > > > > > Subject: [PATCH v11 0/3] add diagnostics macros to make code portable > > > > > Date: Fri, 3 Jan 2025 07:36:48 -0800 > > > > > X-Mailer: git-send-email 1.8.3.1 > > > > > > > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > > > > sprinkled over the code and only activate these pragmas for certain > > > > > compilers (gcc and clang). Clang supports GCC's pragma for > > > > > compatibility with existing source code, so #pragma GCC diagnostic > > > > > and #pragma clang diagnostic are synonyms for Clang > > > > > (https://clang.llvm.org/docs/UsersManual.html). > > > > > > > > > > Now that effort is being made to make the code compatible with MSVC > > > > > these expressions would become more complex. It makes sense to hide > > > > > this complexity behind macros. This makes maintenance easier as these > > > > > macros are defined in a single place. As a plus the code becomes > > > > > more readable as well. > > > > > > > > Since 90% of these cases are about removing const from a pointer, > > > > maybe it would be better to have a macro that did that? > > > > > > > > Would not work for base driver code which is pretending to be platform independent. > > > > > > Most of the warnings I've seen were about dropping the volatile qualifier, like the one below: > > > > > > ../drivers/net/i40e/i40e_rxtx_vec_sse.c:42:32: warning: cast from 'volatile struct i40e_32byte_rx_desc::(unnamed at ../drivers/net/i40e/base/i40e_type.h:803:2) *' to '__attribute__((__vector_size__(2 * sizeof(long long)))) long long *' drops volatile qualifier [-Wcast-qual] > > > 42 | _mm_store_si128((__m128i *)&rxdp[i].read, > > > | ^ > > > > > > To make sure I understood your suggestion correctly, you're proposing to replace this > > > > > > __rte_diagnostic_push > > > __rte_diagnostic_ignored_wcast_qual > > > _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); > > > __rte_diagnostic_pop > > > > > > > > > with something like this? > > > > > > _mm_store_si128(RTE_IGNORE_CAST_QUAL((__m128i *)&rxdp[i].read), dma_addr0); > > > > > > This could be done, and I think it does look better, despite the slight line length increase. > > > > +1 for this option. One macro can be used to drop all qualifiers, both > > const and volatile, right? > > Yes, a single macro can drop all qualifiers. I did realize though that the macro must involve the entire expression - it cannot be used just around one parameter, unfortunately. > For many use cases, those involving pointers, the qualifiers can be cast away by passing through a uintptr_t. Just tested this with gcc and clang: volatile int x = 5; int *y = (int *)(uintptr_t)&x; printf("*y = %d\n", *y); works without warnings or errors. Does this similarly work with MSVC? If so, we can do a macro specifically for pointers types, which should cover 99% of what we need. /Bruce ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v11 0/3] add diagnostics macros to make code portable 2025-01-08 9:20 ` Bruce Richardson @ 2025-01-14 19:20 ` Andre Muezerie 2025-01-15 11:11 ` Bruce Richardson 0 siblings, 1 reply; 87+ messages in thread From: Andre Muezerie @ 2025-01-14 19:20 UTC (permalink / raw) To: Bruce Richardson; +Cc: Stephen Hemminger, dev On Wed, Jan 08, 2025 at 09:20:27AM +0000, Bruce Richardson wrote: > On Tue, Jan 07, 2025 at 06:46:48PM -0800, Andre Muezerie wrote: > > On Mon, Jan 06, 2025 at 11:00:15AM +0000, Bruce Richardson wrote: > > > On Fri, Jan 03, 2025 at 01:26:34PM -0800, Andre Muezerie wrote: > > > > On Fri, Jan 03, 2025 at 11:24:02AM -0800, Stephen Hemminger wrote: > > > > > On Fri, 3 Jan 2025 07:36:48 -0800 > > > > > Andre Muezerie <andremue@linux.microsoft.com> wrote: > > > > > > > > > > > From: Andre Muezerie <andremue@linux.microsoft.com> > > > > > > To: andremue@linux.microsoft.com > > > > > > Cc: dev@dpdk.org, stephen@networkplumber.org > > > > > > Subject: [PATCH v11 0/3] add diagnostics macros to make code portable > > > > > > Date: Fri, 3 Jan 2025 07:36:48 -0800 > > > > > > X-Mailer: git-send-email 1.8.3.1 > > > > > > > > > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > > > > > sprinkled over the code and only activate these pragmas for certain > > > > > > compilers (gcc and clang). Clang supports GCC's pragma for > > > > > > compatibility with existing source code, so #pragma GCC diagnostic > > > > > > and #pragma clang diagnostic are synonyms for Clang > > > > > > (https://clang.llvm.org/docs/UsersManual.html). > > > > > > > > > > > > Now that effort is being made to make the code compatible with MSVC > > > > > > these expressions would become more complex. It makes sense to hide > > > > > > this complexity behind macros. This makes maintenance easier as these > > > > > > macros are defined in a single place. As a plus the code becomes > > > > > > more readable as well. > > > > > > > > > > Since 90% of these cases are about removing const from a pointer, > > > > > maybe it would be better to have a macro that did that? > > > > > > > > > > Would not work for base driver code which is pretending to be platform independent. > > > > > > > > Most of the warnings I've seen were about dropping the volatile qualifier, like the one below: > > > > > > > > ../drivers/net/i40e/i40e_rxtx_vec_sse.c:42:32: warning: cast from 'volatile struct i40e_32byte_rx_desc::(unnamed at ../drivers/net/i40e/base/i40e_type.h:803:2) *' to '__attribute__((__vector_size__(2 * sizeof(long long)))) long long *' drops volatile qualifier [-Wcast-qual] > > > > 42 | _mm_store_si128((__m128i *)&rxdp[i].read, > > > > | ^ > > > > > > > > To make sure I understood your suggestion correctly, you're proposing to replace this > > > > > > > > __rte_diagnostic_push > > > > __rte_diagnostic_ignored_wcast_qual > > > > _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); > > > > __rte_diagnostic_pop > > > > > > > > > > > > with something like this? > > > > > > > > _mm_store_si128(RTE_IGNORE_CAST_QUAL((__m128i *)&rxdp[i].read), dma_addr0); > > > > > > > > This could be done, and I think it does look better, despite the slight line length increase. > > > > > > +1 for this option. One macro can be used to drop all qualifiers, both > > > const and volatile, right? > > > > Yes, a single macro can drop all qualifiers. I did realize though that the macro must involve the entire expression - it cannot be used just around one parameter, unfortunately. > > > For many use cases, those involving pointers, the qualifiers can be cast > away by passing through a uintptr_t. Just tested this with gcc and clang: > > volatile int x = 5; > int *y = (int *)(uintptr_t)&x; > printf("*y = %d\n", *y); > > works without warnings or errors. Does this similarly work with MSVC? If > so, we can do a macro specifically for pointers types, which should cover > 99% of what we need. > > /Bruce Yes, that also works with MSVC. So for the macro you mentioned, is this what you had in mind? old code: _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); new code: #define RTE_IGNORE_CAST_QUAL(X) \ (uintptr_t)(X) _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i].read), dma_addr0); ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v11 0/3] add diagnostics macros to make code portable 2025-01-14 19:20 ` Andre Muezerie @ 2025-01-15 11:11 ` Bruce Richardson 0 siblings, 0 replies; 87+ messages in thread From: Bruce Richardson @ 2025-01-15 11:11 UTC (permalink / raw) To: Andre Muezerie; +Cc: Stephen Hemminger, dev On Tue, Jan 14, 2025 at 11:20:05AM -0800, Andre Muezerie wrote: > On Wed, Jan 08, 2025 at 09:20:27AM +0000, Bruce Richardson wrote: > > On Tue, Jan 07, 2025 at 06:46:48PM -0800, Andre Muezerie wrote: > > > On Mon, Jan 06, 2025 at 11:00:15AM +0000, Bruce Richardson wrote: > > > > On Fri, Jan 03, 2025 at 01:26:34PM -0800, Andre Muezerie wrote: > > > > > On Fri, Jan 03, 2025 at 11:24:02AM -0800, Stephen Hemminger wrote: > > > > > > On Fri, 3 Jan 2025 07:36:48 -0800 > > > > > > Andre Muezerie <andremue@linux.microsoft.com> wrote: > > > > > > > > > > > > > From: Andre Muezerie <andremue@linux.microsoft.com> > > > > > > > To: andremue@linux.microsoft.com > > > > > > > Cc: dev@dpdk.org, stephen@networkplumber.org > > > > > > > Subject: [PATCH v11 0/3] add diagnostics macros to make code portable > > > > > > > Date: Fri, 3 Jan 2025 07:36:48 -0800 > > > > > > > X-Mailer: git-send-email 1.8.3.1 > > > > > > > > > > > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > > > > > > sprinkled over the code and only activate these pragmas for certain > > > > > > > compilers (gcc and clang). Clang supports GCC's pragma for > > > > > > > compatibility with existing source code, so #pragma GCC diagnostic > > > > > > > and #pragma clang diagnostic are synonyms for Clang > > > > > > > (https://clang.llvm.org/docs/UsersManual.html). > > > > > > > > > > > > > > Now that effort is being made to make the code compatible with MSVC > > > > > > > these expressions would become more complex. It makes sense to hide > > > > > > > this complexity behind macros. This makes maintenance easier as these > > > > > > > macros are defined in a single place. As a plus the code becomes > > > > > > > more readable as well. > > > > > > > > > > > > Since 90% of these cases are about removing const from a pointer, > > > > > > maybe it would be better to have a macro that did that? > > > > > > > > > > > > Would not work for base driver code which is pretending to be platform independent. > > > > > > > > > > Most of the warnings I've seen were about dropping the volatile qualifier, like the one below: > > > > > > > > > > ../drivers/net/i40e/i40e_rxtx_vec_sse.c:42:32: warning: cast from 'volatile struct i40e_32byte_rx_desc::(unnamed at ../drivers/net/i40e/base/i40e_type.h:803:2) *' to '__attribute__((__vector_size__(2 * sizeof(long long)))) long long *' drops volatile qualifier [-Wcast-qual] > > > > > 42 | _mm_store_si128((__m128i *)&rxdp[i].read, > > > > > | ^ > > > > > > > > > > To make sure I understood your suggestion correctly, you're proposing to replace this > > > > > > > > > > __rte_diagnostic_push > > > > > __rte_diagnostic_ignored_wcast_qual > > > > > _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); > > > > > __rte_diagnostic_pop > > > > > > > > > > > > > > > with something like this? > > > > > > > > > > _mm_store_si128(RTE_IGNORE_CAST_QUAL((__m128i *)&rxdp[i].read), dma_addr0); > > > > > > > > > > This could be done, and I think it does look better, despite the slight line length increase. > > > > > > > > +1 for this option. One macro can be used to drop all qualifiers, both > > > > const and volatile, right? > > > > > > Yes, a single macro can drop all qualifiers. I did realize though that the macro must involve the entire expression - it cannot be used just around one parameter, unfortunately. > > > > > For many use cases, those involving pointers, the qualifiers can be cast > > away by passing through a uintptr_t. Just tested this with gcc and clang: > > > > volatile int x = 5; > > int *y = (int *)(uintptr_t)&x; > > printf("*y = %d\n", *y); > > > > works without warnings or errors. Does this similarly work with MSVC? If > > so, we can do a macro specifically for pointers types, which should cover > > 99% of what we need. > > > > /Bruce > > Yes, that also works with MSVC. So for the macro you mentioned, is this what you had in mind? > > old code: > _mm_store_si128((__m128i *)&rxdp[i].read, dma_addr0); > > > new code: > #define RTE_IGNORE_CAST_QUAL(X) \ > (uintptr_t)(X) > > _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i].read), dma_addr0); Something like that. However, I'd actually include a (void *) in the macro which should avoid the need for the cast in the store function call: #define RTE_IGNORE_CAST_QUAL(X) (void *)(uintptr_t)(X) Since void pointers are automatically cast to any other pointer type, we save typecasting in lots of other places. If we want to avoid risk of someone trying to use this on non-pointer values, we may also be able to do this as an inline function to give a little type-safety (untested to verify, sadly :-(, just sharing as possible idea): static inline void * rte_ignore_ptr_qualifiers(const volatile void *x) { return (void *)(uintptr_t)x; } /Bruce ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v12 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (12 preceding siblings ...) 2025-01-03 15:36 ` [PATCH v11 0/3] " Andre Muezerie @ 2025-01-15 4:27 ` Andre Muezerie 2025-01-15 4:27 ` [PATCH v12 1/3] lib/eal: " Andre Muezerie ` (2 more replies) 2025-01-16 1:55 ` [PATCH v13 0/3] " Andre Muezerie ` (3 subsequent siblings) 17 siblings, 3 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-15 4:27 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. v12: * Added macro RTE_IGNORE_CAST_QUAL and used it as a more compact and readable form to suppress warnings where a cast is used to remove a type qualifier. v11: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v10: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v9: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v8: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v7: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v6: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v5: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): lib/eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 77 ++++++++--------- drivers/net/axgbe/axgbe_rxtx.h | 9 -- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 - drivers/net/dpaa2/dpaa2_rxtx.c | 16 +--- drivers/net/fm10k/fm10k_rxtx_vec.c | 22 +++-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 7 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 24 +++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 22 ++--- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 40 +++++---- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 30 ++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 - drivers/net/i40e/i40e_rxtx_vec_neon.c | 44 +++++----- drivers/net/i40e/i40e_rxtx_vec_sse.c | 34 ++++---- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 84 ++++++++++--------- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 78 ++++++++--------- drivers/net/iavf/iavf_rxtx_vec_common.h | 12 ++- drivers/net/iavf/iavf_rxtx_vec_neon.c | 27 +++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 56 +++++++------ drivers/net/ice/ice_rxtx_common_avx.h | 24 +++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 82 +++++++++--------- drivers/net/ice/ice_rxtx_vec_avx512.c | 72 ++++++++-------- drivers/net/ice/ice_rxtx_vec_common.h | 4 - drivers/net/ice/ice_rxtx_vec_sse.c | 42 ++++------ drivers/net/idpf/idpf_rxtx_vec_common.h | 4 - .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 18 ++-- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 22 +++-- drivers/net/mlx5/mlx5_flow.c | 5 +- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 ++-- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 76 ++++++++++------- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 11 ++- drivers/net/tap/tap_flow.c | 6 +- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 8 +- drivers/net/virtio/virtio_rxtx_simple.c | 4 - lib/eal/include/rte_common.h | 26 ++++++ 37 files changed, 510 insertions(+), 511 deletions(-) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v12 1/3] lib/eal: add diagnostics macros to make code portable 2025-01-15 4:27 ` [PATCH v12 " Andre Muezerie @ 2025-01-15 4:27 ` Andre Muezerie 2025-01-15 9:05 ` Morten Brørup 2025-01-15 4:27 ` [PATCH v12 2/3] drivers/common: " Andre Muezerie 2025-01-15 4:27 ` [PATCH v12 3/3] drivers/net: " Andre Muezerie 2 siblings, 1 reply; 87+ messages in thread From: Andre Muezerie @ 2025-01-15 4:27 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 4d299f2b36..2142dd968d 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -137,6 +137,32 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macro to ignore whenever a pointer is cast so as to remove a type + * qualifier from the target type. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + +#define RTE_IGNORE_CAST_QUAL(X) \ + ((uintptr_t)(X)) + /** * Mark a function or variable to a weak reference. */ -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* RE: [PATCH v12 1/3] lib/eal: add diagnostics macros to make code portable 2025-01-15 4:27 ` [PATCH v12 1/3] lib/eal: " Andre Muezerie @ 2025-01-15 9:05 ` Morten Brørup 0 siblings, 0 replies; 87+ messages in thread From: Morten Brørup @ 2025-01-15 9:05 UTC (permalink / raw) To: Andre Muezerie; +Cc: dev, stephen, bruce.richardson > +/* > + * Macro to ignore whenever a pointer is cast so as to remove a type > + * qualifier from the target type. > + */ This description could be better, something like the push/pop description: "Macro to disable compiler warnings about ..." > +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC Prefer defined(NAME) over defined NAME. Same for the description of push/pop below. (I see both in rte_common.h, so perhaps it's just my personal preference.) > +#define __rte_diagnostic_ignored_wcast_qual \ > + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") > +#else > +#define __rte_diagnostic_ignored_wcast_qual > +#endif > + > +/* > + * Macros to cause the compiler to remember the state of the diagnostics as of > + * each push, and restore to that point at each pop. > + */ > +#if !defined __INTEL_COMPILER && !defined RTE_TOOLCHAIN_MSVC > +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") > +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") > +#else > +#define __rte_diagnostic_push > +#define __rte_diagnostic_pop > +#endif > +#define RTE_IGNORE_CAST_QUAL(X) \ > + ((uintptr_t)(X)) A description of this macro is missing. Rather than assign a name that refers to the name of compiler's warning, could you come up with a name that describes what the macro does to X, i.e. discards qualifiers. And if the macro is exclusively for pointers, perhaps it should have PTR somewhere in its name. And do we really need this macro? Can't RTE_CAST_FIELD() be used instead? Or can we make the macro more like RTE_CAST_FIELD()? Perhaps RTE_CAST(var, type)? Or maybe, inspired by RTE_PTR_ADD(): #define RTE_PTR(var) ((void*)((uintptr_t)(ptr))) #define RTE_CONST_PTR(var) ((const void*)((uintptr_t)(ptr))) ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v12 2/3] drivers/common: add diagnostics macros to make code portable 2025-01-15 4:27 ` [PATCH v12 " Andre Muezerie 2025-01-15 4:27 ` [PATCH v12 1/3] lib/eal: " Andre Muezerie @ 2025-01-15 4:27 ` Andre Muezerie 2025-01-15 11:13 ` Bruce Richardson 2025-01-15 4:27 ` [PATCH v12 3/3] drivers/net: " Andre Muezerie 2 siblings, 1 reply; 87+ messages in thread From: Andre Muezerie @ 2025-01-15 4:27 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 77 ++++++++++--------- 1 file changed, 39 insertions(+), 38 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..fefc0a05ca 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,7 +30,7 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i].read), dma_addr0); } } @@ -108,8 +104,10 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512((__m512i *)RTE_IGNORE_CAST_QUAL(&rxdp->read), + dma_addr0_3); + _mm512_store_si512((__m512i *)RTE_IGNORE_CAST_QUAL(&(rxdp + 4)->read), + dma_addr4_7); } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +162,8 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_storeu_si128((__m128i *)&rxdp[i].read, - dma_addr0); + _mm_storeu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp[i].read), dma_addr0); } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +214,10 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); - _mm512_storeu_si512((void *)rxdp, desc0_1); - _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); - _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); - _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); + _mm512_storeu_si512((void *)RTE_IGNORE_CAST_QUAL(rxdp), desc0_1); + _mm512_storeu_si512((void *)RTE_IGNORE_CAST_QUAL((rxdp + 2)), desc2_3); + _mm512_storeu_si512((void *)RTE_IGNORE_CAST_QUAL((rxdp + 4)), desc4_5); + _mm512_storeu_si512((void *)RTE_IGNORE_CAST_QUAL((rxdp + 6)), desc6_7); rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -337,28 +335,28 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,7 +558,7 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i], + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i]), dma_addr0); } } @@ -634,7 +632,7 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; - _mm_storeu_si128((__m128i *)&rxdp[i], + _mm_storeu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i]), dma_addr0); } } @@ -798,28 +796,28 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1129,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1176,7 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512((void *)RTE_IGNORE_CAST_QUAL(txdp), desc0_3); } /* do any last ones */ @@ -1435,7 +1433,7 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static __rte_always_inline void @@ -1480,7 +1478,7 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512((void *)RTE_IGNORE_CAST_QUAL(txdp), desc0_3); } /* do any last ones */ @@ -1521,11 +1519,13 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); - idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); + idpf_splitq_vtx((void *)RTE_IGNORE_CAST_QUAL(txdp), tx_pkts, n - 1, + cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); - idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); + idpf_splitq_vtx1((void *)RTE_IGNORE_CAST_QUAL(txdp), *tx_pkts++, + cmd_dtype); nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1540,8 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); - idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); + idpf_splitq_vtx((void *)RTE_IGNORE_CAST_QUAL(txdp), tx_pkts, nb_commit, + cmd_dtype); tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v12 2/3] drivers/common: add diagnostics macros to make code portable 2025-01-15 4:27 ` [PATCH v12 2/3] drivers/common: " Andre Muezerie @ 2025-01-15 11:13 ` Bruce Richardson 0 siblings, 0 replies; 87+ messages in thread From: Bruce Richardson @ 2025-01-15 11:13 UTC (permalink / raw) To: Andre Muezerie; +Cc: dev, stephen On Tue, Jan 14, 2025 at 08:27:17PM -0800, Andre Muezerie wrote: > It was a common pattern to have "GCC diagnostic ignored" pragmas > sprinkled over the code and only activate these pragmas for certain > compilers (gcc and clang). Clang supports GCC's pragma for > compatibility with existing source code, so #pragma GCC diagnostic > and #pragma clang diagnostic are synonyms for Clang > (https://clang.llvm.org/docs/UsersManual.html). > > Now that effort is being made to make the code compatible with MSVC > these expressions would become more complex. It makes sense to hide > this complexity behind macros. This makes maintenance easier as these > macros are defined in a single place. As a plus the code becomes > more readable as well. > > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> > --- > drivers/common/idpf/idpf_common_rxtx_avx512.c | 77 ++++++++++--------- > 1 file changed, 39 insertions(+), 38 deletions(-) > > diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c > index b8450b03ae..fefc0a05ca 100644 > --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c > +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c > @@ -6,10 +6,6 @@ > #include "idpf_common_device.h" > #include "idpf_common_rxtx.h" > > -#ifndef __INTEL_COMPILER > -#pragma GCC diagnostic ignored "-Wcast-qual" > -#endif > - > #define IDPF_DESCS_PER_LOOP_AVX 8 > #define PKTLEN_SHIFT 10 > > @@ -34,7 +30,7 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) > dma_addr0 = _mm_setzero_si128(); > for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { > rxp[i] = &rxq->fake_mbuf; > - _mm_store_si128((__m128i *)&rxdp[i].read, > + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i].read), > dma_addr0); > } > } <SNIP for brevity> Adding in the (void *) to the IGNORE macro will hugely reduce the amount of casting being done in this file! /Bruce ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v12 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-15 4:27 ` [PATCH v12 " Andre Muezerie 2025-01-15 4:27 ` [PATCH v12 1/3] lib/eal: " Andre Muezerie 2025-01-15 4:27 ` [PATCH v12 2/3] drivers/common: " Andre Muezerie @ 2025-01-15 4:27 ` Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-15 4:27 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 -- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 - drivers/net/dpaa2/dpaa2_rxtx.c | 16 +--- drivers/net/fm10k/fm10k_rxtx_vec.c | 22 +++-- drivers/net/hns3/hns3_rxtx_vec_neon.h | 7 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 24 +++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 22 ++--- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 40 +++++---- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 30 ++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 - drivers/net/i40e/i40e_rxtx_vec_neon.c | 44 +++++----- drivers/net/i40e/i40e_rxtx_vec_sse.c | 34 ++++---- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 84 ++++++++++--------- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 78 ++++++++--------- drivers/net/iavf/iavf_rxtx_vec_common.h | 12 ++- drivers/net/iavf/iavf_rxtx_vec_neon.c | 27 +++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 56 +++++++------ drivers/net/ice/ice_rxtx_common_avx.h | 24 +++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 82 +++++++++--------- drivers/net/ice/ice_rxtx_vec_avx512.c | 72 ++++++++-------- drivers/net/ice/ice_rxtx_vec_common.h | 4 - drivers/net/ice/ice_rxtx_vec_sse.c | 42 ++++------ drivers/net/idpf/idpf_rxtx_vec_common.h | 4 - .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 18 ++-- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 22 +++-- drivers/net/mlx5/mlx5_flow.c | 5 +- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 ++-- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 76 ++++++++++------- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 11 ++- drivers/net/tap/tap_flow.c | 6 +- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 8 +- drivers/net/virtio/virtio_rxtx_simple.c | 4 - 35 files changed, 445 insertions(+), 473 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..fd73c655dd 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,8 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } - fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); + fd[num_rx] = (struct qbman_fd *)RTE_IGNORE_CAST_QUAL + (qbman_result_DQ_fd(dq_storage)); dq_storage++; num_rx++; @@ -2118,8 +2111,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..eb5317198b 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,7 +266,7 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].q, + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i].q), dma_addr0); } @@ -316,8 +312,10 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp++->q), + dma_addr0); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp++->q), + dma_addr1); /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +463,7 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs0[3] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +475,11 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs0[2] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); rte_compiler_barrier(); - descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs0[1] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); rte_compiler_barrier(); - descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs0[0] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +734,7 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..3adfbc74e7 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,9 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; - vst1q_u64((uint64_t *)&desc->addr, val1); - vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(&desc->addr), val1); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(&desc->tx.outer_vlan_tag), + val2); } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..5d0b9ebc11 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,7 +32,7 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i].read), dma_addr0); } } @@ -72,8 +68,10 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp++->read), + dma_addr0); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp++->read), + dma_addr1); } #else #ifdef __AVX512VL__ @@ -144,8 +142,10 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512((__m512i *)RTE_IGNORE_CAST_QUAL + (&rxdp->read), dma_addr0_3); + _mm512_store_si512((__m512i *)RTE_IGNORE_CAST_QUAL + (&(rxdp + 4)->read), dma_addr4_7); } } else #endif /* __AVX512VL__*/ @@ -190,8 +190,10 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ - _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); - _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); + _mm256_store_si256((__m256i *)RTE_IGNORE_CAST_QUAL + (&rxdp->read), dma_addr0_1); + _mm256_store_si256((__m256i *)RTE_IGNORE_CAST_QUAL + (&(rxdp + 2)->read), dma_addr2_3); } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..4e089d8ae2 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -43,8 +41,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = (__vector unsigned long){}; for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vec_st(dma_addr0, 0, - (__vector unsigned long *)&rxdp[i].read); + vec_st(dma_addr0, 0, (__vector unsigned long *) + RTE_IGNORE_CAST_QUAL(&rxdp[i].read)); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +82,10 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); - vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); + vec_st(dma_addr0, 0, (__vector unsigned long *) + RTE_IGNORE_CAST_QUAL(&rxdp++->read)); + vec_st(dma_addr1, 0, (__vector unsigned long *) + RTE_IGNORE_CAST_QUAL(&rxdp++->read)); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +286,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = *(__vector unsigned long *)(rxdp + 3); + descs[3] = *(__vector unsigned long *)RTE_IGNORE_CAST_QUAL(rxdp + 3); rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +296,11 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ - descs[2] = *(__vector unsigned long *)(rxdp + 2); + descs[2] = *(__vector unsigned long *)RTE_IGNORE_CAST_QUAL(rxdp + 2); rte_compiler_barrier(); - descs[1] = *(__vector unsigned long *)(rxdp + 1); + descs[1] = *(__vector unsigned long *)RTE_IGNORE_CAST_QUAL(rxdp + 1); rte_compiler_barrier(); - descs[0] = *(__vector unsigned long *)(rxdp); + descs[0] = *(__vector unsigned long *)RTE_IGNORE_CAST_QUAL(rxdp); /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +534,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; - *(__vector unsigned long *)txdp = descriptor; + *(__vector unsigned long *)RTE_IGNORE_CAST_QUAL(txdp) = descriptor; } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..72cfd06934 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,10 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ - __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); - __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); + __m128i *rxdp_desc_0 = (void *)RTE_IGNORE_CAST_QUAL + ((&rxdp[desc_idx + 0].wb.qword2)); + __m128i *rxdp_desc_1 = (void *)RTE_IGNORE_CAST_QUAL + ((&rxdp[desc_idx + 1].wb.qword2)); const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,21 +274,29 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL(rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL(rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL(rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL(rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL(rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +701,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static inline void @@ -728,8 +734,8 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256((void *)RTE_IGNORE_CAST_QUAL(txdp + 2), desc2_3); + _mm256_store_si256((void *)RTE_IGNORE_CAST_QUAL(txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..b8602f4b5c 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,10 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ - __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); - __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); + __m128i *rxdp_desc_0 = (void *)RTE_IGNORE_CAST_QUAL + (&rxdp[desc_idx + 0].wb.qword2); + __m128i *rxdp_desc_1 = (void *)RTE_IGNORE_CAST_QUAL + (&rxdp[desc_idx + 1].wb.qword2); const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -264,28 +262,28 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* load in descriptors, in reverse order */ const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +873,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static inline void @@ -909,7 +907,7 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512((void *)RTE_IGNORE_CAST_QUAL(txdp), desc0_3); } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..0a3be06f41 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +38,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, zero); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (&rxdp[i].read), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,11 +56,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(&rxdp++->read), dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(&rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +85,14 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; - desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); - desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); - desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); - desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); + desc0_qw23 = vld1q_u64 + ((uint64_t *)RTE_IGNORE_CAST_QUAL(&(rxdp + 0)->wb.qword2)); + desc1_qw23 = vld1q_u64 + ((uint64_t *)RTE_IGNORE_CAST_QUAL(&(rxdp + 1)->wb.qword2)); + desc2_qw23 = vld1q_u64 + ((uint64_t *)RTE_IGNORE_CAST_QUAL(&(rxdp + 2)->wb.qword2)); + desc3_qw23 = vld1q_u64 + ((uint64_t *)RTE_IGNORE_CAST_QUAL(&(rxdp + 3)->wb.qword2)); /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,18 +423,22 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[0] = vld1q_u64((uint64_t *)(rxdp)); + descs[3] = vld1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); + descs[2] = vld1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); + descs[1] = vld1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); + descs[0] = vld1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp)); /* Use acquire fence to order loads of descriptor qwords */ rte_atomic_thread_fence(rte_memory_order_acquire); /* A.2 reload qword0 to make it ordered after qword1 load */ - descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0); - descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); - descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); - descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); + descs[3] = vld1q_lane_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (rxdp + 3), descs[3], 0); + descs[2] = vld1q_lane_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (rxdp + 2), descs[2], 0); + descs[1] = vld1q_lane_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (rxdp + 1), descs[1], 0); + descs[0] = vld1q_lane_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (rxdp), descs[0], 0); /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +668,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; - vst1q_u64((uint64_t *)txdp, descriptor); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..6a2f3f443c 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +37,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i].read), dma_addr0); } } @@ -72,8 +68,10 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp++->read), dma_addr0); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +95,14 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; - desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); - desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); - desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); - desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); + desc0_qw23 = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&(rxdp + 0)->wb.qword2)); + desc1_qw23 = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&(rxdp + 1)->wb.qword2)); + desc2_qw23 = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&(rxdp + 2)->wb.qword2)); + desc3_qw23 = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&(rxdp + 3)->wb.qword2)); /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +464,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +476,11 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +683,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..bdd2a6e78d 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,21 +189,29 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +513,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -743,28 +747,28 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -960,36 +964,36 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[7].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[6].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[5].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[4].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1668,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static __rte_always_inline void @@ -1719,8 +1723,8 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256((void *)RTE_IGNORE_CAST_QUAL(txdp + 2), desc2_3); + _mm256_store_si256((void *)RTE_IGNORE_CAST_QUAL(txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..064da0a5a3 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -165,28 +161,28 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +596,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -734,28 +730,28 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1112,36 +1108,36 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[7].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[6].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[5].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[4].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1979,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2033,7 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512((void *)RTE_IGNORE_CAST_QUAL(txdp), desc0_3); } /* do any last ones */ @@ -2225,7 +2221,7 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); - _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); + _mm256_storeu_si256((__m256i *)RTE_IGNORE_CAST_QUAL(txdp), ctx_data_desc); } static __rte_always_inline void @@ -2300,7 +2296,7 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512((void *)RTE_IGNORE_CAST_QUAL(txdp), desc0_3); } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..fe5ae4f704 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,7 +418,7 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i].read), dma_addr0); } } @@ -458,8 +454,10 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp++->read), dma_addr0); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp++->read), dma_addr1); } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c index 04be574683..12ba5c9d22 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_neon.c +++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c @@ -36,7 +36,8 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxep[i] = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, zero); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (&rxdp[i].read), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -53,11 +54,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(&rxdp++->read), dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(&rxdp++->read), dma_addr1); } rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH; @@ -269,18 +270,22 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[0] = vld1q_u64((uint64_t *)(rxdp)); + descs[3] = vld1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); + descs[2] = vld1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); + descs[1] = vld1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); + descs[0] = vld1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp)); /* Use acquire fence to order loads of descriptor qwords */ rte_atomic_thread_fence(rte_memory_order_acquire); /* A.2 reload qword0 to make it ordered after qword1 load */ - descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0); - descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); - descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); - descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); + descs[3] = vld1q_lane_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (rxdp + 3), descs[3], 0); + descs[2] = vld1q_lane_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (rxdp + 2), descs[2], 0); + descs[1] = vld1q_lane_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (rxdp + 1), descs[1], 0); + descs[0] = vld1q_lane_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (rxdp), descs[0], 0); /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..8767c71663 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,7 +34,7 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i].read), dma_addr0); } } @@ -69,8 +65,10 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp++->read), dma_addr0); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp++->read), dma_addr1); } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +576,8 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +589,14 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +785,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +866,8 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +879,14 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -927,17 +933,17 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ - descs_bh[3] = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + descs_bh[3] = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[3].wb.status_error1)); rte_compiler_barrier(); - descs_bh[2] = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + descs_bh[2] = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[2].wb.status_error1)); rte_compiler_barrier(); - descs_bh[1] = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + descs_bh[1] = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[1].wb.status_error1)); rte_compiler_barrier(); - descs_bh[0] = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + descs_bh[0] = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[0].wb.status_error1)); } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1355,7 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..4cc1744150 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,7 +29,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i].read), dma_addr0); } } @@ -77,8 +73,10 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp++->read), dma_addr0); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp++->read), dma_addr1); } #else #ifdef __AVX512VL__ @@ -157,8 +155,10 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512((__m512i *)RTE_IGNORE_CAST_QUAL + (&rxdp->read), dma_addr0_3); + _mm512_store_si512((__m512i *)RTE_IGNORE_CAST_QUAL + (&(rxdp + 4)->read), dma_addr4_7); } } else #endif /* __AVX512VL__ */ @@ -213,8 +213,10 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ - _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); - _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); + _mm256_store_si256((__m256i *)RTE_IGNORE_CAST_QUAL + (&rxdp->read), dma_addr0_1); + _mm256_store_si256((__m256i *)RTE_IGNORE_CAST_QUAL + (&(rxdp + 2)->read), dma_addr2_3); } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..7876a2150d 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,21 +250,29 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL + (rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,37 +448,37 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ - const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + const __m128i raw_desc_bh7 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[7].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + const __m128i raw_desc_bh6 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[6].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + const __m128i raw_desc_bh5 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[5].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + const __m128i raw_desc_bh4 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[4].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + const __m128i raw_desc_bh3 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[3].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + const __m128i raw_desc_bh2 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[2].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + const __m128i raw_desc_bh1 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[1].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + const __m128i raw_desc_bh0 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +794,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static __rte_always_inline void @@ -841,8 +845,8 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256((void *)RTE_IGNORE_CAST_QUAL(txdp + 2), desc2_3); + _mm256_store_si256((void *)RTE_IGNORE_CAST_QUAL(txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..63ce947e56 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -244,28 +240,28 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, /* load in descriptors, in reverse order */ const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128((void *)RTE_IGNORE_CAST_QUAL(rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -474,37 +470,37 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ - const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + const __m128i raw_desc_bh7 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[7].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + const __m128i raw_desc_bh6 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[6].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + const __m128i raw_desc_bh5 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[5].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + const __m128i raw_desc_bh4 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[4].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + const __m128i raw_desc_bh3 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[3].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + const __m128i raw_desc_bh2 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[2].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + const __m128i raw_desc_bh1 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[1].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + const __m128i raw_desc_bh0 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL + (&rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +983,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static __rte_always_inline void @@ -1029,7 +1025,7 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512((void *)RTE_IGNORE_CAST_QUAL(txdp), desc0_3); } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..6a04d726c0 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,7 +48,7 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i].read), dma_addr0); } } @@ -91,8 +87,10 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp++->read), dma_addr0); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp++->read), dma_addr1); } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +423,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +435,11 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -489,21 +487,17 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ - const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + const __m128i raw_desc_bh3 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + const __m128i raw_desc_bh2 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL(&rxdp[2].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + const __m128i raw_desc_bh1 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL(&rxdp[1].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + const __m128i raw_desc_bh0 = _mm_load_si128 + ((void *)RTE_IGNORE_CAST_QUAL(&rxdp[0].wb.status_error1)); /** * to shift the 32b RSS hash value to the @@ -680,7 +674,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..230e54ee01 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -36,7 +34,7 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(&rxdp[i].read), zero); } } @@ -60,12 +58,12 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(&rxdp++->read), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(&rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -367,10 +365,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); /* A. load 4 pkts descs */ - descs[0] = vld1q_u64((uint64_t *)(rxdp)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); + descs[0] = vld1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp)); + descs[1] = vld1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); + descs[2] = vld1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); + descs[3] = vld1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); /* B.2 copy 2 mbuf point into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); @@ -554,7 +552,7 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, pkt->buf_iova + pkt->data_off, (uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len}; - vst1q_u64((uint64_t *)&txdp->read, descriptor); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(&txdp->read), descriptor); } static inline void diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..2b48dc723a 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,7 +37,7 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&rxdp[i].read), dma_addr0); } } @@ -76,8 +72,10 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp++->read), dma_addr0); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +464,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +476,11 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL(rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +674,7 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)&txdp->read, descriptor); + _mm_store_si128((__m128i *)RTE_IGNORE_CAST_QUAL(&txdp->read), descriptor); } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..e3b9d5b570 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,7 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" - tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop + tunnel = (typeof(tunnel))RTE_IGNORE_CAST_QUAL(flow->tunnel); return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..e7fe273ea4 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * @@ -75,7 +73,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { volatile struct mlx5_mini_cqe8 *mcq = - (void *)&(cq + !rxq->cqe_comp_layout)->pkt_info; + (volatile void *)&(cq + !rxq->cqe_comp_layout)->pkt_info; /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -139,9 +137,9 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, */ cycle: if (rxq->cqe_comp_layout) - rte_prefetch0((void *)(cq + mcqe_n)); + rte_prefetch0((void *)RTE_IGNORE_CAST_QUAL(cq + mcqe_n)); for (pos = 0; pos < mcqe_n; ) { - uint8_t *p = (void *)&mcq[pos % 8]; + uint8_t *p = (void *)RTE_IGNORE_CAST_QUAL(&mcq[pos % 8]); uint8_t *e0 = (void *)&elts[pos]->rearm_data; uint8_t *e1 = (void *)&elts[pos + 1]->rearm_data; uint8_t *e2 = (void *)&elts[pos + 2]->rearm_data; @@ -157,7 +155,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) - rte_prefetch0((void *)(cq + pos + i)); + rte_prefetch0((volatile void *)(cq + pos + i)); __asm__ volatile ( /* A.1 load mCQEs into a 128bit register. */ "ld1 {v16.16b - v17.16b}, [%[mcq]] \n\t" @@ -367,8 +365,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { if (pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)&(cq + pos)->pkt_info; + rte_prefetch0((volatile void *)(cq + pos + 8)); + mcq = (volatile void *)&(cq + pos)->pkt_info; for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -383,7 +381,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -663,7 +661,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, mask = vcreate_u16(pkts_n - pos < MLX5_VPMD_DESCS_PER_LOOP ? -1UL >> ((pkts_n - pos) * sizeof(uint16_t) * 8) : 0); - p0 = (void *)&cq[pos].pkt_info; + p0 = (void *)RTE_IGNORE_CAST_QUAL(&cq[pos].pkt_info); p1 = p0 + (pkts_n - pos > 1) * sizeof(struct mlx5_cqe); p2 = p1 + (pkts_n - pos > 2) * sizeof(struct mlx5_cqe); p3 = p2 + (pkts_n - pos > 3) * sizeof(struct mlx5_cqe); diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..27572a3454 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile void *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -130,7 +127,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, */ cycle: if (rxq->cqe_comp_layout) - rte_prefetch0((void *)(cq + mcqe_n)); + rte_prefetch0((void *)RTE_IGNORE_CAST_QUAL(cq + mcqe_n)); for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -141,10 +138,12 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) - rte_prefetch0((void *)(cq + pos + i)); + rte_prefetch0((void *)RTE_IGNORE_CAST_QUAL(cq + pos + i)); /* A.1 load mCQEs into a 128bit register. */ - mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); - mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); + mcqe1 = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&mcq[pos % 8])); + mcqe2 = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&mcq[pos % 8 + 2])); /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -355,8 +354,9 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { if (pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); + rte_prefetch0((void *)RTE_IGNORE_CAST_QUAL + (cq + pos + 8)); + mcq = (volatile void *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +371,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,38 +651,44 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); - cqes[3] = _mm_loadl_epi64((__m128i *) - &cq[pos + p3].sop_drop_qpn); + cqes[3] = _mm_loadl_epi64((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos + p3].sop_drop_qpn)); rte_compiler_barrier(); p2 = _mm_extract_epi16(p, 2); - cqes[2] = _mm_loadl_epi64((__m128i *) - &cq[pos + p2].sop_drop_qpn); + cqes[2] = _mm_loadl_epi64((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos + p2].sop_drop_qpn)); rte_compiler_barrier(); /* B.1 load mbuf pointers. */ mbp1 = _mm_loadu_si128((__m128i *)&elts[pos]); mbp2 = _mm_loadu_si128((__m128i *)&elts[pos + 2]); /* A.1 load a block having op_own. */ p1 = _mm_extract_epi16(p, 1); - cqes[1] = _mm_loadl_epi64((__m128i *) - &cq[pos + p1].sop_drop_qpn); + cqes[1] = _mm_loadl_epi64((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos + p1].sop_drop_qpn)); rte_compiler_barrier(); - cqes[0] = _mm_loadl_epi64((__m128i *) - &cq[pos].sop_drop_qpn); + cqes[0] = _mm_loadl_epi64((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos].sop_drop_qpn)); /* B.2 copy mbuf pointers. */ _mm_storeu_si128((__m128i *)&pkts[pos], mbp1); _mm_storeu_si128((__m128i *)&pkts[pos + 2], mbp2); rte_io_rmb(); /* C.1 load remained CQE data and extract necessary fields. */ - cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p3]); - cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos + p2]); + cqe_tmp2 = _mm_load_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos + p3])); + cqe_tmp1 = _mm_load_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos + p2])); cqes[3] = _mm_blendv_epi8(cqes[3], cqe_tmp2, blend_mask); cqes[2] = _mm_blendv_epi8(cqes[2], cqe_tmp1, blend_mask); - cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + p3].csum); - cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos + p2].csum); + cqe_tmp2 = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos + p3].csum)); + cqe_tmp1 = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos + p2].csum)); cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x30); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); - cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); - cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); + cqe_tmp2 = _mm_loadl_epi64((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos + p3].rsvd4[2])); + cqe_tmp1 = _mm_loadl_epi64((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos + p2].rsvd4[2])); cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,16 +706,22 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ - cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); - cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); + cqe_tmp2 = _mm_load_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos + p1])); + cqe_tmp1 = _mm_load_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos])); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); cqes[0] = _mm_blendv_epi8(cqes[0], cqe_tmp1, blend_mask); - cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + p1].csum); - cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos].csum); + cqe_tmp2 = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos + p1].csum)); + cqe_tmp1 = _mm_loadu_si128((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos].csum)); cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x30); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); - cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); - cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); + cqe_tmp2 = _mm_loadl_epi64((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos + p1].rsvd4[2])); + cqe_tmp1 = _mm_loadl_epi64((__m128i *)RTE_IGNORE_CAST_QUAL + (&cq[pos].rsvd4[2])); cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c index 37075ea5e7..29e5050aa0 100644 --- a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c +++ b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c @@ -35,7 +35,8 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_NGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (&rxdp[i]), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,12 +59,14 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (rxdp++), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL + (rxdp++), dma_addr1); } rxq->rxrearm_start += RTE_NGBE_RXQ_REARM_THRESH; @@ -484,7 +487,7 @@ vtx1(volatile struct ngbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; - vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static inline void diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c index d4d647fab5..75e90e3a54 100644 --- a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c +++ b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c @@ -34,7 +34,7 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_TXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(&rxdp[i]), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -57,12 +57,12 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp++), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(rxdp++), dma_addr1); } rxq->rxrearm_start += RTE_TXGBE_RXQ_REARM_THRESH; @@ -484,7 +484,7 @@ vtx1(volatile struct txgbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; - vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); + vst1q_u64((uint64_t *)RTE_IGNORE_CAST_QUAL(txdp), descriptor); } static inline void diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.0.vfs.0.3 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v13 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (13 preceding siblings ...) 2025-01-15 4:27 ` [PATCH v12 " Andre Muezerie @ 2025-01-16 1:55 ` Andre Muezerie 2025-01-16 1:55 ` [PATCH v13 1/3] eal: " Andre Muezerie ` (3 more replies) 2025-01-18 2:46 ` [PATCH v14 " Andre Muezerie ` (2 subsequent siblings) 17 siblings, 4 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-16 1:55 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. v13: * Renamed RTE_IGNORE_CAST_QUAL into RTE_PTR_DROP_QUALIFIERS. * Added (void *) cast to RTE_PTR_DROP_QUALIFIERS to avoid the need for casting the result in most places where the macro is used. v12: * Added macro RTE_IGNORE_CAST_QUAL and used it as a more compact and readable form to suppress warnings where a cast is used to remove a type qualifier. v11: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v10: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v9: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v8: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v7: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v6: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v5: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 77 ++++++++--------- drivers/net/axgbe/axgbe_rxtx.h | 9 -- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 - drivers/net/dpaa2/dpaa2_rxtx.c | 15 +--- drivers/net/fm10k/fm10k_rxtx_vec.c | 20 ++--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 6 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 24 +++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 23 ++--- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 40 +++++---- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 30 ++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 - drivers/net/i40e/i40e_rxtx_vec_neon.c | 39 ++++----- drivers/net/i40e/i40e_rxtx_vec_sse.c | 32 +++---- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 84 ++++++++++--------- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 78 ++++++++--------- drivers/net/iavf/iavf_rxtx_vec_common.h | 12 ++- drivers/net/iavf/iavf_rxtx_vec_neon.c | 26 +++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 52 ++++++------ drivers/net/ice/ice_rxtx_common_avx.h | 24 +++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 74 ++++++++-------- drivers/net/ice/ice_rxtx_vec_avx512.c | 64 ++++++-------- drivers/net/ice/ice_rxtx_vec_common.h | 4 - drivers/net/ice/ice_rxtx_vec_sse.c | 28 +++---- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 - .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 18 ++-- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 20 ++--- drivers/net/mlx5/mlx5_flow.c | 5 +- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 ++-- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 71 +++++++++------- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 8 +- drivers/net/tap/tap_flow.c | 6 +- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 8 +- drivers/net/virtio/virtio_rxtx_simple.c | 4 - lib/eal/include/rte_common.h | 29 +++++++ 37 files changed, 466 insertions(+), 503 deletions(-) -- 2.47.2.vfs.0.0 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v13 1/3] eal: add diagnostics macros to make code portable 2025-01-16 1:55 ` [PATCH v13 0/3] " Andre Muezerie @ 2025-01-16 1:55 ` Andre Muezerie 2025-01-16 1:55 ` [PATCH v13 2/3] drivers/common: " Andre Muezerie ` (2 subsequent siblings) 3 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-16 1:55 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 40592f71b1..8ddab43ec0 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -156,6 +156,35 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/* + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + +/* + * Macro to disable compiler warnings about removing a type + * qualifier from the target type. + */ +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/* + * Macro to disable compiler warnings about removing a type + * qualifier from a pointer. + */ +#define RTE_PTR_DROP_QUALIFIERS(X) ((void *)(uintptr_t)(X)) + /** * Mark a function or variable to a weak reference. */ -- 2.47.2.vfs.0.0 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v13 2/3] drivers/common: add diagnostics macros to make code portable 2025-01-16 1:55 ` [PATCH v13 0/3] " Andre Muezerie 2025-01-16 1:55 ` [PATCH v13 1/3] eal: " Andre Muezerie @ 2025-01-16 1:55 ` Andre Muezerie 2025-01-16 1:55 ` [PATCH v13 3/3] drivers/net: " Andre Muezerie 2025-01-16 8:58 ` [PATCH v13 0/3] " Bruce Richardson 3 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-16 1:55 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 77 ++++++++++--------- 1 file changed, 39 insertions(+), 38 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..4fd06a98a3 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,7 +30,7 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), dma_addr0); } } @@ -108,8 +104,10 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512(RTE_PTR_DROP_QUALIFIERS(&rxdp->read), + dma_addr0_3); + _mm512_store_si512(RTE_PTR_DROP_QUALIFIERS(&(rxdp + 4)->read), + dma_addr4_7); } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +162,8 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_storeu_si128((__m128i *)&rxdp[i].read, - dma_addr0); + _mm_storeu_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[i].read), dma_addr0); } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +214,10 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); - _mm512_storeu_si512((void *)rxdp, desc0_1); - _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); - _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); - _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); + _mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS(rxdp), desc0_1); + _mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS((rxdp + 2)), desc2_3); + _mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS((rxdp + 4)), desc4_5); + _mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS((rxdp + 6)), desc6_7); rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -337,28 +335,28 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,7 +558,7 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i], + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i]), dma_addr0); } } @@ -634,7 +632,7 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; - _mm_storeu_si128((__m128i *)&rxdp[i], + _mm_storeu_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i]), dma_addr0); } } @@ -798,28 +796,28 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1129,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1176,7 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_3); } /* do any last ones */ @@ -1435,7 +1433,7 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static __rte_always_inline void @@ -1480,7 +1478,7 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_3); } /* do any last ones */ @@ -1521,11 +1519,13 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); - idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); + idpf_splitq_vtx(RTE_PTR_DROP_QUALIFIERS(txdp), tx_pkts, n - 1, + cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); - idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); + idpf_splitq_vtx1(RTE_PTR_DROP_QUALIFIERS(txdp), *tx_pkts++, + cmd_dtype); nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1540,8 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); - idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); + idpf_splitq_vtx(RTE_PTR_DROP_QUALIFIERS(txdp), tx_pkts, nb_commit, + cmd_dtype); tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.2.vfs.0.0 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v13 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-16 1:55 ` [PATCH v13 0/3] " Andre Muezerie 2025-01-16 1:55 ` [PATCH v13 1/3] eal: " Andre Muezerie 2025-01-16 1:55 ` [PATCH v13 2/3] drivers/common: " Andre Muezerie @ 2025-01-16 1:55 ` Andre Muezerie 2025-01-16 8:57 ` Bruce Richardson 2025-01-16 9:08 ` Morten Brørup 2025-01-16 8:58 ` [PATCH v13 0/3] " Bruce Richardson 3 siblings, 2 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-16 1:55 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 -- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 - drivers/net/dpaa2/dpaa2_rxtx.c | 15 +--- drivers/net/fm10k/fm10k_rxtx_vec.c | 20 ++--- drivers/net/hns3/hns3_rxtx_vec_neon.h | 6 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 24 +++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 23 ++--- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 40 +++++---- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 30 ++++--- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 - drivers/net/i40e/i40e_rxtx_vec_neon.c | 39 ++++----- drivers/net/i40e/i40e_rxtx_vec_sse.c | 32 +++---- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 84 ++++++++++--------- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 78 ++++++++--------- drivers/net/iavf/iavf_rxtx_vec_common.h | 12 ++- drivers/net/iavf/iavf_rxtx_vec_neon.c | 26 +++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 52 ++++++------ drivers/net/ice/ice_rxtx_common_avx.h | 24 +++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 74 ++++++++-------- drivers/net/ice/ice_rxtx_vec_avx512.c | 64 ++++++-------- drivers/net/ice/ice_rxtx_vec_common.h | 4 - drivers/net/ice/ice_rxtx_vec_sse.c | 28 +++---- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 - .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 18 ++-- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 20 ++--- drivers/net/mlx5/mlx5_flow.c | 5 +- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 ++-- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 71 +++++++++------- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 8 +- drivers/net/tap/tap_flow.c | 6 +- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 8 +- drivers/net/virtio/virtio_rxtx_simple.c | 4 - 35 files changed, 398 insertions(+), 465 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..f8b07a5acd 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,7 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } - fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); + fd[num_rx] = RTE_PTR_DROP_QUALIFIERS(qbman_result_DQ_fd(dq_storage)); dq_storage++; num_rx++; @@ -2118,8 +2110,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..6fc9097ebc 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,7 +266,7 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].q, + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].q), dma_addr0); } @@ -316,8 +312,8 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->q), dma_addr0); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->q), dma_addr1); /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +461,7 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs0[3] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +473,11 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs0[2] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); - descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs0[1] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); - descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs0[0] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +732,7 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..a0acb2a3d6 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,8 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; - vst1q_u64((uint64_t *)&desc->addr, val1); - vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&desc->addr), val1); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&desc->tx.outer_vlan_tag), val2); } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..f8a4a96eee 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,7 +32,7 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), dma_addr0); } } @@ -72,8 +68,10 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), + dma_addr0); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), + dma_addr1); } #else #ifdef __AVX512VL__ @@ -144,8 +142,10 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512(RTE_PTR_DROP_QUALIFIERS + (&rxdp->read), dma_addr0_3); + _mm512_store_si512(RTE_PTR_DROP_QUALIFIERS + (&(rxdp + 4)->read), dma_addr4_7); } } else #endif /* __AVX512VL__*/ @@ -190,8 +190,10 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ - _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); - _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); + _mm256_store_si256(RTE_PTR_DROP_QUALIFIERS + (&rxdp->read), dma_addr0_1); + _mm256_store_si256(RTE_PTR_DROP_QUALIFIERS + (&(rxdp + 2)->read), dma_addr2_3); } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..3c67f959b8 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -43,8 +41,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = (__vector unsigned long){}; for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vec_st(dma_addr0, 0, - (__vector unsigned long *)&rxdp[i].read); + vec_st(dma_addr0, 0, RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read)); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +81,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); - vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); + vec_st(dma_addr0, 0, RTE_PTR_DROP_QUALIFIERS(&rxdp++->read)); + vec_st(dma_addr1, 0, RTE_PTR_DROP_QUALIFIERS(&rxdp++->read)); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +283,8 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = *(__vector unsigned long *)(rxdp + 3); + descs[3] = *(__vector unsigned long *) + RTE_PTR_DROP_QUALIFIERS(rxdp + 3); rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +294,14 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ - descs[2] = *(__vector unsigned long *)(rxdp + 2); + descs[2] = *(__vector unsigned long *) + RTE_PTR_DROP_QUALIFIERS(rxdp + 2); rte_compiler_barrier(); - descs[1] = *(__vector unsigned long *)(rxdp + 1); + descs[1] = *(__vector unsigned long *) + RTE_PTR_DROP_QUALIFIERS(rxdp + 1); rte_compiler_barrier(); - descs[0] = *(__vector unsigned long *)(rxdp); + descs[0] = *(__vector unsigned long *) + RTE_PTR_DROP_QUALIFIERS(rxdp); /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +535,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; - *(__vector unsigned long *)txdp = descriptor; + *(__vector unsigned long *)RTE_PTR_DROP_QUALIFIERS(txdp) = descriptor; } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..217add8be7 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,10 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ - __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); - __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); + __m128i *rxdp_desc_0 = RTE_PTR_DROP_QUALIFIERS + ((&rxdp[desc_idx + 0].wb.qword2)); + __m128i *rxdp_desc_1 = RTE_PTR_DROP_QUALIFIERS + ((&rxdp[desc_idx + 1].wb.qword2)); const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,21 +274,29 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +701,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static inline void @@ -728,8 +734,8 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256(RTE_PTR_DROP_QUALIFIERS(txdp + 2), desc2_3); + _mm256_store_si256(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..52a54a9e79 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,10 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ - __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); - __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); + __m128i *rxdp_desc_0 = RTE_PTR_DROP_QUALIFIERS + (&rxdp[desc_idx + 0].wb.qword2); + __m128i *rxdp_desc_1 = RTE_PTR_DROP_QUALIFIERS + (&rxdp[desc_idx + 1].wb.qword2); const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -264,28 +262,28 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* load in descriptors, in reverse order */ const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +873,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static inline void @@ -909,7 +907,7 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_3); } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..23525db319 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +38,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, zero); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,11 +55,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +84,10 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; - desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); - desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); - desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); - desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); + desc0_qw23 = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(&(rxdp + 0)->wb.qword2)); + desc1_qw23 = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(&(rxdp + 1)->wb.qword2)); + desc2_qw23 = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(&(rxdp + 2)->wb.qword2)); + desc3_qw23 = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(&(rxdp + 3)->wb.qword2)); /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,18 +418,22 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[0] = vld1q_u64((uint64_t *)(rxdp)); + descs[3] = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); + descs[2] = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); + descs[1] = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); + descs[0] = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp)); /* Use acquire fence to order loads of descriptor qwords */ rte_atomic_thread_fence(rte_memory_order_acquire); /* A.2 reload qword0 to make it ordered after qword1 load */ - descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0); - descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); - descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); - descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); + descs[3] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS + (rxdp + 3), descs[3], 0); + descs[2] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS + (rxdp + 2), descs[2], 0); + descs[1] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS + (rxdp + 1), descs[1], 0); + descs[0] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS + (rxdp), descs[0], 0); /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +663,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; - vst1q_u64((uint64_t *)txdp, descriptor); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..61c71c8c98 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +37,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), dma_addr0); } } @@ -72,8 +68,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr0); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +93,14 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; - desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); - desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); - desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); - desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); + desc0_qw23 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS + (&(rxdp + 0)->wb.qword2)); + desc1_qw23 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS + (&(rxdp + 1)->wb.qword2)); + desc2_qw23 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS + (&(rxdp + 2)->wb.qword2)); + desc3_qw23 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS + (&(rxdp + 3)->wb.qword2)); /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +462,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +474,11 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +681,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..ab5c10fe03 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,21 +189,29 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +513,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -743,28 +747,28 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -960,36 +964,36 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[7].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[6].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[5].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[4].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1668,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static __rte_always_inline void @@ -1719,8 +1723,8 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256(RTE_PTR_DROP_QUALIFIERS(txdp + 2), desc2_3); + _mm256_store_si256(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..dbb9588a47 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -165,28 +161,28 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +596,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -734,28 +730,28 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1112,36 +1108,36 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[7].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[6].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[5].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[4].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1979,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2033,7 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_3); } /* do any last ones */ @@ -2225,7 +2221,7 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); - _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); + _mm256_storeu_si256(RTE_PTR_DROP_QUALIFIERS(txdp), ctx_data_desc); } static __rte_always_inline void @@ -2300,7 +2296,7 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_3); } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..a7f7f977ec 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,7 +418,7 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), dma_addr0); } } @@ -458,8 +454,10 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp++->read), dma_addr0); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp++->read), dma_addr1); } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c index 04be574683..e989868d7a 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_neon.c +++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c @@ -36,7 +36,7 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxep[i] = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, zero); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -53,11 +53,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr1); } rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH; @@ -269,18 +269,22 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[0] = vld1q_u64((uint64_t *)(rxdp)); + descs[3] = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); + descs[2] = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); + descs[1] = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); + descs[0] = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp)); /* Use acquire fence to order loads of descriptor qwords */ rte_atomic_thread_fence(rte_memory_order_acquire); /* A.2 reload qword0 to make it ordered after qword1 load */ - descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0); - descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); - descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); - descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); + descs[3] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS + (rxdp + 3), descs[3], 0); + descs[2] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS + (rxdp + 2), descs[2], 0); + descs[1] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS + (rxdp + 1), descs[1], 0); + descs[0] = vld1q_lane_u64(RTE_PTR_DROP_QUALIFIERS + (rxdp), descs[0], 0); /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..10173e2102 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,7 +34,7 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), dma_addr0); } } @@ -69,8 +65,10 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp++->read), dma_addr0); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp++->read), dma_addr1); } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +576,8 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +589,14 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +785,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +866,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +878,11 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -927,17 +929,17 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, offloads & RTE_ETH_RX_OFFLOAD_TIMESTAMP || rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ - descs_bh[3] = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + descs_bh[3] = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[3].wb.status_error1)); rte_compiler_barrier(); - descs_bh[2] = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + descs_bh[2] = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[2].wb.status_error1)); rte_compiler_barrier(); - descs_bh[1] = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + descs_bh[1] = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[1].wb.status_error1)); rte_compiler_barrier(); - descs_bh[0] = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + descs_bh[0] = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp[0].wb.status_error1)); } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1351,7 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..e35e79e39f 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,7 +29,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), dma_addr0); } } @@ -77,8 +73,10 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp++->read), dma_addr0); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS + (&rxdp++->read), dma_addr1); } #else #ifdef __AVX512VL__ @@ -157,8 +155,10 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512(RTE_PTR_DROP_QUALIFIERS + (&rxdp->read), dma_addr0_3); + _mm512_store_si512(RTE_PTR_DROP_QUALIFIERS + (&(rxdp + 4)->read), dma_addr4_7); } } else #endif /* __AVX512VL__ */ @@ -213,8 +213,10 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ - _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); - _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); + _mm256_store_si256(RTE_PTR_DROP_QUALIFIERS + (&rxdp->read), dma_addr0_1); + _mm256_store_si256(RTE_PTR_DROP_QUALIFIERS + (&(rxdp + 2)->read), dma_addr2_3); } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..da4f433db9 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,21 +250,29 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,37 +448,29 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ - const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + const __m128i raw_desc_bh7 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[7].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + const __m128i raw_desc_bh6 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[6].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + const __m128i raw_desc_bh5 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[5].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + const __m128i raw_desc_bh4 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[4].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + const __m128i raw_desc_bh3 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + const __m128i raw_desc_bh2 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[2].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + const __m128i raw_desc_bh1 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[1].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + const __m128i raw_desc_bh0 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +786,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static __rte_always_inline void @@ -841,8 +837,8 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256(RTE_PTR_DROP_QUALIFIERS(txdp + 2), desc2_3); + _mm256_store_si256(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..5613478bca 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -244,28 +240,28 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, /* load in descriptors, in reverse order */ const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -474,37 +470,29 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ - const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + const __m128i raw_desc_bh7 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[7].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + const __m128i raw_desc_bh6 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[6].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + const __m128i raw_desc_bh5 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[5].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + const __m128i raw_desc_bh4 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[4].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + const __m128i raw_desc_bh3 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + const __m128i raw_desc_bh2 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[2].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + const __m128i raw_desc_bh1 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[1].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + const __m128i raw_desc_bh0 = _mm_load_si128 + (RTE_PTR_DROP_QUALIFIERS(&rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +975,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static __rte_always_inline void @@ -1029,7 +1017,7 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_PTR_DROP_QUALIFIERS(txdp), desc0_3); } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..5f536fe5c5 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,7 +48,7 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), dma_addr0); } } @@ -91,8 +87,8 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr0); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr1); } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +421,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +433,11 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -491,19 +487,19 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* load bottom half of every 32B desc */ const __m128i raw_desc_bh3 = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_PTR_DROP_QUALIFIERS(&rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_PTR_DROP_QUALIFIERS(&rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_PTR_DROP_QUALIFIERS(&rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_PTR_DROP_QUALIFIERS(&rxdp[0].wb.status_error1)); /** * to shift the 32b RSS hash value to the @@ -680,7 +676,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..1c4a52a84a 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -36,7 +34,7 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), zero); } } @@ -60,12 +58,12 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -367,10 +365,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); /* A. load 4 pkts descs */ - descs[0] = vld1q_u64((uint64_t *)(rxdp)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); + descs[0] = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp)); + descs[1] = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); + descs[2] = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); + descs[3] = vld1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); /* B.2 copy 2 mbuf point into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); @@ -554,7 +552,7 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, pkt->buf_iova + pkt->data_off, (uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len}; - vst1q_u64((uint64_t *)&txdp->read, descriptor); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&txdp->read), descriptor); } static inline void diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..c3c71d442f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,7 +37,7 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp[i].read), dma_addr0); } } @@ -76,8 +72,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr0); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +462,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +474,11 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +672,7 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)&txdp->read, descriptor); + _mm_store_si128(RTE_PTR_DROP_QUALIFIERS(&txdp->read), descriptor); } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..8bfbc7290d 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,7 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" - tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop + tunnel = (typeof(tunnel))RTE_PTR_DROP_QUALIFIERS(flow->tunnel); return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..290395cb5d 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * @@ -75,7 +73,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { volatile struct mlx5_mini_cqe8 *mcq = - (void *)&(cq + !rxq->cqe_comp_layout)->pkt_info; + (volatile void *)&(cq + !rxq->cqe_comp_layout)->pkt_info; /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -139,9 +137,9 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, */ cycle: if (rxq->cqe_comp_layout) - rte_prefetch0((void *)(cq + mcqe_n)); + rte_prefetch0(RTE_PTR_DROP_QUALIFIERS(cq + mcqe_n)); for (pos = 0; pos < mcqe_n; ) { - uint8_t *p = (void *)&mcq[pos % 8]; + uint8_t *p = RTE_PTR_DROP_QUALIFIERS(&mcq[pos % 8]); uint8_t *e0 = (void *)&elts[pos]->rearm_data; uint8_t *e1 = (void *)&elts[pos + 1]->rearm_data; uint8_t *e2 = (void *)&elts[pos + 2]->rearm_data; @@ -157,7 +155,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) - rte_prefetch0((void *)(cq + pos + i)); + rte_prefetch0((volatile void *)(cq + pos + i)); __asm__ volatile ( /* A.1 load mCQEs into a 128bit register. */ "ld1 {v16.16b - v17.16b}, [%[mcq]] \n\t" @@ -367,8 +365,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { if (pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)&(cq + pos)->pkt_info; + rte_prefetch0((volatile void *)(cq + pos + 8)); + mcq = (volatile void *)&(cq + pos)->pkt_info; for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -383,7 +381,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -663,7 +661,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, mask = vcreate_u16(pkts_n - pos < MLX5_VPMD_DESCS_PER_LOOP ? -1UL >> ((pkts_n - pos) * sizeof(uint16_t) * 8) : 0); - p0 = (void *)&cq[pos].pkt_info; + p0 = RTE_PTR_DROP_QUALIFIERS(&cq[pos].pkt_info); p1 = p0 + (pkts_n - pos > 1) * sizeof(struct mlx5_cqe); p2 = p1 + (pkts_n - pos > 2) * sizeof(struct mlx5_cqe); p3 = p2 + (pkts_n - pos > 3) * sizeof(struct mlx5_cqe); diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..c235c8eeee 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile void *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -130,7 +127,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, */ cycle: if (rxq->cqe_comp_layout) - rte_prefetch0((void *)(cq + mcqe_n)); + rte_prefetch0(RTE_PTR_DROP_QUALIFIERS(cq + mcqe_n)); for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -141,10 +138,10 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) - rte_prefetch0((void *)(cq + pos + i)); + rte_prefetch0(RTE_PTR_DROP_QUALIFIERS(cq + pos + i)); /* A.1 load mCQEs into a 128bit register. */ - mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); - mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); + mcqe1 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(&mcq[pos % 8])); + mcqe2 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS(&mcq[pos % 8 + 2])); /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -355,8 +352,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { if (pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); + rte_prefetch0(RTE_PTR_DROP_QUALIFIERS(cq + pos + 8)); + mcq = (volatile void *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +368,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile void *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,38 +648,44 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); - cqes[3] = _mm_loadl_epi64((__m128i *) - &cq[pos + p3].sop_drop_qpn); + cqes[3] = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS + (&cq[pos + p3].sop_drop_qpn)); rte_compiler_barrier(); p2 = _mm_extract_epi16(p, 2); - cqes[2] = _mm_loadl_epi64((__m128i *) - &cq[pos + p2].sop_drop_qpn); + cqes[2] = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS + (&cq[pos + p2].sop_drop_qpn)); rte_compiler_barrier(); /* B.1 load mbuf pointers. */ mbp1 = _mm_loadu_si128((__m128i *)&elts[pos]); mbp2 = _mm_loadu_si128((__m128i *)&elts[pos + 2]); /* A.1 load a block having op_own. */ p1 = _mm_extract_epi16(p, 1); - cqes[1] = _mm_loadl_epi64((__m128i *) - &cq[pos + p1].sop_drop_qpn); + cqes[1] = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS + (&cq[pos + p1].sop_drop_qpn)); rte_compiler_barrier(); - cqes[0] = _mm_loadl_epi64((__m128i *) - &cq[pos].sop_drop_qpn); + cqes[0] = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS + (&cq[pos].sop_drop_qpn)); /* B.2 copy mbuf pointers. */ _mm_storeu_si128((__m128i *)&pkts[pos], mbp1); _mm_storeu_si128((__m128i *)&pkts[pos + 2], mbp2); rte_io_rmb(); /* C.1 load remained CQE data and extract necessary fields. */ - cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p3]); - cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos + p2]); + cqe_tmp2 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&cq[pos + p3])); + cqe_tmp1 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS + (&cq[pos + p2])); cqes[3] = _mm_blendv_epi8(cqes[3], cqe_tmp2, blend_mask); cqes[2] = _mm_blendv_epi8(cqes[2], cqe_tmp1, blend_mask); - cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + p3].csum); - cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos + p2].csum); + cqe_tmp2 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS + (&cq[pos + p3].csum)); + cqe_tmp1 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS + (&cq[pos + p2].csum)); cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x30); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); - cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); - cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); + cqe_tmp2 = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS + (&cq[pos + p3].rsvd4[2])); + cqe_tmp1 = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS + (&cq[pos + p2].rsvd4[2])); cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,16 +703,20 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ - cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); - cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); + cqe_tmp2 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(&cq[pos + p1])); + cqe_tmp1 = _mm_load_si128(RTE_PTR_DROP_QUALIFIERS(&cq[pos])); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); cqes[0] = _mm_blendv_epi8(cqes[0], cqe_tmp1, blend_mask); - cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + p1].csum); - cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos].csum); + cqe_tmp2 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS + (&cq[pos + p1].csum)); + cqe_tmp1 = _mm_loadu_si128(RTE_PTR_DROP_QUALIFIERS + (&cq[pos].csum)); cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x30); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); - cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); - cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); + cqe_tmp2 = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS + (&cq[pos + p1].rsvd4[2])); + cqe_tmp1 = _mm_loadl_epi64(RTE_PTR_DROP_QUALIFIERS + (&cq[pos].rsvd4[2])); cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c index 37075ea5e7..3c555214a8 100644 --- a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c +++ b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c @@ -35,7 +35,7 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_NGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp[i]), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,12 +58,12 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp++), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp++), dma_addr1); } rxq->rxrearm_start += RTE_NGBE_RXQ_REARM_THRESH; @@ -484,7 +484,7 @@ vtx1(volatile struct ngbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; - vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static inline void diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c index d4d647fab5..713b8fc26b 100644 --- a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c +++ b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c @@ -34,7 +34,7 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_TXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(&rxdp[i]), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -57,12 +57,12 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp++), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(rxdp++), dma_addr1); } rxq->rxrearm_start += RTE_TXGBE_RXQ_REARM_THRESH; @@ -484,7 +484,7 @@ vtx1(volatile struct txgbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; - vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); + vst1q_u64(RTE_PTR_DROP_QUALIFIERS(txdp), descriptor); } static inline void diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.2.vfs.0.0 ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v13 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-16 1:55 ` [PATCH v13 3/3] drivers/net: " Andre Muezerie @ 2025-01-16 8:57 ` Bruce Richardson 2025-01-18 3:07 ` Andre Muezerie 2025-01-16 9:08 ` Morten Brørup 1 sibling, 1 reply; 87+ messages in thread From: Bruce Richardson @ 2025-01-16 8:57 UTC (permalink / raw) To: Andre Muezerie; +Cc: dev, stephen On Wed, Jan 15, 2025 at 05:55:11PM -0800, Andre Muezerie wrote: > It was a common pattern to have "GCC diagnostic ignored" pragmas > sprinkled over the code and only activate these pragmas for certain > compilers (gcc and clang). Clang supports GCC's pragma for > compatibility with existing source code, so #pragma GCC diagnostic > and #pragma clang diagnostic are synonyms for Clang > (https://clang.llvm.org/docs/UsersManual.html). > > Now that effort is being made to make the code compatible with MSVC > these expressions would become more complex. It makes sense to hide > this complexity behind macros. This makes maintenance easier as these > macros are defined in a single place. As a plus the code becomes > more readable as well. > > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> > --- Acked-by: Bruce Richardson <bruce.richardson@intel.com> On a stylistic note, I think you can be slightly less aggressive in wrapping the new code in the patch. DPDK allows lines up to 100 long without wrapping, so please don't wrap at 80. Thanks, /Bruce ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v13 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-16 8:57 ` Bruce Richardson @ 2025-01-18 3:07 ` Andre Muezerie 0 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-18 3:07 UTC (permalink / raw) To: Bruce Richardson; +Cc: dev, stephen On Thu, Jan 16, 2025 at 08:57:27AM +0000, Bruce Richardson wrote: > On Wed, Jan 15, 2025 at 05:55:11PM -0800, Andre Muezerie wrote: > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > sprinkled over the code and only activate these pragmas for certain > > compilers (gcc and clang). Clang supports GCC's pragma for > > compatibility with existing source code, so #pragma GCC diagnostic > > and #pragma clang diagnostic are synonyms for Clang > > (https://clang.llvm.org/docs/UsersManual.html). > > > > Now that effort is being made to make the code compatible with MSVC > > these expressions would become more complex. It makes sense to hide > > this complexity behind macros. This makes maintenance easier as these > > macros are defined in a single place. As a plus the code becomes > > more readable as well. > > > > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> > > --- > Acked-by: Bruce Richardson <bruce.richardson@intel.com> > > On a stylistic note, I think you can be slightly less aggressive in > wrapping the new code in the patch. DPDK allows lines up to 100 long > without wrapping, so please don't wrap at 80. > > Thanks, > /Bruce Thanks for calling this out. I followed you suggestion in the v14 series of this patchset. ^ permalink raw reply [flat|nested] 87+ messages in thread
* RE: [PATCH v13 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-16 1:55 ` [PATCH v13 3/3] drivers/net: " Andre Muezerie 2025-01-16 8:57 ` Bruce Richardson @ 2025-01-16 9:08 ` Morten Brørup 2025-01-17 3:56 ` Andre Muezerie 1 sibling, 1 reply; 87+ messages in thread From: Morten Brørup @ 2025-01-16 9:08 UTC (permalink / raw) To: Andre Muezerie; +Cc: dev, stephen, bruce.richardson > From: Andre Muezerie [mailto:andremue@linux.microsoft.com] > Sent: Thursday, 16 January 2025 02.55 > > It was a common pattern to have "GCC diagnostic ignored" pragmas > sprinkled over the code and only activate these pragmas for certain > compilers (gcc and clang). Clang supports GCC's pragma for > compatibility with existing source code, so #pragma GCC diagnostic > and #pragma clang diagnostic are synonyms for Clang > (https://clang.llvm.org/docs/UsersManual.html). > > Now that effort is being made to make the code compatible with MSVC > these expressions would become more complex. It makes sense to hide > this complexity behind macros. This makes maintenance easier as these > macros are defined in a single place. As a plus the code becomes > more readable as well. Here is some food for thought and discussion... > @@ -2083,7 +2075,7 @@ dpaa2_dev_loopback_rx(void *queue, > if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == > 0)) > continue; > } > - fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); > + fd[num_rx] = RTE_PTR_DROP_QUALIFIERS(qbman_result_DQ_fd(dq_storage)); I do not think this makes the code more readable; quite the opposite. Before this, I could see which type the variable was being cast to. How about a macro that resembles "traditional" type casting: /** * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer, * without the compiler emitting a warning. * * @warning * Although this macro can be abused for casting a pointer to point to a different type, * alignment may be incorrect when casting to point to a larger type. E.g.: * struct s { * uint16_t a; * uint8_t b; * uint8_t c; * uint8_t d; * } v; * uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not 16 bit aligned! */ #define RTE_CAST_PTR(type, ptr) \ ((type)(uintptr_t)(ptr)) Writing the above warning lead me down another path... Can we somehow use __typeof_unqual__? It is available in both GCC [1] and MSVC [2]. [1]: https://gcc.gnu.org/onlinedocs/gcc/Typeof.html [2]: https://learn.microsoft.com/en-us/cpp/c-language/typeof-unqual-c?view=msvc-170 We are making a workaround, and should take care to not endorse overusing it. Especially for other purposes than intended. Unfortunately, I think some of the type casts don't just remove qualifiers, but does exactly what my warning above describes: Casts a pointer to completely different type. If the new type is a larger type, the pointer's alignment becomes invalid, and if the compiler considers alignment a "qualifier", -Wcast-qual emits a warning about it. Backtracking a bit... If the macro is intended to remove qualifiers, and not to cast to a different type, RTE_PTR_DROP_QUALIFIERS(ptr) might be better than RTE_CAST_PTR(type, ptr). For brevity and to resemble the C23 keyword typeof_unqual, it could be named RTE_PTR_UNQUAL instead of RTE_PTR_DROP_QUALIFIERS. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v13 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-16 9:08 ` Morten Brørup @ 2025-01-17 3:56 ` Andre Muezerie 2025-01-18 3:05 ` Andre Muezerie 0 siblings, 1 reply; 87+ messages in thread From: Andre Muezerie @ 2025-01-17 3:56 UTC (permalink / raw) To: Morten Brørup; +Cc: dev, stephen, bruce.richardson On Thu, Jan 16, 2025 at 10:08:07AM +0100, Morten Brørup wrote: > > From: Andre Muezerie [mailto:andremue@linux.microsoft.com] > > Sent: Thursday, 16 January 2025 02.55 > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > sprinkled over the code and only activate these pragmas for certain > > compilers (gcc and clang). Clang supports GCC's pragma for > > compatibility with existing source code, so #pragma GCC diagnostic > > and #pragma clang diagnostic are synonyms for Clang > > (https://clang.llvm.org/docs/UsersManual.html). > > > > Now that effort is being made to make the code compatible with MSVC > > these expressions would become more complex. It makes sense to hide > > this complexity behind macros. This makes maintenance easier as these > > macros are defined in a single place. As a plus the code becomes > > more readable as well. > > Here is some food for thought and discussion... > > > @@ -2083,7 +2075,7 @@ dpaa2_dev_loopback_rx(void *queue, > > if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == > > 0)) > > continue; > > } > > - fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); > > + fd[num_rx] = RTE_PTR_DROP_QUALIFIERS(qbman_result_DQ_fd(dq_storage)); > > I do not think this makes the code more readable; quite the opposite. > Before this, I could see which type the variable was being cast to. > > How about a macro that resembles "traditional" type casting: > > /** > * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer, > * without the compiler emitting a warning. > * > * @warning > * Although this macro can be abused for casting a pointer to point to a different type, > * alignment may be incorrect when casting to point to a larger type. E.g.: > * struct s { > * uint16_t a; > * uint8_t b; > * uint8_t c; > * uint8_t d; > * } v; > * uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not 16 bit aligned! > */ > #define RTE_CAST_PTR(type, ptr) \ > ((type)(uintptr_t)(ptr)) > > > Writing the above warning lead me down another path... > Can we somehow use __typeof_unqual__? > It is available in both GCC [1] and MSVC [2]. > > [1]: https://gcc.gnu.org/onlinedocs/gcc/Typeof.html > [2]: https://learn.microsoft.com/en-us/cpp/c-language/typeof-unqual-c?view=msvc-170 > > > We are making a workaround, and should take care to not endorse overusing it. > Especially for other purposes than intended. > > Unfortunately, I think some of the type casts don't just remove qualifiers, but does exactly what my warning above describes: Casts a pointer to completely different type. > If the new type is a larger type, the pointer's alignment becomes invalid, and if the compiler considers alignment a "qualifier", -Wcast-qual emits a warning about it. > > > Backtracking a bit... > If the macro is intended to remove qualifiers, and not to cast to a different type, RTE_PTR_DROP_QUALIFIERS(ptr) might be better than RTE_CAST_PTR(type, ptr). > For brevity and to resemble the C23 keyword typeof_unqual, it could be named RTE_PTR_UNQUAL instead of RTE_PTR_DROP_QUALIFIERS. > These are great suggestions, and __typeof_unqual__ seems to be exactly what we need to drop the qualifiers. I'll look more closely at the code and find out where a cast is actually being used for other purposes than removing the qualifier. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v13 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-17 3:56 ` Andre Muezerie @ 2025-01-18 3:05 ` Andre Muezerie 0 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-18 3:05 UTC (permalink / raw) To: Morten Brørup; +Cc: dev, stephen, bruce.richardson On Thu, Jan 16, 2025 at 07:56:52PM -0800, Andre Muezerie wrote: > On Thu, Jan 16, 2025 at 10:08:07AM +0100, Morten Brørup wrote: > > > From: Andre Muezerie [mailto:andremue@linux.microsoft.com] > > > Sent: Thursday, 16 January 2025 02.55 > > > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > > sprinkled over the code and only activate these pragmas for certain > > > compilers (gcc and clang). Clang supports GCC's pragma for > > > compatibility with existing source code, so #pragma GCC diagnostic > > > and #pragma clang diagnostic are synonyms for Clang > > > (https://clang.llvm.org/docs/UsersManual.html). > > > > > > Now that effort is being made to make the code compatible with MSVC > > > these expressions would become more complex. It makes sense to hide > > > this complexity behind macros. This makes maintenance easier as these > > > macros are defined in a single place. As a plus the code becomes > > > more readable as well. > > > > Here is some food for thought and discussion... > > > > > @@ -2083,7 +2075,7 @@ dpaa2_dev_loopback_rx(void *queue, > > > if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == > > > 0)) > > > continue; > > > } > > > - fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); > > > + fd[num_rx] = RTE_PTR_DROP_QUALIFIERS(qbman_result_DQ_fd(dq_storage)); > > > > I do not think this makes the code more readable; quite the opposite. > > Before this, I could see which type the variable was being cast to. > > > > How about a macro that resembles "traditional" type casting: > > > > /** > > * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer, > > * without the compiler emitting a warning. > > * > > * @warning > > * Although this macro can be abused for casting a pointer to point to a different type, > > * alignment may be incorrect when casting to point to a larger type. E.g.: > > * struct s { > > * uint16_t a; > > * uint8_t b; > > * uint8_t c; > > * uint8_t d; > > * } v; > > * uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not 16 bit aligned! > > */ > > #define RTE_CAST_PTR(type, ptr) \ > > ((type)(uintptr_t)(ptr)) > > > > > > Writing the above warning lead me down another path... > > Can we somehow use __typeof_unqual__? > > It is available in both GCC [1] and MSVC [2]. > > > > [1]: https://gcc.gnu.org/onlinedocs/gcc/Typeof.html > > [2]: https://learn.microsoft.com/en-us/cpp/c-language/typeof-unqual-c?view=msvc-170 > > > > > > We are making a workaround, and should take care to not endorse overusing it. > > Especially for other purposes than intended. > > > > Unfortunately, I think some of the type casts don't just remove qualifiers, but does exactly what my warning above describes: Casts a pointer to completely different type. > > If the new type is a larger type, the pointer's alignment becomes invalid, and if the compiler considers alignment a "qualifier", -Wcast-qual emits a warning about it. > > > > > > Backtracking a bit... > > If the macro is intended to remove qualifiers, and not to cast to a different type, RTE_PTR_DROP_QUALIFIERS(ptr) might be better than RTE_CAST_PTR(type, ptr). > > For brevity and to resemble the C23 keyword typeof_unqual, it could be named RTE_PTR_UNQUAL instead of RTE_PTR_DROP_QUALIFIERS. > > > > These are great suggestions, and __typeof_unqual__ seems to be exactly what we need to drop the qualifiers. I'll look more closely at the code and find out where a cast is actually being used for other purposes than removing the qualifier. I took a closer look at the code and this is what I found: * Only 2 places where qualifiers were being dropped were not casting to a different type. I used RTE_PTR_UNQUAL in those as suggested, for clarity. * I experimented with C23 typeof_unqual. It indeed works on gcc, clang and MSVC, but there are some details: a) With gcc the project needs to be compiled with -std=c2x. Many other warnings show up, unrelated to the scope of this patchset. Some look suspicious and should be looked at. An error also showed up, for which I sent out a small patch. b) When using typeof_unqual and passing "-Wcast-qual" to the compiler, a warning about the qualifier being dropped is emitted. The project currently uses "-Wcast-qual" Due to (a) I decided to not use typeof_unqual for now, but it would be trivial to change the macro in the future to do so. * All other places where I was using RTE_PTR_DROP_QUALIFIERS I'm using RTE_CAST_PTR now. I also think that the code became more readable by doing so. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v13 0/3] add diagnostics macros to make code portable 2025-01-16 1:55 ` [PATCH v13 0/3] " Andre Muezerie ` (2 preceding siblings ...) 2025-01-16 1:55 ` [PATCH v13 3/3] drivers/net: " Andre Muezerie @ 2025-01-16 8:58 ` Bruce Richardson 3 siblings, 0 replies; 87+ messages in thread From: Bruce Richardson @ 2025-01-16 8:58 UTC (permalink / raw) To: Andre Muezerie; +Cc: dev, stephen On Wed, Jan 15, 2025 at 05:55:08PM -0800, Andre Muezerie wrote: > It was a common pattern to have "GCC diagnostic ignored" pragmas > sprinkled over the code and only activate these pragmas for certain > compilers (gcc and clang). Clang supports GCC's pragma for > compatibility with existing source code, so #pragma GCC diagnostic > and #pragma clang diagnostic are synonyms for Clang > (https://clang.llvm.org/docs/UsersManual.html). > > Now that effort is being made to make the code compatible with MSVC > these expressions would become more complex. It makes sense to hide > this complexity behind macros. This makes maintenance easier as these > macros are defined in a single place. As a plus the code becomes > more readable as well. > Series-acked-by: Bruce Richardson <bruce.richardson@intel.com> > v13: > * Renamed RTE_IGNORE_CAST_QUAL into RTE_PTR_DROP_QUALIFIERS. > * Added (void *) cast to RTE_PTR_DROP_QUALIFIERS to avoid the need > for casting the result in most places where the macro is used. > > v12: > * Added macro RTE_IGNORE_CAST_QUAL and used it as a more compact and > readable form to suppress warnings where a cast is used to remove > a type qualifier. > > v11: > * Added __rte_diagnostic_ignored_wcast_qual to a few more places where > it was needed. > > v10: > * Added __rte_diagnostic_ignored_wcast_qual to a few more places where > it was needed. > > v9: > * Added __rte_diagnostic_ignored_wcast_qual to a few more places where > it was needed. > > v8: > * Added __rte_diagnostic_ignored_wcast_qual to a few more places where > it was needed. > > v7: > * Added __rte_diagnostic_ignored_wcast_qual to a few more places where > it was needed. > > v6: > * Added __rte_diagnostic_ignored_wcast_qual to a few more places where > it was needed. > > v5: > * Added __rte_diagnostic_ignored_wcast_qual to a few more places where > it was needed. > > v4: > * Added __rte_diagnostic_ignored_wcast_qual to a few more places where > it was needed. > > v3: > * Added __rte_diagnostic_ignored_wcast_qual to a few more places where > it was needed. > > v2: > * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced > in v1). > * Removed the pragmas from many files where they were not needed. > * In the files where the pragmas were indeed needed, reduced the > scope during which they are active, reducing the chance that > unforeseen issues are hidden due to warning suppression. > > Andre Muezerie (3): > eal: add diagnostics macros to make code portable > drivers/common: add diagnostics macros to make code portable > drivers/net: add diagnostics macros to make code portable > > drivers/common/idpf/idpf_common_rxtx_avx512.c | 77 ++++++++--------- > drivers/net/axgbe/axgbe_rxtx.h | 9 -- > drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 - > drivers/net/dpaa2/dpaa2_rxtx.c | 15 +--- > drivers/net/fm10k/fm10k_rxtx_vec.c | 20 ++--- > drivers/net/hns3/hns3_rxtx_vec_neon.h | 6 +- > .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - > drivers/net/i40e/i40e_rxtx_common_avx.h | 24 +++--- > drivers/net/i40e/i40e_rxtx_vec_altivec.c | 23 ++--- > drivers/net/i40e/i40e_rxtx_vec_avx2.c | 40 +++++---- > drivers/net/i40e/i40e_rxtx_vec_avx512.c | 30 ++++--- > drivers/net/i40e/i40e_rxtx_vec_common.h | 4 - > drivers/net/i40e/i40e_rxtx_vec_neon.c | 39 ++++----- > drivers/net/i40e/i40e_rxtx_vec_sse.c | 32 +++---- > drivers/net/iavf/iavf_rxtx_vec_avx2.c | 84 ++++++++++--------- > drivers/net/iavf/iavf_rxtx_vec_avx512.c | 78 ++++++++--------- > drivers/net/iavf/iavf_rxtx_vec_common.h | 12 ++- > drivers/net/iavf/iavf_rxtx_vec_neon.c | 26 +++--- > drivers/net/iavf/iavf_rxtx_vec_sse.c | 52 ++++++------ > drivers/net/ice/ice_rxtx_common_avx.h | 24 +++--- > drivers/net/ice/ice_rxtx_vec_avx2.c | 74 ++++++++-------- > drivers/net/ice/ice_rxtx_vec_avx512.c | 64 ++++++-------- > drivers/net/ice/ice_rxtx_vec_common.h | 4 - > drivers/net/ice/ice_rxtx_vec_sse.c | 28 +++---- > drivers/net/idpf/idpf_rxtx_vec_common.h | 4 - > .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - > drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 18 ++-- > drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 20 ++--- > drivers/net/mlx5/mlx5_flow.c | 5 +- > drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- > drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 ++-- > drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 71 +++++++++------- > drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 8 +- > drivers/net/tap/tap_flow.c | 6 +- > drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 8 +- > drivers/net/virtio/virtio_rxtx_simple.c | 4 - > lib/eal/include/rte_common.h | 29 +++++++ > 37 files changed, 466 insertions(+), 503 deletions(-) > > -- > 2.47.2.vfs.0.0 > ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v14 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (14 preceding siblings ...) 2025-01-16 1:55 ` [PATCH v13 0/3] " Andre Muezerie @ 2025-01-18 2:46 ` Andre Muezerie 2025-01-18 2:46 ` [PATCH v14 1/3] eal: " Andre Muezerie ` (2 more replies) 2025-01-18 21:55 ` [PATCH v15 0/3] " Andre Muezerie 2025-01-21 22:36 ` [PATCH v16 0/3] " Andre Muezerie 17 siblings, 3 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-18 2:46 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson v14: * Renamed RTE_PTR_DROP_QUALIFIERS into RTE_PTR_UNQUAL to more resemble C23 typeof_unqual. * Added macro RTE_CAST_PTR to make the cast more readable when removing a type qualifier from a pointer. v13: * Renamed RTE_IGNORE_CAST_QUAL into RTE_PTR_DROP_QUALIFIERS. * Added (void *) cast to RTE_PTR_DROP_QUALIFIERS to avoid the need for casting the result in most places where the macro is used. v12: * Added macro RTE_IGNORE_CAST_QUAL and used it as a more compact and readable form to suppress warnings where a cast is used to remove a type qualifier. v11: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v10: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v9: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v8: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v7: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v6: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v5: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 72 +++++++++--------- drivers/net/axgbe/axgbe_rxtx.h | 9 --- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 - drivers/net/dpaa2/dpaa2_rxtx.c | 15 +--- drivers/net/fm10k/fm10k_rxtx_vec.c | 21 ++---- drivers/net/hns3/hns3_rxtx_vec_neon.h | 6 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 22 +++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 18 ++--- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 30 ++++---- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 28 +++---- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 - drivers/net/i40e/i40e_rxtx_vec_neon.c | 35 ++++----- drivers/net/i40e/i40e_rxtx_vec_sse.c | 28 +++---- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 60 +++++++-------- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 62 ++++++++-------- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 +-- drivers/net/iavf/iavf_rxtx_vec_neon.c | 22 +++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 38 +++++----- drivers/net/ice/ice_rxtx_common_avx.h | 18 ++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 74 +++++++++---------- drivers/net/ice/ice_rxtx_vec_avx512.c | 64 +++++++--------- drivers/net/ice/ice_rxtx_vec_common.h | 4 - drivers/net/ice/ice_rxtx_vec_sse.c | 28 +++---- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 - .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 18 ++--- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 20 ++--- drivers/net/mlx5/mlx5_flow.c | 5 +- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 ++--- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 61 ++++++++------- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 8 +- drivers/net/tap/tap_flow.c | 6 +- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 8 +- drivers/net/virtio/virtio_rxtx_simple.c | 4 - lib/eal/include/rte_common.h | 50 +++++++++++++ 37 files changed, 400 insertions(+), 483 deletions(-) -- Series-acked-by: Bruce Richardson <bruce.richardson@intel.com> 2.47.2.vfs.0.1 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v14 1/3] eal: add diagnostics macros to make code portable 2025-01-18 2:46 ` [PATCH v14 " Andre Muezerie @ 2025-01-18 2:46 ` Andre Muezerie 2025-01-18 2:46 ` [PATCH v14 2/3] drivers/common: " Andre Muezerie 2025-01-18 2:46 ` [PATCH v14 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-18 2:46 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 50 ++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 40592f71b1..26da7a7476 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -156,6 +156,56 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/** + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + +/** + * Macro to disable compiler warnings about removing a type + * qualifier from the target type. + */ +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/** + * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer, + * without the compiler emitting a warning. + * + * Once the project uses C23, the definition below can be used instead + * (-Wcast-qual will emit a warning with this new definition though) + * #define RTE_PTR_UNQUAL(X) ((typeof_unqual(*X)*)(X)) + */ +#define RTE_PTR_UNQUAL(X) ((void *)(uintptr_t)(X)) + +/** + * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer + * and cast it to a specific type, without the compiler emitting a warning. + * + * @warning + * Although this macro can be abused for casting a pointer to point to a different type, + * alignment may be incorrect when casting to point to a larger type. E.g.: + * struct s { + * uint16_t a; + * uint8_t b; + * uint8_t c; + * uint8_t d; + * } v; + * uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not 16 bit aligned! + */ +#define RTE_CAST_PTR(type, ptr) ((type)(uintptr_t)(ptr)) + /** * Mark a function or variable to a weak reference. */ -- 2.47.2.vfs.0.1 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v14 2/3] drivers/common: add diagnostics macros to make code portable 2025-01-18 2:46 ` [PATCH v14 " Andre Muezerie 2025-01-18 2:46 ` [PATCH v14 1/3] eal: " Andre Muezerie @ 2025-01-18 2:46 ` Andre Muezerie 2025-01-18 2:46 ` [PATCH v14 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-18 2:46 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 72 +++++++++---------- 1 file changed, 34 insertions(+), 38 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..81052e72c1 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,7 +30,7 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -108,8 +104,8 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, &rxdp->read), dma_addr0_3); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, &(rxdp + 4)->read), dma_addr4_7); } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +160,8 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_storeu_si128((__m128i *)&rxdp[i].read, - dma_addr0); + _mm_storeu_si128(RTE_CAST_PTR + (__m128i *, &rxdp[i].read), dma_addr0); } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +212,10 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); - _mm512_storeu_si512((void *)rxdp, desc0_1); - _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); - _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); - _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); + _mm512_storeu_si512(RTE_CAST_PTR(void *, rxdp), desc0_1); + _mm512_storeu_si512(RTE_CAST_PTR(void *, (rxdp + 2)), desc2_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, (rxdp + 4)), desc4_5); + _mm512_storeu_si512(RTE_CAST_PTR(void *, (rxdp + 6)), desc6_7); rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -337,28 +333,28 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,7 +556,7 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i], + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i]), dma_addr0); } } @@ -634,7 +630,7 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; - _mm_storeu_si128((__m128i *)&rxdp[i], + _mm_storeu_si128(RTE_CAST_PTR(__m128i *, &rxdp[i]), dma_addr0); } } @@ -798,28 +794,28 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1127,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1174,7 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ @@ -1435,7 +1431,7 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static __rte_always_inline void @@ -1480,7 +1476,7 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ @@ -1521,11 +1517,11 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); - idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); + idpf_splitq_vtx(txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); - idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); + idpf_splitq_vtx1(txdp, *tx_pkts++, cmd_dtype); nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1536,7 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); - idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); + idpf_splitq_vtx(txdp, tx_pkts, nb_commit, cmd_dtype); tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.2.vfs.0.1 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v14 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-18 2:46 ` [PATCH v14 " Andre Muezerie 2025-01-18 2:46 ` [PATCH v14 1/3] eal: " Andre Muezerie 2025-01-18 2:46 ` [PATCH v14 2/3] drivers/common: " Andre Muezerie @ 2025-01-18 2:46 ` Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-18 2:46 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 --- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 - drivers/net/dpaa2/dpaa2_rxtx.c | 15 +--- drivers/net/fm10k/fm10k_rxtx_vec.c | 21 ++---- drivers/net/hns3/hns3_rxtx_vec_neon.h | 6 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 22 +++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 18 ++--- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 30 ++++---- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 28 +++---- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 - drivers/net/i40e/i40e_rxtx_vec_neon.c | 35 ++++----- drivers/net/i40e/i40e_rxtx_vec_sse.c | 28 +++---- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 60 +++++++-------- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 62 ++++++++-------- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 +-- drivers/net/iavf/iavf_rxtx_vec_neon.c | 22 +++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 38 +++++----- drivers/net/ice/ice_rxtx_common_avx.h | 18 ++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 74 +++++++++---------- drivers/net/ice/ice_rxtx_vec_avx512.c | 64 +++++++--------- drivers/net/ice/ice_rxtx_vec_common.h | 4 - drivers/net/ice/ice_rxtx_vec_sse.c | 28 +++---- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 - .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 18 ++--- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 20 ++--- drivers/net/mlx5/mlx5_flow.c | 5 +- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 ++--- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 61 ++++++++------- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 8 +- drivers/net/tap/tap_flow.c | 6 +- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 8 +- drivers/net/virtio/virtio_rxtx_simple.c | 4 - 35 files changed, 316 insertions(+), 445 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..bfb5542bbc 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,7 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } - fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); + fd[num_rx] = RTE_PTR_UNQUAL(qbman_result_DQ_fd(dq_storage)); dq_storage++; num_rx++; @@ -2118,8 +2110,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..715c891c30 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,7 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].q, - dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].q), dma_addr0); } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +311,8 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->q), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->q), dma_addr1); /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +460,7 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs0[3] = _mm_loadu_si128(RTE_CAST_PTR(__m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +472,11 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs0[2] = _mm_loadu_si128(RTE_CAST_PTR(__m128i *, rxdp + 2)); rte_compiler_barrier(); - descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs0[1] = _mm_loadu_si128(RTE_CAST_PTR(__m128i *, rxdp + 1)); rte_compiler_barrier(); - descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs0[0] = _mm_loadu_si128(RTE_CAST_PTR(__m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +731,7 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..f2e155c9f5 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,8 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; - vst1q_u64((uint64_t *)&desc->addr, val1); - vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &desc->addr), val1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &desc->tx.outer_vlan_tag), val2); } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..b66a808f9f 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,7 +32,7 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -72,8 +68,8 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } #else #ifdef __AVX512VL__ @@ -144,8 +140,10 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, + &rxdp->read), dma_addr0_3); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, + &(rxdp + 4)->read), dma_addr4_7); } } else #endif /* __AVX512VL__*/ @@ -190,8 +188,10 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ - _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); - _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, + &rxdp->read), dma_addr0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, + &(rxdp + 2)->read), dma_addr2_3); } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..f9e26d18dd 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -44,7 +42,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; vec_st(dma_addr0, 0, - (__vector unsigned long *)&rxdp[i].read); + RTE_CAST_PTR(__vector unsigned long *, &rxdp[i].read)); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +82,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); - vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); + vec_st(dma_addr0, 0, RTE_CAST_PTR(__vector unsigned long *, &rxdp++->read)); + vec_st(dma_addr1, 0, RTE_CAST_PTR(__vector unsigned long *, &rxdp++->read)); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +284,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = *(__vector unsigned long *)(rxdp + 3); + descs[3] = *RTE_CAST_PTR(__vector unsigned long *, rxdp + 3); rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +294,11 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ - descs[2] = *(__vector unsigned long *)(rxdp + 2); + descs[2] = *RTE_CAST_PTR(__vector unsigned long *, rxdp + 2); rte_compiler_barrier(); - descs[1] = *(__vector unsigned long *)(rxdp + 1); + descs[1] = *RTE_CAST_PTR(__vector unsigned long *, rxdp + 1); rte_compiler_barrier(); - descs[0] = *(__vector unsigned long *)(rxdp); + descs[0] = *RTE_CAST_PTR(__vector unsigned long *, rxdp); /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +532,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; - *(__vector unsigned long *)txdp = descriptor; + *RTE_CAST_PTR(__vector unsigned long *, txdp) = descriptor; } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..86f06d67fb 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,8 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ - __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); - __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); + __m128i *rxdp_desc_0 = RTE_CAST_PTR(__m128i *, &rxdp[desc_idx + 0].wb.qword2); + __m128i *rxdp_desc_1 = RTE_CAST_PTR(__m128i *, &rxdp[desc_idx + 1].wb.qword2); const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,21 +272,21 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +691,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void @@ -728,8 +724,8 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp + 2), desc2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..587cbfb6d9 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,8 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ - __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); - __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); + __m128i *rxdp_desc_0 = RTE_CAST_PTR(__m128i *, &rxdp[desc_idx + 0].wb.qword2); + __m128i *rxdp_desc_1 = RTE_CAST_PTR(__m128i *, &rxdp[desc_idx + 1].wb.qword2); const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -264,28 +260,28 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* load in descriptors, in reverse order */ const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +871,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void @@ -909,7 +905,7 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..12c8521824 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +38,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, zero); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i].read), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,11 +55,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +84,10 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; - desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); - desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); - desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); - desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); + desc0_qw23 = vld1q_u64(RTE_CAST_PTR(uint64_t *, &(rxdp + 0)->wb.qword2)); + desc1_qw23 = vld1q_u64(RTE_CAST_PTR(uint64_t *, &(rxdp + 1)->wb.qword2)); + desc2_qw23 = vld1q_u64(RTE_CAST_PTR(uint64_t *, &(rxdp + 2)->wb.qword2)); + desc3_qw23 = vld1q_u64(RTE_CAST_PTR(uint64_t *, &(rxdp + 3)->wb.qword2)); /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,18 +418,18 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[0] = vld1q_u64((uint64_t *)(rxdp)); + descs[3] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3)); + descs[2] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2)); + descs[1] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1)); + descs[0] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp)); /* Use acquire fence to order loads of descriptor qwords */ rte_atomic_thread_fence(rte_memory_order_acquire); /* A.2 reload qword0 to make it ordered after qword1 load */ - descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0); - descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); - descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); - descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); + descs[3] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3), descs[3], 0); + descs[2] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2), descs[2], 0); + descs[1] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1), descs[1], 0); + descs[0] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp), descs[0], 0); /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +659,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; - vst1q_u64((uint64_t *)txdp, descriptor); + vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor); } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..3fb97f4528 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +37,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -72,8 +68,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +93,10 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; - desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); - desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); - desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); - desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); + desc0_qw23 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &(rxdp + 0)->wb.qword2)); + desc1_qw23 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &(rxdp + 1)->wb.qword2)); + desc2_qw23 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &(rxdp + 2)->wb.qword2)); + desc3_qw23 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &(rxdp + 3)->wb.qword2)); /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +458,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +470,11 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +677,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..8cc61c484f 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,21 +189,21 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +505,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -743,28 +739,28 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -961,35 +957,35 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, /* load bottom half of every 32B desc */ const __m128i raw_desc_bh7 = _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[7].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh6 = _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[6].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh5 = _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[5].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh4 = _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[4].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh3 = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1660,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static __rte_always_inline void @@ -1719,8 +1715,8 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp + 2), desc2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..3c238db104 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -165,28 +161,28 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +596,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -734,28 +730,28 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1113,35 +1109,35 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, /* load bottom half of every 32B desc */ const __m128i raw_desc_bh7 = _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[7].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh6 = _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[6].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh5 = _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[5].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh4 = _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[4].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh3 = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1979,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2033,7 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ @@ -2225,7 +2221,7 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); - _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); + _mm256_storeu_si256(RTE_CAST_PTR(__m256i *, txdp), ctx_data_desc); } static __rte_always_inline void @@ -2300,7 +2296,7 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..18513a6d7f 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,7 +418,7 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -458,8 +454,8 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c index 04be574683..f1b82c7d56 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_neon.c +++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c @@ -36,7 +36,7 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxep[i] = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, zero); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i].read), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -53,11 +53,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH; @@ -269,18 +269,18 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[0] = vld1q_u64((uint64_t *)(rxdp)); + descs[3] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3)); + descs[2] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2)); + descs[1] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1)); + descs[0] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp)); /* Use acquire fence to order loads of descriptor qwords */ rte_atomic_thread_fence(rte_memory_order_acquire); /* A.2 reload qword0 to make it ordered after qword1 load */ - descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0); - descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); - descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); - descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); + descs[3] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3), descs[3], 0); + descs[2] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2), descs[2], 0); + descs[1] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1), descs[1], 0); + descs[0] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp), descs[0], 0); /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..490604bec8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,7 +34,7 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -69,8 +65,8 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +574,7 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +586,11 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +779,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +860,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +872,11 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -928,16 +924,16 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ descs_bh[3] = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); descs_bh[2] = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); descs_bh[1] = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1345,7 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..c62e60c70e 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,7 +29,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -77,8 +73,8 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } #else #ifdef __AVX512VL__ @@ -157,8 +153,8 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, &rxdp->read), dma_addr0_3); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, &(rxdp + 4)->read), dma_addr4_7); } } else #endif /* __AVX512VL__ */ @@ -213,8 +209,8 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ - _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); - _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, &rxdp->read), dma_addr0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, &(rxdp + 2)->read), dma_addr2_3); } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..b7c67d6396 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,21 +250,29 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,37 +448,29 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ - const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + const __m128i raw_desc_bh7 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[7].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + const __m128i raw_desc_bh6 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[6].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + const __m128i raw_desc_bh5 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[5].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + const __m128i raw_desc_bh4 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[4].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + const __m128i raw_desc_bh3 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + const __m128i raw_desc_bh2 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + const __m128i raw_desc_bh1 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + const __m128i raw_desc_bh0 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +786,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static __rte_always_inline void @@ -841,8 +837,8 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp + 2), desc2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..a770710ea0 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -244,28 +240,28 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, /* load in descriptors, in reverse order */ const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -474,37 +470,29 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ - const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + const __m128i raw_desc_bh7 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[7].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + const __m128i raw_desc_bh6 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp[6].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + const __m128i raw_desc_bh5 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[5].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + const __m128i raw_desc_bh4 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[4].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + const __m128i raw_desc_bh3 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + const __m128i raw_desc_bh2 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + const __m128i raw_desc_bh1 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + const __m128i raw_desc_bh0 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +975,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static __rte_always_inline void @@ -1029,7 +1017,7 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..42eaea7326 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,7 +48,7 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -91,8 +87,8 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +421,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +433,11 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -491,19 +487,19 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* load bottom half of every 32B desc */ const __m128i raw_desc_bh3 = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); /** * to shift the 32b RSS hash value to the @@ -680,7 +676,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..fa5702588c 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -36,7 +34,7 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i].read), zero); } } @@ -60,12 +58,12 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -367,10 +365,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); /* A. load 4 pkts descs */ - descs[0] = vld1q_u64((uint64_t *)(rxdp)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); + descs[0] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp)); + descs[1] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1)); + descs[2] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2)); + descs[3] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3)); /* B.2 copy 2 mbuf point into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); @@ -554,7 +552,7 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, pkt->buf_iova + pkt->data_off, (uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len}; - vst1q_u64((uint64_t *)&txdp->read, descriptor); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &txdp->read), descriptor); } static inline void diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..5c1dcb568f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,7 +37,7 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -76,8 +72,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +462,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +474,11 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +672,7 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)&txdp->read, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &txdp->read), descriptor); } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..bd13a243d5 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,7 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" - tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop + tunnel = RTE_PTR_UNQUAL(flow->tunnel); return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..69419610b3 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * @@ -75,7 +73,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { volatile struct mlx5_mini_cqe8 *mcq = - (void *)&(cq + !rxq->cqe_comp_layout)->pkt_info; + (volatile struct mlx5_mini_cqe8 *)&(cq + !rxq->cqe_comp_layout)->pkt_info; /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -139,9 +137,9 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, */ cycle: if (rxq->cqe_comp_layout) - rte_prefetch0((void *)(cq + mcqe_n)); + rte_prefetch0((volatile void *), cq + mcqe_n); for (pos = 0; pos < mcqe_n; ) { - uint8_t *p = (void *)&mcq[pos % 8]; + uint8_t *p = RTE_CAST_PTR(uint8_t *, &mcq[pos % 8]); uint8_t *e0 = (void *)&elts[pos]->rearm_data; uint8_t *e1 = (void *)&elts[pos + 1]->rearm_data; uint8_t *e2 = (void *)&elts[pos + 2]->rearm_data; @@ -157,7 +155,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) - rte_prefetch0((void *)(cq + pos + i)); + rte_prefetch0((volatile void *)(cq + pos + i)); __asm__ volatile ( /* A.1 load mCQEs into a 128bit register. */ "ld1 {v16.16b - v17.16b}, [%[mcq]] \n\t" @@ -367,8 +365,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { if (pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)&(cq + pos)->pkt_info; + rte_prefetch0((volatile void *)(cq + pos + 8)); + mcq = (volatile struct mlx5_mini_cqe8 *)&(cq + pos)->pkt_info; for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -383,7 +381,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile struct mlx5_mini_cqe8 *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -663,7 +661,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, mask = vcreate_u16(pkts_n - pos < MLX5_VPMD_DESCS_PER_LOOP ? -1UL >> ((pkts_n - pos) * sizeof(uint16_t) * 8) : 0); - p0 = (void *)&cq[pos].pkt_info; + p0 = &cq[pos].pkt_info; p1 = p0 + (pkts_n - pos > 1) * sizeof(struct mlx5_cqe); p2 = p1 + (pkts_n - pos > 2) * sizeof(struct mlx5_cqe); p3 = p2 + (pkts_n - pos > 3) * sizeof(struct mlx5_cqe); diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..574df5c407 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile struct mlx5_mini_cqe8 *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -130,7 +127,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, */ cycle: if (rxq->cqe_comp_layout) - rte_prefetch0((void *)(cq + mcqe_n)); + rte_prefetch0((volatile void *)(cq + mcqe_n)); for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -141,10 +138,10 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) - rte_prefetch0((void *)(cq + pos + i)); + rte_prefetch0((volatile void *)(cq + pos + i)); /* A.1 load mCQEs into a 128bit register. */ - mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); - mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); + mcqe1 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &mcq[pos % 8])); + mcqe2 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &mcq[pos % 8 + 2])); /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -355,8 +352,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { if (pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); + rte_prefetch0((volatile void *)(cq + pos + 8)); + mcq = (volatile struct mlx5_mini_cqe8 *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +368,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile struct mlx5_mini_cqe8 *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,38 +648,38 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); - cqes[3] = _mm_loadl_epi64((__m128i *) - &cq[pos + p3].sop_drop_qpn); + cqes[3] = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, + &cq[pos + p3].sop_drop_qpn)); rte_compiler_barrier(); p2 = _mm_extract_epi16(p, 2); - cqes[2] = _mm_loadl_epi64((__m128i *) - &cq[pos + p2].sop_drop_qpn); + cqes[2] = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, + &cq[pos + p2].sop_drop_qpn)); rte_compiler_barrier(); /* B.1 load mbuf pointers. */ mbp1 = _mm_loadu_si128((__m128i *)&elts[pos]); mbp2 = _mm_loadu_si128((__m128i *)&elts[pos + 2]); /* A.1 load a block having op_own. */ p1 = _mm_extract_epi16(p, 1); - cqes[1] = _mm_loadl_epi64((__m128i *) - &cq[pos + p1].sop_drop_qpn); + cqes[1] = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, + &cq[pos + p1].sop_drop_qpn)); rte_compiler_barrier(); - cqes[0] = _mm_loadl_epi64((__m128i *) - &cq[pos].sop_drop_qpn); + cqes[0] = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, + &cq[pos].sop_drop_qpn)); /* B.2 copy mbuf pointers. */ _mm_storeu_si128((__m128i *)&pkts[pos], mbp1); _mm_storeu_si128((__m128i *)&pkts[pos + 2], mbp2); rte_io_rmb(); /* C.1 load remained CQE data and extract necessary fields. */ - cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p3]); - cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos + p2]); + cqe_tmp2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p3])); + cqe_tmp1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p2])); cqes[3] = _mm_blendv_epi8(cqes[3], cqe_tmp2, blend_mask); cqes[2] = _mm_blendv_epi8(cqes[2], cqe_tmp1, blend_mask); - cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + p3].csum); - cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos + p2].csum); + cqe_tmp2 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p3].csum)); + cqe_tmp1 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p2].csum)); cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x30); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); - cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); - cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); + cqe_tmp2 = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, &cq[pos + p3].rsvd4[2])); + cqe_tmp1 = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, &cq[pos + p2].rsvd4[2])); cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,16 +697,16 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ - cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); - cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); + cqe_tmp2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p1])); + cqe_tmp1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &cq[pos])); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); cqes[0] = _mm_blendv_epi8(cqes[0], cqe_tmp1, blend_mask); - cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + p1].csum); - cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos].csum); + cqe_tmp2 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p1].csum)); + cqe_tmp1 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &cq[pos].csum)); cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x30); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); - cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); - cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); + cqe_tmp2 = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, &cq[pos + p1].rsvd4[2])); + cqe_tmp1 = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, &cq[pos].rsvd4[2])); cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c index 37075ea5e7..46391c9400 100644 --- a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c +++ b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c @@ -35,7 +35,7 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_NGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i]), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,12 +58,12 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, rxdp++), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, rxdp++), dma_addr1); } rxq->rxrearm_start += RTE_NGBE_RXQ_REARM_THRESH; @@ -484,7 +484,7 @@ vtx1(volatile struct ngbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; - vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); + vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor); } static inline void diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c index d4d647fab5..a56e2f4164 100644 --- a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c +++ b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c @@ -34,7 +34,7 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_TXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i]), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -57,12 +57,12 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, rxdp++), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, rxdp++), dma_addr1); } rxq->rxrearm_start += RTE_TXGBE_RXQ_REARM_THRESH; @@ -484,7 +484,7 @@ vtx1(volatile struct txgbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; - vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); + vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor); } static inline void diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.2.vfs.0.1 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v15 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (15 preceding siblings ...) 2025-01-18 2:46 ` [PATCH v14 " Andre Muezerie @ 2025-01-18 21:55 ` Andre Muezerie 2025-01-18 21:55 ` [PATCH v15 1/3] eal: " Andre Muezerie ` (2 more replies) 2025-01-21 22:36 ` [PATCH v16 0/3] " Andre Muezerie 17 siblings, 3 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-18 21:55 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson v15: * Fixed a comment in rte_common.h to make Doxygen happy. * Fixed a typo (extra comma). * Added missing RTE_PTR_UNQUAL needed for ARM64. v14: * Renamed RTE_PTR_DROP_QUALIFIERS into RTE_PTR_UNQUAL to more resemble C23 typeof_unqual. * Added macro RTE_CAST_PTR to make the cast more readable when removing a type qualifier from a pointer. v13: * Renamed RTE_IGNORE_CAST_QUAL into RTE_PTR_DROP_QUALIFIERS. * Added (void *) cast to RTE_PTR_DROP_QUALIFIERS to avoid the need for casting the result in most places where the macro is used. v12: * Added macro RTE_IGNORE_CAST_QUAL and used it as a more compact and readable form to suppress warnings where a cast is used to remove a type qualifier. v11: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v10: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v9: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v8: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v7: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v6: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v5: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 72 +++++++++--------- drivers/net/axgbe/axgbe_rxtx.h | 9 --- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 - drivers/net/dpaa2/dpaa2_rxtx.c | 15 +--- drivers/net/fm10k/fm10k_rxtx_vec.c | 21 ++---- drivers/net/hns3/hns3_rxtx_vec_neon.h | 6 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 22 +++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 18 ++--- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 30 ++++---- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 28 +++---- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 - drivers/net/i40e/i40e_rxtx_vec_neon.c | 35 ++++----- drivers/net/i40e/i40e_rxtx_vec_sse.c | 28 +++---- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 60 +++++++-------- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 62 ++++++++-------- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 +-- drivers/net/iavf/iavf_rxtx_vec_neon.c | 22 +++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 38 +++++----- drivers/net/ice/ice_rxtx_common_avx.h | 18 ++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 74 +++++++++---------- drivers/net/ice/ice_rxtx_vec_avx512.c | 64 +++++++--------- drivers/net/ice/ice_rxtx_vec_common.h | 4 - drivers/net/ice/ice_rxtx_vec_sse.c | 28 +++---- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 - .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 18 ++--- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 20 ++--- drivers/net/mlx5/mlx5_flow.c | 5 +- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 ++--- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 61 ++++++++------- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 8 +- drivers/net/tap/tap_flow.c | 6 +- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 8 +- drivers/net/virtio/virtio_rxtx_simple.c | 4 - lib/eal/include/rte_common.h | 46 ++++++++++++ 37 files changed, 396 insertions(+), 483 deletions(-) -- 2.47.2.vfs.0.1 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v15 1/3] eal: add diagnostics macros to make code portable 2025-01-18 21:55 ` [PATCH v15 0/3] " Andre Muezerie @ 2025-01-18 21:55 ` Andre Muezerie 2025-01-21 9:53 ` Morten Brørup 2025-01-18 21:55 ` [PATCH v15 2/3] drivers/common: " Andre Muezerie 2025-01-18 21:55 ` [PATCH v15 3/3] drivers/net: " Andre Muezerie 2 siblings, 1 reply; 87+ messages in thread From: Andre Muezerie @ 2025-01-18 21:55 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 46 ++++++++++++++++++++++++++++++++++++ 1 file changed, 46 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 40592f71b1..4b87a0a352 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -156,6 +156,52 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/** + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + +/** + * Macro to disable compiler warnings about removing a type + * qualifier from the target type. + */ +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/** + * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer, + * without the compiler emitting a warning. + */ +#define RTE_PTR_UNQUAL(X) ((void *)(uintptr_t)(X)) + +/** + * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer + * and cast it to a specific type, without the compiler emitting a warning. + * + * @warning + * Although this macro can be abused for casting a pointer to point to a different type, + * alignment may be incorrect when casting to point to a larger type. E.g.: + * struct s { + * uint16_t a; + * uint8_t b; + * uint8_t c; + * uint8_t d; + * } v; + * uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not 16 bit aligned! + */ +#define RTE_CAST_PTR(type, ptr) ((type)(uintptr_t)(ptr)) + /** * Mark a function or variable to a weak reference. */ -- 2.47.2.vfs.0.1 ^ permalink raw reply [flat|nested] 87+ messages in thread
* RE: [PATCH v15 1/3] eal: add diagnostics macros to make code portable 2025-01-18 21:55 ` [PATCH v15 1/3] eal: " Andre Muezerie @ 2025-01-21 9:53 ` Morten Brørup 2025-01-21 14:28 ` Andre Muezerie 0 siblings, 1 reply; 87+ messages in thread From: Morten Brørup @ 2025-01-21 9:53 UTC (permalink / raw) To: Andre Muezerie; +Cc: dev, stephen, bruce.richardson > From: Andre Muezerie [mailto:andremue@linux.microsoft.com] > Sent: Saturday, 18 January 2025 22.55 > > It was a common pattern to have "GCC diagnostic ignored" pragmas > sprinkled over the code and only activate these pragmas for certain > compilers (gcc and clang). Clang supports GCC's pragma for > compatibility with existing source code, so #pragma GCC diagnostic > and #pragma clang diagnostic are synonyms for Clang > (https://clang.llvm.org/docs/UsersManual.html). > > Now that effort is being made to make the code compatible with MSVC > these expressions would become more complex. It makes sense to hide > this complexity behind macros. This makes maintenance easier as these > macros are defined in a single place. As a plus the code becomes > more readable as well. > > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> > --- > lib/eal/include/rte_common.h | 46 ++++++++++++++++++++++++++++++++++++ > 1 file changed, 46 insertions(+) > > diff --git a/lib/eal/include/rte_common.h > b/lib/eal/include/rte_common.h > index 40592f71b1..4b87a0a352 100644 > --- a/lib/eal/include/rte_common.h > +++ b/lib/eal/include/rte_common.h > @@ -156,6 +156,52 @@ typedef uint16_t unaligned_uint16_t; > #define RTE_DEPRECATED(x) > #endif > > +/** > + * Macros to cause the compiler to remember the state of the diagnostics as of > + * each push, and restore to that point at each pop. > + */ > +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) > +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") > +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") > +#else > +#define __rte_diagnostic_push > +#define __rte_diagnostic_pop > +#endif > + > +/** > + * Macro to disable compiler warnings about removing a type > + * qualifier from the target type. > + */ > +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) > +#define __rte_diagnostic_ignored_wcast_qual \ > + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") > +#else > +#define __rte_diagnostic_ignored_wcast_qual > +#endif > + > +/** > + * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer, > + * without the compiler emitting a warning. > + */ > +#define RTE_PTR_UNQUAL(X) ((void *)(uintptr_t)(X)) It seems the C23 typeof_unqual and the built-in pre-C23 __typeof_unqual__ couldn't be used. Was it a generic issue, or only when operating on (the return value of) functions? > + > +/** > + * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer > + * and cast it to a specific type, without the compiler emitting a warning. Propose new description with emphasis on casting rather than discarding qualifiers: Workaround to cast a pointer to a specific type, without the compiler emitting a warning about discarding qualifiers. > + * > + * @warning > + * Although this macro can be abused for casting a pointer to point to a different type, > + * alignment may be incorrect when casting to point to a larger type. E.g.: This macro is now meant for abuse, so propose softening the warning: When casting a pointer to point to a larger type, the resulting pointer may be misaligned, which causes undefined behavior. E.g.: > + * struct s { > + * uint16_t a; > + * uint8_t b; > + * uint8_t c; > + * uint8_t d; > + * } v; > + * uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not 16 bit aligned! > + */ > +#define RTE_CAST_PTR(type, ptr) ((type)(uintptr_t)(ptr)) I am somewhat concerned about these macros... There's a good reason why MSVC doesn't allow casting to discard qualifiers or changing the type like this. If in doubt, read this: https://www.trust-in-soft.com/resources/blogs/2020-04-06-gcc-always-assumes-aligned-pointer-accesses We need these workarounds because DPDK currently contains code with formally "undefined behavior". And instead of fixing the root causes, we choose the pragmatic solution and introduce workarounds to allow it. Would it be possible for the RTE_CAST_PTR macro to check if the casted-to pointer changes from a smaller type to a larger type, and warn/fail if it does? Should the use of these workaround macros be disallowed in new code? I.e. should checkpatches check for them? ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v15 1/3] eal: add diagnostics macros to make code portable 2025-01-21 9:53 ` Morten Brørup @ 2025-01-21 14:28 ` Andre Muezerie 2025-01-21 14:41 ` Morten Brørup 2025-01-21 15:01 ` Stephen Hemminger 0 siblings, 2 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-21 14:28 UTC (permalink / raw) To: Morten Brørup; +Cc: dev, stephen, bruce.richardson On Tue, Jan 21, 2025 at 10:53:14AM +0100, Morten Brørup wrote: > > From: Andre Muezerie [mailto:andremue@linux.microsoft.com] > > Sent: Saturday, 18 January 2025 22.55 > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > sprinkled over the code and only activate these pragmas for certain > > compilers (gcc and clang). Clang supports GCC's pragma for > > compatibility with existing source code, so #pragma GCC diagnostic > > and #pragma clang diagnostic are synonyms for Clang > > (https://clang.llvm.org/docs/UsersManual.html). > > > > Now that effort is being made to make the code compatible with MSVC > > these expressions would become more complex. It makes sense to hide > > this complexity behind macros. This makes maintenance easier as these > > macros are defined in a single place. As a plus the code becomes > > more readable as well. > > > > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> > > --- > > lib/eal/include/rte_common.h | 46 ++++++++++++++++++++++++++++++++++++ > > 1 file changed, 46 insertions(+) > > > > diff --git a/lib/eal/include/rte_common.h > > b/lib/eal/include/rte_common.h > > index 40592f71b1..4b87a0a352 100644 > > --- a/lib/eal/include/rte_common.h > > +++ b/lib/eal/include/rte_common.h > > @@ -156,6 +156,52 @@ typedef uint16_t unaligned_uint16_t; > > #define RTE_DEPRECATED(x) > > #endif > > > > +/** > > + * Macros to cause the compiler to remember the state of the diagnostics as of > > + * each push, and restore to that point at each pop. > > + */ > > +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) > > +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") > > +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") > > +#else > > +#define __rte_diagnostic_push > > +#define __rte_diagnostic_pop > > +#endif > > + > > +/** > > + * Macro to disable compiler warnings about removing a type > > + * qualifier from the target type. > > + */ > > +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) > > +#define __rte_diagnostic_ignored_wcast_qual \ > > + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") > > +#else > > +#define __rte_diagnostic_ignored_wcast_qual > > +#endif > > + > > +/** > > + * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer, > > + * without the compiler emitting a warning. > > + */ > > +#define RTE_PTR_UNQUAL(X) ((void *)(uintptr_t)(X)) > > It seems the C23 typeof_unqual and the built-in pre-C23 __typeof_unqual__ couldn't be used. > Was it a generic issue, or only when operating on (the return value of) functions? I experimented with C23 typeof_unqual. It indeed works on gcc, clang and MSVC, but there are some details: a) With gcc the project needs to be compiled with -std=c2x. Many other warnings show up, unrelated to the scope of this patchset. Some look suspicious and should be looked at. An error also showed up, for which I sent out a small patch. b) When using typeof_unqual and passing "-Wcast-qual" to the compiler, a warning about the qualifier being dropped is emitted. The project currently uses "-Wcast-qual". Perhaps it shouldn't? Due to (a) I decided to not use typeof_unqual for now, but it would be trivial to change the macro to do so in the future. > > > + > > +/** > > + * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer > > + * and cast it to a specific type, without the compiler emitting a warning. > > Propose new description with emphasis on casting rather than discarding qualifiers: > > Workaround to cast a pointer to a specific type, > without the compiler emitting a warning about discarding qualifiers. > I'll update this. > > + * > > + * @warning > > + * Although this macro can be abused for casting a pointer to point to a different type, > > + * alignment may be incorrect when casting to point to a larger type. E.g.: > > This macro is now meant for abuse, so propose softening the warning: > > When casting a pointer to point to a larger type, > the resulting pointer may be misaligned, > which causes undefined behavior. I'll update this. > E.g.: > > > + * struct s { > > + * uint16_t a; > > + * uint8_t b; > > + * uint8_t c; > > + * uint8_t d; > > + * } v; > > + * uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not 16 bit aligned! > > + */ > > +#define RTE_CAST_PTR(type, ptr) ((type)(uintptr_t)(ptr)) > > I am somewhat concerned about these macros... > > There's a good reason why MSVC doesn't allow casting to discard qualifiers or changing the type like this. > > If in doubt, read this: > https://www.trust-in-soft.com/resources/blogs/2020-04-06-gcc-always-assumes-aligned-pointer-accesses > > We need these workarounds because DPDK currently contains code with formally "undefined behavior". > And instead of fixing the root causes, we choose the pragmatic solution and introduce workarounds to allow it. > > Would it be possible for the RTE_CAST_PTR macro to check if the casted-to pointer changes from a smaller type to a larger type, and warn/fail if it does? I'll think about it. > > Should the use of these workaround macros be disallowed in new code? > I.e. should checkpatches check for them? We can certainly add a check to checkpatches. ^ permalink raw reply [flat|nested] 87+ messages in thread
* RE: [PATCH v15 1/3] eal: add diagnostics macros to make code portable 2025-01-21 14:28 ` Andre Muezerie @ 2025-01-21 14:41 ` Morten Brørup 2025-01-21 20:17 ` Andre Muezerie 2025-01-21 15:01 ` Stephen Hemminger 1 sibling, 1 reply; 87+ messages in thread From: Morten Brørup @ 2025-01-21 14:41 UTC (permalink / raw) To: Andre Muezerie; +Cc: dev, stephen, bruce.richardson > From: Andre Muezerie [mailto:andremue@linux.microsoft.com] > Sent: Tuesday, 21 January 2025 15.28 > > On Tue, Jan 21, 2025 at 10:53:14AM +0100, Morten Brørup wrote: > > > From: Andre Muezerie [mailto:andremue@linux.microsoft.com] > > > Sent: Saturday, 18 January 2025 22.55 > > > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > > sprinkled over the code and only activate these pragmas for certain > > > compilers (gcc and clang). Clang supports GCC's pragma for > > > compatibility with existing source code, so #pragma GCC diagnostic > > > and #pragma clang diagnostic are synonyms for Clang > > > (https://clang.llvm.org/docs/UsersManual.html). > > > > > > Now that effort is being made to make the code compatible with MSVC > > > these expressions would become more complex. It makes sense to hide > > > this complexity behind macros. This makes maintenance easier as > these > > > macros are defined in a single place. As a plus the code becomes > > > more readable as well. > > > > > > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> > > > --- > > > lib/eal/include/rte_common.h | 46 > ++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 46 insertions(+) > > > > > > diff --git a/lib/eal/include/rte_common.h > > > b/lib/eal/include/rte_common.h > > > index 40592f71b1..4b87a0a352 100644 > > > --- a/lib/eal/include/rte_common.h > > > +++ b/lib/eal/include/rte_common.h > > > @@ -156,6 +156,52 @@ typedef uint16_t unaligned_uint16_t; > > > #define RTE_DEPRECATED(x) > > > #endif > > > > > > +/** > > > + * Macros to cause the compiler to remember the state of the > diagnostics as of > > > + * each push, and restore to that point at each pop. > > > + */ > > > +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) > > > +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") > > > +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") > > > +#else > > > +#define __rte_diagnostic_push > > > +#define __rte_diagnostic_pop > > > +#endif > > > + > > > +/** > > > + * Macro to disable compiler warnings about removing a type > > > + * qualifier from the target type. > > > + */ > > > +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) > > > +#define __rte_diagnostic_ignored_wcast_qual \ > > > + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") > > > +#else > > > +#define __rte_diagnostic_ignored_wcast_qual > > > +#endif > > > + > > > +/** > > > + * Workaround to discard qualifiers (such as const, volatile, > restrict) from a pointer, > > > + * without the compiler emitting a warning. > > > + */ > > > +#define RTE_PTR_UNQUAL(X) ((void *)(uintptr_t)(X)) > > > > It seems the C23 typeof_unqual and the built-in pre-C23 > __typeof_unqual__ couldn't be used. > > Was it a generic issue, or only when operating on (the return value > of) functions? > > I experimented with C23 typeof_unqual. It indeed works on gcc, clang > and MSVC, but there are some details: > > a) With gcc the project needs to be compiled with -std=c2x. Many other > warnings show up, unrelated to the scope of this patchset. Some look > suspicious and should be looked at. An error also showed up, for which > I sent out a small patch. > > b) When using typeof_unqual and passing "-Wcast-qual" to the compiler, > a warning about the qualifier being dropped is emitted. The project > currently uses "-Wcast-qual". Perhaps it shouldn't? The compiler is our friend; when more warnings enabled, the code quality requirements are higher. Although this statement may not be universally true, I think it is for "-Wcast-qual". > > Due to (a) I decided to not use typeof_unqual for now, but it would be > trivial to change the macro to do so in the future. How about __typeof_unqual__ (with double underscores prefix and postfix)? It seems to be available in both GCC [1] and MSVC [2] without requiring C23. [1]: https://gcc.gnu.org/onlinedocs/gcc/Typeof.html [2]: https://learn.microsoft.com/en-us/cpp/c-language/typeof-unqual-c?view=msvc-170 > > > > > > + > > > +/** > > > + * Workaround to discard qualifiers (such as const, volatile, > restrict) from a pointer > > > + * and cast it to a specific type, without the compiler emitting a > warning. > > > > Propose new description with emphasis on casting rather than > discarding qualifiers: > > > > Workaround to cast a pointer to a specific type, > > without the compiler emitting a warning about discarding qualifiers. > > > > I'll update this. > > > > + * > > > + * @warning > > > + * Although this macro can be abused for casting a pointer to > point to a different type, > > > + * alignment may be incorrect when casting to point to a larger > type. E.g.: > > > > This macro is now meant for abuse, so propose softening the warning: > > > > When casting a pointer to point to a larger type, > > the resulting pointer may be misaligned, > > which causes undefined behavior. > > I'll update this. > > > E.g.: > > > > > + * struct s { > > > + * uint16_t a; > > > + * uint8_t b; > > > + * uint8_t c; > > > + * uint8_t d; > > > + * } v; > > > + * uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not > 16 bit aligned! > > > + */ > > > +#define RTE_CAST_PTR(type, ptr) ((type)(uintptr_t)(ptr)) > > > > I am somewhat concerned about these macros... > > > > There's a good reason why MSVC doesn't allow casting to discard > qualifiers or changing the type like this. > > > > If in doubt, read this: > > https://www.trust-in-soft.com/resources/blogs/2020-04-06-gcc-always- > assumes-aligned-pointer-accesses > > > > We need these workarounds because DPDK currently contains code with > formally "undefined behavior". > > And instead of fixing the root causes, we choose the pragmatic > solution and introduce workarounds to allow it. > > > > Would it be possible for the RTE_CAST_PTR macro to check if the > casted-to pointer changes from a smaller type to a larger type, and > warn/fail if it does? > > I'll think about it. > > > > Should the use of these workaround macros be disallowed in new code? > > I.e. should checkpatches check for them? > > We can certainly add a check to checkpatches. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v15 1/3] eal: add diagnostics macros to make code portable 2025-01-21 14:41 ` Morten Brørup @ 2025-01-21 20:17 ` Andre Muezerie 0 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-21 20:17 UTC (permalink / raw) To: Morten Brørup; +Cc: dev, stephen, bruce.richardson On Tue, Jan 21, 2025 at 03:41:09PM +0100, Morten Brørup wrote: > > From: Andre Muezerie [mailto:andremue@linux.microsoft.com] > > Sent: Tuesday, 21 January 2025 15.28 > > > > On Tue, Jan 21, 2025 at 10:53:14AM +0100, Morten Brørup wrote: > > > > From: Andre Muezerie [mailto:andremue@linux.microsoft.com] > > > > Sent: Saturday, 18 January 2025 22.55 > > > > > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > > > sprinkled over the code and only activate these pragmas for certain > > > > compilers (gcc and clang). Clang supports GCC's pragma for > > > > compatibility with existing source code, so #pragma GCC diagnostic > > > > and #pragma clang diagnostic are synonyms for Clang > > > > (https://clang.llvm.org/docs/UsersManual.html). > > > > > > > > Now that effort is being made to make the code compatible with MSVC > > > > these expressions would become more complex. It makes sense to hide > > > > this complexity behind macros. This makes maintenance easier as > > these > > > > macros are defined in a single place. As a plus the code becomes > > > > more readable as well. > > > > > > > > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> > > > > --- > > > > lib/eal/include/rte_common.h | 46 > > ++++++++++++++++++++++++++++++++++++ > > > > 1 file changed, 46 insertions(+) > > > > > > > > diff --git a/lib/eal/include/rte_common.h > > > > b/lib/eal/include/rte_common.h > > > > index 40592f71b1..4b87a0a352 100644 > > > > --- a/lib/eal/include/rte_common.h > > > > +++ b/lib/eal/include/rte_common.h > > > > @@ -156,6 +156,52 @@ typedef uint16_t unaligned_uint16_t; > > > > #define RTE_DEPRECATED(x) > > > > #endif > > > > > > > > +/** > > > > + * Macros to cause the compiler to remember the state of the > > diagnostics as of > > > > + * each push, and restore to that point at each pop. > > > > + */ > > > > +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) > > > > +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") > > > > +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") > > > > +#else > > > > +#define __rte_diagnostic_push > > > > +#define __rte_diagnostic_pop > > > > +#endif > > > > + > > > > +/** > > > > + * Macro to disable compiler warnings about removing a type > > > > + * qualifier from the target type. > > > > + */ > > > > +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) > > > > +#define __rte_diagnostic_ignored_wcast_qual \ > > > > + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") > > > > +#else > > > > +#define __rte_diagnostic_ignored_wcast_qual > > > > +#endif > > > > + > > > > +/** > > > > + * Workaround to discard qualifiers (such as const, volatile, > > restrict) from a pointer, > > > > + * without the compiler emitting a warning. > > > > + */ > > > > +#define RTE_PTR_UNQUAL(X) ((void *)(uintptr_t)(X)) > > > > > > It seems the C23 typeof_unqual and the built-in pre-C23 > > __typeof_unqual__ couldn't be used. > > > Was it a generic issue, or only when operating on (the return value > > of) functions? > > > > I experimented with C23 typeof_unqual. It indeed works on gcc, clang > > and MSVC, but there are some details: > > > > a) With gcc the project needs to be compiled with -std=c2x. Many other > > warnings show up, unrelated to the scope of this patchset. Some look > > suspicious and should be looked at. An error also showed up, for which > > I sent out a small patch. > > > > b) When using typeof_unqual and passing "-Wcast-qual" to the compiler, > > a warning about the qualifier being dropped is emitted. The project > > currently uses "-Wcast-qual". Perhaps it shouldn't? > > The compiler is our friend; when more warnings enabled, the code quality requirements are higher. I agree 100% with this. > Although this statement may not be universally true, I think it is for "-Wcast-qual". > > > > > Due to (a) I decided to not use typeof_unqual for now, but it would be > > trivial to change the macro to do so in the future. > > How about __typeof_unqual__ (with double underscores prefix and postfix)? > It seems to be available in both GCC [1] and MSVC [2] without requiring C23. > > [1]: https://gcc.gnu.org/onlinedocs/gcc/Typeof.html > [2]: https://learn.microsoft.com/en-us/cpp/c-language/typeof-unqual-c?view=msvc-170 __typeof_unqual__ (with double underscores prefix and postfix) requires gcc-14. On gcc-13 this keyword is not defined. It works exactly like C23 typeof_unqual: When -Wcast-qual is passed to the compiler a warning is emitted when discarding a qualifier from the pointer target type, which is treated as error in the DPDK build. Without -Wcast-qual it compiles cleanly. > > > > > > > > > > + > > > > +/** > > > > + * Workaround to discard qualifiers (such as const, volatile, > > restrict) from a pointer > > > > + * and cast it to a specific type, without the compiler emitting a > > warning. > > > > > > Propose new description with emphasis on casting rather than > > discarding qualifiers: > > > > > > Workaround to cast a pointer to a specific type, > > > without the compiler emitting a warning about discarding qualifiers. > > > > > > > I'll update this. > > > > > > + * > > > > + * @warning > > > > + * Although this macro can be abused for casting a pointer to > > point to a different type, > > > > + * alignment may be incorrect when casting to point to a larger > > type. E.g.: > > > > > > This macro is now meant for abuse, so propose softening the warning: > > > > > > When casting a pointer to point to a larger type, > > > the resulting pointer may be misaligned, > > > which causes undefined behavior. > > > > I'll update this. > > > > > E.g.: > > > > > > > + * struct s { > > > > + * uint16_t a; > > > > + * uint8_t b; > > > > + * uint8_t c; > > > > + * uint8_t d; > > > > + * } v; > > > > + * uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not > > 16 bit aligned! > > > > + */ > > > > +#define RTE_CAST_PTR(type, ptr) ((type)(uintptr_t)(ptr)) > > > > > > I am somewhat concerned about these macros... > > > > > > There's a good reason why MSVC doesn't allow casting to discard > > qualifiers or changing the type like this. > > > > > > If in doubt, read this: > > > https://www.trust-in-soft.com/resources/blogs/2020-04-06-gcc-always- > > assumes-aligned-pointer-accesses > > > > > > We need these workarounds because DPDK currently contains code with > > formally "undefined behavior". > > > And instead of fixing the root causes, we choose the pragmatic > > solution and introduce workarounds to allow it. > > > > > > Would it be possible for the RTE_CAST_PTR macro to check if the > > casted-to pointer changes from a smaller type to a larger type, and > > warn/fail if it does? > > > > I'll think about it. > > > > > > Should the use of these workaround macros be disallowed in new code? > > > I.e. should checkpatches check for them? > > > > We can certainly add a check to checkpatches. ^ permalink raw reply [flat|nested] 87+ messages in thread
* Re: [PATCH v15 1/3] eal: add diagnostics macros to make code portable 2025-01-21 14:28 ` Andre Muezerie 2025-01-21 14:41 ` Morten Brørup @ 2025-01-21 15:01 ` Stephen Hemminger 1 sibling, 0 replies; 87+ messages in thread From: Stephen Hemminger @ 2025-01-21 15:01 UTC (permalink / raw) To: Andre Muezerie; +Cc: Morten Brørup, dev, bruce.richardson On Tue, 21 Jan 2025 06:28:16 -0800 Andre Muezerie <andremue@linux.microsoft.com> wrote: > On Tue, Jan 21, 2025 at 10:53:14AM +0100, Morten Brørup wrote: > > > From: Andre Muezerie [mailto:andremue@linux.microsoft.com] > > > Sent: Saturday, 18 January 2025 22.55 > > > > > > It was a common pattern to have "GCC diagnostic ignored" pragmas > > > sprinkled over the code and only activate these pragmas for certain > > > compilers (gcc and clang). Clang supports GCC's pragma for > > > compatibility with existing source code, so #pragma GCC diagnostic > > > and #pragma clang diagnostic are synonyms for Clang > > > (https://clang.llvm.org/docs/UsersManual.html). > > > > > > Now that effort is being made to make the code compatible with MSVC > > > these expressions would become more complex. It makes sense to hide > > > this complexity behind macros. This makes maintenance easier as these > > > macros are defined in a single place. As a plus the code becomes > > > more readable as well. > > > > > > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> > > > --- > > > lib/eal/include/rte_common.h | 46 ++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 46 insertions(+) > > > > > > diff --git a/lib/eal/include/rte_common.h > > > b/lib/eal/include/rte_common.h > > > index 40592f71b1..4b87a0a352 100644 > > > --- a/lib/eal/include/rte_common.h > > > +++ b/lib/eal/include/rte_common.h > > > @@ -156,6 +156,52 @@ typedef uint16_t unaligned_uint16_t; > > > #define RTE_DEPRECATED(x) > > > #endif > > > > > > +/** > > > + * Macros to cause the compiler to remember the state of the diagnostics as of > > > + * each push, and restore to that point at each pop. > > > + */ > > > +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) > > > +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") > > > +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") > > > +#else > > > +#define __rte_diagnostic_push > > > +#define __rte_diagnostic_pop > > > +#endif > > > + > > > +/** > > > + * Macro to disable compiler warnings about removing a type > > > + * qualifier from the target type. > > > + */ > > > +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) > > > +#define __rte_diagnostic_ignored_wcast_qual \ > > > + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") > > > +#else > > > +#define __rte_diagnostic_ignored_wcast_qual > > > +#endif > > > + > > > +/** > > > + * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer, > > > + * without the compiler emitting a warning. > > > + */ > > > +#define RTE_PTR_UNQUAL(X) ((void *)(uintptr_t)(X)) > > > > It seems the C23 typeof_unqual and the built-in pre-C23 __typeof_unqual__ couldn't be used. > > Was it a generic issue, or only when operating on (the return value of) functions? > > I experimented with C23 typeof_unqual. It indeed works on gcc, clang and MSVC, but there are some details: > > a) With gcc the project needs to be compiled with -std=c2x. Many other warnings show up, unrelated to the scope of this patchset. Some look suspicious and should be looked at. An error also showed up, for which I sent out a small patch. > > b) When using typeof_unqual and passing "-Wcast-qual" to the compiler, a warning about the qualifier being dropped is emitted. The project currently uses "-Wcast-qual". Perhaps it shouldn't? > > Due to (a) I decided to not use typeof_unqual for now, but it would be trivial to change the macro to do so in the future. Be careful, C23 changes some things like default initialization of padding bits. Might break other stuff. ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v15 2/3] drivers/common: add diagnostics macros to make code portable 2025-01-18 21:55 ` [PATCH v15 0/3] " Andre Muezerie 2025-01-18 21:55 ` [PATCH v15 1/3] eal: " Andre Muezerie @ 2025-01-18 21:55 ` Andre Muezerie 2025-01-18 21:55 ` [PATCH v15 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-18 21:55 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 72 +++++++++---------- 1 file changed, 34 insertions(+), 38 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..81052e72c1 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,7 +30,7 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -108,8 +104,8 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, &rxdp->read), dma_addr0_3); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, &(rxdp + 4)->read), dma_addr4_7); } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +160,8 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_storeu_si128((__m128i *)&rxdp[i].read, - dma_addr0); + _mm_storeu_si128(RTE_CAST_PTR + (__m128i *, &rxdp[i].read), dma_addr0); } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +212,10 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); - _mm512_storeu_si512((void *)rxdp, desc0_1); - _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); - _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); - _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); + _mm512_storeu_si512(RTE_CAST_PTR(void *, rxdp), desc0_1); + _mm512_storeu_si512(RTE_CAST_PTR(void *, (rxdp + 2)), desc2_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, (rxdp + 4)), desc4_5); + _mm512_storeu_si512(RTE_CAST_PTR(void *, (rxdp + 6)), desc6_7); rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -337,28 +333,28 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,7 +556,7 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i], + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i]), dma_addr0); } } @@ -634,7 +630,7 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; - _mm_storeu_si128((__m128i *)&rxdp[i], + _mm_storeu_si128(RTE_CAST_PTR(__m128i *, &rxdp[i]), dma_addr0); } } @@ -798,28 +794,28 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1127,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1174,7 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ @@ -1435,7 +1431,7 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static __rte_always_inline void @@ -1480,7 +1476,7 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ @@ -1521,11 +1517,11 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); - idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); + idpf_splitq_vtx(txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); - idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); + idpf_splitq_vtx1(txdp, *tx_pkts++, cmd_dtype); nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1536,7 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); - idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); + idpf_splitq_vtx(txdp, tx_pkts, nb_commit, cmd_dtype); tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.2.vfs.0.1 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v15 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-18 21:55 ` [PATCH v15 0/3] " Andre Muezerie 2025-01-18 21:55 ` [PATCH v15 1/3] eal: " Andre Muezerie 2025-01-18 21:55 ` [PATCH v15 2/3] drivers/common: " Andre Muezerie @ 2025-01-18 21:55 ` Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-18 21:55 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 --- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 - drivers/net/dpaa2/dpaa2_rxtx.c | 15 +--- drivers/net/fm10k/fm10k_rxtx_vec.c | 21 ++---- drivers/net/hns3/hns3_rxtx_vec_neon.h | 6 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 22 +++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 18 ++--- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 30 ++++---- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 28 +++---- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 - drivers/net/i40e/i40e_rxtx_vec_neon.c | 35 ++++----- drivers/net/i40e/i40e_rxtx_vec_sse.c | 28 +++---- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 60 +++++++-------- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 62 ++++++++-------- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 +-- drivers/net/iavf/iavf_rxtx_vec_neon.c | 22 +++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 38 +++++----- drivers/net/ice/ice_rxtx_common_avx.h | 18 ++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 74 +++++++++---------- drivers/net/ice/ice_rxtx_vec_avx512.c | 64 +++++++--------- drivers/net/ice/ice_rxtx_vec_common.h | 4 - drivers/net/ice/ice_rxtx_vec_sse.c | 28 +++---- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 - .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 18 ++--- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 20 ++--- drivers/net/mlx5/mlx5_flow.c | 5 +- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 ++--- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 61 ++++++++------- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 8 +- drivers/net/tap/tap_flow.c | 6 +- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 8 +- drivers/net/virtio/virtio_rxtx_simple.c | 4 - 35 files changed, 316 insertions(+), 445 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..bfb5542bbc 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,7 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } - fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); + fd[num_rx] = RTE_PTR_UNQUAL(qbman_result_DQ_fd(dq_storage)); dq_storage++; num_rx++; @@ -2118,8 +2110,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..715c891c30 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,7 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].q, - dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].q), dma_addr0); } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +311,8 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->q), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->q), dma_addr1); /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +460,7 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs0[3] = _mm_loadu_si128(RTE_CAST_PTR(__m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +472,11 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs0[2] = _mm_loadu_si128(RTE_CAST_PTR(__m128i *, rxdp + 2)); rte_compiler_barrier(); - descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs0[1] = _mm_loadu_si128(RTE_CAST_PTR(__m128i *, rxdp + 1)); rte_compiler_barrier(); - descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs0[0] = _mm_loadu_si128(RTE_CAST_PTR(__m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +731,7 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..f2e155c9f5 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,8 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; - vst1q_u64((uint64_t *)&desc->addr, val1); - vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &desc->addr), val1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &desc->tx.outer_vlan_tag), val2); } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..b66a808f9f 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,7 +32,7 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -72,8 +68,8 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } #else #ifdef __AVX512VL__ @@ -144,8 +140,10 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, + &rxdp->read), dma_addr0_3); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, + &(rxdp + 4)->read), dma_addr4_7); } } else #endif /* __AVX512VL__*/ @@ -190,8 +188,10 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ - _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); - _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, + &rxdp->read), dma_addr0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, + &(rxdp + 2)->read), dma_addr2_3); } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..f9e26d18dd 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -44,7 +42,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; vec_st(dma_addr0, 0, - (__vector unsigned long *)&rxdp[i].read); + RTE_CAST_PTR(__vector unsigned long *, &rxdp[i].read)); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +82,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); - vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); + vec_st(dma_addr0, 0, RTE_CAST_PTR(__vector unsigned long *, &rxdp++->read)); + vec_st(dma_addr1, 0, RTE_CAST_PTR(__vector unsigned long *, &rxdp++->read)); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +284,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = *(__vector unsigned long *)(rxdp + 3); + descs[3] = *RTE_CAST_PTR(__vector unsigned long *, rxdp + 3); rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +294,11 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ - descs[2] = *(__vector unsigned long *)(rxdp + 2); + descs[2] = *RTE_CAST_PTR(__vector unsigned long *, rxdp + 2); rte_compiler_barrier(); - descs[1] = *(__vector unsigned long *)(rxdp + 1); + descs[1] = *RTE_CAST_PTR(__vector unsigned long *, rxdp + 1); rte_compiler_barrier(); - descs[0] = *(__vector unsigned long *)(rxdp); + descs[0] = *RTE_CAST_PTR(__vector unsigned long *, rxdp); /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +532,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; - *(__vector unsigned long *)txdp = descriptor; + *RTE_CAST_PTR(__vector unsigned long *, txdp) = descriptor; } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..86f06d67fb 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,8 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ - __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); - __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); + __m128i *rxdp_desc_0 = RTE_CAST_PTR(__m128i *, &rxdp[desc_idx + 0].wb.qword2); + __m128i *rxdp_desc_1 = RTE_CAST_PTR(__m128i *, &rxdp[desc_idx + 1].wb.qword2); const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,21 +272,21 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +691,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void @@ -728,8 +724,8 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp + 2), desc2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..587cbfb6d9 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,8 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ - __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); - __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); + __m128i *rxdp_desc_0 = RTE_CAST_PTR(__m128i *, &rxdp[desc_idx + 0].wb.qword2); + __m128i *rxdp_desc_1 = RTE_CAST_PTR(__m128i *, &rxdp[desc_idx + 1].wb.qword2); const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -264,28 +260,28 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* load in descriptors, in reverse order */ const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +871,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void @@ -909,7 +905,7 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..12c8521824 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +38,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, zero); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i].read), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,11 +55,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +84,10 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; - desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); - desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); - desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); - desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); + desc0_qw23 = vld1q_u64(RTE_CAST_PTR(uint64_t *, &(rxdp + 0)->wb.qword2)); + desc1_qw23 = vld1q_u64(RTE_CAST_PTR(uint64_t *, &(rxdp + 1)->wb.qword2)); + desc2_qw23 = vld1q_u64(RTE_CAST_PTR(uint64_t *, &(rxdp + 2)->wb.qword2)); + desc3_qw23 = vld1q_u64(RTE_CAST_PTR(uint64_t *, &(rxdp + 3)->wb.qword2)); /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,18 +418,18 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[0] = vld1q_u64((uint64_t *)(rxdp)); + descs[3] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3)); + descs[2] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2)); + descs[1] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1)); + descs[0] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp)); /* Use acquire fence to order loads of descriptor qwords */ rte_atomic_thread_fence(rte_memory_order_acquire); /* A.2 reload qword0 to make it ordered after qword1 load */ - descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0); - descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); - descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); - descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); + descs[3] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3), descs[3], 0); + descs[2] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2), descs[2], 0); + descs[1] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1), descs[1], 0); + descs[0] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp), descs[0], 0); /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +659,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; - vst1q_u64((uint64_t *)txdp, descriptor); + vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor); } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..3fb97f4528 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +37,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -72,8 +68,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +93,10 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; - desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); - desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); - desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); - desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); + desc0_qw23 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &(rxdp + 0)->wb.qword2)); + desc1_qw23 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &(rxdp + 1)->wb.qword2)); + desc2_qw23 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &(rxdp + 2)->wb.qword2)); + desc3_qw23 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &(rxdp + 3)->wb.qword2)); /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +458,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +470,11 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +677,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..8cc61c484f 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,21 +189,21 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +505,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -743,28 +739,28 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -961,35 +957,35 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, /* load bottom half of every 32B desc */ const __m128i raw_desc_bh7 = _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[7].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh6 = _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[6].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh5 = _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[5].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh4 = _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[4].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh3 = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1660,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static __rte_always_inline void @@ -1719,8 +1715,8 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp + 2), desc2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..3c238db104 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -165,28 +161,28 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +596,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -734,28 +730,28 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1113,35 +1109,35 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, /* load bottom half of every 32B desc */ const __m128i raw_desc_bh7 = _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[7].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh6 = _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[6].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh5 = _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[5].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh4 = _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[4].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh3 = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1979,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2033,7 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ @@ -2225,7 +2221,7 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); - _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); + _mm256_storeu_si256(RTE_CAST_PTR(__m256i *, txdp), ctx_data_desc); } static __rte_always_inline void @@ -2300,7 +2296,7 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..18513a6d7f 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,7 +418,7 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -458,8 +454,8 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c index 04be574683..f1b82c7d56 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_neon.c +++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c @@ -36,7 +36,7 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxep[i] = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, zero); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i].read), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -53,11 +53,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH; @@ -269,18 +269,18 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[0] = vld1q_u64((uint64_t *)(rxdp)); + descs[3] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3)); + descs[2] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2)); + descs[1] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1)); + descs[0] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp)); /* Use acquire fence to order loads of descriptor qwords */ rte_atomic_thread_fence(rte_memory_order_acquire); /* A.2 reload qword0 to make it ordered after qword1 load */ - descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0); - descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); - descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); - descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); + descs[3] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3), descs[3], 0); + descs[2] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2), descs[2], 0); + descs[1] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1), descs[1], 0); + descs[0] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp), descs[0], 0); /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..490604bec8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,7 +34,7 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -69,8 +65,8 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +574,7 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +586,11 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +779,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +860,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +872,11 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -928,16 +924,16 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ descs_bh[3] = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); descs_bh[2] = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); descs_bh[1] = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1345,7 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..c62e60c70e 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,7 +29,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -77,8 +73,8 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } #else #ifdef __AVX512VL__ @@ -157,8 +153,8 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, &rxdp->read), dma_addr0_3); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, &(rxdp + 4)->read), dma_addr4_7); } } else #endif /* __AVX512VL__ */ @@ -213,8 +209,8 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ - _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); - _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, &rxdp->read), dma_addr0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, &(rxdp + 2)->read), dma_addr2_3); } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..b7c67d6396 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,21 +250,29 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,37 +448,29 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ - const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + const __m128i raw_desc_bh7 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[7].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + const __m128i raw_desc_bh6 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[6].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + const __m128i raw_desc_bh5 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[5].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + const __m128i raw_desc_bh4 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[4].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + const __m128i raw_desc_bh3 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + const __m128i raw_desc_bh2 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + const __m128i raw_desc_bh1 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + const __m128i raw_desc_bh0 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +786,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static __rte_always_inline void @@ -841,8 +837,8 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp + 2), desc2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..a770710ea0 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -244,28 +240,28 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, /* load in descriptors, in reverse order */ const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -474,37 +470,29 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ - const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + const __m128i raw_desc_bh7 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[7].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + const __m128i raw_desc_bh6 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp[6].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + const __m128i raw_desc_bh5 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[5].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + const __m128i raw_desc_bh4 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[4].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + const __m128i raw_desc_bh3 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + const __m128i raw_desc_bh2 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + const __m128i raw_desc_bh1 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + const __m128i raw_desc_bh0 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +975,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static __rte_always_inline void @@ -1029,7 +1017,7 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..42eaea7326 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,7 +48,7 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -91,8 +87,8 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +421,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +433,11 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -491,19 +487,19 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* load bottom half of every 32B desc */ const __m128i raw_desc_bh3 = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); /** * to shift the 32b RSS hash value to the @@ -680,7 +676,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..fa5702588c 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -36,7 +34,7 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i].read), zero); } } @@ -60,12 +58,12 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -367,10 +365,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); /* A. load 4 pkts descs */ - descs[0] = vld1q_u64((uint64_t *)(rxdp)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); + descs[0] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp)); + descs[1] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1)); + descs[2] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2)); + descs[3] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3)); /* B.2 copy 2 mbuf point into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); @@ -554,7 +552,7 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, pkt->buf_iova + pkt->data_off, (uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len}; - vst1q_u64((uint64_t *)&txdp->read, descriptor); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &txdp->read), descriptor); } static inline void diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..5c1dcb568f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,7 +37,7 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -76,8 +72,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +462,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +474,11 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +672,7 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)&txdp->read, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &txdp->read), descriptor); } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..bd13a243d5 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,7 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" - tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop + tunnel = RTE_PTR_UNQUAL(flow->tunnel); return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..bb90625040 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * @@ -75,7 +73,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { volatile struct mlx5_mini_cqe8 *mcq = - (void *)&(cq + !rxq->cqe_comp_layout)->pkt_info; + (volatile struct mlx5_mini_cqe8 *)&(cq + !rxq->cqe_comp_layout)->pkt_info; /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -139,9 +137,9 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, */ cycle: if (rxq->cqe_comp_layout) - rte_prefetch0((void *)(cq + mcqe_n)); + rte_prefetch0((volatile void *)(cq + mcqe_n)); for (pos = 0; pos < mcqe_n; ) { - uint8_t *p = (void *)&mcq[pos % 8]; + uint8_t *p = RTE_CAST_PTR(uint8_t *, &mcq[pos % 8]); uint8_t *e0 = (void *)&elts[pos]->rearm_data; uint8_t *e1 = (void *)&elts[pos + 1]->rearm_data; uint8_t *e2 = (void *)&elts[pos + 2]->rearm_data; @@ -157,7 +155,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) - rte_prefetch0((void *)(cq + pos + i)); + rte_prefetch0((volatile void *)(cq + pos + i)); __asm__ volatile ( /* A.1 load mCQEs into a 128bit register. */ "ld1 {v16.16b - v17.16b}, [%[mcq]] \n\t" @@ -367,8 +365,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { if (pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)&(cq + pos)->pkt_info; + rte_prefetch0((volatile void *)(cq + pos + 8)); + mcq = (volatile struct mlx5_mini_cqe8 *)&(cq + pos)->pkt_info; for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -383,7 +381,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile struct mlx5_mini_cqe8 *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -663,7 +661,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, mask = vcreate_u16(pkts_n - pos < MLX5_VPMD_DESCS_PER_LOOP ? -1UL >> ((pkts_n - pos) * sizeof(uint16_t) * 8) : 0); - p0 = (void *)&cq[pos].pkt_info; + p0 = RTE_PTR_UNQUAL(&cq[pos].pkt_info); p1 = p0 + (pkts_n - pos > 1) * sizeof(struct mlx5_cqe); p2 = p1 + (pkts_n - pos > 2) * sizeof(struct mlx5_cqe); p3 = p2 + (pkts_n - pos > 3) * sizeof(struct mlx5_cqe); diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..574df5c407 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile struct mlx5_mini_cqe8 *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -130,7 +127,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, */ cycle: if (rxq->cqe_comp_layout) - rte_prefetch0((void *)(cq + mcqe_n)); + rte_prefetch0((volatile void *)(cq + mcqe_n)); for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -141,10 +138,10 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) - rte_prefetch0((void *)(cq + pos + i)); + rte_prefetch0((volatile void *)(cq + pos + i)); /* A.1 load mCQEs into a 128bit register. */ - mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); - mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); + mcqe1 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &mcq[pos % 8])); + mcqe2 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &mcq[pos % 8 + 2])); /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -355,8 +352,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { if (pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); + rte_prefetch0((volatile void *)(cq + pos + 8)); + mcq = (volatile struct mlx5_mini_cqe8 *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +368,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile struct mlx5_mini_cqe8 *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,38 +648,38 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); - cqes[3] = _mm_loadl_epi64((__m128i *) - &cq[pos + p3].sop_drop_qpn); + cqes[3] = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, + &cq[pos + p3].sop_drop_qpn)); rte_compiler_barrier(); p2 = _mm_extract_epi16(p, 2); - cqes[2] = _mm_loadl_epi64((__m128i *) - &cq[pos + p2].sop_drop_qpn); + cqes[2] = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, + &cq[pos + p2].sop_drop_qpn)); rte_compiler_barrier(); /* B.1 load mbuf pointers. */ mbp1 = _mm_loadu_si128((__m128i *)&elts[pos]); mbp2 = _mm_loadu_si128((__m128i *)&elts[pos + 2]); /* A.1 load a block having op_own. */ p1 = _mm_extract_epi16(p, 1); - cqes[1] = _mm_loadl_epi64((__m128i *) - &cq[pos + p1].sop_drop_qpn); + cqes[1] = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, + &cq[pos + p1].sop_drop_qpn)); rte_compiler_barrier(); - cqes[0] = _mm_loadl_epi64((__m128i *) - &cq[pos].sop_drop_qpn); + cqes[0] = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, + &cq[pos].sop_drop_qpn)); /* B.2 copy mbuf pointers. */ _mm_storeu_si128((__m128i *)&pkts[pos], mbp1); _mm_storeu_si128((__m128i *)&pkts[pos + 2], mbp2); rte_io_rmb(); /* C.1 load remained CQE data and extract necessary fields. */ - cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p3]); - cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos + p2]); + cqe_tmp2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p3])); + cqe_tmp1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p2])); cqes[3] = _mm_blendv_epi8(cqes[3], cqe_tmp2, blend_mask); cqes[2] = _mm_blendv_epi8(cqes[2], cqe_tmp1, blend_mask); - cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + p3].csum); - cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos + p2].csum); + cqe_tmp2 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p3].csum)); + cqe_tmp1 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p2].csum)); cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x30); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); - cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); - cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); + cqe_tmp2 = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, &cq[pos + p3].rsvd4[2])); + cqe_tmp1 = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, &cq[pos + p2].rsvd4[2])); cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,16 +697,16 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ - cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); - cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); + cqe_tmp2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p1])); + cqe_tmp1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &cq[pos])); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); cqes[0] = _mm_blendv_epi8(cqes[0], cqe_tmp1, blend_mask); - cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + p1].csum); - cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos].csum); + cqe_tmp2 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p1].csum)); + cqe_tmp1 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &cq[pos].csum)); cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x30); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); - cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); - cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); + cqe_tmp2 = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, &cq[pos + p1].rsvd4[2])); + cqe_tmp1 = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, &cq[pos].rsvd4[2])); cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c index 37075ea5e7..46391c9400 100644 --- a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c +++ b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c @@ -35,7 +35,7 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_NGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i]), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,12 +58,12 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, rxdp++), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, rxdp++), dma_addr1); } rxq->rxrearm_start += RTE_NGBE_RXQ_REARM_THRESH; @@ -484,7 +484,7 @@ vtx1(volatile struct ngbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; - vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); + vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor); } static inline void diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c index d4d647fab5..a56e2f4164 100644 --- a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c +++ b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c @@ -34,7 +34,7 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_TXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i]), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -57,12 +57,12 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, rxdp++), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, rxdp++), dma_addr1); } rxq->rxrearm_start += RTE_TXGBE_RXQ_REARM_THRESH; @@ -484,7 +484,7 @@ vtx1(volatile struct txgbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; - vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); + vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor); } static inline void diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.2.vfs.0.1 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v16 0/3] add diagnostics macros to make code portable 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie ` (16 preceding siblings ...) 2025-01-18 21:55 ` [PATCH v15 0/3] " Andre Muezerie @ 2025-01-21 22:36 ` Andre Muezerie 2025-01-21 22:36 ` [PATCH v16 1/3] eal: " Andre Muezerie ` (2 more replies) 17 siblings, 3 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-21 22:36 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson v16: * Updated comment for RTE_CAST_PTR. v15: * Fixed a comment in rte_common.h to make Doxygen happy. * Fixed a typo (extra comma). * Added missing RTE_PTR_UNQUAL needed for ARM64. v14: * Renamed RTE_PTR_DROP_QUALIFIERS into RTE_PTR_UNQUAL to more resemble C23 typeof_unqual. * Added macro RTE_CAST_PTR to make the cast more readable when removing a type qualifier from a pointer. v13: * Renamed RTE_IGNORE_CAST_QUAL into RTE_PTR_DROP_QUALIFIERS. * Added (void *) cast to RTE_PTR_DROP_QUALIFIERS to avoid the need for casting the result in most places where the macro is used. v12: * Added macro RTE_IGNORE_CAST_QUAL and used it as a more compact and readable form to suppress warnings where a cast is used to remove a type qualifier. v11: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v10: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v9: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v8: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v7: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v6: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v5: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v4: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v3: * Added __rte_diagnostic_ignored_wcast_qual to a few more places where it was needed. v2: * Removed __rte_diagnostic_ignored_wstrict_aliasing (introduced in v1). * Removed the pragmas from many files where they were not needed. * In the files where the pragmas were indeed needed, reduced the scope during which they are active, reducing the chance that unforeseen issues are hidden due to warning suppression. Andre Muezerie (3): eal: add diagnostics macros to make code portable drivers/common: add diagnostics macros to make code portable drivers/net: add diagnostics macros to make code portable drivers/common/idpf/idpf_common_rxtx_avx512.c | 72 +++++++++--------- drivers/net/axgbe/axgbe_rxtx.h | 9 --- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 - drivers/net/dpaa2/dpaa2_rxtx.c | 15 +--- drivers/net/fm10k/fm10k_rxtx_vec.c | 21 ++---- drivers/net/hns3/hns3_rxtx_vec_neon.h | 6 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 22 +++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 18 ++--- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 30 ++++---- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 28 +++---- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 - drivers/net/i40e/i40e_rxtx_vec_neon.c | 35 ++++----- drivers/net/i40e/i40e_rxtx_vec_sse.c | 28 +++---- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 60 +++++++-------- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 62 ++++++++-------- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 +-- drivers/net/iavf/iavf_rxtx_vec_neon.c | 22 +++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 38 +++++----- drivers/net/ice/ice_rxtx_common_avx.h | 18 ++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 74 +++++++++---------- drivers/net/ice/ice_rxtx_vec_avx512.c | 64 +++++++--------- drivers/net/ice/ice_rxtx_vec_common.h | 4 - drivers/net/ice/ice_rxtx_vec_sse.c | 28 +++---- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 - .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 18 ++--- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 20 ++--- drivers/net/mlx5/mlx5_flow.c | 5 +- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 ++--- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 61 ++++++++------- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 8 +- drivers/net/tap/tap_flow.c | 6 +- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 8 +- drivers/net/virtio/virtio_rxtx_simple.c | 4 - lib/eal/include/rte_common.h | 48 ++++++++++++ 37 files changed, 398 insertions(+), 483 deletions(-) -- 2.47.2.vfs.0.1 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v16 1/3] eal: add diagnostics macros to make code portable 2025-01-21 22:36 ` [PATCH v16 0/3] " Andre Muezerie @ 2025-01-21 22:36 ` Andre Muezerie 2025-01-21 22:36 ` [PATCH v16 2/3] drivers/common: " Andre Muezerie 2025-01-21 22:36 ` [PATCH v16 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-21 22:36 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/include/rte_common.h | 48 ++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h index 40592f71b1..9983eb586d 100644 --- a/lib/eal/include/rte_common.h +++ b/lib/eal/include/rte_common.h @@ -156,6 +156,54 @@ typedef uint16_t unaligned_uint16_t; #define RTE_DEPRECATED(x) #endif +/** + * Macros to cause the compiler to remember the state of the diagnostics as of + * each push, and restore to that point at each pop. + */ +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) +#define __rte_diagnostic_push _Pragma("GCC diagnostic push") +#define __rte_diagnostic_pop _Pragma("GCC diagnostic pop") +#else +#define __rte_diagnostic_push +#define __rte_diagnostic_pop +#endif + +/** + * Macro to disable compiler warnings about removing a type + * qualifier from the target type. + */ +#if !defined(__INTEL_COMPILER) && !defined(RTE_TOOLCHAIN_MSVC) +#define __rte_diagnostic_ignored_wcast_qual \ + _Pragma("GCC diagnostic ignored \"-Wcast-qual\"") +#else +#define __rte_diagnostic_ignored_wcast_qual +#endif + +/** + * Workaround to discard qualifiers (such as const, volatile, restrict) from a pointer, + * without the compiler emitting a warning. + */ +#define RTE_PTR_UNQUAL(X) ((void *)(uintptr_t)(X)) + +/** + * Workaround to cast a pointer to a specific type, + * without the compiler emitting a warning about discarding qualifiers. + * + * @warning + * When casting a pointer to point to a larger type, the resulting pointer may + * be misaligned, which results in undefined behavior. + * E.g.: + * + * struct s { + * uint16_t a; + * uint8_t b; + * uint8_t c; + * uint8_t d; + * } v; + * uint16_t * p = RTE_CAST_PTR(uint16_t *, &v.c); // "p" is not 16 bit aligned! + */ +#define RTE_CAST_PTR(type, ptr) ((type)(uintptr_t)(ptr)) + /** * Mark a function or variable to a weak reference. */ -- 2.47.2.vfs.0.1 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v16 2/3] drivers/common: add diagnostics macros to make code portable 2025-01-21 22:36 ` [PATCH v16 0/3] " Andre Muezerie 2025-01-21 22:36 ` [PATCH v16 1/3] eal: " Andre Muezerie @ 2025-01-21 22:36 ` Andre Muezerie 2025-01-21 22:36 ` [PATCH v16 3/3] drivers/net: " Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-21 22:36 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/common/idpf/idpf_common_rxtx_avx512.c | 72 +++++++++---------- 1 file changed, 34 insertions(+), 38 deletions(-) diff --git a/drivers/common/idpf/idpf_common_rxtx_avx512.c b/drivers/common/idpf/idpf_common_rxtx_avx512.c index b8450b03ae..81052e72c1 100644 --- a/drivers/common/idpf/idpf_common_rxtx_avx512.c +++ b/drivers/common/idpf/idpf_common_rxtx_avx512.c @@ -6,10 +6,6 @@ #include "idpf_common_device.h" #include "idpf_common_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -34,7 +30,7 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -108,8 +104,8 @@ idpf_singleq_rearm_common(struct idpf_rx_queue *rxq) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, &rxdp->read), dma_addr0_3); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, &(rxdp + 4)->read), dma_addr4_7); } rxq->rxrearm_start += IDPF_RXQ_REARM_THRESH; @@ -164,8 +160,8 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_storeu_si128((__m128i *)&rxdp[i].read, - dma_addr0); + _mm_storeu_si128(RTE_CAST_PTR + (__m128i *, &rxdp[i].read), dma_addr0); } } rte_atomic_fetch_add_explicit(&rxq->rx_stats.mbuf_alloc_failed, @@ -216,10 +212,10 @@ idpf_singleq_rearm(struct idpf_rx_queue *rxq) iovas1); const __m512i desc6_7 = _mm512_bsrli_epi128(desc4_5, 8); - _mm512_storeu_si512((void *)rxdp, desc0_1); - _mm512_storeu_si512((void *)(rxdp + 2), desc2_3); - _mm512_storeu_si512((void *)(rxdp + 4), desc4_5); - _mm512_storeu_si512((void *)(rxdp + 6), desc6_7); + _mm512_storeu_si512(RTE_CAST_PTR(void *, rxdp), desc0_1); + _mm512_storeu_si512(RTE_CAST_PTR(void *, (rxdp + 2)), desc2_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, (rxdp + 4)), desc4_5); + _mm512_storeu_si512(RTE_CAST_PTR(void *, (rxdp + 6)), desc6_7); rxp += IDPF_DESCS_PER_LOOP_AVX; rxdp += IDPF_DESCS_PER_LOOP_AVX; @@ -337,28 +333,28 @@ _idpf_singleq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -560,7 +556,7 @@ idpf_splitq_rearm_common(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i], + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i]), dma_addr0); } } @@ -634,7 +630,7 @@ idpf_splitq_rearm(struct idpf_rx_queue *rx_bufq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IDPF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rx_bufq->fake_mbuf; - _mm_storeu_si128((__m128i *)&rxdp[i], + _mm_storeu_si128(RTE_CAST_PTR(__m128i *, &rxdp[i]), dma_addr0); } } @@ -798,28 +794,28 @@ _idpf_splitq_recv_raw_pkts_avx512(struct idpf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1131,7 +1127,7 @@ idpf_singleq_vtx1(volatile struct idpf_base_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } #define IDPF_TX_LEN_MASK 0xAA @@ -1178,7 +1174,7 @@ idpf_singleq_vtx(volatile struct idpf_base_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ @@ -1435,7 +1431,7 @@ idpf_splitq_vtx1(volatile struct idpf_flex_tx_sched_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static __rte_always_inline void @@ -1480,7 +1476,7 @@ idpf_splitq_vtx(volatile struct idpf_flex_tx_sched_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ @@ -1521,11 +1517,11 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt if (nb_commit >= n) { tx_backlog_entry_avx512(txep, tx_pkts, n); - idpf_splitq_vtx((void *)txdp, tx_pkts, n - 1, cmd_dtype); + idpf_splitq_vtx(txdp, tx_pkts, n - 1, cmd_dtype); tx_pkts += (n - 1); txdp += (n - 1); - idpf_splitq_vtx1((void *)txdp, *tx_pkts++, cmd_dtype); + idpf_splitq_vtx1(txdp, *tx_pkts++, cmd_dtype); nb_commit = (uint16_t)(nb_commit - n); @@ -1540,7 +1536,7 @@ idpf_splitq_xmit_fixed_burst_vec_avx512(void *tx_queue, struct rte_mbuf **tx_pkt tx_backlog_entry_avx512(txep, tx_pkts, nb_commit); - idpf_splitq_vtx((void *)txdp, tx_pkts, nb_commit, cmd_dtype); + idpf_splitq_vtx(txdp, tx_pkts, nb_commit, cmd_dtype); tx_id = (uint16_t)(tx_id + nb_commit); if (tx_id > txq->next_rs) -- 2.47.2.vfs.0.1 ^ permalink raw reply [flat|nested] 87+ messages in thread
* [PATCH v16 3/3] drivers/net: add diagnostics macros to make code portable 2025-01-21 22:36 ` [PATCH v16 0/3] " Andre Muezerie 2025-01-21 22:36 ` [PATCH v16 1/3] eal: " Andre Muezerie 2025-01-21 22:36 ` [PATCH v16 2/3] drivers/common: " Andre Muezerie @ 2025-01-21 22:36 ` Andre Muezerie 2 siblings, 0 replies; 87+ messages in thread From: Andre Muezerie @ 2025-01-21 22:36 UTC (permalink / raw) To: andremue; +Cc: dev, stephen, bruce.richardson It was a common pattern to have "GCC diagnostic ignored" pragmas sprinkled over the code and only activate these pragmas for certain compilers (gcc and clang). Clang supports GCC's pragma for compatibility with existing source code, so #pragma GCC diagnostic and #pragma clang diagnostic are synonyms for Clang (https://clang.llvm.org/docs/UsersManual.html). Now that effort is being made to make the code compatible with MSVC these expressions would become more complex. It makes sense to hide this complexity behind macros. This makes maintenance easier as these macros are defined in a single place. As a plus the code becomes more readable as well. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/axgbe/axgbe_rxtx.h | 9 --- drivers/net/cpfl/cpfl_rxtx_vec_common.h | 4 - drivers/net/dpaa2/dpaa2_rxtx.c | 15 +--- drivers/net/fm10k/fm10k_rxtx_vec.c | 21 ++---- drivers/net/hns3/hns3_rxtx_vec_neon.h | 6 +- .../net/i40e/i40e_recycle_mbufs_vec_common.c | 2 - drivers/net/i40e/i40e_rxtx_common_avx.h | 22 +++--- drivers/net/i40e/i40e_rxtx_vec_altivec.c | 18 ++--- drivers/net/i40e/i40e_rxtx_vec_avx2.c | 30 ++++---- drivers/net/i40e/i40e_rxtx_vec_avx512.c | 28 +++---- drivers/net/i40e/i40e_rxtx_vec_common.h | 4 - drivers/net/i40e/i40e_rxtx_vec_neon.c | 35 ++++----- drivers/net/i40e/i40e_rxtx_vec_sse.c | 28 +++---- drivers/net/iavf/iavf_rxtx_vec_avx2.c | 60 +++++++-------- drivers/net/iavf/iavf_rxtx_vec_avx512.c | 62 ++++++++-------- drivers/net/iavf/iavf_rxtx_vec_common.h | 10 +-- drivers/net/iavf/iavf_rxtx_vec_neon.c | 22 +++--- drivers/net/iavf/iavf_rxtx_vec_sse.c | 38 +++++----- drivers/net/ice/ice_rxtx_common_avx.h | 18 ++--- drivers/net/ice/ice_rxtx_vec_avx2.c | 74 +++++++++---------- drivers/net/ice/ice_rxtx_vec_avx512.c | 64 +++++++--------- drivers/net/ice/ice_rxtx_vec_common.h | 4 - drivers/net/ice/ice_rxtx_vec_sse.c | 28 +++---- drivers/net/idpf/idpf_rxtx_vec_common.h | 4 - .../ixgbe/ixgbe_recycle_mbufs_vec_common.c | 2 - drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c | 18 ++--- drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c | 20 ++--- drivers/net/mlx5/mlx5_flow.c | 5 +- drivers/net/mlx5/mlx5_rxtx_vec_altivec.h | 5 -- drivers/net/mlx5/mlx5_rxtx_vec_neon.h | 18 ++--- drivers/net/mlx5/mlx5_rxtx_vec_sse.h | 61 ++++++++------- drivers/net/ngbe/ngbe_rxtx_vec_neon.c | 8 +- drivers/net/tap/tap_flow.c | 6 +- drivers/net/txgbe/txgbe_rxtx_vec_neon.c | 8 +- drivers/net/virtio/virtio_rxtx_simple.c | 4 - 35 files changed, 316 insertions(+), 445 deletions(-) diff --git a/drivers/net/axgbe/axgbe_rxtx.h b/drivers/net/axgbe/axgbe_rxtx.h index a326ba9ac8..f5f74a0a39 100644 --- a/drivers/net/axgbe/axgbe_rxtx.h +++ b/drivers/net/axgbe/axgbe_rxtx.h @@ -6,15 +6,6 @@ #ifndef _AXGBE_RXTX_H_ #define _AXGBE_RXTX_H_ -/* to suppress gcc warnings related to descriptor casting*/ -#ifdef RTE_TOOLCHAIN_GCC -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - -#ifdef RTE_TOOLCHAIN_CLANG -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /* Descriptor related defines */ #define AXGBE_MAX_RING_DESC 4096 /*should be power of 2*/ #define AXGBE_TX_DESC_MIN_FREE (AXGBE_MAX_RING_DESC >> 3) diff --git a/drivers/net/cpfl/cpfl_rxtx_vec_common.h b/drivers/net/cpfl/cpfl_rxtx_vec_common.h index 479e1ddcb9..5b98f86932 100644 --- a/drivers/net/cpfl/cpfl_rxtx_vec_common.h +++ b/drivers/net/cpfl/cpfl_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "cpfl_ethdev.h" #include "cpfl_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define CPFL_SCALAR_PATH 0 #define CPFL_VECTOR_PATH 1 #define CPFL_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/dpaa2/dpaa2_rxtx.c b/drivers/net/dpaa2/dpaa2_rxtx.c index e3b6c7e460..bfb5542bbc 100644 --- a/drivers/net/dpaa2/dpaa2_rxtx.c +++ b/drivers/net/dpaa2/dpaa2_rxtx.c @@ -1962,14 +1962,6 @@ dpaa2_dev_tx_ordered(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts) return num_tx; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic push -#pragma clang diagnostic ignored "-Wcast-qual" -#endif - /* This function loopbacks all the received packets.*/ uint16_t dpaa2_dev_loopback_rx(void *queue, @@ -2083,7 +2075,7 @@ dpaa2_dev_loopback_rx(void *queue, if (unlikely((status & QBMAN_DQ_STAT_VALIDFRAME) == 0)) continue; } - fd[num_rx] = (struct qbman_fd *)qbman_result_DQ_fd(dq_storage); + fd[num_rx] = RTE_PTR_UNQUAL(qbman_result_DQ_fd(dq_storage)); dq_storage++; num_rx++; @@ -2118,8 +2110,3 @@ dpaa2_dev_loopback_rx(void *queue, return 0; } -#if defined(RTE_TOOLCHAIN_GCC) -#pragma GCC diagnostic pop -#elif defined(RTE_TOOLCHAIN_CLANG) -#pragma clang diagnostic pop -#endif diff --git a/drivers/net/fm10k/fm10k_rxtx_vec.c b/drivers/net/fm10k/fm10k_rxtx_vec.c index 68acaca75b..715c891c30 100644 --- a/drivers/net/fm10k/fm10k_rxtx_vec.c +++ b/drivers/net/fm10k/fm10k_rxtx_vec.c @@ -11,10 +11,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static void fm10k_reset_tx_queue(struct fm10k_tx_queue *txq); @@ -270,8 +266,7 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) /* Clean up all the HW/SW ring content */ for (i = 0; i < RTE_FM10K_RXQ_REARM_THRESH; i++) { mb_alloc[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].q, - dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].q), dma_addr0); } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -316,8 +311,8 @@ fm10k_rxq_rearm(struct fm10k_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->q, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->q, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->q), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->q), dma_addr1); /* enforce 512B alignment on default Rx virtual addresses */ mb0->data_off = (uint16_t)(RTE_PTR_ALIGN((char *)mb0->buf_addr @@ -465,7 +460,7 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs0[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs0[3] = _mm_loadu_si128(RTE_CAST_PTR(__m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -477,11 +472,11 @@ fm10k_recv_raw_pkts_vec(void *rx_queue, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs0[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs0[2] = _mm_loadu_si128(RTE_CAST_PTR(__m128i *, rxdp + 2)); rte_compiler_barrier(); - descs0[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs0[1] = _mm_loadu_si128(RTE_CAST_PTR(__m128i *, rxdp + 1)); rte_compiler_barrier(); - descs0[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs0[0] = _mm_loadu_si128(RTE_CAST_PTR(__m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -736,7 +731,7 @@ vtx1(volatile struct fm10k_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(flags << 56 | (uint64_t)pkt->vlan_tci << 16 | (uint64_t)pkt->data_len, MBUF_DMA_ADDR(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h index bbb5478015..f2e155c9f5 100644 --- a/drivers/net/hns3/hns3_rxtx_vec_neon.h +++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h @@ -9,8 +9,6 @@ #include <arm_neon.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) { @@ -22,8 +20,8 @@ hns3_vec_tx(volatile struct hns3_desc *desc, struct rte_mbuf *pkt) 0, ((uint64_t)HNS3_TXD_DEFAULT_VLD_FE_BDTYPE) << HNS3_UINT32_BIT }; - vst1q_u64((uint64_t *)&desc->addr, val1); - vst1q_u64((uint64_t *)&desc->tx.outer_vlan_tag, val2); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &desc->addr), val1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &desc->tx.outer_vlan_tag), val2); } static uint16_t diff --git a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c index 14424c9921..6eafe51e3d 100644 --- a/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c +++ b/drivers/net/i40e/i40e_recycle_mbufs_vec_common.c @@ -10,8 +10,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void i40e_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/i40e/i40e_rxtx_common_avx.h b/drivers/net/i40e/i40e_rxtx_common_avx.h index 85958d6c81..b66a808f9f 100644 --- a/drivers/net/i40e/i40e_rxtx_common_avx.h +++ b/drivers/net/i40e/i40e_rxtx_common_avx.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) @@ -36,7 +32,7 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -72,8 +68,8 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } #else #ifdef __AVX512VL__ @@ -144,8 +140,10 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, + &rxdp->read), dma_addr0_3); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, + &(rxdp + 4)->read), dma_addr4_7); } } else #endif /* __AVX512VL__*/ @@ -190,8 +188,10 @@ i40e_rxq_rearm_common(struct i40e_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ - _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); - _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, + &rxdp->read), dma_addr0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, + &(rxdp + 2)->read), dma_addr2_3); } } diff --git a/drivers/net/i40e/i40e_rxtx_vec_altivec.c b/drivers/net/i40e/i40e_rxtx_vec_altivec.c index b6b0d38ec1..f9e26d18dd 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_altivec.c +++ b/drivers/net/i40e/i40e_rxtx_vec_altivec.c @@ -15,8 +15,6 @@ #include <rte_altivec.h> -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -44,7 +42,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; vec_st(dma_addr0, 0, - (__vector unsigned long *)&rxdp[i].read); + RTE_CAST_PTR(__vector unsigned long *, &rxdp[i].read)); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -84,8 +82,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = vec_add(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - vec_st(dma_addr0, 0, (__vector unsigned long *)&rxdp++->read); - vec_st(dma_addr1, 0, (__vector unsigned long *)&rxdp++->read); + vec_st(dma_addr0, 0, RTE_CAST_PTR(__vector unsigned long *, &rxdp++->read)); + vec_st(dma_addr1, 0, RTE_CAST_PTR(__vector unsigned long *, &rxdp++->read)); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -286,7 +284,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = *(__vector unsigned long *)&sw_ring[pos]; /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = *(__vector unsigned long *)(rxdp + 3); + descs[3] = *RTE_CAST_PTR(__vector unsigned long *, rxdp + 3); rte_compiler_barrier(); /* B.2 copy 2 mbuf point into rx_pkts */ @@ -296,11 +294,11 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = *(__vector unsigned long *)&sw_ring[pos + 2]; /* A.1 load desc[2-0] */ - descs[2] = *(__vector unsigned long *)(rxdp + 2); + descs[2] = *RTE_CAST_PTR(__vector unsigned long *, rxdp + 2); rte_compiler_barrier(); - descs[1] = *(__vector unsigned long *)(rxdp + 1); + descs[1] = *RTE_CAST_PTR(__vector unsigned long *, rxdp + 1); rte_compiler_barrier(); - descs[0] = *(__vector unsigned long *)(rxdp); + descs[0] = *RTE_CAST_PTR(__vector unsigned long *, rxdp); /* B.2 copy 2 mbuf point into rx_pkts */ *(__vector unsigned long *)&rx_pkts[pos + 2] = mbp2; @@ -534,7 +532,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __vector unsigned long descriptor = (__vector unsigned long){ pkt->buf_iova + pkt->data_off, high_qw}; - *(__vector unsigned long *)txdp = descriptor; + *RTE_CAST_PTR(__vector unsigned long *, txdp) = descriptor; } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx2.c b/drivers/net/i40e/i40e_rxtx_vec_avx2.c index 19cf0ac718..86f06d67fb 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx2.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx2.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -39,8 +35,8 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ - __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); - __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); + __m128i *rxdp_desc_0 = RTE_CAST_PTR(__m128i *, &rxdp[desc_idx + 0].wb.qword2); + __m128i *rxdp_desc_1 = RTE_CAST_PTR(__m128i *, &rxdp[desc_idx + 1].wb.qword2); const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -276,21 +272,21 @@ _recv_raw_pkts_vec_avx2(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256( _mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -695,7 +691,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void @@ -728,8 +724,8 @@ vtx(volatile struct i40e_tx_desc *txdp, __m256i desc0_1 = _mm256_set_epi64x( hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp + 2), desc2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_avx512.c b/drivers/net/i40e/i40e_rxtx_vec_avx512.c index 3b2750221b..587cbfb6d9 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_avx512.c +++ b/drivers/net/i40e/i40e_rxtx_vec_avx512.c @@ -15,10 +15,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define RTE_I40E_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -41,8 +37,8 @@ desc_fdir_processing_32b(volatile union i40e_rx_desc *rxdp, const uint32_t desc_idx) { /* 32B desc path: load rxdp.wb.qword2 for EXT_STATUS and FLEXBH_STAT */ - __m128i *rxdp_desc_0 = (void *)(&rxdp[desc_idx + 0].wb.qword2); - __m128i *rxdp_desc_1 = (void *)(&rxdp[desc_idx + 1].wb.qword2); + __m128i *rxdp_desc_0 = RTE_CAST_PTR(__m128i *, &rxdp[desc_idx + 0].wb.qword2); + __m128i *rxdp_desc_1 = RTE_CAST_PTR(__m128i *, &rxdp[desc_idx + 1].wb.qword2); const __m128i desc_qw2_0 = _mm_load_si128(rxdp_desc_0); const __m128i desc_qw2_1 = _mm_load_si128(rxdp_desc_1); @@ -264,28 +260,28 @@ _recv_raw_pkts_vec_avx512(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* load in descriptors, in reverse order */ const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -875,7 +871,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void @@ -909,7 +905,7 @@ vtx(volatile struct i40e_tx_desc *txdp, hi_qw2, pkt[2]->buf_iova + pkt[2]->data_off, hi_qw1, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ diff --git a/drivers/net/i40e/i40e_rxtx_vec_common.h b/drivers/net/i40e/i40e_rxtx_vec_common.h index 8b745630e4..ec59a68f9d 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_common.h +++ b/drivers/net/i40e/i40e_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "i40e_ethdev.h" #include "i40e_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t reassemble_packets(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c index e1c5c7041b..12c8521824 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_neon.c +++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c @@ -16,9 +16,6 @@ #include "i40e_rxtx.h" #include "i40e_rxtx_vec_common.h" - -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +38,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, zero); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i].read), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,11 +55,11 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -87,10 +84,10 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ uint64x2_t desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; - desc0_qw23 = vld1q_u64((uint64_t *)&(rxdp + 0)->wb.qword2); - desc1_qw23 = vld1q_u64((uint64_t *)&(rxdp + 1)->wb.qword2); - desc2_qw23 = vld1q_u64((uint64_t *)&(rxdp + 2)->wb.qword2); - desc3_qw23 = vld1q_u64((uint64_t *)&(rxdp + 3)->wb.qword2); + desc0_qw23 = vld1q_u64(RTE_CAST_PTR(uint64_t *, &(rxdp + 0)->wb.qword2)); + desc1_qw23 = vld1q_u64(RTE_CAST_PTR(uint64_t *, &(rxdp + 1)->wb.qword2)); + desc2_qw23 = vld1q_u64(RTE_CAST_PTR(uint64_t *, &(rxdp + 2)->wb.qword2)); + desc3_qw23 = vld1q_u64(RTE_CAST_PTR(uint64_t *, &(rxdp + 3)->wb.qword2)); /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ uint32x4_t v_unpack_02, v_unpack_13; @@ -421,18 +418,18 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[0] = vld1q_u64((uint64_t *)(rxdp)); + descs[3] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3)); + descs[2] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2)); + descs[1] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1)); + descs[0] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp)); /* Use acquire fence to order loads of descriptor qwords */ rte_atomic_thread_fence(rte_memory_order_acquire); /* A.2 reload qword0 to make it ordered after qword1 load */ - descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0); - descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); - descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); - descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); + descs[3] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3), descs[3], 0); + descs[2] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2), descs[2], 0); + descs[1] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1), descs[1], 0); + descs[0] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp), descs[0], 0); /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); @@ -662,7 +659,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, ((uint64_t)pkt->data_len << I40E_TXD_QW1_TX_BUF_SZ_SHIFT)); uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, high_qw}; - vst1q_u64((uint64_t *)txdp, descriptor); + vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor); } static inline void diff --git a/drivers/net/i40e/i40e_rxtx_vec_sse.c b/drivers/net/i40e/i40e_rxtx_vec_sse.c index ad560d2b6b..3fb97f4528 100644 --- a/drivers/net/i40e/i40e_rxtx_vec_sse.c +++ b/drivers/net/i40e/i40e_rxtx_vec_sse.c @@ -14,10 +14,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void i40e_rxq_rearm(struct i40e_rx_queue *rxq) { @@ -41,7 +37,7 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_I40E_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -72,8 +68,8 @@ i40e_rxq_rearm(struct i40e_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_I40E_RXQ_REARM_THRESH; @@ -97,10 +93,10 @@ descs_to_fdir_32b(volatile union i40e_rx_desc *rxdp, struct rte_mbuf **rx_pkt) { /* 32B descriptors: Load 2nd half of descriptors for FDIR ID data */ __m128i desc0_qw23, desc1_qw23, desc2_qw23, desc3_qw23; - desc0_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 0)->wb.qword2); - desc1_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 1)->wb.qword2); - desc2_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 2)->wb.qword2); - desc3_qw23 = _mm_loadu_si128((__m128i *)&(rxdp + 3)->wb.qword2); + desc0_qw23 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &(rxdp + 0)->wb.qword2)); + desc1_qw23 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &(rxdp + 1)->wb.qword2)); + desc2_qw23 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &(rxdp + 2)->wb.qword2)); + desc3_qw23 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &(rxdp + 3)->wb.qword2)); /* FDIR ID data: move last u32 of each desc to 4 u32 lanes */ __m128i v_unpack_01, v_unpack_23; @@ -462,7 +458,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -474,11 +470,11 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -681,7 +677,7 @@ vtx1(volatile struct i40e_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx2.c b/drivers/net/iavf/iavf_rxtx_vec_avx2.c index 49d41af953..8cc61c484f 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx2.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx2.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -193,21 +189,21 @@ _iavf_recv_raw_pkts_vec_avx2(struct iavf_rx_queue *rxq, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -509,7 +505,7 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, 0, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -743,28 +739,28 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, __m256i raw_desc0_1, raw_desc2_3, raw_desc4_5, raw_desc6_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -961,35 +957,35 @@ _iavf_recv_raw_pkts_vec_avx2_flex_rxd(struct iavf_rx_queue *rxq, /* load bottom half of every 32B desc */ const __m128i raw_desc_bh7 = _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[7].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh6 = _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[6].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh5 = _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[5].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh4 = _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[4].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh3 = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1664,7 +1660,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static __rte_always_inline void @@ -1719,8 +1715,8 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp + 2), desc2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/iavf/iavf_rxtx_vec_avx512.c b/drivers/net/iavf/iavf_rxtx_vec_avx512.c index d6a861bf80..3c238db104 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_avx512.c +++ b/drivers/net/iavf/iavf_rxtx_vec_avx512.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IAVF_DESCS_PER_LOOP_AVX 8 #define PKTLEN_SHIFT 10 @@ -165,28 +161,28 @@ _iavf_recv_raw_pkts_vec_avx512(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -600,7 +596,7 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, rxq->mbuf_initializer); struct rte_mbuf **sw_ring = &rxq->sw_ring[rxq->rx_tail]; volatile union iavf_rx_flex_desc *rxdp = - (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -734,28 +730,28 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, __m512i raw_desc0_3, raw_desc4_7; const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc4_7 = _mm512_broadcast_i32x4(raw_desc4); raw_desc4_7 = _mm512_inserti32x4(raw_desc4_7, raw_desc5, 1); @@ -1113,35 +1109,35 @@ _iavf_recv_raw_pkts_vec_avx512_flex_rxd(struct iavf_rx_queue *rxq, /* load bottom half of every 32B desc */ const __m128i raw_desc_bh7 = _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[7].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh6 = _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[6].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh5 = _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[5].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh4 = _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[4].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh3 = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -1983,7 +1979,7 @@ iavf_vtx1(volatile struct iavf_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_storeu_si128((__m128i *)txdp, descriptor); + _mm_storeu_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } #define IAVF_TX_LEN_MASK 0xAA @@ -2037,7 +2033,7 @@ iavf_vtx(volatile struct iavf_tx_desc *txdp, pkt[1]->buf_iova + pkt[1]->data_off, hi_qw0, pkt[0]->buf_iova + pkt[0]->data_off); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ @@ -2225,7 +2221,7 @@ ctx_vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, __m256i ctx_data_desc = _mm256_set_epi64x(high_data_qw, pkt->buf_iova + pkt->data_off, high_ctx_qw, low_ctx_qw); - _mm256_storeu_si256((__m256i *)txdp, ctx_data_desc); + _mm256_storeu_si256(RTE_CAST_PTR(__m256i *, txdp), ctx_data_desc); } static __rte_always_inline void @@ -2300,7 +2296,7 @@ ctx_vtx(volatile struct iavf_tx_desc *txdp, hi_ctx_qw1, low_ctx_qw1, hi_data_qw0, pkt[0]->buf_iova + pkt[0]->data_off, hi_ctx_qw0, low_ctx_qw0); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } if (nb_pkts) diff --git a/drivers/net/iavf/iavf_rxtx_vec_common.h b/drivers/net/iavf/iavf_rxtx_vec_common.h index 5c5220048d..18513a6d7f 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_common.h +++ b/drivers/net/iavf/iavf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "iavf.h" #include "iavf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline uint16_t reassemble_packets(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) @@ -422,7 +418,7 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -458,8 +454,8 @@ iavf_rxq_rearm_common(struct iavf_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } #else #ifdef CC_AVX512_SUPPORT diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c index 04be574683..f1b82c7d56 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_neon.c +++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c @@ -36,7 +36,7 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxep[i] = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, zero); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i].read), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -53,11 +53,11 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = vdupq_n_u64(paddr); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr0); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vdupq_n_u64(paddr); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += IAVF_RXQ_REARM_THRESH; @@ -269,18 +269,18 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq, int32x4_t len_shl = {0, 0, 0, PKTLEN_SHIFT}; /* A.1 load desc[3-0] */ - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[0] = vld1q_u64((uint64_t *)(rxdp)); + descs[3] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3)); + descs[2] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2)); + descs[1] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1)); + descs[0] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp)); /* Use acquire fence to order loads of descriptor qwords */ rte_atomic_thread_fence(rte_memory_order_acquire); /* A.2 reload qword0 to make it ordered after qword1 load */ - descs[3] = vld1q_lane_u64((uint64_t *)(rxdp + 3), descs[3], 0); - descs[2] = vld1q_lane_u64((uint64_t *)(rxdp + 2), descs[2], 0); - descs[1] = vld1q_lane_u64((uint64_t *)(rxdp + 1), descs[1], 0); - descs[0] = vld1q_lane_u64((uint64_t *)(rxdp), descs[0], 0); + descs[3] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3), descs[3], 0); + descs[2] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2), descs[2], 0); + descs[1] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1), descs[1], 0); + descs[0] = vld1q_lane_u64(RTE_CAST_PTR(uint64_t *, rxdp), descs[0], 0); /* B.1 load 4 mbuf point */ mbp1 = vld1q_u64((uint64_t *)&sw_ring[pos]); diff --git a/drivers/net/iavf/iavf_rxtx_vec_sse.c b/drivers/net/iavf/iavf_rxtx_vec_sse.c index 0db6fa8bd4..490604bec8 100644 --- a/drivers/net/iavf/iavf_rxtx_vec_sse.c +++ b/drivers/net/iavf/iavf_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void iavf_rxq_rearm(struct iavf_rx_queue *rxq) { @@ -38,7 +34,7 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < IAVF_VPMD_DESCS_PER_LOOP; i++) { rxp[i] = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -69,8 +65,8 @@ iavf_rxq_rearm(struct iavf_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += rxq->rx_free_thresh; @@ -578,7 +574,7 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -590,11 +586,11 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -783,7 +779,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, /* Just the act of getting into the function from the application is * going to cost about 7 cycles */ - rxdp = (union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; + rxdp = (volatile union iavf_rx_flex_desc *)rxq->rx_ring + rxq->rx_tail; rte_prefetch0(rxdp); @@ -864,7 +860,7 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -876,11 +872,11 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -928,16 +924,16 @@ _recv_raw_pkts_vec_flex_rxd(struct iavf_rx_queue *rxq, rxq->rx_flags & IAVF_RX_FLAGS_VLAN_TAG_LOC_L2TAG2_2) { /* load bottom half of every 32B desc */ descs_bh[3] = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); descs_bh[2] = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); descs_bh[1] = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); descs_bh[0] = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); } if (offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { @@ -1349,7 +1345,7 @@ vtx1(volatile struct iavf_tx_desc *txdp, struct rte_mbuf *pkt, uint64_t flags) __m128i descriptor = _mm_set_epi64x(high_qw, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void diff --git a/drivers/net/ice/ice_rxtx_common_avx.h b/drivers/net/ice/ice_rxtx_common_avx.h index dacb87dcb0..c62e60c70e 100644 --- a/drivers/net/ice/ice_rxtx_common_avx.h +++ b/drivers/net/ice/ice_rxtx_common_avx.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #ifdef __AVX2__ static __rte_always_inline void ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) @@ -33,7 +29,7 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -77,8 +73,8 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } #else #ifdef __AVX512VL__ @@ -157,8 +153,8 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr4_7 = _mm512_add_epi64(dma_addr4_7, hdr_room); /* flush desc with pa dma_addr */ - _mm512_store_si512((__m512i *)&rxdp->read, dma_addr0_3); - _mm512_store_si512((__m512i *)&(rxdp + 4)->read, dma_addr4_7); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, &rxdp->read), dma_addr0_3); + _mm512_store_si512(RTE_CAST_PTR(__m512i *, &(rxdp + 4)->read), dma_addr4_7); } } else #endif /* __AVX512VL__ */ @@ -213,8 +209,8 @@ ice_rxq_rearm_common(struct ice_rx_queue *rxq, __rte_unused bool avx512) dma_addr2_3 = _mm256_add_epi64(dma_addr2_3, hdr_room); /* flush desc with pa dma_addr */ - _mm256_store_si256((__m256i *)&rxdp->read, dma_addr0_1); - _mm256_store_si256((__m256i *)&(rxdp + 2)->read, dma_addr2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, &rxdp->read), dma_addr0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, &(rxdp + 2)->read), dma_addr2_3); } } diff --git a/drivers/net/ice/ice_rxtx_vec_avx2.c b/drivers/net/ice/ice_rxtx_vec_avx2.c index d6e88dbb29..b7c67d6396 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx2.c +++ b/drivers/net/ice/ice_rxtx_vec_avx2.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static __rte_always_inline void ice_rxq_rearm(struct ice_rx_queue *rxq) { @@ -254,21 +250,29 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, _mm256_loadu_si256((void *)&sw_ring[i + 4])); #endif - const __m128i raw_desc7 = _mm_load_si128((void *)(rxdp + 7)); + const __m128i raw_desc7 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); - const __m128i raw_desc6 = _mm_load_si128((void *)(rxdp + 6)); + const __m128i raw_desc6 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); - const __m128i raw_desc5 = _mm_load_si128((void *)(rxdp + 5)); + const __m128i raw_desc5 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); - const __m128i raw_desc4 = _mm_load_si128((void *)(rxdp + 4)); + const __m128i raw_desc4 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); - const __m128i raw_desc3 = _mm_load_si128((void *)(rxdp + 3)); + const __m128i raw_desc3 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); - const __m128i raw_desc2 = _mm_load_si128((void *)(rxdp + 2)); + const __m128i raw_desc2 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - const __m128i raw_desc1 = _mm_load_si128((void *)(rxdp + 1)); + const __m128i raw_desc1 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - const __m128i raw_desc0 = _mm_load_si128((void *)(rxdp + 0)); + const __m128i raw_desc0 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp + 0)); const __m256i raw_desc6_7 = _mm256_inserti128_si256(_mm256_castsi128_si256(raw_desc6), raw_desc7, 1); @@ -444,37 +448,29 @@ _ice_recv_raw_pkts_vec_avx2(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ - const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + const __m128i raw_desc_bh7 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[7].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + const __m128i raw_desc_bh6 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[6].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + const __m128i raw_desc_bh5 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[5].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + const __m128i raw_desc_bh4 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[4].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + const __m128i raw_desc_bh3 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + const __m128i raw_desc_bh2 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + const __m128i raw_desc_bh1 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + const __m128i raw_desc_bh0 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -790,7 +786,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static __rte_always_inline void @@ -841,8 +837,8 @@ ice_vtx(volatile struct ice_tx_desc *txdp, _mm256_set_epi64x (hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); - _mm256_store_si256((void *)(txdp + 2), desc2_3); - _mm256_store_si256((void *)txdp, desc0_1); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp + 2), desc2_3); + _mm256_store_si256(RTE_CAST_PTR(__m256i *, txdp), desc0_1); } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_avx512.c b/drivers/net/ice/ice_rxtx_vec_avx512.c index add095ef06..a770710ea0 100644 --- a/drivers/net/ice/ice_rxtx_vec_avx512.c +++ b/drivers/net/ice/ice_rxtx_vec_avx512.c @@ -7,10 +7,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define ICE_DESCS_PER_LOOP_AVX 8 static __rte_always_inline void @@ -244,28 +240,28 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, /* load in descriptors, in reverse order */ const __m128i raw_desc7 = - _mm_load_si128((void *)(rxdp + 7)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 7)); rte_compiler_barrier(); const __m128i raw_desc6 = - _mm_load_si128((void *)(rxdp + 6)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 6)); rte_compiler_barrier(); const __m128i raw_desc5 = - _mm_load_si128((void *)(rxdp + 5)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 5)); rte_compiler_barrier(); const __m128i raw_desc4 = - _mm_load_si128((void *)(rxdp + 4)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 4)); rte_compiler_barrier(); const __m128i raw_desc3 = - _mm_load_si128((void *)(rxdp + 3)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); const __m128i raw_desc2 = - _mm_load_si128((void *)(rxdp + 2)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); const __m128i raw_desc1 = - _mm_load_si128((void *)(rxdp + 1)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); const __m128i raw_desc0 = - _mm_load_si128((void *)(rxdp + 0)); + _mm_load_si128(RTE_CAST_PTR(const __m128i *, rxdp + 0)); raw_desc6_7 = _mm256_inserti128_si256 @@ -474,37 +470,29 @@ _ice_recv_raw_pkts_vec_avx512(struct ice_rx_queue *rxq, if (rxq->vsi->adapter->pf.dev_data->dev_conf.rxmode.offloads & RTE_ETH_RX_OFFLOAD_RSS_HASH) { /* load bottom half of every 32B desc */ - const __m128i raw_desc_bh7 = - _mm_load_si128 - ((void *)(&rxdp[7].wb.status_error1)); + const __m128i raw_desc_bh7 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[7].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh6 = - _mm_load_si128 - ((void *)(&rxdp[6].wb.status_error1)); + const __m128i raw_desc_bh6 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, rxdp[6].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh5 = - _mm_load_si128 - ((void *)(&rxdp[5].wb.status_error1)); + const __m128i raw_desc_bh5 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[5].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh4 = - _mm_load_si128 - ((void *)(&rxdp[4].wb.status_error1)); + const __m128i raw_desc_bh4 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[4].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh3 = - _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + const __m128i raw_desc_bh3 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh2 = - _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + const __m128i raw_desc_bh2 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh1 = - _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + const __m128i raw_desc_bh1 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); - const __m128i raw_desc_bh0 = - _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + const __m128i raw_desc_bh0 = _mm_load_si128 + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); __m256i raw_desc_bh6_7 = _mm256_inserti128_si256 @@ -987,7 +975,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, ice_txd_enable_offload(pkt, &high_qw); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static __rte_always_inline void @@ -1029,7 +1017,7 @@ ice_vtx(volatile struct ice_tx_desc *txdp, struct rte_mbuf **pkt, hi_qw2, rte_pktmbuf_iova(pkt[2]), hi_qw1, rte_pktmbuf_iova(pkt[1]), hi_qw0, rte_pktmbuf_iova(pkt[0])); - _mm512_storeu_si512((void *)txdp, desc0_3); + _mm512_storeu_si512(RTE_CAST_PTR(void *, txdp), desc0_3); } /* do any last ones */ diff --git a/drivers/net/ice/ice_rxtx_vec_common.h b/drivers/net/ice/ice_rxtx_vec_common.h index 4b73465af5..45147decff 100644 --- a/drivers/net/ice/ice_rxtx_vec_common.h +++ b/drivers/net/ice/ice_rxtx_vec_common.h @@ -7,10 +7,6 @@ #include "ice_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline uint16_t ice_rx_reassemble_packets(struct ice_rx_queue *rxq, struct rte_mbuf **rx_bufs, uint16_t nb_bufs, uint8_t *split_flags) diff --git a/drivers/net/ice/ice_rxtx_vec_sse.c b/drivers/net/ice/ice_rxtx_vec_sse.c index c01d8ede29..42eaea7326 100644 --- a/drivers/net/ice/ice_rxtx_vec_sse.c +++ b/drivers/net/ice/ice_rxtx_vec_sse.c @@ -6,10 +6,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline __m128i ice_flex_rxd_to_fdir_flags_vec(const __m128i fdir_id0_3) { @@ -52,7 +48,7 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < ICE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -91,8 +87,8 @@ ice_rxq_rearm(struct ice_rx_queue *rxq) dma_addr1 = _mm_add_epi64(dma_addr1, hdr_room); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += ICE_RXQ_REARM_THRESH; @@ -425,7 +421,7 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp1 = _mm_loadu_si128((__m128i *)&sw_ring[pos]); /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -437,11 +433,11 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -491,19 +487,19 @@ _ice_recv_raw_pkts_vec(struct ice_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* load bottom half of every 32B desc */ const __m128i raw_desc_bh3 = _mm_load_si128 - ((void *)(&rxdp[3].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[3].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh2 = _mm_load_si128 - ((void *)(&rxdp[2].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[2].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh1 = _mm_load_si128 - ((void *)(&rxdp[1].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[1].wb.status_error1)); rte_compiler_barrier(); const __m128i raw_desc_bh0 = _mm_load_si128 - ((void *)(&rxdp[0].wb.status_error1)); + (RTE_CAST_PTR(const __m128i *, &rxdp[0].wb.status_error1)); /** * to shift the 32b RSS hash value to the @@ -680,7 +676,7 @@ ice_vtx1(volatile struct ice_tx_desc *txdp, struct rte_mbuf *pkt, ((uint64_t)pkt->data_len << ICE_TXD_QW1_TX_BUF_SZ_S)); __m128i descriptor = _mm_set_epi64x(high_qw, rte_pktmbuf_iova(pkt)); - _mm_store_si128((__m128i *)txdp, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, txdp), descriptor); } static inline void diff --git a/drivers/net/idpf/idpf_rxtx_vec_common.h b/drivers/net/idpf/idpf_rxtx_vec_common.h index 2787d27616..002c1e6948 100644 --- a/drivers/net/idpf/idpf_rxtx_vec_common.h +++ b/drivers/net/idpf/idpf_rxtx_vec_common.h @@ -11,10 +11,6 @@ #include "idpf_ethdev.h" #include "idpf_rxtx.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - #define IDPF_SCALAR_PATH 0 #define IDPF_VECTOR_PATH 1 #define IDPF_RX_NO_VECTOR_FLAGS ( \ diff --git a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c index d451562269..92a89f8def 100644 --- a/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c +++ b/drivers/net/ixgbe/ixgbe_recycle_mbufs_vec_common.c @@ -8,8 +8,6 @@ #include "ixgbe_ethdev.h" #include "ixgbe_rxtx.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - void ixgbe_recycle_rx_descriptors_refill_vec(void *rx_queue, uint16_t nb_mbufs) { diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c index 952b032eb6..fa5702588c 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_neon.c @@ -11,8 +11,6 @@ #include "ixgbe_rxtx.h" #include "ixgbe_rxtx_vec_common.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -36,7 +34,7 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)&rxdp[i].read, + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i].read), zero); } } @@ -60,12 +58,12 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)&rxdp++->read, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -367,10 +365,10 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, mbp2 = vld1q_u64((uint64_t *)&sw_ring[pos + 2]); /* A. load 4 pkts descs */ - descs[0] = vld1q_u64((uint64_t *)(rxdp)); - descs[1] = vld1q_u64((uint64_t *)(rxdp + 1)); - descs[2] = vld1q_u64((uint64_t *)(rxdp + 2)); - descs[3] = vld1q_u64((uint64_t *)(rxdp + 3)); + descs[0] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp)); + descs[1] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 1)); + descs[2] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 2)); + descs[3] = vld1q_u64(RTE_CAST_PTR(uint64_t *, rxdp + 3)); /* B.2 copy 2 mbuf point into rx_pkts */ vst1q_u64((uint64_t *)&rx_pkts[pos + 2], mbp2); @@ -554,7 +552,7 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, pkt->buf_iova + pkt->data_off, (uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len}; - vst1q_u64((uint64_t *)&txdp->read, descriptor); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &txdp->read), descriptor); } static inline void diff --git a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c index a77370cdb7..5c1dcb568f 100644 --- a/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c +++ b/drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c @@ -12,10 +12,6 @@ #include <rte_vect.h> -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - static inline void ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) { @@ -41,7 +37,7 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr0 = _mm_setzero_si128(); for (i = 0; i < RTE_IXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - _mm_store_si128((__m128i *)&rxdp[i].read, + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp[i].read), dma_addr0); } } @@ -76,8 +72,8 @@ ixgbe_rxq_rearm(struct ixgbe_rx_queue *rxq) dma_addr1 = _mm_and_si128(dma_addr1, hba_msk); /* flush desc with pa dma_addr */ - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr0); - _mm_store_si128((__m128i *)&rxdp++->read, dma_addr1); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr0); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &rxdp++->read), dma_addr1); } rxq->rxrearm_start += RTE_IXGBE_RXQ_REARM_THRESH; @@ -466,7 +462,7 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, /* Read desc statuses backwards to avoid race condition */ /* A.1 load desc[3] */ - descs[3] = _mm_loadu_si128((__m128i *)(rxdp + 3)); + descs[3] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 3)); rte_compiler_barrier(); /* B.2 copy 2 64 bit or 4 32 bit mbuf point into rx_pkts */ @@ -478,11 +474,11 @@ _recv_raw_pkts_vec(struct ixgbe_rx_queue *rxq, struct rte_mbuf **rx_pkts, #endif /* A.1 load desc[2-0] */ - descs[2] = _mm_loadu_si128((__m128i *)(rxdp + 2)); + descs[2] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 2)); rte_compiler_barrier(); - descs[1] = _mm_loadu_si128((__m128i *)(rxdp + 1)); + descs[1] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp + 1)); rte_compiler_barrier(); - descs[0] = _mm_loadu_si128((__m128i *)(rxdp)); + descs[0] = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, rxdp)); #if defined(RTE_ARCH_X86_64) /* B.2 copy 2 mbuf point into rx_pkts */ @@ -676,7 +672,7 @@ vtx1(volatile union ixgbe_adv_tx_desc *txdp, __m128i descriptor = _mm_set_epi64x((uint64_t)pkt->pkt_len << 46 | flags | pkt->data_len, pkt->buf_iova + pkt->data_off); - _mm_store_si128((__m128i *)&txdp->read, descriptor); + _mm_store_si128(RTE_CAST_PTR(__m128i *, &txdp->read), descriptor); } static inline void diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c index 16ddd05448..bd13a243d5 100644 --- a/drivers/net/mlx5/mlx5_flow.c +++ b/drivers/net/mlx5/mlx5_flow.c @@ -7287,10 +7287,7 @@ flow_tunnel_from_rule(const struct mlx5_flow *flow) { struct mlx5_flow_tunnel *tunnel; -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" - tunnel = (typeof(tunnel))flow->tunnel; -#pragma GCC diagnostic pop + tunnel = RTE_PTR_UNQUAL(flow->tunnel); return tunnel; } diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h index 240987d03d..b37483bcca 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_altivec.h @@ -25,11 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#pragma GCC diagnostic ignored "-Wstrict-aliasing" -#endif - /** * Store free buffers to RX SW ring. * diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h index dc1d30753d..bb90625040 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h @@ -25,8 +25,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#pragma GCC diagnostic ignored "-Wcast-qual" - /** * Store free buffers to RX SW ring. * @@ -75,7 +73,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { volatile struct mlx5_mini_cqe8 *mcq = - (void *)&(cq + !rxq->cqe_comp_layout)->pkt_info; + (volatile struct mlx5_mini_cqe8 *)&(cq + !rxq->cqe_comp_layout)->pkt_info; /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -139,9 +137,9 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, */ cycle: if (rxq->cqe_comp_layout) - rte_prefetch0((void *)(cq + mcqe_n)); + rte_prefetch0((volatile void *)(cq + mcqe_n)); for (pos = 0; pos < mcqe_n; ) { - uint8_t *p = (void *)&mcq[pos % 8]; + uint8_t *p = RTE_CAST_PTR(uint8_t *, &mcq[pos % 8]); uint8_t *e0 = (void *)&elts[pos]->rearm_data; uint8_t *e1 = (void *)&elts[pos + 1]->rearm_data; uint8_t *e2 = (void *)&elts[pos + 2]->rearm_data; @@ -157,7 +155,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) - rte_prefetch0((void *)(cq + pos + i)); + rte_prefetch0((volatile void *)(cq + pos + i)); __asm__ volatile ( /* A.1 load mCQEs into a 128bit register. */ "ld1 {v16.16b - v17.16b}, [%[mcq]] \n\t" @@ -367,8 +365,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { if (pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)&(cq + pos)->pkt_info; + rte_prefetch0((volatile void *)(cq + pos + 8)); + mcq = (volatile struct mlx5_mini_cqe8 *)&(cq + pos)->pkt_info; for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -383,7 +381,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile struct mlx5_mini_cqe8 *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -663,7 +661,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, mask = vcreate_u16(pkts_n - pos < MLX5_VPMD_DESCS_PER_LOOP ? -1UL >> ((pkts_n - pos) * sizeof(uint16_t) * 8) : 0); - p0 = (void *)&cq[pos].pkt_info; + p0 = RTE_PTR_UNQUAL(&cq[pos].pkt_info); p1 = p0 + (pkts_n - pos > 1) * sizeof(struct mlx5_cqe); p2 = p1 + (pkts_n - pos > 2) * sizeof(struct mlx5_cqe); p3 = p2 + (pkts_n - pos > 3) * sizeof(struct mlx5_cqe); diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h index 81a177fce7..574df5c407 100644 --- a/drivers/net/mlx5/mlx5_rxtx_vec_sse.h +++ b/drivers/net/mlx5/mlx5_rxtx_vec_sse.h @@ -24,10 +24,6 @@ #include "mlx5_rxtx_vec.h" #include "mlx5_autoconf.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - /** * Store free buffers to RX SW ring. * @@ -75,7 +71,8 @@ static inline uint16_t rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, struct rte_mbuf **elts, bool keep) { - volatile struct mlx5_mini_cqe8 *mcq = (void *)(cq + !rxq->cqe_comp_layout); + volatile struct mlx5_mini_cqe8 *mcq = + (volatile struct mlx5_mini_cqe8 *)(cq + !rxq->cqe_comp_layout); /* Title packet is pre-built. */ struct rte_mbuf *t_pkt = rxq->cqe_comp_layout ? &rxq->title_pkt : elts[0]; unsigned int pos; @@ -130,7 +127,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, */ cycle: if (rxq->cqe_comp_layout) - rte_prefetch0((void *)(cq + mcqe_n)); + rte_prefetch0((volatile void *)(cq + mcqe_n)); for (pos = 0; pos < mcqe_n; ) { __m128i mcqe1, mcqe2; __m128i rxdf1, rxdf2; @@ -141,10 +138,10 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) for (i = 0; i < MLX5_VPMD_DESCS_PER_LOOP; ++i) if (likely(pos + i < mcqe_n)) - rte_prefetch0((void *)(cq + pos + i)); + rte_prefetch0((volatile void *)(cq + pos + i)); /* A.1 load mCQEs into a 128bit register. */ - mcqe1 = _mm_loadu_si128((__m128i *)&mcq[pos % 8]); - mcqe2 = _mm_loadu_si128((__m128i *)&mcq[pos % 8 + 2]); + mcqe1 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &mcq[pos % 8])); + mcqe2 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &mcq[pos % 8 + 2])); /* B.1 store rearm data to mbuf. */ _mm_storeu_si128((__m128i *)&elts[pos]->rearm_data, rearm); _mm_storeu_si128((__m128i *)&elts[pos + 1]->rearm_data, rearm); @@ -355,8 +352,8 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, if (!rxq->cqe_comp_layout) { if (!(pos & 0x7) && pos < mcqe_n) { if (pos + 8 < mcqe_n) - rte_prefetch0((void *)(cq + pos + 8)); - mcq = (void *)(cq + pos); + rte_prefetch0((volatile void *)(cq + pos + 8)); + mcq = (volatile struct mlx5_mini_cqe8 *)(cq + pos); for (i = 0; i < 8; ++i) cq[inv++].op_own = MLX5_CQE_INVALIDATE; } @@ -371,7 +368,7 @@ rxq_cq_decompress_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, MLX5_CQE_FORMAT(cq->op_own) == MLX5_COMPRESSED) { pos = 0; elts = &elts[mcqe_n]; - mcq = (void *)cq; + mcq = (volatile struct mlx5_mini_cqe8 *)cq; mcqe_n = MLX5_CQE_NUM_MINIS(cq->op_own) + 1; pkts_n += mcqe_n; goto cycle; @@ -651,38 +648,38 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, p = _mm_andnot_si128(mask, p); /* A.1 load cqes. */ p3 = _mm_extract_epi16(p, 3); - cqes[3] = _mm_loadl_epi64((__m128i *) - &cq[pos + p3].sop_drop_qpn); + cqes[3] = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, + &cq[pos + p3].sop_drop_qpn)); rte_compiler_barrier(); p2 = _mm_extract_epi16(p, 2); - cqes[2] = _mm_loadl_epi64((__m128i *) - &cq[pos + p2].sop_drop_qpn); + cqes[2] = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, + &cq[pos + p2].sop_drop_qpn)); rte_compiler_barrier(); /* B.1 load mbuf pointers. */ mbp1 = _mm_loadu_si128((__m128i *)&elts[pos]); mbp2 = _mm_loadu_si128((__m128i *)&elts[pos + 2]); /* A.1 load a block having op_own. */ p1 = _mm_extract_epi16(p, 1); - cqes[1] = _mm_loadl_epi64((__m128i *) - &cq[pos + p1].sop_drop_qpn); + cqes[1] = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, + &cq[pos + p1].sop_drop_qpn)); rte_compiler_barrier(); - cqes[0] = _mm_loadl_epi64((__m128i *) - &cq[pos].sop_drop_qpn); + cqes[0] = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, + &cq[pos].sop_drop_qpn)); /* B.2 copy mbuf pointers. */ _mm_storeu_si128((__m128i *)&pkts[pos], mbp1); _mm_storeu_si128((__m128i *)&pkts[pos + 2], mbp2); rte_io_rmb(); /* C.1 load remained CQE data and extract necessary fields. */ - cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p3]); - cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos + p2]); + cqe_tmp2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p3])); + cqe_tmp1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p2])); cqes[3] = _mm_blendv_epi8(cqes[3], cqe_tmp2, blend_mask); cqes[2] = _mm_blendv_epi8(cqes[2], cqe_tmp1, blend_mask); - cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + p3].csum); - cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos + p2].csum); + cqe_tmp2 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p3].csum)); + cqe_tmp1 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p2].csum)); cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x30); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x30); - cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p3].rsvd4[2]); - cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos + p2].rsvd4[2]); + cqe_tmp2 = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, &cq[pos + p3].rsvd4[2])); + cqe_tmp1 = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, &cq[pos + p2].rsvd4[2])); cqes[3] = _mm_blend_epi16(cqes[3], cqe_tmp2, 0x04); cqes[2] = _mm_blend_epi16(cqes[2], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ @@ -700,16 +697,16 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq, /* E.1 extract op_own field. */ op_own_tmp2 = _mm_unpacklo_epi32(cqes[2], cqes[3]); /* C.1 load remained CQE data and extract necessary fields. */ - cqe_tmp2 = _mm_load_si128((__m128i *)&cq[pos + p1]); - cqe_tmp1 = _mm_load_si128((__m128i *)&cq[pos]); + cqe_tmp2 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p1])); + cqe_tmp1 = _mm_load_si128(RTE_CAST_PTR(const __m128i *, &cq[pos])); cqes[1] = _mm_blendv_epi8(cqes[1], cqe_tmp2, blend_mask); cqes[0] = _mm_blendv_epi8(cqes[0], cqe_tmp1, blend_mask); - cqe_tmp2 = _mm_loadu_si128((__m128i *)&cq[pos + p1].csum); - cqe_tmp1 = _mm_loadu_si128((__m128i *)&cq[pos].csum); + cqe_tmp2 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &cq[pos + p1].csum)); + cqe_tmp1 = _mm_loadu_si128(RTE_CAST_PTR(const __m128i *, &cq[pos].csum)); cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x30); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x30); - cqe_tmp2 = _mm_loadl_epi64((__m128i *)&cq[pos + p1].rsvd4[2]); - cqe_tmp1 = _mm_loadl_epi64((__m128i *)&cq[pos].rsvd4[2]); + cqe_tmp2 = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, &cq[pos + p1].rsvd4[2])); + cqe_tmp1 = _mm_loadl_epi64(RTE_CAST_PTR(const __m128i *, &cq[pos].rsvd4[2])); cqes[1] = _mm_blend_epi16(cqes[1], cqe_tmp2, 0x04); cqes[0] = _mm_blend_epi16(cqes[0], cqe_tmp1, 0x04); /* C.2 generate final structure for mbuf with swapping bytes. */ diff --git a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c index 37075ea5e7..46391c9400 100644 --- a/drivers/net/ngbe/ngbe_rxtx_vec_neon.c +++ b/drivers/net/ngbe/ngbe_rxtx_vec_neon.c @@ -35,7 +35,7 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_NGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i]), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -58,12 +58,12 @@ ngbe_rxq_rearm(struct ngbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, rxdp++), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, rxdp++), dma_addr1); } rxq->rxrearm_start += RTE_NGBE_RXQ_REARM_THRESH; @@ -484,7 +484,7 @@ vtx1(volatile struct ngbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; - vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); + vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor); } static inline void diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c index c0e44bb1a7..373b773e2d 100644 --- a/drivers/net/tap/tap_flow.c +++ b/drivers/net/tap/tap_flow.c @@ -23,10 +23,10 @@ #ifdef HAVE_BPF_RSS /* Workaround for warning in bpftool generated skeleton code */ -#pragma GCC diagnostic push -#pragma GCC diagnostic ignored "-Wcast-qual" +__rte_diagnostic_push +__rte_diagnostic_ignored_wcast_qual #include "tap_rss.skel.h" -#pragma GCC diagnostic pop +__rte_diagnostic_pop #endif #define ISOLATE_HANDLE 1 diff --git a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c index d4d647fab5..a56e2f4164 100644 --- a/drivers/net/txgbe/txgbe_rxtx_vec_neon.c +++ b/drivers/net/txgbe/txgbe_rxtx_vec_neon.c @@ -34,7 +34,7 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) rxq->nb_rx_desc) { for (i = 0; i < RTE_TXGBE_DESCS_PER_LOOP; i++) { rxep[i].mbuf = &rxq->fake_mbuf; - vst1q_u64((uint64_t *)(uintptr_t)&rxdp[i], zero); + vst1q_u64(RTE_CAST_PTR(uint64_t *, &rxdp[i]), zero); } } rte_eth_devices[rxq->port_id].data->rx_mbuf_alloc_failed += @@ -57,12 +57,12 @@ txgbe_rxq_rearm(struct txgbe_rx_queue *rxq) paddr = mb0->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr0 = vsetq_lane_u64(paddr, zero, 0); /* flush desc with pa dma_addr */ - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr0); + vst1q_u64(RTE_CAST_PTR(uint64_t *, rxdp++), dma_addr0); vst1_u8((uint8_t *)&mb1->rearm_data, p); paddr = mb1->buf_iova + RTE_PKTMBUF_HEADROOM; dma_addr1 = vsetq_lane_u64(paddr, zero, 0); - vst1q_u64((uint64_t *)(uintptr_t)rxdp++, dma_addr1); + vst1q_u64(RTE_CAST_PTR(uint64_t *, rxdp++), dma_addr1); } rxq->rxrearm_start += RTE_TXGBE_RXQ_REARM_THRESH; @@ -484,7 +484,7 @@ vtx1(volatile struct txgbe_tx_desc *txdp, uint64x2_t descriptor = {pkt->buf_iova + pkt->data_off, (uint64_t)pkt_len << 45 | flags | pkt_len}; - vst1q_u64((uint64_t *)(uintptr_t)txdp, descriptor); + vst1q_u64(RTE_CAST_PTR(uint64_t *, txdp), descriptor); } static inline void diff --git a/drivers/net/virtio/virtio_rxtx_simple.c b/drivers/net/virtio/virtio_rxtx_simple.c index 438256970d..439e00a7e1 100644 --- a/drivers/net/virtio/virtio_rxtx_simple.c +++ b/drivers/net/virtio/virtio_rxtx_simple.c @@ -23,10 +23,6 @@ #include "virtio_rxtx_simple.h" -#ifndef __INTEL_COMPILER -#pragma GCC diagnostic ignored "-Wcast-qual" -#endif - int __rte_cold virtio_rxq_vec_setup(struct virtnet_rx *rxq) { -- 2.47.2.vfs.0.1 ^ permalink raw reply [flat|nested] 87+ messages in thread
end of thread, other threads:[~2025-01-21 22:36 UTC | newest] Thread overview: 87+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-12-27 1:33 [PATCH 0/3] add diagnostics macros to make code portable Andre Muezerie 2024-12-27 1:33 ` [PATCH 1/3] lib/eal: " Andre Muezerie 2024-12-27 1:33 ` [PATCH 2/3] drivers/common: " Andre Muezerie 2024-12-27 17:57 ` Stephen Hemminger 2024-12-27 19:43 ` Andre Muezerie 2024-12-27 1:33 ` [PATCH 3/3] drivers/net: " Andre Muezerie 2024-12-28 0:45 ` [PATCH v2 0/3] " Andre Muezerie 2024-12-28 0:45 ` [PATCH v2 1/3] lib/eal: " Andre Muezerie 2024-12-28 0:45 ` [PATCH v2 2/3] drivers/common: " Andre Muezerie 2024-12-28 0:45 ` [PATCH v2 3/3] drivers/net: " Andre Muezerie 2024-12-28 3:18 ` [PATCH v3 0/3] " Andre Muezerie 2024-12-28 3:18 ` [PATCH v3 1/3] lib/eal: " Andre Muezerie 2024-12-28 3:18 ` [PATCH v3 2/3] drivers/common: " Andre Muezerie 2024-12-28 3:18 ` [PATCH v3 3/3] drivers/net: " Andre Muezerie 2024-12-30 15:59 ` [PATCH v4 0/3] " Andre Muezerie 2024-12-30 15:59 ` [PATCH v4 1/3] lib/eal: " Andre Muezerie 2024-12-30 15:59 ` [PATCH v4 2/3] drivers/common: " Andre Muezerie 2024-12-30 15:59 ` [PATCH v4 3/3] drivers/net: " Andre Muezerie 2024-12-30 17:44 ` [PATCH v4 0/3] " Stephen Hemminger 2024-12-31 18:55 ` [PATCH v5 " Andre Muezerie 2024-12-31 18:55 ` [PATCH v5 1/3] lib/eal: " Andre Muezerie 2024-12-31 18:55 ` [PATCH v5 2/3] drivers/common: " Andre Muezerie 2024-12-31 18:55 ` [PATCH v5 3/3] drivers/net: " Andre Muezerie 2024-12-31 20:15 ` [PATCH v6 0/3] " Andre Muezerie 2024-12-31 20:15 ` [PATCH v6 1/3] lib/eal: " Andre Muezerie 2024-12-31 20:15 ` [PATCH v6 2/3] drivers/common: " Andre Muezerie 2024-12-31 20:15 ` [PATCH v6 3/3] drivers/net: " Andre Muezerie 2024-12-31 22:30 ` [PATCH v7 0/3] " Andre Muezerie 2024-12-31 22:30 ` [PATCH v7 1/3] lib/eal: " Andre Muezerie 2024-12-31 22:30 ` [PATCH v7 2/3] drivers/common: " Andre Muezerie 2024-12-31 22:30 ` [PATCH v7 3/3] drivers/net: " Andre Muezerie 2025-01-01 0:48 ` [PATCH v8 0/3] " Andre Muezerie 2025-01-01 0:48 ` [PATCH v8 1/3] lib/eal: " Andre Muezerie 2025-01-01 0:48 ` [PATCH v8 2/3] drivers/common: " Andre Muezerie 2025-01-01 0:48 ` [PATCH v8 3/3] drivers/net: " Andre Muezerie 2025-01-01 3:36 ` [PATCH v9 0/3] " Andre Muezerie 2025-01-01 3:36 ` [PATCH v9 1/3] lib/eal: " Andre Muezerie 2025-01-01 3:36 ` [PATCH v9 2/3] drivers/common: " Andre Muezerie 2025-01-01 3:36 ` [PATCH v9 3/3] drivers/net: " Andre Muezerie 2025-01-03 0:12 ` [PATCH v10 0/3] " Andre Muezerie 2025-01-03 0:12 ` [PATCH v10 1/3] lib/eal: " Andre Muezerie 2025-01-03 0:12 ` [PATCH v10 2/3] drivers/common: " Andre Muezerie 2025-01-03 0:12 ` [PATCH v10 3/3] drivers/net: " Andre Muezerie 2025-01-03 15:36 ` [PATCH v11 0/3] " Andre Muezerie 2025-01-03 15:36 ` [PATCH v11 1/3] lib/eal: " Andre Muezerie 2025-01-03 15:36 ` [PATCH v11 2/3] drivers/common: " Andre Muezerie 2025-01-03 15:36 ` [PATCH v11 3/3] drivers/net: " Andre Muezerie 2025-01-03 19:24 ` [PATCH v11 0/3] " Stephen Hemminger 2025-01-03 21:26 ` Andre Muezerie 2025-01-06 11:00 ` Bruce Richardson 2025-01-08 2:46 ` Andre Muezerie 2025-01-08 9:20 ` Bruce Richardson 2025-01-14 19:20 ` Andre Muezerie 2025-01-15 11:11 ` Bruce Richardson 2025-01-15 4:27 ` [PATCH v12 " Andre Muezerie 2025-01-15 4:27 ` [PATCH v12 1/3] lib/eal: " Andre Muezerie 2025-01-15 9:05 ` Morten Brørup 2025-01-15 4:27 ` [PATCH v12 2/3] drivers/common: " Andre Muezerie 2025-01-15 11:13 ` Bruce Richardson 2025-01-15 4:27 ` [PATCH v12 3/3] drivers/net: " Andre Muezerie 2025-01-16 1:55 ` [PATCH v13 0/3] " Andre Muezerie 2025-01-16 1:55 ` [PATCH v13 1/3] eal: " Andre Muezerie 2025-01-16 1:55 ` [PATCH v13 2/3] drivers/common: " Andre Muezerie 2025-01-16 1:55 ` [PATCH v13 3/3] drivers/net: " Andre Muezerie 2025-01-16 8:57 ` Bruce Richardson 2025-01-18 3:07 ` Andre Muezerie 2025-01-16 9:08 ` Morten Brørup 2025-01-17 3:56 ` Andre Muezerie 2025-01-18 3:05 ` Andre Muezerie 2025-01-16 8:58 ` [PATCH v13 0/3] " Bruce Richardson 2025-01-18 2:46 ` [PATCH v14 " Andre Muezerie 2025-01-18 2:46 ` [PATCH v14 1/3] eal: " Andre Muezerie 2025-01-18 2:46 ` [PATCH v14 2/3] drivers/common: " Andre Muezerie 2025-01-18 2:46 ` [PATCH v14 3/3] drivers/net: " Andre Muezerie 2025-01-18 21:55 ` [PATCH v15 0/3] " Andre Muezerie 2025-01-18 21:55 ` [PATCH v15 1/3] eal: " Andre Muezerie 2025-01-21 9:53 ` Morten Brørup 2025-01-21 14:28 ` Andre Muezerie 2025-01-21 14:41 ` Morten Brørup 2025-01-21 20:17 ` Andre Muezerie 2025-01-21 15:01 ` Stephen Hemminger 2025-01-18 21:55 ` [PATCH v15 2/3] drivers/common: " Andre Muezerie 2025-01-18 21:55 ` [PATCH v15 3/3] drivers/net: " Andre Muezerie 2025-01-21 22:36 ` [PATCH v16 0/3] " Andre Muezerie 2025-01-21 22:36 ` [PATCH v16 1/3] eal: " Andre Muezerie 2025-01-21 22:36 ` [PATCH v16 2/3] drivers/common: " Andre Muezerie 2025-01-21 22:36 ` [PATCH v16 3/3] drivers/net: " Andre Muezerie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).