+Arm folks. From: Roger Melton (rmelton) Date: Tuesday, December 3, 2024 at 3:39 AM To: dev@dpdk.org , Ruifeng Wang Subject: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM Hey folks, We are building DPDK with clang14 for a 32bit armv8-a based CPU and ran into a compile error with the following from lib/eal/arm/include/rte_vect.h: #if (defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 70000)) /* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7-A(AArch32) * On AArch64, this intrinsic is supported since GCC version 7. */ static inline uint32x4_t vcopyq_laneq_u32(uint32x4_t a, const int lane_a, uint32x4_t b, const int lane_b) { return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); } #endif clang14 compile fails as follows: In file included from ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/arm/include/rte_vect.h:80:24: error: argument to '__builtin_neon_vgetq_lane_i32' must be a constant integer return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); ^ ~~~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:7697:22: note: expanded from macro 'vgetq_lane_u32' __ret = (uint32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \ ^ ~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:24148:19: note: expanded from macro 'vsetq_lane_u32' uint32_t __s0 = __p0; \ ^~~~ In file included from ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/arm/include/rte_vect.h:80:9: error: argument to '__builtin_neon_vsetq_lane_i32' must be a constant integer return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); ^ ~~~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:24150:24: note: expanded from macro 'vsetq_lane_u32' __ret = (uint32x4_t) __builtin_neon_vsetq_lane_i32(__s0, (int32x4_t)__s1, __p2); \ ^ ~~~~ 2 errors generated. clang14 does appear to support the vcopyq_laneq_u32() intrinsic, s0 we want to skip the conditional implementation. Two approaches I have tested to resolve the error are: 1) skip if building with clang: #if !defined(__clang__) && ((defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ 72 (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 70000))) 2) skip if not building for ARMv7: #if (defined(RTE_ARCH_ARMv7) && defined(RTE_ARCH_32)) || \ (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 70000)) Both address our immediate problem, but may not be a appropriate for all cases. Can anyone suggest the proper way to address this? I'll be submitting an patch once I have a solution that is acceptable to the community. Regards, Roger