* lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM @ 2024-12-02 19:39 Roger Melton (rmelton) 2024-12-03 1:25 ` Ruifeng Wang 0 siblings, 1 reply; 7+ messages in thread From: Roger Melton (rmelton) @ 2024-12-02 19:39 UTC (permalink / raw) To: dev, ruifeng.wang [-- Attachment #1: Type: text/plain, Size: 3020 bytes --] Hey folks, We are building DPDK with clang14 for a 32bit armv8-a based CPU and ran into a compile error with the following from lib/eal/arm/include/rte_vect.h: #if (defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU<https://elixir.bootlin.com/dpdk/v24.11/C/ident/RTE_CC_IS_GNU> && (GCC_VERSION<https://elixir.bootlin.com/dpdk/v24.11/C/ident/GCC_VERSION> < 70000)) /* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7-A(AArch32) * On AArch64, this intrinsic is supported since GCC version 7. */ static inline uint32x4_t vcopyq_laneq_u32<https://elixir.bootlin.com/dpdk/v24.11/C/ident/vcopyq_laneq_u32>(uint32x4_t a, const int lane_a, uint32x4_t b, const int lane_b) { return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); } #endif clang14 compile fails as follows: In file included from ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/arm/include/rte_vect.h:80:24: error: argument to '__builtin_neon_vgetq_lane_i32' must be a constant integer return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); ^ ~~~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:7697:22: note: expanded from macro 'vgetq_lane_u32' __ret = (uint32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \ ^ ~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:24148:19: note: expanded from macro 'vsetq_lane_u32' uint32_t __s0 = __p0; \ ^~~~ In file included from ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/arm/include/rte_vect.h:80:9: error: argument to '__builtin_neon_vsetq_lane_i32' must be a constant integer return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); ^ ~~~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:24150:24: note: expanded from macro 'vsetq_lane_u32' __ret = (uint32x4_t) __builtin_neon_vsetq_lane_i32(__s0, (int32x4_t)__s1, __p2); \ ^ ~~~~ 2 errors generated. clang14 does appear to support the vcopyq_laneq_u32() intrinsic, s0 we want to skip the conditional implementation. Two approaches I have tested to resolve the error are: 1) skip if building with clang: #if !defined(__clang__) && ((defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ 72 (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 70000))) 2) skip if not building for ARMv7: #if (defined(RTE_ARCH_ARMv7) && defined(RTE_ARCH_32)) || \ (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 70000)) Both address our immediate problem, but may not be a appropriate for all cases. Can anyone suggest the proper way to address this? I'll be submitting an patch once I have a solution that is acceptable to the community. Regards, Roger [-- Attachment #2: Type: text/html, Size: 7285 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM 2024-12-02 19:39 lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM Roger Melton (rmelton) @ 2024-12-03 1:25 ` Ruifeng Wang 2024-12-04 0:37 ` Roger Melton (rmelton) 2024-12-04 15:38 ` Wathsala Wathawana Vithanage 0 siblings, 2 replies; 7+ messages in thread From: Ruifeng Wang @ 2024-12-03 1:25 UTC (permalink / raw) To: Roger Melton (rmelton), dev; +Cc: Wathsala Wathawana Vithanage, nd [-- Attachment #1: Type: text/plain, Size: 3313 bytes --] +Arm folks. From: Roger Melton (rmelton) <rmelton@cisco.com> Date: Tuesday, December 3, 2024 at 3:39 AM To: dev@dpdk.org <dev@dpdk.org>, Ruifeng Wang <Ruifeng.Wang@arm.com> Subject: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM Hey folks, We are building DPDK with clang14 for a 32bit armv8-a based CPU and ran into a compile error with the following from lib/eal/arm/include/rte_vect.h: #if (defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU<https://elixir.bootlin.com/dpdk/v24.11/C/ident/RTE_CC_IS_GNU> && (GCC_VERSION<https://elixir.bootlin.com/dpdk/v24.11/C/ident/GCC_VERSION> < 70000)) /* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7-A(AArch32) * On AArch64, this intrinsic is supported since GCC version 7. */ static inline uint32x4_t vcopyq_laneq_u32<https://elixir.bootlin.com/dpdk/v24.11/C/ident/vcopyq_laneq_u32>(uint32x4_t a, const int lane_a, uint32x4_t b, const int lane_b) { return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); } #endif clang14 compile fails as follows: In file included from ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/arm/include/rte_vect.h:80:24: error: argument to '__builtin_neon_vgetq_lane_i32' must be a constant integer return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); ^ ~~~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:7697:22: note: expanded from macro 'vgetq_lane_u32' __ret = (uint32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \ ^ ~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:24148:19: note: expanded from macro 'vsetq_lane_u32' uint32_t __s0 = __p0; \ ^~~~ In file included from ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/arm/include/rte_vect.h:80:9: error: argument to '__builtin_neon_vsetq_lane_i32' must be a constant integer return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); ^ ~~~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:24150:24: note: expanded from macro 'vsetq_lane_u32' __ret = (uint32x4_t) __builtin_neon_vsetq_lane_i32(__s0, (int32x4_t)__s1, __p2); \ ^ ~~~~ 2 errors generated. clang14 does appear to support the vcopyq_laneq_u32() intrinsic, s0 we want to skip the conditional implementation. Two approaches I have tested to resolve the error are: 1) skip if building with clang: #if !defined(__clang__) && ((defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ 72 (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 70000))) 2) skip if not building for ARMv7: #if (defined(RTE_ARCH_ARMv7) && defined(RTE_ARCH_32)) || \ (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 70000)) Both address our immediate problem, but may not be a appropriate for all cases. Can anyone suggest the proper way to address this? I'll be submitting an patch once I have a solution that is acceptable to the community. Regards, Roger [-- Attachment #2: Type: text/html, Size: 10297 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM 2024-12-03 1:25 ` Ruifeng Wang @ 2024-12-04 0:37 ` Roger Melton (rmelton) 2024-12-04 17:24 ` Roger Melton (rmelton) 2024-12-04 15:38 ` Wathsala Wathawana Vithanage 1 sibling, 1 reply; 7+ messages in thread From: Roger Melton (rmelton) @ 2024-12-04 0:37 UTC (permalink / raw) To: Ruifeng Wang, dev; +Cc: Wathsala Wathawana Vithanage, nd [-- Attachment #1: Type: text/plain, Size: 4068 bytes --] After looking at this a bit closer today, I realize that my assertion that CLANG14 does support vcopyq_laneq_u32() for 32bit ARM was incorrect. It does not. The reason that disabling the implementation in rte_vect.h works for our clang builds is that we do not build the l3fwd app nor the ixgbe PMD for our application, and they are the only libraries that reference that function. The clang compile errors appear to be related to how clang handles compile time constants, but I'm am again unsure how to resolve them in a way that would work for both GNU and clang. Any suggestions? Regards, Roger On 12/2/24 8:26 PM, Ruifeng Wang wrote: +Arm folks. From: Roger Melton (rmelton) <rmelton@cisco.com><mailto:rmelton@cisco.com> Date: Tuesday, December 3, 2024 at 3:39 AM To: dev@dpdk.org<mailto:dev@dpdk.org> <dev@dpdk.org><mailto:dev@dpdk.org>, Ruifeng Wang <Ruifeng.Wang@arm.com><mailto:Ruifeng.Wang@arm.com> Subject: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM Hey folks, We are building DPDK with clang14 for a 32bit armv8-a based CPU and ran into a compile error with the following from lib/eal/arm/include/rte_vect.h: #if (defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU<https://elixir.bootlin.com/dpdk/v24.11/C/ident/RTE_CC_IS_GNU> && (GCC_VERSION<https://elixir.bootlin.com/dpdk/v24.11/C/ident/GCC_VERSION> < 70000)) /* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7-A(AArch32) * On AArch64, this intrinsic is supported since GCC version 7. */ static inline uint32x4_t vcopyq_laneq_u32<https://elixir.bootlin.com/dpdk/v24.11/C/ident/vcopyq_laneq_u32>(uint32x4_t a, const int lane_a, uint32x4_t b, const int lane_b) { return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); } #endif clang14 compile fails as follows: In file included from ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/arm/include/rte_vect.h:80:24: error: argument to '__builtin_neon_vgetq_lane_i32' must be a constant integer return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); ^ ~~~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:7697:22: note: expanded from macro 'vgetq_lane_u32' __ret = (uint32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \ ^ ~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:24148:19: note: expanded from macro 'vsetq_lane_u32' uint32_t __s0 = __p0; \ ^~~~ In file included from ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/arm/include/rte_vect.h:80:9: error: argument to '__builtin_neon_vsetq_lane_i32' must be a constant integer return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); ^ ~~~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:24150:24: note: expanded from macro 'vsetq_lane_u32' __ret = (uint32x4_t) __builtin_neon_vsetq_lane_i32(__s0, (int32x4_t)__s1, __p2); \ ^ ~~~~ 2 errors generated. clang14 does appear to support the vcopyq_laneq_u32() intrinsic, s0 we want to skip the conditional implementation. Two approaches I have tested to resolve the error are: 1) skip if building with clang: #if !defined(__clang__) && ((defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ 72 (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 70000))) 2) skip if not building for ARMv7: #if (defined(RTE_ARCH_ARMv7) && defined(RTE_ARCH_32)) || \ (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 70000)) Both address our immediate problem, but may not be a appropriate for all cases. Can anyone suggest the proper way to address this? I'll be submitting an patch once I have a solution that is acceptable to the community. Regards, Roger [-- Attachment #2: Type: text/html, Size: 11450 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM 2024-12-04 0:37 ` Roger Melton (rmelton) @ 2024-12-04 17:24 ` Roger Melton (rmelton) 2024-12-05 19:33 ` Wathsala Wathawana Vithanage 0 siblings, 1 reply; 7+ messages in thread From: Roger Melton (rmelton) @ 2024-12-04 17:24 UTC (permalink / raw) To: Ruifeng Wang, dev; +Cc: Wathsala Wathawana Vithanage, nd [-- Attachment #1: Type: text/plain, Size: 7106 bytes --] Considering this problem further, I don't see a way to avoid the CLANG compiler error with a function implementation. We would need a macro implementation similar to CLANGS arm_neon.h. In addition, it may be necessary to provide separate implementations for CLANG and non-CLANG compilers since the builtins between the toolchains are different. One way to address this would be keep the existing function implementation, and add a new macro implementation for CLANG. For example, something like: #if !defined(RTE_CC_CLANG) #if (defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 70000)) /* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7-A(AArch32) * On AArch64, this intrinsic is supported since GCC version 7. */ static inline uint32x4_t vcopyq_laneq_u32(uint32x4_t a, const int lane_a, uint32x4_t b, const int lane_b) { return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); } #endif #else #if defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32) /* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7-A(AArch32) * On AArch64, this intrinsic is supported */ #ifdef LITTLE_ENDIAN #define vcopyq_laneq_u32(__arg1, __arg2, __arg3, __arg4) __extension__ ({ \ uint32x4_t __ret; \ uint32x4_t __lcl_arg1 = __arg1; \ uint32x4_t __lcl_arg3 = __arg3; \ __ret = vsetq_lane_u32(vgetq_lane_u32(__lcl_arg3, __arg4), __lcl_arg1, __arg2); \ __ret; \ }) #else #define __noswap_vsetq_lane_u32(__arg1, __arg2, __arg3) __extension__ ({ \ uint32x4_t __ret; \ uint32_t __lcl_arg1 = __arg1; \ uint32x4_t __lcl_arg2 = __arg2; \ __ret = (uint32x4_t) __builtin_neon_vsetq_lane_i32(__lcl_arg1, (int32x4_t)__lcl_arg2, __arg3); \ __ret; \ }) #define __noswap_vgetq_lane_u32(__arg1, __arg2) __extension__ ({ \ uint32_t __ret; \ uint32x4_t __lcl_arg1 = __arg1; \ __ret = (uint32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__lcl_arg1, __arg2); \ __ret; \ }) #define vcopyq_laneq_u32(__arg1, __arg2, __arg3, __arg4) __extension__ ({ \ uint32x4_t __ret; \ uint32x4_t __lcl_arg1 = __arg1; \ uint32x4_t __lcl_arg3 = __arg3; \ uint32x4_t __rev1; \ uint32x4_t __rev3; \ __rev1 = __builtin_shufflevector(__lcl_arg1, __lcl_arg1, 3, 2, 1, 0); \ __rev3 = __builtin_shufflevector(__lcl_arg3, __lcl_arg3, 3, 2, 1, 0); \ __ret = __noswap_vsetq_lane_u32(__noswap_vgetq_lane_u32(__rev3, __arg4), __rev1, __arg2); \ __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \ __ret; \ }) #endif #endif #endif NOTE1: I saw no reason the CLANG arm_neon.h AARCH64 macros would not work for AARCH32, so the macros in this sample implementation are copies CLANG originals modified for (my) readability. I'm not an attorney, but if used, it may be necessary to include the banner from the CLANG arm_neon.h. NOTE2: While I can build the CLANG ARM implementation, I lack the hardware to test it. Regards, Roger On 12/3/24 7:37 PM, Roger Melton (rmelton) wrote: After looking at this a bit closer today, I realize that my assertion that CLANG14 does support vcopyq_laneq_u32() for 32bit ARM was incorrect. It does not. The reason that disabling the implementation in rte_vect.h works for our clang builds is that we do not build the l3fwd app nor the ixgbe PMD for our application, and they are the only libraries that reference that function. The clang compile errors appear to be related to how clang handles compile time constants, but I'm am again unsure how to resolve them in a way that would work for both GNU and clang. Any suggestions? Regards, Roger On 12/2/24 8:26 PM, Ruifeng Wang wrote: +Arm folks. From: Roger Melton (rmelton) <rmelton@cisco.com><mailto:rmelton@cisco.com> Date: Tuesday, December 3, 2024 at 3:39 AM To: dev@dpdk.org<mailto:dev@dpdk.org> <dev@dpdk.org><mailto:dev@dpdk.org>, Ruifeng Wang <Ruifeng.Wang@arm.com><mailto:Ruifeng.Wang@arm.com> Subject: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM Hey folks, We are building DPDK with clang14 for a 32bit armv8-a based CPU and ran into a compile error with the following from lib/eal/arm/include/rte_vect.h: #if (defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU<https://elixir.bootlin.com/dpdk/v24.11/C/ident/RTE_CC_IS_GNU> && (GCC_VERSION<https://elixir.bootlin.com/dpdk/v24.11/C/ident/GCC_VERSION> < 70000)) /* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7-A(AArch32) * On AArch64, this intrinsic is supported since GCC version 7. */ static inline uint32x4_t vcopyq_laneq_u32<https://elixir.bootlin.com/dpdk/v24.11/C/ident/vcopyq_laneq_u32>(uint32x4_t a, const int lane_a, uint32x4_t b, const int lane_b) { return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); } #endif clang14 compile fails as follows: In file included from ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/arm/include/rte_vect.h:80:24: error: argument to '__builtin_neon_vgetq_lane_i32' must be a constant integer return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); ^ ~~~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:7697:22: note: expanded from macro 'vgetq_lane_u32' __ret = (uint32_t) __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \ ^ ~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:24148:19: note: expanded from macro 'vsetq_lane_u32' uint32_t __s0 = __p0; \ ^~~~ In file included from ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: ../../../../../../cisco-dpdk-upstream-arm-clang-fixes.git/lib/eal/arm/include/rte_vect.h:80:9: error: argument to '__builtin_neon_vsetq_lane_i32' must be a constant integer return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); ^ ~~~~~~ /auto/binos-tools/llvm14/llvm-14.0-p24/lib/clang/14.0.5/include/arm_neon.h:24150:24: note: expanded from macro 'vsetq_lane_u32' __ret = (uint32x4_t) __builtin_neon_vsetq_lane_i32(__s0, (int32x4_t)__s1, __p2); \ ^ ~~~~ 2 errors generated. clang14 does appear to support the vcopyq_laneq_u32() intrinsic, s0 we want to skip the conditional implementation. Two approaches I have tested to resolve the error are: 1) skip if building with clang: #if !defined(__clang__) && ((defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ 72 (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 70000))) 2) skip if not building for ARMv7: #if (defined(RTE_ARCH_ARMv7) && defined(RTE_ARCH_32)) || \ (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION < 70000)) Both address our immediate problem, but may not be a appropriate for all cases. Can anyone suggest the proper way to address this? I'll be submitting an patch once I have a solution that is acceptable to the community. Regards, Roger [-- Attachment #2: Type: text/html, Size: 15777 bytes --] ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM 2024-12-04 17:24 ` Roger Melton (rmelton) @ 2024-12-05 19:33 ` Wathsala Wathawana Vithanage 2024-12-05 20:09 ` Roger Melton (rmelton) 0 siblings, 1 reply; 7+ messages in thread From: Wathsala Wathawana Vithanage @ 2024-12-05 19:33 UTC (permalink / raw) To: Roger Melton (rmelton), Ruifeng Wang, dev; +Cc: nd, nd What version of CLANG are you using? > -----Original Message----- > From: Roger Melton (rmelton) <rmelton@cisco.com> > Sent: Wednesday, December 4, 2024 11:24 AM > To: Ruifeng Wang <Ruifeng.Wang@arm.com>; dev@dpdk.org > Cc: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>; nd > <nd@arm.com> > Subject: Re: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for > 32bit ARM > > Considering this problem further, I don't see a way to avoid the CLANG > compiler error with a function implementation. We would need a macro > implementation similar to CLANGS arm_neon.h. In addition, it may be > necessary to provide separate implementations for CLANG and non-CLANG > compilers since the builtins between the toolchains are different. One way to > address this would be keep the existing function implementation, and add a > new macro implementation for CLANG. > > For example, something like: > > > > #if !defined(RTE_CC_CLANG) > #if (defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ > (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION > < 70000)) > /* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7- > A(AArch32) > * On AArch64, this intrinsic is supported since GCC version 7. > */ > static inline uint32x4_t > vcopyq_laneq_u32(uint32x4_t a, const int lane_a, > uint32x4_t b, const int lane_b) > { > return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); > } > #endif > #else > #if defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32) > /* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7- > A(AArch32) > * On AArch64, this intrinsic is supported > */ > #ifdef LITTLE_ENDIAN > #define vcopyq_laneq_u32(__arg1, __arg2, __arg3, __arg4) > __extension__ ({ \ > uint32x4_t __ret; \ > uint32x4_t __lcl_arg1 = __arg1; \ > uint32x4_t __lcl_arg3 = __arg3; \ > __ret = vsetq_lane_u32(vgetq_lane_u32(__lcl_arg3, __arg4), > __lcl_arg1, __arg2); \ > __ret; \ > }) > #else > #define __noswap_vsetq_lane_u32(__arg1, __arg2, __arg3) > __extension__ ({ \ > uint32x4_t __ret; \ > uint32_t __lcl_arg1 = __arg1; \ > uint32x4_t __lcl_arg2 = __arg2; \ > __ret = (uint32x4_t) __builtin_neon_vsetq_lane_i32(__lcl_arg1, > (int32x4_t)__lcl_arg2, __arg3); \ > __ret; \ > }) > #define __noswap_vgetq_lane_u32(__arg1, __arg2) __extension__ ({ > \ > uint32_t __ret; \ > uint32x4_t __lcl_arg1 = __arg1; \ > __ret = (uint32_t) > __builtin_neon_vgetq_lane_i32((int32x4_t)__lcl_arg1, __arg2); \ > __ret; \ > }) > #define vcopyq_laneq_u32(__arg1, __arg2, __arg3, __arg4) > __extension__ ({ \ > uint32x4_t __ret; \ > uint32x4_t __lcl_arg1 = __arg1; \ > uint32x4_t __lcl_arg3 = __arg3; \ > uint32x4_t __rev1; \ > uint32x4_t __rev3; \ > __rev1 = __builtin_shufflevector(__lcl_arg1, __lcl_arg1, 3, 2, 1, 0); \ > __rev3 = __builtin_shufflevector(__lcl_arg3, __lcl_arg3, 3, 2, 1, 0); \ > __ret = > __noswap_vsetq_lane_u32(__noswap_vgetq_lane_u32(__rev3, __arg4), > __rev1, __arg2); \ > __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \ > __ret; \ > }) > #endif > #endif > #endif > > > > NOTE1: I saw no reason the CLANG arm_neon.h AARCH64 macros would not > work for AARCH32, so the macros in this sample implementation are copies > CLANG originals modified for (my) readability. I'm not an attorney, but if used, > it may be necessary to include the banner from the CLANG arm_neon.h. > > NOTE2: While I can build the CLANG ARM implementation, I lack the hardware > to test it. > > > Regards, > Roger > > On 12/3/24 7:37 PM, Roger Melton (rmelton) wrote: > > > After looking at this a bit closer today, I realize that my assertion that > CLANG14 does support vcopyq_laneq_u32() for 32bit ARM was incorrect. It > does not. The reason that disabling the implementation in rte_vect.h works > for our clang builds is that we do not build the l3fwd app nor the ixgbe PMD > for our application, and they are the only libraries that reference that function. > > The clang compile errors appear to be related to how clang handles > compile time constants, but I'm am again unsure how to resolve them in a way > that would work for both GNU and clang. > > Any suggestions? > > > Regards, > Roger > > > On 12/2/24 8:26 PM, Ruifeng Wang wrote: > > > +Arm folks. > > > > From: Roger Melton (rmelton) <rmelton@cisco.com> > <mailto:rmelton@cisco.com> > Date: Tuesday, December 3, 2024 at 3:39 AM > To: dev@dpdk.org <mailto:dev@dpdk.org> <dev@dpdk.org> > <mailto:dev@dpdk.org> , Ruifeng Wang <Ruifeng.Wang@arm.com> > <mailto:Ruifeng.Wang@arm.com> > Subject: lib/eal/arm/include/rte_vect.h fails to compile with > clang14 for 32bit ARM > > Hey folks, > > We are building DPDK with clang14 for a 32bit armv8-a based > CPU and ran into a compile error with the following from > lib/eal/arm/include/rte_vect.h: > > > > > > #if (defined(RTE_ARCH_ARM) && > defined(RTE_ARCH_32)) || \ > (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU > <https://elixir.bootlin.com/dpdk/v24.11/C/ident/RTE_CC_IS_GNU> && > (GCC_VERSION > <https://elixir.bootlin.com/dpdk/v24.11/C/ident/GCC_VERSION> < 70000)) > /* NEON intrinsic vcopyq_laneq_u32() is not > supported in ARMv7-A(AArch32) > * On AArch64, this intrinsic is supported since GCC > version 7. > */ > static inline uint32x4_t > vcopyq_laneq_u32 > <https://elixir.bootlin.com/dpdk/v24.11/C/ident/vcopyq_laneq_u32> > (uint32x4_t a, const int lane_a, > uint32x4_t b, const int lane_b) > { > return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), > a, lane_a); > } > #endif > > > clang14 compile fails as follows: > > > > In file included from ../../../../../../cisco-dpdk- > upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: > ../../../../../../cisco-dpdk-upstream-arm-clang- > fixes.git/lib/eal/arm/include/rte_vect.h:80:24: error: argument to > '__builtin_neon_vgetq_lane_i32' must be a constant integer > return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, > lane_a); > ^ ~~~~~~ > /auto/binos-tools/llvm14/llvm-14.0- > p24/lib/clang/14.0.5/include/arm_neon.h:7697:22: note: expanded from > macro 'vgetq_lane_u32' > __ret = (uint32_t) > __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \ > ^ ~~~~ > /auto/binos-tools/llvm14/llvm-14.0- > p24/lib/clang/14.0.5/include/arm_neon.h:24148:19: note: expanded from > macro 'vsetq_lane_u32' > uint32_t __s0 = __p0; \ > ^~~~ > In file included from ../../../../../../cisco-dpdk- > upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: > ../../../../../../cisco-dpdk-upstream-arm-clang- > fixes.git/lib/eal/arm/include/rte_vect.h:80:9: error: argument to > '__builtin_neon_vsetq_lane_i32' must be a constant integer > return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, > lane_a); > ^ ~~~~~~ > /auto/binos-tools/llvm14/llvm-14.0- > p24/lib/clang/14.0.5/include/arm_neon.h:24150:24: note: expanded from > macro 'vsetq_lane_u32' > __ret = (uint32x4_t) > __builtin_neon_vsetq_lane_i32(__s0, (int32x4_t)__s1, __p2); \ > ^ ~~~~ > 2 errors generated. > > > > clang14 does appear to support the vcopyq_laneq_u32() > intrinsic, s0 we want to skip the conditional implementation. > > Two approaches I have tested to resolve the error are: > > 1) skip if building with clang: > > > #if !defined(__clang__) && > ((defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ > 72 (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU > && (GCC_VERSION < 70000))) > > > > > 2) skip if not building for ARMv7: > > > > > #if (defined(RTE_ARCH_ARMv7) && > defined(RTE_ARCH_32)) || \ > (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && > (GCC_VERSION < 70000)) > > > > Both address our immediate problem, but may not be a > appropriate for all cases. > > Can anyone suggest the proper way to address this? I'll be > submitting an patch once I have a solution that is acceptable to the > community. > > Regards, > Roger > > > > > > > > > > > > > > > > > > > > > ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM 2024-12-05 19:33 ` Wathsala Wathawana Vithanage @ 2024-12-05 20:09 ` Roger Melton (rmelton) 0 siblings, 0 replies; 7+ messages in thread From: Roger Melton (rmelton) @ 2024-12-05 20:09 UTC (permalink / raw) To: Wathsala Wathawana Vithanage, Ruifeng Wang, dev; +Cc: nd clang version 14.0.5 On 12/5/24 2:34 PM, Wathsala Wathawana Vithanage wrote: > What version of CLANG are you using? > >> -----Original Message----- >> From: Roger Melton (rmelton) <rmelton@cisco.com> >> Sent: Wednesday, December 4, 2024 11:24 AM >> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; dev@dpdk.org >> Cc: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>; nd >> <nd@arm.com> >> Subject: Re: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for >> 32bit ARM >> >> Considering this problem further, I don't see a way to avoid the CLANG >> compiler error with a function implementation. We would need a macro >> implementation similar to CLANGS arm_neon.h. In addition, it may be >> necessary to provide separate implementations for CLANG and non-CLANG >> compilers since the builtins between the toolchains are different. One way to >> address this would be keep the existing function implementation, and add a >> new macro implementation for CLANG. >> >> For example, something like: >> >> >> >> #if !defined(RTE_CC_CLANG) >> #if (defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ >> (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION >> < 70000)) >> /* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7- >> A(AArch32) >> * On AArch64, this intrinsic is supported since GCC version 7. >> */ >> static inline uint32x4_t >> vcopyq_laneq_u32(uint32x4_t a, const int lane_a, >> uint32x4_t b, const int lane_b) >> { >> return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a); >> } >> #endif >> #else >> #if defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32) >> /* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7- >> A(AArch32) >> * On AArch64, this intrinsic is supported >> */ >> #ifdef LITTLE_ENDIAN >> #define vcopyq_laneq_u32(__arg1, __arg2, __arg3, __arg4) >> __extension__ ({ \ >> uint32x4_t __ret; \ >> uint32x4_t __lcl_arg1 = __arg1; \ >> uint32x4_t __lcl_arg3 = __arg3; \ >> __ret = vsetq_lane_u32(vgetq_lane_u32(__lcl_arg3, __arg4), >> __lcl_arg1, __arg2); \ >> __ret; \ >> }) >> #else >> #define __noswap_vsetq_lane_u32(__arg1, __arg2, __arg3) >> __extension__ ({ \ >> uint32x4_t __ret; \ >> uint32_t __lcl_arg1 = __arg1; \ >> uint32x4_t __lcl_arg2 = __arg2; \ >> __ret = (uint32x4_t) __builtin_neon_vsetq_lane_i32(__lcl_arg1, >> (int32x4_t)__lcl_arg2, __arg3); \ >> __ret; \ >> }) >> #define __noswap_vgetq_lane_u32(__arg1, __arg2) __extension__ ({ >> \ >> uint32_t __ret; \ >> uint32x4_t __lcl_arg1 = __arg1; \ >> __ret = (uint32_t) >> __builtin_neon_vgetq_lane_i32((int32x4_t)__lcl_arg1, __arg2); \ >> __ret; \ >> }) >> #define vcopyq_laneq_u32(__arg1, __arg2, __arg3, __arg4) >> __extension__ ({ \ >> uint32x4_t __ret; \ >> uint32x4_t __lcl_arg1 = __arg1; \ >> uint32x4_t __lcl_arg3 = __arg3; \ >> uint32x4_t __rev1; \ >> uint32x4_t __rev3; \ >> __rev1 = __builtin_shufflevector(__lcl_arg1, __lcl_arg1, 3, 2, 1, 0); \ >> __rev3 = __builtin_shufflevector(__lcl_arg3, __lcl_arg3, 3, 2, 1, 0); \ >> __ret = >> __noswap_vsetq_lane_u32(__noswap_vgetq_lane_u32(__rev3, __arg4), >> __rev1, __arg2); \ >> __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \ >> __ret; \ >> }) >> #endif >> #endif >> #endif >> >> >> >> NOTE1: I saw no reason the CLANG arm_neon.h AARCH64 macros would not >> work for AARCH32, so the macros in this sample implementation are copies >> CLANG originals modified for (my) readability. I'm not an attorney, but if used, >> it may be necessary to include the banner from the CLANG arm_neon.h. >> >> NOTE2: While I can build the CLANG ARM implementation, I lack the hardware >> to test it. >> >> >> Regards, >> Roger >> >> On 12/3/24 7:37 PM, Roger Melton (rmelton) wrote: >> >> >> After looking at this a bit closer today, I realize that my assertion that >> CLANG14 does support vcopyq_laneq_u32() for 32bit ARM was incorrect. It >> does not. The reason that disabling the implementation in rte_vect.h works >> for our clang builds is that we do not build the l3fwd app nor the ixgbe PMD >> for our application, and they are the only libraries that reference that function. >> >> The clang compile errors appear to be related to how clang handles >> compile time constants, but I'm am again unsure how to resolve them in a way >> that would work for both GNU and clang. >> >> Any suggestions? >> >> >> Regards, >> Roger >> >> >> On 12/2/24 8:26 PM, Ruifeng Wang wrote: >> >> >> +Arm folks. >> >> >> >> From: Roger Melton (rmelton) <rmelton@cisco.com> >> <mailto:rmelton@cisco.com> >> Date: Tuesday, December 3, 2024 at 3:39 AM >> To: dev@dpdk.org <mailto:dev@dpdk.org> <dev@dpdk.org> >> <mailto:dev@dpdk.org> , Ruifeng Wang <Ruifeng.Wang@arm.com> >> <mailto:Ruifeng.Wang@arm.com> >> Subject: lib/eal/arm/include/rte_vect.h fails to compile with >> clang14 for 32bit ARM >> >> Hey folks, >> >> We are building DPDK with clang14 for a 32bit armv8-a based >> CPU and ran into a compile error with the following from >> lib/eal/arm/include/rte_vect.h: >> >> >> >> >> >> #if (defined(RTE_ARCH_ARM) && >> defined(RTE_ARCH_32)) || \ >> (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU >> <https://elixir.bootlin.com/dpdk/v24.11/C/ident/RTE_CC_IS_GNU> && >> (GCC_VERSION >> <https://elixir.bootlin.com/dpdk/v24.11/C/ident/GCC_VERSION> < 70000)) >> /* NEON intrinsic vcopyq_laneq_u32() is not >> supported in ARMv7-A(AArch32) >> * On AArch64, this intrinsic is supported since GCC >> version 7. >> */ >> static inline uint32x4_t >> vcopyq_laneq_u32 >> <https://elixir.bootlin.com/dpdk/v24.11/C/ident/vcopyq_laneq_u32> >> (uint32x4_t a, const int lane_a, >> uint32x4_t b, const int lane_b) >> { >> return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), >> a, lane_a); >> } >> #endif >> >> >> clang14 compile fails as follows: >> >> >> >> In file included from ../../../../../../cisco-dpdk- >> upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: >> ../../../../../../cisco-dpdk-upstream-arm-clang- >> fixes.git/lib/eal/arm/include/rte_vect.h:80:24: error: argument to >> '__builtin_neon_vgetq_lane_i32' must be a constant integer >> return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, >> lane_a); >> ^ ~~~~~~ >> /auto/binos-tools/llvm14/llvm-14.0- >> p24/lib/clang/14.0.5/include/arm_neon.h:7697:22: note: expanded from >> macro 'vgetq_lane_u32' >> __ret = (uint32_t) >> __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \ >> ^ ~~~~ >> /auto/binos-tools/llvm14/llvm-14.0- >> p24/lib/clang/14.0.5/include/arm_neon.h:24148:19: note: expanded from >> macro 'vsetq_lane_u32' >> uint32_t __s0 = __p0; \ >> ^~~~ >> In file included from ../../../../../../cisco-dpdk- >> upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36: >> ../../../../../../cisco-dpdk-upstream-arm-clang- >> fixes.git/lib/eal/arm/include/rte_vect.h:80:9: error: argument to >> '__builtin_neon_vsetq_lane_i32' must be a constant integer >> return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, >> lane_a); >> ^ ~~~~~~ >> /auto/binos-tools/llvm14/llvm-14.0- >> p24/lib/clang/14.0.5/include/arm_neon.h:24150:24: note: expanded from >> macro 'vsetq_lane_u32' >> __ret = (uint32x4_t) >> __builtin_neon_vsetq_lane_i32(__s0, (int32x4_t)__s1, __p2); \ >> ^ ~~~~ >> 2 errors generated. >> >> >> >> clang14 does appear to support the vcopyq_laneq_u32() >> intrinsic, s0 we want to skip the conditional implementation. >> >> Two approaches I have tested to resolve the error are: >> >> 1) skip if building with clang: >> >> >> #if !defined(__clang__) && >> ((defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \ >> 72 (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU >> && (GCC_VERSION < 70000))) >> >> >> >> >> 2) skip if not building for ARMv7: >> >> >> >> >> #if (defined(RTE_ARCH_ARMv7) && >> defined(RTE_ARCH_32)) || \ >> (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && >> (GCC_VERSION < 70000)) >> >> >> >> Both address our immediate problem, but may not be a >> appropriate for all cases. >> >> Can anyone suggest the proper way to address this? I'll be >> submitting an patch once I have a solution that is acceptable to the >> community. >> >> Regards, >> Roger >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > ^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM 2024-12-03 1:25 ` Ruifeng Wang 2024-12-04 0:37 ` Roger Melton (rmelton) @ 2024-12-04 15:38 ` Wathsala Wathawana Vithanage 1 sibling, 0 replies; 7+ messages in thread From: Wathsala Wathawana Vithanage @ 2024-12-04 15:38 UTC (permalink / raw) To: Ruifeng Wang, Roger Melton (rmelton), dev; +Cc: nd, nd > > clang14 does appear to support the vcopyq_laneq_u32() intrinsic, s0 we want > to skip the conditional implementation. > > Two approaches I have tested to resolve the error are: > > 1) skip if building with clang: > > > #if !defined(__clang__) && ((defined(RTE_ARCH_ARM) && > defined(RTE_ARCH_32)) || \ > 72 (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && > (GCC_VERSION < 70000))) Use RTE_CC_CLANG instead of __clang__ > > 2) skip if not building for ARMv7: > > > > > #if (defined(RTE_ARCH_ARMv7) && defined(RTE_ARCH_32)) || \ > (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION > < 70000)) > > > > Both address our immediate problem, but may not be a appropriate for all > cases. > > Can anyone suggest the proper way to address this? I'll be submitting an > patch once I have a solution that is acceptable to the community. I prefer skipping for clang (option 1) --wathsala ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-12-05 20:09 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-12-02 19:39 lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM Roger Melton (rmelton) 2024-12-03 1:25 ` Ruifeng Wang 2024-12-04 0:37 ` Roger Melton (rmelton) 2024-12-04 17:24 ` Roger Melton (rmelton) 2024-12-05 19:33 ` Wathsala Wathawana Vithanage 2024-12-05 20:09 ` Roger Melton (rmelton) 2024-12-04 15:38 ` Wathsala Wathawana Vithanage
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).