DPDK patches and discussions
 help / color / mirror / Atom feed
From: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>
To: "Roger Melton (rmelton)" <rmelton@cisco.com>,
	Ruifeng Wang <Ruifeng.Wang@arm.com>,
	"dev@dpdk.org" <dev@dpdk.org>
Cc: nd <nd@arm.com>, nd <nd@arm.com>
Subject: RE: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for 32bit ARM
Date: Thu, 5 Dec 2024 19:33:57 +0000	[thread overview]
Message-ID: <PAWPR08MB89097A810A15556D67B581629F302@PAWPR08MB8909.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <afe49b7c-2c8f-4bb7-ae18-34ce7e77095a@cisco.com>

What version of CLANG are you using?

> -----Original Message-----
> From: Roger Melton (rmelton) <rmelton@cisco.com>
> Sent: Wednesday, December 4, 2024 11:24 AM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; dev@dpdk.org
> Cc: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>; nd
> <nd@arm.com>
> Subject: Re: lib/eal/arm/include/rte_vect.h fails to compile with clang14 for
> 32bit ARM
> 
> Considering this problem further, I don't see a way to avoid the CLANG
> compiler error with a function implementation.  We would need a macro
> implementation similar to CLANGS arm_neon.h.  In addition, it may be
> necessary to provide separate implementations for CLANG and non-CLANG
> compilers since the builtins between the toolchains are different.  One way to
> address this would be keep the existing function implementation, and add a
> new macro implementation for CLANG.
> 
> For example, something like:
> 
> 
> 
> 	#if !defined(RTE_CC_CLANG)
> 	#if (defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \
> 	(defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU && (GCC_VERSION
> < 70000))
> 	/* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7-
> A(AArch32)
> 	 * On AArch64, this intrinsic is supported since GCC version 7.
> 	 */
> 	static inline uint32x4_t
> 	vcopyq_laneq_u32(uint32x4_t a, const int lane_a,
> 	         uint32x4_t b, const int lane_b)
> 	{
> 	    return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a, lane_a);
> 	}
> 	#endif
> 	#else
> 	#if defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)
> 	/* NEON intrinsic vcopyq_laneq_u32() is not supported in ARMv7-
> A(AArch32)
> 	 * On AArch64, this intrinsic is supported
> 	 */
> 	#ifdef LITTLE_ENDIAN
> 	#define vcopyq_laneq_u32(__arg1, __arg2, __arg3, __arg4)
> __extension__ ({ \
> 	  uint32x4_t __ret; \
> 	  uint32x4_t __lcl_arg1 = __arg1; \
> 	  uint32x4_t __lcl_arg3 = __arg3; \
> 	  __ret = vsetq_lane_u32(vgetq_lane_u32(__lcl_arg3, __arg4),
> __lcl_arg1, __arg2); \
> 	  __ret; \
> 	})
> 	#else
> 	#define __noswap_vsetq_lane_u32(__arg1, __arg2, __arg3)
> __extension__ ({ \
> 	  uint32x4_t __ret; \
> 	  uint32_t __lcl_arg1 = __arg1; \
> 	  uint32x4_t __lcl_arg2 = __arg2; \
> 	  __ret = (uint32x4_t) __builtin_neon_vsetq_lane_i32(__lcl_arg1,
> (int32x4_t)__lcl_arg2, __arg3); \
> 	  __ret; \
> 	})
> 	#define __noswap_vgetq_lane_u32(__arg1, __arg2) __extension__ ({
> \
> 	  uint32_t __ret; \
> 	  uint32x4_t __lcl_arg1 = __arg1; \
> 	  __ret = (uint32_t)
> __builtin_neon_vgetq_lane_i32((int32x4_t)__lcl_arg1, __arg2); \
> 	  __ret; \
> 	})
> 	#define vcopyq_laneq_u32(__arg1, __arg2, __arg3, __arg4)
> __extension__ ({ \
> 	  uint32x4_t __ret; \
> 	  uint32x4_t __lcl_arg1 = __arg1; \
> 	  uint32x4_t __lcl_arg3 = __arg3; \
> 	  uint32x4_t __rev1; \
> 	  uint32x4_t __rev3; \
> 	  __rev1 = __builtin_shufflevector(__lcl_arg1, __lcl_arg1, 3, 2, 1, 0); \
> 	  __rev3 = __builtin_shufflevector(__lcl_arg3, __lcl_arg3, 3, 2, 1, 0); \
> 	  __ret =
> __noswap_vsetq_lane_u32(__noswap_vgetq_lane_u32(__rev3, __arg4),
> __rev1, __arg2); \
> 	  __ret = __builtin_shufflevector(__ret, __ret, 3, 2, 1, 0); \
> 	  __ret; \
> 	})
> 	#endif
> 	#endif
> 	#endif
> 
> 
> 
> NOTE1:  I saw no reason the CLANG arm_neon.h AARCH64 macros would not
> work for AARCH32, so the macros in this sample implementation are copies
> CLANG originals modified for (my) readability.  I'm not an attorney, but if used,
> it may be necessary to include the banner from the CLANG arm_neon.h.
> 
> NOTE2: While I can build the CLANG ARM implementation, I lack the hardware
> to test it.
> 
> 
> Regards,
> Roger
> 
> On 12/3/24 7:37 PM, Roger Melton (rmelton) wrote:
> 
> 
> 	After looking at this a bit closer today, I realize that my assertion that
> CLANG14 does support vcopyq_laneq_u32() for 32bit ARM was incorrect.  It
> does not.  The reason that disabling the implementation in rte_vect.h works
> for our clang builds is that we do not build the l3fwd app nor the ixgbe PMD
> for our application, and they are the only libraries that reference that function.
> 
> 	The clang compile errors appear to be related to how clang handles
> compile time constants, but I'm am again unsure how to resolve them in a way
> that would work for both GNU and clang.
> 
> 	Any suggestions?
> 
> 
> 	Regards,
> 	Roger
> 
> 
> 	On 12/2/24 8:26 PM, Ruifeng Wang wrote:
> 
> 
> 		+Arm folks.
> 
> 
> 
> 		From: Roger Melton (rmelton) <rmelton@cisco.com>
> <mailto:rmelton@cisco.com>
> 		Date: Tuesday, December 3, 2024 at 3:39 AM
> 		To: dev@dpdk.org <mailto:dev@dpdk.org>  <dev@dpdk.org>
> <mailto:dev@dpdk.org> , Ruifeng Wang <Ruifeng.Wang@arm.com>
> <mailto:Ruifeng.Wang@arm.com>
> 		Subject: lib/eal/arm/include/rte_vect.h fails to compile with
> clang14 for 32bit ARM
> 
> 		Hey folks,
> 
> 		We are building DPDK with clang14 for a 32bit armv8-a based
> CPU and ran into a compile error with the following from
> lib/eal/arm/include/rte_vect.h:
> 
> 
> 
> 
> 
> 			#if (defined(RTE_ARCH_ARM) &&
> defined(RTE_ARCH_32)) || \
> 			(defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU
> <https://elixir.bootlin.com/dpdk/v24.11/C/ident/RTE_CC_IS_GNU>  &&
> (GCC_VERSION
> <https://elixir.bootlin.com/dpdk/v24.11/C/ident/GCC_VERSION>  < 70000))
> 			/* NEON intrinsic vcopyq_laneq_u32() is not
> supported in ARMv7-A(AArch32)
> 			 * On AArch64, this intrinsic is supported since GCC
> version 7.
> 			 */
> 			static inline uint32x4_t
> 			vcopyq_laneq_u32
> <https://elixir.bootlin.com/dpdk/v24.11/C/ident/vcopyq_laneq_u32>
> (uint32x4_t a, const int lane_a,
> 			          uint32x4_t b, const int lane_b)
> 			{
> 			  return vsetq_lane_u32(vgetq_lane_u32(b, lane_b),
> a, lane_a);
> 			}
> 			#endif
> 
> 
> 		clang14 compile fails as follows:
> 
> 
> 
> 			In file included from ../../../../../../cisco-dpdk-
> upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36:
> 			../../../../../../cisco-dpdk-upstream-arm-clang-
> fixes.git/lib/eal/arm/include/rte_vect.h:80:24: error: argument to
> '__builtin_neon_vgetq_lane_i32' must be a constant integer
> 			return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a,
> lane_a);
> 			^ ~~~~~~
> 			/auto/binos-tools/llvm14/llvm-14.0-
> p24/lib/clang/14.0.5/include/arm_neon.h:7697:22: note: expanded from
> macro 'vgetq_lane_u32'
> 			__ret = (uint32_t)
> __builtin_neon_vgetq_lane_i32((int32x4_t)__s0, __p1); \
> 			^ ~~~~
> 			/auto/binos-tools/llvm14/llvm-14.0-
> p24/lib/clang/14.0.5/include/arm_neon.h:24148:19: note: expanded from
> macro 'vsetq_lane_u32'
> 			uint32_t __s0 = __p0; \
> 			^~~~
> 			In file included from ../../../../../../cisco-dpdk-
> upstream-arm-clang-fixes.git/lib/eal/common/eal_common_options.c:36:
> 			../../../../../../cisco-dpdk-upstream-arm-clang-
> fixes.git/lib/eal/arm/include/rte_vect.h:80:9: error: argument to
> '__builtin_neon_vsetq_lane_i32' must be a constant integer
> 			return vsetq_lane_u32(vgetq_lane_u32(b, lane_b), a,
> lane_a);
> 			^ ~~~~~~
> 			/auto/binos-tools/llvm14/llvm-14.0-
> p24/lib/clang/14.0.5/include/arm_neon.h:24150:24: note: expanded from
> macro 'vsetq_lane_u32'
> 			__ret = (uint32x4_t)
> __builtin_neon_vsetq_lane_i32(__s0, (int32x4_t)__s1, __p2); \
> 			^ ~~~~
> 			2 errors generated.
> 
> 
> 
> 		clang14 does appear to support the vcopyq_laneq_u32()
> intrinsic, s0 we want to skip the conditional implementation.
> 
> 		Two approaches I have tested to resolve the error are:
> 
> 		1) skip if building with clang:
> 
> 
> 			#if !defined(__clang__) &&
> ((defined(RTE_ARCH_ARM) && defined(RTE_ARCH_32)) || \
> 			72 (defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU
> && (GCC_VERSION < 70000)))
> 
> 
> 
> 
> 		2) skip if not building for ARMv7:
> 
> 
> 
> 
> 			#if (defined(RTE_ARCH_ARMv7) &&
> defined(RTE_ARCH_32)) || \
> 			(defined(RTE_ARCH_ARM64) && RTE_CC_IS_GNU &&
> (GCC_VERSION < 70000))
> 
> 
> 
> 		Both address our immediate problem, but may not be a
> appropriate for all cases.
> 
> 		Can anyone suggest the proper way to address this?  I'll be
> submitting an patch once I have a solution that is acceptable to the
> community.
> 
> 		Regards,
> 		Roger
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 


  reply	other threads:[~2024-12-05 19:34 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-02 19:39 Roger Melton (rmelton)
2024-12-03  1:25 ` Ruifeng Wang
2024-12-04  0:37   ` Roger Melton (rmelton)
2024-12-04 17:24     ` Roger Melton (rmelton)
2024-12-05 19:33       ` Wathsala Wathawana Vithanage [this message]
2024-12-05 20:09         ` Roger Melton (rmelton)
2024-12-04 15:38   ` Wathsala Wathawana Vithanage

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=PAWPR08MB89097A810A15556D67B581629F302@PAWPR08MB8909.eurprd08.prod.outlook.com \
    --to=wathsala.vithanage@arm.com \
    --cc=Ruifeng.Wang@arm.com \
    --cc=dev@dpdk.org \
    --cc=nd@arm.com \
    --cc=rmelton@cisco.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).