* [PATCH 0/6] allow faster instruction sets to be used with MSVC @ 2025-02-24 21:01 Andre Muezerie 2025-02-24 21:01 ` [PATCH 1/6] eal: make compatible with instruction set updates for MSVC Andre Muezerie ` (5 more replies) 0 siblings, 6 replies; 7+ messages in thread From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw) Cc: dev, Andre Muezerie Up to now MSVC has being used with the default mode, which uses SSE2 instructions for scalar floating-point and vector calculations. https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170 This series allows users to specify the CPU for which the generated code should be optimized for in the same way it's done for GCC: by passing the CPU name. When no explicit CPU name is passed, 'native' is assumed (like it happens with GCC) and the code will be optimized for the same CPU type used to compile the code. MSVC does not provide this functionality natively, so logic was added to handle these differences, detecting which instruction sets are supported by the CPU(s), passing the best options to MSVC and setting the correct macros (like __AVX512F__) so that the DPDK code can rely on them like it is done with GCC. Andre Muezerie (6): eal: make compatible with instruction set updates for MSVC eal: only use numbers as align parameters for MSVC config: allow faster instruction sets to be used with MSVC drivers/net: make compatible with instruction set updates for MSVC acl: make compatible with instruction set updates for MSVC member: make compatible with instruction set updates for MSVC config/x86/meson.build | 364 +++++++++++++++++++++++++---- drivers/net/bnxt/meson.build | 2 +- drivers/net/enic/meson.build | 2 +- drivers/net/intel/i40e/meson.build | 2 +- drivers/net/intel/iavf/meson.build | 2 +- drivers/net/intel/ice/meson.build | 2 +- drivers/net/intel/idpf/meson.build | 2 +- drivers/net/nfp/meson.build | 2 +- drivers/net/octeon_ep/meson.build | 4 +- lib/acl/meson.build | 16 +- lib/eal/common/rte_random.c | 2 + lib/eal/x86/include/rte_vect.h | 11 +- lib/member/meson.build | 11 +- 13 files changed, 363 insertions(+), 59 deletions(-) -- 2.48.1.vfs.0.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 1/6] eal: make compatible with instruction set updates for MSVC 2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie @ 2025-02-24 21:01 ` Andre Muezerie 2025-02-24 21:01 ` [PATCH 2/6] eal: only use numbers as align parameters " Andre Muezerie ` (4 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw) To: Mattias Rönnblom, Tyler Retzlaff; +Cc: dev, Andre Muezerie After the instruction set updates for MSVC the error below poped up: ../lib/eal/common/rte_random.c(6): fatal error C1083: Cannot open include file: 'x86intrin.h': No such file or directory The fix is to not include header x86intrin.h with MSVC. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/common/rte_random.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/lib/eal/common/rte_random.c b/lib/eal/common/rte_random.c index 8e62578176..9354358818 100644 --- a/lib/eal/common/rte_random.c +++ b/lib/eal/common/rte_random.c @@ -3,8 +3,10 @@ */ #ifdef __RDSEED__ +#ifndef RTE_TOOLCHAIN_MSVC #include <x86intrin.h> #endif +#endif #include <unistd.h> #include <rte_bitops.h> -- 2.48.1.vfs.0.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 2/6] eal: only use numbers as align parameters for MSVC 2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie 2025-02-24 21:01 ` [PATCH 1/6] eal: make compatible with instruction set updates for MSVC Andre Muezerie @ 2025-02-24 21:01 ` Andre Muezerie 2025-02-24 21:01 ` [PATCH 3/6] config: allow faster instruction sets to be used with MSVC Andre Muezerie ` (3 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw) To: Bruce Richardson, Konstantin Ananyev; +Cc: dev, Andre Muezerie After the instruction set updates for MSVC the error below poped up: ..\lib\eal\x86\include\rte_vect.h(82): error C2059: syntax error: '(' The issue is that MSVC does not allow __rte_aligned(RTE_X86_ZMM_SIZE). It only accepts numbers that are power of 2. So, even though RTE_X86_ZMM_SIZE represents a number that is a power of two it cannot be used directly. https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170 Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/eal/x86/include/rte_vect.h | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/lib/eal/x86/include/rte_vect.h b/lib/eal/x86/include/rte_vect.h index 70c78e9b77..0a51c539a4 100644 --- a/lib/eal/x86/include/rte_vect.h +++ b/lib/eal/x86/include/rte_vect.h @@ -79,7 +79,16 @@ __extension__ ({ \ #define RTE_X86_ZMM_SIZE (sizeof(__m512i)) #define RTE_X86_ZMM_MASK (RTE_X86_ZMM_SIZE - 1) -typedef union __rte_aligned(RTE_X86_ZMM_SIZE) __rte_x86_zmm { +/* + * MSVC does not allow __rte_aligned(RTE_X86_ZMM_SIZE). It only accepts + * numbers that are power of 2. So, even though RTE_X86_ZMM_SIZE represents a + * number that is a power of two it cannot be used directly. + * Ref: https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170 + * The static assert below ensures that RTE_X86_ZMM_SIZE is equal to what is + * used in the __rte_aligned() expression. + */ +static_assert(RTE_X86_ZMM_SIZE == 64, "Unexpected size of __m512i"); +typedef union __rte_aligned(64) __rte_x86_zmm { __m512i z; ymm_t y[RTE_X86_ZMM_SIZE / sizeof(ymm_t)]; xmm_t x[RTE_X86_ZMM_SIZE / sizeof(xmm_t)]; -- 2.48.1.vfs.0.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 3/6] config: allow faster instruction sets to be used with MSVC 2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie 2025-02-24 21:01 ` [PATCH 1/6] eal: make compatible with instruction set updates for MSVC Andre Muezerie 2025-02-24 21:01 ` [PATCH 2/6] eal: only use numbers as align parameters " Andre Muezerie @ 2025-02-24 21:01 ` Andre Muezerie 2025-02-24 21:01 ` [PATCH 4/6] drivers/net: make compatible with instruction set updates for MSVC Andre Muezerie ` (2 subsequent siblings) 5 siblings, 0 replies; 7+ messages in thread From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw) To: Bruce Richardson, Konstantin Ananyev; +Cc: dev, Andre Muezerie Up to now MSVC has being used with the default mode, which uses SSE2 instructions for scalar floating-point and vector calculations. https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170 This patch allows users to specify the CPU for which the generated code should be optimized for in the same way it's done for GCC: by passing the CPU name. When no explicit CPU name is passed, 'native' is assumed (like it happens with GCC) and the code will be optimized for the same CPU type used to compile the code. MSVC does not provide this functionality natively, so logic was added to meson.build to handle these differences, detecting which instruction sets are supported by the CPU(s), passing the best options to MSVC and setting the correct macros (like __AVX512F__) so that the DPDK code can rely on them like it is done with GCC. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- config/x86/meson.build | 364 ++++++++++++++++++++++++++++++++++++----- 1 file changed, 325 insertions(+), 39 deletions(-) diff --git a/config/x86/meson.build b/config/x86/meson.build index 47a5b0c04a..9260969c54 100644 --- a/config/x86/meson.build +++ b/config/x86/meson.build @@ -14,7 +14,194 @@ if is_linux or cc.get_id() == 'gcc' endif endif -cc_avx512_flags = ['-mavx512f', '-mavx512vl', '-mavx512dq', '-mavx512bw'] +cpuid_code = ''' + #include <stdio.h> + #include <stdint.h> + #include <intrin.h> + + uint32_t f1_ECX = 0; + uint32_t f1_EDX = 0; + uint32_t f7_EBX = 0; + uint32_t f7_ECX = 0; + + void get_support_flags() + { + int ids_max; + int data[4]; + + /* + * Calling __cpuid with 0x0 as the function_id argument + * gets the number of the highest valid function ID. + */ + __cpuid(data, 0); + ids_max = data[0]; + + if (1 <= ids_max) { + __cpuidex(data, 1, 0); + f1_ECX = data[2]; + f1_EDX = data[3]; + + if (7 <= ids_max) { + __cpuidex(data, 7, 0); + f7_EBX = data[1]; + f7_ECX = data[2]; + } + } + } + + int get_instruction_support() + { + get_support_flags(); + + #ifdef SSE3 + return (f1_ECX & (1UL << 0)) ? 1 : 0; + #endif + #ifdef PCLMUL + return (f1_ECX & (1UL << 1)) ? 1 : 0; + #endif + #ifdef SSSE3 + return (f1_ECX & (1UL << 9)) ? 1 : 0; + #endif + #ifdef SSE4_1 + return (f1_ECX & (1UL << 19)) ? 1 : 0; + #endif + #ifdef SSE4_2 + return (f1_ECX & (1UL << 20)) ? 1 : 0; + #endif + #ifdef AES + return (f1_ECX & (1UL << 25)) ? 1 : 0; + #endif + #ifdef AVX + return (f1_ECX & (1UL << 28)) ? 1 : 0; + #endif + #ifdef RDRND + return (f1_ECX & (1UL << 30)) ? 1 : 0; + #endif + #ifdef SSE + return (f1_EDX & (1UL << 25)) ? 1 : 0; + #endif + #ifdef SSE2 + return (f1_EDX & (1UL << 26)) ? 1 : 0; + #endif + #ifdef AVX2 + return (f7_EBX & (1UL << 5)) ? 1 : 0; + #endif + #ifdef AVX512F + return (f7_EBX & (1UL << 16)) ? 1 : 0; + #endif + #ifdef AVX512DQ + return (f7_EBX & (1UL << 17)) ? 1 : 0; + #endif + #ifdef RDSEED + return (f7_EBX & (1UL << 18)) ? 1 : 0; + #endif + #ifdef AVX512IFMA + return (f7_EBX & (1UL << 21)) ? 1 : 0; + #endif + #ifdef AVX512CD + return (f7_EBX & (1UL << 28)) ? 1 : 0; + #endif + #ifdef AVX512BW + return (f7_EBX & (1UL << 30)) ? 1 : 0; + #endif + #ifdef AVX512VL + return (f7_EBX & (1UL << 31)) ? 1 : 0; + #endif + #ifdef GFNI + return (f7_ECX & (1UL << 8)) ? 1 : 0; + #endif + #ifdef VPCLMULQDQ + return (f7_ECX & (1UL << 10)) ? 1 : 0; + #endif + + return -1; + } + + int main(int argc, char *argv[]) + { + int res = get_instruction_support(); + if (res == -1) { + printf("Unknown instruction set"); + return -1; + } + printf("%d", res); + + return 0; + } +''' + +# The data in table below can be found here: +# https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html +# This table only contains CPUs that have SSE4.2, as this instruction set is required by DPDK. +# That means that in addition to the instruction sets mentioned in the table, all these CPUs +# also have ['SSE', 'SSE2', 'SSE3', 'SSEE3', 'SSE4_1', 'SSE4_2'] +cpu_type_to_flags = { + 'x86-64-v2': [], + 'x86-64-v3': ['AVX', 'AVX2'], + 'x86-64-v4': ['AVX', 'AVX2', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'], + 'nehalem': [], + 'corei7': [], + 'westmere': ['PCLMUL'], + 'sandybridge': ['AVX', 'PCLMUL'], + 'corei7-avx': ['AVX', 'PCLMUL'], + 'ivybridge': ['AVX', 'PCLMUL', 'RDRND'], + 'core-avx-i': ['AVX', 'PCLMUL', 'RDRND'], + 'haswell': ['AVX', 'PCLMUL', 'RDRND', 'AVX2'], + 'core-avx2': ['AVX', 'PCLMUL', 'RDRND', 'AVX2'], + 'broadwell': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED'], + 'skylake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'], + 'skylake-avx512': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'], + 'cascadelake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'], + 'cannonlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA'], + 'cooperlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'], + 'icelake-client': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'], + 'icelake-server': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'], + 'tigerlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'], + 'rocketlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'], + 'alderlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'], + 'raptorlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'], + 'meteorlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'], + 'gracemont': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'], + 'arrowlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'], + 'arrowlake-s': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'], + 'lunarlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'], + 'pantherlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'], + 'sapphirerapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'], + 'emeraldrapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'], + 'graniterapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'], + 'graniterapids-d': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'], + 'diamondrapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'], + 'silvermont': ['PCLMUL', 'RDRND'], + 'slm': ['PCLMUL', 'RDRND'], + 'goldmont': ['PCLMUL', 'RDRND', 'RDSEED', 'AES'], + 'goldmont-plus': ['PCLMUL', 'RDRND', 'RDSEED', 'AES'], + 'tremont': ['PCLMUL', 'RDRND', 'RDSEED', 'AES', 'GFNI'], + 'sierraforest': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'], + 'grandridge': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'], +'clearwaterforest': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'], + 'bdver1': ['AVX', 'PCLMUL', 'AES'], + 'bdver2': ['AVX', 'PCLMUL', 'AES'], + 'bdver3': ['AVX', 'PCLMUL', 'AES'], + 'bdver4': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'AES'], + 'znver1': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'], + 'znver2': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'], + 'znver3': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ'], + 'znver4': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'], + 'znver5': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'], + 'btver2': ['AVX', 'PCLMUL', 'AES'], + 'lujiazui': ['PCLMUL', 'RDRND', 'RDSEED', 'AES'], + 'yongfeng': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'], + 'shijidadao': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'], +} + +if is_ms_compiler + cc_avx2_flags = ['/arch:AVX2'] + cc_avx512_flags = ['/arch:AVX512'] +else + cc_avx2_flags = ['-mavx2'] + cc_avx512_flags = ['-mavx512f', '-mavx512vl', '-mavx512dq', '-mavx512bw'] +endif + cc_has_avx512 = false target_has_avx512 = false if (binutils_ok and cc.has_multi_arguments(cc_avx512_flags) @@ -30,12 +217,14 @@ if (binutils_ok and cc.has_multi_arguments(cc_avx512_flags) warning('Broken _mm512_extracti64x4_epi64, disabling AVX512 support') else cc_has_avx512 = true - target_has_avx512 = ( - cc.get_define('__AVX512F__', args: machine_args) != '' and - cc.get_define('__AVX512BW__', args: machine_args) != '' and - cc.get_define('__AVX512DQ__', args: machine_args) != '' and - cc.get_define('__AVX512VL__', args: machine_args) != '' - ) + if not is_ms_compiler + target_has_avx512 = ( + cc.get_define('__AVX512F__', args: machine_args) != '' and + cc.get_define('__AVX512BW__', args: machine_args) != '' and + cc.get_define('__AVX512DQ__', args: machine_args) != '' and + cc.get_define('__AVX512VL__', args: machine_args) != '' + ) + endif endif endif @@ -47,42 +236,139 @@ if not is_ms_compiler endif endif -# enable restricted transactional memory intrinsics -# https://gcc.gnu.org/onlinedocs/gcc/x86-transactional-memory-intrinsics.html -if cc.get_id() != 'msvc' - machine_args += '-mrtm' -endif +if is_ms_compiler + # Determine cpu_flags for a given configuration. + # SSE instructions up to 4.2 are required for DPDK. + cpu_flags = ['SSE', 'SSE2', 'SSE3', 'SSEE3', 'SSE4_1', 'SSE4_2'] + + message('cpu_instruction_set: @0@'.format(cpu_instruction_set)) -base_flags = ['SSE', 'SSE2', 'SSE3','SSSE3', 'SSE4_1', 'SSE4_2'] -foreach f:base_flags - compile_time_cpuflags += ['RTE_CPUFLAG_' + f] -endforeach - -optional_flags = [ - 'AES', - 'AVX', - 'AVX2', - 'AVX512BW', - 'AVX512CD', - 'AVX512DQ', - 'AVX512F', - 'AVX512VL', - 'PCLMUL', - 'RDRND', - 'RDSEED', - 'VPCLMULQDQ', -] -foreach f:optional_flags - if cc.get_define('__@0@__'.format(f), args: machine_args) == '1' - if f == 'PCLMUL' # special case flags with different defines - f = 'PCLMULQDQ' - elif f == 'RDRND' - f = 'RDRAND' + if cpu_instruction_set == '' + # Nothing to do as cpu_flags already holds all the required flags. + elif cpu_instruction_set == 'native' + # MSVC behaves differently than GCC regarding supported instruction sets. + # While GCC will create macros like __AVX512F__ when such instruction set is + # supported by the current CPU, MSVC does not do that. MSVC will create that + # macro when parameter /arch:AVX512 is passed to the compiler, even when the + # CPU does not have that instruction set (by design). So there's a need to + # look at CPUID flags to figure out what is really supported by the CPU, so + # that the correct /arch value can be passed to the compiler. + # The macros also need to be explicitly defined, as /arch will not create all + # macros GCC creates under the same conditions. + # As an example, /arch:AVX512 creates __AVX512BW__, but does not create __SSE2__. + # More details available here: + # https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros + + optional_flags = [ + 'PCLMUL', + 'AES', + 'AVX', + 'RDRND', + 'AVX2', + 'AVX512F', + 'AVX512BW', + 'AVX512DQ', + 'AVX512VL', + 'AVX512CD', + 'AVX512IFMA', + 'GFNI', + 'RDSEED', + 'VPCLMULQDQ', + ] + foreach f:optional_flags + result = cc.run(cpuid_code, args: '-D@0@'.format(f), + name: 'instruction set @0@'.format(f)) + has_instr_set = result.returncode() == 0 and result.stdout() == '1' + if has_instr_set + cpu_flags += f + endif + message('Target has @0@: @1@'.format(f, has_instr_set)) + endforeach + else + # An explicit cpu_instruction_set was provided. Get cpu_flags + # from cpu_type_to_flags table. + if cpu_instruction_set not in cpu_type_to_flags + error('CPU not known or not supported. Please update the table with known CPUs if needed.') endif - compile_time_cpuflags += ['RTE_CPUFLAG_' + f] + cpu_flags += cpu_type_to_flags[cpu_instruction_set] + endif + + # Now that all cpu_flags are known, set compile_time_cpuflags and also + # machine_args to ensure that the instruction set #defines (like __SSE2__) + # are always present in the preprocessor. + message('cpu_flags: @0@'.format(cpu_flags)) + + foreach flag:cpu_flags + machine_args += '/D__@0@__'.format(flag) + if flag == 'PCLMUL' + flag = 'PCLMULQDQ' + elif flag == 'RDRND' + flag = 'RDRAND' + endif + compile_time_cpuflags += ['RTE_CPUFLAG_' + flag] + endforeach + + target_has_avx512 = ('AVX512F' in cpu_flags and + 'AVX512BW' in cpu_flags and + 'AVX512DQ' in cpu_flags and + 'AVX512VL' in cpu_flags) + + # Decide which instruction sets should be used by the compiler. + # With MSVC, intrinsic functions are always enabled. However, for the + # compiler to use an extended instruction set for automatically + # generated code "/arch" needs to be passed. So we instruct the compiler + # to use the largest set that is supported by the CPU. It is implied that + # smaller sets than the largest selected are included, as described here: + # https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170 + if 'RTE_CPUFLAG_AVX512F' in compile_time_cpuflags + machine_args += ['/arch:AVX512'] + elif 'RTE_CPUFLAG_AVX2' in compile_time_cpuflags + machine_args += ['/arch:AVX2'] + elif 'RTE_CPUFLAG_AVX' in compile_time_cpuflags + machine_args += ['/arch:AVX'] + else + # SSE4.2 is expected to always be available + machine_args += ['/arch:SSE4.2'] endif -endforeach + message('machine_args: @0@'.format(machine_args)) +else + # enable restricted transactional memory intrinsics + # https://gcc.gnu.org/onlinedocs/gcc/x86-transactional-memory-intrinsics.html + machine_args += '-mrtm' + + base_flags = ['SSE', 'SSE2', 'SSE3','SSSE3', 'SSE4_1', 'SSE4_2'] + foreach f:base_flags + compile_time_cpuflags += ['RTE_CPUFLAG_' + f] + endforeach + + optional_flags = [ + 'AES', + 'AVX', + 'AVX2', + 'AVX512BW', + 'AVX512CD', + 'AVX512DQ', + 'AVX512F', + 'AVX512VL', + 'PCLMUL', + 'RDRND', + 'RDSEED', + 'VPCLMULQDQ', + ] + foreach f:optional_flags + if cc.get_define('__@0@__'.format(f), args: machine_args) == '1' + if f == 'PCLMUL' # special case flags with different defines + f = 'PCLMULQDQ' + elif f == 'RDRND' + f = 'RDRAND' + endif + compile_time_cpuflags += ['RTE_CPUFLAG_' + f] + endif + endforeach +endif + +message('compile_time_cpuflags: @0@'.format(compile_time_cpuflags)) dpdk_conf.set('RTE_ARCH_X86', 1) if dpdk_conf.get('RTE_ARCH_64') -- 2.48.1.vfs.0.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 4/6] drivers/net: make compatible with instruction set updates for MSVC 2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie ` (2 preceding siblings ...) 2025-02-24 21:01 ` [PATCH 3/6] config: allow faster instruction sets to be used with MSVC Andre Muezerie @ 2025-02-24 21:01 ` Andre Muezerie 2025-02-24 21:01 ` [PATCH 5/6] acl: " Andre Muezerie 2025-02-24 21:01 ` [PATCH 6/6] member: " Andre Muezerie 5 siblings, 0 replies; 7+ messages in thread From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw) To: Ajit Khaparde, Somnath Kotur, John Daley, Hyong Youb Kim, Ian Stokes, Bruce Richardson, Vladimir Medvedkin, Anatoly Burakov, Jingjing Wu, Praveen Shetty, Chaoyong He, Vamsi Attunuru Cc: dev, Andre Muezerie Top level 'cc_avx2_flags' was created and holds the correct flags depending on the compiler used. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- drivers/net/bnxt/meson.build | 2 +- drivers/net/enic/meson.build | 2 +- drivers/net/intel/i40e/meson.build | 2 +- drivers/net/intel/iavf/meson.build | 2 +- drivers/net/intel/ice/meson.build | 2 +- drivers/net/intel/idpf/meson.build | 2 +- drivers/net/nfp/meson.build | 2 +- drivers/net/octeon_ep/meson.build | 4 ++-- 8 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/net/bnxt/meson.build b/drivers/net/bnxt/meson.build index e26cf13a65..fd82d0c409 100644 --- a/drivers/net/bnxt/meson.build +++ b/drivers/net/bnxt/meson.build @@ -65,7 +65,7 @@ if arch_subdir == 'x86' static_rte_bus_pci, static_rte_kvargs, static_rte_hash], include_directories: includes, - c_args: [cflags, '-mavx2']) + c_args: [cflags, cc_avx2_flags]) objs += bnxt_avx2_lib.extract_objects('bnxt_rxtx_vec_avx2.c') elif arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64') sources += files('bnxt_rxtx_vec_neon.c') diff --git a/drivers/net/enic/meson.build b/drivers/net/enic/meson.build index 1e26338350..cfe5ec170a 100644 --- a/drivers/net/enic/meson.build +++ b/drivers/net/enic/meson.build @@ -38,7 +38,7 @@ if dpdk_conf.has('RTE_ARCH_X86_64') 'enic_rxtx_vec_avx2.c', dependencies: [static_rte_ethdev, static_rte_bus_pci], include_directories: includes, - c_args: [cflags, '-mavx2']) + c_args: [cflags, cc_avx2_flags]) objs += enic_avx2_lib.extract_objects('enic_rxtx_vec_avx2.c') endif diff --git a/drivers/net/intel/i40e/meson.build b/drivers/net/intel/i40e/meson.build index ffa40c5d64..2973ed1a01 100644 --- a/drivers/net/intel/i40e/meson.build +++ b/drivers/net/intel/i40e/meson.build @@ -49,7 +49,7 @@ if arch_subdir == 'x86' 'i40e_rxtx_vec_avx2.c', dependencies: [static_rte_ethdev, static_rte_kvargs, static_rte_hash], include_directories: includes, - c_args: [cflags, '-mavx2']) + c_args: [cflags, cc_avx2_flags]) objs += i40e_avx2_lib.extract_objects('i40e_rxtx_vec_avx2.c') if cc_has_avx512 diff --git a/drivers/net/intel/iavf/meson.build b/drivers/net/intel/iavf/meson.build index 19cd1cfbc8..f7eac7c57a 100644 --- a/drivers/net/intel/iavf/meson.build +++ b/drivers/net/intel/iavf/meson.build @@ -37,7 +37,7 @@ if arch_subdir == 'x86' 'iavf_rxtx_vec_avx2.c', dependencies: [static_rte_ethdev], include_directories: includes, - c_args: [cflags, '-mavx2']) + c_args: [cflags, cc_avx2_flags]) objs += iavf_avx2_lib.extract_objects('iavf_rxtx_vec_avx2.c') if cc_has_avx512 diff --git a/drivers/net/intel/ice/meson.build b/drivers/net/intel/ice/meson.build index a34b7c966a..cbdf38c1c4 100644 --- a/drivers/net/intel/ice/meson.build +++ b/drivers/net/intel/ice/meson.build @@ -43,7 +43,7 @@ if arch_subdir == 'x86' 'ice_rxtx_vec_avx2.c', dependencies: [static_rte_ethdev, static_rte_hash], include_directories: includes, - c_args: [cflags, '-mavx2']) + c_args: [cflags, cc_avx2_flags]) objs += ice_avx2_lib.extract_objects('ice_rxtx_vec_avx2.c') if cc_has_avx512 diff --git a/drivers/net/intel/idpf/meson.build b/drivers/net/intel/idpf/meson.build index 802b13035b..4b272d02b1 100644 --- a/drivers/net/intel/idpf/meson.build +++ b/drivers/net/intel/idpf/meson.build @@ -23,7 +23,7 @@ if arch_subdir == 'x86' and dpdk_conf.get('RTE_IOVA_IN_MBUF') == 1 'idpf_common_rxtx_avx2.c', dependencies: [static_rte_ethdev, static_rte_hash], include_directories: includes, - c_args: [cflags, '-mavx2']) + c_args: [cflags, cc_avx2_flags]) objs += idpf_avx2_lib.extract_objects('idpf_common_rxtx_avx2.c') if cc_has_avx512 diff --git a/drivers/net/nfp/meson.build b/drivers/net/nfp/meson.build index 39762bd45a..0a12b7dce7 100644 --- a/drivers/net/nfp/meson.build +++ b/drivers/net/nfp/meson.build @@ -61,7 +61,7 @@ if arch_subdir == 'x86' avx2_sources, dependencies: [static_rte_ethdev, static_rte_bus_pci], include_directories: includes, - c_args: [cflags, '-mavx2'] + c_args: [cflags, cc_avx2_flags] ) objs += nfp_avx2_lib.extract_all_objects(recursive: true) diff --git a/drivers/net/octeon_ep/meson.build b/drivers/net/octeon_ep/meson.build index d5d40b23a1..1b34db3edc 100644 --- a/drivers/net/octeon_ep/meson.build +++ b/drivers/net/octeon_ep/meson.build @@ -18,13 +18,13 @@ if arch_subdir == 'x86' if cc.get_define('__AVX2__', args: machine_args) != '' cflags += ['-DCC_AVX2_SUPPORT'] sources += files('cnxk_ep_rx_avx.c') - elif cc.has_argument('-mavx2') + elif cc.has_multi_arguments(cc_avx2_flags) cflags += ['-DCC_AVX2_SUPPORT'] otx_ep_avx2_lib = static_library('otx_ep_avx2_lib', 'cnxk_ep_rx_avx.c', dependencies: [static_rte_ethdev, static_rte_pci, static_rte_bus_pci], include_directories: includes, - c_args: [cflags, '-mavx2']) + c_args: [cflags, cc_avx2_flags]) objs += otx_ep_avx2_lib.extract_objects('cnxk_ep_rx_avx.c') endif endif -- 2.48.1.vfs.0.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 5/6] acl: make compatible with instruction set updates for MSVC 2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie ` (3 preceding siblings ...) 2025-02-24 21:01 ` [PATCH 4/6] drivers/net: make compatible with instruction set updates for MSVC Andre Muezerie @ 2025-02-24 21:01 ` Andre Muezerie 2025-02-24 21:01 ` [PATCH 6/6] member: " Andre Muezerie 5 siblings, 0 replies; 7+ messages in thread From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw) To: Konstantin Ananyev; +Cc: dev, Andre Muezerie Top level 'cc_avx2_flags' was created and holds the correct flags depending on the compiler used. File meson.build was updated to handle the correct AVX512 flags depending on compiler used. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/acl/meson.build | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/lib/acl/meson.build b/lib/acl/meson.build index fefe131a48..24e47b6cc1 100644 --- a/lib/acl/meson.build +++ b/lib/acl/meson.build @@ -19,7 +19,7 @@ if dpdk_conf.has('RTE_ARCH_X86') avx2_tmplib = static_library('avx2_tmp', 'acl_run_avx2.c', dependencies: static_rte_eal, - c_args: cflags + ['-mavx2']) + c_args: [cflags, cc_avx2_flags]) objs += avx2_tmplib.extract_objects('acl_run_avx2.c') # compile AVX512 version if: @@ -38,6 +38,12 @@ if dpdk_conf.has('RTE_ARCH_X86') # compiler flags, and then have the .o file from static lib # linked into main lib. + if is_ms_compiler + acl_avx512_args = cc_avx512_flags + else + acl_avx512_args = ['-mavx512f', '-mavx512vl', '-mavx512cd', '-mavx512bw'] + endif + # check if all required flags already enabled (variant a). acl_avx512_flags = ['__AVX512F__', '__AVX512VL__', '__AVX512CD__', '__AVX512BW__'] @@ -55,15 +61,11 @@ if dpdk_conf.has('RTE_ARCH_X86') sources += files('acl_run_avx512.c') cflags += '-DCC_AVX512_SUPPORT' - elif cc.has_multi_arguments('-mavx512f', '-mavx512vl', - '-mavx512cd', '-mavx512bw') - + elif cc.has_multi_arguments(acl_avx512_args) avx512_tmplib = static_library('avx512_tmp', 'acl_run_avx512.c', dependencies: static_rte_eal, - c_args: cflags + - ['-mavx512f', '-mavx512vl', - '-mavx512cd', '-mavx512bw']) + c_args: cflags + acl_avx512_args) objs += avx512_tmplib.extract_objects( 'acl_run_avx512.c') cflags += '-DCC_AVX512_SUPPORT' -- 2.48.1.vfs.0.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
* [PATCH 6/6] member: make compatible with instruction set updates for MSVC 2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie ` (4 preceding siblings ...) 2025-02-24 21:01 ` [PATCH 5/6] acl: " Andre Muezerie @ 2025-02-24 21:01 ` Andre Muezerie 5 siblings, 0 replies; 7+ messages in thread From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw) To: Yipeng Wang, Sameh Gobriel; +Cc: dev, Andre Muezerie File meson.build was updated to handle the correct AVX512 flags depending on compiler used. Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com> --- lib/member/meson.build | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/lib/member/meson.build b/lib/member/meson.build index f92cbb7f25..8416dc6f8a 100644 --- a/lib/member/meson.build +++ b/lib/member/meson.build @@ -33,6 +33,12 @@ if dpdk_conf.has('RTE_ARCH_X86_64') and binutils_ok # compiler flags, and then have the .o file from static lib # linked into main lib. + if is_ms_compiler + member_avx512_args = cc_avx512_flags + else + member_avx512_args = ['-mavx512f', '-mavx512dq', '-mavx512ifma'] + endif + # check if all required flags already enabled sketch_avx512_flags = ['__AVX512F__', '__AVX512DQ__', '__AVX512IFMA__'] @@ -46,13 +52,12 @@ if dpdk_conf.has('RTE_ARCH_X86_64') and binutils_ok if sketch_avx512_on == true cflags += ['-DCC_AVX512_SUPPORT'] sources += files('rte_member_sketch_avx512.c') - elif cc.has_multi_arguments('-mavx512f', '-mavx512dq', '-mavx512ifma') + elif cc.has_multi_arguments(member_avx512_args) sketch_avx512_tmp = static_library('sketch_avx512_tmp', 'rte_member_sketch_avx512.c', include_directories: includes, dependencies: [static_rte_eal, static_rte_hash], - c_args: cflags + - ['-mavx512f', '-mavx512dq', '-mavx512ifma']) + c_args: cflags + member_avx512_args) objs += sketch_avx512_tmp.extract_objects('rte_member_sketch_avx512.c') cflags += ['-DCC_AVX512_SUPPORT'] endif -- 2.48.1.vfs.0.0 ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2025-02-24 21:02 UTC | newest] Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie 2025-02-24 21:01 ` [PATCH 1/6] eal: make compatible with instruction set updates for MSVC Andre Muezerie 2025-02-24 21:01 ` [PATCH 2/6] eal: only use numbers as align parameters " Andre Muezerie 2025-02-24 21:01 ` [PATCH 3/6] config: allow faster instruction sets to be used with MSVC Andre Muezerie 2025-02-24 21:01 ` [PATCH 4/6] drivers/net: make compatible with instruction set updates for MSVC Andre Muezerie 2025-02-24 21:01 ` [PATCH 5/6] acl: " Andre Muezerie 2025-02-24 21:01 ` [PATCH 6/6] member: " Andre Muezerie
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).