[PATCH 0/6] allow faster instruction sets to be used with MSVC

DPDK patches and discussions
 help / color / mirror / Atom feed

* [PATCH 0/6] allow faster instruction sets to be used with MSVC
@ 2025-02-24 21:01 Andre Muezerie
  2025-02-24 21:01 ` [PATCH 1/6] eal: make compatible with instruction set updates for MSVC Andre Muezerie
                   ` (7 more replies)
  0 siblings, 8 replies; 34+ messages in thread
From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw)
  Cc: dev, Andre Muezerie

Up to now MSVC has being used with the default mode, which uses SSE2
instructions for scalar floating-point and vector calculations.
https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170

This series allows users to specify the CPU for which the generated
code should be optimized for in the same way it's done for GCC: by
passing the CPU name.
When no explicit CPU name is passed, 'native' is assumed (like it
happens with GCC) and the code will be optimized for the same CPU
type used to compile the code.

MSVC does not provide this functionality natively, so logic was
added to handle these differences, detecting which
instruction sets are supported by the CPU(s), passing the best
options to MSVC and setting the correct macros (like __AVX512F__)
so that the DPDK code can rely on them like it is done with GCC.

Andre Muezerie (6):
  eal: make compatible with instruction set updates for MSVC
  eal: only use numbers as align parameters for MSVC
  config: allow faster instruction sets to be used with MSVC
  drivers/net: make compatible with instruction set updates for MSVC
  acl: make compatible with instruction set updates for MSVC
  member: make compatible with instruction set updates for MSVC

 config/x86/meson.build             | 364 +++++++++++++++++++++++++----
 drivers/net/bnxt/meson.build       |   2 +-
 drivers/net/enic/meson.build       |   2 +-
 drivers/net/intel/i40e/meson.build |   2 +-
 drivers/net/intel/iavf/meson.build |   2 +-
 drivers/net/intel/ice/meson.build  |   2 +-
 drivers/net/intel/idpf/meson.build |   2 +-
 drivers/net/nfp/meson.build        |   2 +-
 drivers/net/octeon_ep/meson.build  |   4 +-
 lib/acl/meson.build                |  16 +-
 lib/eal/common/rte_random.c        |   2 +
 lib/eal/x86/include/rte_vect.h     |  11 +-
 lib/member/meson.build             |  11 +-
 13 files changed, 363 insertions(+), 59 deletions(-)

--
2.48.1.vfs.0.0

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/6] eal: make compatible with instruction set updates for MSVC
  2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie
@ 2025-02-24 21:01 ` Andre Muezerie
  2025-02-24 21:01 ` [PATCH 2/6] eal: only use numbers as align parameters " Andre Muezerie
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 34+ messages in thread
From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw)
  To: Mattias Rönnblom, Tyler Retzlaff; +Cc: dev, Andre Muezerie

After the instruction set updates for MSVC the error below poped up:

../lib/eal/common/rte_random.c(6): fatal error C1083:
    Cannot open include file: 'x86intrin.h': No such file or directory

The fix is to not include header x86intrin.h with MSVC.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
---
 lib/eal/common/rte_random.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/eal/common/rte_random.c b/lib/eal/common/rte_random.c
index 8e62578176..9354358818 100644
--- a/lib/eal/common/rte_random.c
+++ b/lib/eal/common/rte_random.c
@@ -3,8 +3,10 @@
  */
 
 #ifdef __RDSEED__
+#ifndef RTE_TOOLCHAIN_MSVC
 #include <x86intrin.h>
 #endif
+#endif
 #include <unistd.h>
 
 #include <rte_bitops.h>
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 2/6] eal: only use numbers as align parameters for MSVC
  2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie
  2025-02-24 21:01 ` [PATCH 1/6] eal: make compatible with instruction set updates for MSVC Andre Muezerie
@ 2025-02-24 21:01 ` Andre Muezerie
  2025-02-24 21:01 ` [PATCH 3/6] config: allow faster instruction sets to be used with MSVC Andre Muezerie
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 34+ messages in thread
From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw)
  To: Bruce Richardson, Konstantin Ananyev; +Cc: dev, Andre Muezerie

After the instruction set updates for MSVC the error below poped up:

..\lib\eal\x86\include\rte_vect.h(82): error C2059: syntax error: '('

The issue is that MSVC does not allow __rte_aligned(RTE_X86_ZMM_SIZE).
It only accepts numbers that are power of 2. So, even though
RTE_X86_ZMM_SIZE represents a number that is a power of two it cannot
be used directly.
https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
---
 lib/eal/x86/include/rte_vect.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/eal/x86/include/rte_vect.h b/lib/eal/x86/include/rte_vect.h
index 70c78e9b77..0a51c539a4 100644
--- a/lib/eal/x86/include/rte_vect.h
+++ b/lib/eal/x86/include/rte_vect.h
@@ -79,7 +79,16 @@ __extension__ ({                \
 #define RTE_X86_ZMM_SIZE	(sizeof(__m512i))
 #define RTE_X86_ZMM_MASK	(RTE_X86_ZMM_SIZE - 1)
 
-typedef union __rte_aligned(RTE_X86_ZMM_SIZE) __rte_x86_zmm {
+/*
+ * MSVC does not allow __rte_aligned(RTE_X86_ZMM_SIZE). It only accepts
+ * numbers that are power of 2. So, even though RTE_X86_ZMM_SIZE represents a
+ * number that is a power of two it cannot be used directly.
+ * Ref: https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170
+ * The static assert below ensures that RTE_X86_ZMM_SIZE is equal to what is
+ * used in the __rte_aligned() expression.
+ */
+static_assert(RTE_X86_ZMM_SIZE == 64, "Unexpected size of __m512i");
+typedef union __rte_aligned(64) __rte_x86_zmm {
 	__m512i	 z;
 	ymm_t    y[RTE_X86_ZMM_SIZE / sizeof(ymm_t)];
 	xmm_t    x[RTE_X86_ZMM_SIZE / sizeof(xmm_t)];
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 3/6] config: allow faster instruction sets to be used with MSVC
  2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie
  2025-02-24 21:01 ` [PATCH 1/6] eal: make compatible with instruction set updates for MSVC Andre Muezerie
  2025-02-24 21:01 ` [PATCH 2/6] eal: only use numbers as align parameters " Andre Muezerie
@ 2025-02-24 21:01 ` Andre Muezerie
  2025-02-25 14:28   ` Bruce Richardson
  2025-02-24 21:01 ` [PATCH 4/6] drivers/net: make compatible with instruction set updates for MSVC Andre Muezerie
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw)
  To: Bruce Richardson, Konstantin Ananyev; +Cc: dev, Andre Muezerie

Up to now MSVC has being used with the default mode, which uses SSE2
instructions for scalar floating-point and vector calculations.
https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170

This patch allows users to specify the CPU for which the generated
code should be optimized for in the same way it's done for GCC: by
passing the CPU name.
When no explicit CPU name is passed, 'native' is assumed (like it
happens with GCC) and the code will be optimized for the same CPU
type used to compile the code.

MSVC does not provide this functionality natively, so logic was
added to meson.build to handle these differences, detecting which
instruction sets are supported by the CPU(s), passing the best
options to MSVC and setting the correct macros (like __AVX512F__)
so that the DPDK code can rely on them like it is done with GCC.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
---
 config/x86/meson.build | 364 ++++++++++++++++++++++++++++++++++++-----
 1 file changed, 325 insertions(+), 39 deletions(-)

diff --git a/config/x86/meson.build b/config/x86/meson.build
index 47a5b0c04a..9260969c54 100644
--- a/config/x86/meson.build
+++ b/config/x86/meson.build
@@ -14,7 +14,194 @@ if is_linux or cc.get_id() == 'gcc'
     endif
 endif
 
-cc_avx512_flags = ['-mavx512f', '-mavx512vl', '-mavx512dq', '-mavx512bw']
+cpuid_code = '''
+    #include <stdio.h>
+    #include <stdint.h>
+    #include <intrin.h>
+
+    uint32_t f1_ECX = 0;
+    uint32_t f1_EDX = 0;
+    uint32_t f7_EBX = 0;
+    uint32_t f7_ECX = 0;
+
+    void get_support_flags()
+    {
+        int ids_max;
+        int data[4];
+
+        /*
+         * Calling __cpuid with 0x0 as the function_id argument
+         * gets the number of the highest valid function ID.
+         */
+        __cpuid(data, 0);
+        ids_max = data[0];
+
+        if (1 <= ids_max) {
+            __cpuidex(data, 1, 0);
+            f1_ECX = data[2];
+            f1_EDX = data[3];
+
+            if (7 <= ids_max) {
+                __cpuidex(data, 7, 0);
+                f7_EBX = data[1];
+                f7_ECX = data[2];
+            }
+        }
+    }
+
+    int get_instruction_support()
+    {
+        get_support_flags();
+
+    #ifdef SSE3
+        return (f1_ECX & (1UL << 0)) ? 1 : 0;
+    #endif
+    #ifdef PCLMUL
+        return (f1_ECX & (1UL << 1)) ? 1 : 0;
+    #endif
+    #ifdef SSSE3
+        return (f1_ECX & (1UL << 9)) ? 1 : 0;
+    #endif
+    #ifdef SSE4_1
+        return (f1_ECX & (1UL << 19)) ? 1 : 0;
+    #endif
+    #ifdef SSE4_2
+        return (f1_ECX & (1UL << 20)) ? 1 : 0;
+    #endif
+    #ifdef AES
+        return (f1_ECX & (1UL << 25)) ? 1 : 0;
+    #endif
+    #ifdef AVX
+        return (f1_ECX & (1UL << 28)) ? 1 : 0;
+    #endif
+    #ifdef RDRND
+        return (f1_ECX & (1UL << 30)) ? 1 : 0;
+    #endif
+    #ifdef SSE
+        return (f1_EDX & (1UL << 25)) ? 1 : 0;
+    #endif
+    #ifdef SSE2
+        return (f1_EDX & (1UL << 26)) ? 1 : 0;
+    #endif
+    #ifdef AVX2
+        return (f7_EBX & (1UL << 5)) ? 1 : 0;
+    #endif
+    #ifdef AVX512F
+        return (f7_EBX & (1UL << 16)) ? 1 : 0;
+    #endif
+    #ifdef AVX512DQ
+        return (f7_EBX & (1UL << 17)) ? 1 : 0;
+    #endif
+    #ifdef RDSEED
+        return (f7_EBX & (1UL << 18)) ? 1 : 0;
+    #endif
+    #ifdef AVX512IFMA
+        return (f7_EBX & (1UL << 21)) ? 1 : 0;
+    #endif
+    #ifdef AVX512CD
+        return (f7_EBX & (1UL << 28)) ? 1 : 0;
+    #endif
+    #ifdef AVX512BW
+        return (f7_EBX & (1UL << 30)) ? 1 : 0;
+    #endif
+    #ifdef AVX512VL
+        return (f7_EBX & (1UL << 31)) ? 1 : 0;
+    #endif
+    #ifdef GFNI
+        return (f7_ECX & (1UL << 8)) ? 1 : 0;
+    #endif
+    #ifdef VPCLMULQDQ
+        return (f7_ECX & (1UL << 10)) ? 1 : 0;
+    #endif
+
+        return -1;
+    }
+
+    int main(int argc, char *argv[])
+    {
+        int res = get_instruction_support();
+        if (res == -1) {
+            printf("Unknown instruction set");
+            return -1;
+        }
+        printf("%d", res);
+
+        return 0;
+    }
+'''
+
+# The data in table below can be found here:
+# https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
+# This table only contains CPUs that have SSE4.2, as this instruction set is required by DPDK.
+# That means that in addition to the instruction sets mentioned in the table, all these CPUs
+# also have ['SSE', 'SSE2', 'SSE3', 'SSEE3', 'SSE4_1', 'SSE4_2']
+cpu_type_to_flags = {
+       'x86-64-v2': [],
+       'x86-64-v3': ['AVX', 'AVX2'],
+       'x86-64-v4': ['AVX', 'AVX2', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'],
+         'nehalem': [],
+          'corei7': [],
+        'westmere': ['PCLMUL'],
+     'sandybridge': ['AVX', 'PCLMUL'],
+      'corei7-avx': ['AVX', 'PCLMUL'],
+       'ivybridge': ['AVX', 'PCLMUL', 'RDRND'],
+      'core-avx-i': ['AVX', 'PCLMUL', 'RDRND'],
+         'haswell': ['AVX', 'PCLMUL', 'RDRND', 'AVX2'],
+       'core-avx2': ['AVX', 'PCLMUL', 'RDRND', 'AVX2'],
+       'broadwell': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED'],
+         'skylake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
+  'skylake-avx512': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'],
+     'cascadelake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'],
+      'cannonlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA'],
+      'cooperlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'],
+  'icelake-client': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
+  'icelake-server': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
+       'tigerlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
+      'rocketlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
+       'alderlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
+      'raptorlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
+      'meteorlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
+       'gracemont': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
+       'arrowlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
+     'arrowlake-s': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
+       'lunarlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
+     'pantherlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
+  'sapphirerapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
+   'emeraldrapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
+   'graniterapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
+ 'graniterapids-d': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
+   'diamondrapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
+      'silvermont': ['PCLMUL', 'RDRND'],
+             'slm': ['PCLMUL', 'RDRND'],
+        'goldmont': ['PCLMUL', 'RDRND', 'RDSEED', 'AES'],
+   'goldmont-plus': ['PCLMUL', 'RDRND', 'RDSEED', 'AES'],
+         'tremont': ['PCLMUL', 'RDRND', 'RDSEED', 'AES', 'GFNI'],
+    'sierraforest': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
+      'grandridge': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
+'clearwaterforest': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
+          'bdver1': ['AVX', 'PCLMUL', 'AES'],
+          'bdver2': ['AVX', 'PCLMUL', 'AES'],
+          'bdver3': ['AVX', 'PCLMUL', 'AES'],
+          'bdver4': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'AES'],
+          'znver1': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
+          'znver2': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
+          'znver3': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ'],
+          'znver4': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
+          'znver5': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
+          'btver2': ['AVX', 'PCLMUL', 'AES'],
+        'lujiazui': ['PCLMUL', 'RDRND', 'RDSEED', 'AES'],
+        'yongfeng': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
+      'shijidadao': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
+}
+
+if is_ms_compiler
+    cc_avx2_flags = ['/arch:AVX2']
+    cc_avx512_flags = ['/arch:AVX512']
+else
+    cc_avx2_flags = ['-mavx2']
+    cc_avx512_flags = ['-mavx512f', '-mavx512vl', '-mavx512dq', '-mavx512bw']
+endif
+
 cc_has_avx512 = false
 target_has_avx512 = false
 if (binutils_ok and cc.has_multi_arguments(cc_avx512_flags)
@@ -30,12 +217,14 @@ if (binutils_ok and cc.has_multi_arguments(cc_avx512_flags)
         warning('Broken _mm512_extracti64x4_epi64, disabling AVX512 support')
     else
         cc_has_avx512 = true
-        target_has_avx512 = (
-                cc.get_define('__AVX512F__', args: machine_args) != '' and
-                cc.get_define('__AVX512BW__', args: machine_args) != '' and
-                cc.get_define('__AVX512DQ__', args: machine_args) != '' and
-                cc.get_define('__AVX512VL__', args: machine_args) != ''
-            )
+        if not is_ms_compiler
+            target_has_avx512 = (
+                    cc.get_define('__AVX512F__', args: machine_args) != '' and
+                    cc.get_define('__AVX512BW__', args: machine_args) != '' and
+                    cc.get_define('__AVX512DQ__', args: machine_args) != '' and
+                    cc.get_define('__AVX512VL__', args: machine_args) != ''
+                )
+        endif
     endif
 endif
 
@@ -47,42 +236,139 @@ if not is_ms_compiler
     endif
 endif
 
-# enable restricted transactional memory intrinsics
-# https://gcc.gnu.org/onlinedocs/gcc/x86-transactional-memory-intrinsics.html
-if cc.get_id() != 'msvc'
-    machine_args += '-mrtm'
-endif
+if is_ms_compiler
+    # Determine cpu_flags for a given configuration.
+    # SSE instructions up to 4.2 are required for DPDK.
+    cpu_flags = ['SSE', 'SSE2', 'SSE3', 'SSEE3', 'SSE4_1', 'SSE4_2']
+
+    message('cpu_instruction_set: @0@'.format(cpu_instruction_set))
 
-base_flags = ['SSE', 'SSE2', 'SSE3','SSSE3', 'SSE4_1', 'SSE4_2']
-foreach f:base_flags
-    compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
-endforeach
-
-optional_flags = [
-        'AES',
-        'AVX',
-        'AVX2',
-        'AVX512BW',
-        'AVX512CD',
-        'AVX512DQ',
-        'AVX512F',
-        'AVX512VL',
-        'PCLMUL',
-        'RDRND',
-        'RDSEED',
-        'VPCLMULQDQ',
-]
-foreach f:optional_flags
-    if cc.get_define('__@0@__'.format(f), args: machine_args) == '1'
-        if f == 'PCLMUL' # special case flags with different defines
-            f = 'PCLMULQDQ'
-        elif f == 'RDRND'
-            f = 'RDRAND'
+    if cpu_instruction_set == ''
+        # Nothing to do as cpu_flags already holds all the required flags.
+    elif cpu_instruction_set == 'native'
+        # MSVC behaves differently than GCC regarding supported instruction sets.
+        # While GCC will create macros like __AVX512F__ when such instruction set is
+        # supported by the current CPU, MSVC does not do that. MSVC will create that
+        # macro when parameter /arch:AVX512 is passed to the compiler, even when the
+        # CPU does not have that instruction set (by design). So there's a need to
+        # look at CPUID flags to figure out what is really supported by the CPU, so
+        # that the correct /arch value can be passed to the compiler.
+        # The macros also need to be explicitly defined, as /arch will not create all
+        # macros GCC creates under the same conditions.
+        # As an example, /arch:AVX512 creates __AVX512BW__, but does not create __SSE2__.
+        # More details available here:
+        # https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros
+
+        optional_flags = [
+                'PCLMUL',
+                'AES',
+                'AVX',
+                'RDRND',
+                'AVX2',
+                'AVX512F',
+                'AVX512BW',
+                'AVX512DQ',
+                'AVX512VL',
+                'AVX512CD',
+                'AVX512IFMA',
+                'GFNI',
+                'RDSEED',
+                'VPCLMULQDQ',
+        ]
+        foreach f:optional_flags
+            result = cc.run(cpuid_code, args: '-D@0@'.format(f),
+                            name: 'instruction set @0@'.format(f))
+            has_instr_set = result.returncode() == 0 and result.stdout() == '1'
+            if has_instr_set
+                cpu_flags += f
+            endif
+            message('Target has @0@: @1@'.format(f, has_instr_set))
+        endforeach
+    else
+        # An explicit cpu_instruction_set was provided. Get cpu_flags
+        # from cpu_type_to_flags table.
+        if cpu_instruction_set not in cpu_type_to_flags
+            error('CPU not known or not supported. Please update the table with known CPUs if needed.')
         endif
-        compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
+        cpu_flags += cpu_type_to_flags[cpu_instruction_set]
+    endif
+
+    # Now that all cpu_flags are known, set compile_time_cpuflags and also
+    # machine_args to ensure that the instruction set #defines (like __SSE2__)
+    # are always present in the preprocessor.
+    message('cpu_flags: @0@'.format(cpu_flags))
+
+    foreach flag:cpu_flags
+        machine_args += '/D__@0@__'.format(flag)
+        if flag == 'PCLMUL'
+            flag = 'PCLMULQDQ'
+        elif flag == 'RDRND'
+            flag = 'RDRAND'
+        endif
+        compile_time_cpuflags += ['RTE_CPUFLAG_' + flag]
+    endforeach
+
+    target_has_avx512 = ('AVX512F'  in cpu_flags and
+                         'AVX512BW' in cpu_flags and
+                         'AVX512DQ' in cpu_flags and
+                         'AVX512VL' in cpu_flags)
+
+    # Decide which instruction sets should be used by the compiler.
+    # With MSVC, intrinsic functions are always enabled. However, for the
+    # compiler to use an extended instruction set for automatically
+    # generated code "/arch" needs to be passed. So we instruct the compiler
+    # to use the largest set that is supported by the CPU. It is implied that
+    # smaller sets than the largest selected are included, as described here:
+    # https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170
+    if 'RTE_CPUFLAG_AVX512F' in compile_time_cpuflags
+        machine_args += ['/arch:AVX512']
+    elif 'RTE_CPUFLAG_AVX2' in compile_time_cpuflags
+        machine_args += ['/arch:AVX2']
+    elif 'RTE_CPUFLAG_AVX' in compile_time_cpuflags
+        machine_args += ['/arch:AVX']
+    else
+        # SSE4.2 is expected to always be available
+        machine_args += ['/arch:SSE4.2']
     endif
-endforeach
 
+    message('machine_args: @0@'.format(machine_args))
+else
+    # enable restricted transactional memory intrinsics
+    # https://gcc.gnu.org/onlinedocs/gcc/x86-transactional-memory-intrinsics.html
+    machine_args += '-mrtm'
+
+    base_flags = ['SSE', 'SSE2', 'SSE3','SSSE3', 'SSE4_1', 'SSE4_2']
+    foreach f:base_flags
+        compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
+    endforeach
+
+    optional_flags = [
+            'AES',
+            'AVX',
+            'AVX2',
+            'AVX512BW',
+            'AVX512CD',
+            'AVX512DQ',
+            'AVX512F',
+            'AVX512VL',
+            'PCLMUL',
+            'RDRND',
+            'RDSEED',
+            'VPCLMULQDQ',
+    ]
+    foreach f:optional_flags
+        if cc.get_define('__@0@__'.format(f), args: machine_args) == '1'
+            if f == 'PCLMUL' # special case flags with different defines
+                f = 'PCLMULQDQ'
+            elif f == 'RDRND'
+                f = 'RDRAND'
+            endif
+            compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
+        endif
+    endforeach
+endif
+
+message('compile_time_cpuflags: @0@'.format(compile_time_cpuflags))
 
 dpdk_conf.set('RTE_ARCH_X86', 1)
 if dpdk_conf.get('RTE_ARCH_64')
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 4/6] drivers/net: make compatible with instruction set updates for MSVC
  2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie
                   ` (2 preceding siblings ...)
  2025-02-24 21:01 ` [PATCH 3/6] config: allow faster instruction sets to be used with MSVC Andre Muezerie
@ 2025-02-24 21:01 ` Andre Muezerie
  2025-02-25  9:06   ` Bruce Richardson
  2025-02-24 21:01 ` [PATCH 5/6] acl: " Andre Muezerie
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw)
  To: Ajit Khaparde, Somnath Kotur, John Daley, Hyong Youb Kim,
	Ian Stokes, Bruce Richardson, Vladimir Medvedkin,
	Anatoly Burakov, Jingjing Wu, Praveen Shetty, Chaoyong He,
	Vamsi Attunuru
  Cc: dev, Andre Muezerie

Top level 'cc_avx2_flags' was created and holds the correct flags
depending on the compiler used.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
---
 drivers/net/bnxt/meson.build       | 2 +-
 drivers/net/enic/meson.build       | 2 +-
 drivers/net/intel/i40e/meson.build | 2 +-
 drivers/net/intel/iavf/meson.build | 2 +-
 drivers/net/intel/ice/meson.build  | 2 +-
 drivers/net/intel/idpf/meson.build | 2 +-
 drivers/net/nfp/meson.build        | 2 +-
 drivers/net/octeon_ep/meson.build  | 4 ++--
 8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bnxt/meson.build b/drivers/net/bnxt/meson.build
index e26cf13a65..fd82d0c409 100644
--- a/drivers/net/bnxt/meson.build
+++ b/drivers/net/bnxt/meson.build
@@ -65,7 +65,7 @@ if arch_subdir == 'x86'
                 static_rte_bus_pci,
                 static_rte_kvargs, static_rte_hash],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
      objs += bnxt_avx2_lib.extract_objects('bnxt_rxtx_vec_avx2.c')
 elif arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
     sources += files('bnxt_rxtx_vec_neon.c')
diff --git a/drivers/net/enic/meson.build b/drivers/net/enic/meson.build
index 1e26338350..cfe5ec170a 100644
--- a/drivers/net/enic/meson.build
+++ b/drivers/net/enic/meson.build
@@ -38,7 +38,7 @@ if dpdk_conf.has('RTE_ARCH_X86_64')
             'enic_rxtx_vec_avx2.c',
             dependencies: [static_rte_ethdev, static_rte_bus_pci],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += enic_avx2_lib.extract_objects('enic_rxtx_vec_avx2.c')
 endif
 
diff --git a/drivers/net/intel/i40e/meson.build b/drivers/net/intel/i40e/meson.build
index ffa40c5d64..2973ed1a01 100644
--- a/drivers/net/intel/i40e/meson.build
+++ b/drivers/net/intel/i40e/meson.build
@@ -49,7 +49,7 @@ if arch_subdir == 'x86'
             'i40e_rxtx_vec_avx2.c',
             dependencies: [static_rte_ethdev, static_rte_kvargs, static_rte_hash],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += i40e_avx2_lib.extract_objects('i40e_rxtx_vec_avx2.c')
 
     if cc_has_avx512
diff --git a/drivers/net/intel/iavf/meson.build b/drivers/net/intel/iavf/meson.build
index 19cd1cfbc8..f7eac7c57a 100644
--- a/drivers/net/intel/iavf/meson.build
+++ b/drivers/net/intel/iavf/meson.build
@@ -37,7 +37,7 @@ if arch_subdir == 'x86'
             'iavf_rxtx_vec_avx2.c',
             dependencies: [static_rte_ethdev],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += iavf_avx2_lib.extract_objects('iavf_rxtx_vec_avx2.c')
 
     if cc_has_avx512
diff --git a/drivers/net/intel/ice/meson.build b/drivers/net/intel/ice/meson.build
index a34b7c966a..cbdf38c1c4 100644
--- a/drivers/net/intel/ice/meson.build
+++ b/drivers/net/intel/ice/meson.build
@@ -43,7 +43,7 @@ if arch_subdir == 'x86'
             'ice_rxtx_vec_avx2.c',
             dependencies: [static_rte_ethdev, static_rte_hash],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += ice_avx2_lib.extract_objects('ice_rxtx_vec_avx2.c')
 
     if cc_has_avx512
diff --git a/drivers/net/intel/idpf/meson.build b/drivers/net/intel/idpf/meson.build
index 802b13035b..4b272d02b1 100644
--- a/drivers/net/intel/idpf/meson.build
+++ b/drivers/net/intel/idpf/meson.build
@@ -23,7 +23,7 @@ if arch_subdir == 'x86' and dpdk_conf.get('RTE_IOVA_IN_MBUF') == 1
         'idpf_common_rxtx_avx2.c',
         dependencies: [static_rte_ethdev, static_rte_hash],
         include_directories: includes,
-        c_args: [cflags, '-mavx2'])
+        c_args: [cflags, cc_avx2_flags])
     objs += idpf_avx2_lib.extract_objects('idpf_common_rxtx_avx2.c')
 
     if cc_has_avx512
diff --git a/drivers/net/nfp/meson.build b/drivers/net/nfp/meson.build
index 39762bd45a..0a12b7dce7 100644
--- a/drivers/net/nfp/meson.build
+++ b/drivers/net/nfp/meson.build
@@ -61,7 +61,7 @@ if arch_subdir == 'x86'
                 avx2_sources,
                 dependencies: [static_rte_ethdev, static_rte_bus_pci],
                 include_directories: includes,
-                c_args: [cflags, '-mavx2']
+                c_args: [cflags, cc_avx2_flags]
         )
 
         objs += nfp_avx2_lib.extract_all_objects(recursive: true)
diff --git a/drivers/net/octeon_ep/meson.build b/drivers/net/octeon_ep/meson.build
index d5d40b23a1..1b34db3edc 100644
--- a/drivers/net/octeon_ep/meson.build
+++ b/drivers/net/octeon_ep/meson.build
@@ -18,13 +18,13 @@ if arch_subdir == 'x86'
     if cc.get_define('__AVX2__', args: machine_args) != ''
         cflags += ['-DCC_AVX2_SUPPORT']
         sources += files('cnxk_ep_rx_avx.c')
-    elif cc.has_argument('-mavx2')
+    elif cc.has_multi_arguments(cc_avx2_flags)
         cflags += ['-DCC_AVX2_SUPPORT']
         otx_ep_avx2_lib = static_library('otx_ep_avx2_lib',
                         'cnxk_ep_rx_avx.c',
                         dependencies: [static_rte_ethdev, static_rte_pci, static_rte_bus_pci],
                         include_directories: includes,
-                        c_args: [cflags, '-mavx2'])
+                        c_args: [cflags, cc_avx2_flags])
         objs += otx_ep_avx2_lib.extract_objects('cnxk_ep_rx_avx.c')
     endif
 endif
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 5/6] acl: make compatible with instruction set updates for MSVC
  2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie
                   ` (3 preceding siblings ...)
  2025-02-24 21:01 ` [PATCH 4/6] drivers/net: make compatible with instruction set updates for MSVC Andre Muezerie
@ 2025-02-24 21:01 ` Andre Muezerie
  2025-02-25  9:03   ` Bruce Richardson
  2025-02-24 21:01 ` [PATCH 6/6] member: " Andre Muezerie
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 34+ messages in thread
From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, Andre Muezerie

Top level 'cc_avx2_flags' was created and holds the correct flags
depending on the compiler used.

File meson.build was updated to handle the correct AVX512 flags
depending on compiler used.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
---
 lib/acl/meson.build | 16 +++++++++-------
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/lib/acl/meson.build b/lib/acl/meson.build
index fefe131a48..24e47b6cc1 100644
--- a/lib/acl/meson.build
+++ b/lib/acl/meson.build
@@ -19,7 +19,7 @@ if dpdk_conf.has('RTE_ARCH_X86')
     avx2_tmplib = static_library('avx2_tmp',
             'acl_run_avx2.c',
             dependencies: static_rte_eal,
-            c_args: cflags + ['-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += avx2_tmplib.extract_objects('acl_run_avx2.c')
 
     # compile AVX512 version if:
@@ -38,6 +38,12 @@ if dpdk_conf.has('RTE_ARCH_X86')
         # compiler flags, and then have the .o file from static lib
         # linked into main lib.
 
+        if is_ms_compiler
+            acl_avx512_args = cc_avx512_flags
+        else
+            acl_avx512_args = ['-mavx512f', '-mavx512vl', '-mavx512cd', '-mavx512bw']
+        endif
+
         # check if all required flags already enabled (variant a).
         acl_avx512_flags = ['__AVX512F__', '__AVX512VL__',
             '__AVX512CD__', '__AVX512BW__']
@@ -55,15 +61,11 @@ if dpdk_conf.has('RTE_ARCH_X86')
             sources += files('acl_run_avx512.c')
             cflags += '-DCC_AVX512_SUPPORT'
 
-        elif cc.has_multi_arguments('-mavx512f', '-mavx512vl',
-                    '-mavx512cd', '-mavx512bw')
-
+        elif cc.has_multi_arguments(acl_avx512_args)
             avx512_tmplib = static_library('avx512_tmp',
                 'acl_run_avx512.c',
                 dependencies: static_rte_eal,
-                c_args: cflags +
-                    ['-mavx512f', '-mavx512vl',
-                     '-mavx512cd', '-mavx512bw'])
+                c_args: cflags + acl_avx512_args)
             objs += avx512_tmplib.extract_objects(
                     'acl_run_avx512.c')
             cflags += '-DCC_AVX512_SUPPORT'
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 6/6] member: make compatible with instruction set updates for MSVC
  2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie
                   ` (4 preceding siblings ...)
  2025-02-24 21:01 ` [PATCH 5/6] acl: " Andre Muezerie
@ 2025-02-24 21:01 ` Andre Muezerie
  2025-02-26  1:06 ` [PATCH v2 0/5] allow faster instruction sets to be used with MSVC Andre Muezerie
  2025-02-26 21:21 ` [PATCH v3 0/3] allow faster instruction sets to be used with MSVC Andre Muezerie
  7 siblings, 0 replies; 34+ messages in thread
From: Andre Muezerie @ 2025-02-24 21:01 UTC (permalink / raw)
  To: Yipeng Wang, Sameh Gobriel; +Cc: dev, Andre Muezerie

File meson.build was updated to handle the correct AVX512 flags
depending on compiler used.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
---
 lib/member/meson.build | 11 ++++++++---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/lib/member/meson.build b/lib/member/meson.build
index f92cbb7f25..8416dc6f8a 100644
--- a/lib/member/meson.build
+++ b/lib/member/meson.build
@@ -33,6 +33,12 @@ if dpdk_conf.has('RTE_ARCH_X86_64') and binutils_ok
     # compiler flags, and then have the .o file from static lib
     # linked into main lib.
 
+    if is_ms_compiler
+        member_avx512_args = cc_avx512_flags
+    else
+        member_avx512_args = ['-mavx512f', '-mavx512dq', '-mavx512ifma']
+    endif
+
     # check if all required flags already enabled
     sketch_avx512_flags = ['__AVX512F__', '__AVX512DQ__', '__AVX512IFMA__']
 
@@ -46,13 +52,12 @@ if dpdk_conf.has('RTE_ARCH_X86_64') and binutils_ok
     if sketch_avx512_on == true
         cflags += ['-DCC_AVX512_SUPPORT']
         sources += files('rte_member_sketch_avx512.c')
-    elif cc.has_multi_arguments('-mavx512f', '-mavx512dq', '-mavx512ifma')
+    elif cc.has_multi_arguments(member_avx512_args)
         sketch_avx512_tmp = static_library('sketch_avx512_tmp',
             'rte_member_sketch_avx512.c',
             include_directories: includes,
             dependencies: [static_rte_eal, static_rte_hash],
-            c_args: cflags +
-                ['-mavx512f', '-mavx512dq', '-mavx512ifma'])
+            c_args: cflags + member_avx512_args)
         objs += sketch_avx512_tmp.extract_objects('rte_member_sketch_avx512.c')
         cflags += ['-DCC_AVX512_SUPPORT']
     endif
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] acl: make compatible with instruction set updates for MSVC
  2025-02-24 21:01 ` [PATCH 5/6] acl: " Andre Muezerie
@ 2025-02-25  9:03   ` Bruce Richardson
  2025-02-25 16:37     ` Andre Muezerie
  0 siblings, 1 reply; 34+ messages in thread
From: Bruce Richardson @ 2025-02-25  9:03 UTC (permalink / raw)
  To: Andre Muezerie; +Cc: Konstantin Ananyev, dev

On Mon, Feb 24, 2025 at 01:01:18PM -0800, Andre Muezerie wrote:
> Top level 'cc_avx2_flags' was created and holds the correct flags
> depending on the compiler used.
> 
> File meson.build was updated to handle the correct AVX512 flags
> depending on compiler used.
> 
> Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> ---
>  lib/acl/meson.build | 16 +++++++++-------
>  1 file changed, 9 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/acl/meson.build b/lib/acl/meson.build
> index fefe131a48..24e47b6cc1 100644
> --- a/lib/acl/meson.build
> +++ b/lib/acl/meson.build
> @@ -19,7 +19,7 @@ if dpdk_conf.has('RTE_ARCH_X86')
>      avx2_tmplib = static_library('avx2_tmp',
>              'acl_run_avx2.c',
>              dependencies: static_rte_eal,
> -            c_args: cflags + ['-mavx2'])
> +            c_args: [cflags, cc_avx2_flags])
>      objs += avx2_tmplib.extract_objects('acl_run_avx2.c')
>  
>      # compile AVX512 version if:
> @@ -38,6 +38,12 @@ if dpdk_conf.has('RTE_ARCH_X86')
>          # compiler flags, and then have the .o file from static lib
>          # linked into main lib.
>  
> +        if is_ms_compiler
> +            acl_avx512_args = cc_avx512_flags
> +        else
> +            acl_avx512_args = ['-mavx512f', '-mavx512vl', '-mavx512cd', '-mavx512bw']
> +        endif
> +

in the non-msvc case are these flags not the same as cc_avx512_flags too?
If so, let's just get rid of the acl_avx512_args variable generally.

/Bruce


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] drivers/net: make compatible with instruction set updates for MSVC
  2025-02-24 21:01 ` [PATCH 4/6] drivers/net: make compatible with instruction set updates for MSVC Andre Muezerie
@ 2025-02-25  9:06   ` Bruce Richardson
  2025-02-25 16:44     ` Andre Muezerie
  0 siblings, 1 reply; 34+ messages in thread
From: Bruce Richardson @ 2025-02-25  9:06 UTC (permalink / raw)
  To: Andre Muezerie
  Cc: Ajit Khaparde, Somnath Kotur, John Daley, Hyong Youb Kim,
	Ian Stokes, Vladimir Medvedkin, Anatoly Burakov, Jingjing Wu,
	Praveen Shetty, Chaoyong He, Vamsi Attunuru, dev

On Mon, Feb 24, 2025 at 01:01:17PM -0800, Andre Muezerie wrote:
> Top level 'cc_avx2_flags' was created and holds the correct flags
> depending on the compiler used.
> 
> Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> ---
>  drivers/net/bnxt/meson.build       | 2 +-
>  drivers/net/enic/meson.build       | 2 +-
>  drivers/net/intel/i40e/meson.build | 2 +-
>  drivers/net/intel/iavf/meson.build | 2 +-
>  drivers/net/intel/ice/meson.build  | 2 +-
>  drivers/net/intel/idpf/meson.build | 2 +-
>  drivers/net/nfp/meson.build        | 2 +-
>  drivers/net/octeon_ep/meson.build  | 4 ++--
>  8 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/net/bnxt/meson.build b/drivers/net/bnxt/meson.build
> index e26cf13a65..fd82d0c409 100644
> --- a/drivers/net/bnxt/meson.build
> +++ b/drivers/net/bnxt/meson.build
> @@ -65,7 +65,7 @@ if arch_subdir == 'x86'
>                  static_rte_bus_pci,
>                  static_rte_kvargs, static_rte_hash],
>              include_directories: includes,
> -            c_args: [cflags, '-mavx2'])
> +            c_args: [cflags, cc_avx2_flags])
>       objs += bnxt_avx2_lib.extract_objects('bnxt_rxtx_vec_avx2.c')

I like this change, and the consistency of the variable with the equivalent
avx512 one. To simplify getting this patchset in - or as much of it as
possible - can you perhaps add the cc_avx2_flags variable earlier in the
patchset and make these library changes to use it, ahead of the complicated
changes in patch 3. I think it may be simplier to have everything but patch
ready since they should be easy to review and merge, and then we can look
at patch 3 standalone. WDYT?

/Bruce

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] config: allow faster instruction sets to be used with MSVC
  2025-02-24 21:01 ` [PATCH 3/6] config: allow faster instruction sets to be used with MSVC Andre Muezerie
@ 2025-02-25 14:28   ` Bruce Richardson
  2025-02-26  2:01     ` Andre Muezerie
  0 siblings, 1 reply; 34+ messages in thread
From: Bruce Richardson @ 2025-02-25 14:28 UTC (permalink / raw)
  To: Andre Muezerie; +Cc: Konstantin Ananyev, dev

On Mon, Feb 24, 2025 at 01:01:16PM -0800, Andre Muezerie wrote:
> Up to now MSVC has being used with the default mode, which uses SSE2
> instructions for scalar floating-point and vector calculations.
> https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170
> 
> This patch allows users to specify the CPU for which the generated
> code should be optimized for in the same way it's done for GCC: by
> passing the CPU name.
> When no explicit CPU name is passed, 'native' is assumed (like it
> happens with GCC) and the code will be optimized for the same CPU
> type used to compile the code.
> 
> MSVC does not provide this functionality natively, so logic was
> added to meson.build to handle these differences, detecting which
> instruction sets are supported by the CPU(s), passing the best
> options to MSVC and setting the correct macros (like __AVX512F__)
> so that the DPDK code can rely on them like it is done with GCC.
> 
> Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> ---

Hi Andre,

couple of initial thoughts inline below.

/Bruce

>  config/x86/meson.build | 364 ++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 325 insertions(+), 39 deletions(-)
> 

There is quite a lot of new code to be added here. Might it be worthwhile
creating a "config/x86/msvc/" subdirectory with its own meson.build file to
handle all the complexities of using it. We can have the common material at
the top of the x86/meson.build file, and then do 

if is_ms_compiler
    subdir(msvc)
    subdir_done()
endif

leaving the rest of the file for the gcc/clang/icx code.

> diff --git a/config/x86/meson.build b/config/x86/meson.build
> index 47a5b0c04a..9260969c54 100644
> --- a/config/x86/meson.build
> +++ b/config/x86/meson.build
> @@ -14,7 +14,194 @@ if is_linux or cc.get_id() == 'gcc'
>      endif
>  endif
>  
> -cc_avx512_flags = ['-mavx512f', '-mavx512vl', '-mavx512dq', '-mavx512bw']
> +cpuid_code = '''
> +    #include <stdio.h>
> +    #include <stdint.h>
> +    #include <intrin.h>
> +
> +    uint32_t f1_ECX = 0;
> +    uint32_t f1_EDX = 0;
> +    uint32_t f7_EBX = 0;
> +    uint32_t f7_ECX = 0;
> +
> +    void get_support_flags()
> +    {
> +        int ids_max;
> +        int data[4];
> +
> +        /*
> +         * Calling __cpuid with 0x0 as the function_id argument
> +         * gets the number of the highest valid function ID.
> +         */
> +        __cpuid(data, 0);
> +        ids_max = data[0];
> +
> +        if (1 <= ids_max) {
> +            __cpuidex(data, 1, 0);
> +            f1_ECX = data[2];
> +            f1_EDX = data[3];
> +
> +            if (7 <= ids_max) {
> +                __cpuidex(data, 7, 0);
> +                f7_EBX = data[1];
> +                f7_ECX = data[2];
> +            }
> +        }
> +    }
> +
> +    int get_instruction_support()
> +    {
> +        get_support_flags();
> +
> +    #ifdef SSE3
> +        return (f1_ECX & (1UL << 0)) ? 1 : 0;
> +    #endif
> +    #ifdef PCLMUL
> +        return (f1_ECX & (1UL << 1)) ? 1 : 0;
> +    #endif
> +    #ifdef SSSE3
> +        return (f1_ECX & (1UL << 9)) ? 1 : 0;
> +    #endif
> +    #ifdef SSE4_1
> +        return (f1_ECX & (1UL << 19)) ? 1 : 0;
> +    #endif
> +    #ifdef SSE4_2
> +        return (f1_ECX & (1UL << 20)) ? 1 : 0;
> +    #endif
> +    #ifdef AES
> +        return (f1_ECX & (1UL << 25)) ? 1 : 0;
> +    #endif
> +    #ifdef AVX
> +        return (f1_ECX & (1UL << 28)) ? 1 : 0;
> +    #endif
> +    #ifdef RDRND
> +        return (f1_ECX & (1UL << 30)) ? 1 : 0;
> +    #endif
> +    #ifdef SSE
> +        return (f1_EDX & (1UL << 25)) ? 1 : 0;
> +    #endif
> +    #ifdef SSE2
> +        return (f1_EDX & (1UL << 26)) ? 1 : 0;
> +    #endif
> +    #ifdef AVX2
> +        return (f7_EBX & (1UL << 5)) ? 1 : 0;
> +    #endif
> +    #ifdef AVX512F
> +        return (f7_EBX & (1UL << 16)) ? 1 : 0;
> +    #endif
> +    #ifdef AVX512DQ
> +        return (f7_EBX & (1UL << 17)) ? 1 : 0;
> +    #endif
> +    #ifdef RDSEED
> +        return (f7_EBX & (1UL << 18)) ? 1 : 0;
> +    #endif
> +    #ifdef AVX512IFMA
> +        return (f7_EBX & (1UL << 21)) ? 1 : 0;
> +    #endif
> +    #ifdef AVX512CD
> +        return (f7_EBX & (1UL << 28)) ? 1 : 0;
> +    #endif
> +    #ifdef AVX512BW
> +        return (f7_EBX & (1UL << 30)) ? 1 : 0;
> +    #endif
> +    #ifdef AVX512VL
> +        return (f7_EBX & (1UL << 31)) ? 1 : 0;
> +    #endif
> +    #ifdef GFNI
> +        return (f7_ECX & (1UL << 8)) ? 1 : 0;
> +    #endif
> +    #ifdef VPCLMULQDQ
> +        return (f7_ECX & (1UL << 10)) ? 1 : 0;
> +    #endif
> +
> +        return -1;
> +    }
> +
> +    int main(int argc, char *argv[])
> +    {
> +        int res = get_instruction_support();
> +        if (res == -1) {
> +            printf("Unknown instruction set");
> +            return -1;
> +        }
> +        printf("%d", res);
> +
> +        return 0;
> +    }
> +'''
> +
> +# The data in table below can be found here:
> +# https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
> +# This table only contains CPUs that have SSE4.2, as this instruction set is required by DPDK.
> +# That means that in addition to the instruction sets mentioned in the table, all these CPUs
> +# also have ['SSE', 'SSE2', 'SSE3', 'SSEE3', 'SSE4_1', 'SSE4_2']
> +cpu_type_to_flags = {
> +       'x86-64-v2': [],
> +       'x86-64-v3': ['AVX', 'AVX2'],
> +       'x86-64-v4': ['AVX', 'AVX2', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'],
> +         'nehalem': [],
> +          'corei7': [],
> +        'westmere': ['PCLMUL'],
> +     'sandybridge': ['AVX', 'PCLMUL'],
> +      'corei7-avx': ['AVX', 'PCLMUL'],
> +       'ivybridge': ['AVX', 'PCLMUL', 'RDRND'],
> +      'core-avx-i': ['AVX', 'PCLMUL', 'RDRND'],
> +         'haswell': ['AVX', 'PCLMUL', 'RDRND', 'AVX2'],
> +       'core-avx2': ['AVX', 'PCLMUL', 'RDRND', 'AVX2'],
> +       'broadwell': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED'],
> +         'skylake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
> +  'skylake-avx512': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'],
> +     'cascadelake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'],
> +      'cannonlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA'],
> +      'cooperlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'],
> +  'icelake-client': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> +  'icelake-server': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> +       'tigerlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> +      'rocketlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> +       'alderlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> +      'raptorlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> +      'meteorlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> +       'gracemont': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> +       'arrowlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> +     'arrowlake-s': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> +       'lunarlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> +     'pantherlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> +  'sapphirerapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> +   'emeraldrapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> +   'graniterapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> + 'graniterapids-d': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> +   'diamondrapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> +      'silvermont': ['PCLMUL', 'RDRND'],
> +             'slm': ['PCLMUL', 'RDRND'],
> +        'goldmont': ['PCLMUL', 'RDRND', 'RDSEED', 'AES'],
> +   'goldmont-plus': ['PCLMUL', 'RDRND', 'RDSEED', 'AES'],
> +         'tremont': ['PCLMUL', 'RDRND', 'RDSEED', 'AES', 'GFNI'],
> +    'sierraforest': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> +      'grandridge': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> +'clearwaterforest': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> +          'bdver1': ['AVX', 'PCLMUL', 'AES'],
> +          'bdver2': ['AVX', 'PCLMUL', 'AES'],
> +          'bdver3': ['AVX', 'PCLMUL', 'AES'],
> +          'bdver4': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'AES'],
> +          'znver1': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
> +          'znver2': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
> +          'znver3': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ'],
> +          'znver4': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> +          'znver5': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> +          'btver2': ['AVX', 'PCLMUL', 'AES'],
> +        'lujiazui': ['PCLMUL', 'RDRND', 'RDSEED', 'AES'],
> +        'yongfeng': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
> +      'shijidadao': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
> +}
> +

I really don't want to have tables like this to maintain in our code if at
all possible. We used to have something a bit similar in DPDK IIRC, but we
found it a maintenance nightmare and just switched to using the compiler to
do all the work. In our existing builds, we just pass the
cpu_instruction_set parameter straight to the -march flag of the compiler.
For MSVC support, I believe we should just do the exact same.

Maintaining lists like this will be a problem as new platforms need to be
constantly added. Do we also look to backport them, because if equivalence
with the linux build is necessary then that will have to be done - as on
Linux when a new version of GCC comes out, we can then use the new
instruction set targets on the old releases of DPDK.

> +if is_ms_compiler
> +    cc_avx2_flags = ['/arch:AVX2']
> +    cc_avx512_flags = ['/arch:AVX512']
> +else
> +    cc_avx2_flags = ['-mavx2']
> +    cc_avx512_flags = ['-mavx512f', '-mavx512vl', '-mavx512dq', '-mavx512bw']
> +endif
> +
>  cc_has_avx512 = false
>  target_has_avx512 = false
>  if (binutils_ok and cc.has_multi_arguments(cc_avx512_flags)
> @@ -30,12 +217,14 @@ if (binutils_ok and cc.has_multi_arguments(cc_avx512_flags)
>          warning('Broken _mm512_extracti64x4_epi64, disabling AVX512 support')
>      else
>          cc_has_avx512 = true
> -        target_has_avx512 = (
> -                cc.get_define('__AVX512F__', args: machine_args) != '' and
> -                cc.get_define('__AVX512BW__', args: machine_args) != '' and
> -                cc.get_define('__AVX512DQ__', args: machine_args) != '' and
> -                cc.get_define('__AVX512VL__', args: machine_args) != ''
> -            )
> +        if not is_ms_compiler
> +            target_has_avx512 = (
> +                    cc.get_define('__AVX512F__', args: machine_args) != '' and
> +                    cc.get_define('__AVX512BW__', args: machine_args) != '' and
> +                    cc.get_define('__AVX512DQ__', args: machine_args) != '' and
> +                    cc.get_define('__AVX512VL__', args: machine_args) != ''
> +                )
> +        endif
>      endif
>  endif
>  
> @@ -47,42 +236,139 @@ if not is_ms_compiler
>      endif
>  endif
>  
> -# enable restricted transactional memory intrinsics
> -# https://gcc.gnu.org/onlinedocs/gcc/x86-transactional-memory-intrinsics.html
> -if cc.get_id() != 'msvc'
> -    machine_args += '-mrtm'
> -endif
> +if is_ms_compiler
> +    # Determine cpu_flags for a given configuration.
> +    # SSE instructions up to 4.2 are required for DPDK.
> +    cpu_flags = ['SSE', 'SSE2', 'SSE3', 'SSEE3', 'SSE4_1', 'SSE4_2']
> +
> +    message('cpu_instruction_set: @0@'.format(cpu_instruction_set))
>  
> -base_flags = ['SSE', 'SSE2', 'SSE3','SSSE3', 'SSE4_1', 'SSE4_2']
> -foreach f:base_flags
> -    compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
> -endforeach
> -
> -optional_flags = [
> -        'AES',
> -        'AVX',
> -        'AVX2',
> -        'AVX512BW',
> -        'AVX512CD',
> -        'AVX512DQ',
> -        'AVX512F',
> -        'AVX512VL',
> -        'PCLMUL',
> -        'RDRND',
> -        'RDSEED',
> -        'VPCLMULQDQ',
> -]
> -foreach f:optional_flags
> -    if cc.get_define('__@0@__'.format(f), args: machine_args) == '1'
> -        if f == 'PCLMUL' # special case flags with different defines
> -            f = 'PCLMULQDQ'
> -        elif f == 'RDRND'
> -            f = 'RDRAND'
> +    if cpu_instruction_set == ''
> +        # Nothing to do as cpu_flags already holds all the required flags.
> +    elif cpu_instruction_set == 'native'
> +        # MSVC behaves differently than GCC regarding supported instruction sets.
> +        # While GCC will create macros like __AVX512F__ when such instruction set is
> +        # supported by the current CPU, MSVC does not do that. MSVC will create that
> +        # macro when parameter /arch:AVX512 is passed to the compiler, even when the
> +        # CPU does not have that instruction set (by design). So there's a need to
> +        # look at CPUID flags to figure out what is really supported by the CPU, so
> +        # that the correct /arch value can be passed to the compiler.
> +        # The macros also need to be explicitly defined, as /arch will not create all
> +        # macros GCC creates under the same conditions.
> +        # As an example, /arch:AVX512 creates __AVX512BW__, but does not create __SSE2__.
> +        # More details available here:
> +        # https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros
> +
> +        optional_flags = [
> +                'PCLMUL',
> +                'AES',
> +                'AVX',
> +                'RDRND',
> +                'AVX2',
> +                'AVX512F',
> +                'AVX512BW',
> +                'AVX512DQ',
> +                'AVX512VL',
> +                'AVX512CD',
> +                'AVX512IFMA',
> +                'GFNI',
> +                'RDSEED',
> +                'VPCLMULQDQ',
> +        ]
> +        foreach f:optional_flags
> +            result = cc.run(cpuid_code, args: '-D@0@'.format(f),
> +                            name: 'instruction set @0@'.format(f))
> +            has_instr_set = result.returncode() == 0 and result.stdout() == '1'
> +            if has_instr_set
> +                cpu_flags += f
> +            endif
> +            message('Target has @0@: @1@'.format(f, has_instr_set))
> +        endforeach
> +    else
> +        # An explicit cpu_instruction_set was provided. Get cpu_flags
> +        # from cpu_type_to_flags table.
> +        if cpu_instruction_set not in cpu_type_to_flags
> +            error('CPU not known or not supported. Please update the table with known CPUs if needed.')
>          endif
> -        compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
> +        cpu_flags += cpu_type_to_flags[cpu_instruction_set]
> +    endif
> +
> +    # Now that all cpu_flags are known, set compile_time_cpuflags and also
> +    # machine_args to ensure that the instruction set #defines (like __SSE2__)
> +    # are always present in the preprocessor.
> +    message('cpu_flags: @0@'.format(cpu_flags))
> +
> +    foreach flag:cpu_flags
> +        machine_args += '/D__@0@__'.format(flag)
> +        if flag == 'PCLMUL'
> +            flag = 'PCLMULQDQ'
> +        elif flag == 'RDRND'
> +            flag = 'RDRAND'
> +        endif
> +        compile_time_cpuflags += ['RTE_CPUFLAG_' + flag]
> +    endforeach
> +
> +    target_has_avx512 = ('AVX512F'  in cpu_flags and
> +                         'AVX512BW' in cpu_flags and
> +                         'AVX512DQ' in cpu_flags and
> +                         'AVX512VL' in cpu_flags)
> +
> +    # Decide which instruction sets should be used by the compiler.
> +    # With MSVC, intrinsic functions are always enabled. However, for the
> +    # compiler to use an extended instruction set for automatically
> +    # generated code "/arch" needs to be passed. So we instruct the compiler
> +    # to use the largest set that is supported by the CPU. It is implied that
> +    # smaller sets than the largest selected are included, as described here:
> +    # https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170
> +    if 'RTE_CPUFLAG_AVX512F' in compile_time_cpuflags
> +        machine_args += ['/arch:AVX512']
> +    elif 'RTE_CPUFLAG_AVX2' in compile_time_cpuflags
> +        machine_args += ['/arch:AVX2']
> +    elif 'RTE_CPUFLAG_AVX' in compile_time_cpuflags
> +        machine_args += ['/arch:AVX']
> +    else
> +        # SSE4.2 is expected to always be available
> +        machine_args += ['/arch:SSE4.2']
>      endif
> -endforeach
>  

Since these appear to be the only /arch flags supported by the compiler for
code generation, I would suggest that these would be the only instruction
set flags that we support on MSVC builds, and that we then build up the
actual CPU flags based on the minimum flags to be expected when each of
these instruction sets is present.

Similarly with 'native', rather than supporting all the different CPU types,
it would be a lot easier to just determine if it's an SSE4 machine, an AVX2
machine or AVX512, and run with that.

My thinking is that getting this as a first step should get us a lot of the
benefits without such a massive maintenance headache.

> +    message('machine_args: @0@'.format(machine_args))
> +else
> +    # enable restricted transactional memory intrinsics
> +    # https://gcc.gnu.org/onlinedocs/gcc/x86-transactional-memory-intrinsics.html
> +    machine_args += '-mrtm'
> +
> +    base_flags = ['SSE', 'SSE2', 'SSE3','SSSE3', 'SSE4_1', 'SSE4_2']
> +    foreach f:base_flags
> +        compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
> +    endforeach
> +
> +    optional_flags = [
> +            'AES',
> +            'AVX',
> +            'AVX2',
> +            'AVX512BW',
> +            'AVX512CD',
> +            'AVX512DQ',
> +            'AVX512F',
> +            'AVX512VL',
> +            'PCLMUL',
> +            'RDRND',
> +            'RDSEED',
> +            'VPCLMULQDQ',
> +    ]
> +    foreach f:optional_flags
> +        if cc.get_define('__@0@__'.format(f), args: machine_args) == '1'
> +            if f == 'PCLMUL' # special case flags with different defines
> +                f = 'PCLMULQDQ'
> +            elif f == 'RDRND'
> +                f = 'RDRAND'
> +            endif
> +            compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
> +        endif
> +    endforeach
> +endif
> +
> +message('compile_time_cpuflags: @0@'.format(compile_time_cpuflags))
>  
>  dpdk_conf.set('RTE_ARCH_X86', 1)
>  if dpdk_conf.get('RTE_ARCH_64')
> -- 
> 2.48.1.vfs.0.0
> 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] acl: make compatible with instruction set updates for MSVC
  2025-02-25  9:03   ` Bruce Richardson
@ 2025-02-25 16:37     ` Andre Muezerie
  2025-02-25 17:21       ` Bruce Richardson
  0 siblings, 1 reply; 34+ messages in thread
From: Andre Muezerie @ 2025-02-25 16:37 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Konstantin Ananyev, dev

On Tue, Feb 25, 2025 at 09:03:37AM +0000, Bruce Richardson wrote:
> On Mon, Feb 24, 2025 at 01:01:18PM -0800, Andre Muezerie wrote:
> > Top level 'cc_avx2_flags' was created and holds the correct flags
> > depending on the compiler used.
> > 
> > File meson.build was updated to handle the correct AVX512 flags
> > depending on compiler used.
> > 
> > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> > ---
> >  lib/acl/meson.build | 16 +++++++++-------
> >  1 file changed, 9 insertions(+), 7 deletions(-)
> > 
> > diff --git a/lib/acl/meson.build b/lib/acl/meson.build
> > index fefe131a48..24e47b6cc1 100644
> > --- a/lib/acl/meson.build
> > +++ b/lib/acl/meson.build
> > @@ -19,7 +19,7 @@ if dpdk_conf.has('RTE_ARCH_X86')
> >      avx2_tmplib = static_library('avx2_tmp',
> >              'acl_run_avx2.c',
> >              dependencies: static_rte_eal,
> > -            c_args: cflags + ['-mavx2'])
> > +            c_args: [cflags, cc_avx2_flags])
> >      objs += avx2_tmplib.extract_objects('acl_run_avx2.c')
> >  
> >      # compile AVX512 version if:
> > @@ -38,6 +38,12 @@ if dpdk_conf.has('RTE_ARCH_X86')
> >          # compiler flags, and then have the .o file from static lib
> >          # linked into main lib.
> >  
> > +        if is_ms_compiler
> > +            acl_avx512_args = cc_avx512_flags
> > +        else
> > +            acl_avx512_args = ['-mavx512f', '-mavx512vl', '-mavx512cd', '-mavx512bw']
> > +        endif
> > +
> 
> in the non-msvc case are these flags not the same as cc_avx512_flags too?
> If so, let's just get rid of the acl_avx512_args variable generally.
> 
> /Bruce

It's not an exact match. I did not look further to see if this was intentional or result
of a typo. TBH I'm not even sure if it would be possible to deduct this from the code.
Also, all the CPUs I have looked at bring all these 5 instruction sets together, but we
know this might not hold true in the future as each one of them has an independent CPUID flag.

    cc_avx512_flags = ['-mavx512f', '-mavx512vl', '-mavx512dq', '-mavx512bw']

My choice was to keep the flags that were used initially, but I can change that if the
maintainers tell me this was a mistake.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/6] drivers/net: make compatible with instruction set updates for MSVC
  2025-02-25  9:06   ` Bruce Richardson
@ 2025-02-25 16:44     ` Andre Muezerie
  0 siblings, 0 replies; 34+ messages in thread
From: Andre Muezerie @ 2025-02-25 16:44 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Ajit Khaparde, Somnath Kotur, John Daley, Hyong Youb Kim,
	Ian Stokes, Vladimir Medvedkin, Anatoly Burakov, Jingjing Wu,
	Praveen Shetty, Chaoyong He, Vamsi Attunuru, dev

On Tue, Feb 25, 2025 at 09:06:26AM +0000, Bruce Richardson wrote:
> On Mon, Feb 24, 2025 at 01:01:17PM -0800, Andre Muezerie wrote:
> > Top level 'cc_avx2_flags' was created and holds the correct flags
> > depending on the compiler used.
> > 
> > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> > ---
> >  drivers/net/bnxt/meson.build       | 2 +-
> >  drivers/net/enic/meson.build       | 2 +-
> >  drivers/net/intel/i40e/meson.build | 2 +-
> >  drivers/net/intel/iavf/meson.build | 2 +-
> >  drivers/net/intel/ice/meson.build  | 2 +-
> >  drivers/net/intel/idpf/meson.build | 2 +-
> >  drivers/net/nfp/meson.build        | 2 +-
> >  drivers/net/octeon_ep/meson.build  | 4 ++--
> >  8 files changed, 9 insertions(+), 9 deletions(-)
> > 
> > diff --git a/drivers/net/bnxt/meson.build b/drivers/net/bnxt/meson.build
> > index e26cf13a65..fd82d0c409 100644
> > --- a/drivers/net/bnxt/meson.build
> > +++ b/drivers/net/bnxt/meson.build
> > @@ -65,7 +65,7 @@ if arch_subdir == 'x86'
> >                  static_rte_bus_pci,
> >                  static_rte_kvargs, static_rte_hash],
> >              include_directories: includes,
> > -            c_args: [cflags, '-mavx2'])
> > +            c_args: [cflags, cc_avx2_flags])
> >       objs += bnxt_avx2_lib.extract_objects('bnxt_rxtx_vec_avx2.c')
> 
> I like this change, and the consistency of the variable with the equivalent
> avx512 one. To simplify getting this patchset in - or as much of it as
> possible - can you perhaps add the cc_avx2_flags variable earlier in the
> patchset and make these library changes to use it, ahead of the complicated
> changes in patch 3. I think it may be simplier to have everything but patch
> ready since they should be easy to review and merge, and then we can look
> at patch 3 standalone. WDYT?
> 
> /Bruce

I had added all these changes in the same patchset because they are related,
but if splitting them makes the review process easier I’m all for it.
I’ll replace this series with 2 new patchsets.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] acl: make compatible with instruction set updates for MSVC
  2025-02-25 16:37     ` Andre Muezerie
@ 2025-02-25 17:21       ` Bruce Richardson
  2025-02-25 17:23         ` Andre Muezerie
  0 siblings, 1 reply; 34+ messages in thread
From: Bruce Richardson @ 2025-02-25 17:21 UTC (permalink / raw)
  To: Andre Muezerie; +Cc: Konstantin Ananyev, dev

On Tue, Feb 25, 2025 at 08:37:27AM -0800, Andre Muezerie wrote:
> On Tue, Feb 25, 2025 at 09:03:37AM +0000, Bruce Richardson wrote:
> > On Mon, Feb 24, 2025 at 01:01:18PM -0800, Andre Muezerie wrote:
> > > Top level 'cc_avx2_flags' was created and holds the correct flags
> > > depending on the compiler used.
> > > 
> > > File meson.build was updated to handle the correct AVX512 flags
> > > depending on compiler used.
> > > 
> > > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> > > ---
> > >  lib/acl/meson.build | 16 +++++++++-------
> > >  1 file changed, 9 insertions(+), 7 deletions(-)
> > > 
> > > diff --git a/lib/acl/meson.build b/lib/acl/meson.build
> > > index fefe131a48..24e47b6cc1 100644
> > > --- a/lib/acl/meson.build
> > > +++ b/lib/acl/meson.build
> > > @@ -19,7 +19,7 @@ if dpdk_conf.has('RTE_ARCH_X86')
> > >      avx2_tmplib = static_library('avx2_tmp',
> > >              'acl_run_avx2.c',
> > >              dependencies: static_rte_eal,
> > > -            c_args: cflags + ['-mavx2'])
> > > +            c_args: [cflags, cc_avx2_flags])
> > >      objs += avx2_tmplib.extract_objects('acl_run_avx2.c')
> > >  
> > >      # compile AVX512 version if:
> > > @@ -38,6 +38,12 @@ if dpdk_conf.has('RTE_ARCH_X86')
> > >          # compiler flags, and then have the .o file from static lib
> > >          # linked into main lib.
> > >  
> > > +        if is_ms_compiler
> > > +            acl_avx512_args = cc_avx512_flags
> > > +        else
> > > +            acl_avx512_args = ['-mavx512f', '-mavx512vl', '-mavx512cd', '-mavx512bw']
> > > +        endif
> > > +
> > 
> > in the non-msvc case are these flags not the same as cc_avx512_flags too?
> > If so, let's just get rid of the acl_avx512_args variable generally.
> > 
> > /Bruce
> 
> It's not an exact match. I did not look further to see if this was intentional or result
> of a typo. TBH I'm not even sure if it would be possible to deduct this from the code.
> Also, all the CPUs I have looked at bring all these 5 instruction sets together, but we
> know this might not hold true in the future as each one of them has an independent CPUID flag.
> 

Yes, they are independent flags. However, given that AVX-512 has been
around a long time without any CPUs being released with only partial
support of the initial 5 sets introduced with the first AVX-512 CPUs, I'd
take the view that we are probably ok just mandating all 5 for AVX-512
support. That way, if it does happen that a CPU with partial support is
released, we just end up without AVX-512 support on it, rather than a
broken build.  We can then fix that later if such a situation occurs. Until
such time, we get nice simplicity in our code of having a standard AVX-512
flag-set.

>     cc_avx512_flags = ['-mavx512f', '-mavx512vl', '-mavx512dq', '-mavx512bw']
> 
> My choice was to keep the flags that were used initially, but I can change that if the
> maintainers tell me this was a mistake.

I'd add in "avx512cd" into the basic avx512 flags and then reuse the
variable. I suspect I just missed it when creating the list of flags.

/Bruce

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 5/6] acl: make compatible with instruction set updates for MSVC
  2025-02-25 17:21       ` Bruce Richardson
@ 2025-02-25 17:23         ` Andre Muezerie
  0 siblings, 0 replies; 34+ messages in thread
From: Andre Muezerie @ 2025-02-25 17:23 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Konstantin Ananyev, dev

On Tue, Feb 25, 2025 at 05:21:10PM +0000, Bruce Richardson wrote:
> On Tue, Feb 25, 2025 at 08:37:27AM -0800, Andre Muezerie wrote:
> > On Tue, Feb 25, 2025 at 09:03:37AM +0000, Bruce Richardson wrote:
> > > On Mon, Feb 24, 2025 at 01:01:18PM -0800, Andre Muezerie wrote:
> > > > Top level 'cc_avx2_flags' was created and holds the correct flags
> > > > depending on the compiler used.
> > > > 
> > > > File meson.build was updated to handle the correct AVX512 flags
> > > > depending on compiler used.
> > > > 
> > > > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> > > > ---
> > > >  lib/acl/meson.build | 16 +++++++++-------
> > > >  1 file changed, 9 insertions(+), 7 deletions(-)
> > > > 
> > > > diff --git a/lib/acl/meson.build b/lib/acl/meson.build
> > > > index fefe131a48..24e47b6cc1 100644
> > > > --- a/lib/acl/meson.build
> > > > +++ b/lib/acl/meson.build
> > > > @@ -19,7 +19,7 @@ if dpdk_conf.has('RTE_ARCH_X86')
> > > >      avx2_tmplib = static_library('avx2_tmp',
> > > >              'acl_run_avx2.c',
> > > >              dependencies: static_rte_eal,
> > > > -            c_args: cflags + ['-mavx2'])
> > > > +            c_args: [cflags, cc_avx2_flags])
> > > >      objs += avx2_tmplib.extract_objects('acl_run_avx2.c')
> > > >  
> > > >      # compile AVX512 version if:
> > > > @@ -38,6 +38,12 @@ if dpdk_conf.has('RTE_ARCH_X86')
> > > >          # compiler flags, and then have the .o file from static lib
> > > >          # linked into main lib.
> > > >  
> > > > +        if is_ms_compiler
> > > > +            acl_avx512_args = cc_avx512_flags
> > > > +        else
> > > > +            acl_avx512_args = ['-mavx512f', '-mavx512vl', '-mavx512cd', '-mavx512bw']
> > > > +        endif
> > > > +
> > > 
> > > in the non-msvc case are these flags not the same as cc_avx512_flags too?
> > > If so, let's just get rid of the acl_avx512_args variable generally.
> > > 
> > > /Bruce
> > 
> > It's not an exact match. I did not look further to see if this was intentional or result
> > of a typo. TBH I'm not even sure if it would be possible to deduct this from the code.
> > Also, all the CPUs I have looked at bring all these 5 instruction sets together, but we
> > know this might not hold true in the future as each one of them has an independent CPUID flag.
> > 
> 
> Yes, they are independent flags. However, given that AVX-512 has been
> around a long time without any CPUs being released with only partial
> support of the initial 5 sets introduced with the first AVX-512 CPUs, I'd
> take the view that we are probably ok just mandating all 5 for AVX-512
> support. That way, if it does happen that a CPU with partial support is
> released, we just end up without AVX-512 support on it, rather than a
> broken build.  We can then fix that later if such a situation occurs. Until
> such time, we get nice simplicity in our code of having a standard AVX-512
> flag-set.
> 
> >     cc_avx512_flags = ['-mavx512f', '-mavx512vl', '-mavx512dq', '-mavx512bw']
> > 
> > My choice was to keep the flags that were used initially, but I can change that if the
> > maintainers tell me this was a mistake.
> 
> I'd add in "avx512cd" into the basic avx512 flags and then reuse the
> variable. I suspect I just missed it when creating the list of flags.
> 
> /Bruce

Sounds good. I'll do that.

^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 0/5] allow faster instruction sets to be used with MSVC
  2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie
                   ` (5 preceding siblings ...)
  2025-02-24 21:01 ` [PATCH 6/6] member: " Andre Muezerie
@ 2025-02-26  1:06 ` Andre Muezerie
  2025-02-26  1:06   ` [PATCH v2 1/5] eal: make compatible with instruction set updates for MSVC Andre Muezerie
                     ` (4 more replies)
  2025-02-26 21:21 ` [PATCH v3 0/3] allow faster instruction sets to be used with MSVC Andre Muezerie
  7 siblings, 5 replies; 34+ messages in thread
From: Andre Muezerie @ 2025-02-26  1:06 UTC (permalink / raw)
  To: andremue; +Cc: dev

Meson.build files were refactored to add MSVC support for the AVX2
instruction set with MSVC: Top level variable cc_avx2_flags was created
and set according to the compiler used. It is used in lower level
directories.

Andre Muezerie (5):
  eal: make compatible with instruction set updates for MSVC
  eal: only use numbers as align parameters for MSVC
  config: create top level variable cc_avx2_flags
  drivers/net: make compatible with instruction set updates for MSVC
  acl: make compatible with instruction set updates for MSVC

 config/x86/meson.build             |  6 ++++++
 drivers/net/bnxt/meson.build       |  2 +-
 drivers/net/enic/meson.build       |  2 +-
 drivers/net/intel/i40e/meson.build |  2 +-
 drivers/net/intel/iavf/meson.build |  2 +-
 drivers/net/intel/ice/meson.build  |  2 +-
 drivers/net/intel/idpf/meson.build |  2 +-
 drivers/net/nfp/meson.build        |  2 +-
 drivers/net/octeon_ep/meson.build  |  4 ++--
 lib/acl/meson.build                |  2 +-
 lib/eal/common/rte_random.c        |  2 ++
 lib/eal/x86/include/rte_vect.h     | 11 ++++++++++-
 12 files changed, 28 insertions(+), 11 deletions(-)

--
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 1/5] eal: make compatible with instruction set updates for MSVC
  2025-02-26  1:06 ` [PATCH v2 0/5] allow faster instruction sets to be used with MSVC Andre Muezerie
@ 2025-02-26  1:06   ` Andre Muezerie
  2025-02-26  9:50     ` Bruce Richardson
  2025-02-26  1:06   ` [PATCH v2 2/5] eal: only use numbers as align parameters " Andre Muezerie
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 34+ messages in thread
From: Andre Muezerie @ 2025-02-26  1:06 UTC (permalink / raw)
  To: andremue; +Cc: dev

After the instruction set updates for MSVC the error below popped up:

../lib/eal/common/rte_random.c(6): fatal error C1083:
    Cannot open include file: 'x86intrin.h': No such file or directory

The fix is to not include header x86intrin.h with MSVC.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
---
 lib/eal/common/rte_random.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/eal/common/rte_random.c b/lib/eal/common/rte_random.c
index 8e62578176..9354358818 100644
--- a/lib/eal/common/rte_random.c
+++ b/lib/eal/common/rte_random.c
@@ -3,8 +3,10 @@
  */
 
 #ifdef __RDSEED__
+#ifndef RTE_TOOLCHAIN_MSVC
 #include <x86intrin.h>
 #endif
+#endif
 #include <unistd.h>
 
 #include <rte_bitops.h>
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 2/5] eal: only use numbers as align parameters for MSVC
  2025-02-26  1:06 ` [PATCH v2 0/5] allow faster instruction sets to be used with MSVC Andre Muezerie
  2025-02-26  1:06   ` [PATCH v2 1/5] eal: make compatible with instruction set updates for MSVC Andre Muezerie
@ 2025-02-26  1:06   ` Andre Muezerie
  2025-02-26  9:51     ` Bruce Richardson
  2025-02-26 10:15     ` Konstantin Ananyev
  2025-02-26  1:06   ` [PATCH v2 3/5] config: create top level variable cc_avx2_flags Andre Muezerie
                     ` (2 subsequent siblings)
  4 siblings, 2 replies; 34+ messages in thread
From: Andre Muezerie @ 2025-02-26  1:06 UTC (permalink / raw)
  To: andremue; +Cc: dev

After the instruction set updates for MSVC the error below popped up:

..\lib\eal\x86\include\rte_vect.h(82): error C2059: syntax error: '('

The issue is that MSVC does not allow __rte_aligned(RTE_X86_ZMM_SIZE).
It only accepts numbers that are power of 2. So, even though
RTE_X86_ZMM_SIZE represents a number that is a power of two it cannot
be used directly.
https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
---
 lib/eal/x86/include/rte_vect.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/eal/x86/include/rte_vect.h b/lib/eal/x86/include/rte_vect.h
index 70c78e9b77..0a51c539a4 100644
--- a/lib/eal/x86/include/rte_vect.h
+++ b/lib/eal/x86/include/rte_vect.h
@@ -79,7 +79,16 @@ __extension__ ({                \
 #define RTE_X86_ZMM_SIZE	(sizeof(__m512i))
 #define RTE_X86_ZMM_MASK	(RTE_X86_ZMM_SIZE - 1)
 
-typedef union __rte_aligned(RTE_X86_ZMM_SIZE) __rte_x86_zmm {
+/*
+ * MSVC does not allow __rte_aligned(RTE_X86_ZMM_SIZE). It only accepts
+ * numbers that are power of 2. So, even though RTE_X86_ZMM_SIZE represents a
+ * number that is a power of two it cannot be used directly.
+ * Ref: https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170
+ * The static assert below ensures that RTE_X86_ZMM_SIZE is equal to what is
+ * used in the __rte_aligned() expression.
+ */
+static_assert(RTE_X86_ZMM_SIZE == 64, "Unexpected size of __m512i");
+typedef union __rte_aligned(64) __rte_x86_zmm {
 	__m512i	 z;
 	ymm_t    y[RTE_X86_ZMM_SIZE / sizeof(ymm_t)];
 	xmm_t    x[RTE_X86_ZMM_SIZE / sizeof(xmm_t)];
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 3/5] config: create top level variable cc_avx2_flags
  2025-02-26  1:06 ` [PATCH v2 0/5] allow faster instruction sets to be used with MSVC Andre Muezerie
  2025-02-26  1:06   ` [PATCH v2 1/5] eal: make compatible with instruction set updates for MSVC Andre Muezerie
  2025-02-26  1:06   ` [PATCH v2 2/5] eal: only use numbers as align parameters " Andre Muezerie
@ 2025-02-26  1:06   ` Andre Muezerie
  2025-02-26  9:52     ` Bruce Richardson
  2025-02-26  1:06   ` [PATCH v2 4/5] drivers/net: make compatible with instruction set updates for MSVC Andre Muezerie
  2025-02-26  1:06   ` [PATCH v2 5/5] acl: " Andre Muezerie
  4 siblings, 1 reply; 34+ messages in thread
From: Andre Muezerie @ 2025-02-26  1:06 UTC (permalink / raw)
  To: andremue; +Cc: dev

Top level variable cc_avx2_flags was created and set according to the
compiler used. It is used in lower level directories.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
---
 config/x86/meson.build | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/config/x86/meson.build b/config/x86/meson.build
index 47a5b0c04a..941d4d921f 100644
--- a/config/x86/meson.build
+++ b/config/x86/meson.build
@@ -1,6 +1,12 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017-2020 Intel Corporation
 
+if is_ms_compiler
+    cc_avx2_flags = ['/arch:AVX2']
+else
+    cc_avx2_flags = ['-mavx2']
+endif
+
 # get binutils version for the workaround of Bug 97
 binutils_ok = true
 if is_linux or cc.get_id() == 'gcc'
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 4/5] drivers/net: make compatible with instruction set updates for MSVC
  2025-02-26  1:06 ` [PATCH v2 0/5] allow faster instruction sets to be used with MSVC Andre Muezerie
                     ` (2 preceding siblings ...)
  2025-02-26  1:06   ` [PATCH v2 3/5] config: create top level variable cc_avx2_flags Andre Muezerie
@ 2025-02-26  1:06   ` Andre Muezerie
  2025-02-26  9:53     ` Bruce Richardson
  2025-02-26  1:06   ` [PATCH v2 5/5] acl: " Andre Muezerie
  4 siblings, 1 reply; 34+ messages in thread
From: Andre Muezerie @ 2025-02-26  1:06 UTC (permalink / raw)
  To: andremue; +Cc: dev

Top level 'cc_avx2_flags' was created and holds the correct flags
depending on the compiler used.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
---
 drivers/net/bnxt/meson.build       | 2 +-
 drivers/net/enic/meson.build       | 2 +-
 drivers/net/intel/i40e/meson.build | 2 +-
 drivers/net/intel/iavf/meson.build | 2 +-
 drivers/net/intel/ice/meson.build  | 2 +-
 drivers/net/intel/idpf/meson.build | 2 +-
 drivers/net/nfp/meson.build        | 2 +-
 drivers/net/octeon_ep/meson.build  | 4 ++--
 8 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/net/bnxt/meson.build b/drivers/net/bnxt/meson.build
index e26cf13a65..fd82d0c409 100644
--- a/drivers/net/bnxt/meson.build
+++ b/drivers/net/bnxt/meson.build
@@ -65,7 +65,7 @@ if arch_subdir == 'x86'
                 static_rte_bus_pci,
                 static_rte_kvargs, static_rte_hash],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
      objs += bnxt_avx2_lib.extract_objects('bnxt_rxtx_vec_avx2.c')
 elif arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
     sources += files('bnxt_rxtx_vec_neon.c')
diff --git a/drivers/net/enic/meson.build b/drivers/net/enic/meson.build
index 1e26338350..cfe5ec170a 100644
--- a/drivers/net/enic/meson.build
+++ b/drivers/net/enic/meson.build
@@ -38,7 +38,7 @@ if dpdk_conf.has('RTE_ARCH_X86_64')
             'enic_rxtx_vec_avx2.c',
             dependencies: [static_rte_ethdev, static_rte_bus_pci],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += enic_avx2_lib.extract_objects('enic_rxtx_vec_avx2.c')
 endif
 
diff --git a/drivers/net/intel/i40e/meson.build b/drivers/net/intel/i40e/meson.build
index ffa40c5d64..2973ed1a01 100644
--- a/drivers/net/intel/i40e/meson.build
+++ b/drivers/net/intel/i40e/meson.build
@@ -49,7 +49,7 @@ if arch_subdir == 'x86'
             'i40e_rxtx_vec_avx2.c',
             dependencies: [static_rte_ethdev, static_rte_kvargs, static_rte_hash],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += i40e_avx2_lib.extract_objects('i40e_rxtx_vec_avx2.c')
 
     if cc_has_avx512
diff --git a/drivers/net/intel/iavf/meson.build b/drivers/net/intel/iavf/meson.build
index 19cd1cfbc8..f7eac7c57a 100644
--- a/drivers/net/intel/iavf/meson.build
+++ b/drivers/net/intel/iavf/meson.build
@@ -37,7 +37,7 @@ if arch_subdir == 'x86'
             'iavf_rxtx_vec_avx2.c',
             dependencies: [static_rte_ethdev],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += iavf_avx2_lib.extract_objects('iavf_rxtx_vec_avx2.c')
 
     if cc_has_avx512
diff --git a/drivers/net/intel/ice/meson.build b/drivers/net/intel/ice/meson.build
index a34b7c966a..cbdf38c1c4 100644
--- a/drivers/net/intel/ice/meson.build
+++ b/drivers/net/intel/ice/meson.build
@@ -43,7 +43,7 @@ if arch_subdir == 'x86'
             'ice_rxtx_vec_avx2.c',
             dependencies: [static_rte_ethdev, static_rte_hash],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += ice_avx2_lib.extract_objects('ice_rxtx_vec_avx2.c')
 
     if cc_has_avx512
diff --git a/drivers/net/intel/idpf/meson.build b/drivers/net/intel/idpf/meson.build
index 802b13035b..4b272d02b1 100644
--- a/drivers/net/intel/idpf/meson.build
+++ b/drivers/net/intel/idpf/meson.build
@@ -23,7 +23,7 @@ if arch_subdir == 'x86' and dpdk_conf.get('RTE_IOVA_IN_MBUF') == 1
         'idpf_common_rxtx_avx2.c',
         dependencies: [static_rte_ethdev, static_rte_hash],
         include_directories: includes,
-        c_args: [cflags, '-mavx2'])
+        c_args: [cflags, cc_avx2_flags])
     objs += idpf_avx2_lib.extract_objects('idpf_common_rxtx_avx2.c')
 
     if cc_has_avx512
diff --git a/drivers/net/nfp/meson.build b/drivers/net/nfp/meson.build
index 39762bd45a..0a12b7dce7 100644
--- a/drivers/net/nfp/meson.build
+++ b/drivers/net/nfp/meson.build
@@ -61,7 +61,7 @@ if arch_subdir == 'x86'
                 avx2_sources,
                 dependencies: [static_rte_ethdev, static_rte_bus_pci],
                 include_directories: includes,
-                c_args: [cflags, '-mavx2']
+                c_args: [cflags, cc_avx2_flags]
         )
 
         objs += nfp_avx2_lib.extract_all_objects(recursive: true)
diff --git a/drivers/net/octeon_ep/meson.build b/drivers/net/octeon_ep/meson.build
index d5d40b23a1..1b34db3edc 100644
--- a/drivers/net/octeon_ep/meson.build
+++ b/drivers/net/octeon_ep/meson.build
@@ -18,13 +18,13 @@ if arch_subdir == 'x86'
     if cc.get_define('__AVX2__', args: machine_args) != ''
         cflags += ['-DCC_AVX2_SUPPORT']
         sources += files('cnxk_ep_rx_avx.c')
-    elif cc.has_argument('-mavx2')
+    elif cc.has_multi_arguments(cc_avx2_flags)
         cflags += ['-DCC_AVX2_SUPPORT']
         otx_ep_avx2_lib = static_library('otx_ep_avx2_lib',
                         'cnxk_ep_rx_avx.c',
                         dependencies: [static_rte_ethdev, static_rte_pci, static_rte_bus_pci],
                         include_directories: includes,
-                        c_args: [cflags, '-mavx2'])
+                        c_args: [cflags, cc_avx2_flags])
         objs += otx_ep_avx2_lib.extract_objects('cnxk_ep_rx_avx.c')
     endif
 endif
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v2 5/5] acl: make compatible with instruction set updates for MSVC
  2025-02-26  1:06 ` [PATCH v2 0/5] allow faster instruction sets to be used with MSVC Andre Muezerie
                     ` (3 preceding siblings ...)
  2025-02-26  1:06   ` [PATCH v2 4/5] drivers/net: make compatible with instruction set updates for MSVC Andre Muezerie
@ 2025-02-26  1:06   ` Andre Muezerie
  2025-02-26  9:54     ` Bruce Richardson
  4 siblings, 1 reply; 34+ messages in thread
From: Andre Muezerie @ 2025-02-26  1:06 UTC (permalink / raw)
  To: andremue; +Cc: dev

Top level 'cc_avx2_flags' was created and holds the correct flags
depending on the compiler used.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
---
 lib/acl/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/acl/meson.build b/lib/acl/meson.build
index fefe131a48..a28d70c23f 100644
--- a/lib/acl/meson.build
+++ b/lib/acl/meson.build
@@ -19,7 +19,7 @@ if dpdk_conf.has('RTE_ARCH_X86')
     avx2_tmplib = static_library('avx2_tmp',
             'acl_run_avx2.c',
             dependencies: static_rte_eal,
-            c_args: cflags + ['-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += avx2_tmplib.extract_objects('acl_run_avx2.c')
 
     # compile AVX512 version if:
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] config: allow faster instruction sets to be used with MSVC
  2025-02-25 14:28   ` Bruce Richardson
@ 2025-02-26  2:01     ` Andre Muezerie
  2025-02-26  9:44       ` Bruce Richardson
  0 siblings, 1 reply; 34+ messages in thread
From: Andre Muezerie @ 2025-02-26  2:01 UTC (permalink / raw)
  To: Bruce Richardson; +Cc: Konstantin Ananyev, dev

On Tue, Feb 25, 2025 at 02:28:02PM +0000, Bruce Richardson wrote:
> On Mon, Feb 24, 2025 at 01:01:16PM -0800, Andre Muezerie wrote:
> > Up to now MSVC has being used with the default mode, which uses SSE2
> > instructions for scalar floating-point and vector calculations.
> > https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170
> > 
> > This patch allows users to specify the CPU for which the generated
> > code should be optimized for in the same way it's done for GCC: by
> > passing the CPU name.
> > When no explicit CPU name is passed, 'native' is assumed (like it
> > happens with GCC) and the code will be optimized for the same CPU
> > type used to compile the code.
> > 
> > MSVC does not provide this functionality natively, so logic was
> > added to meson.build to handle these differences, detecting which
> > instruction sets are supported by the CPU(s), passing the best
> > options to MSVC and setting the correct macros (like __AVX512F__)
> > so that the DPDK code can rely on them like it is done with GCC.
> > 
> > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> > ---
> 
> Hi Andre,
> 
> couple of initial thoughts inline below.
> 
> /Bruce
> 
> >  config/x86/meson.build | 364 ++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 325 insertions(+), 39 deletions(-)
> > 
> 
> There is quite a lot of new code to be added here. Might it be worthwhile
> creating a "config/x86/msvc/" subdirectory with its own meson.build file to
> handle all the complexities of using it. We can have the common material at
> the top of the x86/meson.build file, and then do 
> 
> if is_ms_compiler
>     subdir(msvc)
>     subdir_done()
> endif
> 
> leaving the rest of the file for the gcc/clang/icx code.

I think that makes sense, as there's not much common code there that is common to gcc and msvc.

> 
> > diff --git a/config/x86/meson.build b/config/x86/meson.build
> > index 47a5b0c04a..9260969c54 100644
> > --- a/config/x86/meson.build
> > +++ b/config/x86/meson.build
> > @@ -14,7 +14,194 @@ if is_linux or cc.get_id() == 'gcc'
> >      endif
> >  endif
> >  
> > -cc_avx512_flags = ['-mavx512f', '-mavx512vl', '-mavx512dq', '-mavx512bw']
> > +cpuid_code = '''
> > +    #include <stdio.h>
> > +    #include <stdint.h>
> > +    #include <intrin.h>
> > +
> > +    uint32_t f1_ECX = 0;
> > +    uint32_t f1_EDX = 0;
> > +    uint32_t f7_EBX = 0;
> > +    uint32_t f7_ECX = 0;
> > +
> > +    void get_support_flags()
> > +    {
> > +        int ids_max;
> > +        int data[4];
> > +
> > +        /*
> > +         * Calling __cpuid with 0x0 as the function_id argument
> > +         * gets the number of the highest valid function ID.
> > +         */
> > +        __cpuid(data, 0);
> > +        ids_max = data[0];
> > +
> > +        if (1 <= ids_max) {
> > +            __cpuidex(data, 1, 0);
> > +            f1_ECX = data[2];
> > +            f1_EDX = data[3];
> > +
> > +            if (7 <= ids_max) {
> > +                __cpuidex(data, 7, 0);
> > +                f7_EBX = data[1];
> > +                f7_ECX = data[2];
> > +            }
> > +        }
> > +    }
> > +
> > +    int get_instruction_support()
> > +    {
> > +        get_support_flags();
> > +
> > +    #ifdef SSE3
> > +        return (f1_ECX & (1UL << 0)) ? 1 : 0;
> > +    #endif
> > +    #ifdef PCLMUL
> > +        return (f1_ECX & (1UL << 1)) ? 1 : 0;
> > +    #endif
> > +    #ifdef SSSE3
> > +        return (f1_ECX & (1UL << 9)) ? 1 : 0;
> > +    #endif
> > +    #ifdef SSE4_1
> > +        return (f1_ECX & (1UL << 19)) ? 1 : 0;
> > +    #endif
> > +    #ifdef SSE4_2
> > +        return (f1_ECX & (1UL << 20)) ? 1 : 0;
> > +    #endif
> > +    #ifdef AES
> > +        return (f1_ECX & (1UL << 25)) ? 1 : 0;
> > +    #endif
> > +    #ifdef AVX
> > +        return (f1_ECX & (1UL << 28)) ? 1 : 0;
> > +    #endif
> > +    #ifdef RDRND
> > +        return (f1_ECX & (1UL << 30)) ? 1 : 0;
> > +    #endif
> > +    #ifdef SSE
> > +        return (f1_EDX & (1UL << 25)) ? 1 : 0;
> > +    #endif
> > +    #ifdef SSE2
> > +        return (f1_EDX & (1UL << 26)) ? 1 : 0;
> > +    #endif
> > +    #ifdef AVX2
> > +        return (f7_EBX & (1UL << 5)) ? 1 : 0;
> > +    #endif
> > +    #ifdef AVX512F
> > +        return (f7_EBX & (1UL << 16)) ? 1 : 0;
> > +    #endif
> > +    #ifdef AVX512DQ
> > +        return (f7_EBX & (1UL << 17)) ? 1 : 0;
> > +    #endif
> > +    #ifdef RDSEED
> > +        return (f7_EBX & (1UL << 18)) ? 1 : 0;
> > +    #endif
> > +    #ifdef AVX512IFMA
> > +        return (f7_EBX & (1UL << 21)) ? 1 : 0;
> > +    #endif
> > +    #ifdef AVX512CD
> > +        return (f7_EBX & (1UL << 28)) ? 1 : 0;
> > +    #endif
> > +    #ifdef AVX512BW
> > +        return (f7_EBX & (1UL << 30)) ? 1 : 0;
> > +    #endif
> > +    #ifdef AVX512VL
> > +        return (f7_EBX & (1UL << 31)) ? 1 : 0;
> > +    #endif
> > +    #ifdef GFNI
> > +        return (f7_ECX & (1UL << 8)) ? 1 : 0;
> > +    #endif
> > +    #ifdef VPCLMULQDQ
> > +        return (f7_ECX & (1UL << 10)) ? 1 : 0;
> > +    #endif
> > +
> > +        return -1;
> > +    }
> > +
> > +    int main(int argc, char *argv[])
> > +    {
> > +        int res = get_instruction_support();
> > +        if (res == -1) {
> > +            printf("Unknown instruction set");
> > +            return -1;
> > +        }
> > +        printf("%d", res);
> > +
> > +        return 0;
> > +    }
> > +'''
> > +
> > +# The data in table below can be found here:
> > +# https://gcc.gnu.org/onlinedocs/gcc/x86-Options.html
> > +# This table only contains CPUs that have SSE4.2, as this instruction set is required by DPDK.
> > +# That means that in addition to the instruction sets mentioned in the table, all these CPUs
> > +# also have ['SSE', 'SSE2', 'SSE3', 'SSEE3', 'SSE4_1', 'SSE4_2']
> > +cpu_type_to_flags = {
> > +       'x86-64-v2': [],
> > +       'x86-64-v3': ['AVX', 'AVX2'],
> > +       'x86-64-v4': ['AVX', 'AVX2', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'],
> > +         'nehalem': [],
> > +          'corei7': [],
> > +        'westmere': ['PCLMUL'],
> > +     'sandybridge': ['AVX', 'PCLMUL'],
> > +      'corei7-avx': ['AVX', 'PCLMUL'],
> > +       'ivybridge': ['AVX', 'PCLMUL', 'RDRND'],
> > +      'core-avx-i': ['AVX', 'PCLMUL', 'RDRND'],
> > +         'haswell': ['AVX', 'PCLMUL', 'RDRND', 'AVX2'],
> > +       'core-avx2': ['AVX', 'PCLMUL', 'RDRND', 'AVX2'],
> > +       'broadwell': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED'],
> > +         'skylake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
> > +  'skylake-avx512': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'],
> > +     'cascadelake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'],
> > +      'cannonlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA'],
> > +      'cooperlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD'],
> > +  'icelake-client': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> > +  'icelake-server': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> > +       'tigerlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> > +      'rocketlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> > +       'alderlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> > +      'raptorlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> > +      'meteorlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> > +       'gracemont': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> > +       'arrowlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> > +     'arrowlake-s': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> > +       'lunarlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> > +     'pantherlake': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> > +  'sapphirerapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> > +   'emeraldrapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> > +   'graniterapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> > + 'graniterapids-d': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> > +   'diamondrapids': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> > +      'silvermont': ['PCLMUL', 'RDRND'],
> > +             'slm': ['PCLMUL', 'RDRND'],
> > +        'goldmont': ['PCLMUL', 'RDRND', 'RDSEED', 'AES'],
> > +   'goldmont-plus': ['PCLMUL', 'RDRND', 'RDSEED', 'AES'],
> > +         'tremont': ['PCLMUL', 'RDRND', 'RDSEED', 'AES', 'GFNI'],
> > +    'sierraforest': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> > +      'grandridge': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> > +'clearwaterforest': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'GFNI'],
> > +          'bdver1': ['AVX', 'PCLMUL', 'AES'],
> > +          'bdver2': ['AVX', 'PCLMUL', 'AES'],
> > +          'bdver3': ['AVX', 'PCLMUL', 'AES'],
> > +          'bdver4': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'AES'],
> > +          'znver1': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
> > +          'znver2': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
> > +          'znver3': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ'],
> > +          'znver4': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> > +          'znver5': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES', 'VPCLMULQDQ', 'AVX512F', 'AVX512VL', 'AVX512BW', 'AVX512DQ', 'AVX512CD', 'AVX512IFMA', 'GFNI'],
> > +          'btver2': ['AVX', 'PCLMUL', 'AES'],
> > +        'lujiazui': ['PCLMUL', 'RDRND', 'RDSEED', 'AES'],
> > +        'yongfeng': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
> > +      'shijidadao': ['AVX', 'PCLMUL', 'RDRND', 'AVX2', 'RDSEED', 'AES'],
> > +}
> > +
> 
> I really don't want to have tables like this to maintain in our code if at
> all possible. We used to have something a bit similar in DPDK IIRC, but we
> found it a maintenance nightmare and just switched to using the compiler to
> do all the work. In our existing builds, we just pass the
> cpu_instruction_set parameter straight to the -march flag of the compiler.
> For MSVC support, I believe we should just do the exact same.
> 
> Maintaining lists like this will be a problem as new platforms need to be
> constantly added.

It's great that when using gcc users can just pass the CPU type to it and
that it will set all the macros corresponding to that CPU for them. I
would love to be able to rely on MSVC for that as well, unfortunately MSVC
does not provide that level of granularity, that's why I came up with the
idea of having this table.

Initially I was also very concerned about the amount of data to be stored
there, and the work required to maintain it. Then I decided to throw away
all the CPU types that do not have SSE4_2. That reduced the table in half.
Adding the entries manually was still a lot of work and error prone. So,
I decided to write some code that uses gcc to build that table for me.
I'll polish that code and add it to the patch. With that it will be almost
zero effort to maintain that table. All that will be required is running
that code on a setup with the latest gcc, providing a file with the CPU
names. Assuming gcc knows about the latest CPUs a new table will be
generated and can be pasted in the meson.build file.

I assume this does not need to be done too often. In the worst case, if it
happens that DPDK was not updated with the latest CPUs, people can still
pick an earlier CPU with similar characteristics and have similar (if not
same) performance.


> Do we also look to backport them, because if equivalence
> with the linux build is necessary then that will have to be done - as on
> Linux when a new version of GCC comes out, we can then use the new
> instruction set targets on the old releases of DPDK.

That would be nice, and I'm willing to help with that. It makes it a
better user experience if we can minimize the perceived differences
between the toolsets. I'm not sure if it's a requirement though. If
you're concerned that this would add too much overhead, it could be
decided that no such backport should happen.

> 
> > +if is_ms_compiler
> > +    cc_avx2_flags = ['/arch:AVX2']
> > +    cc_avx512_flags = ['/arch:AVX512']
> > +else
> > +    cc_avx2_flags = ['-mavx2']
> > +    cc_avx512_flags = ['-mavx512f', '-mavx512vl', '-mavx512dq', '-mavx512bw']
> > +endif
> > +
> >  cc_has_avx512 = false
> >  target_has_avx512 = false
> >  if (binutils_ok and cc.has_multi_arguments(cc_avx512_flags)
> > @@ -30,12 +217,14 @@ if (binutils_ok and cc.has_multi_arguments(cc_avx512_flags)
> >          warning('Broken _mm512_extracti64x4_epi64, disabling AVX512 support')
> >      else
> >          cc_has_avx512 = true
> > -        target_has_avx512 = (
> > -                cc.get_define('__AVX512F__', args: machine_args) != '' and
> > -                cc.get_define('__AVX512BW__', args: machine_args) != '' and
> > -                cc.get_define('__AVX512DQ__', args: machine_args) != '' and
> > -                cc.get_define('__AVX512VL__', args: machine_args) != ''
> > -            )
> > +        if not is_ms_compiler
> > +            target_has_avx512 = (
> > +                    cc.get_define('__AVX512F__', args: machine_args) != '' and
> > +                    cc.get_define('__AVX512BW__', args: machine_args) != '' and
> > +                    cc.get_define('__AVX512DQ__', args: machine_args) != '' and
> > +                    cc.get_define('__AVX512VL__', args: machine_args) != ''
> > +                )
> > +        endif
> >      endif
> >  endif
> >  
> > @@ -47,42 +236,139 @@ if not is_ms_compiler
> >      endif
> >  endif
> >  
> > -# enable restricted transactional memory intrinsics
> > -# https://gcc.gnu.org/onlinedocs/gcc/x86-transactional-memory-intrinsics.html
> > -if cc.get_id() != 'msvc'
> > -    machine_args += '-mrtm'
> > -endif
> > +if is_ms_compiler
> > +    # Determine cpu_flags for a given configuration.
> > +    # SSE instructions up to 4.2 are required for DPDK.
> > +    cpu_flags = ['SSE', 'SSE2', 'SSE3', 'SSEE3', 'SSE4_1', 'SSE4_2']
> > +
> > +    message('cpu_instruction_set: @0@'.format(cpu_instruction_set))
> >  
> > -base_flags = ['SSE', 'SSE2', 'SSE3','SSSE3', 'SSE4_1', 'SSE4_2']
> > -foreach f:base_flags
> > -    compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
> > -endforeach
> > -
> > -optional_flags = [
> > -        'AES',
> > -        'AVX',
> > -        'AVX2',
> > -        'AVX512BW',
> > -        'AVX512CD',
> > -        'AVX512DQ',
> > -        'AVX512F',
> > -        'AVX512VL',
> > -        'PCLMUL',
> > -        'RDRND',
> > -        'RDSEED',
> > -        'VPCLMULQDQ',
> > -]
> > -foreach f:optional_flags
> > -    if cc.get_define('__@0@__'.format(f), args: machine_args) == '1'
> > -        if f == 'PCLMUL' # special case flags with different defines
> > -            f = 'PCLMULQDQ'
> > -        elif f == 'RDRND'
> > -            f = 'RDRAND'
> > +    if cpu_instruction_set == ''
> > +        # Nothing to do as cpu_flags already holds all the required flags.
> > +    elif cpu_instruction_set == 'native'
> > +        # MSVC behaves differently than GCC regarding supported instruction sets.
> > +        # While GCC will create macros like __AVX512F__ when such instruction set is
> > +        # supported by the current CPU, MSVC does not do that. MSVC will create that
> > +        # macro when parameter /arch:AVX512 is passed to the compiler, even when the
> > +        # CPU does not have that instruction set (by design). So there's a need to
> > +        # look at CPUID flags to figure out what is really supported by the CPU, so
> > +        # that the correct /arch value can be passed to the compiler.
> > +        # The macros also need to be explicitly defined, as /arch will not create all
> > +        # macros GCC creates under the same conditions.
> > +        # As an example, /arch:AVX512 creates __AVX512BW__, but does not create __SSE2__.
> > +        # More details available here:
> > +        # https://learn.microsoft.com/en-us/cpp/preprocessor/predefined-macros
> > +
> > +        optional_flags = [
> > +                'PCLMUL',
> > +                'AES',
> > +                'AVX',
> > +                'RDRND',
> > +                'AVX2',
> > +                'AVX512F',
> > +                'AVX512BW',
> > +                'AVX512DQ',
> > +                'AVX512VL',
> > +                'AVX512CD',
> > +                'AVX512IFMA',
> > +                'GFNI',
> > +                'RDSEED',
> > +                'VPCLMULQDQ',
> > +        ]
> > +        foreach f:optional_flags
> > +            result = cc.run(cpuid_code, args: '-D@0@'.format(f),
> > +                            name: 'instruction set @0@'.format(f))
> > +            has_instr_set = result.returncode() == 0 and result.stdout() == '1'
> > +            if has_instr_set
> > +                cpu_flags += f
> > +            endif
> > +            message('Target has @0@: @1@'.format(f, has_instr_set))
> > +        endforeach
> > +    else
> > +        # An explicit cpu_instruction_set was provided. Get cpu_flags
> > +        # from cpu_type_to_flags table.
> > +        if cpu_instruction_set not in cpu_type_to_flags
> > +            error('CPU not known or not supported. Please update the table with known CPUs if needed.')
> >          endif
> > -        compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
> > +        cpu_flags += cpu_type_to_flags[cpu_instruction_set]
> > +    endif
> > +
> > +    # Now that all cpu_flags are known, set compile_time_cpuflags and also
> > +    # machine_args to ensure that the instruction set #defines (like __SSE2__)
> > +    # are always present in the preprocessor.
> > +    message('cpu_flags: @0@'.format(cpu_flags))
> > +
> > +    foreach flag:cpu_flags
> > +        machine_args += '/D__@0@__'.format(flag)
> > +        if flag == 'PCLMUL'
> > +            flag = 'PCLMULQDQ'
> > +        elif flag == 'RDRND'
> > +            flag = 'RDRAND'
> > +        endif
> > +        compile_time_cpuflags += ['RTE_CPUFLAG_' + flag]
> > +    endforeach
> > +
> > +    target_has_avx512 = ('AVX512F'  in cpu_flags and
> > +                         'AVX512BW' in cpu_flags and
> > +                         'AVX512DQ' in cpu_flags and
> > +                         'AVX512VL' in cpu_flags)
> > +
> > +    # Decide which instruction sets should be used by the compiler.
> > +    # With MSVC, intrinsic functions are always enabled. However, for the
> > +    # compiler to use an extended instruction set for automatically
> > +    # generated code "/arch" needs to be passed. So we instruct the compiler
> > +    # to use the largest set that is supported by the CPU. It is implied that
> > +    # smaller sets than the largest selected are included, as described here:
> > +    # https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170
> > +    if 'RTE_CPUFLAG_AVX512F' in compile_time_cpuflags
> > +        machine_args += ['/arch:AVX512']
> > +    elif 'RTE_CPUFLAG_AVX2' in compile_time_cpuflags
> > +        machine_args += ['/arch:AVX2']
> > +    elif 'RTE_CPUFLAG_AVX' in compile_time_cpuflags
> > +        machine_args += ['/arch:AVX']
> > +    else
> > +        # SSE4.2 is expected to always be available
> > +        machine_args += ['/arch:SSE4.2']
> >      endif
> > -endforeach
> >  
> 
> Since these appear to be the only /arch flags supported by the compiler for
> code generation, I would suggest that these would be the only instruction
> set flags that we support on MSVC builds, and that we then build up the
> actual CPU flags based on the minimum flags to be expected when each of
> these instruction sets is present.
> 
> Similarly with 'native', rather than supporting all the different CPU types,
> it would be a lot easier to just determine if it's an SSE4 machine, an AVX2
> machine or AVX512, and run with that.
> 
> My thinking is that getting this as a first step should get us a lot of the
> benefits without such a massive maintenance headache.

Providing only /arch option at first sight might look simpler, but I still
like the table approach much better. As I said earlier, maintaining that
table is not much work with that extra devtool.

Some drawbacks from the simpler approach:

1) User experience is not as good as it will differ from other toolsets.
   Users will have to learn about more parameter/values (how/when to use them).
2) DPDK code will not benefit from other instruction sets which might be
   present (__RDSEED__, __RDRND__, etc.) because these are not set by /arch.

Let me know your thoughts.

> 
> > +    message('machine_args: @0@'.format(machine_args))
> > +else
> > +    # enable restricted transactional memory intrinsics
> > +    # https://gcc.gnu.org/onlinedocs/gcc/x86-transactional-memory-intrinsics.html
> > +    machine_args += '-mrtm'
> > +
> > +    base_flags = ['SSE', 'SSE2', 'SSE3','SSSE3', 'SSE4_1', 'SSE4_2']
> > +    foreach f:base_flags
> > +        compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
> > +    endforeach
> > +
> > +    optional_flags = [
> > +            'AES',
> > +            'AVX',
> > +            'AVX2',
> > +            'AVX512BW',
> > +            'AVX512CD',
> > +            'AVX512DQ',
> > +            'AVX512F',
> > +            'AVX512VL',
> > +            'PCLMUL',
> > +            'RDRND',
> > +            'RDSEED',
> > +            'VPCLMULQDQ',
> > +    ]
> > +    foreach f:optional_flags
> > +        if cc.get_define('__@0@__'.format(f), args: machine_args) == '1'
> > +            if f == 'PCLMUL' # special case flags with different defines
> > +                f = 'PCLMULQDQ'
> > +            elif f == 'RDRND'
> > +                f = 'RDRAND'
> > +            endif
> > +            compile_time_cpuflags += ['RTE_CPUFLAG_' + f]
> > +        endif
> > +    endforeach
> > +endif
> > +
> > +message('compile_time_cpuflags: @0@'.format(compile_time_cpuflags))
> >  
> >  dpdk_conf.set('RTE_ARCH_X86', 1)
> >  if dpdk_conf.get('RTE_ARCH_64')
> > -- 
> > 2.48.1.vfs.0.0
> > 

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 3/6] config: allow faster instruction sets to be used with MSVC
  2025-02-26  2:01     ` Andre Muezerie
@ 2025-02-26  9:44       ` Bruce Richardson
  0 siblings, 0 replies; 34+ messages in thread
From: Bruce Richardson @ 2025-02-26  9:44 UTC (permalink / raw)
  To: Andre Muezerie; +Cc: Konstantin Ananyev, dev

On Tue, Feb 25, 2025 at 06:01:38PM -0800, Andre Muezerie wrote:
> On Tue, Feb 25, 2025 at 02:28:02PM +0000, Bruce Richardson wrote:
> > On Mon, Feb 24, 2025 at 01:01:16PM -0800, Andre Muezerie wrote:
> > > Up to now MSVC has being used with the default mode, which uses SSE2
> > > instructions for scalar floating-point and vector calculations.
> > > https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170
> > > 
> > > This patch allows users to specify the CPU for which the generated
> > > code should be optimized for in the same way it's done for GCC: by
> > > passing the CPU name.
> > > When no explicit CPU name is passed, 'native' is assumed (like it
> > > happens with GCC) and the code will be optimized for the same CPU
> > > type used to compile the code.
> > > 
> > > MSVC does not provide this functionality natively, so logic was
> > > added to meson.build to handle these differences, detecting which
> > > instruction sets are supported by the CPU(s), passing the best
> > > options to MSVC and setting the correct macros (like __AVX512F__)
> > > so that the DPDK code can rely on them like it is done with GCC.
> > > 
> > > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> > > ---
> > 
> > Hi Andre,
> > 
> > couple of initial thoughts inline below.
> > 
> > /Bruce
> > 
> > >  config/x86/meson.build | 364 ++++++++++++++++++++++++++++++++++++-----
> > >  1 file changed, 325 insertions(+), 39 deletions(-)
> > > 
> > 
> > There is quite a lot of new code to be added here. Might it be worthwhile
> > creating a "config/x86/msvc/" subdirectory with its own meson.build file to
> > handle all the complexities of using it. We can have the common material at
> > the top of the x86/meson.build file, and then do 
> > 
> > if is_ms_compiler
> >     subdir(msvc)
> >     subdir_done()
> > endif
> > 
> > leaving the rest of the file for the gcc/clang/icx code.
> 
> I think that makes sense, as there's not much common code there that is common to gcc and msvc.
> 

<snip>

> > 
> > I really don't want to have tables like this to maintain in our code if at
> > all possible. We used to have something a bit similar in DPDK IIRC, but we
> > found it a maintenance nightmare and just switched to using the compiler to
> > do all the work. In our existing builds, we just pass the
> > cpu_instruction_set parameter straight to the -march flag of the compiler.
> > For MSVC support, I believe we should just do the exact same.
> > 
> > Maintaining lists like this will be a problem as new platforms need to be
> > constantly added.
> 
> It's great that when using gcc users can just pass the CPU type to it and
> that it will set all the macros corresponding to that CPU for them. I
> would love to be able to rely on MSVC for that as well, unfortunately MSVC
> does not provide that level of granularity, that's why I came up with the
> idea of having this table.
> 
> Initially I was also very concerned about the amount of data to be stored
> there, and the work required to maintain it. Then I decided to throw away
> all the CPU types that do not have SSE4_2. That reduced the table in half.
> Adding the entries manually was still a lot of work and error prone. So,
> I decided to write some code that uses gcc to build that table for me.
> I'll polish that code and add it to the patch. With that it will be almost
> zero effort to maintain that table. All that will be required is running
> that code on a setup with the latest gcc, providing a file with the CPU
> names. Assuming gcc knows about the latest CPUs a new table will be
> generated and can be pasted in the meson.build file.
> 
> I assume this does not need to be done too often. In the worst case, if it
> happens that DPDK was not updated with the latest CPUs, people can still
> pick an earlier CPU with similar characteristics and have similar (if not
> same) performance.
> 

Thanks. Having it automated certainly helps, but I'm still not fully
convinced that it's a good idea. If everyone else is happy, though, I'm ok
to ignore those concerns :-). Let's see what others think.

> 
> > Do we also look to backport them, because if equivalence
> > with the linux build is necessary then that will have to be done - as on
> > Linux when a new version of GCC comes out, we can then use the new
> > instruction set targets on the old releases of DPDK.
> 
> That would be nice, and I'm willing to help with that. It makes it a
> better user experience if we can minimize the perceived differences
> between the toolsets. I'm not sure if it's a requirement though. If
> you're concerned that this would add too much overhead, it could be
> decided that no such backport should happen.
> 
> > 

<snip>

> > > +    if 'RTE_CPUFLAG_AVX512F' in compile_time_cpuflags
> > > +        machine_args += ['/arch:AVX512']
> > > +    elif 'RTE_CPUFLAG_AVX2' in compile_time_cpuflags
> > > +        machine_args += ['/arch:AVX2']
> > > +    elif 'RTE_CPUFLAG_AVX' in compile_time_cpuflags
> > > +        machine_args += ['/arch:AVX']
> > > +    else
> > > +        # SSE4.2 is expected to always be available
> > > +        machine_args += ['/arch:SSE4.2']
> > >      endif
> > > -endforeach
> > >  
> > 
> > Since these appear to be the only /arch flags supported by the compiler for
> > code generation, I would suggest that these would be the only instruction
> > set flags that we support on MSVC builds, and that we then build up the
> > actual CPU flags based on the minimum flags to be expected when each of
> > these instruction sets is present.
> > 
> > Similarly with 'native', rather than supporting all the different CPU types,
> > it would be a lot easier to just determine if it's an SSE4 machine, an AVX2
> > machine or AVX512, and run with that.
> > 
> > My thinking is that getting this as a first step should get us a lot of the
> > benefits without such a massive maintenance headache.
> 
> Providing only /arch option at first sight might look simpler, but I still
> like the table approach much better. As I said earlier, maintaining that
> table is not much work with that extra devtool.
> 
> Some drawbacks from the simpler approach:
> 
> 1) User experience is not as good as it will differ from other toolsets.
>    Users will have to learn about more parameter/values (how/when to use them).

This is already the case, in that different versions of compilers have
different march flags available. In practice, I think "native" and
"default" are the two most important to have from usability. For any other
values, I'd consider it advanced tuning and that the user probably already
knows what they need or want and what their compiler can provide.

> 2) DPDK code will not benefit from other instruction sets which might be
>    present (__RDSEED__, __RDRND__, etc.) because these are not set by /arch.
> 

RDRND looks to be available everywhere AVX2 is, and RDSEED whereever AVX512
is, so in a simplified model, I'd bundle those in that way.

/Bruce

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 1/5] eal: make compatible with instruction set updates for MSVC
  2025-02-26  1:06   ` [PATCH v2 1/5] eal: make compatible with instruction set updates for MSVC Andre Muezerie
@ 2025-02-26  9:50     ` Bruce Richardson
  0 siblings, 0 replies; 34+ messages in thread
From: Bruce Richardson @ 2025-02-26  9:50 UTC (permalink / raw)
  To: Andre Muezerie; +Cc: dev

On Tue, Feb 25, 2025 at 05:06:52PM -0800, Andre Muezerie wrote:
> After the instruction set updates for MSVC the error below popped up:
> 
> ../lib/eal/common/rte_random.c(6): fatal error C1083:
>     Cannot open include file: 'x86intrin.h': No such file or directory
> 
> The fix is to not include header x86intrin.h with MSVC.
> 
> Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 2/5] eal: only use numbers as align parameters for MSVC
  2025-02-26  1:06   ` [PATCH v2 2/5] eal: only use numbers as align parameters " Andre Muezerie
@ 2025-02-26  9:51     ` Bruce Richardson
  2025-02-26 10:15     ` Konstantin Ananyev
  1 sibling, 0 replies; 34+ messages in thread
From: Bruce Richardson @ 2025-02-26  9:51 UTC (permalink / raw)
  To: Andre Muezerie; +Cc: dev

On Tue, Feb 25, 2025 at 05:06:53PM -0800, Andre Muezerie wrote:
> After the instruction set updates for MSVC the error below popped up:
> 
> ..\lib\eal\x86\include\rte_vect.h(82): error C2059: syntax error: '('
> 
> The issue is that MSVC does not allow __rte_aligned(RTE_X86_ZMM_SIZE).
> It only accepts numbers that are power of 2. So, even though
> RTE_X86_ZMM_SIZE represents a number that is a power of two it cannot
> be used directly.
> https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170
> 
> Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 3/5] config: create top level variable cc_avx2_flags
  2025-02-26  1:06   ` [PATCH v2 3/5] config: create top level variable cc_avx2_flags Andre Muezerie
@ 2025-02-26  9:52     ` Bruce Richardson
  0 siblings, 0 replies; 34+ messages in thread
From: Bruce Richardson @ 2025-02-26  9:52 UTC (permalink / raw)
  To: Andre Muezerie; +Cc: dev

On Tue, Feb 25, 2025 at 05:06:54PM -0800, Andre Muezerie wrote:
> Top level variable cc_avx2_flags was created and set according to the
> compiler used. It is used in lower level directories.
> 
> Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> ---
>  config/x86/meson.build | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
Acked-by: Bruce Richardson <bruce.richardson@intel.com>


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 4/5] drivers/net: make compatible with instruction set updates for MSVC
  2025-02-26  1:06   ` [PATCH v2 4/5] drivers/net: make compatible with instruction set updates for MSVC Andre Muezerie
@ 2025-02-26  9:53     ` Bruce Richardson
  0 siblings, 0 replies; 34+ messages in thread
From: Bruce Richardson @ 2025-02-26  9:53 UTC (permalink / raw)
  To: Andre Muezerie; +Cc: dev

On Tue, Feb 25, 2025 at 05:06:55PM -0800, Andre Muezerie wrote:
> Top level 'cc_avx2_flags' was created and holds the correct flags
> depending on the compiler used.
> 
> Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v2 5/5] acl: make compatible with instruction set updates for MSVC
  2025-02-26  1:06   ` [PATCH v2 5/5] acl: " Andre Muezerie
@ 2025-02-26  9:54     ` Bruce Richardson
  0 siblings, 0 replies; 34+ messages in thread
From: Bruce Richardson @ 2025-02-26  9:54 UTC (permalink / raw)
  To: Andre Muezerie; +Cc: dev

On Tue, Feb 25, 2025 at 05:06:56PM -0800, Andre Muezerie wrote:
> Top level 'cc_avx2_flags' was created and holds the correct flags
> depending on the compiler used.
> 
> Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> ---
Acked-by: Bruce Richardson <bruce.richardson@intel.com>

This patch can probably be merged with the patch introducing the new build
variable. It and the previous-but-one patch are both very small and are
probably better merged.


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v2 2/5] eal: only use numbers as align parameters for MSVC
  2025-02-26  1:06   ` [PATCH v2 2/5] eal: only use numbers as align parameters " Andre Muezerie
  2025-02-26  9:51     ` Bruce Richardson
@ 2025-02-26 10:15     ` Konstantin Ananyev
  1 sibling, 0 replies; 34+ messages in thread
From: Konstantin Ananyev @ 2025-02-26 10:15 UTC (permalink / raw)
  To: Andre Muezerie; +Cc: dev



> -----Original Message-----
> From: Andre Muezerie <andremue@linux.microsoft.com>
> Sent: Wednesday, February 26, 2025 1:07 AM
> To: andremue@linux.microsoft.com
> Cc: dev@dpdk.org
> Subject: [PATCH v2 2/5] eal: only use numbers as align parameters for MSVC
> 
> After the instruction set updates for MSVC the error below popped up:
> 
> ..\lib\eal\x86\include\rte_vect.h(82): error C2059: syntax error: '('
> 
> The issue is that MSVC does not allow __rte_aligned(RTE_X86_ZMM_SIZE).
> It only accepts numbers that are power of 2. So, even though
> RTE_X86_ZMM_SIZE represents a number that is a power of two it cannot
> be used directly.
> https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170
> 
> Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> ---
>  lib/eal/x86/include/rte_vect.h | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/eal/x86/include/rte_vect.h b/lib/eal/x86/include/rte_vect.h
> index 70c78e9b77..0a51c539a4 100644
> --- a/lib/eal/x86/include/rte_vect.h
> +++ b/lib/eal/x86/include/rte_vect.h
> @@ -79,7 +79,16 @@ __extension__ ({                \
>  #define RTE_X86_ZMM_SIZE	(sizeof(__m512i))
>  #define RTE_X86_ZMM_MASK	(RTE_X86_ZMM_SIZE - 1)
> 
> -typedef union __rte_aligned(RTE_X86_ZMM_SIZE) __rte_x86_zmm {
> +/*
> + * MSVC does not allow __rte_aligned(RTE_X86_ZMM_SIZE). It only accepts
> + * numbers that are power of 2. So, even though RTE_X86_ZMM_SIZE represents a
> + * number that is a power of two it cannot be used directly.
> + * Ref: https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170
> + * The static assert below ensures that RTE_X86_ZMM_SIZE is equal to what is
> + * used in the __rte_aligned() expression.
> + */
> +static_assert(RTE_X86_ZMM_SIZE == 64, "Unexpected size of __m512i");
> +typedef union __rte_aligned(64) __rte_x86_zmm {

Just wonder, would then MSVC understand something like:
#define RTE_X86_ZMM_SIZE	64
static_assert(RTE_X86_ZMM_SIZE == sizeof((__m512i), "Unexpected size of __m512i");
typedef union __rte_aligned(RTE_X86_ZMM_SIZE) __rte_x86_zmm {
? 

>  	__m512i	 z;
>  	ymm_t    y[RTE_X86_ZMM_SIZE / sizeof(ymm_t)];
>  	xmm_t    x[RTE_X86_ZMM_SIZE / sizeof(xmm_t)];
> --
> 2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 0/3] allow faster instruction sets to be used with MSVC
  2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie
                   ` (6 preceding siblings ...)
  2025-02-26  1:06 ` [PATCH v2 0/5] allow faster instruction sets to be used with MSVC Andre Muezerie
@ 2025-02-26 21:21 ` Andre Muezerie
  2025-02-26 21:21   ` [PATCH v3 1/3] eal: make compatible with instruction set updates for MSVC Andre Muezerie
                     ` (3 more replies)
  7 siblings, 4 replies; 34+ messages in thread
From: Andre Muezerie @ 2025-02-26 21:21 UTC (permalink / raw)
  To: andremue; +Cc: dev

Meson.build files were refactored to add MSVC support for the AVX2
instruction set with MSVC: Top level variable cc_avx2_flags was created
and set according to the compiler used. It is used in lower level
directories.

Andre Muezerie (3):
  eal: make compatible with instruction set updates for MSVC
  eal: only use numbers as align parameters for MSVC
  config: create top level variable cc_avx2_flags

 config/x86/meson.build             |  6 ++++++
 drivers/net/bnxt/meson.build       |  2 +-
 drivers/net/enic/meson.build       |  2 +-
 drivers/net/intel/i40e/meson.build |  2 +-
 drivers/net/intel/iavf/meson.build |  2 +-
 drivers/net/intel/ice/meson.build  |  2 +-
 drivers/net/intel/idpf/meson.build |  2 +-
 drivers/net/nfp/meson.build        |  2 +-
 drivers/net/octeon_ep/meson.build  |  4 ++--
 lib/acl/meson.build                |  2 +-
 lib/eal/common/rte_random.c        |  2 ++
 lib/eal/x86/include/rte_vect.h     | 11 ++++++++++-
 12 files changed, 28 insertions(+), 11 deletions(-)

--
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 1/3] eal: make compatible with instruction set updates for MSVC
  2025-02-26 21:21 ` [PATCH v3 0/3] allow faster instruction sets to be used with MSVC Andre Muezerie
@ 2025-02-26 21:21   ` Andre Muezerie
  2025-02-26 21:21   ` [PATCH v3 2/3] eal: only use numbers as align parameters " Andre Muezerie
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 34+ messages in thread
From: Andre Muezerie @ 2025-02-26 21:21 UTC (permalink / raw)
  To: andremue; +Cc: dev

After the instruction set updates for MSVC the error below popped up:

../lib/eal/common/rte_random.c(6): fatal error C1083:
    Cannot open include file: 'x86intrin.h': No such file or directory

The fix is to not include header x86intrin.h with MSVC.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/eal/common/rte_random.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/eal/common/rte_random.c b/lib/eal/common/rte_random.c
index 8e62578176..9354358818 100644
--- a/lib/eal/common/rte_random.c
+++ b/lib/eal/common/rte_random.c
@@ -3,8 +3,10 @@
  */
 
 #ifdef __RDSEED__
+#ifndef RTE_TOOLCHAIN_MSVC
 #include <x86intrin.h>
 #endif
+#endif
 #include <unistd.h>
 
 #include <rte_bitops.h>
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 2/3] eal: only use numbers as align parameters for MSVC
  2025-02-26 21:21 ` [PATCH v3 0/3] allow faster instruction sets to be used with MSVC Andre Muezerie
  2025-02-26 21:21   ` [PATCH v3 1/3] eal: make compatible with instruction set updates for MSVC Andre Muezerie
@ 2025-02-26 21:21   ` Andre Muezerie
  2025-02-27  8:39     ` Konstantin Ananyev
  2025-02-26 21:21   ` [PATCH v3 3/3] config: create top level variable cc_avx2_flags Andre Muezerie
  2025-03-06 13:57   ` [PATCH v3 0/3] allow faster instruction sets to be used with MSVC David Marchand
  3 siblings, 1 reply; 34+ messages in thread
From: Andre Muezerie @ 2025-02-26 21:21 UTC (permalink / raw)
  To: andremue; +Cc: dev

After the instruction set updates for MSVC the error below popped up:

..\lib\eal\x86\include\rte_vect.h(82): error C2059: syntax error: '('

The issue is that MSVC does not allow __rte_aligned(RTE_X86_ZMM_SIZE).
It only accepts numbers that are power of 2. So, even though
RTE_X86_ZMM_SIZE represents a number that is a power of two it cannot
be used directly.
https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 lib/eal/x86/include/rte_vect.h | 11 ++++++++++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/lib/eal/x86/include/rte_vect.h b/lib/eal/x86/include/rte_vect.h
index 70c78e9b77..9ea158b27e 100644
--- a/lib/eal/x86/include/rte_vect.h
+++ b/lib/eal/x86/include/rte_vect.h
@@ -76,9 +76,18 @@ __extension__ ({                \
 
 #ifdef __AVX512F__
 
-#define RTE_X86_ZMM_SIZE	(sizeof(__m512i))
+#define RTE_X86_ZMM_SIZE        64
 #define RTE_X86_ZMM_MASK	(RTE_X86_ZMM_SIZE - 1)
 
+/*
+ * MSVC does not allow __rte_aligned(sizeof(__m512i)). It only accepts
+ * numbers that are power of 2. So, even though sizeof(__m512i) represents a
+ * number that is a power of two it cannot be used directly.
+ * Ref: https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170
+ * The static assert below ensures that the hardcoded value defined as
+ * RTE_X86_ZMM_SIZE is equal to sizeof(__m512i).
+ */
+static_assert(RTE_X86_ZMM_SIZE == (sizeof(__m512i)), "Unexpected size of __m512i");
 typedef union __rte_aligned(RTE_X86_ZMM_SIZE) __rte_x86_zmm {
 	__m512i	 z;
 	ymm_t    y[RTE_X86_ZMM_SIZE / sizeof(ymm_t)];
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH v3 3/3] config: create top level variable cc_avx2_flags
  2025-02-26 21:21 ` [PATCH v3 0/3] allow faster instruction sets to be used with MSVC Andre Muezerie
  2025-02-26 21:21   ` [PATCH v3 1/3] eal: make compatible with instruction set updates for MSVC Andre Muezerie
  2025-02-26 21:21   ` [PATCH v3 2/3] eal: only use numbers as align parameters " Andre Muezerie
@ 2025-02-26 21:21   ` Andre Muezerie
  2025-03-06 13:57   ` [PATCH v3 0/3] allow faster instruction sets to be used with MSVC David Marchand
  3 siblings, 0 replies; 34+ messages in thread
From: Andre Muezerie @ 2025-02-26 21:21 UTC (permalink / raw)
  To: andremue; +Cc: dev

Top level variable cc_avx2_flags was created and set according to the
compiler used. It is used in lower level directories like drivers/net
and lib/acl.

Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
Acked-by: Bruce Richardson <bruce.richardson@intel.com>
---
 config/x86/meson.build             | 6 ++++++
 drivers/net/bnxt/meson.build       | 2 +-
 drivers/net/enic/meson.build       | 2 +-
 drivers/net/intel/i40e/meson.build | 2 +-
 drivers/net/intel/iavf/meson.build | 2 +-
 drivers/net/intel/ice/meson.build  | 2 +-
 drivers/net/intel/idpf/meson.build | 2 +-
 drivers/net/nfp/meson.build        | 2 +-
 drivers/net/octeon_ep/meson.build  | 4 ++--
 lib/acl/meson.build                | 2 +-
 10 files changed, 16 insertions(+), 10 deletions(-)

diff --git a/config/x86/meson.build b/config/x86/meson.build
index 47a5b0c04a..941d4d921f 100644
--- a/config/x86/meson.build
+++ b/config/x86/meson.build
@@ -1,6 +1,12 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2017-2020 Intel Corporation
 
+if is_ms_compiler
+    cc_avx2_flags = ['/arch:AVX2']
+else
+    cc_avx2_flags = ['-mavx2']
+endif
+
 # get binutils version for the workaround of Bug 97
 binutils_ok = true
 if is_linux or cc.get_id() == 'gcc'
diff --git a/drivers/net/bnxt/meson.build b/drivers/net/bnxt/meson.build
index e26cf13a65..fd82d0c409 100644
--- a/drivers/net/bnxt/meson.build
+++ b/drivers/net/bnxt/meson.build
@@ -65,7 +65,7 @@ if arch_subdir == 'x86'
                 static_rte_bus_pci,
                 static_rte_kvargs, static_rte_hash],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
      objs += bnxt_avx2_lib.extract_objects('bnxt_rxtx_vec_avx2.c')
 elif arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
     sources += files('bnxt_rxtx_vec_neon.c')
diff --git a/drivers/net/enic/meson.build b/drivers/net/enic/meson.build
index 1e26338350..cfe5ec170a 100644
--- a/drivers/net/enic/meson.build
+++ b/drivers/net/enic/meson.build
@@ -38,7 +38,7 @@ if dpdk_conf.has('RTE_ARCH_X86_64')
             'enic_rxtx_vec_avx2.c',
             dependencies: [static_rte_ethdev, static_rte_bus_pci],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += enic_avx2_lib.extract_objects('enic_rxtx_vec_avx2.c')
 endif
 
diff --git a/drivers/net/intel/i40e/meson.build b/drivers/net/intel/i40e/meson.build
index ffa40c5d64..2973ed1a01 100644
--- a/drivers/net/intel/i40e/meson.build
+++ b/drivers/net/intel/i40e/meson.build
@@ -49,7 +49,7 @@ if arch_subdir == 'x86'
             'i40e_rxtx_vec_avx2.c',
             dependencies: [static_rte_ethdev, static_rte_kvargs, static_rte_hash],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += i40e_avx2_lib.extract_objects('i40e_rxtx_vec_avx2.c')
 
     if cc_has_avx512
diff --git a/drivers/net/intel/iavf/meson.build b/drivers/net/intel/iavf/meson.build
index 19cd1cfbc8..f7eac7c57a 100644
--- a/drivers/net/intel/iavf/meson.build
+++ b/drivers/net/intel/iavf/meson.build
@@ -37,7 +37,7 @@ if arch_subdir == 'x86'
             'iavf_rxtx_vec_avx2.c',
             dependencies: [static_rte_ethdev],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += iavf_avx2_lib.extract_objects('iavf_rxtx_vec_avx2.c')
 
     if cc_has_avx512
diff --git a/drivers/net/intel/ice/meson.build b/drivers/net/intel/ice/meson.build
index a34b7c966a..cbdf38c1c4 100644
--- a/drivers/net/intel/ice/meson.build
+++ b/drivers/net/intel/ice/meson.build
@@ -43,7 +43,7 @@ if arch_subdir == 'x86'
             'ice_rxtx_vec_avx2.c',
             dependencies: [static_rte_ethdev, static_rte_hash],
             include_directories: includes,
-            c_args: [cflags, '-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += ice_avx2_lib.extract_objects('ice_rxtx_vec_avx2.c')
 
     if cc_has_avx512
diff --git a/drivers/net/intel/idpf/meson.build b/drivers/net/intel/idpf/meson.build
index 802b13035b..4b272d02b1 100644
--- a/drivers/net/intel/idpf/meson.build
+++ b/drivers/net/intel/idpf/meson.build
@@ -23,7 +23,7 @@ if arch_subdir == 'x86' and dpdk_conf.get('RTE_IOVA_IN_MBUF') == 1
         'idpf_common_rxtx_avx2.c',
         dependencies: [static_rte_ethdev, static_rte_hash],
         include_directories: includes,
-        c_args: [cflags, '-mavx2'])
+        c_args: [cflags, cc_avx2_flags])
     objs += idpf_avx2_lib.extract_objects('idpf_common_rxtx_avx2.c')
 
     if cc_has_avx512
diff --git a/drivers/net/nfp/meson.build b/drivers/net/nfp/meson.build
index 39762bd45a..0a12b7dce7 100644
--- a/drivers/net/nfp/meson.build
+++ b/drivers/net/nfp/meson.build
@@ -61,7 +61,7 @@ if arch_subdir == 'x86'
                 avx2_sources,
                 dependencies: [static_rte_ethdev, static_rte_bus_pci],
                 include_directories: includes,
-                c_args: [cflags, '-mavx2']
+                c_args: [cflags, cc_avx2_flags]
         )
 
         objs += nfp_avx2_lib.extract_all_objects(recursive: true)
diff --git a/drivers/net/octeon_ep/meson.build b/drivers/net/octeon_ep/meson.build
index d5d40b23a1..1b34db3edc 100644
--- a/drivers/net/octeon_ep/meson.build
+++ b/drivers/net/octeon_ep/meson.build
@@ -18,13 +18,13 @@ if arch_subdir == 'x86'
     if cc.get_define('__AVX2__', args: machine_args) != ''
         cflags += ['-DCC_AVX2_SUPPORT']
         sources += files('cnxk_ep_rx_avx.c')
-    elif cc.has_argument('-mavx2')
+    elif cc.has_multi_arguments(cc_avx2_flags)
         cflags += ['-DCC_AVX2_SUPPORT']
         otx_ep_avx2_lib = static_library('otx_ep_avx2_lib',
                         'cnxk_ep_rx_avx.c',
                         dependencies: [static_rte_ethdev, static_rte_pci, static_rte_bus_pci],
                         include_directories: includes,
-                        c_args: [cflags, '-mavx2'])
+                        c_args: [cflags, cc_avx2_flags])
         objs += otx_ep_avx2_lib.extract_objects('cnxk_ep_rx_avx.c')
     endif
 endif
diff --git a/lib/acl/meson.build b/lib/acl/meson.build
index fefe131a48..a28d70c23f 100644
--- a/lib/acl/meson.build
+++ b/lib/acl/meson.build
@@ -19,7 +19,7 @@ if dpdk_conf.has('RTE_ARCH_X86')
     avx2_tmplib = static_library('avx2_tmp',
             'acl_run_avx2.c',
             dependencies: static_rte_eal,
-            c_args: cflags + ['-mavx2'])
+            c_args: [cflags, cc_avx2_flags])
     objs += avx2_tmplib.extract_objects('acl_run_avx2.c')
 
     # compile AVX512 version if:
-- 
2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* RE: [PATCH v3 2/3] eal: only use numbers as align parameters for MSVC
  2025-02-26 21:21   ` [PATCH v3 2/3] eal: only use numbers as align parameters " Andre Muezerie
@ 2025-02-27  8:39     ` Konstantin Ananyev
  0 siblings, 0 replies; 34+ messages in thread
From: Konstantin Ananyev @ 2025-02-27  8:39 UTC (permalink / raw)
  To: Andre Muezerie; +Cc: dev



> 
> After the instruction set updates for MSVC the error below popped up:
> 
> ..\lib\eal\x86\include\rte_vect.h(82): error C2059: syntax error: '('
> 
> The issue is that MSVC does not allow __rte_aligned(RTE_X86_ZMM_SIZE).
> It only accepts numbers that are power of 2. So, even though
> RTE_X86_ZMM_SIZE represents a number that is a power of two it cannot
> be used directly.
> https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170
> 
> Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> Acked-by: Bruce Richardson <bruce.richardson@intel.com>
> ---
>  lib/eal/x86/include/rte_vect.h | 11 ++++++++++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/eal/x86/include/rte_vect.h b/lib/eal/x86/include/rte_vect.h
> index 70c78e9b77..9ea158b27e 100644
> --- a/lib/eal/x86/include/rte_vect.h
> +++ b/lib/eal/x86/include/rte_vect.h
> @@ -76,9 +76,18 @@ __extension__ ({                \
> 
>  #ifdef __AVX512F__
> 
> -#define RTE_X86_ZMM_SIZE	(sizeof(__m512i))
> +#define RTE_X86_ZMM_SIZE        64
>  #define RTE_X86_ZMM_MASK	(RTE_X86_ZMM_SIZE - 1)
> 
> +/*
> + * MSVC does not allow __rte_aligned(sizeof(__m512i)). It only accepts
> + * numbers that are power of 2. So, even though sizeof(__m512i) represents a
> + * number that is a power of two it cannot be used directly.
> + * Ref: https://learn.microsoft.com/en-us/cpp/cpp/align-cpp?view=msvc-170
> + * The static assert below ensures that the hardcoded value defined as
> + * RTE_X86_ZMM_SIZE is equal to sizeof(__m512i).
> + */
> +static_assert(RTE_X86_ZMM_SIZE == (sizeof(__m512i)), "Unexpected size of __m512i");
>  typedef union __rte_aligned(RTE_X86_ZMM_SIZE) __rte_x86_zmm {
>  	__m512i	 z;
>  	ymm_t    y[RTE_X86_ZMM_SIZE / sizeof(ymm_t)];
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>

> 2.48.1.vfs.0.0


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH v3 0/3] allow faster instruction sets to be used with MSVC
  2025-02-26 21:21 ` [PATCH v3 0/3] allow faster instruction sets to be used with MSVC Andre Muezerie
                     ` (2 preceding siblings ...)
  2025-02-26 21:21   ` [PATCH v3 3/3] config: create top level variable cc_avx2_flags Andre Muezerie
@ 2025-03-06 13:57   ` David Marchand
  3 siblings, 0 replies; 34+ messages in thread
From: David Marchand @ 2025-03-06 13:57 UTC (permalink / raw)
  To: Andre Muezerie; +Cc: dev

On Wed, Feb 26, 2025 at 10:21 PM Andre Muezerie
<andremue@linux.microsoft.com> wrote:
>
> Meson.build files were refactored to add MSVC support for the AVX2
> instruction set with MSVC: Top level variable cc_avx2_flags was created
> and set according to the compiler used. It is used in lower level
> directories.
>
> Andre Muezerie (3):
>   eal: make compatible with instruction set updates for MSVC
>   eal: only use numbers as align parameters for MSVC
>   config: create top level variable cc_avx2_flags
>
>  config/x86/meson.build             |  6 ++++++
>  drivers/net/bnxt/meson.build       |  2 +-
>  drivers/net/enic/meson.build       |  2 +-
>  drivers/net/intel/i40e/meson.build |  2 +-
>  drivers/net/intel/iavf/meson.build |  2 +-
>  drivers/net/intel/ice/meson.build  |  2 +-
>  drivers/net/intel/idpf/meson.build |  2 +-
>  drivers/net/nfp/meson.build        |  2 +-
>  drivers/net/octeon_ep/meson.build  |  4 ++--
>  lib/acl/meson.build                |  2 +-
>  lib/eal/common/rte_random.c        |  2 ++
>  lib/eal/x86/include/rte_vect.h     | 11 ++++++++++-
>  12 files changed, 28 insertions(+), 11 deletions(-)

Series applied, thanks André.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2025-03-06 13:57 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-24 21:01 [PATCH 0/6] allow faster instruction sets to be used with MSVC Andre Muezerie
2025-02-24 21:01 ` [PATCH 1/6] eal: make compatible with instruction set updates for MSVC Andre Muezerie
2025-02-24 21:01 ` [PATCH 2/6] eal: only use numbers as align parameters " Andre Muezerie
2025-02-24 21:01 ` [PATCH 3/6] config: allow faster instruction sets to be used with MSVC Andre Muezerie
2025-02-25 14:28   ` Bruce Richardson
2025-02-26  2:01     ` Andre Muezerie
2025-02-26  9:44       ` Bruce Richardson
2025-02-24 21:01 ` [PATCH 4/6] drivers/net: make compatible with instruction set updates for MSVC Andre Muezerie
2025-02-25  9:06   ` Bruce Richardson
2025-02-25 16:44     ` Andre Muezerie
2025-02-24 21:01 ` [PATCH 5/6] acl: " Andre Muezerie
2025-02-25  9:03   ` Bruce Richardson
2025-02-25 16:37     ` Andre Muezerie
2025-02-25 17:21       ` Bruce Richardson
2025-02-25 17:23         ` Andre Muezerie
2025-02-24 21:01 ` [PATCH 6/6] member: " Andre Muezerie
2025-02-26  1:06 ` [PATCH v2 0/5] allow faster instruction sets to be used with MSVC Andre Muezerie
2025-02-26  1:06   ` [PATCH v2 1/5] eal: make compatible with instruction set updates for MSVC Andre Muezerie
2025-02-26  9:50     ` Bruce Richardson
2025-02-26  1:06   ` [PATCH v2 2/5] eal: only use numbers as align parameters " Andre Muezerie
2025-02-26  9:51     ` Bruce Richardson
2025-02-26 10:15     ` Konstantin Ananyev
2025-02-26  1:06   ` [PATCH v2 3/5] config: create top level variable cc_avx2_flags Andre Muezerie
2025-02-26  9:52     ` Bruce Richardson
2025-02-26  1:06   ` [PATCH v2 4/5] drivers/net: make compatible with instruction set updates for MSVC Andre Muezerie
2025-02-26  9:53     ` Bruce Richardson
2025-02-26  1:06   ` [PATCH v2 5/5] acl: " Andre Muezerie
2025-02-26  9:54     ` Bruce Richardson
2025-02-26 21:21 ` [PATCH v3 0/3] allow faster instruction sets to be used with MSVC Andre Muezerie
2025-02-26 21:21   ` [PATCH v3 1/3] eal: make compatible with instruction set updates for MSVC Andre Muezerie
2025-02-26 21:21   ` [PATCH v3 2/3] eal: only use numbers as align parameters " Andre Muezerie
2025-02-27  8:39     ` Konstantin Ananyev
2025-02-26 21:21   ` [PATCH v3 3/3] config: create top level variable cc_avx2_flags Andre Muezerie
2025-03-06 13:57   ` [PATCH v3 0/3] allow faster instruction sets to be used with MSVC David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).