Re: [PATCH 1/2] config: allow AVX512 instructions to be used with MSVC

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Andre Muezerie <andremue@linux.microsoft.com>
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>,
	Yipeng Wang <yipeng1.wang@intel.com>,
	Sameh Gobriel <sameh.gobriel@intel.com>,
	dev@dpdk.org
Subject: Re: [PATCH 1/2] config: allow AVX512 instructions to be used with MSVC
Date: Fri, 28 Feb 2025 16:47:55 -0800	[thread overview]
Message-ID: <20250301004755.GB2378@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> (raw)
In-Reply-To: <Z8HIngY9yOjI3tTQ@bricha3-mobl1.ger.corp.intel.com>

On Fri, Feb 28, 2025 at 02:30:54PM +0000, Bruce Richardson wrote:
> On Thu, Feb 27, 2025 at 05:52:16PM -0800, Andre Muezerie wrote:
> > Up to now MSVC has being used with the default mode, which uses SSE2
> > instructions for scalar floating-point and vector calculations.
> > https://learn.microsoft.com/en-us/cpp/build/reference/arch-x64?view=msvc-170
> > 
> > This patch allows users to specify the CPU for which the generated
> > code should be optimized for in the same way it's done for GCC: by
> > passing the CPU name.
> > When no explicit CPU name is passed, 'native' is assumed (like it
> > happens with GCC) and the code will be optimized for the same CPU
> > type used to compile the code.
> > 
> > MSVC does not provide this functionality natively, so logic was
> > added to a new meson.build file under config/x86/msvc to handle
> > these differences, detecting which
> > instruction sets are supported by the CPU(s), passing the best
> > options to MSVC and setting the correct macros (like __AVX512F__)
> > so that the DPDK code can rely on them like it is done with GCC.
> > 
> > Signed-off-by: Andre Muezerie <andremue@linux.microsoft.com>
> 
> Thanks for splitting out this change from the earlier one, it allows more
> focused review.
> However, I think within this, we can split things a bit further two, and I
> think this patch could do with being split into separate changes.
> Specifically:
> * one patch for reordering the x86/meson.build file, i.e. just move the
>   code about and put a subdir_done() for MSVC once common part is
>   completed. That patch is a quick review since all you are doing is moving
>   things about.
> * separate patch for adding the new msvc file +  one-line change to add it
>   to the x86/meson.build file.
> * library changes in separate patch - or perhaps two patches so individual
>   maintainers can ack.
> 

I split them as suggested. In that process I found an old bug for which I sent
a patch as well.

> Some other comments inline below too.
> Thanks,
> /Bruce
> 
> > ---
> >  config/x86/meson.build      |  87 +++++------

<snip>

> > -
> > -cpu_instruction_set = get_option('cpu_instruction_set')
> > -if cpu_instruction_set == 'native'
> > -    foreach m:epyc_zen_cores.keys()
> > -        if cc.get_define(m, args: machine_args) != ''
> > -            dpdk_conf.set('RTE_MAX_LCORE', epyc_zen_cores[m])
> > -            break
> > -        endif
> > -    endforeach
> > -else
> > -    foreach m:epyc_zen_cores.keys()
> > -        if m.contains(cpu_instruction_set)
> > -            dpdk_conf.set('RTE_MAX_LCORE', epyc_zen_cores[m])
> > -            break
> > -        endif
> > -    endforeach
> > -endif
> > -
> > -dpdk_conf.set('RTE_MAX_NUMA_NODES', 32)
> 
> For changes to config/x86/meson.build you can add my Ack if it's a separate
> patch.

Done, and added the ACK.

> 
> > diff --git a/config/x86/msvc/meson.build b/config/x86/msvc/meson.build
> > new file mode 100644
> > index 0000000000..646c9a8515
> > --- /dev/null
> > +++ b/config/x86/msvc/meson.build
> > @@ -0,0 +1,287 @@

<snip>

> > +            'VPCLMULQDQ',
> > +    ]
> > +    foreach f:optional_flags
> > +        result = cc.run(cpuid_code, args: '-D@0@'.format(f),
> > +                        name: 'instruction set @0@'.format(f))
> 
> Is building a new binary for each instruction set the best way to do this?
> Would it not be better to have a single binary that outputs all the
> instruction sets in one go?

Building a new binary every time is probably the simplest approach. It also
makes teh code easy to read and understand. It is fast, and since it will
probably not be run more than once per DPDK release there's not much incentive
to come up with something better.

> 
> > +        has_instr_set = result.returncode() == 0 and result.stdout() == '1'
> > +        if has_instr_set
> > +            cpu_flags += f
> > +        endif
> > +        message('Target has @0@: @1@'.format(f, has_instr_set))
> > -                c_args: cflags +
> > -                    ['-mavx512f', '-mavx512vl',
> > -                     '-mavx512cd', '-mavx512bw'])
> > +                c_args: cflags + cc_avx512_flags)
> >              objs += avx512_tmplib.extract_objects(
> >                      'acl_run_avx512.c')
> >              cflags += '-DCC_AVX512_SUPPORT'
> 
> Ack from me for this change too.

I forgot to add the ACK.

> 
> > diff --git a/lib/member/meson.build b/lib/member/meson.build
> > index f92cbb7f25..8416dc6f8a 100644
> > --- a/lib/member/meson.build
> > +++ b/lib/member/meson.build
> > @@ -33,6 +33,12 @@ if dpdk_conf.has('RTE_ARCH_X86_64') and binutils_ok
> >      # compiler flags, and then have the .o file from static lib
> >      # linked into main lib.
> >  
> > +    if is_ms_compiler
> > +        member_avx512_args = cc_avx512_flags
> > +    else
> > +        member_avx512_args = ['-mavx512f', '-mavx512dq', '-mavx512ifma']
> > +    endif
> > +
> 
> is ifma included as msvc as part of the AVX512 flags? How does this work
> for the windows build?

Option /arch:AVX512 does not set __AVX512IFMA__ so, my educated guess is
that msvc does not automatically generate instructions from that set.

There's also no msvc option specific for AVX512IFMA.

That being said, compiler intrinsics would still work as these do not require
any options to be passed to msvc. This is other difference when compared
to GCC.

Relying on cc_avx512_flags for msvc would still bring the benefit of using
other AVX512 (non-IFMA) instructions.

> 
> >      # check if all required flags already enabled
> >      sketch_avx512_flags = ['__AVX512F__', '__AVX512DQ__', '__AVX512IFMA__']
> >  
> > @@ -46,13 +52,12 @@ if dpdk_conf.has('RTE_ARCH_X86_64') and binutils_ok
> >      if sketch_avx512_on == true
> >          cflags += ['-DCC_AVX512_SUPPORT']
> >          sources += files('rte_member_sketch_avx512.c')
> > -    elif cc.has_multi_arguments('-mavx512f', '-mavx512dq', '-mavx512ifma')
> > +    elif cc.has_multi_arguments(member_avx512_args)
> >          sketch_avx512_tmp = static_library('sketch_avx512_tmp',
> >              'rte_member_sketch_avx512.c',
> >              include_directories: includes,
> >              dependencies: [static_rte_eal, static_rte_hash],
> > -            c_args: cflags +
> > -                ['-mavx512f', '-mavx512dq', '-mavx512ifma'])
> > +            c_args: cflags + member_avx512_args)
> >          objs += sketch_avx512_tmp.extract_objects('rte_member_sketch_avx512.c')
> >          cflags += ['-DCC_AVX512_SUPPORT']
> >      endif
> > -- 
> > 2.48.1.vfs.0.0
> >

next prev parent reply	other threads:[~2025-03-01  0:48 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-28  1:52 [PATCH 0/2] " Andre Muezerie
2025-02-28  1:52 ` [PATCH 1/2] config: " Andre Muezerie
2025-02-28 14:30   ` Bruce Richardson
2025-03-01  0:47     ` Andre Muezerie [this message]
2025-02-28  1:52 ` [PATCH 2/2] devtools/dump-cpu-flags: add tool to update CPU flags table Andre Muezerie
2025-02-28 14:33   ` Bruce Richardson
2025-03-01  0:38     ` Andre Muezerie
2025-02-28 21:43 ` [PATCH v2 0/2] allow AVX512 instructions to be used with MSVC Andre Muezerie
2025-02-28 21:43   ` [PATCH v2 1/2] config: " Andre Muezerie
2025-02-28 21:43   ` [PATCH v2 2/2] devtools/dump-cpu-flags: add tool to update CPU flags table Andre Muezerie
2025-03-06 19:08 ` [PATCH v3 0/2] allow AVX512 instructions to be used with MSVC Andre Muezerie
2025-03-06 19:08   ` [PATCH v3 1/2] config: " Andre Muezerie
2025-03-06 19:08   ` [PATCH v3 2/2] devtools/dump-cpu-flags: add tool to update CPU flags table Andre Muezerie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20250301004755.GB2378@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net \
    --to=andremue@linux.microsoft.com \
    --cc=bruce.richardson@intel.com \
    --cc=dev@dpdk.org \
    --cc=konstantin.v.ananyev@yandex.ru \
    --cc=sameh.gobriel@intel.com \
    --cc=yipeng1.wang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).