DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] How to disable SVE auto vectorization while using GCC
@ 2021-04-30 11:57 fengchengwen
  2021-04-30 15:11 ` Jerin Jacob
  0 siblings, 1 reply; 10+ messages in thread
From: fengchengwen @ 2021-04-30 11:57 UTC (permalink / raw)
  To: dev; +Cc: jerinj, ruifeng.wang, humin29

Hi, ALL
We have a question for your help:
  1. We have two platforms, both of which are ARM64, one of which supports
     both NEON and SVE, the other only support NEON.
  2. We want to run on both platforms with a single binary file, and use the
     highest vector capability of the corresponding platform whenever possible.
  3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC 10.2).
     However, it is found that invalid instructions occur when the program
     runs on a machine that does not support SVE (pls see below).
  4. The problem is caused by the introduction of SVE in GCC automatic vector
     optimization.

  So Is there a way to disable GCC automatic vector optimization or use only
  NEON to perform automatic vector optimization?

  BTW: we already test -fno-tree-vectorize (as link below) but found no effect.
  https://stackoverflow.com/questions/7778174/how-can-i-disable-vectorization-while-using-gcc


The GDB output:
     EAL: Detected 128 lcore(s)
     EAL: Detected 4 NUMA nodes
     Option -w, --pci-whitelist is deprecated, use -a, --allow option instead

     Program received signal SIGILL, Illegal instruction.
     0x0000000000671b88 in eal_adjust_config ()
     (gdb)
     (gdb) where
     #0  0x0000000000671b88 in eal_adjust_config ()
     #1  0x0000000000682840 in rte_eal_init ()
     #2  0x000000000051c870 in main ()
     (gdb)

The disassembly output of eal_adjust_config:
     671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
     671b80:       f110001f        cmp     x0, #0x400
     671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>  // b.any
     671b88:       043357f5        addvl   x21, x19, #-1
     671b8c:       043457e1        addvl   x1, x20, #-1
     671b90:       910562b5        add     x21, x21, #0x158
     671b94:       04e0e3e0        cntd    x0
     671b98:       914012b5        add     x21, x21, #0x4, lsl #12
     671b9c:       52800218        mov     w24, #0x10                      // #16
     671ba0:       25d8e3e1        ptrue   p1.d
     671ba4:       25f80fe0        whilelo p0.d, wzr, w24
     671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]


Best regards.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] How to disable SVE auto vectorization while using GCC
  2021-04-30 11:57 [dpdk-dev] How to disable SVE auto vectorization while using GCC fengchengwen
@ 2021-04-30 15:11 ` Jerin Jacob
  2021-04-30 16:09   ` Stephen Hemminger
  2021-04-30 20:54   ` Honnappa Nagarahalli
  0 siblings, 2 replies; 10+ messages in thread
From: Jerin Jacob @ 2021-04-30 15:11 UTC (permalink / raw)
  To: fengchengwen, Richardson, Bruce, Thomas Monjalon, David Marchand,
	Honnappa Nagarahalli, Stephen Hemminger, Ananyev, Konstantin
  Cc: dev, Jerin Jacob, Ruifeng Wang (Arm Technology China), humin29

On Fri, Apr 30, 2021 at 5:27 PM fengchengwen <fengchengwen@huawei.com> wrote:
>
> Hi, ALL
> We have a question for your help:
>   1. We have two platforms, both of which are ARM64, one of which supports
>      both NEON and SVE, the other only support NEON.
>   2. We want to run on both platforms with a single binary file, and use the
>      highest vector capability of the corresponding platform whenever possible.

I see VPP has a similar feature. IMO, it is not present in DPDK.
Basically, In order to do this.
- Compile slow-path code(90% of DPDK) with minimal CPU instruction set support
- Have fastpath function compile with different CPU instruction set levels
-In slowpath, Attach the fastpath function pointer-based on CPU
instruction-level support.


>   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC 10.2).
>      However, it is found that invalid instructions occur when the program
>      runs on a machine that does not support SVE (pls see below).
>   4. The problem is caused by the introduction of SVE in GCC automatic vector
>      optimization.
>
>   So Is there a way to disable GCC automatic vector optimization or use only
>   NEON to perform automatic vector optimization?
>
>   BTW: we already test -fno-tree-vectorize (as link below) but found no effect.
>   https://stackoverflow.com/questions/7778174/how-can-i-disable-vectorization-while-using-gcc
>
>
> The GDB output:
>      EAL: Detected 128 lcore(s)
>      EAL: Detected 4 NUMA nodes
>      Option -w, --pci-whitelist is deprecated, use -a, --allow option instead
>
>      Program received signal SIGILL, Illegal instruction.
>      0x0000000000671b88 in eal_adjust_config ()
>      (gdb)
>      (gdb) where
>      #0  0x0000000000671b88 in eal_adjust_config ()
>      #1  0x0000000000682840 in rte_eal_init ()
>      #2  0x000000000051c870 in main ()
>      (gdb)
>
> The disassembly output of eal_adjust_config:
>      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
>      671b80:       f110001f        cmp     x0, #0x400
>      671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>  // b.any
>      671b88:       043357f5        addvl   x21, x19, #-1
>      671b8c:       043457e1        addvl   x1, x20, #-1
>      671b90:       910562b5        add     x21, x21, #0x158
>      671b94:       04e0e3e0        cntd    x0
>      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
>      671b9c:       52800218        mov     w24, #0x10                      // #16
>      671ba0:       25d8e3e1        ptrue   p1.d
>      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
>      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
>
>
> Best regards.
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] How to disable SVE auto vectorization while using GCC
  2021-04-30 15:11 ` Jerin Jacob
@ 2021-04-30 16:09   ` Stephen Hemminger
  2021-05-08 19:17     ` Honnappa Nagarahalli
  2021-04-30 20:54   ` Honnappa Nagarahalli
  1 sibling, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2021-04-30 16:09 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: fengchengwen, Richardson, Bruce, Thomas Monjalon, David Marchand,
	Honnappa Nagarahalli, Stephen Hemminger, Ananyev, Konstantin,
	dev, Jerin Jacob, Ruifeng Wang (Arm Technology China),
	humin29

On Fri, 30 Apr 2021 20:41:13 +0530
Jerin Jacob <jerinjacobk@gmail.com> wrote:

> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen <fengchengwen@huawei.com> wrote:
> >
> > Hi, ALL
> > We have a question for your help:
> >   1. We have two platforms, both of which are ARM64, one of which supports
> >      both NEON and SVE, the other only support NEON.
> >   2. We want to run on both platforms with a single binary file, and use the
> >      highest vector capability of the corresponding platform whenever possible.  
> 
> I see VPP has a similar feature. IMO, it is not present in DPDK.
> Basically, In order to do this.
> - Compile slow-path code(90% of DPDK) with minimal CPU instruction set support
> - Have fastpath function compile with different CPU instruction set levels
> -In slowpath, Attach the fastpath function pointer-based on CPU
> instruction-level support.
> 
> 
> >   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC 10.2).
> >      However, it is found that invalid instructions occur when the program
> >      runs on a machine that does not support SVE (pls see below).
> >   4. The problem is caused by the introduction of SVE in GCC automatic vector
> >      optimization.
> >
> >   So Is there a way to disable GCC automatic vector optimization or use only
> >   NEON to perform automatic vector optimization?
> >
> >   BTW: we already test -fno-tree-vectorize (as link below) but found no effect.
> >   https://stackoverflow.com/questions/7778174/how-can-i-disable-vectorization-while-using-gcc
> >
> >
> > The GDB output:
> >      EAL: Detected 128 lcore(s)
> >      EAL: Detected 4 NUMA nodes
> >      Option -w, --pci-whitelist is deprecated, use -a, --allow option instead
> >
> >      Program received signal SIGILL, Illegal instruction.
> >      0x0000000000671b88 in eal_adjust_config ()
> >      (gdb)
> >      (gdb) where
> >      #0  0x0000000000671b88 in eal_adjust_config ()
> >      #1  0x0000000000682840 in rte_eal_init ()
> >      #2  0x000000000051c870 in main ()
> >      (gdb)
> >
> > The disassembly output of eal_adjust_config:
> >      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
> >      671b80:       f110001f        cmp     x0, #0x400
> >      671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>  // b.any
> >      671b88:       043357f5        addvl   x21, x19, #-1
> >      671b8c:       043457e1        addvl   x1, x20, #-1
> >      671b90:       910562b5        add     x21, x21, #0x158
> >      671b94:       04e0e3e0        cntd    x0
> >      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
> >      671b9c:       52800218        mov     w24, #0x10                      // #16
> >      671ba0:       25d8e3e1        ptrue   p1.d
> >      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
> >      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
> >
> >
> > Best regards.
> >  

Is there a way to use Gcc function multiversioning for this?
https://gcc.gnu.org/onlinedocs/gcc/Function-Multiversioning.html

Not sure if this is only available on all compiler versions that DPDK
claims to support. It looks like it made into GCC 6 and LLVM 7

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] How to disable SVE auto vectorization while using GCC
  2021-04-30 15:11 ` Jerin Jacob
  2021-04-30 16:09   ` Stephen Hemminger
@ 2021-04-30 20:54   ` Honnappa Nagarahalli
  2021-05-08  3:23     ` fengchengwen
  1 sibling, 1 reply; 10+ messages in thread
From: Honnappa Nagarahalli @ 2021-04-30 20:54 UTC (permalink / raw)
  To: Jerin Jacob, fengchengwen, Richardson, Bruce, thomas,
	David Marchand, Stephen Hemminger, Ananyev, Konstantin
  Cc: dev, jerinj, Ruifeng Wang, humin29, nd, Honnappa Nagarahalli, nd

<snip>

> 
> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
> <fengchengwen@huawei.com> wrote:
> >
> > Hi, ALL
> > We have a question for your help:
> >   1. We have two platforms, both of which are ARM64, one of which
> supports
> >      both NEON and SVE, the other only support NEON.
> >   2. We want to run on both platforms with a single binary file, and use the
> >      highest vector capability of the corresponding platform whenever
> possible.
> 
> I see VPP has a similar feature. IMO, it is not present in DPDK.
> Basically, In order to do this.
> - Compile slow-path code(90% of DPDK) with minimal CPU instruction set
> support
> - Have fastpath function compile with different CPU instruction set levels -In
> slowpath, Attach the fastpath function pointer-based on CPU instruction-
> level support.
Agree.

> 
> 
> >   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
> 10.2).
This defines the minimum capabilities of the target machine.

> >      However, it is found that invalid instructions occur when the program
> >      runs on a machine that does not support SVE (pls see below).
> >   4. The problem is caused by the introduction of SVE in GCC automatic
> vector
> >      optimization.
> >
> >   So Is there a way to disable GCC automatic vector optimization or use only
> >   NEON to perform automatic vector optimization?
I do not think this is safe. Once SVE is enabled, compiler is allowed to use the SVE instructions wherever it finds it fit.

> >
> >   BTW: we already test -fno-tree-vectorize (as link below) but found no
> effect.
> >
> > https://stackoverflow.com/questions/7778174/how-can-i-disable-vectoriz
> > ation-while-using-gcc
> >
> >
> > The GDB output:
> >      EAL: Detected 128 lcore(s)
> >      EAL: Detected 4 NUMA nodes
> >      Option -w, --pci-whitelist is deprecated, use -a, --allow option
> > instead
> >
> >      Program received signal SIGILL, Illegal instruction.
> >      0x0000000000671b88 in eal_adjust_config ()
> >      (gdb)
> >      (gdb) where
> >      #0  0x0000000000671b88 in eal_adjust_config ()
> >      #1  0x0000000000682840 in rte_eal_init ()
> >      #2  0x000000000051c870 in main ()
> >      (gdb)
> >
> > The disassembly output of eal_adjust_config:
> >      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
> >      671b80:       f110001f        cmp     x0, #0x400
> >      671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>  //
> b.any
> >      671b88:       043357f5        addvl   x21, x19, #-1
> >      671b8c:       043457e1        addvl   x1, x20, #-1
> >      671b90:       910562b5        add     x21, x21, #0x158
> >      671b94:       04e0e3e0        cntd    x0
> >      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
> >      671b9c:       52800218        mov     w24, #0x10                      // #16
> >      671ba0:       25d8e3e1        ptrue   p1.d
> >      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
> >      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
> >
> >
> > Best regards.
> >

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] How to disable SVE auto vectorization while using GCC
  2021-04-30 20:54   ` Honnappa Nagarahalli
@ 2021-05-08  3:23     ` fengchengwen
  2021-05-08 18:46       ` Honnappa Nagarahalli
  0 siblings, 1 reply; 10+ messages in thread
From: fengchengwen @ 2021-05-08  3:23 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Jerin Jacob, Richardson, Bruce, thomas,
	David Marchand, Stephen Hemminger, Ananyev, Konstantin
  Cc: dev, jerinj, Ruifeng Wang, humin29, nd

Thanks for your suggestions, we found that the -fno-tree-vectorize option works.
PS: This option is not successfully added in the earliest test.

Solution:
1. use the -fno-tree-vectorize option to prevent compiler generate auto vetorization
   code, so tha slow-path will work fine.
2. add '-march=armv8-a+sve+crc' line of implementer_generic in arm/meson.build
        'part_number_config': {
                'generic': {'machine_args': ['-march=armv8-a+crc',
                                             '-march=armv8-a+sve+crc',
                                             '-moutline-atomics']}
        }
   If compiler doesn't support '-march=armv8-a+sve+crc', then it will fallback
   supports '-march=armv8-a+crc'.
   If compiler supports '-march=armv8-a+sve+crc', then it will compile SVE-related
   code, so the IO-path could support SVE.

Base above we could achieve initial target.


On 2021/5/1 4:54, Honnappa Nagarahalli wrote:
> <snip>
> 
>>
>> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
>> <fengchengwen@huawei.com> wrote:
>>>
>>> Hi, ALL
>>> We have a question for your help:
>>>   1. We have two platforms, both of which are ARM64, one of which
>> supports
>>>      both NEON and SVE, the other only support NEON.
>>>   2. We want to run on both platforms with a single binary file, and use the
>>>      highest vector capability of the corresponding platform whenever
>> possible.
>>
>> I see VPP has a similar feature. IMO, it is not present in DPDK.
>> Basically, In order to do this.
>> - Compile slow-path code(90% of DPDK) with minimal CPU instruction set
>> support
>> - Have fastpath function compile with different CPU instruction set levels -In
>> slowpath, Attach the fastpath function pointer-based on CPU instruction-
>> level support.
> Agree.
> 
>>
>>
>>>   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
>> 10.2).
> This defines the minimum capabilities of the target machine.
> 
>>>      However, it is found that invalid instructions occur when the program
>>>      runs on a machine that does not support SVE (pls see below).
>>>   4. The problem is caused by the introduction of SVE in GCC automatic
>> vector
>>>      optimization.
>>>
>>>   So Is there a way to disable GCC automatic vector optimization or use only
>>>   NEON to perform automatic vector optimization?
> I do not think this is safe. Once SVE is enabled, compiler is allowed to use the SVE instructions wherever it finds it fit.
> 
>>>
>>>   BTW: we already test -fno-tree-vectorize (as link below) but found no
>> effect.
>>>
>>> https://stackoverflow.com/questions/7778174/how-can-i-disable-vectoriz
>>> ation-while-using-gcc
>>>
>>>
>>> The GDB output:
>>>      EAL: Detected 128 lcore(s)
>>>      EAL: Detected 4 NUMA nodes
>>>      Option -w, --pci-whitelist is deprecated, use -a, --allow option
>>> instead
>>>
>>>      Program received signal SIGILL, Illegal instruction.
>>>      0x0000000000671b88 in eal_adjust_config ()
>>>      (gdb)
>>>      (gdb) where
>>>      #0  0x0000000000671b88 in eal_adjust_config ()
>>>      #1  0x0000000000682840 in rte_eal_init ()
>>>      #2  0x000000000051c870 in main ()
>>>      (gdb)
>>>
>>> The disassembly output of eal_adjust_config:
>>>      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
>>>      671b80:       f110001f        cmp     x0, #0x400
>>>      671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>  //
>> b.any
>>>      671b88:       043357f5        addvl   x21, x19, #-1
>>>      671b8c:       043457e1        addvl   x1, x20, #-1
>>>      671b90:       910562b5        add     x21, x21, #0x158
>>>      671b94:       04e0e3e0        cntd    x0
>>>      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
>>>      671b9c:       52800218        mov     w24, #0x10                      // #16
>>>      671ba0:       25d8e3e1        ptrue   p1.d
>>>      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
>>>      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
>>>
>>>
>>> Best regards.
>>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] How to disable SVE auto vectorization while using GCC
  2021-05-08  3:23     ` fengchengwen
@ 2021-05-08 18:46       ` Honnappa Nagarahalli
  2021-05-11 11:23         ` fengchengwen
  0 siblings, 1 reply; 10+ messages in thread
From: Honnappa Nagarahalli @ 2021-05-08 18:46 UTC (permalink / raw)
  To: fengchengwen, Jerin Jacob, Richardson, Bruce, thomas,
	David Marchand, Stephen Hemminger, Ananyev, Konstantin
  Cc: dev, jerinj, Ruifeng Wang, humin29, nd, Honnappa Nagarahalli, nd

<snip>

> 
> Thanks for your suggestions, we found that the -fno-tree-vectorize option
> works.
> PS: This option is not successfully added in the earliest test.
> 
> Solution:
> 1. use the -fno-tree-vectorize option to prevent compiler generate auto
> vetorization
>    code, so tha slow-path will work fine.
> 2. add '-march=armv8-a+sve+crc' line of implementer_generic in
> arm/meson.build
>         'part_number_config': {
>                 'generic': {'machine_args': ['-march=armv8-a+crc',
>                                              '-march=armv8-a+sve+crc',
>                                              '-moutline-atomics']}
>         }
>    If compiler doesn't support '-march=armv8-a+sve+crc', then it will fallback
>    supports '-march=armv8-a+crc'.
>    If compiler supports '-march=armv8-a+sve+crc', then it will compile SVE-
> related
>    code, so the IO-path could support SVE.
> 
> Base above we could achieve initial target.
The 'generic' target is for generating a binary that would work on all ArmV8 machines. If you are building with '-march=armv8-a+sve+crc', the IO-Path would not work on non-SVE machines.

> 
> 
> On 2021/5/1 4:54, Honnappa Nagarahalli wrote:
> > <snip>
> >
> >>
> >> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
> >> <fengchengwen@huawei.com> wrote:
> >>>
> >>> Hi, ALL
> >>> We have a question for your help:
> >>>   1. We have two platforms, both of which are ARM64, one of which
> >> supports
> >>>      both NEON and SVE, the other only support NEON.
> >>>   2. We want to run on both platforms with a single binary file, and use
> the
> >>>      highest vector capability of the corresponding platform
> >>> whenever
> >> possible.
> >>
> >> I see VPP has a similar feature. IMO, it is not present in DPDK.
> >> Basically, In order to do this.
> >> - Compile slow-path code(90% of DPDK) with minimal CPU instruction
> >> set support
> >> - Have fastpath function compile with different CPU instruction set
> >> levels -In slowpath, Attach the fastpath function pointer-based on
> >> CPU instruction- level support.
> > Agree.
> >
> >>
> >>
> >>>   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
> >> 10.2).
> > This defines the minimum capabilities of the target machine.
> >
> >>>      However, it is found that invalid instructions occur when the program
> >>>      runs on a machine that does not support SVE (pls see below).
> >>>   4. The problem is caused by the introduction of SVE in GCC
> >>> automatic
> >> vector
> >>>      optimization.
> >>>
> >>>   So Is there a way to disable GCC automatic vector optimization or use
> only
> >>>   NEON to perform automatic vector optimization?
> > I do not think this is safe. Once SVE is enabled, compiler is allowed to use
> the SVE instructions wherever it finds it fit.
> >
> >>>
> >>>   BTW: we already test -fno-tree-vectorize (as link below) but found
> >>> no
> >> effect.
> >>>
> >>> https://stackoverflow.com/questions/7778174/how-can-i-disable-vector
> >>> iz
> >>> ation-while-using-gcc
> >>>
> >>>
> >>> The GDB output:
> >>>      EAL: Detected 128 lcore(s)
> >>>      EAL: Detected 4 NUMA nodes
> >>>      Option -w, --pci-whitelist is deprecated, use -a, --allow
> >>> option instead
> >>>
> >>>      Program received signal SIGILL, Illegal instruction.
> >>>      0x0000000000671b88 in eal_adjust_config ()
> >>>      (gdb)
> >>>      (gdb) where
> >>>      #0  0x0000000000671b88 in eal_adjust_config ()
> >>>      #1  0x0000000000682840 in rte_eal_init ()
> >>>      #2  0x000000000051c870 in main ()
> >>>      (gdb)
> >>>
> >>> The disassembly output of eal_adjust_config:
> >>>      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
> >>>      671b80:       f110001f        cmp     x0, #0x400
> >>>      671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>  //
> >> b.any
> >>>      671b88:       043357f5        addvl   x21, x19, #-1
> >>>      671b8c:       043457e1        addvl   x1, x20, #-1
> >>>      671b90:       910562b5        add     x21, x21, #0x158
> >>>      671b94:       04e0e3e0        cntd    x0
> >>>      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
> >>>      671b9c:       52800218        mov     w24, #0x10                      // #16
> >>>      671ba0:       25d8e3e1        ptrue   p1.d
> >>>      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
> >>>      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
> >>>
> >>>
> >>> Best regards.
> >>>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] How to disable SVE auto vectorization while using GCC
  2021-04-30 16:09   ` Stephen Hemminger
@ 2021-05-08 19:17     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 10+ messages in thread
From: Honnappa Nagarahalli @ 2021-05-08 19:17 UTC (permalink / raw)
  To: Stephen Hemminger, Jerin Jacob
  Cc: fengchengwen, Richardson, Bruce, thomas, David Marchand,
	Stephen Hemminger, Ananyev, Konstantin, dev, jerinj,
	Ruifeng Wang, humin29, nd, Honnappa Nagarahalli, nd

<snip>

> 
> > On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
> <fengchengwen@huawei.com> wrote:
> > >
> > > Hi, ALL
> > > We have a question for your help:
> > >   1. We have two platforms, both of which are ARM64, one of which
> supports
> > >      both NEON and SVE, the other only support NEON.
> > >   2. We want to run on both platforms with a single binary file, and use the
> > >      highest vector capability of the corresponding platform whenever
> possible.
> >
> > I see VPP has a similar feature. IMO, it is not present in DPDK.
> > Basically, In order to do this.
> > - Compile slow-path code(90% of DPDK) with minimal CPU instruction set
> > support
> > - Have fastpath function compile with different CPU instruction set
> > levels -In slowpath, Attach the fastpath function pointer-based on CPU
> > instruction-level support.
> >
> >
> > >   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
> 10.2).
> > >      However, it is found that invalid instructions occur when the program
> > >      runs on a machine that does not support SVE (pls see below).
> > >   4. The problem is caused by the introduction of SVE in GCC automatic
> vector
> > >      optimization.
> > >
> > >   So Is there a way to disable GCC automatic vector optimization or use
> only
> > >   NEON to perform automatic vector optimization?
> > >
> > >   BTW: we already test -fno-tree-vectorize (as link below) but found no
> effect.
> > >
> > > https://stackoverflow.com/questions/7778174/how-can-i-disable-vector
> > > ization-while-using-gcc
> > >
> > >
> > > The GDB output:
> > >      EAL: Detected 128 lcore(s)
> > >      EAL: Detected 4 NUMA nodes
> > >      Option -w, --pci-whitelist is deprecated, use -a, --allow
> > > option instead
> > >
> > >      Program received signal SIGILL, Illegal instruction.
> > >      0x0000000000671b88 in eal_adjust_config ()
> > >      (gdb)
> > >      (gdb) where
> > >      #0  0x0000000000671b88 in eal_adjust_config ()
> > >      #1  0x0000000000682840 in rte_eal_init ()
> > >      #2  0x000000000051c870 in main ()
> > >      (gdb)
> > >
> > > The disassembly output of eal_adjust_config:
> > >      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
> > >      671b80:       f110001f        cmp     x0, #0x400
> > >      671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>  //
> b.any
> > >      671b88:       043357f5        addvl   x21, x19, #-1
> > >      671b8c:       043457e1        addvl   x1, x20, #-1
> > >      671b90:       910562b5        add     x21, x21, #0x158
> > >      671b94:       04e0e3e0        cntd    x0
> > >      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
> > >      671b9c:       52800218        mov     w24, #0x10                      // #16
> > >      671ba0:       25d8e3e1        ptrue   p1.d
> > >      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
> > >      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
> > >
> > >
> > > Best regards.
> > >
> 
> Is there a way to use Gcc function multiversioning for this?
> https://gcc.gnu.org/onlinedocs/gcc/Function-Multiversioning.html
> 
> Not sure if this is only available on all compiler versions that DPDK claims to
> support. It looks like it made into GCC 6 and LLVM 7
It looks like it is not fully support for Arm. For ex: 'target_clones' is not supported or automatic dispatcher does not seem to be supported, we need to write our own dispatcher. The following code works and should be sufficient for DPDK. There is no need to pass SVE flag at the command line for the compiler. I do not have a machine with SVE, so the SVE part is not tested.

#include <stdio.h>
#include <sys/auxv.h>

__attribute__((target ("arch=armv8-a+crc")))
int foo_neon ()
{
  printf ("Neon\n");
  return 1;
}

__attribute__((target ("arch=armv8-a+sve")))
int foo_sve ()
{
  printf ("SVE\n");
  return 2;
}

/*
  * The following code can go into IO function selection in DPDK during
  * initialization.
  */
void
foo_selector ()
{
  static int(*foo)(void);

  if (!foo)
     /* The following code can use DPDK wrappers */
     foo = getauxval(AT_HWCAP) & HWCAP_SVE ? foo_sve : foo_neon;

  foo ();
}

int main ()
{
  foo_selector();
  return 0;
}

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] How to disable SVE auto vectorization while using GCC
  2021-05-08 18:46       ` Honnappa Nagarahalli
@ 2021-05-11 11:23         ` fengchengwen
  2021-05-11 14:10           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 10+ messages in thread
From: fengchengwen @ 2021-05-11 11:23 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Jerin Jacob, Richardson, Bruce, thomas,
	David Marchand, Stephen Hemminger, Ananyev, Konstantin
  Cc: dev, jerinj, Ruifeng Wang, humin29, nd



On 2021/5/9 2:46, Honnappa Nagarahalli wrote:
> <snip>
> 
>>
>> Thanks for your suggestions, we found that the -fno-tree-vectorize option
>> works.
>> PS: This option is not successfully added in the earliest test.
>>
>> Solution:
>> 1. use the -fno-tree-vectorize option to prevent compiler generate auto
>> vetorization
>>    code, so tha slow-path will work fine.
>> 2. add '-march=armv8-a+sve+crc' line of implementer_generic in
>> arm/meson.build
>>         'part_number_config': {
>>                 'generic': {'machine_args': ['-march=armv8-a+crc',
>>                                              '-march=armv8-a+sve+crc',
>>                                              '-moutline-atomics']}
>>         }
>>    If compiler doesn't support '-march=armv8-a+sve+crc', then it will fallback
>>    supports '-march=armv8-a+crc'.
>>    If compiler supports '-march=armv8-a+sve+crc', then it will compile SVE-
>> related
>>    code, so the IO-path could support SVE.
>>
>> Base above we could achieve initial target.
> The 'generic' target is for generating a binary that would work on all ArmV8 machines. If you are building with '-march=armv8-a+sve+crc', the IO-Path would not work on non-SVE machines.
> 

The 'generic' only used in local CI (note: the two platforms are both ARMv8 machines)

In the IO-path, we support NEON and SVE Rx/Tx, the code was written by ACLE, so it will
not affect by the -fno-tree-vectorize option.

If compiler supports '-march=armv8-a+sve+crc', then it will compile both NEON and SVE
related code.
In the runtime, driver supports detect the platform whether support SVE, if not it will
select the NEON.

Best regards.

>>
>>
>> On 2021/5/1 4:54, Honnappa Nagarahalli wrote:
>>> <snip>
>>>
>>>>
>>>> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
>>>> <fengchengwen@huawei.com> wrote:
>>>>>
>>>>> Hi, ALL
>>>>> We have a question for your help:
>>>>>   1. We have two platforms, both of which are ARM64, one of which
>>>> supports
>>>>>      both NEON and SVE, the other only support NEON.
>>>>>   2. We want to run on both platforms with a single binary file, and use
>> the
>>>>>      highest vector capability of the corresponding platform
>>>>> whenever
>>>> possible.
>>>>
>>>> I see VPP has a similar feature. IMO, it is not present in DPDK.
>>>> Basically, In order to do this.
>>>> - Compile slow-path code(90% of DPDK) with minimal CPU instruction
>>>> set support
>>>> - Have fastpath function compile with different CPU instruction set
>>>> levels -In slowpath, Attach the fastpath function pointer-based on
>>>> CPU instruction- level support.
>>> Agree.
>>>
>>>>
>>>>
>>>>>   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
>>>> 10.2).
>>> This defines the minimum capabilities of the target machine.
>>>
>>>>>      However, it is found that invalid instructions occur when the program
>>>>>      runs on a machine that does not support SVE (pls see below).
>>>>>   4. The problem is caused by the introduction of SVE in GCC
>>>>> automatic
>>>> vector
>>>>>      optimization.
>>>>>
>>>>>   So Is there a way to disable GCC automatic vector optimization or use
>> only
>>>>>   NEON to perform automatic vector optimization?
>>> I do not think this is safe. Once SVE is enabled, compiler is allowed to use
>> the SVE instructions wherever it finds it fit.
>>>
>>>>>
>>>>>   BTW: we already test -fno-tree-vectorize (as link below) but found
>>>>> no
>>>> effect.
>>>>>
>>>>> https://stackoverflow.com/questions/7778174/how-can-i-disable-vector
>>>>> iz
>>>>> ation-while-using-gcc
>>>>>
>>>>>
>>>>> The GDB output:
>>>>>      EAL: Detected 128 lcore(s)
>>>>>      EAL: Detected 4 NUMA nodes
>>>>>      Option -w, --pci-whitelist is deprecated, use -a, --allow
>>>>> option instead
>>>>>
>>>>>      Program received signal SIGILL, Illegal instruction.
>>>>>      0x0000000000671b88 in eal_adjust_config ()
>>>>>      (gdb)
>>>>>      (gdb) where
>>>>>      #0  0x0000000000671b88 in eal_adjust_config ()
>>>>>      #1  0x0000000000682840 in rte_eal_init ()
>>>>>      #2  0x000000000051c870 in main ()
>>>>>      (gdb)
>>>>>
>>>>> The disassembly output of eal_adjust_config:
>>>>>      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
>>>>>      671b80:       f110001f        cmp     x0, #0x400
>>>>>      671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>  //
>>>> b.any
>>>>>      671b88:       043357f5        addvl   x21, x19, #-1
>>>>>      671b8c:       043457e1        addvl   x1, x20, #-1
>>>>>      671b90:       910562b5        add     x21, x21, #0x158
>>>>>      671b94:       04e0e3e0        cntd    x0
>>>>>      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
>>>>>      671b9c:       52800218        mov     w24, #0x10                      // #16
>>>>>      671ba0:       25d8e3e1        ptrue   p1.d
>>>>>      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
>>>>>      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
>>>>>
>>>>>
>>>>> Best regards.
>>>>>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] How to disable SVE auto vectorization while using GCC
  2021-05-11 11:23         ` fengchengwen
@ 2021-05-11 14:10           ` Honnappa Nagarahalli
  2021-05-12  8:47             ` fengchengwen
  0 siblings, 1 reply; 10+ messages in thread
From: Honnappa Nagarahalli @ 2021-05-11 14:10 UTC (permalink / raw)
  To: fengchengwen, Jerin Jacob, Richardson, Bruce, thomas,
	David Marchand, Stephen Hemminger, Ananyev, Konstantin
  Cc: dev, jerinj, Ruifeng Wang, humin29, nd, Honnappa Nagarahalli, nd

<snip>
> >
> >>
> >> Thanks for your suggestions, we found that the -fno-tree-vectorize
> >> option works.
> >> PS: This option is not successfully added in the earliest test.
> >>
> >> Solution:
> >> 1. use the -fno-tree-vectorize option to prevent compiler generate
> >> auto vetorization
> >>    code, so tha slow-path will work fine.
> >> 2. add '-march=armv8-a+sve+crc' line of implementer_generic in
> >> arm/meson.build
> >>         'part_number_config': {
> >>                 'generic': {'machine_args': ['-march=armv8-a+crc',
> >>                                              '-march=armv8-a+sve+crc',
> >>                                              '-moutline-atomics']}
> >>         }
> >>    If compiler doesn't support '-march=armv8-a+sve+crc', then it will
> fallback
> >>    supports '-march=armv8-a+crc'.
> >>    If compiler supports '-march=armv8-a+sve+crc', then it will
> >> compile SVE- related
> >>    code, so the IO-path could support SVE.
> >>
> >> Base above we could achieve initial target.
> > The 'generic' target is for generating a binary that would work on all ArmV8
> machines. If you are building with '-march=armv8-a+sve+crc', the IO-Path
> would not work on non-SVE machines.
> >
> 
> The 'generic' only used in local CI (note: the two platforms are both ARMv8
> machines)
> 
> In the IO-path, we support NEON and SVE Rx/Tx, the code was written by
> ACLE, so it will not affect by the -fno-tree-vectorize option.
> 
> If compiler supports '-march=armv8-a+sve+crc', then it will compile both
> NEON and SVE related code.
Using '-march=armv8-a+sve+crc' and '-fno-tree-vectorize' does not provide an absolute guarantee that the compiler will not use SVE elsewhere.

The safest way to ensure that only specific functions use SVE is to compile without +sve (e.g. using -march=armv8-a) and use pragmas around the functions that are allowed to use SVE.  Ex:

#pragma GCC push_options
#pragma GCC target ("+sve")
void f(int *x) {
	for (int i = 0; i < 100; ++i) x[i] = i;
}
#pragma GCC pop_options
void g(int *x) {
	for (int i = 0; i < 100; ++i) x[i] = i;
}

compiles f() using SVE and g() with standard options.

You can also follow the function multiversioning discussed in the other thread.

> In the runtime, driver supports detect the platform whether support SVE, if
> not it will select the NEON.
> 
> Best regards.
> 
> >>
> >>
> >> On 2021/5/1 4:54, Honnappa Nagarahalli wrote:
> >>> <snip>
> >>>
> >>>>
> >>>> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
> >>>> <fengchengwen@huawei.com> wrote:
> >>>>>
> >>>>> Hi, ALL
> >>>>> We have a question for your help:
> >>>>>   1. We have two platforms, both of which are ARM64, one of which
> >>>> supports
> >>>>>      both NEON and SVE, the other only support NEON.
> >>>>>   2. We want to run on both platforms with a single binary file,
> >>>>> and use
> >> the
> >>>>>      highest vector capability of the corresponding platform
> >>>>> whenever
> >>>> possible.
> >>>>
> >>>> I see VPP has a similar feature. IMO, it is not present in DPDK.
> >>>> Basically, In order to do this.
> >>>> - Compile slow-path code(90% of DPDK) with minimal CPU instruction
> >>>> set support
> >>>> - Have fastpath function compile with different CPU instruction set
> >>>> levels -In slowpath, Attach the fastpath function pointer-based on
> >>>> CPU instruction- level support.
> >>> Agree.
> >>>
> >>>>
> >>>>
> >>>>>   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
> >>>> 10.2).
> >>> This defines the minimum capabilities of the target machine.
> >>>
> >>>>>      However, it is found that invalid instructions occur when the
> program
> >>>>>      runs on a machine that does not support SVE (pls see below).
> >>>>>   4. The problem is caused by the introduction of SVE in GCC
> >>>>> automatic
> >>>> vector
> >>>>>      optimization.
> >>>>>
> >>>>>   So Is there a way to disable GCC automatic vector optimization
> >>>>> or use
> >> only
> >>>>>   NEON to perform automatic vector optimization?
> >>> I do not think this is safe. Once SVE is enabled, compiler is
> >>> allowed to use
> >> the SVE instructions wherever it finds it fit.
> >>>
> >>>>>
> >>>>>   BTW: we already test -fno-tree-vectorize (as link below) but
> >>>>> found no
> >>>> effect.
> >>>>>
> >>>>> https://stackoverflow.com/questions/7778174/how-can-i-disable-vect
> >>>>> or
> >>>>> iz
> >>>>> ation-while-using-gcc
> >>>>>
> >>>>>
> >>>>> The GDB output:
> >>>>>      EAL: Detected 128 lcore(s)
> >>>>>      EAL: Detected 4 NUMA nodes
> >>>>>      Option -w, --pci-whitelist is deprecated, use -a, --allow
> >>>>> option instead
> >>>>>
> >>>>>      Program received signal SIGILL, Illegal instruction.
> >>>>>      0x0000000000671b88 in eal_adjust_config ()
> >>>>>      (gdb)
> >>>>>      (gdb) where
> >>>>>      #0  0x0000000000671b88 in eal_adjust_config ()
> >>>>>      #1  0x0000000000682840 in rte_eal_init ()
> >>>>>      #2  0x000000000051c870 in main ()
> >>>>>      (gdb)
> >>>>>
> >>>>> The disassembly output of eal_adjust_config:
> >>>>>      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
> >>>>>      671b80:       f110001f        cmp     x0, #0x400
> >>>>>      671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>
> //
> >>>> b.any
> >>>>>      671b88:       043357f5        addvl   x21, x19, #-1
> >>>>>      671b8c:       043457e1        addvl   x1, x20, #-1
> >>>>>      671b90:       910562b5        add     x21, x21, #0x158
> >>>>>      671b94:       04e0e3e0        cntd    x0
> >>>>>      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
> >>>>>      671b9c:       52800218        mov     w24, #0x10                      // #16
> >>>>>      671ba0:       25d8e3e1        ptrue   p1.d
> >>>>>      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
> >>>>>      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
> >>>>>
> >>>>>
> >>>>> Best regards.
> >>>>>
> >


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [dpdk-dev] How to disable SVE auto vectorization while using GCC
  2021-05-11 14:10           ` Honnappa Nagarahalli
@ 2021-05-12  8:47             ` fengchengwen
  0 siblings, 0 replies; 10+ messages in thread
From: fengchengwen @ 2021-05-12  8:47 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Jerin Jacob, Richardson, Bruce, thomas,
	David Marchand, Stephen Hemminger, Ananyev, Konstantin
  Cc: dev, jerinj, Ruifeng Wang, humin29, nd



On 2021/5/11 22:10, Honnappa Nagarahalli wrote:
> <snip>
>>>
>>>>
>>>> Thanks for your suggestions, we found that the -fno-tree-vectorize
>>>> option works.
>>>> PS: This option is not successfully added in the earliest test.
>>>>
>>>> Solution:
>>>> 1. use the -fno-tree-vectorize option to prevent compiler generate
>>>> auto vetorization
>>>>    code, so tha slow-path will work fine.
>>>> 2. add '-march=armv8-a+sve+crc' line of implementer_generic in
>>>> arm/meson.build
>>>>         'part_number_config': {
>>>>                 'generic': {'machine_args': ['-march=armv8-a+crc',
>>>>                                              '-march=armv8-a+sve+crc',
>>>>                                              '-moutline-atomics']}
>>>>         }
>>>>    If compiler doesn't support '-march=armv8-a+sve+crc', then it will
>> fallback
>>>>    supports '-march=armv8-a+crc'.
>>>>    If compiler supports '-march=armv8-a+sve+crc', then it will
>>>> compile SVE- related
>>>>    code, so the IO-path could support SVE.
>>>>
>>>> Base above we could achieve initial target.
>>> The 'generic' target is for generating a binary that would work on all ArmV8
>> machines. If you are building with '-march=armv8-a+sve+crc', the IO-Path
>> would not work on non-SVE machines.
>>>
>>
>> The 'generic' only used in local CI (note: the two platforms are both ARMv8
>> machines)
>>
>> In the IO-path, we support NEON and SVE Rx/Tx, the code was written by
>> ACLE, so it will not affect by the -fno-tree-vectorize option.
>>
>> If compiler supports '-march=armv8-a+sve+crc', then it will compile both
>> NEON and SVE related code.
> Using '-march=armv8-a+sve+crc' and '-fno-tree-vectorize' does not provide an absolute guarantee that the compiler will not use SVE elsewhere.
> 
> The safest way to ensure that only specific functions use SVE is to compile without +sve (e.g. using -march=armv8-a) and use pragmas around the functions that are allowed to use SVE.  Ex:
> 
> #pragma GCC push_options
> #pragma GCC target ("+sve")
> void f(int *x) {
> 	for (int i = 0; i < 100; ++i) x[i] = i;
> }
> #pragma GCC pop_options
> void g(int *x) {
> 	for (int i = 0; i < 100; ++i) x[i] = i;
> }
> 
> compiles f() using SVE and g() with standard options.
> 
> You can also follow the function multiversioning discussed in the other thread.
> 

Thanks for your suggestions

Because the SVE code is organized by file, so use the following scheme in hns3 meson.build:
 if arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
     sources += files('hns3_rxtx_vec.c')

     # compile SVE when:
     # a. support SVE in minimum instruction set baseline
     # b. it's not minimum instruction set, but compiler support
     if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
         cflags += ['-DCC_SVE_SUPPORT']
         sources += files('hns3_rxtx_vec_sve.c')
     elif cc.has_argument('-march=armv8.2-a+sve')
         cflags += ['-DCC_SVE_SUPPORT']
         hns3_sve_lib = static_library('hns3_sve_lib',
                         'hns3_rxtx_vec_sve.c',
                         dependencies: [static_rte_ethdev],
                         include_directories: includes,
                         c_args: [cflags, '-march=armv8.2-a+sve'])
         objs += hns3_sve_lib.extract_objects('hns3_rxtx_vec_sve.c')
     endif
 endif

Ref: https://patchwork.dpdk.org/project/dpdk/patch/1620808126-18876-3-git-send-email-fengchengwen@huawei.com/

Best regards.

>> In the runtime, driver supports detect the platform whether support SVE, if
>> not it will select the NEON.
>>
>> Best regards.
>>
>>>>
>>>>
>>>> On 2021/5/1 4:54, Honnappa Nagarahalli wrote:
>>>>> <snip>
>>>>>
>>>>>>
>>>>>> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
>>>>>> <fengchengwen@huawei.com> wrote:
>>>>>>>
>>>>>>> Hi, ALL
>>>>>>> We have a question for your help:
>>>>>>>   1. We have two platforms, both of which are ARM64, one of which
>>>>>> supports
>>>>>>>      both NEON and SVE, the other only support NEON.
>>>>>>>   2. We want to run on both platforms with a single binary file,
>>>>>>> and use
>>>> the
>>>>>>>      highest vector capability of the corresponding platform
>>>>>>> whenever
>>>>>> possible.
>>>>>>
>>>>>> I see VPP has a similar feature. IMO, it is not present in DPDK.
>>>>>> Basically, In order to do this.
>>>>>> - Compile slow-path code(90% of DPDK) with minimal CPU instruction
>>>>>> set support
>>>>>> - Have fastpath function compile with different CPU instruction set
>>>>>> levels -In slowpath, Attach the fastpath function pointer-based on
>>>>>> CPU instruction- level support.
>>>>> Agree.
>>>>>
>>>>>>
>>>>>>
>>>>>>>   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
>>>>>> 10.2).
>>>>> This defines the minimum capabilities of the target machine.
>>>>>
>>>>>>>      However, it is found that invalid instructions occur when the
>> program
>>>>>>>      runs on a machine that does not support SVE (pls see below).
>>>>>>>   4. The problem is caused by the introduction of SVE in GCC
>>>>>>> automatic
>>>>>> vector
>>>>>>>      optimization.
>>>>>>>
>>>>>>>   So Is there a way to disable GCC automatic vector optimization
>>>>>>> or use
>>>> only
>>>>>>>   NEON to perform automatic vector optimization?
>>>>> I do not think this is safe. Once SVE is enabled, compiler is
>>>>> allowed to use
>>>> the SVE instructions wherever it finds it fit.
>>>>>
>>>>>>>
>>>>>>>   BTW: we already test -fno-tree-vectorize (as link below) but
>>>>>>> found no
>>>>>> effect.
>>>>>>>
>>>>>>> https://stackoverflow.com/questions/7778174/how-can-i-disable-vect
>>>>>>> or
>>>>>>> iz
>>>>>>> ation-while-using-gcc
>>>>>>>
>>>>>>>
>>>>>>> The GDB output:
>>>>>>>      EAL: Detected 128 lcore(s)
>>>>>>>      EAL: Detected 4 NUMA nodes
>>>>>>>      Option -w, --pci-whitelist is deprecated, use -a, --allow
>>>>>>> option instead
>>>>>>>
>>>>>>>      Program received signal SIGILL, Illegal instruction.
>>>>>>>      0x0000000000671b88 in eal_adjust_config ()
>>>>>>>      (gdb)
>>>>>>>      (gdb) where
>>>>>>>      #0  0x0000000000671b88 in eal_adjust_config ()
>>>>>>>      #1  0x0000000000682840 in rte_eal_init ()
>>>>>>>      #2  0x000000000051c870 in main ()
>>>>>>>      (gdb)
>>>>>>>
>>>>>>> The disassembly output of eal_adjust_config:
>>>>>>>      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
>>>>>>>      671b80:       f110001f        cmp     x0, #0x400
>>>>>>>      671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>
>> //
>>>>>> b.any
>>>>>>>      671b88:       043357f5        addvl   x21, x19, #-1
>>>>>>>      671b8c:       043457e1        addvl   x1, x20, #-1
>>>>>>>      671b90:       910562b5        add     x21, x21, #0x158
>>>>>>>      671b94:       04e0e3e0        cntd    x0
>>>>>>>      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
>>>>>>>      671b9c:       52800218        mov     w24, #0x10                      // #16
>>>>>>>      671ba0:       25d8e3e1        ptrue   p1.d
>>>>>>>      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
>>>>>>>      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
>>>>>>>
>>>>>>>
>>>>>>> Best regards.
>>>>>>>
>>>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2021-05-12  8:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-30 11:57 [dpdk-dev] How to disable SVE auto vectorization while using GCC fengchengwen
2021-04-30 15:11 ` Jerin Jacob
2021-04-30 16:09   ` Stephen Hemminger
2021-05-08 19:17     ` Honnappa Nagarahalli
2021-04-30 20:54   ` Honnappa Nagarahalli
2021-05-08  3:23     ` fengchengwen
2021-05-08 18:46       ` Honnappa Nagarahalli
2021-05-11 11:23         ` fengchengwen
2021-05-11 14:10           ` Honnappa Nagarahalli
2021-05-12  8:47             ` fengchengwen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).