From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id D4894A0C43;
	Wed, 12 May 2021 10:47:22 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id C75EC410F8;
	Wed, 12 May 2021 10:47:19 +0200 (CEST)
Received: from szxga07-in.huawei.com (szxga07-in.huawei.com [45.249.212.35])
 by mails.dpdk.org (Postfix) with ESMTP id 03D5B4003F
 for <dev@dpdk.org>; Wed, 12 May 2021 10:47:17 +0200 (CEST)
Received: from DGGEMS410-HUB.china.huawei.com (unknown [172.30.72.59])
 by szxga07-in.huawei.com (SkyGuard) with ESMTP id 4Fg7dB3jjkzCrZJ;
 Wed, 12 May 2021 16:44:30 +0800 (CST)
Received: from [127.0.0.1] (10.40.190.165) by DGGEMS410-HUB.china.huawei.com
 (10.3.19.210) with Microsoft SMTP Server id 14.3.498.0; Wed, 12 May 2021
 16:47:03 +0800
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, Jerin Jacob
 <jerinjacobk@gmail.com>, "Richardson, Bruce" <bruce.richardson@intel.com>,
 "thomas@monjalon.net" <thomas@monjalon.net>, David Marchand
 <david.marchand@redhat.com>, Stephen Hemminger <sthemmin@microsoft.com>,
 "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
CC: "dev@dpdk.org" <dev@dpdk.org>, "jerinj@marvell.com" <jerinj@marvell.com>, 
 Ruifeng Wang <Ruifeng.Wang@arm.com>, "humin29@huawei.com"
 <humin29@huawei.com>, nd <nd@arm.com>
References: <d88df728-9603-96a5-b3d1-8e5336835720@huawei.com>
 <CALBAE1O362gLyV-oko0RwhRrWSgJbWK63vk7V8u9L9pvkD9oHA@mail.gmail.com>
 <DBAPR08MB5814C2F121FE3BCA234895F5985E9@DBAPR08MB5814.eurprd08.prod.outlook.com>
 <319916e1-3380-6ed5-afd3-38e1295c4733@huawei.com>
 <DBAPR08MB58149C10483B66EEC5F27B9698569@DBAPR08MB5814.eurprd08.prod.outlook.com>
 <d0fa5117-0efe-29b6-2241-de56d7a30c21@huawei.com>
 <AM8PR08MB58101B6F852C929A1556785D98539@AM8PR08MB5810.eurprd08.prod.outlook.com>
From: fengchengwen <fengchengwen@huawei.com>
Message-ID: <358a16d0-8489-4180-7c2b-2118544f78e5@huawei.com>
Date: Wed, 12 May 2021 16:47:04 +0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101
 Thunderbird/68.11.0
MIME-Version: 1.0
In-Reply-To: <AM8PR08MB58101B6F852C929A1556785D98539@AM8PR08MB5810.eurprd08.prod.outlook.com>
Content-Type: text/plain; charset="utf-8"
Content-Language: en-US
Content-Transfer-Encoding: 7bit
X-Originating-IP: [10.40.190.165]
X-CFilter-Loop: Reflected
Subject: Re: [dpdk-dev] How to disable SVE auto vectorization while using GCC
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>



On 2021/5/11 22:10, Honnappa Nagarahalli wrote:
> <snip>
>>>
>>>>
>>>> Thanks for your suggestions, we found that the -fno-tree-vectorize
>>>> option works.
>>>> PS: This option is not successfully added in the earliest test.
>>>>
>>>> Solution:
>>>> 1. use the -fno-tree-vectorize option to prevent compiler generate
>>>> auto vetorization
>>>>    code, so tha slow-path will work fine.
>>>> 2. add '-march=armv8-a+sve+crc' line of implementer_generic in
>>>> arm/meson.build
>>>>         'part_number_config': {
>>>>                 'generic': {'machine_args': ['-march=armv8-a+crc',
>>>>                                              '-march=armv8-a+sve+crc',
>>>>                                              '-moutline-atomics']}
>>>>         }
>>>>    If compiler doesn't support '-march=armv8-a+sve+crc', then it will
>> fallback
>>>>    supports '-march=armv8-a+crc'.
>>>>    If compiler supports '-march=armv8-a+sve+crc', then it will
>>>> compile SVE- related
>>>>    code, so the IO-path could support SVE.
>>>>
>>>> Base above we could achieve initial target.
>>> The 'generic' target is for generating a binary that would work on all ArmV8
>> machines. If you are building with '-march=armv8-a+sve+crc', the IO-Path
>> would not work on non-SVE machines.
>>>
>>
>> The 'generic' only used in local CI (note: the two platforms are both ARMv8
>> machines)
>>
>> In the IO-path, we support NEON and SVE Rx/Tx, the code was written by
>> ACLE, so it will not affect by the -fno-tree-vectorize option.
>>
>> If compiler supports '-march=armv8-a+sve+crc', then it will compile both
>> NEON and SVE related code.
> Using '-march=armv8-a+sve+crc' and '-fno-tree-vectorize' does not provide an absolute guarantee that the compiler will not use SVE elsewhere.
> 
> The safest way to ensure that only specific functions use SVE is to compile without +sve (e.g. using -march=armv8-a) and use pragmas around the functions that are allowed to use SVE.  Ex:
> 
> #pragma GCC push_options
> #pragma GCC target ("+sve")
> void f(int *x) {
> 	for (int i = 0; i < 100; ++i) x[i] = i;
> }
> #pragma GCC pop_options
> void g(int *x) {
> 	for (int i = 0; i < 100; ++i) x[i] = i;
> }
> 
> compiles f() using SVE and g() with standard options.
> 
> You can also follow the function multiversioning discussed in the other thread.
> 

Thanks for your suggestions

Because the SVE code is organized by file, so use the following scheme in hns3 meson.build:
 if arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
     sources += files('hns3_rxtx_vec.c')

     # compile SVE when:
     # a. support SVE in minimum instruction set baseline
     # b. it's not minimum instruction set, but compiler support
     if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
         cflags += ['-DCC_SVE_SUPPORT']
         sources += files('hns3_rxtx_vec_sve.c')
     elif cc.has_argument('-march=armv8.2-a+sve')
         cflags += ['-DCC_SVE_SUPPORT']
         hns3_sve_lib = static_library('hns3_sve_lib',
                         'hns3_rxtx_vec_sve.c',
                         dependencies: [static_rte_ethdev],
                         include_directories: includes,
                         c_args: [cflags, '-march=armv8.2-a+sve'])
         objs += hns3_sve_lib.extract_objects('hns3_rxtx_vec_sve.c')
     endif
 endif

Ref: https://patchwork.dpdk.org/project/dpdk/patch/1620808126-18876-3-git-send-email-fengchengwen@huawei.com/

Best regards.

>> In the runtime, driver supports detect the platform whether support SVE, if
>> not it will select the NEON.
>>
>> Best regards.
>>
>>>>
>>>>
>>>> On 2021/5/1 4:54, Honnappa Nagarahalli wrote:
>>>>> <snip>
>>>>>
>>>>>>
>>>>>> On Fri, Apr 30, 2021 at 5:27 PM fengchengwen
>>>>>> <fengchengwen@huawei.com> wrote:
>>>>>>>
>>>>>>> Hi, ALL
>>>>>>> We have a question for your help:
>>>>>>>   1. We have two platforms, both of which are ARM64, one of which
>>>>>> supports
>>>>>>>      both NEON and SVE, the other only support NEON.
>>>>>>>   2. We want to run on both platforms with a single binary file,
>>>>>>> and use
>>>> the
>>>>>>>      highest vector capability of the corresponding platform
>>>>>>> whenever
>>>>>> possible.
>>>>>>
>>>>>> I see VPP has a similar feature. IMO, it is not present in DPDK.
>>>>>> Basically, In order to do this.
>>>>>> - Compile slow-path code(90% of DPDK) with minimal CPU instruction
>>>>>> set support
>>>>>> - Have fastpath function compile with different CPU instruction set
>>>>>> levels -In slowpath, Attach the fastpath function pointer-based on
>>>>>> CPU instruction- level support.
>>>>> Agree.
>>>>>
>>>>>>
>>>>>>
>>>>>>>   3. So we build the DPDK program with -march=armv8-a+sve+crc (GCC
>>>>>> 10.2).
>>>>> This defines the minimum capabilities of the target machine.
>>>>>
>>>>>>>      However, it is found that invalid instructions occur when the
>> program
>>>>>>>      runs on a machine that does not support SVE (pls see below).
>>>>>>>   4. The problem is caused by the introduction of SVE in GCC
>>>>>>> automatic
>>>>>> vector
>>>>>>>      optimization.
>>>>>>>
>>>>>>>   So Is there a way to disable GCC automatic vector optimization
>>>>>>> or use
>>>> only
>>>>>>>   NEON to perform automatic vector optimization?
>>>>> I do not think this is safe. Once SVE is enabled, compiler is
>>>>> allowed to use
>>>> the SVE instructions wherever it finds it fit.
>>>>>
>>>>>>>
>>>>>>>   BTW: we already test -fno-tree-vectorize (as link below) but
>>>>>>> found no
>>>>>> effect.
>>>>>>>
>>>>>>> https://stackoverflow.com/questions/7778174/how-can-i-disable-vect
>>>>>>> or
>>>>>>> iz
>>>>>>> ation-while-using-gcc
>>>>>>>
>>>>>>>
>>>>>>> The GDB output:
>>>>>>>      EAL: Detected 128 lcore(s)
>>>>>>>      EAL: Detected 4 NUMA nodes
>>>>>>>      Option -w, --pci-whitelist is deprecated, use -a, --allow
>>>>>>> option instead
>>>>>>>
>>>>>>>      Program received signal SIGILL, Illegal instruction.
>>>>>>>      0x0000000000671b88 in eal_adjust_config ()
>>>>>>>      (gdb)
>>>>>>>      (gdb) where
>>>>>>>      #0  0x0000000000671b88 in eal_adjust_config ()
>>>>>>>      #1  0x0000000000682840 in rte_eal_init ()
>>>>>>>      #2  0x000000000051c870 in main ()
>>>>>>>      (gdb)
>>>>>>>
>>>>>>> The disassembly output of eal_adjust_config:
>>>>>>>      671b7c:       f8237a81        str     x1, [x20, x3, lsl #3]
>>>>>>>      671b80:       f110001f        cmp     x0, #0x400
>>>>>>>      671b84:       54ffff21        b.ne    671b68 <eal_adjust_config+0x1f4>
>> //
>>>>>> b.any
>>>>>>>      671b88:       043357f5        addvl   x21, x19, #-1
>>>>>>>      671b8c:       043457e1        addvl   x1, x20, #-1
>>>>>>>      671b90:       910562b5        add     x21, x21, #0x158
>>>>>>>      671b94:       04e0e3e0        cntd    x0
>>>>>>>      671b98:       914012b5        add     x21, x21, #0x4, lsl #12
>>>>>>>      671b9c:       52800218        mov     w24, #0x10                      // #16
>>>>>>>      671ba0:       25d8e3e1        ptrue   p1.d
>>>>>>>      671ba4:       25f80fe0        whilelo p0.d, wzr, w24
>>>>>>>      671ba8:       a5e04020        ld1d    {z0.d}, p0/z, [x1, x0, lsl #3]
>>>>>>>
>>>>>>>
>>>>>>> Best regards.
>>>>>>>
>>>
>