From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 19CCCA00E6 for ; Tue, 11 Jun 2019 03:27:46 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 03DCD1C1E8; Tue, 11 Jun 2019 03:27:45 +0200 (CEST) Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-eopbgr150051.outbound.protection.outlook.com [40.107.15.51]) by dpdk.org (Postfix) with ESMTP id 435A61C1A0; Tue, 11 Jun 2019 03:27:44 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector2-armh-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=9m4Hfh/j6l8AUNoPO+dVFd+REhyT807vidWoeEvEKiM=; b=Y9QTfPnhpw5+6nfRqmWmVni4LjzZgw/t6jiyZi9fgCuBXTb0x5U0mQqee0JKlUDc40SZ4HjYsLKOm8OUyKNPqIsXeWV6wRb1N+0KwqeJr5YfS/pJmPKkpYbxIBQY7xyAnVX4ZTzpn44FQyZXuGJkoho/zQL40KLNTbElCq5NrMo= Received: from VE1PR08MB5149.eurprd08.prod.outlook.com (20.179.30.152) by VE1PR08MB5152.eurprd08.prod.outlook.com (20.179.30.155) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1965.14; Tue, 11 Jun 2019 01:27:42 +0000 Received: from VE1PR08MB5149.eurprd08.prod.outlook.com ([fe80::9983:2882:a24:c0b0]) by VE1PR08MB5149.eurprd08.prod.outlook.com ([fe80::9983:2882:a24:c0b0%5]) with mapi id 15.20.1965.017; Tue, 11 Jun 2019 01:27:42 +0000 From: Honnappa Nagarahalli To: "jerinj@marvell.com" , "dev@dpdk.org" CC: "thomas@monjalon.net" , "Gavin Hu (Arm Technology China)" , "msantana@redhat.com" , "aconole@redhat.com" , "stable@dpdk.org" , Honnappa Nagarahalli , nd , nd Thread-Topic: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler Thread-Index: AQHVHHdDb9Awx2+OaUqUzFfJSJHNCaaPqs4QgAAOuoCABKcgsIAARzcAgADtjWA= Date: Tue, 11 Jun 2019 01:27:42 +0000 Message-ID: References: <20190606145054.39995-1-jerinj@marvell.com> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-ts-tracking-id: 7ce628fe-df9a-4a64-a913-149b8789db01.0 x-checkrecipientchecked: true authentication-results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [107.77.217.229] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: eaa570fd-e7f9-428a-f713-08d6ee0bffcc x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600148)(711020)(4605104)(1401327)(4618075)(2017052603328)(7193020); SRVR:VE1PR08MB5152; x-ms-traffictypediagnostic: VE1PR08MB5152: x-ms-exchange-purlcount: 3 x-ld-processed: f34e5979-57d9-4aaa-ad4d-b122a662184d,ExtAddr nodisclaimer: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:8882; x-forefront-prvs: 006546F32A x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(366004)(346002)(376002)(39860400002)(396003)(136003)(189003)(199004)(53936002)(8936002)(76116006)(11346002)(9686003)(66446008)(66556008)(66476007)(446003)(6306002)(73956011)(66946007)(64756008)(6246003)(81166006)(4326008)(486006)(72206003)(478600001)(66066001)(305945005)(476003)(55016002)(68736007)(7736002)(14454004)(81156014)(8676002)(966005)(6116002)(33656002)(3846002)(229853002)(26005)(74316002)(52536014)(186003)(5660300002)(2501003)(71190400001)(71200400001)(6506007)(7696005)(316002)(76176011)(256004)(25786009)(110136005)(54906003)(2906002)(6436002)(99286004)(102836004)(86362001); DIR:OUT; SFP:1101; SCL:1; SRVR:VE1PR08MB5152; H:VE1PR08MB5149.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: 9FjiSLPHDD9NmzEsq16GmbOcXXB/FJOvX+krhB64j9IkVyTRIHlRiQvY+vUXi0FLRHPvMZ1iKUpokvyHom16YfHldTvY8iPKzYoP1Tmas/dkcr3d48Bqax+AwkpOfFOrFBwI0oorYlevHhbPfeKoq91qcJCp1axqVHIOzGtvkibBHpuYaK3ylBLAwU9BlEXlXEPoorX1IhdlH9ARk28KC0RF4iRjZjPu2lb8XvOO6isi9SfVEXR72fRUiaLE58w+2PP9n++2I5bIekkL4ebfuIddo+46Fnz+iCV3UVgljXRu2qaRgJ6YHUlnrvF5o6BzFa2mCRbQDeTQC9g+aaSWIZGMoYQNMP3yxGLiupT1B8YNXmoBxGysFAu2uWYzFfE9cPLfm8twFKhdvFcw6s2lQk/HhUcPQckN5ll6NqwNMcU= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: eaa570fd-e7f9-428a-f713-08d6ee0bffcc X-MS-Exchange-CrossTenant-originalarrivaltime: 11 Jun 2019 01:27:42.1068 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Honnappa.Nagarahalli@arm.com X-MS-Exchange-Transport-CrossTenantHeadersStamped: VE1PR08MB5152 Subject: Re: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > > > > -- > > > > > Subject: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 > > > > > compiler > > > > > > > > > > From: Jerin Jacob > > > > > > > > > > Some compilers reporting the following error, though the > > > > > existing code doesn't have any uninitialized variable case. > > > > > Just to make compiler happy, initialize the int32x4_t variable > > > > > one shot in C language. > > > > > > > > > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' > > > > > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be > > > > > used uninitialized in this function [-Werror=3Dmaybe-uninitialize= d] > > > > > int32x4_t input; > > > > > > > > > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") > > > > > Cc: stable@dpdk.org > > > > > > > > > > Signed-off-by: Jerin Jacob > > > > > --- > > > > > lib/librte_acl/acl_run_neon.h | 29 > > > > > ++++++++++++----------------- > > > > > 1 file changed, 12 insertions(+), 17 deletions(-) > > > > > > > > > > diff --git a/lib/librte_acl/acl_run_neon.h > > > > > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..dc9e9efe9 > > > > > 100644 > > > > > --- a/lib/librte_acl/acl_run_neon.h > > > > > +++ b/lib/librte_acl/acl_run_neon.h > > > > > @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, > > > > > const uint8_t **data, > > > > > uint64_t index_array[8]; > > > > > struct completion cmplt[8]; > > > > > struct parms parms[8]; > > > > > - int32x4_t input0, input1; > > > > > > > > > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > > > > > total_packets, categories, ctx->trans_table); @@ -181,17 > > > > > +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const > > > > > +uint8_t > > > > > **data, > > > > > > > > > > while (flows.started > 0) { > > > > > /* Gather 4 bytes of input data for each stream. */ > > > > > - input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 0), > > > > > input0, 0); > > > > > - input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 4), > > > > > input1, 0); > > > > > - > > > > > - input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 1), > > > > > input0, 1); > > > > > - input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 5), > > > > > input1, 1); > > > > > - > > > > > - input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 2), > > > > > input0, 2); > > > > > - input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 6), > > > > > input1, 2); > > > > > - > > > > > - input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 3), > > > > > input0, 3); > > > > > - input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 7), > > > > > input1, 3); > > > > > + int32x4_t input0 =3D {GET_NEXT_4BYTES(parms, 0), > > > > > + GET_NEXT_4BYTES(parms, 1), > > > > > + GET_NEXT_4BYTES(parms, 2), > > > > > + GET_NEXT_4BYTES(parms, 3)}; > > > > > + int32x4_t input1 =3D {GET_NEXT_4BYTES(parms, 4), > > > > > + GET_NEXT_4BYTES(parms, 5), > > > > > + GET_NEXT_4BYTES(parms, 6), > > > > > + GET_NEXT_4BYTES(parms, 7)}; > > > > > > > > > This mixes the use of NEON intrinsics with GCC vector extensions. > > > > ACLE (Arm C Language Extensions) specifically recommends not to > > > > mix the two methods in section 12.2.6. IMO, Aaron's suggestion of > > > > using a temp vector > > > should be good. > > > > > > We are using this pattern across DPDK and SSE for x86 as well. > > > https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c > > > #n > > > 91 > > I am not sure about x86, I have not looked at a document similar to > > ACLE for x86. IMO, it is not relevant here as this is Arm specific code= . >=20 > What I meant was its been already used in DPDK for arm64. > https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c#n91 Ok, got it. I have had discussion with compiler folks at Arm with mixing ve= ctor programming models and the recommendation has been to use NEON exclusi= vely. I have had this discussion with Marvel compiler folks too some time b= ack. >=20 > Please see offial page vector gcc gcc documentation. The examples are usi= ng > this scheme. > https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html >=20 > This is to just create 'input' variable. I am fine to use any other schem= e with > out additional cost of instructions. >=20 > > > > > > > > Since it used in fastpath, a temp variable would be additional cost > > > for no reason. > > Then, I would suggest we can go with using 'vdupq_n_s32'. >=20 > We have to form uint64x2_t with 4 x uint32_t variable, How does > 'vdupq_n_s32' help here? We would use 'vdupq_n_s32' only for the first initialization, the rest of t= he code remains the same (see the diff below) > Can you share code snippet without any temp variable? diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h index 01b9766d8..b3196cd12 100644 --- a/lib/librte_acl/acl_run_neon.h +++ b/lib/librte_acl/acl_run_neon.h @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint= 8_t **data, while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0= , 0); - input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1= , 0); + input0 =3D vdupq_n_s32(GET_NEXT_4BYTES(parms, 0)); + input1 =3D vdupq_n_s32(GET_NEXT_4BYTES(parms, 4)); input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0= , 1); input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1= , 1); @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint= 8_t **data, while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, = 0); + input =3D vdupq_n_s32(GET_NEXT_4BYTES(parms, 0)); input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, = 1); input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, = 2); input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, = 3); My understanding is that the generated code for both your patch and my chan= ges above is the same. Above suggested changes will conform to ACLE recomme= ndation. >=20 > > > > > If GCC supports it then I think it is fine, I think, above usage > > > matters with C++ portability. > > I did not understand the C++ portability part. Can you elaborate more? > > > > > > > > > > > > > > > > > /* Process the 4 bytes of input on each stream. */ > > > > > > > > > > @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, > > > > > const uint8_t **data, > > > > > uint64_t index_array[4]; > > > > > struct completion cmplt[4]; > > > > > struct parms parms[4]; > > > > > - int32x4_t input; > > > > > > > > > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > > > > > total_packets, categories, ctx->trans_table); @@ -242,10 > > > > > +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const > > > > > +uint8_t > > > > > **data, > > > > > > > > > > while (flows.started > 0) { > > > > > /* Gather 4 bytes of input data for each stream. */ > > > > > - input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), > input, > > > > > 0); > > > > > - input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), > input, > > > > > 1); > > > > > - input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), > input, > > > > > 2); > > > > > - input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), > input, > > > > > 3); > > > > > + int32x4_t input =3D {GET_NEXT_4BYTES(parms, 0), > > > > > + GET_NEXT_4BYTES(parms, 1), > > > > > + GET_NEXT_4BYTES(parms, 2), > > > > > + GET_NEXT_4BYTES(parms, 3)}; > > > > > > > > > > /* Process the 4 bytes of input on each stream. */ > > > > > input =3D transition4(input, flows.trans, index_array); > > > > > -- > > > > > 2.21.0