From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 09FB3A0096 for ; Wed, 5 Jun 2019 19:09:45 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 725CC1B99A; Wed, 5 Jun 2019 19:09:44 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 2EC721B993 for ; Wed, 5 Jun 2019 19:09:43 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7EA3A30A7C61; Wed, 5 Jun 2019 17:09:37 +0000 (UTC) Received: from dhcp-25.97.bos.redhat.com (unknown [10.18.25.61]) by smtp.corp.redhat.com (Postfix) with ESMTPS id D366B60BF3; Wed, 5 Jun 2019 17:09:35 +0000 (UTC) From: Aaron Conole To: Jerin Jacob Kollanukkaran Cc: "dev\@dpdk.org" , "gavin.hu\@arm.com" , "konstantin.ananyev\@intel.com" References: <20190408182420.4398-1-aconole@redhat.com> <20190408182420.4398-2-aconole@redhat.com> Date: Wed, 05 Jun 2019 13:09:34 -0400 In-Reply-To: (Jerin Jacob Kollanukkaran's message of "Wed, 5 Jun 2019 15:16:11 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.42]); Wed, 05 Jun 2019 17:09:42 +0000 (UTC) Subject: Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Jerin Jacob Kollanukkaran writes: >> -----Original Message----- >> From: Jerin Jacob Kollanukkaran >> Sent: Wednesday, April 10, 2019 8:10 PM >> To: dev@dpdk.org; aconole@redhat.com >> Cc: gavin.hu@arm.com; konstantin.ananyev@intel.com >> Subject: Re: [EXT] [PATCH 1/3] acl: fix arm argument types >>=20 >> On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote: >> > ------------------------------------------------------------------- >> > --- >> > Compiler complains of argument type mismatch, like: >>=20 >> Can you share more details on how to reproduce this issue? >>=20 >> We already have >> CFLAGS_acl_run_neon.o +=3D -flax-vector-conversions in the Makefile. >>=20 >> If you are taking out -flax-vector-conversions the correct way to fix wi= ll be >> use vreinterpret*. >>=20 >> For me the code looks clean, If unnecessary casting is avoided. > > > Considering the following patch is part of dpdk.org now. I think, We may = not need this > patch in benefit to avoid a lot of typecasting. > > https://git.dpdk.org/dpdk/commit/?id=3De53ce4e4137974f46743e74bd9ab912e01= 66c8b1 Correct, the lax conversions aren't needed. > > > >>=20 >>=20 >> > >> > ../lib/librte_acl/acl_run_neon.h: In function =E2=80=98transition4= =E2=80=99: >> > ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-vector- >> > conversions >> > to permit conversions between vectors with differing element >> > types >> > or numbers of subparts >> > node_type =3D vbicq_s32(tr_hi_lo.val[0], index_msk); >> > ^ >> > ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible type >> > for >> > argument 2 of =E2=80=98vbicq_s32=E2=80=99 >> > >> > Signed-off-by: Aaron Conole >> > --- >> > lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++------------- >> > -- >> > 1 file changed, 27 insertions(+), 19 deletions(-) >> > >> > >> > >> > /* >> > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx, const >> > uint8_t **data, >> > acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]); >> > acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]); >> > >> > + memset(&input0, 0, sizeof(input0)); >> > + memset(&input1, 0, sizeof(input1)); >>=20 >> Why this memset only required for arm64? If it real issue, Shouldn't it >> required for x86 and ppc ? >>=20 Something for this part is still needed (see for example: https://travis-ci.com/DPDK/dpdk/jobs/205675369). I have two alternate approaches, butneither have even been compile tested (and the obvious '-Wno-maybe-uninitialized' - but I dislike that approach because it will afflict all routines): 1. Something like this: @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint= 8_t **data, =20 while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0); - input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0); + input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), vdup_n_s32(0), 0); + input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), vdup_n_s32(0), 0); =20 input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1); input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1); @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint= 8_t **data, =20 while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0); + input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), vdup_n_s32(0), 0); input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1); input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2); input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3); --------- 2: something like this diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h index a055a8240..0eb42865a 100644 --- a/lib/librte_acl/acl_run_neon.h +++ b/lib/librte_acl/acl_run_neon.h @@ -165,7 +165,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint= 8_t **data, uint64_t index_array[8]; struct completion cmplt[8]; struct parms parms[8]; - int32x4_t input0, input1; + static int32x4_t ZERO_VAL; + int32x4_t input0 =3D ZERO_VAL, input1 =3D ZERO_VAL; =20 acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, total_packets, categories, ctx->trans_table); @@ -181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint= 8_t **data, =20 while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), vdup_n= _s32(0), 0); - input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), vdup_n= _s32(0), 0); + input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0= , 0); + input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1= , 0); =20 input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0= , 1); input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1= , 1); @@ -227,7 +228,8 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint= 8_t **data, uint64_t index_array[4]; struct completion cmplt[4]; struct parms parms[4]; - int32x4_t input; + static int32x4_t ZERO_VAL; + int32x4_t input =3D ZERO_VAL; =20 acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, total_packets, categories, ctx->trans_table); @@ -242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint= 8_t **data, =20 while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), vdup_n_= s32(0), 0); + input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, = 0); input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, = 1); input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, = 2); input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, = 3); --- WDYT?