From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 96D09A0096 for ; Wed, 10 Apr 2019 17:52:44 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 664861B204; Wed, 10 Apr 2019 17:52:42 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 911481B1F6 for ; Wed, 10 Apr 2019 17:52:40 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.phx2.redhat.com [10.5.11.16]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id D75D988ABF; Wed, 10 Apr 2019 15:52:39 +0000 (UTC) Received: from dhcp-25.97.bos.redhat.com (unknown [10.18.25.61]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 5178A5F9C3; Wed, 10 Apr 2019 15:52:39 +0000 (UTC) From: Aaron Conole To: Jerin Jacob Kollanukkaran Cc: "dev\@dpdk.org" , "gavin.hu\@arm.com" , "konstantin.ananyev\@intel.com" References: <20190408182420.4398-1-aconole@redhat.com> <20190408182420.4398-2-aconole@redhat.com> Date: Wed, 10 Apr 2019 11:52:38 -0400 In-Reply-To: (Jerin Jacob Kollanukkaran's message of "Wed, 10 Apr 2019 14:39:31 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.16 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.26]); Wed, 10 Apr 2019 15:52:39 +0000 (UTC) Subject: Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190410155238.fennHkUPh88b6xq7kiZSQ945kBS9sPABKxgYPXiRHGI@z> Jerin Jacob Kollanukkaran writes: > On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote: >> ------------------------------------------------------------------- >> --- >> Compiler complains of argument type mismatch, like: > > Can you share more details on how to reproduce this issue? It will be generated using the meson build after enabling the neon extension support (which isn't currently happening on ARM using meson as the build environment). > We already have > CFLAGS_acl_run_neon.o +=3D -flax-vector-conversions > in the Makefile. > > If you are taking out -flax-vector-conversions the correct way to > fix will be use vreinterpret*. > > For me the code looks clean, If unnecessary casting is avoided. I agree. I merely make explicit the casts that the compiler will be implicitly introducing. > >>=20 >> ../lib/librte_acl/acl_run_neon.h: In function =E2=80=98transition4=E2= =80=99: >> ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-vector- >> conversions >> to permit conversions between vectors with differing element >> types >> or numbers of subparts >> node_type =3D vbicq_s32(tr_hi_lo.val[0], index_msk); >> ^ >> ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible type >> for >> argument 2 of =E2=80=98vbicq_s32=E2=80=99 >>=20 >> Signed-off-by: Aaron Conole >> --- >> lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++------------- >> -- >> 1 file changed, 27 insertions(+), 19 deletions(-) >>=20 >>=20 >>=20=20 >> /* >> @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx, >> const uint8_t **data, >> acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]); >> acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]); >>=20=20 >> + memset(&input0, 0, sizeof(input0)); >> + memset(&input1, 0, sizeof(input1)); > > Why this memset only required for arm64? If it real issue, Shouldn't > it required for x86 and ppc ? No. Please see the following lines (which is due to the ARM neon intrinsic for setting individual lanes): while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0); input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0); Note: the first time through this loop, input0 and input1 appear on the rhs of the assignment before appearing on the lhs. This will generate an uninitialized value warning, even though the assignments are to individual lanes of the vector. I squelched the warning from the compiler in the most brute-force way possible. Perhaps it would be better to use a static initialization for the vector but this code was intended to be RFC and to generate feedback. I guess one alternate approach could be: static const int32x4_t ZERO_VEC; int32x4_t input0 =3D ZERO_VEC, input1 =3D ZERO_VEC; ... int32x4_t input =3D ZERO_VEC; This would have the benefit of keeping the initializer as 'fast' as possible (although I recall a memset under a certain size threshold is the same effect, but not certain). Either way, I prefer it to squelching the warning, since the warning has been found to catch legitimate errors many times.