From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 1D97A5B36 for ; Tue, 30 Apr 2019 14:57:10 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7B71A3091786; Tue, 30 Apr 2019 12:57:08 +0000 (UTC) Received: from dhcp-25.97.bos.redhat.com (unknown [10.18.25.61]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DA78578DEC; Tue, 30 Apr 2019 12:57:07 +0000 (UTC) From: Aaron Conole To: Jerin Jacob Kollanukkaran Cc: "gavin.hu\@arm.com" , "dev\@dpdk.org" , "konstantin.ananyev\@intel.com" References: <20190408182420.4398-1-aconole@redhat.com> <20190408182420.4398-2-aconole@redhat.com> <6900f8accff6ff85b56bd4f5987572ff085a2153.camel@marvell.com> Date: Tue, 30 Apr 2019 08:57:06 -0400 In-Reply-To: (Aaron Conole's message of "Wed, 10 Apr 2019 13:20:42 -0400") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Tue, 30 Apr 2019 12:57:08 +0000 (UTC) Subject: Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 30 Apr 2019 12:57:10 -0000 Aaron Conole writes: > Jerin Jacob Kollanukkaran writes: > >> On Wed, 2019-04-10 at 11:52 -0400, Aaron Conole wrote: >>> Jerin Jacob Kollanukkaran writes: >>>=20 >>> > On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote: >>> > > --------------------------------------------------------------- >>> > > ---- >>> > > --- >>> > > Compiler complains of argument type mismatch, like: >>> >=20 >>> > Can you share more details on how to reproduce this issue? >>>=20 >>> It will be generated using the meson build after enabling the neon >>> extension support (which isn't currently happening on ARM using meson >>> as >>> the build environment). >> >> >> Can you share the patch to enable this for testing. > > Sure - I'm using these: > > (needed) > 1/3 - http://mails.dpdk.org/archives/dev/2019-March/128304.html > 2/3 - http://mails.dpdk.org/archives/dev/2019-March/128305.html > > (following only needed for travis support) > 3/3 - http://mails.dpdk.org/archives/dev/2019-March/128306.html > > -Aaron > >> Since the additional memcpy in fastpath, I need to check the overhead >> and check the possibility to avoid the memcpy to case. Were you able to test this? >> >>>=20 >>> > We already have >>> > CFLAGS_acl_run_neon.o +=3D -flax-vector-conversions >>> > in the Makefile. >>> >=20 >>> > If you are taking out -flax-vector-conversions the correct way to >>> > fix will be use vreinterpret*. >>> >=20 >>> > For me the code looks clean, If unnecessary casting is avoided. >>>=20 >>> I agree. I merely make explicit the casts that the compiler will be >>> implicitly introducing. >>>=20 >>> > > ../lib/librte_acl/acl_run_neon.h: In function =E2=80=98transitio= n4=E2=80=99: >>> > > ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax- >>> > > vector- >>> > > conversions >>> > > to permit conversions between vectors with differing >>> > > element >>> > > types >>> > > or numbers of subparts >>> > > node_type =3D vbicq_s32(tr_hi_lo.val[0], index_msk); >>> > > ^ >>> > > ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible >>> > > type >>> > > for >>> > > argument 2 of =E2=80=98vbicq_s32=E2=80=99 >>> > >=20 >>> > > Signed-off-by: Aaron Conole >>> > > --- >>> > > lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++--------- >>> > > ---- >>> > > -- >>> > > 1 file changed, 27 insertions(+), 19 deletions(-) >>> > >=20 >>> > >=20 >>> > >=20=20 >>> > > /* >>> > > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx, >>> > > const uint8_t **data, >>> > > acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]); >>> > > acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]); >>> > >=20=20 >>> > > + memset(&input0, 0, sizeof(input0)); >>> > > + memset(&input1, 0, sizeof(input1)); >>> >=20 >>> > Why this memset only required for arm64? If it real issue, >>> > Shouldn't >>> > it required for x86 and ppc ? >>>=20 >>> No. Please see the following lines (which is due to the ARM neon >>> intrinsic for setting individual lanes): >>>=20 >>> while (flows.started > 0) { >>> /* Gather 4 bytes of input data for each stream. */ >>> input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), >>> input0, 0); >>> input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), >>> input1, 0); >>>=20 >>> Note: the first time through this loop, input0 and input1 appear on >>> the >>> rhs of the assignment before appearing on the lhs. This will >>> generate >>> an uninitialized value warning, even though the assignments are to >>> individual lanes of the vector. >>>=20 >>> I squelched the warning from the compiler in the most brute-force way >>> possible. Perhaps it would be better to use a static initialization >>> for >>> the vector but this code was intended to be RFC and to generate >>> feedback. >>>=20 >>> I guess one alternate approach could be: >>>=20 >>> static const int32x4_t ZERO_VEC; >>> int32x4_t input0 =3D ZERO_VEC, input1 =3D ZERO_VEC; >>>=20 >>> ... >>>=20 >>> int32x4_t input =3D ZERO_VEC; >>>=20 >>> This would have the benefit of keeping the initializer as 'fast' as >>> possible (although I recall a memset under a certain size threshold >>> is >>> the same effect, but not certain). >>>=20 >>> Either way, I prefer it to squelching the warning, since the warning >>> has been found to catch legitimate errors many times. >> >> I will get back to this after reproducing the issue locally. > > Awesome - thanks. From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id A08D0A0679 for ; Tue, 30 Apr 2019 14:57:12 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 79B7F5F0D; Tue, 30 Apr 2019 14:57:11 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id 1D97A5B36 for ; Tue, 30 Apr 2019 14:57:10 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com [10.5.11.12]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 7B71A3091786; Tue, 30 Apr 2019 12:57:08 +0000 (UTC) Received: from dhcp-25.97.bos.redhat.com (unknown [10.18.25.61]) by smtp.corp.redhat.com (Postfix) with ESMTPS id DA78578DEC; Tue, 30 Apr 2019 12:57:07 +0000 (UTC) From: Aaron Conole To: Jerin Jacob Kollanukkaran Cc: "gavin.hu\@arm.com" , "dev\@dpdk.org" , "konstantin.ananyev\@intel.com" References: <20190408182420.4398-1-aconole@redhat.com> <20190408182420.4398-2-aconole@redhat.com> <6900f8accff6ff85b56bd4f5987572ff085a2153.camel@marvell.com> Date: Tue, 30 Apr 2019 08:57:06 -0400 In-Reply-To: (Aaron Conole's message of "Wed, 10 Apr 2019 13:20:42 -0400") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.41]); Tue, 30 Apr 2019 12:57:08 +0000 (UTC) Subject: Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Message-ID: <20190430125706.tvJF8B3UKWCTePxhD0cVhh5uNc7X79UvxkUbIykduis@z> Aaron Conole writes: > Jerin Jacob Kollanukkaran writes: > >> On Wed, 2019-04-10 at 11:52 -0400, Aaron Conole wrote: >>> Jerin Jacob Kollanukkaran writes: >>>=20 >>> > On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote: >>> > > --------------------------------------------------------------- >>> > > ---- >>> > > --- >>> > > Compiler complains of argument type mismatch, like: >>> >=20 >>> > Can you share more details on how to reproduce this issue? >>>=20 >>> It will be generated using the meson build after enabling the neon >>> extension support (which isn't currently happening on ARM using meson >>> as >>> the build environment). >> >> >> Can you share the patch to enable this for testing. > > Sure - I'm using these: > > (needed) > 1/3 - http://mails.dpdk.org/archives/dev/2019-March/128304.html > 2/3 - http://mails.dpdk.org/archives/dev/2019-March/128305.html > > (following only needed for travis support) > 3/3 - http://mails.dpdk.org/archives/dev/2019-March/128306.html > > -Aaron > >> Since the additional memcpy in fastpath, I need to check the overhead >> and check the possibility to avoid the memcpy to case. Were you able to test this? >> >>>=20 >>> > We already have >>> > CFLAGS_acl_run_neon.o +=3D -flax-vector-conversions >>> > in the Makefile. >>> >=20 >>> > If you are taking out -flax-vector-conversions the correct way to >>> > fix will be use vreinterpret*. >>> >=20 >>> > For me the code looks clean, If unnecessary casting is avoided. >>>=20 >>> I agree. I merely make explicit the casts that the compiler will be >>> implicitly introducing. >>>=20 >>> > > ../lib/librte_acl/acl_run_neon.h: In function =E2=80=98transitio= n4=E2=80=99: >>> > > ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax- >>> > > vector- >>> > > conversions >>> > > to permit conversions between vectors with differing >>> > > element >>> > > types >>> > > or numbers of subparts >>> > > node_type =3D vbicq_s32(tr_hi_lo.val[0], index_msk); >>> > > ^ >>> > > ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible >>> > > type >>> > > for >>> > > argument 2 of =E2=80=98vbicq_s32=E2=80=99 >>> > >=20 >>> > > Signed-off-by: Aaron Conole >>> > > --- >>> > > lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++--------- >>> > > ---- >>> > > -- >>> > > 1 file changed, 27 insertions(+), 19 deletions(-) >>> > >=20 >>> > >=20 >>> > >=20=20 >>> > > /* >>> > > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx, >>> > > const uint8_t **data, >>> > > acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]); >>> > > acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]); >>> > >=20=20 >>> > > + memset(&input0, 0, sizeof(input0)); >>> > > + memset(&input1, 0, sizeof(input1)); >>> >=20 >>> > Why this memset only required for arm64? If it real issue, >>> > Shouldn't >>> > it required for x86 and ppc ? >>>=20 >>> No. Please see the following lines (which is due to the ARM neon >>> intrinsic for setting individual lanes): >>>=20 >>> while (flows.started > 0) { >>> /* Gather 4 bytes of input data for each stream. */ >>> input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), >>> input0, 0); >>> input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), >>> input1, 0); >>>=20 >>> Note: the first time through this loop, input0 and input1 appear on >>> the >>> rhs of the assignment before appearing on the lhs. This will >>> generate >>> an uninitialized value warning, even though the assignments are to >>> individual lanes of the vector. >>>=20 >>> I squelched the warning from the compiler in the most brute-force way >>> possible. Perhaps it would be better to use a static initialization >>> for >>> the vector but this code was intended to be RFC and to generate >>> feedback. >>>=20 >>> I guess one alternate approach could be: >>>=20 >>> static const int32x4_t ZERO_VEC; >>> int32x4_t input0 =3D ZERO_VEC, input1 =3D ZERO_VEC; >>>=20 >>> ... >>>=20 >>> int32x4_t input =3D ZERO_VEC; >>>=20 >>> This would have the benefit of keeping the initializer as 'fast' as >>> possible (although I recall a memset under a certain size threshold >>> is >>> the same effect, but not certain). >>>=20 >>> Either way, I prefer it to squelching the warning, since the warning >>> has been found to catch legitimate errors many times. >> >> I will get back to this after reproducing the issue locally. > > Awesome - thanks.