From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from dpdk.org (dpdk.org [92.243.14.124])
	by dpdk.space (Postfix) with ESMTP id A08D0A0679
	for <public@inbox.dpdk.org>; Tue, 30 Apr 2019 14:57:12 +0200 (CEST)
Received: from [92.243.14.124] (localhost [127.0.0.1])
	by dpdk.org (Postfix) with ESMTP id 79B7F5F0D;
	Tue, 30 Apr 2019 14:57:11 +0200 (CEST)
Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28])
 by dpdk.org (Postfix) with ESMTP id 1D97A5B36
 for <dev@dpdk.org>; Tue, 30 Apr 2019 14:57:10 +0200 (CEST)
Received: from smtp.corp.redhat.com (int-mx02.intmail.prod.int.phx2.redhat.com
 [10.5.11.12])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mx1.redhat.com (Postfix) with ESMTPS id 7B71A3091786;
 Tue, 30 Apr 2019 12:57:08 +0000 (UTC)
Received: from dhcp-25.97.bos.redhat.com (unknown [10.18.25.61])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id DA78578DEC;
 Tue, 30 Apr 2019 12:57:07 +0000 (UTC)
From: Aaron Conole <aconole@redhat.com>
To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
Cc: "gavin.hu\@arm.com" <gavin.hu@arm.com>, "dev\@dpdk.org" <dev@dpdk.org>,
 "konstantin.ananyev\@intel.com" <konstantin.ananyev@intel.com>
References: <20190408182420.4398-1-aconole@redhat.com>
 <20190408182420.4398-2-aconole@redhat.com>
 <b38d3b8e74fbf89dd40f3d9b26e436b1c30612ff.camel@marvell.com>
 <f7tpnpteu2h.fsf@dhcp-25.97.bos.redhat.com>
 <6900f8accff6ff85b56bd4f5987572ff085a2153.camel@marvell.com>
 <f7tef69epzp.fsf@dhcp-25.97.bos.redhat.com>
Date: Tue, 30 Apr 2019 08:57:06 -0400
In-Reply-To: <f7tef69epzp.fsf@dhcp-25.97.bos.redhat.com> (Aaron Conole's
 message of "Wed, 10 Apr 2019 13:20:42 -0400")
Message-ID: <f7tftpzodlp.fsf@dhcp-25.97.bos.redhat.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Scanned-By: MIMEDefang 2.79 on 10.5.11.12
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16
 (mx1.redhat.com [10.5.110.41]); Tue, 30 Apr 2019 12:57:08 +0000 (UTC)
Subject: Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>
Message-ID: <20190430125706.tvJF8B3UKWCTePxhD0cVhh5uNc7X79UvxkUbIykduis@z>

Aaron Conole <aconole@redhat.com> writes:

> Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:
>
>> On Wed, 2019-04-10 at 11:52 -0400, Aaron Conole wrote:
>>> Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:
>>>=20
>>> > On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
>>> > > ---------------------------------------------------------------
>>> > > ----
>>> > > ---
>>> > > Compiler complains of argument type mismatch, like:
>>> >=20
>>> > Can you share more details on how to reproduce this issue?
>>>=20
>>> It will be generated using the meson build after enabling the neon
>>> extension support (which isn't currently happening on ARM using meson
>>> as
>>> the build environment).
>>
>>
>> Can you share the patch to enable this for testing.
>
> Sure - I'm using these:
>
> (needed)
> 1/3 - http://mails.dpdk.org/archives/dev/2019-March/128304.html
> 2/3 - http://mails.dpdk.org/archives/dev/2019-March/128305.html
>
> (following only needed for travis support)
> 3/3 - http://mails.dpdk.org/archives/dev/2019-March/128306.html
>
> -Aaron
>
>> Since the additional memcpy in fastpath, I need to check the overhead
>> and check the possibility to avoid the memcpy to case.

Were you able to test this?

>>
>>>=20
>>> > We already have
>>> > CFLAGS_acl_run_neon.o +=3D -flax-vector-conversions
>>> > in the Makefile.
>>> >=20
>>> > If you are taking out -flax-vector-conversions the correct way to
>>> > fix will be use vreinterpret*.
>>> >=20
>>> > For me the code looks clean, If unnecessary casting is avoided.
>>>=20
>>> I agree.  I merely make explicit the casts that the compiler will be
>>> implicitly introducing.
>>>=20
>>> > >    ../lib/librte_acl/acl_run_neon.h: In function =E2=80=98transitio=
n4=E2=80=99:
>>> > >    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-
>>> > > vector-
>>> > > conversions
>>> > >       to permit conversions between vectors with differing
>>> > > element
>>> > > types
>>> > >       or numbers of subparts
>>> > >      node_type =3D vbicq_s32(tr_hi_lo.val[0], index_msk);
>>> > >      ^
>>> > >    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible
>>> > > type
>>> > > for
>>> > >       argument 2 of =E2=80=98vbicq_s32=E2=80=99
>>> > >=20
>>> > > Signed-off-by: Aaron Conole <aconole@redhat.com>
>>> > > ---
>>> > >  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++---------
>>> > > ----
>>> > > --
>>> > >  1 file changed, 27 insertions(+), 19 deletions(-)
>>> > >=20
>>> > >=20
>>> > >=20=20
>>> > >  /*
>>> > > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
>>> > > const uint8_t **data,
>>> > >  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
>>> > >  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
>>> > >=20=20
>>> > > +	memset(&input0, 0, sizeof(input0));
>>> > > +	memset(&input1, 0, sizeof(input1));
>>> >=20
>>> > Why this memset only required for arm64? If it real issue,
>>> > Shouldn't
>>> > it required for x86 and ppc ?
>>>=20
>>> No.  Please see the following lines (which is due to the ARM neon
>>> intrinsic for setting individual lanes):
>>>=20
>>> 	while (flows.started > 0) {
>>> 		/* Gather 4 bytes of input data for each stream. */
>>> 		input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>>> input0, 0);
>>> 		input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
>>> input1, 0);
>>>=20
>>> Note: the first time through this loop, input0 and input1 appear on
>>> the
>>> rhs of the assignment before appearing on the lhs.  This will
>>> generate
>>> an uninitialized value warning, even though the assignments are to
>>> individual lanes of the vector.
>>>=20
>>> I squelched the warning from the compiler in the most brute-force way
>>> possible.  Perhaps it would be better to use a static initialization
>>> for
>>> the vector but this code was intended to be RFC and to generate
>>> feedback.
>>>=20
>>> I guess one alternate approach could be:
>>>=20
>>>    static const int32x4_t ZERO_VEC;
>>>    int32x4_t input0 =3D ZERO_VEC, input1 =3D ZERO_VEC;
>>>=20
>>>    ...
>>>=20
>>>    int32x4_t input =3D ZERO_VEC;
>>>=20
>>> This would have the benefit of keeping the initializer as 'fast' as
>>> possible (although I recall a memset under a certain size threshold
>>> is
>>> the same effect, but not certain).
>>>=20
>>> Either way, I prefer it to squelching the warning, since the warning
>>> has been found to catch legitimate errors many times.
>>
>> I will get back to this after reproducing the issue locally.
>
> Awesome - thanks.