From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by dpdk.space (Postfix) with ESMTP id 03CB9A0096 for ; Fri, 7 Jun 2019 15:24:46 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id B6FCB1BBDB; Fri, 7 Jun 2019 15:24:45 +0200 (CEST) Received: from mx1.redhat.com (mx1.redhat.com [209.132.183.28]) by dpdk.org (Postfix) with ESMTP id C65C91BB35 for ; Fri, 7 Jun 2019 15:24:44 +0200 (CEST) Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id A2E8CC0624A1; Fri, 7 Jun 2019 13:24:36 +0000 (UTC) Received: from dhcp-25.97.bos.redhat.com (unknown [10.18.25.61]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 3A75568C79; Fri, 7 Jun 2019 13:24:28 +0000 (UTC) From: Aaron Conole To: Honnappa Nagarahalli Cc: "msantana\@redhat.com" , "thomas\@monjalon.net" , "Ruifeng Wang \(Arm Technology China\)" , "Gavin Hu \(Arm Technology China\)" , Dharmik Thakkar , "jerin.jacob\@caviumnetworks.com" , "yskoh\@mellanox.com" , "dev\@dpdk.org" , "bruce.richardson\@intel.com" , nd References: <18576498.0Zn3BvHS7Y@xps> <74282465.H2CcKukIUE@xps> <82c7ff69-a3fe-6f24-9dbf-ee66ee229869@redhat.com> Date: Fri, 07 Jun 2019 09:24:27 -0400 In-Reply-To: (Honnappa Nagarahalli's message of "Fri, 7 Jun 2019 05:10:46 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.32]); Fri, 07 Jun 2019 13:24:44 +0000 (UTC) Subject: Re: [dpdk-dev] DPDK compilation on arm is failing in Travis X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Honnappa Nagarahalli writes: >> > >> > Thomas Monjalon writes: >> > >> > >> > >> > The compilation of the master branch is failing for aarch64: >> > >> > https://travis-ci.com/DPDK/dpdk >> > >> > The log is so much verbose that I am not able to understand what >> > >> > is really wrong. >> > >> > Please help to diagnose and fix, thanks. >> > >> > >> > >> > A discussion about this: >> > >> > >> > >> > http://mails.dpdk.org/archives/dev/2019-June/134012.html >> > >> > >> > >> > I see the error now. >> > >> > It is printing the full log after the error, so I missed the error >> > >> > at the top. >> > >> > >> > >> > I've read your comment about a possible error with the patch >> > >> > removing weak functions but neither me nor Bruce were able to >> > reproduce >> > >> > it. >> > >> > What is the condition to see this compiler warning? >> > >> > >> > >> > It is only on ARM, and only when the neon intrinsics are in use. >> > >> > I am not able to reproduce it from the tip of master. >> > >> > >> > >> > I am using: >> > >> > gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0 >> > >> > >> > >> > From the log on Travis, looks like the compiler is: >> > >> > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609 >> > >> > >> > >> > Is this the issue? >> > >> > >> > >> > Why are we seeing the error now? >> > >> > I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010, it >> works fine. I cannot get hold of 5.4.0. Not sure if needs to be supporte= d. >> > >> > Are there any issues in upgrading to 7 or 8? >> > >> > I have tested it on my ubuntu 16.04 vm on commit >> > 8cb511bb94ad92a76990f175cac76bb13d51daba >> > (head of master seems to be failing for other reasons on my vm). >> > I tested the following gcc versions: >> > >> > gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010" >> > gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0" >> > gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0" >> > >> > All tested versions failed on the exact same error shown in travis. I >> > don't know if the compiler is at fault here. Maybe Aaron's patch is a = viable >> option? >> > >> > The issue is the vector lane setting code looks like: >> > >> > >> > >> > lval =3D lane_set(scalar, rval, lane id) >> > >> > >> > >> > In this case, 'rval' is being used before it is ever set, but it >> > >> > really could be just 0 for the first lane setting code. Thereafter, >> > >> > we use the old value of input as the rval, but each time a different l= ane is >> set. >> > >> > >> > >> > It would be nice if there were an intrinsic that formatted correctly >> > >> > from the start (something we could call like lval =3D >> > >> > lane_set_from_array(scalar_array)). >> > >> > [Honnappa] This exists already. =E2=80=98vdupq_n_s32=E2=80=99 can be u= sed. Can you try the >> following? >>=20 >> Well, it isn't exactly that. You are setting all lanes from a scalar. > Yes, you are correct, it sets all the lanes. I am not sure on how this > will affect the performance. > >> I'd rather be able to say: >>=20 >> input0 =3D vdupq_nn_s32(&parms[0]); >> input1 =3D vdupq_nn_s32(&parms[4]); >>=20 >> Something like that, which lets us delete all the rest of the lane-set c= ode. But >> it seems it doesn't exist. >>=20 >> Regardless, I think either patch should work (either using the 'all lane= s' >> setting you have or the static variable). I have no preference on it - = it's up to >> you (or someone else) to say which is preferred. I guess your version c= ould be >> preferable since there's no static to need to "explain" :) > I think we can go ahead with your patch with using a temporary vector > for the first set, as it does not introduce any change to the code and > hence performance should not get affected. > > But, I do not understand why you have added 'static'. Also, changing > 'ZEROVAL' to 'tmp' or something similar will be better. The static is there to guarantee '0' value. Otherwise we create a temp variable that has to be initialized explicitly. >>=20 >> > honnag01@qc2400f-1:~/dpdk$ git diff >> > >> > diff --git a/lib/librte_acl/acl_run_neon.h >> > b/lib/librte_acl/acl_run_neon.h >> > >> > index 01b9766d8..b3196cd12 100644 >> > >> > --- a/lib/librte_acl/acl_run_neon.h >> > >> > +++ b/lib/librte_acl/acl_run_neon.h >> > >> > @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const >> > uint8_t **data, >> > >> > >> > >> > while (flows.started > 0) { >> > >> > /* Gather 4 bytes of input data for each stream. */ >> > >> > - input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), i= nput0, 0); >> > >> > - input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), i= nput1, 0); >> > >> > + input0 =3D vdupq_n_s32(GET_NEXT_4BYTES(parms, 0)); >> > >> > + input1 =3D vdupq_n_s32(GET_NEXT_4BYTES(parms, 4)); >> > >> > >> > >> > input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), >> > input0, 1); >> > >> > input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), >> > input1, 1); >> > >> > @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const >> > uint8_t **data, >> > >> > >> > >> > while (flows.started > 0) { >> > >> > /* Gather 4 bytes of input data for each stream. */ >> > >> > - input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), in= put, 0); >> > >> > + input =3D vdupq_n_s32(GET_NEXT_4BYTES(parms, 0)); >> > >> > input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), >> > input, 1); >> > >> > input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), >> > input, 2); >> > >> > input =3D >> > vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3); >> > >> > >> > >> > Then 'input' would never appear as an rval before it was set. >> > >> > >> > >> > I thought Jerin Jacob (CC'd) would have some opinion on the right fix. >> > >> > There are three 'fixes' I know exist - one is to squelch the warning >> > >> > (but I don't like it because it could hide future code that introduces >> > >> > this), one is to create a static and use assignment, one is to replace >> > >> > the first call and pass in a 0'd lane for the first one. >> > >> > >> > >> > Actually, I think I have a patch that could work to not introduce an >> > >> > assignment, but squelch the warning. Something like the following >> > (not >> > >> > tested). >> > >> > >> > >> > --- >> > >> > >> > >> > diff --git a/lib/librte_acl/acl_run_neon.h >> > >> > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..37c984fef 100644 >> > >> > --- a/lib/librte_acl/acl_run_neon.h >> > >> > +++ b/lib/librte_acl/acl_run_neon.h >> > >> > @@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx, const >> > >> > uint8_t **data, >> > >> > uint64_t index_array[8]; >> > >> > struct completion cmplt[8]; >> > >> > struct parms parms[8]; >> > >> > + static int32x4_t ZEROVAL; >> > >> > int32x4_t input0, input1; >> > >> > >> > >> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ - >> > >> > 181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const >> > >> > uint8_t **data, >> > >> > >> > >> > while (flows.started > 0) { >> > >> > /* Gather 4 bytes of input data for each stream. */ >> > >> > - input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input= 0, >> > >> > 0); >> > >> > - input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input= 1, >> > >> > 0); >> > >> > + input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), >> > >> > ZEROVAL, 0); >> > >> > + input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), >> > >> > ZEROVAL, 0); >> > >> > >> > >> > input0 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input= 0, >> > >> > 1); >> > >> > input1 =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), >> > input1, >> > >> > 1); @@ >> > >> > -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const >> > >> > uint8_t **data, >> > >> > uint64_t index_array[4]; >> > >> > struct completion cmplt[4]; >> > >> > struct parms parms[4]; >> > >> > + static int32x4_t ZEROVAL; >> > >> > int32x4_t input; >> > >> > >> > >> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ - >> > >> > 242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const >> > >> > uint8_t **data, >> > >> > >> > >> > while (flows.started > 0) { >> > >> > /* Gather 4 bytes of input data for each stream. */ >> > >> > - input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input,= 0); >> > >> > + input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), >> > >> > ZEROVAL, 0); >> > >> > input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, >> > 1); >> > >> > input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, >> > 2); >> > >> > input =3D vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, >> > 3); >> > >> > -- >> > >> > 2.21.0