From: Aaron Conole <aconole@redhat.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
Cc: "msantana\@redhat.com" <msantana@redhat.com>,
"thomas\@monjalon.net" <thomas@monjalon.net>,
"Ruifeng Wang \(Arm Technology China\)" <Ruifeng.Wang@arm.com>,
"Gavin Hu \(Arm Technology China\)" <Gavin.Hu@arm.com>,
Dharmik Thakkar <Dharmik.Thakkar@arm.com>,
"jerin.jacob\@caviumnetworks.com"
<jerin.jacob@caviumnetworks.com>,
"yskoh\@mellanox.com" <yskoh@mellanox.com>,
"dev\@dpdk.org" <dev@dpdk.org>,
"bruce.richardson\@intel.com" <bruce.richardson@intel.com>,
nd <nd@arm.com>
Subject: Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
Date: Fri, 07 Jun 2019 09:24:27 -0400 [thread overview]
Message-ID: <f7th891o7ec.fsf@dhcp-25.97.bos.redhat.com> (raw)
In-Reply-To: <VE1PR08MB5149A70A8B85F195A00B80CC98100@VE1PR08MB5149.eurprd08.prod.outlook.com> (Honnappa Nagarahalli's message of "Fri, 7 Jun 2019 05:10:46 +0000")
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:
>> >
>> > Thomas Monjalon <thomas@monjalon.net> writes:
>> >
>> >
>> >
>> > The compilation of the master branch is failing for aarch64:
>> >
>> > https://travis-ci.com/DPDK/dpdk
>> >
>> > The log is so much verbose that I am not able to understand what
>> >
>> > is really wrong.
>> >
>> > Please help to diagnose and fix, thanks.
>> >
>> >
>> >
>> > A discussion about this:
>> >
>> >
>> >
>> > http://mails.dpdk.org/archives/dev/2019-June/134012.html
>> >
>> >
>> >
>> > I see the error now.
>> >
>> > It is printing the full log after the error, so I missed the error
>> >
>> > at the top.
>> >
>> >
>> >
>> > I've read your comment about a possible error with the patch
>> >
>> > removing weak functions but neither me nor Bruce were able to
>> > reproduce
>> >
>> > it.
>> >
>> > What is the condition to see this compiler warning?
>> >
>> >
>> >
>> > It is only on ARM, and only when the neon intrinsics are in use.
>> >
>> > I am not able to reproduce it from the tip of master.
>> >
>> >
>> >
>> > I am using:
>> >
>> > gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
>> >
>> >
>> >
>> > From the log on Travis, looks like the compiler is:
>> >
>> > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
>> >
>> >
>> >
>> > Is this the issue?
>> >
>> >
>> >
>> > Why are we seeing the error now?
>> >
>> > I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010, it
>> works fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
>> >
>> > Are there any issues in upgrading to 7 or 8?
>> >
>> > I have tested it on my ubuntu 16.04 vm on commit
>> > 8cb511bb94ad92a76990f175cac76bb13d51daba
>> > (head of master seems to be failing for other reasons on my vm).
>> > I tested the following gcc versions:
>> >
>> > gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010"
>> > gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0"
>> > gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0"
>> >
>> > All tested versions failed on the exact same error shown in travis. I
>> > don't know if the compiler is at fault here. Maybe Aaron's patch is a viable
>> option?
>> >
>> > The issue is the vector lane setting code looks like:
>> >
>> >
>> >
>> > lval = lane_set(scalar, rval, lane id)
>> >
>> >
>> >
>> > In this case, 'rval' is being used before it is ever set, but it
>> >
>> > really could be just 0 for the first lane setting code. Thereafter,
>> >
>> > we use the old value of input as the rval, but each time a different lane is
>> set.
>> >
>> >
>> >
>> > It would be nice if there were an intrinsic that formatted correctly
>> >
>> > from the start (something we could call like lval =
>> >
>> > lane_set_from_array(scalar_array)).
>> >
>> > [Honnappa] This exists already. ‘vdupq_n_s32’ can be used. Can you try the
>> following?
>>
>> Well, it isn't exactly that. You are setting all lanes from a scalar.
> Yes, you are correct, it sets all the lanes. I am not sure on how this
> will affect the performance.
>
>> I'd rather be able to say:
>>
>> input0 = vdupq_nn_s32(&parms[0]);
>> input1 = vdupq_nn_s32(&parms[4]);
>>
>> Something like that, which lets us delete all the rest of the lane-set code. But
>> it seems it doesn't exist.
>>
>> Regardless, I think either patch should work (either using the 'all lanes'
>> setting you have or the static variable). I have no preference on it - it's up to
>> you (or someone else) to say which is preferred. I guess your version could be
>> preferable since there's no static to need to "explain" :)
> I think we can go ahead with your patch with using a temporary vector
> for the first set, as it does not introduce any change to the code and
> hence performance should not get affected.
>
> But, I do not understand why you have added 'static'. Also, changing
> 'ZEROVAL' to 'tmp' or something similar will be better.
The static is there to guarantee '0' value. Otherwise we create a temp
variable that has to be initialized explicitly.
>>
>> > honnag01@qc2400f-1:~/dpdk$ git diff
>> >
>> > diff --git a/lib/librte_acl/acl_run_neon.h
>> > b/lib/librte_acl/acl_run_neon.h
>> >
>> > index 01b9766d8..b3196cd12 100644
>> >
>> > --- a/lib/librte_acl/acl_run_neon.h
>> >
>> > +++ b/lib/librte_acl/acl_run_neon.h
>> >
>> > @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>> > uint8_t **data,
>> >
>> >
>> >
>> > while (flows.started > 0) {
>> >
>> > /* Gather 4 bytes of input data for each stream. */
>> >
>> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
>> >
>> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
>> >
>> > + input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
>> >
>> > + input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));
>> >
>> >
>> >
>> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
>> > input0, 1);
>> >
>> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
>> > input1, 1);
>> >
>> > @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
>> > uint8_t **data,
>> >
>> >
>> >
>> > while (flows.started > 0) {
>> >
>> > /* Gather 4 bytes of input data for each stream. */
>> >
>> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
>> >
>> > + input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
>> >
>> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
>> > input, 1);
>> >
>> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
>> > input, 2);
>> >
>> > input =
>> > vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
>> >
>> >
>> >
>> > Then 'input' would never appear as an rval before it was set.
>> >
>> >
>> >
>> > I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
>> >
>> > There are three 'fixes' I know exist - one is to squelch the warning
>> >
>> > (but I don't like it because it could hide future code that introduces
>> >
>> > this), one is to create a static and use assignment, one is to replace
>> >
>> > the first call and pass in a 0'd lane for the first one.
>> >
>> >
>> >
>> > Actually, I think I have a patch that could work to not introduce an
>> >
>> > assignment, but squelch the warning. Something like the following
>> > (not
>> >
>> > tested).
>> >
>> >
>> >
>> > ---
>> >
>> >
>> >
>> > diff --git a/lib/librte_acl/acl_run_neon.h
>> >
>> > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..37c984fef 100644
>> >
>> > --- a/lib/librte_acl/acl_run_neon.h
>> >
>> > +++ b/lib/librte_acl/acl_run_neon.h
>> >
>> > @@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>> >
>> > uint8_t **data,
>> >
>> > uint64_t index_array[8];
>> >
>> > struct completion cmplt[8];
>> >
>> > struct parms parms[8];
>> >
>> > + static int32x4_t ZEROVAL;
>> >
>> > int32x4_t input0, input1;
>> >
>> >
>> >
>> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
>> >
>> > 181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>> >
>> > uint8_t **data,
>> >
>> >
>> >
>> > while (flows.started > 0) {
>> >
>> > /* Gather 4 bytes of input data for each stream. */
>> >
>> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
>> >
>> > 0);
>> >
>> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
>> >
>> > 0);
>> >
>> > + input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>> >
>> > ZEROVAL, 0);
>> >
>> > + input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
>> >
>> > ZEROVAL, 0);
>> >
>> >
>> >
>> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0,
>> >
>> > 1);
>> >
>> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
>> > input1,
>> >
>> > 1); @@
>> >
>> > -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
>> >
>> > uint8_t **data,
>> >
>> > uint64_t index_array[4];
>> >
>> > struct completion cmplt[4];
>> >
>> > struct parms parms[4];
>> >
>> > + static int32x4_t ZEROVAL;
>> >
>> > int32x4_t input;
>> >
>> >
>> >
>> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
>> >
>> > 242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
>> >
>> > uint8_t **data,
>> >
>> >
>> >
>> > while (flows.started > 0) {
>> >
>> > /* Gather 4 bytes of input data for each stream. */
>> >
>> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
>> >
>> > + input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>> >
>> > ZEROVAL, 0);
>> >
>> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input,
>> > 1);
>> >
>> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input,
>> > 2);
>> >
>> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input,
>> > 3);
>> >
>> > --
>> >
>> > 2.21.0
next prev parent reply other threads:[~2019-06-07 13:24 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-06-05 18:26 Thomas Monjalon
2019-06-05 19:40 ` Aaron Conole
2019-06-05 20:04 ` Thomas Monjalon
2019-06-05 20:58 ` Aaron Conole
2019-06-05 21:05 ` Honnappa Nagarahalli
2019-06-05 21:36 ` Honnappa Nagarahalli
2019-06-05 22:38 ` Michael Santana Francisco
2019-06-06 4:42 ` Honnappa Nagarahalli
2019-06-06 14:50 ` Aaron Conole
2019-06-07 5:10 ` Honnappa Nagarahalli
2019-06-07 13:24 ` Aaron Conole [this message]
2019-06-07 13:53 ` Honnappa Nagarahalli
2019-06-08 8:38 ` Jerin Jacob Kollanukkaran
2019-06-08 8:41 ` Jerin Jacob Kollanukkaran
2019-06-06 14:57 ` Jerin Jacob Kollanukkaran
2019-06-06 17:06 ` Michael Santana Francisco
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f7th891o7ec.fsf@dhcp-25.97.bos.redhat.com \
--to=aconole@redhat.com \
--cc=Dharmik.Thakkar@arm.com \
--cc=Gavin.Hu@arm.com \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=Ruifeng.Wang@arm.com \
--cc=bruce.richardson@intel.com \
--cc=dev@dpdk.org \
--cc=jerin.jacob@caviumnetworks.com \
--cc=msantana@redhat.com \
--cc=nd@arm.com \
--cc=thomas@monjalon.net \
--cc=yskoh@mellanox.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).