* [dpdk-dev] DPDK compilation on arm is failing in Travis
@ 2019-06-05 18:26 Thomas Monjalon
2019-06-05 19:40 ` Aaron Conole
0 siblings, 1 reply; 16+ messages in thread
From: Thomas Monjalon @ 2019-06-05 18:26 UTC (permalink / raw)
To: honnappa.nagarahalli
Cc: ruifeng.wang, gavin.hu, dharmik.thakkar, jerin.jacob,
Yongseok Koh, dev, msantana, aconole
The compilation of the master branch is failing for aarch64:
https://travis-ci.com/DPDK/dpdk
The log is so much verbose that I am not able to understand
what is really wrong.
Please help to diagnose and fix, thanks.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-05 18:26 [dpdk-dev] DPDK compilation on arm is failing in Travis Thomas Monjalon
@ 2019-06-05 19:40 ` Aaron Conole
2019-06-05 20:04 ` Thomas Monjalon
0 siblings, 1 reply; 16+ messages in thread
From: Aaron Conole @ 2019-06-05 19:40 UTC (permalink / raw)
To: Thomas Monjalon
Cc: honnappa.nagarahalli, ruifeng.wang, gavin.hu, dharmik.thakkar,
jerin.jacob, Yongseok Koh, dev, msantana
Thomas Monjalon <thomas@monjalon.net> writes:
> The compilation of the master branch is failing for aarch64:
> https://travis-ci.com/DPDK/dpdk
> The log is so much verbose that I am not able to understand
> what is really wrong.
> Please help to diagnose and fix, thanks.
A discussion about this:
http://mails.dpdk.org/archives/dev/2019-June/134012.html
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-05 19:40 ` Aaron Conole
@ 2019-06-05 20:04 ` Thomas Monjalon
2019-06-05 20:58 ` Aaron Conole
0 siblings, 1 reply; 16+ messages in thread
From: Thomas Monjalon @ 2019-06-05 20:04 UTC (permalink / raw)
To: Aaron Conole
Cc: honnappa.nagarahalli, ruifeng.wang, gavin.hu, dharmik.thakkar,
jerin.jacob, Yongseok Koh, dev, msantana, bruce.richardson
05/06/2019 21:40, Aaron Conole:
> Thomas Monjalon <thomas@monjalon.net> writes:
>
> > The compilation of the master branch is failing for aarch64:
> > https://travis-ci.com/DPDK/dpdk
> > The log is so much verbose that I am not able to understand
> > what is really wrong.
> > Please help to diagnose and fix, thanks.
>
> A discussion about this:
>
> http://mails.dpdk.org/archives/dev/2019-June/134012.html
I see the error now.
It is printing the full log after the error,
so I missed the error at the top.
I've read your comment about a possible error with the patch
removing weak functions but neither me nor Bruce were able
to reproduce it. What is the condition to see this compiler warning?
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-05 20:04 ` Thomas Monjalon
@ 2019-06-05 20:58 ` Aaron Conole
2019-06-05 21:05 ` Honnappa Nagarahalli
0 siblings, 1 reply; 16+ messages in thread
From: Aaron Conole @ 2019-06-05 20:58 UTC (permalink / raw)
To: Thomas Monjalon
Cc: honnappa.nagarahalli, ruifeng.wang, gavin.hu, dharmik.thakkar,
jerin.jacob, Yongseok Koh, dev, msantana, bruce.richardson
Thomas Monjalon <thomas@monjalon.net> writes:
> 05/06/2019 21:40, Aaron Conole:
>> Thomas Monjalon <thomas@monjalon.net> writes:
>>
>> > The compilation of the master branch is failing for aarch64:
>> > https://travis-ci.com/DPDK/dpdk
>> > The log is so much verbose that I am not able to understand
>> > what is really wrong.
>> > Please help to diagnose and fix, thanks.
>>
>> A discussion about this:
>>
>> http://mails.dpdk.org/archives/dev/2019-June/134012.html
>
> I see the error now.
> It is printing the full log after the error,
> so I missed the error at the top.
>
> I've read your comment about a possible error with the patch
> removing weak functions but neither me nor Bruce were able
> to reproduce it. What is the condition to see this compiler warning?
It is only on ARM, and only when the neon intrinsics are in use.
The issue is the vector lane setting code looks like:
lval = lane_set(scalar, rval, lane id)
In this case, 'rval' is being used before it is ever set, but it really
could be just 0 for the first lane setting code. Thereafter, we use the
old value of input as the rval, but each time a different lane is set.
It would be nice if there were an intrinsic that formatted correctly
from the start (something we could call like
lval = lane_set_from_array(scalar_array)). Then 'input' would never
appear as an rval before it was set.
I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
There are three 'fixes' I know exist - one is to squelch the warning
(but I don't like it because it could hide future code that introduces
this), one is to create a static and use assignment, one is to replace
the first call and pass in a 0'd lane for the first one.
Actually, I think I have a patch that could work to not introduce an
assignment, but squelch the warning. Something like the following (not
tested).
---
diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
index 01b9766d8..37c984fef 100644
--- a/lib/librte_acl/acl_run_neon.h
+++ b/lib/librte_acl/acl_run_neon.h
@@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
uint64_t index_array[8];
struct completion cmplt[8];
struct parms parms[8];
+ static int32x4_t ZEROVAL;
int32x4_t input0, input1;
acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
@@ -181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
while (flows.started > 0) {
/* Gather 4 bytes of input data for each stream. */
- input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
- input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
+ input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), ZEROVAL, 0);
+ input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), ZEROVAL, 0);
input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
@@ -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
uint64_t index_array[4];
struct completion cmplt[4];
struct parms parms[4];
+ static int32x4_t ZEROVAL;
int32x4_t input;
acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
@@ -242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
while (flows.started > 0) {
/* Gather 4 bytes of input data for each stream. */
- input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
+ input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), ZEROVAL, 0);
input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
--
2.21.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-05 20:58 ` Aaron Conole
@ 2019-06-05 21:05 ` Honnappa Nagarahalli
2019-06-05 21:36 ` Honnappa Nagarahalli
0 siblings, 1 reply; 16+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-05 21:05 UTC (permalink / raw)
To: Aaron Conole, thomas
Cc: Ruifeng Wang (Arm Technology China),
Gavin Hu (Arm Technology China),
Dharmik Thakkar, jerin.jacob, yskoh, dev, msantana,
bruce.richardson, Honnappa Nagarahalli, nd, nd
>
> Thomas Monjalon <thomas@monjalon.net> writes:
>
> > 05/06/2019 21:40, Aaron Conole:
> >> Thomas Monjalon <thomas@monjalon.net> writes:
> >>
> >> > The compilation of the master branch is failing for aarch64:
> >> > https://travis-ci.com/DPDK/dpdk
> >> > The log is so much verbose that I am not able to understand what is
> >> > really wrong.
> >> > Please help to diagnose and fix, thanks.
> >>
> >> A discussion about this:
> >>
> >> http://mails.dpdk.org/archives/dev/2019-June/134012.html
> >
> > I see the error now.
> > It is printing the full log after the error, so I missed the error at
> > the top.
> >
> > I've read your comment about a possible error with the patch removing
> > weak functions but neither me nor Bruce were able to reproduce it.
> > What is the condition to see this compiler warning?
>
> It is only on ARM, and only when the neon intrinsics are in use.
I am not able to reproduce it from the tip of master.
I am using:
gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
From the log on Travis, looks like the compiler is:
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Is this the issue?
Why are we seeing the error now?
>
> The issue is the vector lane setting code looks like:
>
> lval = lane_set(scalar, rval, lane id)
>
> In this case, 'rval' is being used before it is ever set, but it really could be just 0
> for the first lane setting code. Thereafter, we use the old value of input as the
> rval, but each time a different lane is set.
>
> It would be nice if there were an intrinsic that formatted correctly from the
> start (something we could call like lval = lane_set_from_array(scalar_array)).
> Then 'input' would never appear as an rval before it was set.
>
> I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
> There are three 'fixes' I know exist - one is to squelch the warning (but I don't
> like it because it could hide future code that introduces this), one is to create a
> static and use assignment, one is to replace the first call and pass in a 0'd lane
> for the first one.
>
> Actually, I think I have a patch that could work to not introduce an assignment,
> but squelch the warning. Something like the following (not tested).
>
> ---
>
> diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h index
> 01b9766d8..37c984fef 100644
> --- a/lib/librte_acl/acl_run_neon.h
> +++ b/lib/librte_acl/acl_run_neon.h
> @@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> uint8_t **data,
> uint64_t index_array[8];
> struct completion cmplt[8];
> struct parms parms[8];
> + static int32x4_t ZEROVAL;
> int32x4_t input0, input1;
>
> acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
> 181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t
> **data,
>
> while (flows.started > 0) {
> /* Gather 4 bytes of input data for each stream. */
> - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
> 0);
> - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
> 0);
> + input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> ZEROVAL, 0);
> + input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> ZEROVAL, 0);
>
> input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0,
> 1);
> input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1,
> 1); @@ -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
> uint8_t **data,
> uint64_t index_array[4];
> struct completion cmplt[4];
> struct parms parms[4];
> + static int32x4_t ZEROVAL;
> int32x4_t input;
>
> acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
> 242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t
> **data,
>
> while (flows.started > 0) {
> /* Gather 4 bytes of input data for each stream. */
> - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> + input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> ZEROVAL, 0);
> input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
> input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
> input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
> --
> 2.21.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-05 21:05 ` Honnappa Nagarahalli
@ 2019-06-05 21:36 ` Honnappa Nagarahalli
2019-06-05 22:38 ` Michael Santana Francisco
0 siblings, 1 reply; 16+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-05 21:36 UTC (permalink / raw)
To: Aaron Conole, thomas
Cc: Ruifeng Wang (Arm Technology China),
Gavin Hu (Arm Technology China),
Dharmik Thakkar, jerin.jacob, yskoh, dev, msantana,
bruce.richardson, Honnappa Nagarahalli, nd, nd
> >
> > Thomas Monjalon <thomas@monjalon.net> writes:
> >
> > > 05/06/2019 21:40, Aaron Conole:
> > >> Thomas Monjalon <thomas@monjalon.net> writes:
> > >>
> > >> > The compilation of the master branch is failing for aarch64:
> > >> > https://travis-ci.com/DPDK/dpdk
> > >> > The log is so much verbose that I am not able to understand what
> > >> > is really wrong.
> > >> > Please help to diagnose and fix, thanks.
> > >>
> > >> A discussion about this:
> > >>
> > >> http://mails.dpdk.org/archives/dev/2019-June/134012.html
> > >
> > > I see the error now.
> > > It is printing the full log after the error, so I missed the error
> > > at the top.
> > >
> > > I've read your comment about a possible error with the patch
> > > removing weak functions but neither me nor Bruce were able to reproduce
> it.
> > > What is the condition to see this compiler warning?
> >
> > It is only on ARM, and only when the neon intrinsics are in use.
> I am not able to reproduce it from the tip of master.
>
> I am using:
> gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
>
> From the log on Travis, looks like the compiler is:
> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
>
> Is this the issue?
>
> Why are we seeing the error now?
I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010, it works fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
Are there any issues in upgrading to 7 or 8?
>
> >
> > The issue is the vector lane setting code looks like:
> >
> > lval = lane_set(scalar, rval, lane id)
> >
> > In this case, 'rval' is being used before it is ever set, but it
> > really could be just 0 for the first lane setting code. Thereafter,
> > we use the old value of input as the rval, but each time a different lane is set.
> >
> > It would be nice if there were an intrinsic that formatted correctly
> > from the start (something we could call like lval =
> lane_set_from_array(scalar_array)).
> > Then 'input' would never appear as an rval before it was set.
> >
> > I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
> > There are three 'fixes' I know exist - one is to squelch the warning
> > (but I don't like it because it could hide future code that introduces
> > this), one is to create a static and use assignment, one is to replace
> > the first call and pass in a 0'd lane for the first one.
> >
> > Actually, I think I have a patch that could work to not introduce an
> > assignment, but squelch the warning. Something like the following (not
> tested).
> >
> > ---
> >
> > diff --git a/lib/librte_acl/acl_run_neon.h
> > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..37c984fef 100644
> > --- a/lib/librte_acl/acl_run_neon.h
> > +++ b/lib/librte_acl/acl_run_neon.h
> > @@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> > uint8_t **data,
> > uint64_t index_array[8];
> > struct completion cmplt[8];
> > struct parms parms[8];
> > + static int32x4_t ZEROVAL;
> > int32x4_t input0, input1;
> >
> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
> > 181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> > uint8_t **data,
> >
> > while (flows.started > 0) {
> > /* Gather 4 bytes of input data for each stream. */
> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
> > 0);
> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
> > 0);
> > + input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> > ZEROVAL, 0);
> > + input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> > ZEROVAL, 0);
> >
> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0,
> 1);
> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1,
> 1); @@
> > -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
> > uint8_t **data,
> > uint64_t index_array[4];
> > struct completion cmplt[4];
> > struct parms parms[4];
> > + static int32x4_t ZEROVAL;
> > int32x4_t input;
> >
> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
> > 242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
> > uint8_t **data,
> >
> > while (flows.started > 0) {
> > /* Gather 4 bytes of input data for each stream. */
> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> > + input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> > ZEROVAL, 0);
> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
> > --
> > 2.21.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-05 21:36 ` Honnappa Nagarahalli
@ 2019-06-05 22:38 ` Michael Santana Francisco
2019-06-06 4:42 ` Honnappa Nagarahalli
2019-06-06 14:57 ` Jerin Jacob Kollanukkaran
0 siblings, 2 replies; 16+ messages in thread
From: Michael Santana Francisco @ 2019-06-05 22:38 UTC (permalink / raw)
To: Honnappa Nagarahalli, Aaron Conole, thomas
Cc: Ruifeng Wang (Arm Technology China),
Gavin Hu (Arm Technology China),
Dharmik Thakkar, jerin.jacob, yskoh, dev, bruce.richardson, nd
On 6/5/19 5:36 PM, Honnappa Nagarahalli wrote:
>>> Thomas Monjalon <thomas@monjalon.net> writes:
>>>
>>>> 05/06/2019 21:40, Aaron Conole:
>>>>> Thomas Monjalon <thomas@monjalon.net> writes:
>>>>>
>>>>>> The compilation of the master branch is failing for aarch64:
>>>>>> https://travis-ci.com/DPDK/dpdk
>>>>>> The log is so much verbose that I am not able to understand what
>>>>>> is really wrong.
>>>>>> Please help to diagnose and fix, thanks.
>>>>> A discussion about this:
>>>>>
>>>>> http://mails.dpdk.org/archives/dev/2019-June/134012.html
>>>> I see the error now.
>>>> It is printing the full log after the error, so I missed the error
>>>> at the top.
>>>>
>>>> I've read your comment about a possible error with the patch
>>>> removing weak functions but neither me nor Bruce were able to reproduce
>> it.
>>>> What is the condition to see this compiler warning?
>>> It is only on ARM, and only when the neon intrinsics are in use.
>> I am not able to reproduce it from the tip of master.
>>
>> I am using:
>> gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
>>
>> From the log on Travis, looks like the compiler is:
>> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
>>
>> Is this the issue?
>>
>> Why are we seeing the error now?
> I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010, it works fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
> Are there any issues in upgrading to 7 or 8?
I have tested it on my ubuntu 16.04 vm on commit
8cb511bb94ad92a76990f175cac76bb13d51daba (head of master seems to be
failing for other reasons on my vm).
I tested the following gcc versions:
gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010"
gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0"
gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0"
All tested versions failed on the exact same error shown in travis. I
don't know if the compiler is at fault here. Maybe Aaron's patch is a
viable option?
>
>>> The issue is the vector lane setting code looks like:
>>>
>>> lval = lane_set(scalar, rval, lane id)
>>>
>>> In this case, 'rval' is being used before it is ever set, but it
>>> really could be just 0 for the first lane setting code. Thereafter,
>>> we use the old value of input as the rval, but each time a different lane is set.
>>>
>>> It would be nice if there were an intrinsic that formatted correctly
>>> from the start (something we could call like lval =
>> lane_set_from_array(scalar_array)).
>>> Then 'input' would never appear as an rval before it was set.
>>>
>>> I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
>>> There are three 'fixes' I know exist - one is to squelch the warning
>>> (but I don't like it because it could hide future code that introduces
>>> this), one is to create a static and use assignment, one is to replace
>>> the first call and pass in a 0'd lane for the first one.
>>>
>>> Actually, I think I have a patch that could work to not introduce an
>>> assignment, but squelch the warning. Something like the following (not
>> tested).
>>> ---
>>>
>>> diff --git a/lib/librte_acl/acl_run_neon.h
>>> b/lib/librte_acl/acl_run_neon.h index 01b9766d8..37c984fef 100644
>>> --- a/lib/librte_acl/acl_run_neon.h
>>> +++ b/lib/librte_acl/acl_run_neon.h
>>> @@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>>> uint8_t **data,
>>> uint64_t index_array[8];
>>> struct completion cmplt[8];
>>> struct parms parms[8];
>>> + static int32x4_t ZEROVAL;
>>> int32x4_t input0, input1;
>>>
>>> acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
>>> 181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>>> uint8_t **data,
>>>
>>> while (flows.started > 0) {
>>> /* Gather 4 bytes of input data for each stream. */
>>> - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
>>> 0);
>>> - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
>>> 0);
>>> + input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>>> ZEROVAL, 0);
>>> + input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
>>> ZEROVAL, 0);
>>>
>>> input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0,
>> 1);
>>> input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1,
>> 1); @@
>>> -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
>>> uint8_t **data,
>>> uint64_t index_array[4];
>>> struct completion cmplt[4];
>>> struct parms parms[4];
>>> + static int32x4_t ZEROVAL;
>>> int32x4_t input;
>>>
>>> acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
>>> 242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
>>> uint8_t **data,
>>>
>>> while (flows.started > 0) {
>>> /* Gather 4 bytes of input data for each stream. */
>>> - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
>>> + input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>>> ZEROVAL, 0);
>>> input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
>>> input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
>>> input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
>>> --
>>> 2.21.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-05 22:38 ` Michael Santana Francisco
@ 2019-06-06 4:42 ` Honnappa Nagarahalli
2019-06-06 14:50 ` Aaron Conole
2019-06-06 14:57 ` Jerin Jacob Kollanukkaran
1 sibling, 1 reply; 16+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-06 4:42 UTC (permalink / raw)
To: msantana, Aaron Conole, thomas
Cc: Ruifeng Wang (Arm Technology China),
Gavin Hu (Arm Technology China),
Dharmik Thakkar, jerin.jacob, yskoh, dev, bruce.richardson,
Honnappa Nagarahalli, nd, nd
From: Michael Santana Francisco <msantana@redhat.com>
Sent: Wednesday, June 5, 2019 5:39 PM
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Aaron Conole <aconole@redhat.com>; thomas@monjalon.net
Cc: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>; Gavin Hu (Arm Technology China) <Gavin.Hu@arm.com>; Dharmik Thakkar <Dharmik.Thakkar@arm.com>; jerin.jacob@caviumnetworks.com; yskoh@mellanox.com; dev@dpdk.org; bruce.richardson@intel.com; nd <nd@arm.com>
Subject: Re: DPDK compilation on arm is failing in Travis
On 6/5/19 5:36 PM, Honnappa Nagarahalli wrote:
Thomas Monjalon <thomas@monjalon.net><mailto:thomas@monjalon.net> writes:
05/06/2019 21:40, Aaron Conole:
Thomas Monjalon <thomas@monjalon.net><mailto:thomas@monjalon.net> writes:
The compilation of the master branch is failing for aarch64:
https://travis-ci.com/DPDK/dpdk
The log is so much verbose that I am not able to understand what
is really wrong.
Please help to diagnose and fix, thanks.
A discussion about this:
http://mails.dpdk.org/archives/dev/2019-June/134012.html
I see the error now.
It is printing the full log after the error, so I missed the error
at the top.
I've read your comment about a possible error with the patch
removing weak functions but neither me nor Bruce were able to reproduce
it.
What is the condition to see this compiler warning?
It is only on ARM, and only when the neon intrinsics are in use.
I am not able to reproduce it from the tip of master.
I am using:
gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
From the log on Travis, looks like the compiler is:
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
Is this the issue?
Why are we seeing the error now?
I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010, it works fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
Are there any issues in upgrading to 7 or 8?
I have tested it on my ubuntu 16.04 vm on commit 8cb511bb94ad92a76990f175cac76bb13d51daba (head of master seems to be failing for other reasons on my vm).
I tested the following gcc versions:
gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010"
gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0"
gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0"
All tested versions failed on the exact same error shown in travis. I don't know if the compiler is at fault here. Maybe Aaron's patch is a viable option?
The issue is the vector lane setting code looks like:
lval = lane_set(scalar, rval, lane id)
In this case, 'rval' is being used before it is ever set, but it
really could be just 0 for the first lane setting code. Thereafter,
we use the old value of input as the rval, but each time a different lane is set.
It would be nice if there were an intrinsic that formatted correctly
from the start (something we could call like lval =
lane_set_from_array(scalar_array)).
[Honnappa] This exists already. ‘vdupq_n_s32’ can be used. Can you try the following?
honnag01@qc2400f-1:~/dpdk$ git diff
diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
index 01b9766d8..b3196cd12 100644
--- a/lib/librte_acl/acl_run_neon.h
+++ b/lib/librte_acl/acl_run_neon.h
@@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
while (flows.started > 0) {
/* Gather 4 bytes of input data for each stream. */
- input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
- input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
+ input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
+ input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));
input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
@@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
while (flows.started > 0) {
/* Gather 4 bytes of input data for each stream. */
- input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
+ input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
Then 'input' would never appear as an rval before it was set.
I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
There are three 'fixes' I know exist - one is to squelch the warning
(but I don't like it because it could hide future code that introduces
this), one is to create a static and use assignment, one is to replace
the first call and pass in a 0'd lane for the first one.
Actually, I think I have a patch that could work to not introduce an
assignment, but squelch the warning. Something like the following (not
tested).
---
diff --git a/lib/librte_acl/acl_run_neon.h
b/lib/librte_acl/acl_run_neon.h index 01b9766d8..37c984fef 100644
--- a/lib/librte_acl/acl_run_neon.h
+++ b/lib/librte_acl/acl_run_neon.h
@@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
uint8_t **data,
uint64_t index_array[8];
struct completion cmplt[8];
struct parms parms[8];
+ static int32x4_t ZEROVAL;
int32x4_t input0, input1;
acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
uint8_t **data,
while (flows.started > 0) {
/* Gather 4 bytes of input data for each stream. */
- input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
0);
- input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
0);
+ input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
ZEROVAL, 0);
+ input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
ZEROVAL, 0);
input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0,
1);
input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1,
1); @@
-227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
uint8_t **data,
uint64_t index_array[4];
struct completion cmplt[4];
struct parms parms[4];
+ static int32x4_t ZEROVAL;
int32x4_t input;
acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
uint8_t **data,
while (flows.started > 0) {
/* Gather 4 bytes of input data for each stream. */
- input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
+ input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
ZEROVAL, 0);
input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
--
2.21.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-06 4:42 ` Honnappa Nagarahalli
@ 2019-06-06 14:50 ` Aaron Conole
2019-06-07 5:10 ` Honnappa Nagarahalli
0 siblings, 1 reply; 16+ messages in thread
From: Aaron Conole @ 2019-06-06 14:50 UTC (permalink / raw)
To: Honnappa Nagarahalli
Cc: msantana, thomas, Ruifeng Wang (Arm Technology China),
Gavin Hu (Arm Technology China),
Dharmik Thakkar, jerin.jacob, yskoh, dev, bruce.richardson, nd
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:
>
>
>
>
> From: Michael Santana Francisco <msantana@redhat.com>
> Sent: Wednesday, June 5, 2019 5:39 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Aaron Conole
> <aconole@redhat.com>; thomas@monjalon.net
> Cc: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>; Gavin Hu (Arm Technology
> China) <Gavin.Hu@arm.com>; Dharmik Thakkar <Dharmik.Thakkar@arm.com>;
> jerin.jacob@caviumnetworks.com; yskoh@mellanox.com; dev@dpdk.org;
> bruce.richardson@intel.com; nd <nd@arm.com>
> Subject: Re: DPDK compilation on arm is failing in Travis
>
>
>
> On 6/5/19 5:36 PM, Honnappa Nagarahalli wrote:
>
>
>
> Thomas Monjalon <thomas@monjalon.net> writes:
>
>
>
> 05/06/2019 21:40, Aaron Conole:
>
> Thomas Monjalon <thomas@monjalon.net> writes:
>
>
>
> The compilation of the master branch is failing for aarch64:
>
> https://travis-ci.com/DPDK/dpdk
>
> The log is so much verbose that I am not able to understand what
>
> is really wrong.
>
> Please help to diagnose and fix, thanks.
>
>
>
> A discussion about this:
>
>
>
> http://mails.dpdk.org/archives/dev/2019-June/134012.html
>
>
>
> I see the error now.
>
> It is printing the full log after the error, so I missed the error
>
> at the top.
>
>
>
> I've read your comment about a possible error with the patch
>
> removing weak functions but neither me nor Bruce were able to reproduce
>
> it.
>
> What is the condition to see this compiler warning?
>
>
>
> It is only on ARM, and only when the neon intrinsics are in use.
>
> I am not able to reproduce it from the tip of master.
>
>
>
> I am using:
>
> gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
>
>
>
> From the log on Travis, looks like the compiler is:
>
> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
>
>
>
> Is this the issue?
>
>
>
> Why are we seeing the error now?
>
> I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010, it works fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
>
> Are there any issues in upgrading to 7 or 8?
>
> I have tested it on my ubuntu 16.04 vm on commit 8cb511bb94ad92a76990f175cac76bb13d51daba
> (head of master seems to be failing for other reasons on my vm).
> I tested the following gcc versions:
>
> gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010"
> gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0"
> gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0"
>
> All tested versions failed on the exact same error shown in travis. I don't know if the compiler is at
> fault here. Maybe Aaron's patch is a viable option?
>
> The issue is the vector lane setting code looks like:
>
>
>
> lval = lane_set(scalar, rval, lane id)
>
>
>
> In this case, 'rval' is being used before it is ever set, but it
>
> really could be just 0 for the first lane setting code. Thereafter,
>
> we use the old value of input as the rval, but each time a different lane is set.
>
>
>
> It would be nice if there were an intrinsic that formatted correctly
>
> from the start (something we could call like lval =
>
> lane_set_from_array(scalar_array)).
>
> [Honnappa] This exists already. ‘vdupq_n_s32’ can be used. Can you try the following?
Well, it isn't exactly that. You are setting all lanes from a scalar.
I'd rather be able to say:
input0 = vdupq_nn_s32(&parms[0]);
input1 = vdupq_nn_s32(&parms[4]);
Something like that, which lets us delete all the rest of the lane-set
code. But it seems it doesn't exist.
Regardless, I think either patch should work (either using the 'all
lanes' setting you have or the static variable). I have no preference
on it - it's up to you (or someone else) to say which is preferred. I
guess your version could be preferable since there's no static to need
to "explain" :)
> honnag01@qc2400f-1:~/dpdk$ git diff
>
> diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
>
> index 01b9766d8..b3196cd12 100644
>
> --- a/lib/librte_acl/acl_run_neon.h
>
> +++ b/lib/librte_acl/acl_run_neon.h
>
> @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
>
>
>
> while (flows.started > 0) {
>
> /* Gather 4 bytes of input data for each stream. */
>
> - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
>
> - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
>
> + input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
>
> + input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));
>
>
>
> input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
>
> input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
>
> @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
>
>
>
> while (flows.started > 0) {
>
> /* Gather 4 bytes of input data for each stream. */
>
> - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
>
> + input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
>
> input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
>
> input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
>
> input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
>
>
>
> Then 'input' would never appear as an rval before it was set.
>
>
>
> I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
>
> There are three 'fixes' I know exist - one is to squelch the warning
>
> (but I don't like it because it could hide future code that introduces
>
> this), one is to create a static and use assignment, one is to replace
>
> the first call and pass in a 0'd lane for the first one.
>
>
>
> Actually, I think I have a patch that could work to not introduce an
>
> assignment, but squelch the warning. Something like the following (not
>
> tested).
>
>
>
> ---
>
>
>
> diff --git a/lib/librte_acl/acl_run_neon.h
>
> b/lib/librte_acl/acl_run_neon.h index 01b9766d8..37c984fef 100644
>
> --- a/lib/librte_acl/acl_run_neon.h
>
> +++ b/lib/librte_acl/acl_run_neon.h
>
> @@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>
> uint8_t **data,
>
> uint64_t index_array[8];
>
> struct completion cmplt[8];
>
> struct parms parms[8];
>
> + static int32x4_t ZEROVAL;
>
> int32x4_t input0, input1;
>
>
>
> acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
>
> 181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>
> uint8_t **data,
>
>
>
> while (flows.started > 0) {
>
> /* Gather 4 bytes of input data for each stream. */
>
> - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
>
> 0);
>
> - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
>
> 0);
>
> + input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>
> ZEROVAL, 0);
>
> + input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
>
> ZEROVAL, 0);
>
>
>
> input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0,
>
> 1);
>
> input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1,
>
> 1); @@
>
> -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
>
> uint8_t **data,
>
> uint64_t index_array[4];
>
> struct completion cmplt[4];
>
> struct parms parms[4];
>
> + static int32x4_t ZEROVAL;
>
> int32x4_t input;
>
>
>
> acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
>
> 242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
>
> uint8_t **data,
>
>
>
> while (flows.started > 0) {
>
> /* Gather 4 bytes of input data for each stream. */
>
> - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
>
> + input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>
> ZEROVAL, 0);
>
> input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
>
> input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
>
> input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
>
> --
>
> 2.21.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-05 22:38 ` Michael Santana Francisco
2019-06-06 4:42 ` Honnappa Nagarahalli
@ 2019-06-06 14:57 ` Jerin Jacob Kollanukkaran
2019-06-06 17:06 ` Michael Santana Francisco
1 sibling, 1 reply; 16+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-06 14:57 UTC (permalink / raw)
To: msantana, Honnappa Nagarahalli, Aaron Conole, thomas
Cc: Ruifeng Wang (Arm Technology China),
Gavin Hu (Arm Technology China),
Dharmik Thakkar, jerin.jacob, yskoh, dev, bruce.richardson, nd
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Michael Santana
> Francisco
> Sent: Thursday, June 6, 2019 4:09 AM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Aaron Conole
> <aconole@redhat.com>; thomas@monjalon.net
> Cc: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>; Gavin
> Hu (Arm Technology China) <Gavin.Hu@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>; jerin.jacob@caviumnetworks.com;
> yskoh@mellanox.com; dev@dpdk.org; bruce.richardson@intel.com; nd
> <nd@arm.com>
> Subject: Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
>
> On 6/5/19 5:36 PM, Honnappa Nagarahalli wrote:
> >>> Thomas Monjalon <thomas@monjalon.net> writes:
> >>>
> >>>> 05/06/2019 21:40, Aaron Conole:
> >>>>> Thomas Monjalon <thomas@monjalon.net> writes:
> >>>>>
> >>>>>> The compilation of the master branch is failing for aarch64:
> >>>>>> https://travis-ci.com/DPDK/dpdk
> >>>>>> The log is so much verbose that I am not able to understand what
> >>>>>> is really wrong.
> >>>>>> Please help to diagnose and fix, thanks.
> >>>>> A discussion about this:
> >>>>>
> >>>>> http://mails.dpdk.org/archives/dev/2019-June/134012.html
> >>>> I see the error now.
> >>>> It is printing the full log after the error, so I missed the error
> >>>> at the top.
> >>>>
> >>>> I've read your comment about a possible error with the patch
> >>>> removing weak functions but neither me nor Bruce were able to
> >>>> reproduce
> >> it.
> >>>> What is the condition to see this compiler warning?
> >>> It is only on ARM, and only when the neon intrinsics are in use.
> >> I am not able to reproduce it from the tip of master.
> >>
> >> I am using:
> >> gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
> >>
> >> From the log on Travis, looks like the compiler is:
> >> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
> >>
> >> Is this the issue?
> >>
> >> Why are we seeing the error now?
> > I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010, it
> works fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
> > Are there any issues in upgrading to 7 or 8?
> I have tested it on my ubuntu 16.04 vm on commit
> 8cb511bb94ad92a76990f175cac76bb13d51daba (head of master seems to be
> failing for other reasons on my vm).
> I tested the following gcc versions:
>
> gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010"
> gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0"
> gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0"
>
Though I am not able to reproduce the issue.
Could you the following patch fixing the issue?
http://patches.dpdk.org/patch/54501/
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-06 14:57 ` Jerin Jacob Kollanukkaran
@ 2019-06-06 17:06 ` Michael Santana Francisco
0 siblings, 0 replies; 16+ messages in thread
From: Michael Santana Francisco @ 2019-06-06 17:06 UTC (permalink / raw)
To: Jerin Jacob Kollanukkaran, Honnappa Nagarahalli, Aaron Conole, thomas
Cc: Ruifeng Wang (Arm Technology China),
Gavin Hu (Arm Technology China),
Dharmik Thakkar, jerin.jacob, yskoh, dev, bruce.richardson, nd
On 6/6/19 10:57 AM, Jerin Jacob Kollanukkaran wrote:
>> -----Original Message-----
>> From: dev <dev-bounces@dpdk.org> On Behalf Of Michael Santana
>> Francisco
>> Sent: Thursday, June 6, 2019 4:09 AM
>> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Aaron Conole
>> <aconole@redhat.com>; thomas@monjalon.net
>> Cc: Ruifeng Wang (Arm Technology China) <Ruifeng.Wang@arm.com>; Gavin
>> Hu (Arm Technology China) <Gavin.Hu@arm.com>; Dharmik Thakkar
>> <Dharmik.Thakkar@arm.com>; jerin.jacob@caviumnetworks.com;
>> yskoh@mellanox.com; dev@dpdk.org; bruce.richardson@intel.com; nd
>> <nd@arm.com>
>> Subject: Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
>>
>> On 6/5/19 5:36 PM, Honnappa Nagarahalli wrote:
>>>>> Thomas Monjalon <thomas@monjalon.net> writes:
>>>>>
>>>>>> 05/06/2019 21:40, Aaron Conole:
>>>>>>> Thomas Monjalon <thomas@monjalon.net> writes:
>>>>>>>
>>>>>>>> The compilation of the master branch is failing for aarch64:
>>>>>>>> https://travis-ci.com/DPDK/dpdk
>>>>>>>> The log is so much verbose that I am not able to understand what
>>>>>>>> is really wrong.
>>>>>>>> Please help to diagnose and fix, thanks.
>>>>>>> A discussion about this:
>>>>>>>
>>>>>>> http://mails.dpdk.org/archives/dev/2019-June/134012.html
>>>>>> I see the error now.
>>>>>> It is printing the full log after the error, so I missed the error
>>>>>> at the top.
>>>>>>
>>>>>> I've read your comment about a possible error with the patch
>>>>>> removing weak functions but neither me nor Bruce were able to
>>>>>> reproduce
>>>> it.
>>>>>> What is the condition to see this compiler warning?
>>>>> It is only on ARM, and only when the neon intrinsics are in use.
>>>> I am not able to reproduce it from the tip of master.
>>>>
>>>> I am using:
>>>> gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
>>>>
>>>> From the log on Travis, looks like the compiler is:
>>>> gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
>>>>
>>>> Is this the issue?
>>>>
>>>> Why are we seeing the error now?
>>> I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010, it
>> works fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
>>> Are there any issues in upgrading to 7 or 8?
>> I have tested it on my ubuntu 16.04 vm on commit
>> 8cb511bb94ad92a76990f175cac76bb13d51daba (head of master seems to be
>> failing for other reasons on my vm).
>> I tested the following gcc versions:
>>
>> gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010"
>> gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0"
>> gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0"
>>
> Though I am not able to reproduce the issue.
> Could you the following patch fixing the issue?
>
> http://patches.dpdk.org/patch/54501/
Yes, I have tested this patch on travis and it fixes the issue. see:
https://travis-ci.com/Maickii/dpdk-2/builds/114612090
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-06 14:50 ` Aaron Conole
@ 2019-06-07 5:10 ` Honnappa Nagarahalli
2019-06-07 13:24 ` Aaron Conole
0 siblings, 1 reply; 16+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-07 5:10 UTC (permalink / raw)
To: Aaron Conole
Cc: msantana, thomas, Ruifeng Wang (Arm Technology China),
Gavin Hu (Arm Technology China),
Dharmik Thakkar, jerin.jacob, yskoh, dev, bruce.richardson, nd,
nd
> >
> > Thomas Monjalon <thomas@monjalon.net> writes:
> >
> >
> >
> > The compilation of the master branch is failing for aarch64:
> >
> > https://travis-ci.com/DPDK/dpdk
> >
> > The log is so much verbose that I am not able to understand what
> >
> > is really wrong.
> >
> > Please help to diagnose and fix, thanks.
> >
> >
> >
> > A discussion about this:
> >
> >
> >
> > http://mails.dpdk.org/archives/dev/2019-June/134012.html
> >
> >
> >
> > I see the error now.
> >
> > It is printing the full log after the error, so I missed the error
> >
> > at the top.
> >
> >
> >
> > I've read your comment about a possible error with the patch
> >
> > removing weak functions but neither me nor Bruce were able to
> > reproduce
> >
> > it.
> >
> > What is the condition to see this compiler warning?
> >
> >
> >
> > It is only on ARM, and only when the neon intrinsics are in use.
> >
> > I am not able to reproduce it from the tip of master.
> >
> >
> >
> > I am using:
> >
> > gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
> >
> >
> >
> > From the log on Travis, looks like the compiler is:
> >
> > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
> >
> >
> >
> > Is this the issue?
> >
> >
> >
> > Why are we seeing the error now?
> >
> > I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010, it
> works fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
> >
> > Are there any issues in upgrading to 7 or 8?
> >
> > I have tested it on my ubuntu 16.04 vm on commit
> > 8cb511bb94ad92a76990f175cac76bb13d51daba
> > (head of master seems to be failing for other reasons on my vm).
> > I tested the following gcc versions:
> >
> > gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010"
> > gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0"
> > gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0"
> >
> > All tested versions failed on the exact same error shown in travis. I
> > don't know if the compiler is at fault here. Maybe Aaron's patch is a viable
> option?
> >
> > The issue is the vector lane setting code looks like:
> >
> >
> >
> > lval = lane_set(scalar, rval, lane id)
> >
> >
> >
> > In this case, 'rval' is being used before it is ever set, but it
> >
> > really could be just 0 for the first lane setting code. Thereafter,
> >
> > we use the old value of input as the rval, but each time a different lane is
> set.
> >
> >
> >
> > It would be nice if there were an intrinsic that formatted correctly
> >
> > from the start (something we could call like lval =
> >
> > lane_set_from_array(scalar_array)).
> >
> > [Honnappa] This exists already. ‘vdupq_n_s32’ can be used. Can you try the
> following?
>
> Well, it isn't exactly that. You are setting all lanes from a scalar.
Yes, you are correct, it sets all the lanes. I am not sure on how this will affect the performance.
> I'd rather be able to say:
>
> input0 = vdupq_nn_s32(&parms[0]);
> input1 = vdupq_nn_s32(&parms[4]);
>
> Something like that, which lets us delete all the rest of the lane-set code. But
> it seems it doesn't exist.
>
> Regardless, I think either patch should work (either using the 'all lanes'
> setting you have or the static variable). I have no preference on it - it's up to
> you (or someone else) to say which is preferred. I guess your version could be
> preferable since there's no static to need to "explain" :)
I think we can go ahead with your patch with using a temporary vector for the first set, as it does not introduce any change to the code and hence performance should not get affected.
But, I do not understand why you have added 'static'. Also, changing 'ZEROVAL' to 'tmp' or something similar will be better.
>
> > honnag01@qc2400f-1:~/dpdk$ git diff
> >
> > diff --git a/lib/librte_acl/acl_run_neon.h
> > b/lib/librte_acl/acl_run_neon.h
> >
> > index 01b9766d8..b3196cd12 100644
> >
> > --- a/lib/librte_acl/acl_run_neon.h
> >
> > +++ b/lib/librte_acl/acl_run_neon.h
> >
> > @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> > uint8_t **data,
> >
> >
> >
> > while (flows.started > 0) {
> >
> > /* Gather 4 bytes of input data for each stream. */
> >
> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
> >
> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
> >
> > + input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
> >
> > + input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));
> >
> >
> >
> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> > input0, 1);
> >
> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
> > input1, 1);
> >
> > @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
> > uint8_t **data,
> >
> >
> >
> > while (flows.started > 0) {
> >
> > /* Gather 4 bytes of input data for each stream. */
> >
> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> >
> > + input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
> >
> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> > input, 1);
> >
> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
> > input, 2);
> >
> > input =
> > vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
> >
> >
> >
> > Then 'input' would never appear as an rval before it was set.
> >
> >
> >
> > I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
> >
> > There are three 'fixes' I know exist - one is to squelch the warning
> >
> > (but I don't like it because it could hide future code that introduces
> >
> > this), one is to create a static and use assignment, one is to replace
> >
> > the first call and pass in a 0'd lane for the first one.
> >
> >
> >
> > Actually, I think I have a patch that could work to not introduce an
> >
> > assignment, but squelch the warning. Something like the following
> > (not
> >
> > tested).
> >
> >
> >
> > ---
> >
> >
> >
> > diff --git a/lib/librte_acl/acl_run_neon.h
> >
> > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..37c984fef 100644
> >
> > --- a/lib/librte_acl/acl_run_neon.h
> >
> > +++ b/lib/librte_acl/acl_run_neon.h
> >
> > @@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> >
> > uint8_t **data,
> >
> > uint64_t index_array[8];
> >
> > struct completion cmplt[8];
> >
> > struct parms parms[8];
> >
> > + static int32x4_t ZEROVAL;
> >
> > int32x4_t input0, input1;
> >
> >
> >
> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
> >
> > 181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> >
> > uint8_t **data,
> >
> >
> >
> > while (flows.started > 0) {
> >
> > /* Gather 4 bytes of input data for each stream. */
> >
> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
> >
> > 0);
> >
> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
> >
> > 0);
> >
> > + input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> >
> > ZEROVAL, 0);
> >
> > + input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> >
> > ZEROVAL, 0);
> >
> >
> >
> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0,
> >
> > 1);
> >
> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
> > input1,
> >
> > 1); @@
> >
> > -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
> >
> > uint8_t **data,
> >
> > uint64_t index_array[4];
> >
> > struct completion cmplt[4];
> >
> > struct parms parms[4];
> >
> > + static int32x4_t ZEROVAL;
> >
> > int32x4_t input;
> >
> >
> >
> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
> >
> > 242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
> >
> > uint8_t **data,
> >
> >
> >
> > while (flows.started > 0) {
> >
> > /* Gather 4 bytes of input data for each stream. */
> >
> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> >
> > + input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> >
> > ZEROVAL, 0);
> >
> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input,
> > 1);
> >
> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input,
> > 2);
> >
> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input,
> > 3);
> >
> > --
> >
> > 2.21.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-07 5:10 ` Honnappa Nagarahalli
@ 2019-06-07 13:24 ` Aaron Conole
2019-06-07 13:53 ` Honnappa Nagarahalli
0 siblings, 1 reply; 16+ messages in thread
From: Aaron Conole @ 2019-06-07 13:24 UTC (permalink / raw)
To: Honnappa Nagarahalli
Cc: msantana, thomas, Ruifeng Wang (Arm Technology China),
Gavin Hu (Arm Technology China),
Dharmik Thakkar, jerin.jacob, yskoh, dev, bruce.richardson, nd
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> writes:
>> >
>> > Thomas Monjalon <thomas@monjalon.net> writes:
>> >
>> >
>> >
>> > The compilation of the master branch is failing for aarch64:
>> >
>> > https://travis-ci.com/DPDK/dpdk
>> >
>> > The log is so much verbose that I am not able to understand what
>> >
>> > is really wrong.
>> >
>> > Please help to diagnose and fix, thanks.
>> >
>> >
>> >
>> > A discussion about this:
>> >
>> >
>> >
>> > http://mails.dpdk.org/archives/dev/2019-June/134012.html
>> >
>> >
>> >
>> > I see the error now.
>> >
>> > It is printing the full log after the error, so I missed the error
>> >
>> > at the top.
>> >
>> >
>> >
>> > I've read your comment about a possible error with the patch
>> >
>> > removing weak functions but neither me nor Bruce were able to
>> > reproduce
>> >
>> > it.
>> >
>> > What is the condition to see this compiler warning?
>> >
>> >
>> >
>> > It is only on ARM, and only when the neon intrinsics are in use.
>> >
>> > I am not able to reproduce it from the tip of master.
>> >
>> >
>> >
>> > I am using:
>> >
>> > gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
>> >
>> >
>> >
>> > From the log on Travis, looks like the compiler is:
>> >
>> > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
>> >
>> >
>> >
>> > Is this the issue?
>> >
>> >
>> >
>> > Why are we seeing the error now?
>> >
>> > I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010, it
>> works fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
>> >
>> > Are there any issues in upgrading to 7 or 8?
>> >
>> > I have tested it on my ubuntu 16.04 vm on commit
>> > 8cb511bb94ad92a76990f175cac76bb13d51daba
>> > (head of master seems to be failing for other reasons on my vm).
>> > I tested the following gcc versions:
>> >
>> > gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010"
>> > gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0"
>> > gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0"
>> >
>> > All tested versions failed on the exact same error shown in travis. I
>> > don't know if the compiler is at fault here. Maybe Aaron's patch is a viable
>> option?
>> >
>> > The issue is the vector lane setting code looks like:
>> >
>> >
>> >
>> > lval = lane_set(scalar, rval, lane id)
>> >
>> >
>> >
>> > In this case, 'rval' is being used before it is ever set, but it
>> >
>> > really could be just 0 for the first lane setting code. Thereafter,
>> >
>> > we use the old value of input as the rval, but each time a different lane is
>> set.
>> >
>> >
>> >
>> > It would be nice if there were an intrinsic that formatted correctly
>> >
>> > from the start (something we could call like lval =
>> >
>> > lane_set_from_array(scalar_array)).
>> >
>> > [Honnappa] This exists already. ‘vdupq_n_s32’ can be used. Can you try the
>> following?
>>
>> Well, it isn't exactly that. You are setting all lanes from a scalar.
> Yes, you are correct, it sets all the lanes. I am not sure on how this
> will affect the performance.
>
>> I'd rather be able to say:
>>
>> input0 = vdupq_nn_s32(&parms[0]);
>> input1 = vdupq_nn_s32(&parms[4]);
>>
>> Something like that, which lets us delete all the rest of the lane-set code. But
>> it seems it doesn't exist.
>>
>> Regardless, I think either patch should work (either using the 'all lanes'
>> setting you have or the static variable). I have no preference on it - it's up to
>> you (or someone else) to say which is preferred. I guess your version could be
>> preferable since there's no static to need to "explain" :)
> I think we can go ahead with your patch with using a temporary vector
> for the first set, as it does not introduce any change to the code and
> hence performance should not get affected.
>
> But, I do not understand why you have added 'static'. Also, changing
> 'ZEROVAL' to 'tmp' or something similar will be better.
The static is there to guarantee '0' value. Otherwise we create a temp
variable that has to be initialized explicitly.
>>
>> > honnag01@qc2400f-1:~/dpdk$ git diff
>> >
>> > diff --git a/lib/librte_acl/acl_run_neon.h
>> > b/lib/librte_acl/acl_run_neon.h
>> >
>> > index 01b9766d8..b3196cd12 100644
>> >
>> > --- a/lib/librte_acl/acl_run_neon.h
>> >
>> > +++ b/lib/librte_acl/acl_run_neon.h
>> >
>> > @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>> > uint8_t **data,
>> >
>> >
>> >
>> > while (flows.started > 0) {
>> >
>> > /* Gather 4 bytes of input data for each stream. */
>> >
>> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
>> >
>> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
>> >
>> > + input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
>> >
>> > + input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));
>> >
>> >
>> >
>> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
>> > input0, 1);
>> >
>> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
>> > input1, 1);
>> >
>> > @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
>> > uint8_t **data,
>> >
>> >
>> >
>> > while (flows.started > 0) {
>> >
>> > /* Gather 4 bytes of input data for each stream. */
>> >
>> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
>> >
>> > + input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
>> >
>> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
>> > input, 1);
>> >
>> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
>> > input, 2);
>> >
>> > input =
>> > vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
>> >
>> >
>> >
>> > Then 'input' would never appear as an rval before it was set.
>> >
>> >
>> >
>> > I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
>> >
>> > There are three 'fixes' I know exist - one is to squelch the warning
>> >
>> > (but I don't like it because it could hide future code that introduces
>> >
>> > this), one is to create a static and use assignment, one is to replace
>> >
>> > the first call and pass in a 0'd lane for the first one.
>> >
>> >
>> >
>> > Actually, I think I have a patch that could work to not introduce an
>> >
>> > assignment, but squelch the warning. Something like the following
>> > (not
>> >
>> > tested).
>> >
>> >
>> >
>> > ---
>> >
>> >
>> >
>> > diff --git a/lib/librte_acl/acl_run_neon.h
>> >
>> > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..37c984fef 100644
>> >
>> > --- a/lib/librte_acl/acl_run_neon.h
>> >
>> > +++ b/lib/librte_acl/acl_run_neon.h
>> >
>> > @@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>> >
>> > uint8_t **data,
>> >
>> > uint64_t index_array[8];
>> >
>> > struct completion cmplt[8];
>> >
>> > struct parms parms[8];
>> >
>> > + static int32x4_t ZEROVAL;
>> >
>> > int32x4_t input0, input1;
>> >
>> >
>> >
>> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
>> >
>> > 181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>> >
>> > uint8_t **data,
>> >
>> >
>> >
>> > while (flows.started > 0) {
>> >
>> > /* Gather 4 bytes of input data for each stream. */
>> >
>> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
>> >
>> > 0);
>> >
>> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
>> >
>> > 0);
>> >
>> > + input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>> >
>> > ZEROVAL, 0);
>> >
>> > + input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
>> >
>> > ZEROVAL, 0);
>> >
>> >
>> >
>> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0,
>> >
>> > 1);
>> >
>> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
>> > input1,
>> >
>> > 1); @@
>> >
>> > -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
>> >
>> > uint8_t **data,
>> >
>> > uint64_t index_array[4];
>> >
>> > struct completion cmplt[4];
>> >
>> > struct parms parms[4];
>> >
>> > + static int32x4_t ZEROVAL;
>> >
>> > int32x4_t input;
>> >
>> >
>> >
>> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
>> >
>> > 242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
>> >
>> > uint8_t **data,
>> >
>> >
>> >
>> > while (flows.started > 0) {
>> >
>> > /* Gather 4 bytes of input data for each stream. */
>> >
>> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
>> >
>> > + input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>> >
>> > ZEROVAL, 0);
>> >
>> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input,
>> > 1);
>> >
>> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input,
>> > 2);
>> >
>> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input,
>> > 3);
>> >
>> > --
>> >
>> > 2.21.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-07 13:24 ` Aaron Conole
@ 2019-06-07 13:53 ` Honnappa Nagarahalli
2019-06-08 8:38 ` Jerin Jacob Kollanukkaran
0 siblings, 1 reply; 16+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-07 13:53 UTC (permalink / raw)
To: Aaron Conole
Cc: msantana, thomas, Ruifeng Wang (Arm Technology China),
Gavin Hu (Arm Technology China),
Dharmik Thakkar, jerin.jacob, yskoh, dev, bruce.richardson,
Honnappa Nagarahalli, nd, nd
> >> >
> >> > Thomas Monjalon <thomas@monjalon.net> writes:
> >> >
> >> >
> >> >
> >> > The compilation of the master branch is failing for aarch64:
> >> >
> >> > https://travis-ci.com/DPDK/dpdk
> >> >
> >> > The log is so much verbose that I am not able to understand what
> >> >
> >> > is really wrong.
> >> >
> >> > Please help to diagnose and fix, thanks.
> >> >
> >> >
> >> >
> >> > A discussion about this:
> >> >
> >> >
> >> >
> >> > http://mails.dpdk.org/archives/dev/2019-June/134012.html
> >> >
> >> >
> >> >
> >> > I see the error now.
> >> >
> >> > It is printing the full log after the error, so I missed the error
> >> >
> >> > at the top.
> >> >
> >> >
> >> >
> >> > I've read your comment about a possible error with the patch
> >> >
> >> > removing weak functions but neither me nor Bruce were able to
> >> > reproduce
> >> >
> >> > it.
> >> >
> >> > What is the condition to see this compiler warning?
> >> >
> >> >
> >> >
> >> > It is only on ARM, and only when the neon intrinsics are in use.
> >> >
> >> > I am not able to reproduce it from the tip of master.
> >> >
> >> >
> >> >
> >> > I am using:
> >> >
> >> > gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
> >> >
> >> >
> >> >
> >> > From the log on Travis, looks like the compiler is:
> >> >
> >> > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
> >> >
> >> >
> >> >
> >> > Is this the issue?
> >> >
> >> >
> >> >
> >> > Why are we seeing the error now?
> >> >
> >> > I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0 20171010,
> >> > it
> >> works fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
> >> >
> >> > Are there any issues in upgrading to 7 or 8?
> >> >
> >> > I have tested it on my ubuntu 16.04 vm on commit
> >> > 8cb511bb94ad92a76990f175cac76bb13d51daba
> >> > (head of master seems to be failing for other reasons on my vm).
> >> > I tested the following gcc versions:
> >> >
> >> > gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010"
> >> > gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0"
> >> > gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0"
> >> >
> >> > All tested versions failed on the exact same error shown in travis.
> >> > I don't know if the compiler is at fault here. Maybe Aaron's patch
> >> > is a viable
> >> option?
> >> >
> >> > The issue is the vector lane setting code looks like:
> >> >
> >> >
> >> >
> >> > lval = lane_set(scalar, rval, lane id)
> >> >
> >> >
> >> >
> >> > In this case, 'rval' is being used before it is ever set, but it
> >> >
> >> > really could be just 0 for the first lane setting code.
> >> > Thereafter,
> >> >
> >> > we use the old value of input as the rval, but each time a
> >> > different lane is
> >> set.
> >> >
> >> >
> >> >
> >> > It would be nice if there were an intrinsic that formatted
> >> > correctly
> >> >
> >> > from the start (something we could call like lval =
> >> >
> >> > lane_set_from_array(scalar_array)).
> >> >
> >> > [Honnappa] This exists already. ‘vdupq_n_s32’ can be used. Can you
> >> > try the
> >> following?
> >>
> >> Well, it isn't exactly that. You are setting all lanes from a scalar.
> > Yes, you are correct, it sets all the lanes. I am not sure on how this
> > will affect the performance.
> >
> >> I'd rather be able to say:
> >>
> >> input0 = vdupq_nn_s32(&parms[0]);
> >> input1 = vdupq_nn_s32(&parms[4]);
> >>
> >> Something like that, which lets us delete all the rest of the
> >> lane-set code. But it seems it doesn't exist.
> >>
> >> Regardless, I think either patch should work (either using the 'all lanes'
> >> setting you have or the static variable). I have no preference on it
> >> - it's up to you (or someone else) to say which is preferred. I
> >> guess your version could be preferable since there's no static to
> >> need to "explain" :)
> > I think we can go ahead with your patch with using a temporary vector
> > for the first set, as it does not introduce any change to the code and
> > hence performance should not get affected.
> >
> > But, I do not understand why you have added 'static'. Also, changing
> > 'ZEROVAL' to 'tmp' or something similar will be better.
>
> The static is there to guarantee '0' value. Otherwise we create a temp
> variable that has to be initialized explicitly.
Ok, I am fine with this. I guess this is the explanation you wanted to avoid 😊.
>
> >>
> >> > honnag01@qc2400f-1:~/dpdk$ git diff
> >> >
> >> > diff --git a/lib/librte_acl/acl_run_neon.h
> >> > b/lib/librte_acl/acl_run_neon.h
> >> >
> >> > index 01b9766d8..b3196cd12 100644
> >> >
> >> > --- a/lib/librte_acl/acl_run_neon.h
> >> >
> >> > +++ b/lib/librte_acl/acl_run_neon.h
> >> >
> >> > @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> >> > const uint8_t **data,
> >> >
> >> >
> >> >
> >> > while (flows.started > 0) {
> >> >
> >> > /* Gather 4 bytes of input data for each stream. */
> >> >
> >> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
> 0);
> >> >
> >> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
> 0);
> >> >
> >> > + input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
> >> >
> >> > + input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));
> >> >
> >> >
> >> >
> >> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> >> > input0, 1);
> >> >
> >> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
> >> > input1, 1);
> >> >
> >> > @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx,
> >> > const uint8_t **data,
> >> >
> >> >
> >> >
> >> > while (flows.started > 0) {
> >> >
> >> > /* Gather 4 bytes of input data for each stream. */
> >> >
> >> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> >> >
> >> > + input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
> >> >
> >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> >> > input, 1);
> >> >
> >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
> >> > input, 2);
> >> >
> >> > input =
> >> > vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
> >> >
> >> >
> >> >
> >> > Then 'input' would never appear as an rval before it was set.
> >> >
> >> >
> >> >
> >> > I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
> >> >
> >> > There are three 'fixes' I know exist - one is to squelch the
> >> > warning
> >> >
> >> > (but I don't like it because it could hide future code that
> >> > introduces
> >> >
> >> > this), one is to create a static and use assignment, one is to
> >> > replace
> >> >
> >> > the first call and pass in a 0'd lane for the first one.
> >> >
> >> >
> >> >
> >> > Actually, I think I have a patch that could work to not introduce
> >> > an
> >> >
> >> > assignment, but squelch the warning. Something like the following
> >> > (not
> >> >
> >> > tested).
> >> >
> >> >
> >> >
> >> > ---
> >> >
> >> >
> >> >
> >> > diff --git a/lib/librte_acl/acl_run_neon.h
> >> >
> >> > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..37c984fef 100644
> >> >
> >> > --- a/lib/librte_acl/acl_run_neon.h
> >> >
> >> > +++ b/lib/librte_acl/acl_run_neon.h
> >> >
> >> > @@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> >> > const
> >> >
> >> > uint8_t **data,
> >> >
> >> > uint64_t index_array[8];
> >> >
> >> > struct completion cmplt[8];
> >> >
> >> > struct parms parms[8];
> >> >
> >> > + static int32x4_t ZEROVAL;
> >> >
> >> > int32x4_t input0, input1;
> >> >
> >> >
> >> >
> >> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
> >> >
> >> > 181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> >> >
> >> > uint8_t **data,
> >> >
> >> >
> >> >
> >> > while (flows.started > 0) {
> >> >
> >> > /* Gather 4 bytes of input data for each stream. */
> >> >
> >> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
> >> >
> >> > 0);
> >> >
> >> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
> >> >
> >> > 0);
> >> >
> >> > + input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> >> >
> >> > ZEROVAL, 0);
> >> >
> >> > + input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> >> >
> >> > ZEROVAL, 0);
> >> >
> >> >
> >> >
> >> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> >> > input0,
> >> >
> >> > 1);
> >> >
> >> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
> >> > input1,
> >> >
> >> > 1); @@
> >> >
> >> > -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx,
> >> > const
> >> >
> >> > uint8_t **data,
> >> >
> >> > uint64_t index_array[4];
> >> >
> >> > struct completion cmplt[4];
> >> >
> >> > struct parms parms[4];
> >> >
> >> > + static int32x4_t ZEROVAL;
> >> >
> >> > int32x4_t input;
> >> >
> >> >
> >> >
> >> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@ -
> >> >
> >> > 242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
> >> >
> >> > uint8_t **data,
> >> >
> >> >
> >> >
> >> > while (flows.started > 0) {
> >> >
> >> > /* Gather 4 bytes of input data for each stream. */
> >> >
> >> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> >> >
> >> > + input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> >> >
> >> > ZEROVAL, 0);
> >> >
> >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> >> > input, 1);
> >> >
> >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
> >> > input, 2);
> >> >
> >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3),
> >> > input, 3);
> >> >
> >> > --
> >> >
> >> > 2.21.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-07 13:53 ` Honnappa Nagarahalli
@ 2019-06-08 8:38 ` Jerin Jacob Kollanukkaran
2019-06-08 8:41 ` Jerin Jacob Kollanukkaran
0 siblings, 1 reply; 16+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-08 8:38 UTC (permalink / raw)
To: Honnappa Nagarahalli, Aaron Conole
Cc: msantana, thomas, Ruifeng Wang (Arm Technology China),
Gavin Hu (Arm Technology China),
Dharmik Thakkar, jerin.jacob, yskoh, dev, bruce.richardson, nd,
nd
> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Honnappa Nagarahalli
> Sent: Friday, June 7, 2019 7:24 PM
> To: Aaron Conole <aconole@redhat.com>
> Cc: msantana@redhat.com; thomas@monjalon.net; Ruifeng Wang (Arm
> Technology China) <Ruifeng.Wang@arm.com>; Gavin Hu (Arm Technology
> China) <Gavin.Hu@arm.com>; Dharmik Thakkar <Dharmik.Thakkar@arm.com>;
> jerin.jacob@caviumnetworks.com; yskoh@mellanox.com; dev@dpdk.org;
> bruce.richardson@intel.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
>
> > >> >
> > >> > Thomas Monjalon <thomas@monjalon.net> writes:
> > >> >
> > >> >
> > >> >
> > >> > The compilation of the master branch is failing for aarch64:
> > >> >
> > >> > https://travis-ci.com/DPDK/dpdk
> > >> >
> > >> > The log is so much verbose that I am not able to understand what
> > >> >
> > >> > is really wrong.
> > >> >
> > >> > Please help to diagnose and fix, thanks.
> > >> >
> > >> >
> > >> >
> > >> > A discussion about this:
> > >> >
> > >> >
> > >> >
> > >> > http://mails.dpdk.org/archives/dev/2019-June/134012.html
> > >> >
> > >> >
> > >> >
> > >> > I see the error now.
> > >> >
> > >> > It is printing the full log after the error, so I missed the
> > >> > error
> > >> >
> > >> > at the top.
> > >> >
> > >> >
> > >> >
> > >> > I've read your comment about a possible error with the patch
> > >> >
> > >> > removing weak functions but neither me nor Bruce were able to
> > >> > reproduce
> > >> >
> > >> > it.
> > >> >
> > >> > What is the condition to see this compiler warning?
> > >> >
> > >> >
> > >> >
> > >> > It is only on ARM, and only when the neon intrinsics are in use.
> > >> >
> > >> > I am not able to reproduce it from the tip of master.
> > >> >
> > >> >
> > >> >
> > >> > I am using:
> > >> >
> > >> > gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
> > >> >
> > >> >
> > >> >
> > >> > From the log on Travis, looks like the compiler is:
> > >> >
> > >> > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
> > >> >
> > >> >
> > >> >
> > >> > Is this the issue?
> > >> >
> > >> >
> > >> >
> > >> > Why are we seeing the error now?
> > >> >
> > >> > I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0
> > >> > 20171010, it
> > >> works fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
> > >> >
> > >> > Are there any issues in upgrading to 7 or 8?
> > >> >
> > >> > I have tested it on my ubuntu 16.04 vm on commit
> > >> > 8cb511bb94ad92a76990f175cac76bb13d51daba
> > >> > (head of master seems to be failing for other reasons on my vm).
> > >> > I tested the following gcc versions:
> > >> >
> > >> > gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010"
> > >> > gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0"
> > >> > gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0"
> > >> >
> > >> > All tested versions failed on the exact same error shown in travis.
> > >> > I don't know if the compiler is at fault here. Maybe Aaron's
> > >> > patch is a viable
> > >> option?
> > >> >
> > >> > The issue is the vector lane setting code looks like:
> > >> >
> > >> >
> > >> >
> > >> > lval = lane_set(scalar, rval, lane id)
> > >> >
> > >> >
> > >> >
> > >> > In this case, 'rval' is being used before it is ever set, but it
> > >> >
> > >> > really could be just 0 for the first lane setting code.
> > >> > Thereafter,
> > >> >
> > >> > we use the old value of input as the rval, but each time a
> > >> > different lane is
> > >> set.
> > >> >
> > >> >
> > >> >
> > >> > It would be nice if there were an intrinsic that formatted
> > >> > correctly
> > >> >
> > >> > from the start (something we could call like lval =
> > >> >
> > >> > lane_set_from_array(scalar_array)).
> > >> >
> > >> > [Honnappa] This exists already. ‘vdupq_n_s32’ can be used. Can
> > >> > you try the
> > >> following?
> > >>
> > >> Well, it isn't exactly that. You are setting all lanes from a scalar.
> > > Yes, you are correct, it sets all the lanes. I am not sure on how
> > > this will affect the performance.
> > >
> > >> I'd rather be able to say:
> > >>
> > >> input0 = vdupq_nn_s32(&parms[0]);
> > >> input1 = vdupq_nn_s32(&parms[4]);
> > >>
> > >> Something like that, which lets us delete all the rest of the
> > >> lane-set code. But it seems it doesn't exist.
> > >>
> > >> Regardless, I think either patch should work (either using the 'all lanes'
> > >> setting you have or the static variable). I have no preference on
> > >> it
> > >> - it's up to you (or someone else) to say which is preferred. I
> > >> guess your version could be preferable since there's no static to
> > >> need to "explain" :)
> > > I think we can go ahead with your patch with using a temporary
> > > vector for the first set, as it does not introduce any change to the
> > > code and hence performance should not get affected.
> > >
> > > But, I do not understand why you have added 'static'. Also, changing
> > > 'ZEROVAL' to 'tmp' or something similar will be better.
> >
> > The static is there to guarantee '0' value. Otherwise we create a
> > temp variable that has to be initialized explicitly.
> Ok, I am fine with this. I guess this is the explanation you wanted to avoid 😊.
Don’t use BSS for fastpath code. Let it use stack for better cache usage and multi thread
case. I already sent a simple fix for this with temp variable. Please don’t complicate.
>
> >
> > >>
> > >> > honnag01@qc2400f-1:~/dpdk$ git diff
> > >> >
> > >> > diff --git a/lib/librte_acl/acl_run_neon.h
> > >> > b/lib/librte_acl/acl_run_neon.h
> > >> >
> > >> > index 01b9766d8..b3196cd12 100644
> > >> >
> > >> > --- a/lib/librte_acl/acl_run_neon.h
> > >> >
> > >> > +++ b/lib/librte_acl/acl_run_neon.h
> > >> >
> > >> > @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> > >> > const uint8_t **data,
> > >> >
> > >> >
> > >> >
> > >> > while (flows.started > 0) {
> > >> >
> > >> > /* Gather 4 bytes of input data for each stream.
> > >> > */
> > >> >
> > >> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
> > 0);
> > >> >
> > >> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
> > 0);
> > >> >
> > >> > + input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
> > >> >
> > >> > + input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));
> > >> >
> > >> >
> > >> >
> > >> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> > >> > 1), input0, 1);
> > >> >
> > >> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> > >> > 5), input1, 1);
> > >> >
> > >> > @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx,
> > >> > const uint8_t **data,
> > >> >
> > >> >
> > >> >
> > >> > while (flows.started > 0) {
> > >> >
> > >> > /* Gather 4 bytes of input data for each stream.
> > >> > */
> > >> >
> > >> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> > >> >
> > >> > + input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
> > >> >
> > >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> > >> > input, 1);
> > >> >
> > >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
> > >> > input, 2);
> > >> >
> > >> > input =
> > >> > vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
> > >> >
> > >> >
> > >> >
> > >> > Then 'input' would never appear as an rval before it was set.
> > >> >
> > >> >
> > >> >
> > >> > I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
> > >> >
> > >> > There are three 'fixes' I know exist - one is to squelch the
> > >> > warning
> > >> >
> > >> > (but I don't like it because it could hide future code that
> > >> > introduces
> > >> >
> > >> > this), one is to create a static and use assignment, one is to
> > >> > replace
> > >> >
> > >> > the first call and pass in a 0'd lane for the first one.
> > >> >
> > >> >
> > >> >
> > >> > Actually, I think I have a patch that could work to not introduce
> > >> > an
> > >> >
> > >> > assignment, but squelch the warning. Something like the
> > >> > following (not
> > >> >
> > >> > tested).
> > >> >
> > >> >
> > >> >
> > >> > ---
> > >> >
> > >> >
> > >> >
> > >> > diff --git a/lib/librte_acl/acl_run_neon.h
> > >> >
> > >> > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..37c984fef 100644
> > >> >
> > >> > --- a/lib/librte_acl/acl_run_neon.h
> > >> >
> > >> > +++ b/lib/librte_acl/acl_run_neon.h
> > >> >
> > >> > @@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> > >> > const
> > >> >
> > >> > uint8_t **data,
> > >> >
> > >> > uint64_t index_array[8];
> > >> >
> > >> > struct completion cmplt[8];
> > >> >
> > >> > struct parms parms[8];
> > >> >
> > >> > + static int32x4_t ZEROVAL;
> > >> >
> > >> > int32x4_t input0, input1;
> > >> >
> > >> >
> > >> >
> > >> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@
> > >> > -
> > >> >
> > >> > 181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> > >> > const
> > >> >
> > >> > uint8_t **data,
> > >> >
> > >> >
> > >> >
> > >> > while (flows.started > 0) {
> > >> >
> > >> > /* Gather 4 bytes of input data for each stream. */
> > >> >
> > >> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
> > >> >
> > >> > 0);
> > >> >
> > >> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
> > >> >
> > >> > 0);
> > >> >
> > >> > + input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> > >> >
> > >> > ZEROVAL, 0);
> > >> >
> > >> > + input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> > >> >
> > >> > ZEROVAL, 0);
> > >> >
> > >> >
> > >> >
> > >> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> > >> > input0,
> > >> >
> > >> > 1);
> > >> >
> > >> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
> > >> > input1,
> > >> >
> > >> > 1); @@
> > >> >
> > >> > -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx,
> > >> > const
> > >> >
> > >> > uint8_t **data,
> > >> >
> > >> > uint64_t index_array[4];
> > >> >
> > >> > struct completion cmplt[4];
> > >> >
> > >> > struct parms parms[4];
> > >> >
> > >> > + static int32x4_t ZEROVAL;
> > >> >
> > >> > int32x4_t input;
> > >> >
> > >> >
> > >> >
> > >> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, @@
> > >> > -
> > >> >
> > >> > 242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx,
> > >> > const
> > >> >
> > >> > uint8_t **data,
> > >> >
> > >> >
> > >> >
> > >> > while (flows.started > 0) {
> > >> >
> > >> > /* Gather 4 bytes of input data for each stream. */
> > >> >
> > >> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> > >> >
> > >> > + input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> > >> >
> > >> > ZEROVAL, 0);
> > >> >
> > >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> > >> > input, 1);
> > >> >
> > >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
> > >> > input, 2);
> > >> >
> > >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3),
> > >> > input, 3);
> > >> >
> > >> > --
> > >> >
> > >> > 2.21.0
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
2019-06-08 8:38 ` Jerin Jacob Kollanukkaran
@ 2019-06-08 8:41 ` Jerin Jacob Kollanukkaran
0 siblings, 0 replies; 16+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-08 8:41 UTC (permalink / raw)
To: Jerin Jacob Kollanukkaran, Honnappa Nagarahalli, Aaron Conole
Cc: msantana, thomas, Ruifeng Wang (Arm Technology China),
Gavin Hu (Arm Technology China),
Dharmik Thakkar, jerin.jacob, yskoh, dev, bruce.richardson, nd,
nd
> -----Original Message-----
> From: Jerin Jacob Kollanukkaran <jerinj@marvell.com>
> Sent: Saturday, June 8, 2019 2:08 PM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Aaron Conole
> <aconole@redhat.com>
> Cc: msantana@redhat.com; thomas@monjalon.net; Ruifeng Wang (Arm
> Technology China) <Ruifeng.Wang@arm.com>; Gavin Hu (Arm Technology
> China) <Gavin.Hu@arm.com>; Dharmik Thakkar <Dharmik.Thakkar@arm.com>;
> jerin.jacob@caviumnetworks.com; yskoh@mellanox.com; dev@dpdk.org;
> bruce.richardson@intel.com; nd <nd@arm.com>; nd <nd@arm.com>
> Subject: RE: DPDK compilation on arm is failing in Travis
>
> > -----Original Message-----
> > From: dev <dev-bounces@dpdk.org> On Behalf Of Honnappa Nagarahalli
> > Sent: Friday, June 7, 2019 7:24 PM
> > To: Aaron Conole <aconole@redhat.com>
> > Cc: msantana@redhat.com; thomas@monjalon.net; Ruifeng Wang (Arm
> > Technology China) <Ruifeng.Wang@arm.com>; Gavin Hu (Arm Technology
> > China) <Gavin.Hu@arm.com>; Dharmik Thakkar
> <Dharmik.Thakkar@arm.com>;
> > jerin.jacob@caviumnetworks.com; yskoh@mellanox.com; dev@dpdk.org;
> > bruce.richardson@intel.com; Honnappa Nagarahalli
> > <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; nd <nd@arm.com>
> > Subject: Re: [dpdk-dev] DPDK compilation on arm is failing in Travis
> >
> > > >> >
> > > >> > Thomas Monjalon <thomas@monjalon.net> writes:
> > > >> >
> > > >> >
> > > >> >
> > > >> > The compilation of the master branch is failing for aarch64:
> > > >> >
> > > >> > https://travis-ci.com/DPDK/dpdk
> > > >> >
> > > >> > The log is so much verbose that I am not able to understand
> > > >> > what
> > > >> >
> > > >> > is really wrong.
> > > >> >
> > > >> > Please help to diagnose and fix, thanks.
> > > >> >
> > > >> >
> > > >> >
> > > >> > A discussion about this:
> > > >> >
> > > >> >
> > > >> >
> > > >> > http://mails.dpdk.org/archives/dev/2019-June/134012.html
> > > >> >
> > > >> >
> > > >> >
> > > >> > I see the error now.
> > > >> >
> > > >> > It is printing the full log after the error, so I missed the
> > > >> > error
> > > >> >
> > > >> > at the top.
> > > >> >
> > > >> >
> > > >> >
> > > >> > I've read your comment about a possible error with the patch
> > > >> >
> > > >> > removing weak functions but neither me nor Bruce were able to
> > > >> > reproduce
> > > >> >
> > > >> > it.
> > > >> >
> > > >> > What is the condition to see this compiler warning?
> > > >> >
> > > >> >
> > > >> >
> > > >> > It is only on ARM, and only when the neon intrinsics are in use.
> > > >> >
> > > >> > I am not able to reproduce it from the tip of master.
> > > >> >
> > > >> >
> > > >> >
> > > >> > I am using:
> > > >> >
> > > >> > gcc (Ubuntu 8.3.0-6ubuntu1~18.04) 8.3.0
> > > >> >
> > > >> >
> > > >> >
> > > >> > From the log on Travis, looks like the compiler is:
> > > >> >
> > > >> > gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
> > > >> >
> > > >> >
> > > >> >
> > > >> > Is this the issue?
> > > >> >
> > > >> >
> > > >> >
> > > >> > Why are we seeing the error now?
> > > >> >
> > > >> > I tested with gcc-5 (Ubuntu/Linaro 5.5.0-12ubuntu1) 5.5.0
> > > >> > 20171010, it
> > > >> works fine. I cannot get hold of 5.4.0. Not sure if needs to be supported.
> > > >> >
> > > >> > Are there any issues in upgrading to 7 or 8?
> > > >> >
> > > >> > I have tested it on my ubuntu 16.04 vm on commit
> > > >> > 8cb511bb94ad92a76990f175cac76bb13d51daba
> > > >> > (head of master seems to be failing for other reasons on my vm).
> > > >> > I tested the following gcc versions:
> > > >> >
> > > >> > gcc 5.5.0 "cc (Ubuntu 5.5.0-12ubuntu1~16.04) 5.5.0 20171010"
> > > >> > gcc 7.4.0 "cc (Ubuntu 7.4.0-1ubuntu1~16.04~ppa1) 7.4.0"
> > > >> > gcc 8.1.0 "cc (Ubuntu 8.1.0-5ubuntu1~16.04) 8.1.0"
> > > >> >
> > > >> > All tested versions failed on the exact same error shown in travis.
> > > >> > I don't know if the compiler is at fault here. Maybe Aaron's
> > > >> > patch is a viable
> > > >> option?
> > > >> >
> > > >> > The issue is the vector lane setting code looks like:
> > > >> >
> > > >> >
> > > >> >
> > > >> > lval = lane_set(scalar, rval, lane id)
> > > >> >
> > > >> >
> > > >> >
> > > >> > In this case, 'rval' is being used before it is ever set, but
> > > >> > it
> > > >> >
> > > >> > really could be just 0 for the first lane setting code.
> > > >> > Thereafter,
> > > >> >
> > > >> > we use the old value of input as the rval, but each time a
> > > >> > different lane is
> > > >> set.
> > > >> >
> > > >> >
> > > >> >
> > > >> > It would be nice if there were an intrinsic that formatted
> > > >> > correctly
> > > >> >
> > > >> > from the start (something we could call like lval =
> > > >> >
> > > >> > lane_set_from_array(scalar_array)).
> > > >> >
> > > >> > [Honnappa] This exists already. ‘vdupq_n_s32’ can be used. Can
> > > >> > you try the
> > > >> following?
> > > >>
> > > >> Well, it isn't exactly that. You are setting all lanes from a scalar.
> > > > Yes, you are correct, it sets all the lanes. I am not sure on how
> > > > this will affect the performance.
> > > >
> > > >> I'd rather be able to say:
> > > >>
> > > >> input0 = vdupq_nn_s32(&parms[0]);
> > > >> input1 = vdupq_nn_s32(&parms[4]);
> > > >>
> > > >> Something like that, which lets us delete all the rest of the
> > > >> lane-set code. But it seems it doesn't exist.
> > > >>
> > > >> Regardless, I think either patch should work (either using the 'all lanes'
> > > >> setting you have or the static variable). I have no preference
> > > >> on it
> > > >> - it's up to you (or someone else) to say which is preferred. I
> > > >> guess your version could be preferable since there's no static to
> > > >> need to "explain" :)
> > > > I think we can go ahead with your patch with using a temporary
> > > > vector for the first set, as it does not introduce any change to
> > > > the code and hence performance should not get affected.
> > > >
> > > > But, I do not understand why you have added 'static'. Also,
> > > > changing 'ZEROVAL' to 'tmp' or something similar will be better.
> > >
> > > The static is there to guarantee '0' value. Otherwise we create a
> > > temp variable that has to be initialized explicitly.
> > Ok, I am fine with this. I guess this is the explanation you wanted to avoid 😊.
>
> Don’t use BSS for fastpath code. Let it use stack for better cache usage and
> multi thread case. I already sent a simple fix for this with temp variable. Please
> don’t complicate.
Typo: s/with temp variable/ with out temp variable/g
>
>
> >
> > >
> > > >>
> > > >> > honnag01@qc2400f-1:~/dpdk$ git diff
> > > >> >
> > > >> > diff --git a/lib/librte_acl/acl_run_neon.h
> > > >> > b/lib/librte_acl/acl_run_neon.h
> > > >> >
> > > >> > index 01b9766d8..b3196cd12 100644
> > > >> >
> > > >> > --- a/lib/librte_acl/acl_run_neon.h
> > > >> >
> > > >> > +++ b/lib/librte_acl/acl_run_neon.h
> > > >> >
> > > >> > @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx
> > > >> > *ctx, const uint8_t **data,
> > > >> >
> > > >> >
> > > >> >
> > > >> > while (flows.started > 0) {
> > > >> >
> > > >> > /* Gather 4 bytes of input data for each stream.
> > > >> > */
> > > >> >
> > > >> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
> > > 0);
> > > >> >
> > > >> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
> > > 0);
> > > >> >
> > > >> > + input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms,
> > > >> > + 0));
> > > >> >
> > > >> > + input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms,
> > > >> > + 4));
> > > >> >
> > > >> >
> > > >> >
> > > >> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> > > >> > 1), input0, 1);
> > > >> >
> > > >> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> > > >> > 5), input1, 1);
> > > >> >
> > > >> > @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx
> > > >> > *ctx, const uint8_t **data,
> > > >> >
> > > >> >
> > > >> >
> > > >> > while (flows.started > 0) {
> > > >> >
> > > >> > /* Gather 4 bytes of input data for each stream.
> > > >> > */
> > > >> >
> > > >> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> > > >> >
> > > >> > + input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
> > > >> >
> > > >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> > > >> > 1), input, 1);
> > > >> >
> > > >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> > > >> > 2), input, 2);
> > > >> >
> > > >> > input =
> > > >> > vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
> > > >> >
> > > >> >
> > > >> >
> > > >> > Then 'input' would never appear as an rval before it was set.
> > > >> >
> > > >> >
> > > >> >
> > > >> > I thought Jerin Jacob (CC'd) would have some opinion on the right fix.
> > > >> >
> > > >> > There are three 'fixes' I know exist - one is to squelch the
> > > >> > warning
> > > >> >
> > > >> > (but I don't like it because it could hide future code that
> > > >> > introduces
> > > >> >
> > > >> > this), one is to create a static and use assignment, one is to
> > > >> > replace
> > > >> >
> > > >> > the first call and pass in a 0'd lane for the first one.
> > > >> >
> > > >> >
> > > >> >
> > > >> > Actually, I think I have a patch that could work to not
> > > >> > introduce an
> > > >> >
> > > >> > assignment, but squelch the warning. Something like the
> > > >> > following (not
> > > >> >
> > > >> > tested).
> > > >> >
> > > >> >
> > > >> >
> > > >> > ---
> > > >> >
> > > >> >
> > > >> >
> > > >> > diff --git a/lib/librte_acl/acl_run_neon.h
> > > >> >
> > > >> > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..37c984fef
> > > >> > 100644
> > > >> >
> > > >> > --- a/lib/librte_acl/acl_run_neon.h
> > > >> >
> > > >> > +++ b/lib/librte_acl/acl_run_neon.h
> > > >> >
> > > >> > @@ -165,6 +165,7 @@ search_neon_8(const struct rte_acl_ctx
> > > >> > *ctx, const
> > > >> >
> > > >> > uint8_t **data,
> > > >> >
> > > >> > uint64_t index_array[8];
> > > >> >
> > > >> > struct completion cmplt[8];
> > > >> >
> > > >> > struct parms parms[8];
> > > >> >
> > > >> > + static int32x4_t ZEROVAL;
> > > >> >
> > > >> > int32x4_t input0, input1;
> > > >> >
> > > >> >
> > > >> >
> > > >> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
> > > >> > @@
> > > >> > -
> > > >> >
> > > >> > 181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> > > >> > const
> > > >> >
> > > >> > uint8_t **data,
> > > >> >
> > > >> >
> > > >> >
> > > >> > while (flows.started > 0) {
> > > >> >
> > > >> > /* Gather 4 bytes of input data for each stream. */
> > > >> >
> > > >> > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0,
> > > >> >
> > > >> > 0);
> > > >> >
> > > >> > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1,
> > > >> >
> > > >> > 0);
> > > >> >
> > > >> > + input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> > > >> >
> > > >> > ZEROVAL, 0);
> > > >> >
> > > >> > + input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> > > >> >
> > > >> > ZEROVAL, 0);
> > > >> >
> > > >> >
> > > >> >
> > > >> > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> > > >> > input0,
> > > >> >
> > > >> > 1);
> > > >> >
> > > >> > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
> > > >> > input1,
> > > >> >
> > > >> > 1); @@
> > > >> >
> > > >> > -227,6 +228,7 @@ search_neon_4(const struct rte_acl_ctx *ctx,
> > > >> > const
> > > >> >
> > > >> > uint8_t **data,
> > > >> >
> > > >> > uint64_t index_array[4];
> > > >> >
> > > >> > struct completion cmplt[4];
> > > >> >
> > > >> > struct parms parms[4];
> > > >> >
> > > >> > + static int32x4_t ZEROVAL;
> > > >> >
> > > >> > int32x4_t input;
> > > >> >
> > > >> >
> > > >> >
> > > >> > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
> > > >> > @@
> > > >> > -
> > > >> >
> > > >> > 242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx,
> > > >> > const
> > > >> >
> > > >> > uint8_t **data,
> > > >> >
> > > >> >
> > > >> >
> > > >> > while (flows.started > 0) {
> > > >> >
> > > >> > /* Gather 4 bytes of input data for each stream. */
> > > >> >
> > > >> > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> > > >> >
> > > >> > + input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> > > >> >
> > > >> > ZEROVAL, 0);
> > > >> >
> > > >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> > > >> > input, 1);
> > > >> >
> > > >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
> > > >> > input, 2);
> > > >> >
> > > >> > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3),
> > > >> > input, 3);
> > > >> >
> > > >> > --
> > > >> >
> > > >> > 2.21.0
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2019-06-08 8:41 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-05 18:26 [dpdk-dev] DPDK compilation on arm is failing in Travis Thomas Monjalon
2019-06-05 19:40 ` Aaron Conole
2019-06-05 20:04 ` Thomas Monjalon
2019-06-05 20:58 ` Aaron Conole
2019-06-05 21:05 ` Honnappa Nagarahalli
2019-06-05 21:36 ` Honnappa Nagarahalli
2019-06-05 22:38 ` Michael Santana Francisco
2019-06-06 4:42 ` Honnappa Nagarahalli
2019-06-06 14:50 ` Aaron Conole
2019-06-07 5:10 ` Honnappa Nagarahalli
2019-06-07 13:24 ` Aaron Conole
2019-06-07 13:53 ` Honnappa Nagarahalli
2019-06-08 8:38 ` Jerin Jacob Kollanukkaran
2019-06-08 8:41 ` Jerin Jacob Kollanukkaran
2019-06-06 14:57 ` Jerin Jacob Kollanukkaran
2019-06-06 17:06 ` Michael Santana Francisco
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).