* [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler @ 2019-06-06 14:50 jerinj 2019-06-06 15:55 ` Michael Santana Francisco ` (3 more replies) 0 siblings, 4 replies; 13+ messages in thread From: jerinj @ 2019-06-06 14:50 UTC (permalink / raw) To: dev Cc: thomas, gavin.hu, honnappa.nagarahalli, msantana, aconole, Jerin Jacob, stable From: Jerin Jacob <jerinj@marvell.com> Some compilers reporting the following error, though the existing code doesn't have any uninitialized variable case. Just to make compiler happy, initialize the int32x4_t variable one shot in C language. ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be used uninitialized in this function [-Werror=maybe-uninitialized] int32x4_t input; Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") Cc: stable@dpdk.org Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- lib/librte_acl/acl_run_neon.h | 29 ++++++++++++----------------- 1 file changed, 12 insertions(+), 17 deletions(-) diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h index 01b9766d8..dc9e9efe9 100644 --- a/lib/librte_acl/acl_run_neon.h +++ b/lib/librte_acl/acl_run_neon.h @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data, uint64_t index_array[8]; struct completion cmplt[8]; struct parms parms[8]; - int32x4_t input0, input1; acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, total_packets, categories, ctx->trans_table); @@ -181,17 +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data, while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0); - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0); - - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1); - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1); - - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input0, 2); - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6), input1, 2); - - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input0, 3); - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7), input1, 3); + int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0), + GET_NEXT_4BYTES(parms, 1), + GET_NEXT_4BYTES(parms, 2), + GET_NEXT_4BYTES(parms, 3)}; + int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4), + GET_NEXT_4BYTES(parms, 5), + GET_NEXT_4BYTES(parms, 6), + GET_NEXT_4BYTES(parms, 7)}; /* Process the 4 bytes of input on each stream. */ @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data, uint64_t index_array[4]; struct completion cmplt[4]; struct parms parms[4]; - int32x4_t input; acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, total_packets, categories, ctx->trans_table); @@ -242,10 +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data, while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0); - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1); - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2); - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3); + int32x4_t input = {GET_NEXT_4BYTES(parms, 0), + GET_NEXT_4BYTES(parms, 1), + GET_NEXT_4BYTES(parms, 2), + GET_NEXT_4BYTES(parms, 3)}; /* Process the 4 bytes of input on each stream. */ input = transition4(input, flows.trans, index_array); -- 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler 2019-06-06 14:50 [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler jerinj @ 2019-06-06 15:55 ` Michael Santana Francisco 2019-06-07 5:42 ` Honnappa Nagarahalli 2019-06-07 5:35 ` Honnappa Nagarahalli ` (2 subsequent siblings) 3 siblings, 1 reply; 13+ messages in thread From: Michael Santana Francisco @ 2019-06-06 15:55 UTC (permalink / raw) To: jerinj, dev; +Cc: thomas, gavin.hu, honnappa.nagarahalli, aconole, stable On 6/6/19 10:50 AM, jerinj@marvell.com wrote: > From: Jerin Jacob <jerinj@marvell.com> > > Some compilers reporting the following error, though the existing > code doesn't have any uninitialized variable case. > Just to make compiler happy, initialize the int32x4_t variable > one shot in C language. > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be > used uninitialized in this function [-Werror=maybe-uninitialized] > int32x4_t input; > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") > Cc: stable@dpdk.org > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > --- > lib/librte_acl/acl_run_neon.h | 29 ++++++++++++----------------- > 1 file changed, 12 insertions(+), 17 deletions(-) > > diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h > index 01b9766d8..dc9e9efe9 100644 > --- a/lib/librte_acl/acl_run_neon.h > +++ b/lib/librte_acl/acl_run_neon.h > @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data, > uint64_t index_array[8]; > struct completion cmplt[8]; > struct parms parms[8]; > - int32x4_t input0, input1; > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > total_packets, categories, ctx->trans_table); > @@ -181,17 +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data, > > while (flows.started > 0) { > /* Gather 4 bytes of input data for each stream. */ > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0); > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0); > - > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1); > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1); > - > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input0, 2); > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6), input1, 2); > - > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input0, 3); > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7), input1, 3); > + int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0), > + GET_NEXT_4BYTES(parms, 1), > + GET_NEXT_4BYTES(parms, 2), > + GET_NEXT_4BYTES(parms, 3)}; > + int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4), > + GET_NEXT_4BYTES(parms, 5), > + GET_NEXT_4BYTES(parms, 6), > + GET_NEXT_4BYTES(parms, 7)}; > > /* Process the 4 bytes of input on each stream. */ > > @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data, > uint64_t index_array[4]; > struct completion cmplt[4]; > struct parms parms[4]; > - int32x4_t input; > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > total_packets, categories, ctx->trans_table); > @@ -242,10 +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data, > > while (flows.started > 0) { > /* Gather 4 bytes of input data for each stream. */ > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0); > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1); > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2); > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3); > + int32x4_t input = {GET_NEXT_4BYTES(parms, 0), > + GET_NEXT_4BYTES(parms, 1), > + GET_NEXT_4BYTES(parms, 2), > + GET_NEXT_4BYTES(parms, 3)}; > > /* Process the 4 bytes of input on each stream. */ > input = transition4(input, flows.trans, index_array); Fixed on travis: https://travis-ci.com/Maickii/dpdk-2/builds/114612090 Acked-by: Michael Santana <msantana@redhat.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler 2019-06-06 15:55 ` Michael Santana Francisco @ 2019-06-07 5:42 ` Honnappa Nagarahalli 0 siblings, 0 replies; 13+ messages in thread From: Honnappa Nagarahalli @ 2019-06-07 5:42 UTC (permalink / raw) To: msantana, jerinj, dev Cc: thomas, Gavin Hu (Arm Technology China), aconole, stable, Honnappa Nagarahalli, nd, nd On 6/6/19 10:50 AM, mailto:jerinj@marvell.com wrote: From: Jerin Jacob mailto:jerinj@marvell.com Some compilers reporting the following error, though the existing code doesn't have any uninitialized variable case. Just to make compiler happy, initialize the int32x4_t variable one shot in C language. ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be used uninitialized in this function [-Werror=maybe-uninitialized] int32x4_t input; Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") Cc: mailto:stable@dpdk.org Signed-off-by: Jerin Jacob mailto:jerinj@marvell.com --- lib/librte_acl/acl_run_neon.h | 29 ++++++++++++----------------- 1 file changed, 12 insertions(+), 17 deletions(-) diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h index 01b9766d8..dc9e9efe9 100644 --- a/lib/librte_acl/acl_run_neon.h +++ b/lib/librte_acl/acl_run_neon.h @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data, uint64_t index_array[8]; struct completion cmplt[8]; struct parms parms[8]; - int32x4_t input0, input1; acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, total_packets, categories, ctx->trans_table); @@ -181,17 +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data, while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0); - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0); - - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1); - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1); - - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input0, 2); - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6), input1, 2); - - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input0, 3); - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7), input1, 3); + int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0), + GET_NEXT_4BYTES(parms, 1), + GET_NEXT_4BYTES(parms, 2), + GET_NEXT_4BYTES(parms, 3)}; + int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4), + GET_NEXT_4BYTES(parms, 5), + GET_NEXT_4BYTES(parms, 6), + GET_NEXT_4BYTES(parms, 7)}; /* Process the 4 bytes of input on each stream. */ @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data, uint64_t index_array[4]; struct completion cmplt[4]; struct parms parms[4]; - int32x4_t input; acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, total_packets, categories, ctx->trans_table); @@ -242,10 +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data, while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0); - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1); - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2); - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3); + int32x4_t input = {GET_NEXT_4BYTES(parms, 0), + GET_NEXT_4BYTES(parms, 1), + GET_NEXT_4BYTES(parms, 2), + GET_NEXT_4BYTES(parms, 3)}; /* Process the 4 bytes of input on each stream. */ input = transition4(input, flows.trans, index_array); Fixed on travis: https://travis-ci.com/Maickii/dpdk-2/builds/114612090 Acked-by: Michael Santana mailto:msantana@redhat.com [Honnappa] Prefer to go with Aaron's patch with a temp variable for setting the first lane. Mixing of NEON intrinsics and GCC vector extensions is not recommended as per Arm C Language Extensions guide 12.2.6 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler 2019-06-06 14:50 [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler jerinj 2019-06-06 15:55 ` Michael Santana Francisco @ 2019-06-07 5:35 ` Honnappa Nagarahalli 2019-06-07 6:21 ` Jerin Jacob Kollanukkaran 2019-06-10 12:10 ` Aaron Conole 2019-06-11 14:15 ` [dpdk-stable] [dpdk-dev] [PATCH v2] " jerinj 3 siblings, 1 reply; 13+ messages in thread From: Honnappa Nagarahalli @ 2019-06-07 5:35 UTC (permalink / raw) To: jerinj, dev Cc: thomas, Gavin Hu (Arm Technology China), msantana, aconole, jerinj, Honnappa Nagarahalli, stable, nd > Subject: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler > > From: Jerin Jacob <jerinj@marvell.com> > > Some compilers reporting the following error, though the existing code > doesn't have any uninitialized variable case. > Just to make compiler happy, initialize the int32x4_t variable one shot in C > language. > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be used > uninitialized in this function [-Werror=maybe-uninitialized] > int32x4_t input; > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") > Cc: stable@dpdk.org > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > --- > lib/librte_acl/acl_run_neon.h | 29 ++++++++++++----------------- > 1 file changed, 12 insertions(+), 17 deletions(-) > > diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h > index 01b9766d8..dc9e9efe9 100644 > --- a/lib/librte_acl/acl_run_neon.h > +++ b/lib/librte_acl/acl_run_neon.h > @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, const > uint8_t **data, > uint64_t index_array[8]; > struct completion cmplt[8]; > struct parms parms[8]; > - int32x4_t input0, input1; > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > total_packets, categories, ctx->trans_table); @@ -181,17 > +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t > **data, > > while (flows.started > 0) { > /* Gather 4 bytes of input data for each stream. */ > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), > input0, 0); > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), > input1, 0); > - > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), > input0, 1); > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), > input1, 1); > - > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), > input0, 2); > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6), > input1, 2); > - > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), > input0, 3); > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7), > input1, 3); > + int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0), > + GET_NEXT_4BYTES(parms, 1), > + GET_NEXT_4BYTES(parms, 2), > + GET_NEXT_4BYTES(parms, 3)}; > + int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4), > + GET_NEXT_4BYTES(parms, 5), > + GET_NEXT_4BYTES(parms, 6), > + GET_NEXT_4BYTES(parms, 7)}; > This mixes the use of NEON intrinsics with GCC vector extensions. ACLE (Arm C Language Extensions) specifically recommends not to mix the two methods in section 12.2.6. IMO, Aaron's suggestion of using a temp vector should be good. > /* Process the 4 bytes of input on each stream. */ > > @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, const > uint8_t **data, > uint64_t index_array[4]; > struct completion cmplt[4]; > struct parms parms[4]; > - int32x4_t input; > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > total_packets, categories, ctx->trans_table); @@ -242,10 > +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t > **data, > > while (flows.started > 0) { > /* Gather 4 bytes of input data for each stream. */ > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, > 0); > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, > 1); > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, > 2); > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, > 3); > + int32x4_t input = {GET_NEXT_4BYTES(parms, 0), > + GET_NEXT_4BYTES(parms, 1), > + GET_NEXT_4BYTES(parms, 2), > + GET_NEXT_4BYTES(parms, 3)}; > > /* Process the 4 bytes of input on each stream. */ > input = transition4(input, flows.trans, index_array); > -- > 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler 2019-06-07 5:35 ` Honnappa Nagarahalli @ 2019-06-07 6:21 ` Jerin Jacob Kollanukkaran 2019-06-10 5:29 ` Honnappa Nagarahalli 0 siblings, 1 reply; 13+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-06-07 6:21 UTC (permalink / raw) To: Honnappa Nagarahalli, dev Cc: thomas, Gavin Hu (Arm Technology China), msantana, aconole, stable, nd > -----Original Message----- > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > Sent: Friday, June 7, 2019 11:05 AM > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org > Cc: thomas@monjalon.net; Gavin Hu (Arm Technology China) > <Gavin.Hu@arm.com>; msantana@redhat.com; aconole@redhat.com; Jerin > Jacob Kollanukkaran <jerinj@marvell.com>; Honnappa Nagarahalli > <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com> > Subject: [EXT] RE: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 > compiler > > ---------------------------------------------------------------------- > > Subject: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 > > compiler > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > Some compilers reporting the following error, though the existing code > > doesn't have any uninitialized variable case. > > Just to make compiler happy, initialize the int32x4_t variable one > > shot in C language. > > > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' > > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be used > > uninitialized in this function [-Werror=maybe-uninitialized] > > int32x4_t input; > > > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") > > Cc: stable@dpdk.org > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > > --- > > lib/librte_acl/acl_run_neon.h | 29 ++++++++++++----------------- > > 1 file changed, 12 insertions(+), 17 deletions(-) > > > > diff --git a/lib/librte_acl/acl_run_neon.h > > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..dc9e9efe9 100644 > > --- a/lib/librte_acl/acl_run_neon.h > > +++ b/lib/librte_acl/acl_run_neon.h > > @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, const > > uint8_t **data, > > uint64_t index_array[8]; > > struct completion cmplt[8]; > > struct parms parms[8]; > > - int32x4_t input0, input1; > > > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > > total_packets, categories, ctx->trans_table); @@ -181,17 > > +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t > > **data, > > > > while (flows.started > 0) { > > /* Gather 4 bytes of input data for each stream. */ > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), > > input0, 0); > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), > > input1, 0); > > - > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), > > input0, 1); > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), > > input1, 1); > > - > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), > > input0, 2); > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6), > > input1, 2); > > - > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), > > input0, 3); > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7), > > input1, 3); > > + int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0), > > + GET_NEXT_4BYTES(parms, 1), > > + GET_NEXT_4BYTES(parms, 2), > > + GET_NEXT_4BYTES(parms, 3)}; > > + int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4), > > + GET_NEXT_4BYTES(parms, 5), > > + GET_NEXT_4BYTES(parms, 6), > > + GET_NEXT_4BYTES(parms, 7)}; > > > This mixes the use of NEON intrinsics with GCC vector extensions. ACLE (Arm C > Language Extensions) specifically recommends not to mix the two methods in > section 12.2.6. IMO, Aaron's suggestion of using a temp vector should be good. We are using this pattern across DPDK and SSE for x86 as well. https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c#n91 Since it used in fastpath, a temp variable would be additional cost for no reason. If GCC supports it then I think it is fine, I think, above usage matters with C++ portability. > > > /* Process the 4 bytes of input on each stream. */ > > > > @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, const > > uint8_t **data, > > uint64_t index_array[4]; > > struct completion cmplt[4]; > > struct parms parms[4]; > > - int32x4_t input; > > > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > > total_packets, categories, ctx->trans_table); @@ -242,10 > > +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t > > **data, > > > > while (flows.started > 0) { > > /* Gather 4 bytes of input data for each stream. */ > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, > > 0); > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, > > 1); > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, > > 2); > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, > > 3); > > + int32x4_t input = {GET_NEXT_4BYTES(parms, 0), > > + GET_NEXT_4BYTES(parms, 1), > > + GET_NEXT_4BYTES(parms, 2), > > + GET_NEXT_4BYTES(parms, 3)}; > > > > /* Process the 4 bytes of input on each stream. */ > > input = transition4(input, flows.trans, index_array); > > -- > > 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler 2019-06-07 6:21 ` Jerin Jacob Kollanukkaran @ 2019-06-10 5:29 ` Honnappa Nagarahalli 2019-06-10 9:39 ` Jerin Jacob Kollanukkaran 0 siblings, 1 reply; 13+ messages in thread From: Honnappa Nagarahalli @ 2019-06-10 5:29 UTC (permalink / raw) To: jerinj, dev Cc: thomas, Gavin Hu (Arm Technology China), msantana, aconole, stable, Honnappa Nagarahalli, nd, nd > > > > ---------------------------------------------------------------------- > > > Subject: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 > > > compiler > > > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > > > Some compilers reporting the following error, though the existing > > > code doesn't have any uninitialized variable case. > > > Just to make compiler happy, initialize the int32x4_t variable one > > > shot in C language. > > > > > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' > > > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be used > > > uninitialized in this function [-Werror=maybe-uninitialized] > > > int32x4_t input; > > > > > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") > > > Cc: stable@dpdk.org > > > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > > > --- > > > lib/librte_acl/acl_run_neon.h | 29 ++++++++++++----------------- > > > 1 file changed, 12 insertions(+), 17 deletions(-) > > > > > > diff --git a/lib/librte_acl/acl_run_neon.h > > > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..dc9e9efe9 100644 > > > --- a/lib/librte_acl/acl_run_neon.h > > > +++ b/lib/librte_acl/acl_run_neon.h > > > @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, > > > const uint8_t **data, > > > uint64_t index_array[8]; > > > struct completion cmplt[8]; > > > struct parms parms[8]; > > > - int32x4_t input0, input1; > > > > > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > > > total_packets, categories, ctx->trans_table); @@ -181,17 > > > +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const > > > +uint8_t > > > **data, > > > > > > while (flows.started > 0) { > > > /* Gather 4 bytes of input data for each stream. */ > > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), > > > input0, 0); > > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), > > > input1, 0); > > > - > > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), > > > input0, 1); > > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), > > > input1, 1); > > > - > > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), > > > input0, 2); > > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6), > > > input1, 2); > > > - > > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), > > > input0, 3); > > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7), > > > input1, 3); > > > + int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0), > > > + GET_NEXT_4BYTES(parms, 1), > > > + GET_NEXT_4BYTES(parms, 2), > > > + GET_NEXT_4BYTES(parms, 3)}; > > > + int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4), > > > + GET_NEXT_4BYTES(parms, 5), > > > + GET_NEXT_4BYTES(parms, 6), > > > + GET_NEXT_4BYTES(parms, 7)}; > > > > > This mixes the use of NEON intrinsics with GCC vector extensions. ACLE > > (Arm C Language Extensions) specifically recommends not to mix the two > > methods in section 12.2.6. IMO, Aaron's suggestion of using a temp vector > should be good. > > We are using this pattern across DPDK and SSE for x86 as well. > https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c#n91 I am not sure about x86, I have not looked at a document similar to ACLE for x86. IMO, it is not relevant here as this is Arm specific code. > > Since it used in fastpath, a temp variable would be additional cost for no > reason. Then, I would suggest we can go with using 'vdupq_n_s32'. > If GCC supports it then I think it is fine, I think, above usage matters with C++ > portability. I did not understand the C++ portability part. Can you elaborate more? > > > > > > > /* Process the 4 bytes of input on each stream. */ > > > > > > @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, > > > const uint8_t **data, > > > uint64_t index_array[4]; > > > struct completion cmplt[4]; > > > struct parms parms[4]; > > > - int32x4_t input; > > > > > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > > > total_packets, categories, ctx->trans_table); @@ -242,10 > > > +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const > > > +uint8_t > > > **data, > > > > > > while (flows.started > 0) { > > > /* Gather 4 bytes of input data for each stream. */ > > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, > > > 0); > > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, > > > 1); > > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, > > > 2); > > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, > > > 3); > > > + int32x4_t input = {GET_NEXT_4BYTES(parms, 0), > > > + GET_NEXT_4BYTES(parms, 1), > > > + GET_NEXT_4BYTES(parms, 2), > > > + GET_NEXT_4BYTES(parms, 3)}; > > > > > > /* Process the 4 bytes of input on each stream. */ > > > input = transition4(input, flows.trans, index_array); > > > -- > > > 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler 2019-06-10 5:29 ` Honnappa Nagarahalli @ 2019-06-10 9:39 ` Jerin Jacob Kollanukkaran 2019-06-11 1:27 ` Honnappa Nagarahalli 0 siblings, 1 reply; 13+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-06-10 9:39 UTC (permalink / raw) To: Honnappa Nagarahalli, dev Cc: thomas, Gavin Hu (Arm Technology China), msantana, aconole, stable, nd, nd > -----Original Message----- > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > Sent: Monday, June 10, 2019 11:00 AM > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org > Cc: thomas@monjalon.net; Gavin Hu (Arm Technology China) > <Gavin.Hu@arm.com>; msantana@redhat.com; aconole@redhat.com; > stable@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; > nd <nd@arm.com>; nd <nd@arm.com> > Subject: [EXT] RE: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 > compiler > > > > -- > > > > Subject: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 > > > > compiler > > > > > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > > > > > Some compilers reporting the following error, though the existing > > > > code doesn't have any uninitialized variable case. > > > > Just to make compiler happy, initialize the int32x4_t variable one > > > > shot in C language. > > > > > > > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' > > > > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be > > > > used uninitialized in this function [-Werror=maybe-uninitialized] > > > > int32x4_t input; > > > > > > > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") > > > > Cc: stable@dpdk.org > > > > > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > > > > --- > > > > lib/librte_acl/acl_run_neon.h | 29 ++++++++++++----------------- > > > > 1 file changed, 12 insertions(+), 17 deletions(-) > > > > > > > > diff --git a/lib/librte_acl/acl_run_neon.h > > > > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..dc9e9efe9 100644 > > > > --- a/lib/librte_acl/acl_run_neon.h > > > > +++ b/lib/librte_acl/acl_run_neon.h > > > > @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, > > > > const uint8_t **data, > > > > uint64_t index_array[8]; > > > > struct completion cmplt[8]; > > > > struct parms parms[8]; > > > > - int32x4_t input0, input1; > > > > > > > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > > > > total_packets, categories, ctx->trans_table); @@ -181,17 > > > > +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const > > > > +uint8_t > > > > **data, > > > > > > > > while (flows.started > 0) { > > > > /* Gather 4 bytes of input data for each stream. */ > > > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), > > > > input0, 0); > > > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), > > > > input1, 0); > > > > - > > > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), > > > > input0, 1); > > > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), > > > > input1, 1); > > > > - > > > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), > > > > input0, 2); > > > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6), > > > > input1, 2); > > > > - > > > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), > > > > input0, 3); > > > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7), > > > > input1, 3); > > > > + int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0), > > > > + GET_NEXT_4BYTES(parms, 1), > > > > + GET_NEXT_4BYTES(parms, 2), > > > > + GET_NEXT_4BYTES(parms, 3)}; > > > > + int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4), > > > > + GET_NEXT_4BYTES(parms, 5), > > > > + GET_NEXT_4BYTES(parms, 6), > > > > + GET_NEXT_4BYTES(parms, 7)}; > > > > > > > This mixes the use of NEON intrinsics with GCC vector extensions. > > > ACLE (Arm C Language Extensions) specifically recommends not to mix > > > the two methods in section 12.2.6. IMO, Aaron's suggestion of using > > > a temp vector > > should be good. > > > > We are using this pattern across DPDK and SSE for x86 as well. > > https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c#n > > 91 > I am not sure about x86, I have not looked at a document similar to ACLE for > x86. IMO, it is not relevant here as this is Arm specific code. What I meant was its been already used in DPDK for arm64. https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c#n91 Please see offial page vector gcc gcc documentation. The examples are using this scheme. https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html This is to just create 'input' variable. I am fine to use any other scheme with out additional cost of instructions. > > > > > Since it used in fastpath, a temp variable would be additional cost > > for no reason. > Then, I would suggest we can go with using 'vdupq_n_s32'. We have to form uint64x2_t with 4 x uint32_t variable, How does 'vdupq_n_s32' help here? Can you share code snippet without any temp variable? > > > If GCC supports it then I think it is fine, I think, above usage > > matters with C++ portability. > I did not understand the C++ portability part. Can you elaborate more? > > > > > > > > > > > > /* Process the 4 bytes of input on each stream. */ > > > > > > > > @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, > > > > const uint8_t **data, > > > > uint64_t index_array[4]; > > > > struct completion cmplt[4]; > > > > struct parms parms[4]; > > > > - int32x4_t input; > > > > > > > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > > > > total_packets, categories, ctx->trans_table); @@ -242,10 > > > > +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const > > > > +uint8_t > > > > **data, > > > > > > > > while (flows.started > 0) { > > > > /* Gather 4 bytes of input data for each stream. */ > > > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, > > > > 0); > > > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, > > > > 1); > > > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, > > > > 2); > > > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, > > > > 3); > > > > + int32x4_t input = {GET_NEXT_4BYTES(parms, 0), > > > > + GET_NEXT_4BYTES(parms, 1), > > > > + GET_NEXT_4BYTES(parms, 2), > > > > + GET_NEXT_4BYTES(parms, 3)}; > > > > > > > > /* Process the 4 bytes of input on each stream. */ > > > > input = transition4(input, flows.trans, index_array); > > > > -- > > > > 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler 2019-06-10 9:39 ` Jerin Jacob Kollanukkaran @ 2019-06-11 1:27 ` Honnappa Nagarahalli 2019-06-11 14:24 ` Jerin Jacob Kollanukkaran 0 siblings, 1 reply; 13+ messages in thread From: Honnappa Nagarahalli @ 2019-06-11 1:27 UTC (permalink / raw) To: jerinj, dev Cc: thomas, Gavin Hu (Arm Technology China), msantana, aconole, stable, Honnappa Nagarahalli, nd, nd > > > > -- > > > > > Subject: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 > > > > > compiler > > > > > > > > > > From: Jerin Jacob <jerinj@marvell.com> > > > > > > > > > > Some compilers reporting the following error, though the > > > > > existing code doesn't have any uninitialized variable case. > > > > > Just to make compiler happy, initialize the int32x4_t variable > > > > > one shot in C language. > > > > > > > > > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' > > > > > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be > > > > > used uninitialized in this function [-Werror=maybe-uninitialized] > > > > > int32x4_t input; > > > > > > > > > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") > > > > > Cc: stable@dpdk.org > > > > > > > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > > > > > --- > > > > > lib/librte_acl/acl_run_neon.h | 29 > > > > > ++++++++++++----------------- > > > > > 1 file changed, 12 insertions(+), 17 deletions(-) > > > > > > > > > > diff --git a/lib/librte_acl/acl_run_neon.h > > > > > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..dc9e9efe9 > > > > > 100644 > > > > > --- a/lib/librte_acl/acl_run_neon.h > > > > > +++ b/lib/librte_acl/acl_run_neon.h > > > > > @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, > > > > > const uint8_t **data, > > > > > uint64_t index_array[8]; > > > > > struct completion cmplt[8]; > > > > > struct parms parms[8]; > > > > > - int32x4_t input0, input1; > > > > > > > > > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > > > > > total_packets, categories, ctx->trans_table); @@ -181,17 > > > > > +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const > > > > > +uint8_t > > > > > **data, > > > > > > > > > > while (flows.started > 0) { > > > > > /* Gather 4 bytes of input data for each stream. */ > > > > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 0), > > > > > input0, 0); > > > > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 4), > > > > > input1, 0); > > > > > - > > > > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 1), > > > > > input0, 1); > > > > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 5), > > > > > input1, 1); > > > > > - > > > > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 2), > > > > > input0, 2); > > > > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 6), > > > > > input1, 2); > > > > > - > > > > > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 3), > > > > > input0, 3); > > > > > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, > 7), > > > > > input1, 3); > > > > > + int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0), > > > > > + GET_NEXT_4BYTES(parms, 1), > > > > > + GET_NEXT_4BYTES(parms, 2), > > > > > + GET_NEXT_4BYTES(parms, 3)}; > > > > > + int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4), > > > > > + GET_NEXT_4BYTES(parms, 5), > > > > > + GET_NEXT_4BYTES(parms, 6), > > > > > + GET_NEXT_4BYTES(parms, 7)}; > > > > > > > > > This mixes the use of NEON intrinsics with GCC vector extensions. > > > > ACLE (Arm C Language Extensions) specifically recommends not to > > > > mix the two methods in section 12.2.6. IMO, Aaron's suggestion of > > > > using a temp vector > > > should be good. > > > > > > We are using this pattern across DPDK and SSE for x86 as well. > > > https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c > > > #n > > > 91 > > I am not sure about x86, I have not looked at a document similar to > > ACLE for x86. IMO, it is not relevant here as this is Arm specific code. > > What I meant was its been already used in DPDK for arm64. > https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c#n91 Ok, got it. I have had discussion with compiler folks at Arm with mixing vector programming models and the recommendation has been to use NEON exclusively. I have had this discussion with Marvel compiler folks too some time back. > > Please see offial page vector gcc gcc documentation. The examples are using > this scheme. > https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html > > This is to just create 'input' variable. I am fine to use any other scheme with > out additional cost of instructions. > > > > > > > > > Since it used in fastpath, a temp variable would be additional cost > > > for no reason. > > Then, I would suggest we can go with using 'vdupq_n_s32'. > > We have to form uint64x2_t with 4 x uint32_t variable, How does > 'vdupq_n_s32' help here? We would use 'vdupq_n_s32' only for the first initialization, the rest of the code remains the same (see the diff below) > Can you share code snippet without any temp variable? diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h index 01b9766d8..b3196cd12 100644 --- a/lib/librte_acl/acl_run_neon.h +++ b/lib/librte_acl/acl_run_neon.h @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data, while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0); - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0); + input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0)); + input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4)); input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1); input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1); @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data, while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0); + input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0)); input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1); input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2); input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3); My understanding is that the generated code for both your patch and my changes above is the same. Above suggested changes will conform to ACLE recommendation. > > > > > > If GCC supports it then I think it is fine, I think, above usage > > > matters with C++ portability. > > I did not understand the C++ portability part. Can you elaborate more? > > > > > > > > > > > > > > > > > /* Process the 4 bytes of input on each stream. */ > > > > > > > > > > @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, > > > > > const uint8_t **data, > > > > > uint64_t index_array[4]; > > > > > struct completion cmplt[4]; > > > > > struct parms parms[4]; > > > > > - int32x4_t input; > > > > > > > > > > acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results, > > > > > total_packets, categories, ctx->trans_table); @@ -242,10 > > > > > +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const > > > > > +uint8_t > > > > > **data, > > > > > > > > > > while (flows.started > 0) { > > > > > /* Gather 4 bytes of input data for each stream. */ > > > > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), > input, > > > > > 0); > > > > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), > input, > > > > > 1); > > > > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), > input, > > > > > 2); > > > > > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), > input, > > > > > 3); > > > > > + int32x4_t input = {GET_NEXT_4BYTES(parms, 0), > > > > > + GET_NEXT_4BYTES(parms, 1), > > > > > + GET_NEXT_4BYTES(parms, 2), > > > > > + GET_NEXT_4BYTES(parms, 3)}; > > > > > > > > > > /* Process the 4 bytes of input on each stream. */ > > > > > input = transition4(input, flows.trans, index_array); > > > > > -- > > > > > 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler 2019-06-11 1:27 ` Honnappa Nagarahalli @ 2019-06-11 14:24 ` Jerin Jacob Kollanukkaran 0 siblings, 0 replies; 13+ messages in thread From: Jerin Jacob Kollanukkaran @ 2019-06-11 14:24 UTC (permalink / raw) To: Honnappa Nagarahalli, dev Cc: thomas, Gavin Hu (Arm Technology China), msantana, aconole, stable, nd, nd > -----Original Message----- > From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com> > Sent: Tuesday, June 11, 2019 6:58 AM > To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org > Cc: thomas@monjalon.net; Gavin Hu (Arm Technology China) > <Gavin.Hu@arm.com>; msantana@redhat.com; aconole@redhat.com; > stable@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; > nd <nd@arm.com>; nd <nd@arm.com> > Subject: [EXT] RE: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 > compiler > > > > > > > > > > > > Since it used in fastpath, a temp variable would be additional > > > > cost for no reason. > > > Then, I would suggest we can go with using 'vdupq_n_s32'. > > > > We have to form uint64x2_t with 4 x uint32_t variable, How does > > 'vdupq_n_s32' help here? > We would use 'vdupq_n_s32' only for the first initialization, the rest of the code > remains the same (see the diff below) > > > Can you share code snippet without any temp variable? > diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h index > 01b9766d8..b3196cd12 100644 > --- a/lib/librte_acl/acl_run_neon.h > +++ b/lib/librte_acl/acl_run_neon.h > @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const > uint8_t **data, > > while (flows.started > 0) { > /* Gather 4 bytes of input data for each stream. */ > - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0); > - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0); > + input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0)); > + input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4)); > > input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1); > input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1); @@ - > 242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t > **data, > > while (flows.started > 0) { > /* Gather 4 bytes of input data for each stream. */ > - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0); > + input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0)); > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1); > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2); > input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3); > > My understanding is that the generated code for both your patch and my > changes above is the same. Above suggested changes will conform to ACLE > recommendation. Though instructions are different. Effective cycles are same even though First dup updates the four positions. To make forward progress send the v2 based on the updated logic just to make ACLE Spec happy, I don’t see any real reason to do it though 😊 http://patches.dpdk.org/patch/54656/ ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler 2019-06-06 14:50 [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler jerinj 2019-06-06 15:55 ` Michael Santana Francisco 2019-06-07 5:35 ` Honnappa Nagarahalli @ 2019-06-10 12:10 ` Aaron Conole 2019-06-11 14:15 ` [dpdk-stable] [dpdk-dev] [PATCH v2] " jerinj 3 siblings, 0 replies; 13+ messages in thread From: Aaron Conole @ 2019-06-10 12:10 UTC (permalink / raw) To: jerinj; +Cc: dev, thomas, gavin.hu, honnappa.nagarahalli, msantana, stable <jerinj@marvell.com> writes: > From: Jerin Jacob <jerinj@marvell.com> > > Some compilers reporting the following error, though the existing > code doesn't have any uninitialized variable case. > Just to make compiler happy, initialize the int32x4_t variable > one shot in C language. > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be > used uninitialized in this function [-Werror=maybe-uninitialized] > int32x4_t input; > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") > Cc: stable@dpdk.org > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > --- This pattern is easy to understand, congruent with other usages in the code base, has good patch statistics, and solves the issue. Acked-by: Aaron Conole <aconole@redhat.com> I prefer this solution to the others posted. Thanks for looking into it, Jerin! ^ permalink raw reply [flat|nested] 13+ messages in thread
* [dpdk-stable] [dpdk-dev] [PATCH v2] acl: fix build issue with some arm64 compiler 2019-06-06 14:50 [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler jerinj ` (2 preceding siblings ...) 2019-06-10 12:10 ` Aaron Conole @ 2019-06-11 14:15 ` jerinj 2019-06-11 14:53 ` Aaron Conole 3 siblings, 1 reply; 13+ messages in thread From: jerinj @ 2019-06-11 14:15 UTC (permalink / raw) To: Jerin Jacob, Gavin Hu, Konstantin Ananyev Cc: dev, thomas, msantana, aconole, stable, Honnappa Nagarahalli From: Jerin Jacob <jerinj@marvell.com> Some compilers reporting the following error, though the existing code doesn't have any uninitialized variable case. Just to make compiler happy, initialize the int32x4_t variable one shot using vdupq_n_s32. ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be used uninitialized in this function [-Werror=maybe-uninitialized] int32x4_t input; Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") Cc: stable@dpdk.org Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> --- v2: - Changed C based initializion to vdupq_n_s32 for better comparability with ACLE(Honnappa) --- lib/librte_acl/acl_run_neon.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h index 01b9766d8..b3196cd12 100644 --- a/lib/librte_acl/acl_run_neon.h +++ b/lib/librte_acl/acl_run_neon.h @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data, while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0); - input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0); + input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0)); + input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4)); input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1); input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1); @@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data, while (flows.started > 0) { /* Gather 4 bytes of input data for each stream. */ - input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0); + input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0)); input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1); input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2); input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3); -- 2.21.0 ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v2] acl: fix build issue with some arm64 compiler 2019-06-11 14:15 ` [dpdk-stable] [dpdk-dev] [PATCH v2] " jerinj @ 2019-06-11 14:53 ` Aaron Conole 2019-06-11 15:07 ` Thomas Monjalon 0 siblings, 1 reply; 13+ messages in thread From: Aaron Conole @ 2019-06-11 14:53 UTC (permalink / raw) To: jerinj Cc: Gavin Hu, Konstantin Ananyev, dev, thomas, msantana, stable, Honnappa Nagarahalli <jerinj@marvell.com> writes: > From: Jerin Jacob <jerinj@marvell.com> > > Some compilers reporting the following error, though the existing > code doesn't have any uninitialized variable case. > Just to make compiler happy, initialize the int32x4_t variable > one shot using vdupq_n_s32. > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be > used uninitialized in this function [-Werror=maybe-uninitialized] > int32x4_t input; > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") > Cc: stable@dpdk.org > > Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > --- LGTM Acked-by: Aaron Conole <aconole@redhat.com> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [dpdk-stable] [dpdk-dev] [PATCH v2] acl: fix build issue with some arm64 compiler 2019-06-11 14:53 ` Aaron Conole @ 2019-06-11 15:07 ` Thomas Monjalon 0 siblings, 0 replies; 13+ messages in thread From: Thomas Monjalon @ 2019-06-11 15:07 UTC (permalink / raw) To: jerinj Cc: dev, Aaron Conole, Gavin Hu, Konstantin Ananyev, msantana, stable, Honnappa Nagarahalli 11/06/2019 23:53, Aaron Conole: > <jerinj@marvell.com> writes: > > > From: Jerin Jacob <jerinj@marvell.com> > > > > Some compilers reporting the following error, though the existing > > code doesn't have any uninitialized variable case. > > Just to make compiler happy, initialize the int32x4_t variable > > one shot using vdupq_n_s32. > > > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4' > > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be > > used uninitialized in this function [-Werror=maybe-uninitialized] > > int32x4_t input; > > > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8") > > Cc: stable@dpdk.org > > > > Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com> > > Signed-off-by: Jerin Jacob <jerinj@marvell.com> > > --- > > LGTM > > Acked-by: Aaron Conole <aconole@redhat.com> Applied, thanks ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2019-06-11 15:08 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-06-06 14:50 [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler jerinj 2019-06-06 15:55 ` Michael Santana Francisco 2019-06-07 5:42 ` Honnappa Nagarahalli 2019-06-07 5:35 ` Honnappa Nagarahalli 2019-06-07 6:21 ` Jerin Jacob Kollanukkaran 2019-06-10 5:29 ` Honnappa Nagarahalli 2019-06-10 9:39 ` Jerin Jacob Kollanukkaran 2019-06-11 1:27 ` Honnappa Nagarahalli 2019-06-11 14:24 ` Jerin Jacob Kollanukkaran 2019-06-10 12:10 ` Aaron Conole 2019-06-11 14:15 ` [dpdk-stable] [dpdk-dev] [PATCH v2] " jerinj 2019-06-11 14:53 ` Aaron Conole 2019-06-11 15:07 ` Thomas Monjalon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).