patches for DPDK stable branches
 help / color / mirror / Atom feed
* [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
@ 2019-06-06 14:50 jerinj
  2019-06-06 15:55 ` Michael Santana Francisco
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: jerinj @ 2019-06-06 14:50 UTC (permalink / raw)
  To: dev
  Cc: thomas, gavin.hu, honnappa.nagarahalli, msantana, aconole,
	Jerin Jacob, stable

From: Jerin Jacob <jerinj@marvell.com>

Some compilers reporting the following error, though the existing
code doesn't have any uninitialized variable case.
Just to make compiler happy, initialize the int32x4_t variable
one shot in C language.

../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be
used uninitialized in this function [-Werror=maybe-uninitialized]
  int32x4_t input;

Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
Cc: stable@dpdk.org

Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---
 lib/librte_acl/acl_run_neon.h | 29 ++++++++++++-----------------
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
index 01b9766d8..dc9e9efe9 100644
--- a/lib/librte_acl/acl_run_neon.h
+++ b/lib/librte_acl/acl_run_neon.h
@@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
 	uint64_t index_array[8];
 	struct completion cmplt[8];
 	struct parms parms[8];
-	int32x4_t input0, input1;
 
 	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
 		     total_packets, categories, ctx->trans_table);
@@ -181,17 +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
 
 	while (flows.started > 0) {
 		/* Gather 4 bytes of input data for each stream. */
-		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
-		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
-
-		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
-		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
-
-		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input0, 2);
-		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6), input1, 2);
-
-		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input0, 3);
-		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7), input1, 3);
+		int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0),
+				    GET_NEXT_4BYTES(parms, 1),
+				    GET_NEXT_4BYTES(parms, 2),
+				    GET_NEXT_4BYTES(parms, 3)};
+		int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4),
+				    GET_NEXT_4BYTES(parms, 5),
+				    GET_NEXT_4BYTES(parms, 6),
+				    GET_NEXT_4BYTES(parms, 7)};
 
 		/* Process the 4 bytes of input on each stream. */
 
@@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
 	uint64_t index_array[4];
 	struct completion cmplt[4];
 	struct parms parms[4];
-	int32x4_t input;
 
 	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
 		     total_packets, categories, ctx->trans_table);
@@ -242,10 +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
 
 	while (flows.started > 0) {
 		/* Gather 4 bytes of input data for each stream. */
-		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
-		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
-		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
-		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
+		int32x4_t input = {GET_NEXT_4BYTES(parms, 0),
+				   GET_NEXT_4BYTES(parms, 1),
+				   GET_NEXT_4BYTES(parms, 2),
+				   GET_NEXT_4BYTES(parms, 3)};
 
 		/* Process the 4 bytes of input on each stream. */
 		input = transition4(input, flows.trans, index_array);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
  2019-06-06 14:50 [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler jerinj
@ 2019-06-06 15:55 ` Michael Santana Francisco
  2019-06-07  5:42   ` Honnappa Nagarahalli
  2019-06-07  5:35 ` Honnappa Nagarahalli
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Michael Santana Francisco @ 2019-06-06 15:55 UTC (permalink / raw)
  To: jerinj, dev; +Cc: thomas, gavin.hu, honnappa.nagarahalli, aconole, stable

On 6/6/19 10:50 AM, jerinj@marvell.com wrote:
> From: Jerin Jacob <jerinj@marvell.com>
>
> Some compilers reporting the following error, though the existing
> code doesn't have any uninitialized variable case.
> Just to make compiler happy, initialize the int32x4_t variable
> one shot in C language.
>
> ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
> ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be
> used uninitialized in this function [-Werror=maybe-uninitialized]
>    int32x4_t input;
>
> Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
> Cc: stable@dpdk.org
>
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> ---
>   lib/librte_acl/acl_run_neon.h | 29 ++++++++++++-----------------
>   1 file changed, 12 insertions(+), 17 deletions(-)
>
> diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
> index 01b9766d8..dc9e9efe9 100644
> --- a/lib/librte_acl/acl_run_neon.h
> +++ b/lib/librte_acl/acl_run_neon.h
> @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
>   	uint64_t index_array[8];
>   	struct completion cmplt[8];
>   	struct parms parms[8];
> -	int32x4_t input0, input1;
>   
>   	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
>   		     total_packets, categories, ctx->trans_table);
> @@ -181,17 +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
>   
>   	while (flows.started > 0) {
>   		/* Gather 4 bytes of input data for each stream. */
> -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
> -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
> -
> -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
> -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
> -
> -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input0, 2);
> -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6), input1, 2);
> -
> -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input0, 3);
> -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7), input1, 3);
> +		int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0),
> +				    GET_NEXT_4BYTES(parms, 1),
> +				    GET_NEXT_4BYTES(parms, 2),
> +				    GET_NEXT_4BYTES(parms, 3)};
> +		int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4),
> +				    GET_NEXT_4BYTES(parms, 5),
> +				    GET_NEXT_4BYTES(parms, 6),
> +				    GET_NEXT_4BYTES(parms, 7)};
>   
>   		/* Process the 4 bytes of input on each stream. */
>   
> @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
>   	uint64_t index_array[4];
>   	struct completion cmplt[4];
>   	struct parms parms[4];
> -	int32x4_t input;
>   
>   	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
>   		     total_packets, categories, ctx->trans_table);
> @@ -242,10 +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
>   
>   	while (flows.started > 0) {
>   		/* Gather 4 bytes of input data for each stream. */
> -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
> -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
> -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
> +		int32x4_t input = {GET_NEXT_4BYTES(parms, 0),
> +				   GET_NEXT_4BYTES(parms, 1),
> +				   GET_NEXT_4BYTES(parms, 2),
> +				   GET_NEXT_4BYTES(parms, 3)};
>   
>   		/* Process the 4 bytes of input on each stream. */
>   		input = transition4(input, flows.trans, index_array);

Fixed on travis: https://travis-ci.com/Maickii/dpdk-2/builds/114612090

Acked-by: Michael Santana <msantana@redhat.com>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
  2019-06-06 14:50 [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler jerinj
  2019-06-06 15:55 ` Michael Santana Francisco
@ 2019-06-07  5:35 ` Honnappa Nagarahalli
  2019-06-07  6:21   ` Jerin Jacob Kollanukkaran
  2019-06-10 12:10 ` Aaron Conole
  2019-06-11 14:15 ` [dpdk-stable] [dpdk-dev] [PATCH v2] " jerinj
  3 siblings, 1 reply; 13+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-07  5:35 UTC (permalink / raw)
  To: jerinj, dev
  Cc: thomas, Gavin Hu (Arm Technology China),
	msantana, aconole, jerinj, Honnappa Nagarahalli, stable, nd

> Subject: [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
> 
> From: Jerin Jacob <jerinj@marvell.com>
> 
> Some compilers reporting the following error, though the existing code
> doesn't have any uninitialized variable case.
> Just to make compiler happy, initialize the int32x4_t variable one shot in C
> language.
> 
> ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
> ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be used
> uninitialized in this function [-Werror=maybe-uninitialized]
>   int32x4_t input;
> 
> Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> ---
>  lib/librte_acl/acl_run_neon.h | 29 ++++++++++++-----------------
>  1 file changed, 12 insertions(+), 17 deletions(-)
> 
> diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
> index 01b9766d8..dc9e9efe9 100644
> --- a/lib/librte_acl/acl_run_neon.h
> +++ b/lib/librte_acl/acl_run_neon.h
> @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> uint8_t **data,
>  	uint64_t index_array[8];
>  	struct completion cmplt[8];
>  	struct parms parms[8];
> -	int32x4_t input0, input1;
> 
>  	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
>  		     total_packets, categories, ctx->trans_table); @@ -181,17
> +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t
> **data,
> 
>  	while (flows.started > 0) {
>  		/* Gather 4 bytes of input data for each stream. */
> -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> input0, 0);
> -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> input1, 0);
> -
> -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> input0, 1);
> -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
> input1, 1);
> -
> -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
> input0, 2);
> -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6),
> input1, 2);
> -
> -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3),
> input0, 3);
> -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7),
> input1, 3);
> +		int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0),
> +				    GET_NEXT_4BYTES(parms, 1),
> +				    GET_NEXT_4BYTES(parms, 2),
> +				    GET_NEXT_4BYTES(parms, 3)};
> +		int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4),
> +				    GET_NEXT_4BYTES(parms, 5),
> +				    GET_NEXT_4BYTES(parms, 6),
> +				    GET_NEXT_4BYTES(parms, 7)};
> 
This mixes the use of NEON intrinsics with GCC vector extensions. ACLE (Arm C Language Extensions) specifically recommends not to mix the two methods in section 12.2.6. IMO, Aaron's suggestion of using a temp vector should be good.

>  		/* Process the 4 bytes of input on each stream. */
> 
> @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
> uint8_t **data,
>  	uint64_t index_array[4];
>  	struct completion cmplt[4];
>  	struct parms parms[4];
> -	int32x4_t input;
> 
>  	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
>  		     total_packets, categories, ctx->trans_table); @@ -242,10
> +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t
> **data,
> 
>  	while (flows.started > 0) {
>  		/* Gather 4 bytes of input data for each stream. */
> -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input,
> 0);
> -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input,
> 1);
> -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input,
> 2);
> -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input,
> 3);
> +		int32x4_t input = {GET_NEXT_4BYTES(parms, 0),
> +				   GET_NEXT_4BYTES(parms, 1),
> +				   GET_NEXT_4BYTES(parms, 2),
> +				   GET_NEXT_4BYTES(parms, 3)};
> 
>  		/* Process the 4 bytes of input on each stream. */
>  		input = transition4(input, flows.trans, index_array);
> --
> 2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
  2019-06-06 15:55 ` Michael Santana Francisco
@ 2019-06-07  5:42   ` Honnappa Nagarahalli
  0 siblings, 0 replies; 13+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-07  5:42 UTC (permalink / raw)
  To: msantana, jerinj, dev
  Cc: thomas, Gavin Hu (Arm Technology China),
	aconole, stable, Honnappa Nagarahalli, nd, nd

On 6/6/19 10:50 AM, mailto:jerinj@marvell.com wrote:
From: Jerin Jacob mailto:jerinj@marvell.com

Some compilers reporting the following error, though the existing
code doesn't have any uninitialized variable case.
Just to make compiler happy, initialize the int32x4_t variable
one shot in C language.

../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be
used uninitialized in this function [-Werror=maybe-uninitialized]
  int32x4_t input;

Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
Cc: mailto:stable@dpdk.org

Signed-off-by: Jerin Jacob mailto:jerinj@marvell.com
---
 lib/librte_acl/acl_run_neon.h | 29 ++++++++++++-----------------
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
index 01b9766d8..dc9e9efe9 100644
--- a/lib/librte_acl/acl_run_neon.h
+++ b/lib/librte_acl/acl_run_neon.h
@@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
 	uint64_t index_array[8];
 	struct completion cmplt[8];
 	struct parms parms[8];
-	int32x4_t input0, input1;
 
 	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
 		     total_packets, categories, ctx->trans_table);
@@ -181,17 +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
 
 	while (flows.started > 0) {
 		/* Gather 4 bytes of input data for each stream. */
-		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
-		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
-
-		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
-		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
-
-		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input0, 2);
-		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6), input1, 2);
-
-		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input0, 3);
-		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7), input1, 3);
+		int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0),
+				    GET_NEXT_4BYTES(parms, 1),
+				    GET_NEXT_4BYTES(parms, 2),
+				    GET_NEXT_4BYTES(parms, 3)};
+		int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4),
+				    GET_NEXT_4BYTES(parms, 5),
+				    GET_NEXT_4BYTES(parms, 6),
+				    GET_NEXT_4BYTES(parms, 7)};
 
 		/* Process the 4 bytes of input on each stream. */
 
@@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
 	uint64_t index_array[4];
 	struct completion cmplt[4];
 	struct parms parms[4];
-	int32x4_t input;
 
 	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
 		     total_packets, categories, ctx->trans_table);
@@ -242,10 +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
 
 	while (flows.started > 0) {
 		/* Gather 4 bytes of input data for each stream. */
-		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
-		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
-		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
-		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
+		int32x4_t input = {GET_NEXT_4BYTES(parms, 0),
+				   GET_NEXT_4BYTES(parms, 1),
+				   GET_NEXT_4BYTES(parms, 2),
+				   GET_NEXT_4BYTES(parms, 3)};
 
 		/* Process the 4 bytes of input on each stream. */
 		input = transition4(input, flows.trans, index_array);
Fixed on travis: https://travis-ci.com/Maickii/dpdk-2/builds/114612090
Acked-by: Michael Santana mailto:msantana@redhat.com

[Honnappa] Prefer to go with Aaron's patch with a temp variable for setting the first lane. Mixing of NEON intrinsics and GCC vector extensions is not recommended as per Arm C Language Extensions guide 12.2.6


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
  2019-06-07  5:35 ` Honnappa Nagarahalli
@ 2019-06-07  6:21   ` Jerin Jacob Kollanukkaran
  2019-06-10  5:29     ` Honnappa Nagarahalli
  0 siblings, 1 reply; 13+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-07  6:21 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev
  Cc: thomas, Gavin Hu (Arm Technology China), msantana, aconole, stable, nd

	
> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Friday, June 7, 2019 11:05 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; msantana@redhat.com; aconole@redhat.com; Jerin
> Jacob Kollanukkaran <jerinj@marvell.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; stable@dpdk.org; nd <nd@arm.com>
> Subject: [EXT] RE: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> compiler
> 
> ----------------------------------------------------------------------
> > Subject: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> > compiler
> >
> > From: Jerin Jacob <jerinj@marvell.com>
> >
> > Some compilers reporting the following error, though the existing code
> > doesn't have any uninitialized variable case.
> > Just to make compiler happy, initialize the int32x4_t variable one
> > shot in C language.
> >
> > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
> > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be used
> > uninitialized in this function [-Werror=maybe-uninitialized]
> >   int32x4_t input;
> >
> > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> > ---
> >  lib/librte_acl/acl_run_neon.h | 29 ++++++++++++-----------------
> >  1 file changed, 12 insertions(+), 17 deletions(-)
> >
> > diff --git a/lib/librte_acl/acl_run_neon.h
> > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..dc9e9efe9 100644
> > --- a/lib/librte_acl/acl_run_neon.h
> > +++ b/lib/librte_acl/acl_run_neon.h
> > @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> > uint8_t **data,
> >  	uint64_t index_array[8];
> >  	struct completion cmplt[8];
> >  	struct parms parms[8];
> > -	int32x4_t input0, input1;
> >
> >  	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
> >  		     total_packets, categories, ctx->trans_table); @@ -181,17
> > +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t
> > **data,
> >
> >  	while (flows.started > 0) {
> >  		/* Gather 4 bytes of input data for each stream. */
> > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> > input0, 0);
> > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> > input1, 0);
> > -
> > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> > input0, 1);
> > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
> > input1, 1);
> > -
> > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
> > input0, 2);
> > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6),
> > input1, 2);
> > -
> > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3),
> > input0, 3);
> > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7),
> > input1, 3);
> > +		int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0),
> > +				    GET_NEXT_4BYTES(parms, 1),
> > +				    GET_NEXT_4BYTES(parms, 2),
> > +				    GET_NEXT_4BYTES(parms, 3)};
> > +		int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4),
> > +				    GET_NEXT_4BYTES(parms, 5),
> > +				    GET_NEXT_4BYTES(parms, 6),
> > +				    GET_NEXT_4BYTES(parms, 7)};
> >
> This mixes the use of NEON intrinsics with GCC vector extensions. ACLE (Arm C
> Language Extensions) specifically recommends not to mix the two methods in
> section 12.2.6. IMO, Aaron's suggestion of using a temp vector should be good.

We are using this pattern across DPDK and SSE for x86 as well.
https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c#n91

Since it used in fastpath, a temp variable would be additional cost for no reason.
If GCC supports it then I think it is fine, I think, above usage matters with C++ portability.


> 
> >  		/* Process the 4 bytes of input on each stream. */
> >
> > @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
> > uint8_t **data,
> >  	uint64_t index_array[4];
> >  	struct completion cmplt[4];
> >  	struct parms parms[4];
> > -	int32x4_t input;
> >
> >  	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
> >  		     total_packets, categories, ctx->trans_table); @@ -242,10
> > +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t
> > **data,
> >
> >  	while (flows.started > 0) {
> >  		/* Gather 4 bytes of input data for each stream. */
> > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input,
> > 0);
> > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input,
> > 1);
> > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input,
> > 2);
> > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input,
> > 3);
> > +		int32x4_t input = {GET_NEXT_4BYTES(parms, 0),
> > +				   GET_NEXT_4BYTES(parms, 1),
> > +				   GET_NEXT_4BYTES(parms, 2),
> > +				   GET_NEXT_4BYTES(parms, 3)};
> >
> >  		/* Process the 4 bytes of input on each stream. */
> >  		input = transition4(input, flows.trans, index_array);
> > --
> > 2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
  2019-06-07  6:21   ` Jerin Jacob Kollanukkaran
@ 2019-06-10  5:29     ` Honnappa Nagarahalli
  2019-06-10  9:39       ` Jerin Jacob Kollanukkaran
  0 siblings, 1 reply; 13+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-10  5:29 UTC (permalink / raw)
  To: jerinj, dev
  Cc: thomas, Gavin Hu (Arm Technology China),
	msantana, aconole, stable, Honnappa Nagarahalli, nd, nd

> >
> > ----------------------------------------------------------------------
> > > Subject: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> > > compiler
> > >
> > > From: Jerin Jacob <jerinj@marvell.com>
> > >
> > > Some compilers reporting the following error, though the existing
> > > code doesn't have any uninitialized variable case.
> > > Just to make compiler happy, initialize the int32x4_t variable one
> > > shot in C language.
> > >
> > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
> > > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be used
> > > uninitialized in this function [-Werror=maybe-uninitialized]
> > >   int32x4_t input;
> > >
> > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
> > > Cc: stable@dpdk.org
> > >
> > > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> > > ---
> > >  lib/librte_acl/acl_run_neon.h | 29 ++++++++++++-----------------
> > >  1 file changed, 12 insertions(+), 17 deletions(-)
> > >
> > > diff --git a/lib/librte_acl/acl_run_neon.h
> > > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..dc9e9efe9 100644
> > > --- a/lib/librte_acl/acl_run_neon.h
> > > +++ b/lib/librte_acl/acl_run_neon.h
> > > @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> > > const uint8_t **data,
> > >  	uint64_t index_array[8];
> > >  	struct completion cmplt[8];
> > >  	struct parms parms[8];
> > > -	int32x4_t input0, input1;
> > >
> > >  	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
> > >  		     total_packets, categories, ctx->trans_table); @@ -181,17
> > > +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> > > +uint8_t
> > > **data,
> > >
> > >  	while (flows.started > 0) {
> > >  		/* Gather 4 bytes of input data for each stream. */
> > > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> > > input0, 0);
> > > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> > > input1, 0);
> > > -
> > > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> > > input0, 1);
> > > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
> > > input1, 1);
> > > -
> > > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
> > > input0, 2);
> > > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6),
> > > input1, 2);
> > > -
> > > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3),
> > > input0, 3);
> > > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7),
> > > input1, 3);
> > > +		int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0),
> > > +				    GET_NEXT_4BYTES(parms, 1),
> > > +				    GET_NEXT_4BYTES(parms, 2),
> > > +				    GET_NEXT_4BYTES(parms, 3)};
> > > +		int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4),
> > > +				    GET_NEXT_4BYTES(parms, 5),
> > > +				    GET_NEXT_4BYTES(parms, 6),
> > > +				    GET_NEXT_4BYTES(parms, 7)};
> > >
> > This mixes the use of NEON intrinsics with GCC vector extensions. ACLE
> > (Arm C Language Extensions) specifically recommends not to mix the two
> > methods in section 12.2.6. IMO, Aaron's suggestion of using a temp vector
> should be good.
> 
> We are using this pattern across DPDK and SSE for x86 as well.
> https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c#n91
I am not sure about x86, I have not looked at a document similar to ACLE for x86. IMO, it is not relevant here as this is Arm specific code.

> 
> Since it used in fastpath, a temp variable would be additional cost for no
> reason.
Then, I would suggest we can go with using 'vdupq_n_s32'.

> If GCC supports it then I think it is fine, I think, above usage matters with C++
> portability.
I did not understand the C++ portability part. Can you elaborate more?

> 
> 
> >
> > >  		/* Process the 4 bytes of input on each stream. */
> > >
> > > @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx,
> > > const uint8_t **data,
> > >  	uint64_t index_array[4];
> > >  	struct completion cmplt[4];
> > >  	struct parms parms[4];
> > > -	int32x4_t input;
> > >
> > >  	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
> > >  		     total_packets, categories, ctx->trans_table); @@ -242,10
> > > +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
> > > +uint8_t
> > > **data,
> > >
> > >  	while (flows.started > 0) {
> > >  		/* Gather 4 bytes of input data for each stream. */
> > > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input,
> > > 0);
> > > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input,
> > > 1);
> > > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input,
> > > 2);
> > > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input,
> > > 3);
> > > +		int32x4_t input = {GET_NEXT_4BYTES(parms, 0),
> > > +				   GET_NEXT_4BYTES(parms, 1),
> > > +				   GET_NEXT_4BYTES(parms, 2),
> > > +				   GET_NEXT_4BYTES(parms, 3)};
> > >
> > >  		/* Process the 4 bytes of input on each stream. */
> > >  		input = transition4(input, flows.trans, index_array);
> > > --
> > > 2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
  2019-06-10  5:29     ` Honnappa Nagarahalli
@ 2019-06-10  9:39       ` Jerin Jacob Kollanukkaran
  2019-06-11  1:27         ` Honnappa Nagarahalli
  0 siblings, 1 reply; 13+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-10  9:39 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev
  Cc: thomas, Gavin Hu (Arm Technology China),
	msantana, aconole, stable, nd, nd

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Monday, June 10, 2019 11:00 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; msantana@redhat.com; aconole@redhat.com;
> stable@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> nd <nd@arm.com>; nd <nd@arm.com>
> Subject: [EXT] RE: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> compiler
> 
> > > --
> > > > Subject: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> > > > compiler
> > > >
> > > > From: Jerin Jacob <jerinj@marvell.com>
> > > >
> > > > Some compilers reporting the following error, though the existing
> > > > code doesn't have any uninitialized variable case.
> > > > Just to make compiler happy, initialize the int32x4_t variable one
> > > > shot in C language.
> > > >
> > > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
> > > > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be
> > > > used uninitialized in this function [-Werror=maybe-uninitialized]
> > > >   int32x4_t input;
> > > >
> > > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
> > > > Cc: stable@dpdk.org
> > > >
> > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> > > > ---
> > > >  lib/librte_acl/acl_run_neon.h | 29 ++++++++++++-----------------
> > > >  1 file changed, 12 insertions(+), 17 deletions(-)
> > > >
> > > > diff --git a/lib/librte_acl/acl_run_neon.h
> > > > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..dc9e9efe9 100644
> > > > --- a/lib/librte_acl/acl_run_neon.h
> > > > +++ b/lib/librte_acl/acl_run_neon.h
> > > > @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> > > > const uint8_t **data,
> > > >  	uint64_t index_array[8];
> > > >  	struct completion cmplt[8];
> > > >  	struct parms parms[8];
> > > > -	int32x4_t input0, input1;
> > > >
> > > >  	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
> > > >  		     total_packets, categories, ctx->trans_table); @@ -181,17
> > > > +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> > > > +uint8_t
> > > > **data,
> > > >
> > > >  	while (flows.started > 0) {
> > > >  		/* Gather 4 bytes of input data for each stream. */
> > > > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> > > > input0, 0);
> > > > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> > > > input1, 0);
> > > > -
> > > > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> > > > input0, 1);
> > > > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5),
> > > > input1, 1);
> > > > -
> > > > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
> > > > input0, 2);
> > > > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 6),
> > > > input1, 2);
> > > > -
> > > > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3),
> > > > input0, 3);
> > > > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 7),
> > > > input1, 3);
> > > > +		int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0),
> > > > +				    GET_NEXT_4BYTES(parms, 1),
> > > > +				    GET_NEXT_4BYTES(parms, 2),
> > > > +				    GET_NEXT_4BYTES(parms, 3)};
> > > > +		int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4),
> > > > +				    GET_NEXT_4BYTES(parms, 5),
> > > > +				    GET_NEXT_4BYTES(parms, 6),
> > > > +				    GET_NEXT_4BYTES(parms, 7)};
> > > >
> > > This mixes the use of NEON intrinsics with GCC vector extensions.
> > > ACLE (Arm C Language Extensions) specifically recommends not to mix
> > > the two methods in section 12.2.6. IMO, Aaron's suggestion of using
> > > a temp vector
> > should be good.
> >
> > We are using this pattern across DPDK and SSE for x86 as well.
> > https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c#n
> > 91
> I am not sure about x86, I have not looked at a document similar to ACLE for
> x86. IMO, it is not relevant here as this is Arm specific code.

What I meant was its been already used in DPDK for arm64.
https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c#n91

Please see offial page vector gcc gcc documentation. The examples are using this scheme.
https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html

This is to just create 'input' variable. I am fine to use any other scheme with out additional cost
of instructions.

> 
> >
> > Since it used in fastpath, a temp variable would be additional cost
> > for no reason.
> Then, I would suggest we can go with using 'vdupq_n_s32'.

We have to form uint64x2_t with 4 x uint32_t variable, How does 'vdupq_n_s32' help here?
Can you share code snippet without any temp variable?

> 
> > If GCC supports it then I think it is fine, I think, above usage
> > matters with C++ portability.
> I did not understand the C++ portability part. Can you elaborate more?
> 
> >
> >
> > >
> > > >  		/* Process the 4 bytes of input on each stream. */
> > > >
> > > > @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx,
> > > > const uint8_t **data,
> > > >  	uint64_t index_array[4];
> > > >  	struct completion cmplt[4];
> > > >  	struct parms parms[4];
> > > > -	int32x4_t input;
> > > >
> > > >  	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
> > > >  		     total_packets, categories, ctx->trans_table); @@ -242,10
> > > > +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
> > > > +uint8_t
> > > > **data,
> > > >
> > > >  	while (flows.started > 0) {
> > > >  		/* Gather 4 bytes of input data for each stream. */
> > > > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input,
> > > > 0);
> > > > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input,
> > > > 1);
> > > > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input,
> > > > 2);
> > > > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input,
> > > > 3);
> > > > +		int32x4_t input = {GET_NEXT_4BYTES(parms, 0),
> > > > +				   GET_NEXT_4BYTES(parms, 1),
> > > > +				   GET_NEXT_4BYTES(parms, 2),
> > > > +				   GET_NEXT_4BYTES(parms, 3)};
> > > >
> > > >  		/* Process the 4 bytes of input on each stream. */
> > > >  		input = transition4(input, flows.trans, index_array);
> > > > --
> > > > 2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
  2019-06-06 14:50 [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler jerinj
  2019-06-06 15:55 ` Michael Santana Francisco
  2019-06-07  5:35 ` Honnappa Nagarahalli
@ 2019-06-10 12:10 ` Aaron Conole
  2019-06-11 14:15 ` [dpdk-stable] [dpdk-dev] [PATCH v2] " jerinj
  3 siblings, 0 replies; 13+ messages in thread
From: Aaron Conole @ 2019-06-10 12:10 UTC (permalink / raw)
  To: jerinj; +Cc: dev, thomas, gavin.hu, honnappa.nagarahalli, msantana, stable

<jerinj@marvell.com> writes:

> From: Jerin Jacob <jerinj@marvell.com>
>
> Some compilers reporting the following error, though the existing
> code doesn't have any uninitialized variable case.
> Just to make compiler happy, initialize the int32x4_t variable
> one shot in C language.
>
> ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
> ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be
> used uninitialized in this function [-Werror=maybe-uninitialized]
>   int32x4_t input;
>
> Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
> Cc: stable@dpdk.org
>
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> ---

This pattern is easy to understand, congruent with other usages in the
code base, has good patch statistics, and solves the issue.

Acked-by: Aaron Conole <aconole@redhat.com>

I prefer this solution to the others posted.  Thanks for looking into
it, Jerin!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
  2019-06-10  9:39       ` Jerin Jacob Kollanukkaran
@ 2019-06-11  1:27         ` Honnappa Nagarahalli
  2019-06-11 14:24           ` Jerin Jacob Kollanukkaran
  0 siblings, 1 reply; 13+ messages in thread
From: Honnappa Nagarahalli @ 2019-06-11  1:27 UTC (permalink / raw)
  To: jerinj, dev
  Cc: thomas, Gavin Hu (Arm Technology China),
	msantana, aconole, stable, Honnappa Nagarahalli, nd, nd

> > > > --
> > > > > Subject: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> > > > > compiler
> > > > >
> > > > > From: Jerin Jacob <jerinj@marvell.com>
> > > > >
> > > > > Some compilers reporting the following error, though the
> > > > > existing code doesn't have any uninitialized variable case.
> > > > > Just to make compiler happy, initialize the int32x4_t variable
> > > > > one shot in C language.
> > > > >
> > > > > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
> > > > > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be
> > > > > used uninitialized in this function [-Werror=maybe-uninitialized]
> > > > >   int32x4_t input;
> > > > >
> > > > > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
> > > > > Cc: stable@dpdk.org
> > > > >
> > > > > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> > > > > ---
> > > > >  lib/librte_acl/acl_run_neon.h | 29
> > > > > ++++++++++++-----------------
> > > > >  1 file changed, 12 insertions(+), 17 deletions(-)
> > > > >
> > > > > diff --git a/lib/librte_acl/acl_run_neon.h
> > > > > b/lib/librte_acl/acl_run_neon.h index 01b9766d8..dc9e9efe9
> > > > > 100644
> > > > > --- a/lib/librte_acl/acl_run_neon.h
> > > > > +++ b/lib/librte_acl/acl_run_neon.h
> > > > > @@ -165,7 +165,6 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> > > > > const uint8_t **data,
> > > > >  	uint64_t index_array[8];
> > > > >  	struct completion cmplt[8];
> > > > >  	struct parms parms[8];
> > > > > -	int32x4_t input0, input1;
> > > > >
> > > > >  	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
> > > > >  		     total_packets, categories, ctx->trans_table); @@ -181,17
> > > > > +180,14 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> > > > > +uint8_t
> > > > > **data,
> > > > >
> > > > >  	while (flows.started > 0) {
> > > > >  		/* Gather 4 bytes of input data for each stream. */
> > > > > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> 0),
> > > > > input0, 0);
> > > > > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> 4),
> > > > > input1, 0);
> > > > > -
> > > > > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> 1),
> > > > > input0, 1);
> > > > > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> 5),
> > > > > input1, 1);
> > > > > -
> > > > > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> 2),
> > > > > input0, 2);
> > > > > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> 6),
> > > > > input1, 2);
> > > > > -
> > > > > -		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> 3),
> > > > > input0, 3);
> > > > > -		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms,
> 7),
> > > > > input1, 3);
> > > > > +		int32x4_t input0 = {GET_NEXT_4BYTES(parms, 0),
> > > > > +				    GET_NEXT_4BYTES(parms, 1),
> > > > > +				    GET_NEXT_4BYTES(parms, 2),
> > > > > +				    GET_NEXT_4BYTES(parms, 3)};
> > > > > +		int32x4_t input1 = {GET_NEXT_4BYTES(parms, 4),
> > > > > +				    GET_NEXT_4BYTES(parms, 5),
> > > > > +				    GET_NEXT_4BYTES(parms, 6),
> > > > > +				    GET_NEXT_4BYTES(parms, 7)};
> > > > >
> > > > This mixes the use of NEON intrinsics with GCC vector extensions.
> > > > ACLE (Arm C Language Extensions) specifically recommends not to
> > > > mix the two methods in section 12.2.6. IMO, Aaron's suggestion of
> > > > using a temp vector
> > > should be good.
> > >
> > > We are using this pattern across DPDK and SSE for x86 as well.
> > > https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c
> > > #n
> > > 91
> > I am not sure about x86, I have not looked at a document similar to
> > ACLE for x86. IMO, it is not relevant here as this is Arm specific code.
> 
> What I meant was its been already used in DPDK for arm64.
> https://git.dpdk.org/dpdk/tree/drivers/net/i40e/i40e_rxtx_vec_neon.c#n91
Ok, got it. I have had discussion with compiler folks at Arm with mixing vector programming models and the recommendation has been to use NEON exclusively. I have had this discussion with Marvel compiler folks too some time back.

> 
> Please see offial page vector gcc gcc documentation. The examples are using
> this scheme.
> https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html
> 
> This is to just create 'input' variable. I am fine to use any other scheme with
> out additional cost of instructions.
> 
> >
> > >
> > > Since it used in fastpath, a temp variable would be additional cost
> > > for no reason.
> > Then, I would suggest we can go with using 'vdupq_n_s32'.
> 
> We have to form uint64x2_t with 4 x uint32_t variable, How does
> 'vdupq_n_s32' help here?
We would use 'vdupq_n_s32' only for the first initialization, the rest of the code remains the same (see the diff below)

> Can you share code snippet without any temp variable?
diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
index 01b9766d8..b3196cd12 100644
--- a/lib/librte_acl/acl_run_neon.h
+++ b/lib/librte_acl/acl_run_neon.h
@@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,

        while (flows.started > 0) {
                /* Gather 4 bytes of input data for each stream. */
-               input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
-               input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
+               input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
+               input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));

                input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
                input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
@@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,

        while (flows.started > 0) {
                /* Gather 4 bytes of input data for each stream. */
-               input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
+               input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
                input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
                input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
                input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);

My understanding is that the generated code for both your patch and my changes above is the same. Above suggested changes will conform to ACLE recommendation.

> 
> >
> > > If GCC supports it then I think it is fine, I think, above usage
> > > matters with C++ portability.
> > I did not understand the C++ portability part. Can you elaborate more?
> >
> > >
> > >
> > > >
> > > > >  		/* Process the 4 bytes of input on each stream. */
> > > > >
> > > > > @@ -227,7 +223,6 @@ search_neon_4(const struct rte_acl_ctx *ctx,
> > > > > const uint8_t **data,
> > > > >  	uint64_t index_array[4];
> > > > >  	struct completion cmplt[4];
> > > > >  	struct parms parms[4];
> > > > > -	int32x4_t input;
> > > > >
> > > > >  	acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
> > > > >  		     total_packets, categories, ctx->trans_table); @@ -242,10
> > > > > +237,10 @@ search_neon_4(const struct rte_acl_ctx *ctx, const
> > > > > +uint8_t
> > > > > **data,
> > > > >
> > > > >  	while (flows.started > 0) {
> > > > >  		/* Gather 4 bytes of input data for each stream. */
> > > > > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> input,
> > > > > 0);
> > > > > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1),
> input,
> > > > > 1);
> > > > > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2),
> input,
> > > > > 2);
> > > > > -		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3),
> input,
> > > > > 3);
> > > > > +		int32x4_t input = {GET_NEXT_4BYTES(parms, 0),
> > > > > +				   GET_NEXT_4BYTES(parms, 1),
> > > > > +				   GET_NEXT_4BYTES(parms, 2),
> > > > > +				   GET_NEXT_4BYTES(parms, 3)};
> > > > >
> > > > >  		/* Process the 4 bytes of input on each stream. */
> > > > >  		input = transition4(input, flows.trans, index_array);
> > > > > --
> > > > > 2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [dpdk-stable] [dpdk-dev] [PATCH v2] acl: fix build issue with some arm64 compiler
  2019-06-06 14:50 [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler jerinj
                   ` (2 preceding siblings ...)
  2019-06-10 12:10 ` Aaron Conole
@ 2019-06-11 14:15 ` jerinj
  2019-06-11 14:53   ` Aaron Conole
  3 siblings, 1 reply; 13+ messages in thread
From: jerinj @ 2019-06-11 14:15 UTC (permalink / raw)
  To: Jerin Jacob, Gavin Hu, Konstantin Ananyev
  Cc: dev, thomas, msantana, aconole, stable, Honnappa Nagarahalli

From: Jerin Jacob <jerinj@marvell.com>

Some compilers reporting the following error, though the existing
code doesn't have any uninitialized variable case.
Just to make compiler happy, initialize the int32x4_t variable
one shot using vdupq_n_s32.

../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be
used uninitialized in this function [-Werror=maybe-uninitialized]
  int32x4_t input;

Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
Cc: stable@dpdk.org

Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Signed-off-by: Jerin Jacob <jerinj@marvell.com>
---

v2:
- Changed C based initializion to vdupq_n_s32 for better comparability with
  ACLE(Honnappa)

---
 lib/librte_acl/acl_run_neon.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
index 01b9766d8..b3196cd12 100644
--- a/lib/librte_acl/acl_run_neon.h
+++ b/lib/librte_acl/acl_run_neon.h
@@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
 
 	while (flows.started > 0) {
 		/* Gather 4 bytes of input data for each stream. */
-		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
-		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
+		input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
+		input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));
 
 		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
 		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
@@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
 
 	while (flows.started > 0) {
 		/* Gather 4 bytes of input data for each stream. */
-		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
+		input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
 		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
 		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
 		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
-- 
2.21.0


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler
  2019-06-11  1:27         ` Honnappa Nagarahalli
@ 2019-06-11 14:24           ` Jerin Jacob Kollanukkaran
  0 siblings, 0 replies; 13+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-11 14:24 UTC (permalink / raw)
  To: Honnappa Nagarahalli, dev
  Cc: thomas, Gavin Hu (Arm Technology China),
	msantana, aconole, stable, nd, nd

> -----Original Message-----
> From: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>
> Sent: Tuesday, June 11, 2019 6:58 AM
> To: Jerin Jacob Kollanukkaran <jerinj@marvell.com>; dev@dpdk.org
> Cc: thomas@monjalon.net; Gavin Hu (Arm Technology China)
> <Gavin.Hu@arm.com>; msantana@redhat.com; aconole@redhat.com;
> stable@dpdk.org; Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>;
> nd <nd@arm.com>; nd <nd@arm.com>
> Subject: [EXT] RE: [dpdk-dev] [PATCH] acl: fix build issue with some arm64
> compiler
> 
> > >
> > > >
> > > > Since it used in fastpath, a temp variable would be additional
> > > > cost for no reason.
> > > Then, I would suggest we can go with using 'vdupq_n_s32'.
> >
> > We have to form uint64x2_t with 4 x uint32_t variable, How does
> > 'vdupq_n_s32' help here?
> We would use 'vdupq_n_s32' only for the first initialization, the rest of the code
> remains the same (see the diff below)
> 
> > Can you share code snippet without any temp variable?
> diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h index
> 01b9766d8..b3196cd12 100644
> --- a/lib/librte_acl/acl_run_neon.h
> +++ b/lib/librte_acl/acl_run_neon.h
> @@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> uint8_t **data,
> 
>         while (flows.started > 0) {
>                 /* Gather 4 bytes of input data for each stream. */
> -               input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
> -               input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
> +               input0 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
> +               input1 = vdupq_n_s32(GET_NEXT_4BYTES(parms, 4));
> 
>                 input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
>                 input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1); @@ -
> 242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t
> **data,
> 
>         while (flows.started > 0) {
>                 /* Gather 4 bytes of input data for each stream. */
> -               input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
> +               input = vdupq_n_s32(GET_NEXT_4BYTES(parms, 0));
>                 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
>                 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
>                 input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
> 
> My understanding is that the generated code for both your patch and my
> changes above is the same. Above suggested changes will conform to ACLE
> recommendation.

Though instructions are different. Effective cycles are same even though
First dup updates the four positions.
To make forward progress send the v2 based on the updated logic
 just to make ACLE  Spec happy, I don’t see any real reason to do it though 😊

http://patches.dpdk.org/patch/54656/



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH v2] acl: fix build issue with some arm64 compiler
  2019-06-11 14:15 ` [dpdk-stable] [dpdk-dev] [PATCH v2] " jerinj
@ 2019-06-11 14:53   ` Aaron Conole
  2019-06-11 15:07     ` Thomas Monjalon
  0 siblings, 1 reply; 13+ messages in thread
From: Aaron Conole @ 2019-06-11 14:53 UTC (permalink / raw)
  To: jerinj
  Cc: Gavin Hu, Konstantin Ananyev, dev, thomas, msantana, stable,
	Honnappa Nagarahalli

<jerinj@marvell.com> writes:

> From: Jerin Jacob <jerinj@marvell.com>
>
> Some compilers reporting the following error, though the existing
> code doesn't have any uninitialized variable case.
> Just to make compiler happy, initialize the int32x4_t variable
> one shot using vdupq_n_s32.
>
> ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
> ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be
> used uninitialized in this function [-Werror=maybe-uninitialized]
>   int32x4_t input;
>
> Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
> Cc: stable@dpdk.org
>
> Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> ---

LGTM

Acked-by: Aaron Conole <aconole@redhat.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH v2] acl: fix build issue with some arm64 compiler
  2019-06-11 14:53   ` Aaron Conole
@ 2019-06-11 15:07     ` Thomas Monjalon
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Monjalon @ 2019-06-11 15:07 UTC (permalink / raw)
  To: jerinj
  Cc: dev, Aaron Conole, Gavin Hu, Konstantin Ananyev, msantana,
	stable, Honnappa Nagarahalli

11/06/2019 23:53, Aaron Conole:
> <jerinj@marvell.com> writes:
> 
> > From: Jerin Jacob <jerinj@marvell.com>
> >
> > Some compilers reporting the following error, though the existing
> > code doesn't have any uninitialized variable case.
> > Just to make compiler happy, initialize the int32x4_t variable
> > one shot using vdupq_n_s32.
> >
> > ../lib/librte_acl/acl_run_neon.h: In function 'search_neon_4'
> > ../lib/librte_acl/acl_run_neon.h:230:12: error: 'input' may be
> > used uninitialized in this function [-Werror=maybe-uninitialized]
> >   int32x4_t input;
> >
> > Fixes: 34fa6c27c156 ("acl: add NEON optimization for ARMv8")
> > Cc: stable@dpdk.org
> >
> > Suggested-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> > Signed-off-by: Jerin Jacob <jerinj@marvell.com>
> > ---
> 
> LGTM
> 
> Acked-by: Aaron Conole <aconole@redhat.com>

Applied, thanks




^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2019-06-11 15:08 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-06 14:50 [dpdk-stable] [dpdk-dev] [PATCH] acl: fix build issue with some arm64 compiler jerinj
2019-06-06 15:55 ` Michael Santana Francisco
2019-06-07  5:42   ` Honnappa Nagarahalli
2019-06-07  5:35 ` Honnappa Nagarahalli
2019-06-07  6:21   ` Jerin Jacob Kollanukkaran
2019-06-10  5:29     ` Honnappa Nagarahalli
2019-06-10  9:39       ` Jerin Jacob Kollanukkaran
2019-06-11  1:27         ` Honnappa Nagarahalli
2019-06-11 14:24           ` Jerin Jacob Kollanukkaran
2019-06-10 12:10 ` Aaron Conole
2019-06-11 14:15 ` [dpdk-stable] [dpdk-dev] [PATCH v2] " jerinj
2019-06-11 14:53   ` Aaron Conole
2019-06-11 15:07     ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).