DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build
@ 2019-04-08 18:24 Aaron Conole
  2019-04-08 18:24 ` Aaron Conole
                   ` (4 more replies)
  0 siblings, 5 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-08 18:24 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, Jerin Jacob, Gavin Hu

This series fixes the following conditions in the RTE_ACL library:

  1. Fix outstanding compilation issues on ARM with the NEON optimized code
     These consisted mostly of compiler type-cast warnings.  Additionally, some
     of the vector code didn't initialize memory properly.

  2. Properly include ARM, and PPC objects when building on those platforms
     During the meson port, only the scalar code, and some of the x86_64 code was
     ported.

  3. Allow the unit tests to pass
     In order to support this, the unsupported symbols were moved to a separate
     file, which was needed to prevent the compiler from inlining references to the
     functions (resulting in non-scalar code always falling into the -ENOTSUP case).

     The tests were modified to primarily test the scalar version - a better system
     for exercising the non-scalar code needs to be developed.

Aaron Conole (3):
  acl: fix arm argument types
  acl: update the build for multi-arch
  acl: adjust the tests

 app/test/test_acl.c             | 62 +++++++++++++--------------------
 lib/librte_acl/Makefile         |  1 +
 lib/librte_acl/acl_run_neon.h   | 46 ++++++++++++++----------
 lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
 lib/librte_acl/meson.build      |  9 +++--
 5 files changed, 104 insertions(+), 60 deletions(-)
 create mode 100644 lib/librte_acl/acl_run_notsup.c

-- 
2.19.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build
  2019-04-08 18:24 [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build Aaron Conole
@ 2019-04-08 18:24 ` Aaron Conole
  2019-04-08 18:24 ` [dpdk-dev] [PATCH 1/3] acl: fix arm argument types Aaron Conole
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-08 18:24 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, Jerin Jacob, Gavin Hu

This series fixes the following conditions in the RTE_ACL library:

  1. Fix outstanding compilation issues on ARM with the NEON optimized code
     These consisted mostly of compiler type-cast warnings.  Additionally, some
     of the vector code didn't initialize memory properly.

  2. Properly include ARM, and PPC objects when building on those platforms
     During the meson port, only the scalar code, and some of the x86_64 code was
     ported.

  3. Allow the unit tests to pass
     In order to support this, the unsupported symbols were moved to a separate
     file, which was needed to prevent the compiler from inlining references to the
     functions (resulting in non-scalar code always falling into the -ENOTSUP case).

     The tests were modified to primarily test the scalar version - a better system
     for exercising the non-scalar code needs to be developed.

Aaron Conole (3):
  acl: fix arm argument types
  acl: update the build for multi-arch
  acl: adjust the tests

 app/test/test_acl.c             | 62 +++++++++++++--------------------
 lib/librte_acl/Makefile         |  1 +
 lib/librte_acl/acl_run_neon.h   | 46 ++++++++++++++----------
 lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
 lib/librte_acl/meson.build      |  9 +++--
 5 files changed, 104 insertions(+), 60 deletions(-)
 create mode 100644 lib/librte_acl/acl_run_notsup.c

-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 1/3] acl: fix arm argument types
  2019-04-08 18:24 [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build Aaron Conole
  2019-04-08 18:24 ` Aaron Conole
@ 2019-04-08 18:24 ` Aaron Conole
  2019-04-08 18:24   ` Aaron Conole
  2019-04-10 14:39   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
  2019-04-08 18:24 ` [dpdk-dev] [PATCH 2/3] acl: update the build for multi-arch Aaron Conole
                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-08 18:24 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, Jerin Jacob, Gavin Hu

Compiler complains of argument type mismatch, like:

   ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
   ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-vector-conversions
      to permit conversions between vectors with differing element types
      or numbers of subparts
     node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
     ^
   ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible type for
      argument 2 of ‘vbicq_s32’

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
index 01b9766d8..4a8e4b681 100644
--- a/lib/librte_acl/acl_run_neon.h
+++ b/lib/librte_acl/acl_run_neon.h
@@ -112,37 +112,41 @@ transition4(int32x4_t next_input, const uint64_t *trans, uint64_t transitions[])
 	index_msk = vld1q_u32((const uint32_t *)&neon_acl_const.xmm_index_mask);
 
 	/* Calc node type and node addr */
-	node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
-	addr = vandq_s32(tr_hi_lo.val[0], index_msk);
+	node_type = (uint32x4_t) vbicq_s32(tr_hi_lo.val[0],
+				       (int32x4_t)index_msk);
+	addr = (uint32x4_t) vandq_s32(tr_hi_lo.val[0], (int32x4_t) index_msk);
 
 	/* t = 0 */
-	t = veorq_s32(node_type, node_type);
+	t = veorq_s32((int32x4_t)node_type, (int32x4_t)node_type);
 
 	/* mask for DFA type(0) nodes */
-	dfa_msk = vceqq_u32(node_type, t);
+	dfa_msk = vceqq_u32(node_type, (uint32x4_t)t);
 
-	mask = vld1q_s32((const int32_t *)&neon_acl_const.xmm_shuffle_input);
-	in = vqtbl1q_u8((uint8x16_t)next_input, (uint8x16_t)mask);
+	mask = (uint32x4_t)
+	       vld1q_s32((const int32_t *)&neon_acl_const.xmm_shuffle_input);
+	in = (int32x4_t) vqtbl1q_u8((uint8x16_t)next_input, (uint8x16_t)mask);
 
 	/* DFA calculations. */
-	r = vshrq_n_u32(in, 30); /* div by 64 */
-	mask = vld1q_s32((const int32_t *)&neon_acl_const.range_base);
-	r = vaddq_u8(r, mask);
-	t = vshrq_n_u32(in, 24);
-	r = vqtbl1q_u8((uint8x16_t)tr_hi_lo.val[1], (uint8x16_t)r);
-	dfa_ofs = vsubq_s32(t, r);
+	r = (int32x4_t) vshrq_n_u32((uint32x4_t) in, 30); /* div by 64 */
+	mask = (uint32x4_t)
+	       vld1q_s32((const int32_t *)&neon_acl_const.range_base);
+	r = (int32x4_t) vaddq_u8((uint8x16_t)r, (uint8x16_t)mask);
+	t = (int32x4_t) vshrq_n_u32((uint32x4_t)in, 24);
+	r = (int32x4_t) vqtbl1q_u8((uint8x16_t)tr_hi_lo.val[1], (uint8x16_t)r);
+	dfa_ofs = (uint32x4_t) vsubq_s32(t, r);
 
 	/* QUAD/SINGLE calculations. */
-	t = vcgtq_s8(in, tr_hi_lo.val[1]);
-	t = vabsq_s8(t);
-	t = vpaddlq_u8(t);
-	quad_ofs = vpaddlq_u16(t);
+	t = (int32x4_t) vcgtq_s8((int8x16_t)in, (int8x16_t)tr_hi_lo.val[1]);
+	t = (int32x4_t) vabsq_s8((int8x16_t)t);
+	t = (int32x4_t) vpaddlq_u8((uint8x16_t)t);
+	quad_ofs = vpaddlq_u16((uint16x8_t)t);
 
 	/* blend DFA and QUAD/SINGLE. */
-	t = vbslq_u8(dfa_msk, dfa_ofs, quad_ofs);
+	t = (int32x4_t) vbslq_u8((uint8x16_t)dfa_msk, (uint8x16_t)dfa_ofs,
+				 (uint8x16_t)quad_ofs);
 
 	/* calculate address for next transitions */
-	addr = vaddq_u32(addr, t);
+	addr = vaddq_u32(addr, (uint32x4_t)t);
 
 	/* Fill next transitions */
 	transitions[0] = trans[vgetq_lane_u32(addr, 0)];
@@ -150,7 +154,7 @@ transition4(int32x4_t next_input, const uint64_t *trans, uint64_t transitions[])
 	transitions[2] = trans[vgetq_lane_u32(addr, 2)];
 	transitions[3] = trans[vgetq_lane_u32(addr, 3)];
 
-	return vshrq_n_u32(next_input, CHAR_BIT);
+	return (int32x4_t) vshrq_n_u32((uint32x4_t)next_input, CHAR_BIT);
 }
 
 /*
@@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
 	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
 	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
 
+	memset(&input0, 0, sizeof(input0));
+	memset(&input1, 0, sizeof(input1));
+
 	while (flows.started > 0) {
 		/* Gather 4 bytes of input data for each stream. */
 		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
@@ -240,6 +247,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
 	/* Check for any matches. */
 	acl_match_check_x4(0, ctx, parms, &flows, index_array);
 
+	memset(&input, 0, sizeof(input));
 	while (flows.started > 0) {
 		/* Gather 4 bytes of input data for each stream. */
 		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
-- 
2.19.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 1/3] acl: fix arm argument types
  2019-04-08 18:24 ` [dpdk-dev] [PATCH 1/3] acl: fix arm argument types Aaron Conole
@ 2019-04-08 18:24   ` Aaron Conole
  2019-04-10 14:39   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
  1 sibling, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-08 18:24 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, Jerin Jacob, Gavin Hu

Compiler complains of argument type mismatch, like:

   ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
   ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-vector-conversions
      to permit conversions between vectors with differing element types
      or numbers of subparts
     node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
     ^
   ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible type for
      argument 2 of ‘vbicq_s32’

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++---------------
 1 file changed, 27 insertions(+), 19 deletions(-)

diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
index 01b9766d8..4a8e4b681 100644
--- a/lib/librte_acl/acl_run_neon.h
+++ b/lib/librte_acl/acl_run_neon.h
@@ -112,37 +112,41 @@ transition4(int32x4_t next_input, const uint64_t *trans, uint64_t transitions[])
 	index_msk = vld1q_u32((const uint32_t *)&neon_acl_const.xmm_index_mask);
 
 	/* Calc node type and node addr */
-	node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
-	addr = vandq_s32(tr_hi_lo.val[0], index_msk);
+	node_type = (uint32x4_t) vbicq_s32(tr_hi_lo.val[0],
+				       (int32x4_t)index_msk);
+	addr = (uint32x4_t) vandq_s32(tr_hi_lo.val[0], (int32x4_t) index_msk);
 
 	/* t = 0 */
-	t = veorq_s32(node_type, node_type);
+	t = veorq_s32((int32x4_t)node_type, (int32x4_t)node_type);
 
 	/* mask for DFA type(0) nodes */
-	dfa_msk = vceqq_u32(node_type, t);
+	dfa_msk = vceqq_u32(node_type, (uint32x4_t)t);
 
-	mask = vld1q_s32((const int32_t *)&neon_acl_const.xmm_shuffle_input);
-	in = vqtbl1q_u8((uint8x16_t)next_input, (uint8x16_t)mask);
+	mask = (uint32x4_t)
+	       vld1q_s32((const int32_t *)&neon_acl_const.xmm_shuffle_input);
+	in = (int32x4_t) vqtbl1q_u8((uint8x16_t)next_input, (uint8x16_t)mask);
 
 	/* DFA calculations. */
-	r = vshrq_n_u32(in, 30); /* div by 64 */
-	mask = vld1q_s32((const int32_t *)&neon_acl_const.range_base);
-	r = vaddq_u8(r, mask);
-	t = vshrq_n_u32(in, 24);
-	r = vqtbl1q_u8((uint8x16_t)tr_hi_lo.val[1], (uint8x16_t)r);
-	dfa_ofs = vsubq_s32(t, r);
+	r = (int32x4_t) vshrq_n_u32((uint32x4_t) in, 30); /* div by 64 */
+	mask = (uint32x4_t)
+	       vld1q_s32((const int32_t *)&neon_acl_const.range_base);
+	r = (int32x4_t) vaddq_u8((uint8x16_t)r, (uint8x16_t)mask);
+	t = (int32x4_t) vshrq_n_u32((uint32x4_t)in, 24);
+	r = (int32x4_t) vqtbl1q_u8((uint8x16_t)tr_hi_lo.val[1], (uint8x16_t)r);
+	dfa_ofs = (uint32x4_t) vsubq_s32(t, r);
 
 	/* QUAD/SINGLE calculations. */
-	t = vcgtq_s8(in, tr_hi_lo.val[1]);
-	t = vabsq_s8(t);
-	t = vpaddlq_u8(t);
-	quad_ofs = vpaddlq_u16(t);
+	t = (int32x4_t) vcgtq_s8((int8x16_t)in, (int8x16_t)tr_hi_lo.val[1]);
+	t = (int32x4_t) vabsq_s8((int8x16_t)t);
+	t = (int32x4_t) vpaddlq_u8((uint8x16_t)t);
+	quad_ofs = vpaddlq_u16((uint16x8_t)t);
 
 	/* blend DFA and QUAD/SINGLE. */
-	t = vbslq_u8(dfa_msk, dfa_ofs, quad_ofs);
+	t = (int32x4_t) vbslq_u8((uint8x16_t)dfa_msk, (uint8x16_t)dfa_ofs,
+				 (uint8x16_t)quad_ofs);
 
 	/* calculate address for next transitions */
-	addr = vaddq_u32(addr, t);
+	addr = vaddq_u32(addr, (uint32x4_t)t);
 
 	/* Fill next transitions */
 	transitions[0] = trans[vgetq_lane_u32(addr, 0)];
@@ -150,7 +154,7 @@ transition4(int32x4_t next_input, const uint64_t *trans, uint64_t transitions[])
 	transitions[2] = trans[vgetq_lane_u32(addr, 2)];
 	transitions[3] = trans[vgetq_lane_u32(addr, 3)];
 
-	return vshrq_n_u32(next_input, CHAR_BIT);
+	return (int32x4_t) vshrq_n_u32((uint32x4_t)next_input, CHAR_BIT);
 }
 
 /*
@@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
 	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
 	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
 
+	memset(&input0, 0, sizeof(input0));
+	memset(&input1, 0, sizeof(input1));
+
 	while (flows.started > 0) {
 		/* Gather 4 bytes of input data for each stream. */
 		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
@@ -240,6 +247,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
 	/* Check for any matches. */
 	acl_match_check_x4(0, ctx, parms, &flows, index_array);
 
+	memset(&input, 0, sizeof(input));
 	while (flows.started > 0) {
 		/* Gather 4 bytes of input data for each stream. */
 		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 2/3] acl: update the build for multi-arch
  2019-04-08 18:24 [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build Aaron Conole
  2019-04-08 18:24 ` Aaron Conole
  2019-04-08 18:24 ` [dpdk-dev] [PATCH 1/3] acl: fix arm argument types Aaron Conole
@ 2019-04-08 18:24 ` Aaron Conole
  2019-04-08 18:24   ` Aaron Conole
  2019-04-08 18:24 ` [dpdk-dev] [PATCH 3/3] acl: adjust the tests Aaron Conole
  2019-04-08 20:40 ` [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build Aaron Conole
  4 siblings, 1 reply; 44+ messages in thread
From: Aaron Conole @ 2019-04-08 18:24 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, Jerin Jacob, Gavin Hu

For the introduction of the meson build, the build file for the ACL library
architecture specific files was not ported.  This means the compiler didn't
know about the optimized versions when building the RTE_ACL library for
each architecture.

Now hook up the different architecures by checking the architecture build
environment and including the right objects.

Weak symbols aren't working with this commit but will get fixed to properly
select the right runtime version in a future commit.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_acl/meson.build | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_acl/meson.build b/lib/librte_acl/meson.build
index 2207dbafe..03c19e4e5 100644
--- a/lib/librte_acl/meson.build
+++ b/lib/librte_acl/meson.build
@@ -27,5 +27,8 @@ if arch_subdir == 'x86'
 		objs += avx2_tmplib.extract_objects('acl_run_avx2.c')
 		cflags += '-DCC_AVX2_SUPPORT'
 	endif
-
+elif arch_subdir == 'arm' and host_machine.cpu_family().startswith('aarch64')
+	sources += files('acl_run_neon.c')
+elif arch_subdir == 'ppc_64'
+	sources += files('acl_run_altivec.c')
 endif
-- 
2.19.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 2/3] acl: update the build for multi-arch
  2019-04-08 18:24 ` [dpdk-dev] [PATCH 2/3] acl: update the build for multi-arch Aaron Conole
@ 2019-04-08 18:24   ` Aaron Conole
  0 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-08 18:24 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, Jerin Jacob, Gavin Hu

For the introduction of the meson build, the build file for the ACL library
architecture specific files was not ported.  This means the compiler didn't
know about the optimized versions when building the RTE_ACL library for
each architecture.

Now hook up the different architecures by checking the architecture build
environment and including the right objects.

Weak symbols aren't working with this commit but will get fixed to properly
select the right runtime version in a future commit.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 lib/librte_acl/meson.build | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/lib/librte_acl/meson.build b/lib/librte_acl/meson.build
index 2207dbafe..03c19e4e5 100644
--- a/lib/librte_acl/meson.build
+++ b/lib/librte_acl/meson.build
@@ -27,5 +27,8 @@ if arch_subdir == 'x86'
 		objs += avx2_tmplib.extract_objects('acl_run_avx2.c')
 		cflags += '-DCC_AVX2_SUPPORT'
 	endif
-
+elif arch_subdir == 'arm' and host_machine.cpu_family().startswith('aarch64')
+	sources += files('acl_run_neon.c')
+elif arch_subdir == 'ppc_64'
+	sources += files('acl_run_altivec.c')
 endif
-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-08 18:24 [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build Aaron Conole
                   ` (2 preceding siblings ...)
  2019-04-08 18:24 ` [dpdk-dev] [PATCH 2/3] acl: update the build for multi-arch Aaron Conole
@ 2019-04-08 18:24 ` Aaron Conole
  2019-04-08 18:24   ` Aaron Conole
  2019-04-09  8:41   ` Ananyev, Konstantin
  2019-04-08 20:40 ` [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build Aaron Conole
  4 siblings, 2 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-08 18:24 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, Jerin Jacob, Gavin Hu

This makes the tests pass, and also ensures that on platforms where the
testing is supported, we can properly test the implementation specific
code.  One edge case is when we run on x86_64 systems that don't support
AVX2, but where the compiler can generate such instructions.  That could
be an enhancement in the future, but for now at least the tests will
pass.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 app/test/test_acl.c             | 62 +++++++++++++--------------------
 lib/librte_acl/Makefile         |  1 +
 lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
 lib/librte_acl/meson.build      |  4 +--
 4 files changed, 73 insertions(+), 40 deletions(-)
 create mode 100644 lib/librte_acl/acl_run_notsup.c

diff --git a/app/test/test_acl.c b/app/test/test_acl.c
index b1f75d1bc..c44faa251 100644
--- a/app/test/test_acl.c
+++ b/app/test/test_acl.c
@@ -408,6 +408,9 @@ test_classify(void)
 		return -1;
 	}
 
+	/* Always use the scalar testing for now. */
+	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
+
 	ret = 0;
 	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
 
@@ -547,6 +550,7 @@ test_build_ports_range(void)
 	for (i = 0; i != RTE_DIM(test_data); i++)
 		data[i] = (uint8_t *)&test_data[i];
 
+	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
 	for (i = 0; i != RTE_DIM(test_rules); i++) {
 		rte_acl_reset(acx);
 		ret = test_classify_buid(acx, test_rules, i + 1);
@@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
 		return -1;
 	}
 
+	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
+
 	rc = convert_rules(acx, convert, acl_test_rules,
 		RTE_DIM(acl_test_rules));
 	if (rc != 0)
@@ -1352,7 +1358,7 @@ test_invalid_parameters(void)
 	struct rte_acl_param param;
 	struct rte_acl_ctx *acx;
 	struct rte_acl_ipv4vlan_rule rule;
-	int result;
+	int i, result;
 
 	uint32_t layout[RTE_ACL_IPV4VLAN_NUM] = {0};
 
@@ -1513,45 +1519,25 @@ test_invalid_parameters(void)
 		return -1;
 	}
 
-	/* SSE classify test */
-
-	/* cover zero categories in classify (should not fail) */
-	result = rte_acl_classify(acx, NULL, NULL, 0, 0);
-	if (result != 0) {
-		printf("Line %i: SSE classify with zero categories "
-				"failed!\n", __LINE__);
-		rte_acl_free(acx);
-		return -1;
-	}
-
-	/* cover invalid but positive categories in classify */
-	result = rte_acl_classify(acx, NULL, NULL, 0, 3);
-	if (result == 0) {
-		printf("Line %i: SSE classify with 3 categories "
-				"should have failed!\n", __LINE__);
-		rte_acl_free(acx);
-		return -1;
-	}
-
-	/* scalar classify test */
+	for (i = RTE_ACL_CLASSIFY_DEFAULT; i < RTE_ACL_CLASSIFY_NUM; ++i) {
+		rte_acl_set_ctx_classify(acx, i); /* set up the classify code */
 
-	/* cover zero categories in classify (should not fail) */
-	result = rte_acl_classify_alg(acx, NULL, NULL, 0, 0,
-		RTE_ACL_CLASSIFY_SCALAR);
-	if (result != 0) {
-		printf("Line %i: Scalar classify with zero categories "
-				"failed!\n", __LINE__);
-		rte_acl_free(acx);
-		return -1;
-	}
+		/* cover zero categories in classify (should not fail) */
+		result = rte_acl_classify(acx, NULL, NULL, 0, 0);
+		if (result != 0 && result != -ENOTSUP) {
+			printf("AGL: %d, ACL classify with zero categories failed: %d!\n",
+			       i, result);
+			return -1;
+		}
 
-	/* cover invalid but positive categories in classify */
-	result = rte_acl_classify(acx, NULL, NULL, 0, 3);
-	if (result == 0) {
-		printf("Line %i: Scalar classify with 3 categories "
-				"should have failed!\n", __LINE__);
-		rte_acl_free(acx);
-		return -1;
+		/* cover invalid but positive categories in classify */
+		result = rte_acl_classify(acx, NULL, NULL, 0, 3);
+		/* we don't check for -ENOTSUP here, since it is a failure */
+		if (result == 0) {
+			printf("AGL: %d, ACL classify with 3 categories should fail!\n",
+			       i);
+			return -1;
+		}
 	}
 
 	/* free ACL context */
diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
index ea5edf00a..c5dfdb832 100644
--- a/lib/librte_acl/Makefile
+++ b/lib/librte_acl/Makefile
@@ -21,6 +21,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += rte_acl.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c
+SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_notsup.c
 
 ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_neon.c
diff --git a/lib/librte_acl/acl_run_notsup.c b/lib/librte_acl/acl_run_notsup.c
new file mode 100644
index 000000000..2bcc6e67f
--- /dev/null
+++ b/lib/librte_acl/acl_run_notsup.c
@@ -0,0 +1,46 @@
+#include <rte_acl.h>
+#include "acl.h"
+
+/*
+ * If the compiler doesn't support AVX2 instructions,
+ * then the dummy one would be used instead for AVX2 classify method.
+ */
+int __rte_weak
+rte_acl_classify_avx2(__rte_unused const struct rte_acl_ctx *ctx,
+	__rte_unused const uint8_t **data,
+	__rte_unused uint32_t *results,
+	__rte_unused uint32_t num,
+	__rte_unused uint32_t categories)
+{
+	return -ENOTSUP;
+}
+
+int __rte_weak
+rte_acl_classify_sse(__rte_unused const struct rte_acl_ctx *ctx,
+	__rte_unused const uint8_t **data,
+	__rte_unused uint32_t *results,
+	__rte_unused uint32_t num,
+	__rte_unused uint32_t categories)
+{
+	return -ENOTSUP;
+}
+
+int __rte_weak
+rte_acl_classify_neon(__rte_unused const struct rte_acl_ctx *ctx,
+	__rte_unused const uint8_t **data,
+	__rte_unused uint32_t *results,
+	__rte_unused uint32_t num,
+	__rte_unused uint32_t categories)
+{
+	return -ENOTSUP;
+}
+
+int __rte_weak
+rte_acl_classify_altivec(__rte_unused const struct rte_acl_ctx *ctx,
+	__rte_unused const uint8_t **data,
+	__rte_unused uint32_t *results,
+	__rte_unused uint32_t num,
+	__rte_unused uint32_t categories)
+{
+	return -ENOTSUP;
+}
diff --git a/lib/librte_acl/meson.build b/lib/librte_acl/meson.build
index 03c19e4e5..fc8689aa9 100644
--- a/lib/librte_acl/meson.build
+++ b/lib/librte_acl/meson.build
@@ -2,8 +2,8 @@
 # Copyright(c) 2017 Intel Corporation
 
 version = 2
-sources = files('acl_bld.c', 'acl_gen.c', 'acl_run_scalar.c',
-		'rte_acl.c', 'tb_mem.c')
+sources = files('acl_bld.c', 'acl_gen.c', 'acl_run_notsup.c',
+		'acl_run_scalar.c', 'rte_acl.c', 'tb_mem.c')
 headers = files('rte_acl.h', 'rte_acl_osdep.h')
 
 if arch_subdir == 'x86'
-- 
2.19.1

^ permalink raw reply	[flat|nested] 44+ messages in thread

* [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-08 18:24 ` [dpdk-dev] [PATCH 3/3] acl: adjust the tests Aaron Conole
@ 2019-04-08 18:24   ` Aaron Conole
  2019-04-09  8:41   ` Ananyev, Konstantin
  1 sibling, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-08 18:24 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, Jerin Jacob, Gavin Hu

This makes the tests pass, and also ensures that on platforms where the
testing is supported, we can properly test the implementation specific
code.  One edge case is when we run on x86_64 systems that don't support
AVX2, but where the compiler can generate such instructions.  That could
be an enhancement in the future, but for now at least the tests will
pass.

Signed-off-by: Aaron Conole <aconole@redhat.com>
---
 app/test/test_acl.c             | 62 +++++++++++++--------------------
 lib/librte_acl/Makefile         |  1 +
 lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
 lib/librte_acl/meson.build      |  4 +--
 4 files changed, 73 insertions(+), 40 deletions(-)
 create mode 100644 lib/librte_acl/acl_run_notsup.c

diff --git a/app/test/test_acl.c b/app/test/test_acl.c
index b1f75d1bc..c44faa251 100644
--- a/app/test/test_acl.c
+++ b/app/test/test_acl.c
@@ -408,6 +408,9 @@ test_classify(void)
 		return -1;
 	}
 
+	/* Always use the scalar testing for now. */
+	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
+
 	ret = 0;
 	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
 
@@ -547,6 +550,7 @@ test_build_ports_range(void)
 	for (i = 0; i != RTE_DIM(test_data); i++)
 		data[i] = (uint8_t *)&test_data[i];
 
+	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
 	for (i = 0; i != RTE_DIM(test_rules); i++) {
 		rte_acl_reset(acx);
 		ret = test_classify_buid(acx, test_rules, i + 1);
@@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
 		return -1;
 	}
 
+	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
+
 	rc = convert_rules(acx, convert, acl_test_rules,
 		RTE_DIM(acl_test_rules));
 	if (rc != 0)
@@ -1352,7 +1358,7 @@ test_invalid_parameters(void)
 	struct rte_acl_param param;
 	struct rte_acl_ctx *acx;
 	struct rte_acl_ipv4vlan_rule rule;
-	int result;
+	int i, result;
 
 	uint32_t layout[RTE_ACL_IPV4VLAN_NUM] = {0};
 
@@ -1513,45 +1519,25 @@ test_invalid_parameters(void)
 		return -1;
 	}
 
-	/* SSE classify test */
-
-	/* cover zero categories in classify (should not fail) */
-	result = rte_acl_classify(acx, NULL, NULL, 0, 0);
-	if (result != 0) {
-		printf("Line %i: SSE classify with zero categories "
-				"failed!\n", __LINE__);
-		rte_acl_free(acx);
-		return -1;
-	}
-
-	/* cover invalid but positive categories in classify */
-	result = rte_acl_classify(acx, NULL, NULL, 0, 3);
-	if (result == 0) {
-		printf("Line %i: SSE classify with 3 categories "
-				"should have failed!\n", __LINE__);
-		rte_acl_free(acx);
-		return -1;
-	}
-
-	/* scalar classify test */
+	for (i = RTE_ACL_CLASSIFY_DEFAULT; i < RTE_ACL_CLASSIFY_NUM; ++i) {
+		rte_acl_set_ctx_classify(acx, i); /* set up the classify code */
 
-	/* cover zero categories in classify (should not fail) */
-	result = rte_acl_classify_alg(acx, NULL, NULL, 0, 0,
-		RTE_ACL_CLASSIFY_SCALAR);
-	if (result != 0) {
-		printf("Line %i: Scalar classify with zero categories "
-				"failed!\n", __LINE__);
-		rte_acl_free(acx);
-		return -1;
-	}
+		/* cover zero categories in classify (should not fail) */
+		result = rte_acl_classify(acx, NULL, NULL, 0, 0);
+		if (result != 0 && result != -ENOTSUP) {
+			printf("AGL: %d, ACL classify with zero categories failed: %d!\n",
+			       i, result);
+			return -1;
+		}
 
-	/* cover invalid but positive categories in classify */
-	result = rte_acl_classify(acx, NULL, NULL, 0, 3);
-	if (result == 0) {
-		printf("Line %i: Scalar classify with 3 categories "
-				"should have failed!\n", __LINE__);
-		rte_acl_free(acx);
-		return -1;
+		/* cover invalid but positive categories in classify */
+		result = rte_acl_classify(acx, NULL, NULL, 0, 3);
+		/* we don't check for -ENOTSUP here, since it is a failure */
+		if (result == 0) {
+			printf("AGL: %d, ACL classify with 3 categories should fail!\n",
+			       i);
+			return -1;
+		}
 	}
 
 	/* free ACL context */
diff --git a/lib/librte_acl/Makefile b/lib/librte_acl/Makefile
index ea5edf00a..c5dfdb832 100644
--- a/lib/librte_acl/Makefile
+++ b/lib/librte_acl/Makefile
@@ -21,6 +21,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_ACL) += rte_acl.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_bld.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_gen.c
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_scalar.c
+SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_notsup.c
 
 ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),)
 SRCS-$(CONFIG_RTE_LIBRTE_ACL) += acl_run_neon.c
diff --git a/lib/librte_acl/acl_run_notsup.c b/lib/librte_acl/acl_run_notsup.c
new file mode 100644
index 000000000..2bcc6e67f
--- /dev/null
+++ b/lib/librte_acl/acl_run_notsup.c
@@ -0,0 +1,46 @@
+#include <rte_acl.h>
+#include "acl.h"
+
+/*
+ * If the compiler doesn't support AVX2 instructions,
+ * then the dummy one would be used instead for AVX2 classify method.
+ */
+int __rte_weak
+rte_acl_classify_avx2(__rte_unused const struct rte_acl_ctx *ctx,
+	__rte_unused const uint8_t **data,
+	__rte_unused uint32_t *results,
+	__rte_unused uint32_t num,
+	__rte_unused uint32_t categories)
+{
+	return -ENOTSUP;
+}
+
+int __rte_weak
+rte_acl_classify_sse(__rte_unused const struct rte_acl_ctx *ctx,
+	__rte_unused const uint8_t **data,
+	__rte_unused uint32_t *results,
+	__rte_unused uint32_t num,
+	__rte_unused uint32_t categories)
+{
+	return -ENOTSUP;
+}
+
+int __rte_weak
+rte_acl_classify_neon(__rte_unused const struct rte_acl_ctx *ctx,
+	__rte_unused const uint8_t **data,
+	__rte_unused uint32_t *results,
+	__rte_unused uint32_t num,
+	__rte_unused uint32_t categories)
+{
+	return -ENOTSUP;
+}
+
+int __rte_weak
+rte_acl_classify_altivec(__rte_unused const struct rte_acl_ctx *ctx,
+	__rte_unused const uint8_t **data,
+	__rte_unused uint32_t *results,
+	__rte_unused uint32_t num,
+	__rte_unused uint32_t categories)
+{
+	return -ENOTSUP;
+}
diff --git a/lib/librte_acl/meson.build b/lib/librte_acl/meson.build
index 03c19e4e5..fc8689aa9 100644
--- a/lib/librte_acl/meson.build
+++ b/lib/librte_acl/meson.build
@@ -2,8 +2,8 @@
 # Copyright(c) 2017 Intel Corporation
 
 version = 2
-sources = files('acl_bld.c', 'acl_gen.c', 'acl_run_scalar.c',
-		'rte_acl.c', 'tb_mem.c')
+sources = files('acl_bld.c', 'acl_gen.c', 'acl_run_notsup.c',
+		'acl_run_scalar.c', 'rte_acl.c', 'tb_mem.c')
 headers = files('rte_acl.h', 'rte_acl_osdep.h')
 
 if arch_subdir == 'x86'
-- 
2.19.1


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build
  2019-04-08 18:24 [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build Aaron Conole
                   ` (3 preceding siblings ...)
  2019-04-08 18:24 ` [dpdk-dev] [PATCH 3/3] acl: adjust the tests Aaron Conole
@ 2019-04-08 20:40 ` Aaron Conole
  2019-04-08 20:40   ` Aaron Conole
  4 siblings, 1 reply; 44+ messages in thread
From: Aaron Conole @ 2019-04-08 20:40 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, Jerin Jacob, Gavin Hu, David Marchand

Aaron Conole <aconole@redhat.com> writes:

> This series fixes the following conditions in the RTE_ACL library:
>
>   1. Fix outstanding compilation issues on ARM with the NEON optimized code
>      These consisted mostly of compiler type-cast warnings.  Additionally, some
>      of the vector code didn't initialize memory properly.
>
>   2. Properly include ARM, and PPC objects when building on those platforms
>      During the meson port, only the scalar code, and some of the x86_64 code was
>      ported.
>
>   3. Allow the unit tests to pass
>      In order to support this, the unsupported symbols were moved to a separate
>      file, which was needed to prevent the compiler from inlining references to the
>      functions (resulting in non-scalar code always falling into the -ENOTSUP case).
>
>      The tests were modified to primarily test the scalar version - a better system
>      for exercising the non-scalar code needs to be developed.
>
> Aaron Conole (3):
>   acl: fix arm argument types
>   acl: update the build for multi-arch
>   acl: adjust the tests
>
>  app/test/test_acl.c             | 62 +++++++++++++--------------------
>  lib/librte_acl/Makefile         |  1 +
>  lib/librte_acl/acl_run_neon.h   | 46 ++++++++++++++----------
>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
>  lib/librte_acl/meson.build      |  9 +++--
>  5 files changed, 104 insertions(+), 60 deletions(-)
>  create mode 100644 lib/librte_acl/acl_run_notsup.c

Might have sent prematurely - please ignore for the time being.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build
  2019-04-08 20:40 ` [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build Aaron Conole
@ 2019-04-08 20:40   ` Aaron Conole
  0 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-08 20:40 UTC (permalink / raw)
  To: dev; +Cc: Konstantin Ananyev, Jerin Jacob, Gavin Hu, David Marchand

Aaron Conole <aconole@redhat.com> writes:

> This series fixes the following conditions in the RTE_ACL library:
>
>   1. Fix outstanding compilation issues on ARM with the NEON optimized code
>      These consisted mostly of compiler type-cast warnings.  Additionally, some
>      of the vector code didn't initialize memory properly.
>
>   2. Properly include ARM, and PPC objects when building on those platforms
>      During the meson port, only the scalar code, and some of the x86_64 code was
>      ported.
>
>   3. Allow the unit tests to pass
>      In order to support this, the unsupported symbols were moved to a separate
>      file, which was needed to prevent the compiler from inlining references to the
>      functions (resulting in non-scalar code always falling into the -ENOTSUP case).
>
>      The tests were modified to primarily test the scalar version - a better system
>      for exercising the non-scalar code needs to be developed.
>
> Aaron Conole (3):
>   acl: fix arm argument types
>   acl: update the build for multi-arch
>   acl: adjust the tests
>
>  app/test/test_acl.c             | 62 +++++++++++++--------------------
>  lib/librte_acl/Makefile         |  1 +
>  lib/librte_acl/acl_run_neon.h   | 46 ++++++++++++++----------
>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
>  lib/librte_acl/meson.build      |  9 +++--
>  5 files changed, 104 insertions(+), 60 deletions(-)
>  create mode 100644 lib/librte_acl/acl_run_notsup.c

Might have sent prematurely - please ignore for the time being.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-08 18:24 ` [dpdk-dev] [PATCH 3/3] acl: adjust the tests Aaron Conole
  2019-04-08 18:24   ` Aaron Conole
@ 2019-04-09  8:41   ` Ananyev, Konstantin
  2019-04-09  8:41     ` Ananyev, Konstantin
  2019-04-09 13:01     ` Aaron Conole
  1 sibling, 2 replies; 44+ messages in thread
From: Ananyev, Konstantin @ 2019-04-09  8:41 UTC (permalink / raw)
  To: Aaron Conole, dev; +Cc: Jerin Jacob, Gavin Hu

Hi Aaron,

> 
> This makes the tests pass, and also ensures that on platforms where the
> testing is supported, we can properly test the implementation specific
> code.  One edge case is when we run on x86_64 systems that don't support
> AVX2, but where the compiler can generate such instructions.  That could
> be an enhancement in the future, but for now at least the tests will
> pass.
> 
> Signed-off-by: Aaron Conole <aconole@redhat.com>
> ---
>  app/test/test_acl.c             | 62 +++++++++++++--------------------
>  lib/librte_acl/Makefile         |  1 +
>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
>  lib/librte_acl/meson.build      |  4 +--
>  4 files changed, 73 insertions(+), 40 deletions(-)
>  create mode 100644 lib/librte_acl/acl_run_notsup.c
> 
> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
> index b1f75d1bc..c44faa251 100644
> --- a/app/test/test_acl.c
> +++ b/app/test/test_acl.c
> @@ -408,6 +408,9 @@ test_classify(void)
>  		return -1;
>  	}
> 
> +	/* Always use the scalar testing for now. */
> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> +
>  	ret = 0;
>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> 
> @@ -547,6 +550,7 @@ test_build_ports_range(void)
>  	for (i = 0; i != RTE_DIM(test_data); i++)
>  		data[i] = (uint8_t *)&test_data[i];
> 
> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
>  		rte_acl_reset(acx);
>  		ret = test_classify_buid(acx, test_rules, i + 1);
> @@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
>  		return -1;
>  	}
> 
> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> +

As I understand here and above, on x86 you replaced default algo (SSE, AVX2)
with scalar one, right?
That looks like reduction of test coverage for x86.
Konstantin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09  8:41   ` Ananyev, Konstantin
@ 2019-04-09  8:41     ` Ananyev, Konstantin
  2019-04-09 13:01     ` Aaron Conole
  1 sibling, 0 replies; 44+ messages in thread
From: Ananyev, Konstantin @ 2019-04-09  8:41 UTC (permalink / raw)
  To: Aaron Conole, dev; +Cc: Jerin Jacob, Gavin Hu

Hi Aaron,

> 
> This makes the tests pass, and also ensures that on platforms where the
> testing is supported, we can properly test the implementation specific
> code.  One edge case is when we run on x86_64 systems that don't support
> AVX2, but where the compiler can generate such instructions.  That could
> be an enhancement in the future, but for now at least the tests will
> pass.
> 
> Signed-off-by: Aaron Conole <aconole@redhat.com>
> ---
>  app/test/test_acl.c             | 62 +++++++++++++--------------------
>  lib/librte_acl/Makefile         |  1 +
>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
>  lib/librte_acl/meson.build      |  4 +--
>  4 files changed, 73 insertions(+), 40 deletions(-)
>  create mode 100644 lib/librte_acl/acl_run_notsup.c
> 
> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
> index b1f75d1bc..c44faa251 100644
> --- a/app/test/test_acl.c
> +++ b/app/test/test_acl.c
> @@ -408,6 +408,9 @@ test_classify(void)
>  		return -1;
>  	}
> 
> +	/* Always use the scalar testing for now. */
> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> +
>  	ret = 0;
>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> 
> @@ -547,6 +550,7 @@ test_build_ports_range(void)
>  	for (i = 0; i != RTE_DIM(test_data); i++)
>  		data[i] = (uint8_t *)&test_data[i];
> 
> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
>  		rte_acl_reset(acx);
>  		ret = test_classify_buid(acx, test_rules, i + 1);
> @@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
>  		return -1;
>  	}
> 
> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> +

As I understand here and above, on x86 you replaced default algo (SSE, AVX2)
with scalar one, right?
That looks like reduction of test coverage for x86.
Konstantin



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09  8:41   ` Ananyev, Konstantin
  2019-04-09  8:41     ` Ananyev, Konstantin
@ 2019-04-09 13:01     ` Aaron Conole
  2019-04-09 13:01       ` Aaron Conole
  2019-04-09 16:03       ` Ananyev, Konstantin
  1 sibling, 2 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-09 13:01 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: dev, Jerin Jacob, Gavin Hu, Bruce Richardson, Michael Santana

"Ananyev, Konstantin" <konstantin.ananyev@intel.com> writes:

> Hi Aaron,
>
>> 
>> This makes the tests pass, and also ensures that on platforms where the
>> testing is supported, we can properly test the implementation specific
>> code.  One edge case is when we run on x86_64 systems that don't support
>> AVX2, but where the compiler can generate such instructions.  That could
>> be an enhancement in the future, but for now at least the tests will
>> pass.
>> 
>> Signed-off-by: Aaron Conole <aconole@redhat.com>
>> ---
>>  app/test/test_acl.c             | 62 +++++++++++++--------------------
>>  lib/librte_acl/Makefile         |  1 +
>>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
>>  lib/librte_acl/meson.build      |  4 +--
>>  4 files changed, 73 insertions(+), 40 deletions(-)
>>  create mode 100644 lib/librte_acl/acl_run_notsup.c
>> 
>> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
>> index b1f75d1bc..c44faa251 100644
>> --- a/app/test/test_acl.c
>> +++ b/app/test/test_acl.c
>> @@ -408,6 +408,9 @@ test_classify(void)
>>  		return -1;
>>  	}
>> 
>> +	/* Always use the scalar testing for now. */
>> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>> +
>>  	ret = 0;
>>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
>> 
>> @@ -547,6 +550,7 @@ test_build_ports_range(void)
>>  	for (i = 0; i != RTE_DIM(test_data); i++)
>>  		data[i] = (uint8_t *)&test_data[i];
>> 
>> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
>>  		rte_acl_reset(acx);
>>  		ret = test_classify_buid(acx, test_rules, i + 1);
>> @@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
>>  		return -1;
>>  	}
>> 
>> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>> +
>
> As I understand here and above, on x86 you replaced default algo (SSE, AVX2)
> with scalar one, right?
> That looks like reduction of test coverage for x86.

In one way, you're right.  However, the tests weren't testing what they
purported anyway.  Actually, it's just a shift I think (previously, it
would have tested the AVX2 but I don't see AVX2 having a fallback into
the SSE code - unlike the SSE code falling back into scalar).

The tests were failing for a number of reasons when built with meson,
and on the systems I tested with (including tests under travis).

1. Any meson build that I observed didn't correctly fill anything but
   the scalar variable.  I had to remove the -ENOTSUP definitions in the
   rte_acl.c file (forgot to git add it), and make the second version.

2. The tests never selected scalar, or nor sse implementations.  Rather,
   they selected only what the currently running platform provided.
   This meant that I was always seeing the AVX2 code executed, but never
   SSE nor scalar (but for one case) - at least as far as I could see.

There were others - I iterated on these for a few days.

This is why I changed a block to run through each implementation in one
of the versions.

HOWEVER, it's still deficient.

We need to fully cover all the cases.  BUT it's better than the failure
that currently happens on almost every system I've tried - including
shipping the build to travis to run.  So, I figured running > failing with
almost no reason why.  And looking into the failure revealed that the
meson build didn't even include the platform specific builds.

During my rework, I can change the test cases to iterate as in other
test cases.  It will extend the time.  And I don't know how to resolve
the case where we run on a system that doesn't support AVX2 but we have
a compiler that supports AVX2 (since that case will fail - but we
shouldn't even attempt it).

WDYT?

> Konstantin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09 13:01     ` Aaron Conole
@ 2019-04-09 13:01       ` Aaron Conole
  2019-04-09 16:03       ` Ananyev, Konstantin
  1 sibling, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-09 13:01 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: dev, Jerin Jacob, Gavin Hu, Bruce Richardson, Michael Santana

"Ananyev, Konstantin" <konstantin.ananyev@intel.com> writes:

> Hi Aaron,
>
>> 
>> This makes the tests pass, and also ensures that on platforms where the
>> testing is supported, we can properly test the implementation specific
>> code.  One edge case is when we run on x86_64 systems that don't support
>> AVX2, but where the compiler can generate such instructions.  That could
>> be an enhancement in the future, but for now at least the tests will
>> pass.
>> 
>> Signed-off-by: Aaron Conole <aconole@redhat.com>
>> ---
>>  app/test/test_acl.c             | 62 +++++++++++++--------------------
>>  lib/librte_acl/Makefile         |  1 +
>>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
>>  lib/librte_acl/meson.build      |  4 +--
>>  4 files changed, 73 insertions(+), 40 deletions(-)
>>  create mode 100644 lib/librte_acl/acl_run_notsup.c
>> 
>> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
>> index b1f75d1bc..c44faa251 100644
>> --- a/app/test/test_acl.c
>> +++ b/app/test/test_acl.c
>> @@ -408,6 +408,9 @@ test_classify(void)
>>  		return -1;
>>  	}
>> 
>> +	/* Always use the scalar testing for now. */
>> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>> +
>>  	ret = 0;
>>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
>> 
>> @@ -547,6 +550,7 @@ test_build_ports_range(void)
>>  	for (i = 0; i != RTE_DIM(test_data); i++)
>>  		data[i] = (uint8_t *)&test_data[i];
>> 
>> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
>>  		rte_acl_reset(acx);
>>  		ret = test_classify_buid(acx, test_rules, i + 1);
>> @@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
>>  		return -1;
>>  	}
>> 
>> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>> +
>
> As I understand here and above, on x86 you replaced default algo (SSE, AVX2)
> with scalar one, right?
> That looks like reduction of test coverage for x86.

In one way, you're right.  However, the tests weren't testing what they
purported anyway.  Actually, it's just a shift I think (previously, it
would have tested the AVX2 but I don't see AVX2 having a fallback into
the SSE code - unlike the SSE code falling back into scalar).

The tests were failing for a number of reasons when built with meson,
and on the systems I tested with (including tests under travis).

1. Any meson build that I observed didn't correctly fill anything but
   the scalar variable.  I had to remove the -ENOTSUP definitions in the
   rte_acl.c file (forgot to git add it), and make the second version.

2. The tests never selected scalar, or nor sse implementations.  Rather,
   they selected only what the currently running platform provided.
   This meant that I was always seeing the AVX2 code executed, but never
   SSE nor scalar (but for one case) - at least as far as I could see.

There were others - I iterated on these for a few days.

This is why I changed a block to run through each implementation in one
of the versions.

HOWEVER, it's still deficient.

We need to fully cover all the cases.  BUT it's better than the failure
that currently happens on almost every system I've tried - including
shipping the build to travis to run.  So, I figured running > failing with
almost no reason why.  And looking into the failure revealed that the
meson build didn't even include the platform specific builds.

During my rework, I can change the test cases to iterate as in other
test cases.  It will extend the time.  And I don't know how to resolve
the case where we run on a system that doesn't support AVX2 but we have
a compiler that supports AVX2 (since that case will fail - but we
shouldn't even attempt it).

WDYT?

> Konstantin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09 13:01     ` Aaron Conole
  2019-04-09 13:01       ` Aaron Conole
@ 2019-04-09 16:03       ` Ananyev, Konstantin
  2019-04-09 16:03         ` Ananyev, Konstantin
                           ` (2 more replies)
  1 sibling, 3 replies; 44+ messages in thread
From: Ananyev, Konstantin @ 2019-04-09 16:03 UTC (permalink / raw)
  To: Aaron Conole
  Cc: dev, Jerin Jacob, Gavin Hu, Richardson, Bruce, Michael Santana


> > Hi Aaron,
> >
> >>
> >> This makes the tests pass, and also ensures that on platforms where the
> >> testing is supported, we can properly test the implementation specific
> >> code.  One edge case is when we run on x86_64 systems that don't support
> >> AVX2, but where the compiler can generate such instructions.  That could
> >> be an enhancement in the future, but for now at least the tests will
> >> pass.
> >>
> >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> >> ---
> >>  app/test/test_acl.c             | 62 +++++++++++++--------------------
> >>  lib/librte_acl/Makefile         |  1 +
> >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> >>  lib/librte_acl/meson.build      |  4 +--
> >>  4 files changed, 73 insertions(+), 40 deletions(-)
> >>  create mode 100644 lib/librte_acl/acl_run_notsup.c
> >>
> >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
> >> index b1f75d1bc..c44faa251 100644
> >> --- a/app/test/test_acl.c
> >> +++ b/app/test/test_acl.c
> >> @@ -408,6 +408,9 @@ test_classify(void)
> >>  		return -1;
> >>  	}
> >>
> >> +	/* Always use the scalar testing for now. */
> >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> >> +
> >>  	ret = 0;
> >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> >>
> >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> >>  		data[i] = (uint8_t *)&test_data[i];
> >>
> >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> >>  		rte_acl_reset(acx);
> >>  		ret = test_classify_buid(acx, test_rules, i + 1);
> >> @@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
> >>  		return -1;
> >>  	}
> >>
> >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> >> +
> >
> > As I understand here and above, on x86 you replaced default algo (SSE, AVX2)
> > with scalar one, right?
> > That looks like reduction of test coverage for x86.
> 
> In one way, you're right.  However, the tests weren't testing what they
> purported anyway.

Could you explain a bit more here?
What I am seeing: tests were running bot sse(or avx2) and scalar classify() method.
Now they always running scalar only.
To me it definitely looks like reduction in coverage.

>  Actually, it's just a shift I think (previously, it
> would have tested the AVX2 but I don't see AVX2 having a fallback into
> the SSE code - unlike the SSE code falling back into scalar).

Not sure I understand you here.
What fallback for AVX2 you expect that you think is missing?

> 
> The tests were failing for a number of reasons when built with meson,

Ok, but with legacy build system (make) on x86 all tests passes, right?
So the problem is in new build system, not in the test itself.
Why we should compromise our test coverage to make it work with
new tools? 
That just hides the problem without fixing it.
Instead I think the build system needs to be fixed.
Looking at it a bit closely, for .so meson+ninja generates code with
correct version of the function:
 
nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep acl_classify_sse
000000000000fa50 t rte_acl_classify_sse

So for 'meson -Ddefault_library=shared'
acl_autotest passes without the problem. 

Though for static lib we have both:
nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep acl_classify_sse
0000000000000000 W rte_acl_classify_sse
0000000000004880 T rte_acl_classify_sse

And then linker pickups the wrong one:
nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep acl_classify_sse
00000000005f6100 W rte_acl_classify_sse

While for make:
$ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep acl_classify_sse 
0000000000000000 W rte_acl_classify_sse
0000000000004880 T rte_acl_classify_sse
$ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
0000000000240440 T rte_acl_classify_sse

Linker pickups the right one.


> and on the systems I tested with (including tests under travis).
> 
> 1. Any meson build that I observed didn't correctly fill anything but
>    the scalar variable.  I had to remove the -ENOTSUP definitions in the
>    rte_acl.c file (forgot to git add it), and make the second version.
> 
> 2. The tests never selected scalar, or nor sse implementations. 

As I can see test_classify_run() *always* run both default method (sse/avx2 on x86)
and then scalar one. 

>  Rather,
>    they selected only what the currently running platform provided.
>    This meant that I was always seeing the AVX2 code executed, but never
>    SSE nor scalar (but for one case) - at least as far as I could see.
> 
> There were others - I iterated on these for a few days.
> 
> This is why I changed a block to run through each implementation in one
> of the versions.
> 
> HOWEVER, it's still deficient.
> 
> We need to fully cover all the cases.  BUT it's better than the failure
> that currently happens on almost every system I've tried - including
> shipping the build to travis to run.  So, I figured running > failing with
> almost no reason why.  And looking into the failure revealed that the
> meson build didn't even include the platform specific builds.
> 
> During my rework, I can change the test cases to iterate as in other
> test cases.  It will extend the time.  And I don't know how to resolve
> the case where we run on a system that doesn't support AVX2 but we have
> a compiler that supports AVX2 (since that case will fail - but we
> shouldn't even attempt it).

I don't see why that should happen.
At rte_acl_init() we do check does that machine supports AVX2(SSE, NEON) 
instructions or not.
Are you saying under some circumstances rte_acl_init() are not working properly,
or not get invoked?

Konstantin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09 16:03       ` Ananyev, Konstantin
@ 2019-04-09 16:03         ` Ananyev, Konstantin
  2019-04-09 17:04         ` Ananyev, Konstantin
  2019-04-09 17:05         ` Richardson, Bruce
  2 siblings, 0 replies; 44+ messages in thread
From: Ananyev, Konstantin @ 2019-04-09 16:03 UTC (permalink / raw)
  To: Aaron Conole
  Cc: dev, Jerin Jacob, Gavin Hu, Richardson, Bruce, Michael Santana


> > Hi Aaron,
> >
> >>
> >> This makes the tests pass, and also ensures that on platforms where the
> >> testing is supported, we can properly test the implementation specific
> >> code.  One edge case is when we run on x86_64 systems that don't support
> >> AVX2, but where the compiler can generate such instructions.  That could
> >> be an enhancement in the future, but for now at least the tests will
> >> pass.
> >>
> >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> >> ---
> >>  app/test/test_acl.c             | 62 +++++++++++++--------------------
> >>  lib/librte_acl/Makefile         |  1 +
> >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> >>  lib/librte_acl/meson.build      |  4 +--
> >>  4 files changed, 73 insertions(+), 40 deletions(-)
> >>  create mode 100644 lib/librte_acl/acl_run_notsup.c
> >>
> >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
> >> index b1f75d1bc..c44faa251 100644
> >> --- a/app/test/test_acl.c
> >> +++ b/app/test/test_acl.c
> >> @@ -408,6 +408,9 @@ test_classify(void)
> >>  		return -1;
> >>  	}
> >>
> >> +	/* Always use the scalar testing for now. */
> >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> >> +
> >>  	ret = 0;
> >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> >>
> >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> >>  		data[i] = (uint8_t *)&test_data[i];
> >>
> >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> >>  		rte_acl_reset(acx);
> >>  		ret = test_classify_buid(acx, test_rules, i + 1);
> >> @@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
> >>  		return -1;
> >>  	}
> >>
> >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> >> +
> >
> > As I understand here and above, on x86 you replaced default algo (SSE, AVX2)
> > with scalar one, right?
> > That looks like reduction of test coverage for x86.
> 
> In one way, you're right.  However, the tests weren't testing what they
> purported anyway.

Could you explain a bit more here?
What I am seeing: tests were running bot sse(or avx2) and scalar classify() method.
Now they always running scalar only.
To me it definitely looks like reduction in coverage.

>  Actually, it's just a shift I think (previously, it
> would have tested the AVX2 but I don't see AVX2 having a fallback into
> the SSE code - unlike the SSE code falling back into scalar).

Not sure I understand you here.
What fallback for AVX2 you expect that you think is missing?

> 
> The tests were failing for a number of reasons when built with meson,

Ok, but with legacy build system (make) on x86 all tests passes, right?
So the problem is in new build system, not in the test itself.
Why we should compromise our test coverage to make it work with
new tools? 
That just hides the problem without fixing it.
Instead I think the build system needs to be fixed.
Looking at it a bit closely, for .so meson+ninja generates code with
correct version of the function:
 
nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep acl_classify_sse
000000000000fa50 t rte_acl_classify_sse

So for 'meson -Ddefault_library=shared'
acl_autotest passes without the problem. 

Though for static lib we have both:
nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep acl_classify_sse
0000000000000000 W rte_acl_classify_sse
0000000000004880 T rte_acl_classify_sse

And then linker pickups the wrong one:
nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep acl_classify_sse
00000000005f6100 W rte_acl_classify_sse

While for make:
$ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep acl_classify_sse 
0000000000000000 W rte_acl_classify_sse
0000000000004880 T rte_acl_classify_sse
$ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
0000000000240440 T rte_acl_classify_sse

Linker pickups the right one.


> and on the systems I tested with (including tests under travis).
> 
> 1. Any meson build that I observed didn't correctly fill anything but
>    the scalar variable.  I had to remove the -ENOTSUP definitions in the
>    rte_acl.c file (forgot to git add it), and make the second version.
> 
> 2. The tests never selected scalar, or nor sse implementations. 

As I can see test_classify_run() *always* run both default method (sse/avx2 on x86)
and then scalar one. 

>  Rather,
>    they selected only what the currently running platform provided.
>    This meant that I was always seeing the AVX2 code executed, but never
>    SSE nor scalar (but for one case) - at least as far as I could see.
> 
> There were others - I iterated on these for a few days.
> 
> This is why I changed a block to run through each implementation in one
> of the versions.
> 
> HOWEVER, it's still deficient.
> 
> We need to fully cover all the cases.  BUT it's better than the failure
> that currently happens on almost every system I've tried - including
> shipping the build to travis to run.  So, I figured running > failing with
> almost no reason why.  And looking into the failure revealed that the
> meson build didn't even include the platform specific builds.
> 
> During my rework, I can change the test cases to iterate as in other
> test cases.  It will extend the time.  And I don't know how to resolve
> the case where we run on a system that doesn't support AVX2 but we have
> a compiler that supports AVX2 (since that case will fail - but we
> shouldn't even attempt it).

I don't see why that should happen.
At rte_acl_init() we do check does that machine supports AVX2(SSE, NEON) 
instructions or not.
Are you saying under some circumstances rte_acl_init() are not working properly,
or not get invoked?

Konstantin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09 16:03       ` Ananyev, Konstantin
  2019-04-09 16:03         ` Ananyev, Konstantin
@ 2019-04-09 17:04         ` Ananyev, Konstantin
  2019-04-09 17:04           ` Ananyev, Konstantin
                             ` (2 more replies)
  2019-04-09 17:05         ` Richardson, Bruce
  2 siblings, 3 replies; 44+ messages in thread
From: Ananyev, Konstantin @ 2019-04-09 17:04 UTC (permalink / raw)
  To: Ananyev, Konstantin, Aaron Conole
  Cc: dev, Jerin Jacob, Gavin Hu, Richardson, Bruce, Michael Santana



> 
> > > Hi Aaron,
> > >
> > >>
> > >> This makes the tests pass, and also ensures that on platforms where the
> > >> testing is supported, we can properly test the implementation specific
> > >> code.  One edge case is when we run on x86_64 systems that don't support
> > >> AVX2, but where the compiler can generate such instructions.  That could
> > >> be an enhancement in the future, but for now at least the tests will
> > >> pass.
> > >>
> > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> > >> ---
> > >>  app/test/test_acl.c             | 62 +++++++++++++--------------------
> > >>  lib/librte_acl/Makefile         |  1 +
> > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> > >>  lib/librte_acl/meson.build      |  4 +--
> > >>  4 files changed, 73 insertions(+), 40 deletions(-)
> > >>  create mode 100644 lib/librte_acl/acl_run_notsup.c
> > >>
> > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
> > >> index b1f75d1bc..c44faa251 100644
> > >> --- a/app/test/test_acl.c
> > >> +++ b/app/test/test_acl.c
> > >> @@ -408,6 +408,9 @@ test_classify(void)
> > >>  		return -1;
> > >>  	}
> > >>
> > >> +	/* Always use the scalar testing for now. */
> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > >> +
> > >>  	ret = 0;
> > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> > >>
> > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> > >>  		data[i] = (uint8_t *)&test_data[i];
> > >>
> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> > >>  		rte_acl_reset(acx);
> > >>  		ret = test_classify_buid(acx, test_rules, i + 1);
> > >> @@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
> > >>  		return -1;
> > >>  	}
> > >>
> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > >> +
> > >
> > > As I understand here and above, on x86 you replaced default algo (SSE, AVX2)
> > > with scalar one, right?
> > > That looks like reduction of test coverage for x86.
> >
> > In one way, you're right.  However, the tests weren't testing what they
> > purported anyway.
> 
> Could you explain a bit more here?
> What I am seeing: tests were running bot sse(or avx2) and scalar classify() method.
> Now they always running scalar only.
> To me it definitely looks like reduction in coverage.
> 
> >  Actually, it's just a shift I think (previously, it
> > would have tested the AVX2 but I don't see AVX2 having a fallback into
> > the SSE code - unlike the SSE code falling back into scalar).
> 
> Not sure I understand you here.
> What fallback for AVX2 you expect that you think is missing?
> 
> >
> > The tests were failing for a number of reasons when built with meson,
> 
> Ok, but with legacy build system (make) on x86 all tests passes, right?
> So the problem is in new build system, not in the test itself.
> Why we should compromise our test coverage to make it work with
> new tools?
> That just hides the problem without fixing it.
> Instead I think the build system needs to be fixed.
> Looking at it a bit closely, for .so meson+ninja generates code with
> correct version of the function:
> 
> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep acl_classify_sse
> 000000000000fa50 t rte_acl_classify_sse
> 
> So for 'meson -Ddefault_library=shared'
> acl_autotest passes without the problem.
> 
> Though for static lib we have both:
> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep acl_classify_sse
> 0000000000000000 W rte_acl_classify_sse
> 0000000000004880 T rte_acl_classify_sse
> 
> And then linker pickups the wrong one:
> nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep acl_classify_sse
> 00000000005f6100 W rte_acl_classify_sse
> 
> While for make:
> $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep acl_classify_sse
> 0000000000000000 W rte_acl_classify_sse
> 0000000000004880 T rte_acl_classify_sse
> $ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
> 0000000000240440 T rte_acl_classify_sse
> 
> Linker pickups the right one.

And the changes below make linker to pick-up the proper version of the function
and make acl_autotest to pass for static build too.

diff --git a/app/test/meson.build b/app/test/meson.build
index 867cc5863..4364be932 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -328,6 +328,7 @@ test_dep_objs += cc.find_library('execinfo', required: false)
 link_libs = []
 if get_option('default_library') == 'static'
        link_libs = dpdk_drivers
+       link_libs += dpdk_static_libraries
 endif

 if get_option('tests')
diff --git a/meson.build b/meson.build
index a96486597..df1e1c41c 100644
--- a/meson.build
+++ b/meson.build
@@ -62,6 +62,7 @@ configure_file(output: build_cfg,
 # for static builds, include the drivers as libs and we need to "whole-archive"
 # them.
 dpdk_drivers = ['-Wl,--whole-archive'] + dpdk_drivers + ['-Wl,--no-whole-archive']
+dpdk_static_libraries = ['-Wl,--whole-archive'] + dpdk_static_libraries + ['-Wl,--no-whole-archive']

Not saying that's the proper patch, but just to prove that linking librte_acl.a
with '--whole-archive' does fix the problem.
Bruce, could you point is the best way to fix things here
(my meson knowledge is very limited)?
Do we need extra container here as 'whole_archive_static_libraries[]' or so?
Thanks
Konstantin


> 
> 
> > and on the systems I tested with (including tests under travis).
> >
> > 1. Any meson build that I observed didn't correctly fill anything but
> >    the scalar variable.  I had to remove the -ENOTSUP definitions in the
> >    rte_acl.c file (forgot to git add it), and make the second version.
> >
> > 2. The tests never selected scalar, or nor sse implementations.
> 
> As I can see test_classify_run() *always* run both default method (sse/avx2 on x86)
> and then scalar one.
> 
> >  Rather,
> >    they selected only what the currently running platform provided.
> >    This meant that I was always seeing the AVX2 code executed, but never
> >    SSE nor scalar (but for one case) - at least as far as I could see.
> >
> > There were others - I iterated on these for a few days.
> >
> > This is why I changed a block to run through each implementation in one
> > of the versions.
> >
> > HOWEVER, it's still deficient.
> >
> > We need to fully cover all the cases.  BUT it's better than the failure
> > that currently happens on almost every system I've tried - including
> > shipping the build to travis to run.  So, I figured running > failing with
> > almost no reason why.  And looking into the failure revealed that the
> > meson build didn't even include the platform specific builds.
> >
> > During my rework, I can change the test cases to iterate as in other
> > test cases.  It will extend the time.  And I don't know how to resolve
> > the case where we run on a system that doesn't support AVX2 but we have
> > a compiler that supports AVX2 (since that case will fail - but we
> > shouldn't even attempt it).
> 
> I don't see why that should happen.
> At rte_acl_init() we do check does that machine supports AVX2(SSE, NEON)
> instructions or not.
> Are you saying under some circumstances rte_acl_init() are not working properly,
> or not get invoked?
> 
> Konstantin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09 17:04         ` Ananyev, Konstantin
@ 2019-04-09 17:04           ` Ananyev, Konstantin
  2019-04-10  8:13           ` Richardson, Bruce
  2019-04-10 13:10           ` Aaron Conole
  2 siblings, 0 replies; 44+ messages in thread
From: Ananyev, Konstantin @ 2019-04-09 17:04 UTC (permalink / raw)
  To: Ananyev, Konstantin, Aaron Conole
  Cc: dev, Jerin Jacob, Gavin Hu, Richardson, Bruce, Michael Santana



> 
> > > Hi Aaron,
> > >
> > >>
> > >> This makes the tests pass, and also ensures that on platforms where the
> > >> testing is supported, we can properly test the implementation specific
> > >> code.  One edge case is when we run on x86_64 systems that don't support
> > >> AVX2, but where the compiler can generate such instructions.  That could
> > >> be an enhancement in the future, but for now at least the tests will
> > >> pass.
> > >>
> > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> > >> ---
> > >>  app/test/test_acl.c             | 62 +++++++++++++--------------------
> > >>  lib/librte_acl/Makefile         |  1 +
> > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> > >>  lib/librte_acl/meson.build      |  4 +--
> > >>  4 files changed, 73 insertions(+), 40 deletions(-)
> > >>  create mode 100644 lib/librte_acl/acl_run_notsup.c
> > >>
> > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
> > >> index b1f75d1bc..c44faa251 100644
> > >> --- a/app/test/test_acl.c
> > >> +++ b/app/test/test_acl.c
> > >> @@ -408,6 +408,9 @@ test_classify(void)
> > >>  		return -1;
> > >>  	}
> > >>
> > >> +	/* Always use the scalar testing for now. */
> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > >> +
> > >>  	ret = 0;
> > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> > >>
> > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> > >>  		data[i] = (uint8_t *)&test_data[i];
> > >>
> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> > >>  		rte_acl_reset(acx);
> > >>  		ret = test_classify_buid(acx, test_rules, i + 1);
> > >> @@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
> > >>  		return -1;
> > >>  	}
> > >>
> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > >> +
> > >
> > > As I understand here and above, on x86 you replaced default algo (SSE, AVX2)
> > > with scalar one, right?
> > > That looks like reduction of test coverage for x86.
> >
> > In one way, you're right.  However, the tests weren't testing what they
> > purported anyway.
> 
> Could you explain a bit more here?
> What I am seeing: tests were running bot sse(or avx2) and scalar classify() method.
> Now they always running scalar only.
> To me it definitely looks like reduction in coverage.
> 
> >  Actually, it's just a shift I think (previously, it
> > would have tested the AVX2 but I don't see AVX2 having a fallback into
> > the SSE code - unlike the SSE code falling back into scalar).
> 
> Not sure I understand you here.
> What fallback for AVX2 you expect that you think is missing?
> 
> >
> > The tests were failing for a number of reasons when built with meson,
> 
> Ok, but with legacy build system (make) on x86 all tests passes, right?
> So the problem is in new build system, not in the test itself.
> Why we should compromise our test coverage to make it work with
> new tools?
> That just hides the problem without fixing it.
> Instead I think the build system needs to be fixed.
> Looking at it a bit closely, for .so meson+ninja generates code with
> correct version of the function:
> 
> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep acl_classify_sse
> 000000000000fa50 t rte_acl_classify_sse
> 
> So for 'meson -Ddefault_library=shared'
> acl_autotest passes without the problem.
> 
> Though for static lib we have both:
> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep acl_classify_sse
> 0000000000000000 W rte_acl_classify_sse
> 0000000000004880 T rte_acl_classify_sse
> 
> And then linker pickups the wrong one:
> nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep acl_classify_sse
> 00000000005f6100 W rte_acl_classify_sse
> 
> While for make:
> $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep acl_classify_sse
> 0000000000000000 W rte_acl_classify_sse
> 0000000000004880 T rte_acl_classify_sse
> $ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
> 0000000000240440 T rte_acl_classify_sse
> 
> Linker pickups the right one.

And the changes below make linker to pick-up the proper version of the function
and make acl_autotest to pass for static build too.

diff --git a/app/test/meson.build b/app/test/meson.build
index 867cc5863..4364be932 100644
--- a/app/test/meson.build
+++ b/app/test/meson.build
@@ -328,6 +328,7 @@ test_dep_objs += cc.find_library('execinfo', required: false)
 link_libs = []
 if get_option('default_library') == 'static'
        link_libs = dpdk_drivers
+       link_libs += dpdk_static_libraries
 endif

 if get_option('tests')
diff --git a/meson.build b/meson.build
index a96486597..df1e1c41c 100644
--- a/meson.build
+++ b/meson.build
@@ -62,6 +62,7 @@ configure_file(output: build_cfg,
 # for static builds, include the drivers as libs and we need to "whole-archive"
 # them.
 dpdk_drivers = ['-Wl,--whole-archive'] + dpdk_drivers + ['-Wl,--no-whole-archive']
+dpdk_static_libraries = ['-Wl,--whole-archive'] + dpdk_static_libraries + ['-Wl,--no-whole-archive']

Not saying that's the proper patch, but just to prove that linking librte_acl.a
with '--whole-archive' does fix the problem.
Bruce, could you point is the best way to fix things here
(my meson knowledge is very limited)?
Do we need extra container here as 'whole_archive_static_libraries[]' or so?
Thanks
Konstantin


> 
> 
> > and on the systems I tested with (including tests under travis).
> >
> > 1. Any meson build that I observed didn't correctly fill anything but
> >    the scalar variable.  I had to remove the -ENOTSUP definitions in the
> >    rte_acl.c file (forgot to git add it), and make the second version.
> >
> > 2. The tests never selected scalar, or nor sse implementations.
> 
> As I can see test_classify_run() *always* run both default method (sse/avx2 on x86)
> and then scalar one.
> 
> >  Rather,
> >    they selected only what the currently running platform provided.
> >    This meant that I was always seeing the AVX2 code executed, but never
> >    SSE nor scalar (but for one case) - at least as far as I could see.
> >
> > There were others - I iterated on these for a few days.
> >
> > This is why I changed a block to run through each implementation in one
> > of the versions.
> >
> > HOWEVER, it's still deficient.
> >
> > We need to fully cover all the cases.  BUT it's better than the failure
> > that currently happens on almost every system I've tried - including
> > shipping the build to travis to run.  So, I figured running > failing with
> > almost no reason why.  And looking into the failure revealed that the
> > meson build didn't even include the platform specific builds.
> >
> > During my rework, I can change the test cases to iterate as in other
> > test cases.  It will extend the time.  And I don't know how to resolve
> > the case where we run on a system that doesn't support AVX2 but we have
> > a compiler that supports AVX2 (since that case will fail - but we
> > shouldn't even attempt it).
> 
> I don't see why that should happen.
> At rte_acl_init() we do check does that machine supports AVX2(SSE, NEON)
> instructions or not.
> Are you saying under some circumstances rte_acl_init() are not working properly,
> or not get invoked?
> 
> Konstantin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09 16:03       ` Ananyev, Konstantin
  2019-04-09 16:03         ` Ananyev, Konstantin
  2019-04-09 17:04         ` Ananyev, Konstantin
@ 2019-04-09 17:05         ` Richardson, Bruce
  2019-04-09 17:05           ` Richardson, Bruce
  2019-04-09 18:29           ` Ananyev, Konstantin
  2 siblings, 2 replies; 44+ messages in thread
From: Richardson, Bruce @ 2019-04-09 17:05 UTC (permalink / raw)
  To: Ananyev, Konstantin, Aaron Conole
  Cc: dev, Jerin Jacob, Gavin Hu, Michael Santana



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, April 9, 2019 5:03 PM
> To: Aaron Conole <aconole@redhat.com>
> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Gavin Hu
> <gavin.hu@arm.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> Michael Santana <msantana@redhat.com>
> Subject: RE: [PATCH 3/3] acl: adjust the tests
> 
> 
> > > Hi Aaron,
> > >
> > >>
> > >> This makes the tests pass, and also ensures that on platforms where
> > >> the testing is supported, we can properly test the implementation
> > >> specific code.  One edge case is when we run on x86_64 systems that
> > >> don't support AVX2, but where the compiler can generate such
> > >> instructions.  That could be an enhancement in the future, but for
> > >> now at least the tests will pass.
> > >>
> > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> > >> ---
> > >>  app/test/test_acl.c             | 62 +++++++++++++------------------
> --
> > >>  lib/librte_acl/Makefile         |  1 +
> > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> > >>  lib/librte_acl/meson.build      |  4 +--
> > >>  4 files changed, 73 insertions(+), 40 deletions(-)  create mode
> > >> 100644 lib/librte_acl/acl_run_notsup.c
> > >>
> > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c index
> > >> b1f75d1bc..c44faa251 100644
> > >> --- a/app/test/test_acl.c
> > >> +++ b/app/test/test_acl.c
> > >> @@ -408,6 +408,9 @@ test_classify(void)
> > >>  		return -1;
> > >>  	}
> > >>
> > >> +	/* Always use the scalar testing for now. */
> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > >> +
> > >>  	ret = 0;
> > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> > >>
> > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> > >>  		data[i] = (uint8_t *)&test_data[i];
> > >>
> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> > >>  		rte_acl_reset(acx);
> > >>  		ret = test_classify_buid(acx, test_rules, i + 1); @@ -
> 911,6
> > >> +915,8 @@ test_convert_rules(const char *desc,
> > >>  		return -1;
> > >>  	}
> > >>
> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > >> +
> > >
> > > As I understand here and above, on x86 you replaced default algo
> > > (SSE, AVX2) with scalar one, right?
> > > That looks like reduction of test coverage for x86.
> >
> > In one way, you're right.  However, the tests weren't testing what
> > they purported anyway.
> 
> Could you explain a bit more here?
> What I am seeing: tests were running bot sse(or avx2) and scalar
> classify() method.
> Now they always running scalar only.
> To me it definitely looks like reduction in coverage.
> 
> >  Actually, it's just a shift I think (previously, it would have tested
> > the AVX2 but I don't see AVX2 having a fallback into the SSE code -
> > unlike the SSE code falling back into scalar).
> 
> Not sure I understand you here.
> What fallback for AVX2 you expect that you think is missing?
> 
> >
> > The tests were failing for a number of reasons when built with meson,
> 
> Ok, but with legacy build system (make) on x86 all tests passes, right?
> So the problem is in new build system, not in the test itself.
> Why we should compromise our test coverage to make it work with new tools?
> That just hides the problem without fixing it.
> Instead I think the build system needs to be fixed.
> Looking at it a bit closely, for .so meson+ninja generates code with
> correct version of the function:
> 
> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep
> acl_classify_sse
> 000000000000fa50 t rte_acl_classify_sse
> 
> So for 'meson -Ddefault_library=shared'
> acl_autotest passes without the problem.
> 
> Though for static lib we have both:
> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep
> acl_classify_sse
> 0000000000000000 W rte_acl_classify_sse
> 0000000000004880 T rte_acl_classify_sse
> 
> And then linker pickups the wrong one:
> nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep
> acl_classify_sse
> 00000000005f6100 W rte_acl_classify_sse
> 
> While for make:
> $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep
> acl_classify_sse
> 0000000000000000 W rte_acl_classify_sse
> 0000000000004880 T rte_acl_classify_sse
> $ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
> 0000000000240440 T rte_acl_classify_sse
> 
> Linker pickups the right one.
> 

I assume the same issues occurs for AVX2, but for SSE specifically why do we even compile up two copies of the function for x86 platforms, since SSE will always be supported?

/Bruce

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09 17:05         ` Richardson, Bruce
@ 2019-04-09 17:05           ` Richardson, Bruce
  2019-04-09 18:29           ` Ananyev, Konstantin
  1 sibling, 0 replies; 44+ messages in thread
From: Richardson, Bruce @ 2019-04-09 17:05 UTC (permalink / raw)
  To: Ananyev, Konstantin, Aaron Conole
  Cc: dev, Jerin Jacob, Gavin Hu, Michael Santana



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, April 9, 2019 5:03 PM
> To: Aaron Conole <aconole@redhat.com>
> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Gavin Hu
> <gavin.hu@arm.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> Michael Santana <msantana@redhat.com>
> Subject: RE: [PATCH 3/3] acl: adjust the tests
> 
> 
> > > Hi Aaron,
> > >
> > >>
> > >> This makes the tests pass, and also ensures that on platforms where
> > >> the testing is supported, we can properly test the implementation
> > >> specific code.  One edge case is when we run on x86_64 systems that
> > >> don't support AVX2, but where the compiler can generate such
> > >> instructions.  That could be an enhancement in the future, but for
> > >> now at least the tests will pass.
> > >>
> > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> > >> ---
> > >>  app/test/test_acl.c             | 62 +++++++++++++------------------
> --
> > >>  lib/librte_acl/Makefile         |  1 +
> > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> > >>  lib/librte_acl/meson.build      |  4 +--
> > >>  4 files changed, 73 insertions(+), 40 deletions(-)  create mode
> > >> 100644 lib/librte_acl/acl_run_notsup.c
> > >>
> > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c index
> > >> b1f75d1bc..c44faa251 100644
> > >> --- a/app/test/test_acl.c
> > >> +++ b/app/test/test_acl.c
> > >> @@ -408,6 +408,9 @@ test_classify(void)
> > >>  		return -1;
> > >>  	}
> > >>
> > >> +	/* Always use the scalar testing for now. */
> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > >> +
> > >>  	ret = 0;
> > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> > >>
> > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> > >>  		data[i] = (uint8_t *)&test_data[i];
> > >>
> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> > >>  		rte_acl_reset(acx);
> > >>  		ret = test_classify_buid(acx, test_rules, i + 1); @@ -
> 911,6
> > >> +915,8 @@ test_convert_rules(const char *desc,
> > >>  		return -1;
> > >>  	}
> > >>
> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > >> +
> > >
> > > As I understand here and above, on x86 you replaced default algo
> > > (SSE, AVX2) with scalar one, right?
> > > That looks like reduction of test coverage for x86.
> >
> > In one way, you're right.  However, the tests weren't testing what
> > they purported anyway.
> 
> Could you explain a bit more here?
> What I am seeing: tests were running bot sse(or avx2) and scalar
> classify() method.
> Now they always running scalar only.
> To me it definitely looks like reduction in coverage.
> 
> >  Actually, it's just a shift I think (previously, it would have tested
> > the AVX2 but I don't see AVX2 having a fallback into the SSE code -
> > unlike the SSE code falling back into scalar).
> 
> Not sure I understand you here.
> What fallback for AVX2 you expect that you think is missing?
> 
> >
> > The tests were failing for a number of reasons when built with meson,
> 
> Ok, but with legacy build system (make) on x86 all tests passes, right?
> So the problem is in new build system, not in the test itself.
> Why we should compromise our test coverage to make it work with new tools?
> That just hides the problem without fixing it.
> Instead I think the build system needs to be fixed.
> Looking at it a bit closely, for .so meson+ninja generates code with
> correct version of the function:
> 
> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep
> acl_classify_sse
> 000000000000fa50 t rte_acl_classify_sse
> 
> So for 'meson -Ddefault_library=shared'
> acl_autotest passes without the problem.
> 
> Though for static lib we have both:
> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep
> acl_classify_sse
> 0000000000000000 W rte_acl_classify_sse
> 0000000000004880 T rte_acl_classify_sse
> 
> And then linker pickups the wrong one:
> nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep
> acl_classify_sse
> 00000000005f6100 W rte_acl_classify_sse
> 
> While for make:
> $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep
> acl_classify_sse
> 0000000000000000 W rte_acl_classify_sse
> 0000000000004880 T rte_acl_classify_sse
> $ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
> 0000000000240440 T rte_acl_classify_sse
> 
> Linker pickups the right one.
> 

I assume the same issues occurs for AVX2, but for SSE specifically why do we even compile up two copies of the function for x86 platforms, since SSE will always be supported?

/Bruce

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09 17:05         ` Richardson, Bruce
  2019-04-09 17:05           ` Richardson, Bruce
@ 2019-04-09 18:29           ` Ananyev, Konstantin
  2019-04-09 18:29             ` Ananyev, Konstantin
  2019-04-10  9:06             ` Bruce Richardson
  1 sibling, 2 replies; 44+ messages in thread
From: Ananyev, Konstantin @ 2019-04-09 18:29 UTC (permalink / raw)
  To: Richardson, Bruce, Aaron Conole
  Cc: dev, Jerin Jacob, Gavin Hu, Michael Santana


> >
> > > > Hi Aaron,
> > > >
> > > >>
> > > >> This makes the tests pass, and also ensures that on platforms where
> > > >> the testing is supported, we can properly test the implementation
> > > >> specific code.  One edge case is when we run on x86_64 systems that
> > > >> don't support AVX2, but where the compiler can generate such
> > > >> instructions.  That could be an enhancement in the future, but for
> > > >> now at least the tests will pass.
> > > >>
> > > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> > > >> ---
> > > >>  app/test/test_acl.c             | 62 +++++++++++++------------------
> > --
> > > >>  lib/librte_acl/Makefile         |  1 +
> > > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> > > >>  lib/librte_acl/meson.build      |  4 +--
> > > >>  4 files changed, 73 insertions(+), 40 deletions(-)  create mode
> > > >> 100644 lib/librte_acl/acl_run_notsup.c
> > > >>
> > > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c index
> > > >> b1f75d1bc..c44faa251 100644
> > > >> --- a/app/test/test_acl.c
> > > >> +++ b/app/test/test_acl.c
> > > >> @@ -408,6 +408,9 @@ test_classify(void)
> > > >>  		return -1;
> > > >>  	}
> > > >>
> > > >> +	/* Always use the scalar testing for now. */
> > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > >> +
> > > >>  	ret = 0;
> > > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> > > >>
> > > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> > > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> > > >>  		data[i] = (uint8_t *)&test_data[i];
> > > >>
> > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> > > >>  		rte_acl_reset(acx);
> > > >>  		ret = test_classify_buid(acx, test_rules, i + 1); @@ -
> > 911,6
> > > >> +915,8 @@ test_convert_rules(const char *desc,
> > > >>  		return -1;
> > > >>  	}
> > > >>
> > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > >> +
> > > >
> > > > As I understand here and above, on x86 you replaced default algo
> > > > (SSE, AVX2) with scalar one, right?
> > > > That looks like reduction of test coverage for x86.
> > >
> > > In one way, you're right.  However, the tests weren't testing what
> > > they purported anyway.
> >
> > Could you explain a bit more here?
> > What I am seeing: tests were running bot sse(or avx2) and scalar
> > classify() method.
> > Now they always running scalar only.
> > To me it definitely looks like reduction in coverage.
> >
> > >  Actually, it's just a shift I think (previously, it would have tested
> > > the AVX2 but I don't see AVX2 having a fallback into the SSE code -
> > > unlike the SSE code falling back into scalar).
> >
> > Not sure I understand you here.
> > What fallback for AVX2 you expect that you think is missing?
> >
> > >
> > > The tests were failing for a number of reasons when built with meson,
> >
> > Ok, but with legacy build system (make) on x86 all tests passes, right?
> > So the problem is in new build system, not in the test itself.
> > Why we should compromise our test coverage to make it work with new tools?
> > That just hides the problem without fixing it.
> > Instead I think the build system needs to be fixed.
> > Looking at it a bit closely, for .so meson+ninja generates code with
> > correct version of the function:
> >
> > nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep
> > acl_classify_sse
> > 000000000000fa50 t rte_acl_classify_sse
> >
> > So for 'meson -Ddefault_library=shared'
> > acl_autotest passes without the problem.
> >
> > Though for static lib we have both:
> > nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep
> > acl_classify_sse
> > 0000000000000000 W rte_acl_classify_sse
> > 0000000000004880 T rte_acl_classify_sse
> >
> > And then linker pickups the wrong one:
> > nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep
> > acl_classify_sse
> > 00000000005f6100 W rte_acl_classify_sse
> >
> > While for make:
> > $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep
> > acl_classify_sse
> > 0000000000000000 W rte_acl_classify_sse
> > 0000000000004880 T rte_acl_classify_sse
> > $ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
> > 0000000000240440 T rte_acl_classify_sse
> >
> > Linker pickups the right one.
> >
> 
> I assume the same issues occurs for AVX2, 

Yes, I just used sse because it is always available on x86. 

but for SSE specifically why do we even compile up two copies of the function for x86 platforms,
> since SSE will always be supported?

for non IA  platforms.
Konstantin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09 18:29           ` Ananyev, Konstantin
@ 2019-04-09 18:29             ` Ananyev, Konstantin
  2019-04-10  9:06             ` Bruce Richardson
  1 sibling, 0 replies; 44+ messages in thread
From: Ananyev, Konstantin @ 2019-04-09 18:29 UTC (permalink / raw)
  To: Richardson, Bruce, Aaron Conole
  Cc: dev, Jerin Jacob, Gavin Hu, Michael Santana


> >
> > > > Hi Aaron,
> > > >
> > > >>
> > > >> This makes the tests pass, and also ensures that on platforms where
> > > >> the testing is supported, we can properly test the implementation
> > > >> specific code.  One edge case is when we run on x86_64 systems that
> > > >> don't support AVX2, but where the compiler can generate such
> > > >> instructions.  That could be an enhancement in the future, but for
> > > >> now at least the tests will pass.
> > > >>
> > > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> > > >> ---
> > > >>  app/test/test_acl.c             | 62 +++++++++++++------------------
> > --
> > > >>  lib/librte_acl/Makefile         |  1 +
> > > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> > > >>  lib/librte_acl/meson.build      |  4 +--
> > > >>  4 files changed, 73 insertions(+), 40 deletions(-)  create mode
> > > >> 100644 lib/librte_acl/acl_run_notsup.c
> > > >>
> > > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c index
> > > >> b1f75d1bc..c44faa251 100644
> > > >> --- a/app/test/test_acl.c
> > > >> +++ b/app/test/test_acl.c
> > > >> @@ -408,6 +408,9 @@ test_classify(void)
> > > >>  		return -1;
> > > >>  	}
> > > >>
> > > >> +	/* Always use the scalar testing for now. */
> > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > >> +
> > > >>  	ret = 0;
> > > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> > > >>
> > > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> > > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> > > >>  		data[i] = (uint8_t *)&test_data[i];
> > > >>
> > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> > > >>  		rte_acl_reset(acx);
> > > >>  		ret = test_classify_buid(acx, test_rules, i + 1); @@ -
> > 911,6
> > > >> +915,8 @@ test_convert_rules(const char *desc,
> > > >>  		return -1;
> > > >>  	}
> > > >>
> > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > >> +
> > > >
> > > > As I understand here and above, on x86 you replaced default algo
> > > > (SSE, AVX2) with scalar one, right?
> > > > That looks like reduction of test coverage for x86.
> > >
> > > In one way, you're right.  However, the tests weren't testing what
> > > they purported anyway.
> >
> > Could you explain a bit more here?
> > What I am seeing: tests were running bot sse(or avx2) and scalar
> > classify() method.
> > Now they always running scalar only.
> > To me it definitely looks like reduction in coverage.
> >
> > >  Actually, it's just a shift I think (previously, it would have tested
> > > the AVX2 but I don't see AVX2 having a fallback into the SSE code -
> > > unlike the SSE code falling back into scalar).
> >
> > Not sure I understand you here.
> > What fallback for AVX2 you expect that you think is missing?
> >
> > >
> > > The tests were failing for a number of reasons when built with meson,
> >
> > Ok, but with legacy build system (make) on x86 all tests passes, right?
> > So the problem is in new build system, not in the test itself.
> > Why we should compromise our test coverage to make it work with new tools?
> > That just hides the problem without fixing it.
> > Instead I think the build system needs to be fixed.
> > Looking at it a bit closely, for .so meson+ninja generates code with
> > correct version of the function:
> >
> > nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep
> > acl_classify_sse
> > 000000000000fa50 t rte_acl_classify_sse
> >
> > So for 'meson -Ddefault_library=shared'
> > acl_autotest passes without the problem.
> >
> > Though for static lib we have both:
> > nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep
> > acl_classify_sse
> > 0000000000000000 W rte_acl_classify_sse
> > 0000000000004880 T rte_acl_classify_sse
> >
> > And then linker pickups the wrong one:
> > nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep
> > acl_classify_sse
> > 00000000005f6100 W rte_acl_classify_sse
> >
> > While for make:
> > $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep
> > acl_classify_sse
> > 0000000000000000 W rte_acl_classify_sse
> > 0000000000004880 T rte_acl_classify_sse
> > $ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
> > 0000000000240440 T rte_acl_classify_sse
> >
> > Linker pickups the right one.
> >
> 
> I assume the same issues occurs for AVX2, 

Yes, I just used sse because it is always available on x86. 

but for SSE specifically why do we even compile up two copies of the function for x86 platforms,
> since SSE will always be supported?

for non IA  platforms.
Konstantin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09 17:04         ` Ananyev, Konstantin
  2019-04-09 17:04           ` Ananyev, Konstantin
@ 2019-04-10  8:13           ` Richardson, Bruce
  2019-04-10  8:13             ` Richardson, Bruce
  2019-04-10 13:10           ` Aaron Conole
  2 siblings, 1 reply; 44+ messages in thread
From: Richardson, Bruce @ 2019-04-10  8:13 UTC (permalink / raw)
  To: Ananyev, Konstantin, Aaron Conole
  Cc: dev, Jerin Jacob, Gavin Hu, Michael Santana



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, April 9, 2019 6:05 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Aaron Conole
> <aconole@redhat.com>
> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Gavin Hu
> <gavin.hu@arm.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> Michael Santana <msantana@redhat.com>
> Subject: RE: [PATCH 3/3] acl: adjust the tests
> 
> 
> 
> >
> > > > Hi Aaron,
> > > >
> > > >>
> > > >> This makes the tests pass, and also ensures that on platforms
> > > >> where the testing is supported, we can properly test the
> > > >> implementation specific code.  One edge case is when we run on
> > > >> x86_64 systems that don't support AVX2, but where the compiler
> > > >> can generate such instructions.  That could be an enhancement in
> > > >> the future, but for now at least the tests will pass.
> > > >>
> > > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> > > >> ---
> > > >>  app/test/test_acl.c             | 62 +++++++++++++----------------
> ----
> > > >>  lib/librte_acl/Makefile         |  1 +
> > > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> > > >>  lib/librte_acl/meson.build      |  4 +--
> > > >>  4 files changed, 73 insertions(+), 40 deletions(-)  create mode
> > > >> 100644 lib/librte_acl/acl_run_notsup.c
> > > >>
> > > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c index
> > > >> b1f75d1bc..c44faa251 100644
> > > >> --- a/app/test/test_acl.c
> > > >> +++ b/app/test/test_acl.c
> > > >> @@ -408,6 +408,9 @@ test_classify(void)
> > > >>  		return -1;
> > > >>  	}
> > > >>
> > > >> +	/* Always use the scalar testing for now. */
> > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > >> +
> > > >>  	ret = 0;
> > > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> > > >>
> > > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> > > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> > > >>  		data[i] = (uint8_t *)&test_data[i];
> > > >>
> > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> > > >>  		rte_acl_reset(acx);
> > > >>  		ret = test_classify_buid(acx, test_rules, i + 1); @@ -
> 911,6
> > > >> +915,8 @@ test_convert_rules(const char *desc,
> > > >>  		return -1;
> > > >>  	}
> > > >>
> > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > >> +
> > > >
> > > > As I understand here and above, on x86 you replaced default algo
> > > > (SSE, AVX2) with scalar one, right?
> > > > That looks like reduction of test coverage for x86.
> > >
> > > In one way, you're right.  However, the tests weren't testing what
> > > they purported anyway.
> >
> > Could you explain a bit more here?
> > What I am seeing: tests were running bot sse(or avx2) and scalar
> classify() method.
> > Now they always running scalar only.
> > To me it definitely looks like reduction in coverage.
> >
> > >  Actually, it's just a shift I think (previously, it would have
> > > tested the AVX2 but I don't see AVX2 having a fallback into the SSE
> > > code - unlike the SSE code falling back into scalar).
> >
> > Not sure I understand you here.
> > What fallback for AVX2 you expect that you think is missing?
> >
> > >
> > > The tests were failing for a number of reasons when built with
> > > meson,
> >
> > Ok, but with legacy build system (make) on x86 all tests passes, right?
> > So the problem is in new build system, not in the test itself.
> > Why we should compromise our test coverage to make it work with new
> > tools?
> > That just hides the problem without fixing it.
> > Instead I think the build system needs to be fixed.
> > Looking at it a bit closely, for .so meson+ninja generates code with
> > correct version of the function:
> >
> > nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep
> > acl_classify_sse
> > 000000000000fa50 t rte_acl_classify_sse
> >
> > So for 'meson -Ddefault_library=shared'
> > acl_autotest passes without the problem.
> >
> > Though for static lib we have both:
> > nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep
> > acl_classify_sse
> > 0000000000000000 W rte_acl_classify_sse
> > 0000000000004880 T rte_acl_classify_sse
> >
> > And then linker pickups the wrong one:
> > nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep
> > acl_classify_sse
> > 00000000005f6100 W rte_acl_classify_sse
> >
> > While for make:
> > $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep
> > acl_classify_sse
> > 0000000000000000 W rte_acl_classify_sse
> > 0000000000004880 T rte_acl_classify_sse $ nm
> > x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
> > 0000000000240440 T rte_acl_classify_sse
> >
> > Linker pickups the right one.
> 
> And the changes below make linker to pick-up the proper version of the
> function and make acl_autotest to pass for static build too.
> 
> diff --git a/app/test/meson.build b/app/test/meson.build index
> 867cc5863..4364be932 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -328,6 +328,7 @@ test_dep_objs += cc.find_library('execinfo', required:
> false)  link_libs = []  if get_option('default_library') == 'static'
>         link_libs = dpdk_drivers
> +       link_libs += dpdk_static_libraries
>  endif
> 
>  if get_option('tests')
> diff --git a/meson.build b/meson.build
> index a96486597..df1e1c41c 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -62,6 +62,7 @@ configure_file(output: build_cfg,  # for static builds,
> include the drivers as libs and we need to "whole-archive"
>  # them.
>  dpdk_drivers = ['-Wl,--whole-archive'] + dpdk_drivers + ['-Wl,--no-whole-
> archive']
> +dpdk_static_libraries = ['-Wl,--whole-archive'] + dpdk_static_libraries
> ++ ['-Wl,--no-whole-archive']
> 
> Not saying that's the proper patch, but just to prove that linking
> librte_acl.a with '--whole-archive' does fix the problem.
> Bruce, could you point is the best way to fix things here (my meson
> knowledge is very limited)?
> Do we need extra container here as 'whole_archive_static_libraries[]' or
> so?
> Thanks
> Konstantin
> 
I'll look into this. I'd really rather avoid having to have all DPDK libraries
linked with link-whole, but if not, we'll need some sort of similar solution.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-10  8:13           ` Richardson, Bruce
@ 2019-04-10  8:13             ` Richardson, Bruce
  0 siblings, 0 replies; 44+ messages in thread
From: Richardson, Bruce @ 2019-04-10  8:13 UTC (permalink / raw)
  To: Ananyev, Konstantin, Aaron Conole
  Cc: dev, Jerin Jacob, Gavin Hu, Michael Santana



> -----Original Message-----
> From: Ananyev, Konstantin
> Sent: Tuesday, April 9, 2019 6:05 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>; Aaron Conole
> <aconole@redhat.com>
> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Gavin Hu
> <gavin.hu@arm.com>; Richardson, Bruce <bruce.richardson@intel.com>;
> Michael Santana <msantana@redhat.com>
> Subject: RE: [PATCH 3/3] acl: adjust the tests
> 
> 
> 
> >
> > > > Hi Aaron,
> > > >
> > > >>
> > > >> This makes the tests pass, and also ensures that on platforms
> > > >> where the testing is supported, we can properly test the
> > > >> implementation specific code.  One edge case is when we run on
> > > >> x86_64 systems that don't support AVX2, but where the compiler
> > > >> can generate such instructions.  That could be an enhancement in
> > > >> the future, but for now at least the tests will pass.
> > > >>
> > > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> > > >> ---
> > > >>  app/test/test_acl.c             | 62 +++++++++++++----------------
> ----
> > > >>  lib/librte_acl/Makefile         |  1 +
> > > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> > > >>  lib/librte_acl/meson.build      |  4 +--
> > > >>  4 files changed, 73 insertions(+), 40 deletions(-)  create mode
> > > >> 100644 lib/librte_acl/acl_run_notsup.c
> > > >>
> > > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c index
> > > >> b1f75d1bc..c44faa251 100644
> > > >> --- a/app/test/test_acl.c
> > > >> +++ b/app/test/test_acl.c
> > > >> @@ -408,6 +408,9 @@ test_classify(void)
> > > >>  		return -1;
> > > >>  	}
> > > >>
> > > >> +	/* Always use the scalar testing for now. */
> > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > >> +
> > > >>  	ret = 0;
> > > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> > > >>
> > > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> > > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> > > >>  		data[i] = (uint8_t *)&test_data[i];
> > > >>
> > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> > > >>  		rte_acl_reset(acx);
> > > >>  		ret = test_classify_buid(acx, test_rules, i + 1); @@ -
> 911,6
> > > >> +915,8 @@ test_convert_rules(const char *desc,
> > > >>  		return -1;
> > > >>  	}
> > > >>
> > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > >> +
> > > >
> > > > As I understand here and above, on x86 you replaced default algo
> > > > (SSE, AVX2) with scalar one, right?
> > > > That looks like reduction of test coverage for x86.
> > >
> > > In one way, you're right.  However, the tests weren't testing what
> > > they purported anyway.
> >
> > Could you explain a bit more here?
> > What I am seeing: tests were running bot sse(or avx2) and scalar
> classify() method.
> > Now they always running scalar only.
> > To me it definitely looks like reduction in coverage.
> >
> > >  Actually, it's just a shift I think (previously, it would have
> > > tested the AVX2 but I don't see AVX2 having a fallback into the SSE
> > > code - unlike the SSE code falling back into scalar).
> >
> > Not sure I understand you here.
> > What fallback for AVX2 you expect that you think is missing?
> >
> > >
> > > The tests were failing for a number of reasons when built with
> > > meson,
> >
> > Ok, but with legacy build system (make) on x86 all tests passes, right?
> > So the problem is in new build system, not in the test itself.
> > Why we should compromise our test coverage to make it work with new
> > tools?
> > That just hides the problem without fixing it.
> > Instead I think the build system needs to be fixed.
> > Looking at it a bit closely, for .so meson+ninja generates code with
> > correct version of the function:
> >
> > nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep
> > acl_classify_sse
> > 000000000000fa50 t rte_acl_classify_sse
> >
> > So for 'meson -Ddefault_library=shared'
> > acl_autotest passes without the problem.
> >
> > Though for static lib we have both:
> > nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep
> > acl_classify_sse
> > 0000000000000000 W rte_acl_classify_sse
> > 0000000000004880 T rte_acl_classify_sse
> >
> > And then linker pickups the wrong one:
> > nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep
> > acl_classify_sse
> > 00000000005f6100 W rte_acl_classify_sse
> >
> > While for make:
> > $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep
> > acl_classify_sse
> > 0000000000000000 W rte_acl_classify_sse
> > 0000000000004880 T rte_acl_classify_sse $ nm
> > x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
> > 0000000000240440 T rte_acl_classify_sse
> >
> > Linker pickups the right one.
> 
> And the changes below make linker to pick-up the proper version of the
> function and make acl_autotest to pass for static build too.
> 
> diff --git a/app/test/meson.build b/app/test/meson.build index
> 867cc5863..4364be932 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -328,6 +328,7 @@ test_dep_objs += cc.find_library('execinfo', required:
> false)  link_libs = []  if get_option('default_library') == 'static'
>         link_libs = dpdk_drivers
> +       link_libs += dpdk_static_libraries
>  endif
> 
>  if get_option('tests')
> diff --git a/meson.build b/meson.build
> index a96486597..df1e1c41c 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -62,6 +62,7 @@ configure_file(output: build_cfg,  # for static builds,
> include the drivers as libs and we need to "whole-archive"
>  # them.
>  dpdk_drivers = ['-Wl,--whole-archive'] + dpdk_drivers + ['-Wl,--no-whole-
> archive']
> +dpdk_static_libraries = ['-Wl,--whole-archive'] + dpdk_static_libraries
> ++ ['-Wl,--no-whole-archive']
> 
> Not saying that's the proper patch, but just to prove that linking
> librte_acl.a with '--whole-archive' does fix the problem.
> Bruce, could you point is the best way to fix things here (my meson
> knowledge is very limited)?
> Do we need extra container here as 'whole_archive_static_libraries[]' or
> so?
> Thanks
> Konstantin
> 
I'll look into this. I'd really rather avoid having to have all DPDK libraries
linked with link-whole, but if not, we'll need some sort of similar solution.

Regards,
/Bruce

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09 18:29           ` Ananyev, Konstantin
  2019-04-09 18:29             ` Ananyev, Konstantin
@ 2019-04-10  9:06             ` Bruce Richardson
  2019-04-10  9:06               ` Bruce Richardson
  1 sibling, 1 reply; 44+ messages in thread
From: Bruce Richardson @ 2019-04-10  9:06 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Aaron Conole, dev, Jerin Jacob, Gavin Hu, Michael Santana

On Tue, Apr 09, 2019 at 07:29:09PM +0100, Ananyev, Konstantin wrote:
> 
> > >
> > > > > Hi Aaron,
> > > > >
> > > > >>
> > > > >> This makes the tests pass, and also ensures that on platforms where
> > > > >> the testing is supported, we can properly test the implementation
> > > > >> specific code.  One edge case is when we run on x86_64 systems that
> > > > >> don't support AVX2, but where the compiler can generate such
> > > > >> instructions.  That could be an enhancement in the future, but for
> > > > >> now at least the tests will pass.
> > > > >>
> > > > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> > > > >> ---
> > > > >>  app/test/test_acl.c             | 62 +++++++++++++------------------
> > > --
> > > > >>  lib/librte_acl/Makefile         |  1 +
> > > > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> > > > >>  lib/librte_acl/meson.build      |  4 +--
> > > > >>  4 files changed, 73 insertions(+), 40 deletions(-)  create mode
> > > > >> 100644 lib/librte_acl/acl_run_notsup.c
> > > > >>
> > > > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c index
> > > > >> b1f75d1bc..c44faa251 100644
> > > > >> --- a/app/test/test_acl.c
> > > > >> +++ b/app/test/test_acl.c
> > > > >> @@ -408,6 +408,9 @@ test_classify(void)
> > > > >>  		return -1;
> > > > >>  	}
> > > > >>
> > > > >> +	/* Always use the scalar testing for now. */
> > > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > > >> +
> > > > >>  	ret = 0;
> > > > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> > > > >>
> > > > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> > > > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> > > > >>  		data[i] = (uint8_t *)&test_data[i];
> > > > >>
> > > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> > > > >>  		rte_acl_reset(acx);
> > > > >>  		ret = test_classify_buid(acx, test_rules, i + 1); @@ -
> > > 911,6
> > > > >> +915,8 @@ test_convert_rules(const char *desc,
> > > > >>  		return -1;
> > > > >>  	}
> > > > >>
> > > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > > >> +
> > > > >
> > > > > As I understand here and above, on x86 you replaced default algo
> > > > > (SSE, AVX2) with scalar one, right?
> > > > > That looks like reduction of test coverage for x86.
> > > >
> > > > In one way, you're right.  However, the tests weren't testing what
> > > > they purported anyway.
> > >
> > > Could you explain a bit more here?
> > > What I am seeing: tests were running bot sse(or avx2) and scalar
> > > classify() method.
> > > Now they always running scalar only.
> > > To me it definitely looks like reduction in coverage.
> > >
> > > >  Actually, it's just a shift I think (previously, it would have tested
> > > > the AVX2 but I don't see AVX2 having a fallback into the SSE code -
> > > > unlike the SSE code falling back into scalar).
> > >
> > > Not sure I understand you here.
> > > What fallback for AVX2 you expect that you think is missing?
> > >
> > > >
> > > > The tests were failing for a number of reasons when built with meson,
> > >
> > > Ok, but with legacy build system (make) on x86 all tests passes, right?
> > > So the problem is in new build system, not in the test itself.
> > > Why we should compromise our test coverage to make it work with new tools?
> > > That just hides the problem without fixing it.
> > > Instead I think the build system needs to be fixed.
> > > Looking at it a bit closely, for .so meson+ninja generates code with
> > > correct version of the function:
> > >
> > > nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep
> > > acl_classify_sse
> > > 000000000000fa50 t rte_acl_classify_sse
> > >
> > > So for 'meson -Ddefault_library=shared'
> > > acl_autotest passes without the problem.
> > >
> > > Though for static lib we have both:
> > > nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep
> > > acl_classify_sse
> > > 0000000000000000 W rte_acl_classify_sse
> > > 0000000000004880 T rte_acl_classify_sse
> > >
> > > And then linker pickups the wrong one:
> > > nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep
> > > acl_classify_sse
> > > 00000000005f6100 W rte_acl_classify_sse
> > >
> > > While for make:
> > > $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep
> > > acl_classify_sse
> > > 0000000000000000 W rte_acl_classify_sse
> > > 0000000000004880 T rte_acl_classify_sse
> > > $ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
> > > 0000000000240440 T rte_acl_classify_sse
> > >
> > > Linker pickups the right one.
> > >
> > 
> > I assume the same issues occurs for AVX2, 
> 
> Yes, I just used sse because it is always available on x86. 
> 
> but for SSE specifically why do we even compile up two copies of the function for x86 platforms,
> > since SSE will always be supported?
> 
> for non IA  platforms.

Yes, I realise that, but there is no point in compiling the weak version
for IA platforms, since the normal version will be guaranteed available. In
any case, it doesn't matter much, since the issue needs fixing for AVX2
anyway.

/Bruce

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-10  9:06             ` Bruce Richardson
@ 2019-04-10  9:06               ` Bruce Richardson
  0 siblings, 0 replies; 44+ messages in thread
From: Bruce Richardson @ 2019-04-10  9:06 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Aaron Conole, dev, Jerin Jacob, Gavin Hu, Michael Santana

On Tue, Apr 09, 2019 at 07:29:09PM +0100, Ananyev, Konstantin wrote:
> 
> > >
> > > > > Hi Aaron,
> > > > >
> > > > >>
> > > > >> This makes the tests pass, and also ensures that on platforms where
> > > > >> the testing is supported, we can properly test the implementation
> > > > >> specific code.  One edge case is when we run on x86_64 systems that
> > > > >> don't support AVX2, but where the compiler can generate such
> > > > >> instructions.  That could be an enhancement in the future, but for
> > > > >> now at least the tests will pass.
> > > > >>
> > > > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> > > > >> ---
> > > > >>  app/test/test_acl.c             | 62 +++++++++++++------------------
> > > --
> > > > >>  lib/librte_acl/Makefile         |  1 +
> > > > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> > > > >>  lib/librte_acl/meson.build      |  4 +--
> > > > >>  4 files changed, 73 insertions(+), 40 deletions(-)  create mode
> > > > >> 100644 lib/librte_acl/acl_run_notsup.c
> > > > >>
> > > > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c index
> > > > >> b1f75d1bc..c44faa251 100644
> > > > >> --- a/app/test/test_acl.c
> > > > >> +++ b/app/test/test_acl.c
> > > > >> @@ -408,6 +408,9 @@ test_classify(void)
> > > > >>  		return -1;
> > > > >>  	}
> > > > >>
> > > > >> +	/* Always use the scalar testing for now. */
> > > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > > >> +
> > > > >>  	ret = 0;
> > > > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> > > > >>
> > > > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> > > > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> > > > >>  		data[i] = (uint8_t *)&test_data[i];
> > > > >>
> > > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> > > > >>  		rte_acl_reset(acx);
> > > > >>  		ret = test_classify_buid(acx, test_rules, i + 1); @@ -
> > > 911,6
> > > > >> +915,8 @@ test_convert_rules(const char *desc,
> > > > >>  		return -1;
> > > > >>  	}
> > > > >>
> > > > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> > > > >> +
> > > > >
> > > > > As I understand here and above, on x86 you replaced default algo
> > > > > (SSE, AVX2) with scalar one, right?
> > > > > That looks like reduction of test coverage for x86.
> > > >
> > > > In one way, you're right.  However, the tests weren't testing what
> > > > they purported anyway.
> > >
> > > Could you explain a bit more here?
> > > What I am seeing: tests were running bot sse(or avx2) and scalar
> > > classify() method.
> > > Now they always running scalar only.
> > > To me it definitely looks like reduction in coverage.
> > >
> > > >  Actually, it's just a shift I think (previously, it would have tested
> > > > the AVX2 but I don't see AVX2 having a fallback into the SSE code -
> > > > unlike the SSE code falling back into scalar).
> > >
> > > Not sure I understand you here.
> > > What fallback for AVX2 you expect that you think is missing?
> > >
> > > >
> > > > The tests were failing for a number of reasons when built with meson,
> > >
> > > Ok, but with legacy build system (make) on x86 all tests passes, right?
> > > So the problem is in new build system, not in the test itself.
> > > Why we should compromise our test coverage to make it work with new tools?
> > > That just hides the problem without fixing it.
> > > Instead I think the build system needs to be fixed.
> > > Looking at it a bit closely, for .so meson+ninja generates code with
> > > correct version of the function:
> > >
> > > nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep
> > > acl_classify_sse
> > > 000000000000fa50 t rte_acl_classify_sse
> > >
> > > So for 'meson -Ddefault_library=shared'
> > > acl_autotest passes without the problem.
> > >
> > > Though for static lib we have both:
> > > nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep
> > > acl_classify_sse
> > > 0000000000000000 W rte_acl_classify_sse
> > > 0000000000004880 T rte_acl_classify_sse
> > >
> > > And then linker pickups the wrong one:
> > > nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep
> > > acl_classify_sse
> > > 00000000005f6100 W rte_acl_classify_sse
> > >
> > > While for make:
> > > $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep
> > > acl_classify_sse
> > > 0000000000000000 W rte_acl_classify_sse
> > > 0000000000004880 T rte_acl_classify_sse
> > > $ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
> > > 0000000000240440 T rte_acl_classify_sse
> > >
> > > Linker pickups the right one.
> > >
> > 
> > I assume the same issues occurs for AVX2, 
> 
> Yes, I just used sse because it is always available on x86. 
> 
> but for SSE specifically why do we even compile up two copies of the function for x86 platforms,
> > since SSE will always be supported?
> 
> for non IA  platforms.

Yes, I realise that, but there is no point in compiling the weak version
for IA platforms, since the normal version will be guaranteed available. In
any case, it doesn't matter much, since the issue needs fixing for AVX2
anyway.

/Bruce

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-09 17:04         ` Ananyev, Konstantin
  2019-04-09 17:04           ` Ananyev, Konstantin
  2019-04-10  8:13           ` Richardson, Bruce
@ 2019-04-10 13:10           ` Aaron Conole
  2019-04-10 13:10             ` Aaron Conole
  2019-04-10 13:24             ` Bruce Richardson
  2 siblings, 2 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-10 13:10 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: dev, Jerin Jacob, Gavin Hu, Richardson, Bruce, Michael Santana

"Ananyev, Konstantin" <konstantin.ananyev@intel.com> writes:

>> 
>> > > Hi Aaron,
>> > >
>> > >>
>> > >> This makes the tests pass, and also ensures that on platforms where the
>> > >> testing is supported, we can properly test the implementation specific
>> > >> code.  One edge case is when we run on x86_64 systems that don't support
>> > >> AVX2, but where the compiler can generate such instructions.  That could
>> > >> be an enhancement in the future, but for now at least the tests will
>> > >> pass.
>> > >>
>> > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
>> > >> ---
>> > >>  app/test/test_acl.c             | 62 +++++++++++++--------------------
>> > >>  lib/librte_acl/Makefile         |  1 +
>> > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
>> > >>  lib/librte_acl/meson.build      |  4 +--
>> > >>  4 files changed, 73 insertions(+), 40 deletions(-)
>> > >>  create mode 100644 lib/librte_acl/acl_run_notsup.c
>> > >>
>> > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
>> > >> index b1f75d1bc..c44faa251 100644
>> > >> --- a/app/test/test_acl.c
>> > >> +++ b/app/test/test_acl.c
>> > >> @@ -408,6 +408,9 @@ test_classify(void)
>> > >>  		return -1;
>> > >>  	}
>> > >>
>> > >> +	/* Always use the scalar testing for now. */
>> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>> > >> +
>> > >>  	ret = 0;
>> > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
>> > >>
>> > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
>> > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
>> > >>  		data[i] = (uint8_t *)&test_data[i];
>> > >>
>> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>> > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
>> > >>  		rte_acl_reset(acx);
>> > >>  		ret = test_classify_buid(acx, test_rules, i + 1);
>> > >> @@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
>> > >>  		return -1;
>> > >>  	}
>> > >>
>> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>> > >> +
>> > >
>> > > As I understand here and above, on x86 you replaced default algo (SSE, AVX2)
>> > > with scalar one, right?
>> > > That looks like reduction of test coverage for x86.
>> >
>> > In one way, you're right.  However, the tests weren't testing what they
>> > purported anyway.
>> 
>> Could you explain a bit more here?
>> What I am seeing: tests were running bot sse(or avx2) and scalar classify() method.
>> Now they always running scalar only.
>> To me it definitely looks like reduction in coverage.
>> 
>> >  Actually, it's just a shift I think (previously, it
>> > would have tested the AVX2 but I don't see AVX2 having a fallback into
>> > the SSE code - unlike the SSE code falling back into scalar).
>> 
>> Not sure I understand you here.
>> What fallback for AVX2 you expect that you think is missing?
>> 
>> >
>> > The tests were failing for a number of reasons when built with meson,
>> 
>> Ok, but with legacy build system (make) on x86 all tests passes, right?
>> So the problem is in new build system, not in the test itself.
>> Why we should compromise our test coverage to make it work with
>> new tools?
>> That just hides the problem without fixing it.
>> Instead I think the build system needs to be fixed.
>> Looking at it a bit closely, for .so meson+ninja generates code with
>> correct version of the function:
>> 
>> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep acl_classify_sse
>> 000000000000fa50 t rte_acl_classify_sse
>> 
>> So for 'meson -Ddefault_library=shared'
>> acl_autotest passes without the problem.
>> 
>> Though for static lib we have both:
>> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep acl_classify_sse
>> 0000000000000000 W rte_acl_classify_sse
>> 0000000000004880 T rte_acl_classify_sse
>> 
>> And then linker pickups the wrong one:
>> nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep acl_classify_sse
>> 00000000005f6100 W rte_acl_classify_sse
>> 
>> While for make:
>> $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep acl_classify_sse
>> 0000000000000000 W rte_acl_classify_sse
>> 0000000000004880 T rte_acl_classify_sse
>> $ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
>> 0000000000240440 T rte_acl_classify_sse
>> 
>> Linker pickups the right one.
>
> And the changes below make linker to pick-up the proper version of the function
> and make acl_autotest to pass for static build too.
>
> diff --git a/app/test/meson.build b/app/test/meson.build
> index 867cc5863..4364be932 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -328,6 +328,7 @@ test_dep_objs += cc.find_library('execinfo', required: false)
>  link_libs = []
>  if get_option('default_library') == 'static'
>         link_libs = dpdk_drivers
> +       link_libs += dpdk_static_libraries
>  endif
>
>  if get_option('tests')
> diff --git a/meson.build b/meson.build
> index a96486597..df1e1c41c 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -62,6 +62,7 @@ configure_file(output: build_cfg,
>  # for static builds, include the drivers as libs and we need to "whole-archive"
>  # them.
>  dpdk_drivers = ['-Wl,--whole-archive'] + dpdk_drivers + ['-Wl,--no-whole-archive']
> +dpdk_static_libraries = ['-Wl,--whole-archive'] + dpdk_static_libraries + ['-Wl,--no-whole-archive']
>
> Not saying that's the proper patch, but just to prove that linking librte_acl.a
> with '--whole-archive' does fix the problem.
> Bruce, could you point is the best way to fix things here
> (my meson knowledge is very limited)?
> Do we need extra container here as 'whole_archive_static_libraries[]' or so?
> Thanks
> Konstantin

Okay - I'll look at this part more.  I think I went down the path of
explicitly setting these because the comments didn't match with what was
occuring (for example, in the section that I changed that loops through
all versions, only the AVX2 and Scalar were being tested on my system,
while the comment implied SSE).

I also believe that I split out the functions because of the linking
issue (I guess the way the linker resolves the functions works properly
when the weak versions are in a different translation unit)?  I'll spend
some time trying to get it working in a different way.

Regardless, this wasn't ready for posting as 'PATCH' - I meant it as
RFC.  I don't intend to change the first two patches, though.

And thank you for the all the feedback!

>
>> 
>> 
>> > and on the systems I tested with (including tests under travis).
>> >
>> > 1. Any meson build that I observed didn't correctly fill anything but
>> >    the scalar variable.  I had to remove the -ENOTSUP definitions in the
>> >    rte_acl.c file (forgot to git add it), and make the second version.
>> >
>> > 2. The tests never selected scalar, or nor sse implementations.
>> 
>> As I can see test_classify_run() *always* run both default method (sse/avx2 on x86)
>> and then scalar one.
>> 
>> >  Rather,
>> >    they selected only what the currently running platform provided.
>> >    This meant that I was always seeing the AVX2 code executed, but never
>> >    SSE nor scalar (but for one case) - at least as far as I could see.
>> >
>> > There were others - I iterated on these for a few days.
>> >
>> > This is why I changed a block to run through each implementation in one
>> > of the versions.
>> >
>> > HOWEVER, it's still deficient.
>> >
>> > We need to fully cover all the cases.  BUT it's better than the failure
>> > that currently happens on almost every system I've tried - including
>> > shipping the build to travis to run.  So, I figured running > failing with
>> > almost no reason why.  And looking into the failure revealed that the
>> > meson build didn't even include the platform specific builds.
>> >
>> > During my rework, I can change the test cases to iterate as in other
>> > test cases.  It will extend the time.  And I don't know how to resolve
>> > the case where we run on a system that doesn't support AVX2 but we have
>> > a compiler that supports AVX2 (since that case will fail - but we
>> > shouldn't even attempt it).
>> 
>> I don't see why that should happen.
>> At rte_acl_init() we do check does that machine supports AVX2(SSE, NEON)
>> instructions or not.
>> Are you saying under some circumstances rte_acl_init() are not working properly,
>> or not get invoked?
>> 
>> Konstantin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-10 13:10           ` Aaron Conole
@ 2019-04-10 13:10             ` Aaron Conole
  2019-04-10 13:24             ` Bruce Richardson
  1 sibling, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-10 13:10 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: dev, Jerin Jacob, Gavin Hu, Richardson, Bruce, Michael Santana

"Ananyev, Konstantin" <konstantin.ananyev@intel.com> writes:

>> 
>> > > Hi Aaron,
>> > >
>> > >>
>> > >> This makes the tests pass, and also ensures that on platforms where the
>> > >> testing is supported, we can properly test the implementation specific
>> > >> code.  One edge case is when we run on x86_64 systems that don't support
>> > >> AVX2, but where the compiler can generate such instructions.  That could
>> > >> be an enhancement in the future, but for now at least the tests will
>> > >> pass.
>> > >>
>> > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
>> > >> ---
>> > >>  app/test/test_acl.c             | 62 +++++++++++++--------------------
>> > >>  lib/librte_acl/Makefile         |  1 +
>> > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
>> > >>  lib/librte_acl/meson.build      |  4 +--
>> > >>  4 files changed, 73 insertions(+), 40 deletions(-)
>> > >>  create mode 100644 lib/librte_acl/acl_run_notsup.c
>> > >>
>> > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
>> > >> index b1f75d1bc..c44faa251 100644
>> > >> --- a/app/test/test_acl.c
>> > >> +++ b/app/test/test_acl.c
>> > >> @@ -408,6 +408,9 @@ test_classify(void)
>> > >>  		return -1;
>> > >>  	}
>> > >>
>> > >> +	/* Always use the scalar testing for now. */
>> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>> > >> +
>> > >>  	ret = 0;
>> > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
>> > >>
>> > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
>> > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
>> > >>  		data[i] = (uint8_t *)&test_data[i];
>> > >>
>> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>> > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
>> > >>  		rte_acl_reset(acx);
>> > >>  		ret = test_classify_buid(acx, test_rules, i + 1);
>> > >> @@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
>> > >>  		return -1;
>> > >>  	}
>> > >>
>> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
>> > >> +
>> > >
>> > > As I understand here and above, on x86 you replaced default algo (SSE, AVX2)
>> > > with scalar one, right?
>> > > That looks like reduction of test coverage for x86.
>> >
>> > In one way, you're right.  However, the tests weren't testing what they
>> > purported anyway.
>> 
>> Could you explain a bit more here?
>> What I am seeing: tests were running bot sse(or avx2) and scalar classify() method.
>> Now they always running scalar only.
>> To me it definitely looks like reduction in coverage.
>> 
>> >  Actually, it's just a shift I think (previously, it
>> > would have tested the AVX2 but I don't see AVX2 having a fallback into
>> > the SSE code - unlike the SSE code falling back into scalar).
>> 
>> Not sure I understand you here.
>> What fallback for AVX2 you expect that you think is missing?
>> 
>> >
>> > The tests were failing for a number of reasons when built with meson,
>> 
>> Ok, but with legacy build system (make) on x86 all tests passes, right?
>> So the problem is in new build system, not in the test itself.
>> Why we should compromise our test coverage to make it work with
>> new tools?
>> That just hides the problem without fixing it.
>> Instead I think the build system needs to be fixed.
>> Looking at it a bit closely, for .so meson+ninja generates code with
>> correct version of the function:
>> 
>> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep acl_classify_sse
>> 000000000000fa50 t rte_acl_classify_sse
>> 
>> So for 'meson -Ddefault_library=shared'
>> acl_autotest passes without the problem.
>> 
>> Though for static lib we have both:
>> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep acl_classify_sse
>> 0000000000000000 W rte_acl_classify_sse
>> 0000000000004880 T rte_acl_classify_sse
>> 
>> And then linker pickups the wrong one:
>> nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep acl_classify_sse
>> 00000000005f6100 W rte_acl_classify_sse
>> 
>> While for make:
>> $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep acl_classify_sse
>> 0000000000000000 W rte_acl_classify_sse
>> 0000000000004880 T rte_acl_classify_sse
>> $ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
>> 0000000000240440 T rte_acl_classify_sse
>> 
>> Linker pickups the right one.
>
> And the changes below make linker to pick-up the proper version of the function
> and make acl_autotest to pass for static build too.
>
> diff --git a/app/test/meson.build b/app/test/meson.build
> index 867cc5863..4364be932 100644
> --- a/app/test/meson.build
> +++ b/app/test/meson.build
> @@ -328,6 +328,7 @@ test_dep_objs += cc.find_library('execinfo', required: false)
>  link_libs = []
>  if get_option('default_library') == 'static'
>         link_libs = dpdk_drivers
> +       link_libs += dpdk_static_libraries
>  endif
>
>  if get_option('tests')
> diff --git a/meson.build b/meson.build
> index a96486597..df1e1c41c 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -62,6 +62,7 @@ configure_file(output: build_cfg,
>  # for static builds, include the drivers as libs and we need to "whole-archive"
>  # them.
>  dpdk_drivers = ['-Wl,--whole-archive'] + dpdk_drivers + ['-Wl,--no-whole-archive']
> +dpdk_static_libraries = ['-Wl,--whole-archive'] + dpdk_static_libraries + ['-Wl,--no-whole-archive']
>
> Not saying that's the proper patch, but just to prove that linking librte_acl.a
> with '--whole-archive' does fix the problem.
> Bruce, could you point is the best way to fix things here
> (my meson knowledge is very limited)?
> Do we need extra container here as 'whole_archive_static_libraries[]' or so?
> Thanks
> Konstantin

Okay - I'll look at this part more.  I think I went down the path of
explicitly setting these because the comments didn't match with what was
occuring (for example, in the section that I changed that loops through
all versions, only the AVX2 and Scalar were being tested on my system,
while the comment implied SSE).

I also believe that I split out the functions because of the linking
issue (I guess the way the linker resolves the functions works properly
when the weak versions are in a different translation unit)?  I'll spend
some time trying to get it working in a different way.

Regardless, this wasn't ready for posting as 'PATCH' - I meant it as
RFC.  I don't intend to change the first two patches, though.

And thank you for the all the feedback!

>
>> 
>> 
>> > and on the systems I tested with (including tests under travis).
>> >
>> > 1. Any meson build that I observed didn't correctly fill anything but
>> >    the scalar variable.  I had to remove the -ENOTSUP definitions in the
>> >    rte_acl.c file (forgot to git add it), and make the second version.
>> >
>> > 2. The tests never selected scalar, or nor sse implementations.
>> 
>> As I can see test_classify_run() *always* run both default method (sse/avx2 on x86)
>> and then scalar one.
>> 
>> >  Rather,
>> >    they selected only what the currently running platform provided.
>> >    This meant that I was always seeing the AVX2 code executed, but never
>> >    SSE nor scalar (but for one case) - at least as far as I could see.
>> >
>> > There were others - I iterated on these for a few days.
>> >
>> > This is why I changed a block to run through each implementation in one
>> > of the versions.
>> >
>> > HOWEVER, it's still deficient.
>> >
>> > We need to fully cover all the cases.  BUT it's better than the failure
>> > that currently happens on almost every system I've tried - including
>> > shipping the build to travis to run.  So, I figured running > failing with
>> > almost no reason why.  And looking into the failure revealed that the
>> > meson build didn't even include the platform specific builds.
>> >
>> > During my rework, I can change the test cases to iterate as in other
>> > test cases.  It will extend the time.  And I don't know how to resolve
>> > the case where we run on a system that doesn't support AVX2 but we have
>> > a compiler that supports AVX2 (since that case will fail - but we
>> > shouldn't even attempt it).
>> 
>> I don't see why that should happen.
>> At rte_acl_init() we do check does that machine supports AVX2(SSE, NEON)
>> instructions or not.
>> Are you saying under some circumstances rte_acl_init() are not working properly,
>> or not get invoked?
>> 
>> Konstantin

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-10 13:10           ` Aaron Conole
  2019-04-10 13:10             ` Aaron Conole
@ 2019-04-10 13:24             ` Bruce Richardson
  2019-04-10 13:24               ` Bruce Richardson
  2019-04-10 13:46               ` Bruce Richardson
  1 sibling, 2 replies; 44+ messages in thread
From: Bruce Richardson @ 2019-04-10 13:24 UTC (permalink / raw)
  To: Aaron Conole
  Cc: Ananyev, Konstantin, dev, Jerin Jacob, Gavin Hu, Michael Santana

On Wed, Apr 10, 2019 at 09:10:25AM -0400, Aaron Conole wrote:
> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> writes:
> 
> >> 
> >> > > Hi Aaron,
> >> > >
> >> > >>
> >> > >> This makes the tests pass, and also ensures that on platforms where the
> >> > >> testing is supported, we can properly test the implementation specific
> >> > >> code.  One edge case is when we run on x86_64 systems that don't support
> >> > >> AVX2, but where the compiler can generate such instructions.  That could
> >> > >> be an enhancement in the future, but for now at least the tests will
> >> > >> pass.
> >> > >>
> >> > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> >> > >> ---
> >> > >>  app/test/test_acl.c             | 62 +++++++++++++--------------------
> >> > >>  lib/librte_acl/Makefile         |  1 +
> >> > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> >> > >>  lib/librte_acl/meson.build      |  4 +--
> >> > >>  4 files changed, 73 insertions(+), 40 deletions(-)
> >> > >>  create mode 100644 lib/librte_acl/acl_run_notsup.c
> >> > >>
> >> > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
> >> > >> index b1f75d1bc..c44faa251 100644
> >> > >> --- a/app/test/test_acl.c
> >> > >> +++ b/app/test/test_acl.c
> >> > >> @@ -408,6 +408,9 @@ test_classify(void)
> >> > >>  		return -1;
> >> > >>  	}
> >> > >>
> >> > >> +	/* Always use the scalar testing for now. */
> >> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> >> > >> +
> >> > >>  	ret = 0;
> >> > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> >> > >>
> >> > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> >> > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> >> > >>  		data[i] = (uint8_t *)&test_data[i];
> >> > >>
> >> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> >> > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> >> > >>  		rte_acl_reset(acx);
> >> > >>  		ret = test_classify_buid(acx, test_rules, i + 1);
> >> > >> @@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
> >> > >>  		return -1;
> >> > >>  	}
> >> > >>
> >> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> >> > >> +
> >> > >
> >> > > As I understand here and above, on x86 you replaced default algo (SSE, AVX2)
> >> > > with scalar one, right?
> >> > > That looks like reduction of test coverage for x86.
> >> >
> >> > In one way, you're right.  However, the tests weren't testing what they
> >> > purported anyway.
> >> 
> >> Could you explain a bit more here?
> >> What I am seeing: tests were running bot sse(or avx2) and scalar classify() method.
> >> Now they always running scalar only.
> >> To me it definitely looks like reduction in coverage.
> >> 
> >> >  Actually, it's just a shift I think (previously, it
> >> > would have tested the AVX2 but I don't see AVX2 having a fallback into
> >> > the SSE code - unlike the SSE code falling back into scalar).
> >> 
> >> Not sure I understand you here.
> >> What fallback for AVX2 you expect that you think is missing?
> >> 
> >> >
> >> > The tests were failing for a number of reasons when built with meson,
> >> 
> >> Ok, but with legacy build system (make) on x86 all tests passes, right?
> >> So the problem is in new build system, not in the test itself.
> >> Why we should compromise our test coverage to make it work with
> >> new tools?
> >> That just hides the problem without fixing it.
> >> Instead I think the build system needs to be fixed.
> >> Looking at it a bit closely, for .so meson+ninja generates code with
> >> correct version of the function:
> >> 
> >> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep acl_classify_sse
> >> 000000000000fa50 t rte_acl_classify_sse
> >> 
> >> So for 'meson -Ddefault_library=shared'
> >> acl_autotest passes without the problem.
> >> 
> >> Though for static lib we have both:
> >> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep acl_classify_sse
> >> 0000000000000000 W rte_acl_classify_sse
> >> 0000000000004880 T rte_acl_classify_sse
> >> 
> >> And then linker pickups the wrong one:
> >> nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep acl_classify_sse
> >> 00000000005f6100 W rte_acl_classify_sse
> >> 
> >> While for make:
> >> $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep acl_classify_sse
> >> 0000000000000000 W rte_acl_classify_sse
> >> 0000000000004880 T rte_acl_classify_sse
> >> $ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
> >> 0000000000240440 T rte_acl_classify_sse
> >> 
> >> Linker pickups the right one.
> >
> > And the changes below make linker to pick-up the proper version of the function
> > and make acl_autotest to pass for static build too.
> >
> > diff --git a/app/test/meson.build b/app/test/meson.build
> > index 867cc5863..4364be932 100644
> > --- a/app/test/meson.build
> > +++ b/app/test/meson.build
> > @@ -328,6 +328,7 @@ test_dep_objs += cc.find_library('execinfo', required: false)
> >  link_libs = []
> >  if get_option('default_library') == 'static'
> >         link_libs = dpdk_drivers
> > +       link_libs += dpdk_static_libraries
> >  endif
> >
> >  if get_option('tests')
> > diff --git a/meson.build b/meson.build
> > index a96486597..df1e1c41c 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -62,6 +62,7 @@ configure_file(output: build_cfg,
> >  # for static builds, include the drivers as libs and we need to "whole-archive"
> >  # them.
> >  dpdk_drivers = ['-Wl,--whole-archive'] + dpdk_drivers + ['-Wl,--no-whole-archive']
> > +dpdk_static_libraries = ['-Wl,--whole-archive'] + dpdk_static_libraries + ['-Wl,--no-whole-archive']
> >
> > Not saying that's the proper patch, but just to prove that linking librte_acl.a
> > with '--whole-archive' does fix the problem.
> > Bruce, could you point is the best way to fix things here
> > (my meson knowledge is very limited)?
> > Do we need extra container here as 'whole_archive_static_libraries[]' or so?
> > Thanks
> > Konstantin
> 
> Okay - I'll look at this part more.  I think I went down the path of
> explicitly setting these because the comments didn't match with what was
> occuring (for example, in the section that I changed that loops through
> all versions, only the AVX2 and Scalar were being tested on my system,
> while the comment implied SSE).
> 
> I also believe that I split out the functions because of the linking
> issue (I guess the way the linker resolves the functions works properly
> when the weak versions are in a different translation unit)?  I'll spend
> some time trying to get it working in a different way.
> 
> Regardless, this wasn't ready for posting as 'PATCH' - I meant it as
> RFC.  I don't intend to change the first two patches, though.
> 
> And thank you for the all the feedback!
> 
I've dug into this a bit, and I'm doing up a patch to remove the use of
weak symbols in our libraries (note, just libs, not drivers) entirely.
That's fairly easy to do, and not a big change, but should make this
problem go away.

/Bruce

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-10 13:24             ` Bruce Richardson
@ 2019-04-10 13:24               ` Bruce Richardson
  2019-04-10 13:46               ` Bruce Richardson
  1 sibling, 0 replies; 44+ messages in thread
From: Bruce Richardson @ 2019-04-10 13:24 UTC (permalink / raw)
  To: Aaron Conole
  Cc: Ananyev, Konstantin, dev, Jerin Jacob, Gavin Hu, Michael Santana

On Wed, Apr 10, 2019 at 09:10:25AM -0400, Aaron Conole wrote:
> "Ananyev, Konstantin" <konstantin.ananyev@intel.com> writes:
> 
> >> 
> >> > > Hi Aaron,
> >> > >
> >> > >>
> >> > >> This makes the tests pass, and also ensures that on platforms where the
> >> > >> testing is supported, we can properly test the implementation specific
> >> > >> code.  One edge case is when we run on x86_64 systems that don't support
> >> > >> AVX2, but where the compiler can generate such instructions.  That could
> >> > >> be an enhancement in the future, but for now at least the tests will
> >> > >> pass.
> >> > >>
> >> > >> Signed-off-by: Aaron Conole <aconole@redhat.com>
> >> > >> ---
> >> > >>  app/test/test_acl.c             | 62 +++++++++++++--------------------
> >> > >>  lib/librte_acl/Makefile         |  1 +
> >> > >>  lib/librte_acl/acl_run_notsup.c | 46 ++++++++++++++++++++++++
> >> > >>  lib/librte_acl/meson.build      |  4 +--
> >> > >>  4 files changed, 73 insertions(+), 40 deletions(-)
> >> > >>  create mode 100644 lib/librte_acl/acl_run_notsup.c
> >> > >>
> >> > >> diff --git a/app/test/test_acl.c b/app/test/test_acl.c
> >> > >> index b1f75d1bc..c44faa251 100644
> >> > >> --- a/app/test/test_acl.c
> >> > >> +++ b/app/test/test_acl.c
> >> > >> @@ -408,6 +408,9 @@ test_classify(void)
> >> > >>  		return -1;
> >> > >>  	}
> >> > >>
> >> > >> +	/* Always use the scalar testing for now. */
> >> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> >> > >> +
> >> > >>  	ret = 0;
> >> > >>  	for (i = 0; i != TEST_CLASSIFY_ITER; i++) {
> >> > >>
> >> > >> @@ -547,6 +550,7 @@ test_build_ports_range(void)
> >> > >>  	for (i = 0; i != RTE_DIM(test_data); i++)
> >> > >>  		data[i] = (uint8_t *)&test_data[i];
> >> > >>
> >> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> >> > >>  	for (i = 0; i != RTE_DIM(test_rules); i++) {
> >> > >>  		rte_acl_reset(acx);
> >> > >>  		ret = test_classify_buid(acx, test_rules, i + 1);
> >> > >> @@ -911,6 +915,8 @@ test_convert_rules(const char *desc,
> >> > >>  		return -1;
> >> > >>  	}
> >> > >>
> >> > >> +	rte_acl_set_ctx_classify(acx, RTE_ACL_CLASSIFY_SCALAR);
> >> > >> +
> >> > >
> >> > > As I understand here and above, on x86 you replaced default algo (SSE, AVX2)
> >> > > with scalar one, right?
> >> > > That looks like reduction of test coverage for x86.
> >> >
> >> > In one way, you're right.  However, the tests weren't testing what they
> >> > purported anyway.
> >> 
> >> Could you explain a bit more here?
> >> What I am seeing: tests were running bot sse(or avx2) and scalar classify() method.
> >> Now they always running scalar only.
> >> To me it definitely looks like reduction in coverage.
> >> 
> >> >  Actually, it's just a shift I think (previously, it
> >> > would have tested the AVX2 but I don't see AVX2 having a fallback into
> >> > the SSE code - unlike the SSE code falling back into scalar).
> >> 
> >> Not sure I understand you here.
> >> What fallback for AVX2 you expect that you think is missing?
> >> 
> >> >
> >> > The tests were failing for a number of reasons when built with meson,
> >> 
> >> Ok, but with legacy build system (make) on x86 all tests passes, right?
> >> So the problem is in new build system, not in the test itself.
> >> Why we should compromise our test coverage to make it work with
> >> new tools?
> >> That just hides the problem without fixing it.
> >> Instead I think the build system needs to be fixed.
> >> Looking at it a bit closely, for .so meson+ninja generates code with
> >> correct version of the function:
> >> 
> >> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.so.2 | grep acl_classify_sse
> >> 000000000000fa50 t rte_acl_classify_sse
> >> 
> >> So for 'meson -Ddefault_library=shared'
> >> acl_autotest passes without the problem.
> >> 
> >> Though for static lib we have both:
> >> nm x86_64-native-linuxapp-gcc-meson/lib/librte_acl.a | grep acl_classify_sse
> >> 0000000000000000 W rte_acl_classify_sse
> >> 0000000000004880 T rte_acl_classify_sse
> >> 
> >> And then linker pickups the wrong one:
> >> nm x86_64-native-linuxapp-gcc-meson/app/test/dpdk-test | grep acl_classify_sse
> >> 00000000005f6100 W rte_acl_classify_sse
> >> 
> >> While for make:
> >> $ nm x86_64-native-linuxapp-gcc-aesmb/lib/librte_acl.a | grep acl_classify_sse
> >> 0000000000000000 W rte_acl_classify_sse
> >> 0000000000004880 T rte_acl_classify_sse
> >> $ nm x86_64-native-linuxapp-gcc-aesmb/app/test | grep acl_classify_sse
> >> 0000000000240440 T rte_acl_classify_sse
> >> 
> >> Linker pickups the right one.
> >
> > And the changes below make linker to pick-up the proper version of the function
> > and make acl_autotest to pass for static build too.
> >
> > diff --git a/app/test/meson.build b/app/test/meson.build
> > index 867cc5863..4364be932 100644
> > --- a/app/test/meson.build
> > +++ b/app/test/meson.build
> > @@ -328,6 +328,7 @@ test_dep_objs += cc.find_library('execinfo', required: false)
> >  link_libs = []
> >  if get_option('default_library') == 'static'
> >         link_libs = dpdk_drivers
> > +       link_libs += dpdk_static_libraries
> >  endif
> >
> >  if get_option('tests')
> > diff --git a/meson.build b/meson.build
> > index a96486597..df1e1c41c 100644
> > --- a/meson.build
> > +++ b/meson.build
> > @@ -62,6 +62,7 @@ configure_file(output: build_cfg,
> >  # for static builds, include the drivers as libs and we need to "whole-archive"
> >  # them.
> >  dpdk_drivers = ['-Wl,--whole-archive'] + dpdk_drivers + ['-Wl,--no-whole-archive']
> > +dpdk_static_libraries = ['-Wl,--whole-archive'] + dpdk_static_libraries + ['-Wl,--no-whole-archive']
> >
> > Not saying that's the proper patch, but just to prove that linking librte_acl.a
> > with '--whole-archive' does fix the problem.
> > Bruce, could you point is the best way to fix things here
> > (my meson knowledge is very limited)?
> > Do we need extra container here as 'whole_archive_static_libraries[]' or so?
> > Thanks
> > Konstantin
> 
> Okay - I'll look at this part more.  I think I went down the path of
> explicitly setting these because the comments didn't match with what was
> occuring (for example, in the section that I changed that loops through
> all versions, only the AVX2 and Scalar were being tested on my system,
> while the comment implied SSE).
> 
> I also believe that I split out the functions because of the linking
> issue (I guess the way the linker resolves the functions works properly
> when the weak versions are in a different translation unit)?  I'll spend
> some time trying to get it working in a different way.
> 
> Regardless, this wasn't ready for posting as 'PATCH' - I meant it as
> RFC.  I don't intend to change the first two patches, though.
> 
> And thank you for the all the feedback!
> 
I've dug into this a bit, and I'm doing up a patch to remove the use of
weak symbols in our libraries (note, just libs, not drivers) entirely.
That's fairly easy to do, and not a big change, but should make this
problem go away.

/Bruce

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-10 13:24             ` Bruce Richardson
  2019-04-10 13:24               ` Bruce Richardson
@ 2019-04-10 13:46               ` Bruce Richardson
  2019-04-10 13:46                 ` Bruce Richardson
  1 sibling, 1 reply; 44+ messages in thread
From: Bruce Richardson @ 2019-04-10 13:46 UTC (permalink / raw)
  To: Aaron Conole
  Cc: Ananyev, Konstantin, dev, Jerin Jacob, Gavin Hu, Michael Santana

On Wed, Apr 10, 2019 at 02:24:56PM +0100, Bruce Richardson wrote:
> On Wed, Apr 10, 2019 at 09:10:25AM -0400, Aaron Conole wrote:
> > 
> > Okay - I'll look at this part more.  I think I went down the path of
> > explicitly setting these because the comments didn't match with what was
> > occuring (for example, in the section that I changed that loops through
> > all versions, only the AVX2 and Scalar were being tested on my system,
> > while the comment implied SSE).
> > 
> > I also believe that I split out the functions because of the linking
> > issue (I guess the way the linker resolves the functions works properly
> > when the weak versions are in a different translation unit)?  I'll spend
> > some time trying to get it working in a different way.
> > 
> > Regardless, this wasn't ready for posting as 'PATCH' - I meant it as
> > RFC.  I don't intend to change the first two patches, though.
> > 
> > And thank you for the all the feedback!
> > 
> I've dug into this a bit, and I'm doing up a patch to remove the use of
> weak symbols in our libraries (note, just libs, not drivers) entirely.
> That's fairly easy to do, and not a big change, but should make this
> problem go away.
> 
> /Bruce

Ref: http://patches.dpdk.org/project/dpdk/list/?series=4242

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [PATCH 3/3] acl: adjust the tests
  2019-04-10 13:46               ` Bruce Richardson
@ 2019-04-10 13:46                 ` Bruce Richardson
  0 siblings, 0 replies; 44+ messages in thread
From: Bruce Richardson @ 2019-04-10 13:46 UTC (permalink / raw)
  To: Aaron Conole
  Cc: Ananyev, Konstantin, dev, Jerin Jacob, Gavin Hu, Michael Santana

On Wed, Apr 10, 2019 at 02:24:56PM +0100, Bruce Richardson wrote:
> On Wed, Apr 10, 2019 at 09:10:25AM -0400, Aaron Conole wrote:
> > 
> > Okay - I'll look at this part more.  I think I went down the path of
> > explicitly setting these because the comments didn't match with what was
> > occuring (for example, in the section that I changed that loops through
> > all versions, only the AVX2 and Scalar were being tested on my system,
> > while the comment implied SSE).
> > 
> > I also believe that I split out the functions because of the linking
> > issue (I guess the way the linker resolves the functions works properly
> > when the weak versions are in a different translation unit)?  I'll spend
> > some time trying to get it working in a different way.
> > 
> > Regardless, this wasn't ready for posting as 'PATCH' - I meant it as
> > RFC.  I don't intend to change the first two patches, though.
> > 
> > And thank you for the all the feedback!
> > 
> I've dug into this a bit, and I'm doing up a patch to remove the use of
> weak symbols in our libraries (note, just libs, not drivers) entirely.
> That's fairly easy to do, and not a big change, but should make this
> problem go away.
> 
> /Bruce

Ref: http://patches.dpdk.org/project/dpdk/list/?series=4242

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
  2019-04-08 18:24 ` [dpdk-dev] [PATCH 1/3] acl: fix arm argument types Aaron Conole
  2019-04-08 18:24   ` Aaron Conole
@ 2019-04-10 14:39   ` Jerin Jacob Kollanukkaran
  2019-04-10 14:39     ` Jerin Jacob Kollanukkaran
                       ` (2 more replies)
  1 sibling, 3 replies; 44+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-04-10 14:39 UTC (permalink / raw)
  To: dev, aconole; +Cc: gavin.hu, konstantin.ananyev

On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
> -------------------------------------------------------------------
> ---
> Compiler complains of argument type mismatch, like:

Can you share more details on how to reproduce this issue?

We already have
CFLAGS_acl_run_neon.o += -flax-vector-conversions
in the Makefile.

If you are taking out -flax-vector-conversions the correct way to
fix will be use vreinterpret*.

For me the code looks clean, If unnecessary casting is avoided.


> 
>    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
>    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-vector-
> conversions
>       to permit conversions between vectors with differing element
> types
>       or numbers of subparts
>      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
>      ^
>    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible type
> for
>       argument 2 of ‘vbicq_s32’
> 
> Signed-off-by: Aaron Conole <aconole@redhat.com>
> ---
>  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++-------------
> --
>  1 file changed, 27 insertions(+), 19 deletions(-)
> 
> 
>  
>  /*
> @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> const uint8_t **data,
>  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
>  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
>  
> +	memset(&input0, 0, sizeof(input0));
> +	memset(&input1, 0, sizeof(input1));

Why this memset only required for arm64? If it real issue, Shouldn't
it required for x86 and ppc ?



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
  2019-04-10 14:39   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
@ 2019-04-10 14:39     ` Jerin Jacob Kollanukkaran
  2019-04-10 15:52     ` Aaron Conole
  2019-06-05 15:16     ` Jerin Jacob Kollanukkaran
  2 siblings, 0 replies; 44+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-04-10 14:39 UTC (permalink / raw)
  To: dev, aconole; +Cc: gavin.hu, konstantin.ananyev

On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
> -------------------------------------------------------------------
> ---
> Compiler complains of argument type mismatch, like:

Can you share more details on how to reproduce this issue?

We already have
CFLAGS_acl_run_neon.o += -flax-vector-conversions
in the Makefile.

If you are taking out -flax-vector-conversions the correct way to
fix will be use vreinterpret*.

For me the code looks clean, If unnecessary casting is avoided.


> 
>    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
>    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-vector-
> conversions
>       to permit conversions between vectors with differing element
> types
>       or numbers of subparts
>      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
>      ^
>    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible type
> for
>       argument 2 of ‘vbicq_s32’
> 
> Signed-off-by: Aaron Conole <aconole@redhat.com>
> ---
>  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++-------------
> --
>  1 file changed, 27 insertions(+), 19 deletions(-)
> 
> 
>  
>  /*
> @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> const uint8_t **data,
>  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
>  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
>  
> +	memset(&input0, 0, sizeof(input0));
> +	memset(&input1, 0, sizeof(input1));

Why this memset only required for arm64? If it real issue, Shouldn't
it required for x86 and ppc ?



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
  2019-04-10 14:39   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
  2019-04-10 14:39     ` Jerin Jacob Kollanukkaran
@ 2019-04-10 15:52     ` Aaron Conole
  2019-04-10 15:52       ` Aaron Conole
  2019-04-10 16:07       ` Jerin Jacob Kollanukkaran
  2019-06-05 15:16     ` Jerin Jacob Kollanukkaran
  2 siblings, 2 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-10 15:52 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran; +Cc: dev, gavin.hu, konstantin.ananyev

Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:

> On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
>> -------------------------------------------------------------------
>> ---
>> Compiler complains of argument type mismatch, like:
>
> Can you share more details on how to reproduce this issue?

It will be generated using the meson build after enabling the neon
extension support (which isn't currently happening on ARM using meson as
the build environment).

> We already have
> CFLAGS_acl_run_neon.o += -flax-vector-conversions
> in the Makefile.
>
> If you are taking out -flax-vector-conversions the correct way to
> fix will be use vreinterpret*.
>
> For me the code looks clean, If unnecessary casting is avoided.

I agree.  I merely make explicit the casts that the compiler will be
implicitly introducing.

>
>> 
>>    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
>>    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-vector-
>> conversions
>>       to permit conversions between vectors with differing element
>> types
>>       or numbers of subparts
>>      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
>>      ^
>>    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible type
>> for
>>       argument 2 of ‘vbicq_s32’
>> 
>> Signed-off-by: Aaron Conole <aconole@redhat.com>
>> ---
>>  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++-------------
>> --
>>  1 file changed, 27 insertions(+), 19 deletions(-)
>> 
>> 
>>  
>>  /*
>> @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
>> const uint8_t **data,
>>  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
>>  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
>>  
>> +	memset(&input0, 0, sizeof(input0));
>> +	memset(&input1, 0, sizeof(input1));
>
> Why this memset only required for arm64? If it real issue, Shouldn't
> it required for x86 and ppc ?

No.  Please see the following lines (which is due to the ARM neon
intrinsic for setting individual lanes):

	while (flows.started > 0) {
		/* Gather 4 bytes of input data for each stream. */
		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);

Note: the first time through this loop, input0 and input1 appear on the
rhs of the assignment before appearing on the lhs.  This will generate
an uninitialized value warning, even though the assignments are to
individual lanes of the vector.

I squelched the warning from the compiler in the most brute-force way
possible.  Perhaps it would be better to use a static initialization for
the vector but this code was intended to be RFC and to generate
feedback.

I guess one alternate approach could be:

   static const int32x4_t ZERO_VEC;
   int32x4_t input0 = ZERO_VEC, input1 = ZERO_VEC;

   ...

   int32x4_t input = ZERO_VEC;

This would have the benefit of keeping the initializer as 'fast' as
possible (although I recall a memset under a certain size threshold is
the same effect, but not certain).

Either way, I prefer it to squelching the warning, since the warning
has been found to catch legitimate errors many times.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
  2019-04-10 15:52     ` Aaron Conole
@ 2019-04-10 15:52       ` Aaron Conole
  2019-04-10 16:07       ` Jerin Jacob Kollanukkaran
  1 sibling, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-10 15:52 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran; +Cc: dev, gavin.hu, konstantin.ananyev

Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:

> On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
>> -------------------------------------------------------------------
>> ---
>> Compiler complains of argument type mismatch, like:
>
> Can you share more details on how to reproduce this issue?

It will be generated using the meson build after enabling the neon
extension support (which isn't currently happening on ARM using meson as
the build environment).

> We already have
> CFLAGS_acl_run_neon.o += -flax-vector-conversions
> in the Makefile.
>
> If you are taking out -flax-vector-conversions the correct way to
> fix will be use vreinterpret*.
>
> For me the code looks clean, If unnecessary casting is avoided.

I agree.  I merely make explicit the casts that the compiler will be
implicitly introducing.

>
>> 
>>    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
>>    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-vector-
>> conversions
>>       to permit conversions between vectors with differing element
>> types
>>       or numbers of subparts
>>      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
>>      ^
>>    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible type
>> for
>>       argument 2 of ‘vbicq_s32’
>> 
>> Signed-off-by: Aaron Conole <aconole@redhat.com>
>> ---
>>  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++-------------
>> --
>>  1 file changed, 27 insertions(+), 19 deletions(-)
>> 
>> 
>>  
>>  /*
>> @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
>> const uint8_t **data,
>>  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
>>  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
>>  
>> +	memset(&input0, 0, sizeof(input0));
>> +	memset(&input1, 0, sizeof(input1));
>
> Why this memset only required for arm64? If it real issue, Shouldn't
> it required for x86 and ppc ?

No.  Please see the following lines (which is due to the ARM neon
intrinsic for setting individual lanes):

	while (flows.started > 0) {
		/* Gather 4 bytes of input data for each stream. */
		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);

Note: the first time through this loop, input0 and input1 appear on the
rhs of the assignment before appearing on the lhs.  This will generate
an uninitialized value warning, even though the assignments are to
individual lanes of the vector.

I squelched the warning from the compiler in the most brute-force way
possible.  Perhaps it would be better to use a static initialization for
the vector but this code was intended to be RFC and to generate
feedback.

I guess one alternate approach could be:

   static const int32x4_t ZERO_VEC;
   int32x4_t input0 = ZERO_VEC, input1 = ZERO_VEC;

   ...

   int32x4_t input = ZERO_VEC;

This would have the benefit of keeping the initializer as 'fast' as
possible (although I recall a memset under a certain size threshold is
the same effect, but not certain).

Either way, I prefer it to squelching the warning, since the warning
has been found to catch legitimate errors many times.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
  2019-04-10 15:52     ` Aaron Conole
  2019-04-10 15:52       ` Aaron Conole
@ 2019-04-10 16:07       ` Jerin Jacob Kollanukkaran
  2019-04-10 16:07         ` Jerin Jacob Kollanukkaran
  2019-04-10 17:20         ` Aaron Conole
  1 sibling, 2 replies; 44+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-04-10 16:07 UTC (permalink / raw)
  To: aconole; +Cc: gavin.hu, dev, konstantin.ananyev

On Wed, 2019-04-10 at 11:52 -0400, Aaron Conole wrote:
> Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:
> 
> > On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
> > > ---------------------------------------------------------------
> > > ----
> > > ---
> > > Compiler complains of argument type mismatch, like:
> > 
> > Can you share more details on how to reproduce this issue?
> 
> It will be generated using the meson build after enabling the neon
> extension support (which isn't currently happening on ARM using meson
> as
> the build environment).


Can you share the patch to enable this for testing.

Since the additional memcpy in fastpath, I need to check the overhead
and check the possibility to avoid the memcpy to case.


> 
> > We already have
> > CFLAGS_acl_run_neon.o += -flax-vector-conversions
> > in the Makefile.
> > 
> > If you are taking out -flax-vector-conversions the correct way to
> > fix will be use vreinterpret*.
> > 
> > For me the code looks clean, If unnecessary casting is avoided.
> 
> I agree.  I merely make explicit the casts that the compiler will be
> implicitly introducing.
> 
> > >    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
> > >    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-
> > > vector-
> > > conversions
> > >       to permit conversions between vectors with differing
> > > element
> > > types
> > >       or numbers of subparts
> > >      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
> > >      ^
> > >    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible
> > > type
> > > for
> > >       argument 2 of ‘vbicq_s32’
> > > 
> > > Signed-off-by: Aaron Conole <aconole@redhat.com>
> > > ---
> > >  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++---------
> > > ----
> > > --
> > >  1 file changed, 27 insertions(+), 19 deletions(-)
> > > 
> > > 
> > >  
> > >  /*
> > > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> > > const uint8_t **data,
> > >  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
> > >  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
> > >  
> > > +	memset(&input0, 0, sizeof(input0));
> > > +	memset(&input1, 0, sizeof(input1));
> > 
> > Why this memset only required for arm64? If it real issue,
> > Shouldn't
> > it required for x86 and ppc ?
> 
> No.  Please see the following lines (which is due to the ARM neon
> intrinsic for setting individual lanes):
> 
> 	while (flows.started > 0) {
> 		/* Gather 4 bytes of input data for each stream. */
> 		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> input0, 0);
> 		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> input1, 0);
> 
> Note: the first time through this loop, input0 and input1 appear on
> the
> rhs of the assignment before appearing on the lhs.  This will
> generate
> an uninitialized value warning, even though the assignments are to
> individual lanes of the vector.
> 
> I squelched the warning from the compiler in the most brute-force way
> possible.  Perhaps it would be better to use a static initialization
> for
> the vector but this code was intended to be RFC and to generate
> feedback.
> 
> I guess one alternate approach could be:
> 
>    static const int32x4_t ZERO_VEC;
>    int32x4_t input0 = ZERO_VEC, input1 = ZERO_VEC;
> 
>    ...
> 
>    int32x4_t input = ZERO_VEC;
> 
> This would have the benefit of keeping the initializer as 'fast' as
> possible (although I recall a memset under a certain size threshold
> is
> the same effect, but not certain).
> 
> Either way, I prefer it to squelching the warning, since the warning
> has been found to catch legitimate errors many times.

I will get back to this after reproducing the issue locally.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
  2019-04-10 16:07       ` Jerin Jacob Kollanukkaran
@ 2019-04-10 16:07         ` Jerin Jacob Kollanukkaran
  2019-04-10 17:20         ` Aaron Conole
  1 sibling, 0 replies; 44+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-04-10 16:07 UTC (permalink / raw)
  To: aconole; +Cc: gavin.hu, dev, konstantin.ananyev

On Wed, 2019-04-10 at 11:52 -0400, Aaron Conole wrote:
> Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:
> 
> > On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
> > > ---------------------------------------------------------------
> > > ----
> > > ---
> > > Compiler complains of argument type mismatch, like:
> > 
> > Can you share more details on how to reproduce this issue?
> 
> It will be generated using the meson build after enabling the neon
> extension support (which isn't currently happening on ARM using meson
> as
> the build environment).


Can you share the patch to enable this for testing.

Since the additional memcpy in fastpath, I need to check the overhead
and check the possibility to avoid the memcpy to case.


> 
> > We already have
> > CFLAGS_acl_run_neon.o += -flax-vector-conversions
> > in the Makefile.
> > 
> > If you are taking out -flax-vector-conversions the correct way to
> > fix will be use vreinterpret*.
> > 
> > For me the code looks clean, If unnecessary casting is avoided.
> 
> I agree.  I merely make explicit the casts that the compiler will be
> implicitly introducing.
> 
> > >    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
> > >    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-
> > > vector-
> > > conversions
> > >       to permit conversions between vectors with differing
> > > element
> > > types
> > >       or numbers of subparts
> > >      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
> > >      ^
> > >    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible
> > > type
> > > for
> > >       argument 2 of ‘vbicq_s32’
> > > 
> > > Signed-off-by: Aaron Conole <aconole@redhat.com>
> > > ---
> > >  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++---------
> > > ----
> > > --
> > >  1 file changed, 27 insertions(+), 19 deletions(-)
> > > 
> > > 
> > >  
> > >  /*
> > > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
> > > const uint8_t **data,
> > >  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
> > >  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
> > >  
> > > +	memset(&input0, 0, sizeof(input0));
> > > +	memset(&input1, 0, sizeof(input1));
> > 
> > Why this memset only required for arm64? If it real issue,
> > Shouldn't
> > it required for x86 and ppc ?
> 
> No.  Please see the following lines (which is due to the ARM neon
> intrinsic for setting individual lanes):
> 
> 	while (flows.started > 0) {
> 		/* Gather 4 bytes of input data for each stream. */
> 		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
> input0, 0);
> 		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
> input1, 0);
> 
> Note: the first time through this loop, input0 and input1 appear on
> the
> rhs of the assignment before appearing on the lhs.  This will
> generate
> an uninitialized value warning, even though the assignments are to
> individual lanes of the vector.
> 
> I squelched the warning from the compiler in the most brute-force way
> possible.  Perhaps it would be better to use a static initialization
> for
> the vector but this code was intended to be RFC and to generate
> feedback.
> 
> I guess one alternate approach could be:
> 
>    static const int32x4_t ZERO_VEC;
>    int32x4_t input0 = ZERO_VEC, input1 = ZERO_VEC;
> 
>    ...
> 
>    int32x4_t input = ZERO_VEC;
> 
> This would have the benefit of keeping the initializer as 'fast' as
> possible (although I recall a memset under a certain size threshold
> is
> the same effect, but not certain).
> 
> Either way, I prefer it to squelching the warning, since the warning
> has been found to catch legitimate errors many times.

I will get back to this after reproducing the issue locally.



^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
  2019-04-10 16:07       ` Jerin Jacob Kollanukkaran
  2019-04-10 16:07         ` Jerin Jacob Kollanukkaran
@ 2019-04-10 17:20         ` Aaron Conole
  2019-04-10 17:20           ` Aaron Conole
  2019-04-30 12:57           ` Aaron Conole
  1 sibling, 2 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-10 17:20 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran; +Cc: gavin.hu, dev, konstantin.ananyev

Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:

> On Wed, 2019-04-10 at 11:52 -0400, Aaron Conole wrote:
>> Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:
>> 
>> > On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
>> > > ---------------------------------------------------------------
>> > > ----
>> > > ---
>> > > Compiler complains of argument type mismatch, like:
>> > 
>> > Can you share more details on how to reproduce this issue?
>> 
>> It will be generated using the meson build after enabling the neon
>> extension support (which isn't currently happening on ARM using meson
>> as
>> the build environment).
>
>
> Can you share the patch to enable this for testing.

Sure - I'm using these:

(needed)
1/3 - http://mails.dpdk.org/archives/dev/2019-March/128304.html
2/3 - http://mails.dpdk.org/archives/dev/2019-March/128305.html

(following only needed for travis support)
3/3 - http://mails.dpdk.org/archives/dev/2019-March/128306.html

-Aaron

> Since the additional memcpy in fastpath, I need to check the overhead
> and check the possibility to avoid the memcpy to case.
>
>
>> 
>> > We already have
>> > CFLAGS_acl_run_neon.o += -flax-vector-conversions
>> > in the Makefile.
>> > 
>> > If you are taking out -flax-vector-conversions the correct way to
>> > fix will be use vreinterpret*.
>> > 
>> > For me the code looks clean, If unnecessary casting is avoided.
>> 
>> I agree.  I merely make explicit the casts that the compiler will be
>> implicitly introducing.
>> 
>> > >    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
>> > >    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-
>> > > vector-
>> > > conversions
>> > >       to permit conversions between vectors with differing
>> > > element
>> > > types
>> > >       or numbers of subparts
>> > >      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
>> > >      ^
>> > >    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible
>> > > type
>> > > for
>> > >       argument 2 of ‘vbicq_s32’
>> > > 
>> > > Signed-off-by: Aaron Conole <aconole@redhat.com>
>> > > ---
>> > >  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++---------
>> > > ----
>> > > --
>> > >  1 file changed, 27 insertions(+), 19 deletions(-)
>> > > 
>> > > 
>> > >  
>> > >  /*
>> > > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
>> > > const uint8_t **data,
>> > >  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
>> > >  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
>> > >  
>> > > +	memset(&input0, 0, sizeof(input0));
>> > > +	memset(&input1, 0, sizeof(input1));
>> > 
>> > Why this memset only required for arm64? If it real issue,
>> > Shouldn't
>> > it required for x86 and ppc ?
>> 
>> No.  Please see the following lines (which is due to the ARM neon
>> intrinsic for setting individual lanes):
>> 
>> 	while (flows.started > 0) {
>> 		/* Gather 4 bytes of input data for each stream. */
>> 		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>> input0, 0);
>> 		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
>> input1, 0);
>> 
>> Note: the first time through this loop, input0 and input1 appear on
>> the
>> rhs of the assignment before appearing on the lhs.  This will
>> generate
>> an uninitialized value warning, even though the assignments are to
>> individual lanes of the vector.
>> 
>> I squelched the warning from the compiler in the most brute-force way
>> possible.  Perhaps it would be better to use a static initialization
>> for
>> the vector but this code was intended to be RFC and to generate
>> feedback.
>> 
>> I guess one alternate approach could be:
>> 
>>    static const int32x4_t ZERO_VEC;
>>    int32x4_t input0 = ZERO_VEC, input1 = ZERO_VEC;
>> 
>>    ...
>> 
>>    int32x4_t input = ZERO_VEC;
>> 
>> This would have the benefit of keeping the initializer as 'fast' as
>> possible (although I recall a memset under a certain size threshold
>> is
>> the same effect, but not certain).
>> 
>> Either way, I prefer it to squelching the warning, since the warning
>> has been found to catch legitimate errors many times.
>
> I will get back to this after reproducing the issue locally.

Awesome - thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
  2019-04-10 17:20         ` Aaron Conole
@ 2019-04-10 17:20           ` Aaron Conole
  2019-04-30 12:57           ` Aaron Conole
  1 sibling, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-10 17:20 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran; +Cc: gavin.hu, dev, konstantin.ananyev

Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:

> On Wed, 2019-04-10 at 11:52 -0400, Aaron Conole wrote:
>> Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:
>> 
>> > On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
>> > > ---------------------------------------------------------------
>> > > ----
>> > > ---
>> > > Compiler complains of argument type mismatch, like:
>> > 
>> > Can you share more details on how to reproduce this issue?
>> 
>> It will be generated using the meson build after enabling the neon
>> extension support (which isn't currently happening on ARM using meson
>> as
>> the build environment).
>
>
> Can you share the patch to enable this for testing.

Sure - I'm using these:

(needed)
1/3 - http://mails.dpdk.org/archives/dev/2019-March/128304.html
2/3 - http://mails.dpdk.org/archives/dev/2019-March/128305.html

(following only needed for travis support)
3/3 - http://mails.dpdk.org/archives/dev/2019-March/128306.html

-Aaron

> Since the additional memcpy in fastpath, I need to check the overhead
> and check the possibility to avoid the memcpy to case.
>
>
>> 
>> > We already have
>> > CFLAGS_acl_run_neon.o += -flax-vector-conversions
>> > in the Makefile.
>> > 
>> > If you are taking out -flax-vector-conversions the correct way to
>> > fix will be use vreinterpret*.
>> > 
>> > For me the code looks clean, If unnecessary casting is avoided.
>> 
>> I agree.  I merely make explicit the casts that the compiler will be
>> implicitly introducing.
>> 
>> > >    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
>> > >    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-
>> > > vector-
>> > > conversions
>> > >       to permit conversions between vectors with differing
>> > > element
>> > > types
>> > >       or numbers of subparts
>> > >      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
>> > >      ^
>> > >    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible
>> > > type
>> > > for
>> > >       argument 2 of ‘vbicq_s32’
>> > > 
>> > > Signed-off-by: Aaron Conole <aconole@redhat.com>
>> > > ---
>> > >  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++---------
>> > > ----
>> > > --
>> > >  1 file changed, 27 insertions(+), 19 deletions(-)
>> > > 
>> > > 
>> > >  
>> > >  /*
>> > > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
>> > > const uint8_t **data,
>> > >  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
>> > >  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
>> > >  
>> > > +	memset(&input0, 0, sizeof(input0));
>> > > +	memset(&input1, 0, sizeof(input1));
>> > 
>> > Why this memset only required for arm64? If it real issue,
>> > Shouldn't
>> > it required for x86 and ppc ?
>> 
>> No.  Please see the following lines (which is due to the ARM neon
>> intrinsic for setting individual lanes):
>> 
>> 	while (flows.started > 0) {
>> 		/* Gather 4 bytes of input data for each stream. */
>> 		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>> input0, 0);
>> 		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
>> input1, 0);
>> 
>> Note: the first time through this loop, input0 and input1 appear on
>> the
>> rhs of the assignment before appearing on the lhs.  This will
>> generate
>> an uninitialized value warning, even though the assignments are to
>> individual lanes of the vector.
>> 
>> I squelched the warning from the compiler in the most brute-force way
>> possible.  Perhaps it would be better to use a static initialization
>> for
>> the vector but this code was intended to be RFC and to generate
>> feedback.
>> 
>> I guess one alternate approach could be:
>> 
>>    static const int32x4_t ZERO_VEC;
>>    int32x4_t input0 = ZERO_VEC, input1 = ZERO_VEC;
>> 
>>    ...
>> 
>>    int32x4_t input = ZERO_VEC;
>> 
>> This would have the benefit of keeping the initializer as 'fast' as
>> possible (although I recall a memset under a certain size threshold
>> is
>> the same effect, but not certain).
>> 
>> Either way, I prefer it to squelching the warning, since the warning
>> has been found to catch legitimate errors many times.
>
> I will get back to this after reproducing the issue locally.

Awesome - thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
  2019-04-10 17:20         ` Aaron Conole
  2019-04-10 17:20           ` Aaron Conole
@ 2019-04-30 12:57           ` Aaron Conole
  2019-04-30 12:57             ` Aaron Conole
  1 sibling, 1 reply; 44+ messages in thread
From: Aaron Conole @ 2019-04-30 12:57 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran; +Cc: gavin.hu, dev, konstantin.ananyev

Aaron Conole <aconole@redhat.com> writes:

> Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:
>
>> On Wed, 2019-04-10 at 11:52 -0400, Aaron Conole wrote:
>>> Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:
>>> 
>>> > On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
>>> > > ---------------------------------------------------------------
>>> > > ----
>>> > > ---
>>> > > Compiler complains of argument type mismatch, like:
>>> > 
>>> > Can you share more details on how to reproduce this issue?
>>> 
>>> It will be generated using the meson build after enabling the neon
>>> extension support (which isn't currently happening on ARM using meson
>>> as
>>> the build environment).
>>
>>
>> Can you share the patch to enable this for testing.
>
> Sure - I'm using these:
>
> (needed)
> 1/3 - http://mails.dpdk.org/archives/dev/2019-March/128304.html
> 2/3 - http://mails.dpdk.org/archives/dev/2019-March/128305.html
>
> (following only needed for travis support)
> 3/3 - http://mails.dpdk.org/archives/dev/2019-March/128306.html
>
> -Aaron
>
>> Since the additional memcpy in fastpath, I need to check the overhead
>> and check the possibility to avoid the memcpy to case.

Were you able to test this?

>>
>>> 
>>> > We already have
>>> > CFLAGS_acl_run_neon.o += -flax-vector-conversions
>>> > in the Makefile.
>>> > 
>>> > If you are taking out -flax-vector-conversions the correct way to
>>> > fix will be use vreinterpret*.
>>> > 
>>> > For me the code looks clean, If unnecessary casting is avoided.
>>> 
>>> I agree.  I merely make explicit the casts that the compiler will be
>>> implicitly introducing.
>>> 
>>> > >    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
>>> > >    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-
>>> > > vector-
>>> > > conversions
>>> > >       to permit conversions between vectors with differing
>>> > > element
>>> > > types
>>> > >       or numbers of subparts
>>> > >      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
>>> > >      ^
>>> > >    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible
>>> > > type
>>> > > for
>>> > >       argument 2 of ‘vbicq_s32’
>>> > > 
>>> > > Signed-off-by: Aaron Conole <aconole@redhat.com>
>>> > > ---
>>> > >  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++---------
>>> > > ----
>>> > > --
>>> > >  1 file changed, 27 insertions(+), 19 deletions(-)
>>> > > 
>>> > > 
>>> > >  
>>> > >  /*
>>> > > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
>>> > > const uint8_t **data,
>>> > >  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
>>> > >  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
>>> > >  
>>> > > +	memset(&input0, 0, sizeof(input0));
>>> > > +	memset(&input1, 0, sizeof(input1));
>>> > 
>>> > Why this memset only required for arm64? If it real issue,
>>> > Shouldn't
>>> > it required for x86 and ppc ?
>>> 
>>> No.  Please see the following lines (which is due to the ARM neon
>>> intrinsic for setting individual lanes):
>>> 
>>> 	while (flows.started > 0) {
>>> 		/* Gather 4 bytes of input data for each stream. */
>>> 		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>>> input0, 0);
>>> 		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
>>> input1, 0);
>>> 
>>> Note: the first time through this loop, input0 and input1 appear on
>>> the
>>> rhs of the assignment before appearing on the lhs.  This will
>>> generate
>>> an uninitialized value warning, even though the assignments are to
>>> individual lanes of the vector.
>>> 
>>> I squelched the warning from the compiler in the most brute-force way
>>> possible.  Perhaps it would be better to use a static initialization
>>> for
>>> the vector but this code was intended to be RFC and to generate
>>> feedback.
>>> 
>>> I guess one alternate approach could be:
>>> 
>>>    static const int32x4_t ZERO_VEC;
>>>    int32x4_t input0 = ZERO_VEC, input1 = ZERO_VEC;
>>> 
>>>    ...
>>> 
>>>    int32x4_t input = ZERO_VEC;
>>> 
>>> This would have the benefit of keeping the initializer as 'fast' as
>>> possible (although I recall a memset under a certain size threshold
>>> is
>>> the same effect, but not certain).
>>> 
>>> Either way, I prefer it to squelching the warning, since the warning
>>> has been found to catch legitimate errors many times.
>>
>> I will get back to this after reproducing the issue locally.
>
> Awesome - thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
  2019-04-30 12:57           ` Aaron Conole
@ 2019-04-30 12:57             ` Aaron Conole
  0 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2019-04-30 12:57 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran; +Cc: gavin.hu, dev, konstantin.ananyev

Aaron Conole <aconole@redhat.com> writes:

> Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:
>
>> On Wed, 2019-04-10 at 11:52 -0400, Aaron Conole wrote:
>>> Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:
>>> 
>>> > On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
>>> > > ---------------------------------------------------------------
>>> > > ----
>>> > > ---
>>> > > Compiler complains of argument type mismatch, like:
>>> > 
>>> > Can you share more details on how to reproduce this issue?
>>> 
>>> It will be generated using the meson build after enabling the neon
>>> extension support (which isn't currently happening on ARM using meson
>>> as
>>> the build environment).
>>
>>
>> Can you share the patch to enable this for testing.
>
> Sure - I'm using these:
>
> (needed)
> 1/3 - http://mails.dpdk.org/archives/dev/2019-March/128304.html
> 2/3 - http://mails.dpdk.org/archives/dev/2019-March/128305.html
>
> (following only needed for travis support)
> 3/3 - http://mails.dpdk.org/archives/dev/2019-March/128306.html
>
> -Aaron
>
>> Since the additional memcpy in fastpath, I need to check the overhead
>> and check the possibility to avoid the memcpy to case.

Were you able to test this?

>>
>>> 
>>> > We already have
>>> > CFLAGS_acl_run_neon.o += -flax-vector-conversions
>>> > in the Makefile.
>>> > 
>>> > If you are taking out -flax-vector-conversions the correct way to
>>> > fix will be use vreinterpret*.
>>> > 
>>> > For me the code looks clean, If unnecessary casting is avoided.
>>> 
>>> I agree.  I merely make explicit the casts that the compiler will be
>>> implicitly introducing.
>>> 
>>> > >    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
>>> > >    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-
>>> > > vector-
>>> > > conversions
>>> > >       to permit conversions between vectors with differing
>>> > > element
>>> > > types
>>> > >       or numbers of subparts
>>> > >      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
>>> > >      ^
>>> > >    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible
>>> > > type
>>> > > for
>>> > >       argument 2 of ‘vbicq_s32’
>>> > > 
>>> > > Signed-off-by: Aaron Conole <aconole@redhat.com>
>>> > > ---
>>> > >  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++---------
>>> > > ----
>>> > > --
>>> > >  1 file changed, 27 insertions(+), 19 deletions(-)
>>> > > 
>>> > > 
>>> > >  
>>> > >  /*
>>> > > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx,
>>> > > const uint8_t **data,
>>> > >  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
>>> > >  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
>>> > >  
>>> > > +	memset(&input0, 0, sizeof(input0));
>>> > > +	memset(&input1, 0, sizeof(input1));
>>> > 
>>> > Why this memset only required for arm64? If it real issue,
>>> > Shouldn't
>>> > it required for x86 and ppc ?
>>> 
>>> No.  Please see the following lines (which is due to the ARM neon
>>> intrinsic for setting individual lanes):
>>> 
>>> 	while (flows.started > 0) {
>>> 		/* Gather 4 bytes of input data for each stream. */
>>> 		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0),
>>> input0, 0);
>>> 		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4),
>>> input1, 0);
>>> 
>>> Note: the first time through this loop, input0 and input1 appear on
>>> the
>>> rhs of the assignment before appearing on the lhs.  This will
>>> generate
>>> an uninitialized value warning, even though the assignments are to
>>> individual lanes of the vector.
>>> 
>>> I squelched the warning from the compiler in the most brute-force way
>>> possible.  Perhaps it would be better to use a static initialization
>>> for
>>> the vector but this code was intended to be RFC and to generate
>>> feedback.
>>> 
>>> I guess one alternate approach could be:
>>> 
>>>    static const int32x4_t ZERO_VEC;
>>>    int32x4_t input0 = ZERO_VEC, input1 = ZERO_VEC;
>>> 
>>>    ...
>>> 
>>>    int32x4_t input = ZERO_VEC;
>>> 
>>> This would have the benefit of keeping the initializer as 'fast' as
>>> possible (although I recall a memset under a certain size threshold
>>> is
>>> the same effect, but not certain).
>>> 
>>> Either way, I prefer it to squelching the warning, since the warning
>>> has been found to catch legitimate errors many times.
>>
>> I will get back to this after reproducing the issue locally.
>
> Awesome - thanks.

^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
  2019-04-10 14:39   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
  2019-04-10 14:39     ` Jerin Jacob Kollanukkaran
  2019-04-10 15:52     ` Aaron Conole
@ 2019-06-05 15:16     ` Jerin Jacob Kollanukkaran
  2019-06-05 17:09       ` Aaron Conole
  2 siblings, 1 reply; 44+ messages in thread
From: Jerin Jacob Kollanukkaran @ 2019-06-05 15:16 UTC (permalink / raw)
  To: dev, aconole; +Cc: gavin.hu, konstantin.ananyev

> -----Original Message-----
> From: Jerin Jacob Kollanukkaran
> Sent: Wednesday, April 10, 2019 8:10 PM
> To: dev@dpdk.org; aconole@redhat.com
> Cc: gavin.hu@arm.com; konstantin.ananyev@intel.com
> Subject: Re: [EXT] [PATCH 1/3] acl: fix arm argument types
> 
> On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
> > -------------------------------------------------------------------
> > ---
> > Compiler complains of argument type mismatch, like:
> 
> Can you share more details on how to reproduce this issue?
> 
> We already have
> CFLAGS_acl_run_neon.o += -flax-vector-conversions in the Makefile.
> 
> If you are taking out -flax-vector-conversions the correct way to fix will be
> use vreinterpret*.
> 
> For me the code looks clean, If unnecessary casting is avoided.


Considering the following patch is part of dpdk.org now. I think, We may not need this
patch in benefit to avoid a lot of typecasting.

https://git.dpdk.org/dpdk/commit/?id=e53ce4e4137974f46743e74bd9ab912e0166c8b1




> 
> 
> >
> >    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
> >    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-vector-
> > conversions
> >       to permit conversions between vectors with differing element
> > types
> >       or numbers of subparts
> >      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
> >      ^
> >    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible type
> > for
> >       argument 2 of ‘vbicq_s32’
> >
> > Signed-off-by: Aaron Conole <aconole@redhat.com>
> > ---
> >  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++-------------
> > --
> >  1 file changed, 27 insertions(+), 19 deletions(-)
> >
> >
> >
> >  /*
> > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
> > uint8_t **data,
> >  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
> >  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
> >
> > +	memset(&input0, 0, sizeof(input0));
> > +	memset(&input1, 0, sizeof(input1));
> 
> Why this memset only required for arm64? If it real issue, Shouldn't it
> required for x86 and ppc ?
> 


^ permalink raw reply	[flat|nested] 44+ messages in thread

* Re: [dpdk-dev] [EXT] [PATCH 1/3] acl: fix arm argument types
  2019-06-05 15:16     ` Jerin Jacob Kollanukkaran
@ 2019-06-05 17:09       ` Aaron Conole
  0 siblings, 0 replies; 44+ messages in thread
From: Aaron Conole @ 2019-06-05 17:09 UTC (permalink / raw)
  To: Jerin Jacob Kollanukkaran; +Cc: dev, gavin.hu, konstantin.ananyev

Jerin Jacob Kollanukkaran <jerinj@marvell.com> writes:

>> -----Original Message-----
>> From: Jerin Jacob Kollanukkaran
>> Sent: Wednesday, April 10, 2019 8:10 PM
>> To: dev@dpdk.org; aconole@redhat.com
>> Cc: gavin.hu@arm.com; konstantin.ananyev@intel.com
>> Subject: Re: [EXT] [PATCH 1/3] acl: fix arm argument types
>> 
>> On Mon, 2019-04-08 at 14:24 -0400, Aaron Conole wrote:
>> > -------------------------------------------------------------------
>> > ---
>> > Compiler complains of argument type mismatch, like:
>> 
>> Can you share more details on how to reproduce this issue?
>> 
>> We already have
>> CFLAGS_acl_run_neon.o += -flax-vector-conversions in the Makefile.
>> 
>> If you are taking out -flax-vector-conversions the correct way to fix will be
>> use vreinterpret*.
>> 
>> For me the code looks clean, If unnecessary casting is avoided.
>
>
> Considering the following patch is part of dpdk.org now. I think, We may not need this
> patch in benefit to avoid a lot of typecasting.
>
> https://git.dpdk.org/dpdk/commit/?id=e53ce4e4137974f46743e74bd9ab912e0166c8b1

Correct, the lax conversions aren't needed.

>
>
>
>> 
>> 
>> >
>> >    ../lib/librte_acl/acl_run_neon.h: In function ‘transition4’:
>> >    ../lib/librte_acl/acl_run_neon.h:115:2: note: use -flax-vector-
>> > conversions
>> >       to permit conversions between vectors with differing element
>> > types
>> >       or numbers of subparts
>> >      node_type = vbicq_s32(tr_hi_lo.val[0], index_msk);
>> >      ^
>> >    ../lib/librte_acl/acl_run_neon.h:115:41: error: incompatible type
>> > for
>> >       argument 2 of ‘vbicq_s32’
>> >
>> > Signed-off-by: Aaron Conole <aconole@redhat.com>
>> > ---
>> >  lib/librte_acl/acl_run_neon.h | 46 ++++++++++++++++++++-------------
>> > --
>> >  1 file changed, 27 insertions(+), 19 deletions(-)
>> >
>> >
>> >
>> >  /*
>> > @@ -179,6 +183,9 @@ search_neon_8(const struct rte_acl_ctx *ctx, const
>> > uint8_t **data,
>> >  	acl_match_check_x4(0, ctx, parms, &flows, &index_array[0]);
>> >  	acl_match_check_x4(4, ctx, parms, &flows, &index_array[4]);
>> >
>> > +	memset(&input0, 0, sizeof(input0));
>> > +	memset(&input1, 0, sizeof(input1));
>> 
>> Why this memset only required for arm64? If it real issue, Shouldn't it
>> required for x86 and ppc ?
>> 

Something for this part is still needed (see for example:
https://travis-ci.com/DPDK/dpdk/jobs/205675369).

I have two alternate approaches, butneither have even been compile tested
(and the obvious '-Wno-maybe-uninitialized' - but I dislike that
 approach because it will afflict all routines):

1.  Something like this:

@@ -181,8 +181,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
 
 	while (flows.started > 0) {
 		/* Gather 4 bytes of input data for each stream. */
-		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
-		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
+		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), vdup_n_s32(0), 0);
+		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), vdup_n_s32(0), 0);
 
 		input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
 		input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
@@ -242,7 +242,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
 
 	while (flows.started > 0) {
 		/* Gather 4 bytes of input data for each stream. */
-		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
+		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), vdup_n_s32(0), 0);
 		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
 		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
 		input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);

---------

2: something like this

diff --git a/lib/librte_acl/acl_run_neon.h b/lib/librte_acl/acl_run_neon.h
index a055a8240..0eb42865a 100644
--- a/lib/librte_acl/acl_run_neon.h
+++ b/lib/librte_acl/acl_run_neon.h
@@ -165,7 +165,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
        uint64_t index_array[8];
        struct completion cmplt[8];
        struct parms parms[8];
-       int32x4_t input0, input1;
+       static int32x4_t ZERO_VAL;
+       int32x4_t input0 = ZERO_VAL, input1 = ZERO_VAL;
 
        acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
                     total_packets, categories, ctx->trans_table);
@@ -181,8 +182,8 @@ search_neon_8(const struct rte_acl_ctx *ctx, const uint8_t **data,
 
        while (flows.started > 0) {
                /* Gather 4 bytes of input data for each stream. */
-               input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), vdup_n_s32(0), 0);
-               input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), vdup_n_s32(0), 0);
+               input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input0, 0);
+               input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 4), input1, 0);
 
                input0 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input0, 1);
                input1 = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 5), input1, 1);
@@ -227,7 +228,8 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
        uint64_t index_array[4];
        struct completion cmplt[4];
        struct parms parms[4];
-       int32x4_t input;
+       static int32x4_t ZERO_VAL;
+       int32x4_t input = ZERO_VAL;
 
        acl_set_flow(&flows, cmplt, RTE_DIM(cmplt), data, results,
                     total_packets, categories, ctx->trans_table);
@@ -242,7 +244,7 @@ search_neon_4(const struct rte_acl_ctx *ctx, const uint8_t **data,
 
        while (flows.started > 0) {
                /* Gather 4 bytes of input data for each stream. */
-               input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), vdup_n_s32(0), 0);
+               input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 0), input, 0);
                input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 1), input, 1);
                input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 2), input, 2);
                input = vsetq_lane_s32(GET_NEXT_4BYTES(parms, 3), input, 3);
---

WDYT?

^ permalink raw reply	[flat|nested] 44+ messages in thread

end of thread, other threads:[~2019-06-05 17:09 UTC | newest]

Thread overview: 44+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-04-08 18:24 [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build Aaron Conole
2019-04-08 18:24 ` Aaron Conole
2019-04-08 18:24 ` [dpdk-dev] [PATCH 1/3] acl: fix arm argument types Aaron Conole
2019-04-08 18:24   ` Aaron Conole
2019-04-10 14:39   ` [dpdk-dev] [EXT] " Jerin Jacob Kollanukkaran
2019-04-10 14:39     ` Jerin Jacob Kollanukkaran
2019-04-10 15:52     ` Aaron Conole
2019-04-10 15:52       ` Aaron Conole
2019-04-10 16:07       ` Jerin Jacob Kollanukkaran
2019-04-10 16:07         ` Jerin Jacob Kollanukkaran
2019-04-10 17:20         ` Aaron Conole
2019-04-10 17:20           ` Aaron Conole
2019-04-30 12:57           ` Aaron Conole
2019-04-30 12:57             ` Aaron Conole
2019-06-05 15:16     ` Jerin Jacob Kollanukkaran
2019-06-05 17:09       ` Aaron Conole
2019-04-08 18:24 ` [dpdk-dev] [PATCH 2/3] acl: update the build for multi-arch Aaron Conole
2019-04-08 18:24   ` Aaron Conole
2019-04-08 18:24 ` [dpdk-dev] [PATCH 3/3] acl: adjust the tests Aaron Conole
2019-04-08 18:24   ` Aaron Conole
2019-04-09  8:41   ` Ananyev, Konstantin
2019-04-09  8:41     ` Ananyev, Konstantin
2019-04-09 13:01     ` Aaron Conole
2019-04-09 13:01       ` Aaron Conole
2019-04-09 16:03       ` Ananyev, Konstantin
2019-04-09 16:03         ` Ananyev, Konstantin
2019-04-09 17:04         ` Ananyev, Konstantin
2019-04-09 17:04           ` Ananyev, Konstantin
2019-04-10  8:13           ` Richardson, Bruce
2019-04-10  8:13             ` Richardson, Bruce
2019-04-10 13:10           ` Aaron Conole
2019-04-10 13:10             ` Aaron Conole
2019-04-10 13:24             ` Bruce Richardson
2019-04-10 13:24               ` Bruce Richardson
2019-04-10 13:46               ` Bruce Richardson
2019-04-10 13:46                 ` Bruce Richardson
2019-04-09 17:05         ` Richardson, Bruce
2019-04-09 17:05           ` Richardson, Bruce
2019-04-09 18:29           ` Ananyev, Konstantin
2019-04-09 18:29             ` Ananyev, Konstantin
2019-04-10  9:06             ` Bruce Richardson
2019-04-10  9:06               ` Bruce Richardson
2019-04-08 20:40 ` [dpdk-dev] [PATCH 0/3] librte_acl: fixes related to testing with the meson build Aaron Conole
2019-04-08 20:40   ` Aaron Conole

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).