DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v2] arm: detect NEON by RTE_MACHINE_CPUFLAG_NEON flag only
@ 2016-03-19  9:26 Jan Viktorin
  2016-03-19 11:05 ` Jan Viktorin
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: Jan Viktorin @ 2016-03-19  9:26 UTC (permalink / raw)
  To: thomas.monjalon
  Cc: jerin.jacob, tomaszx.kulasek, jianbo.liu, dev, Jan Viktorin

The RTE_MACHINE_CPUFLAG_NEON was only a result of the gcc testing. However,
the target CPU may not support NEON or the user can disable to use it (as it
does not always improve the performance).

The RTE_MACHINE_CPUFLAG_NEON detection is now based on both, the __ARM_NEON_FP
feature from gcc and CONFIG_RTE_ARCH_ARM_NEON from the .config. The memcpy
implemention is driven by RTE_MACHINE_CPUFLAG_NEON, so the reason to disable
NEON is hidden for the actual code.

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
v2: fix l3fwm_em.c to refer RTE_MACHINE_CPUFLAG_NEON instead of __ARM_NEON
---
 examples/l3fwd/l3fwd_em.c                              | 2 +-
 lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 4 ++--
 mk/machine/armv7a/rte.vars.mk                          | 2 +-
 mk/rte.cpuflags.mk                                     | 2 ++
 4 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index 0adf8f4..4983eed 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -250,7 +250,7 @@ em_mask_key(void *key, xmm_t mask)
 
 	return _mm_and_si128(data, mask);
 }
-#elif defined(__ARM_NEON)
+#elif defined(RTE_MACHINE_CPUFLAG_NEON)
 static inline xmm_t
 em_mask_key(void *key, xmm_t mask)
 {
diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
index df47c0d..ad8bc65 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
@@ -42,7 +42,7 @@ extern "C" {
 
 #include "generic/rte_memcpy.h"
 
-#ifdef __ARM_NEON_FP
+#ifdef RTE_MACHINE_CPUFLAG_NEON
 
 /* ARM NEON Intrinsics are used to copy data */
 #include <arm_neon.h>
@@ -325,7 +325,7 @@ rte_memcpy_func(void *dst, const void *src, size_t n)
 	return memcpy(dst, src, n);
 }
 
-#endif /* __ARM_NEON_FP */
+#endif /* RTE_MACHINE_CPUFLAG_NEON */
 
 #ifdef __cplusplus
 }
diff --git a/mk/machine/armv7a/rte.vars.mk b/mk/machine/armv7a/rte.vars.mk
index 48d3979..7a167c1 100644
--- a/mk/machine/armv7a/rte.vars.mk
+++ b/mk/machine/armv7a/rte.vars.mk
@@ -62,6 +62,6 @@ ifdef CONFIG_RTE_ARCH_ARM_TUNE
 MACHINE_CFLAGS += -mtune=$(CONFIG_RTE_ARCH_ARM_TUNE)
 endif
 
-ifeq ($(CONFIG_RTE_ARCH_ARM_NEON),y)
+ifdef $(RTE_MACHINE_CPUFLAG_NEON)
 MACHINE_CFLAGS += -mfpu=neon
 endif
diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
index 19a3e7e..1947511 100644
--- a/mk/rte.cpuflags.mk
+++ b/mk/rte.cpuflags.mk
@@ -111,9 +111,11 @@ CPUFLAGS += VSX
 endif
 
 # ARM flags
+ifeq ($(CONFIG_RTE_ARCH_ARM_NEON),y)
 ifneq ($(filter $(AUTO_CPUFLAGS),__ARM_NEON_FP),)
 CPUFLAGS += NEON
 endif
+endif
 
 ifneq ($(filter $(AUTO_CPUFLAGS),__ARM_FEATURE_CRC32),)
 CPUFLAGS += CRC32
-- 
2.7.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v2] arm: detect NEON by RTE_MACHINE_CPUFLAG_NEON flag only
  2016-03-19  9:26 [dpdk-dev] [PATCH v2] arm: detect NEON by RTE_MACHINE_CPUFLAG_NEON flag only Jan Viktorin
@ 2016-03-19 11:05 ` Jan Viktorin
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 0/4] " Jan Viktorin
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Jan Viktorin @ 2016-03-19 11:05 UTC (permalink / raw)
  To: thomas.monjalon; +Cc: jerin.jacob, tomaszx.kulasek, jianbo.liu, dev

On Sat, 19 Mar 2016 10:26:30 +0100
Jan Viktorin <viktorin@rehivetech.com> wrote:

> The RTE_MACHINE_CPUFLAG_NEON was only a result of the gcc testing. However,
> the target CPU may not support NEON or the user can disable to use it (as it
> does not always improve the performance).
> 
> The RTE_MACHINE_CPUFLAG_NEON detection is now based on both, the __ARM_NEON_FP
> feature from gcc and CONFIG_RTE_ARCH_ARM_NEON from the .config. The memcpy
> implemention is driven by RTE_MACHINE_CPUFLAG_NEON, so the reason to disable
> NEON is hidden for the actual code.

Unfortunately, I've overlooked a mistake. I have to remake the patch a
bit, sorry. I am a bit confused about the __ARM_NEON and __ARM_NEON_FP
settings.

The arm_neon.h is available only when the __ARM_NEON is present. But...

$ arm-buildroot-linux-gnueabi-gcc -dM -E - < /dev/null  | grep "_FP\|_NEON"
#define __ARM_FP 12
#define __ARM_NEON_FP 4
#define __VFP_FP__ 1

Without -mfpu=neon we don't have arm_neon.h. I consider this strange as
we are not interested in the FPU features but in the SIMD features...

$ arm-buildroot-linux-gnueabi-gcc -mfpu=neon -dM -E - < /dev/null  | grep "_FP\|_NEON"
#define __ARM_FP 12
#define __ARM_NEON_FP 4
#define __ARM_NEON__ 1
#define __VFP_FP__ 1
#define __ARM_NEON 1

$ arm-buildroot-linux-gnueabi-gcc -mfpu=neon-vfpv4 -dM -E - < /dev/null  | grep "_FP\|_NEON"
#define __ARM_FP 14
#define __ARM_NEON_FP 6
#define __FP_FAST_FMAF 1
#define __FP_FAST_FMAL 1
#define __ARM_NEON__ 1
#define __VFP_FP__ 1
#define __ARM_NEON 1
#define __FP_FAST_FMA 1

ARM64 is OK here...

$ aarch64-buildroot-linux-gnu-gcc -dM -E - < /dev/null | grep "NEON\|FP"
#define __FP_FAST_FMAF 1
#define __ARM_NEON 1
#define __FP_FAST_FMA 1

So...

> 
> Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
> ---
> v2: fix l3fwm_em.c to refer RTE_MACHINE_CPUFLAG_NEON instead of __ARM_NEON
> ---
>  examples/l3fwd/l3fwd_em.c                              | 2 +-
>  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 4 ++--
>  mk/machine/armv7a/rte.vars.mk                          | 2 +-
>  mk/rte.cpuflags.mk                                     | 2 ++
>  4 files changed, 6 insertions(+), 4 deletions(-)
> 
[...]
>  #ifdef __cplusplus
>  }
> diff --git a/mk/machine/armv7a/rte.vars.mk b/mk/machine/armv7a/rte.vars.mk
> index 48d3979..7a167c1 100644
> --- a/mk/machine/armv7a/rte.vars.mk
> +++ b/mk/machine/armv7a/rte.vars.mk
> @@ -62,6 +62,6 @@ ifdef CONFIG_RTE_ARCH_ARM_TUNE
>  MACHINE_CFLAGS += -mtune=$(CONFIG_RTE_ARCH_ARM_TUNE)
>  endif
>  
> -ifeq ($(CONFIG_RTE_ARCH_ARM_NEON),y)
> +ifdef $(RTE_MACHINE_CPUFLAG_NEON)
>  MACHINE_CFLAGS += -mfpu=neon
>  endif

RTE_MACHINE_CPUFLAG_NEON is not *yet* set here (cpuflags are detected later)...
So the -mfpu=neon is never configured and the build fails. The
MACHINE_CFLAGS should rather depend on the CONFIG_RTE_ARCH_ARM_NEON
telling the build-system "we want NEON".

> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
> index 19a3e7e..1947511 100644
> --- a/mk/rte.cpuflags.mk
> +++ b/mk/rte.cpuflags.mk
> @@ -111,9 +111,11 @@ CPUFLAGS += VSX
>  endif
>  
>  # ARM flags
> +ifeq ($(CONFIG_RTE_ARCH_ARM_NEON),y)
>  ifneq ($(filter $(AUTO_CPUFLAGS),__ARM_NEON_FP),)

Here, we should check __ARM_NEON (to be ARM32/64 compatible) but we
cannot see __ARM_NEON without the -mfpu=neon flag.

Jerin, does the current DPDK detect NEON feature on ARM64? I'd say, it
cannot.

So, we should probably check both __ARM_NEON and __ARM_NEON_FP here.

Another point, related to the original discussion:

http://dpdk.org/ml/archives/dev/2016-March/thread.html#35972

we should probably have a config option to enable memcpy optimizations
separated from the NEON support. The NEON support can then be detected
only by the __ARM_NEON flag. The ARMv7 would have the -mfpu=neon always
set. If somebody likes to customize this, she would do it by hand. The
result is, we correctly detect NEON during build time from the GCC.

>  CPUFLAGS += NEON
>  endif
> +endif
>  
>  ifneq ($(filter $(AUTO_CPUFLAGS),__ARM_FEATURE_CRC32),)
>  CPUFLAGS += CRC32



-- 
  Jan Viktorin                E-mail: Viktorin@RehiveTech.com
  System Architect            Web:    www.RehiveTech.com
  RehiveTech
  Brno, Czech Republic

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH v3 0/4] arm: detect NEON by RTE_MACHINE_CPUFLAG_NEON flag only
  2016-03-19  9:26 [dpdk-dev] [PATCH v2] arm: detect NEON by RTE_MACHINE_CPUFLAG_NEON flag only Jan Viktorin
  2016-03-19 11:05 ` Jan Viktorin
@ 2016-03-19 19:58 ` Jan Viktorin
  2016-03-24 16:47   ` Thomas Monjalon
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 1/4] arm: remove CONFIG_RTE_ARCH_ARM_NEON Jan Viktorin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: Jan Viktorin @ 2016-03-19 19:58 UTC (permalink / raw)
  To: dev
  Cc: Jan Viktorin, thomas.monjalon, jerin.jacob, tomaszx.kulasek,
	jianbo.liu, david.marchand

Hello,

finally, I've broken the original patch into 4 pieces as it solves more issues
and not just a single one.

* As Thomas have already mentioned, the CONFIG_RTE_ARCH_ARM_NEON is confusing. 
  So, I've decided to remove it entirely and provide another option for a more
  specific purpose: CONFIG_RTE_ARCH_ARM_NEON_MEMCPY.

* The RTE_MACHINE_CPUFLAG_NEON detection is now based on __ARM_NEON as only
  this compiler definition gives us the arm_neon.h and is compatible with
  arm64. In DPDK, the RTE_MACHINE_CPUFLAG_NEON should be prefered over the
  __ARM_NEON. I'd recommend the same for x86 code (__SSE2__)... 

History:
v2
* fix l3fwm_em.c to refer RTE_MACHINE_CPUFLAG_NEON instead of __ARM_NEON

v3
* divided into 4 patches as there are more independent problems
* compiles well for armv7
* (probably) fixes RTE_MACHINE_CPUFLAG_NEON detection on arm64

Jan Viktorin (4):
  arm: remove CONFIG_RTE_ARCH_ARM_NEON
  arm: detect NEON cpu feature by checking __ARM_NEON
  arm: detect NEON by checking RTE_MACHINE_CPUFLAG_NEON
  eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY

 config/defconfig_arm-armv7a-linuxapp-gcc               | 2 +-
 config/defconfig_arm64-armv8a-linuxapp-gcc             | 1 -
 examples/l3fwd/l3fwd_em.c                              | 2 +-
 lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++--
 mk/machine/armv7a/rte.vars.mk                          | 2 --
 mk/rte.cpuflags.mk                                     | 2 +-
 6 files changed, 9 insertions(+), 8 deletions(-)

-- 
2.7.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH v3 1/4] arm: remove CONFIG_RTE_ARCH_ARM_NEON
  2016-03-19  9:26 [dpdk-dev] [PATCH v2] arm: detect NEON by RTE_MACHINE_CPUFLAG_NEON flag only Jan Viktorin
  2016-03-19 11:05 ` Jan Viktorin
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 0/4] " Jan Viktorin
@ 2016-03-19 19:58 ` Jan Viktorin
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 2/4] arm: detect NEON cpu feature by checking __ARM_NEON Jan Viktorin
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 19+ messages in thread
From: Jan Viktorin @ 2016-03-19 19:58 UTC (permalink / raw)
  To: dev
  Cc: Jan Viktorin, thomas.monjalon, jerin.jacob, tomaszx.kulasek, jianbo.liu

ARMv7 machines have usually the NEON available. Customization of the -mfpu=neon
must be done by hand or by defining another machine rte.vars.mk. So, the
CONFIG_RTE_ARCH_ARM_NEON is useless (and confusing).

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
 config/defconfig_arm-armv7a-linuxapp-gcc   | 1 -
 config/defconfig_arm64-armv8a-linuxapp-gcc | 1 -
 mk/machine/armv7a/rte.vars.mk              | 2 --
 3 files changed, 4 deletions(-)

diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
index b007ca7..96c3343 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -36,7 +36,6 @@ CONFIG_RTE_ARCH="arm"
 CONFIG_RTE_ARCH_ARM=y
 CONFIG_RTE_ARCH_ARMv7=y
 CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
-CONFIG_RTE_ARCH_ARM_NEON=y
 
 CONFIG_RTE_FORCE_INTRINSICS=y
 CONFIG_RTE_ARCH_STRICT_ALIGN=y
diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc
index b0b17cf..9abeca4 100644
--- a/config/defconfig_arm64-armv8a-linuxapp-gcc
+++ b/config/defconfig_arm64-armv8a-linuxapp-gcc
@@ -36,7 +36,6 @@ CONFIG_RTE_MACHINE="armv8a"
 CONFIG_RTE_ARCH="arm64"
 CONFIG_RTE_ARCH_ARM64=y
 CONFIG_RTE_ARCH_64=y
-CONFIG_RTE_ARCH_ARM_NEON=y
 
 CONFIG_RTE_FORCE_INTRINSICS=y
 
diff --git a/mk/machine/armv7a/rte.vars.mk b/mk/machine/armv7a/rte.vars.mk
index 48d3979..abdb15e 100644
--- a/mk/machine/armv7a/rte.vars.mk
+++ b/mk/machine/armv7a/rte.vars.mk
@@ -62,6 +62,4 @@ ifdef CONFIG_RTE_ARCH_ARM_TUNE
 MACHINE_CFLAGS += -mtune=$(CONFIG_RTE_ARCH_ARM_TUNE)
 endif
 
-ifeq ($(CONFIG_RTE_ARCH_ARM_NEON),y)
 MACHINE_CFLAGS += -mfpu=neon
-endif
-- 
2.7.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH v3 2/4] arm: detect NEON cpu feature by checking __ARM_NEON
  2016-03-19  9:26 [dpdk-dev] [PATCH v2] arm: detect NEON by RTE_MACHINE_CPUFLAG_NEON flag only Jan Viktorin
                   ` (2 preceding siblings ...)
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 1/4] arm: remove CONFIG_RTE_ARCH_ARM_NEON Jan Viktorin
@ 2016-03-19 19:58 ` Jan Viktorin
  2016-03-20 17:27   ` Jerin Jacob
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 3/4] arm: detect NEON by checking RTE_MACHINE_CPUFLAG_NEON Jan Viktorin
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY Jan Viktorin
  5 siblings, 1 reply; 19+ messages in thread
From: Jan Viktorin @ 2016-03-19 19:58 UTC (permalink / raw)
  To: dev
  Cc: Jan Viktorin, thomas.monjalon, jerin.jacob, tomaszx.kulasek, jianbo.liu

The __ARM_NEON declares that the arm_neon.h is available which is not true for
the __ARM_NEON_FP. The __ARM_NEON_FP is not provided by aarch64 gcc.

 $ arm-linux-gnueabi-gcc -dM -E - < /dev/null  | grep "_FP\|_NEON"
 #define __ARM_FP 12
 #define __ARM_NEON_FP 4
 #define __VFP_FP__ 1

 $ arm-linux-gnueabi-gcc -mfpu=neon -dM -E - < /dev/null  | grep "_FP\|_NEON"
 #define __ARM_FP 12
 #define __ARM_NEON_FP 4
 #define __ARM_NEON__ 1
 #define __VFP_FP__ 1
 #define __ARM_NEON 1

 $ aarch64-linux-gnu-gcc -dM -E - < /dev/null | grep "NEON\|FP"
 #define __FP_FAST_FMAF 1
 #define __ARM_NEON 1
 #define __FP_FAST_FMA 1

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
 mk/rte.cpuflags.mk | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
index 19a3e7e..529bcef 100644
--- a/mk/rte.cpuflags.mk
+++ b/mk/rte.cpuflags.mk
@@ -111,7 +111,7 @@ CPUFLAGS += VSX
 endif
 
 # ARM flags
-ifneq ($(filter $(AUTO_CPUFLAGS),__ARM_NEON_FP),)
+ifneq ($(filter $(AUTO_CPUFLAGS),__ARM_NEON),)
 CPUFLAGS += NEON
 endif
 
-- 
2.7.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH v3 3/4] arm: detect NEON by checking RTE_MACHINE_CPUFLAG_NEON
  2016-03-19  9:26 [dpdk-dev] [PATCH v2] arm: detect NEON by RTE_MACHINE_CPUFLAG_NEON flag only Jan Viktorin
                   ` (3 preceding siblings ...)
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 2/4] arm: detect NEON cpu feature by checking __ARM_NEON Jan Viktorin
@ 2016-03-19 19:58 ` Jan Viktorin
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY Jan Viktorin
  5 siblings, 0 replies; 19+ messages in thread
From: Jan Viktorin @ 2016-03-19 19:58 UTC (permalink / raw)
  To: dev
  Cc: Jan Viktorin, thomas.monjalon, jerin.jacob, tomaszx.kulasek, jianbo.liu

User applications and DPDK libraries should detect the NEON by the
RTE_MACHINE_CPUFLAG_NEON. It guarantees that the arm_neon.h is present.

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
 examples/l3fwd/l3fwd_em.c                              | 2 +-
 lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index 0adf8f4..4983eed 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -250,7 +250,7 @@ em_mask_key(void *key, xmm_t mask)
 
 	return _mm_and_si128(data, mask);
 }
-#elif defined(__ARM_NEON)
+#elif defined(RTE_MACHINE_CPUFLAG_NEON)
 static inline xmm_t
 em_mask_key(void *key, xmm_t mask)
 {
diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
index df47c0d..ad8bc65 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
@@ -42,7 +42,7 @@ extern "C" {
 
 #include "generic/rte_memcpy.h"
 
-#ifdef __ARM_NEON_FP
+#ifdef RTE_MACHINE_CPUFLAG_NEON
 
 /* ARM NEON Intrinsics are used to copy data */
 #include <arm_neon.h>
@@ -325,7 +325,7 @@ rte_memcpy_func(void *dst, const void *src, size_t n)
 	return memcpy(dst, src, n);
 }
 
-#endif /* __ARM_NEON_FP */
+#endif /* RTE_MACHINE_CPUFLAG_NEON */
 
 #ifdef __cplusplus
 }
-- 
2.7.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY
  2016-03-19  9:26 [dpdk-dev] [PATCH v2] arm: detect NEON by RTE_MACHINE_CPUFLAG_NEON flag only Jan Viktorin
                   ` (4 preceding siblings ...)
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 3/4] arm: detect NEON by checking RTE_MACHINE_CPUFLAG_NEON Jan Viktorin
@ 2016-03-19 19:58 ` Jan Viktorin
  2016-03-19 20:14   ` Thomas Monjalon
  2016-03-21  5:42   ` Jianbo Liu
  5 siblings, 2 replies; 19+ messages in thread
From: Jan Viktorin @ 2016-03-19 19:58 UTC (permalink / raw)
  To: dev
  Cc: Jan Viktorin, thomas.monjalon, jerin.jacob, tomaszx.kulasek, jianbo.liu

The flag is used to enable memcpy optimizations in EAL. As it is not always
the performance benefit, the flag allows to disable it.

Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
---
 config/defconfig_arm-armv7a-linuxapp-gcc               | 1 +
 lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
index 96c3343..2c60c2c 100644
--- a/config/defconfig_arm-armv7a-linuxapp-gcc
+++ b/config/defconfig_arm-armv7a-linuxapp-gcc
@@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
 CONFIG_RTE_ARCH_ARM=y
 CONFIG_RTE_ARCH_ARMv7=y
 CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
+CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
 
 CONFIG_RTE_FORCE_INTRINSICS=y
 CONFIG_RTE_ARCH_STRICT_ALIGN=y
diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
index ad8bc65..988125b 100644
--- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
+++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h
@@ -42,7 +42,11 @@ extern "C" {
 
 #include "generic/rte_memcpy.h"
 
-#ifdef RTE_MACHINE_CPUFLAG_NEON
+#ifdef RTE_ARCH_ARM_NEON_MEMCPY
+
+#ifndef RTE_MACHINE_CPUFLAG_NEON
+#error "Cannot optimize memcpy by NEON as the CPU seems to not support this"
+#endif
 
 /* ARM NEON Intrinsics are used to copy data */
 #include <arm_neon.h>
@@ -325,7 +329,7 @@ rte_memcpy_func(void *dst, const void *src, size_t n)
 	return memcpy(dst, src, n);
 }
 
-#endif /* RTE_MACHINE_CPUFLAG_NEON */
+#endif /* RTE_ARCH_ARM_NEON_MEMCPY */
 
 #ifdef __cplusplus
 }
-- 
2.7.0

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY Jan Viktorin
@ 2016-03-19 20:14   ` Thomas Monjalon
  2016-03-20  9:41     ` Jan Viktorin
  2016-03-21  5:42   ` Jianbo Liu
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Monjalon @ 2016-03-19 20:14 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev, jerin.jacob, tomaszx.kulasek, jianbo.liu

2016-03-19 20:58, Jan Viktorin:
> The flag is used to enable memcpy optimizations in EAL. As it is not always
> the performance benefit, the flag allows to disable it.

Ideally the default should be to choose the best optimization.
If it is not possible, it would help to have some comments explaining
how to choose wether enabling NEON memcpy or not.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY
  2016-03-19 20:14   ` Thomas Monjalon
@ 2016-03-20  9:41     ` Jan Viktorin
  2016-03-20  9:46       ` Jan Viktorin
  2016-03-20 10:29       ` Thomas Monjalon
  0 siblings, 2 replies; 19+ messages in thread
From: Jan Viktorin @ 2016-03-20  9:41 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, jerin.jacob, tomaszx.kulasek, jianbo.liu

On Sat, 19 Mar 2016 21:14:57 +0100
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2016-03-19 20:58, Jan Viktorin:
> > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > the performance benefit, the flag allows to disable it.  
> 
> Ideally the default should be to choose the best optimization.
> If it is not possible, it would help to have some comments explaining
> how to choose wether enabling NEON memcpy or not.

Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY,
delete it from the defconfig and change the test in rte_memcpy_32.h to

#ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY

Alternatively, to have a positive test like

#ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY

I can create a bigger change that moves the non-neon-memcpy up in the
file...

Should I resend the whole series as v3?

Regards
Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY
  2016-03-20  9:41     ` Jan Viktorin
@ 2016-03-20  9:46       ` Jan Viktorin
  2016-03-20 10:33         ` Thomas Monjalon
  2016-03-20 10:29       ` Thomas Monjalon
  1 sibling, 1 reply; 19+ messages in thread
From: Jan Viktorin @ 2016-03-20  9:46 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: dev, jerin.jacob, tomaszx.kulasek, jianbo.liu

On Sun, 20 Mar 2016 10:41:10 +0100
Jan Viktorin <viktorin@rehivetech.com> wrote:

> On Sat, 19 Mar 2016 21:14:57 +0100
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 
> > 2016-03-19 20:58, Jan Viktorin:  
> > > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > > the performance benefit, the flag allows to disable it.    
> > 
> > Ideally the default should be to choose the best optimization.
> > If it is not possible, it would help to have some comments explaining
> > how to choose wether enabling NEON memcpy or not.

The related statistics are mentioned here:

commit 04a2fde35daf5e9a271e72331a70b48b951d7568
Author: Vlastimil Kosar <kosar@rehivetech.com>
Date:   Tue Nov 3 00:47:20 2015 +0100

    eal/arm: add vector memcpy for ARMv7

It's quite difficult to easily summarize it, especially for so many
CPUs...
 
> 
> Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY,
> delete it from the defconfig and change the test in rte_memcpy_32.h to
> 
> #ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> 
> Alternatively, to have a positive test like
> 
> #ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> 
> I can create a bigger change that moves the non-neon-memcpy up in the
> file...
> 
> Should I resend the whole series as v3?
> 
> Regards
> Jan


-- 
  Jan Viktorin                E-mail: Viktorin@RehiveTech.com
  System Architect            Web:    www.RehiveTech.com
  RehiveTech
  Brno, Czech Republic

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY
  2016-03-20  9:41     ` Jan Viktorin
  2016-03-20  9:46       ` Jan Viktorin
@ 2016-03-20 10:29       ` Thomas Monjalon
  2016-03-20 17:38         ` Jerin Jacob
  1 sibling, 1 reply; 19+ messages in thread
From: Thomas Monjalon @ 2016-03-20 10:29 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev, jerin.jacob, tomaszx.kulasek, jianbo.liu

2016-03-20 10:41, Jan Viktorin:
> On Sat, 19 Mar 2016 21:14:57 +0100
> Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> 
> > 2016-03-19 20:58, Jan Viktorin:
> > > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > > the performance benefit, the flag allows to disable it.  
> > 
> > Ideally the default should be to choose the best optimization.
> > If it is not possible, it would help to have some comments explaining
> > how to choose wether enabling NEON memcpy or not.
> 
> Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY,
> delete it from the defconfig and change the test in rte_memcpy_32.h to
> 
> #ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> 
> Alternatively, to have a positive test like
> 
> #ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> 
> I can create a bigger change that moves the non-neon-memcpy up in the
> file...
> 
> Should I resend the whole series as v3?

No, I don't think changing the name of the config or moving code
will change anything.
We just need to understand when it must be enabled or disabled.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY
  2016-03-20  9:46       ` Jan Viktorin
@ 2016-03-20 10:33         ` Thomas Monjalon
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Monjalon @ 2016-03-20 10:33 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev, jerin.jacob, tomaszx.kulasek, jianbo.liu

2016-03-20 10:46, Jan Viktorin:
> On Sun, 20 Mar 2016 10:41:10 +0100
> Jan Viktorin <viktorin@rehivetech.com> wrote:
> 
> > On Sat, 19 Mar 2016 21:14:57 +0100
> > Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> > 
> > > 2016-03-19 20:58, Jan Viktorin:  
> > > > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > > > the performance benefit, the flag allows to disable it.    
> > > 
> > > Ideally the default should be to choose the best optimization.
> > > If it is not possible, it would help to have some comments explaining
> > > how to choose wether enabling NEON memcpy or not.
> 
> The related statistics are mentioned here:
> 
> commit 04a2fde35daf5e9a271e72331a70b48b951d7568
> Author: Vlastimil Kosar <kosar@rehivetech.com>
> Date:   Tue Nov 3 00:47:20 2015 +0100
> 
>     eal/arm: add vector memcpy for ARMv7
> 
> It's quite difficult to easily summarize it, especially for so many
> CPUs...

If it is difficult for you, it will be  impossible for the users of
this config option.
When someone will ask what is the best value for his CPU, what will
you answer?
At least, we can add a comment explaining that the performance is not
always better, depending of the buffer size and the CPU.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/4] arm: detect NEON cpu feature by checking __ARM_NEON
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 2/4] arm: detect NEON cpu feature by checking __ARM_NEON Jan Viktorin
@ 2016-03-20 17:27   ` Jerin Jacob
  0 siblings, 0 replies; 19+ messages in thread
From: Jerin Jacob @ 2016-03-20 17:27 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev, thomas.monjalon, tomaszx.kulasek, jianbo.liu

On Sat, Mar 19, 2016 at 08:58:03PM +0100, Jan Viktorin wrote:
> The __ARM_NEON declares that the arm_neon.h is available which is not true for
> the __ARM_NEON_FP. The __ARM_NEON_FP is not provided by aarch64 gcc.

It depends on specific aarch64 compiler builds. Some aarch64 gcc versions
do provide __ARM_NEON_FP.

[~] $ aarch64-thunderx-linux-gnu-gcc -dM -E - < /dev/null |grep "NEON\|FP"
#define __ARM_FP 12
#define __ARM_NEON_FP 12
#define __FP_FAST_FMAF 1
#define __ARM_NEON 1
#define __FP_FAST_FMA 1

However, This patch is correct, we should use __ARM_NEON.

For this series,
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>


> 
>  $ arm-linux-gnueabi-gcc -dM -E - < /dev/null  | grep "_FP\|_NEON"
>  #define __ARM_FP 12
>  #define __ARM_NEON_FP 4
>  #define __VFP_FP__ 1
> 
>  $ arm-linux-gnueabi-gcc -mfpu=neon -dM -E - < /dev/null  | grep "_FP\|_NEON"
>  #define __ARM_FP 12
>  #define __ARM_NEON_FP 4
>  #define __ARM_NEON__ 1
>  #define __VFP_FP__ 1
>  #define __ARM_NEON 1
> 
>  $ aarch64-linux-gnu-gcc -dM -E - < /dev/null | grep "NEON\|FP"
>  #define __FP_FAST_FMAF 1
>  #define __ARM_NEON 1
>  #define __FP_FAST_FMA 1
> 
> Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
> ---
>  mk/rte.cpuflags.mk | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
> index 19a3e7e..529bcef 100644
> --- a/mk/rte.cpuflags.mk
> +++ b/mk/rte.cpuflags.mk
> @@ -111,7 +111,7 @@ CPUFLAGS += VSX
>  endif
>  
>  # ARM flags
> -ifneq ($(filter $(AUTO_CPUFLAGS),__ARM_NEON_FP),)
> +ifneq ($(filter $(AUTO_CPUFLAGS),__ARM_NEON),)
>  CPUFLAGS += NEON
>  endif
>  
> -- 
> 2.7.0
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY
  2016-03-20 10:29       ` Thomas Monjalon
@ 2016-03-20 17:38         ` Jerin Jacob
  0 siblings, 0 replies; 19+ messages in thread
From: Jerin Jacob @ 2016-03-20 17:38 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Jan Viktorin, dev, tomaszx.kulasek, jianbo.liu

On Sun, Mar 20, 2016 at 11:29:48AM +0100, Thomas Monjalon wrote:
> 2016-03-20 10:41, Jan Viktorin:
> > On Sat, 19 Mar 2016 21:14:57 +0100
> > Thomas Monjalon <thomas.monjalon@6wind.com> wrote:
> > 
> > > 2016-03-19 20:58, Jan Viktorin:
> > > > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > > > the performance benefit, the flag allows to disable it.  
> > > 
> > > Ideally the default should be to choose the best optimization.
> > > If it is not possible, it would help to have some comments explaining
> > > how to choose wether enabling NEON memcpy or not.
> > 
> > Ok, we can rename the option to CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY,
> > delete it from the defconfig and change the test in rte_memcpy_32.h to
> > 
> > #ifndef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> > 
> > Alternatively, to have a positive test like
> > 
> > #ifdef CONFIG_RTE_ARCH_ARM_AVOID_NEON_MEMCPY
> > 
> > I can create a bigger change that moves the non-neon-memcpy up in the
> > file...
> > 
> > Should I resend the whole series as v3?
> 
> No, I don't think changing the name of the config or moving code
> will change anything.
> We just need to understand when it must be enabled or disabled.

By default, NEON implementation should be enabled in default config file,
if a given arm target/cpu has issue with NEON specific implementation
at target/cpu config level it can be disabled. IMO, Its inline with Jan's Patch.

The factors like NEON instruction execution cycles and pipelines
supported etc highly depend on the ARM target vendor implementation.
(ie arm specification does not mandate those fine-grained details)
so let target/cpu configuration decides any expectation is required or
not.

Jerin

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY Jan Viktorin
  2016-03-19 20:14   ` Thomas Monjalon
@ 2016-03-21  5:42   ` Jianbo Liu
  2016-03-21 12:21     ` Jan Viktorin
  1 sibling, 1 reply; 19+ messages in thread
From: Jianbo Liu @ 2016-03-21  5:42 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: dev, Thomas Monjalon, Jerin Jacob, tomaszx.kulasek

On 20 March 2016 at 03:58, Jan Viktorin <viktorin@rehivetech.com> wrote:
> The flag is used to enable memcpy optimizations in EAL. As it is not always
> the performance benefit, the flag allows to disable it.
>
> Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
> ---
>  config/defconfig_arm-armv7a-linuxapp-gcc               | 1 +
>  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++--
>  2 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> index 96c3343..2c60c2c 100644
> --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
>  CONFIG_RTE_ARCH_ARM=y
>  CONFIG_RTE_ARCH_ARMv7=y
>  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
>
If it's not always benefit, why not disable here since it is common
armv7a config, and enable in your or other user's own config file?

Thanks!
Jianbo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY
  2016-03-21  5:42   ` Jianbo Liu
@ 2016-03-21 12:21     ` Jan Viktorin
  2016-03-21 13:24       ` Thomas Monjalon
  0 siblings, 1 reply; 19+ messages in thread
From: Jan Viktorin @ 2016-03-21 12:21 UTC (permalink / raw)
  To: Jianbo Liu; +Cc: dev, Thomas Monjalon, Jerin Jacob, tomaszx.kulasek

On Mon, 21 Mar 2016 13:42:31 +0800
Jianbo Liu <jianbo.liu@linaro.org> wrote:

> On 20 March 2016 at 03:58, Jan Viktorin <viktorin@rehivetech.com> wrote:
> > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > the performance benefit, the flag allows to disable it.
> >
> > Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
> > ---
> >  config/defconfig_arm-armv7a-linuxapp-gcc               | 1 +
> >  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++--
> >  2 files changed, 7 insertions(+), 2 deletions(-)
> >
> > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> > index 96c3343..2c60c2c 100644
> > --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
> >  CONFIG_RTE_ARCH_ARM=y
> >  CONFIG_RTE_ARCH_ARMv7=y
> >  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
> >  
> If it's not always benefit, why not disable here since it is common
> armv7a config, and enable in your or other user's own config file?

Jianbo, you are right. In that case, I'd just turn it off by default.
And when there is a new platform-specific defconfig, it can enable it.

Anyway, I am thinking of adding some comment into the rte_memcpy_32.h
file describing the potential of the NEON code. What about:

/* Enable in your defconfig to accelerate memcpy operations. Consider
   enabling this for Cortex-A15. For Cortex-A7 and Cortex-A9, It might
   accelerate short data copies (< 64 B). */

Thomas, do you consider this enough?

Jan

> 
> Thanks!
> Jianbo



-- 
   Jan Viktorin                  E-mail: Viktorin@RehiveTech.com
   System Architect              Web:    www.RehiveTech.com
   RehiveTech
   Brno, Czech Republic

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY
  2016-03-21 12:21     ` Jan Viktorin
@ 2016-03-21 13:24       ` Thomas Monjalon
  2016-03-21 14:01         ` Jan Viktorin
  0 siblings, 1 reply; 19+ messages in thread
From: Thomas Monjalon @ 2016-03-21 13:24 UTC (permalink / raw)
  To: Jan Viktorin; +Cc: Jianbo Liu, dev, Jerin Jacob, tomaszx.kulasek

2016-03-21 13:21, Jan Viktorin:
> On Mon, 21 Mar 2016 13:42:31 +0800
> Jianbo Liu <jianbo.liu@linaro.org> wrote:
> 
> > On 20 March 2016 at 03:58, Jan Viktorin <viktorin@rehivetech.com> wrote:
> > > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > > the performance benefit, the flag allows to disable it.
> > >
> > > Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
> > > ---
> > >  config/defconfig_arm-armv7a-linuxapp-gcc               | 1 +
> > >  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++--
> > >  2 files changed, 7 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> > > index 96c3343..2c60c2c 100644
> > > --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> > > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> > > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
> > >  CONFIG_RTE_ARCH_ARM=y
> > >  CONFIG_RTE_ARCH_ARMv7=y
> > >  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> > > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
> > >  
> > If it's not always benefit, why not disable here since it is common
> > armv7a config, and enable in your or other user's own config file?
> 
> Jianbo, you are right. In that case, I'd just turn it off by default.
> And when there is a new platform-specific defconfig, it can enable it.
> 
> Anyway, I am thinking of adding some comment into the rte_memcpy_32.h
> file describing the potential of the NEON code. What about:
> 
> /* Enable in your defconfig to accelerate memcpy operations. Consider
>    enabling this for Cortex-A15. For Cortex-A7 and Cortex-A9, It might
>    accelerate short data copies (< 64 B). */
> 
> Thomas, do you consider this enough?

Yes it is perfect.
Why not put it in defconfig_arm-armv7a-linuxapp-gcc?

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY
  2016-03-21 13:24       ` Thomas Monjalon
@ 2016-03-21 14:01         ` Jan Viktorin
  0 siblings, 0 replies; 19+ messages in thread
From: Jan Viktorin @ 2016-03-21 14:01 UTC (permalink / raw)
  To: Thomas Monjalon; +Cc: Jianbo Liu, dev, Jerin Jacob, tomaszx.kulasek

On Mon, 21 Mar 2016 06:24:37 -0700 (PDT)
Thomas Monjalon <thomas.monjalon@6wind.com> wrote:

> 2016-03-21 13:21, Jan Viktorin:
> > On Mon, 21 Mar 2016 13:42:31 +0800
> > Jianbo Liu <jianbo.liu@linaro.org> wrote:
> >   
> > > On 20 March 2016 at 03:58, Jan Viktorin <viktorin@rehivetech.com> wrote:  
> > > > The flag is used to enable memcpy optimizations in EAL. As it is not always
> > > > the performance benefit, the flag allows to disable it.
> > > >
> > > > Signed-off-by: Jan Viktorin <viktorin@rehivetech.com>
> > > > ---
> > > >  config/defconfig_arm-armv7a-linuxapp-gcc               | 1 +
> > > >  lib/librte_eal/common/include/arch/arm/rte_memcpy_32.h | 8 ++++++--
> > > >  2 files changed, 7 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc
> > > > index 96c3343..2c60c2c 100644
> > > > --- a/config/defconfig_arm-armv7a-linuxapp-gcc
> > > > +++ b/config/defconfig_arm-armv7a-linuxapp-gcc
> > > > @@ -36,6 +36,7 @@ CONFIG_RTE_ARCH="arm"
> > > >  CONFIG_RTE_ARCH_ARM=y
> > > >  CONFIG_RTE_ARCH_ARMv7=y
> > > >  CONFIG_RTE_ARCH_ARM_TUNE="cortex-a9"
> > > > +CONFIG_RTE_ARCH_ARM_NEON_MEMCPY=y
> > > >    
> > > If it's not always benefit, why not disable here since it is common
> > > armv7a config, and enable in your or other user's own config file?  
> > 
> > Jianbo, you are right. In that case, I'd just turn it off by default.
> > And when there is a new platform-specific defconfig, it can enable it.
> > 
> > Anyway, I am thinking of adding some comment into the rte_memcpy_32.h
> > file describing the potential of the NEON code. What about:
> > 
> > /* Enable in your defconfig to accelerate memcpy operations. Consider
> >    enabling this for Cortex-A15. For Cortex-A7 and Cortex-A9, It might
> >    accelerate short data copies (< 64 B). */
> > 
> > Thomas, do you consider this enough?  
> 
> Yes it is perfect.
> Why not put it in defconfig_arm-armv7a-linuxapp-gcc?

So, for now, I leave the patch as is and just add the comment.

Jan

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [dpdk-dev] [PATCH v3 0/4] arm: detect NEON by RTE_MACHINE_CPUFLAG_NEON flag only
  2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 0/4] " Jan Viktorin
@ 2016-03-24 16:47   ` Thomas Monjalon
  0 siblings, 0 replies; 19+ messages in thread
From: Thomas Monjalon @ 2016-03-24 16:47 UTC (permalink / raw)
  To: Jan Viktorin
  Cc: dev, jerin.jacob, tomaszx.kulasek, jianbo.liu, david.marchand

2016-03-19 20:58, Jan Viktorin:
> Hello,
> 
> finally, I've broken the original patch into 4 pieces as it solves more issues
> and not just a single one.
> 
> * As Thomas have already mentioned, the CONFIG_RTE_ARCH_ARM_NEON is confusing. 
>   So, I've decided to remove it entirely and provide another option for a more
>   specific purpose: CONFIG_RTE_ARCH_ARM_NEON_MEMCPY.
> 
> * The RTE_MACHINE_CPUFLAG_NEON detection is now based on __ARM_NEON as only
>   this compiler definition gives us the arm_neon.h and is compatible with
>   arm64. In DPDK, the RTE_MACHINE_CPUFLAG_NEON should be prefered over the
>   __ARM_NEON. I'd recommend the same for x86 code (__SSE2__)... 
> 
> History:
> v2
> * fix l3fwm_em.c to refer RTE_MACHINE_CPUFLAG_NEON instead of __ARM_NEON
> 
> v3
> * divided into 4 patches as there are more independent problems
> * compiles well for armv7
> * (probably) fixes RTE_MACHINE_CPUFLAG_NEON detection on arm64

Applied with discussed changes.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2016-03-24 16:49 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-19  9:26 [dpdk-dev] [PATCH v2] arm: detect NEON by RTE_MACHINE_CPUFLAG_NEON flag only Jan Viktorin
2016-03-19 11:05 ` Jan Viktorin
2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 0/4] " Jan Viktorin
2016-03-24 16:47   ` Thomas Monjalon
2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 1/4] arm: remove CONFIG_RTE_ARCH_ARM_NEON Jan Viktorin
2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 2/4] arm: detect NEON cpu feature by checking __ARM_NEON Jan Viktorin
2016-03-20 17:27   ` Jerin Jacob
2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 3/4] arm: detect NEON by checking RTE_MACHINE_CPUFLAG_NEON Jan Viktorin
2016-03-19 19:58 ` [dpdk-dev] [PATCH v3 4/4] eal/arm: introduce CONFIG_RTE_ARCH_ARM_NEON_MEMCPY Jan Viktorin
2016-03-19 20:14   ` Thomas Monjalon
2016-03-20  9:41     ` Jan Viktorin
2016-03-20  9:46       ` Jan Viktorin
2016-03-20 10:33         ` Thomas Monjalon
2016-03-20 10:29       ` Thomas Monjalon
2016-03-20 17:38         ` Jerin Jacob
2016-03-21  5:42   ` Jianbo Liu
2016-03-21 12:21     ` Jan Viktorin
2016-03-21 13:24       ` Thomas Monjalon
2016-03-21 14:01         ` Jan Viktorin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).