patches for DPDK stable branches
 help / color / mirror / Atom feed
* [dpdk-stable] [PATCH v2 2/5] net/hns3: fix build with sve enabled
       [not found] ` <20210108082523.1062058-1-ruifeng.wang@arm.com>
@ 2021-01-08  8:25   ` Ruifeng Wang
  2021-01-09  0:06     ` Honnappa Nagarahalli
  2021-01-09  2:15     ` oulijun
  2021-01-08  8:25   ` [dpdk-stable] [PATCH v2 3/5] net/octeontx: " Ruifeng Wang
  2021-01-08  8:25   ` [dpdk-stable] [PATCH v2 4/5] common/octeontx2: " Ruifeng Wang
  2 siblings, 2 replies; 17+ messages in thread
From: Ruifeng Wang @ 2021-01-08  8:25 UTC (permalink / raw)
  To: Wei Hu (Xavier), Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Huisong Li, Chengchang Tang,
	Chengwen Feng
  Cc: dev, vladimir.medvedkin, jerinj, hemant.agrawal,
	honnappa.nagarahalli, nd, Ruifeng Wang, stable

Building with SVE extension enabled stopped with error:

 error: ACLE function ‘svwhilelt_b64_s32’ requires ISA extension ‘sve’
   18 | #define PG64_256BIT  svwhilelt_b64(0, 4)

This is caused by unintentional cflags reset.
Fixed the issue by appending required flag to cflags instead of
overriding it.

Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
Cc: xavier.huwei@huawei.com
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 drivers/net/hns3/meson.build | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/hns3/meson.build b/drivers/net/hns3/meson.build
index 45cee34d9..798086357 100644
--- a/drivers/net/hns3/meson.build
+++ b/drivers/net/hns3/meson.build
@@ -32,7 +32,7 @@ deps += ['hash']
 if arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
 	sources += files('hns3_rxtx_vec.c')
 	if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
-		cflags = ['-DCC_SVE_SUPPORT']
+		cflags += ['-DCC_SVE_SUPPORT']
 		sources += files('hns3_rxtx_vec_sve.c')
 	endif
 endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-stable] [PATCH v2 3/5] net/octeontx: fix build with sve enabled
       [not found] ` <20210108082523.1062058-1-ruifeng.wang@arm.com>
  2021-01-08  8:25   ` [dpdk-stable] [PATCH v2 2/5] net/hns3: fix build with sve enabled Ruifeng Wang
@ 2021-01-08  8:25   ` Ruifeng Wang
  2021-01-08  8:25   ` [dpdk-stable] [PATCH v2 4/5] common/octeontx2: " Ruifeng Wang
  2 siblings, 0 replies; 17+ messages in thread
From: Ruifeng Wang @ 2021-01-08  8:25 UTC (permalink / raw)
  To: Harman Kalra, Jerin Jacob, Santosh Shukla
  Cc: dev, vladimir.medvedkin, jerinj, hemant.agrawal,
	honnappa.nagarahalli, nd, Ruifeng Wang, stable

Building with gcc 10.2 with SVE extension enabled got error:

{standard input}: Assembler messages:
{standard input}:91: Error: selected processor does not support `addvl x4,x8,#-1'
{standard input}:95: Error: selected processor does not support `ptrue p1.d,all'
{standard input}:135: Error: selected processor does not support `whilelo p2.d,xzr,x5'
{standard input}:137: Error: selected processor does not support `decb x1'

This is because inline assembly code explicitly resets cpu model to
not have SVE support. Thus SVE instructions generated by compiler
auto vectorization got rejected by assembler.

Fixed the issue by replacing inline assembly with equivalent atomic
built-ins. Compiler will generate LSE instructions for cpu that has
the extension.

Fixes: f0c7bb1bf778 ("net/octeontx/base: add octeontx IO operations")
Cc: jerinj@marvell.com
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 drivers/net/octeontx/base/octeontx_io.h | 16 ++++------------
 1 file changed, 4 insertions(+), 12 deletions(-)

diff --git a/drivers/net/octeontx/base/octeontx_io.h b/drivers/net/octeontx/base/octeontx_io.h
index 04b9ce191..0bf9b100d 100644
--- a/drivers/net/octeontx/base/octeontx_io.h
+++ b/drivers/net/octeontx/base/octeontx_io.h
@@ -58,14 +58,8 @@ do {							\
 static inline uint64_t
 octeontx_reg_ldadd_u64(void *addr, int64_t off)
 {
-	uint64_t old_val;
-
-	__asm__ volatile(
-		" .cpu		generic+lse\n"
-		" ldadd	%1, %0, [%2]\n"
-		: "=r" (old_val) : "r" (off), "r" (addr) : "memory");
-
-	return old_val;
+	return (uint64_t)__atomic_fetch_add((int64_t *)addr, off,
+						__ATOMIC_RELAXED);
 }
 
 /**
@@ -97,10 +91,8 @@ octeontx_reg_lmtst(void *lmtline_va, void *ioreg_va, const uint64_t cmdbuf[],
 		}
 
 		/* LDEOR initiates atomic transfer to I/O device */
-		__asm__ volatile(
-			" .cpu		generic+lse\n"
-			" ldeor	xzr, %0, [%1]\n"
-			: "=r" (result) : "r" (ioreg_va) : "memory");
+		result = __atomic_fetch_xor((uint64_t *)ioreg_va, 0,
+						__ATOMIC_RELAXED);
 	} while (!result);
 }
 
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-stable] [PATCH v2 4/5] common/octeontx2: fix build with sve enabled
       [not found] ` <20210108082523.1062058-1-ruifeng.wang@arm.com>
  2021-01-08  8:25   ` [dpdk-stable] [PATCH v2 2/5] net/hns3: fix build with sve enabled Ruifeng Wang
  2021-01-08  8:25   ` [dpdk-stable] [PATCH v2 3/5] net/octeontx: " Ruifeng Wang
@ 2021-01-08  8:25   ` Ruifeng Wang
  2021-01-08 10:29     ` [dpdk-stable] [EXT] " Pavan Nikhilesh Bhagavatula
  2 siblings, 1 reply; 17+ messages in thread
From: Ruifeng Wang @ 2021-01-08  8:25 UTC (permalink / raw)
  To: Jerin Jacob, Nithin Dabilpuram, Pavan Nikhilesh
  Cc: dev, vladimir.medvedkin, hemant.agrawal, honnappa.nagarahalli,
	nd, Ruifeng Wang, stable

Building with gcc 10.2 with SVE extension enabled got error:

{standard input}: Assembler messages:
{standard input}:4002: Error: selected processor does not support `mov z3.b,#0'
{standard input}:4003: Error: selected processor does not support `whilelo p1.b,xzr,x7'
{standard input}:4005: Error: selected processor does not support `ld1b z0.b,p1/z,[x8]'
{standard input}:4006: Error: selected processor does not support `whilelo p4.s,wzr,w7'

This is because inline assembly code explicitly resets cpu model to
not have SVE support. Thus SVE instructions generated by compiler
auto vectorization got rejected by assembler.

Fixed the issue by replacing inline assembly with equivalent atomic
built-ins. Compiler will generate LSE instructions for cpu that has
the extension.

Fixes: 8a4f835971f5 ("common/octeontx2: add IO handling APIs")
Cc: jerinj@marvell.com
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
 drivers/common/octeontx2/otx2_io_arm64.h | 37 +++---------------------
 1 file changed, 4 insertions(+), 33 deletions(-)

diff --git a/drivers/common/octeontx2/otx2_io_arm64.h b/drivers/common/octeontx2/otx2_io_arm64.h
index b5c85d9a6..8843a79b5 100644
--- a/drivers/common/octeontx2/otx2_io_arm64.h
+++ b/drivers/common/octeontx2/otx2_io_arm64.h
@@ -24,55 +24,26 @@
 static __rte_always_inline uint64_t
 otx2_atomic64_add_nosync(int64_t incr, int64_t *ptr)
 {
-	uint64_t result;
-
 	/* Atomic add with no ordering */
-	asm volatile (
-		".cpu  generic+lse\n"
-		"ldadd %x[i], %x[r], [%[b]]"
-		: [r] "=r" (result), "+m" (*ptr)
-		: [i] "r" (incr), [b] "r" (ptr)
-		: "memory");
-	return result;
+	return (uint64_t)__atomic_fetch_add(ptr, incr, __ATOMIC_RELAXED);
 }
 
 static __rte_always_inline uint64_t
 otx2_atomic64_add_sync(int64_t incr, int64_t *ptr)
 {
-	uint64_t result;
-
-	/* Atomic add with ordering */
-	asm volatile (
-		".cpu  generic+lse\n"
-		"ldadda %x[i], %x[r], [%[b]]"
-		: [r] "=r" (result), "+m" (*ptr)
-		: [i] "r" (incr), [b] "r" (ptr)
-		: "memory");
-	return result;
+	return (uint64_t)__atomic_fetch_add(ptr, incr, __ATOMIC_ACQUIRE);
 }
 
 static __rte_always_inline uint64_t
 otx2_lmt_submit(rte_iova_t io_address)
 {
-	uint64_t result;
-
-	asm volatile (
-		".cpu  generic+lse\n"
-		"ldeor xzr,%x[rf],[%[rs]]" :
-		 [rf] "=r"(result): [rs] "r"(io_address));
-	return result;
+	return __atomic_fetch_xor((uint64_t *)io_address, 0, __ATOMIC_RELAXED);
 }
 
 static __rte_always_inline uint64_t
 otx2_lmt_submit_release(rte_iova_t io_address)
 {
-	uint64_t result;
-
-	asm volatile (
-		".cpu  generic+lse\n"
-		"ldeorl xzr,%x[rf],[%[rs]]" :
-		 [rf] "=r"(result) : [rs] "r"(io_address));
-	return result;
+	return __atomic_fetch_xor((uint64_t *)io_address, 0, __ATOMIC_RELEASE);
 }
 
 static __rte_always_inline void
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-stable] [EXT] [PATCH v2 4/5] common/octeontx2: fix build with sve enabled
  2021-01-08  8:25   ` [dpdk-stable] [PATCH v2 4/5] common/octeontx2: " Ruifeng Wang
@ 2021-01-08 10:29     ` Pavan Nikhilesh Bhagavatula
  2021-01-11  9:51       ` Ruifeng Wang
  0 siblings, 1 reply; 17+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2021-01-08 10:29 UTC (permalink / raw)
  To: Ruifeng Wang, Jerin Jacob Kollanukkaran, Nithin Kumar Dabilpuram
  Cc: dev, vladimir.medvedkin, hemant.agrawal, honnappa.nagarahalli,
	nd, stable

Hi Ruifeng,

>Building with gcc 10.2 with SVE extension enabled got error:
>
>{standard input}: Assembler messages:
>{standard input}:4002: Error: selected processor does not support `mov
>z3.b,#0'
>{standard input}:4003: Error: selected processor does not support
>`whilelo p1.b,xzr,x7'
>{standard input}:4005: Error: selected processor does not support `ld1b
>z0.b,p1/z,[x8]'
>{standard input}:4006: Error: selected processor does not support
>`whilelo p4.s,wzr,w7'
>
>This is because inline assembly code explicitly resets cpu model to
>not have SVE support. Thus SVE instructions generated by compiler
>auto vectorization got rejected by assembler.
>
>Fixed the issue by replacing inline assembly with equivalent atomic
>built-ins. Compiler will generate LSE instructions for cpu that has
>the extension.
>
>Fixes: 8a4f835971f5 ("common/octeontx2: add IO handling APIs")
>Cc: jerinj@marvell.com
>Cc: stable@dpdk.org
>
>Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
>---
> drivers/common/octeontx2/otx2_io_arm64.h | 37 +++--------------------
>-
> 1 file changed, 4 insertions(+), 33 deletions(-)
>
>diff --git a/drivers/common/octeontx2/otx2_io_arm64.h
>b/drivers/common/octeontx2/otx2_io_arm64.h
>index b5c85d9a6..8843a79b5 100644
>--- a/drivers/common/octeontx2/otx2_io_arm64.h
>+++ b/drivers/common/octeontx2/otx2_io_arm64.h
>@@ -24,55 +24,26 @@
> static __rte_always_inline uint64_t
> otx2_atomic64_add_nosync(int64_t incr, int64_t *ptr)
> {
>-	uint64_t result;
>-
> 	/* Atomic add with no ordering */
>-	asm volatile (
>-		".cpu  generic+lse\n"
>-		"ldadd %x[i], %x[r], [%[b]]"
>-		: [r] "=r" (result), "+m" (*ptr)
>-		: [i] "r" (incr), [b] "r" (ptr)
>-		: "memory");
>-	return result;
>+	return (uint64_t)__atomic_fetch_add(ptr, incr,
>__ATOMIC_RELAXED);
> }
>

Here LDADD acts as a way to interface to co-processors i.e. 
LDADD instruction opcode + specific io address are recognized by 
HW interceptor and dispatched to the specific coprocessor.

Leaving it to the compiler to use the correct instruction is a bad idea.
This breaks the arm64_armv8_linux_gcc build as it doesn't have the
+lse enabled.
__atomic_fetch_add will generate a different instruction with SVE 
enabled.

Instead can we add +sve to the first line to prevent outer loop from optimizing out 
the trap?

I tested with 10.2 and n2 config below change works fine.
-" .cpu          generic+lse\n"
+" .cpu		generic+lse+sve\n"

Regards,
Pavan.

> static __rte_always_inline uint64_t
> otx2_atomic64_add_sync(int64_t incr, int64_t *ptr)
> {
>-	uint64_t result;
>-
>-	/* Atomic add with ordering */
>-	asm volatile (
>-		".cpu  generic+lse\n"
>-		"ldadda %x[i], %x[r], [%[b]]"
>-		: [r] "=r" (result), "+m" (*ptr)
>-		: [i] "r" (incr), [b] "r" (ptr)
>-		: "memory");
>-	return result;
>+	return (uint64_t)__atomic_fetch_add(ptr, incr,
>__ATOMIC_ACQUIRE);
> }
>
> static __rte_always_inline uint64_t
> otx2_lmt_submit(rte_iova_t io_address)
> {
>-	uint64_t result;
>-
>-	asm volatile (
>-		".cpu  generic+lse\n"
>-		"ldeor xzr,%x[rf],[%[rs]]" :
>-		 [rf] "=r"(result): [rs] "r"(io_address));
>-	return result;
>+	return __atomic_fetch_xor((uint64_t *)io_address, 0,
>__ATOMIC_RELAXED);
> }
>
> static __rte_always_inline uint64_t
> otx2_lmt_submit_release(rte_iova_t io_address)
> {
>-	uint64_t result;
>-
>-	asm volatile (
>-		".cpu  generic+lse\n"
>-		"ldeorl xzr,%x[rf],[%[rs]]" :
>-		 [rf] "=r"(result) : [rs] "r"(io_address));
>-	return result;
>+	return __atomic_fetch_xor((uint64_t *)io_address, 0,
>__ATOMIC_RELEASE);
> }
>
> static __rte_always_inline void
>--
>2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-stable] [PATCH v2 2/5] net/hns3: fix build with sve enabled
  2021-01-08  8:25   ` [dpdk-stable] [PATCH v2 2/5] net/hns3: fix build with sve enabled Ruifeng Wang
@ 2021-01-09  0:06     ` Honnappa Nagarahalli
  2021-01-09  2:11       ` oulijun
  2021-01-09  2:15     ` oulijun
  1 sibling, 1 reply; 17+ messages in thread
From: Honnappa Nagarahalli @ 2021-01-09  0:06 UTC (permalink / raw)
  To: Ruifeng Wang, Wei Hu (Xavier), Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Huisong Li, Chengchang Tang,
	Chengwen Feng
  Cc: dev, vladimir.medvedkin, jerinj, hemant.agrawal, nd,
	Ruifeng Wang, stable, Honnappa Nagarahalli, nd

<snip>

> 
> Building with SVE extension enabled stopped with error:
> 
>  error: ACLE function ‘svwhilelt_b64_s32’ requires ISA extension ‘sve’
>    18 | #define PG64_256BIT  svwhilelt_b64(0, 4)
> 
> This is caused by unintentional cflags reset.
> Fixed the issue by appending required flag to cflags instead of overriding it.
> 
> Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
> Cc: xavier.huwei@huawei.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>  drivers/net/hns3/meson.build | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/hns3/meson.build b/drivers/net/hns3/meson.build
> index 45cee34d9..798086357 100644
> --- a/drivers/net/hns3/meson.build
> +++ b/drivers/net/hns3/meson.build
> @@ -32,7 +32,7 @@ deps += ['hash']
>  if arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
>  	sources += files('hns3_rxtx_vec.c')
>  	if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
> -		cflags = ['-DCC_SVE_SUPPORT']
> +		cflags += ['-DCC_SVE_SUPPORT']
This comment is unrelated to this patch. We need to be consistent with the macro definitions. Is '__ARM_FEATURE_SVE' not enough? If we need to define an additional flag, I would name it something like 'RTE_ARM_FEATURE_SVE'.

>  		sources += files('hns3_rxtx_vec_sve.c')
>  	endif
>  endif
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-stable] [PATCH v2 2/5] net/hns3: fix build with sve enabled
  2021-01-09  0:06     ` Honnappa Nagarahalli
@ 2021-01-09  2:11       ` oulijun
  2021-01-11  2:39         ` Ruifeng Wang
  0 siblings, 1 reply; 17+ messages in thread
From: oulijun @ 2021-01-09  2:11 UTC (permalink / raw)
  To: Honnappa Nagarahalli, Ruifeng Wang, Wei Hu (Xavier),
	Min Hu (Connor),
	Yisen Zhuang, Huisong Li, Chengchang Tang, Chengwen Feng
  Cc: dev, vladimir.medvedkin, jerinj, hemant.agrawal, nd, stable


在 2021/1/9 8:06, Honnappa Nagarahalli 写道:
> <snip>
> 
>>
>> Building with SVE extension enabled stopped with error:
>>
>>   error: ACLE function ‘svwhilelt_b64_s32’ requires ISA extension ‘sve’
>>     18 | #define PG64_256BIT  svwhilelt_b64(0, 4)
>>
>> This is caused by unintentional cflags reset.
>> Fixed the issue by appending required flag to cflags instead of overriding it.
>>
>> Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
>> Cc: xavier.huwei@huawei.com
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
>> ---
>>   drivers/net/hns3/meson.build | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/hns3/meson.build b/drivers/net/hns3/meson.build
>> index 45cee34d9..798086357 100644
>> --- a/drivers/net/hns3/meson.build
>> +++ b/drivers/net/hns3/meson.build
>> @@ -32,7 +32,7 @@ deps += ['hash']
>>   if arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
>>   	sources += files('hns3_rxtx_vec.c')
>>   	if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
>> -		cflags = ['-DCC_SVE_SUPPORT']
>> +		cflags += ['-DCC_SVE_SUPPORT']
> This comment is unrelated to this patch. We need to be consistent with the macro definitions. Is '__ARM_FEATURE_SVE' not enough? If we need to define an additional flag, I would name it something like 'RTE_ARM_FEATURE_SVE'.
> 
I think the __ARM_FEATURE_SVE is ok. if use the gcc version included SVE 
flag, it will be identified as __ARM_FEATURE_SVE. it is defined in the 
ARM SVE document.
>>   		sources += files('hns3_rxtx_vec_sve.c')
>>   	endif
>>   endif
>> --
>> 2.25.1
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-stable] [PATCH v2 2/5] net/hns3: fix build with sve enabled
  2021-01-08  8:25   ` [dpdk-stable] [PATCH v2 2/5] net/hns3: fix build with sve enabled Ruifeng Wang
  2021-01-09  0:06     ` Honnappa Nagarahalli
@ 2021-01-09  2:15     ` oulijun
  2021-01-11  2:27       ` Ruifeng Wang
  1 sibling, 1 reply; 17+ messages in thread
From: oulijun @ 2021-01-09  2:15 UTC (permalink / raw)
  To: Ruifeng Wang, Wei Hu (Xavier), Min Hu (Connor),
	Yisen Zhuang, Huisong Li, Chengchang Tang, Chengwen Feng
  Cc: dev, vladimir.medvedkin, jerinj, hemant.agrawal,
	honnappa.nagarahalli, nd, stable



在 2021/1/8 16:25, Ruifeng Wang 写道:
> Building with SVE extension enabled stopped with error:
> 
>   error: ACLE function ‘svwhilelt_b64_s32’ requires ISA extension ‘sve’
>     18 | #define PG64_256BIT  svwhilelt_b64(0, 4)
> 
> This is caused by unintentional cflags reset.
> Fixed the issue by appending required flag to cflags instead of
> overriding it.
> 
> Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
> Cc: xavier.huwei@huawei.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
>   drivers/net/hns3/meson.build | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/hns3/meson.build b/drivers/net/hns3/meson.build
> index 45cee34d9..798086357 100644
> --- a/drivers/net/hns3/meson.build
> +++ b/drivers/net/hns3/meson.build
> @@ -32,7 +32,7 @@ deps += ['hash']
>   if arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
>   	sources += files('hns3_rxtx_vec.c')
>   	if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
> -		cflags = ['-DCC_SVE_SUPPORT']
> +		cflags += ['-DCC_SVE_SUPPORT']
Hi
   I noticed this patch, but I checked that the hns3 driver did not use 
this function.How did you compile it?

Thanks
Lijun Ou
>   		sources += files('hns3_rxtx_vec_sve.c')
>   	endif
>   endif
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-stable] [PATCH v2 2/5] net/hns3: fix build with sve enabled
  2021-01-09  2:15     ` oulijun
@ 2021-01-11  2:27       ` Ruifeng Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Ruifeng Wang @ 2021-01-11  2:27 UTC (permalink / raw)
  To: oulijun, Wei Hu (Xavier), Min Hu (Connor),
	Yisen Zhuang, Huisong Li, Chengchang Tang, Chengwen Feng
  Cc: dev, vladimir.medvedkin, jerinj, hemant.agrawal,
	Honnappa Nagarahalli, nd, stable, nd


> -----Original Message-----
> From: oulijun <oulijun@huawei.com>
> Sent: Saturday, January 9, 2021 10:16 AM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; Wei Hu (Xavier)
> <xavier.huwei@huawei.com>; Min Hu (Connor) <humin29@huawei.com>;
> Yisen Zhuang <yisen.zhuang@huawei.com>; Huisong Li
> <lihuisong@huawei.com>; Chengchang Tang
> <tangchengchang@huawei.com>; Chengwen Feng
> <fengchengwen@huawei.com>
> Cc: dev@dpdk.org; vladimir.medvedkin@intel.com; jerinj@marvell.com;
> hemant.agrawal@nxp.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; stable@dpdk.org
> Subject: Re: [PATCH v2 2/5] net/hns3: fix build with sve enabled
> 
> 
> 
> 在 2021/1/8 16:25, Ruifeng Wang 写道:
> > Building with SVE extension enabled stopped with error:
> >
> >   error: ACLE function ‘svwhilelt_b64_s32’ requires ISA extension ‘sve’
> >     18 | #define PG64_256BIT  svwhilelt_b64(0, 4)
> >
> > This is caused by unintentional cflags reset.
> > Fixed the issue by appending required flag to cflags instead of
> > overriding it.
> >
> > Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
> > Cc: xavier.huwei@huawei.com
> > Cc: stable@dpdk.org
> >
> > Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > ---
> >   drivers/net/hns3/meson.build | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/drivers/net/hns3/meson.build
> > b/drivers/net/hns3/meson.build index 45cee34d9..798086357 100644
> > --- a/drivers/net/hns3/meson.build
> > +++ b/drivers/net/hns3/meson.build
> > @@ -32,7 +32,7 @@ deps += ['hash']
> >   if arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
> >   	sources += files('hns3_rxtx_vec.c')
> >   	if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
> > -		cflags = ['-DCC_SVE_SUPPORT']
> > +		cflags += ['-DCC_SVE_SUPPORT']
> Hi
>    I noticed this patch, but I checked that the hns3 driver did not use this
> function.How did you compile it?

Hi,
The hns3 driver has sve rx/tx implementation in hns3_rxtx_vec_sve.c. This path
will be enabled when compiling with sve feature enabled.

I compiled it by using gcc-10.2 with flag '-march=armv8.3-a+sve'. 
You can try compile for n2 with the cross file added in this series (5/5).

Thanks,
Ruifeng
> 
> Thanks
> Lijun Ou
> >   		sources += files('hns3_rxtx_vec_sve.c')
> >   	endif
> >   endif
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-stable] [PATCH v2 2/5] net/hns3: fix build with sve enabled
  2021-01-09  2:11       ` oulijun
@ 2021-01-11  2:39         ` Ruifeng Wang
  2021-01-11 13:38           ` Honnappa Nagarahalli
  0 siblings, 1 reply; 17+ messages in thread
From: Ruifeng Wang @ 2021-01-11  2:39 UTC (permalink / raw)
  To: oulijun, Honnappa Nagarahalli, Min Hu (Connor),
	Yisen Zhuang, Huisong Li, Chengchang Tang, Chengwen Feng
  Cc: dev, vladimir.medvedkin, jerinj, hemant.agrawal, nd, stable, nd


> -----Original Message-----
> From: oulijun <oulijun@huawei.com>
> Sent: Saturday, January 9, 2021 10:12 AM
> To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; Wei Hu (Xavier) <xavier.huwei@huawei.com>;
> Min Hu (Connor) <humin29@huawei.com>; Yisen Zhuang
> <yisen.zhuang@huawei.com>; Huisong Li <lihuisong@huawei.com>;
> Chengchang Tang <tangchengchang@huawei.com>; Chengwen Feng
> <fengchengwen@huawei.com>
> Cc: dev@dpdk.org; vladimir.medvedkin@intel.com; jerinj@marvell.com;
> hemant.agrawal@nxp.com; nd <nd@arm.com>; stable@dpdk.org
> Subject: Re: [PATCH v2 2/5] net/hns3: fix build with sve enabled
> 
> 
> 在 2021/1/9 8:06, Honnappa Nagarahalli 写道:
> > <snip>
> >
> >>
> >> Building with SVE extension enabled stopped with error:
> >>
> >>   error: ACLE function ‘svwhilelt_b64_s32’ requires ISA extension ‘sve’
> >>     18 | #define PG64_256BIT  svwhilelt_b64(0, 4)
> >>
> >> This is caused by unintentional cflags reset.
> >> Fixed the issue by appending required flag to cflags instead of overriding it.
> >>
> >> Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
> >> Cc: xavier.huwei@huawei.com
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >> ---
> >>   drivers/net/hns3/meson.build | 2 +-
> >>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/net/hns3/meson.build
> >> b/drivers/net/hns3/meson.build index 45cee34d9..798086357 100644
> >> --- a/drivers/net/hns3/meson.build
> >> +++ b/drivers/net/hns3/meson.build
> >> @@ -32,7 +32,7 @@ deps += ['hash']
> >>   if arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
> >>   	sources += files('hns3_rxtx_vec.c')
> >>   	if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
> >> -		cflags = ['-DCC_SVE_SUPPORT']
> >> +		cflags += ['-DCC_SVE_SUPPORT']
> > This comment is unrelated to this patch. We need to be consistent with the
> macro definitions. Is '__ARM_FEATURE_SVE' not enough? If we need to
> define an additional flag, I would name it something like
> 'RTE_ARM_FEATURE_SVE'.
> >
> I think the __ARM_FEATURE_SVE is ok. if use the gcc version included SVE
> flag, it will be identified as __ARM_FEATURE_SVE. it is defined in the ARM
> SVE document.

Yes, we can rely on flags defined by compiler and no extra flag is needed.
I can update in next version to remove this section from meson file and replace CC_SVE_SUPPORT in code.
> >>   		sources += files('hns3_rxtx_vec_sve.c')
> >>   	endif
> >>   endif
> >> --
> >> 2.25.1
> >

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-stable] [EXT] [PATCH v2 4/5] common/octeontx2: fix build with sve enabled
  2021-01-08 10:29     ` [dpdk-stable] [EXT] " Pavan Nikhilesh Bhagavatula
@ 2021-01-11  9:51       ` Ruifeng Wang
  0 siblings, 0 replies; 17+ messages in thread
From: Ruifeng Wang @ 2021-01-11  9:51 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, jerinj, Nithin Kumar Dabilpuram
  Cc: dev, vladimir.medvedkin, hemant.agrawal, Honnappa Nagarahalli,
	nd, stable, nd


> -----Original Message-----
> From: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>
> Sent: Friday, January 8, 2021 6:29 PM
> To: Ruifeng Wang <Ruifeng.Wang@arm.com>; jerinj@marvell.com; Nithin
> Kumar Dabilpuram <ndabilpuram@marvell.com>
> Cc: dev@dpdk.org; vladimir.medvedkin@intel.com;
> hemant.agrawal@nxp.com; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; nd <nd@arm.com>; stable@dpdk.org
> Subject: RE: [EXT] [PATCH v2 4/5] common/octeontx2: fix build with sve
> enabled
> 
> Hi Ruifeng,
> 
> >Building with gcc 10.2 with SVE extension enabled got error:
> >
> >{standard input}: Assembler messages:
> >{standard input}:4002: Error: selected processor does not support `mov
> >z3.b,#0'
> >{standard input}:4003: Error: selected processor does not support
> >`whilelo p1.b,xzr,x7'
> >{standard input}:4005: Error: selected processor does not support `ld1b
> >z0.b,p1/z,[x8]'
> >{standard input}:4006: Error: selected processor does not support
> >`whilelo p4.s,wzr,w7'
> >
> >This is because inline assembly code explicitly resets cpu model to not
> >have SVE support. Thus SVE instructions generated by compiler auto
> >vectorization got rejected by assembler.
> >
> >Fixed the issue by replacing inline assembly with equivalent atomic
> >built-ins. Compiler will generate LSE instructions for cpu that has the
> >extension.
> >
> >Fixes: 8a4f835971f5 ("common/octeontx2: add IO handling APIs")
> >Cc: jerinj@marvell.com
> >Cc: stable@dpdk.org
> >
> >Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> >---
> > drivers/common/octeontx2/otx2_io_arm64.h | 37 +++--------------------
> >-
> > 1 file changed, 4 insertions(+), 33 deletions(-)
> >
> >diff --git a/drivers/common/octeontx2/otx2_io_arm64.h
> >b/drivers/common/octeontx2/otx2_io_arm64.h
> >index b5c85d9a6..8843a79b5 100644
> >--- a/drivers/common/octeontx2/otx2_io_arm64.h
> >+++ b/drivers/common/octeontx2/otx2_io_arm64.h
> >@@ -24,55 +24,26 @@
> > static __rte_always_inline uint64_t
> > otx2_atomic64_add_nosync(int64_t incr, int64_t *ptr)  {
> >-	uint64_t result;
> >-
> > 	/* Atomic add with no ordering */
> >-	asm volatile (
> >-		".cpu  generic+lse\n"
> >-		"ldadd %x[i], %x[r], [%[b]]"
> >-		: [r] "=r" (result), "+m" (*ptr)
> >-		: [i] "r" (incr), [b] "r" (ptr)
> >-		: "memory");
> >-	return result;
> >+	return (uint64_t)__atomic_fetch_add(ptr, incr,
> >__ATOMIC_RELAXED);
> > }
> >
> 
> Here LDADD acts as a way to interface to co-processors i.e.
> LDADD instruction opcode + specific io address are recognized by HW
> interceptor and dispatched to the specific coprocessor.

OK. Now I understand the background.
> 
> Leaving it to the compiler to use the correct instruction is a bad idea.
> This breaks the arm64_armv8_linux_gcc build as it doesn't have the
> +lse enabled.
> __atomic_fetch_add will generate a different instruction with SVE enabled.
> 
> Instead can we add +sve to the first line to prevent outer loop from
> optimizing out the trap?

Since the inline assembly needs to be preserved, we have to tune the enabled extensions.
I will change in next version.

Thanks,
Ruifeng
> 
> I tested with 10.2 and n2 config below change works fine.
> -" .cpu          generic+lse\n"
> +" .cpu		generic+lse+sve\n"
> 
> Regards,
> Pavan.
> 
> > static __rte_always_inline uint64_t
> > otx2_atomic64_add_sync(int64_t incr, int64_t *ptr)  {
> >-	uint64_t result;
> >-
> >-	/* Atomic add with ordering */
> >-	asm volatile (
> >-		".cpu  generic+lse\n"
> >-		"ldadda %x[i], %x[r], [%[b]]"
> >-		: [r] "=r" (result), "+m" (*ptr)
> >-		: [i] "r" (incr), [b] "r" (ptr)
> >-		: "memory");
> >-	return result;
> >+	return (uint64_t)__atomic_fetch_add(ptr, incr,
> >__ATOMIC_ACQUIRE);
> > }
> >
> > static __rte_always_inline uint64_t
> > otx2_lmt_submit(rte_iova_t io_address)  {
> >-	uint64_t result;
> >-
> >-	asm volatile (
> >-		".cpu  generic+lse\n"
> >-		"ldeor xzr,%x[rf],[%[rs]]" :
> >-		 [rf] "=r"(result): [rs] "r"(io_address));
> >-	return result;
> >+	return __atomic_fetch_xor((uint64_t *)io_address, 0,
> >__ATOMIC_RELAXED);
> > }
> >
> > static __rte_always_inline uint64_t
> > otx2_lmt_submit_release(rte_iova_t io_address)  {
> >-	uint64_t result;
> >-
> >-	asm volatile (
> >-		".cpu  generic+lse\n"
> >-		"ldeorl xzr,%x[rf],[%[rs]]" :
> >-		 [rf] "=r"(result) : [rs] "r"(io_address));
> >-	return result;
> >+	return __atomic_fetch_xor((uint64_t *)io_address, 0,
> >__ATOMIC_RELEASE);
> > }
> >
> > static __rte_always_inline void
> >--
> >2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-stable] [PATCH v2 2/5] net/hns3: fix build with sve enabled
  2021-01-11  2:39         ` Ruifeng Wang
@ 2021-01-11 13:38           ` Honnappa Nagarahalli
  0 siblings, 0 replies; 17+ messages in thread
From: Honnappa Nagarahalli @ 2021-01-11 13:38 UTC (permalink / raw)
  To: Ruifeng Wang, oulijun, Min Hu (Connor),
	Yisen Zhuang, Huisong Li, Chengchang Tang, Chengwen Feng
  Cc: dev, vladimir.medvedkin, jerinj, hemant.agrawal, nd, stable,
	Honnappa Nagarahalli, nd

<snip>

> > >
> > >>
> > >> Building with SVE extension enabled stopped with error:
> > >>
> > >>   error: ACLE function ‘svwhilelt_b64_s32’ requires ISA extension ‘sve’
> > >>     18 | #define PG64_256BIT  svwhilelt_b64(0, 4)
> > >>
> > >> This is caused by unintentional cflags reset.
> > >> Fixed the issue by appending required flag to cflags instead of overriding it.
> > >>
> > >> Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
> > >> Cc: xavier.huwei@huawei.com
> > >> Cc: stable@dpdk.org
> > >>
> > >> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> > >> ---
> > >>   drivers/net/hns3/meson.build | 2 +-
> > >>   1 file changed, 1 insertion(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/net/hns3/meson.build
> > >> b/drivers/net/hns3/meson.build index 45cee34d9..798086357 100644
> > >> --- a/drivers/net/hns3/meson.build
> > >> +++ b/drivers/net/hns3/meson.build
> > >> @@ -32,7 +32,7 @@ deps += ['hash']
> > >>   if arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
> > >>   sources += files('hns3_rxtx_vec.c')
> > >>   if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
> > >> -cflags = ['-DCC_SVE_SUPPORT']
> > >> +cflags += ['-DCC_SVE_SUPPORT']
> > > This comment is unrelated to this patch. We need to be consistent
> > > with the
> > macro definitions. Is '__ARM_FEATURE_SVE' not enough? If we need to
> > define an additional flag, I would name it something like
> > 'RTE_ARM_FEATURE_SVE'.
> > >
> > I think the __ARM_FEATURE_SVE is ok. if use the gcc version included
> > SVE flag, it will be identified as __ARM_FEATURE_SVE. it is defined in
> > the ARM SVE document.
> 
> Yes, we can rely on flags defined by compiler and no extra flag is needed.
> I can update in next version to remove this section from meson file and replace
> CC_SVE_SUPPORT in code.
Sounds good to me.

> > >>   sources += files('hns3_rxtx_vec_sve.c')
> > >>   endif
> > >>   endif
> > >> --
> > >> 2.25.1
> > >


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-stable] [PATCH v3 2/5] net/hns3: fix build with sve enabled
       [not found] ` <20210112025709.1121523-1-ruifeng.wang@arm.com>
@ 2021-01-12  2:57   ` Ruifeng Wang
  2021-01-13  2:16     ` Honnappa Nagarahalli
  2021-01-12  2:57   ` [dpdk-stable] [PATCH v3 3/5] net/octeontx: " Ruifeng Wang
  2021-01-12  2:57   ` [dpdk-stable] [PATCH v3 4/5] common/octeontx2: " Ruifeng Wang
  2 siblings, 1 reply; 17+ messages in thread
From: Ruifeng Wang @ 2021-01-12  2:57 UTC (permalink / raw)
  To: Wei Hu (Xavier), Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Chengwen Feng, Chengchang Tang,
	Huisong Li
  Cc: dev, vladimir.medvedkin, pbhagavatula, jerinj, hemant.agrawal,
	honnappa.nagarahalli, nd, Ruifeng Wang, stable

Building with SVE extension enabled stopped with error:

 error: ACLE function ‘svwhilelt_b64_s32’ requires ISA extension ‘sve’
   18 | #define PG64_256BIT  svwhilelt_b64(0, 4)

This is caused by unintentional cflags reset.
Fixed the issue by not touching cflags, and using flags defined by
compiler.

Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
Cc: stable@dpdk.org

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v3:
Removed extra flag, use compiler flag instead.

 drivers/net/hns3/hns3_rxtx.c | 4 ++--
 drivers/net/hns3/meson.build | 1 -
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/net/hns3/hns3_rxtx.c b/drivers/net/hns3/hns3_rxtx.c
index 88d3baba4..5ac36b314 100644
--- a/drivers/net/hns3/hns3_rxtx.c
+++ b/drivers/net/hns3/hns3_rxtx.c
@@ -10,7 +10,7 @@
 #include <rte_io.h>
 #include <rte_net.h>
 #include <rte_malloc.h>
-#if defined(RTE_ARCH_ARM64) && defined(CC_SVE_SUPPORT)
+#if defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_SVE)
 #include <rte_cpuflags.h>
 #endif
 
@@ -2467,7 +2467,7 @@ hns3_rx_burst_mode_get(struct rte_eth_dev *dev, __rte_unused uint16_t queue_id,
 static bool
 hns3_check_sve_support(void)
 {
-#if defined(RTE_ARCH_ARM64) && defined(CC_SVE_SUPPORT)
+#if defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_SVE)
 	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
 		return true;
 #endif
diff --git a/drivers/net/hns3/meson.build b/drivers/net/hns3/meson.build
index 45cee34d9..5674d986b 100644
--- a/drivers/net/hns3/meson.build
+++ b/drivers/net/hns3/meson.build
@@ -32,7 +32,6 @@ deps += ['hash']
 if arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
 	sources += files('hns3_rxtx_vec.c')
 	if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
-		cflags = ['-DCC_SVE_SUPPORT']
 		sources += files('hns3_rxtx_vec_sve.c')
 	endif
 endif
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-stable] [PATCH v3 3/5] net/octeontx: fix build with sve enabled
       [not found] ` <20210112025709.1121523-1-ruifeng.wang@arm.com>
  2021-01-12  2:57   ` [dpdk-stable] [PATCH v3 2/5] net/hns3: " Ruifeng Wang
@ 2021-01-12  2:57   ` Ruifeng Wang
  2021-01-12  4:39     ` [dpdk-stable] [dpdk-dev] " Jerin Jacob
  2021-01-12  2:57   ` [dpdk-stable] [PATCH v3 4/5] common/octeontx2: " Ruifeng Wang
  2 siblings, 1 reply; 17+ messages in thread
From: Ruifeng Wang @ 2021-01-12  2:57 UTC (permalink / raw)
  To: Harman Kalra, Santosh Shukla, Jerin Jacob
  Cc: dev, vladimir.medvedkin, pbhagavatula, jerinj, hemant.agrawal,
	honnappa.nagarahalli, nd, Ruifeng Wang, stable

Building with gcc 10.2 with SVE extension enabled got error:

{standard input}: Assembler messages:
{standard input}:91: Error: selected processor does not support `addvl x4,x8,#-1'
{standard input}:95: Error: selected processor does not support `ptrue p1.d,all'
{standard input}:135: Error: selected processor does not support `whilelo p2.d,xzr,x5'
{standard input}:137: Error: selected processor does not support `decb x1'

This is because inline assembly code explicitly resets cpu model to
not have SVE support. Thus SVE instructions generated by compiler
auto vectorization got rejected by assembler.

Added SVE to the cpu model specified by inline assembly for SVE support.
Not replacing the inline assembly with C atomics because the driver relies
on specific LSE instruction to interface to co-processor [1].

Fixes: f0c7bb1bf778 ("net/octeontx/base: add octeontx IO operations")
Cc: jerinj@marvell.com
Cc: stable@dpdk.org

[1] https://mails.dpdk.org/archives/dev/2021-January/196092.html

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v3:
Keep inline assembly and add sve extension to fix issue. (Pavan)

 drivers/net/octeontx/base/octeontx_io.h | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/net/octeontx/base/octeontx_io.h b/drivers/net/octeontx/base/octeontx_io.h
index 04b9ce191..d0b9cfbc6 100644
--- a/drivers/net/octeontx/base/octeontx_io.h
+++ b/drivers/net/octeontx/base/octeontx_io.h
@@ -52,6 +52,11 @@ do {							\
 #endif
 
 #if defined(RTE_ARCH_ARM64)
+#if defined(__ARM_FEATURE_SVE)
+#define __LSE_PREAMBLE " .cpu	generic+lse+sve\n"
+#else
+#define __LSE_PREAMBLE " .cpu	generic+lse\n"
+#endif
 /**
  * Perform an atomic fetch-and-add operation.
  */
@@ -61,7 +66,7 @@ octeontx_reg_ldadd_u64(void *addr, int64_t off)
 	uint64_t old_val;
 
 	__asm__ volatile(
-		" .cpu		generic+lse\n"
+		__LSE_PREAMBLE
 		" ldadd	%1, %0, [%2]\n"
 		: "=r" (old_val) : "r" (off), "r" (addr) : "memory");
 
@@ -98,12 +103,13 @@ octeontx_reg_lmtst(void *lmtline_va, void *ioreg_va, const uint64_t cmdbuf[],
 
 		/* LDEOR initiates atomic transfer to I/O device */
 		__asm__ volatile(
-			" .cpu		generic+lse\n"
+			__LSE_PREAMBLE
 			" ldeor	xzr, %0, [%1]\n"
 			: "=r" (result) : "r" (ioreg_va) : "memory");
 	} while (!result);
 }
 
+#undef __LSE_PREAMBLE
 #else
 
 static inline uint64_t
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [dpdk-stable] [PATCH v3 4/5] common/octeontx2: fix build with sve enabled
       [not found] ` <20210112025709.1121523-1-ruifeng.wang@arm.com>
  2021-01-12  2:57   ` [dpdk-stable] [PATCH v3 2/5] net/hns3: " Ruifeng Wang
  2021-01-12  2:57   ` [dpdk-stable] [PATCH v3 3/5] net/octeontx: " Ruifeng Wang
@ 2021-01-12  2:57   ` Ruifeng Wang
  2021-01-12  4:38     ` [dpdk-stable] [dpdk-dev] " Jerin Jacob
  2 siblings, 1 reply; 17+ messages in thread
From: Ruifeng Wang @ 2021-01-12  2:57 UTC (permalink / raw)
  To: Jerin Jacob, Nithin Dabilpuram, Pavan Nikhilesh
  Cc: dev, vladimir.medvedkin, hemant.agrawal, honnappa.nagarahalli,
	nd, Ruifeng Wang, stable

Building with gcc 10.2 with SVE extension enabled got error:

{standard input}: Assembler messages:
{standard input}:4002: Error: selected processor does not support `mov z3.b,#0'
{standard input}:4003: Error: selected processor does not support `whilelo p1.b,xzr,x7'
{standard input}:4005: Error: selected processor does not support `ld1b z0.b,p1/z,[x8]'
{standard input}:4006: Error: selected processor does not support `whilelo p4.s,wzr,w7'

This is because inline assembly code explicitly resets cpu model to
not have SVE support. Thus SVE instructions generated by compiler
auto vectorization got rejected by assembler.

Added SVE to the cpu model specified by inline assembly for SVE support.
Not replacing the inline assembly with C atomics because the driver relies
on specific LSE instruction to interface to co-processor [1].

Fixes: 8a4f835971f5 ("common/octeontx2: add IO handling APIs")
Cc: jerinj@marvell.com
Cc: stable@dpdk.org

[1] https://mails.dpdk.org/archives/dev/2021-January/196092.html

Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
---
v3:
Keep inline assembly and add sve extension to fix issue. (Pavan)

 drivers/common/octeontx2/otx2_io_arm64.h | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/common/octeontx2/otx2_io_arm64.h b/drivers/common/octeontx2/otx2_io_arm64.h
index b5c85d9a6..34268e3af 100644
--- a/drivers/common/octeontx2/otx2_io_arm64.h
+++ b/drivers/common/octeontx2/otx2_io_arm64.h
@@ -21,6 +21,12 @@
 #define otx2_prefetch_store_keep(ptr) ({\
 	asm volatile("prfm pstl1keep, [%x0]\n" : : "r" (ptr)); })
 
+#if defined(__ARM_FEATURE_SVE)
+#define __LSE_PREAMBLE " .cpu  generic+lse+sve\n"
+#else
+#define __LSE_PREAMBLE " .cpu  generic+lse\n"
+#endif
+
 static __rte_always_inline uint64_t
 otx2_atomic64_add_nosync(int64_t incr, int64_t *ptr)
 {
@@ -28,7 +34,7 @@ otx2_atomic64_add_nosync(int64_t incr, int64_t *ptr)
 
 	/* Atomic add with no ordering */
 	asm volatile (
-		".cpu  generic+lse\n"
+		__LSE_PREAMBLE
 		"ldadd %x[i], %x[r], [%[b]]"
 		: [r] "=r" (result), "+m" (*ptr)
 		: [i] "r" (incr), [b] "r" (ptr)
@@ -43,7 +49,7 @@ otx2_atomic64_add_sync(int64_t incr, int64_t *ptr)
 
 	/* Atomic add with ordering */
 	asm volatile (
-		".cpu  generic+lse\n"
+		__LSE_PREAMBLE
 		"ldadda %x[i], %x[r], [%[b]]"
 		: [r] "=r" (result), "+m" (*ptr)
 		: [i] "r" (incr), [b] "r" (ptr)
@@ -57,7 +63,7 @@ otx2_lmt_submit(rte_iova_t io_address)
 	uint64_t result;
 
 	asm volatile (
-		".cpu  generic+lse\n"
+		__LSE_PREAMBLE
 		"ldeor xzr,%x[rf],[%[rs]]" :
 		 [rf] "=r"(result): [rs] "r"(io_address));
 	return result;
@@ -69,7 +75,7 @@ otx2_lmt_submit_release(rte_iova_t io_address)
 	uint64_t result;
 
 	asm volatile (
-		".cpu  generic+lse\n"
+		__LSE_PREAMBLE
 		"ldeorl xzr,%x[rf],[%[rs]]" :
 		 [rf] "=r"(result) : [rs] "r"(io_address));
 	return result;
@@ -104,4 +110,5 @@ otx2_lmt_mov_seg(void *out, const void *in, const uint16_t segdw)
 		dst128[i] = src128[i];
 }
 
+#undef __LSE_PREAMBLE
 #endif /* _OTX2_IO_ARM64_H_ */
-- 
2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH v3 4/5] common/octeontx2: fix build with sve enabled
  2021-01-12  2:57   ` [dpdk-stable] [PATCH v3 4/5] common/octeontx2: " Ruifeng Wang
@ 2021-01-12  4:38     ` Jerin Jacob
  0 siblings, 0 replies; 17+ messages in thread
From: Jerin Jacob @ 2021-01-12  4:38 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: Jerin Jacob, Nithin Dabilpuram, Pavan Nikhilesh, dpdk-dev,
	Vladimir Medvedkin, Hemant Agrawal, Honnappa Nagarahalli, nd,
	dpdk stable

On Tue, Jan 12, 2021 at 8:28 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
>
> Building with gcc 10.2 with SVE extension enabled got error:
>
> {standard input}: Assembler messages:
> {standard input}:4002: Error: selected processor does not support `mov z3.b,#0'
> {standard input}:4003: Error: selected processor does not support `whilelo p1.b,xzr,x7'
> {standard input}:4005: Error: selected processor does not support `ld1b z0.b,p1/z,[x8]'
> {standard input}:4006: Error: selected processor does not support `whilelo p4.s,wzr,w7'
>
> This is because inline assembly code explicitly resets cpu model to
> not have SVE support. Thus SVE instructions generated by compiler
> auto vectorization got rejected by assembler.
>
> Added SVE to the cpu model specified by inline assembly for SVE support.
> Not replacing the inline assembly with C atomics because the driver relies
> on specific LSE instruction to interface to co-processor [1].
>
> Fixes: 8a4f835971f5 ("common/octeontx2: add IO handling APIs")
> Cc: jerinj@marvell.com
> Cc: stable@dpdk.org

Reviewed-by: Jerin Jacob <jerinj@marvell.com>



>
> [1] https://mails.dpdk.org/archives/dev/2021-January/196092.html
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>
> ---
> v3:
> Keep inline assembly and add sve extension to fix issue. (Pavan)
>
>  drivers/common/octeontx2/otx2_io_arm64.h | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/common/octeontx2/otx2_io_arm64.h b/drivers/common/octeontx2/otx2_io_arm64.h
> index b5c85d9a6..34268e3af 100644
> --- a/drivers/common/octeontx2/otx2_io_arm64.h
> +++ b/drivers/common/octeontx2/otx2_io_arm64.h
> @@ -21,6 +21,12 @@
>  #define otx2_prefetch_store_keep(ptr) ({\
>         asm volatile("prfm pstl1keep, [%x0]\n" : : "r" (ptr)); })
>
> +#if defined(__ARM_FEATURE_SVE)
> +#define __LSE_PREAMBLE " .cpu  generic+lse+sve\n"
> +#else
> +#define __LSE_PREAMBLE " .cpu  generic+lse\n"
> +#endif
> +
>  static __rte_always_inline uint64_t
>  otx2_atomic64_add_nosync(int64_t incr, int64_t *ptr)
>  {
> @@ -28,7 +34,7 @@ otx2_atomic64_add_nosync(int64_t incr, int64_t *ptr)
>
>         /* Atomic add with no ordering */
>         asm volatile (
> -               ".cpu  generic+lse\n"
> +               __LSE_PREAMBLE
>                 "ldadd %x[i], %x[r], [%[b]]"
>                 : [r] "=r" (result), "+m" (*ptr)
>                 : [i] "r" (incr), [b] "r" (ptr)
> @@ -43,7 +49,7 @@ otx2_atomic64_add_sync(int64_t incr, int64_t *ptr)
>
>         /* Atomic add with ordering */
>         asm volatile (
> -               ".cpu  generic+lse\n"
> +               __LSE_PREAMBLE
>                 "ldadda %x[i], %x[r], [%[b]]"
>                 : [r] "=r" (result), "+m" (*ptr)
>                 : [i] "r" (incr), [b] "r" (ptr)
> @@ -57,7 +63,7 @@ otx2_lmt_submit(rte_iova_t io_address)
>         uint64_t result;
>
>         asm volatile (
> -               ".cpu  generic+lse\n"
> +               __LSE_PREAMBLE
>                 "ldeor xzr,%x[rf],[%[rs]]" :
>                  [rf] "=r"(result): [rs] "r"(io_address));
>         return result;
> @@ -69,7 +75,7 @@ otx2_lmt_submit_release(rte_iova_t io_address)
>         uint64_t result;
>
>         asm volatile (
> -               ".cpu  generic+lse\n"
> +               __LSE_PREAMBLE
>                 "ldeorl xzr,%x[rf],[%[rs]]" :
>                  [rf] "=r"(result) : [rs] "r"(io_address));
>         return result;
> @@ -104,4 +110,5 @@ otx2_lmt_mov_seg(void *out, const void *in, const uint16_t segdw)
>                 dst128[i] = src128[i];
>  }
>
> +#undef __LSE_PREAMBLE
>  #endif /* _OTX2_IO_ARM64_H_ */
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-stable] [dpdk-dev] [PATCH v3 3/5] net/octeontx: fix build with sve enabled
  2021-01-12  2:57   ` [dpdk-stable] [PATCH v3 3/5] net/octeontx: " Ruifeng Wang
@ 2021-01-12  4:39     ` Jerin Jacob
  0 siblings, 0 replies; 17+ messages in thread
From: Jerin Jacob @ 2021-01-12  4:39 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: Harman Kalra, Santosh Shukla, Jerin Jacob, dpdk-dev,
	Vladimir Medvedkin, Pavan Nikhilesh, Jerin Jacob, Hemant Agrawal,
	Honnappa Nagarahalli, nd, dpdk stable

On Tue, Jan 12, 2021 at 8:28 AM Ruifeng Wang <ruifeng.wang@arm.com> wrote:
>
> Building with gcc 10.2 with SVE extension enabled got error:
>
> {standard input}: Assembler messages:
> {standard input}:91: Error: selected processor does not support `addvl x4,x8,#-1'
> {standard input}:95: Error: selected processor does not support `ptrue p1.d,all'
> {standard input}:135: Error: selected processor does not support `whilelo p2.d,xzr,x5'
> {standard input}:137: Error: selected processor does not support `decb x1'
>
> This is because inline assembly code explicitly resets cpu model to
> not have SVE support. Thus SVE instructions generated by compiler
> auto vectorization got rejected by assembler.
>
> Added SVE to the cpu model specified by inline assembly for SVE support.
> Not replacing the inline assembly with C atomics because the driver relies
> on specific LSE instruction to interface to co-processor [1].
>
> Fixes: f0c7bb1bf778 ("net/octeontx/base: add octeontx IO operations")
> Cc: jerinj@marvell.com
> Cc: stable@dpdk.org
>
> [1] https://mails.dpdk.org/archives/dev/2021-January/196092.html
>
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>


Reviewed-by: Jerin Jacob <jerinj@marvell.com>


> ---
> v3:
> Keep inline assembly and add sve extension to fix issue. (Pavan)
>
>  drivers/net/octeontx/base/octeontx_io.h | 10 ++++++++--
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/net/octeontx/base/octeontx_io.h b/drivers/net/octeontx/base/octeontx_io.h
> index 04b9ce191..d0b9cfbc6 100644
> --- a/drivers/net/octeontx/base/octeontx_io.h
> +++ b/drivers/net/octeontx/base/octeontx_io.h
> @@ -52,6 +52,11 @@ do {                                                 \
>  #endif
>
>  #if defined(RTE_ARCH_ARM64)
> +#if defined(__ARM_FEATURE_SVE)
> +#define __LSE_PREAMBLE " .cpu  generic+lse+sve\n"
> +#else
> +#define __LSE_PREAMBLE " .cpu  generic+lse\n"
> +#endif
>  /**
>   * Perform an atomic fetch-and-add operation.
>   */
> @@ -61,7 +66,7 @@ octeontx_reg_ldadd_u64(void *addr, int64_t off)
>         uint64_t old_val;
>
>         __asm__ volatile(
> -               " .cpu          generic+lse\n"
> +               __LSE_PREAMBLE
>                 " ldadd %1, %0, [%2]\n"
>                 : "=r" (old_val) : "r" (off), "r" (addr) : "memory");
>
> @@ -98,12 +103,13 @@ octeontx_reg_lmtst(void *lmtline_va, void *ioreg_va, const uint64_t cmdbuf[],
>
>                 /* LDEOR initiates atomic transfer to I/O device */
>                 __asm__ volatile(
> -                       " .cpu          generic+lse\n"
> +                       __LSE_PREAMBLE
>                         " ldeor xzr, %0, [%1]\n"
>                         : "=r" (result) : "r" (ioreg_va) : "memory");
>         } while (!result);
>  }
>
> +#undef __LSE_PREAMBLE
>  #else
>
>  static inline uint64_t
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [dpdk-stable] [PATCH v3 2/5] net/hns3: fix build with sve enabled
  2021-01-12  2:57   ` [dpdk-stable] [PATCH v3 2/5] net/hns3: " Ruifeng Wang
@ 2021-01-13  2:16     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 17+ messages in thread
From: Honnappa Nagarahalli @ 2021-01-13  2:16 UTC (permalink / raw)
  To: Ruifeng Wang, Wei Hu (Xavier), Min Hu (Connor),
	Yisen Zhuang, Lijun Ou, Chengwen Feng, Chengchang Tang,
	Huisong Li
  Cc: dev, vladimir.medvedkin, pbhagavatula, jerinj, hemant.agrawal,
	nd, Ruifeng Wang, stable, Honnappa Nagarahalli, nd

<snip>

> 
> Building with SVE extension enabled stopped with error:
> 
>  error: ACLE function ‘svwhilelt_b64_s32’ requires ISA extension ‘sve’
>    18 | #define PG64_256BIT  svwhilelt_b64(0, 4)
> 
> This is caused by unintentional cflags reset.
> Fixed the issue by not touching cflags, and using flags defined by compiler.
> 
> Fixes: 952ebacce4f2 ("net/hns3: support SVE Rx")
> Cc: stable@dpdk.org
> 
> Signed-off-by: Ruifeng Wang <ruifeng.wang@arm.com>

Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
> v3:
> Removed extra flag, use compiler flag instead.
> 
>  drivers/net/hns3/hns3_rxtx.c | 4 ++--
>  drivers/net/hns3/meson.build | 1 -
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/net/hns3/hns3_rxtx.c b/drivers/net/hns3/hns3_rxtx.c
> index 88d3baba4..5ac36b314 100644
> --- a/drivers/net/hns3/hns3_rxtx.c
> +++ b/drivers/net/hns3/hns3_rxtx.c
> @@ -10,7 +10,7 @@
>  #include <rte_io.h>
>  #include <rte_net.h>
>  #include <rte_malloc.h>
> -#if defined(RTE_ARCH_ARM64) && defined(CC_SVE_SUPPORT)
> +#if defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_SVE)
>  #include <rte_cpuflags.h>
>  #endif
> 
> @@ -2467,7 +2467,7 @@ hns3_rx_burst_mode_get(struct rte_eth_dev
> *dev, __rte_unused uint16_t queue_id,  static bool
>  hns3_check_sve_support(void)
>  {
> -#if defined(RTE_ARCH_ARM64) && defined(CC_SVE_SUPPORT)
> +#if defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_SVE)
>  	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_SVE))
>  		return true;
>  #endif
> diff --git a/drivers/net/hns3/meson.build b/drivers/net/hns3/meson.build
> index 45cee34d9..5674d986b 100644
> --- a/drivers/net/hns3/meson.build
> +++ b/drivers/net/hns3/meson.build
> @@ -32,7 +32,6 @@ deps += ['hash']
>  if arch_subdir == 'arm' and dpdk_conf.get('RTE_ARCH_64')
>  	sources += files('hns3_rxtx_vec.c')
>  	if cc.get_define('__ARM_FEATURE_SVE', args: machine_args) != ''
> -		cflags = ['-DCC_SVE_SUPPORT']
>  		sources += files('hns3_rxtx_vec_sve.c')
>  	endif
>  endif
> --
> 2.25.1


^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-01-13  2:17 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20201218101210.356836-1-ruifeng.wang@arm.com>
     [not found] ` <20210108082523.1062058-1-ruifeng.wang@arm.com>
2021-01-08  8:25   ` [dpdk-stable] [PATCH v2 2/5] net/hns3: fix build with sve enabled Ruifeng Wang
2021-01-09  0:06     ` Honnappa Nagarahalli
2021-01-09  2:11       ` oulijun
2021-01-11  2:39         ` Ruifeng Wang
2021-01-11 13:38           ` Honnappa Nagarahalli
2021-01-09  2:15     ` oulijun
2021-01-11  2:27       ` Ruifeng Wang
2021-01-08  8:25   ` [dpdk-stable] [PATCH v2 3/5] net/octeontx: " Ruifeng Wang
2021-01-08  8:25   ` [dpdk-stable] [PATCH v2 4/5] common/octeontx2: " Ruifeng Wang
2021-01-08 10:29     ` [dpdk-stable] [EXT] " Pavan Nikhilesh Bhagavatula
2021-01-11  9:51       ` Ruifeng Wang
     [not found] ` <20210112025709.1121523-1-ruifeng.wang@arm.com>
2021-01-12  2:57   ` [dpdk-stable] [PATCH v3 2/5] net/hns3: " Ruifeng Wang
2021-01-13  2:16     ` Honnappa Nagarahalli
2021-01-12  2:57   ` [dpdk-stable] [PATCH v3 3/5] net/octeontx: " Ruifeng Wang
2021-01-12  4:39     ` [dpdk-stable] [dpdk-dev] " Jerin Jacob
2021-01-12  2:57   ` [dpdk-stable] [PATCH v3 4/5] common/octeontx2: " Ruifeng Wang
2021-01-12  4:38     ` [dpdk-stable] [dpdk-dev] " Jerin Jacob

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).