DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] build: disable compiler AVX512F support
@ 2018-10-23 21:23 Yongseok Koh
  2018-11-01 23:11 ` Thomas Monjalon
                   ` (3 more replies)
  0 siblings, 4 replies; 30+ messages in thread
From: Yongseok Koh @ 2018-10-23 21:23 UTC (permalink / raw)
  To: Thomas Monjalon, bruce.richardson
  Cc: dev, Shahaf Shuler, Yongseok Koh, stable

This is a workaround to prevent a crash, which might be caused by
optimization of newer gcc (7.3.0) on Intel Skylake.

Bugzilla ID: 97

Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---
 config/x86/meson.build | 5 +++++
 mk/rte.cpuflags.mk     | 5 +++++
 2 files changed, 10 insertions(+)

diff --git a/config/x86/meson.build b/config/x86/meson.build
index 33efb5e547..e10ba872ac 100644
--- a/config/x86/meson.build
+++ b/config/x86/meson.build
@@ -47,6 +47,11 @@ endif
 if cc.get_define('__AVX512F__', args: march_opt) != ''
 	dpdk_conf.set('RTE_MACHINE_CPUFLAG_AVX512F', 1)
 	compile_time_cpuflags += ['RTE_CPUFLAG_AVX512F']
+else
+# disable compiler's AVX512F support as a workaround for Bug 97
+	if cc.has_argument('-mavx512f')
+		machine_args += '-mno-avx512f'
+	endif
 endif
 
 dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
index 43ed84155b..8fdb0cc2c3 100644
--- a/mk/rte.cpuflags.mk
+++ b/mk/rte.cpuflags.mk
@@ -68,6 +68,11 @@ endif
 ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),)
 ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
 CPUFLAGS += AVX512F
+else
+# disable compiler's AVX512F support as a workaround for Bug 97
+ifeq ($(shell $(CC) --target-help | grep -q mavx512f && echo 1), 1)
+MACHINE_CFLAGS += -mno-avx512f
+endif
 endif
 endif
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [PATCH] build: disable compiler AVX512F support
  2018-10-23 21:23 [dpdk-dev] [PATCH] build: disable compiler AVX512F support Yongseok Koh
@ 2018-11-01 23:11 ` Thomas Monjalon
  2018-11-02 12:42 ` [dpdk-dev] [dpdk-stable] " Ferruh Yigit
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 30+ messages in thread
From: Thomas Monjalon @ 2018-11-01 23:11 UTC (permalink / raw)
  To: dev, bruce.richardson, konstantin.ananyev
  Cc: Yongseok Koh, Shahaf Shuler, stable, ferruh.yigit

23/10/2018 23:23, Yongseok Koh:
> This is a workaround to prevent a crash, which might be caused by
> optimization of newer gcc (7.3.0) on Intel Skylake.
> 
> Bugzilla ID: 97
> 
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

I was asked to wait before applying this patch.
10 days passed and there is no comment, so I will apply it tomorrow.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH] build: disable compiler AVX512F support
  2018-10-23 21:23 [dpdk-dev] [PATCH] build: disable compiler AVX512F support Yongseok Koh
  2018-11-01 23:11 ` Thomas Monjalon
@ 2018-11-02 12:42 ` Ferruh Yigit
  2018-11-02 13:48   ` Ferruh Yigit
  2018-11-02 21:04 ` [dpdk-dev] [PATCH v2] " Yongseok Koh
  2018-11-03  1:06 ` [dpdk-dev] [PATCH v3] build: disable gcc AVX512F support Yongseok Koh
  3 siblings, 1 reply; 30+ messages in thread
From: Ferruh Yigit @ 2018-11-02 12:42 UTC (permalink / raw)
  To: Yongseok Koh, Thomas Monjalon, bruce.richardson
  Cc: dev, Shahaf Shuler, stable, Konstantin Ananyev

On 10/23/2018 10:23 PM, Yongseok Koh wrote:
> This is a workaround to prevent a crash, which might be caused by
> optimization of newer gcc (7.3.0) on Intel Skylake.
> 
> Bugzilla ID: 97

After checking the defect description again, this is the issue observed in
rte_memcpy() implementation for AVX2, compiler uses AVX512F instructions while
compiling it which causes the failure, so this may be a compiler defect but we
don't know the root cause yet.

I think best solution is to find the root cause and fix either avx2
implementation or compiler, but this seems won't be soon, at least for rc2.

What this patch does is to prevent compiler to use avx512f instruction when
"CONFIG_RTE_ENABLE_AVX512=n".

Concern is this will affect all DPDK generated code for x86, but since
rte_memcpy() in header file there is no way to disable using avx512f
instructions locally for rte_memcpy().
I can't think of any other solution for now, so OK to go with this patch for
now. Please find below comment.

> 
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> ---
>  config/x86/meson.build | 5 +++++
>  mk/rte.cpuflags.mk     | 5 +++++
>  2 files changed, 10 insertions(+)
> 
> diff --git a/config/x86/meson.build b/config/x86/meson.build
> index 33efb5e547..e10ba872ac 100644
> --- a/config/x86/meson.build
> +++ b/config/x86/meson.build
> @@ -47,6 +47,11 @@ endif
>  if cc.get_define('__AVX512F__', args: march_opt) != ''
>  	dpdk_conf.set('RTE_MACHINE_CPUFLAG_AVX512F', 1)
>  	compile_time_cpuflags += ['RTE_CPUFLAG_AVX512F']
> +else
> +# disable compiler's AVX512F support as a workaround for Bug 97
> +	if cc.has_argument('-mavx512f')
> +		machine_args += '-mno-avx512f'
> +	endif
>  endif
>  
>  dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
> index 43ed84155b..8fdb0cc2c3 100644
> --- a/mk/rte.cpuflags.mk
> +++ b/mk/rte.cpuflags.mk
> @@ -68,6 +68,11 @@ endif
>  ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),)
>  ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
>  CPUFLAGS += AVX512F
> +else
> +# disable compiler's AVX512F support as a workaround for Bug 97
> +ifeq ($(shell $(CC) --target-help | grep -q mavx512f && echo 1), 1)

This will not work for ICC, and do we need this? AUTO_CPUFLAGS already should
have what you are looking for, so I think this check can be removed.

> +MACHINE_CFLAGS += -mno-avx512f
> +endif
>  endif
>  endif
>  
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH] build: disable compiler AVX512F support
  2018-11-02 12:42 ` [dpdk-dev] [dpdk-stable] " Ferruh Yigit
@ 2018-11-02 13:48   ` Ferruh Yigit
  2018-11-02 20:59     ` Yongseok Koh
  0 siblings, 1 reply; 30+ messages in thread
From: Ferruh Yigit @ 2018-11-02 13:48 UTC (permalink / raw)
  To: Yongseok Koh, Thomas Monjalon, bruce.richardson
  Cc: dev, Shahaf Shuler, stable, Konstantin Ananyev, Anatoly Burakov

On 11/2/2018 12:42 PM, Ferruh Yigit wrote:
> On 10/23/2018 10:23 PM, Yongseok Koh wrote:
>> This is a workaround to prevent a crash, which might be caused by
>> optimization of newer gcc (7.3.0) on Intel Skylake.
>>
>> Bugzilla ID: 97
> 
> After checking the defect description again, this is the issue observed in
> rte_memcpy() implementation for AVX2, compiler uses AVX512F instructions while
> compiling it which causes the failure, so this may be a compiler defect but we
> don't know the root cause yet.

Is the issue only with gcc, and only with specific version of gcc?
If so can we reduce the disabling avx512 only to that gcc version?

> 
> I think best solution is to find the root cause and fix either avx2
> implementation or compiler, but this seems won't be soon, at least for rc2.
> 
> What this patch does is to prevent compiler to use avx512f instruction when
> "CONFIG_RTE_ENABLE_AVX512=n".
> 
> Concern is this will affect all DPDK generated code for x86, but since
> rte_memcpy() in header file there is no way to disable using avx512f
> instructions locally for rte_memcpy().
> I can't think of any other solution for now, so OK to go with this patch for
> now. Please find below comment.
> 
>>
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>> ---
>>  config/x86/meson.build | 5 +++++
>>  mk/rte.cpuflags.mk     | 5 +++++
>>  2 files changed, 10 insertions(+)
>>
>> diff --git a/config/x86/meson.build b/config/x86/meson.build
>> index 33efb5e547..e10ba872ac 100644
>> --- a/config/x86/meson.build
>> +++ b/config/x86/meson.build
>> @@ -47,6 +47,11 @@ endif
>>  if cc.get_define('__AVX512F__', args: march_opt) != ''
>>  	dpdk_conf.set('RTE_MACHINE_CPUFLAG_AVX512F', 1)
>>  	compile_time_cpuflags += ['RTE_CPUFLAG_AVX512F']
>> +else
>> +# disable compiler's AVX512F support as a workaround for Bug 97
>> +	if cc.has_argument('-mavx512f')
>> +		machine_args += '-mno-avx512f'
>> +	endif
>>  endif
>>  
>>  dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
>> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
>> index 43ed84155b..8fdb0cc2c3 100644
>> --- a/mk/rte.cpuflags.mk
>> +++ b/mk/rte.cpuflags.mk
>> @@ -68,6 +68,11 @@ endif
>>  ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),)
>>  ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
>>  CPUFLAGS += AVX512F
>> +else
>> +# disable compiler's AVX512F support as a workaround for Bug 97
>> +ifeq ($(shell $(CC) --target-help | grep -q mavx512f && echo 1), 1)
> 
> This will not work for ICC, and do we need this? AUTO_CPUFLAGS already should
> have what you are looking for, so I think this check can be removed.
> 
>> +MACHINE_CFLAGS += -mno-avx512f
>> +endif
>>  endif
>>  endif
>>  
>>
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH] build: disable compiler AVX512F support
  2018-11-02 13:48   ` Ferruh Yigit
@ 2018-11-02 20:59     ` Yongseok Koh
  2018-11-02 21:46       ` Ferruh Yigit
  0 siblings, 1 reply; 30+ messages in thread
From: Yongseok Koh @ 2018-11-02 20:59 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Thomas Monjalon, bruce.richardson, dev, Shahaf Shuler, stable,
	Konstantin Ananyev, Anatoly Burakov

On Fri, Nov 02, 2018 at 01:48:11PM +0000, Ferruh Yigit wrote:
> On 11/2/2018 12:42 PM, Ferruh Yigit wrote:
> > On 10/23/2018 10:23 PM, Yongseok Koh wrote:
> >> This is a workaround to prevent a crash, which might be caused by
> >> optimization of newer gcc (7.3.0) on Intel Skylake.
> >>
> >> Bugzilla ID: 97
> > 
> > After checking the defect description again, this is the issue observed in
> > rte_memcpy() implementation for AVX2, compiler uses AVX512F instructions while
> > compiling it which causes the failure, so this may be a compiler defect but we
> > don't know the root cause yet.
> 
> Is the issue only with gcc, and only with specific version of gcc?
> If so can we reduce the disabling avx512 only to that gcc version?
> 
> > 
> > I think best solution is to find the root cause and fix either avx2
> > implementation or compiler, but this seems won't be soon, at least for rc2.
> > 
> > What this patch does is to prevent compiler to use avx512f instruction when
> > "CONFIG_RTE_ENABLE_AVX512=n".
> > 
> > Concern is this will affect all DPDK generated code for x86, but since
> > rte_memcpy() in header file there is no way to disable using avx512f
> > instructions locally for rte_memcpy().
> > I can't think of any other solution for now, so OK to go with this patch for
> > now. Please find below comment.
> > 
> >>
> >> Cc: stable@dpdk.org
> >>
> >> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> >> ---
> >>  config/x86/meson.build | 5 +++++
> >>  mk/rte.cpuflags.mk     | 5 +++++
> >>  2 files changed, 10 insertions(+)
> >>
> >> diff --git a/config/x86/meson.build b/config/x86/meson.build
> >> index 33efb5e547..e10ba872ac 100644
> >> --- a/config/x86/meson.build
> >> +++ b/config/x86/meson.build
> >> @@ -47,6 +47,11 @@ endif
> >>  if cc.get_define('__AVX512F__', args: march_opt) != ''
> >>  	dpdk_conf.set('RTE_MACHINE_CPUFLAG_AVX512F', 1)
> >>  	compile_time_cpuflags += ['RTE_CPUFLAG_AVX512F']
> >> +else
> >> +# disable compiler's AVX512F support as a workaround for Bug 97
> >> +	if cc.has_argument('-mavx512f')
> >> +		machine_args += '-mno-avx512f'
> >> +	endif
> >>  endif
> >>  
> >>  dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
> >> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
> >> index 43ed84155b..8fdb0cc2c3 100644
> >> --- a/mk/rte.cpuflags.mk
> >> +++ b/mk/rte.cpuflags.mk
> >> @@ -68,6 +68,11 @@ endif
> >>  ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),)
> >>  ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
> >>  CPUFLAGS += AVX512F
> >> +else
> >> +# disable compiler's AVX512F support as a workaround for Bug 97
> >> +ifeq ($(shell $(CC) --target-help | grep -q mavx512f && echo 1), 1)
> > 
> > This will not work for ICC, and do we need this? AUTO_CPUFLAGS already should
> > have what you are looking for, so I think this check can be removed.

This is different from AUTO_CPUFLAGS as it tries to check compiler flag support.
And per your question, I have only tested it with gcc, so I agree on applying it
only for gcc. Will submit v2. But, I don't think we need to check gcc version as
there's no fix reported yet in a newer gcc version and this patch would have
very limited impact. avx512f support is quite new and kinda experimental so
far. Dropping a bit of performance would be better than crash. :-)

Thanks for your review,
Yongseok

> >> +MACHINE_CFLAGS += -mno-avx512f
> >> +endif
> >>  endif
> >>  endif
> >>  
> >>
> > 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [dpdk-dev] [PATCH v2] build: disable compiler AVX512F support
  2018-10-23 21:23 [dpdk-dev] [PATCH] build: disable compiler AVX512F support Yongseok Koh
  2018-11-01 23:11 ` Thomas Monjalon
  2018-11-02 12:42 ` [dpdk-dev] [dpdk-stable] " Ferruh Yigit
@ 2018-11-02 21:04 ` Yongseok Koh
  2018-11-05 14:06   ` Wiles, Keith
  2018-11-03  1:06 ` [dpdk-dev] [PATCH v3] build: disable gcc AVX512F support Yongseok Koh
  3 siblings, 1 reply; 30+ messages in thread
From: Yongseok Koh @ 2018-11-02 21:04 UTC (permalink / raw)
  To: Thomas Monjalon, bruce.richardson, ferruh.yigit
  Cc: dev, Shahaf Shuler, konstantin.ananyev, anatoly.burakov,
	Yongseok Koh, stable

This is a workaround to prevent a crash, which might be caused by
optimization of newer gcc (7.3.0) on Intel Skylake.

Bugzilla ID: 97

Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---

v2:
* disable the flag only in case of gcc

 config/x86/meson.build | 5 +++++
 mk/rte.cpuflags.mk     | 7 +++++++
 2 files changed, 12 insertions(+)

diff --git a/config/x86/meson.build b/config/x86/meson.build
index 33efb5e547..8ddca0ea9f 100644
--- a/config/x86/meson.build
+++ b/config/x86/meson.build
@@ -47,6 +47,11 @@ endif
 if cc.get_define('__AVX512F__', args: march_opt) != ''
 	dpdk_conf.set('RTE_MACHINE_CPUFLAG_AVX512F', 1)
 	compile_time_cpuflags += ['RTE_CPUFLAG_AVX512F']
+else
+# disable AVX512F support of gcc as a workaround for Bug 97
+	if cc.get_id() == 'gcc' and cc.has_argument('-mavx512f')
+		machine_args += '-mno-avx512f'
+	endif
 endif
 
 dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
index 43ed84155b..a8c26fb011 100644
--- a/mk/rte.cpuflags.mk
+++ b/mk/rte.cpuflags.mk
@@ -68,6 +68,13 @@ endif
 ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),)
 ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
 CPUFLAGS += AVX512F
+else
+# disable AVX512F support of gcc as a workaround for Bug 97
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+	ifeq ($(shell $(CC) --target-help | grep -q mavx512f && echo 1), 1)
+		MACHINE_CFLAGS += -mno-avx512f
+	endif
+endif
 endif
 endif
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH] build: disable compiler AVX512F support
  2018-11-02 20:59     ` Yongseok Koh
@ 2018-11-02 21:46       ` Ferruh Yigit
  2018-11-02 23:31         ` Yongseok Koh
  0 siblings, 1 reply; 30+ messages in thread
From: Ferruh Yigit @ 2018-11-02 21:46 UTC (permalink / raw)
  To: Yongseok Koh
  Cc: Thomas Monjalon, bruce.richardson, dev, Shahaf Shuler, stable,
	Konstantin Ananyev, Anatoly Burakov

On 11/2/2018 8:59 PM, Yongseok Koh wrote:
> On Fri, Nov 02, 2018 at 01:48:11PM +0000, Ferruh Yigit wrote:
>> On 11/2/2018 12:42 PM, Ferruh Yigit wrote:
>>> On 10/23/2018 10:23 PM, Yongseok Koh wrote:
>>>> This is a workaround to prevent a crash, which might be caused by
>>>> optimization of newer gcc (7.3.0) on Intel Skylake.
>>>>
>>>> Bugzilla ID: 97
>>>
>>> After checking the defect description again, this is the issue observed in
>>> rte_memcpy() implementation for AVX2, compiler uses AVX512F instructions while
>>> compiling it which causes the failure, so this may be a compiler defect but we
>>> don't know the root cause yet.
>>
>> Is the issue only with gcc, and only with specific version of gcc?
>> If so can we reduce the disabling avx512 only to that gcc version?
>>
>>>
>>> I think best solution is to find the root cause and fix either avx2
>>> implementation or compiler, but this seems won't be soon, at least for rc2.
>>>
>>> What this patch does is to prevent compiler to use avx512f instruction when
>>> "CONFIG_RTE_ENABLE_AVX512=n".
>>>
>>> Concern is this will affect all DPDK generated code for x86, but since
>>> rte_memcpy() in header file there is no way to disable using avx512f
>>> instructions locally for rte_memcpy().
>>> I can't think of any other solution for now, so OK to go with this patch for
>>> now. Please find below comment.
>>>
>>>>
>>>> Cc: stable@dpdk.org
>>>>
>>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>>>> ---
>>>>  config/x86/meson.build | 5 +++++
>>>>  mk/rte.cpuflags.mk     | 5 +++++
>>>>  2 files changed, 10 insertions(+)
>>>>
>>>> diff --git a/config/x86/meson.build b/config/x86/meson.build
>>>> index 33efb5e547..e10ba872ac 100644
>>>> --- a/config/x86/meson.build
>>>> +++ b/config/x86/meson.build
>>>> @@ -47,6 +47,11 @@ endif
>>>>  if cc.get_define('__AVX512F__', args: march_opt) != ''
>>>>  	dpdk_conf.set('RTE_MACHINE_CPUFLAG_AVX512F', 1)
>>>>  	compile_time_cpuflags += ['RTE_CPUFLAG_AVX512F']
>>>> +else
>>>> +# disable compiler's AVX512F support as a workaround for Bug 97
>>>> +	if cc.has_argument('-mavx512f')
>>>> +		machine_args += '-mno-avx512f'
>>>> +	endif
>>>>  endif
>>>>  
>>>>  dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
>>>> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
>>>> index 43ed84155b..8fdb0cc2c3 100644
>>>> --- a/mk/rte.cpuflags.mk
>>>> +++ b/mk/rte.cpuflags.mk
>>>> @@ -68,6 +68,11 @@ endif
>>>>  ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),)
>>>>  ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
>>>>  CPUFLAGS += AVX512F
>>>> +else
>>>> +# disable compiler's AVX512F support as a workaround for Bug 97
>>>> +ifeq ($(shell $(CC) --target-help | grep -q mavx512f && echo 1), 1)
>>>
>>> This will not work for ICC, and do we need this? AUTO_CPUFLAGS already should
>>> have what you are looking for, so I think this check can be removed.
> 
> This is different from AUTO_CPUFLAGS as it tries to check compiler flag support.

What AUTO_CPUFLAGS does?

It is output of `cc -march=xxx -dM -E - < /dev/null`, which list defined macros
for that specific march provided.

Like if you use `-march=corei7` you won't see __AVX2__ set.

And for `native`, if compiler doesn't support AVX2, I assume it won't able to
output __AVX2__

Is there a case AUTO_CPUFLAGS has __AVX512F__ but "$(CC) --target-help" doesn't
have `mavx512f`?

> And per your question, I have only tested it with gcc, so I agree on applying it
> only for gcc. Will submit v2. But, I don't think we need to check gcc version as
> there's no fix reported yet in a newer gcc version and this patch would have
> very limited impact. avx512f support is quite new and kinda experimental so
> far. Dropping a bit of performance would be better than crash. :-)
> 
> Thanks for your review,
> Yongseok
> 
>>>> +MACHINE_CFLAGS += -mno-avx512f
>>>> +endif
>>>>  endif
>>>>  endif
>>>>  
>>>>
>>>
>>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] [PATCH] build: disable compiler AVX512F support
  2018-11-02 21:46       ` Ferruh Yigit
@ 2018-11-02 23:31         ` Yongseok Koh
  0 siblings, 0 replies; 30+ messages in thread
From: Yongseok Koh @ 2018-11-02 23:31 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Thomas Monjalon, bruce.richardson, dev, Shahaf Shuler, stable,
	Konstantin Ananyev, Anatoly Burakov

On Fri, Nov 02, 2018 at 09:46:09PM +0000, Ferruh Yigit wrote:
> On 11/2/2018 8:59 PM, Yongseok Koh wrote:
> > On Fri, Nov 02, 2018 at 01:48:11PM +0000, Ferruh Yigit wrote:
> >> On 11/2/2018 12:42 PM, Ferruh Yigit wrote:
> >>> On 10/23/2018 10:23 PM, Yongseok Koh wrote:
> >>>> This is a workaround to prevent a crash, which might be caused by
> >>>> optimization of newer gcc (7.3.0) on Intel Skylake.
> >>>>
> >>>> Bugzilla ID: 97
> >>>
> >>> After checking the defect description again, this is the issue observed in
> >>> rte_memcpy() implementation for AVX2, compiler uses AVX512F instructions while
> >>> compiling it which causes the failure, so this may be a compiler defect but we
> >>> don't know the root cause yet.
> >>
> >> Is the issue only with gcc, and only with specific version of gcc?
> >> If so can we reduce the disabling avx512 only to that gcc version?
> >>
> >>>
> >>> I think best solution is to find the root cause and fix either avx2
> >>> implementation or compiler, but this seems won't be soon, at least for rc2.
> >>>
> >>> What this patch does is to prevent compiler to use avx512f instruction when
> >>> "CONFIG_RTE_ENABLE_AVX512=n".
> >>>
> >>> Concern is this will affect all DPDK generated code for x86, but since
> >>> rte_memcpy() in header file there is no way to disable using avx512f
> >>> instructions locally for rte_memcpy().
> >>> I can't think of any other solution for now, so OK to go with this patch for
> >>> now. Please find below comment.
> >>>
> >>>>
> >>>> Cc: stable@dpdk.org
> >>>>
> >>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> >>>> ---
> >>>>  config/x86/meson.build | 5 +++++
> >>>>  mk/rte.cpuflags.mk     | 5 +++++
> >>>>  2 files changed, 10 insertions(+)
> >>>>
> >>>> diff --git a/config/x86/meson.build b/config/x86/meson.build
> >>>> index 33efb5e547..e10ba872ac 100644
> >>>> --- a/config/x86/meson.build
> >>>> +++ b/config/x86/meson.build
> >>>> @@ -47,6 +47,11 @@ endif
> >>>>  if cc.get_define('__AVX512F__', args: march_opt) != ''
> >>>>  	dpdk_conf.set('RTE_MACHINE_CPUFLAG_AVX512F', 1)
> >>>>  	compile_time_cpuflags += ['RTE_CPUFLAG_AVX512F']
> >>>> +else
> >>>> +# disable compiler's AVX512F support as a workaround for Bug 97
> >>>> +	if cc.has_argument('-mavx512f')
> >>>> +		machine_args += '-mno-avx512f'
> >>>> +	endif
> >>>>  endif
> >>>>  
> >>>>  dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
> >>>> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
> >>>> index 43ed84155b..8fdb0cc2c3 100644
> >>>> --- a/mk/rte.cpuflags.mk
> >>>> +++ b/mk/rte.cpuflags.mk
> >>>> @@ -68,6 +68,11 @@ endif
> >>>>  ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),)
> >>>>  ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
> >>>>  CPUFLAGS += AVX512F
> >>>> +else
> >>>> +# disable compiler's AVX512F support as a workaround for Bug 97
> >>>> +ifeq ($(shell $(CC) --target-help | grep -q mavx512f && echo 1), 1)
> >>>
> >>> This will not work for ICC, and do we need this? AUTO_CPUFLAGS already should
> >>> have what you are looking for, so I think this check can be removed.
> > 
> > This is different from AUTO_CPUFLAGS as it tries to check compiler flag support.
> 
> What AUTO_CPUFLAGS does?
> 
> It is output of `cc -march=xxx -dM -E - < /dev/null`, which list defined macros
> for that specific march provided.
> 
> Like if you use `-march=corei7` you won't see __AVX2__ set.
> 
> And for `native`, if compiler doesn't support AVX2, I assume it won't able to
> output __AVX2__
> 
> Is there a case AUTO_CPUFLAGS has __AVX512F__ but "$(CC) --target-help" doesn't
> have `mavx512f`?

Right. For some reason, I have wrong impression that the flag grabs cpuflags of
the compile host. Will submit a new version.

> > And per your question, I have only tested it with gcc, so I agree on applying it
> > only for gcc. Will submit v2. But, I don't think we need to check gcc version as
> > there's no fix reported yet in a newer gcc version and this patch would have
> > very limited impact. avx512f support is quite new and kinda experimental so
> > far. Dropping a bit of performance would be better than crash. :-)
> > 
> > Thanks for your review,
> > Yongseok
> > 
> >>>> +MACHINE_CFLAGS += -mno-avx512f
> >>>> +endif
> >>>>  endif
> >>>>  endif
> >>>>  
> >>>>
> >>>
> >>
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [dpdk-dev] [PATCH v3] build: disable gcc AVX512F support
  2018-10-23 21:23 [dpdk-dev] [PATCH] build: disable compiler AVX512F support Yongseok Koh
                   ` (2 preceding siblings ...)
  2018-11-02 21:04 ` [dpdk-dev] [PATCH v2] " Yongseok Koh
@ 2018-11-03  1:06 ` Yongseok Koh
  2018-11-04 20:56   ` Thomas Monjalon
  3 siblings, 1 reply; 30+ messages in thread
From: Yongseok Koh @ 2018-11-03  1:06 UTC (permalink / raw)
  To: Thomas Monjalon, bruce.richardson, ferruh.yigit
  Cc: dev, Shahaf Shuler, konstantin.ananyev, anatoly.burakov,
	Yongseok Koh, stable

This is a workaround to prevent a crash, which might be caused by
optimization of newer gcc (7.3.0) on Intel Skylake.

This disables AVX512F support of gcc by adding -mno-avx512f if it is
disabled in DPDK (CONFIG_RTE_ENABLE_AVX512=n).

This does not apply to the meson build as that doesn't have such an option
but always enable AVX512F whenever supported.

Bugzilla ID: 97

Cc: stable@dpdk.org

Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
---

v3:
* use AUTO_CPUFLAGS instead of redundant check for gcc
* remove the change from meson build

v2:
* disable the flag only in case of gcc

 mk/rte.cpuflags.mk | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
index 43ed84155b..c3291b17a1 100644
--- a/mk/rte.cpuflags.mk
+++ b/mk/rte.cpuflags.mk
@@ -68,6 +68,11 @@ endif
 ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),)
 ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
 CPUFLAGS += AVX512F
+else
+# disable AVX512F support of gcc as a workaround for Bug 97
+ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
+MACHINE_CFLAGS += -mno-avx512f
+endif
 endif
 endif
 
-- 
2.11.0

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [PATCH v3] build: disable gcc AVX512F support
  2018-11-03  1:06 ` [dpdk-dev] [PATCH v3] build: disable gcc AVX512F support Yongseok Koh
@ 2018-11-04 20:56   ` Thomas Monjalon
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Monjalon @ 2018-11-04 20:56 UTC (permalink / raw)
  To: Yongseok Koh
  Cc: dev, bruce.richardson, ferruh.yigit, Shahaf Shuler,
	konstantin.ananyev, anatoly.burakov, stable

03/11/2018 02:06, Yongseok Koh:
> This is a workaround to prevent a crash, which might be caused by
> optimization of newer gcc (7.3.0) on Intel Skylake.
> 
> This disables AVX512F support of gcc by adding -mno-avx512f if it is
> disabled in DPDK (CONFIG_RTE_ENABLE_AVX512=n).
> 
> This does not apply to the meson build as that doesn't have such an option
> but always enable AVX512F whenever supported.
> 
> Bugzilla ID: 97
> 
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>

Applied, thanks

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [PATCH v2] build: disable compiler AVX512F support
  2018-11-02 21:04 ` [dpdk-dev] [PATCH v2] " Yongseok Koh
@ 2018-11-05 14:06   ` Wiles, Keith
  2018-11-06 21:30     ` Yongseok Koh
  0 siblings, 1 reply; 30+ messages in thread
From: Wiles, Keith @ 2018-11-05 14:06 UTC (permalink / raw)
  To: Yongseok Koh
  Cc: Thomas Monjalon, Richardson, Bruce, Yigit, Ferruh, dev,
	Shahaf Shuler, Ananyev, Konstantin, Burakov, Anatoly, stable



> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> 
> This is a workaround to prevent a crash, which might be caused by
> optimization of newer gcc (7.3.0) on Intel Skylake.

Should the code below not also test for the gcc version and the Sky Lake processor, maybe I am wrong but it seems it is turning AVX512 for all GCC builds

Also bug 97 seems a bit obscure reference, maybe you know the bug report, but more details would be good?
> 
> Bugzilla ID: 97
> 
> Cc: stable@dpdk.org
> 
> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
> ---
> 
> v2:
> * disable the flag only in case of gcc
> 
> config/x86/meson.build | 5 +++++
> mk/rte.cpuflags.mk     | 7 +++++++
> 2 files changed, 12 insertions(+)
> 
> diff --git a/config/x86/meson.build b/config/x86/meson.build
> index 33efb5e547..8ddca0ea9f 100644
> --- a/config/x86/meson.build
> +++ b/config/x86/meson.build
> @@ -47,6 +47,11 @@ endif
> if cc.get_define('__AVX512F__', args: march_opt) != ''
> 	dpdk_conf.set('RTE_MACHINE_CPUFLAG_AVX512F', 1)
> 	compile_time_cpuflags += ['RTE_CPUFLAG_AVX512F']
> +else
> +# disable AVX512F support of gcc as a workaround for Bug 97
> +	if cc.get_id() == 'gcc' and cc.has_argument('-mavx512f')
> +		machine_args += '-mno-avx512f'
> +	endif
> endif
> 
> dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
> index 43ed84155b..a8c26fb011 100644
> --- a/mk/rte.cpuflags.mk
> +++ b/mk/rte.cpuflags.mk
> @@ -68,6 +68,13 @@ endif
> ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),)
> ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
> CPUFLAGS += AVX512F
> +else
> +# disable AVX512F support of gcc as a workaround for Bug 97
> +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
> +	ifeq ($(shell $(CC) --target-help | grep -q mavx512f && echo 1), 1)
> +		MACHINE_CFLAGS += -mno-avx512f
> +	endif
> +endif
> endif
> endif
> 
> -- 
> 2.11.0
> 

Regards,
Keith

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [PATCH v2] build: disable compiler AVX512F support
  2018-11-05 14:06   ` Wiles, Keith
@ 2018-11-06 21:30     ` Yongseok Koh
  2018-11-07  9:04       ` Wiles, Keith
  0 siblings, 1 reply; 30+ messages in thread
From: Yongseok Koh @ 2018-11-06 21:30 UTC (permalink / raw)
  To: Wiles, Keith
  Cc: Thomas Monjalon, Richardson, Bruce, Yigit, Ferruh, dev,
	Shahaf Shuler, Ananyev, Konstantin, Burakov, Anatoly, stable


> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> 
> 
> 
>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>> 
>> This is a workaround to prevent a crash, which might be caused by
>> optimization of newer gcc (7.3.0) on Intel Skylake.
> 
> Should the code below not also test for the gcc version and the Sky Lake processor, maybe I am wrong but it seems it is turning AVX512 for all GCC builds

I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.

> 
> Also bug 97 seems a bit obscure reference, maybe you know the bug report, but more details would be good?

I sent out the report to dev list two month ago. And I created the Bug 97 in order to reference it in the commit message.
I didn't want to repeat same message here and there, but it would've been better to have some sort of summary of the Bug, although v3 has a few more words. However, v3 has been merged.

Thanks,
Yongseok

>> 
>> Bugzilla ID: 97
>> 
>> Cc: stable@dpdk.org
>> 
>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>> ---
>> 
>> v2:
>> * disable the flag only in case of gcc
>> 
>> config/x86/meson.build | 5 +++++
>> mk/rte.cpuflags.mk     | 7 +++++++
>> 2 files changed, 12 insertions(+)
>> 
>> diff --git a/config/x86/meson.build b/config/x86/meson.build
>> index 33efb5e547..8ddca0ea9f 100644
>> --- a/config/x86/meson.build
>> +++ b/config/x86/meson.build
>> @@ -47,6 +47,11 @@ endif
>> if cc.get_define('__AVX512F__', args: march_opt) != ''
>> 	dpdk_conf.set('RTE_MACHINE_CPUFLAG_AVX512F', 1)
>> 	compile_time_cpuflags += ['RTE_CPUFLAG_AVX512F']
>> +else
>> +# disable AVX512F support of gcc as a workaround for Bug 97
>> +	if cc.get_id() == 'gcc' and cc.has_argument('-mavx512f')
>> +		machine_args += '-mno-avx512f'
>> +	endif
>> endif
>> 
>> dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
>> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
>> index 43ed84155b..a8c26fb011 100644
>> --- a/mk/rte.cpuflags.mk
>> +++ b/mk/rte.cpuflags.mk
>> @@ -68,6 +68,13 @@ endif
>> ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),)
>> ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
>> CPUFLAGS += AVX512F
>> +else
>> +# disable AVX512F support of gcc as a workaround for Bug 97
>> +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
>> +	ifeq ($(shell $(CC) --target-help | grep -q mavx512f && echo 1), 1)
>> +		MACHINE_CFLAGS += -mno-avx512f
>> +	endif
>> +endif
>> endif
>> endif
>> 
>> -- 
>> 2.11.0
>> 
> 
> Regards,
> Keith

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [PATCH v2] build: disable compiler AVX512F support
  2018-11-06 21:30     ` Yongseok Koh
@ 2018-11-07  9:04       ` Wiles, Keith
  2018-11-08 15:59         ` [dpdk-dev] AVX512 bug on SkyLake Thomas Monjalon
  0 siblings, 1 reply; 30+ messages in thread
From: Wiles, Keith @ 2018-11-07  9:04 UTC (permalink / raw)
  To: Yongseok Koh
  Cc: Thomas Monjalon, Richardson, Bruce, Yigit, Ferruh, dev,
	Shahaf Shuler, Ananyev, Konstantin, Burakov, Anatoly, stable



> On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> 
> 
>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
>> 
>> 
>> 
>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>> 
>>> This is a workaround to prevent a crash, which might be caused by
>>> optimization of newer gcc (7.3.0) on Intel Skylake.
>> 
>> Should the code below not also test for the gcc version and the Sky Lake processor, maybe I am wrong but it seems it is turning AVX512 for all GCC builds
> 
> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
> Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
> Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.

Are you not turning off all of the GCC versions for AVX512. And you can test for range or greater then GCC version and it just seems like we are turning off every gcc version, is that true?
> 
>> 
>> Also bug 97 seems a bit obscure reference, maybe you know the bug report, but more details would be good?
> 
> I sent out the report to dev list two month ago. And I created the Bug 97 in order to reference it in the commit message.
> I didn't want to repeat same message here and there, but it would've been better to have some sort of summary of the Bug, although v3 has a few more words. However, v3 has been merged.

Still this is too obscure if nothing else give a link to a specific bug not just 97.
> 
> Thanks,
> Yongseok
> 
>>> 
>>> Bugzilla ID: 97
>>> 
>>> Cc: stable@dpdk.org
>>> 
>>> Signed-off-by: Yongseok Koh <yskoh@mellanox.com>
>>> ---
>>> 
>>> v2:
>>> * disable the flag only in case of gcc
>>> 
>>> config/x86/meson.build | 5 +++++
>>> mk/rte.cpuflags.mk     | 7 +++++++
>>> 2 files changed, 12 insertions(+)
>>> 
>>> diff --git a/config/x86/meson.build b/config/x86/meson.build
>>> index 33efb5e547..8ddca0ea9f 100644
>>> --- a/config/x86/meson.build
>>> +++ b/config/x86/meson.build
>>> @@ -47,6 +47,11 @@ endif
>>> if cc.get_define('__AVX512F__', args: march_opt) != ''
>>> 	dpdk_conf.set('RTE_MACHINE_CPUFLAG_AVX512F', 1)
>>> 	compile_time_cpuflags += ['RTE_CPUFLAG_AVX512F']
>>> +else
>>> +# disable AVX512F support of gcc as a workaround for Bug 97
>>> +	if cc.get_id() == 'gcc' and cc.has_argument('-mavx512f')
>>> +		machine_args += '-mno-avx512f'
>>> +	endif
>>> endif
>>> 
>>> dpdk_conf.set('RTE_CACHE_LINE_SIZE', 64)
>>> diff --git a/mk/rte.cpuflags.mk b/mk/rte.cpuflags.mk
>>> index 43ed84155b..a8c26fb011 100644
>>> --- a/mk/rte.cpuflags.mk
>>> +++ b/mk/rte.cpuflags.mk
>>> @@ -68,6 +68,13 @@ endif
>>> ifneq ($(filter $(AUTO_CPUFLAGS),__AVX512F__),)
>>> ifeq ($(CONFIG_RTE_ENABLE_AVX512),y)
>>> CPUFLAGS += AVX512F
>>> +else
>>> +# disable AVX512F support of gcc as a workaround for Bug 97
>>> +ifeq ($(CONFIG_RTE_TOOLCHAIN_GCC),y)
>>> +	ifeq ($(shell $(CC) --target-help | grep -q mavx512f && echo 1), 1)
>>> +		MACHINE_CFLAGS += -mno-avx512f
>>> +	endif
>>> +endif
>>> endif
>>> endif
>>> 
>>> -- 
>>> 2.11.0
>>> 
>> 
>> Regards,
>> Keith
> 

Regards,
Keith

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] AVX512 bug on SkyLake
  2018-11-07  9:04       ` Wiles, Keith
@ 2018-11-08 15:59         ` Thomas Monjalon
  2018-11-08 17:21           ` Ferruh Yigit
                             ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Thomas Monjalon @ 2018-11-08 15:59 UTC (permalink / raw)
  To: keith.wiles, Yongseok Koh, ferruh.yigit
  Cc: dev, bruce.richardson, Shahaf Shuler, konstantin.ananyev,
	anatoly.burakov, stable, justin.parus, christian.ehrhardt,
	david.coronel, josh.powers, jay.vosburgh, dan.streetman

Hi,

We need to gather more information about this bug.
More below.

07/11/2018 10:04, Wiles, Keith:
> > On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> >>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >>> 
> >>> This is a workaround to prevent a crash, which might be caused by
> >>> optimization of newer gcc (7.3.0) on Intel Skylake.
> >> 
> >> Should the code below not also test for the gcc version and
> >> the Sky Lake processor, maybe I am wrong but it seems it is
> >> turning AVX512 for all GCC builds
> > 
> > I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
> > Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
> > Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
> > And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.
> 
> Are you not turning off all of the GCC versions for AVX512.
> And you can test for range or greater then GCC version and
> it just seems like we are turning off every gcc version, is that true?

Do we know exactly which GCC versions are affected?

> >> Also bug 97 seems a bit obscure reference, maybe you know
> >> the bug report, but more details would be good?
> > 
> > I sent out the report to dev list two month ago.
> > And I created the Bug 97 in order to reference it
> > in the commit message.
> > I didn't want to repeat same message here and there,
> > but it would've been better to have some sort of summary
> > of the Bug, although v3 has a few more words.
> > However, v3 has been merged.
> 
> Still this is too obscure if nothing else give a link to
> a specific bug not just 97.

The URL is
	https://bugs.dpdk.org/show_bug.cgi?id=97
The bug is also pointing to an email:
	https://mails.dpdk.org/archives/dev/2018-September/111522.html

Summary:
	- CPU: Intel Skylake
	- Linux environment: Ubuntu 18.04
	- Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)
	- Scenario: testpmd crashes when it starts forwarding
	- Behaviour: AVX2 version of rte_memcpy() optimized with 512b instructions
	- Fix: disable AVX512 optimization with -mno-avx512f

It seems to have been reproduced only when using mlx5 PMD so far.
Any other experience?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] AVX512 bug on SkyLake
  2018-11-08 15:59         ` [dpdk-dev] AVX512 bug on SkyLake Thomas Monjalon
@ 2018-11-08 17:21           ` Ferruh Yigit
  2018-11-08 23:01             ` Yongseok Koh
  2018-11-09 18:46           ` [dpdk-dev] " Stephen Hemminger
  2018-11-10  2:13           ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
  2 siblings, 1 reply; 30+ messages in thread
From: Ferruh Yigit @ 2018-11-08 17:21 UTC (permalink / raw)
  To: Thomas Monjalon, keith.wiles, Yongseok Koh
  Cc: dev, bruce.richardson, Shahaf Shuler, konstantin.ananyev,
	anatoly.burakov, stable, justin.parus, christian.ehrhardt,
	david.coronel, josh.powers, jay.vosburgh, dan.streetman

On 11/8/2018 3:59 PM, Thomas Monjalon wrote:
> Hi,
> 
> We need to gather more information about this bug.
> More below.
> 
> 07/11/2018 10:04, Wiles, Keith:
>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>>>>
>>>>> This is a workaround to prevent a crash, which might be caused by
>>>>> optimization of newer gcc (7.3.0) on Intel Skylake.
>>>>
>>>> Should the code below not also test for the gcc version and
>>>> the Sky Lake processor, maybe I am wrong but it seems it is
>>>> turning AVX512 for all GCC builds
>>>
>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
>>> Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
>>> Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.
>>
>> Are you not turning off all of the GCC versions for AVX512.
>> And you can test for range or greater then GCC version and
>> it just seems like we are turning off every gcc version, is that true?
> 
> Do we know exactly which GCC versions are affected?
> 
>>>> Also bug 97 seems a bit obscure reference, maybe you know
>>>> the bug report, but more details would be good?
>>>
>>> I sent out the report to dev list two month ago.
>>> And I created the Bug 97 in order to reference it
>>> in the commit message.
>>> I didn't want to repeat same message here and there,
>>> but it would've been better to have some sort of summary
>>> of the Bug, although v3 has a few more words.
>>> However, v3 has been merged.
>>
>> Still this is too obscure if nothing else give a link to
>> a specific bug not just 97.
> 
> The URL is
> 	https://bugs.dpdk.org/show_bug.cgi?id=97
> The bug is also pointing to an email:
> 	https://mails.dpdk.org/archives/dev/2018-September/111522.html
> 
> Summary:
> 	- CPU: Intel Skylake
> 	- Linux environment: Ubuntu 18.04
> 	- Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)

Is it possible to test a few other gcc versions to check if the issue is
specific to this compiler version?

> 	- Scenario: testpmd crashes when it starts forwarding
> 	- Behaviour: AVX2 version of rte_memcpy() optimized with 512b instructions
> 	- Fix: disable AVX512 optimization with -mno-avx512f
> 
> It seems to have been reproduced only when using mlx5 PMD so far.
> Any other experience?
> 
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] AVX512 bug on SkyLake
  2018-11-08 17:21           ` Ferruh Yigit
@ 2018-11-08 23:01             ` Yongseok Koh
  2018-11-09  6:27               ` Christian Ehrhardt
  2018-11-09 10:03               ` Ferruh Yigit
  0 siblings, 2 replies; 30+ messages in thread
From: Yongseok Koh @ 2018-11-08 23:01 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Thomas Monjalon, Wiles, Keith, dev, Richardson, Bruce,
	Shahaf Shuler, konstantin.ananyev, anatoly.burakov, stable,
	justin.parus, christian.ehrhardt, david.coronel, josh.powers,
	jay.vosburgh, dan.streetman


> On Nov 8, 2018, at 9:21 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> 
> On 11/8/2018 3:59 PM, Thomas Monjalon wrote:
>> Hi,
>> 
>> We need to gather more information about this bug.
>> More below.
>> 
>> 07/11/2018 10:04, Wiles, Keith:
>>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
>>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>>>>> 
>>>>>> This is a workaround to prevent a crash, which might be caused by
>>>>>> optimization of newer gcc (7.3.0) on Intel Skylake.
>>>>> 
>>>>> Should the code below not also test for the gcc version and
>>>>> the Sky Lake processor, maybe I am wrong but it seems it is
>>>>> turning AVX512 for all GCC builds
>>>> 
>>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
>>>> Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
>>>> Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
>>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.
>>> 
>>> Are you not turning off all of the GCC versions for AVX512.
>>> And you can test for range or greater then GCC version and
>>> it just seems like we are turning off every gcc version, is that true?
>> 
>> Do we know exactly which GCC versions are affected?
>> 
>>>>> Also bug 97 seems a bit obscure reference, maybe you know
>>>>> the bug report, but more details would be good?
>>>> 
>>>> I sent out the report to dev list two month ago.
>>>> And I created the Bug 97 in order to reference it
>>>> in the commit message.
>>>> I didn't want to repeat same message here and there,
>>>> but it would've been better to have some sort of summary
>>>> of the Bug, although v3 has a few more words.
>>>> However, v3 has been merged.
>>> 
>>> Still this is too obscure if nothing else give a link to
>>> a specific bug not just 97.
>> 
>> The URL is
>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D97&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=2o%2Fg203aWrKCYg16S6oI4BcS41igpLu1DloS%2FrRnknc%3D&amp;reserved=0
>> The bug is also pointing to an email:
>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111522.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=NCFKxaREd69iZ8eyFKg%2FWBP73CLTXkxrNQQeii%2Bbsao%3D&amp;reserved=0
>> 
>> Summary:
>> 	- CPU: Intel Skylake
>> 	- Linux environment: Ubuntu 18.04
>> 	- Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)
> 
> Is it possible to test a few other gcc versions to check if the issue is
> specific to this compiler version?

Nothing's impossible but even with my quick search in gcc.gnu.org,
I could find the following documents mention mavx512f support:

GCC 4.9.0
April 22, 2014 (changes, documentation)
 
GCC 5.1
April 22, 2015 (changes, documentation)
 
GCC 6.4
July 4, 2017 (changes, documentation)
 
GCC 7.1
May 2, 2017 (changes, documentation)
 
GCC 8.1
May 2, 2018 (changes, documentation)

We altogether have to put quite large resource to verify all of the versions.
 
I assumed older than gcc 7 would have the same issue. I know it was a speculation
but like I mentioned I wanted to be more conservative. I didn't mean this is a permanent fix.
For two months, we couldn't have any tangible solution (actually nobody cared including myself),
so I submitted the patch to temporarily disable mavx512f.

I'm still not sure what the best option is...

Thanks,
Yongseok

> 
>> 	- Scenario: testpmd crashes when it starts forwarding
>> 	- Behaviour: AVX2 version of rte_memcpy() optimized with 512b instructions
>> 	- Fix: disable AVX512 optimization with -mno-avx512f
>> 
>> It seems to have been reproduced only when using mlx5 PMD so far.
>> Any other experience?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] AVX512 bug on SkyLake
  2018-11-08 23:01             ` Yongseok Koh
@ 2018-11-09  6:27               ` Christian Ehrhardt
  2018-11-09  9:49                 ` Ferruh Yigit
  2018-11-09 10:03               ` Ferruh Yigit
  1 sibling, 1 reply; 30+ messages in thread
From: Christian Ehrhardt @ 2018-11-09  6:27 UTC (permalink / raw)
  To: yskoh
  Cc: Ferruh Yigit, Thomas Monjalon, keith.wiles, dev,
	Bruce Richardson, Shahaf Shuler, Ananyev, Konstantin,
	anatoly.burakov, stable, justin.parus, David Coronel,
	Josh Powers, Jay Vosburgh, Dan Streetman

On Fri, Nov 9, 2018 at 12:01 AM Yongseok Koh <yskoh@mellanox.com> wrote:
>
>
> > On Nov 8, 2018, at 9:21 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> >
> > On 11/8/2018 3:59 PM, Thomas Monjalon wrote:
> >> Hi,
> >>
> >> We need to gather more information about this bug.
> >> More below.
> >>

Thanks Thomas for looping us in!

> >> 07/11/2018 10:04, Wiles, Keith:
> >>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> >>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >>>>>>
> >>>>>> This is a workaround to prevent a crash, which might be caused by
> >>>>>> optimization of newer gcc (7.3.0) on Intel Skylake.
> >>>>>
> >>>>> Should the code below not also test for the gcc version and
> >>>>> the Sky Lake processor, maybe I am wrong but it seems it is
> >>>>> turning AVX512 for all GCC builds
> >>>>
> >>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
> >>>> Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
> >>>> Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
> >>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.
> >>>
> >>> Are you not turning off all of the GCC versions for AVX512.
> >>> And you can test for range or greater then GCC version and
> >>> it just seems like we are turning off every gcc version, is that true?
> >>
> >> Do we know exactly which GCC versions are affected?
> >>
> >>>>> Also bug 97 seems a bit obscure reference, maybe you know
> >>>>> the bug report, but more details would be good?
> >>>>
> >>>> I sent out the report to dev list two month ago.
> >>>> And I created the Bug 97 in order to reference it
> >>>> in the commit message.
> >>>> I didn't want to repeat same message here and there,
> >>>> but it would've been better to have some sort of summary
> >>>> of the Bug, although v3 has a few more words.
> >>>> However, v3 has been merged.
> >>>
> >>> Still this is too obscure if nothing else give a link to
> >>> a specific bug not just 97.
> >>
> >> The URL is
> >>      https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D97&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=2o%2Fg203aWrKCYg16S6oI4BcS41igpLu1DloS%2FrRnknc%3D&amp;reserved=0
> >> The bug is also pointing to an email:
> >>      https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111522.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=NCFKxaREd69iZ8eyFKg%2FWBP73CLTXkxrNQQeii%2Bbsao%3D&amp;reserved=0
> >>
> >> Summary:
> >>      - CPU: Intel Skylake
> >>      - Linux environment: Ubuntu 18.04
> >>      - Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)
> >
> > Is it possible to test a few other gcc versions to check if the issue is
> > specific to this compiler version?
>
> Nothing's impossible but even with my quick search in gcc.gnu.org,
> I could find the following documents mention mavx512f support:
>
> GCC 4.9.0
> April 22, 2014 (changes, documentation)
>
> GCC 5.1
> April 22, 2015 (changes, documentation)
>
> GCC 6.4
> July 4, 2017 (changes, documentation)
>
> GCC 7.1
> May 2, 2017 (changes, documentation)
>
> GCC 8.1
> May 2, 2018 (changes, documentation)
>
> We altogether have to put quite large resource to verify all of the versions.
>
> I assumed older than gcc 7 would have the same issue. I know it was a speculation
> but like I mentioned I wanted to be more conservative. I didn't mean this is a permanent fix.
> For two months, we couldn't have any tangible solution (actually nobody cared including myself),
> so I submitted the patch to temporarily disable mavx512f.
>
> I'm still not sure what the best option is...
>

What I wonder in all of this as I don't understand that part of it yet is this.
I assume you are building on Ubuntu as that is your gcc reference.
FYI: as people asked for bug references, there also is [1] which seems
pretty much the same issue.

It builds with mostly defaults, that means per
mk/machine/default/rte.vars.mk and similar it sets -march=corei7

But when I look at what that implies all avx512 is disabled
$ gcc -Q --help=target -m64 -march=corei7 | grep avx512f
 -mavx512f                             [disabled]

So I wonder what/why -mno-avx512f should help at all.
I used the full list of gcc args we have for the build (e.g. [2] of a
18.05 build), but that doesn't change that (mostly -W, -I and -D).
So I wonder, did people do a custom build and bump up march or enable
-mavx512f on their own to hit that?
Or are we facing a real gcc issue where " -mavx512f [disabled]" is not
the same as -mno-avx512f ?
Maybe someone who hit the bug could clarify that please?

BTW: per reports I've seen it also seems to apply to the latest
compiler update of the same series - at least it was said to be fully
updated, that would be 7.3.0-27ubuntu1~18.04
But this is 2nd grade information as I don't have a system with the
right combo MLX5+Skylake available atm, so I can't confirm for sure
:-/

[1]: https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1799397
[2]: https://launchpadlibrarian.net/373589345/buildlog_ubuntu-bionic-amd64.dpdk_18.05-1~ubuntu0.18.04.1_BUILDING.txt.gz

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] AVX512 bug on SkyLake
  2018-11-09  6:27               ` Christian Ehrhardt
@ 2018-11-09  9:49                 ` Ferruh Yigit
  2018-11-09 11:35                   ` Thomas Monjalon
  0 siblings, 1 reply; 30+ messages in thread
From: Ferruh Yigit @ 2018-11-09  9:49 UTC (permalink / raw)
  To: Christian Ehrhardt, yskoh
  Cc: Thomas Monjalon, keith.wiles, dev, Bruce Richardson,
	Shahaf Shuler, Ananyev, Konstantin, anatoly.burakov, stable,
	justin.parus, David Coronel, Josh Powers, Jay Vosburgh,
	Dan Streetman

On 11/9/2018 6:27 AM, Christian Ehrhardt wrote:
> On Fri, Nov 9, 2018 at 12:01 AM Yongseok Koh <yskoh@mellanox.com> wrote:
>>
>>
>>> On Nov 8, 2018, at 9:21 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>>>
>>> On 11/8/2018 3:59 PM, Thomas Monjalon wrote:
>>>> Hi,
>>>>
>>>> We need to gather more information about this bug.
>>>> More below.
>>>>
> 
> Thanks Thomas for looping us in!
> 
>>>> 07/11/2018 10:04, Wiles, Keith:
>>>>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>>>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
>>>>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>>>>>>>
>>>>>>>> This is a workaround to prevent a crash, which might be caused by
>>>>>>>> optimization of newer gcc (7.3.0) on Intel Skylake.
>>>>>>>
>>>>>>> Should the code below not also test for the gcc version and
>>>>>>> the Sky Lake processor, maybe I am wrong but it seems it is
>>>>>>> turning AVX512 for all GCC builds
>>>>>>
>>>>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
>>>>>> Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
>>>>>> Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
>>>>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.
>>>>>
>>>>> Are you not turning off all of the GCC versions for AVX512.
>>>>> And you can test for range or greater then GCC version and
>>>>> it just seems like we are turning off every gcc version, is that true?
>>>>
>>>> Do we know exactly which GCC versions are affected?
>>>>
>>>>>>> Also bug 97 seems a bit obscure reference, maybe you know
>>>>>>> the bug report, but more details would be good?
>>>>>>
>>>>>> I sent out the report to dev list two month ago.
>>>>>> And I created the Bug 97 in order to reference it
>>>>>> in the commit message.
>>>>>> I didn't want to repeat same message here and there,
>>>>>> but it would've been better to have some sort of summary
>>>>>> of the Bug, although v3 has a few more words.
>>>>>> However, v3 has been merged.
>>>>>
>>>>> Still this is too obscure if nothing else give a link to
>>>>> a specific bug not just 97.
>>>>
>>>> The URL is
>>>>      https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D97&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=2o%2Fg203aWrKCYg16S6oI4BcS41igpLu1DloS%2FrRnknc%3D&amp;reserved=0
>>>> The bug is also pointing to an email:
>>>>      https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111522.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=NCFKxaREd69iZ8eyFKg%2FWBP73CLTXkxrNQQeii%2Bbsao%3D&amp;reserved=0
>>>>
>>>> Summary:
>>>>      - CPU: Intel Skylake
>>>>      - Linux environment: Ubuntu 18.04
>>>>      - Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)
>>>
>>> Is it possible to test a few other gcc versions to check if the issue is
>>> specific to this compiler version?
>>
>> Nothing's impossible but even with my quick search in gcc.gnu.org,
>> I could find the following documents mention mavx512f support:
>>
>> GCC 4.9.0
>> April 22, 2014 (changes, documentation)
>>
>> GCC 5.1
>> April 22, 2015 (changes, documentation)
>>
>> GCC 6.4
>> July 4, 2017 (changes, documentation)
>>
>> GCC 7.1
>> May 2, 2017 (changes, documentation)
>>
>> GCC 8.1
>> May 2, 2018 (changes, documentation)
>>
>> We altogether have to put quite large resource to verify all of the versions.
>>
>> I assumed older than gcc 7 would have the same issue. I know it was a speculation
>> but like I mentioned I wanted to be more conservative. I didn't mean this is a permanent fix.
>> For two months, we couldn't have any tangible solution (actually nobody cared including myself),
>> so I submitted the patch to temporarily disable mavx512f.
>>
>> I'm still not sure what the best option is...
>>
> 
> What I wonder in all of this as I don't understand that part of it yet is this.
> I assume you are building on Ubuntu as that is your gcc reference.
> FYI: as people asked for bug references, there also is [1] which seems
> pretty much the same issue.
> 
> It builds with mostly defaults, that means per
> mk/machine/default/rte.vars.mk and similar it sets -march=corei7
> 
> But when I look at what that implies all avx512 is disabled
> $ gcc -Q --help=target -m64 -march=corei7 | grep avx512f
>  -mavx512f                             [disabled]

This is output is the "corei7" architecture. Defect is for "native" build on
Skylake.

For "native" arch:
gcc -Q --help=target -m64 -march=native | grep avx512f
  -mavx512f                             [enabled]


Also I can confirm from disassembly output that avx512 instructions are used
without '-mno-avx512f' provided.

> 
> So I wonder what/why -mno-avx512f should help at all.
> I used the full list of gcc args we have for the build (e.g. [2] of a
> 18.05 build), but that doesn't change that (mostly -W, -I and -D).
> So I wonder, did people do a custom build and bump up march or enable
> -mavx512f on their own to hit that?
> Or are we facing a real gcc issue where " -mavx512f [disabled]" is not
> the same as -mno-avx512f ?
> Maybe someone who hit the bug could clarify that please?
> 
> BTW: per reports I've seen it also seems to apply to the latest
> compiler update of the same series - at least it was said to be fully
> updated, that would be 7.3.0-27ubuntu1~18.04
> But this is 2nd grade information as I don't have a system with the
> right combo MLX5+Skylake available atm, so I can't confirm for sure
> :-/
> 
> [1]: https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/1799397
> [2]: https://launchpadlibrarian.net/373589345/buildlog_ubuntu-bionic-amd64.dpdk_18.05-1~ubuntu0.18.04.1_BUILDING.txt.gz
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] AVX512 bug on SkyLake
  2018-11-08 23:01             ` Yongseok Koh
  2018-11-09  6:27               ` Christian Ehrhardt
@ 2018-11-09 10:03               ` Ferruh Yigit
  2018-11-09 13:17                 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
  1 sibling, 1 reply; 30+ messages in thread
From: Ferruh Yigit @ 2018-11-09 10:03 UTC (permalink / raw)
  To: Yongseok Koh
  Cc: Thomas Monjalon, Wiles, Keith, dev, Richardson, Bruce,
	Shahaf Shuler, konstantin.ananyev, anatoly.burakov, stable,
	justin.parus, christian.ehrhardt, david.coronel, josh.powers,
	jay.vosburgh, dan.streetman

On 11/8/2018 11:01 PM, Yongseok Koh wrote:
> 
>> On Nov 8, 2018, at 9:21 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>>
>> On 11/8/2018 3:59 PM, Thomas Monjalon wrote:
>>> Hi,
>>>
>>> We need to gather more information about this bug.
>>> More below.
>>>
>>> 07/11/2018 10:04, Wiles, Keith:
>>>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
>>>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>>>>>>
>>>>>>> This is a workaround to prevent a crash, which might be caused by
>>>>>>> optimization of newer gcc (7.3.0) on Intel Skylake.
>>>>>>
>>>>>> Should the code below not also test for the gcc version and
>>>>>> the Sky Lake processor, maybe I am wrong but it seems it is
>>>>>> turning AVX512 for all GCC builds
>>>>>
>>>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
>>>>> Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
>>>>> Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
>>>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.
>>>>
>>>> Are you not turning off all of the GCC versions for AVX512.
>>>> And you can test for range or greater then GCC version and
>>>> it just seems like we are turning off every gcc version, is that true?
>>>
>>> Do we know exactly which GCC versions are affected?
>>>
>>>>>> Also bug 97 seems a bit obscure reference, maybe you know
>>>>>> the bug report, but more details would be good?
>>>>>
>>>>> I sent out the report to dev list two month ago.
>>>>> And I created the Bug 97 in order to reference it
>>>>> in the commit message.
>>>>> I didn't want to repeat same message here and there,
>>>>> but it would've been better to have some sort of summary
>>>>> of the Bug, although v3 has a few more words.
>>>>> However, v3 has been merged.
>>>>
>>>> Still this is too obscure if nothing else give a link to
>>>> a specific bug not just 97.
>>>
>>> The URL is
>>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D97&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=2o%2Fg203aWrKCYg16S6oI4BcS41igpLu1DloS%2FrRnknc%3D&amp;reserved=0
>>> The bug is also pointing to an email:
>>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111522.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=NCFKxaREd69iZ8eyFKg%2FWBP73CLTXkxrNQQeii%2Bbsao%3D&amp;reserved=0
>>>
>>> Summary:
>>> 	- CPU: Intel Skylake
>>> 	- Linux environment: Ubuntu 18.04
>>> 	- Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)
>>
>> Is it possible to test a few other gcc versions to check if the issue is
>> specific to this compiler version?
> 
> Nothing's impossible but even with my quick search in gcc.gnu.org,
> I could find the following documents mention mavx512f support:
> 
> GCC 4.9.0
> April 22, 2014 (changes, documentation)
>  
> GCC 5.1
> April 22, 2015 (changes, documentation)
>  
> GCC 6.4
> July 4, 2017 (changes, documentation)
>  
> GCC 7.1
> May 2, 2017 (changes, documentation)
>  
> GCC 8.1
> May 2, 2018 (changes, documentation)
> 
> We altogether have to put quite large resource to verify all of the versions.
>  
> I assumed older than gcc 7 would have the same issue. I know it was a speculation
> but like I mentioned I wanted to be more conservative. I didn't mean this is a permanent fix.
> For two months, we couldn't have any tangible solution (actually nobody cared including myself),
> so I submitted the patch to temporarily disable mavx512f.
> 
> I'm still not sure what the best option is...

For permanent fix we need more information, currently we can't re-produce this
defect. Since you can reproduce it we need your support.

Right now we don't know if this is compiler issue or code defect in rte_memcpy()
or something else.

It is easy to disable mavx512f as temporarily solution but it is coming with the
cost of the performance drop, also without knowing the actual root cause I
wouldn't say this is being conservative, actual issue may be just hidden with
this change.

I think as first thing we need to find a way to reproduce this issue in any
other way than using mlx5 PMD. So that we can put more organized effort to fix this.
I attached a simple unit test for rte_memcpy(), if this is a rte_memcpy() with
avx512f defect as claimed, you should be able to see the issue with that, right?
Did you able to find a chance to test it? Do you observer any crash there?

> 
> Thanks,
> Yongseok
> 
>>
>>> 	- Scenario: testpmd crashes when it starts forwarding
>>> 	- Behaviour: AVX2 version of rte_memcpy() optimized with 512b instructions
>>> 	- Fix: disable AVX512 optimization with -mno-avx512f
>>>
>>> It seems to have been reproduced only when using mlx5 PMD so far.
>>> Any other experience?
> 

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] AVX512 bug on SkyLake
  2018-11-09  9:49                 ` Ferruh Yigit
@ 2018-11-09 11:35                   ` Thomas Monjalon
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Monjalon @ 2018-11-09 11:35 UTC (permalink / raw)
  To: Ferruh Yigit, Christian Ehrhardt
  Cc: dev, yskoh, keith.wiles, Bruce Richardson, Shahaf Shuler,
	Ananyev, Konstantin, anatoly.burakov, stable, justin.parus,
	David Coronel, Josh Powers, Jay Vosburgh, Dan Streetman

09/11/2018 10:49, Ferruh Yigit:
> On 11/9/2018 6:27 AM, Christian Ehrhardt wrote:
> > On Fri, Nov 9, 2018 at 12:01 AM Yongseok Koh <yskoh@mellanox.com> wrote:
> >>
> >>
> >>> On Nov 8, 2018, at 9:21 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> >>>
> >>> On 11/8/2018 3:59 PM, Thomas Monjalon wrote:
> >>>> Hi,
> >>>>
> >>>> We need to gather more information about this bug.
> >>>> More below.
> >>>>
> > 
> > Thanks Thomas for looping us in!
> > 
> >>>> 07/11/2018 10:04, Wiles, Keith:
> >>>>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >>>>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> >>>>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >>>>>>>>
> >>>>>>>> This is a workaround to prevent a crash, which might be caused by
> >>>>>>>> optimization of newer gcc (7.3.0) on Intel Skylake.
> >>>>>>>
> >>>>>>> Should the code below not also test for the gcc version and
> >>>>>>> the Sky Lake processor, maybe I am wrong but it seems it is
> >>>>>>> turning AVX512 for all GCC builds
> >>>>>>
> >>>>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
> >>>>>> Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
> >>>>>> Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
> >>>>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.
> >>>>>
> >>>>> Are you not turning off all of the GCC versions for AVX512.
> >>>>> And you can test for range or greater then GCC version and
> >>>>> it just seems like we are turning off every gcc version, is that true?
> >>>>
> >>>> Do we know exactly which GCC versions are affected?
> >>>>
> >>>>>>> Also bug 97 seems a bit obscure reference, maybe you know
> >>>>>>> the bug report, but more details would be good?
> >>>>>>
> >>>>>> I sent out the report to dev list two month ago.
> >>>>>> And I created the Bug 97 in order to reference it
> >>>>>> in the commit message.
> >>>>>> I didn't want to repeat same message here and there,
> >>>>>> but it would've been better to have some sort of summary
> >>>>>> of the Bug, although v3 has a few more words.
> >>>>>> However, v3 has been merged.
> >>>>>
> >>>>> Still this is too obscure if nothing else give a link to
> >>>>> a specific bug not just 97.
> >>>>
> >>>> The URL is
> >>>>      https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D97&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=2o%2Fg203aWrKCYg16S6oI4BcS41igpLu1DloS%2FrRnknc%3D&amp;reserved=0
> >>>> The bug is also pointing to an email:
> >>>>      https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111522.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=NCFKxaREd69iZ8eyFKg%2FWBP73CLTXkxrNQQeii%2Bbsao%3D&amp;reserved=0
> >>>>
> >>>> Summary:
> >>>>      - CPU: Intel Skylake
> >>>>      - Linux environment: Ubuntu 18.04
> >>>>      - Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)
> >>>
> >>> Is it possible to test a few other gcc versions to check if the issue is
> >>> specific to this compiler version?
> >>
> >> Nothing's impossible but even with my quick search in gcc.gnu.org,
> >> I could find the following documents mention mavx512f support:
> >>
> >> GCC 4.9.0
> >> April 22, 2014 (changes, documentation)
> >>
> >> GCC 5.1
> >> April 22, 2015 (changes, documentation)
> >>
> >> GCC 6.4
> >> July 4, 2017 (changes, documentation)
> >>
> >> GCC 7.1
> >> May 2, 2017 (changes, documentation)
> >>
> >> GCC 8.1
> >> May 2, 2018 (changes, documentation)
> >>
> >> We altogether have to put quite large resource to verify all of the versions.
> >>
> >> I assumed older than gcc 7 would have the same issue. I know it was a speculation
> >> but like I mentioned I wanted to be more conservative. I didn't mean this is a permanent fix.
> >> For two months, we couldn't have any tangible solution (actually nobody cared including myself),
> >> so I submitted the patch to temporarily disable mavx512f.
> >>
> >> I'm still not sure what the best option is...
> > 
> > What I wonder in all of this as I don't understand that part of it yet is this.
> > I assume you are building on Ubuntu as that is your gcc reference.
> > FYI: as people asked for bug references, there also is [1] which seems
> > pretty much the same issue.
> > 
> > It builds with mostly defaults, that means per
> > mk/machine/default/rte.vars.mk and similar it sets -march=corei7
> > 
> > But when I look at what that implies all avx512 is disabled
> > $ gcc -Q --help=target -m64 -march=corei7 | grep avx512f
> >  -mavx512f                             [disabled]
> 
> This is output is the "corei7" architecture. Defect is for "native" build on
> Skylake.
> 
> For "native" arch:
> gcc -Q --help=target -m64 -march=native | grep avx512f
>   -mavx512f                             [enabled]

I confirm.


> Also I can confirm from disassembly output that avx512 instructions are used
> without '-mno-avx512f' provided.
> 
> > So I wonder what/why -mno-avx512f should help at all.
> > I used the full list of gcc args we have for the build (e.g. [2] of a
> > 18.05 build), but that doesn't change that (mostly -W, -I and -D).
> > So I wonder, did people do a custom build and bump up march or enable
> > -mavx512f on their own to hit that?
> > Or are we facing a real gcc issue where " -mavx512f [disabled]" is not
> > the same as -mno-avx512f ?
> > Maybe someone who hit the bug could clarify that please?
> > 
> > BTW: per reports I've seen it also seems to apply to the latest
> > compiler update of the same series - at least it was said to be fully
> > updated, that would be 7.3.0-27ubuntu1~18.04
> > But this is 2nd grade information as I don't have a system with the
> > right combo MLX5+Skylake available atm, so I can't confirm for sure
> > :-/

I tried latest gcc 7.3.0-27ubuntu1~18.04, and the bug is still there.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] AVX512 bug on SkyLake
  2018-11-09 10:03               ` Ferruh Yigit
@ 2018-11-09 13:17                 ` Thomas Monjalon
  2018-11-09 14:27                   ` Thomas Monjalon
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Monjalon @ 2018-11-09 13:17 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: stable, Yongseok Koh, keith.wiles, dev, bruce.richardson,
	Shahaf Shuler, konstantin.ananyev, anatoly.burakov, justin.parus,
	christian.ehrhardt, david.coronel, josh.powers, jay.vosburgh,
	dan.streetman

09/11/2018 11:03, Ferruh Yigit:
> On 11/8/2018 11:01 PM, Yongseok Koh wrote:
> > 
> >> On Nov 8, 2018, at 9:21 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> >>
> >> On 11/8/2018 3:59 PM, Thomas Monjalon wrote:
> >>> Hi,
> >>>
> >>> We need to gather more information about this bug.
> >>> More below.
> >>>
> >>> 07/11/2018 10:04, Wiles, Keith:
> >>>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >>>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> >>>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> >>>>>>>
> >>>>>>> This is a workaround to prevent a crash, which might be caused by
> >>>>>>> optimization of newer gcc (7.3.0) on Intel Skylake.
> >>>>>>
> >>>>>> Should the code below not also test for the gcc version and
> >>>>>> the Sky Lake processor, maybe I am wrong but it seems it is
> >>>>>> turning AVX512 for all GCC builds
> >>>>>
> >>>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
> >>>>> Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
> >>>>> Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
> >>>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.
> >>>>
> >>>> Are you not turning off all of the GCC versions for AVX512.
> >>>> And you can test for range or greater then GCC version and
> >>>> it just seems like we are turning off every gcc version, is that true?
> >>>
> >>> Do we know exactly which GCC versions are affected?
> >>>
> >>>>>> Also bug 97 seems a bit obscure reference, maybe you know
> >>>>>> the bug report, but more details would be good?
> >>>>>
> >>>>> I sent out the report to dev list two month ago.
> >>>>> And I created the Bug 97 in order to reference it
> >>>>> in the commit message.
> >>>>> I didn't want to repeat same message here and there,
> >>>>> but it would've been better to have some sort of summary
> >>>>> of the Bug, although v3 has a few more words.
> >>>>> However, v3 has been merged.
> >>>>
> >>>> Still this is too obscure if nothing else give a link to
> >>>> a specific bug not just 97.
> >>>
> >>> The URL is
> >>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D97&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=2o%2Fg203aWrKCYg16S6oI4BcS41igpLu1DloS%2FrRnknc%3D&amp;reserved=0
> >>> The bug is also pointing to an email:
> >>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111522.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=NCFKxaREd69iZ8eyFKg%2FWBP73CLTXkxrNQQeii%2Bbsao%3D&amp;reserved=0
> >>>
> >>> Summary:
> >>> 	- CPU: Intel Skylake
> >>> 	- Linux environment: Ubuntu 18.04
> >>> 	- Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)
> >>
> >> Is it possible to test a few other gcc versions to check if the issue is
> >> specific to this compiler version?
> > 
> > Nothing's impossible but even with my quick search in gcc.gnu.org,
> > I could find the following documents mention mavx512f support:
> > 
> > GCC 4.9.0
> > April 22, 2014 (changes, documentation)
> >  
> > GCC 5.1
> > April 22, 2015 (changes, documentation)
> >  
> > GCC 6.4
> > July 4, 2017 (changes, documentation)
> >  
> > GCC 7.1
> > May 2, 2017 (changes, documentation)
> >  
> > GCC 8.1
> > May 2, 2018 (changes, documentation)
> > 
> > We altogether have to put quite large resource to verify all of the versions.
> >  
> > I assumed older than gcc 7 would have the same issue. I know it was a speculation
> > but like I mentioned I wanted to be more conservative. I didn't mean this is a permanent fix.
> > For two months, we couldn't have any tangible solution (actually nobody cared including myself),
> > so I submitted the patch to temporarily disable mavx512f.
> > 
> > I'm still not sure what the best option is...
> 
> For permanent fix we need more information, currently we can't re-produce this
> defect. Since you can reproduce it we need your support.
> 
> Right now we don't know if this is compiler issue or code defect in rte_memcpy()
> or something else.
> 
> It is easy to disable mavx512f as temporarily solution but it is coming with the
> cost of the performance drop, also without knowing the actual root cause I
> wouldn't say this is being conservative, actual issue may be just hidden with
> this change.
> 
> I think as first thing we need to find a way to reproduce this issue in any
> other way than using mlx5 PMD. So that we can put more organized effort to fix this.
> I attached a simple unit test for rte_memcpy(), if this is a rte_memcpy() with
> avx512f defect as claimed, you should be able to see the issue with that, right?
> Did you able to find a chance to test it? Do you observer any crash there?

I am able to connect to a machine where the issue is reproduced.
So I have tested replacing rte_memcpy with memcpy,
and the crash disappears when using memcpy.
So it confirms that the issue is in rte_memcpy.

About the unit test you attached in bugzilla:
	https://bugs.dpdk.org/attachment.cgi?id=15
It does not reproduce the bug:
	RTE>>rte_memcpy_autotest
	.................................................................
	Test OK

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] AVX512 bug on SkyLake
  2018-11-09 13:17                 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
@ 2018-11-09 14:27                   ` Thomas Monjalon
  2018-11-09 20:06                     ` Ferruh Yigit
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Monjalon @ 2018-11-09 14:27 UTC (permalink / raw)
  To: dev
  Cc: Ferruh Yigit, stable, Yongseok Koh, keith.wiles,
	bruce.richardson, Shahaf Shuler, konstantin.ananyev,
	anatoly.burakov, justin.parus, christian.ehrhardt, david.coronel,
	josh.powers, jay.vosburgh, dan.streetman

09/11/2018 14:17, Thomas Monjalon:
> 09/11/2018 11:03, Ferruh Yigit:
> > On 11/8/2018 11:01 PM, Yongseok Koh wrote:
> > > 
> > >> On Nov 8, 2018, at 9:21 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
> > >>
> > >> On 11/8/2018 3:59 PM, Thomas Monjalon wrote:
> > >>> Hi,
> > >>>
> > >>> We need to gather more information about this bug.
> > >>> More below.
> > >>>
> > >>> 07/11/2018 10:04, Wiles, Keith:
> > >>>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> > >>>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
> > >>>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> > >>>>>>>
> > >>>>>>> This is a workaround to prevent a crash, which might be caused by
> > >>>>>>> optimization of newer gcc (7.3.0) on Intel Skylake.
> > >>>>>>
> > >>>>>> Should the code below not also test for the gcc version and
> > >>>>>> the Sky Lake processor, maybe I am wrong but it seems it is
> > >>>>>> turning AVX512 for all GCC builds
> > >>>>>
> > >>>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
> > >>>>> Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
> > >>>>> Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
> > >>>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.
> > >>>>
> > >>>> Are you not turning off all of the GCC versions for AVX512.
> > >>>> And you can test for range or greater then GCC version and
> > >>>> it just seems like we are turning off every gcc version, is that true?
> > >>>
> > >>> Do we know exactly which GCC versions are affected?
> > >>>
> > >>>>>> Also bug 97 seems a bit obscure reference, maybe you know
> > >>>>>> the bug report, but more details would be good?
> > >>>>>
> > >>>>> I sent out the report to dev list two month ago.
> > >>>>> And I created the Bug 97 in order to reference it
> > >>>>> in the commit message.
> > >>>>> I didn't want to repeat same message here and there,
> > >>>>> but it would've been better to have some sort of summary
> > >>>>> of the Bug, although v3 has a few more words.
> > >>>>> However, v3 has been merged.
> > >>>>
> > >>>> Still this is too obscure if nothing else give a link to
> > >>>> a specific bug not just 97.
> > >>>
> > >>> The URL is
> > >>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D97&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=2o%2Fg203aWrKCYg16S6oI4BcS41igpLu1DloS%2FrRnknc%3D&amp;reserved=0
> > >>> The bug is also pointing to an email:
> > >>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111522.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=NCFKxaREd69iZ8eyFKg%2FWBP73CLTXkxrNQQeii%2Bbsao%3D&amp;reserved=0
> > >>>
> > >>> Summary:
> > >>> 	- CPU: Intel Skylake
> > >>> 	- Linux environment: Ubuntu 18.04
> > >>> 	- Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)
> > >>
> > >> Is it possible to test a few other gcc versions to check if the issue is
> > >> specific to this compiler version?
> > > 
> > > Nothing's impossible but even with my quick search in gcc.gnu.org,
> > > I could find the following documents mention mavx512f support:
> > > 
> > > GCC 4.9.0
> > > April 22, 2014 (changes, documentation)
> > >  
> > > GCC 5.1
> > > April 22, 2015 (changes, documentation)
> > >  
> > > GCC 6.4
> > > July 4, 2017 (changes, documentation)
> > >  
> > > GCC 7.1
> > > May 2, 2017 (changes, documentation)
> > >  
> > > GCC 8.1
> > > May 2, 2018 (changes, documentation)
> > > 
> > > We altogether have to put quite large resource to verify all of the versions.
> > >  
> > > I assumed older than gcc 7 would have the same issue. I know it was a speculation
> > > but like I mentioned I wanted to be more conservative. I didn't mean this is a permanent fix.
> > > For two months, we couldn't have any tangible solution (actually nobody cared including myself),
> > > so I submitted the patch to temporarily disable mavx512f.
> > > 
> > > I'm still not sure what the best option is...
> > 
> > For permanent fix we need more information, currently we can't re-produce this
> > defect. Since you can reproduce it we need your support.
> > 
> > Right now we don't know if this is compiler issue or code defect in rte_memcpy()
> > or something else.
> > 
> > It is easy to disable mavx512f as temporarily solution but it is coming with the
> > cost of the performance drop, also without knowing the actual root cause I
> > wouldn't say this is being conservative, actual issue may be just hidden with
> > this change.
> > 
> > I think as first thing we need to find a way to reproduce this issue in any
> > other way than using mlx5 PMD. So that we can put more organized effort to fix this.
> > I attached a simple unit test for rte_memcpy(), if this is a rte_memcpy() with
> > avx512f defect as claimed, you should be able to see the issue with that, right?
> > Did you able to find a chance to test it? Do you observer any crash there?
> 
> I am able to connect to a machine where the issue is reproduced.
> So I have tested replacing rte_memcpy with memcpy,
> and the crash disappears when using memcpy.
> So it confirms that the issue is in rte_memcpy.

One workaround is to disable CONFIG_RTE_ENABLE_AVX,
but it is disabling AVX and AVX2 for all DPDK code.

A more limited fix (tested) can be to disable AVX2 version of rte_memcpy
and rely on the AVX version (which is not crashing):

--- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
+++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
-#elif defined RTE_MACHINE_CPUFLAG_AVX2
+#elif defined RTE_MACHINE_CPUFLAG_AVX2_disable

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] AVX512 bug on SkyLake
  2018-11-08 15:59         ` [dpdk-dev] AVX512 bug on SkyLake Thomas Monjalon
  2018-11-08 17:21           ` Ferruh Yigit
@ 2018-11-09 18:46           ` Stephen Hemminger
  2018-11-10  2:13           ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
  2 siblings, 0 replies; 30+ messages in thread
From: Stephen Hemminger @ 2018-11-09 18:46 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: keith.wiles, Yongseok Koh, ferruh.yigit, dev, bruce.richardson,
	Shahaf Shuler, konstantin.ananyev, anatoly.burakov, stable,
	justin.parus, christian.ehrhardt, david.coronel, josh.powers,
	jay.vosburgh, dan.streetman

On Thu, 08 Nov 2018 16:59:22 +0100
Thomas Monjalon <thomas@monjalon.net> wrote:

> Hi,
> 
> We need to gather more information about this bug.
> More below.
> 
> 07/11/2018 10:04, Wiles, Keith:
> > > On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:  
> > >> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:  
> > >>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
> > >>> 
> > >>> This is a workaround to prevent a crash, which might be caused by
> > >>> optimization of newer gcc (7.3.0) on Intel Skylake.  
> > >> 
> > >> Should the code below not also test for the gcc version and
> > >> the Sky Lake processor, maybe I am wrong but it seems it is
> > >> turning AVX512 for all GCC builds  
> > > 
> > > I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
> > > Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
> > > Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
> > > And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.  
> > 
> > Are you not turning off all of the GCC versions for AVX512.
> > And you can test for range or greater then GCC version and
> > it just seems like we are turning off every gcc version, is that true?  
> 
> Do we know exactly which GCC versions are affected?
> 
> > >> Also bug 97 seems a bit obscure reference, maybe you know
> > >> the bug report, but more details would be good?  
> > > 
> > > I sent out the report to dev list two month ago.
> > > And I created the Bug 97 in order to reference it
> > > in the commit message.
> > > I didn't want to repeat same message here and there,
> > > but it would've been better to have some sort of summary
> > > of the Bug, although v3 has a few more words.
> > > However, v3 has been merged.  
> > 
> > Still this is too obscure if nothing else give a link to
> > a specific bug not just 97.  
> 
> The URL is
> 	https://bugs.dpdk.org/show_bug.cgi?id=97
> The bug is also pointing to an email:
> 	https://mails.dpdk.org/archives/dev/2018-September/111522.html
> 
> Summary:
> 	- CPU: Intel Skylake
> 	- Linux environment: Ubuntu 18.04
> 	- Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)
> 	- Scenario: testpmd crashes when it starts forwarding
> 	- Behaviour: AVX2 version of rte_memcpy() optimized with 512b instructions
> 	- Fix: disable AVX512 optimization with -mno-avx512f
> 
> It seems to have been reproduced only when using mlx5 PMD so far.
> Any other experience?
> 
> 

I did a little checking, there are only a few machine types on Azure that
are Skylake.   These are obviously the high end lateset ones...
 https://azure.microsoft.com/en-us/blog/fv2-vms-are-now-available-the-fastest-vms-on-azure/

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] AVX512 bug on SkyLake
  2018-11-09 14:27                   ` Thomas Monjalon
@ 2018-11-09 20:06                     ` Ferruh Yigit
  0 siblings, 0 replies; 30+ messages in thread
From: Ferruh Yigit @ 2018-11-09 20:06 UTC (permalink / raw)
  To: Thomas Monjalon, dev
  Cc: stable, Yongseok Koh, keith.wiles, bruce.richardson,
	Shahaf Shuler, konstantin.ananyev, anatoly.burakov, justin.parus,
	christian.ehrhardt, david.coronel, josh.powers, jay.vosburgh,
	dan.streetman

On 11/9/2018 2:27 PM, Thomas Monjalon wrote:
> 09/11/2018 14:17, Thomas Monjalon:
>> 09/11/2018 11:03, Ferruh Yigit:
>>> On 11/8/2018 11:01 PM, Yongseok Koh wrote:
>>>>
>>>>> On Nov 8, 2018, at 9:21 AM, Ferruh Yigit <ferruh.yigit@intel.com> wrote:
>>>>>
>>>>> On 11/8/2018 3:59 PM, Thomas Monjalon wrote:
>>>>>> Hi,
>>>>>>
>>>>>> We need to gather more information about this bug.
>>>>>> More below.
>>>>>>
>>>>>> 07/11/2018 10:04, Wiles, Keith:
>>>>>>>> On Nov 6, 2018, at 9:30 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>>>>>>>> On Nov 5, 2018, at 6:06 AM, Wiles, Keith <keith.wiles@intel.com> wrote:
>>>>>>>>>> On Nov 2, 2018, at 9:04 PM, Yongseok Koh <yskoh@mellanox.com> wrote:
>>>>>>>>>>
>>>>>>>>>> This is a workaround to prevent a crash, which might be caused by
>>>>>>>>>> optimization of newer gcc (7.3.0) on Intel Skylake.
>>>>>>>>>
>>>>>>>>> Should the code below not also test for the gcc version and
>>>>>>>>> the Sky Lake processor, maybe I am wrong but it seems it is
>>>>>>>>> turning AVX512 for all GCC builds
>>>>>>>>
>>>>>>>> I didn't want to check gcc version as 7.3.0 is very new. Only gcc 8 is newly up since then (gcc 8.2).
>>>>>>>> Also, I wasn't able to test every gcc versions and I wanted to be a bit conservative for this crash.
>>>>>>>> Performance drop (if any) by disabling a new (experimental) feature would be less risky than unaccountable crash.
>>>>>>>> And, it does disable the feature only if CONFIG_RTE_ENABLE_AVX512=n. Please refer to v3.
>>>>>>>
>>>>>>> Are you not turning off all of the GCC versions for AVX512.
>>>>>>> And you can test for range or greater then GCC version and
>>>>>>> it just seems like we are turning off every gcc version, is that true?
>>>>>>
>>>>>> Do we know exactly which GCC versions are affected?
>>>>>>
>>>>>>>>> Also bug 97 seems a bit obscure reference, maybe you know
>>>>>>>>> the bug report, but more details would be good?
>>>>>>>>
>>>>>>>> I sent out the report to dev list two month ago.
>>>>>>>> And I created the Bug 97 in order to reference it
>>>>>>>> in the commit message.
>>>>>>>> I didn't want to repeat same message here and there,
>>>>>>>> but it would've been better to have some sort of summary
>>>>>>>> of the Bug, although v3 has a few more words.
>>>>>>>> However, v3 has been merged.
>>>>>>>
>>>>>>> Still this is too obscure if nothing else give a link to
>>>>>>> a specific bug not just 97.
>>>>>>
>>>>>> The URL is
>>>>>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.dpdk.org%2Fshow_bug.cgi%3Fid%3D97&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=2o%2Fg203aWrKCYg16S6oI4BcS41igpLu1DloS%2FrRnknc%3D&amp;reserved=0
>>>>>> The bug is also pointing to an email:
>>>>>> 	https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fmails.dpdk.org%2Farchives%2Fdev%2F2018-September%2F111522.html&amp;data=02%7C01%7Cyskoh%40mellanox.com%7C90ff6c361faf422b976108d6459eb490%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636772945282345908&amp;sdata=NCFKxaREd69iZ8eyFKg%2FWBP73CLTXkxrNQQeii%2Bbsao%3D&amp;reserved=0
>>>>>>
>>>>>> Summary:
>>>>>> 	- CPU: Intel Skylake
>>>>>> 	- Linux environment: Ubuntu 18.04
>>>>>> 	- Compiler: gcc-7.3 (Ubuntu 7.3.0-16ubuntu3)
>>>>>
>>>>> Is it possible to test a few other gcc versions to check if the issue is
>>>>> specific to this compiler version?
>>>>
>>>> Nothing's impossible but even with my quick search in gcc.gnu.org,
>>>> I could find the following documents mention mavx512f support:
>>>>
>>>> GCC 4.9.0
>>>> April 22, 2014 (changes, documentation)
>>>>  
>>>> GCC 5.1
>>>> April 22, 2015 (changes, documentation)
>>>>  
>>>> GCC 6.4
>>>> July 4, 2017 (changes, documentation)
>>>>  
>>>> GCC 7.1
>>>> May 2, 2017 (changes, documentation)
>>>>  
>>>> GCC 8.1
>>>> May 2, 2018 (changes, documentation)
>>>>
>>>> We altogether have to put quite large resource to verify all of the versions.
>>>>  
>>>> I assumed older than gcc 7 would have the same issue. I know it was a speculation
>>>> but like I mentioned I wanted to be more conservative. I didn't mean this is a permanent fix.
>>>> For two months, we couldn't have any tangible solution (actually nobody cared including myself),
>>>> so I submitted the patch to temporarily disable mavx512f.
>>>>
>>>> I'm still not sure what the best option is...
>>>
>>> For permanent fix we need more information, currently we can't re-produce this
>>> defect. Since you can reproduce it we need your support.
>>>
>>> Right now we don't know if this is compiler issue or code defect in rte_memcpy()
>>> or something else.
>>>
>>> It is easy to disable mavx512f as temporarily solution but it is coming with the
>>> cost of the performance drop, also without knowing the actual root cause I
>>> wouldn't say this is being conservative, actual issue may be just hidden with
>>> this change.
>>>
>>> I think as first thing we need to find a way to reproduce this issue in any
>>> other way than using mlx5 PMD. So that we can put more organized effort to fix this.
>>> I attached a simple unit test for rte_memcpy(), if this is a rte_memcpy() with
>>> avx512f defect as claimed, you should be able to see the issue with that, right?
>>> Did you able to find a chance to test it? Do you observer any crash there?
>>
>> I am able to connect to a machine where the issue is reproduced.
>> So I have tested replacing rte_memcpy with memcpy,
>> and the crash disappears when using memcpy.
>> So it confirms that the issue is in rte_memcpy.
> 
> One workaround is to disable CONFIG_RTE_ENABLE_AVX,
> but it is disabling AVX and AVX2 for all DPDK code.
> 
> A more limited fix (tested) can be to disable AVX2 version of rte_memcpy
> and rely on the AVX version (which is not crashing):
> 
> --- a/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
> +++ b/lib/librte_eal/common/include/arch/x86/rte_memcpy.h
> -#elif defined RTE_MACHINE_CPUFLAG_AVX2
> +#elif defined RTE_MACHINE_CPUFLAG_AVX2_disable

I put a patch into bugzilla:
https://bugs.dpdk.org/attachment.cgi?id=18&action=diff

Can you please check if this workaround prevents the crash without performance drop.

Also there is another suggestion from Yongseok, that looks simpler, but not
covering CONFIG_RTE_ENABLE_AVX512=y case.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] AVX512 bug on SkyLake
  2018-11-08 15:59         ` [dpdk-dev] AVX512 bug on SkyLake Thomas Monjalon
  2018-11-08 17:21           ` Ferruh Yigit
  2018-11-09 18:46           ` [dpdk-dev] " Stephen Hemminger
@ 2018-11-10  2:13           ` Thomas Monjalon
  2018-11-11 14:15             ` Ananyev, Konstantin
  2 siblings, 1 reply; 30+ messages in thread
From: Thomas Monjalon @ 2018-11-10  2:13 UTC (permalink / raw)
  To: ferruh.yigit, bruce.richardson, konstantin.ananyev
  Cc: stable, keith.wiles, Yongseok Koh, dev, Shahaf Shuler,
	anatoly.burakov, justin.parus, christian.ehrhardt, david.coronel,
	josh.powers, jay.vosburgh, dan.streetman

Below is my conclusion for this bug.
An expert of x86 is required to follow-up.

Summary:
	- CPU: Intel Skylake
	- Linux environment: Ubuntu 18.04
	- Compiler: GCC 7 or 8
	- Scenario: testpmd crashes when it starts forwarding
	- Behaviour: AVX2 version of rte_memcpy() fails if optimized for AVX512
	- Context: inline rte_memcpy() is called from
			inline rte_mempool_put_bulk(), called from
			mlx5_tx_complete() (inline or not)
	- Analysis: AVX512 optimization changes vmovdqu to vmovdqu8

Latest status can be found in Bugzilla:
	https://bugs.dpdk.org/show_bug.cgi?id=97#c35

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] AVX512 bug on SkyLake
  2018-11-10  2:13           ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
@ 2018-11-11 14:15             ` Ananyev, Konstantin
  2018-11-11 18:15               ` Thomas Monjalon
  0 siblings, 1 reply; 30+ messages in thread
From: Ananyev, Konstantin @ 2018-11-11 14:15 UTC (permalink / raw)
  To: Thomas Monjalon, Yigit, Ferruh, Richardson, Bruce
  Cc: stable, Wiles, Keith, Yongseok Koh, dev, Shahaf Shuler, Burakov,
	Anatoly, justin.parus, christian.ehrhardt, david.coronel,
	josh.powers, jay.vosburgh, dan.streetman

Hi Thomas,

> 
> Below is my conclusion for this bug.
> An expert of x86 is required to follow-up.
> 
> Summary:
> 	- CPU: Intel Skylake
> 	- Linux environment: Ubuntu 18.04
> 	- Compiler: GCC 7 or 8
> 	- Scenario: testpmd crashes when it starts forwarding
> 	- Behaviour: AVX2 version of rte_memcpy() fails if optimized for AVX512
> 	- Context: inline rte_memcpy() is called from
> 			inline rte_mempool_put_bulk(), called from
> 			mlx5_tx_complete() (inline or not)
> 	- Analysis: AVX512 optimization changes vmovdqu to vmovdqu8
> 
> Latest status can be found in Bugzilla:
> 	https://bugs.dpdk.org/show_bug.cgi?id=97#c35


Looking at dissamled output from the bug report, it seems that the
problem is not in vmovdqu8 instruction itself, but in the wrong offsets
generated by the compiler:

   vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2]
   vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1
    vmovups XMMWORD PTR [rsi+0x20],xmm0
    vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1
    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4]
    vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1
    vmovups XMMWORD PTR [rsi+0x40],xmm0
    vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1
    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6]

Should be:
vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x20]
I think.

Same for next two offsets: 0x4 and 0x6 respectively should be 0x40 and 0x60.

Not sure what causing compiler behaves that way.
BTW, looking though testpmd objdump output - it seems that only mlx5 driver
exhibits such problem (I didn't enable mlx4 actually, probably same problem here).
Which looks a bit weird to me.
Konstantin

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] AVX512 bug on SkyLake
  2018-11-11 14:15             ` Ananyev, Konstantin
@ 2018-11-11 18:15               ` Thomas Monjalon
  2018-11-12  9:09                 ` Christian Ehrhardt
  2018-11-12  9:26                 ` Ananyev, Konstantin
  0 siblings, 2 replies; 30+ messages in thread
From: Thomas Monjalon @ 2018-11-11 18:15 UTC (permalink / raw)
  To: Ananyev, Konstantin
  Cc: Yigit, Ferruh, Richardson, Bruce, stable, Wiles, Keith,
	Yongseok Koh, dev, Shahaf Shuler, Burakov, Anatoly, justin.parus,
	christian.ehrhardt, david.coronel, josh.powers, jay.vosburgh,
	dan.streetman

11/11/2018 15:15, Ananyev, Konstantin:
> Hi Thomas,
> 
> > Below is my conclusion for this bug.
> > An expert of x86 is required to follow-up.
> > 
> > Summary:
> > 	- CPU: Intel Skylake
> > 	- Linux environment: Ubuntu 18.04
> > 	- Compiler: GCC 7 or 8
> > 	- Scenario: testpmd crashes when it starts forwarding
> > 	- Behaviour: AVX2 version of rte_memcpy() fails if optimized for AVX512
> > 	- Context: inline rte_memcpy() is called from
> > 			inline rte_mempool_put_bulk(), called from
> > 			mlx5_tx_complete() (inline or not)
> > 	- Analysis: AVX512 optimization changes vmovdqu to vmovdqu8
> > 
> > Latest status can be found in Bugzilla:
> > 	https://bugs.dpdk.org/show_bug.cgi?id=97#c35
> 
> 
> Looking at dissamled output from the bug report, it seems that the
> problem is not in vmovdqu8 instruction itself, but in the wrong offsets
> generated by the compiler:
> 
>    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2]
>    vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1
>     vmovups XMMWORD PTR [rsi+0x20],xmm0
>     vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1
>     vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4]
>     vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1
>     vmovups XMMWORD PTR [rsi+0x40],xmm0
>     vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1
>     vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6]
> 
> Should be:
> vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x20]
> I think.
> 
> Same for next two offsets: 0x4 and 0x6 respectively should be 0x40 and 0x60.

Yes, you're right, I missed it, thank you!

The full diff is below:

--- bad-avx512-enabled
+++ good-avx512-disabled
-    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x0]
+    vmovdqu xmm0,XMMWORD PTR [rax*8+0x0]
     vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x10],0x1
     vmovups XMMWORD PTR [rsi],xmm0
     vextracti128 XMMWORD PTR [rsi+0x10],ymm0,0x1
-    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2]
+    vmovdqu xmm0,XMMWORD PTR [rax*8+0x20]
     vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1
     vmovups XMMWORD PTR [rsi+0x20],xmm0
     vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1
-    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4]
+    vmovdqu xmm0,XMMWORD PTR [rax*8+0x40]
     vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1
     vmovups XMMWORD PTR [rsi+0x40],xmm0
     vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1
-    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6]
+    vmovdqu xmm0,XMMWORD PTR [rax*8+0x60]
     vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x70],0x1
     vmovups XMMWORD PTR [rsi+0x60],xmm0
     vextracti128 XMMWORD PTR [rsi+0x70],ymm0,0x1

> Not sure what causing compiler behaves that way.
> BTW, looking though testpmd objdump output - it seems that only mlx5 driver
> exhibits such problem (I didn't enable mlx4 actually, probably same problem here).
> Which looks a bit weird to me.

Yes it's weird. I don't see how the mlx5 code could influence
the compiler to generate this bad code in AVX512 mode.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] AVX512 bug on SkyLake
  2018-11-11 18:15               ` Thomas Monjalon
@ 2018-11-12  9:09                 ` Christian Ehrhardt
  2018-11-12  9:21                   ` Thomas Monjalon
  2018-11-12  9:26                 ` Ananyev, Konstantin
  1 sibling, 1 reply; 30+ messages in thread
From: Christian Ehrhardt @ 2018-11-12  9:09 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Ananyev, Konstantin, Ferruh Yigit, Bruce Richardson, stable,
	keith.wiles, yskoh, dev, Shahaf Shuler, anatoly.burakov,
	justin.parus, David Coronel, Josh Powers, Jay Vosburgh,
	Dan Streetman

> -    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6]
> +    vmovdqu xmm0,XMMWORD PTR [rax*8+0x60]
>      vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x70],0x1
>      vmovups XMMWORD PTR [rsi+0x60],xmm0
>      vextracti128 XMMWORD PTR [rsi+0x70],ymm0,0x1
>
> > Not sure what causing compiler behaves that way.
> > BTW, looking though testpmd objdump output - it seems that only mlx5 driver
> > exhibits such problem (I didn't enable mlx4 actually, probably same problem here).
> > Which looks a bit weird to me.
>
> Yes it's weird. I don't see how the mlx5 code could influence
> the compiler to generate this bad code in AVX512 mode.

Thomas you have all this set up, do you have any chance to test this
on the GCC's in Ubuntu 18.10 and 19.04
If easy I'd love to see results wit hgcc-7 & gcc-8 as in Ubuntu 19.04
(current -dev).
If the above is too hard, at least could you try the gcc-8 in Bionic
is 8.2.0-1ubuntu2~18.04 that is rather close.

If you could share the simplified build options that you need to
reproduce that would help (at least me) - thanks in advance

-- 
Christian Ehrhardt
Software Engineer, Ubuntu Server
Canonical Ltd

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] AVX512 bug on SkyLake
  2018-11-12  9:09                 ` Christian Ehrhardt
@ 2018-11-12  9:21                   ` Thomas Monjalon
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Monjalon @ 2018-11-12  9:21 UTC (permalink / raw)
  To: Christian Ehrhardt
  Cc: Ananyev, Konstantin, Ferruh Yigit, Bruce Richardson, stable,
	keith.wiles, yskoh, dev, Shahaf Shuler, anatoly.burakov,
	justin.parus, David Coronel, Josh Powers, Jay Vosburgh,
	Dan Streetman

12/11/2018 10:09, Christian Ehrhardt:
> > -    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6]
> > +    vmovdqu xmm0,XMMWORD PTR [rax*8+0x60]
> >      vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x70],0x1
> >      vmovups XMMWORD PTR [rsi+0x60],xmm0
> >      vextracti128 XMMWORD PTR [rsi+0x70],ymm0,0x1
> >
> > > Not sure what causing compiler behaves that way.
> > > BTW, looking though testpmd objdump output - it seems that only mlx5 driver
> > > exhibits such problem (I didn't enable mlx4 actually, probably same problem here).
> > > Which looks a bit weird to me.
> >
> > Yes it's weird. I don't see how the mlx5 code could influence
> > the compiler to generate this bad code in AVX512 mode.
> 
> Thomas you have all this set up, do you have any chance to test this
> on the GCC's in Ubuntu 18.10 and 19.04
> If easy I'd love to see results wit hgcc-7 & gcc-8 as in Ubuntu 19.04
> (current -dev).
> If the above is too hard, at least could you try the gcc-8 in Bionic
> is 8.2.0-1ubuntu2~18.04 that is rather close.

I already tested updated GCC 7 and 8 (it is the same result):
	https://bugs.dpdk.org/show_bug.cgi?id=97#c18

Versions are:
	gcc-7 (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0
	gcc-8 (Ubuntu 8.2.0-1ubuntu2~18.04) 8.2.0

> If you could share the simplified build options that you need to
> reproduce that would help (at least me) - thanks in advance

You just need to compile mlx5, nothing special.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [dpdk-dev] [dpdk-stable] AVX512 bug on SkyLake
  2018-11-11 18:15               ` Thomas Monjalon
  2018-11-12  9:09                 ` Christian Ehrhardt
@ 2018-11-12  9:26                 ` Ananyev, Konstantin
  1 sibling, 0 replies; 30+ messages in thread
From: Ananyev, Konstantin @ 2018-11-12  9:26 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Yigit, Ferruh, Richardson, Bruce, stable, Wiles, Keith,
	Yongseok Koh, dev, Shahaf Shuler, Burakov, Anatoly, justin.parus,
	christian.ehrhardt, david.coronel, josh.powers, jay.vosburgh,
	dan.streetman


> 11/11/2018 15:15, Ananyev, Konstantin:
> > Hi Thomas,
> >
> > > Below is my conclusion for this bug.
> > > An expert of x86 is required to follow-up.
> > >
> > > Summary:
> > > 	- CPU: Intel Skylake
> > > 	- Linux environment: Ubuntu 18.04
> > > 	- Compiler: GCC 7 or 8
> > > 	- Scenario: testpmd crashes when it starts forwarding
> > > 	- Behaviour: AVX2 version of rte_memcpy() fails if optimized for AVX512
> > > 	- Context: inline rte_memcpy() is called from
> > > 			inline rte_mempool_put_bulk(), called from
> > > 			mlx5_tx_complete() (inline or not)
> > > 	- Analysis: AVX512 optimization changes vmovdqu to vmovdqu8
> > >
> > > Latest status can be found in Bugzilla:
> > > 	https://bugs.dpdk.org/show_bug.cgi?id=97#c35
> >
> >
> > Looking at dissamled output from the bug report, it seems that the
> > problem is not in vmovdqu8 instruction itself, but in the wrong offsets
> > generated by the compiler:
> >
> >    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2]
> >    vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1
> >     vmovups XMMWORD PTR [rsi+0x20],xmm0
> >     vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1
> >     vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4]
> >     vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1
> >     vmovups XMMWORD PTR [rsi+0x40],xmm0
> >     vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1
> >     vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6]
> >
> > Should be:
> > vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x20]
> > I think.
> >
> > Same for next two offsets: 0x4 and 0x6 respectively should be 0x40 and 0x60.
> 
> Yes, you're right, I missed it, thank you!
> 
> The full diff is below:
> 
> --- bad-avx512-enabled
> +++ good-avx512-disabled
> -    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x0]
> +    vmovdqu xmm0,XMMWORD PTR [rax*8+0x0]
>      vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x10],0x1
>      vmovups XMMWORD PTR [rsi],xmm0
>      vextracti128 XMMWORD PTR [rsi+0x10],ymm0,0x1
> -    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x2]
> +    vmovdqu xmm0,XMMWORD PTR [rax*8+0x20]
>      vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x30],0x1
>      vmovups XMMWORD PTR [rsi+0x20],xmm0
>      vextracti128 XMMWORD PTR [rsi+0x30],ymm0,0x1
> -    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x4]
> +    vmovdqu xmm0,XMMWORD PTR [rax*8+0x40]
>      vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x50],0x1
>      vmovups XMMWORD PTR [rsi+0x40],xmm0
>      vextracti128 XMMWORD PTR [rsi+0x50],ymm0,0x1
> -    vmovdqu8 xmm0,XMMWORD PTR [rax*8+0x6]
> +    vmovdqu xmm0,XMMWORD PTR [rax*8+0x60]
>      vinserti128 ymm0,ymm0,XMMWORD PTR [rax*8+0x70],0x1
>      vmovups XMMWORD PTR [rsi+0x60],xmm0
>      vextracti128 XMMWORD PTR [rsi+0x70],ymm0,0x1
> 
> > Not sure what causing compiler behaves that way.
> > BTW, looking though testpmd objdump output - it seems that only mlx5 driver
> > exhibits such problem (I didn't enable mlx4 actually, probably same problem here).
> > Which looks a bit weird to me.
> 
> Yes it's weird. I don't see how the mlx5 code could influence
> the compiler to generate this bad code in AVX512 mode.

Same here, looked through mlx5_rxtx code, it is unclear to me
what triggers the issue.
So far looks like gcc bug to me.
Konstantin

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2018-11-12  9:26 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-10-23 21:23 [dpdk-dev] [PATCH] build: disable compiler AVX512F support Yongseok Koh
2018-11-01 23:11 ` Thomas Monjalon
2018-11-02 12:42 ` [dpdk-dev] [dpdk-stable] " Ferruh Yigit
2018-11-02 13:48   ` Ferruh Yigit
2018-11-02 20:59     ` Yongseok Koh
2018-11-02 21:46       ` Ferruh Yigit
2018-11-02 23:31         ` Yongseok Koh
2018-11-02 21:04 ` [dpdk-dev] [PATCH v2] " Yongseok Koh
2018-11-05 14:06   ` Wiles, Keith
2018-11-06 21:30     ` Yongseok Koh
2018-11-07  9:04       ` Wiles, Keith
2018-11-08 15:59         ` [dpdk-dev] AVX512 bug on SkyLake Thomas Monjalon
2018-11-08 17:21           ` Ferruh Yigit
2018-11-08 23:01             ` Yongseok Koh
2018-11-09  6:27               ` Christian Ehrhardt
2018-11-09  9:49                 ` Ferruh Yigit
2018-11-09 11:35                   ` Thomas Monjalon
2018-11-09 10:03               ` Ferruh Yigit
2018-11-09 13:17                 ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
2018-11-09 14:27                   ` Thomas Monjalon
2018-11-09 20:06                     ` Ferruh Yigit
2018-11-09 18:46           ` [dpdk-dev] " Stephen Hemminger
2018-11-10  2:13           ` [dpdk-dev] [dpdk-stable] " Thomas Monjalon
2018-11-11 14:15             ` Ananyev, Konstantin
2018-11-11 18:15               ` Thomas Monjalon
2018-11-12  9:09                 ` Christian Ehrhardt
2018-11-12  9:21                   ` Thomas Monjalon
2018-11-12  9:26                 ` Ananyev, Konstantin
2018-11-03  1:06 ` [dpdk-dev] [PATCH v3] build: disable gcc AVX512F support Yongseok Koh
2018-11-04 20:56   ` Thomas Monjalon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).