DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH] app/testpmd: fix IP checksum calculation
@ 2020-12-03 13:59 George Prekas
  2020-12-03 16:08 ` Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: George Prekas @ 2020-12-03 13:59 UTC (permalink / raw)
  To: Wenzhuo Lu, Beilei Xing, Bernard Iremonger; +Cc: dev, George Prekas

Insert a compiler barrier to make sure that the IP checksum calculation
happens after setting all the fields of the IP header.

Signed-off-by: George Prekas <prekageo@amazon.com>
---
 app/test-pmd/flowgen.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c
index acf3e2460..893b4b0b8 100644
--- a/app/test-pmd/flowgen.c
+++ b/app/test-pmd/flowgen.c
@@ -150,6 +150,7 @@ pkt_burst_flow_gen(struct fwd_stream *fs)
 							   next_flow);
 		ip_hdr->total_length	= RTE_CPU_TO_BE_16(pkt_size -
 							   sizeof(*eth_hdr));
+		rte_compiler_barrier();
 		ip_hdr->hdr_checksum	= ip_sum((unaligned_uint16_t *)ip_hdr,
 						 sizeof(*ip_hdr));
 
-- 
2.17.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH] app/testpmd: fix IP checksum calculation
  2020-12-03 13:59 [dpdk-dev] [PATCH] app/testpmd: fix IP checksum calculation George Prekas
@ 2020-12-03 16:08 ` Stephen Hemminger
  2020-12-03 16:35   ` George Prekas
  2020-12-04  8:59 ` Ferruh Yigit
  2020-12-05  5:42 ` [dpdk-dev] [PATCH v2] " George Prekas
  2 siblings, 1 reply; 21+ messages in thread
From: Stephen Hemminger @ 2020-12-03 16:08 UTC (permalink / raw)
  To: George Prekas; +Cc: Wenzhuo Lu, Beilei Xing, Bernard Iremonger, dev

On Thu, 3 Dec 2020 07:59:54 -0600
George Prekas <prekageo@amazon.com> wrote:

> Insert a compiler barrier to make sure that the IP checksum calculation
> happens after setting all the fields of the IP header.
> 
> Signed-off-by: George Prekas <prekageo@amazon.com>

I don't think this is necessary. All other OS's don't have to do this.
The CPU is going to maintain proper memory order.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH] app/testpmd: fix IP checksum calculation
  2020-12-03 16:08 ` Stephen Hemminger
@ 2020-12-03 16:35   ` George Prekas
  2020-12-03 18:33     ` Stephen Hemminger
  0 siblings, 1 reply; 21+ messages in thread
From: George Prekas @ 2020-12-03 16:35 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Wenzhuo Lu, Beilei Xing, Bernard Iremonger, dev

On 12/3/2020 10:08 AM, Stephen Hemminger wrote:
> On Thu, 3 Dec 2020 07:59:54 -0600
> George Prekas <prekageo@amazon.com> wrote:
>
>> Insert a compiler barrier to make sure that the IP checksum calculation
>> happens after setting all the fields of the IP header.
>>
>> Signed-off-by: George Prekas <prekageo@amazon.com>
> I don't think this is necessary. All other OS's don't have to do this.
> The CPU is going to maintain proper memory order.

Hi Stephen,

This is not a CPU or OS issue. This is a compiler issue. The compiler is 
free to reorder statements if it does not detect dependencies between 
them. Without this compiler barrier, the calculated IP checksum is wrong.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH] app/testpmd: fix IP checksum calculation
  2020-12-03 16:35   ` George Prekas
@ 2020-12-03 18:33     ` Stephen Hemminger
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Hemminger @ 2020-12-03 18:33 UTC (permalink / raw)
  To: George Prekas; +Cc: Wenzhuo Lu, Beilei Xing, Bernard Iremonger, dev

On Thu, 3 Dec 2020 10:35:50 -0600
George Prekas <prekageo@amazon.com> wrote:

> On 12/3/2020 10:08 AM, Stephen Hemminger wrote:
> > On Thu, 3 Dec 2020 07:59:54 -0600
> > George Prekas <prekageo@amazon.com> wrote:
> >  
> >> Insert a compiler barrier to make sure that the IP checksum calculation
> >> happens after setting all the fields of the IP header.
> >>
> >> Signed-off-by: George Prekas <prekageo@amazon.com>  
> > I don't think this is necessary. All other OS's don't have to do this.
> > The CPU is going to maintain proper memory order.  
> 
> Hi Stephen,
> 
> This is not a CPU or OS issue. This is a compiler issue. The compiler is 
> free to reorder statements if it does not detect dependencies between 
> them. Without this compiler barrier, the calculated IP checksum is wrong.
> 
> 

But the compiler should be detecting any aliasing here, if no you have
a bad compiler.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH] app/testpmd: fix IP checksum calculation
  2020-12-03 13:59 [dpdk-dev] [PATCH] app/testpmd: fix IP checksum calculation George Prekas
  2020-12-03 16:08 ` Stephen Hemminger
@ 2020-12-04  8:59 ` Ferruh Yigit
  2020-12-05  5:47   ` George Prekas
  2020-12-05  5:42 ` [dpdk-dev] [PATCH v2] " George Prekas
  2 siblings, 1 reply; 21+ messages in thread
From: Ferruh Yigit @ 2020-12-04  8:59 UTC (permalink / raw)
  To: George Prekas, Wenzhuo Lu, Beilei Xing, Bernard Iremonger; +Cc: dev

On 12/3/2020 1:59 PM, George Prekas wrote:
> Insert a compiler barrier to make sure that the IP checksum calculation
> happens after setting all the fields of the IP header.
> 

Can you please provide the compiler details, and if there is any specific 
instruction on how to reproduce this failure?

> Signed-off-by: George Prekas <prekageo@amazon.com>
> ---
>   app/test-pmd/flowgen.c | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c
> index acf3e2460..893b4b0b8 100644
> --- a/app/test-pmd/flowgen.c
> +++ b/app/test-pmd/flowgen.c
> @@ -150,6 +150,7 @@ pkt_burst_flow_gen(struct fwd_stream *fs)
>   							   next_flow);
>   		ip_hdr->total_length	= RTE_CPU_TO_BE_16(pkt_size -
>   							   sizeof(*eth_hdr));
> +		rte_compiler_barrier();
>   		ip_hdr->hdr_checksum	= ip_sum((unaligned_uint16_t *)ip_hdr,
>   						 sizeof(*ip_hdr));
>   
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2020-12-03 13:59 [dpdk-dev] [PATCH] app/testpmd: fix IP checksum calculation George Prekas
  2020-12-03 16:08 ` Stephen Hemminger
  2020-12-04  8:59 ` Ferruh Yigit
@ 2020-12-05  5:42 ` George Prekas
  2021-01-05 16:26   ` George Prekas
                     ` (2 more replies)
  2 siblings, 3 replies; 21+ messages in thread
From: George Prekas @ 2020-12-05  5:42 UTC (permalink / raw)
  To: Wenzhuo Lu, Beilei Xing, Bernard Iremonger
  Cc: dev, Stephen Hemminger, Ferruh Yigit, George Prekas

Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c
and the calculated IP checksum is wrong on GCC 9 and GCC 10.

Signed-off-by: George Prekas <prekageo@amazon.com>
---
v2:
* Instead of a compiler barrier, use a compiler flag.
---
 app/test-pmd/meson.build | 1 +
 1 file changed, 1 insertion(+)

diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
index 7e9c7bdd6..5d24e807f 100644
--- a/app/test-pmd/meson.build
+++ b/app/test-pmd/meson.build
@@ -4,6 +4,7 @@
 # override default name to drop the hyphen
 name = 'testpmd'
 cflags += '-Wno-deprecated-declarations'
+cflags += '-fno-strict-aliasing'
 sources = files('5tswap.c',
 	'cmdline.c',
 	'cmdline_flow.c',
-- 
2.17.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH] app/testpmd: fix IP checksum calculation
  2020-12-04  8:59 ` Ferruh Yigit
@ 2020-12-05  5:47   ` George Prekas
  0 siblings, 0 replies; 21+ messages in thread
From: George Prekas @ 2020-12-05  5:47 UTC (permalink / raw)
  To: Ferruh Yigit, Wenzhuo Lu, Beilei Xing, Bernard Iremonger; +Cc: dev


On 12/4/2020 2:59 AM, Ferruh Yigit wrote:
> CAUTION: This email originated from outside of the organization. Do 
> not click links or open attachments unless you can confirm the sender 
> and know the content is safe.
>
>
>
> On 12/3/2020 1:59 PM, George Prekas wrote:
>> Insert a compiler barrier to make sure that the IP checksum calculation
>> happens after setting all the fields of the IP header.
>>
>
> Can you please provide the compiler details, and if there is any specific
> instruction on how to reproduce this failure?

This happens with GCC 9 and GCC 10. It works fine on GCC 8.

Stephen was right that a compiler barrier here is not the right 
solution. After spending some time on it, I realized that it is an 
aliasing problem when casting the IP header to uint16_t*. As far as I 
understand, this is not allowed by the C standard. As far as I know, 
there are 3 ways to fix this problem: Use a union, use memcpy, or set 
the compiler flag -fno-strict-aliasing. I assume that the last option is 
the least intrusive. I've submitted a second version of the patch with it.

Let me know of your opinion.

>
>> Signed-off-by: George Prekas <prekageo@amazon.com>
>> ---
>>   app/test-pmd/flowgen.c | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c
>> index acf3e2460..893b4b0b8 100644
>> --- a/app/test-pmd/flowgen.c
>> +++ b/app/test-pmd/flowgen.c
>> @@ -150,6 +150,7 @@ pkt_burst_flow_gen(struct fwd_stream *fs)
>> next_flow);
>>               ip_hdr->total_length    = RTE_CPU_TO_BE_16(pkt_size -
>> sizeof(*eth_hdr));
>> +             rte_compiler_barrier();
>>               ip_hdr->hdr_checksum    = ip_sum((unaligned_uint16_t 
>> *)ip_hdr,
>>                                                sizeof(*ip_hdr));
>>
>>
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2020-12-05  5:42 ` [dpdk-dev] [PATCH v2] " George Prekas
@ 2021-01-05 16:26   ` George Prekas
  2021-01-06 18:02   ` Ferruh Yigit
  2021-01-07 20:42   ` [dpdk-dev] [PATCH v3] " George Prekas
  2 siblings, 0 replies; 21+ messages in thread
From: George Prekas @ 2021-01-05 16:26 UTC (permalink / raw)
  To: Wenzhuo Lu, Beilei Xing, Bernard Iremonger, Stephen Hemminger,
	Ferruh Yigit
  Cc: dev

On 12/4/2020 11:42 PM, George Prekas wrote:
> Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c
> and the calculated IP checksum is wrong on GCC 9 and GCC 10.
>
> Signed-off-by: George Prekas <prekageo@amazon.com>
> ---
> v2:
> * Instead of a compiler barrier, use a compiler flag.
> ---
>   app/test-pmd/meson.build | 1 +
>   1 file changed, 1 insertion(+)
>
> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
> index 7e9c7bdd6..5d24e807f 100644
> --- a/app/test-pmd/meson.build
> +++ b/app/test-pmd/meson.build
> @@ -4,6 +4,7 @@
>   # override default name to drop the hyphen
>   name = 'testpmd'
>   cflags += '-Wno-deprecated-declarations'
> +cflags += '-fno-strict-aliasing'
>   sources = files('5tswap.c',
>   	'cmdline.c',
>   	'cmdline_flow.c',
Happy New Year!

Any updates on this?

Thanks,
George

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2020-12-05  5:42 ` [dpdk-dev] [PATCH v2] " George Prekas
  2021-01-05 16:26   ` George Prekas
@ 2021-01-06 18:02   ` Ferruh Yigit
  2021-01-07  5:25     ` Stephen Hemminger
  2021-01-07  5:39     ` George Prekas
  2021-01-07 20:42   ` [dpdk-dev] [PATCH v3] " George Prekas
  2 siblings, 2 replies; 21+ messages in thread
From: Ferruh Yigit @ 2021-01-06 18:02 UTC (permalink / raw)
  To: George Prekas, Wenzhuo Lu, Beilei Xing, Bernard Iremonger
  Cc: dev, Stephen Hemminger

On 12/5/2020 5:42 AM, George Prekas wrote:
> Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c
> and the calculated IP checksum is wrong on GCC 9 and GCC 10.
> 
> Signed-off-by: George Prekas <prekageo@amazon.com>
> ---
> v2:
> * Instead of a compiler barrier, use a compiler flag.
> ---
>   app/test-pmd/meson.build | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
> index 7e9c7bdd6..5d24e807f 100644
> --- a/app/test-pmd/meson.build
> +++ b/app/test-pmd/meson.build
> @@ -4,6 +4,7 @@
>   # override default name to drop the hyphen
>   name = 'testpmd'
>   cflags += '-Wno-deprecated-declarations'
> +cflags += '-fno-strict-aliasing'
>   sources = files('5tswap.c',
>   	'cmdline.c',
>   	'cmdline_flow.c',
> 

Hi George,

I am trying to understand this, the relevant code is as below:
ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, sizeof(*ip_hdr));

You are suspicious of strict aliasing rule violation, with more details:
The concern is the "struct rte_ipv4_hdr *ip_hdr;" aliased to "const 
unaligned_uint16_t *hdr", and compiler can optimize out the calculations using 
data pointed by 'hdr' pointer, since the 'hdr' pointer is not used to alter the 
data and compiler may think data is not changed at all.

1) But the pointer "hdr" is assigned in the loop, from another pointer whose 
content is changing, why this is not helping to figure out that the data 'hdr' 
pointing is changed.

2) I tried to debug this, but I am not able to reproduce the issue, 'ip_sum()' 
called each time and checksum calculated correctly. Using gcc 10.2.1-9. Can you 
able to confirm the case with debug, or from the assembly/object file?


And if the issue is strict aliasing rule violation as you said, compiler flag is 
an option but not sure how much it reduces the compiler optimization benefit, I 
guess other options also not so good, memcpy brings too much work on runtime and 
union requires bigger change and makes code complex.
I wonder if making 'ip_sum()' a non inline function can help, can you please 
give a try since you can reproduce it?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2021-01-06 18:02   ` Ferruh Yigit
@ 2021-01-07  5:25     ` Stephen Hemminger
  2021-01-07  5:39     ` George Prekas
  1 sibling, 0 replies; 21+ messages in thread
From: Stephen Hemminger @ 2021-01-07  5:25 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: George Prekas, Wenzhuo Lu, Beilei Xing, Bernard Iremonger, dev

On Wed, 6 Jan 2021 18:02:49 +0000
Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> On 12/5/2020 5:42 AM, George Prekas wrote:
> > Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c
> > and the calculated IP checksum is wrong on GCC 9 and GCC 10.
> > 
> > Signed-off-by: George Prekas <prekageo@amazon.com>
> > ---
> > v2:
> > * Instead of a compiler barrier, use a compiler flag.
> > ---
> >   app/test-pmd/meson.build | 1 +
> >   1 file changed, 1 insertion(+)
> > 
> > diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
> > index 7e9c7bdd6..5d24e807f 100644
> > --- a/app/test-pmd/meson.build
> > +++ b/app/test-pmd/meson.build
> > @@ -4,6 +4,7 @@
> >   # override default name to drop the hyphen
> >   name = 'testpmd'
> >   cflags += '-Wno-deprecated-declarations'
> > +cflags += '-fno-strict-aliasing'
> >   sources = files('5tswap.c',
> >   	'cmdline.c',
> >   	'cmdline_flow.c',
> >   
> 
> Hi George,
> 
> I am trying to understand this, the relevant code is as below:
> ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, sizeof(*ip_hdr));
> 
> You are suspicious of strict aliasing rule violation, with more details:
> The concern is the "struct rte_ipv4_hdr *ip_hdr;" aliased to "const 
> unaligned_uint16_t *hdr", and compiler can optimize out the calculations using 
> data pointed by 'hdr' pointer, since the 'hdr' pointer is not used to alter the 
> data and compiler may think data is not changed at all.
> 
> 1) But the pointer "hdr" is assigned in the loop, from another pointer whose 
> content is changing, why this is not helping to figure out that the data 'hdr' 
> pointing is changed.
> 
> 2) I tried to debug this, but I am not able to reproduce the issue, 'ip_sum()' 
> called each time and checksum calculated correctly. Using gcc 10.2.1-9. Can you 
> able to confirm the case with debug, or from the assembly/object file?
> 
> 
> And if the issue is strict aliasing rule violation as you said, compiler flag is 
> an option but not sure how much it reduces the compiler optimization benefit, I 
> guess other options also not so good, memcpy brings too much work on runtime and 
> union requires bigger change and makes code complex.
> I wonder if making 'ip_sum()' a non inline function can help, can you please 
> give a try since you can reproduce it?

If it is an aliasing problem, it should be fixed with a union instead of a compiler flag.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2021-01-06 18:02   ` Ferruh Yigit
  2021-01-07  5:25     ` Stephen Hemminger
@ 2021-01-07  5:39     ` George Prekas
  2021-01-07 11:32       ` Ferruh Yigit
  2021-01-07 15:50       ` Stephen Hemminger
  1 sibling, 2 replies; 21+ messages in thread
From: George Prekas @ 2021-01-07  5:39 UTC (permalink / raw)
  To: Ferruh Yigit, Wenzhuo Lu, Beilei Xing, Bernard Iremonger
  Cc: dev, Stephen Hemminger, George Prekas (prekageo)



On 1/6/2021 12:02 PM, Ferruh Yigit wrote:
> On 12/5/2020 5:42 AM, George Prekas wrote:
>> Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c
>> and the calculated IP checksum is wrong on GCC 9 and GCC 10.
>>
>> Signed-off-by: George Prekas <prekageo@amazon.com>
>> ---
>> v2:
>> * Instead of a compiler barrier, use a compiler flag.
>> ---
>>   app/test-pmd/meson.build | 1 +
>>   1 file changed, 1 insertion(+)
>>
>> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
>> index 7e9c7bdd6..5d24e807f 100644
>> --- a/app/test-pmd/meson.build
>> +++ b/app/test-pmd/meson.build
>> @@ -4,6 +4,7 @@
>>   # override default name to drop the hyphen
>>   name = 'testpmd'
>>   cflags += '-Wno-deprecated-declarations'
>> +cflags += '-fno-strict-aliasing'
>>   sources = files('5tswap.c',
>>       'cmdline.c',
>>       'cmdline_flow.c',
>>
> 
> Hi George,
> 
> I am trying to understand this, the relevant code is as below:
> ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, sizeof(*ip_hdr));
> 
> You are suspicious of strict aliasing rule violation, with more details:
> The concern is the "struct rte_ipv4_hdr *ip_hdr;" aliased to "const
> unaligned_uint16_t *hdr", and compiler can optimize out the calculations using
> data pointed by 'hdr' pointer, since the 'hdr' pointer is not used to alter the
> data and compiler may think data is not changed at all.
> 
> 1) But the pointer "hdr" is assigned in the loop, from another pointer whose
> content is changing, why this is not helping to figure out that the data 'hdr'
> pointing is changed.
> 
> 2) I tried to debug this, but I am not able to reproduce the issue, 'ip_sum()'
> called each time and checksum calculated correctly. Using gcc 10.2.1-9. Can you
> able to confirm the case with debug, or from the assembly/object file?
> 
> 
> And if the issue is strict aliasing rule violation as you said, compiler flag is
> an option but not sure how much it reduces the compiler optimization benefit, I
> guess other options also not so good, memcpy brings too much work on runtime and
> union requires bigger change and makes code complex.
> I wonder if making 'ip_sum()' a non inline function can help, can you please
> give a try since you can reproduce it?

Hi Ferruh,

Thanks for looking into it.

I am copy-pasting at the end of this email a minimal reproduction. It calculates a checksum and prints it. The correct value is f8d9. If you compile it with -O0 or -O3 -fno-strict-aliasing, you will get the correct value. If you compile it with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 and -O3, you will get f8e8. You can also try it on https://godbolt.org/ and see how different versions behave.

My understanding is that the code violates the C standard (https://stackoverflow.com/a/99010).

--- cut here --- 

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct rte_ipv4_hdr {
	uint8_t  version_ihl;
	uint8_t  type_of_service;
	uint16_t total_length;
	uint16_t packet_id;
	uint16_t fragment_offset;
	uint8_t  time_to_live;
	uint8_t  next_proto_id;
	uint16_t hdr_checksum;
	uint32_t src_addr;
	uint32_t dst_addr;
};

static inline uint16_t ip_sum(const uint16_t *hdr, int hdr_len)
{
	uint32_t sum = 0;

	while (hdr_len > 1)
	{
		sum += *hdr++;
		if (sum & 0x80000000)
			sum = (sum & 0xFFFF) + (sum >> 16);
		hdr_len -= 2;
	}

	while (sum >> 16)
		sum = (sum & 0xFFFF) + (sum >> 16);

	return ~sum;
}

static void pkt_burst_flow_gen(void)
{
	struct rte_ipv4_hdr *ip_hdr = (struct rte_ipv4_hdr *) malloc(4096);
	memset(ip_hdr, 0, sizeof(*ip_hdr));
	ip_hdr->version_ihl	= 1;
	ip_hdr->type_of_service	= 2;
	ip_hdr->fragment_offset	= 3;
	ip_hdr->time_to_live	= 4;
	ip_hdr->next_proto_id	= 5;
	ip_hdr->packet_id	= 6;
	ip_hdr->src_addr	= 7;
	ip_hdr->dst_addr	= 8;
	ip_hdr->total_length	= 9;
	ip_hdr->hdr_checksum	= ip_sum((uint16_t *)ip_hdr, sizeof(*ip_hdr));
	printf("%x\n", ip_hdr->hdr_checksum);
}

int main(void)
{
	pkt_burst_flow_gen();
	return 0;
}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2021-01-07  5:39     ` George Prekas
@ 2021-01-07 11:32       ` Ferruh Yigit
  2021-01-07 13:06         ` Ferruh Yigit
  2021-01-07 14:20         ` George Prekas
  2021-01-07 15:50       ` Stephen Hemminger
  1 sibling, 2 replies; 21+ messages in thread
From: Ferruh Yigit @ 2021-01-07 11:32 UTC (permalink / raw)
  To: George Prekas, Wenzhuo Lu, Beilei Xing, Bernard Iremonger
  Cc: dev, Stephen Hemminger

On 1/7/2021 5:39 AM, George Prekas wrote:
> 
> 
> On 1/6/2021 12:02 PM, Ferruh Yigit wrote:
>> On 12/5/2020 5:42 AM, George Prekas wrote:
>>> Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c
>>> and the calculated IP checksum is wrong on GCC 9 and GCC 10.
>>>
>>> Signed-off-by: George Prekas <prekageo@amazon.com>
>>> ---
>>> v2:
>>> * Instead of a compiler barrier, use a compiler flag.
>>> ---
>>>    app/test-pmd/meson.build | 1 +
>>>    1 file changed, 1 insertion(+)
>>>
>>> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
>>> index 7e9c7bdd6..5d24e807f 100644
>>> --- a/app/test-pmd/meson.build
>>> +++ b/app/test-pmd/meson.build
>>> @@ -4,6 +4,7 @@
>>>    # override default name to drop the hyphen
>>>    name = 'testpmd'
>>>    cflags += '-Wno-deprecated-declarations'
>>> +cflags += '-fno-strict-aliasing'
>>>    sources = files('5tswap.c',
>>>        'cmdline.c',
>>>        'cmdline_flow.c',
>>>
>>
>> Hi George,
>>
>> I am trying to understand this, the relevant code is as below:
>> ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, sizeof(*ip_hdr));
>>
>> You are suspicious of strict aliasing rule violation, with more details:
>> The concern is the "struct rte_ipv4_hdr *ip_hdr;" aliased to "const
>> unaligned_uint16_t *hdr", and compiler can optimize out the calculations using
>> data pointed by 'hdr' pointer, since the 'hdr' pointer is not used to alter the
>> data and compiler may think data is not changed at all.
>>
>> 1) But the pointer "hdr" is assigned in the loop, from another pointer whose
>> content is changing, why this is not helping to figure out that the data 'hdr'
>> pointing is changed.
>>
>> 2) I tried to debug this, but I am not able to reproduce the issue, 'ip_sum()'
>> called each time and checksum calculated correctly. Using gcc 10.2.1-9. Can you
>> able to confirm the case with debug, or from the assembly/object file?
>>
>>
>> And if the issue is strict aliasing rule violation as you said, compiler flag is
>> an option but not sure how much it reduces the compiler optimization benefit, I
>> guess other options also not so good, memcpy brings too much work on runtime and
>> union requires bigger change and makes code complex.
>> I wonder if making 'ip_sum()' a non inline function can help, can you please
>> give a try since you can reproduce it?
> 
> Hi Ferruh,
> 
> Thanks for looking into it.
> 
> I am copy-pasting at the end of this email a minimal reproduction. It calculates a checksum and prints it. The correct value is f8d9. If you compile it with -O0 or -O3 -fno-strict-aliasing, you will get the correct value. If you compile it with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 and -O3, you will get f8e8. You can also try it on https://godbolt.org/ and see how different versions behave.
> 
> My understanding is that the code violates the C standard (https://stackoverflow.com/a/99010).
> 

Thanks for the sample code below, I copied to the godbolt:
https://godbolt.org/z/6fMK19

In gcc 10, the checksum calculation is done during compilation (when 
optimization is enabled) and the value is returned directly:
mov    $0xffed,%esi

Since a calculation is happening I assume the compiler knows about the aliasing 
and OK with it.

But that optimized calculation seems wrong, when it is disabled [1] the checksum 
is correct again.

[1] all following seems helping to disable compile time calculation
- disabling optimization
- putting a compiler barrier
- putting a 'printf' inside 'ip_sum()'
- fno-strict-aliasing

gcc 8 & 9 is not doing this compile time calculation, hence they are not affected.

This feels like an optimization issue in gcc10, but not sure exactly on the root 
cause, and how to disable it properly in our case.

> --- cut here ---
> 
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> 
> struct rte_ipv4_hdr {
> 	uint8_t  version_ihl;
> 	uint8_t  type_of_service;
> 	uint16_t total_length;
> 	uint16_t packet_id;
> 	uint16_t fragment_offset;
> 	uint8_t  time_to_live;
> 	uint8_t  next_proto_id;
> 	uint16_t hdr_checksum;
> 	uint32_t src_addr;
> 	uint32_t dst_addr;
> };
> 
> static inline uint16_t ip_sum(const uint16_t *hdr, int hdr_len)
> {
> 	uint32_t sum = 0;
> 
> 	while (hdr_len > 1)
> 	{
> 		sum += *hdr++;
> 		if (sum & 0x80000000)
> 			sum = (sum & 0xFFFF) + (sum >> 16);
> 		hdr_len -= 2;
> 	}
> 
> 	while (sum >> 16)
> 		sum = (sum & 0xFFFF) + (sum >> 16);
> 
> 	return ~sum;
> }
> 
> static void pkt_burst_flow_gen(void)
> {
> 	struct rte_ipv4_hdr *ip_hdr = (struct rte_ipv4_hdr *) malloc(4096);
> 	memset(ip_hdr, 0, sizeof(*ip_hdr));
> 	ip_hdr->version_ihl	= 1;
> 	ip_hdr->type_of_service	= 2;
> 	ip_hdr->fragment_offset	= 3;
> 	ip_hdr->time_to_live	= 4;
> 	ip_hdr->next_proto_id	= 5;
> 	ip_hdr->packet_id	= 6;
> 	ip_hdr->src_addr	= 7;
> 	ip_hdr->dst_addr	= 8;
> 	ip_hdr->total_length	= 9;
> 	ip_hdr->hdr_checksum	= ip_sum((uint16_t *)ip_hdr, sizeof(*ip_hdr));
> 	printf("%x\n", ip_hdr->hdr_checksum);
> }
> 
> int main(void)
> {
> 	pkt_burst_flow_gen();
> 	return 0;
> }
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2021-01-07 11:32       ` Ferruh Yigit
@ 2021-01-07 13:06         ` Ferruh Yigit
  2021-01-07 14:20         ` George Prekas
  1 sibling, 0 replies; 21+ messages in thread
From: Ferruh Yigit @ 2021-01-07 13:06 UTC (permalink / raw)
  To: George Prekas, Wenzhuo Lu, Beilei Xing, Bernard Iremonger
  Cc: dev, Stephen Hemminger, Harry van Haaren

On 1/7/2021 11:32 AM, Ferruh Yigit wrote:
> On 1/7/2021 5:39 AM, George Prekas wrote:
>>
>>
>> On 1/6/2021 12:02 PM, Ferruh Yigit wrote:
>>> On 12/5/2020 5:42 AM, George Prekas wrote:
>>>> Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c
>>>> and the calculated IP checksum is wrong on GCC 9 and GCC 10.
>>>>
>>>> Signed-off-by: George Prekas <prekageo@amazon.com>
>>>> ---
>>>> v2:
>>>> * Instead of a compiler barrier, use a compiler flag.
>>>> ---
>>>>    app/test-pmd/meson.build | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
>>>> index 7e9c7bdd6..5d24e807f 100644
>>>> --- a/app/test-pmd/meson.build
>>>> +++ b/app/test-pmd/meson.build
>>>> @@ -4,6 +4,7 @@
>>>>    # override default name to drop the hyphen
>>>>    name = 'testpmd'
>>>>    cflags += '-Wno-deprecated-declarations'
>>>> +cflags += '-fno-strict-aliasing'
>>>>    sources = files('5tswap.c',
>>>>        'cmdline.c',
>>>>        'cmdline_flow.c',
>>>>
>>>
>>> Hi George,
>>>
>>> I am trying to understand this, the relevant code is as below:
>>> ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, sizeof(*ip_hdr));
>>>
>>> You are suspicious of strict aliasing rule violation, with more details:
>>> The concern is the "struct rte_ipv4_hdr *ip_hdr;" aliased to "const
>>> unaligned_uint16_t *hdr", and compiler can optimize out the calculations using
>>> data pointed by 'hdr' pointer, since the 'hdr' pointer is not used to alter the
>>> data and compiler may think data is not changed at all.
>>>
>>> 1) But the pointer "hdr" is assigned in the loop, from another pointer whose
>>> content is changing, why this is not helping to figure out that the data 'hdr'
>>> pointing is changed.
>>>
>>> 2) I tried to debug this, but I am not able to reproduce the issue, 'ip_sum()'
>>> called each time and checksum calculated correctly. Using gcc 10.2.1-9. Can you
>>> able to confirm the case with debug, or from the assembly/object file?
>>>
>>>
>>> And if the issue is strict aliasing rule violation as you said, compiler flag is
>>> an option but not sure how much it reduces the compiler optimization benefit, I
>>> guess other options also not so good, memcpy brings too much work on runtime and
>>> union requires bigger change and makes code complex.
>>> I wonder if making 'ip_sum()' a non inline function can help, can you please
>>> give a try since you can reproduce it?
>>
>> Hi Ferruh,
>>
>> Thanks for looking into it.
>>
>> I am copy-pasting at the end of this email a minimal reproduction. It 
>> calculates a checksum and prints it. The correct value is f8d9. If you compile 
>> it with -O0 or -O3 -fno-strict-aliasing, you will get the correct value. If 
>> you compile it with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 and -O3, you will 
>> get f8e8. You can also try it on https://godbolt.org/ and see how different 
>> versions behave.
>>
>> My understanding is that the code violates the C standard 
>> (https://stackoverflow.com/a/99010).
>>
> 
> Thanks for the sample code below, I copied to the godbolt:
> https://godbolt.org/z/6fMK19
> 
> In gcc 10, the checksum calculation is done during compilation (when 
> optimization is enabled) and the value is returned directly:
> mov    $0xffed,%esi
> 
> Since a calculation is happening I assume the compiler knows about the aliasing 
> and OK with it.
> 
> But that optimized calculation seems wrong, when it is disabled [1] the checksum 
> is correct again.
> 
> [1] all following seems helping to disable compile time calculation
> - disabling optimization
> - putting a compiler barrier
> - putting a 'printf' inside 'ip_sum()'
> - fno-strict-aliasing
> 
> gcc 8 & 9 is not doing this compile time calculation, hence they are not affected.
> 
> This feels like an optimization issue in gcc10, but not sure exactly on the root 
> cause, and how to disable it properly in our case.
> 

As checked with the Harry, latest finding is gcc 10 left out any _non_ uint16_t 
type variable in sturct during its compile time calculation. Not sure if it is 
because of broken aliasing or gcc defect, I will report the issue.

Meanwhile for short time solution, can you please try force uninline the 
'ip_sum()' and try?


>> --- cut here ---
>>
>> #include <stdint.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>>
>> struct rte_ipv4_hdr {
>>     uint8_t  version_ihl;
>>     uint8_t  type_of_service;
>>     uint16_t total_length;
>>     uint16_t packet_id;
>>     uint16_t fragment_offset;
>>     uint8_t  time_to_live;
>>     uint8_t  next_proto_id;
>>     uint16_t hdr_checksum;
>>     uint32_t src_addr;
>>     uint32_t dst_addr;
>> };
>>
>> static inline uint16_t ip_sum(const uint16_t *hdr, int hdr_len)
>> {
>>     uint32_t sum = 0;
>>
>>     while (hdr_len > 1)
>>     {
>>         sum += *hdr++;
>>         if (sum & 0x80000000)
>>             sum = (sum & 0xFFFF) + (sum >> 16);
>>         hdr_len -= 2;
>>     }
>>
>>     while (sum >> 16)
>>         sum = (sum & 0xFFFF) + (sum >> 16);
>>
>>     return ~sum;
>> }
>>
>> static void pkt_burst_flow_gen(void)
>> {
>>     struct rte_ipv4_hdr *ip_hdr = (struct rte_ipv4_hdr *) malloc(4096);
>>     memset(ip_hdr, 0, sizeof(*ip_hdr));
>>     ip_hdr->version_ihl    = 1;
>>     ip_hdr->type_of_service    = 2;
>>     ip_hdr->fragment_offset    = 3;
>>     ip_hdr->time_to_live    = 4;
>>     ip_hdr->next_proto_id    = 5;
>>     ip_hdr->packet_id    = 6;
>>     ip_hdr->src_addr    = 7;
>>     ip_hdr->dst_addr    = 8;
>>     ip_hdr->total_length    = 9;
>>     ip_hdr->hdr_checksum    = ip_sum((uint16_t *)ip_hdr, sizeof(*ip_hdr));
>>     printf("%x\n", ip_hdr->hdr_checksum);
>> }
>>
>> int main(void)
>> {
>>     pkt_burst_flow_gen();
>>     return 0;
>> }
>>
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2021-01-07 11:32       ` Ferruh Yigit
  2021-01-07 13:06         ` Ferruh Yigit
@ 2021-01-07 14:20         ` George Prekas
  2021-01-07 15:22           ` Ferruh Yigit
  1 sibling, 1 reply; 21+ messages in thread
From: George Prekas @ 2021-01-07 14:20 UTC (permalink / raw)
  To: Ferruh Yigit, Wenzhuo Lu, Beilei Xing, Bernard Iremonger
  Cc: dev, Stephen Hemminger, George Prekas (prekageo)

On 1/7/2021 5:32 AM, Ferruh Yigit wrote:
> On 1/7/2021 5:39 AM, George Prekas wrote:
>> On 1/6/2021 12:02 PM, Ferruh Yigit wrote:
>>> On 12/5/2020 5:42 AM, George Prekas wrote:
>>>> Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c
>>>> and the calculated IP checksum is wrong on GCC 9 and GCC 10.
>>>>
>>>> Signed-off-by: George Prekas <prekageo@amazon.com>
>>>> ---
>>>> v2:
>>>> * Instead of a compiler barrier, use a compiler flag.
>>>> ---
>>>>    app/test-pmd/meson.build | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
>>>> index 7e9c7bdd6..5d24e807f 100644
>>>> --- a/app/test-pmd/meson.build
>>>> +++ b/app/test-pmd/meson.build
>>>> @@ -4,6 +4,7 @@
>>>>    # override default name to drop the hyphen
>>>>    name = 'testpmd'
>>>>    cflags += '-Wno-deprecated-declarations'
>>>> +cflags += '-fno-strict-aliasing'
>>>>    sources = files('5tswap.c',
>>>>        'cmdline.c',
>>>>        'cmdline_flow.c',
>>>>
>>>
>>> Hi George,
>>>
>>> I am trying to understand this, the relevant code is as below:
>>> ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, sizeof(*ip_hdr));
>>>
>>> You are suspicious of strict aliasing rule violation, with more details:
>>> The concern is the "struct rte_ipv4_hdr *ip_hdr;" aliased to "const
>>> unaligned_uint16_t *hdr", and compiler can optimize out the calculations using
>>> data pointed by 'hdr' pointer, since the 'hdr' pointer is not used to alter the
>>> data and compiler may think data is not changed at all.
>>>
>>> 1) But the pointer "hdr" is assigned in the loop, from another pointer whose
>>> content is changing, why this is not helping to figure out that the data 'hdr'
>>> pointing is changed.
>>>
>>> 2) I tried to debug this, but I am not able to reproduce the issue, 'ip_sum()'
>>> called each time and checksum calculated correctly. Using gcc 10.2.1-9. Can you
>>> able to confirm the case with debug, or from the assembly/object file?
>>>
>>>
>>> And if the issue is strict aliasing rule violation as you said, compiler flag is
>>> an option but not sure how much it reduces the compiler optimization benefit, I
>>> guess other options also not so good, memcpy brings too much work on runtime and
>>> union requires bigger change and makes code complex.
>>> I wonder if making 'ip_sum()' a non inline function can help, can you please
>>> give a try since you can reproduce it?
>>
>> Hi Ferruh,
>>
>> Thanks for looking into it.
>>
>> I am copy-pasting at the end of this email a minimal reproduction. It calculates a checksum and prints it. The correct value is f8d9. If you compile it with -O0 or -O3 -fno-strict-aliasing, you will get the correct value. If you compile it with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 and -O3, you will get f8e8. You can also try it on https://godbolt.org/ and see how different versions behave.
>>
>> My understanding is that the code violates the C standard (https://stackoverflow.com/a/99010).
>>
> 
> Thanks for the sample code below, I copied to the godbolt:
> https://godbolt.org/z/6fMK19
> 
> In gcc 10, the checksum calculation is done during compilation (when
> optimization is enabled) and the value is returned directly:
> mov    $0xffed,%esi
> 
> Since a calculation is happening I assume the compiler knows about the aliasing
> and OK with it.

According to https://gcc.gnu.org/bugs/: "if compiling with -fno-strict-aliasing -fwrapv
-fno-aggressive-loop-optimizations makes a difference ... then your code is probably not
correct"

> 
> But that optimized calculation seems wrong, when it is disabled [1] the checksum
> is correct again.
> 
> [1] all following seems helping to disable compile time calculation
> - disabling optimization
> - putting a compiler barrier
> - putting a 'printf' inside 'ip_sum()'
> - fno-strict-aliasing
> 
> gcc 8 & 9 is not doing this compile time calculation, hence they are not affected.

I just checked gcc 8.3 and gcc 9.3 on godbolt and I got f8e8 (which is wrong; the correct
is f8d9). 

> 
> This feels like an optimization issue in gcc10, but not sure exactly on the root
> cause, and how to disable it properly in our case.

I've tried with __attribute__ ((noinline)) and it fixes the problem. But keep in mind
that we are dealing with broken C code. This attribute just prevents the optimization that
reveals the problem. It does not guarantee that the problem will not reappear in a future
compiler version.

I've also tried to use a union as suggested by Stephen Hemminger and it works correctly but
it requires significant code changes: you have to copy paste the IP header structure inside
a union and access it only through the union.

As a side note, here is a piece of opinion from Linus Torvalds regarding strict aliasing:
https://lkml.org/lkml/2018/6/5/769

DPDK already uses -fno-strict-aliasing for librte_node and librte_vhost.

> 
>> --- cut here ---
>>
>> #include <stdint.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>>
>> struct rte_ipv4_hdr {
>>       uint8_t  version_ihl;
>>       uint8_t  type_of_service;
>>       uint16_t total_length;
>>       uint16_t packet_id;
>>       uint16_t fragment_offset;
>>       uint8_t  time_to_live;
>>       uint8_t  next_proto_id;
>>       uint16_t hdr_checksum;
>>       uint32_t src_addr;
>>       uint32_t dst_addr;
>> };
>>
>> static inline uint16_t ip_sum(const uint16_t *hdr, int hdr_len)
>> {
>>       uint32_t sum = 0;
>>
>>       while (hdr_len > 1)
>>       {
>>               sum += *hdr++;
>>               if (sum & 0x80000000)
>>                       sum = (sum & 0xFFFF) + (sum >> 16);
>>               hdr_len -= 2;
>>       }
>>
>>       while (sum >> 16)
>>               sum = (sum & 0xFFFF) + (sum >> 16);
>>
>>       return ~sum;
>> }
>>
>> static void pkt_burst_flow_gen(void)
>> {
>>       struct rte_ipv4_hdr *ip_hdr = (struct rte_ipv4_hdr *) malloc(4096);
>>       memset(ip_hdr, 0, sizeof(*ip_hdr));
>>       ip_hdr->version_ihl     = 1;
>>       ip_hdr->type_of_service = 2;
>>       ip_hdr->fragment_offset = 3;
>>       ip_hdr->time_to_live    = 4;
>>       ip_hdr->next_proto_id   = 5;
>>       ip_hdr->packet_id       = 6;
>>       ip_hdr->src_addr        = 7;
>>       ip_hdr->dst_addr        = 8;
>>       ip_hdr->total_length    = 9;
>>       ip_hdr->hdr_checksum    = ip_sum((uint16_t *)ip_hdr, sizeof(*ip_hdr));
>>       printf("%x\n", ip_hdr->hdr_checksum);
>> }
>>
>> int main(void)
>> {
>>       pkt_burst_flow_gen();
>>       return 0;
>> }
>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2021-01-07 14:20         ` George Prekas
@ 2021-01-07 15:22           ` Ferruh Yigit
  2021-01-07 20:45             ` George Prekas
  0 siblings, 1 reply; 21+ messages in thread
From: Ferruh Yigit @ 2021-01-07 15:22 UTC (permalink / raw)
  To: George Prekas, Wenzhuo Lu, Beilei Xing, Bernard Iremonger
  Cc: dev, Stephen Hemminger, Harry van Haaren

On 1/7/2021 2:20 PM, George Prekas wrote:
> On 1/7/2021 5:32 AM, Ferruh Yigit wrote:
>> On 1/7/2021 5:39 AM, George Prekas wrote:
>>> On 1/6/2021 12:02 PM, Ferruh Yigit wrote:
>>>> On 12/5/2020 5:42 AM, George Prekas wrote:
>>>>> Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c
>>>>> and the calculated IP checksum is wrong on GCC 9 and GCC 10.
>>>>>
>>>>> Signed-off-by: George Prekas <prekageo@amazon.com>
>>>>> ---
>>>>> v2:
>>>>> * Instead of a compiler barrier, use a compiler flag.
>>>>> ---
>>>>>     app/test-pmd/meson.build | 1 +
>>>>>     1 file changed, 1 insertion(+)
>>>>>
>>>>> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
>>>>> index 7e9c7bdd6..5d24e807f 100644
>>>>> --- a/app/test-pmd/meson.build
>>>>> +++ b/app/test-pmd/meson.build
>>>>> @@ -4,6 +4,7 @@
>>>>>     # override default name to drop the hyphen
>>>>>     name = 'testpmd'
>>>>>     cflags += '-Wno-deprecated-declarations'
>>>>> +cflags += '-fno-strict-aliasing'
>>>>>     sources = files('5tswap.c',
>>>>>         'cmdline.c',
>>>>>         'cmdline_flow.c',
>>>>>
>>>>
>>>> Hi George,
>>>>
>>>> I am trying to understand this, the relevant code is as below:
>>>> ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, sizeof(*ip_hdr));
>>>>
>>>> You are suspicious of strict aliasing rule violation, with more details:
>>>> The concern is the "struct rte_ipv4_hdr *ip_hdr;" aliased to "const
>>>> unaligned_uint16_t *hdr", and compiler can optimize out the calculations using
>>>> data pointed by 'hdr' pointer, since the 'hdr' pointer is not used to alter the
>>>> data and compiler may think data is not changed at all.
>>>>
>>>> 1) But the pointer "hdr" is assigned in the loop, from another pointer whose
>>>> content is changing, why this is not helping to figure out that the data 'hdr'
>>>> pointing is changed.
>>>>
>>>> 2) I tried to debug this, but I am not able to reproduce the issue, 'ip_sum()'
>>>> called each time and checksum calculated correctly. Using gcc 10.2.1-9. Can you
>>>> able to confirm the case with debug, or from the assembly/object file?
>>>>
>>>>
>>>> And if the issue is strict aliasing rule violation as you said, compiler flag is
>>>> an option but not sure how much it reduces the compiler optimization benefit, I
>>>> guess other options also not so good, memcpy brings too much work on runtime and
>>>> union requires bigger change and makes code complex.
>>>> I wonder if making 'ip_sum()' a non inline function can help, can you please
>>>> give a try since you can reproduce it?
>>>
>>> Hi Ferruh,
>>>
>>> Thanks for looking into it.
>>>
>>> I am copy-pasting at the end of this email a minimal reproduction. It calculates a checksum and prints it. The correct value is f8d9. If you compile it with -O0 or -O3 -fno-strict-aliasing, you will get the correct value. If you compile it with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 and -O3, you will get f8e8. You can also try it on https://godbolt.org/ and see how different versions behave.
>>>
>>> My understanding is that the code violates the C standard (https://stackoverflow.com/a/99010).
>>>
>>
>> Thanks for the sample code below, I copied to the godbolt:
>> https://godbolt.org/z/6fMK19
>>
>> In gcc 10, the checksum calculation is done during compilation (when
>> optimization is enabled) and the value is returned directly:
>> mov    $0xffed,%esi
>>
>> Since a calculation is happening I assume the compiler knows about the aliasing
>> and OK with it.
> 
> According to https://gcc.gnu.org/bugs/: "if compiling with -fno-strict-aliasing -fwrapv
> -fno-aggressive-loop-optimizations makes a difference ... then your code is probably not
> correct"
> 

Yep, I saw it while submitting the gcc ticket, and it seems it was right:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98582

>>
>> But that optimized calculation seems wrong, when it is disabled [1] the checksum
>> is correct again.
>>
>> [1] all following seems helping to disable compile time calculation
>> - disabling optimization
>> - putting a compiler barrier
>> - putting a 'printf' inside 'ip_sum()'
>> - fno-strict-aliasing
>>
>> gcc 8 & 9 is not doing this compile time calculation, hence they are not affected.
> 
> I just checked gcc 8.3 and gcc 9.3 on godbolt and I got f8e8 (which is wrong; the correct
> is f8d9).
> 

True, I missed that they generate wrong value.

>>
>> This feels like an optimization issue in gcc10, but not sure exactly on the root
>> cause, and how to disable it properly in our case.
> 
> I've tried with __attribute__ ((noinline)) and it fixes the problem. But keep in mind
> that we are dealing with broken C code. This attribute just prevents the optimization that
> reveals the problem. It does not guarantee that the problem will not reappear in a future
> compiler version.
> 
> I've also tried to use a union as suggested by Stephen Hemminger and it works correctly but
> it requires significant code changes: you have to copy paste the IP header structure inside
> a union and access it only through the union.
> 
> As a side note, here is a piece of opinion from Linus Torvalds regarding strict aliasing:
> https://lkml.org/lkml/2018/6/5/769
> 
> DPDK already uses -fno-strict-aliasing for librte_node and librte_vhost.

In the above ticket, 'may_alias' attribute is also suggested, which is working 
for the sample, can you please try with it too?
It may be better to allow non compatible aliasing only for single function, 
instead of whole binary.

typedef uint16_t alias_int16_t __attribute__((may_alias));

> 
>>
>>> --- cut here ---
>>>
>>> #include <stdint.h>
>>> #include <stdio.h>
>>> #include <stdlib.h>
>>> #include <string.h>
>>>
>>> struct rte_ipv4_hdr {
>>>        uint8_t  version_ihl;
>>>        uint8_t  type_of_service;
>>>        uint16_t total_length;
>>>        uint16_t packet_id;
>>>        uint16_t fragment_offset;
>>>        uint8_t  time_to_live;
>>>        uint8_t  next_proto_id;
>>>        uint16_t hdr_checksum;
>>>        uint32_t src_addr;
>>>        uint32_t dst_addr;
>>> };
>>>
>>> static inline uint16_t ip_sum(const uint16_t *hdr, int hdr_len)
>>> {
>>>        uint32_t sum = 0;
>>>
>>>        while (hdr_len > 1)
>>>        {
>>>                sum += *hdr++;
>>>                if (sum & 0x80000000)
>>>                        sum = (sum & 0xFFFF) + (sum >> 16);
>>>                hdr_len -= 2;
>>>        }
>>>
>>>        while (sum >> 16)
>>>                sum = (sum & 0xFFFF) + (sum >> 16);
>>>
>>>        return ~sum;
>>> }
>>>
>>> static void pkt_burst_flow_gen(void)
>>> {
>>>        struct rte_ipv4_hdr *ip_hdr = (struct rte_ipv4_hdr *) malloc(4096);
>>>        memset(ip_hdr, 0, sizeof(*ip_hdr));
>>>        ip_hdr->version_ihl     = 1;
>>>        ip_hdr->type_of_service = 2;
>>>        ip_hdr->fragment_offset = 3;
>>>        ip_hdr->time_to_live    = 4;
>>>        ip_hdr->next_proto_id   = 5;
>>>        ip_hdr->packet_id       = 6;
>>>        ip_hdr->src_addr        = 7;
>>>        ip_hdr->dst_addr        = 8;
>>>        ip_hdr->total_length    = 9;
>>>        ip_hdr->hdr_checksum    = ip_sum((uint16_t *)ip_hdr, sizeof(*ip_hdr));
>>>        printf("%x\n", ip_hdr->hdr_checksum);
>>> }
>>>
>>> int main(void)
>>> {
>>>        pkt_burst_flow_gen();
>>>        return 0;
>>> }
>>>
>>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2021-01-07  5:39     ` George Prekas
  2021-01-07 11:32       ` Ferruh Yigit
@ 2021-01-07 15:50       ` Stephen Hemminger
  2021-01-07 15:59         ` Ferruh Yigit
  1 sibling, 1 reply; 21+ messages in thread
From: Stephen Hemminger @ 2021-01-07 15:50 UTC (permalink / raw)
  To: George Prekas
  Cc: Ferruh Yigit, Wenzhuo Lu, Beilei Xing, Bernard Iremonger, dev

On Wed, 6 Jan 2021 23:39:39 -0600
George Prekas <prekageo@amazon.com> wrote:

> On 1/6/2021 12:02 PM, Ferruh Yigit wrote:
> > On 12/5/2020 5:42 AM, George Prekas wrote:  
> >> Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c
> >> and the calculated IP checksum is wrong on GCC 9 and GCC 10.
> >>
> >> Signed-off-by: George Prekas <prekageo@amazon.com>
> >> ---
> >> v2:
> >> * Instead of a compiler barrier, use a compiler flag.
> >> ---
> >>   app/test-pmd/meson.build | 1 +
> >>   1 file changed, 1 insertion(+)
> >>
> >> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
> >> index 7e9c7bdd6..5d24e807f 100644
> >> --- a/app/test-pmd/meson.build
> >> +++ b/app/test-pmd/meson.build
> >> @@ -4,6 +4,7 @@
> >>   # override default name to drop the hyphen
> >>   name = 'testpmd'
> >>   cflags += '-Wno-deprecated-declarations'
> >> +cflags += '-fno-strict-aliasing'
> >>   sources = files('5tswap.c',
> >>       'cmdline.c',
> >>       'cmdline_flow.c',
> >>  
> > 
> > Hi George,
> > 
> > I am trying to understand this, the relevant code is as below:
> > ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, sizeof(*ip_hdr));
> > 
> > You are suspicious of strict aliasing rule violation, with more details:
> > The concern is the "struct rte_ipv4_hdr *ip_hdr;" aliased to "const
> > unaligned_uint16_t *hdr", and compiler can optimize out the calculations using
> > data pointed by 'hdr' pointer, since the 'hdr' pointer is not used to alter the
> > data and compiler may think data is not changed at all.
> > 
> > 1) But the pointer "hdr" is assigned in the loop, from another pointer whose
> > content is changing, why this is not helping to figure out that the data 'hdr'
> > pointing is changed.
> > 
> > 2) I tried to debug this, but I am not able to reproduce the issue, 'ip_sum()'
> > called each time and checksum calculated correctly. Using gcc 10.2.1-9. Can you
> > able to confirm the case with debug, or from the assembly/object file?
> > 
> > 
> > And if the issue is strict aliasing rule violation as you said, compiler flag is
> > an option but not sure how much it reduces the compiler optimization benefit, I
> > guess other options also not so good, memcpy brings too much work on runtime and
> > union requires bigger change and makes code complex.
> > I wonder if making 'ip_sum()' a non inline function can help, can you please
> > give a try since you can reproduce it?  
> 
> Hi Ferruh,
> 
> Thanks for looking into it.
> 
> I am copy-pasting at the end of this email a minimal reproduction. It calculates a checksum and prints it. The correct value is f8d9. If you compile it with -O0 or -O3 -fno-strict-aliasing, you will get the correct value. If you compile it with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 and -O3, you will get f8e8. You can also try it on https://godbolt.org/ and see how different versions behave.
> 
> My understanding is that the code violates the C standard (https://stackoverflow.com/a/99010).
> 
> --- cut here --- 
> 
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> 
> struct rte_ipv4_hdr {
> 	uint8_t  version_ihl;
> 	uint8_t  type_of_service;
> 	uint16_t total_length;
> 	uint16_t packet_id;
> 	uint16_t fragment_offset;
> 	uint8_t  time_to_live;
> 	uint8_t  next_proto_id;
> 	uint16_t hdr_checksum;
> 	uint32_t src_addr;
> 	uint32_t dst_addr;
> };
> 
> static inline uint16_t ip_sum(const uint16_t *hdr, int hdr_len)
> {
> 	uint32_t sum = 0;
> 
> 	while (hdr_len > 1)
> 	{
> 		sum += *hdr++;
> 		if (sum & 0x80000000)
> 			sum = (sum & 0xFFFF) + (sum >> 16);
> 		hdr_len -= 2;
> 	}
> 
> 	while (sum >> 16)
> 		sum = (sum & 0xFFFF) + (sum >> 16);
> 
> 	return ~sum;
> }
> 
> static void pkt_burst_flow_gen(void)
> {
> 	struct rte_ipv4_hdr *ip_hdr = (struct rte_ipv4_hdr *) malloc(4096);
> 	memset(ip_hdr, 0, sizeof(*ip_hdr));
> 	ip_hdr->version_ihl	= 1;
> 	ip_hdr->type_of_service	= 2;
> 	ip_hdr->fragment_offset	= 3;
> 	ip_hdr->time_to_live	= 4;
> 	ip_hdr->next_proto_id	= 5;
> 	ip_hdr->packet_id	= 6;
> 	ip_hdr->src_addr	= 7;
> 	ip_hdr->dst_addr	= 8;
> 	ip_hdr->total_length	= 9;
> 	ip_hdr->hdr_checksum	= ip_sum((uint16_t *)ip_hdr, sizeof(*ip_hdr));
> 	printf("%x\n", ip_hdr->hdr_checksum);
> }
> 
> int main(void)
> {
> 	pkt_burst_flow_gen();
> 	return 0;
> }

If I change your code like this to use union, Gcc 10 is still broken.
It is a compiler bug.  It maybe because optimizer is not smart enough
to know that memset has cleared the header.


#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

struct rte_ipv4_hdr {
	uint8_t  version_ihl;
	uint8_t  type_of_service;
	uint16_t total_length;
	uint16_t packet_id;
	uint16_t fragment_offset;
	uint8_t  time_to_live;
	uint8_t  next_proto_id;
	uint16_t hdr_checksum;
	uint32_t src_addr;
	uint32_t dst_addr;
};

static inline uint16_t ip_sum(const uint16_t *hdr, int hdr_len)
{
	uint32_t sum = 0;

	while (hdr_len > 1)
	{
		sum += *hdr++;
		if (sum & 0x80000000)
			sum = (sum & 0xFFFF) + (sum >> 16);
		hdr_len -= 2;
	}

	while (sum >> 16)
		sum = (sum & 0xFFFF) + (sum >> 16);

	return ~sum;
}

static void pkt_burst_flow_gen(void)
{
	union {
		struct rte_ipv4_hdr ip;
		uint16_t data[10];
	} *hdr;

	hdr = malloc(sizeof(*hdr));

	memset(hdr, 0, sizeof(*hdr));
	hdr->ip.version_ihl	= 1;
	hdr->ip.type_of_service	= 2;
	hdr->ip.fragment_offset	= 3;
	hdr->ip.time_to_live	= 4;
	hdr->ip.next_proto_id	= 5;
	hdr->ip.packet_id	= 6;
	hdr->ip.src_addr	= 7;
	hdr->ip.dst_addr	= 8;
	hdr->ip.total_length	= 9;
	hdr->ip.hdr_checksum	= ip_sum(hdr->data, sizeof(*hdr));
	printf("%x\n", hdr->ip.hdr_checksum);
}

int main(void)
{
	pkt_burst_flow_gen();
	return 0;
}

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2021-01-07 15:50       ` Stephen Hemminger
@ 2021-01-07 15:59         ` Ferruh Yigit
  2021-01-07 16:29           ` Stephen Hemminger
  0 siblings, 1 reply; 21+ messages in thread
From: Ferruh Yigit @ 2021-01-07 15:59 UTC (permalink / raw)
  To: Stephen Hemminger, George Prekas
  Cc: Wenzhuo Lu, Beilei Xing, Bernard Iremonger, dev

On 1/7/2021 3:50 PM, Stephen Hemminger wrote:
> On Wed, 6 Jan 2021 23:39:39 -0600
> George Prekas <prekageo@amazon.com> wrote:
> 
>> On 1/6/2021 12:02 PM, Ferruh Yigit wrote:
>>> On 12/5/2020 5:42 AM, George Prekas wrote:
>>>> Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c
>>>> and the calculated IP checksum is wrong on GCC 9 and GCC 10.
>>>>
>>>> Signed-off-by: George Prekas <prekageo@amazon.com>
>>>> ---
>>>> v2:
>>>> * Instead of a compiler barrier, use a compiler flag.
>>>> ---
>>>>    app/test-pmd/meson.build | 1 +
>>>>    1 file changed, 1 insertion(+)
>>>>
>>>> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
>>>> index 7e9c7bdd6..5d24e807f 100644
>>>> --- a/app/test-pmd/meson.build
>>>> +++ b/app/test-pmd/meson.build
>>>> @@ -4,6 +4,7 @@
>>>>    # override default name to drop the hyphen
>>>>    name = 'testpmd'
>>>>    cflags += '-Wno-deprecated-declarations'
>>>> +cflags += '-fno-strict-aliasing'
>>>>    sources = files('5tswap.c',
>>>>        'cmdline.c',
>>>>        'cmdline_flow.c',
>>>>   
>>>
>>> Hi George,
>>>
>>> I am trying to understand this, the relevant code is as below:
>>> ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, sizeof(*ip_hdr));
>>>
>>> You are suspicious of strict aliasing rule violation, with more details:
>>> The concern is the "struct rte_ipv4_hdr *ip_hdr;" aliased to "const
>>> unaligned_uint16_t *hdr", and compiler can optimize out the calculations using
>>> data pointed by 'hdr' pointer, since the 'hdr' pointer is not used to alter the
>>> data and compiler may think data is not changed at all.
>>>
>>> 1) But the pointer "hdr" is assigned in the loop, from another pointer whose
>>> content is changing, why this is not helping to figure out that the data 'hdr'
>>> pointing is changed.
>>>
>>> 2) I tried to debug this, but I am not able to reproduce the issue, 'ip_sum()'
>>> called each time and checksum calculated correctly. Using gcc 10.2.1-9. Can you
>>> able to confirm the case with debug, or from the assembly/object file?
>>>
>>>
>>> And if the issue is strict aliasing rule violation as you said, compiler flag is
>>> an option but not sure how much it reduces the compiler optimization benefit, I
>>> guess other options also not so good, memcpy brings too much work on runtime and
>>> union requires bigger change and makes code complex.
>>> I wonder if making 'ip_sum()' a non inline function can help, can you please
>>> give a try since you can reproduce it?
>>
>> Hi Ferruh,
>>
>> Thanks for looking into it.
>>
>> I am copy-pasting at the end of this email a minimal reproduction. It calculates a checksum and prints it. The correct value is f8d9. If you compile it with -O0 or -O3 -fno-strict-aliasing, you will get the correct value. If you compile it with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 and -O3, you will get f8e8. You can also try it on https://godbolt.org/ and see how different versions behave.
>>
>> My understanding is that the code violates the C standard (https://stackoverflow.com/a/99010).
>>
>> --- cut here ---
>>
>> #include <stdint.h>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <string.h>
>>
>> struct rte_ipv4_hdr {
>> 	uint8_t  version_ihl;
>> 	uint8_t  type_of_service;
>> 	uint16_t total_length;
>> 	uint16_t packet_id;
>> 	uint16_t fragment_offset;
>> 	uint8_t  time_to_live;
>> 	uint8_t  next_proto_id;
>> 	uint16_t hdr_checksum;
>> 	uint32_t src_addr;
>> 	uint32_t dst_addr;
>> };
>>
>> static inline uint16_t ip_sum(const uint16_t *hdr, int hdr_len)
>> {
>> 	uint32_t sum = 0;
>>
>> 	while (hdr_len > 1)
>> 	{
>> 		sum += *hdr++;
>> 		if (sum & 0x80000000)
>> 			sum = (sum & 0xFFFF) + (sum >> 16);
>> 		hdr_len -= 2;
>> 	}
>>
>> 	while (sum >> 16)
>> 		sum = (sum & 0xFFFF) + (sum >> 16);
>>
>> 	return ~sum;
>> }
>>
>> static void pkt_burst_flow_gen(void)
>> {
>> 	struct rte_ipv4_hdr *ip_hdr = (struct rte_ipv4_hdr *) malloc(4096);
>> 	memset(ip_hdr, 0, sizeof(*ip_hdr));
>> 	ip_hdr->version_ihl	= 1;
>> 	ip_hdr->type_of_service	= 2;
>> 	ip_hdr->fragment_offset	= 3;
>> 	ip_hdr->time_to_live	= 4;
>> 	ip_hdr->next_proto_id	= 5;
>> 	ip_hdr->packet_id	= 6;
>> 	ip_hdr->src_addr	= 7;
>> 	ip_hdr->dst_addr	= 8;
>> 	ip_hdr->total_length	= 9;
>> 	ip_hdr->hdr_checksum	= ip_sum((uint16_t *)ip_hdr, sizeof(*ip_hdr));
>> 	printf("%x\n", ip_hdr->hdr_checksum);
>> }
>>
>> int main(void)
>> {
>> 	pkt_burst_flow_gen();
>> 	return 0;
>> }
> 
> If I change your code like this to use union, Gcc 10 is still broken.

This worked fine for me: https://godbolt.org/z/vdsxh9

> It is a compiler bug.  It maybe because optimizer is not smart enough
> to know that memset has cleared the header.
> 
> 
> #include <stdint.h>
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> 
> struct rte_ipv4_hdr {
> 	uint8_t  version_ihl;
> 	uint8_t  type_of_service;
> 	uint16_t total_length;
> 	uint16_t packet_id;
> 	uint16_t fragment_offset;
> 	uint8_t  time_to_live;
> 	uint8_t  next_proto_id;
> 	uint16_t hdr_checksum;
> 	uint32_t src_addr;
> 	uint32_t dst_addr;
> };
> 
> static inline uint16_t ip_sum(const uint16_t *hdr, int hdr_len)
> {
> 	uint32_t sum = 0;
> 
> 	while (hdr_len > 1)
> 	{
> 		sum += *hdr++;
> 		if (sum & 0x80000000)
> 			sum = (sum & 0xFFFF) + (sum >> 16);
> 		hdr_len -= 2;
> 	}
> 
> 	while (sum >> 16)
> 		sum = (sum & 0xFFFF) + (sum >> 16);
> 
> 	return ~sum;
> }
> 
> static void pkt_burst_flow_gen(void)
> {
> 	union {
> 		struct rte_ipv4_hdr ip;
> 		uint16_t data[10];
> 	} *hdr;
> 
> 	hdr = malloc(sizeof(*hdr));
> 
> 	memset(hdr, 0, sizeof(*hdr));
> 	hdr->ip.version_ihl	= 1;
> 	hdr->ip.type_of_service	= 2;
> 	hdr->ip.fragment_offset	= 3;
> 	hdr->ip.time_to_live	= 4;
> 	hdr->ip.next_proto_id	= 5;
> 	hdr->ip.packet_id	= 6;
> 	hdr->ip.src_addr	= 7;
> 	hdr->ip.dst_addr	= 8;
> 	hdr->ip.total_length	= 9;
> 	hdr->ip.hdr_checksum	= ip_sum(hdr->data, sizeof(*hdr));
> 	printf("%x\n", hdr->ip.hdr_checksum);
> }
> 
> int main(void)
> {
> 	pkt_burst_flow_gen();
> 	return 0;
> }
> 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2021-01-07 15:59         ` Ferruh Yigit
@ 2021-01-07 16:29           ` Stephen Hemminger
  0 siblings, 0 replies; 21+ messages in thread
From: Stephen Hemminger @ 2021-01-07 16:29 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: George Prekas, Wenzhuo Lu, Beilei Xing, Bernard Iremonger, dev

On Thu, 7 Jan 2021 15:59:59 +0000
Ferruh Yigit <ferruh.yigit@intel.com> wrote:

> >>
> >> int main(void)
> >> {
> >> 	pkt_burst_flow_gen();
> >> 	return 0;
> >> }  
> > 
> > If I change your code like this to use union, Gcc 10 is still broken.  
> 
> This worked fine for me: https://godbolt.org/z/vdsxh9

I was looking for wrong result.
The union version gives same answer with and without optimization.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [dpdk-dev] [PATCH v3] app/testpmd: fix IP checksum calculation
  2020-12-05  5:42 ` [dpdk-dev] [PATCH v2] " George Prekas
  2021-01-05 16:26   ` George Prekas
  2021-01-06 18:02   ` Ferruh Yigit
@ 2021-01-07 20:42   ` George Prekas
  2021-01-18 15:20     ` Ferruh Yigit
  2 siblings, 1 reply; 21+ messages in thread
From: George Prekas @ 2021-01-07 20:42 UTC (permalink / raw)
  To: Wenzhuo Lu, Beilei Xing, Bernard Iremonger, Stephen Hemminger,
	Ferruh Yigit, Harry van Haaren
  Cc: dev, George Prekas

Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c and
the calculated IP checksum is wrong. Use attribute __may_alias__ to fix
the problem.

Signed-off-by: George Prekas <prekageo@amazon.com>
---
v3:
* Instead of a compiler flag, use a compiler attribute.
v2:
* Instead of a compiler barrier, use a compiler flag.
---
 app/test-pmd/flowgen.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/app/test-pmd/flowgen.c b/app/test-pmd/flowgen.c
index acf3e2460..cabfc688f 100644
--- a/app/test-pmd/flowgen.c
+++ b/app/test-pmd/flowgen.c
@@ -53,8 +53,11 @@ static struct rte_ether_addr cfg_ether_dst =
 
 #define IP_DEFTTL  64   /* from RFC 1340. */
 
+/* Use this type to inform GCC that ip_sum violates aliasing rules. */
+typedef unaligned_uint16_t alias_int16_t __attribute__((__may_alias__));
+
 static inline uint16_t
-ip_sum(const unaligned_uint16_t *hdr, int hdr_len)
+ip_sum(const alias_int16_t *hdr, int hdr_len)
 {
 	uint32_t sum = 0;
 
@@ -150,7 +153,7 @@ pkt_burst_flow_gen(struct fwd_stream *fs)
 							   next_flow);
 		ip_hdr->total_length	= RTE_CPU_TO_BE_16(pkt_size -
 							   sizeof(*eth_hdr));
-		ip_hdr->hdr_checksum	= ip_sum((unaligned_uint16_t *)ip_hdr,
+		ip_hdr->hdr_checksum	= ip_sum((const alias_int16_t *)ip_hdr,
 						 sizeof(*ip_hdr));
 
 		/* Initialize UDP header. */
-- 
2.17.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation
  2021-01-07 15:22           ` Ferruh Yigit
@ 2021-01-07 20:45             ` George Prekas
  0 siblings, 0 replies; 21+ messages in thread
From: George Prekas @ 2021-01-07 20:45 UTC (permalink / raw)
  To: Ferruh Yigit, Wenzhuo Lu, Beilei Xing, Bernard Iremonger
  Cc: dev, Stephen Hemminger, Harry van Haaren

On 1/7/2021 9:22 AM, Ferruh Yigit wrote:
> On 1/7/2021 2:20 PM, George Prekas wrote:
>> On 1/7/2021 5:32 AM, Ferruh Yigit wrote:
>>> On 1/7/2021 5:39 AM, George Prekas wrote:
>>>> On 1/6/2021 12:02 PM, Ferruh Yigit wrote:
>>>>> On 12/5/2020 5:42 AM, George Prekas wrote:
>>>>>> Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c
>>>>>> and the calculated IP checksum is wrong on GCC 9 and GCC 10.
>>>>>>
>>>>>> Signed-off-by: George Prekas <prekageo@amazon.com>
>>>>>> ---
>>>>>> v2:
>>>>>> * Instead of a compiler barrier, use a compiler flag.
>>>>>> ---
>>>>>>     app/test-pmd/meson.build | 1 +
>>>>>>     1 file changed, 1 insertion(+)
>>>>>>
>>>>>> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build
>>>>>> index 7e9c7bdd6..5d24e807f 100644
>>>>>> --- a/app/test-pmd/meson.build
>>>>>> +++ b/app/test-pmd/meson.build
>>>>>> @@ -4,6 +4,7 @@
>>>>>>     # override default name to drop the hyphen
>>>>>>     name = 'testpmd'
>>>>>>     cflags += '-Wno-deprecated-declarations'
>>>>>> +cflags += '-fno-strict-aliasing'
>>>>>>     sources = files('5tswap.c',
>>>>>>         'cmdline.c',
>>>>>>         'cmdline_flow.c',
>>>>>>
>>>>>
>>>>> Hi George,
>>>>>
>>>>> I am trying to understand this, the relevant code is as below:
>>>>> ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, sizeof(*ip_hdr));
>>>>>
>>>>> You are suspicious of strict aliasing rule violation, with more details:
>>>>> The concern is the "struct rte_ipv4_hdr *ip_hdr;" aliased to "const
>>>>> unaligned_uint16_t *hdr", and compiler can optimize out the calculations using
>>>>> data pointed by 'hdr' pointer, since the 'hdr' pointer is not used to alter the
>>>>> data and compiler may think data is not changed at all.
>>>>>
>>>>> 1) But the pointer "hdr" is assigned in the loop, from another pointer whose
>>>>> content is changing, why this is not helping to figure out that the data 'hdr'
>>>>> pointing is changed.
>>>>>
>>>>> 2) I tried to debug this, but I am not able to reproduce the issue, 'ip_sum()'
>>>>> called each time and checksum calculated correctly. Using gcc 10.2.1-9. Can you
>>>>> able to confirm the case with debug, or from the assembly/object file?
>>>>>
>>>>>
>>>>> And if the issue is strict aliasing rule violation as you said, compiler flag is
>>>>> an option but not sure how much it reduces the compiler optimization benefit, I
>>>>> guess other options also not so good, memcpy brings too much work on runtime and
>>>>> union requires bigger change and makes code complex.
>>>>> I wonder if making 'ip_sum()' a non inline function can help, can you please
>>>>> give a try since you can reproduce it?
>>>>
>>>> Hi Ferruh,
>>>>
>>>> Thanks for looking into it.
>>>>
>>>> I am copy-pasting at the end of this email a minimal reproduction. It calculates a checksum and prints it. The correct value is f8d9. If you compile it with -O0 or -O3 -fno-strict-aliasing, you will get the correct value. If you compile it with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 and -O3, you will get f8e8. You can also try it on https://godbolt.org/ and see how different versions behave.
>>>>
>>>> My understanding is that the code violates the C standard (https://stackoverflow.com/a/99010).
>>>>
>>>
>>> Thanks for the sample code below, I copied to the godbolt:
>>> https://godbolt.org/z/6fMK19
>>>
>>> In gcc 10, the checksum calculation is done during compilation (when
>>> optimization is enabled) and the value is returned directly:
>>> mov    $0xffed,%esi
>>>
>>> Since a calculation is happening I assume the compiler knows about the aliasing
>>> and OK with it.
>>
>> According to https://gcc.gnu.org/bugs/: "if compiling with -fno-strict-aliasing -fwrapv
>> -fno-aggressive-loop-optimizations makes a difference ... then your code is probably not
>> correct"
>>
> 
> Yep, I saw it while submitting the gcc ticket, and it seems it was right:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98582
> 
>>>
>>> But that optimized calculation seems wrong, when it is disabled [1] the checksum
>>> is correct again.
>>>
>>> [1] all following seems helping to disable compile time calculation
>>> - disabling optimization
>>> - putting a compiler barrier
>>> - putting a 'printf' inside 'ip_sum()'
>>> - fno-strict-aliasing
>>>
>>> gcc 8 & 9 is not doing this compile time calculation, hence they are not affected.
>>
>> I just checked gcc 8.3 and gcc 9.3 on godbolt and I got f8e8 (which is wrong; the correct
>> is f8d9).
>>
> 
> True, I missed that they generate wrong value.
> 
>>>
>>> This feels like an optimization issue in gcc10, but not sure exactly on the root
>>> cause, and how to disable it properly in our case.
>>
>> I've tried with __attribute__ ((noinline)) and it fixes the problem. But keep in mind
>> that we are dealing with broken C code. This attribute just prevents the optimization that
>> reveals the problem. It does not guarantee that the problem will not reappear in a future
>> compiler version.
>>
>> I've also tried to use a union as suggested by Stephen Hemminger and it works correctly but
>> it requires significant code changes: you have to copy paste the IP header structure inside
>> a union and access it only through the union.
>>
>> As a side note, here is a piece of opinion from Linus Torvalds regarding strict aliasing:
>> https://lkml.org/lkml/2018/6/5/769
>>
>> DPDK already uses -fno-strict-aliasing for librte_node and librte_vhost.
> 
> In the above ticket, 'may_alias' attribute is also suggested, which is working
> for the sample, can you please try with it too?
> It may be better to allow non compatible aliasing only for single function,
> instead of whole binary.
> 
> typedef uint16_t alias_int16_t __attribute__((may_alias));

I've tested with may_alias and it works correctly between 2 AWS EC2 instances. may_alias is used in
other places in DPDK as well. I've posted a new version of the patch.

> 
>>
>>>
>>>> --- cut here ---
>>>>
>>>> #include <stdint.h>
>>>> #include <stdio.h>
>>>> #include <stdlib.h>
>>>> #include <string.h>
>>>>
>>>> struct rte_ipv4_hdr {
>>>>        uint8_t  version_ihl;
>>>>        uint8_t  type_of_service;
>>>>        uint16_t total_length;
>>>>        uint16_t packet_id;
>>>>        uint16_t fragment_offset;
>>>>        uint8_t  time_to_live;
>>>>        uint8_t  next_proto_id;
>>>>        uint16_t hdr_checksum;
>>>>        uint32_t src_addr;
>>>>        uint32_t dst_addr;
>>>> };
>>>>
>>>> static inline uint16_t ip_sum(const uint16_t *hdr, int hdr_len)
>>>> {
>>>>        uint32_t sum = 0;
>>>>
>>>>        while (hdr_len > 1)
>>>>        {
>>>>                sum += *hdr++;
>>>>                if (sum & 0x80000000)
>>>>                        sum = (sum & 0xFFFF) + (sum >> 16);
>>>>                hdr_len -= 2;
>>>>        }
>>>>
>>>>        while (sum >> 16)
>>>>                sum = (sum & 0xFFFF) + (sum >> 16);
>>>>
>>>>        return ~sum;
>>>> }
>>>>
>>>> static void pkt_burst_flow_gen(void)
>>>> {
>>>>        struct rte_ipv4_hdr *ip_hdr = (struct rte_ipv4_hdr *) malloc(4096);
>>>>        memset(ip_hdr, 0, sizeof(*ip_hdr));
>>>>        ip_hdr->version_ihl     = 1;
>>>>        ip_hdr->type_of_service = 2;
>>>>        ip_hdr->fragment_offset = 3;
>>>>        ip_hdr->time_to_live    = 4;
>>>>        ip_hdr->next_proto_id   = 5;
>>>>        ip_hdr->packet_id       = 6;
>>>>        ip_hdr->src_addr        = 7;
>>>>        ip_hdr->dst_addr        = 8;
>>>>        ip_hdr->total_length    = 9;
>>>>        ip_hdr->hdr_checksum    = ip_sum((uint16_t *)ip_hdr, sizeof(*ip_hdr));
>>>>        printf("%x\n", ip_hdr->hdr_checksum);
>>>> }
>>>>
>>>> int main(void)
>>>> {
>>>>        pkt_burst_flow_gen();
>>>>        return 0;
>>>> }
>>>>
>>>
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [dpdk-dev] [PATCH v3] app/testpmd: fix IP checksum calculation
  2021-01-07 20:42   ` [dpdk-dev] [PATCH v3] " George Prekas
@ 2021-01-18 15:20     ` Ferruh Yigit
  0 siblings, 0 replies; 21+ messages in thread
From: Ferruh Yigit @ 2021-01-18 15:20 UTC (permalink / raw)
  To: George Prekas, Wenzhuo Lu, Beilei Xing, Bernard Iremonger,
	Stephen Hemminger, Harry van Haaren
  Cc: dev

On 1/7/2021 8:42 PM, George Prekas wrote:
> Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c and
> the calculated IP checksum is wrong. Use attribute __may_alias__ to fix
> the problem.
> 
> Signed-off-by: George Prekas <prekageo@amazon.com>

     Fixes: e9e23a617eb8 ("app/testpmd: add flowgen forwarding engine")
     Cc: stable@dpdk.org

Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>

Applied to dpdk-next-net/main, thanks.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-01-18 15:21 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-12-03 13:59 [dpdk-dev] [PATCH] app/testpmd: fix IP checksum calculation George Prekas
2020-12-03 16:08 ` Stephen Hemminger
2020-12-03 16:35   ` George Prekas
2020-12-03 18:33     ` Stephen Hemminger
2020-12-04  8:59 ` Ferruh Yigit
2020-12-05  5:47   ` George Prekas
2020-12-05  5:42 ` [dpdk-dev] [PATCH v2] " George Prekas
2021-01-05 16:26   ` George Prekas
2021-01-06 18:02   ` Ferruh Yigit
2021-01-07  5:25     ` Stephen Hemminger
2021-01-07  5:39     ` George Prekas
2021-01-07 11:32       ` Ferruh Yigit
2021-01-07 13:06         ` Ferruh Yigit
2021-01-07 14:20         ` George Prekas
2021-01-07 15:22           ` Ferruh Yigit
2021-01-07 20:45             ` George Prekas
2021-01-07 15:50       ` Stephen Hemminger
2021-01-07 15:59         ` Ferruh Yigit
2021-01-07 16:29           ` Stephen Hemminger
2021-01-07 20:42   ` [dpdk-dev] [PATCH v3] " George Prekas
2021-01-18 15:20     ` Ferruh Yigit

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).