From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (xvm-189-124.dc0.ghst.net [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EAE49A0524; Thu, 7 Jan 2021 21:45:46 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8CF51140DC9; Thu, 7 Jan 2021 21:45:46 +0100 (CET) Received: from smtp-fw-9101.amazon.com (smtp-fw-9101.amazon.com [207.171.184.25]) by mails.dpdk.org (Postfix) with ESMTP id D606F140DBD for ; Thu, 7 Jan 2021 21:45:44 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amazon.com; i=@amazon.com; q=dns/txt; s=amazon201209; t=1610052345; x=1641588345; h=to:cc:references:from:message-id:date:mime-version: in-reply-to:content-transfer-encoding:subject; bh=GuuTGsv9+QwwUS3tuMvPQBxyrq3pCwJIVCtClh/rfPk=; b=Z0fMHvcuIyiFX7MQ0Myn5eyBTRpYZPYfMq7oc/jNZb+FPmuoOwK9FB8W rE+xA2exzoVbQVvifIZtV36JDl1oQDA+4lzuB9TctGYHQBcnUdgz+eTN5 3gIIOaWuXlnDXWoxj6rDoHbfxnwcjnbPFUDE520hcb/AjPBwnQEU0G5wQ c=; X-IronPort-AV: E=Sophos;i="5.79,330,1602547200"; d="scan'208";a="102069485" Received: from sea32-co-svc-lb4-vlan3.sea.corp.amazon.com (HELO email-inbound-relay-2b-c300ac87.us-west-2.amazon.com) ([10.47.23.38]) by smtp-border-fw-out-9101.sea19.amazon.com with ESMTP; 07 Jan 2021 20:45:44 +0000 Received: from EX13MTAUWC002.ant.amazon.com (pdx1-ws-svc-p6-lb9-vlan2.pdx.amazon.com [10.236.137.194]) by email-inbound-relay-2b-c300ac87.us-west-2.amazon.com (Postfix) with ESMTPS id E2DE8A2D7A; Thu, 7 Jan 2021 20:45:41 +0000 (UTC) Received: from EX13D12UWC002.ant.amazon.com (10.43.162.253) by EX13MTAUWC002.ant.amazon.com (10.43.162.240) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 7 Jan 2021 20:45:41 +0000 Received: from [192.168.4.85] (10.43.162.252) by EX13D12UWC002.ant.amazon.com (10.43.162.253) with Microsoft SMTP Server (TLS) id 15.0.1497.2; Thu, 7 Jan 2021 20:45:39 +0000 To: Ferruh Yigit , Wenzhuo Lu , Beilei Xing , Bernard Iremonger CC: , Stephen Hemminger , "Harry van Haaren" References: <20201203135954.1127-1-prekageo@amazon.com> <20201205054238.12469-1-prekageo@amazon.com> <8c1fd6ad-302d-f45e-f48a-e81fd5f8ba85@intel.com> <6539f363-97b1-82b2-1b09-036aa75c9dc9@amazon.com> <6a84a356-191a-fca6-607f-3caf88eb3da6@intel.com> <4832ffec-526e-239b-82bc-bf2c6877f934@intel.com> From: George Prekas Message-ID: <537da54e-cf1d-f465-df20-70df41faf88f@amazon.com> Date: Thu, 7 Jan 2021 14:45:38 -0600 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.6.0 MIME-Version: 1.0 In-Reply-To: <4832ffec-526e-239b-82bc-bf2c6877f934@intel.com> Content-Type: text/plain; charset="utf-8" Content-Language: en-US Content-Transfer-Encoding: 8bit X-Originating-IP: [10.43.162.252] X-ClientProxiedBy: EX13D10UWB003.ant.amazon.com (10.43.161.106) To EX13D12UWC002.ant.amazon.com (10.43.162.253) Subject: Re: [dpdk-dev] [PATCH v2] app/testpmd: fix IP checksum calculation X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 1/7/2021 9:22 AM, Ferruh Yigit wrote: > On 1/7/2021 2:20 PM, George Prekas wrote: >> On 1/7/2021 5:32 AM, Ferruh Yigit wrote: >>> On 1/7/2021 5:39 AM, George Prekas wrote: >>>> On 1/6/2021 12:02 PM, Ferruh Yigit wrote: >>>>> On 12/5/2020 5:42 AM, George Prekas wrote: >>>>>> Strict-aliasing rules are violated by cast to uint16_t* in flowgen.c >>>>>> and the calculated IP checksum is wrong on GCC 9 and GCC 10. >>>>>> >>>>>> Signed-off-by: George Prekas >>>>>> --- >>>>>> v2: >>>>>> * Instead of a compiler barrier, use a compiler flag. >>>>>> --- >>>>>>     app/test-pmd/meson.build | 1 + >>>>>>     1 file changed, 1 insertion(+) >>>>>> >>>>>> diff --git a/app/test-pmd/meson.build b/app/test-pmd/meson.build >>>>>> index 7e9c7bdd6..5d24e807f 100644 >>>>>> --- a/app/test-pmd/meson.build >>>>>> +++ b/app/test-pmd/meson.build >>>>>> @@ -4,6 +4,7 @@ >>>>>>     # override default name to drop the hyphen >>>>>>     name = 'testpmd' >>>>>>     cflags += '-Wno-deprecated-declarations' >>>>>> +cflags += '-fno-strict-aliasing' >>>>>>     sources = files('5tswap.c', >>>>>>         'cmdline.c', >>>>>>         'cmdline_flow.c', >>>>>> >>>>> >>>>> Hi George, >>>>> >>>>> I am trying to understand this, the relevant code is as below: >>>>> ip_hdr->hdr_checksum = ip_sum((unaligned_uint16_t *)ip_hdr, sizeof(*ip_hdr)); >>>>> >>>>> You are suspicious of strict aliasing rule violation, with more details: >>>>> The concern is the "struct rte_ipv4_hdr *ip_hdr;" aliased to "const >>>>> unaligned_uint16_t *hdr", and compiler can optimize out the calculations using >>>>> data pointed by 'hdr' pointer, since the 'hdr' pointer is not used to alter the >>>>> data and compiler may think data is not changed at all. >>>>> >>>>> 1) But the pointer "hdr" is assigned in the loop, from another pointer whose >>>>> content is changing, why this is not helping to figure out that the data 'hdr' >>>>> pointing is changed. >>>>> >>>>> 2) I tried to debug this, but I am not able to reproduce the issue, 'ip_sum()' >>>>> called each time and checksum calculated correctly. Using gcc 10.2.1-9. Can you >>>>> able to confirm the case with debug, or from the assembly/object file? >>>>> >>>>> >>>>> And if the issue is strict aliasing rule violation as you said, compiler flag is >>>>> an option but not sure how much it reduces the compiler optimization benefit, I >>>>> guess other options also not so good, memcpy brings too much work on runtime and >>>>> union requires bigger change and makes code complex. >>>>> I wonder if making 'ip_sum()' a non inline function can help, can you please >>>>> give a try since you can reproduce it? >>>> >>>> Hi Ferruh, >>>> >>>> Thanks for looking into it. >>>> >>>> I am copy-pasting at the end of this email a minimal reproduction. It calculates a checksum and prints it. The correct value is f8d9. If you compile it with -O0 or -O3 -fno-strict-aliasing, you will get the correct value. If you compile it with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 and -O3, you will get f8e8. You can also try it on https://godbolt.org/ and see how different versions behave. >>>> >>>> My understanding is that the code violates the C standard (https://stackoverflow.com/a/99010). >>>> >>> >>> Thanks for the sample code below, I copied to the godbolt: >>> https://godbolt.org/z/6fMK19 >>> >>> In gcc 10, the checksum calculation is done during compilation (when >>> optimization is enabled) and the value is returned directly: >>> mov    $0xffed,%esi >>> >>> Since a calculation is happening I assume the compiler knows about the aliasing >>> and OK with it. >> >> According to https://gcc.gnu.org/bugs/: "if compiling with -fno-strict-aliasing -fwrapv >> -fno-aggressive-loop-optimizations makes a difference ... then your code is probably not >> correct" >> > > Yep, I saw it while submitting the gcc ticket, and it seems it was right: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98582 > >>> >>> But that optimized calculation seems wrong, when it is disabled [1] the checksum >>> is correct again. >>> >>> [1] all following seems helping to disable compile time calculation >>> - disabling optimization >>> - putting a compiler barrier >>> - putting a 'printf' inside 'ip_sum()' >>> - fno-strict-aliasing >>> >>> gcc 8 & 9 is not doing this compile time calculation, hence they are not affected. >> >> I just checked gcc 8.3 and gcc 9.3 on godbolt and I got f8e8 (which is wrong; the correct >> is f8d9). >> > > True, I missed that they generate wrong value. > >>> >>> This feels like an optimization issue in gcc10, but not sure exactly on the root >>> cause, and how to disable it properly in our case. >> >> I've tried with __attribute__ ((noinline)) and it fixes the problem. But keep in mind >> that we are dealing with broken C code. This attribute just prevents the optimization that >> reveals the problem. It does not guarantee that the problem will not reappear in a future >> compiler version. >> >> I've also tried to use a union as suggested by Stephen Hemminger and it works correctly but >> it requires significant code changes: you have to copy paste the IP header structure inside >> a union and access it only through the union. >> >> As a side note, here is a piece of opinion from Linus Torvalds regarding strict aliasing: >> https://lkml.org/lkml/2018/6/5/769 >> >> DPDK already uses -fno-strict-aliasing for librte_node and librte_vhost. > > In the above ticket, 'may_alias' attribute is also suggested, which is working > for the sample, can you please try with it too? > It may be better to allow non compatible aliasing only for single function, > instead of whole binary. > > typedef uint16_t alias_int16_t __attribute__((may_alias)); I've tested with may_alias and it works correctly between 2 AWS EC2 instances. may_alias is used in other places in DPDK as well. I've posted a new version of the patch. > >> >>> >>>> --- cut here --- >>>> >>>> #include >>>> #include >>>> #include >>>> #include >>>> >>>> struct rte_ipv4_hdr { >>>>        uint8_t  version_ihl; >>>>        uint8_t  type_of_service; >>>>        uint16_t total_length; >>>>        uint16_t packet_id; >>>>        uint16_t fragment_offset; >>>>        uint8_t  time_to_live; >>>>        uint8_t  next_proto_id; >>>>        uint16_t hdr_checksum; >>>>        uint32_t src_addr; >>>>        uint32_t dst_addr; >>>> }; >>>> >>>> static inline uint16_t ip_sum(const uint16_t *hdr, int hdr_len) >>>> { >>>>        uint32_t sum = 0; >>>> >>>>        while (hdr_len > 1) >>>>        { >>>>                sum += *hdr++; >>>>                if (sum & 0x80000000) >>>>                        sum = (sum & 0xFFFF) + (sum >> 16); >>>>                hdr_len -= 2; >>>>        } >>>> >>>>        while (sum >> 16) >>>>                sum = (sum & 0xFFFF) + (sum >> 16); >>>> >>>>        return ~sum; >>>> } >>>> >>>> static void pkt_burst_flow_gen(void) >>>> { >>>>        struct rte_ipv4_hdr *ip_hdr = (struct rte_ipv4_hdr *) malloc(4096); >>>>        memset(ip_hdr, 0, sizeof(*ip_hdr)); >>>>        ip_hdr->version_ihl     = 1; >>>>        ip_hdr->type_of_service = 2; >>>>        ip_hdr->fragment_offset = 3; >>>>        ip_hdr->time_to_live    = 4; >>>>        ip_hdr->next_proto_id   = 5; >>>>        ip_hdr->packet_id       = 6; >>>>        ip_hdr->src_addr        = 7; >>>>        ip_hdr->dst_addr        = 8; >>>>        ip_hdr->total_length    = 9; >>>>        ip_hdr->hdr_checksum    = ip_sum((uint16_t *)ip_hdr, sizeof(*ip_hdr)); >>>>        printf("%x\n", ip_hdr->hdr_checksum); >>>> } >>>> >>>> int main(void) >>>> { >>>>        pkt_burst_flow_gen(); >>>>        return 0; >>>> } >>>> >>> >