From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A545EA034C for ; Fri, 8 Jul 2022 16:44:23 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 9C0E2410E8; Fri, 8 Jul 2022 16:44:23 +0200 (CEST) Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2049.outbound.protection.outlook.com [40.107.237.49]) by mails.dpdk.org (Postfix) with ESMTP id 3FFED4021E; Fri, 8 Jul 2022 16:44:21 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=TpCuYzmFtj44Bp0fTxZcHbMHc+8dLZheETU2p/yhKj2+hszrqXivYnQ24JNthIyMMhfVX/GglHEnDuNYC9Jiw6g6MmaTLfZfalIAGG6yjgVctAQv9YbJV17erxNsUSLrj8k9t3LHu3OuylE8llvTZUmC+3RXi9qJYNSCJ3R8SJN9Cm7wtHAhUE1qiICRBbK2qtlm892rKx52p7YFPHqOTsJffcJJhBRrAQZSJEOHurXkf0435ynNSibDDwkS4Dw+vZxJvoYwF8dsZ2nX4NeuT7JfEnUNLgTeajV/5XrVwQFvWoaLDYkrJp+glL2wvJBeQ7NOuHtj8FNHJwHMv7H4Cg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=h3B3AaTq+w45EBtEXP3qARYWNz/SJlCjAiXZSNpc0U8=; b=AKtMRdf02qh3plVLYQknE+VYM4DpqKkq1P/sytRGmpCbYIitCtESGXf8EdCrqkdkcXLFMKV5dtiqYhLdAI5GVqvE01bZeHJsTa/LF4xTzIlVBFwgnqs5b5QWmcP5xC3qN8UfsIVxogu1/pMspFi7HTIkafMzn5RmumUHJYCWTeyjlVLNSdm2aXicwqC5HGhVuD0MIIO9Ax7zpoX9NMjP0qEf9lqGEfV7g31cSWB9FZxTsFvLMtioUA3hZD1lI0yta1V0Fm+89DPArCrXnxLlhfYRvxa6xogBVs+j9up8o9sfclvUi/Yz1iBlL9KcjstImrrXxtRfyRc/tzargjnDcg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 149.199.80.198) smtp.rcpttodomain=ericsson.com smtp.mailfrom=xilinx.com; dmarc=pass (p=none sp=none pct=100) action=none header.from=xilinx.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=xilinx.onmicrosoft.com; s=selector2-xilinx-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=h3B3AaTq+w45EBtEXP3qARYWNz/SJlCjAiXZSNpc0U8=; b=UlqT8ZtWTKCNXnKy8CS8QjxCeXSzxsYkx5QxvFvwJ4wE9TUWNmeaRcKfHZvvPeYbfo6WRcgAwKpOBUV4oUz8mq+3V53JuG2UDwwNwBQYwNMqIv0PAmlSCOtKU+XaB+YqUHLJ61wtmHsqESk6N8ofPYXWGkwSErdLII1UIHjJ56Y= Received: from DM6PR03CA0067.namprd03.prod.outlook.com (2603:10b6:5:100::44) by CY4PR02MB3368.namprd02.prod.outlook.com (2603:10b6:910:7f::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5395.17; Fri, 8 Jul 2022 14:44:17 +0000 Received: from DM3NAM02FT019.eop-nam02.prod.protection.outlook.com (2603:10b6:5:100:cafe::ee) by DM6PR03CA0067.outlook.office365.com (2603:10b6:5:100::44) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5417.15 via Frontend Transport; Fri, 8 Jul 2022 14:44:17 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 149.199.80.198) smtp.mailfrom=xilinx.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=xilinx.com; Received-SPF: Pass (protection.outlook.com: domain of xilinx.com designates 149.199.80.198 as permitted sender) receiver=protection.outlook.com; client-ip=149.199.80.198; helo=xir-pvapexch01.xlnx.xilinx.com; pr=C Received: from xir-pvapexch01.xlnx.xilinx.com (149.199.80.198) by DM3NAM02FT019.mail.protection.outlook.com (10.13.4.191) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.5417.15 via Frontend Transport; Fri, 8 Jul 2022 14:44:17 +0000 Received: from xir-pvapexch01.xlnx.xilinx.com (172.21.17.15) by xir-pvapexch01.xlnx.xilinx.com (172.21.17.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2176.14; Fri, 8 Jul 2022 15:44:15 +0100 Received: from smtp.xilinx.com (172.21.105.197) by xir-pvapexch01.xlnx.xilinx.com (172.21.17.15) with Microsoft SMTP Server id 15.1.2176.14 via Frontend Transport; Fri, 8 Jul 2022 15:44:15 +0100 Envelope-to: mattias.ronnblom@ericsson.com, olivier.matz@6wind.com, emil.berg@ericsson.com, bruce.richardson@intel.com, stephen@networkplumber.org, stable@dpdk.org, bugzilla@dpdk.org, dev@dpdk.org, onar.olsen@ericsson.com, mb@smartsharesystems.com Received: from [10.71.116.113] (port=59037) by smtp.xilinx.com with esmtp (Exim 4.90) (envelope-from ) id 1o9pDL-0003IP-Ov; Fri, 08 Jul 2022 15:44:15 +0100 Message-ID: <58432e09-11c1-5ce0-3e8c-9b3df7266e6a@xilinx.com> Date: Fri, 8 Jul 2022 15:44:14 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 Subject: Re: [PATCH v2 2/2] net: have checksum routines accept unaligned data Content-Language: en-US To: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= , CC: Emil Berg , , , , , , , =?UTF-8?Q?Morten_Br=c3=b8rup?= References: <6839721a-8050-0e11-0c66-0f735ec8c56d@ericsson.com> <20220708125608.24532-1-mattias.ronnblom@ericsson.com> <20220708125608.24532-2-mattias.ronnblom@ericsson.com> From: Ferruh Yigit In-Reply-To: <20220708125608.24532-2-mattias.ronnblom@ericsson.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 71985944-51a2-473b-4b04-08da60f055ef X-MS-TrafficTypeDiagnostic: CY4PR02MB3368:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: mkDhVJ3OyDK7ogcKsTJzW9/yj9ePARvgreG/C93+1OFPF0MPa/Mo55srMtf4XZAMzKvaS8ZSb+tWf1yNBwIJjZqqcIdxG5YVPda3o8dgz2rZeDutvq2sjHP+z65pF1JHoo/LEMHIP3hq2SG8poH/8e/WRaLToblEpGX56gl4G8nFfhgdw+VuX3PZGFpG8snOhVMCezwXR8zM+uNnZGlEce8WnqwavuHzVZcLK5hja0etKPn/OwKGfKGYXRJA+6LL3tRff1/pXIf3loVW4MFeyx1XmBWaYFExANoCkVF110Gv0h3S9L9V7CvkRiH3KFFtHDFUxgvSfYzZ8ijMN0YqTpaQQSnievbx6vmi+Y+p4zWJcQM7noslA29xWp2Ogah2LgFAFAdQ2Bf7Zp7l6bf+NcunZAuLGfGrz2EoRfJQ/puvCOP+Mwh3c0s+LcPpC6NPniStR7GKPAnzH/2hkAPxnP42IuzJMP1Fg2Mf8ougMHg8hllK1gC/pKp44L74xUewzcaDUSx8HINplu6Cfn+kc0X4pczL+uWW8bAeP6C07/Dg3mZvf15GCC14h6Pgqg4mZvU8JFONeP7KTFieNTRx3hfBE+vd72/3ziPN0Ht+ysoJtkf95Y5IgUOplKsdbMz2N0DiByjLXnrZSlD/20mOWsLqR6oJBgn8umlQfZLSfl+T0VZDWhTuQb13C+SqJ+2axoKpGTTOihsYJHAxJCWaNcSe8EAzIzcVlwQERCyp+brn/BqWU6tuWxUQugqmy2TYRf2z6xbvNWy3PtxZ89v3KfL9/3X1CwyOvygAOlJAxlY4dcBMphXwZqWadfPTTi+Lc860pnVWhRJu5GUHe1cd/xyXfOCFvfF6P6djsjxKeZMBpYbozVNiqWG7XM3JXquc X-Forefront-Antispam-Report: CIP:149.199.80.198; CTRY:IE; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:xir-pvapexch01.xlnx.xilinx.com; PTR:unknown-80-198.xilinx.com; CAT:NONE; SFS:(13230016)(4636009)(376002)(136003)(346002)(39860400002)(396003)(36840700001)(46966006)(40470700004)(31696002)(356005)(7636003)(336012)(83380400001)(36756003)(53546011)(36860700001)(82740400003)(47076005)(186003)(26005)(41300700001)(2616005)(110136005)(82310400005)(8936002)(316002)(5660300002)(9786002)(8676002)(70206006)(478600001)(70586007)(54906003)(66574015)(7416002)(426003)(31686004)(2906002)(40480700001)(40460700003)(44832011)(4326008)(50156003)(43740500002); DIR:OUT; SFP:1101; X-OriginatorOrg: xilinx.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 08 Jul 2022 14:44:17.3697 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 71985944-51a2-473b-4b04-08da60f055ef X-MS-Exchange-CrossTenant-Id: 657af505-d5df-48d0-8300-c31994686c5c X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=657af505-d5df-48d0-8300-c31994686c5c; Ip=[149.199.80.198]; Helo=[xir-pvapexch01.xlnx.xilinx.com] X-MS-Exchange-CrossTenant-AuthSource: DM3NAM02FT019.eop-nam02.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR02MB3368 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org On 7/8/2022 1:56 PM, Mattias Rönnblom wrote: > __rte_raw_cksum() (used by rte_raw_cksum() among others) accessed its > data through an uint16_t pointer, which allowed the compiler to assume > the data was 16-bit aligned. This in turn would, with certain > architectures and compiler flag combinations, result in code with SIMD > load or store instructions with restrictions on data alignment. > > This patch keeps the old algorithm, but data is read using memcpy() > instead of direct pointer access, forcing the compiler to always > generate code that handles unaligned input. The __may_alias__ GCC > attribute is no longer needed. > > The data on which the Internet checksum functions operates are almost > always 16-bit aligned, but there are exceptions. In particular, the > PDCP protocol header may (literally) have an odd size. > > Performance impact seems to range from none to a very slight > regression. > > Bugzilla ID: 1035 > Cc: stable@dpdk.org > > --- > > v2: > * Simplified the odd-length conditional (Morten Brørup). > > Reviewed-by: Morten Brørup > > Signed-off-by: Mattias Rönnblom > --- > lib/net/rte_ip.h | 17 ++++++++++------- > 1 file changed, 10 insertions(+), 7 deletions(-) > > diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h > index b502481670..a0334d931e 100644 > --- a/lib/net/rte_ip.h > +++ b/lib/net/rte_ip.h > @@ -160,18 +160,21 @@ rte_ipv4_hdr_len(const struct rte_ipv4_hdr *ipv4_hdr) > static inline uint32_t > __rte_raw_cksum(const void *buf, size_t len, uint32_t sum) > { > - /* extend strict-aliasing rules */ > - typedef uint16_t __attribute__((__may_alias__)) u16_p; > - const u16_p *u16_buf = (const u16_p *)buf; > - const u16_p *end = u16_buf + len / sizeof(*u16_buf); > + const void *end; > > - for (; u16_buf != end; ++u16_buf) > - sum += *u16_buf; > + for (end = RTE_PTR_ADD(buf, (len/sizeof(uint16_t)) * sizeof(uint16_t)); > + buf != end; buf = RTE_PTR_ADD(buf, sizeof(uint16_t))) { > + uint16_t v; > + > + memcpy(&v, buf, sizeof(uint16_t)); > + sum += v; > + } > > /* if length is odd, keeping it byte order independent */ > if (unlikely(len % 2)) { > uint16_t left = 0; > - *(unsigned char *)&left = *(const unsigned char *)end; > + > + memcpy(&left, end, 1); > sum += left; > } > Hi Mattias, I got following result [1] with patches on [2]. Can you shed light to some questions I have, 1) For 1500 why 'Unaligned' access gives better performance than 'Aligned' access? 2) Why 21/101 bytes almost doubles 20/100 bytes perf? 3) Why 1501 bytes perf better than 1500 bytes perf? Btw, I don't see any noticeable performance difference between with and without patch. [1] RTE>>cksum_perf_autotest ### rte_raw_cksum() performance ### Alignment Block size TSC cycles/block TSC cycles/byte Aligned 20 25.1 1.25 Unaligned 20 25.1 1.25 Aligned 21 51.5 2.45 Unaligned 21 51.5 2.45 Aligned 100 28.2 0.28 Unaligned 100 28.2 0.28 Aligned 101 54.5 0.54 Unaligned 101 54.5 0.54 Aligned 1500 188.9 0.13 Unaligned 1500 138.7 0.09 Aligned 1501 114.1 0.08 Unaligned 1501 110.1 0.07 Test OK RTE>> [2] AMD EPYC 7543P