From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id EDC5FA0554 for ; Mon, 27 Jun 2022 14:28:34 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E1A8B410EF; Mon, 27 Jun 2022 14:28:34 +0200 (CEST) Received: from mail.lysator.liu.se (mail.lysator.liu.se [130.236.254.3]) by mails.dpdk.org (Postfix) with ESMTP id E1B0E400D5; Mon, 27 Jun 2022 14:28:31 +0200 (CEST) Received: from mail.lysator.liu.se (localhost [127.0.0.1]) by mail.lysator.liu.se (Postfix) with ESMTP id 860191DEC4; Mon, 27 Jun 2022 14:28:31 +0200 (CEST) Received: by mail.lysator.liu.se (Postfix, from userid 1004) id 70B511E205; Mon, 27 Jun 2022 14:28:31 +0200 (CEST) X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on hermod.lysator.liu.se X-Spam-Level: X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED, AWL, NICE_REPLY_A, T_SCC_BODY_TEXT_LINE autolearn=disabled version=3.4.6 X-Spam-Score: -2.0 Received: from [192.168.1.59] (unknown [62.63.215.114]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mail.lysator.liu.se (Postfix) with ESMTPSA id 497271E383; Mon, 27 Jun 2022 14:28:28 +0200 (CEST) Message-ID: <9f543fc0-ae8a-067b-d13f-38a0503dd619@lysator.liu.se> Date: Mon, 27 Jun 2022 14:28:27 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.9.1 Subject: Re: [PATCH v4] net: fix checksum with unaligned buffer Content-Language: en-US To: =?UTF-8?Q?Morten_Br=c3=b8rup?= , emil.berg@ericsson.com, bruce.richardson@intel.com, dev@dpdk.org Cc: stephen@networkplumber.org, stable@dpdk.org, bugzilla@dpdk.org, olivier.matz@6wind.com References: <98CBD80474FA8B44BF855DF32C47DC35D87139@smartserver.smartshare.dk> <20220623123900.38283-1-mb@smartsharesystems.com> <98CBD80474FA8B44BF855DF32C47DC35D87169@smartserver.smartshare.dk> From: =?UTF-8?Q?Mattias_R=c3=b6nnblom?= In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D87169@smartserver.smartshare.dk> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Virus-Scanned: ClamAV using ClamSMTP X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org On 2022-06-23 14:51, Morten Brørup wrote: >> From: Morten Brørup [mailto:mb@smartsharesystems.com] >> Sent: Thursday, 23 June 2022 14.39 >> >> With this patch, the checksum can be calculated on an unaligned buffer. >> I.e. the buf parameter is no longer required to be 16 bit aligned. >> >> The checksum is still calculated using a 16 bit aligned pointer, so the >> compiler can auto-vectorize the function's inner loop. >> >> When the buffer is unaligned, the first byte of the buffer is handled >> separately. Furthermore, the calculated checksum of the buffer is byte >> shifted before being added to the initial checksum, to compensate for >> the >> checksum having been calculated on the buffer shifted by one byte. >> >> v4: >> * Add copyright notice. >> * Include stdbool.h (Emil Berg). >> * Use RTE_PTR_ADD (Emil Berg). >> * Fix one more typo in commit message. Is 'unligned' even a word? >> v3: >> * Remove braces from single statement block. >> * Fix typo in commit message. >> v2: >> * Do not assume that the buffer is part of an aligned packet buffer. >> >> Bugzilla ID: 1035 >> Cc: stable@dpdk.org >> >> Signed-off-by: Morten Brørup >> --- >> lib/net/rte_ip.h | 32 +++++++++++++++++++++++++++----- >> 1 file changed, 27 insertions(+), 5 deletions(-) >> >> diff --git a/lib/net/rte_ip.h b/lib/net/rte_ip.h >> index b502481670..738d643da0 100644 >> --- a/lib/net/rte_ip.h >> +++ b/lib/net/rte_ip.h >> @@ -3,6 +3,7 @@ >> * The Regents of the University of California. >> * Copyright(c) 2010-2014 Intel Corporation. >> * Copyright(c) 2014 6WIND S.A. >> + * Copyright(c) 2022 SmartShare Systems. >> * All rights reserved. >> */ >> >> @@ -15,6 +16,7 @@ >> * IP-related defines >> */ >> >> +#include >> #include >> >> #ifdef RTE_EXEC_ENV_WINDOWS >> @@ -162,20 +164,40 @@ __rte_raw_cksum(const void *buf, size_t len, >> uint32_t sum) >> { >> /* extend strict-aliasing rules */ >> typedef uint16_t __attribute__((__may_alias__)) u16_p; >> - const u16_p *u16_buf = (const u16_p *)buf; >> - const u16_p *end = u16_buf + len / sizeof(*u16_buf); >> + const u16_p *u16_buf; >> + const u16_p *end; >> + uint32_t bsum = 0; >> + const bool unaligned = (uintptr_t)buf & 1; >> + >> + /* if buffer is unaligned, keeping it byte order independent */ >> + if (unlikely(unaligned)) { >> + uint16_t first = 0; >> + if (unlikely(len == 0)) >> + return 0; >> + ((unsigned char *)&first)[1] = *(const unsigned char *)buf; >> + bsum += first; >> + buf = RTE_PTR_ADD(buf, 1); >> + len--; >> + } >> >> + /* aligned access for compiler auto-vectorization */ The compiler will be able to auto vectorize even unaligned accesses, just with different instructions. From what I can tell, there's no performance impact, at least not on the x86_64 systems I tried on. I think you should remove the first special case conditional and use memcpy() instead of the cumbersome __may_alias__ construct to retrieve the data. >> + u16_buf = (const u16_p *)buf; >> + end = u16_buf + len / sizeof(*u16_buf); >> for (; u16_buf != end; ++u16_buf) >> - sum += *u16_buf; >> + bsum += *u16_buf; >> >> /* if length is odd, keeping it byte order independent */ >> if (unlikely(len % 2)) { >> uint16_t left = 0; >> *(unsigned char *)&left = *(const unsigned char *)end; >> - sum += left; >> + bsum += left; >> } >> >> - return sum; >> + /* if buffer is unaligned, swap the checksum bytes */ >> + if (unlikely(unaligned)) >> + bsum = (bsum & 0xFF00FF00) >> 8 | (bsum & 0x00FF00FF) << 8; >> + >> + return sum + bsum; >> } >> >> /** >> -- >> 2.17.1 > > @Emil, thank you for thoroughly reviewing the previous versions. > > If your test succeeds and you are satisfied with the patch, remember to reply with a "Tested-by" tag for patchwork. >