From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 7674BA0543 for ; Tue, 21 Jun 2022 10:07:47 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 7199340151; Tue, 21 Jun 2022 10:07:47 +0200 (CEST) Received: from NAM10-BN7-obe.outbound.protection.outlook.com (mail-bn7nam10on2083.outbound.protection.outlook.com [40.107.92.83]) by mails.dpdk.org (Postfix) with ESMTP id 260D840151 for ; Tue, 21 Jun 2022 10:07:46 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=BGSGGFEoHaZXOSz/yzjOpYHtynLnQyZuDd3/I8f6MRqm1GlIYvgw0j6ZsBUQTI+e58ebKzZzoF8D/rmoJGEitOSCmjPxeUF+SAsAMCe2PKQI8Hx5K9APxJ+keMUBm+eV1v6RtRJ8znixrUn0pHtKNWBbmyQiTtio0HcPmWfhmKAYa8lOoHv7DIGbzWOilmPJgO/jDM7QHkRB9n9flLlZNzQ1jg2zJMz3PdWtszTqJ/0cTF64OKrU+6VR+HgtK6cWV6by94NZwr0qRVlabMzrfvobba6vh1Lxl3E0RIDHTYOeNHsMTQIkBHpqcpxpdEPorwWVB9VBq8Eg3eRrfd4S4g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=vFcJcO5heRuv7eKTVhkw4brDHMx6FILoY/GlK/tlBSw=; b=fU2uQpaB73dqWwMXkm1957HyGag2jfrdmRkbOvtoPGR+H0Xe4GDbhFHAIveeYvB6846A5Tjm3RRGBdNwNdUqAfwq2Nvd3+otcl4bqiwXbBpJFNMT9hufsOBDZWgEreefbvH22Uya1i5gXI55dpXWVRwaxftqQxHQeANEMc0IINrMAUqPcXEgyhXyHElbLZrpfNnu30EXNcRvyj6Sva4T4z2aHf41iAgCKoGgirzLm70UtBCFDXlHFa7utmCIsJ8BGqEsUIdTaCxvjk2MG5svc8RuohzSBEDICLn93l8h8126VQuZG9SX8yfExnVXRPyM3kXb1p1HYJ2shrdhWKql4A== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 12.22.5.236) smtp.rcpttodomain=dpdk.org smtp.mailfrom=nvidia.com; dmarc=pass (p=reject sp=reject pct=100) action=none header.from=nvidia.com; dkim=none (message not signed); arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=Nvidia.com; s=selector2; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=vFcJcO5heRuv7eKTVhkw4brDHMx6FILoY/GlK/tlBSw=; b=HmTcBBRCeNtpSJwsPU386v3il/cR6uczuNNvV40wV1SeoQfagvPI35guMZ7IjLNcWsVCL5n40QYCDxZW6XhakC6iq8XKV+yADH/H5E97u7PZryrTRGIAbO33URRT+EsNTu/QltQQvuq3bA3PKV51BxBfembgMyoKJfZECK3ZtQc/adBcWVMqUv/oQ8KJ8cv+w3ZBLSQeglp2HZ5w9Fa3IS6Euic/ZTp2g9Z9rXyGWABbYUA8ujk/fp1kWoWl2iUgcHbbbrPD0NDfR/Qy7UaVHn+Me9Eik70O7Qg81k+FyYTSgyZ8YeppMyu4bWYKxJysOpUIU/iCVfCkjMNoMN67pA== Received: from DM6PR13CA0008.namprd13.prod.outlook.com (2603:10b6:5:bc::21) by BL0PR12MB2433.namprd12.prod.outlook.com (2603:10b6:207:4a::25) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.15; Tue, 21 Jun 2022 08:07:42 +0000 Received: from DM6NAM11FT049.eop-nam11.prod.protection.outlook.com (2603:10b6:5:bc:cafe::86) by DM6PR13CA0008.outlook.office365.com (2603:10b6:5:bc::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.5353.11 via Frontend Transport; Tue, 21 Jun 2022 08:07:42 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 12.22.5.236) smtp.mailfrom=nvidia.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=nvidia.com; Received-SPF: Pass (protection.outlook.com: domain of nvidia.com designates 12.22.5.236 as permitted sender) receiver=protection.outlook.com; client-ip=12.22.5.236; helo=mail.nvidia.com; pr=C Received: from mail.nvidia.com (12.22.5.236) by DM6NAM11FT049.mail.protection.outlook.com (10.13.172.188) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384) id 15.20.5353.14 via Frontend Transport; Tue, 21 Jun 2022 08:07:42 +0000 Received: from rnnvmail201.nvidia.com (10.129.68.8) by DRHQMAIL109.nvidia.com (10.27.9.19) with Microsoft SMTP Server (TLS) id 15.0.1497.32; Tue, 21 Jun 2022 08:07:41 +0000 Received: from nvidia.com (10.126.230.35) by rnnvmail201.nvidia.com (10.129.68.8) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.2.986.22; Tue, 21 Jun 2022 01:07:40 -0700 From: Xueming Li To: Luc Pelletier CC: Konstantin Ananyev , dpdk stable Subject: patch 'eal/x86: fix unaligned access for small memcpy' has been queued to stable release 20.11.6 Date: Tue, 21 Jun 2022 11:01:59 +0300 Message-ID: <20220621080301.2315720-54-xuemingl@nvidia.com> X-Mailer: git-send-email 2.27.0 In-Reply-To: <20220621080301.2315720-1-xuemingl@nvidia.com> References: <20220621080301.2315720-1-xuemingl@nvidia.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.126.230.35] X-ClientProxiedBy: rnnvmail202.nvidia.com (10.129.68.7) To rnnvmail201.nvidia.com (10.129.68.8) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: a1f23146-f5aa-43a7-8010-08da535d1ded X-MS-TrafficTypeDiagnostic: BL0PR12MB2433:EE_ X-Microsoft-Antispam-PRVS: X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: cOfy+P0r3jCqxqd2/aU4hsnDRw8VNmF6XBULOvqAKBqvyoP0wyzOY1g6aMNAOPMrEU8tY93tvBdyAeda5OBf31W041Gb2+DMSLxLzmWbGwsScku62OCEM6uAzEyj86vRWZRCgpKwZHEGTJvUFycqsalv7jWpMkBkn4v2p9lJ0kcXYbvKgLgt7LtnniP+4/M96dCt9IppzfbU+yPtT2JNW2PZOTjpvXxZafwgFri2dDroUyF47ligb+SMw6MAkEv7MPxT10IkvleSk4NWmRbOifzVF3G5rCT6GLS3NleKgL2EW4ofUYEvllnF7FubYwDv0sirnBTaksBe4Z0hgyeZlmjBoQ7uMgXS8nsXlSfpIwFj3NhvC+rts/y9Bdv1O/7u7O/DPqX4Z3RiKxVsTFoKHbWgkHsSWyRAST6Xbf4lhxT2M9sYKGjNebEGo5APxCQAKwKYXX6Y9iF+0hGfcnA/d5HWXcfkdk8AgrTIZxmIFi8I36wyzMuwU4q62tFXDQqgJVIaobN80bVCyWgSi608VmSC92Vj7jmNzTVn0fZ9ltcaRp4PT0rhE9ChljCkDtiewwp9Pp4qCOdENX8jI8Ver51i0KG5GjsDNuVd06ghYUpck8hQXi3MzYLIZD2eE4iq8dSRs0FjjlD/Mp5tI7r/1hauvqiIYdgMQMFSHdxWyvHAvsMvU5u7yvliWlR59lQfaIaxn65/APofurvplcVCEDNYeB/01hcjUGjWhqja6PgDSMjjnxBqNtq3IaPt+3+Y52MJ/Lqmrlbra4UUF4PmLbxxvZN7l4qyPu2Qh8iDui2b5DbLeCYiCOSRCz/klj6OX0hHKznoLxMzFCaBeLqtR7gYI1QLiEUbC4jOBmC8jac= X-Forefront-Antispam-Report: CIP:12.22.5.236; CTRY:US; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:mail.nvidia.com; PTR:InfoNoRecords; CAT:NONE; SFS:(13230016)(4636009)(346002)(396003)(136003)(376002)(39860400002)(36840700001)(40470700004)(46966006)(16526019)(55016003)(316002)(8676002)(186003)(4326008)(86362001)(70206006)(2906002)(26005)(5660300002)(1076003)(70586007)(40460700003)(8936002)(47076005)(83380400001)(54906003)(41300700001)(36756003)(6916009)(478600001)(356005)(426003)(82310400005)(966005)(336012)(36860700001)(82740400003)(6286002)(53546011)(7696005)(40480700001)(81166007)(6666004)(2616005)(36900700001); DIR:OUT; SFP:1101; X-OriginatorOrg: Nvidia.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 21 Jun 2022 08:07:42.3952 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: a1f23146-f5aa-43a7-8010-08da535d1ded X-MS-Exchange-CrossTenant-Id: 43083d15-7273-40c1-b7db-39efd9ccc17a X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=43083d15-7273-40c1-b7db-39efd9ccc17a; Ip=[12.22.5.236]; Helo=[mail.nvidia.com] X-MS-Exchange-CrossTenant-AuthSource: DM6NAM11FT049.eop-nam11.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: BL0PR12MB2433 X-BeenThere: stable@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: patches for DPDK stable branches List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: stable-bounces@dpdk.org Hi, FYI, your patch has been queued to stable release 20.11.6 Note it hasn't been pushed to http://dpdk.org/browse/dpdk-stable yet. It will be pushed if I get no objections before 06/23/22. So please shout if anyone has objections. Also note that after the patch there's a diff of the upstream commit vs the patch applied to the branch. This will indicate if there was any rebasing needed to apply to the stable branch. If there were code changes for rebasing (ie: not only metadata diffs), please double check that the rebase was correctly done. Queued patches are on a temporary branch at: https://github.com/steevenlee/dpdk This queued commit can be viewed at: https://github.com/steevenlee/dpdk/commit/705be73150297d6254e327bc64c57e48409f02df Thanks. Xueming Li --- >From 705be73150297d6254e327bc64c57e48409f02df Mon Sep 17 00:00:00 2001 From: Luc Pelletier Date: Fri, 25 Feb 2022 11:38:05 -0500 Subject: [PATCH] eal/x86: fix unaligned access for small memcpy Cc: Xueming Li [ upstream commit 00901e4d1a9ee7c7b43d0a3592683f0a420a331d ] Calls to rte_memcpy for 1 < n < 16 could result in unaligned loads/stores, which is undefined behaviour according to the C standard, and strict aliasing violations. The code was changed to use a packed structure that allows aliasing (using the __may_alias__ attribute) to perform the load/store operations. This results in code that has the same performance as the original code and that is also C standards-compliant. Fixes: af75078fece3 ("first public release") Signed-off-by: Luc Pelletier Acked-by: Konstantin Ananyev Tested-by: Konstantin Ananyev --- lib/librte_eal/include/rte_common.h | 5 + lib/librte_eal/x86/include/rte_memcpy.h | 133 +++++++++--------------- 2 files changed, 56 insertions(+), 82 deletions(-) diff --git a/lib/librte_eal/include/rte_common.h b/lib/librte_eal/include/rte_common.h index 1b630baf16..677b52a2f8 100644 --- a/lib/librte_eal/include/rte_common.h +++ b/lib/librte_eal/include/rte_common.h @@ -83,6 +83,11 @@ typedef uint16_t unaligned_uint16_t; */ #define __rte_packed __attribute__((__packed__)) +/** + * Macro to mark a type that is not subject to type-based aliasing rules + */ +#define __rte_may_alias __attribute__((__may_alias__)) + /******* Macro to mark functions and fields scheduled for removal *****/ #define __rte_deprecated __attribute__((__deprecated__)) #define __rte_deprecated_msg(msg) __attribute__((__deprecated__(msg))) diff --git a/lib/librte_eal/x86/include/rte_memcpy.h b/lib/librte_eal/x86/include/rte_memcpy.h index 1b6c6e585f..18aa4e43a7 100644 --- a/lib/librte_eal/x86/include/rte_memcpy.h +++ b/lib/librte_eal/x86/include/rte_memcpy.h @@ -45,6 +45,52 @@ extern "C" { static __rte_always_inline void * rte_memcpy(void *dst, const void *src, size_t n); +/** + * Copy bytes from one location to another, + * locations should not overlap. + * Use with n <= 15. + */ +static __rte_always_inline void * +rte_mov15_or_less(void *dst, const void *src, size_t n) +{ + /** + * Use the following structs to avoid violating C standard + * alignment requirements and to avoid strict aliasing bugs + */ + struct rte_uint64_alias { + uint64_t val; + } __rte_packed __rte_may_alias; + struct rte_uint32_alias { + uint32_t val; + } __rte_packed __rte_may_alias; + struct rte_uint16_alias { + uint16_t val; + } __rte_packed __rte_may_alias; + + void *ret = dst; + if (n & 8) { + ((struct rte_uint64_alias *)dst)->val = + ((const struct rte_uint64_alias *)src)->val; + src = (const uint64_t *)src + 1; + dst = (uint64_t *)dst + 1; + } + if (n & 4) { + ((struct rte_uint32_alias *)dst)->val = + ((const struct rte_uint32_alias *)src)->val; + src = (const uint32_t *)src + 1; + dst = (uint32_t *)dst + 1; + } + if (n & 2) { + ((struct rte_uint16_alias *)dst)->val = + ((const struct rte_uint16_alias *)src)->val; + src = (const uint16_t *)src + 1; + dst = (uint16_t *)dst + 1; + } + if (n & 1) + *(uint8_t *)dst = *(const uint8_t *)src; + return ret; +} + #if defined __AVX512F__ && defined RTE_MEMCPY_AVX512 #define ALIGNMENT_MASK 0x3F @@ -171,8 +217,6 @@ rte_mov512blocks(uint8_t *dst, const uint8_t *src, size_t n) static __rte_always_inline void * rte_memcpy_generic(void *dst, const void *src, size_t n) { - uintptr_t dstu = (uintptr_t)dst; - uintptr_t srcu = (uintptr_t)src; void *ret = dst; size_t dstofss; size_t bits; @@ -181,24 +225,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n) * Copy less than 16 bytes */ if (n < 16) { - if (n & 0x01) { - *(uint8_t *)dstu = *(const uint8_t *)srcu; - srcu = (uintptr_t)((const uint8_t *)srcu + 1); - dstu = (uintptr_t)((uint8_t *)dstu + 1); - } - if (n & 0x02) { - *(uint16_t *)dstu = *(const uint16_t *)srcu; - srcu = (uintptr_t)((const uint16_t *)srcu + 1); - dstu = (uintptr_t)((uint16_t *)dstu + 1); - } - if (n & 0x04) { - *(uint32_t *)dstu = *(const uint32_t *)srcu; - srcu = (uintptr_t)((const uint32_t *)srcu + 1); - dstu = (uintptr_t)((uint32_t *)dstu + 1); - } - if (n & 0x08) - *(uint64_t *)dstu = *(const uint64_t *)srcu; - return ret; + return rte_mov15_or_less(dst, src, n); } /** @@ -379,8 +406,6 @@ rte_mov128blocks(uint8_t *dst, const uint8_t *src, size_t n) static __rte_always_inline void * rte_memcpy_generic(void *dst, const void *src, size_t n) { - uintptr_t dstu = (uintptr_t)dst; - uintptr_t srcu = (uintptr_t)src; void *ret = dst; size_t dstofss; size_t bits; @@ -389,25 +414,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n) * Copy less than 16 bytes */ if (n < 16) { - if (n & 0x01) { - *(uint8_t *)dstu = *(const uint8_t *)srcu; - srcu = (uintptr_t)((const uint8_t *)srcu + 1); - dstu = (uintptr_t)((uint8_t *)dstu + 1); - } - if (n & 0x02) { - *(uint16_t *)dstu = *(const uint16_t *)srcu; - srcu = (uintptr_t)((const uint16_t *)srcu + 1); - dstu = (uintptr_t)((uint16_t *)dstu + 1); - } - if (n & 0x04) { - *(uint32_t *)dstu = *(const uint32_t *)srcu; - srcu = (uintptr_t)((const uint32_t *)srcu + 1); - dstu = (uintptr_t)((uint32_t *)dstu + 1); - } - if (n & 0x08) { - *(uint64_t *)dstu = *(const uint64_t *)srcu; - } - return ret; + return rte_mov15_or_less(dst, src, n); } /** @@ -672,8 +679,6 @@ static __rte_always_inline void * rte_memcpy_generic(void *dst, const void *src, size_t n) { __m128i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8; - uintptr_t dstu = (uintptr_t)dst; - uintptr_t srcu = (uintptr_t)src; void *ret = dst; size_t dstofss; size_t srcofs; @@ -682,25 +687,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n) * Copy less than 16 bytes */ if (n < 16) { - if (n & 0x01) { - *(uint8_t *)dstu = *(const uint8_t *)srcu; - srcu = (uintptr_t)((const uint8_t *)srcu + 1); - dstu = (uintptr_t)((uint8_t *)dstu + 1); - } - if (n & 0x02) { - *(uint16_t *)dstu = *(const uint16_t *)srcu; - srcu = (uintptr_t)((const uint16_t *)srcu + 1); - dstu = (uintptr_t)((uint16_t *)dstu + 1); - } - if (n & 0x04) { - *(uint32_t *)dstu = *(const uint32_t *)srcu; - srcu = (uintptr_t)((const uint32_t *)srcu + 1); - dstu = (uintptr_t)((uint32_t *)dstu + 1); - } - if (n & 0x08) { - *(uint64_t *)dstu = *(const uint64_t *)srcu; - } - return ret; + return rte_mov15_or_less(dst, src, n); } /** @@ -818,27 +805,9 @@ rte_memcpy_aligned(void *dst, const void *src, size_t n) { void *ret = dst; - /* Copy size <= 16 bytes */ + /* Copy size < 16 bytes */ if (n < 16) { - if (n & 0x01) { - *(uint8_t *)dst = *(const uint8_t *)src; - src = (const uint8_t *)src + 1; - dst = (uint8_t *)dst + 1; - } - if (n & 0x02) { - *(uint16_t *)dst = *(const uint16_t *)src; - src = (const uint16_t *)src + 1; - dst = (uint16_t *)dst + 1; - } - if (n & 0x04) { - *(uint32_t *)dst = *(const uint32_t *)src; - src = (const uint32_t *)src + 1; - dst = (uint32_t *)dst + 1; - } - if (n & 0x08) - *(uint64_t *)dst = *(const uint64_t *)src; - - return ret; + return rte_mov15_or_less(dst, src, n); } /* Copy 16 <= size <= 32 bytes */ -- 2.35.1 --- Diff of the applied patch vs upstream commit (please double-check if non-empty: --- --- - 2022-06-21 15:37:51.730081067 +0800 +++ 0053-eal-x86-fix-unaligned-access-for-small-memcpy.patch 2022-06-21 15:37:49.074451382 +0800 @@ -1 +1 @@ -From 00901e4d1a9ee7c7b43d0a3592683f0a420a331d Mon Sep 17 00:00:00 2001 +From 705be73150297d6254e327bc64c57e48409f02df Mon Sep 17 00:00:00 2001 @@ -4,0 +5,3 @@ +Cc: Xueming Li + +[ upstream commit 00901e4d1a9ee7c7b43d0a3592683f0a420a331d ] @@ -16 +18,0 @@ -Cc: stable@dpdk.org @@ -22,2 +24,2 @@ - lib/eal/include/rte_common.h | 5 ++ - lib/eal/x86/include/rte_memcpy.h | 133 ++++++++++++------------------- + lib/librte_eal/include/rte_common.h | 5 + + lib/librte_eal/x86/include/rte_memcpy.h | 133 +++++++++--------------- @@ -26,5 +28,5 @@ -diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h -index d56a7570c0..a96cc2a138 100644 ---- a/lib/eal/include/rte_common.h -+++ b/lib/eal/include/rte_common.h -@@ -85,6 +85,11 @@ typedef uint16_t unaligned_uint16_t; +diff --git a/lib/librte_eal/include/rte_common.h b/lib/librte_eal/include/rte_common.h +index 1b630baf16..677b52a2f8 100644 +--- a/lib/librte_eal/include/rte_common.h ++++ b/lib/librte_eal/include/rte_common.h +@@ -83,6 +83,11 @@ typedef uint16_t unaligned_uint16_t; @@ -42 +44 @@ -diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h +diff --git a/lib/librte_eal/x86/include/rte_memcpy.h b/lib/librte_eal/x86/include/rte_memcpy.h @@ -44,2 +46,2 @@ ---- a/lib/eal/x86/include/rte_memcpy.h -+++ b/lib/eal/x86/include/rte_memcpy.h +--- a/lib/librte_eal/x86/include/rte_memcpy.h ++++ b/lib/librte_eal/x86/include/rte_memcpy.h