From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D5996A034F; Sat, 15 Jan 2022 20:44:33 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3A34F4114B; Sat, 15 Jan 2022 20:44:33 +0100 (CET) Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com [209.85.219.54]) by mails.dpdk.org (Postfix) with ESMTP id 15F894013F; Sat, 15 Jan 2022 20:44:31 +0100 (CET) Received: by mail-qv1-f54.google.com with SMTP id bt14so7829239qvb.13; Sat, 15 Jan 2022 11:44:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=tgUf2IrRb1ST/Ax4oEdzt2M+U6Y0jTsV4gTHJGTmen0=; b=UOTSx6mElrSUVIga8q0gqcE95XZvI7kxaULWMiq6oLK+vKPm5PmjLtEVpVbvSqLQBK 7Ze27gA8uUpS4a5RXHNet7XRjUR+SP487oYj4teECXAwx6RKg7Fty8MMwEOUgY0nGB8w gQE2GoKLBXbUig/Iu2/hANMVN/ZLs5FGxDuxSdGyrOvwIlp+pll8/sGcmJ0AkaO2zOQn qjrsc9Xwq+FMC7YkJvl9IaV67Lf6h5FuJet7+fqjjU6ldVU5SDmuOKkxRuLfdDHYYTG2 XEV0+YnW4SZhun/uc0bAUFHX2GL5MQORp5pDcg+kIUhlRpRbPF2mx4cvCBHqVW0O90AU Jj9Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=tgUf2IrRb1ST/Ax4oEdzt2M+U6Y0jTsV4gTHJGTmen0=; b=DucRF6RizXZAxSQcXcCy8YcDEBT5JqHgqBqq5xG1VYZZbxI1EyHXPAqx/DTqj/2xlu AEVsMNS8LgjCv2fTp037XCihR8dmh7xIRR5fE/dZA71RahQTi6Y0nTRL9azKLejQfMxp Y76E7r6jhKyI4rW4eiyjt+0w0h3lh9WPYjU5RVY4Ga8IenovNkvCXxqFeee6oPlm56x+ BOviribJZdAuP9BTFOlwarPG+riwj9YSgnF9kuWxXON4/nrgfEojg549yOvI+Ujy95UL kSMx2yBM6no6x+uvan0UsosWTMwcKGEoFJ6hvHbrfzGXM1kVU3Kgmv8CeZ0XTOJnmuo3 EjqA== X-Gm-Message-State: AOAM531lTaDBNvI/7R1i8LFrB4yp0SmSCiO3KKKg/4IzaHQtMB1XtjaM 2y2tUkTgb8M8xVnNXBbfI+0= X-Google-Smtp-Source: ABdhPJwqoCTkSib6/b39gZmwMqHHvXJ+Sz+NTfYiVGz96ES2ogsVaCHYmCY1zrUh8zmAF6nj85kujw== X-Received: by 2002:a05:6214:19e8:: with SMTP id q8mr12774482qvc.107.1642275870372; Sat, 15 Jan 2022 11:44:30 -0800 (PST) Received: from localhost.localdomain (bras-base-hullpq2034w-grc-18-74-15-213-135.dsl.bell.ca. [74.15.213.135]) by smtp.gmail.com with ESMTPSA id n15sm6450147qta.81.2022.01.15.11.44.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 15 Jan 2022 11:44:29 -0800 (PST) From: Luc Pelletier To: bruce.richardson@intel.com, konstantin.ananyev@intel.com Cc: dev@dpdk.org, Luc Pelletier , Xiaoyun Li , stable@dpdk.org Subject: [PATCH] eal: fix unaligned loads/stores in rte_memcpy_generic Date: Sat, 15 Jan 2022 14:41:03 -0500 Message-Id: <20220115194102.444140-1-lucp.at.work@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Calls to rte_memcpy_generic could result in unaligned loads/stores for 1 < n < 16. This is undefined behavior according to the C standard, and it gets flagged by the clang undefined behavior sanitizer. rte_memcpy_generic is called with unaligned src and dst addresses. When 1 < n < 16, the code would cast both src and dst to a qword, dword or word pointer, without verifying the alignment of src/dst. The code was changed to use inline assembly for the load/store operations. Unaligned load/store operations are permitted in x64 assembly. Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time") Cc: Xiaoyun Li Cc: stable@dpdk.org Signed-off-by: Luc Pelletier --- I have just submitted another patch related to rte_memcpy. The changes in the other patch are straightforward. The changes in this patch are likely to require more discussion, so I've decided to keep both patches completely separate. Please let me know if I should have done otherwise. Based on the benchmarks of rte_memcpy that are in the dpdk tree, the changes in this patch do not negatively affect performance. lib/eal/x86/include/rte_memcpy.h | 106 +++++++++++++------------------ 1 file changed, 44 insertions(+), 62 deletions(-) diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h index 1b6c6e585f..d2a6099200 100644 --- a/lib/eal/x86/include/rte_memcpy.h +++ b/lib/eal/x86/include/rte_memcpy.h @@ -45,6 +45,47 @@ extern "C" { static __rte_always_inline void * rte_memcpy(void *dst, const void *src, size_t n); +/** + * Copy bytes from one location to another, + * locations should not overlap. + * Use with unaligned src/dst, and n <= 15. + */ +static __rte_always_inline void * +rte_mov15_or_less_unaligned(void *dst, const void *src, size_t n) +{ + void *ret = dst; + if (n & 0x08) { + asm ("movq (%[src]), %%rax\n" + "movq %%rax, (%[dst])" + : + : [dst] "r" (dst), [src] "r" (src) + : "rax", "memory"); + src = (const uint64_t *)src + 1; + dst = (uint64_t *)dst + 1; + } + if (n & 0x04) { + asm ("movl (%[src]), %%eax\n" + "movl %%eax, (%[dst])" + : + : [dst] "r" (dst), [src] "r" (src) + : "eax", "memory"); + src = (const uint32_t *)src + 1; + dst = (uint32_t *)dst + 1; + } + if (n & 0x02) { + asm ("movw (%[src]), %%ax\n" + "movw %%ax, (%[dst])" + : + : [dst] "r" (dst), [src] "r" (src) + : "ax", "memory"); + src = (const uint16_t *)src + 1; + dst = (uint16_t *)dst + 1; + } + if (n & 0x01) + *(uint8_t *)dst = *(const uint8_t *)src; + return ret; +} + #if defined __AVX512F__ && defined RTE_MEMCPY_AVX512 #define ALIGNMENT_MASK 0x3F @@ -171,8 +212,6 @@ rte_mov512blocks(uint8_t *dst, const uint8_t *src, size_t n) static __rte_always_inline void * rte_memcpy_generic(void *dst, const void *src, size_t n) { - uintptr_t dstu = (uintptr_t)dst; - uintptr_t srcu = (uintptr_t)src; void *ret = dst; size_t dstofss; size_t bits; @@ -181,24 +220,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n) * Copy less than 16 bytes */ if (n < 16) { - if (n & 0x01) { - *(uint8_t *)dstu = *(const uint8_t *)srcu; - srcu = (uintptr_t)((const uint8_t *)srcu + 1); - dstu = (uintptr_t)((uint8_t *)dstu + 1); - } - if (n & 0x02) { - *(uint16_t *)dstu = *(const uint16_t *)srcu; - srcu = (uintptr_t)((const uint16_t *)srcu + 1); - dstu = (uintptr_t)((uint16_t *)dstu + 1); - } - if (n & 0x04) { - *(uint32_t *)dstu = *(const uint32_t *)srcu; - srcu = (uintptr_t)((const uint32_t *)srcu + 1); - dstu = (uintptr_t)((uint32_t *)dstu + 1); - } - if (n & 0x08) - *(uint64_t *)dstu = *(const uint64_t *)srcu; - return ret; + return rte_mov15_or_less_unaligned(dst, src, n); } /** @@ -379,8 +401,6 @@ rte_mov128blocks(uint8_t *dst, const uint8_t *src, size_t n) static __rte_always_inline void * rte_memcpy_generic(void *dst, const void *src, size_t n) { - uintptr_t dstu = (uintptr_t)dst; - uintptr_t srcu = (uintptr_t)src; void *ret = dst; size_t dstofss; size_t bits; @@ -389,25 +409,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n) * Copy less than 16 bytes */ if (n < 16) { - if (n & 0x01) { - *(uint8_t *)dstu = *(const uint8_t *)srcu; - srcu = (uintptr_t)((const uint8_t *)srcu + 1); - dstu = (uintptr_t)((uint8_t *)dstu + 1); - } - if (n & 0x02) { - *(uint16_t *)dstu = *(const uint16_t *)srcu; - srcu = (uintptr_t)((const uint16_t *)srcu + 1); - dstu = (uintptr_t)((uint16_t *)dstu + 1); - } - if (n & 0x04) { - *(uint32_t *)dstu = *(const uint32_t *)srcu; - srcu = (uintptr_t)((const uint32_t *)srcu + 1); - dstu = (uintptr_t)((uint32_t *)dstu + 1); - } - if (n & 0x08) { - *(uint64_t *)dstu = *(const uint64_t *)srcu; - } - return ret; + return rte_mov15_or_less_unaligned(dst, src, n); } /** @@ -672,8 +674,6 @@ static __rte_always_inline void * rte_memcpy_generic(void *dst, const void *src, size_t n) { __m128i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8; - uintptr_t dstu = (uintptr_t)dst; - uintptr_t srcu = (uintptr_t)src; void *ret = dst; size_t dstofss; size_t srcofs; @@ -682,25 +682,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n) * Copy less than 16 bytes */ if (n < 16) { - if (n & 0x01) { - *(uint8_t *)dstu = *(const uint8_t *)srcu; - srcu = (uintptr_t)((const uint8_t *)srcu + 1); - dstu = (uintptr_t)((uint8_t *)dstu + 1); - } - if (n & 0x02) { - *(uint16_t *)dstu = *(const uint16_t *)srcu; - srcu = (uintptr_t)((const uint16_t *)srcu + 1); - dstu = (uintptr_t)((uint16_t *)dstu + 1); - } - if (n & 0x04) { - *(uint32_t *)dstu = *(const uint32_t *)srcu; - srcu = (uintptr_t)((const uint32_t *)srcu + 1); - dstu = (uintptr_t)((uint32_t *)dstu + 1); - } - if (n & 0x08) { - *(uint64_t *)dstu = *(const uint64_t *)srcu; - } - return ret; + return rte_mov15_or_less_unaligned(dst, src, n); } /** -- 2.25.1