From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id BE042A00C3; Mon, 17 Jan 2022 16:41:07 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5073641233; Mon, 17 Jan 2022 16:41:07 +0100 (CET) Received: from mail-qv1-f41.google.com (mail-qv1-f41.google.com [209.85.219.41]) by mails.dpdk.org (Postfix) with ESMTP id 4C81B4122E; Mon, 17 Jan 2022 16:41:06 +0100 (CET) Received: by mail-qv1-f41.google.com with SMTP id hu2so18705106qvb.8; Mon, 17 Jan 2022 07:41:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=R1g/2hDu5X1GuX4Jl5MHDleJsOkOMi/OC6V85T6XM2s=; b=p8LueVXqZFEjii3aQPJi1hVt39FKABaMu6cVpm/XOU6rYI6gRtJuevYsHaucxzEM3f ZznpDCfeydzisIUpTT9qQ+FTEaG8SVIKSYIFkqFE046Ijxa8qEwzp39FLkZRe4g451GA aMeeYxUdPzmEMqClPHqZYkhEmSz1IQTbg/X7dBiy6+eqc23m3NHAL9K9YxN1ZMv4vuRD Ge70nzVR6tR4I/g6ff3WZ/sxl1qF4yC/iNj0/NDNS7gi3gV9RqjsqSQ6RejZzcC+GxTz +xpxMnmSYBKXoSr6pXp9muFhDRtFT5iVQp6HPpmAl8fdH9+SD0c5tPT7F0T/k89/jexu dR2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=R1g/2hDu5X1GuX4Jl5MHDleJsOkOMi/OC6V85T6XM2s=; b=UL0m+9sxlbhmP1AnjxAkDlRNb49/EHp3CttrUchLCTqOLCPT9eAQUfCUbxc33CXiRn X52GfgZqKUMw1I08gSW6RkBZ1HPQzOw4sbk+GGbuDxYPnMS5vT2R0c8Zqwn0J5acReZ3 JjxxBt8zlQyaau2DZpgJ0CzFvtB3gPLAuFwdZZT3ec6Pj85GZ45vsu72JR1N38Q+4VND 1v1G9qCRcs6kyOs8neD5ZS3suXLLU2ZJOanro5sYyfAtQs/GWdYW7Uw4LObBkIQfgYj/ dE0mFuHse1BQ4K2fGXUTCdQ5995xYHsuJvMvJkxGHihMtgKXYCTnbGBdb76gufpqukFk NHCg== X-Gm-Message-State: AOAM533IQ/VlDZ/s0OaEjJTUSP8QE/P+3JnuiSJSkiFjFy2l9MPtAHYi AaRLyPWa6B/F7Zll3cn+1pw= X-Google-Smtp-Source: ABdhPJyG5taSUpt2UHuLspZCLy0ieCWoM4cKhOqUi90GQw5RFJbKnMva27+vFjz8RHD3xn/jk3ReJA== X-Received: by 2002:ad4:5f09:: with SMTP id fo9mr11419575qvb.43.1642434065634; Mon, 17 Jan 2022 07:41:05 -0800 (PST) Received: from localhost.localdomain (bras-base-hullpq2034w-grc-18-74-15-213-135.dsl.bell.ca. [74.15.213.135]) by smtp.gmail.com with ESMTPSA id d3sm3618267qte.76.2022.01.17.07.41.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 17 Jan 2022 07:41:04 -0800 (PST) From: Luc Pelletier To: bruce.richardson@intel.com, konstantin.ananyev@intel.com Cc: dev@dpdk.org, Luc Pelletier , Xiaoyun Li , stable@dpdk.org Subject: [PATCH v5] eal: fix unaligned loads/stores in rte_memcpy_generic Date: Mon, 17 Jan 2022 10:37:12 -0500 Message-Id: <20220117153711.32829-1-lucp.at.work@gmail.com> In-Reply-To: <20220115194102.444140-1-lucp.at.work@gmail.com> References: <20220115194102.444140-1-lucp.at.work@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Calls to rte_memcpy_generic could result in unaligned loads/stores for 1 < n < 16. This is undefined behavior according to the C standard, and it gets flagged by the clang undefined behavior sanitizer. rte_memcpy_generic is called with unaligned src and dst addresses. When 1 < n < 16, the code would cast both src and dst to a qword, dword or word pointer, without verifying the alignment of src/dst. The code was changed to use a packed structure to perform the unaligned load/store operations. This results in unaligned load/store operations to be C standards-compliant. Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time") Cc: Xiaoyun Li Cc: stable@dpdk.org Signed-off-by: Luc Pelletier --- Thanks to Stephen's pointer to look at the linux kernel, I was able to find a way to perform the unaligned load/store using pure C code. The new functions added to perform the load/store could likely be moved to a different file and the code duplication could likely be eliminated by using a macro. However, I will hold off on making these changes until I get confirmation from maintainers that this technique is acceptable and this is what we want to move forward with. lib/eal/x86/include/rte_memcpy.h | 142 +++++++++++++++++-------------- 1 file changed, 80 insertions(+), 62 deletions(-) diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h index 1b6c6e585f..4e876d39eb 100644 --- a/lib/eal/x86/include/rte_memcpy.h +++ b/lib/eal/x86/include/rte_memcpy.h @@ -45,6 +45,83 @@ extern "C" { static __rte_always_inline void * rte_memcpy(void *dst, const void *src, size_t n); +static __rte_always_inline uint64_t +rte_load_unaligned_uint64(const void *ptr) +{ + struct unaligned_uint64 { uint64_t val; } __rte_packed; + return ((const struct unaligned_uint64 *)ptr)->val; +} + +static __rte_always_inline uint32_t +rte_load_unaligned_uint32(const void *ptr) +{ + struct unaligned_uint32 { uint32_t val; } __rte_packed; + return ((const struct unaligned_uint32 *)ptr)->val; +} + +static __rte_always_inline uint16_t +rte_load_unaligned_uint16(const void *ptr) +{ + struct unaligned_uint16 { uint16_t val; } __rte_packed; + return ((const struct unaligned_uint16 *)ptr)->val; +} + +static __rte_always_inline void +rte_store_unaligned_uint64(void *ptr, uint64_t val) +{ + struct unaligned_uint64 { uint64_t val; } __rte_packed; + ((struct unaligned_uint64 *)ptr)->val = val; +} + +static __rte_always_inline void +rte_store_unaligned_uint32(void *ptr, uint32_t val) +{ + struct unaligned_uint32 { uint32_t val; } __rte_packed; + ((struct unaligned_uint32 *)ptr)->val = val; +} + +static __rte_always_inline void +rte_store_unaligned_uint16(void *ptr, uint16_t val) +{ + struct unaligned_uint16 { uint16_t val; } __rte_packed; + ((struct unaligned_uint16 *)ptr)->val = val; +} + +/** + * Copy bytes from one location to another, + * locations should not overlap. + * Use with unaligned src/dst, and n <= 15. + */ +static __rte_always_inline void * +rte_mov15_or_less_unaligned(void *dst, const void *src, size_t n) +{ + void *ret = dst; + if (n & 8) { + rte_store_unaligned_uint64( + dst, + rte_load_unaligned_uint64(src)); + src = ((const uint64_t *)src + 1); + dst = ((uint64_t *)dst + 1); + } + if (n & 4) { + rte_store_unaligned_uint32( + dst, + rte_load_unaligned_uint32(src)); + src = ((const uint32_t *)src + 1); + dst = ((uint32_t *)dst + 1); + } + if (n & 2) { + rte_store_unaligned_uint16( + dst, + rte_load_unaligned_uint16(src)); + src = ((const uint16_t *)src + 1); + dst = ((uint16_t *)dst + 1); + } + if (n & 1) + *(uint8_t *)dst = *(const uint8_t *)src; + return ret; +} + #if defined __AVX512F__ && defined RTE_MEMCPY_AVX512 #define ALIGNMENT_MASK 0x3F @@ -171,8 +248,6 @@ rte_mov512blocks(uint8_t *dst, const uint8_t *src, size_t n) static __rte_always_inline void * rte_memcpy_generic(void *dst, const void *src, size_t n) { - uintptr_t dstu = (uintptr_t)dst; - uintptr_t srcu = (uintptr_t)src; void *ret = dst; size_t dstofss; size_t bits; @@ -181,24 +256,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n) * Copy less than 16 bytes */ if (n < 16) { - if (n & 0x01) { - *(uint8_t *)dstu = *(const uint8_t *)srcu; - srcu = (uintptr_t)((const uint8_t *)srcu + 1); - dstu = (uintptr_t)((uint8_t *)dstu + 1); - } - if (n & 0x02) { - *(uint16_t *)dstu = *(const uint16_t *)srcu; - srcu = (uintptr_t)((const uint16_t *)srcu + 1); - dstu = (uintptr_t)((uint16_t *)dstu + 1); - } - if (n & 0x04) { - *(uint32_t *)dstu = *(const uint32_t *)srcu; - srcu = (uintptr_t)((const uint32_t *)srcu + 1); - dstu = (uintptr_t)((uint32_t *)dstu + 1); - } - if (n & 0x08) - *(uint64_t *)dstu = *(const uint64_t *)srcu; - return ret; + return rte_mov15_or_less_unaligned(dst, src, n); } /** @@ -379,8 +437,6 @@ rte_mov128blocks(uint8_t *dst, const uint8_t *src, size_t n) static __rte_always_inline void * rte_memcpy_generic(void *dst, const void *src, size_t n) { - uintptr_t dstu = (uintptr_t)dst; - uintptr_t srcu = (uintptr_t)src; void *ret = dst; size_t dstofss; size_t bits; @@ -389,25 +445,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n) * Copy less than 16 bytes */ if (n < 16) { - if (n & 0x01) { - *(uint8_t *)dstu = *(const uint8_t *)srcu; - srcu = (uintptr_t)((const uint8_t *)srcu + 1); - dstu = (uintptr_t)((uint8_t *)dstu + 1); - } - if (n & 0x02) { - *(uint16_t *)dstu = *(const uint16_t *)srcu; - srcu = (uintptr_t)((const uint16_t *)srcu + 1); - dstu = (uintptr_t)((uint16_t *)dstu + 1); - } - if (n & 0x04) { - *(uint32_t *)dstu = *(const uint32_t *)srcu; - srcu = (uintptr_t)((const uint32_t *)srcu + 1); - dstu = (uintptr_t)((uint32_t *)dstu + 1); - } - if (n & 0x08) { - *(uint64_t *)dstu = *(const uint64_t *)srcu; - } - return ret; + return rte_mov15_or_less_unaligned(dst, src, n); } /** @@ -672,8 +710,6 @@ static __rte_always_inline void * rte_memcpy_generic(void *dst, const void *src, size_t n) { __m128i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8; - uintptr_t dstu = (uintptr_t)dst; - uintptr_t srcu = (uintptr_t)src; void *ret = dst; size_t dstofss; size_t srcofs; @@ -682,25 +718,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n) * Copy less than 16 bytes */ if (n < 16) { - if (n & 0x01) { - *(uint8_t *)dstu = *(const uint8_t *)srcu; - srcu = (uintptr_t)((const uint8_t *)srcu + 1); - dstu = (uintptr_t)((uint8_t *)dstu + 1); - } - if (n & 0x02) { - *(uint16_t *)dstu = *(const uint16_t *)srcu; - srcu = (uintptr_t)((const uint16_t *)srcu + 1); - dstu = (uintptr_t)((uint16_t *)dstu + 1); - } - if (n & 0x04) { - *(uint32_t *)dstu = *(const uint32_t *)srcu; - srcu = (uintptr_t)((const uint32_t *)srcu + 1); - dstu = (uintptr_t)((uint32_t *)dstu + 1); - } - if (n & 0x08) { - *(uint64_t *)dstu = *(const uint64_t *)srcu; - } - return ret; + return rte_mov15_or_less_unaligned(dst, src, n); } /** -- 2.25.1