From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <stable-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 3791CA034C
	for <public@inbox.dpdk.org>; Fri, 25 Feb 2022 17:12:40 +0100 (CET)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 01B054068B;
	Fri, 25 Feb 2022 17:12:40 +0100 (CET)
Received: from mail-qv1-f54.google.com (mail-qv1-f54.google.com
 [209.85.219.54])
 by mails.dpdk.org (Postfix) with ESMTP id 80B824068B;
 Fri, 25 Feb 2022 17:12:39 +0100 (CET)
Received: by mail-qv1-f54.google.com with SMTP id a1so7028540qvl.6;
 Fri, 25 Feb 2022 08:12:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=from:to:cc:subject:date:message-id:in-reply-to:references
 :mime-version:content-transfer-encoding;
 bh=7vxYFclzAIjtEXwkF1WvJkFpiy3WJPMsqaaDnriSo1Q=;
 b=EnBREGRBKaqOlWrEe/kFcNGVgFL3MnyDhgOBXM/v91SWDu0paN991A5UOaJyfhnwlS
 HIg5S4ojVVhTag92PedWD8dU7mqYAkop5fHGOk/xu4k+Q6ezYbtmE/Nb+FaNMl2Ye2mn
 sELfLjpemegM2ewnoRw2MWo40jbyTbuQJi1tKXWbVfUTOgYEPY4kbdnjyl49o5Iy+Ups
 62WlqYoLB5aWv+Ok+M9EVeRM5wPulhwqIt4wErKQxX1zYq6MHdBKwVhNaLa0ShZ0K9FI
 fFRBwHoth8ihXrYO5sqjKdKK3GCWqS7lZOJiYIQKf/YAm8/BaeZ63Y6uISydSfxw+lfw
 z4Xg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to
 :references:mime-version:content-transfer-encoding;
 bh=7vxYFclzAIjtEXwkF1WvJkFpiy3WJPMsqaaDnriSo1Q=;
 b=lRyVyr1ysQwLkARjcSH38zsTDiQvDKou5pBnH50M8XVf+ser0wvGVds3Q395dAy6bH
 2U4zxUQPtRW0k0bGz7Ri9a76Hx5ZmjqUqKDqUeKmk9GHwthfTBBYLdcplVxc6hBqTAiz
 zN3zDqDta8ouqXu/Fd4xScyQAk74D7UWEyvf2aGnXSDf8ccdRN4kZ0aNdOUmOzKwOkT1
 75q42gifkTBEYIbeKOfbhbVybjbP50zNtbvU7uoqtF+ATRwBevysFxxb7y6USjZNkNDT
 osxUF9XwdW1cGLsfX7AIinq+MPj7/6LO6hcXYrc2jzRTPD6eNMYYn768CDKbm0sZuiNI
 9sqg==
X-Gm-Message-State: AOAM530OMVe3C/R3kit/1Z6uhXHpTnTWgqYSW+DveSuRCKUhOmS4f704
 4OT/uMZjT6KhbwNte5mCfdA=
X-Google-Smtp-Source: ABdhPJzSOPkP1zIJADN3sMQuaQQewR3j1O0ugoEmA0wnXVGl3MiZ9yGMkfuZ0/fIYANyL2WDrRQlcg==
X-Received: by 2002:a0c:90c9:0:b0:432:3dbf:5a68 with SMTP id
 p67-20020a0c90c9000000b004323dbf5a68mr6581315qvp.2.1645805558877; 
 Fri, 25 Feb 2022 08:12:38 -0800 (PST)
Received: from localhost.localdomain
 (bras-base-hullpq2034w-grc-11-74-14-174-69.dsl.bell.ca. [74.14.174.69])
 by smtp.gmail.com with ESMTPSA id
 d16-20020ac85ad0000000b002d71c463d9csm1703018qtd.24.2022.02.25.08.12.38
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Fri, 25 Feb 2022 08:12:38 -0800 (PST)
From: Luc Pelletier <lucp.at.work@gmail.com>
To: bruce.richardson@intel.com,
	konstantin.ananyev@intel.com
Cc: dev@dpdk.org, Luc Pelletier <lucp.at.work@gmail.com>,
 Xiaoyun Li <xiaoyun.li@intel.com>, stable@dpdk.org
Subject: [PATCH v6] eal: fix rte_memcpy strict aliasing/alignment bugs
Date: Fri, 25 Feb 2022 10:51:28 -0500
Message-Id: <20220225155127.503849-1-lucp.at.work@gmail.com>
In-Reply-To: <20220115194102.444140-1-lucp.at.work@gmail.com>
References: <20220115194102.444140-1-lucp.at.work@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-BeenThere: stable@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: patches for DPDK stable branches <stable.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/stable>,
 <mailto:stable-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/stable/>
List-Post: <mailto:stable@dpdk.org>
List-Help: <mailto:stable-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/stable>,
 <mailto:stable-request@dpdk.org?subject=subscribe>
Errors-To: stable-bounces@dpdk.org

Calls to rte_memcpy for 1 < n < 16 could result in unaligned
loads/stores, which is undefined behaviour according to the C
standard, and strict aliasing violations.

The code was changed to use a packed structure that allows aliasing
(using the __may_alias__ attribute) to perform the load/store
operations. This results in code that has the same performance as the
original code and that is also C standard-compliant.

Fixes: d35cc1fe6a7a ("eal/x86: revert select optimized memcpy at run-time")
Cc: Xiaoyun Li <xiaoyun.li@intel.com>
Cc: stable@dpdk.org

Signed-off-by: Luc Pelletier <lucp.at.work@gmail.com>
---
v6:
* Refocus to fix strict aliasing problems discovered following
discussions in this thread.
* Modified the code to use __may_alias__ and packed structure. This fixes
both the undefined behaviour of unaligned access (which is not the main
concern), and also fixes the strict aliasing violations (which can cause
major bugs, as demonstrated in a previous message in this thread).
* Renamed new function from rte_mov15_or_less_unaligned to
rte_mov15_or_less.
* Modified code that copies <= 15 bytes to call rte_mov15_or_less.

v5:
* Replaced assembly with pure C code that uses a packed structure to make
unaligned loads conform to C standard.

v4:
* Added volatile to asm statements, which is required under gcc.

v3:
* Removed for loop and went back to using assembly.

v2:
* Replaced assembly with a regular for loop that copies bytes one by
one.

v1:
* Fix undefined behaviour of unaligned stores/loads by using assembly
to perform stores/loads.

 lib/eal/x86/include/rte_memcpy.h | 133 ++++++++++++-------------------
 1 file changed, 51 insertions(+), 82 deletions(-)

diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h
index 1b6c6e585f..360b7e069e 100644
--- a/lib/eal/x86/include/rte_memcpy.h
+++ b/lib/eal/x86/include/rte_memcpy.h
@@ -45,6 +45,52 @@ extern "C" {
 static __rte_always_inline void *
 rte_memcpy(void *dst, const void *src, size_t n);
 
+/**
+ * Copy bytes from one location to another,
+ * locations should not overlap.
+ * Use with n <= 15.
+ */
+static __rte_always_inline void *
+rte_mov15_or_less(void *dst, const void *src, size_t n)
+{
+	/**
+	 * Use the following structs to avoid violating C standard
+	 * alignment requirements and to avoid strict aliasing bugs
+	 */
+	struct rte_uint64_alias {
+		uint64_t val;
+	} __attribute__((__may_alias__, __packed__));
+	struct rte_uint32_alias {
+		uint32_t val;
+	} __attribute__((__may_alias__, __packed__));
+	struct rte_uint16_alias {
+		uint16_t val;
+	} __attribute__((__may_alias__, __packed__));
+
+	void *ret = dst;
+	if (n & 8) {
+		((struct rte_uint64_alias *)dst)->val =
+			((const struct rte_uint64_alias *)src)->val;
+		src = (const uint64_t *)src + 1;
+		dst = (uint64_t *)dst + 1;
+	}
+	if (n & 4) {
+		((struct rte_uint32_alias *)dst)->val =
+			((const struct rte_uint32_alias *)src)->val;
+		src = (const uint32_t *)src + 1;
+		dst = (uint32_t *)dst + 1;
+	}
+	if (n & 2) {
+		((struct rte_uint16_alias *)dst)->val =
+			((const struct rte_uint16_alias *)src)->val;
+		src = (const uint16_t *)src + 1;
+		dst = (uint16_t *)dst + 1;
+	}
+	if (n & 1)
+		*(uint8_t *)dst = *(const uint8_t *)src;
+	return ret;
+}
+
 #if defined __AVX512F__ && defined RTE_MEMCPY_AVX512
 
 #define ALIGNMENT_MASK 0x3F
@@ -171,8 +217,6 @@ rte_mov512blocks(uint8_t *dst, const uint8_t *src, size_t n)
 static __rte_always_inline void *
 rte_memcpy_generic(void *dst, const void *src, size_t n)
 {
-	uintptr_t dstu = (uintptr_t)dst;
-	uintptr_t srcu = (uintptr_t)src;
 	void *ret = dst;
 	size_t dstofss;
 	size_t bits;
@@ -181,24 +225,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
 	 * Copy less than 16 bytes
 	 */
 	if (n < 16) {
-		if (n & 0x01) {
-			*(uint8_t *)dstu = *(const uint8_t *)srcu;
-			srcu = (uintptr_t)((const uint8_t *)srcu + 1);
-			dstu = (uintptr_t)((uint8_t *)dstu + 1);
-		}
-		if (n & 0x02) {
-			*(uint16_t *)dstu = *(const uint16_t *)srcu;
-			srcu = (uintptr_t)((const uint16_t *)srcu + 1);
-			dstu = (uintptr_t)((uint16_t *)dstu + 1);
-		}
-		if (n & 0x04) {
-			*(uint32_t *)dstu = *(const uint32_t *)srcu;
-			srcu = (uintptr_t)((const uint32_t *)srcu + 1);
-			dstu = (uintptr_t)((uint32_t *)dstu + 1);
-		}
-		if (n & 0x08)
-			*(uint64_t *)dstu = *(const uint64_t *)srcu;
-		return ret;
+		return rte_mov15_or_less(dst, src, n);
 	}
 
 	/**
@@ -379,8 +406,6 @@ rte_mov128blocks(uint8_t *dst, const uint8_t *src, size_t n)
 static __rte_always_inline void *
 rte_memcpy_generic(void *dst, const void *src, size_t n)
 {
-	uintptr_t dstu = (uintptr_t)dst;
-	uintptr_t srcu = (uintptr_t)src;
 	void *ret = dst;
 	size_t dstofss;
 	size_t bits;
@@ -389,25 +414,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
 	 * Copy less than 16 bytes
 	 */
 	if (n < 16) {
-		if (n & 0x01) {
-			*(uint8_t *)dstu = *(const uint8_t *)srcu;
-			srcu = (uintptr_t)((const uint8_t *)srcu + 1);
-			dstu = (uintptr_t)((uint8_t *)dstu + 1);
-		}
-		if (n & 0x02) {
-			*(uint16_t *)dstu = *(const uint16_t *)srcu;
-			srcu = (uintptr_t)((const uint16_t *)srcu + 1);
-			dstu = (uintptr_t)((uint16_t *)dstu + 1);
-		}
-		if (n & 0x04) {
-			*(uint32_t *)dstu = *(const uint32_t *)srcu;
-			srcu = (uintptr_t)((const uint32_t *)srcu + 1);
-			dstu = (uintptr_t)((uint32_t *)dstu + 1);
-		}
-		if (n & 0x08) {
-			*(uint64_t *)dstu = *(const uint64_t *)srcu;
-		}
-		return ret;
+		return rte_mov15_or_less(dst, src, n);
 	}
 
 	/**
@@ -672,8 +679,6 @@ static __rte_always_inline void *
 rte_memcpy_generic(void *dst, const void *src, size_t n)
 {
 	__m128i xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm8;
-	uintptr_t dstu = (uintptr_t)dst;
-	uintptr_t srcu = (uintptr_t)src;
 	void *ret = dst;
 	size_t dstofss;
 	size_t srcofs;
@@ -682,25 +687,7 @@ rte_memcpy_generic(void *dst, const void *src, size_t n)
 	 * Copy less than 16 bytes
 	 */
 	if (n < 16) {
-		if (n & 0x01) {
-			*(uint8_t *)dstu = *(const uint8_t *)srcu;
-			srcu = (uintptr_t)((const uint8_t *)srcu + 1);
-			dstu = (uintptr_t)((uint8_t *)dstu + 1);
-		}
-		if (n & 0x02) {
-			*(uint16_t *)dstu = *(const uint16_t *)srcu;
-			srcu = (uintptr_t)((const uint16_t *)srcu + 1);
-			dstu = (uintptr_t)((uint16_t *)dstu + 1);
-		}
-		if (n & 0x04) {
-			*(uint32_t *)dstu = *(const uint32_t *)srcu;
-			srcu = (uintptr_t)((const uint32_t *)srcu + 1);
-			dstu = (uintptr_t)((uint32_t *)dstu + 1);
-		}
-		if (n & 0x08) {
-			*(uint64_t *)dstu = *(const uint64_t *)srcu;
-		}
-		return ret;
+		return rte_mov15_or_less(dst, src, n);
 	}
 
 	/**
@@ -818,27 +805,9 @@ rte_memcpy_aligned(void *dst, const void *src, size_t n)
 {
 	void *ret = dst;
 
-	/* Copy size <= 16 bytes */
+	/* Copy size < 16 bytes */
 	if (n < 16) {
-		if (n & 0x01) {
-			*(uint8_t *)dst = *(const uint8_t *)src;
-			src = (const uint8_t *)src + 1;
-			dst = (uint8_t *)dst + 1;
-		}
-		if (n & 0x02) {
-			*(uint16_t *)dst = *(const uint16_t *)src;
-			src = (const uint16_t *)src + 1;
-			dst = (uint16_t *)dst + 1;
-		}
-		if (n & 0x04) {
-			*(uint32_t *)dst = *(const uint32_t *)src;
-			src = (const uint32_t *)src + 1;
-			dst = (uint32_t *)dst + 1;
-		}
-		if (n & 0x08)
-			*(uint64_t *)dst = *(const uint64_t *)src;
-
-		return ret;
+		return rte_mov15_or_less(dst, src, n);
 	}
 
 	/* Copy 16 <= size <= 32 bytes */
-- 
2.35.1