From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 57AC1459D7;
	Fri, 20 Sep 2024 12:36:51 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 9AB99433C6;
	Fri, 20 Sep 2024 12:36:40 +0200 (CEST)
Received: from EUR05-DB8-obe.outbound.protection.outlook.com
 (mail-db8eur05on2076.outbound.protection.outlook.com [40.107.20.76])
 by mails.dpdk.org (Postfix) with ESMTP id 115AA402AE
 for <dev@dpdk.org>; Fri, 20 Sep 2024 12:36:37 +0200 (CEST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=KWQ2+D9uTouXKTnxeDz4tl7pajusdKqct+3pqxe6POKp7v8X3M9CDY3nmy+mv1dgvN7z3eAuF4o/NdaKdpqXqfFFfgBCkj5KN7o8IXDKBaKDI9ZlKZZrBbqlgCTuMPF/B6Fsgl6pnVlS/b7pPvVzd9I8BSgUoUdSAfQRrRft8x3jlbVXS8pQsD9yD4tNAufeiPjiwx+/2hvBAWstlGC3xiDbSPSl0FDAqyObxWfIeCRAowbUr0Uwe6wnRetmEtlrku/Pwj/zQ36N5U9g01DwbzNE6U6VROTjz1JSJdWgMBuwidDo6HqZ8wbM4rISy2cq4v5cNFt05wP1S3pIIs/5aQ==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=NhzMIhddzxFM+bSw9avxtstYMiXUXzVe7s++98BlQMU=;
 b=LIB2EmYV79QQTJDYVRxEvhC1fa2s8ZcZtgalMnj4+9qC0wlHS6CLCFv8Ib8s6si9HrFopY+u0KP8D6e12Cv47y4sF3DSW1j1OOecIRJSbiy644449axOLfUVrNgAAnkN3l1l4i7mbBxU1taQ641Xof9bS0a3L8PXu7HGmylLGNrLXG0z0iLT3Z6AAcgZeFGHfSWFmXe04J02DuAENm2FZU99bINhK3Y62itWK+nc9RGsoEP0rCRSIzDT8PaArj9zA+C/E1F6O11Gk6cxBOWbnW0YE1qK8dZArJDkEc5J1hnW+9IJC7ge2h2V2DRKP7IcePybiI3kAgWpw+T+ytsvTQ==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is
 192.176.1.74) smtp.rcpttodomain=dpdk.org smtp.mailfrom=ericsson.com;
 dmarc=pass (p=reject sp=reject pct=100) action=none header.from=ericsson.com; 
 dkim=none (message not signed); arc=none (0)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com;
 s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=NhzMIhddzxFM+bSw9avxtstYMiXUXzVe7s++98BlQMU=;
 b=yzg8DakZ/YrytuYEYfIO1bShy4+dzLSVuq8BYWDZDE0B2pAZFbulII5ClChD5PnMOFyf36oOAkZwRMxPhcatQpPXrzWA26EVAn7PRYp2EhUG9/vHlJYRTeMCzO6+P8ZLgoihW52R4LFsUXrpJtztmJtPDMswYtOle2b2X28jE3qTu8vAwCySqIFHCEXCbuEJYh27nUf3ZzJaj6Xcx/onQLFFs0BMTYqYnc4UGCTb+h7j5PONhgDmWgsjGEDOAW2oaPic43+r2Bn4yoAfX81kEymJpbHPho0Wq+c7qnsNmtzxPulsdkbLdW8ByV4PTrsqj8K/PLsb62hCTvE/pPJLaw==
Received: from AM6PR01CA0070.eurprd01.prod.exchangelabs.com
 (2603:10a6:20b:e0::47) by GV1PR07MB9119.eurprd07.prod.outlook.com
 (2603:10a6:150:8a::15) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7982.16; Fri, 20 Sep
 2024 10:36:33 +0000
Received: from AM3PEPF0000A79A.eurprd04.prod.outlook.com
 (2603:10a6:20b:e0:cafe::8b) by AM6PR01CA0070.outlook.office365.com
 (2603:10a6:20b:e0::47) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7939.30 via Frontend
 Transport; Fri, 20 Sep 2024 10:36:33 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 192.176.1.74)
 smtp.mailfrom=ericsson.com; dkim=none (message not signed)
 header.d=none;dmarc=pass action=none header.from=ericsson.com;
Received-SPF: Pass (protection.outlook.com: domain of ericsson.com designates
 192.176.1.74 as permitted sender)
 receiver=protection.outlook.com; 
 client-ip=192.176.1.74; helo=oa.msg.ericsson.com; pr=C
Received: from oa.msg.ericsson.com (192.176.1.74) by
 AM3PEPF0000A79A.mail.protection.outlook.com (10.167.16.105) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.7918.13 via Frontend Transport; Fri, 20 Sep 2024 10:36:33 +0000
Received: from seliicinfr00050.seli.gic.ericsson.se (153.88.142.248) by
 smtp-central.internal.ericsson.com (100.87.178.65) with Microsoft SMTP Server
 id 15.2.1544.11; Fri, 20 Sep 2024 12:36:32 +0200
Received: from breslau.. (seliicwb00002.seli.gic.ericsson.se [10.156.25.100])
 by seliicinfr00050.seli.gic.ericsson.se (Postfix) with ESMTP id
 B2CFD1C006B; Fri, 20 Sep 2024 12:36:32 +0200 (CEST)
From: =?UTF-8?q?Mattias=20R=C3=B6nnblom?= <mattias.ronnblom@ericsson.com>
To: <dev@dpdk.org>
CC: =?UTF-8?q?Mattias=20R=C3=B6nnblom?= <hofors@lysator.liu.se>,
 =?UTF-8?q?Morten=20Br=C3=B8rup?= <mb@smartsharesystems.com>, "Stephen
 Hemminger" <stephen@networkplumber.org>, David Marchand
 <david.marchand@redhat.com>, Pavan Nikhilesh <pbhagavatula@marvell.com>,
 Bruce Richardson <bruce.richardson@intel.com>,
 =?UTF-8?q?Mattias=20R=C3=B6nnblom?= <mattias.ronnblom@ericsson.com>
Subject: [PATCH v6 7/7] vhost: optimize memcpy routines when cc memcpy is used
Date: Fri, 20 Sep 2024 12:27:16 +0200
Message-ID: <20240920102716.738940-8-mattias.ronnblom@ericsson.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240920102716.738940-1-mattias.ronnblom@ericsson.com>
References: <20240724075357.546248-2-mattias.ronnblom@ericsson.com>
 <20240920102716.738940-1-mattias.ronnblom@ericsson.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
X-EOPAttributedMessage: 0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: AM3PEPF0000A79A:EE_|GV1PR07MB9119:EE_
X-MS-Office365-Filtering-Correlation-Id: 03f8348e-7a3b-472a-ae2c-08dcd960187f
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
 ARA:13230040|82310400026|376014|36860700013|1800799024; 
X-Microsoft-Antispam-Message-Info: =?utf-8?B?TDI4SDhPbWI5SFg1Szdoa2I1ajlmTVhDaHJrMktRQXZwWUYwVVJZYlI2TXVy?=
 =?utf-8?B?bG56SjA0b2U5NFpualdsNXU5QTRWTnFHWlFCc3VJWWZWUnpwNG9BZk9CeHUr?=
 =?utf-8?B?TTZ5V2pnZVNYRUt1K0pDNC9Db2phMWVhM3UxaC9wbm5LVVMwUkJCQlhzSUhY?=
 =?utf-8?B?MHUxOFB3eTQrVnlHWldrYitYS21MZWcrbG5UcWUrbWp1cTF2ZmhEOTNKUFVW?=
 =?utf-8?B?Rzl6VHVJeUpiSm1Ucm5kWE1UazZXdEhqVElYMWRDa01HNitxWExOUlY5TWFD?=
 =?utf-8?B?U2tFTVZEekJSenRSWEZOTDc2TW1oMHduWnYzTFZDelRrRy9XN0dmM0NlT010?=
 =?utf-8?B?OG5DUVJNbkpDclVOUEtXRVJjQXl4RG9Tbm94dkxSL3FIZmZVMU1oS1ROYStO?=
 =?utf-8?B?WTRvMFpQMWdDYUZrNFEzRFRkUmxueGVKdkxNZk1laExINmc5ZEM3N0RNY3E0?=
 =?utf-8?B?aGtES2N2YXdxS0pzUFhFekZGb0s1UTFhY0FuaEY0L1duRWxscW8zZmovRWsy?=
 =?utf-8?B?WHJOeXlnc2xkd2xaQ1pBRXUzSzhCQWVtM2dZWW5UNzcybTBKTUhoUENXLy9W?=
 =?utf-8?B?VWQxSlowai8zVnQ0ZVVUNnZldEQrbkM2dGYyeUdnZUhnbmJKL2l1UnQxUHdk?=
 =?utf-8?B?Rk9ZazhHMTdDSXNJa2NwT3FzTDVpUHh2eUJTV3VTem01Z1RxZGpHQW9QNUIv?=
 =?utf-8?B?T2pxS0d2RHhzbVB3KzhyVU5YQlJseHRJMHFVWVlBVXpkbXdweDNZamZRQ3pZ?=
 =?utf-8?B?bG1iUHFCTTJkN1BKQm1hUi9DMWxHUE11ajQyNHpTTXlzTFhnM09VZWp0V2s3?=
 =?utf-8?B?TUEvdEJTWnRYeHRyR0pYZUlEc0greHlmY3B1alFCeCtLeXdYbWlMOVFIUFlJ?=
 =?utf-8?B?RGtiT1lDb0I3TXQ3L0s3aXh2Z3lJR3NEUUJTR0FBWEtmUFpweU5aWnkvaDY1?=
 =?utf-8?B?SHBiMDU4dkpkWm9VMGY2eHEwUzJUaW54bi9sdVlmUW5EV1lQS2RMOEtjNitz?=
 =?utf-8?B?N1BaQ3FJSVhFcTF6V3IvVHFyekVnS3JBSnVhN2tyZHJiMjBORnRHY29BYm9E?=
 =?utf-8?B?MnhQS1BEMXg4WlJqZGwzSDJTNk1UOWhrcDVyQW15TE5rQ29wZ0ZXZGo5ZXZa?=
 =?utf-8?B?QWtnWVF0QTNZcVFobWplVW1meDUrcjNOeWFZcFNJNmRJYlhuaC9DdElScVBy?=
 =?utf-8?B?OWJxaHFjZmZoLzVWeVU0T2FTSDBtalM2Qzd2MlRUc01wbDFFZ3MwSTh3MHNX?=
 =?utf-8?B?ZlNPOU1KVG5pVHNsRGVZSDB6b09Eb1dNd2E4dlBEdklIKzdEeW1MK2ptYWl6?=
 =?utf-8?B?eHRHN1BOb3M5TWJsWGFWTWMxdENmZkdWTkdid2FqQnpodDdCODh6L1pjWjA2?=
 =?utf-8?B?R21RcG43SkJ4WFJPSGVYSytVblB2YzlOQWxOeEwySFpheFh1M2Ftd0tQTHJR?=
 =?utf-8?B?VkFlU3ZWcmM5dHprSVE4cHRvVTVtckRwZnMxTEpyM1AyS2dJckgzVk5CaXJB?=
 =?utf-8?B?ZUlhZ1VLVXQvWXdWU3B1bUV0UGVvbVdRdGJOOXFGZEpza0NxQ2p1OFZjMkcz?=
 =?utf-8?B?MzErY09UckFHalRqMmtzbUNMMmQyd09qbnpxQ3lqaGUrOTRBaVF6WTZVZVFw?=
 =?utf-8?B?ZzNoak40dHhzQTdYcjNKSklmTzZpWnE0TjhvVTUxL2ZEb1QyMFNjV3FldWU0?=
 =?utf-8?B?ODBWZ0FvcWthRjZ5VDNsVkdtblA0enQ4K3RoVlhrSWpISk9laDRSZ09vaktq?=
 =?utf-8?B?MEI3c1JqamNVU01STXp1eC9JSHd5ZVNSZUdLUVY3dnhPOXZCZ1h3eHJZUFVC?=
 =?utf-8?B?S2tKTk9KcWxPOUlDOFhjWVlBM3NlVTZhTitPS0RiMVU5N2lZQWM4OG96MlVW?=
 =?utf-8?Q?KFT66WlbAE2Zx?=
X-Forefront-Antispam-Report: CIP:192.176.1.74; CTRY:SE; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:oa.msg.ericsson.com; PTR:office365.se.ericsson.net;
 CAT:NONE; SFS:(13230040)(82310400026)(376014)(36860700013)(1800799024);
 DIR:OUT; SFP:1101; 
X-OriginatorOrg: ericsson.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 20 Sep 2024 10:36:33.0251 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 03f8348e-7a3b-472a-ae2c-08dcd960187f
X-MS-Exchange-CrossTenant-Id: 92e84ceb-fbfd-47ab-be52-080c6b87953f
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=92e84ceb-fbfd-47ab-be52-080c6b87953f; Ip=[192.176.1.74];
 Helo=[oa.msg.ericsson.com]
X-MS-Exchange-CrossTenant-AuthSource: AM3PEPF0000A79A.eurprd04.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: GV1PR07MB9119
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

In build where use_cc_memcpy is set to true, the vhost user PMD
suffers a large performance drop on Intel P-cores for small packets,
at least when built by GCC and (to a much lesser extent) clang.

This patch addresses that issue by using a custom virtio
memcpy()-based packet copying routine.

Performance results from a Raptor Lake @ 3,2 GHz:

GCC 12.3.0
64 bytes packets
Core  Mode              Mpps
E     RTE memcpy        9.5
E     cc memcpy         9.7
E     cc memcpy+pktcpy  9.0

P     RTE memcpy        16.4
P     cc memcpy         13.5
P     cc memcpy+pktcpy  16.2

GCC 12.3.0
1500 bytes packets
Core  Mode              Mpps
P    RTE memcpy         5.8
P    cc memcpy          5.9
P    cc memcpy+pktcpy   5.9

clang 15.0.7
64 bytes packets
Core  Mode              Mpps
P     RTE memcpy        13.3
P     cc memcpy         12.9
P     cc memcpy+pktcpy  13.9

"RTE memcpy" is use_cc_memcpy=false, "cc memcpy" is use_cc_memcpy=true
and "pktcpy" is when this patch is applied.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/vhost/virtio_net.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 370402d849..63571587a8 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -231,6 +231,39 @@ vhost_async_dma_check_completed(struct virtio_net *dev, int16_t dma_id, uint16_t
 	return nr_copies;
 }
 
+/* The code generated by GCC (and to a lesser extent, clang) with just
+ * a straight memcpy() to copy packets is less than optimal on Intel
+ * P-cores, for small packets. Thus the need of this specialized
+ * memcpy() in builds where use_cc_memcpy is set to true.
+ */
+#if defined(RTE_USE_CC_MEMCPY) && defined(RTE_ARCH_X86_64)
+static __rte_always_inline void
+pktcpy(void *restrict in_dst, const void *restrict in_src, size_t len)
+{
+	void *dst = __builtin_assume_aligned(in_dst, 16);
+	const void *src = __builtin_assume_aligned(in_src, 16);
+
+	if (len <= 256) {
+		size_t left;
+
+		for (left = len; left >= 32; left -= 32) {
+			memcpy(dst, src, 32);
+			dst = RTE_PTR_ADD(dst, 32);
+			src = RTE_PTR_ADD(src, 32);
+		}
+
+		memcpy(dst, src, left);
+	} else
+		memcpy(dst, src, len);
+}
+#else
+static __rte_always_inline void
+pktcpy(void *dst, const void *src, size_t len)
+{
+	rte_memcpy(dst, src, len);
+}
+#endif
+
 static inline void
 do_data_copy_enqueue(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	__rte_shared_locks_required(&vq->iotlb_lock)
@@ -240,7 +273,7 @@ do_data_copy_enqueue(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	int i;
 
 	for (i = 0; i < count; i++) {
-		rte_memcpy(elem[i].dst, elem[i].src, elem[i].len);
+		pktcpy(elem[i].dst, elem[i].src, elem[i].len);
 		vhost_log_cache_write_iova(dev, vq, elem[i].log_addr,
 					   elem[i].len);
 		PRINT_PACKET(dev, (uintptr_t)elem[i].dst, elem[i].len, 0);
@@ -257,7 +290,7 @@ do_data_copy_dequeue(struct vhost_virtqueue *vq)
 	int i;
 
 	for (i = 0; i < count; i++)
-		rte_memcpy(elem[i].dst, elem[i].src, elem[i].len);
+		pktcpy(elem[i].dst, elem[i].src, elem[i].len);
 
 	vq->batch_copy_nb_elems = 0;
 }
-- 
2.43.0