From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 0A69045698;
	Wed, 24 Jul 2024 10:22:22 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id D5E3F42E24;
	Wed, 24 Jul 2024 10:22:11 +0200 (CEST)
Received: from EUR05-VI1-obe.outbound.protection.outlook.com
 (mail-vi1eur05on2040.outbound.protection.outlook.com [40.107.21.40])
 by mails.dpdk.org (Postfix) with ESMTP id B12D6427B8
 for <dev@dpdk.org>; Wed, 24 Jul 2024 10:03:44 +0200 (CEST)
ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none;
 b=f710TswLDTmFWgwCiqbDGCA82ss2WJw1PWTP0W02oppcfmlAtiYE52y6wFgQIVqAhJNZQ7Ib38SWWOCnvzfVczV5IQ05oEPKhzFhz4vB6Nby+shfVlM4qUxQ56dk6Z92IGpbdxQaq+onZ/HL3BxbbtDtcaWU7g8uKVsOcQS0vZtq+4n1Ru/6L6W5wgq/4Vq86tg+u/NJ/nbyvNo91Uk/dBtO7w6RQehNY7NAMIy0MfGpoRFHoN1Iim8OK8FHm0JbF2u5ES0yHrssAr274ftn9TQrXvfqet7L8RZPDnNcqpD9wgzzMciSh1qXrLVBB4z+LIxStH+ueHMqmo9LfOA4/A==
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; 
 s=arcselector10001;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1;
 bh=z22bXtS4XrtbkUJgM4b26Ti/93OMNhkH5v3afQAXhLo=;
 b=eDrscThs1qTGg1jsFbmbPOPVlhNsLeYuWD/F1JtaRjdLKFM++5mmt/NtN/XWofmQtsBQUGpjiKLDaKrQsoBAU5KzOIMRJ3tRRk2/yRkuQTn8Vwjbiatsw4rfGEqQoIeNJ5rCDwOE8IxmacqzYCC7LwD77331o3BlwTOp/a9UpreBGSAK50NiXtljbKYaMb90GsyFYPKvpD0fvE/H866TSb8UTBCX8Qr5HdocRXMzcpAKD0coOgYhDvp2/QaOXDdF8wXkIAkNMFOhVGAPbv4YtDSw/UR8QI9Du+q1cjFUY7Jd3T0b5C+Gr0RktRKf0YyZE6fUSxkxk8mGMz2bdMLePA==
ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is
 192.176.1.74) smtp.rcpttodomain=dpdk.org smtp.mailfrom=ericsson.com;
 dmarc=pass (p=reject sp=reject pct=100) action=none header.from=ericsson.com; 
 dkim=none (message not signed); arc=none (0)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ericsson.com;
 s=selector1;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck;
 bh=z22bXtS4XrtbkUJgM4b26Ti/93OMNhkH5v3afQAXhLo=;
 b=GqhvfcWdBASH7P0f5aI7UlE0PEQDzMmXpz11iXmKV0xbFDBnzSFYUPmnZPhoKPNXTYhgbxG3WFGA4YQhjReD5aeJcYpGopipNvMdAKzPeVFtl/0kGlAkq6tNceIQYUubNnlyx3RIusznFskLayZTyigweh+qq/X9g6TKJgi2tXp/5AQZgxfiDFfIj8CVozJE/yfX2WM/bbqMJ16aKLhmKxCnMV6w2ow0vigguq277uAl8NpDPNediXuOC5xZd5y7i1NFkMPGL4j0KoIDq1/vMnh1TbjNucMHWG5o5slDZMryOaH438o7iGwjFGyIFucOZz4/7Fy1uLpzdHSzPExKTQ==
Received: from AS4P250CA0011.EURP250.PROD.OUTLOOK.COM (2603:10a6:20b:5df::13)
 by VI1PR07MB6671.eurprd07.prod.outlook.com (2603:10a6:800:184::10)
 with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7807.7; Wed, 24 Jul
 2024 08:03:43 +0000
Received: from AM2PEPF0001C711.eurprd05.prod.outlook.com
 (2603:10a6:20b:5df:cafe::b5) by AS4P250CA0011.outlook.office365.com
 (2603:10a6:20b:5df::13) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7784.16 via Frontend
 Transport; Wed, 24 Jul 2024 08:03:42 +0000
X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 192.176.1.74)
 smtp.mailfrom=ericsson.com; dkim=none (message not signed)
 header.d=none;dmarc=pass action=none header.from=ericsson.com;
Received-SPF: Pass (protection.outlook.com: domain of ericsson.com designates
 192.176.1.74 as permitted sender)
 receiver=protection.outlook.com; 
 client-ip=192.176.1.74; helo=oa.msg.ericsson.com; pr=C
Received: from oa.msg.ericsson.com (192.176.1.74) by
 AM2PEPF0001C711.mail.protection.outlook.com (10.167.16.181) with Microsoft
 SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 15.20.7784.11 via Frontend Transport; Wed, 24 Jul 2024 08:03:42 +0000
Received: from seliicinfr00049.seli.gic.ericsson.se (153.88.142.248) by
 smtp-central.internal.ericsson.com (100.87.178.60) with Microsoft SMTP Server
 id 15.2.1544.11; Wed, 24 Jul 2024 10:03:41 +0200
Received: from breslau.. (seliicwb00002.seli.gic.ericsson.se [10.156.25.100])
 by seliicinfr00049.seli.gic.ericsson.se (Postfix) with ESMTP id
 CBC18380070; Wed, 24 Jul 2024 10:03:41 +0200 (CEST)
From: =?UTF-8?q?Mattias=20R=C3=B6nnblom?= <mattias.ronnblom@ericsson.com>
To: <dev@dpdk.org>
CC: =?UTF-8?q?Mattias=20R=C3=B6nnblom?= <hofors@lysator.liu.se>,
 =?UTF-8?q?Morten=20Br=C3=B8rup?= <mb@smartsharesystems.com>, "Stephen
 Hemminger" <stephen@networkplumber.org>, David Marchand
 <david.marchand@redhat.com>, Pavan Nikhilesh <pbhagavatula@marvell.com>,
 Bruce Richardson <bruce.richardson@intel.com>,
 =?UTF-8?q?Mattias=20R=C3=B6nnblom?= <mattias.ronnblom@ericsson.com>
Subject: [PATCH v5 6/6] vhost: optimize memcpy routines when cc memcpy is used
Date: Wed, 24 Jul 2024 09:53:57 +0200
Message-ID: <20240724075357.546248-7-mattias.ronnblom@ericsson.com>
X-Mailer: git-send-email 2.34.1
In-Reply-To: <20240724075357.546248-1-mattias.ronnblom@ericsson.com>
References: <20240620175731.420639-2-mattias.ronnblom@ericsson.com>
 <20240724075357.546248-1-mattias.ronnblom@ericsson.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: 8bit
X-EOPAttributedMessage: 0
X-MS-PublicTrafficType: Email
X-MS-TrafficTypeDiagnostic: AM2PEPF0001C711:EE_|VI1PR07MB6671:EE_
X-MS-Office365-Filtering-Correlation-Id: b71a077b-9df9-4148-822f-08dcabb722b6
X-MS-Exchange-SenderADCheck: 1
X-MS-Exchange-AntiSpam-Relay: 0
X-Microsoft-Antispam: BCL:0;
 ARA:13230040|1800799024|36860700013|376014|82310400026; 
X-Microsoft-Antispam-Message-Info: =?utf-8?B?eXlLNm1aQklBdXRLZ2xjTUFjVElhYVJBYkRXMWZOc2Z0K0JLMG5YTm9rZmZn?=
 =?utf-8?B?aTVvUDJ2dlVzQ1dGUmRZVzZEVlNFMmtONnVCdEZBWFF3ejFKTjB3cW9vdjh4?=
 =?utf-8?B?WjU3UDZwc01Gb3UySmxPNGxWVjZ6eFhySnA5SVdKcmo0LzkxZ2tzMjdlYzNl?=
 =?utf-8?B?UDEvVlRVbDNkenZ3SmVkWUt3ZzU0VUpzS1pYZExlc3BlbWt6L0hGV1VFc1JU?=
 =?utf-8?B?eTkzUnZJUHhlNDRJUlp3aVJXWVFSRlUrMFNRR0IyNC9wMThidW5UYVQ0YUkz?=
 =?utf-8?B?cGRnUWFEdU1LdUxiVm94eFRzK3pTTkhPN2ViRk0zSkh6eVRQd05SY24zREN2?=
 =?utf-8?B?djhSZGJCNmp1TGhhNExWWUUwbWNyb3JyTEdGV3huSTQ4eldOZ3pYdHhpdk96?=
 =?utf-8?B?TGdSaytQYUVBV1pjZ0M2bHpERmFCQmNUMEhsUWhKb29Ubzd2Y0VUQmpyQ2ll?=
 =?utf-8?B?dHpkYzgyckRDMmp2bURBakppYzZBbGJPYVlaOU54dFdtQWRSc0UrNGZKZkNp?=
 =?utf-8?B?Slc1Z09hckJIa2YvUDFucHhPeUZGTitRbXplQ0RybXY5WVFKS1dPMmZjenpX?=
 =?utf-8?B?REVPWjFEVWw1bm1UZnVmL3RiUWpRUGw3SlRnUnRaWmJPQit6RDJBNkFLRGVi?=
 =?utf-8?B?cW1zYjlJc1pPUmJ5OEJZYm5IWVAybXlJK3BSRGRiRTk2ZHptRnAwQnlrMDZ5?=
 =?utf-8?B?cEYvNTF4aCs4RWRjeEs0blFGQUt6U0VsSmVvdlBSZ1p3a0dWL0JMUW1oeDRC?=
 =?utf-8?B?dDVkY2N1bUJNcEF3NlU4UTl4dnRSd1MyTnBkT0szcFdYVTBlMWdWVGg2R1Nt?=
 =?utf-8?B?VHh4QnA2c2hhcmNjWHZOdHhnTVZHWlJaeS92QkZpbHExRHJyWkp0bmtKdFJv?=
 =?utf-8?B?M1lqWE9zd085S2REMHFWVkllTmVIWXQ4eXhISmxQbG1zQm13ZVd5aHUwcUpU?=
 =?utf-8?B?STFJdVhPNU9kajRMQ0lTYldXVngwVjBaMmpTMkFOYk9kd1Z5bmM5ck8xY1pF?=
 =?utf-8?B?Qks2MTI3VDJ2djEvYlllc3RmalNFazI0WlRqc3RJc1JRQzQwK3crdFk4Zysx?=
 =?utf-8?B?bjY0bldybnFZeFFTeThzTnVmSklkVEtzNGpNU21ONlFxOWdJZi9STmJ5bU5m?=
 =?utf-8?B?WnVJUHlTMCtJZEdGSkQ2TWxLSTZnR2doTzFkaVlyMkNyRmE3TGY3Wm1PT2xR?=
 =?utf-8?B?Uk9yZmlmSDQwZm9tZ0diVUw1RzJCYlpSdUFBbCtpcXdCaHluYXdFQ2pnQjR2?=
 =?utf-8?B?ZkV2VndBTVg0NVVmZE1hdStPTy93bnZ4a0F2a2N5MzZDeUZLT1RDV3U2TnFY?=
 =?utf-8?B?aHRlYjg1bWF2Ry9ZTU1nZWNXUFQ5TG4rdDNCZVEzN3ZiZTlyZ1J2a2lPVmN0?=
 =?utf-8?B?MzZJUUFPcFVIakZhN0l3WVpaQ2xkSm5KRE5JN1l6bldaVEJPZ2k1emIyNDNJ?=
 =?utf-8?B?TWhaSklnY2d4V3VOQnNwSUk0T3J2Z3ZweEk1cWx2eTBabGxqV2I0YitFZUt5?=
 =?utf-8?B?N003OWhUYnlTVGZmOUFkLzBoRnRKOWxTSUxTTDE5aUZwQzlUeDExcGZNRkxE?=
 =?utf-8?B?OVhScVBoMzhJRDRDK3N4d0V5SWNhcGVGWHAxYktCdnBQTDM0N01OZXZQSFVK?=
 =?utf-8?B?bDVTK1lvbEMwSWJxdHFJSGdnZXRSdmt4akdRMXVaTFFlT0FlNHFJLzFITWlv?=
 =?utf-8?B?SFpCcGJxVFVZWks0WWkzU2hWOXRXM25LNUw4WTFieUlBdlpGSFhoNDJUV2NQ?=
 =?utf-8?B?M2JhMEp4d2wrM1pyVDA2QVcyUXlWK0VOaUxxN2hGUER3TG9XWFg4dUExNHNp?=
 =?utf-8?B?c1pkaTNZakZyYkJMNHpwdWVmcFAxdFZ4eVpjTVBBTDFFUjZYdnYrd0J5a3h2?=
 =?utf-8?B?QlJoVFZKYVk0eVMxOXhNMGhCaSt1VjFBTit3MXlaajVoSXE2aXZodTVVUkVz?=
 =?utf-8?Q?oQmCFn996B5sgIDtf11XstBEVsgRWDMa?=
X-Forefront-Antispam-Report: CIP:192.176.1.74; CTRY:SE; LANG:en; SCL:1; SRV:;
 IPV:NLI; SFV:NSPM; H:oa.msg.ericsson.com; PTR:office365.se.ericsson.net;
 CAT:NONE; SFS:(13230040)(1800799024)(36860700013)(376014)(82310400026);
 DIR:OUT; SFP:1101; 
X-OriginatorOrg: ericsson.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Jul 2024 08:03:42.8893 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: b71a077b-9df9-4148-822f-08dcabb722b6
X-MS-Exchange-CrossTenant-Id: 92e84ceb-fbfd-47ab-be52-080c6b87953f
X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=92e84ceb-fbfd-47ab-be52-080c6b87953f; Ip=[192.176.1.74];
 Helo=[oa.msg.ericsson.com]
X-MS-Exchange-CrossTenant-AuthSource: AM2PEPF0001C711.eurprd05.prod.outlook.com
X-MS-Exchange-CrossTenant-AuthAs: Anonymous
X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem
X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR07MB6671
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

In build where use_cc_memcpy is set to true, the vhost user PMD
suffers a large performance drop on Intel P-cores for small packets,
at least when built by GCC and (to a much lesser extent) clang.

This patch addresses that issue by using a custom virtio
memcpy()-based packet copying routine.

Performance results from a Raptor Lake @ 3,2 GHz:

GCC 12.3.0
64 bytes packets
Core  Mode              Mpps
E     RTE memcpy        9.5
E     cc memcpy         9.7
E     cc memcpy+pktcpy  9.0

P     RTE memcpy        16.4
P     cc memcpy         13.5
P     cc memcpy+pktcpy  16.2

GCC 12.3.0
1500 bytes packets
Core  Mode              Mpps
P    RTE memcpy         5.8
P    cc memcpy          5.9
P    cc memcpy+pktcpy   5.9

clang 15.0.7
64 bytes packets
Core  Mode              Mpps
P     RTE memcpy        13.3
P     cc memcpy         12.9
P     cc memcpy+pktcpy  13.9

"RTE memcpy" is use_cc_memcpy=false, "cc memcpy" is use_cc_memcpy=true
and "pktcpy" is when this patch is applied.

Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
---
 lib/vhost/virtio_net.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/lib/vhost/virtio_net.c b/lib/vhost/virtio_net.c
index 370402d849..63571587a8 100644
--- a/lib/vhost/virtio_net.c
+++ b/lib/vhost/virtio_net.c
@@ -231,6 +231,39 @@ vhost_async_dma_check_completed(struct virtio_net *dev, int16_t dma_id, uint16_t
 	return nr_copies;
 }
 
+/* The code generated by GCC (and to a lesser extent, clang) with just
+ * a straight memcpy() to copy packets is less than optimal on Intel
+ * P-cores, for small packets. Thus the need of this specialized
+ * memcpy() in builds where use_cc_memcpy is set to true.
+ */
+#if defined(RTE_USE_CC_MEMCPY) && defined(RTE_ARCH_X86_64)
+static __rte_always_inline void
+pktcpy(void *restrict in_dst, const void *restrict in_src, size_t len)
+{
+	void *dst = __builtin_assume_aligned(in_dst, 16);
+	const void *src = __builtin_assume_aligned(in_src, 16);
+
+	if (len <= 256) {
+		size_t left;
+
+		for (left = len; left >= 32; left -= 32) {
+			memcpy(dst, src, 32);
+			dst = RTE_PTR_ADD(dst, 32);
+			src = RTE_PTR_ADD(src, 32);
+		}
+
+		memcpy(dst, src, left);
+	} else
+		memcpy(dst, src, len);
+}
+#else
+static __rte_always_inline void
+pktcpy(void *dst, const void *src, size_t len)
+{
+	rte_memcpy(dst, src, len);
+}
+#endif
+
 static inline void
 do_data_copy_enqueue(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	__rte_shared_locks_required(&vq->iotlb_lock)
@@ -240,7 +273,7 @@ do_data_copy_enqueue(struct virtio_net *dev, struct vhost_virtqueue *vq)
 	int i;
 
 	for (i = 0; i < count; i++) {
-		rte_memcpy(elem[i].dst, elem[i].src, elem[i].len);
+		pktcpy(elem[i].dst, elem[i].src, elem[i].len);
 		vhost_log_cache_write_iova(dev, vq, elem[i].log_addr,
 					   elem[i].len);
 		PRINT_PACKET(dev, (uintptr_t)elem[i].dst, elem[i].len, 0);
@@ -257,7 +290,7 @@ do_data_copy_dequeue(struct vhost_virtqueue *vq)
 	int i;
 
 	for (i = 0; i < count; i++)
-		rte_memcpy(elem[i].dst, elem[i].src, elem[i].len);
+		pktcpy(elem[i].dst, elem[i].src, elem[i].len);
 
 	vq->batch_copy_nb_elems = 0;
 }
-- 
2.34.1