From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 386EC455FD; Sat, 13 Jul 2024 17:20:17 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C024B4029B; Sat, 13 Jul 2024 17:20:16 +0200 (CEST) Received: from NAM12-BN8-obe.outbound.protection.outlook.com (mail-bn8nam12on2082.outbound.protection.outlook.com [40.107.237.82]) by mails.dpdk.org (Postfix) with ESMTP id BFDF94028A for ; Sat, 13 Jul 2024 17:20:15 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=hsFT4HWacH69OEmY2QkC047bEXxsdV28vMqwRjleR+SSu2P24SgiTm7Yf6uigt8q+cUkbRA02Zp+wvfUUyQYJa75W9rYso58cPc4dCBUbLZD2c51SaYNBeDCBQSHAd6NoEr5cgzrbf++EoBEaWgw/mlU9McNk0oHoUOo2pIkZoTncU+YP8DDSf1rBDWKp7v7QrKPuKdVVFllG57OOLUQL4ukyMQ81x6gD6wUWTcp8C9KmtSlaTtv5yfGEOkovRUGIVlZvqEjEv7Ak4aWEjVm5SPi62hrec/sjIDt2FO9zmrvQVEAiWIzbrWk6Waaru3bHmR9/nClLKnIYjvbUG9z/A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=5UipzNu2mkyrxbOR2mZAJnx6NNs9KFUsymL2KUxl74g=; b=bZQMY57iifJGyosJlYRltDxNzGshi+BonHaJnIqu2pYj7QSreu9NkX8jM839Nd2MWnRppeD2HBcZ69xkd5vc9SkjCvwJlTb+0G7Mow92Ddm4mc5fxlHTlvnEnQbVrOe4gELkjOd0np3qkMrIX5VA5LWwE3CxmWqfiAD58dL+mK7cVfJ68PlXNokzamE0HQN2lM1MMY72zYuPKyZmHwl61T/uW34Z8SxTr0awgxBNqs7HGvJAedpHp+IXPzy/8rYYznHfCPgi0dekvjwqs/j7Driqv7fFU96Qu/2mbWUNd63PBVIo+QVDyPKj0K37U258s/ZJ9H/+LbnALYPptUrCRw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=dpdk.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=5UipzNu2mkyrxbOR2mZAJnx6NNs9KFUsymL2KUxl74g=; b=i7KmeFjqQeIlJe0DXSUcG3yjKtxGoflxH4fz3q9QS6iTeCIIz1/muwFP05093pGCZZPUQhh2N/M7fw3/f1k88NPRULmLSKpeIbosDYvhV8TdSnUtq1RtkRjzvT3+GFbHFCF5XafqORY0/T8jOvs6zmA3m2BYgNBdicXhK3He0kk= Received: from DM6PR07CA0120.namprd07.prod.outlook.com (2603:10b6:5:330::10) by IA0PR12MB7626.namprd12.prod.outlook.com (2603:10b6:208:438::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7762.23; Sat, 13 Jul 2024 15:20:12 +0000 Received: from DS3PEPF000099DD.namprd04.prod.outlook.com (2603:10b6:5:330:cafe::ae) by DM6PR07CA0120.outlook.office365.com (2603:10b6:5:330::10) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7762.24 via Frontend Transport; Sat, 13 Jul 2024 15:20:12 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by DS3PEPF000099DD.mail.protection.outlook.com (10.167.17.199) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7762.17 via Frontend Transport; Sat, 13 Jul 2024 15:20:12 +0000 Received: from BLR-5CG134626B.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Sat, 13 Jul 2024 10:20:09 -0500 From: Vipin Varghese To: , , , , Subject: [PATCH] app/testpmd: improve sse based macswap Date: Sat, 13 Jul 2024 20:49:49 +0530 Message-ID: <20240713151949.832-1-vipin.varghese@amd.com> X-Mailer: git-send-email 2.41.0.windows.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB04.amd.com (10.181.40.145) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS3PEPF000099DD:EE_|IA0PR12MB7626:EE_ X-MS-Office365-Filtering-Correlation-Id: 871268ab-d4c8-4f34-47f6-08dca34f4a20 X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|36860700013|82310400026|1800799024|376014; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?YQjRFW6IIzVPcldMniVvOH7suZr27PGd+U1GZ36IJGmkq7/wRh2EmotdqZlE?= =?us-ascii?Q?SwiYtv69LhpWhQ8MW7lP/FQCOIgalPDZN1SRVGE+Uiz7pt109dJox8ApV/X5?= =?us-ascii?Q?3fs5ThV3tajyrGgUqJTQr9Z+tSCl+JvOZJgn6lRE6jOHdBIzQ9ivKtPCb2G7?= =?us-ascii?Q?vNgdvBvk4FlBSxSyqIXUFZKOMU30u/pVGCBzMKFdEtI5NuW/izdIQ+IU7d7h?= =?us-ascii?Q?aietd2gwbD1a3NAbn8gfIGdf148uZtOcwop1mp5fWCfQzBR0mdlZgWBADLQf?= =?us-ascii?Q?JW+hwEIf0putd3S62XLnnxygFoLylb1jsFqeJJNl4aHLvV61aa8LRGigsxVd?= =?us-ascii?Q?IntEszZBdThgMANkodUWcLNYoi7M0j5A9UGHNEHpwxTybH13DRWuUgvseycd?= =?us-ascii?Q?IxHOqVvXm73NmjhhsOD62Fz6sS89+5iEJUlaFGxEatiQHbpxPscXrgI5xOeF?= =?us-ascii?Q?7izrQnXPjtJaKY4+qay6OVdp/IJKcufblfGDGAgFfujcldF2QDsn0Vxhk29t?= =?us-ascii?Q?qk/0Vdgc7CfcjfIaY6ZKBuegU/zysi6Wnh3AaHuW2uDQnik2jeGNDnPAAyHk?= =?us-ascii?Q?4D04I8gJOWaXSUpQM8o1lQN7mFCwFCxvtQ2pYQhphvWoZVX26Vi2T9BpuYV2?= =?us-ascii?Q?zbOy7PseYx944Dgo5PeFMbRTZNfxwwVmvK/tS6UMlzzRJ3EpMgwWZpseVQ32?= =?us-ascii?Q?0piF4dXnsmQ/HfLLbByYPdsAeDefloM63yId3o2BCbHljirVdDgHdc+28Mp2?= =?us-ascii?Q?U5eD8PUULEa4HKGm+/+rvMa3uMdp7RCD0k49rydRhsueAnwZBkHduO+jPsqO?= =?us-ascii?Q?E7eWufESgnT16ywEmvTjsw4vXJgEL3jiVluYB0L3W+2Acc8khKx/281tR7vn?= =?us-ascii?Q?kd2pSf5IpKtNeM1pzYMWX4vIXZSy2CRhLANPYejSpBww6vXbXNF2AcbbipvM?= =?us-ascii?Q?jEodYygKKiRe+6y0mzZGz6OYJ5ZWEf6frdCcAIlxmV8CpJ4y/QbuGvS7JIYq?= =?us-ascii?Q?UIv0DiqlhQbxjLyAFiPLpjRrH+X5s2gpJmHOZCk0gM5CgIbxcOGmX7/VEKiD?= =?us-ascii?Q?oeWyVSdg7C+NcFJz3zyg1nUb69QJtZBv+DAkx1TuCTsAgvPIcLeggKAZkc5Z?= =?us-ascii?Q?fjUibfFclRuoTnbxflBaDOYYQCYvH9Mt7G6nAhuGCHSKwF5eao7riCM8JtqT?= =?us-ascii?Q?Fj5uYI6G+5xN5d4wmL27Bek5id4INxaurQ9ipiTuA3o7GUmgliknBsWyNzjO?= =?us-ascii?Q?/Girr8zdD785DiWwebJu5TdGLnn7IRdX7VzqTtGIWnLHZPCN0a7pcpTPtjXL?= =?us-ascii?Q?HWiZyG9Jvhd8RjnNwggAtvHL4Zsv3dgxVE0FFd2lNDxYhgwIBCeBrwecoTql?= =?us-ascii?Q?To5hoXTmGic8uYxMj4rfQZudlUd8q7p98tnTJ6wrs2wvYYp3wuBzU4oZZ0Td?= =?us-ascii?Q?ml2sGT72C2fFpU/xTN9ny4e/aRF5EH2f?= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB04.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(36860700013)(82310400026)(1800799024)(376014); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 13 Jul 2024 15:20:12.0203 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 871268ab-d4c8-4f34-47f6-08dca34f4a20 X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: DS3PEPF000099DD.namprd04.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: IA0PR12MB7626 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Goal of the patch is to improve SSE macswap on x86_64 by reducing the stalls in backend engine. Original implementation of the SSE macswap makes loop call to multiple load, shuffle & store. Using SIMD ISA interleaving we can reduce the stalls for - load SSE token exhaustion - Shuffle and Load dependency Also other changes which improves packet per second are - Filling access to MBUF for offload flags which is separate cacheline, - using register keyword Test results: ------------ Platform: AMD EPYC SIENA 8594P @2.3GHz, no boost DPDK: 24.03 ------------------------------------------------ TEST IO 64B: baseline - mellanox CX-7 2*200Gbps : 42.0 - intel E810 1*100Gbps : 82.0 - intel E810 2*200Gbps (2CQ-DA2): 83.0 ------------------------------------------------ TEST MACSWAP 64B: - mellanox CX-7 2*200Gbps : 31.533 : 31.90 - intel E810 1*100Gbps : 50.380 : 47.0 - intel E810 2*200Gbps (2CQ-DA2): 48.840 : 49.827 ------------------------------------------------ TEST MACSWAP 128B: - mellanox CX-7 2*200Gbps: 30.946 : 31.770 - intel E810 1*100Gbps: 49.386 : 46.366 - intel E810 2*200Gbps (2CQ-DA2): 47.979 : 49.503 ------------------------------------------------ TEST MACSWAP 256B: - mellanox CX-7 2*200Gbps: 32.480 : 33.150 - intel E810 1 * 100Gbps: 45.29 : 44.571 - intel E810 2 * 200Gbps (2CQ-DA2): 45.033 : 45.117 ------------------------------------------------ using multiple queues and lcore there is linear increase in MPPs. Signed-off-by: Vipin Varghese --- app/test-pmd/macswap_sse.h | 40 ++++++++++++++++++-------------------- 1 file changed, 19 insertions(+), 21 deletions(-) diff --git a/app/test-pmd/macswap_sse.h b/app/test-pmd/macswap_sse.h index 223f87a539..a3d3a274e5 100644 --- a/app/test-pmd/macswap_sse.h +++ b/app/test-pmd/macswap_sse.h @@ -11,21 +11,21 @@ static inline void do_macswap(struct rte_mbuf *pkts[], uint16_t nb, struct rte_port *txp) { - struct rte_ether_hdr *eth_hdr[4]; - struct rte_mbuf *mb[4]; + register struct rte_ether_hdr *eth_hdr[8]; + register struct rte_mbuf *mb[8]; uint64_t ol_flags; int i; int r; - __m128i addr0, addr1, addr2, addr3; + register __m128i addr0, addr1, addr2, addr3; /** * shuffle mask be used to shuffle the 16 bytes. * byte 0-5 wills be swapped with byte 6-11. * byte 12-15 will keep unchanged. */ - __m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, - 5, 4, 3, 2, - 1, 0, 11, 10, - 9, 8, 7, 6); + register const __m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, + 5, 4, 3, 2, + 1, 0, 11, 10, + 9, 8, 7, 6); ol_flags = ol_flags_init(txp->dev_conf.txmode.offloads); vlan_qinq_set(pkts, nb, ol_flags, @@ -44,23 +44,24 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t nb, mb[0] = pkts[i++]; eth_hdr[0] = rte_pktmbuf_mtod(mb[0], struct rte_ether_hdr *); - addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); - mb[1] = pkts[i++]; eth_hdr[1] = rte_pktmbuf_mtod(mb[1], struct rte_ether_hdr *); - addr1 = _mm_loadu_si128((__m128i *)eth_hdr[1]); - - mb[2] = pkts[i++]; eth_hdr[2] = rte_pktmbuf_mtod(mb[2], struct rte_ether_hdr *); - addr2 = _mm_loadu_si128((__m128i *)eth_hdr[2]); - mb[3] = pkts[i++]; eth_hdr[3] = rte_pktmbuf_mtod(mb[3], struct rte_ether_hdr *); - addr3 = _mm_loadu_si128((__m128i *)eth_hdr[3]); + /* Interleave load, shuffle & set */ + addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); + mbuf_field_set(mb[0], ol_flags); + addr1 = _mm_loadu_si128((__m128i *)eth_hdr[1]); + mbuf_field_set(mb[1], ol_flags); addr0 = _mm_shuffle_epi8(addr0, shfl_msk); + addr2 = _mm_loadu_si128((__m128i *)eth_hdr[2]); + mbuf_field_set(mb[2], ol_flags); addr1 = _mm_shuffle_epi8(addr1, shfl_msk); + addr3 = _mm_loadu_si128((__m128i *)eth_hdr[3]); + mbuf_field_set(mb[3], ol_flags); addr2 = _mm_shuffle_epi8(addr2, shfl_msk); addr3 = _mm_shuffle_epi8(addr3, shfl_msk); @@ -69,25 +70,22 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t nb, _mm_storeu_si128((__m128i *)eth_hdr[2], addr2); _mm_storeu_si128((__m128i *)eth_hdr[3], addr3); - mbuf_field_set(mb[0], ol_flags); - mbuf_field_set(mb[1], ol_flags); - mbuf_field_set(mb[2], ol_flags); - mbuf_field_set(mb[3], ol_flags); r -= 4; } for ( ; i < nb; i++) { if (i < nb - 1) rte_prefetch0(rte_pktmbuf_mtod(pkts[i+1], void *)); + mb[0] = pkts[i]; eth_hdr[0] = rte_pktmbuf_mtod(mb[0], struct rte_ether_hdr *); /* Swap dest and src mac addresses. */ addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); + /* MBUF and Ethernet are 2 separate cacheline */ + mbuf_field_set(mb[0], ol_flags); addr0 = _mm_shuffle_epi8(addr0, shfl_msk); _mm_storeu_si128((__m128i *)eth_hdr[0], addr0); - - mbuf_field_set(mb[0], ol_flags); } } -- 2.34.1