From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9118945596; Tue, 16 Jul 2024 08:40:39 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 21818432EC; Tue, 16 Jul 2024 08:40:39 +0200 (CEST) Received: from NAM11-CO1-obe.outbound.protection.outlook.com (mail-co1nam11on2066.outbound.protection.outlook.com [40.107.220.66]) by mails.dpdk.org (Postfix) with ESMTP id 5F80040A80 for ; Tue, 16 Jul 2024 08:38:01 +0200 (CEST) ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=D5oFg6Utv7YwLkG4ljGHp640V/w1Sca/qdwSd5rJHn+W6qwuemiUPqe9RrIOPZqNq6O4ZwBmeJ5PUamxrmWhxarKDpsyafDXsE4pfdW5UScVPyzy7qxqlEkYPY/dZqD7fi2+HxnCAlK+xQ6foxChxGDyA8ukin2Ymd8QVDlT56Nzw0DYztZxLvW/t2gWuBR4Cy6sjuvZBofYWOTRK4vIJYWeG5wLHEfk9wm49n6EzBEwM4q8S892r9fUGs1hY4cynR3XrtCv3eq7jMlhM54CxY/z7JnSeVxsZc9VCw2+Fw2Akly1IFMMnag7DGZHUfRO6ePPzjkQWT7cOZmqudp+Tg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=PlYk1rIszAYQuXjXsZ2ULugakyGZe7Qbnj9w7JJeL/M=; b=Df2WMWwQ7S0Z32nMEHg6vxKqrXW3fUqXGZq8KS6BOiat/5mhxP7ULV4PFdeSLmbOsWfIMdmtwlwgUManuwFss2Gkw5G7ffT6ENgI3hLWV2SMOUjIu7lJ6QM2+Uut7T9gohVQviLT5cvtsuwlCOnRzFi808m+xvfORoxUGyYNKkE42I8t8OmTUVDQOSZ3scDrcFOv0O8ZHLv/m+aNhQpQfBlPB8UZ6mzjngS9qs/3m7vr4SFVedjfuo2TCYEhJL8GZhs4BdVqnm4P7ottWDkhrn+rrzzfFX+pDkHoOitHhwUo/6tJZXodfzF8O7pvRDBqaZEMQeAr3L1PZt92ie5FDA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass (sender ip is 165.204.84.17) smtp.rcpttodomain=dpdk.org smtp.mailfrom=amd.com; dmarc=pass (p=quarantine sp=quarantine pct=100) action=none header.from=amd.com; dkim=none (message not signed); arc=none (0) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=amd.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=PlYk1rIszAYQuXjXsZ2ULugakyGZe7Qbnj9w7JJeL/M=; b=khcl1wZn4oIyM3dDBpmHzioEFp+TRakLLFzaC0YpGcYNGfHjFmsquDjdCOi7pgcGcH2kjjMazVK78VAUEzRkLUCdguV4BkWN2SjwMSANh757+rVob5lzED4CjcsAUAP3o2NmFK4oWI1GcdEIkefzMmpMANWsfvR5hxGQiudlpfA= Received: from SJ0PR03CA0256.namprd03.prod.outlook.com (2603:10b6:a03:3a0::21) by CYXPR12MB9320.namprd12.prod.outlook.com (2603:10b6:930:e6::9) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7762.29; Tue, 16 Jul 2024 06:37:59 +0000 Received: from SJ1PEPF00002315.namprd03.prod.outlook.com (2603:10b6:a03:3a0:cafe::45) by SJ0PR03CA0256.outlook.office365.com (2603:10b6:a03:3a0::21) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7762.29 via Frontend Transport; Tue, 16 Jul 2024 06:37:59 +0000 X-MS-Exchange-Authentication-Results: spf=pass (sender IP is 165.204.84.17) smtp.mailfrom=amd.com; dkim=none (message not signed) header.d=none;dmarc=pass action=none header.from=amd.com; Received-SPF: Pass (protection.outlook.com: domain of amd.com designates 165.204.84.17 as permitted sender) receiver=protection.outlook.com; client-ip=165.204.84.17; helo=SATLEXMB04.amd.com; pr=C Received: from SATLEXMB04.amd.com (165.204.84.17) by SJ1PEPF00002315.mail.protection.outlook.com (10.167.242.169) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.20.7784.11 via Frontend Transport; Tue, 16 Jul 2024 06:37:58 +0000 Received: from BLR-5CG134626B.amd.com (10.180.168.240) by SATLEXMB04.amd.com (10.181.40.145) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Tue, 16 Jul 2024 01:37:55 -0500 From: Vipin Varghese To: , , , , Subject: [PATCH] app/testpmd: improve sse based macswap Date: Tue, 16 Jul 2024 12:07:24 +0530 Message-ID: <20240716063724.850-1-vipin.varghese@amd.com> X-Mailer: git-send-email 2.41.0.windows.3 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain X-Originating-IP: [10.180.168.240] X-ClientProxiedBy: SATLEXMB03.amd.com (10.181.40.144) To SATLEXMB04.amd.com (10.181.40.145) X-EOPAttributedMessage: 0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: SJ1PEPF00002315:EE_|CYXPR12MB9320:EE_ X-MS-Office365-Filtering-Correlation-Id: b779dbc2-c1bd-4332-7c2f-08dca561d55f X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; ARA:13230040|36860700013|82310400026|376014|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?r0chfX3el/VTszveXlbVaLTMn+cPBrjO3BQYt6TQEOpfbOATPQbYseZPMfmv?= =?us-ascii?Q?gmTCDPl66sm9VmRa0X2BkVYVOYRpz9g6TAvu6ExggSkTpzySaw6zHqCUN765?= =?us-ascii?Q?8Irw/D376bDs6CRkTHvMkDE82db6c7mzvEGaX9/CHSDbaSqqUQu9PKWWmOmu?= =?us-ascii?Q?lkNwE7iOHrvIPAMbkQzFToAt9Bvudwf1hDgWpclC5Y3fXt0l1UIKvi0ik7pi?= =?us-ascii?Q?+eiVYtvAON9Y1B3CdTqmMq4XKWFLOeG/N7JCK6daFWJ1UFHrenElzOkiELgX?= =?us-ascii?Q?9dBar5Gqq27zxn1m0P8X4OE5UJlprgHsIVX1EHbwxxUeeQxfbB4ZekvALITQ?= =?us-ascii?Q?HUlYtYsqqpLx8/2l5LlY42rD3Fe3QMwrsFFzFFRYn7wq1MqwV6bky3fJl6Yg?= =?us-ascii?Q?plM41u+rB6Ct9SzR4ueMmCywQbY0c0VbTkovmFNNQuitVhnAJ0UZ4epivMNv?= =?us-ascii?Q?i1AHvWFDzL68GrGs84rxeVskz70g7MTlptqT349/Viig9M3v/nx+Up5Lab05?= =?us-ascii?Q?eeqG4iS4pzGIn91GcxtQFuRe4gGHJLq6J6GzhcByb0YnT2GWOVo9TE7fRu7E?= =?us-ascii?Q?Dby8We2n9LIQupFlUAK5ay5EmIqmnP+6l15yG07TasheoMGNlB9/U+76rSUE?= =?us-ascii?Q?tnRTSMVlZa1bFjcRiK7ipr2OWitpNzVXxJFolG86JvpnQFXUmlP/kpZLXXQf?= =?us-ascii?Q?6Nf80PimIOumyiybN71p6cf6FOqjwBpd4ACvHweqSQYFlECnUBZhyoGxrfBY?= =?us-ascii?Q?aYzwXETKLjQr6CheINw9ZmYJ2ITkWlGnyKZUxsQwIFkTUQ3PLE/frHQeI91S?= =?us-ascii?Q?sjzTrKg8J9ass/WCdcEXa+GCBe1+TA30c8Z7lgwovSWtqFrSPXnDS9fiuuU/?= =?us-ascii?Q?ogPEyD0Sqn7W1ahxT+L7Hq8zsYon92igQWSIbcWJA34wj9r2PKCv5XBege3l?= =?us-ascii?Q?6+NFMZxA3nxorxLNkX00b2Pc8j8gtTzpZP1GOlzZYqEhaLIhDh5N96WcANYM?= =?us-ascii?Q?U9hCp271ALnpxxb9e5f0UhLMYO/oKyo3l6o4AfWgb26/IwbHHXaJMUvtolKx?= =?us-ascii?Q?fMd44od4+Vcb2E3DXtLaUzr9CcO69dJEqb5c1KBVr+C98TCTAcA3VlvExjJj?= =?us-ascii?Q?OWr/XGR2wySTOP/aViJXz2p/o3zyvM/iEHsWPGczZsObxU0DSgEvOVZLcKpO?= =?us-ascii?Q?HWq32kyzZJkQuAIxWfvTlMHdx8qFkZJTSh63x2EhzWqvxXUi0OHzePgW3vDE?= =?us-ascii?Q?tVnWxx8TVYUDo17h+zmqhM2sfrE4FEccNAoLQA1OIMnODZUhTQ5IaBmeJMOM?= =?us-ascii?Q?DBuEp02VixN4PC6iGK9RJjOVQXlQcg+1VZsBJdnmOjhxPmvRF4EATcHdmQX+?= =?us-ascii?Q?8TTfVrwWEi+1cSs9Rxrh1v12LWjr4YZQ8L8aIEULmnPbjXxpZC8NA812Z+y4?= =?us-ascii?Q?aOAwuMD463I3PVVoNgRKDsPfwNiLURb4?= X-Forefront-Antispam-Report: CIP:165.204.84.17; CTRY:US; LANG:en; SCL:1; SRV:; IPV:CAL; SFV:NSPM; H:SATLEXMB04.amd.com; PTR:InfoDomainNonexistent; CAT:NONE; SFS:(13230040)(36860700013)(82310400026)(376014)(1800799024); DIR:OUT; SFP:1101; X-OriginatorOrg: amd.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 16 Jul 2024 06:37:58.8420 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: b779dbc2-c1bd-4332-7c2f-08dca561d55f X-MS-Exchange-CrossTenant-Id: 3dd8961f-e488-4e60-8e11-a82d994e183d X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=3dd8961f-e488-4e60-8e11-a82d994e183d; Ip=[165.204.84.17]; Helo=[SATLEXMB04.amd.com] X-MS-Exchange-CrossTenant-AuthSource: SJ1PEPF00002315.namprd03.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CYXPR12MB9320 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Goal of the patch is to improve SSE macswap on x86_64 by reducing the stalls in backend engine. Original implementation of the SSE macswap makes loop call to multiple load, shuffle & store. Using SIMD ISA interleaving we can reduce the stalls for - load SSE token exhaustion - Shuffle and Load dependency Also other changes which improves packet per second are - Filling access to MBUF for offload flags which is separate cacheline, - using register keyword Build test using meson script: `````````````````````````````` build-gcc-static buildtools build-gcc-shared build-mini build-clang-static build-clang-shared build-x86-generic Test Results: ````````````` Platform-1: AMD EPYC SIENA 8594P @2.3GHz, no boost ------------------------------------------------ TEST IO 64B: baseline - mellanox CX-7 2*200Gbps : 42.0 - intel E810 1*100Gbps : 82.0 - intel E810 2*200Gbps (2CQ-DA2): 82.45 ------------------------------------------------ TEST MACSWAP 64B: - mellanox CX-7 2*200Gbps : 31.533 : 31.90 - intel E810 1*100Gbps : 50.380 : 47.0 - intel E810 2*200Gbps (2CQ-DA2): 48.840 : 49.827 ------------------------------------------------ TEST MACSWAP 128B: - mellanox CX-7 2*200Gbps: 30.946 : 31.770 - intel E810 1*100Gbps: 49.386 : 46.366 - intel E810 2*200Gbps (2CQ-DA2): 47.979 : 49.503 ------------------------------------------------ TEST MACSWAP 256B: - mellanox CX-7 2*200Gbps: 32.480 : 33.150 - intel E810 1 * 100Gbps: 45.29 : 44.571 - intel E810 2 * 200Gbps (2CQ-DA2): 45.033 : 45.117 ------------------------------------------------ Platform-2: AMD EPYC 9554 @3.1GHz, no boost ------------------------------------------------ TEST IO 64B: baseline - intel E810 2*200Gbps (2CQ-DA2): 82.49 ------------------------------------------------ TEST MACSWAP: 1Q 1C1T 64B: : 45.0 : 45.54 128B: : 44.48 : 44.43 256B: : 42.0 : 41.99 +++++++++++++++++++++++++ TEST MACSWAP: 2Q 2C2T 64B: : 59.5 : 60.55 128B: : 56.78 : 58.1 256B: : 41.85 : 41.99 ------------------------------------------------ Signed-off-by: Vipin Varghese --- app/test-pmd/macswap_sse.h | 37 +++++++++++++++++-------------------- 1 file changed, 17 insertions(+), 20 deletions(-) diff --git a/app/test-pmd/macswap_sse.h b/app/test-pmd/macswap_sse.h index 223f87a539..6e4ed21924 100644 --- a/app/test-pmd/macswap_sse.h +++ b/app/test-pmd/macswap_sse.h @@ -16,16 +16,16 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t nb, uint64_t ol_flags; int i; int r; - __m128i addr0, addr1, addr2, addr3; + register __m128i addr0, addr1, addr2, addr3; /** * shuffle mask be used to shuffle the 16 bytes. * byte 0-5 wills be swapped with byte 6-11. * byte 12-15 will keep unchanged. */ - __m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, - 5, 4, 3, 2, - 1, 0, 11, 10, - 9, 8, 7, 6); + register const __m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, + 5, 4, 3, 2, + 1, 0, 11, 10, + 9, 8, 7, 6); ol_flags = ol_flags_init(txp->dev_conf.txmode.offloads); vlan_qinq_set(pkts, nb, ol_flags, @@ -44,23 +44,24 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t nb, mb[0] = pkts[i++]; eth_hdr[0] = rte_pktmbuf_mtod(mb[0], struct rte_ether_hdr *); - addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); - mb[1] = pkts[i++]; eth_hdr[1] = rte_pktmbuf_mtod(mb[1], struct rte_ether_hdr *); - addr1 = _mm_loadu_si128((__m128i *)eth_hdr[1]); - - mb[2] = pkts[i++]; eth_hdr[2] = rte_pktmbuf_mtod(mb[2], struct rte_ether_hdr *); - addr2 = _mm_loadu_si128((__m128i *)eth_hdr[2]); - mb[3] = pkts[i++]; eth_hdr[3] = rte_pktmbuf_mtod(mb[3], struct rte_ether_hdr *); - addr3 = _mm_loadu_si128((__m128i *)eth_hdr[3]); + /* Interleave loads and shuffle with field set */ + addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); + mbuf_field_set(mb[0], ol_flags); + addr1 = _mm_loadu_si128((__m128i *)eth_hdr[1]); + mbuf_field_set(mb[1], ol_flags); addr0 = _mm_shuffle_epi8(addr0, shfl_msk); + addr2 = _mm_loadu_si128((__m128i *)eth_hdr[2]); + mbuf_field_set(mb[2], ol_flags); addr1 = _mm_shuffle_epi8(addr1, shfl_msk); + addr3 = _mm_loadu_si128((__m128i *)eth_hdr[3]); + mbuf_field_set(mb[3], ol_flags); addr2 = _mm_shuffle_epi8(addr2, shfl_msk); addr3 = _mm_shuffle_epi8(addr3, shfl_msk); @@ -69,25 +70,21 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t nb, _mm_storeu_si128((__m128i *)eth_hdr[2], addr2); _mm_storeu_si128((__m128i *)eth_hdr[3], addr3); - mbuf_field_set(mb[0], ol_flags); - mbuf_field_set(mb[1], ol_flags); - mbuf_field_set(mb[2], ol_flags); - mbuf_field_set(mb[3], ol_flags); r -= 4; } for ( ; i < nb; i++) { - if (i < nb - 1) + if (i < (nb - 1)) rte_prefetch0(rte_pktmbuf_mtod(pkts[i+1], void *)); mb[0] = pkts[i]; eth_hdr[0] = rte_pktmbuf_mtod(mb[0], struct rte_ether_hdr *); /* Swap dest and src mac addresses. */ addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); + /* invoke field_set as it is on separate cacheline */ + mbuf_field_set(mb[0], ol_flags); addr0 = _mm_shuffle_epi8(addr0, shfl_msk); _mm_storeu_si128((__m128i *)eth_hdr[0], addr0); - - mbuf_field_set(mb[0], ol_flags); } } -- 2.34.1