From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 488DE45552; Mon, 15 Jul 2024 17:08:04 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 1B2AB402DD; Mon, 15 Jul 2024 17:08:04 +0200 (CEST) Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.12]) by mails.dpdk.org (Postfix) with ESMTP id 7F594402B0 for ; Mon, 15 Jul 2024 17:08:02 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1721056083; x=1752592083; h=date:from:to:cc:subject:message-id:references: in-reply-to:mime-version; bh=QgfZ85x9yt9pOHkuMAqiUDBEPbpRKzwOcdXo0S52zvY=; b=VsWFARzuoHPcGSgH44P1uY/8Dq51ajBwxe/Nom7m9IOMFR1dsWMZWA3N XJHDGypQ9yaK6B1NeB/h3dQ5bp7x/bK5usYfpN3q7eywlLTv23/q9p3y0 2GCcsygBMGCC+Pi4Ur8hYvk+nGsC/B0zUDtQyztSutzrPH8NsiSCOa0ER h+i4NN+BOVsRphbu4A+E1IixTAxdn5tZaA791M9NINJZr9v1+QIA2LgMF N3koisDH5Fb2TQOeDOcxtqVR472jwxFvmAdR5NcyExeh58Yc+v9fyXQSO jomrBCa654ZsJzw4j06GQhaaypZB80x7haukX/a233bewXLqELbqRndid A==; X-CSE-ConnectionGUID: OJkxRStjQyiGVMv3i5rVPw== X-CSE-MsgGUID: zrVI8UNoTamvWHIAauldpw== X-IronPort-AV: E=McAfee;i="6700,10204,11134"; a="29830516" X-IronPort-AV: E=Sophos;i="6.09,210,1716274800"; d="scan'208";a="29830516" Received: from orviesa005.jf.intel.com ([10.64.159.145]) by orvoesa104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 15 Jul 2024 08:08:01 -0700 X-CSE-ConnectionGUID: UDEFQGdXSfW6fpvChUQrdQ== X-CSE-MsgGUID: Li0oXmbhS0SeLsUtTO/3qA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.09,210,1716274800"; d="scan'208";a="54584628" Received: from orsmsx603.amr.corp.intel.com ([10.22.229.16]) by orviesa005.jf.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 15 Jul 2024 08:08:00 -0700 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 15 Jul 2024 08:08:00 -0700 Received: from orsmsx610.amr.corp.intel.com (10.22.229.23) by ORSMSX610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39; Mon, 15 Jul 2024 08:07:59 -0700 Received: from ORSEDG602.ED.cps.intel.com (10.7.248.7) by orsmsx610.amr.corp.intel.com (10.22.229.23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.39 via Frontend Transport; Mon, 15 Jul 2024 08:07:59 -0700 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (104.47.58.101) by edgegateway.intel.com (134.134.137.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Mon, 15 Jul 2024 08:07:58 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector10001; d=microsoft.com; cv=none; b=HhgEFeaPAWW9aAFQ+Yi+fBT4JXnJuq7lBZGmO61rC4gYufyA6dJ572N0zqmv5uTvma7bruaJSQBJr7TisO1sIps7DjggGoD8zPHGtX9a0gJWJLS4FF88oOhg9rQRT3a32xI6CT6qPBhu0dCAgJOVw55cqwGf+a6qe+XenwvzulRWGLLj1l4zWQnghTcAtKp8K90tXWPFD6ZxFyL4f7oiKzSP/ULjvwPKavlIWM0iJz9C4vmUngLeLfTrZw5j/YcQgLhhHuVJO9vX3L3iGgTCpAY73gAhX1Vw7UYggQf6C14oEAr2KOHmZVkLQWD7RgsMUBSvcUAZyKTqdTlToCjWRw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector10001; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=lMfxnNroN0NbVTM3vfVDd31UoLWXrX1kScqH9icRlgg=; b=bgjFUS5Byc3abFMBxvYOE0bL+f2pHYSteHAaoJRBikvqQp/zjjNN0GfPqLkI59MbiJMlFcD+wFyRFkTlXKxj0/IYBZs428vTW85KPhUeIJZRWQJiz3Gwu2vFoOPchRDoYGgRd7ypu3h+BbTb0Uxe2X6QFZh4l9559UT64w2k30HV4yRP0NL0LjHrYh7vrYjuVfI+Okul/6TYYMhMQfndnUJattbLv3zIt9xVRtkWB1Yn6yIz6DgTnkxs5O31hv5OwFDInvrZy26htcgwJpiIRMMVb5BMlkld9tZEIH13NOj5H7oEzO5q9ctdfPpOy40s5MG2q6vNyC1UFLVDFJ/JZA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Authentication-Results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; Received: from DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) by DM6PR11MB4514.namprd11.prod.outlook.com (2603:10b6:5:2a3::17) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7762.29; Mon, 15 Jul 2024 15:07:54 +0000 Received: from DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::f120:cc1f:d78d:ae9b]) by DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::f120:cc1f:d78d:ae9b%7]) with mapi id 15.20.7762.027; Mon, 15 Jul 2024 15:07:50 +0000 Date: Mon, 15 Jul 2024 16:07:45 +0100 From: Bruce Richardson To: Vipin Varghese CC: , , Subject: Re: [PATCH] app/testpmd: improve sse based macswap Message-ID: References: <20240713151949.832-1-vipin.varghese@amd.com> Content-Type: text/plain; charset="us-ascii" Content-Disposition: inline In-Reply-To: <20240713151949.832-1-vipin.varghese@amd.com> X-ClientProxiedBy: DU7P189CA0011.EURP189.PROD.OUTLOOK.COM (2603:10a6:10:552::25) To DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS0PR11MB7309:EE_|DM6PR11MB4514:EE_ X-MS-Office365-Filtering-Correlation-Id: 5f4c53f4-5b94-4d92-9af6-08dca4dfe4fc X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0;ARA:13230040|376014|366016|1800799024; X-Microsoft-Antispam-Message-Info: =?us-ascii?Q?oV6DWtIO4cv4oLOFEUhj/HP9b3lI1IDRwW6tNTJKiczTz7bI3FqWRNCKF2Ao?= =?us-ascii?Q?ISnGliQsGAf7A+PLSXm5YDxSq6T61ReLmQGa6rlyXyF6N+ShBZSUiagyffCe?= =?us-ascii?Q?8Q6bMlZQIOuprjeBTOGJmfotHEaOXxV0VnAqnWETZ8EkCd587SPPimpX48/i?= =?us-ascii?Q?nCV8Fg5v/COXAugR+uAdkJC1I/b1ld/FZbOQKMFgdVnYmTi4b+BbqJi2QAq4?= =?us-ascii?Q?eaEsmNbPbsE0S/9GsbWihw2vj2U7liEt3nF2GODubfZ6Z5ICjUZUmx5O7eEt?= =?us-ascii?Q?7RPrRId+0qC+4so+XnJmQhsjQAtxrjHtrmB0JJZcEvH2Y13frL0dIaQr1tcr?= =?us-ascii?Q?PIFqXj94nPEgoVzH/FKHWOyZyltaGDrCkMHVh6uayB7YXyvMmlF3nwjPufxj?= =?us-ascii?Q?WH076xHFc9fyx1J4pu6DH1u3Tuo2NuEtXgCO9VKSqT/mYothDIPThpR2n8uL?= =?us-ascii?Q?ay1Bsunz3DmmLsz3VQ/mgCIgb15tIFsUVQhn1rQxzlpR0cETUH+IxiJRv6ar?= =?us-ascii?Q?cKE948DBrrPpw6Kxov95VMDwofnIqIRF3vbo8tJQVNApiKQ93hfTAtWC9AHY?= =?us-ascii?Q?LB6PGofjM0nn4SSZ9nI99csLn2MjdJh24YSVXhJYh2RRLDVddT76nFmy8xht?= =?us-ascii?Q?mJ4uSWjzXQmpyYYv55iDTQfrb7wLE2ZqCGbkmKhPlVJhY2ATCV9I0qGKzSiz?= =?us-ascii?Q?TOoJjtDr/rcNoDZFi4UToF3bNMNkEWtnhzZ2Vzp3COrkDJ3GL3VtFX0zMwBe?= =?us-ascii?Q?a+Zc8QIvj+9l3W52NbLP/m/lW6UW+lG7r2QH/8JafcHd/TGEwxXUqcyEUNmw?= =?us-ascii?Q?S2bnc1GmfI1bRAc+1yMgVGnTYF7wv4q59L6It3Oksuhb2JIZWOTs5uhYPXxC?= =?us-ascii?Q?lV0gV4hscENplhsdRjdhSM4ZlKtzVJLgjS54c+vGFsE5cIqM7s0/pUaWQmX4?= =?us-ascii?Q?lAgHw2rA3tgv7OnrQX/mb8olHmhKQfPQy3gdLpJrhPenEh21NesnXvRtcFPr?= =?us-ascii?Q?uknb4dxeQEDJhrYbv7mP1EPX8w9fK4REDJMJ5kWXUyF8TixaQ22auRlzkDtI?= =?us-ascii?Q?EbHRlSyIeK0LC01LxdXK8PF0SHV4kV/ZucReO9rqJUNm1txdvwPntfwlHg5q?= =?us-ascii?Q?xF1YtqIgn1jCzED7C+M1HZJOHmjHvmkV0cpR56qSTGnKy9JFMRdxHRF56y3X?= =?us-ascii?Q?ceJVnKkDtDTupz0cZVK2Zwt+NkKrzhDnVPWUFWg2YyHOROpcuIIRtNEictJw?= =?us-ascii?Q?CCTD/ZGlC9rbFHeedlYv0TyXHMYKzzV8XcLsDlzziwOpmMTC/YnZoQ8AX5vK?= =?us-ascii?Q?mThv1KaXyZAc8XBuHOFpK8x8z9i2EeP85/FhKqDQT9tuEA=3D=3D?= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DS0PR11MB7309.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230040)(376014)(366016)(1800799024); DIR:OUT; SFP:1101; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?us-ascii?Q?u9zZxjoaCy1sW/RDJOspK7iqGdXUyFUwetcVw1Glnj0xCRDtUOr54eLR8sT0?= =?us-ascii?Q?D9w18HF1NZOVtiWq2PnaX1VQLyH3LUVHae2eKArfzct84CxNizgo4i/0FlBj?= =?us-ascii?Q?0ssyv7cHewJ0E1bras2jrcByefS8uFY7gTGXDMMD+j0RDko09BAmlqeWWAhS?= =?us-ascii?Q?2f0p2bSeBpxaceTyH7zOKSMTws938tcz0EyzaW9H9cuq0u7XYrolmNbL5kXg?= =?us-ascii?Q?mKyDf5uKsQF84puvcKJ6XhiQSh3O4fHf+Z9xK3/euvnG44e0HA0Qjlh5Ehqq?= =?us-ascii?Q?tRjzTJ/lX2kpe7AeXC7MpmkugECZ47dRZJ0bdcQAT8VzlyaAXTd3IgSzWIwI?= =?us-ascii?Q?YKcIMLeIu2UjVvvRjCU6xmVFyHgAllv/wwR/qTbcFwXMJFTgT570Jt//1kvV?= =?us-ascii?Q?9eWYf5NNjCqVz7whK3mhjr8oi0owkIzt6PyRp0bh9gljLz4UJNHvhVb0Ijcu?= =?us-ascii?Q?N2iQu5wPV5oWJIg6eroBCVQwMBI8GIheFC+WPme+gzK/tuQMbJ/EhMRzW7hw?= =?us-ascii?Q?UsWP3NDiJS5P3ExtaQZ8e/Br2o1Lk4aZIffLYEFecCd8bBIQvMbcTAi5vElj?= =?us-ascii?Q?8izg824VWsFrVJvx5uevtLM3GXUNoz1MGtucgqkrufpf7WgIhHm7MR3+QELt?= =?us-ascii?Q?5CZ/i4SiiJhUv/Hy1WOycwIWTVfuEkpTB6WYQ9fWFd0N1+ulaXLZoWWPatb2?= =?us-ascii?Q?l+xxZzUnLd/apUPALwBf7xK95LbxhPssEmDKw2ShJuwdw/cQJ33vNgILn7jM?= =?us-ascii?Q?wRqL4vdaYm+aRf/EcV82MMLEz0KXQ5F3a/5P6jVWr+hMRL7UZCvPpbzPw4Vk?= =?us-ascii?Q?yN+N8C/S3D8EHT5hwpT7RC4HHFpYx8wHJFTEYt1UHZjYfJAekhVkHRf6x0/S?= =?us-ascii?Q?TYr/CGfZr5L+j6O03VfgdZYoQ97xV4JPuklZTBfMkoXCNdzh3fb5g9Yw24Dx?= =?us-ascii?Q?AErEoTz1BdxbMDCDOBaUheK1ZI+cmuUJ5u2zgZYhX86lWkrLIa1/1+a9VJFp?= =?us-ascii?Q?rRcEHIN7y0gqAH29C8mtBZw13Vvf2SAFs7K4PmbYr29QxSFkGTBQR5qi+8X4?= =?us-ascii?Q?dVhZY+YQ7RCSbDHr5Ty3c3k6mqxCQ4KPtycwdrD32M9JIYNOX7pJ8bRCozO1?= =?us-ascii?Q?ZqtVYAT8citHYDXq1hwEQL+7KdQnOLL4H15vlNUmMdOuVtO5/S2zbW5r4CBv?= =?us-ascii?Q?prKLoY44AN7pfjtbvYrUB/YFzbOXdO6U1Npj/47L3vtxFnlNdIKJNqXatwA/?= =?us-ascii?Q?Iywv17zR6QNPPYU8TuIKhy/e1eJqUqDiuUsj5ChNDEVPzYMvTxSY5Xn0kVLm?= =?us-ascii?Q?fQVKuaVn5tDf1GK3ZgeNprb1tBPei9cyvTKTDizWcibo++qc80ddGsgiZRqi?= =?us-ascii?Q?4dgaUg4crFmhDn7z/rNusevGFjiEgpqh/NRT/NboAQAjGsW+53Wv1MCXxAq8?= =?us-ascii?Q?wBGQo0YvXw1duFqy0ZFnw7pCJcjOcD9k5/Qmgeq6DqygQx4Q72ujfSnvgSmv?= =?us-ascii?Q?F3l0b1IscUdgdxACJTKACxzg6sXlmOkXDg+90srJs6Qp/sroixUdvx2Tgvxx?= =?us-ascii?Q?Ci8JX+HqsReu2Od7NQPYrw5xLreb2KOleAFwoI/0EntNjscJ4ivq3AQoXPXz?= =?us-ascii?Q?XA=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: 5f4c53f4-5b94-4d92-9af6-08dca4dfe4fc X-MS-Exchange-CrossTenant-AuthSource: DS0PR11MB7309.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Jul 2024 15:07:50.8384 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: jaidk//kKaOs9XZx7SaFNg7d9tbkRfYMGbmDydvQ38jg6h/Xw92KV+gU0C2odH+nLuckiTVA77I2J5ZefaSA5NDcPNpiZ+stPi2S3tMUZlk= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR11MB4514 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Sat, Jul 13, 2024 at 08:49:49PM +0530, Vipin Varghese wrote: > Goal of the patch is to improve SSE macswap on x86_64 by reducing > the stalls in backend engine. Original implementation of the SSE > macswap makes loop call to multiple load, shuffle & store. Using > SIMD ISA interleaving we can reduce the stalls for > - load SSE token exhaustion > - Shuffle and Load dependency > > Also other changes which improves packet per second are > - Filling access to MBUF for offload flags which is separate cacheline, > - using register keyword > > Test results: > ------------ > Platform: AMD EPYC SIENA 8594P @2.3GHz, no boost > DPDK: 24.03 > > ------------------------------------------------ > TEST IO 64B: baseline > - mellanox CX-7 2*200Gbps : 42.0 > - intel E810 1*100Gbps : 82.0 > - intel E810 2*200Gbps (2CQ-DA2): 83.0 > ------------------------------------------------ > TEST MACSWAP 64B: > - mellanox CX-7 2*200Gbps : 31.533 : 31.90 > - intel E810 1*100Gbps : 50.380 : 47.0 > - intel E810 2*200Gbps (2CQ-DA2): 48.840 : 49.827 > ------------------------------------------------ > TEST MACSWAP 128B: > - mellanox CX-7 2*200Gbps: 30.946 : 31.770 > - intel E810 1*100Gbps: 49.386 : 46.366 > - intel E810 2*200Gbps (2CQ-DA2): 47.979 : 49.503 > ------------------------------------------------ > TEST MACSWAP 256B: > - mellanox CX-7 2*200Gbps: 32.480 : 33.150 > - intel E810 1 * 100Gbps: 45.29 : 44.571 > - intel E810 2 * 200Gbps (2CQ-DA2): 45.033 : 45.117 > ------------------------------------------------ > Hi, interesting patch. Do you know why we see regressions in some of the cases above? For 1x100G at 64B and 128B packet sizes we see perf drops of 3mpps vs smaller gains in the other two cases at each size (much smaller in the 64B case). Couple of other questions inline below too. Thanks, /Bruce > using multiple queues and lcore there is linear increase in MPPs. > > Signed-off-by: Vipin Varghese > --- > app/test-pmd/macswap_sse.h | 40 ++++++++++++++++++-------------------- > 1 file changed, 19 insertions(+), 21 deletions(-) > > diff --git a/app/test-pmd/macswap_sse.h b/app/test-pmd/macswap_sse.h > index 223f87a539..a3d3a274e5 100644 > --- a/app/test-pmd/macswap_sse.h > +++ b/app/test-pmd/macswap_sse.h > @@ -11,21 +11,21 @@ static inline void > do_macswap(struct rte_mbuf *pkts[], uint16_t nb, > struct rte_port *txp) > { > - struct rte_ether_hdr *eth_hdr[4]; > - struct rte_mbuf *mb[4]; > + register struct rte_ether_hdr *eth_hdr[8]; > + register struct rte_mbuf *mb[8]; Does using "register" actually make a difference to the generated code? Also, why increasing the array sizes from 4 to 8 - the actual code only uses 4 elements of each array below anyway? Is it for cache alignment purposes perhaps - if so, please use explicit cache alignment attributes to specify this rather than having it implicit in the array sizes. > uint64_t ol_flags; > int i; > int r; > - __m128i addr0, addr1, addr2, addr3; > + register __m128i addr0, addr1, addr2, addr3; > /** > * shuffle mask be used to shuffle the 16 bytes. > * byte 0-5 wills be swapped with byte 6-11. > * byte 12-15 will keep unchanged. > */ > - __m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, > - 5, 4, 3, 2, > - 1, 0, 11, 10, > - 9, 8, 7, 6); > + register const __m128i shfl_msk = _mm_set_epi8(15, 14, 13, 12, > + 5, 4, 3, 2, > + 1, 0, 11, 10, > + 9, 8, 7, 6); > > ol_flags = ol_flags_init(txp->dev_conf.txmode.offloads); > vlan_qinq_set(pkts, nb, ol_flags, > @@ -44,23 +44,24 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t nb, > > mb[0] = pkts[i++]; > eth_hdr[0] = rte_pktmbuf_mtod(mb[0], struct rte_ether_hdr *); > - addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); > - > mb[1] = pkts[i++]; > eth_hdr[1] = rte_pktmbuf_mtod(mb[1], struct rte_ether_hdr *); > - addr1 = _mm_loadu_si128((__m128i *)eth_hdr[1]); > - > - > mb[2] = pkts[i++]; > eth_hdr[2] = rte_pktmbuf_mtod(mb[2], struct rte_ether_hdr *); > - addr2 = _mm_loadu_si128((__m128i *)eth_hdr[2]); > - > mb[3] = pkts[i++]; > eth_hdr[3] = rte_pktmbuf_mtod(mb[3], struct rte_ether_hdr *); > - addr3 = _mm_loadu_si128((__m128i *)eth_hdr[3]); > > + /* Interleave load, shuffle & set */ > + addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); > + mbuf_field_set(mb[0], ol_flags); > + addr1 = _mm_loadu_si128((__m128i *)eth_hdr[1]); > + mbuf_field_set(mb[1], ol_flags); > addr0 = _mm_shuffle_epi8(addr0, shfl_msk); > + addr2 = _mm_loadu_si128((__m128i *)eth_hdr[2]); > + mbuf_field_set(mb[2], ol_flags); > addr1 = _mm_shuffle_epi8(addr1, shfl_msk); > + addr3 = _mm_loadu_si128((__m128i *)eth_hdr[3]); > + mbuf_field_set(mb[3], ol_flags); > addr2 = _mm_shuffle_epi8(addr2, shfl_msk); > addr3 = _mm_shuffle_epi8(addr3, shfl_msk); > > @@ -69,25 +70,22 @@ do_macswap(struct rte_mbuf *pkts[], uint16_t nb, > _mm_storeu_si128((__m128i *)eth_hdr[2], addr2); > _mm_storeu_si128((__m128i *)eth_hdr[3], addr3); > > - mbuf_field_set(mb[0], ol_flags); > - mbuf_field_set(mb[1], ol_flags); > - mbuf_field_set(mb[2], ol_flags); > - mbuf_field_set(mb[3], ol_flags); > r -= 4; > } > > for ( ; i < nb; i++) { > if (i < nb - 1) > rte_prefetch0(rte_pktmbuf_mtod(pkts[i+1], void *)); > + > mb[0] = pkts[i]; > eth_hdr[0] = rte_pktmbuf_mtod(mb[0], struct rte_ether_hdr *); > > /* Swap dest and src mac addresses. */ > addr0 = _mm_loadu_si128((__m128i *)eth_hdr[0]); > + /* MBUF and Ethernet are 2 separate cacheline */ > + mbuf_field_set(mb[0], ol_flags); > addr0 = _mm_shuffle_epi8(addr0, shfl_msk); > _mm_storeu_si128((__m128i *)eth_hdr[0], addr0); > - > - mbuf_field_set(mb[0], ol_flags); > } Since final loop is only for the odd elements at the end of the array of buffers. Does changing the instruction ordering here really make perf difference?