From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id D167D43E01; Fri, 5 Apr 2024 15:17:35 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id A4744402D8; Fri, 5 Apr 2024 15:17:35 +0200 (CEST) Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.14]) by mails.dpdk.org (Postfix) with ESMTP id B1CC5402D4 for ; Fri, 5 Apr 2024 15:17:33 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712323054; x=1743859054; h=date:from:to:cc:subject:message-id:references: content-transfer-encoding:in-reply-to:mime-version; bh=PUuQMSRmVSnUEGU6ur5izXmClo54O17UpO+85NvUR7c=; b=Ur+y84fXmIcRbXeb84SgmYnVCFRZXYDYjYUlD/j/CvDkadPcOV01VVq4 Qxp6XFEvtb/MX3PKqE+rkpSXfkLS+tWD9NjMCICy5YDAe28LLavaZj1sv m2M1BlQSuUc6JQGqZ7TaCNb04YTGWCR1nK0PCUVCpUs7vUERooO8MKGI3 MNecZos6L3xBEXuawtVUOOirPkv/URUxzAnPXujx4gK5pXjLbR/garN0W 9/Co6M7xFbeqjmPt/qyYOJQQilOsbpFin3oxR2JYO5VLLKGuvUz0KjPKU YJAnXikCM0YIrjlYjb6L2xCTwLyRm1BBrVPDvRFgQNhLFqXzR0Jx3M4RG A==; X-CSE-ConnectionGUID: znCkM12QQQS/YyCb9dgESA== X-CSE-MsgGUID: c/weyY1rR/iuv5PgXkS+qQ== X-IronPort-AV: E=McAfee;i="6600,9927,11034"; a="7884916" X-IronPort-AV: E=Sophos;i="6.07,181,1708416000"; d="scan'208";a="7884916" Received: from fmviesa006.fm.intel.com ([10.60.135.146]) by fmvoesa108.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Apr 2024 06:17:32 -0700 X-CSE-ConnectionGUID: dETaDsDdTpSytbDul91HsA== X-CSE-MsgGUID: jTk2iMQXRL+fZGegK1j2bA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,181,1708416000"; d="scan'208";a="19183290" Received: from orsmsx603.amr.corp.intel.com ([10.22.229.16]) by fmviesa006.fm.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 05 Apr 2024 06:17:32 -0700 Received: from orsmsx601.amr.corp.intel.com (10.22.229.14) by ORSMSX603.amr.corp.intel.com (10.22.229.16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Fri, 5 Apr 2024 06:17:31 -0700 Received: from ORSEDG601.ED.cps.intel.com (10.7.248.6) by orsmsx601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35 via Frontend Transport; Fri, 5 Apr 2024 06:17:31 -0700 Received: from NAM11-BN8-obe.outbound.protection.outlook.com (104.47.58.170) by edgegateway.intel.com (134.134.137.102) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Fri, 5 Apr 2024 06:17:31 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=VccxwOX8xkco6PkJi0mE+IklO5cg7JLCAO/LrugHAMOvu1vFLxcIJSIITOPlu+rKVfM0N1qFeqW0zHM+KjBDwZTWtJXiGaYTMUVSF6ZkRSIUdQYbo7ZPPKZHinw5K4H4e7bxRdRc9OMndg4wqUA54Gw6YBbgBgEnTNxlGjeQnwim8/5LKyKwokQTVNrL2wrqzM6CkhAaeGfVM1Hb81lzQmkmcYG9wyBmzW9Ru9xS/3+wvFqKiAaYXEdeBd6Ug/GYhTkkEqQ2+5w+BUerRxhYxKYN5q20oeOMrRHlTjVmgJY3yl+XLPzjuNeNIMBk/dnxUTwaQYf9psGV8kQBAln26Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=r8m71BBzCgoWKCPKvr14ghMYwWSpyuZNKbbVLKgxbkM=; b=NyFkj6BTi59cYybgxA4WQVaVIdSAQB2Vj26OR4ZOMeuvlJkMGCaYA23V0CwIwSbLcoH//Yo2AbSliS+duFcb24N1Zkrzb56bvzxZIuIdCqRabUnP3bFZGbcRHEJjPygTKL+uC0DQ9LqxlrPZpmbQORtCw9HdgTzgsXLkGd4q43oXIgvPgAD0gTjv099oFiOYBnhKu7xW+2GKtai+qwWGMzU7TO+NzCX7PQQUHHfiQWQDTYwNNb/bnBBN4UjZfyOKAlSvjTpEAjKVzKEuqedDUk9mCS9tiLt29kh8pCgZZ358Kd8ucWrJEno0kGjb7Oar+UQaE2CHhLl8yIx9PvR9Vg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) by PH7PR11MB6428.namprd11.prod.outlook.com (2603:10b6:510:1f4::12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7452.25; Fri, 5 Apr 2024 13:17:26 +0000 Received: from DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::487e:e20c:ad88:9c0f]) by DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::487e:e20c:ad88:9c0f%7]) with mapi id 15.20.7452.019; Fri, 5 Apr 2024 13:17:26 +0000 Date: Fri, 5 Apr 2024 14:17:21 +0100 From: Bruce Richardson To: Morten =?iso-8859-1?Q?Br=F8rup?= CC: , , , Subject: Re: [PATCH v3] eal/x86: improve rte_memcpy const size 16 performance Message-ID: References: <20240302234812.9137-1-mb@smartsharesystems.com> <20240405124628.47151-1-mb@smartsharesystems.com> Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <20240405124628.47151-1-mb@smartsharesystems.com> X-ClientProxiedBy: DUZP191CA0041.EURP191.PROD.OUTLOOK.COM (2603:10a6:10:4f8::20) To DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS0PR11MB7309:EE_|PH7PR11MB6428:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: L30E1hoTl2R5fVAPZUrY9dX4sESnui8RNxLRwVNHn2If6yfqaKwm6Rsm8nyZ+QFrcsUWT90LHsQkVFZSnoQXTaSeU/Yua7a8jiR4HA1WpSJihHJPewz/pveyJwYuUZBTN/d//RIf0KYmz8ivS3ywjBqE59NYcyuWNJEORMkevQKLcW5pbV4c1utxNvVF8PZzWqIvRlvFLMdvxT6IaSgiMNqVEZ61IprBrQskV+rRWyF1x/BjT0Mvltmx3/r1WYgjQRWHgXBeaAVsAn0szfwZYDs+BVi8ba+uIFPCol9vHgyy6Aq3Xs+2n6E6o13KogUDjetja13y+LNj4b7OO3jckkHUAyGHuo+UBTwg5tnl5UQaEp/siyUpzIMZqMLoeMHwBFceHGZtaDgLd8Sttg/2bF34+U/W+E+fPm7OkKCfZ4YioUIzoGLjTa58OEApxZRRyvNJRLdbajmIhX5fB3KoAit66MHszqA5VVX45zMy2t0N/dnBZmd0sYFjHdNSP8gHRTH6pq/i6fvtmWKJasKOCWeVlCn+QEpQM1VCPLSNykbiQIuGO+pCVhhminVWz+PCg4FHk3jhebxlrgERnSbmcWtkOhXj+ei8Q8bpdQaSOmMwgjDoM+HEY22YXoQtkZsl1/XDFoabPo+FtHxlDPien2cwClZozgl5xOENrSPyzQY= X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DS0PR11MB7309.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(1800799015)(376005)(366007); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?iso-8859-1?Q?mMl0ZfgddWd17ta04ay695LTnZ3X/ckUleg507FlOTLhgcCLl2PAXkUhtV?= =?iso-8859-1?Q?Sh3RHdAJvurHAk+3eZCqq5ncTJQZps1u3/bJx3HFnP56klgvxKwk1iKhwg?= =?iso-8859-1?Q?6eTxhz80tYGUNU5jy1ktU6WNakfn/FsbGgGd3/yohyKG00J0sVCZDCXXTi?= =?iso-8859-1?Q?9b86gl6HxbUKK3KS3iOqLHF94DB7+PpSSw52zTQ1EMm6q+MV7OIl6tBU7Z?= =?iso-8859-1?Q?Y2DSNf1DWIeMGbuGFg4lsZCXjUbIJv57NlSVerGFJ8y1WH78OJmhueWhsm?= =?iso-8859-1?Q?tI3NjRKjOZ/E/pbJZz4OV0l7RSZwHuu9zdMqh+2BxVGcEGZG+MhINpAS9K?= =?iso-8859-1?Q?l1y+tJ91LFUnh2CYSZlK7vGjjgRCvFJrg10hN+4Bkq4eOqUCpQ3f1eaPTO?= =?iso-8859-1?Q?HEl+H+HjKKvx//3WC49zulgs7tU47P/tof+12mtzZYtTw5eT6/RDv6YZyN?= =?iso-8859-1?Q?MhJBXMYs+uLrbA8Ce8Ts/Lyxi8YS8B2/Wi+/gnVntYi/xEB+nNG2VzIqTv?= =?iso-8859-1?Q?TiI/fiyrRHGqBm8ySkhGbedMp1eCuFKvOIxTreLP91cZwvE4wzUoJhT6eD?= =?iso-8859-1?Q?k8LJOw1KOutcw7dEHVYwRlx2vMRkaYVyrpWVFUZCAAXNWAhUEQtME9xOpf?= =?iso-8859-1?Q?+vf4aaqB65h/s1FMkVMNOqJvrnlt2tb1i9y3sw9kzHgiSdhmeoFtpdFb24?= =?iso-8859-1?Q?954yWkbSCf1eBAO0xdetzYsWSy7hviStoihs9dM4mGcBpjSuupRH2ZQgB6?= =?iso-8859-1?Q?VX43F5N1mD4GCL+PPtFwI126rCt2SI2NwSt0UAu7hkJb0SlCy6bgT3PhLd?= =?iso-8859-1?Q?d1o2KX3OJg9s2qkfyAB3rkynk1zgOAAsnVt/IB8pVqUrDwM2Yz/PROwCMq?= =?iso-8859-1?Q?MG0K0tDjufYHewS7I2RLa8DR6VFf6GX/zrgbiTZ+ozOd3adDwJhDhtjYlQ?= =?iso-8859-1?Q?oE75MX5i4Rh/zbD4X34lxJwbu8D2RlIotjWhBwnmX9QXrD5nOyYKHVikMe?= =?iso-8859-1?Q?cOiDlzSkCvq1rVpsUlnHRamht/djDJU0mEfyX9+mCn61Xy/RWqPvTZriUv?= =?iso-8859-1?Q?27UEzbA+am8Pz4RbtHochwvK73GOTFKKxX4TPwDRVZNOsBn+JOcvKeTuli?= =?iso-8859-1?Q?a90tRYQfrtKG0wJQS2gSB1VWc3wnsfmLDM2hRgxsXqybBQVhqVx2Bhw+su?= =?iso-8859-1?Q?khlfGyU3GDrnPyulBIOCRWFuAwMAq03Cj/A4XVTJeMQMcKaRqVcemqUv/z?= =?iso-8859-1?Q?LVjq/ypzbdlQZ6zojbKT9DJu0KdyucEeUEZQmTPlSb4JjmzaoopeVagxHq?= =?iso-8859-1?Q?D+ufOjh1WidNvItBAehAwIwrhdm772swraXwJRzApjP7gAiSjp2UjjBN9D?= =?iso-8859-1?Q?jMB1sQ8/S//rhKPxNC3sQXSLavcGavo965eRgeaw6Bq/NDwRuGEt1TClBH?= =?iso-8859-1?Q?aabKxHn2w7Uw92Say1xnV4uB9qfC25V4MKtmVAOA1aT6xSEteO1Z/Hckex?= =?iso-8859-1?Q?JqkGG/aOsgzZhONaOCoiieQwzV3l3XMEZ3eS4DPRC4Qclz+wd+pPEFE6KW?= =?iso-8859-1?Q?2v+lA9hvlprfWX/4ILxIJhpW5tuXOaYAOuVUBMRmYJ/Dgl7aPRT7GO2wop?= =?iso-8859-1?Q?iz3prdIZ9XOgFhq46Dne0i71J505Szn6u3etZVPJfpXDHkZ64myB8gMA?= =?iso-8859-1?Q?=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: fcc032b3-ac88-402c-3028-08dc5572bceb X-MS-Exchange-CrossTenant-AuthSource: DS0PR11MB7309.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 05 Apr 2024 13:17:26.6003 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: TtU3dTiAuNhdHptMcURTMBnJy9EenCWFf3PnYYCNJ4usVWmZY18kbRHGBA1y5DtibSi68QrKAIKMXETXxiZsKzyGnOFvXneriHdPhOPtyeU= X-MS-Exchange-Transport-CrossTenantHeadersStamped: PH7PR11MB6428 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Fri, Apr 05, 2024 at 02:46:28PM +0200, Morten Brørup wrote: > When the rte_memcpy() size is 16, the same 16 bytes are copied twice. > In the case where the size is known to be 16 at build tine, omit the > duplicate copy. > > Reduced the amount of effectively copy-pasted code by using #ifdef > inside functions instead of outside functions. > > Suggested-by: Stephen Hemminger > Signed-off-by: Morten Brørup > Acked-by: Bruce Richardson > --- > v3: > * AVX2 is a superset of AVX; > for a block of AVX code, testing for AVX suffices. (Bruce Richardson) > * Define RTE_MEMCPY_AVX if AVX is available, to avoid copy-pasting the > check for older GCC version. (Bruce Richardson) > v2: > * For GCC, version 11 is required for proper AVX handling; > if older GCC version, treat AVX as SSE. > Clang does not have this issue. > Note: Original code always treated AVX as SSE, regardless of compiler. > * Do not add copyright. (Stephen Hemminger) > --- > lib/eal/x86/include/rte_memcpy.h | 234 ++++++++----------------------- > 1 file changed, 59 insertions(+), 175 deletions(-) > > diff --git a/lib/eal/x86/include/rte_memcpy.h b/lib/eal/x86/include/rte_memcpy.h > index 72a92290e0..b56bc46713 100644 > --- a/lib/eal/x86/include/rte_memcpy.h > +++ b/lib/eal/x86/include/rte_memcpy.h > @@ -27,6 +27,11 @@ extern "C" { > #pragma GCC diagnostic ignored "-Wstringop-overflow" > #endif > > +/* GCC prior to version 11 doesn't compile AVX properly, so use SSE instead. */ > +#if defined __AVX__ && !(defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 110000)) > +#define RTE_MEMCPY_AVX > +#endif > + Strictly speaking, to have the same behaviour as before, you need to check for AVX2 also, since the issue with GCC < 11 is for (AVX && !AVX2), i.e. if AVX2 is supported, all compilers are fine. My suggestion: #ifdef __AVX2__ #define RTE_MEMCPY_AVX #elif defined __AVX__ && !(defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < 110000)) #define RTE_MEMCPY_AVX #endif You can obviously merge the two branches if you want, but I find the split slightly easier to follow, than a mix of && and || with brackets for precedence. Final alternative I see, you can change defined(RTE_MEMCPY_AVX) to "defined(__AVX2__) || defined(RTE_MEMCPY_AVX)" each place it's used. /Bruce