From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 84FCA43DFA; Thu, 4 Apr 2024 15:29:20 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 0F5AC40268; Thu, 4 Apr 2024 15:29:20 +0200 (CEST) Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.16]) by mails.dpdk.org (Postfix) with ESMTP id 4B96F4025D for ; Thu, 4 Apr 2024 15:29:18 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1712237358; x=1743773358; h=date:from:to:cc:subject:message-id:references: content-transfer-encoding:in-reply-to:mime-version; bh=oRb9q0R0bAnu5nBqt8cY1xrfjPDi0LRB5PdGLC4DePU=; b=QgIBjUyyg0Mu+W5u05U5k24SEqHfANkfHNJWRENdMkpEcC2KB6bANZb5 LC3GrwqsQFc9NQ/nqxtmDTDLYlikKEK0lmRBW5S25mmu3SPWfdPxGsoaC N9df3Lg972Nn74g75ekSpmSCMxNOr2Zo/hGg3yUNlzwXq4+eoHgCbccBU N+JQY6wWLGQ1eT9VDnhGZmJovZZkeAretk1iDHQI463Zk1POZLN+FzLvJ YcQyw+PRE+gc1CCOjYVcZjQj+ExkNprVyT+ZavTq/CALucGyVVpvCqT1o Vr9WgoRK96YQIxMrk9Oc11YcD7as8hRaGSVafUu2Rthks+k7DFMuN6sJp A==; X-CSE-ConnectionGUID: QENOW8dKTZ6AKEbZtJLP2A== X-CSE-MsgGUID: kSTuJS5uS+iNe6EiAXcVrQ== X-IronPort-AV: E=McAfee;i="6600,9927,11033"; a="8095296" X-IronPort-AV: E=Sophos;i="6.07,179,1708416000"; d="scan'208";a="8095296" Received: from fmviesa003.fm.intel.com ([10.60.135.143]) by fmvoesa110.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Apr 2024 06:29:17 -0700 X-CSE-ConnectionGUID: iAA/AvYbSo6m8l0IeqA45Q== X-CSE-MsgGUID: CpL1kUm4RY+JVd3Iw8oTAw== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.07,179,1708416000"; d="scan'208";a="23279262" Received: from fmsmsx601.amr.corp.intel.com ([10.18.126.81]) by fmviesa003.fm.intel.com with ESMTP/TLS/AES256-GCM-SHA384; 04 Apr 2024 06:29:16 -0700 Received: from fmsmsx611.amr.corp.intel.com (10.18.126.91) by fmsmsx601.amr.corp.intel.com (10.18.126.81) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Thu, 4 Apr 2024 06:29:16 -0700 Received: from fmsmsx612.amr.corp.intel.com (10.18.126.92) by fmsmsx611.amr.corp.intel.com (10.18.126.91) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35; Thu, 4 Apr 2024 06:29:15 -0700 Received: from fmsedg602.ED.cps.intel.com (10.1.192.136) by fmsmsx612.amr.corp.intel.com (10.18.126.92) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.35 via Frontend Transport; Thu, 4 Apr 2024 06:29:15 -0700 Received: from NAM10-DM6-obe.outbound.protection.outlook.com (104.47.58.100) by edgegateway.intel.com (192.55.55.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.35; Thu, 4 Apr 2024 06:29:15 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=OzRlHrgsFscYwmesdr9nGfh79ohLSQTIifM8dlB0hZMvMHJE4wrWvCszjPgWVs81++oCeFXX5YgsnhLLaJWvgt4eG5hgtTHGoan6SyenHWS0ySJ3LYAniTfaKvEvGZUYTkaYSISiEnqM5xxJGZzT80oJnOEjU+rA+qS4tLWDegl7eXovPdqlN4+H7svYwE0v13I3O1bYgkOfPaIg76FSfYOOYW2etoJZnt+ii7iIelJfh7j7FfmUjzE5m+8v5M4zqFEU65j/A5gpN/h0qLeUAoLCHBKWxbMdrU5+nlKS1gH0u7m3Ql8aWUM6TocBdil7uypqHTMylmITt8AzPlBJ1A== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=eYWkUzK1K4D5ckN3L5BB4qFUV4zfhRu+kkkYHaXLNH8=; b=lAVdMnVgx8hLLvAAaR/ePPTEpLGhLH2xsFz6VGtbzdhDW6R7tLfPpyUS9jYcWGP52MKYIELenUMfV0gJcjEoYXrCv9m3N22AS2jcgD/lWRs4J8MzsBCQcWFSSF5qIOmE12rk17dCsP/5Z03FyGwtFv0MLcdR8LkXm7UrdOSmigu+eGPghAFl8FJsb3NW0Z82E0rywq8+q+Kv/4yaLhbh0XyNpvOchZiam6WHHx26tlzXDSRDS2Rt3zSCBIju1yLjEPhrdy47VhZZLrd17bkHe3MPWQy4PJJxW1/jnrBW3wZqdHmi09FQ8kHDpw+jXgCdSIcz+gGvoyV575ghrsJ1VQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) by DS7PR11MB6149.namprd11.prod.outlook.com (2603:10b6:8:9e::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.7452.25; Thu, 4 Apr 2024 13:29:13 +0000 Received: from DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::487e:e20c:ad88:9c0f]) by DS0PR11MB7309.namprd11.prod.outlook.com ([fe80::487e:e20c:ad88:9c0f%7]) with mapi id 15.20.7452.019; Thu, 4 Apr 2024 13:29:13 +0000 Date: Thu, 4 Apr 2024 14:29:08 +0100 From: Bruce Richardson To: Morten =?iso-8859-1?Q?Br=F8rup?= CC: , , , Subject: Re: [PATCH v2] eal/x86: improve rte_memcpy const size 16 performance Message-ID: References: <20240302234812.9137-1-mb@smartsharesystems.com> <20240303094621.16404-1-mb@smartsharesystems.com> <98CBD80474FA8B44BF855DF32C47DC35E9F35C@smartserver.smartshare.dk> Content-Type: text/plain; charset="iso-8859-1" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35E9F35C@smartserver.smartshare.dk> X-ClientProxiedBy: DUZPR01CA0111.eurprd01.prod.exchangelabs.com (2603:10a6:10:4bb::12) To DS0PR11MB7309.namprd11.prod.outlook.com (2603:10b6:8:13e::17) MIME-Version: 1.0 X-MS-PublicTrafficType: Email X-MS-TrafficTypeDiagnostic: DS0PR11MB7309:EE_|DS7PR11MB6149:EE_ X-MS-Exchange-SenderADCheck: 1 X-MS-Exchange-AntiSpam-Relay: 0 X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: 4wnE3loIZJyhWWCxy+S37XHiX6XHP4uecDlgL+p5FXw+FhfINCQuq5kXH4G1G4iPpBg0Zpp3tvKzA485ZcwRm0wEFrxeQZpIe8MVJlp28KZE63EBWJsQZEgjmo8W/HMJeZ7F5fc0oSxlZ5uTNhSqUuIreLVnR7xNDOCWpCcaT4MG/EsrMEgLnJvMo55pVj1xN0ixjgywMFNtbqiIU1pai+pCe9fOAkG/Ch2CVA2NDe1UMkSfv6VltIg2PGcuKs2S6LBrsxFUAQZE2w4vLuYX7NEhIOhJ58+o82+42UKkSj8c/MwLiraeuYzHDoeHSRujrP4E6yDUdJASMUSDLDrpSXFM0618TGzNRVfLDqe4A9Xe0WtbN77WdbFqZ/s6vpjmIfeT1+z3EjoVFFljECyZDH/CHaRK81HtAJEg0UVnCsCNsSRshL6Mwy7ClKvbOjuj+8rCvmG5hAjQpJSCP6WcfO8nuEhfKG9qXi/QDft9xtMVIfn7Y9bisyOEjB2j1m1e+gib05V5PJg4nJwTUSMKjhv9s5UImebLougex9TKW5anPie/yQdRf75D3xW4TxkWkCtHTE5ZUZo52MEPcs0XdFQo8QbUT1scuABfwmbunNbq0d2YuvVAMSUwH7Q9VeLi X-Forefront-Antispam-Report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DS0PR11MB7309.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(13230031)(366007)(376005)(1800799015); DIR:OUT; SFP:1102; X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: =?iso-8859-1?Q?Scbjyrt7OBdz681RrJOmXyu9QcQZT9M7i+0t4Ze2/lU05UC/hqadRBr6s7?= =?iso-8859-1?Q?iPaSBs3RIfAZN27aldgRFGowz3gHMUqz6c1w/gXdDmwdpttJy3QP9QQnzQ?= =?iso-8859-1?Q?5xhp6k5Zuk0quN8cWs4Ts25MPSVPojtpVccjVxyzZbCoEoVQG/SLdhC070?= =?iso-8859-1?Q?TcIMPY73/dDg+CCRGWbeO0OpOLOtF3C3vwD69NBlIyyeczvjT2ZB7QKAy0?= =?iso-8859-1?Q?lLEBfLtVMDztSlnnp9R3rzUac56uJ7pCGcGU5MA0N76n15ffvelkKBI+tJ?= =?iso-8859-1?Q?b0yFwcjg7s/9uUFq22xyRjBcXAFwno0eEQwURcK8t1bUBfxKANX92q/x9A?= =?iso-8859-1?Q?t1f0mKZpOpfDF1jp6wDwBUmkd/DTMa+3rPHtW/OicAV5Y2UYf2YGKAEYNk?= =?iso-8859-1?Q?qe7eBYzRSSe8A27+F2iyEFTE5yvT0ydjm4cqVFzgtrrJkWegVx24BRZvF8?= =?iso-8859-1?Q?efqprnQAJvHgZ8CKURdDkJnw9oA0jclSIIbEhLBUC8UyJxgW9wCOmslR72?= =?iso-8859-1?Q?73rpWeuD2+avWPKyiuzrZva3CuOfc7oJkmxJlrNd8IvciNKij0amqxWNNp?= =?iso-8859-1?Q?8xY41lqekAbSJ06mB0roxS6Ns9nakVe//VzXlBBGYwq6uyBZIxhNhUrIuT?= =?iso-8859-1?Q?UYNzurMHvs72ZrL2m7xlkuGLGlxvmSdDyojRJsUMerKfUXA4HuLCLWxrGO?= =?iso-8859-1?Q?XYdV+R8hzbqwBdDpGjQXyfssiVJl2xiOTAAx83jsptlEbKMA5XGclkuyWr?= =?iso-8859-1?Q?gIxi9X/nID9Yn97W4yeczsL7Ncny+JlXrxJWUttgMz7/crAM/3JQuI8udc?= =?iso-8859-1?Q?nHBRXCSaZ46Xs7+YYwF4VluT6w70CWNCVeeraHJYTk9FTl8H+qCIn8JIh7?= =?iso-8859-1?Q?WN3gSpcfhnBx6EoV8nc3Jj1xc2LYNxawTUXHQX5C8KchFOjQJsuE61PRLN?= =?iso-8859-1?Q?Zpmx4TFp10rIozoobYwbi+IjGbpShrXsoRkx0jBEW6g3DnlCCnWA7KoT+O?= =?iso-8859-1?Q?sbDDiHS9mkOKGtFJtZSEK5zvfVSg1eO9lGGyzVmB16ubgPIN0S1S4YB1hQ?= =?iso-8859-1?Q?g1zUMNrQt9dHJN8zSi0WhdVVnhNJA68WATBlW0Rz37PSGg3+zT3F5gZKuh?= =?iso-8859-1?Q?VhfUiq1XvDnHYF8T0huVStkb5pjPz8bfGs+sp6a7bBANITdBr+STT8pEv7?= =?iso-8859-1?Q?qeTRO4DX1U8zNVlvnYSHQA9iGZkspsyz0/wKE+5VrAASfhyIWO7GQTV8tm?= =?iso-8859-1?Q?8yn5mxnuiWS+MJTWcjYfL0s+HZoV3sDATDKfqwA17JYRAVUIq5ulaoS26D?= =?iso-8859-1?Q?EoDtG9KYcWp+bzMzbdk3ZQ+hFfj/1kFgvnrWOGJyv5cC3p5EKUayzW00xn?= =?iso-8859-1?Q?C5exH7PKI3byndajDdT1VmpmQ66JDTqQo7+TCZqErErORrOYIJX6GcGEGk?= =?iso-8859-1?Q?T9XKFCLZ/Gpcj3d6upZy79dxHDaUP2hDrdytyW0NuQyGywxTngxGgYjFua?= =?iso-8859-1?Q?hpQa5e+qw5CPzTYso6ukZ1RrETgsGowXT/svjUYGLAAsa5kkeMirC8cc0Q?= =?iso-8859-1?Q?bEyp/2sF1nreoocjsGXKLc/0DBT/1kmnrHG+KKkPYU6AIgrn/liXjPc+LT?= =?iso-8859-1?Q?jfXfs0BZj2f7lsw1PeZMtg4Amtl1LvSrdU1YObAhf/kSnTUlA44cMpwA?= =?iso-8859-1?Q?=3D=3D?= X-MS-Exchange-CrossTenant-Network-Message-Id: 880800c9-67d0-4825-cd2f-08dc54ab37c9 X-MS-Exchange-CrossTenant-AuthSource: DS0PR11MB7309.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-OriginalArrivalTime: 04 Apr 2024 13:29:13.4570 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-MailboxType: HOSTED X-MS-Exchange-CrossTenant-UserPrincipalName: JD3rcmdtmJvD0VLn1AQFo/SN/NEkcI01DQ1RnUutbg/sz81oPUt7qktIA/eGETFDqqiVgf1MTgR8HUImw65tFaTYXvnhCxD0KS9PnfpRId4= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DS7PR11MB6149 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Thu, Apr 04, 2024 at 01:19:54PM +0200, Morten Brørup wrote: > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > Sent: Thursday, 4 April 2024 12.07 > > > > On Sun, Mar 03, 2024 at 10:46:21AM +0100, Morten Brørup wrote: > > > When the rte_memcpy() size is 16, the same 16 bytes are copied twice. > > > In the case where the size is known to be 16 at build tine, omit the > > > duplicate copy. > > > > > > Reduced the amount of effectively copy-pasted code by using #ifdef > > > inside functions instead of outside functions. > > > > > > Suggested-by: Stephen Hemminger > > > Signed-off-by: Morten Brørup > > > > Changes in general look good to me. Comments inline below. > > > > /Bruce > > > > > --- > > > v2: > > > * For GCC, version 11 is required for proper AVX handling; > > > if older GCC version, treat AVX as SSE. > > > Clang does not have this issue. > > > Note: Original code always treated AVX as SSE, regardless of compiler. > > > * Do not add copyright. (Stephen Hemminger) > > > --- > > > lib/eal/x86/include/rte_memcpy.h | 231 ++++++++----------------------- > > > 1 file changed, 56 insertions(+), 175 deletions(-) > > > > > > diff --git a/lib/eal/x86/include/rte_memcpy.h > > b/lib/eal/x86/include/rte_memcpy.h > > > index 72a92290e0..d1df841f5e 100644 > > > --- a/lib/eal/x86/include/rte_memcpy.h > > > +++ b/lib/eal/x86/include/rte_memcpy.h > > > @@ -91,14 +91,6 @@ rte_mov15_or_less(void *dst, const void *src, size_t n) > > > return ret; > > > } > > > > > > -#if defined __AVX512F__ && defined RTE_MEMCPY_AVX512 > > > - > > > -#define ALIGNMENT_MASK 0x3F > > > - > > > -/** > > > - * AVX512 implementation below > > > - */ > > > - > > > /** > > > * Copy 16 bytes from one location to another, > > > * locations should not overlap. > > > @@ -119,10 +111,16 @@ rte_mov16(uint8_t *dst, const uint8_t *src) > > > static __rte_always_inline void > > > rte_mov32(uint8_t *dst, const uint8_t *src) > > > { > > > +#if (defined __AVX512F__ && defined RTE_MEMCPY_AVX512) || defined __AVX2__ > > || \ > > > + (defined __AVX__ && !(defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION > > < 110000))) > > > > I think we can drop the AVX512 checks here, since I'm not aware of any > > system where we'd have AVX512 but not AVX2 available, so just checking for > > AVX2 support should be sufficient. > > RTE_MEMCPY_AVX512 must be manually defined at build time to enable AVX512: > https://elixir.bootlin.com/dpdk/latest/source/lib/eal/include/generic/rte_memcpy.h#L98 > > Without it, the AVX2 version will be used, regardless if the CPU has AVX512. > > Also, there are some binutils bugs that might disable compilation for AVX512: > https://elixir.bootlin.com/dpdk/latest/source/config/x86/meson.build#L4 > https://elixir.bootlin.com/dpdk/latest/source/config/x86/meson.build#L17 > Yes, I realise that, but the guard here is for an AVX2 block only, so there is no point in checking for AVX512 - it's AVX512 or AVX2. > > > > On the final compiler-based check, I don't strongly object to it, but I > > just wonder as to its real value. AVX2 was first introduced by Intel over 10 > > years ago, and (from what I find in wikipedia), it's been in AMD CPUs since > > ~2015. While we did have CPUs still being produced without AVX2 since that > > time, they generally didn't have AVX1 either, only having SSE instructions. > > Therefore the number of systems which require this additional check is > > likely very small at this stage. > > That said, I'm ok to either keep or omit it at your choice. > > I kept it for consistency, and to support older compilers still officially supported by DPDK. > > I don't feel qualified to change support for CPU features; I'll leave that to the CPU vendors. > Also, I have no clue what has been produced by Intel and AMD. :-) > > > If you do keep > > it, how about putting the check once at the top of the file and using a > > single short define instead for the multiple places it's used e.g. > > > > #if (defined __AVX__ && !(defined(RTE_TOOLCHAIN_GCC) && (GCC_VERSION < > > 110000))) > > #define RTE_MEMCPY_AVX2 > > #endif > > Much of the code reorganization in this patch was done with the intention to improve readability. > > And I don't think this suggestion improves readability; especially considering that RTE_MEMCPY_AVX512 is something manually defined. > > However, I get your point; and if the conditional was very long or very complex, I might agree to a "shadow" definition to keep it short. > I just find it long enough that duplication of it seems painful. :-) I'd rather we check once at the top if we can use an AVX copy vs SSE, rather than duplicate the compiler version checks multiple times. > > > > > > > __m256i ymm0; > > > > > > ymm0 = _mm256_loadu_si256((const __m256i *)src); > > > _mm256_storeu_si256((__m256i *)dst, ymm0); > > > +#else /* SSE implementation */ > > > + rte_mov16((uint8_t *)dst + 0 * 16, (const uint8_t *)src + 0 * 16); > > > + rte_mov16((uint8_t *)dst + 1 * 16, (const uint8_t *)src + 1 * 16); > > > +#endif > > > } > > > > > > /** > > > @@ -132,10 +130,15 @@ rte_mov32(uint8_t *dst, const uint8_t *src) > > > static __rte_always_inline void > > > rte_mov64(uint8_t *dst, const uint8_t *src) > > > { > > > +#if defined __AVX512F__ && defined RTE_MEMCPY_AVX512 > > > __m512i zmm0; > > > > > > zmm0 = _mm512_loadu_si512((const void *)src); > > > _mm512_storeu_si512((void *)dst, zmm0); > > > +#else /* AVX2, AVX & SSE implementation */ > > > + rte_mov32((uint8_t *)dst + 0 * 32, (const uint8_t *)src + 0 * 32); > > > + rte_mov32((uint8_t *)dst + 1 * 32, (const uint8_t *)src + 1 * 32); > > > +#endif > > > } > > > > > > /** > > > @@ -156,12 +159,18 @@ rte_mov128(uint8_t *dst, const uint8_t *src) > > > static __rte_always_inline void > > > rte_mov256(uint8_t *dst, const uint8_t *src) > > > { > > > - rte_mov64(dst + 0 * 64, src + 0 * 64); > > > - rte_mov64(dst + 1 * 64, src + 1 * 64); > > > - rte_mov64(dst + 2 * 64, src + 2 * 64); > > > - rte_mov64(dst + 3 * 64, src + 3 * 64); > > > + rte_mov128(dst + 0 * 128, src + 0 * 128); > > > + rte_mov128(dst + 1 * 128, src + 1 * 128); > > > } > > > > > > +#if defined __AVX512F__ && defined RTE_MEMCPY_AVX512 > > > + > > > +/** > > > + * AVX512 implementation below > > > + */ > > > + > > > +#define ALIGNMENT_MASK 0x3F > > > + > > > /** > > > * Copy 128-byte blocks from one location to another, > > > * locations should not overlap. > > > @@ -231,12 +240,22 @@ rte_memcpy_generic(void *dst, const void *src, size_t > > n) > > > /** > > > * Fast way when copy size doesn't exceed 512 bytes > > > */ > > > + if (__builtin_constant_p(n) && n == 32) { > > > + rte_mov32((uint8_t *)dst, (const uint8_t *)src); > > > + return ret; > > > + } > > > > There's an outstanding patchset from Stephen to replace all use of > > rte_memcpy with a constant parameter with an actual call to regular memcpy. > > On a wider scale should we not look to do something similar in this file, > > have calls to rte_memcpy with constant parameter always turn into a call to > > regular memcpy? We used to have such a macro in older DPDK e.g. > > from DPDK 1.8 > > > > http://git.dpdk.org/dpdk/tree/lib/librte_eal/common/include/arch/x86/rte_memcp > > y.h?h=v1.8.0#n171 > > > > This would elminiate the need to put in constant_p checks all through the > > code. > > The old macro in DPDK 1.8 was removed with the description "Remove slow glibc call for constant copies": > https://git.dpdk.org/dpdk/commit/lib/librte_eal/common/include/arch/x86/rte_memcpy.h?id=9144d6bcdefd5096a9f3f89a3ce433a54ed84475 > > Stephen believes that the memcpy() built-ins provided by compilers are faster than rte_memcpy() for constant size. > I'm not convinced. > Such a change should be backed up by performance tests, preferably for all supported compilers - especially the old compilers that come with some of the supported distros might not be as good as we would hope. > I would tend to agree with Stephen that whereever possible we should use the built-in memcpy calls. Hence my suggestion of re-introducing the macro. I'm not sure why it previously was seen as slower, it may be that the compiler-expanded memcpy calls are not done beyond a certain size. However, since we lack data, I'm ok with taking the changes in your patch as-is. With the above-flagged superfluous AVX512 check on AVX2 code removed: Acked-by: Bruce Richardson