From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id ED4F1A0032; Fri, 14 Jan 2022 11:54:22 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id DA52E42777; Fri, 14 Jan 2022 11:54:22 +0100 (CET) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by mails.dpdk.org (Postfix) with ESMTP id 08D3F40DDD for ; Fri, 14 Jan 2022 11:54:20 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1642157661; x=1673693661; h=from:to:cc:subject:date:message-id:references: in-reply-to:content-transfer-encoding:mime-version; bh=EnxcrF422yXoizyAEGnLsYiqjNowaaZ2PV23blZtvMY=; b=Oqu8EJDOXf5ak4ebNCVEV+Sm7SKM3ym1fYyFhdaZxtIQ7OffZ87XVXIo +uwBLa7kKMlxf+hXaVRl2WVzU61WyoEv0NlIOBFkELEV10lfDCEmEdm5p DuomhKUT4N0y8Up6iKkr1gjL02/p3BIyNIWpFVO3m5GpuWM8vfWlF7pHL xKAKzrhCYk0nwBH0CPPVCYj1tV1Yg4BS/0eUKrFqavQxkfaEKFgXv0tL0 JAMseuwxU67xoWM2FqQZoLlWizDjLSm9ACDZXJK1ZFAP8S4QEYdWg7Ff9 zHE5SRwc+LQBrdN7oyj0j+j+MyHbq36CMbkZqTnB2hIeNkENoiT8ppcHe Q==; X-IronPort-AV: E=McAfee;i="6200,9189,10226"; a="268589069" X-IronPort-AV: E=Sophos;i="5.88,288,1635231600"; d="scan'208";a="268589069" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2022 02:54:20 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.88,288,1635231600"; d="scan'208";a="530190630" Received: from fmsmsx606.amr.corp.intel.com ([10.18.126.86]) by orsmga008.jf.intel.com with ESMTP; 14 Jan 2022 02:54:19 -0800 Received: from fmsmsx610.amr.corp.intel.com (10.18.126.90) by fmsmsx606.amr.corp.intel.com (10.18.126.86) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.20; Fri, 14 Jan 2022 02:54:19 -0800 Received: from fmsmsx611.amr.corp.intel.com (10.18.126.91) by fmsmsx610.amr.corp.intel.com (10.18.126.90) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.20; Fri, 14 Jan 2022 02:54:19 -0800 Received: from fmsedg602.ED.cps.intel.com (10.1.192.136) by fmsmsx611.amr.corp.intel.com (10.18.126.91) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2308.20 via Frontend Transport; Fri, 14 Jan 2022 02:54:19 -0800 Received: from NAM12-BN8-obe.outbound.protection.outlook.com (104.47.55.177) by edgegateway.intel.com (192.55.55.71) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2308.20; Fri, 14 Jan 2022 02:54:18 -0800 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=hl5Hvc0ywoASg5ICBBez8/GDh+Emiuotcqol824IGH7ag9PP7+zDaNUCC5NK3HLcYVM4Kk1wLmoEC1qgMA3F0Fdrjv2vHgdWKmNPRM/eUq+VQ92VhC56Sfoa92U61NddaImY8WvA92rO18b4q/Ni4XQRVPaGkoKsPq3XUvs+zIs2K/lUAAp08H70aoyijJWN8C+Nw++hhbw7kv3nFQtgCOH4WrKDpbMXnfHYXWTXfnr66woedPo86omVMzOO1EeJQvujjeer+dwrXpTeN7Yg9gDOhuXYISeQGkI2U+yipaYNVCBGKcC9+5bLi53Vimmy0/SM61+a3GMhJHWM9TRsrQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-AntiSpam-MessageData-ChunkCount:X-MS-Exchange-AntiSpam-MessageData-0:X-MS-Exchange-AntiSpam-MessageData-1; bh=8B0L9wvzSUv28ljZJboSHnlV4oZ8hLK7hb50QFUtC2s=; b=b8Dgeq4TvRaybiG8xcuvYSuP61hK2okGMU+IB0ieifZmJJur4v/ILU4X6pixiwDiTYvfYRFHKxYXcwVky8nf5XMxofi1kf8mRy4DGV25phzyrIVHiIq/o8jXY39b6oSP40hP78isOxGYc+6HzrnuubCZN6DUwgt7FtwOH5FLWv3iVETs+iHZZGx5VQ9AKeHOaCFoAvzH1SQqpgeTk1yIjhJkr/xpm0xVefleDtJFhITmnslEhzn5jOIRbL34bTvdrml4Or5iPK379FKtcM5i/hzrUBfH2y03XUrLp84p1Nq/s9UQBZPHEvnvlBt5GTUsEPLSQKSoMzcudo+iznMDhw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none Received: from DM6PR11MB4491.namprd11.prod.outlook.com (2603:10b6:5:204::19) by DM6PR11MB2697.namprd11.prod.outlook.com (2603:10b6:5:c3::19) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4888.11; Fri, 14 Jan 2022 10:54:14 +0000 Received: from DM6PR11MB4491.namprd11.prod.outlook.com ([fe80::7de4:731c:cee2:49c2]) by DM6PR11MB4491.namprd11.prod.outlook.com ([fe80::7de4:731c:cee2:49c2%3]) with mapi id 15.20.4888.012; Fri, 14 Jan 2022 10:54:14 +0000 From: "Ananyev, Konstantin" To: =?iso-8859-1?Q?Morten_Br=F8rup?= , "Richardson, Bruce" CC: Jan Viktorin , Ruifeng Wang , David Christensen , "dev@dpdk.org" Subject: RE: rte_memcpy alignment Thread-Topic: rte_memcpy alignment Thread-Index: AdgJJKuLTRnjdnFpTB+l82T17Hz8JgAAf5GAAAF+hAAAAYyb4A== Date: Fri, 14 Jan 2022 10:54:14 +0000 Message-ID: References: <98CBD80474FA8B44BF855DF32C47DC35D86E00@smartserver.smartshare.dk> <98CBD80474FA8B44BF855DF32C47DC35D86E02@smartserver.smartshare.dk> In-Reply-To: <98CBD80474FA8B44BF855DF32C47DC35D86E02@smartserver.smartshare.dk> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.6.200.16 authentication-results: dkim=none (message not signed) header.d=none;dmarc=none action=none header.from=intel.com; x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 8a9937e2-2f40-4fbe-b5b9-08d9d74c3477 x-ms-traffictypediagnostic: DM6PR11MB2697:EE_ x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:9508; x-ms-exchange-senderadcheck: 1 x-ms-exchange-antispam-relay: 0 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: xYUWax8pz6r7U7rDQGkm4LC2apGWp+1/YOTQh+55O7ToU6UFrlcFlevs3c/VP6iJEio61OBZDfurXJubKowYEJefKPli3zPfjJmDjp0Ft2f8EPwK42ZdMYUy7D86qnHIMxc1J+rfrbNoxI//JV/xIIT5F5rIKsSSSxJQ6paG8jOAwiG769aT2pZ/6Zp7AGYI0fQyby6BN7fm3Mw4NEflRURHJPClR0TMmFZzWSSbbVWI8VJ97h9cUX5wZeizqbUrYFriVbWWmbb/2wo1ulKaRGTjp+3iRxNy1PEIMhvC70XPs2e8m48lGWyHKuzXxpYKqlUpFByZbrmySF6qZgimp9nIQh2ZEaN6tTK9CzzNKUgsiDtkmgCa6PhoKt6u6dV3nJ1gF3hpJPFllSY93ZLu8NoN1eXAWcITtEXsB2WdquHeb0AJzJAL3a9KCvJwSO15wdurl4ZnnXxvYyX3zMLnmLaa8EKCt41zYUHM811PA4U8eh2qz6m82BvZz4S9IXA3KvvInAN3OpjwXKfKXjgT/OOU6Z9BumcVhA7uAPmVzlr28/vbv7WYcLi9fIcvYmeU3x5AD3o9ci9ERdTZlcs0zyWzxj0O9iBGiZvEUk3P6pfA+CnryjP+yrf8LmmgkNkfv6sBmmCvkkNZfidi7Evboh7SbOSKbAS09SqIqIJuPX1lYTz7AV5bfgJySPwsmEANr17A1cGyqP7aAtF3Zul0gw== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR11MB4491.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(366004)(76116006)(54906003)(110136005)(316002)(66946007)(2906002)(86362001)(186003)(66556008)(6506007)(53546011)(66446008)(6636002)(8936002)(71200400001)(26005)(7696005)(508600001)(66476007)(8676002)(9686003)(33656002)(64756008)(38070700005)(38100700002)(82960400001)(83380400001)(7116003)(122000001)(4326008)(52536014)(5660300002)(55016003); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?iso-8859-1?Q?V6cK09r8lyH8uLu4MMQWhoOJqstNpaOvGhqzuSc6m3KiCnmZGEqbDXJPS/?= =?iso-8859-1?Q?9WVeiS/sm9jLuqyKdphuay/IsbtLExKpkaBuZWhdo1gI0dqC/QpoYS7Lkr?= =?iso-8859-1?Q?4eyhNOVKbRKZnh46OSyLAFfh0nOoO9ZMM0jsnq3oZb4p4XDUoV85q7JIHB?= =?iso-8859-1?Q?CE+DoEWUpqqKpVudkLU/RLK/Dl2SVe1IfLqPmkkboL1cUA7KE1MfeToJfd?= =?iso-8859-1?Q?OeEMnxofZMKmym+U5bUcQGXoCKytxvg6u0GzKzMyMewUOwUirgNtslSW7A?= =?iso-8859-1?Q?3BcDBaCEKw+UbmR5awisE52/qiNsjoXxqGrEGXn1RmysHxl60pI8XC/bEl?= =?iso-8859-1?Q?yO1pWX/fUpvX3Wulhn783Lals399YgOhmU8iVM8V3u2QakmRTqEMmAyIub?= =?iso-8859-1?Q?j4ghNP49p+dJjBy61RohGZZIDSzZzRk4ejzkZPwHWOmVdtPS6sllQZqz8Q?= =?iso-8859-1?Q?bj1U8cB9pOqw1WOQ7F+/vBS7nFDiRUH+qf6XI3+ML3e3WvFIDGYru8uq9v?= =?iso-8859-1?Q?d2eXs8tXiGMGVJ3o7qsK6iv7pJGvNZkr4b2oaxu6XnfINL9TnBjPz9uKpf?= =?iso-8859-1?Q?IZtl9SYi/BxAiQrtNFfYhur0xgUA5DMaZ97xdOx9ouQPBXz0NHhu4yPfQK?= =?iso-8859-1?Q?v2D3hTH+3teZzGVMCC0JV6Gn+U4NcVGetKoJtyRs3qUkXiseUsyuwD/sc3?= =?iso-8859-1?Q?LgYMldjb3EZIw48F9ESil0j2Nu8TNiLiXS+YqjRka9Iyp6bDZGxK0NzPeP?= =?iso-8859-1?Q?BKfIcXNA8a2SBvG/8a+sOThuKSCEQastGIWjJMiYiPtrpbgY1c7nf6Wt4r?= =?iso-8859-1?Q?4aHGTWXqKHM9+a0f8kKvEjB8duYSyAxo8ThRpEdJpcarVPHhDT4/Dj0+xU?= =?iso-8859-1?Q?38szkhfFvcgKQk7wcFmSctbB6YUCbJxubLHEg0mbYwWk18E6QkOunO04ED?= =?iso-8859-1?Q?qkdRbRj+coGsTRb1+tLeD86ZAFIrubKDrc5sMNb8gYtxKl93V/zufQL98C?= =?iso-8859-1?Q?sins4n4EQN672hPIfQfUDWTxRTG7kcPHIJoZKi3M+swU9h87lOHyGZV2Fm?= =?iso-8859-1?Q?cPQBR1xh0ymXTg3eh0pw0gzqjAkdjAp4qmgTQtvbYKk/ayV4twfPwicJCv?= =?iso-8859-1?Q?UBOSEgQODLuwXgJ4GKGlw4rtEJu5ZSG31TYh8RYLrqhFBgF+a5i/h8Eg4w?= =?iso-8859-1?Q?X7M7mwUfYHZN+dAxCYm71VvqBcr9hThPe7xdwCnA390Wl84QM1GIXCB+zn?= =?iso-8859-1?Q?09Lwcx0tLs3EuqE0oEVwhcHB4lF1OkwFH0e2yv3M0g3HrW0SN0Xg88GN86?= =?iso-8859-1?Q?THrnkTv3wzquNhrIETXF0dRPF/qFWSpirDL/de+H+ZKhPYps7WuEt2RC0j?= =?iso-8859-1?Q?+mJOveBwZ3L2HQya6TjnRm/lSegkpZB46U2f9zbHsT/zTK1xgUmM9+1amu?= =?iso-8859-1?Q?4AQEgTM+yX86hRKDTNElR7huL6lH2B3VQe0rTPqwVk5A5HoljVYvO/wA3B?= =?iso-8859-1?Q?wePEEO2FxlLZbXb7idjCURdZmRJWvWU1wsNMUEQxiuHUb704vuVPyfLzOK?= =?iso-8859-1?Q?1zOdEMKqpQvkmUTotPeft03dj5e93vieYnFS24KalD9TrqsDCiCm2Yslaq?= =?iso-8859-1?Q?M24cALHpdcn+aA4V2wVbY1GwLH3L+jdpTqTI0itaTk6lEpEYhLSehwFQq5?= =?iso-8859-1?Q?akfv+X1eV7NCopomP3uq0ZpfzVbwDLdMH172uh3xBQ3J5fpG92siR4NuBm?= =?iso-8859-1?Q?hP0g=3D=3D?= Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM6PR11MB4491.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 8a9937e2-2f40-4fbe-b5b9-08d9d74c3477 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Jan 2022 10:54:14.6445 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: 2jLkfM2PfF5g2VAxm6HKCvq2akz0iYVb4QRW8aJ8cJ78bZV+43Ziw6Vaoz9IIOlKxFeCuAgK9XpVB2tZ+MGK9xmoLtECgZVXPf9T/d4yWkM= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM6PR11MB2697 X-OriginatorOrg: intel.com X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org > -----Original Message----- > From: Morten Br=F8rup > Sent: Friday, January 14, 2022 9:54 AM > To: Richardson, Bruce > Cc: Jan Viktorin ; Ruifeng Wang ; David Christensen ; > Ananyev, Konstantin ; dev@dpdk.org > Subject: RE: rte_memcpy alignment >=20 > > From: Bruce Richardson [mailto:bruce.richardson@intel.com] > > Sent: Friday, 14 January 2022 10.11 > > > > On Fri, Jan 14, 2022 at 09:56:50AM +0100, Morten Br=F8rup wrote: > > > Dear ARM/POWER/x86 maintainers, > > > > > > The architecture specific rte_memcpy() provides optimized variants to > > copy aligned data. However, the alignment requirements depend on the > > hardware architecture, and there is no common definition for the > > alignment. > > > > > > DPDK provides __rte_cache_aligned for cache optimization purposes, > > with architecture specific values. Would you consider providing an > > __rte_memcpy_aligned for rte_memcpy() optimization purposes? > > > > > > Or should I just use __rte_cache_aligned, although it is overkill? > > > > > > > > > Specifically, I am working on a mempool optimization where the objs > > field in the rte_mempool_cache structure may benefit by being aligned > > for optimized rte_memcpy(). > > > > > For me the difficulty with such a memcpy proposal - apart from probably > > adding to the amount of memcpy code we have to maintain - is the > > specific meaning > > of what "aligned" in the memcpy case. Unlike for a struct definition, > > the > > possible meaning of aligned in memcpy could be: > > * the source address is aligned > > * the destination address is aligned > > * both source and destination is aligned > > * both source and destination are aligned and the copy length is a > > multiple > > of the alignment length > > * the data is aligned to a cacheline boundary > > * the data is aligned to the largest load-store size for system > > * the data is aligned to the boundary suitable for the copy size, e.g. > > memcpy of 8 bytes is 8-byte aligned etc. > > > > Can you clarify a bit more on your own thinking here? Personally, I am > > a > > little dubious of the benefit of general memcpy optimization, but I do > > believe that for specific usecases there is value is having their own > > copy > > operations which include constraints for that specific usecase. For > > example, in the AVX-512 ice/i40e PMD code, we fold the memcpy from the > > mempool cache into the descriptor rearm function because we know we can > > always do 64-byte loads and stores, and also because we know that for > > each > > load in the copy, we can reuse the data just after storing it (giving > > good > > perf boost). Perhaps something similar could work for you in your > > mempool > > optimization. > > > > /Bruce >=20 > I'm going to copy array of pointers, specifically the 'objs' array in the= rte_mempool_cache structure. >=20 > The 'objs' array starts at byte 24, which is only 8 byte aligned. So it a= lways fails the ALIGNMENT_MASK test in the x86 specific > rte_memcpy(), and thus cannot ever use the optimized rte_memcpy_aligned()= function to copy the array, but will use the > rte_memcpy_generic() function. >=20 > If the 'objs' array was optimally aligned, and the other array that is be= ing copied to/from is also optimally aligned, rte_memcpy() would use > the optimized rte_memcpy_aligned() function. >=20 > Please also note that the value of ALIGNMENT_MASK depends on which vector= instruction set DPDK is being compiled with. >=20 > The other CPU architectures have similar stuff in their rte_memcpy() impl= ementations, and their alignment requirements are also different. >=20 > Please also note that rte_memcpy() becomes even more optimized when the s= ize of the memcpy() operation is known at compile time. If the size is known at compile time, rte_memcpy() probably an overkill - m= odern compilers usually generate fast enough code for such cases. >=20 > So I am asking for a public #define __rte_memcpy_aligned I can use to mee= t the alignment requirements for optimal rte_memcpy(). Even on x86 ALIGNMENT_MASK could have different values (15/31/63) depending= on ISA. So probably 64 as 'generic' one is the safest bet. Though I wonder do we really need such micro-optimizations here? Would it be such huge difference if you call rte_memcpy_aligned() instead o= f rte_memcpy()?