From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-BY2-obe.outbound.protection.outlook.com (mail-by2nam03on0052.outbound.protection.outlook.com [104.47.42.52]) by dpdk.org (Postfix) with ESMTP id C41CE29CF for ; Mon, 18 Dec 2017 05:18:05 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=sDGBecmqXeDJ6WShB11j1oVpmiAUAl0aAyBR98z0jio=; b=a0bmYWU4Gn4IoNSGL83zEEDCo0O/GuKvQriwsdZFttk+B7ZtVNUhUbwUClJiIbZnEN51Tu0Y2433r01qZnySyP32oFIP2QYEUBHhshprYIac0kKa5Iril9vPqKrBpdYvbkvgmO43cP2DNSQqgoq60rwXVhFvvb0B8obE8hxHObc= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Jerin.JacobKollanukkaran@cavium.com; Received: from jerin (122.167.86.17) by CY1PR07MB2521.namprd07.prod.outlook.com (10.167.16.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.302.9; Mon, 18 Dec 2017 04:18:02 +0000 Date: Mon, 18 Dec 2017 09:47:43 +0530 From: Jerin Jacob To: Herbert Guan Cc: Jianbo Liu , "dev@dpdk.org" Message-ID: <20171218041742.GA5033@jerin> References: <1511768985-21639-1-git-send-email-herbert.guan@arm.com> <20171129123154.GA22644@jerin> <20171215040623.GB5874@jerin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) X-Originating-IP: [122.167.86.17] X-ClientProxiedBy: PN1PR01CA0101.INDPRD01.PROD.OUTLOOK.COM (10.174.144.17) To CY1PR07MB2521.namprd07.prod.outlook.com (10.167.16.12) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 4ef1d44f-98a5-4fda-0a3e-08d545ce5521 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(5600026)(4604075)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603307); SRVR:CY1PR07MB2521; X-Microsoft-Exchange-Diagnostics: 1; CY1PR07MB2521; 3:Hj7u1Dskme+jzK1ogDN+zXycix3Xcc1fjudjt0YzffoEbokeNi77IILA3V8p8djPNzC/XvipxV1P/H3/QUDWeG0toP7C5AohZ878aYWGcXvu2Tkq9O0gfXp0pZcVic8QiekHaPM14de24JKV5d5wJNJRHgsgYl881j9gWjMRqiBc7kxCRSKKTbqLamnEhA4HQadlRbK+CRySYN7dy/MbaNBTJjI3SBOpI/sY3pil3nYGIGqXE8evZRh+t6SXtwEu; 25:mPZ04OWuQujZ7P/HFTVeD84bmxuQ030LoaTChmQ5mmxz4SC/RmNSCaQ8PswXcxSvrFOlczfFLTceiV3GUIZPSagkyBRtUpDl3iG2X0qXkuHyGeqj38tom3XyElfZuDd2Wv7tSNFSBYeuyHuh38GgfnIvh9ek4REFGQRVsBvS1IinO6jBQfROfbWRz9MYi/f/Ax2iR8oC0+Vj3upO2ZutkPq8vExXcF9GCermHWQEtGxSihO7Nwx77hhflZKT2PqEgbPIHSBhXhRPPvUpLvJT2zFAxJkVCCopeshlJhN2swly/zTxB91zTU+h4h5J0OXvtfhuLu1XzPFBOZtZG6Cx4w==; 31:kHnPLjGLFkoD5YUkRXcsbhwCd4fPU6kGHv4lgjZ9NBNhGuvJuDJGDa8of+UUcOmVZizlm6Lh4g30oBP0ipTTE/eukJHyMq7mAv89cMyitEvQv8IRHIdMZGlaYcjyEIN3UfrN9rUlNxT8ZjZP0/97tT3xNymqiqdKPvGtPYOz98wqkgIkgTVYYgtcalIrW9ZfSpgBc9WS2TkI0C9qCQQbWG4OZ56/W3VKmMhv+efmaZo= X-MS-TrafficTypeDiagnostic: CY1PR07MB2521: X-Microsoft-Exchange-Diagnostics: 1; CY1PR07MB2521; 20:RSC5n+aNemhpcP50sRMcsMa5iDpKkfN8ukPoOrKC7jT2cdUXkOoJ+HRD5fh4kMV8nYYBTaO2DOFD6TGTb/zy23aKJx/bt/VP30XDyLwVm6IPot6tbEG5XHuyoNwOa8OP7kZjxLcc+3EtEEwDPwUGtN9/q/MTJ4L3GXCZR2jZUXZfYpT0MdnYzzVAQWPufN2SnVxT+Z01Z0WcQ504DaiRgT/oRXn4Zlf0wNdIRaRtMZtxDpmsjTp6xaWpbQR2B4gnRjERYrz0E8TvvR17oBEkE/SstZMDqL2TQ8XdxlZPz+eLF9f8QbB2KEg0a7wIeDnLphq0/Wyl/7AsfjhE+3m+E7Be/e/MXdSPGqfmOsDmffeCYBK3q+1lOfsHNxpHrielhs5i9qCqmrPdnKkQtZyDB2qloIoLwxPGKQjKlZQQrfUF44Kql3cm8eg4wC0U7IczajrXpk+M6RYKDb1L6fJRL/c1JjZQMUneiWjkOVX8NloNHEU4jgvBtgcH5BXiSYIGxRo1N8YbItCB4h/R5X4Pk8ZN8R5ouXGaSks4v4acg/7t0KPw5Hzm2+cS2sSQho9CrONhHA08z5hzho3DPkg25wUl1zp+3fsbUyGvt/6nY80=; 4:XrBJAwaeMxpAHIgaPsQXeKE8jcniuEl7ttIXaqJgVjDFg1rn/pcL4iYdtmCouVo/2lAgC5zy18DXxV7edCibFSs5jzk3kt2DY4qycdVr7mdgXUSuK3V5PjDvhnUOdq1Xh+kJb2YV+Nf8qqVj6i+gHxrgvg9kjNEltuBlvPdmrMF/bmJrRCirdtcJoYMH2IdNRVuiKBjnxfokigmWcYoyqLOLdCmaWO85YrZRDTEm2qF97T6OsshW7dgKaI4KidXbdz6UhyiQcL42EAIY1WWb41laO7SmN+mLljGLLm/ZGmoJSstSJhHadxRxF9FN4whM X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(180628864354917); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(2401047)(8121501046)(5005006)(3231023)(10201501046)(93006095)(3002001)(6041248)(20161123564025)(20161123560025)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123562025)(20161123558100)(6072148)(201708071742011); SRVR:CY1PR07MB2521; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:CY1PR07MB2521; X-Forefront-PRVS: 0525BB0ADF X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(376002)(366004)(396003)(39850400004)(346002)(199004)(189003)(13464003)(40434004)(105586002)(6116002)(229853002)(8936002)(58126008)(16526018)(93886005)(16586007)(68736007)(59450400001)(316002)(83506002)(55016002)(25786009)(97736004)(5890100001)(478600001)(9686003)(72206003)(6246003)(305945005)(386003)(53936002)(33716001)(54906003)(106356001)(81166006)(33656002)(8676002)(52116002)(6666003)(81156014)(2906002)(76176011)(1076002)(3846002)(6916009)(42882006)(2950100002)(5660300001)(7736002)(66066001)(50466002)(6496006)(47776003)(23726003)(4326008)(33896004)(18370500001); DIR:OUT; SFP:1101; SCL:1; SRVR:CY1PR07MB2521; H:jerin; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; CY1PR07MB2521; 23:hzxP/itoIcGl9xgYpFDGxiq5fF24DL7yumruVtFYy?= =?us-ascii?Q?H0ZcWLTcmLaPnRrd+9z42idbVQgB0uklk291bvvIpsMIUqGfwmJ9k4yNRBwh?= =?us-ascii?Q?j4CGFYoXtrs+/7CJ0dKwlP4XS4WR5NDnTMXEmk+6pS9vh3uMiTxwx44bJCH9?= =?us-ascii?Q?LO2q0n1YelzLlUyGiitwRpTJQpvrHIyHZL0IMymbzs6Hnhx/0mc6gKxQs5rB?= =?us-ascii?Q?3s868bVnSlGh5mOS9BOJ/RRtHs5gTngc1wSE3K3UnUcvXiIFski06rroPYrF?= =?us-ascii?Q?xjKRJD8gWnVnHXw+iuvIx2lHOGRHjkWrPgxTO5KCeqIW92xvIOqcDGGlV87F?= =?us-ascii?Q?grgUKWZ0mwu5A+rO85ODkZoiZOY0X40dqlgTjGkXmkHZPHVG7qpEb0BE7ObO?= =?us-ascii?Q?dEe3uYC5XDRmwm3xkYBrJvHfOLspEqtn/ZKJtLeoqoqFyalAguqbrZuCaW/u?= =?us-ascii?Q?Os+UL2pJgOt5kGSrXxPe12ire/Ph3vAo+HAaWCwXLKLrKAQLm8D5TzpEoDYz?= =?us-ascii?Q?+Ot/yhlAqVjw0PhUkngYULNt2Dud5c1niLcmM+Ext1zHTNJbwioM9QXINnwh?= =?us-ascii?Q?vPmE/3DCg7hg+haos4+QACwBZusl6mbQARiBfx9FoLeHOoAJeoZ7b1D5yHGb?= =?us-ascii?Q?/1hIPD0fJPszCrnzDApb9/6xvIb/vCJpB4kTULkM+Nk6wr3hKB63/HAlt5Lu?= =?us-ascii?Q?rT4rPdPOFSCHYAynvyLN7ysoEKSeUtxKQ+KsDaglyTQ2rjaPd4C3czkka5M3?= =?us-ascii?Q?Qn3znUJDpAbk2TVIhXd9GlTuplFYADz5fgZi9XtdD6nHw7j8Ms673vo9pSDm?= =?us-ascii?Q?d+4Q1pVd4Y0BMT1SsqLYvAdi7vbob7/dzHzx295QOvkLKneQSkb/akUDhgon?= =?us-ascii?Q?BKWHTBoljeAerom3Yw8r/Hh44IO2EYynhKHSSf50/zHSKqWJVDrgb3MG5Oym?= =?us-ascii?Q?GAkbwRl0zgT5/Lzjq+Emx0atUwqURGy8J68fq9zV2a2sKhHAtWbF2nFn1aXC?= =?us-ascii?Q?oAOqsa5jQ4VPhZrhJ2sMJTWjxLOOAUlk0oMoGmbbNCz7r3z41wRczBCKZFrL?= =?us-ascii?Q?O7CUNsws0lHu0qedEAUBldhUsJCJ/VW9fQ8ntDQGLZvcCD8rQr696f3fe9tR?= =?us-ascii?Q?DgM0bIcClcbPvqatqnrpCZ+bO2N0t7OkytbN7KowLpBiZCaJNKgNQ2rvWOuh?= =?us-ascii?Q?aHSPxA1brpNMkeoeZlqgzq04+6CKQGFwhGcPlYnMuGfzpL7yVf0wyPei/7hf?= =?us-ascii?Q?TdoX+wosquMWrESdKidQ2fR+SiDJoG9Q7jhePDzQC07BmL/4TGX+hI6z3ffA?= =?us-ascii?Q?G89aHtscPjEgzR+ZolFn0/I6TY8Lmn2szfT+U4FLBXD?= X-Microsoft-Exchange-Diagnostics: 1; CY1PR07MB2521; 6:RI8yakGdToaDkhGsnx6XD7dz6Y02JbLu6MyzlejlJD+lfGEbzNZti9C00BqJDSOhWntZBfqlyR/uFE+7jgpGhgmLJ80iSJglhbCRm6EhvB7jDehoYRv5XfxarQasNWS8zWysTNSEIa5ndNleVbcJvuuZbLUOmnPDsaygzhkdXVotRZ3cfhfZwQJjfadaf1+pxcOz/tkBYO0P3e2bTqNT8Gr1GF6g6/estffFHWc6XVGorGDLUoq37KX+UfUD8g+a3jcMrcUhb3yUnemcXbZhPfGZsFX/TkeoRLG9b1f2etVLozbJcZK/Qla8fFBykJ/hi4p5BH6fGuZtdtL27zbB+3B5EB6Ae9db5WNFbpaqOPw=; 5:rKFIxaSIwzYaYM6FYtIplZ0ou9EAmbeysv9dgXiFALMH3txK8oEoBF/tbqPinPAa2lP0/6Ji5pk9BowF+N6aAdAPbbbvxT/hqwrZVYcFJC8tl0nKXDng37VDhJu8LwIumqsXKDdd5k+IaxgPyvSEmfo8H+3ySKGtrwa4OVrlcX4=; 24:NEEUo+Y976aBoNWkJkJfvAQNtrr+pgZ04zVANgD3Y+yXPf+LQy6w8pqyFQa81w3GqLrM/lkYtov+TO8dgIYm0GnNxaWkATxYy+QrmGXldz4=; 7:vXZJB3Y9CyY7EPVVFi0HyPvn94Ygy0GhXq5FgDYGmW78j56lE2B0GYsNuSL1leqbcE8EVxkR/RbkFRMpszA3D/HTZKa2oPmsX7t3ouMTYA6PME3UpjxXFp/eslgPuLN/IojJaMrtGek3E7AQWj1IYeXFEw3MuF5n49aH6Gh68bP/L1DdXlPnVr8KJbIdCL4wg5SYlmC5XQBaBaqP4n4820tOuROdAnGFFgaE8WjoytjgTwujvzeTRWR0bhP9dGg8 SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Dec 2017 04:18:02.0195 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 4ef1d44f-98a5-4fda-0a3e-08d545ce5521 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR07MB2521 Subject: Re: [dpdk-dev] [PATCH] arch/arm: optimization for memcpy on AArch64 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Dec 2017 04:18:06 -0000 -----Original Message----- > Date: Mon, 18 Dec 2017 02:51:19 +0000 > From: Herbert Guan > To: Jerin Jacob > CC: Jianbo Liu , "dev@dpdk.org" > Subject: RE: [PATCH] arch/arm: optimization for memcpy on AArch64 > > Hi Jerin, Hi Herbert, > > > > > > Here the value of '64' is not the cache line size. But for the reason that > > prefetch itself will cost some cycles, it's not worthwhile to do prefetch for > > small size (e.g. < 64 bytes) copy. Per my test, prefetching for small size copy > > will actually lower the performance. > > > > But > > I think, '64' is a function of cache size. ie. Any reason why we haven't used > > rte_memcpy_ge16_lt128()/rte_memcpy_ge128 pair instead of > > rte_memcpy_ge16_lt64//rte_memcpy_ge64 pair? > > I think, if you can add one more conditional compilation to choose between > > rte_memcpy_ge16_lt128()/rte_memcpy_ge128 vs > > rte_memcpy_ge16_lt64//rte_memcpy_ge64, > > will address the all arm64 variants supported in current DPDK. > > > > The logic for 128B cache is implemented as you've suggested, and has been added in V3 patch. > > > > > > > In the other hand, I can only find one 128B cache line aarch64 machine here. > > And it do exist some specific optimization for this machine. Not sure if it'll be > > beneficial for other 128B cache machines or not. I prefer not to put it in this > > patch but in a later standalone specific patch for 128B cache machines. > > > > > > > > +__builtin_prefetch(src, 0, 0); // rte_prefetch_non_temporal(src); > > > > > +__builtin_prefetch(dst, 1, 0); // * unchanged * > > > > # Why only once __builtin_prefetch used? Why not invoke in > > rte_memcpy_ge64 loop > > # Does it make sense to prefetch src + 64/128 * n > > Prefetch is only necessary once at the beginning. CPU will do auto incremental prefetch when the continuous memory access starts. It's not necessary to do prefetch in the loop. In fact doing it in loop will actually break CPU's HW prefetch and degrade the performance. Yes. But, aarch64 specification does not mandate that all implementation should have HW prefetch mechanism(ie. it is IMPLEMENTATION DEFINED). I think, You have provided a good start for memcpy implementation and we can fine tune it _latter_ based different micro architecture. Your v3 looks good. > IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. Please remove such notice from public mailing list.