From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM02-CY1-obe.outbound.protection.outlook.com (mail-cys01nam02on0059.outbound.protection.outlook.com [104.47.37.59]) by dpdk.org (Postfix) with ESMTP id A134E237 for ; Fri, 15 Dec 2017 05:06:45 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=3Z7dFW6iRvg6o9RqAwmY2MQJdFRHwfy4jt7Q63uNcIA=; b=cDVmmrG8mzl+Av9k/MXaVmb1DuUEo8Qg8jbNxKat5LnVn2jPvkBlTFT/WiT9HATHC4zjgsy1gnlai5NDxARaU0CH4wviF8FBC5hLNf/uBLfrPRqLmLWYc+QIiejZEFellUlfLiiS7D6ASPgDv1o2Amrdj8J3jis7m2kInENnOuk= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Jerin.JacobKollanukkaran@cavium.com; Received: from jerin (111.93.218.67) by CY1PR07MB2521.namprd07.prod.outlook.com (10.167.16.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.302.9; Fri, 15 Dec 2017 04:06:42 +0000 Date: Fri, 15 Dec 2017 09:36:24 +0530 From: Jerin Jacob To: Herbert Guan Cc: Jianbo Liu , "dev@dpdk.org" Message-ID: <20171215040623.GB5874@jerin> References: <1511768985-21639-1-git-send-email-herbert.guan@arm.com> <20171129123154.GA22644@jerin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.1 (2017-09-22) X-Originating-IP: [111.93.218.67] X-ClientProxiedBy: PN1PR0101CA0054.INDPRD01.PROD.OUTLOOK.COM (10.174.150.144) To CY1PR07MB2521.namprd07.prod.outlook.com (10.167.16.12) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 90989132-1c30-4ad0-82ed-08d5437140a5 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(5600026)(4604075)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603307); SRVR:CY1PR07MB2521; X-Microsoft-Exchange-Diagnostics: 1; CY1PR07MB2521; 3:0hlIUGoItxpsCx5OxNGIra6O0q6nPEpN+OfbdiZmtlQ+I6E3aGwTwf6X66tbSRb3r8rxJGNZAlEQGikExXV38C4TV1mTuNrBNbWqGo/ej7rE2csrWe6V/GIv7mzmRIMU5mV/l8RIsBno3Uzx7JaZfzDQibbBFY3dX3xrTeKkxYnNPP5NbcgTKrhGfZfja35pYooAsmf9FwHAYBPE1gT4Z22G3wtMMc7mVhzj9K8nEKvQ/GUYatsfigXp0vHiwq0d; 25:IYgoRoQdIo/uFeobtkccts82WCQBUrsBezW9cDwCHUTAuNHMhdUq+sWQG1qwUn4znkEWe3xDd1HrSfXwjZz2W0878SavemUDP8MnymyauobTTpRnAKhk5+xT/qJewRRj90z5EbUP4BlQffFNBkn49eErFN6qvwNsiORTmT5Y3eFmsZJ/v6mYGLu0fSDBr1qU17rLM/43mG89QbKdP67XayZvsOGsv160WN+qWC/DOH/LinJzpTVCsBdsGjBG9+B9MkVIXG3BM0fF3JcQtEvBLuDp4khhe0P9jeDVxugqqgBd6h3/MFNz3Lojjb/aTHP9g9Im8wXuNIDM+JYn4m4QlQ==; 31:sL1Nd/WFJr3dPok+f3//Z+7JphV9UU9EBpahAOtuqRKfiD304v2b9U3vseH/iSLa/3+2qbSEzJinO/u9S4tcAfwJ0XhlrVV0E7WRYf/kKOVa35Celqd15efhx7YXCnD6HVDjeh+HYn88QCvxvK7+WH+DGKC3UaeXwzTlLKy7Gq0n72tk25kBNoWTdEUK8pWR4CR72G2IBUy8c0Nt8yWwGOdS0IUphpWK0MMCH5hNSJY= X-MS-TrafficTypeDiagnostic: CY1PR07MB2521: X-Microsoft-Exchange-Diagnostics: 1; CY1PR07MB2521; 20:Ws1wcBVcw6bGy6UgSm9Zy5/nx5mMR3wnnP8/dCHQr8XgBUyraSiqhFf/+TtLuYYao3mdfgRtoPyQdSu+xTEhW30J3KPNKDhpsWjp8AxiLzx/fKvqmHrVni+ZQt9IvX2Hdzmtmz3E5yKt36CSpeYNuB14ogMX8IsW5yk74QaPfFTb7ohkZ7hgKZ4Mi/tvwBAYECwtCYI0K6p8yGIrqpmvM5OmpVyXQMH7L8ZCAua4o/lW3tvMKo4Qs5HCWgntVzdpFJm+4+NG8UNZ5e3D3aw03uZ4pIE39a40SxZBY+B/rng1/OgzT5r7eWnfJVmOlJCS1a4Gj7Wkon16EbIEBf3l7Kz0dmHFBYXgEs8rMPApE2E1T4SvAkFbNeQN6fhmfl0TSNJNZ6qSPw14x7aYY7ka/87xGnD9ixq93cBvJIllR/U1QeK/Wc47pLiFVdBOKzY3agdsZ8TckGjqtzOORfFFsVcAWtX5sLPs8vvpd3vIltnOsDmUC0tuiJ7Os8KkWb5kmAKDWntZ6DqEQYWNTzS9VW+x09FaCTfd47aSlqxDLSK1HRr3udezoowh0/u/GkLpxs5yGDSvudNZZIaltHg9YM9UXr9XKFr8JDY7iMUiNe0=; 4:eQ2cWJb11ceYm7C+q53Zm7TGCv2n1BQSJK4wfsAD6c9GOrFx0S7/2gCvE152KE+CJ3pEMPzG9eBxoNvgOsO/scZP3oIAKAGpXW2tNTMjWikVsRgw3mVYoWrzru16zIjXoNOpRTBS/8VPPRes9sWn60zTl2Cq/4Dh3kHUF/ubnFMhJaDRHFuH7e7TUhqP0QYAYlX/IS7NcTJYC4SVw8Nlz3BDRgatFKThFOZAdQBET2dIHJcQAB149BFxf7M8IReZj3Ez9YujnI31WbMMuD1zA4IyTPt3RjOVqx2zIIhcAKmyLk7NpqwOUCM6ZPyZbMel X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(180628864354917); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(2401047)(5005006)(8121501046)(10201501046)(3231023)(93006095)(3002001)(6041248)(20161123560025)(20161123555025)(20161123562025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123564025)(20161123558100)(6072148)(201708071742011); SRVR:CY1PR07MB2521; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:CY1PR07MB2521; X-Forefront-PRVS: 05220145DE X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(396003)(346002)(376002)(366004)(39860400002)(199004)(43544003)(189003)(13464003)(1076002)(6916009)(3846002)(5009440100003)(23726003)(6666003)(76176011)(81166006)(6496006)(33656002)(8676002)(42882006)(52116002)(2906002)(81156014)(33896004)(66066001)(4326008)(7736002)(2950100002)(5660300001)(47776003)(50466002)(68736007)(575784001)(16526018)(316002)(105586002)(25786009)(53936002)(55016002)(83506002)(16586007)(106356001)(97736004)(229853002)(9686003)(305945005)(8936002)(54906003)(6246003)(72206003)(33716001)(6116002)(386003)(53546011)(58126008)(478600001)(18370500001); DIR:OUT; SFP:1101; SCL:1; SRVR:CY1PR07MB2521; H:jerin; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; CY1PR07MB2521; 23:KKfD8nGLChXHhUV00yiPtV5BYt0xaRx6BKDfv0oRe?= =?us-ascii?Q?fGF4ZYCLQxSrwMYBf4vUmj1LGLdvAOzrGVmF997VaR6EiucqRziwzR/V8JiO?= =?us-ascii?Q?+Kg4/yLc93qyCAKv4cOD32PU2/7rBqSdsZeRRsy6+tfsWf6l07y/FlV9l1Qm?= =?us-ascii?Q?KfRY7m3HpEvdGdOWbYctntlpPVqrFlD+RpUKpF3Eb4CDeMhZSfXdZiPa/E2i?= =?us-ascii?Q?4zJbQaIVKYJo3fapQIA8dXM53W/SwLtxoXBotTmkfru2cvpiFoOkSSmYuCby?= =?us-ascii?Q?NDw1Ppbmk8wCxCGgp84vquZys4j9aKUmvx+h+7FYMcFCp7n9syZvKoSbnTiC?= =?us-ascii?Q?W7ITDDzXhDVqQP7YYHiTKYBKWvbSMeO+fHKHIdNOZbEN278J2X51GGeiaLM+?= =?us-ascii?Q?ddP3WW7SHaMICV+clRV+vISIIvx9GshIBekaKE1l53+JtBGmkCPgmmNF6hkm?= =?us-ascii?Q?8HoQFQJhVjZDYriiVvRQTjd1JJNZ0qhJJWbSPG0iB5akQHtvCJBhAazGRSdX?= =?us-ascii?Q?t4POcBP92L0MDESXs6PotvBvZ2GALKhX3TrFz6zQXzOQ6DfsHfMbvuIMatxJ?= =?us-ascii?Q?9MHH3KI/gOTGonPRXmb6Sp/YygWKfDZeiv2vLBunIFOkbCiGEdhk4Ahdaf95?= =?us-ascii?Q?M1HWMVHWH/w7CavNL6E60duVmlAeQ1wiNUYjbxdJYCrjWcZcdmCNAxmfvTcZ?= =?us-ascii?Q?X2a2xXtKgOralA32dgijYJwsa8GjhW5521Y8n2Cbp7yWks5+YSa1HfyBRieP?= =?us-ascii?Q?J/5AHxojQflCrOA8xytl3FYWFKPWOmobDPTKF0dM6QJu3kzJq7nVmgo6SUOl?= =?us-ascii?Q?ciVMC80YGH6JcpqvTqgsdHUEQGku3xN/TtpSkLCAs5xFzkbYsyig65psKLTN?= =?us-ascii?Q?JY9UULCtEEN39+8bjOwZ4dFirvPpUUS82KvJNzr8uV8q8+OhL1kS4IkUphjp?= =?us-ascii?Q?U+gApfDZ4KjvgjtFJX2dVUbpeeR4NuUWJ6VJSK2bPO+9DlaUEy2rsy8B8ml+?= =?us-ascii?Q?EUNZ7vEk0MJfFLVSc2S2vYNfD5RBF16g+PVxXEiht/HxJCVsTKKXNqQSENLn?= =?us-ascii?Q?LN9cgD4Tfbt4Edg7i1z34V/G6VTe9hEBgpHkujlQp195p46ELApXtQUgtXYr?= =?us-ascii?Q?7Wbvg/IymuYuBiFTnFs9qD2/mlFLOQHPcD7oXZ5PeGzc0S2EITugr4lDTqec?= =?us-ascii?Q?D1f6alLTWOUYtPMDaWbz9EY+cwhZ/0jUQpHOVI7Bxby9Z5wUJQB1fnVLy0wz?= =?us-ascii?Q?sn+19kOqeov+jur6Ua15cZKzbjzecUA1Gzlkt1gBWejd+3pSKR5fhdPj4bPm?= =?us-ascii?Q?n5B9qOlztn41bXLOkSajcVoTjCYfm0FCWAokBevDz6J?= X-Microsoft-Exchange-Diagnostics: 1; CY1PR07MB2521; 6:HHzN841cSZ0w2DqFTu8TfEgJnlG3ZKJtdu8VxGDAxGDCOpDf4Fsq0w9NZwpF1Vg//0d9Kuw4eVXDh7PTSI/deAGabeDIEfYIOHux9QM3UxV6VEld+gtWHyvNQv2zwM9M+/rfjF0cBF8vNBJwYDaiN5Q5P3J6DQkZNxQNNxfUEO/HGiKjyZbjP59Z1UrE2N9Ei4xGeg2ZOovTlpcszTIu7sqDoNmCTifuJp5bnJ5PZMbsuYmTXv5k0t8vhm9+WR+OW0UnRR9tHRrX4L8P7pv443AfO/3SqlJg+3K823yBdTYbKLkcXPmJ86kHvuqKnMlSGg8xzyjLei4klf0lqZBVtt/qAMAyq1ttG1z5wD0HxiY=; 5:znoTyl5GU/FLiiQfsgtre2qWiG2eJaV7HwYLEGmxP1FgA1lpQSiuqorND+yC0avhw+WMCkCLBx0Rt7Lz3Pm0nKOj94x54S9e9szvf+dYTD91UxFvV/Ky+9yR3lEyLxjjpZeu58h8bb0oID7VwB42u+POH1fHp0Z/YmqedIJe5K0=; 24:3ndrR5QU4A3oRKzr5KC3H0RDLzCzWN5MaK85OnaozFS7zz15FicbLxFtUPETHQ1LEjoFAf8eRkgiJnWCjee3VZm+5BdnPGdeUMA086MQGfI=; 7:/dZLYULw3m603ech+XU+pmXs4MHYL8hGwn47whvEwHRN8iLK1SjA52wfahf+DcFT8e7uQkmwZfZQV3ALHD0cgnjMb/IGnsOE79RE5/8Ogn4HuOVKBpnmLbv3eQVGmHavGycTBNiWe/aowMJQprN4c9PpS8vZ1TNMrFJraZR8BXMWBmmlVFnxXM02StUr4C6JCMkF/YkXQ8ZIQtWOQQEV7lRucRATatezNXLJMPYjHJxTCFqUvghwp/XBsQatnise SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Dec 2017 04:06:42.3528 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 90989132-1c30-4ad0-82ed-08d5437140a5 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR07MB2521 Subject: Re: [dpdk-dev] [PATCH] arch/arm: optimization for memcpy on AArch64 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Dec 2017 04:06:46 -0000 -----Original Message----- > Date: Sun, 3 Dec 2017 12:37:30 +0000 > From: Herbert Guan > To: Jerin Jacob > CC: Jianbo Liu , "dev@dpdk.org" > Subject: RE: [PATCH] arch/arm: optimization for memcpy on AArch64 > > Jerin, Hi Herbert, > > Thanks a lot for your review and comments. Please find my comments below inline. > > Best regards, > Herbert > > > -----Original Message----- > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com] > > Sent: Wednesday, November 29, 2017 20:32 > > To: Herbert Guan > > Cc: Jianbo Liu ; dev@dpdk.org > > Subject: Re: [PATCH] arch/arm: optimization for memcpy on AArch64 > > > > -----Original Message----- > > > Date: Mon, 27 Nov 2017 15:49:45 +0800 > > > From: Herbert Guan > > > To: jerin.jacob@caviumnetworks.com, jianbo.liu@arm.com, dev@dpdk.org > > > CC: Herbert Guan > > > Subject: [PATCH] arch/arm: optimization for memcpy on AArch64 > > > X-Mailer: git-send-email 1.8.3.1 > > > + > > > +/************************************** > > > + * Beginning of customization section > > > +**************************************/ > > > +#define ALIGNMENT_MASK 0x0F > > > +#ifndef RTE_ARCH_ARM64_MEMCPY_STRICT_ALIGN > > > +// Only src unalignment will be treaed as unaligned copy > > > > C++ style comments. It may generate check patch errors. > > I'll change it to use C style comment in the version 2. > > > > > > +#define IS_UNALIGNED_COPY(dst, src) ((uintptr_t)(dst) & > > > +ALIGNMENT_MASK) #else // Both dst and src unalignment will be treated > > > +as unaligned copy #define IS_UNALIGNED_COPY(dst, src) \ > > > +(((uintptr_t)(dst) | (uintptr_t)(src)) & ALIGNMENT_MASK) > > #endif > > > + > > > + > > > +// If copy size is larger than threshold, memcpy() will be used. > > > +// Run "memcpy_perf_autotest" to determine the proper threshold. > > > +#define ALIGNED_THRESHOLD ((size_t)(0xffffffff)) > > > +#define UNALIGNED_THRESHOLD ((size_t)(0xffffffff)) > > > > Do you see any case where this threshold is useful. > > Yes, on some platforms, and/or with some glibc version, the glibc memcpy has better performance in larger size (e.g., >512, >4096...). So developers should run unit test to find the best threshold. The default value of 0xffffffff should be modified with evaluated values. OK > > > > > > + > > > +static inline void *__attribute__ ((__always_inline__))i use __rte_always_inline > > > +rte_memcpy(void *restrict dst, const void *restrict src, size_t n) > > > +{ > > > +if (n < 16) { > > > +rte_memcpy_lt16((uint8_t *)dst, (const uint8_t *)src, n); > > > +return dst; > > > +} > > > +if (n < 64) { > > > +rte_memcpy_ge16_lt64((uint8_t *)dst, (const uint8_t *)src, > > n); > > > +return dst; > > > +} > > > > Unfortunately we have 128B cache arm64 implementation too. Could you > > please take care that based on RTE_CACHE_LINE_SIZE > > > > Here the value of '64' is not the cache line size. But for the reason that prefetch itself will cost some cycles, it's not worthwhile to do prefetch for small size (e.g. < 64 bytes) copy. Per my test, prefetching for small size copy will actually lower the performance. But I think, '64' is a function of cache size. ie. Any reason why we haven't used rte_memcpy_ge16_lt128()/rte_memcpy_ge128 pair instead of rte_memcpy_ge16_lt64//rte_memcpy_ge64 pair? I think, if you can add one more conditional compilation to choose between rte_memcpy_ge16_lt128()/rte_memcpy_ge128 vs rte_memcpy_ge16_lt64//rte_memcpy_ge64, will address the all arm64 variants supported in current DPDK. > > In the other hand, I can only find one 128B cache line aarch64 machine here. And it do exist some specific optimization for this machine. Not sure if it'll be beneficial for other 128B cache machines or not. I prefer not to put it in this patch but in a later standalone specific patch for 128B cache machines. > > > > +__builtin_prefetch(src, 0, 0); // rte_prefetch_non_temporal(src); > > > +__builtin_prefetch(dst, 1, 0); // * unchanged * # Why only once __builtin_prefetch used? Why not invoke in rte_memcpy_ge64 loop # Does it make sense to prefetch src + 64/128 * n