From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-CO1-obe.outbound.protection.outlook.com (mail-co1nam03on0080.outbound.protection.outlook.com [104.47.40.80]) by dpdk.org (Postfix) with ESMTP id F08752C18 for ; Tue, 19 Dec 2017 08:24:57 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=xWDELXc8gpRGPhMA6tEC656/3ddjqeCM9pxCYCz5JBw=; b=bZ2/DvMYuvmLLKSk4n7hF6NOhjmZomt4QKGOlEVMPLaJclwGQnTytOlqlNzp/51JrOEiXQuVu6SSBNKu5G7juv8SnRwXWpRm/dWPN/1RBt1ce0ie+g/+K1+nu/26BTfSGMZcY+RR6jfBbOgr9MTil8K6pNqBJvLAEym2D4SrfGI= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Jerin.JacobKollanukkaran@cavium.com; Received: from jerin (111.93.218.67) by CY1PR07MB2521.namprd07.prod.outlook.com (10.167.16.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.302.9; Tue, 19 Dec 2017 07:24:54 +0000 Date: Tue, 19 Dec 2017 12:54:32 +0530 From: Jerin Jacob To: Herbert Guan Cc: "dev@dpdk.org" , nd Message-ID: <20171219072431.GA19364@jerin> References: <1511768985-21639-1-git-send-email-herbert.guan@arm.com> <1513565664-19509-1-git-send-email-herbert.guan@arm.com> <20171218074349.GA16659@jerin> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.9.2 (2017-12-15) X-Originating-IP: [111.93.218.67] X-ClientProxiedBy: PN1PR01CA0103.INDPRD01.PROD.OUTLOOK.COM (10.174.144.19) To CY1PR07MB2521.namprd07.prod.outlook.com (10.167.16.12) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 510e6b5f-2a52-4d2a-680b-08d546b19ad5 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(5600026)(4604075)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603307); SRVR:CY1PR07MB2521; X-Microsoft-Exchange-Diagnostics: 1; CY1PR07MB2521; 3:irXEb2mMHJWGPHaiqRDiTltq3BTh51FjxfDFPa7s/fl9kD5Ku+HHu+5GetzeVnSWBX5sHWWXmlbARo87OUJGzaYyJe7HEviXAvqK1yFsYv02M36cwq6nxJqh2B0kC3XOkAirLNCPWs2Z07I2hyE9CqCwraifBzlXyFbyYSzF5uUgogcpNeqw8Lr9VxACgTqMZh0s/csp0dHDX7R4/v+fSRNP1GQ46lZu9IRqmB2qdpaCpW5uKxyuikFTTJOdA5/i; 25:rW2+W3Yj97FL9G3PTLFeCUi0vHRJ193xmpS++5TfaBtAjSQSoCGX/h3Dh9HK2vFGO+Jl2e9FN8qQ3ubXGVoU9a5r9hX7f6Lolj0ogJwTayt8ON4DGVeOzZDG9mOVV14aCrAIiy4JQvKfUB5Dow6haYb52unXPWI9MfdW9dx/CsGq5bI+2N5B8wRTv1+HGS6mWVSvxh9VPihaID/GcoMckr9HGWDs+LG50VegNcCpTpw7OKqh4pnchTvtG7QDsw0EPAVS8cqPEk4TgIzmlfv8RTbiy+SsbpFcFdK+bIW60JZv2TrrC5cxFgOHgHDzl9F3MD7A2ZaIh887udjHzX6LSA==; 31:njEj7YVZod+9VTM0lW+xZyin5KPiS6qCV9XmuqIhEV6Ekr1WP0pk0rK9237AZMcm2dV3YYHPfGVpNxtno4PfFZeoWBnstL2C+FaqAET3Eki087zD9IvtoPJYVNtqEGMDHpZNoKHc1pYC/yMKCDjiDNpKWf8kK5eARlz0VXf+IE3f7nPx3xXFWgrYHx1TdDHxAwanWor1+fZ1EHiUeA84FZKcEfV9PDrJYpDHYTFy7Gk= X-MS-TrafficTypeDiagnostic: CY1PR07MB2521: X-Microsoft-Exchange-Diagnostics: 1; CY1PR07MB2521; 20:fpEVZ0rk+oIFkCFLQVHI8k0MVEjnlxMzOJ5MfEISLiLFxn22JWqt/6Xr5NsOVjAfJnQnF9SZp6z4EA8NAghyUdiEJJgDxo8jYcJN/MWID/U1xxxKW4z3R0HyB2JoTJDrX0Fh+IYHOYabSZk3XReej6VEspF9GvIOV67yAO6oBy2hIL1rCNS+Nxcps03jwqU3NDgZnMjFWEiWedcIu9k01E3CJVpz54GpjzFvCFL4u2o/9z2oOKXOWz8bYP5KSS2Fj9qiUpE4AgxyvP3MPkNZzOrHWPGhoWLYZCNK1tDnOThq2+kU7U78R50natI7ZDQKiGCzAAbnCMoEITSlblfXN8wgzUUVJ8zpQzgHDxhrNLak0p6ZdHbzsYXm4j4VLbeLyfRXSHTDBTiUu2i46Tg0kRgO6its1YIIv/oMl0JGRtAWzSp/5HXgWtC4RBxuxDU0jJvyL60z4pwRYwDp0jKV1Nx4uomoZXehFu1vmtGIu2jNdBzp8Zdsx3WzCBC2ruwGlfkAMnsdr/NGv8qK6GcOcn23zbfPH3UErVPPdOp9vyEN/EUoyXkwYtSXRU+Y/siP2sOex14wB45qNMfJ6HpoEP4PDZ0J1WgBXufhABcNCXI=; 4:5VVtMukidVbASiDrqRwQAqp9swT1FkOQE+kD3/1yW/2ZeFUzm8mx+2aSxZ/6hHHGBvqUDIu2t9gxMUoCVtIkTQCIuHnJAuXEQgFjF6TcixwF+gN6smrxAIlgFTse/6OOnyEBdH605TaTJO1EXK6jjj+taqadTQUEZCU5gbBqxYtnnymgTwWg+0dxo4KUWRrh4NHsUCfWFf8zF9FPFeQdoGS+gcBDg0MAbNOLdIC83v8Jg2MquKSKhaG6IwiMxKOiXrDS0BKgLWXU1ob0lHNQgY3/55ti687BQUw58cgpl7/ZEwYLA2RjaIABC7YHeYgE X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(180628864354917); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(2401047)(8121501046)(5005006)(3002001)(10201501046)(3231023)(93006095)(6041248)(20161123558100)(20161123562025)(20161123564025)(20161123555025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123560025)(6072148)(201708071742011); SRVR:CY1PR07MB2521; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:CY1PR07MB2521; X-Forefront-PRVS: 052670E5A4 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(39860400002)(366004)(376002)(396003)(346002)(199004)(13464003)(189003)(105586002)(6116002)(229853002)(55016002)(16526018)(68736007)(59450400001)(93886005)(58126008)(8936002)(316002)(25786009)(83506002)(33716001)(16586007)(6246003)(53546011)(478600001)(97736004)(106356001)(9686003)(53936002)(72206003)(305945005)(54906003)(386003)(76176011)(81166006)(8676002)(6666003)(52116002)(47776003)(81156014)(33656002)(2906002)(3846002)(575784001)(1076002)(5009440100003)(6916009)(42882006)(2950100002)(5660300001)(7736002)(50466002)(6496006)(4326008)(66066001)(33896004)(23726003)(18370500001); DIR:OUT; SFP:1101; SCL:1; SRVR:CY1PR07MB2521; H:jerin; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; CY1PR07MB2521; 23:Uh2khxXgI07LZHnyLG3Azg/gjFU+/50XPpjLpw3Jz?= =?us-ascii?Q?cvglXR4KryVAA8k6Px8tmneI3BQteTl6nQrrPF5pl1byJIGqcyWNJA3CMxK9?= =?us-ascii?Q?1mgoBMo2pvegzpwlb1DeTKQrXl5PTBfJOBJVO+eV4CfLF6fVEMy3XV7j1U/A?= =?us-ascii?Q?vLY9aycB2jFntHTTykfNiMQ7bPVkZj7SoE7Q6jA1bV22MdaGtjz5js8zZLya?= =?us-ascii?Q?RXLFPYSzje19AtqjxJnamz/MHdTI9crinWDeoVZ3GCn0uoX8daSoPAfVOMGB?= =?us-ascii?Q?vJhacSDqg0miOMzC3utZitTCxiuyspMLask3Hyav8nFxGFa5yMtfkV04V7Do?= =?us-ascii?Q?idYtq/JZIykSAAOovX/egkMoHyuoTJ3BP+gDQl7I9Ped5X+0L3iSYG7n3iEf?= =?us-ascii?Q?Qp3zM4VciwXzTjhn+Rr/OJWsjtZ9FtLYq9xI8Z5aOiQW7TCpoKlE184ghiYh?= =?us-ascii?Q?5RKvKNL4o/BoNBZS52WcrAq9R9ZHgyBgstyr8NhPX6MySNz0Kh25QcuOajWa?= =?us-ascii?Q?F1CryelY4ptAudP0EQPGTh7837MRl9W14rEsTSmtUf3F5yE2M94ik0NotARv?= =?us-ascii?Q?ZeLZ0Fiy6JGBJalMHUKItGgfqZ5vhugRaY8KJvAKAebJDcNweCSKBa8hW2C3?= =?us-ascii?Q?5XLvMjIgrqGMtEEseQBZ7J19JqYjoP1wBHoYQIFjbbnLPSMfyLnsJHiZdt3r?= =?us-ascii?Q?btk7qj1DevVZbzrdFwVCImVlYG0ojB0hTxf8HWsOEI6v4Cz48ZvPZOM6V+8F?= =?us-ascii?Q?39ET5WuleP74OooIXhq8kQMq1D4c0G9zibqY1wiJCe6KAMKxV+wO0uGuOHXe?= =?us-ascii?Q?q6Kdmag931uoHpttq3/3Y8+mvs9A6kqv1+5qnfLBOhIzTBmGQzFtH/Gz4zlc?= =?us-ascii?Q?Vk80HBQAjeaCzon/lWuJaLaexzgEWEiYAduUE0tJla0EQcMmdsVd/aM7v+p1?= =?us-ascii?Q?gORHPHSlomcfVqvQ8aa7Wre0iKtlYt9wpP1ch77e/eC8/qkCQVwwOIwDAmI8?= =?us-ascii?Q?slfEWQKLqHln9TPMBCGLCzhZFl6qQzqI5exu0/HNCW9igRii4y0DlzbumFXM?= =?us-ascii?Q?rr0b/5Upva6UOBGZPqKu+Kg6LIqFM75CMnU+m45EkRnpaSROixLNweiXHDtZ?= =?us-ascii?Q?Rjq3+XjqZQqcZpsw4ojJQ2slAhYmjPtbR/gnnpuMfFtip7Ku2jqSgHLueDhR?= =?us-ascii?Q?8144I0/ok2IqKYr+fCt3RAul3DAtdvNLz1bf4MM7kNkV1TF0EfN5JOfQ00pQ?= =?us-ascii?Q?QvizNxrSIayh9naconNazAK7yT1Yg5Qe7rjqy/U7qn7aSRL9+lYkNyDKOXMK?= =?us-ascii?Q?9ELkJXDBk5NPwJ7jd/Vyp3YN82kKQqvHXr99xzNZNB0sZKxL/88V195zgkBd?= =?us-ascii?Q?Lm+Tw=3D=3D?= X-Microsoft-Exchange-Diagnostics: 1; CY1PR07MB2521; 6:nyDLBEqXiiMdMm6YHqJyVnMYOjbsy14OclKTUWnO2QRE/OavH2RuZK0aprM/tmLvL2t9+N2qNNzy1u+HpOHyFg+Njf1aTbLJnd0ViSIVEZ2Ael8cuUCSvJMc8OqkDZf8X52/7/G6u/oaDfm95v/9yXCR1/EZoyb4FWl3MP16V9LRVFylNPJGemhcqb7ncHq8HpQNuHbk5u8KwqkzsOgAblfqdwkf3wYT/RKdxNXNxb7kMPuzl7DutwTGBRqpDx5/4SLhFjtQsAsvqGzqEYE47Xqo6PE+Zd/iOV3RCiihmEqgQ9cVDPZNV0GS3jaNsLPBpiAaVqkUDAO1bJwTJZ1ZIvAjAlIBOEZ9Qb0P/Kl3tZ8=; 5:TX0e5QLl37DEn1Pb8+OXS0vgomsD++f1zzh2M0rp02qTLiNuLuYzMF9NZhzZfkqsQnSJL5g5lKt7gGerOGw+bzmafyCsWDPEOJOc5q55Z//gAVand+5TRcjqnybRC8SJCnYN7p6xk2l9CQpouqPrqRfAITE5M55E8Nx/kWv2IgY=; 24:VVVbpCHFRn3HYaUQpu9UydQAdtoZo3ZT1KJvulerWvgeYzA19/PzLMsdW3TltcV6lCa6fyjckVlki4iMa/Nl6llWHgqCgQJHQmCEpRrpP2Q=; 7:aD6QrLi3T3p4NmB26ZNimJ3sYr1cMqrOV9wWCkOQFAw/LTTp3r/Y97/ExbOyjxSG3tJ3Pi0fI0zg9Nqh9AQ4x0cUTfMUDwud/HkuT24jRwojs2278fgX3jvfpQFKuNPuIpeQCA+lRFeGddFGI6xj7ZWWXjkGA9GCB57jwu4ije4A2rOBszgi+azHhL2q8BnmIdLZGQ47XuWkfhqpIMyeslcsSscI8WeD2QOI15MkeA3UJjBp0vIfiUlL2GjKQPqt SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 19 Dec 2017 07:24:54.8313 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 510e6b5f-2a52-4d2a-680b-08d546b19ad5 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR07MB2521 Subject: Re: [dpdk-dev] [PATCH v3] arch/arm: optimization for memcpy on AArch64 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 19 Dec 2017 07:24:58 -0000 -----Original Message----- > Date: Tue, 19 Dec 2017 05:33:19 +0000 > From: Herbert Guan > To: Jerin Jacob > CC: "dev@dpdk.org" , nd > Subject: RE: [PATCH v3] arch/arm: optimization for memcpy on AArch64 > > Jerin, > > Thanks for review and comments. Please find my feedbacks below inline. > > > -----Original Message----- > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com] > > Sent: Monday, December 18, 2017 15:44 > > To: Herbert Guan > > Cc: dev@dpdk.org > > Subject: Re: [PATCH v3] arch/arm: optimization for memcpy on AArch64 > > > > -----Original Message----- > > > Date: Mon, 18 Dec 2017 10:54:24 +0800 > > > From: Herbert Guan > > > To: dev@dpdk.org, jerin.jacob@caviumnetworks.com > > > CC: Herbert Guan > > > Subject: [PATCH v3] arch/arm: optimization for memcpy on AArch64 > > > X-Mailer: git-send-email 1.8.3.1 > > > > > > Signed-off-by: Herbert Guan > > > --- > > > config/common_armv8a_linuxapp | 6 + > > > .../common/include/arch/arm/rte_memcpy_64.h | 292 > > +++++++++++++++++++++ > > > 2 files changed, 298 insertions(+) > > > > > > diff --git a/config/common_armv8a_linuxapp > > > b/config/common_armv8a_linuxapp index 6732d1e..8f0cbed 100644 > > > --- a/config/common_armv8a_linuxapp > > > +++ b/config/common_armv8a_linuxapp > > > @@ -44,6 +44,12 @@ CONFIG_RTE_FORCE_INTRINSICS=y # to address > > minimum > > > DMA alignment across all arm64 implementations. > > > CONFIG_RTE_CACHE_LINE_SIZE=128 > > > > > > +# Accelarate rte_memcpy. Be sure to run unit test to determine the > > > > Additional space before "Be". Rather than just mentioning the unit test, > > mention the absolute test case name(memcpy_perf_autotest) > > > > > +# best threshold in code. Refer to notes in source file > > > > Additional space before "Refer" > > Fixed in new version. > > > > > > +# (lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h) for more > > # > > > +info. > > > +CONFIG_RTE_ARCH_ARM64_MEMCPY=n > > > + > > > CONFIG_RTE_LIBRTE_FM10K_PMD=n > > > CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n > > > CONFIG_RTE_LIBRTE_AVP_PMD=n > > > diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h > > > b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h > > > index b80d8ba..1ea275d 100644 > > > --- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h > > > +++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h > > > @@ -42,6 +42,296 @@ > > > > > > #include "generic/rte_memcpy.h" > > > > > > +#ifdef RTE_ARCH_ARM64_MEMCPY > > > > See the comment below at "(GCC_VERSION < 50400)" check > > > > > +#include > > > +#include > > > + > > > +/* > > > + * The memory copy performance differs on different AArch64 micro- > > architectures. > > > + * And the most recent glibc (e.g. 2.23 or later) can provide a > > > +better memcpy() > > > + * performance compared to old glibc versions. It's always suggested > > > +to use a > > > + * more recent glibc if possible, from which the entire system can get > > benefit. > > > + * > > > + * This implementation improves memory copy on some aarch64 > > > +micro-architectures, > > > + * when an old glibc (e.g. 2.19, 2.17...) is being used. It is > > > +disabled by > > > + * default and needs "RTE_ARCH_ARM64_MEMCPY" defined to activate. > > > +It's not > > > + * always providing better performance than memcpy() so users need to > > > +run unit > > > + * test "memcpy_perf_autotest" and customize parameters in > > > +customization section > > > + * below for best performance. > > > + * > > > + * Compiler version will also impact the rte_memcpy() performance. > > > +It's observed > > > + * on some platforms and with the same code, GCC 7.2.0 compiled > > > +binaries can > > > + * provide better performance than GCC 4.8.5 compiled binaries. > > > + */ > > > + > > > +/************************************** > > > + * Beginning of customization section > > > +**************************************/ > > > +#define ALIGNMENT_MASK 0x0F > > > > This symbol will be included in public rte_memcpy.h version for arm64 DPDK > > build. > > Please use RTE_ prefix to avoid multi > > definition.(RTE_ARCH_ARM64_ALIGN_MASK ? or any shorter name) > > > Changed to RTE_AARCH64_ALIGN_MASK in new version. Since it is something to do with memcpy and arm64, I prefer, RTE_ARM64_MEMCPY_ALIGN_MASK > > > > +#ifndef RTE_ARCH_ARM64_MEMCPY_STRICT_ALIGN > > > +/* Only src unalignment will be treaed as unaligned copy */ #define > > > +IS_UNALIGNED_COPY(dst, src) ((uintptr_t)(dst) & ALIGNMENT_MASK) > > #else > > > +/* Both dst and src unalignment will be treated as unaligned copy */ > > > +#define IS_UNALIGNED_COPY(dst, src) \ > > > + (((uintptr_t)(dst) | (uintptr_t)(src)) & ALIGNMENT_MASK) > > #endif > > > + > > > + > > > +/* > > > + * If copy size is larger than threshold, memcpy() will be used. > > > + * Run "memcpy_perf_autotest" to determine the proper threshold. > > > + */ > > > +#define ALIGNED_THRESHOLD ((size_t)(0xffffffff)) > > > +#define UNALIGNED_THRESHOLD ((size_t)(0xffffffff)) > > > > Same as above comment. > Added RTE_AARCH64_ prefix in new version. Same as above. > > > > > > + > > > +/************************************** > > > + * End of customization section > > > + **************************************/ > > > +#ifdef RTE_TOOLCHAIN_GCC > > > +#if (GCC_VERSION < 50400) > > > +#warning "The GCC version is quite old, which may result in sub-optimal \ > > > +performance of the compiled code. It is suggested that at least GCC 5.4.0 \ > > > +be used." > > > > Even though it is warning, based on where this file get included it will > > generate error(see below) > > How about, selecting optimized memcpy when RTE_ARCH_ARM64_MEMCPY > > && if (GCC_VERSION >= 50400) ? > > > Fully understand that. While I'm not tending to make it 'silent'. GCC 4.x is just > quite old and may not provide best optimized code -- not only for DPDK app. > We can provide another option RTE_AARCH64_SKIP_GCC_VERSION_CHECK to allow > skipping the GCC version check. How do you think? I prefer to reduce the options. But, No strong opinion on this as this the RTE_ARCH_ARM64_MEMCPY option is by default disabled(ie. No risk).