From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR02-HE1-obe.outbound.protection.outlook.com (mail-eopbgr10042.outbound.protection.outlook.com [40.107.1.42]) by dpdk.org (Postfix) with ESMTP id EA59A1D8E for ; Sun, 3 Dec 2017 13:37:32 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=Q4QkxSeiOjwOP8YrXIjL6T/w0YylETgyAKYLPYLHRRE=; b=gFakSN4FZoINUm6FCGLHB8oZROL2kb4OAlpIvURgN/GrewYDX2LX/nek3MC2fLDStI1ostEGi5hgbtE+eCAOo6zUmuBr0nok2IJZ5R/Gbf3jhMcD47X3gPHiVNURPYP1r/SBtOa7Pj3ddS1WuUnhNY8zld68rx7/Qox8qP1pskU= Received: from HE1PR08MB2809.eurprd08.prod.outlook.com (10.170.246.148) by VI1PR0801MB1360.eurprd08.prod.outlook.com (10.167.198.12) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.282.5; Sun, 3 Dec 2017 12:37:30 +0000 Received: from HE1PR08MB2809.eurprd08.prod.outlook.com ([fe80::54fd:d63d:4cce:8f32]) by HE1PR08MB2809.eurprd08.prod.outlook.com ([fe80::54fd:d63d:4cce:8f32%13]) with mapi id 15.20.0282.010; Sun, 3 Dec 2017 12:37:30 +0000 From: Herbert Guan To: Jerin Jacob CC: Jianbo Liu , "dev@dpdk.org" Thread-Topic: [PATCH] arch/arm: optimization for memcpy on AArch64 Thread-Index: AQHTZ1RSZ4k73Wd8i0Ow+NbtDaNy7KMrTW0AgAZCBFA= Date: Sun, 3 Dec 2017 12:37:30 +0000 Message-ID: References: <1511768985-21639-1-git-send-email-herbert.guan@arm.com> <20171129123154.GA22644@jerin> In-Reply-To: <20171129123154.GA22644@jerin> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Herbert.Guan@arm.com; x-originating-ip: [218.92.220.194] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; VI1PR0801MB1360; 6:6vKxRd6tE3JWB5l23uRlcG8xUbDjcHDl7AuZbTD7J4oGzPpQO5K3s+CuuFM7HQcd4RKBnzCO4KlmsIAalEsyLdRP6crfnW7yBXmn+Yynk0xjWMPD7AOtxVPNR+uLSnWbFT8J/zySDaG7KnnSofIO5PuuTY80ATkefxqaJg0hPFweSihWp1IVMM8U4ssfYl+gm2mxPY5XdLoOd1O9EvsAqOMbifFxrMaNVPt9KQJ/uIry76upxak2xuzWXRcKBDW2KmY+opsoConTav4uvtNjdoKc1c7lnH1o7R2uBH6xe1Q3fELTrGpjY+u/UMvPkT4OjzjW3wa4UW5TP+sEmTaKXhUIN0XLp5HTV6mB72sRWj8=; 5:69/SOkel2qL6RtZX7TgxWOYqux5Ob3CfOMrL+avmjQSuRCxFtG3XdtH7l5kqWNVCGRpjlpKLaNZT6aRrPdfsQ2lX/pD0kHAVw3PVx1yYGTC+UK64cWOOreK2UDLZj087zWBV/S4On22pIFIY3hOfhi4z/GEeO/VkPiq+g8wuJZQ=; 24:aELh/+vsLqLjmDqnX0FAs/Mw2h+q3usHUj3yqh5LrOm8tqkAN8xmIQ6yLilZ0kyOyU3Exc8CILxwLnCPiM80Zf6hUwchF0lHhS0qxDbc7b8=; 7:3JMY+zPjLL1twFrSp6gKNDwb3HMprWYUqYz/PegFed4oovYHXdyAS6IBr74IS/04bPB/n/ZcVVhGLq/OBz7lkd9b7fEQTF0SjcY6UREZe09ydjA/H7kyTwoUzgTicV6jlgTWELXhZSst8vU1iuKRh5HAllr7JxSI4oyULpLfgQy+yIupWSdMCK6ae9Jk1cMoiTadSBUFpUtkym57bA7No3Gx3HKJ0izKfbePvuhc+WjpLORpwU+khO7F2BwORhZ4 x-ms-exchange-antispam-srfa-diagnostics: SSOS;SSOR; x-ms-office365-filtering-correlation-id: e8fac214-b718-4d66-dc44-08d53a4a9e7c x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: UriScan:; BCL:0; PCL:0; RULEID:(5600026)(4604075)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(48565401081)(2017052603286); SRVR:VI1PR0801MB1360; x-ms-traffictypediagnostic: VI1PR0801MB1360: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(180628864354917); x-exchange-antispam-report-cfa-test: BCL:0; PCL:0; RULEID:(6040450)(2401047)(5005006)(8121501046)(10201501046)(3231022)(93006095)(93001095)(3002001)(6055026)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(20161123555025)(20161123564025)(20161123560025)(20161123562025)(6072148)(201708071742011); SRVR:VI1PR0801MB1360; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:VI1PR0801MB1360; x-forefront-prvs: 05102978A2 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(6009001)(366004)(376002)(39860400002)(346002)(199003)(40434004)(189002)(43544003)(13464003)(7736002)(74316002)(305945005)(6246003)(101416001)(229853002)(5250100002)(2900100001)(53546010)(6506006)(68736007)(8676002)(81156014)(81166006)(8936002)(4326008)(54906003)(105586002)(53936002)(316002)(106356001)(6436002)(66066001)(99286004)(72206003)(54356011)(25786009)(9686003)(5890100001)(76176011)(478600001)(7696005)(97736004)(55016002)(14454004)(5660300001)(33656002)(2950100002)(6916009)(86362001)(575784001)(102836003)(3280700002)(189998001)(3846002)(6116002)(2906002)(3660700001); DIR:OUT; SFP:1101; SCL:1; SRVR:VI1PR0801MB1360; H:HE1PR08MB2809.eurprd08.prod.outlook.com; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: e8fac214-b718-4d66-dc44-08d53a4a9e7c X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Dec 2017 12:37:30.1579 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: VI1PR0801MB1360 Subject: Re: [dpdk-dev] [PATCH] arch/arm: optimization for memcpy on AArch64 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Dec 2017 12:37:33 -0000 Jerin, Thanks a lot for your review and comments. Please find my comments below i= nline. Best regards, Herbert > -----Original Message----- > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com] > Sent: Wednesday, November 29, 2017 20:32 > To: Herbert Guan > Cc: Jianbo Liu ; dev@dpdk.org > Subject: Re: [PATCH] arch/arm: optimization for memcpy on AArch64 > > -----Original Message----- > > Date: Mon, 27 Nov 2017 15:49:45 +0800 > > From: Herbert Guan > > To: jerin.jacob@caviumnetworks.com, jianbo.liu@arm.com, dev@dpdk.org > > CC: Herbert Guan > > Subject: [PATCH] arch/arm: optimization for memcpy on AArch64 > > X-Mailer: git-send-email 1.8.3.1 > > + > > +/************************************** > > + * Beginning of customization section > > +**************************************/ > > +#define ALIGNMENT_MASK 0x0F > > +#ifndef RTE_ARCH_ARM64_MEMCPY_STRICT_ALIGN > > +// Only src unalignment will be treaed as unaligned copy > > C++ style comments. It may generate check patch errors. I'll change it to use C style comment in the version 2. > > > +#define IS_UNALIGNED_COPY(dst, src) ((uintptr_t)(dst) & > > +ALIGNMENT_MASK) #else // Both dst and src unalignment will be treated > > +as unaligned copy #define IS_UNALIGNED_COPY(dst, src) \ > > +(((uintptr_t)(dst) | (uintptr_t)(src)) & ALIGNMENT_MASK) > #endif > > + > > + > > +// If copy size is larger than threshold, memcpy() will be used. > > +// Run "memcpy_perf_autotest" to determine the proper threshold. > > +#define ALIGNED_THRESHOLD ((size_t)(0xffffffff)) > > +#define UNALIGNED_THRESHOLD ((size_t)(0xffffffff)) > > Do you see any case where this threshold is useful. Yes, on some platforms, and/or with some glibc version, the glibc memcpy h= as better performance in larger size (e.g., >512, >4096...). So developers= should run unit test to find the best threshold. The default value of 0xf= fffffff should be modified with evaluated values. > > > + > > +static inline void *__attribute__ ((__always_inline__)) > > +rte_memcpy(void *restrict dst, const void *restrict src, size_t n) > > +{ > > +if (n < 16) { > > +rte_memcpy_lt16((uint8_t *)dst, (const uint8_t *)src, n); > > +return dst; > > +} > > +if (n < 64) { > > +rte_memcpy_ge16_lt64((uint8_t *)dst, (const uint8_t *)src, > n); > > +return dst; > > +} > > Unfortunately we have 128B cache arm64 implementation too. Could you > please take care that based on RTE_CACHE_LINE_SIZE > Here the value of '64' is not the cache line size. But for the reason that= prefetch itself will cost some cycles, it's not worthwhile to do prefetch = for small size (e.g. < 64 bytes) copy. Per my test, prefetching for small = size copy will actually lower the performance. In the other hand, I can only find one 128B cache line aarch64 machine here= . And it do exist some specific optimization for this machine. Not sure i= f it'll be beneficial for other 128B cache machines or not. I prefer not t= o put it in this patch but in a later standalone specific patch for 128B ca= che machines. > > +__builtin_prefetch(src, 0, 0); // rte_prefetch_non_temporal(src); > > +__builtin_prefetch(dst, 1, 0); // * unchanged * > > See above point and Please use DPDK equivalents. rte_prefetch*() I can use the " rte_prefetch_non_temporal()" for read prefetch. However, t= here's no DPDK equivalents for the write prefetch. Would you suggest that = we add one API for DPDK? BTW, the current DPDK rte_prefetch*() are using ASM instructions. It might= be better to use __builtin_prefetch(src, 0, 0/1/2/3) for better compatibil= ity of future aarch64 architectures. IMPORTANT NOTICE: The contents of this email and any attachments are confid= ential and may also be privileged. If you are not the intended recipient, p= lease notify the sender immediately and do not disclose the contents to any= other person, use it for any purpose, or store or copy the information in = any medium. Thank you.