From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from m13-111.163.com (m13-111.163.com [220.181.13.111]) by dpdk.org (Postfix) with ESMTP id 39243237 for ; Fri, 15 Dec 2017 04:39:48 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=163.com; s=s110527; h=Date:From:Subject:MIME-Version:Message-ID; bh=BBZ4l fszf7oemKQ4RNP7Xeieenf/P5qLJoLLG6FbqpA=; b=W53pPQDHsf1Sp54rgHq7V 4CkhBiOEf9njV05m5NZ6zaq9m/qGitkuJqqLkjEGj269hR4ZxcjeDsWKjyYImZXF EsNR3ofl/WMfmxqbvXsz8zV1gv0XCxwwKBp8MiFxBGi25pED0Za+0OzrpaiMXfWR wi3YqOu8EaK1ZAKLDwihWI= Received: from liupan1234$163.com ( [106.11.34.9] ) by ajax-webmail-wmsvr111 (Coremail) ; Fri, 15 Dec 2017 11:39:47 +0800 (CST) X-Originating-IP: [106.11.34.9] Date: Fri, 15 Dec 2017 11:39:47 +0800 (CST) From: liupan1234 To: "dev@dpdk.org" X-Priority: 3 X-Mailer: Coremail Webmail Server Version SP_ntes V3.5 build 20160729(86883.8884) Copyright (c) 2002-2017 www.mailtech.cn 163com X-CM-CTRLDATA: +3TFXmZvb3Rlcl9odG09NTIxOjU2 MIME-Version: 1.0 Message-ID: <7accecd7.4d04.1605841acbf.Coremail.liupan1234@163.com> X-Coremail-Locale: zh_CN X-CM-TRANSID: b8GowAAnURoDRDNadbIkAA--.32663W X-CM-SenderInfo: xolx1tjqrsjki6rwjhhfrp/1tbiLx2vvFUMEcUYrAABsZ X-Coremail-Antispam: 1U5529EdanIXcx71UUUUU7vcSsGvfC2KfnxnUU== Content-Type: text/plain; charset=GBK Content-Transfer-Encoding: base64 X-Content-Filtered-By: Mailman/MimeDel 2.1.15 Subject: [dpdk-dev] shared memory statistic X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Dec 2017 03:39:50 -0000 SGkgYWxsLAoKCkkgaGF2ZSBhbiB1cmdlbnQgcXVlc3Rpb246CgoKMSkgd2hlbiBhbiBhcHAgcnVu cywgaG93IHRvIGdldCB0aGUgcmVhbGx5IG1lbW9yeSBpdCB1c2VkIGluIHJlYWwgdGltZT8gZm9y IGV4YW1wbGUsIGl0IHVzZSAtbSBwYXJhbSB0byBzcGVjaWZ5IDFHIG1lbW9yeSwgYnV0IGl0IG9u bHkgdXNlZCA1MDAgTUIsIGhvdyB0byBnZXQgdGhpcyBpbmZvCjIpIFdoZW4gc2V2ZXJhbCBhcHBz IHJ1bnMgd2l0aCBzaGFyZWQgbWVtb3J5OiBzaG0gMCwgaXMgdGhlcmUgYW55IG1ldGhvZCB0byBn ZXQgdGhlIG1lbW9yeSB1c2VkIGluZm8gZm9yIGVhY2ggcHJvY2Vzcz8KCgpUaGFua3MKUGFu >From Jerin.JacobKollanukkaran@cavium.com Fri Dec 15 04:41:51 2017 Return-Path: Received: from NAM03-CO1-obe.outbound.protection.outlook.com (mail-co1nam03on0046.outbound.protection.outlook.com [104.47.40.46]) by dpdk.org (Postfix) with ESMTP id 988752B9F for ; Fri, 15 Dec 2017 04:41:50 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=bOWmuvFfSJcjususGT51GvKMtMABAK+Wgu4LDIR0UV8=; b=D32/sA8DMm1cO1sqG3TsSU1YUqTmTE0i/+ZYtNZwhWBcv+lk86+vSrCclrBG2Ke7WjDH79j88nE6LAuesbSuRnR7iNnlX2NwHOmFs3Nw2PQMel2qbDbzpx8Fegl3rjxMWj2QoX9G6y9pIs03f+8VGOyeMpacXiSVxEIqLw442v8= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Jerin.JacobKollanukkaran@cavium.com; Received: from jerin (111.93.218.67) by CO2PR07MB2519.namprd07.prod.outlook.com (10.166.201.6) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.282.5; Fri, 15 Dec 2017 03:41:46 +0000 Date: Fri, 15 Dec 2017 09:11:29 +0530 From: Jerin Jacob To: Herbert Guan Cc: dev@dpdk.org, pbhagavatula@caviumnetworks.com, jianbo.liu@arm.com Message-ID: <20171215034127.GA5874@jerin> References: <1511768985-21639-1-git-send-email-herbert.guan@arm.com> <1512453723-4513-1-git-send-email-herbert.guan@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1512453723-4513-1-git-send-email-herbert.guan@arm.com> User-Agent: Mutt/1.9.1 (2017-09-22) X-Originating-IP: [111.93.218.67] X-ClientProxiedBy: BM1PR01CA0113.INDPRD01.PROD.OUTLOOK.COM (10.174.208.29) To CO2PR07MB2519.namprd07.prod.outlook.com (10.166.201.6) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 48e95455-6375-477d-62eb-08d5436dc555 X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(5600026)(4604075)(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(2017052603307); SRVR:CO2PR07MB2519; X-Microsoft-Exchange-Diagnostics: 1; CO2PR07MB2519; 3:fs4Zn61/6hqkklgsf5ils69o41IlsVu7BAy9axaxeg2+KbHxe5jSYXalia8YGZfedP2efyYvsu1NMr1MeHm+I0L+PGk5L/Ni+/kVu3Ddk61pTeV/25IAU4EWWEO1HnS7OUguih8u3sPybEkH3lvrIzpdsrOpTeaSJ2eMhzZCAFNnaBvbFQhP4bTd3I8s274048Ir9swj44isSfY+8Hw8Cxu/FnD3Tb3/J0IpiBoeHR9WRqI+wsX7mr1gjeKEGwi1; 25:XPLeLy53VSdfiA71wnLkQKHFMenDBUNGVyAnVk2kZ9bWQKo3gxDK9jbVL5ebmY8II1DPXFZcQdJ1eEXzQHb9OClqXl6jP4PKA/jFJF4rxA6r+dDNB/H0+Z37hwVu3+TazBnrP+QMpo7Mutdh52WhETp6b9R8btYcH6ErPnliLUHSMDc0Z3jTMsXw7irDpWeahsT1AzMkxUWsojjvd+qTyAcafcyO6SRds0OU4gmcuXdCbRnflS34fwXTb1PesurFaQwyVhs95BVHbft+qn+8q0sRl47Ivqqv5lW6dA4dydFnzqP6LiofdiJVAo64xApKyvZ+cP4bSrSKJXhqIaRA4g==; 31:XuQNYRI5fr8gLhpSlxIDnGjk/kBfsUIrTbqISWbELH9aSZ+fWKbUFjpuLKlCY8/dOizytaQB3YV6SiIseNw7LIJcUasEPdDIycdjooAYtT1ruKbVDoIG9fXz7x0L5iZ+PMY6FOSMAv5pxf3GPVOkYUci/4ykSVrB7ALqOWHTd1SLGtr9cdXaAJjMkrZY9/w/S3OuFW7vB2UXEG7H3d9NWfgYV2NfIXvKDkOJAs9cAvU= X-MS-TrafficTypeDiagnostic: CO2PR07MB2519: X-Microsoft-Exchange-Diagnostics: 1; CO2PR07MB2519; 20:cBnMAJfG3ame6xYdlbIMOZ/D/DjD1O7dzCt//YxcEsRWHiiQIofhzi8Vd3TbR5kfouhTOZluqhrsft7YyRz+BmGzBFvH/pK7rWfq6zjGrNfAKpNylWBlfXu18urIB5Tw4PWd7dDSec5uEjZEk7EAI6xKrF/HlYpD+L7HvP1SuqolK2+CwpMRx4LImDYiLP8TRG5kQTVutu/pNX8wN6qo6lKK0bWIZbfymNUTbhhhiY4+FdLTR7+wH0t6p+9YP1zCqPZ/FgKH0nOGvsWUob9ljb33iFMVIMOGMYORSfuQ7R1rj35ewA3ct+GNY7IVvA+byAxSovuwMR2G84dCInLT4zU5si/ZmJNpCe3DTeCx269bTUhvYI+H/NQmAk3TM2vIRODwWScXuaBvYht5UI4cf4QX2Z5uxJU6C7Wc3KqMQjBlLUrt0Y9ub3CIETSygfTRXBelOkcKNqiexXbYdRpCdRE8xfOwtTRkrvtsqOIzzyGenoma/DlRL5J8rDzLt1nh6JWEFwYozwx1zB6Skjnm8gJSGh/O9vQ/RZXxOnEA7NMG2+us9H+wjS9u97aX7FIhuJiz10y7488M5i7+a1cStLtHwSXeNVmEvsmoNEmcEbg=; 4:XPlGQgG/MVhay+YsLTEcYd1a4yDJISc1qbIo7l3yMzbw/QhbMuE5/FLMx4Y085ezqUHhsbjf1jbaJ9Ilrviqgd3KjiG0pUf8YnE4j78lFS+U7RGt86Y15IBFVCFT2WAsQQehlIPvYKiMa69z/D+t/TNneeRYprfDbNgVEj9deAB+9Ke1MO4UhlIttB3UfBR0M+ZvESrS3JM/TRNrK2Z8clJYcOCPrza/4qlD+i7Lh9DVnUq234KQFgxasWXvu8BrewAnvk+M8DTgtIUldjTSw2bcw0ZBO4fcSkYInSxoWjafD/wtU0BH5NGDwy58jUuq X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(180628864354917); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040450)(2401047)(5005006)(8121501046)(93006095)(3231023)(3002001)(10201501046)(6041248)(20161123562025)(20161123555025)(20161123564025)(20161123560025)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123558100)(6072148)(201708071742011); SRVR:CO2PR07MB2519; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:CO2PR07MB2519; X-Forefront-PRVS: 05220145DE X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(346002)(39860400002)(376002)(396003)(366004)(199004)(189003)(13464003)(316002)(9686003)(53936002)(478600001)(575784001)(1076002)(81166006)(16526018)(6246003)(66066001)(6666003)(23726003)(81156014)(42882006)(229853002)(8936002)(16586007)(6916009)(72206003)(58126008)(2906002)(4326008)(55016002)(83506002)(68736007)(2950100002)(50466002)(25786009)(7736002)(8676002)(105586002)(305945005)(33656002)(6496006)(33716001)(106356001)(5009440100003)(6116002)(47776003)(76176011)(52116002)(59450400001)(3846002)(33896004)(5660300001)(97736004)(386003)(18370500001); DIR:OUT; SFP:1101; SCL:1; SRVR:CO2PR07MB2519; H:jerin; FPR:; SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; CO2PR07MB2519; 23:Faao+UwDe84opqqHs5Bb7G5ulDyEbb7fgxl8mY7k2?= =?us-ascii?Q?3mAw6mdDoqBZmWCOUg5psEnBgqFQA4NIiUErMc3vWLvKewFd8UFVj+XN8ZDZ?= =?us-ascii?Q?kMXRku04IXJvktPyN0IZfdMExrYUnR/RpiUz2wfdZF3+WwJge957eG8PGUxu?= =?us-ascii?Q?fHBIugoyBLb4Bftn6Pq8yVB89N9c1EEoJ5dUQwP+k/azsONQ1j3JjsG6XeAr?= =?us-ascii?Q?m2362lCdUJ3fq4Qt7dx5CGnrt2z0Fz3V/UmtUW87bfNQ+rAgFp4v5pvU33zB?= =?us-ascii?Q?nR4pzkD/m0BuPpTgLZ3IfBrQV8AJkwj9cCgHnVSMoIYeoalJre1mUcU/q/vm?= =?us-ascii?Q?gOxIbHNS+/3k0tVSXPzYsVNJxurLcz8LRZKmGtcVJUjvx096foK2ExQgfurj?= =?us-ascii?Q?2BrociU1um+jqydtlO23edtoZFPVKZgkjBCotrgm/PfAr627RYmXzSjNOGXU?= =?us-ascii?Q?C5m5j3a5kYM6fsjRZsyhMPypyqtUMHI9vb7yFtpeSiAZ9UUQQnMITn9tHXDP?= =?us-ascii?Q?eT+U+ovkHRuODkt+PqkQjcdQdsZzjtW+ruSUjot4H4JVCPkEgPcSv/s5t2Ns?= =?us-ascii?Q?KpQVYL3LNt4Pd09PQCh+naI9WXkls+tSbhQvVcZQ2FNrevIbvn0k1SaVP/HX?= =?us-ascii?Q?e9yhonY2ejEG1MRB9v8bhLrrEcv2bFlTVCoJlaeOdlYX7BPbLqSdNls7talz?= =?us-ascii?Q?aGwIB4FfoikXxq7vqcXqwAgS04+N4kMmUXdd5vKrGz30qiofdqRSw926/ArL?= =?us-ascii?Q?FT7f8A7kOR5kLIQEH85FbJciKFHUENEpQ/n77Nec6eoPGR/OcN/5whi+30Hc?= =?us-ascii?Q?aeKwS7vjldKV6mpeo0XtSsQwFAEWbSQtMcI6xXgaIlq2ZtoarzR+g0O99fZr?= =?us-ascii?Q?Du/FRczbDWSdszEGQhMwDJd5QM1TBTg+4Z85oAS88rF4t09JPjOvpGQR8Szi?= =?us-ascii?Q?kkDVz3Fni/b5yI7wt3mE1/mu0XRatxqWzx2jjL/v1c/6MHtVC8WzNrO4QHKV?= =?us-ascii?Q?jP5sx6Y4/ONUPZkHLhrhuTC1bJl1SyyRBKOMenYZZOjeOSE+c2b6oxIIr/HT?= =?us-ascii?Q?/IvtR/DvY0cYDrFLfZ902L760xzQSYpXpwtdrg2qvIPxujixmiX6bJ3W8vS8?= =?us-ascii?Q?SijDh/pxC7pE6pgaTSMLBdFCYVtHehqa5LcCOM7twsuiQFghoud6vOCfiapY?= =?us-ascii?Q?qO3FIO1k1Ra7pRfAdTuqqALbxUWYP+xuKN4dzd0TUCsYQPNU3zxVdjZEckKt?= =?us-ascii?Q?HTUIh8vamYHEmGHLXXzfg6oWfFBHTIR0ofx/c3LzDcnikdxNNkPoaApbthec?= =?us-ascii?B?QT09?= X-Microsoft-Exchange-Diagnostics: 1; CO2PR07MB2519; 6:TKnku09k+DjdtTTklZDOYnwXrLKf60ppjY5dlnF8f3svM+nWp61h7CEaa5ZZiRdls63QhjSyIwE8s10Nbe5BGxfe6bzB4L7CDTT4wmDE1BBzKhHZhaNqzAZYTk74xpeV/uNQ9P3RA5ZVbi8i9kL5i/NCwkP1zGw8RbVYk+6uodwkAfWsoM82vlWK0GraOLyMeddUX++yWYOzM+UE/Abt3YNfjTuc2yaWLhf9rn1tt6zwqOJrI9lDPD8F+lj84Q3+m739yYIZF+uu4vVeM26KiCWDQsE9bRQf0SBT5qXnl/2syQWrrRGCDg7pjDy4WTm9WQTLlvCi2Y4TJvZBHRwZRMPdynlN2o1zYZftBwurbnA=; 5:Kw97lNEcFESFGeTS+D48gc1UyZLEajj3hD6j4q8Ko4RLblpJFhRulonzYc2NECX1rj1Aj8VBPZyVzW+/691sDNeWPmvqcrxbD6EPbHNpxrbcr/0lYrmmxVzYZ/3khy3N4MH/ok3WE9u8mr1yr4U2OuEktwyZZFQo4yxT3dt23IE=; 24:X+FW0UVDvuPKWYvY050ZiDkLBgSwZ6zjuX8Yl6EOFfy6KJZrRK1nfa/OV2kuMHSTpSp1FZBqP2tZAn1SuLX5IFHOf+JNafA/vvesX3xwhAo=; 7:jFBF1RXweOnIG6ilM54uza5tfnELUvnLq1x3W6qSZ0SpC5oRQU7cwT5ROLU+1tp3CsTu1Byjgp1v1GjcUj7cknF/BJ5QdD+jViND4Zwu3eUhLqEk52YjL4neKDVD5pb98obUvvHiye0M5KJgk7gXYxfNgQCeGD4bM7pbVMXFl2qGi9SMTyl0oI1QEo8rAy51QGqIeFIsTwV5hMvHQx3vXGwg1OS1MD91RUDcECG2VtqoMdZnD2fsKcHAPgSGm9e6 SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 15 Dec 2017 03:41:46.8216 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 48e95455-6375-477d-62eb-08d5436dc555 X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: CO2PR07MB2519 Subject: Re: [dpdk-dev] [PATCH v2] arch/arm: optimization for memcpy on AArch64 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Dec 2017 03:41:51 -0000 -----Original Message----- > Date: Tue, 5 Dec 2017 14:02:03 +0800 > From: Herbert Guan > To: dev@dpdk.org > CC: jerin.jacob@caviumnetworks.com, pbhagavatula@caviumnetworks.com, > jianbo.liu@arm.com, Herbert Guan > Subject: [PATCH v2] arch/arm: optimization for memcpy on AArch64 > X-Mailer: git-send-email 1.8.3.1 > > This patch provides an option to do rte_memcpy() using 'restrict' > qualifier, which can induce GCC to do optimizations by using more > efficient instructions, providing some performance gain over memcpy() > on some AArch64 platforms/enviroments. > > The memory copy performance differs between different AArch64 > platforms. And a more recent glibc (e.g. 2.23 or later) > can provide a better memcpy() performance compared to old glibc > versions. It's always suggested to use a more recent glibc if > possible, from which the entire system can get benefit. If for some > reason an old glibc has to be used, this patch is provided for an > alternative. > > This implementation can improve memory copy on some AArch64 > platforms, when an old glibc (e.g. 2.19, 2.17...) is being used. > It is disabled by default and needs "RTE_ARCH_ARM64_MEMCPY" > defined to activate. It's not always proving better performance > than memcpy() so users need to run DPDK unit test > "memcpy_perf_autotest" and customize parameters in "customization > section" in rte_memcpy_64.h for best performance. > > Compiler version will also impact the rte_memcpy() performance. > It's observed on some platforms and with the same code, GCC 7.2.0 > compiled binary can provide better performance than GCC 4.8.5. It's > suggested to use GCC 5.4.0 or later. Description looks good. > > Signed-off-by: Herbert Guan > --- > config/common_armv8a_linuxapp | 6 + > .../common/include/arch/arm/rte_memcpy_64.h | 195 +++++++++++++++++++++ > 2 files changed, 201 insertions(+) > > diff --git a/config/common_armv8a_linuxapp b/config/common_armv8a_linuxapp > index 6732d1e..158ce00 100644 > --- a/config/common_armv8a_linuxapp > +++ b/config/common_armv8a_linuxapp > @@ -44,6 +44,12 @@ CONFIG_RTE_FORCE_INTRINSICS=y > # to address minimum DMA alignment across all arm64 implementations. > CONFIG_RTE_CACHE_LINE_SIZE=128 > > +# Accelarate rte_memcpy. Be sure to run unit test to determine the > +# best threshold in code. Refer to notes in source file > +# (lib/librte_eam/common/include/arch/arm/rte_memcpy_64.h) for more s/librte_eam/librte_eal > +# info. > +CONFIG_RTE_ARCH_ARM64_MEMCPY=n > + > CONFIG_RTE_LIBRTE_FM10K_PMD=n > CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n > CONFIG_RTE_LIBRTE_AVP_PMD=n > diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h > index b80d8ba..a6ad286 100644 > --- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h > +++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h > @@ -42,6 +42,199 @@ > > #include "generic/rte_memcpy.h" > > +#ifdef RTE_ARCH_ARM64_MEMCPY > +#include > +#include > + > +/******************************************************************************* Please remove "*******************************".The standard C comment don't have that. > + * The memory copy performance differs on different AArch64 micro-architectures. > + * And the most recent glibc (e.g. 2.23 or later) can provide a better memcpy() > + * performance compared to old glibc versions. It's always suggested to use a > + * more recent glibc if possible, from which the entire system can get benefit. > + * > + * This implementation improves memory copy on some aarch64 micro-architectures, > + * when an old glibc (e.g. 2.19, 2.17...) is being used. It is disabled by > + * default and needs "RTE_ARCH_ARM64_MEMCPY" defined to activate. It's not > + * always providing better performance than memcpy() so users need to run unit > + * test "memcpy_perf_autotest" and customize parameters in customization section > + * below for best performance. > + * > + * Compiler version will also impact the rte_memcpy() performance. It's observed > + * on some platforms and with the same code, GCC 7.2.0 compiled binaries can > + * provide better performance than GCC 4.8.5 compiled binaries. > + ******************************************************************************/ > + > +/************************************** > + * Beginning of customization section > + **************************************/ > +#define ALIGNMENT_MASK 0x0F > +#ifndef RTE_ARCH_ARM64_MEMCPY_STRICT_ALIGN > +/* Only src unalignment will be treaed as unaligned copy */ > +#define IS_UNALIGNED_COPY(dst, src) ((uintptr_t)(dst) & ALIGNMENT_MASK) > +#else > +/* Both dst and src unalignment will be treated as unaligned copy */ > +#define IS_UNALIGNED_COPY(dst, src) \ > + (((uintptr_t)(dst) | (uintptr_t)(src)) & ALIGNMENT_MASK) > +#endif > + > + > +/* > + * If copy size is larger than threshold, memcpy() will be used. > + * Run "memcpy_perf_autotest" to determine the proper threshold. > + */ > +#define ALIGNED_THRESHOLD ((size_t)(0xffffffff)) > +#define UNALIGNED_THRESHOLD ((size_t)(0xffffffff)) > + > + > +/************************************** > + * End of customization section > + **************************************/ > +#ifdef RTE_TOOLCHAIN_GCC > +#if (GCC_VERSION < 50400) > +#warning "The GCC version is quite old, which may result in sub-optimal \ > +performance of the compiled code. It is suggested that at least GCC 5.4.0 \ > +be used." > +#endif > +#endif > + > +static inline void __attribute__ ((__always_inline__)) > +rte_mov16(uint8_t *restrict dst, const uint8_t *restrict src) > +{ > + __uint128_t *restrict dst128 = (__uint128_t *restrict)dst; > + const __uint128_t *restrict src128 = (const __uint128_t *restrict)src; > + *dst128 = *src128; > +} > + > +static inline void __attribute__ ((__always_inline__)) > +rte_mov32(uint8_t *restrict dst, const uint8_t *restrict src) > +{ > + __uint128_t *restrict dst128 = (__uint128_t *restrict)dst; > + const __uint128_t *restrict src128 = (const __uint128_t *restrict)src; > + dst128[0] = src128[0]; > + dst128[1] = src128[1]; > +} > + > +static inline void __attribute__ ((__always_inline__)) > +rte_mov48(uint8_t *restrict dst, const uint8_t *restrict src) > +{ > + __uint128_t *restrict dst128 = (__uint128_t *restrict)dst; > + const __uint128_t *restrict src128 = (const __uint128_t *restrict)src; > + dst128[0] = src128[0]; > + dst128[1] = src128[1]; > + dst128[2] = src128[2]; > +} > + > +static inline void __attribute__ ((__always_inline__)) > +rte_mov64(uint8_t *restrict dst, const uint8_t *restrict src) > +{ > + __uint128_t *restrict dst128 = (__uint128_t *restrict)dst; > + const __uint128_t *restrict src128 = (const __uint128_t *restrict)src; > + dst128[0] = src128[0]; > + dst128[1] = src128[1]; > + dst128[2] = src128[2]; > + dst128[3] = src128[3]; > +} > + > +static inline void __attribute__ ((__always_inline__)) > +rte_mov128(uint8_t *restrict dst, const uint8_t *restrict src) > +{ > + rte_mov64(dst, src); > + rte_mov64(dst + 64, src + 64); > +} > + > +static inline void __attribute__ ((__always_inline__)) > +rte_mov256(uint8_t *restrict dst, const uint8_t *restrict src) > +{ > + rte_mov128(dst, src); > + rte_mov128(dst + 128, src + 128); > +} > + > +static inline void __attribute__ ((__always_inline__)) > +rte_memcpy_lt16(uint8_t *restrict dst, const uint8_t *restrict src, size_t n) > +{ > + if (n & 0x08) { > + /* copy 8 ~ 15 bytes */ > + *(uint64_t *)dst = *(const uint64_t *)src; > + *(uint64_t *)(dst - 8 + n) = *(const uint64_t *)(src - 8 + n); > + } else if (n & 0x04) { > + /* copy 4 ~ 7 bytes */ > + *(uint32_t *)dst = *(const uint32_t *)src; > + *(uint32_t *)(dst - 4 + n) = *(const uint32_t *)(src - 4 + n); > + } else if (n & 0x02) { > + /* copy 2 ~ 3 bytes */ > + *(uint16_t *)dst = *(const uint16_t *)src; > + *(uint16_t *)(dst - 2 + n) = *(const uint16_t *)(src - 2 + n); > + } else if (n & 0x01) { > + /* copy 1 byte */ > + *dst = *src; > + } > +} > + > +static inline void __attribute__ ((__always_inline__)) > +rte_memcpy_ge16_lt64 > +(uint8_t *restrict dst, const uint8_t *restrict src, size_t n) > +{ > + if (n == 16) { > + rte_mov16(dst, src); > + } else if (n <= 32) { > + rte_mov16(dst, src); > + rte_mov16(dst - 16 + n, src - 16 + n); > + } else if (n <= 48) { > + rte_mov32(dst, src); > + rte_mov16(dst - 16 + n, src - 16 + n); > + } else { > + rte_mov48(dst, src); > + rte_mov16(dst - 16 + n, src - 16 + n); > + } > +} > + > +static inline void __attribute__ ((__always_inline__)) > +rte_memcpy_ge64(uint8_t *restrict dst, const uint8_t *restrict src, size_t n) > +{ > + do { > + rte_mov64(dst, src); > + src += 64; > + dst += 64; > + n -= 64; > + } while (likely(n >= 64)); > + > + if (likely(n)) { > + if (n > 48) > + rte_mov64(dst - 64 + n, src - 64 + n); > + else if (n > 32) > + rte_mov48(dst - 48 + n, src - 48 + n); > + else if (n > 16) > + rte_mov32(dst - 32 + n, src - 32 + n); > + else > + rte_mov16(dst - 16 + n, src - 16 + n); > + } > +} > + > +static inline void *__attribute__ ((__always_inline__)) > +rte_memcpy(void *restrict dst, const void *restrict src, size_t n) > +{ > + if (n < 16) { > + rte_memcpy_lt16((uint8_t *)dst, (const uint8_t *)src, n); > + return dst; > + } > + if (n < 64) { > + rte_memcpy_ge16_lt64((uint8_t *)dst, (const uint8_t *)src, n); > + return dst; > + } I have comment here, I will reply to original thread.