From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Jerin.JacobKollanukkaran@cavium.com>
Received: from NAM03-DM3-obe.outbound.protection.outlook.com
 (mail-dm3nam03on0040.outbound.protection.outlook.com [104.47.41.40])
 by dpdk.org (Postfix) with ESMTP id 33A4D1D7
 for <dev@dpdk.org>; Mon, 18 Dec 2017 08:44:43 +0100 (CET)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com;
 h=From:Date:Subject:Message-ID:Content-Type:MIME-Version;
 bh=+r65JWLXrhc0j+ffI7oOHz4v1d1tT44JFJeDP5Oi9Go=;
 b=WFN6fSR2VyraHlR7zqKRJIKtZ2cjZ2UzSt5hN1N9BAw3tzJW6NHC8r7Yk3g7+sb/Fb/RMilLpXX9/310+bMw2vNrsTi9CyH6tB1ZRat53mKdp9dt8XYPm9mNhFCWg23VYSqaAllXtRFyAo5/IycrgkDA+p6pfHMLEn51ytsaDGY=
Authentication-Results: spf=none (sender IP is )
 smtp.mailfrom=Jerin.JacobKollanukkaran@cavium.com; 
Received: from jerin (111.93.218.67) by
 CY1PR07MB2522.namprd07.prod.outlook.com (10.167.16.13) with Microsoft SMTP
 Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id
 15.20.282.5; Mon, 18 Dec 2017 07:44:39 +0000
Date: Mon, 18 Dec 2017 13:13:51 +0530
From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
To: Herbert Guan <herbert.guan@arm.com>
Cc: dev@dpdk.org
Message-ID: <20171218074349.GA16659@jerin>
References: <1511768985-21639-1-git-send-email-herbert.guan@arm.com>
 <1513565664-19509-1-git-send-email-herbert.guan@arm.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1513565664-19509-1-git-send-email-herbert.guan@arm.com>
User-Agent: Mutt/1.9.2 (2017-12-15)
X-Originating-IP: [111.93.218.67]
X-ClientProxiedBy: MA1PR0101CA0014.INDPRD01.PROD.OUTLOOK.COM (52.134.136.152)
 To CY1PR07MB2522.namprd07.prod.outlook.com (10.167.16.13)
X-MS-PublicTrafficType: Email
X-MS-Office365-Filtering-Correlation-Id: 7a9e1e4f-41fe-4b72-eef7-08d545eb32f2
X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0;
 RULEID:(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(5600026)(4604075)(2017052603307);
 SRVR:CY1PR07MB2522; 
X-Microsoft-Exchange-Diagnostics: 1; CY1PR07MB2522;
 3:9iDa+ptfCfCX1rd4KmNrTIppXm7Q7PskJ4rXwpLGGWuFGsyuvQLbU6MAPlC25d0RElJqb+/1M64o8bcxSpywg+9rPO9tx8RJHiKN6JNqbB0f6woTm4vrF3WFEmuSWgf6gTGBN5+QXCoeyyiq7ZitZn1KFvUcs1uYL7oVmMynpd1hX43wWELlY+4umFV3F4LIuAGbYvEQA7/QfctQBX6uLp6DFiyi/+i6rkYdbxHYqQjx8foT0GwAk3AZiNiQKQq9;
 25:g4tJTh+B9J4pods3Q+69oaik78a15GZw9OU5Y2xL/a3vfW/9gZy4MH6TkB6sk2vvf08DwT7jYKY/ZMfhj7uytqx21BPjGgtWly/kVHXwmK1WOGjtA2S4AOVXSUK2WsdeP5mZLRXc6dgp9tTMLaMFG+nudHQLwtCd62totkT5FAItBJAtdT8epJ5i8RmOyFBS2+9avU6kYHd0nJaM3b1Jipxk3tkgyuDWu1EmuCUsXjtatOZ4ZxS+KkRHR6KP/wmNQ5/4FZjPwWzDoOEUjcsUjXAt3dylh0KwhtcwjWJojK9WsHvS5omBmGrm3+/56X8M0fNviFaXSyJGvNoHmTPH8w==;
 31:CfgKWsSQc/7jslqpKC+iKPLnx2RCtvR3XPVrPwiIZ+PZdL3hRN944t5U2hoSFYeSUct9tHBt8tttpt+wvESGtK4s2Pf4cTLQmWAOuA+R5KLFZmPdNdKQXW9+af7JCJuB6z/MLMOVlypUGtimBdZK0UyTxduadQmqm1AdHaCsx708Ypv/aAPbs2QuZb5rmbdchemeTa1ZHnr5byrJ39Fn7jvmK0UvGdV9MuYnn4Aqdc4=
X-MS-TrafficTypeDiagnostic: CY1PR07MB2522:
X-Microsoft-Exchange-Diagnostics: 1; CY1PR07MB2522;
 20:Eng75jvz4DpS1pOmThPr3fXZ1hpG6ZADjoz0v15I8haSSNFR0Gr6SX5YO6Y71lC2dEK/yDotjC7KeCw497VJTPkTH86Tfsn8NMF7FEVnsYvSg2SXIdQb2CeBxQD5zWqHKIPhzOAJlkK+7mxWAApqoPrYzqWSV5HTzBISNKFzq5vudwAUyU3HNase0Ashi4YrUrtnWTWqPvNMvDFVZG2DKfK6PGeZWkyU2k//BtQfPD6BMcihDRiS7Y6nhHFGhJ6wNXeVlQhG4EPCuSNGlPQjZLYTJxDysrwkPzBTT885IdoEiI9pgSRv1QAFNgWX1gyLLIcCxkUjkMQHifZHN3WsOXhWnUr29/MyhJqhh/XXukGULm3WHBU+BQHASvpDT4S3M5TeQ0lLX8ODd9ETw17GYNrP/fk9TvfJxWxTNZzYLOjcqWxD5aESqtSrPlmgCcyXv3s1msMcmzy9ftDn5tXLI1bdcKmu1m5ks76j6wtn0oP+qs7jYVfqb5jOkSa7iLpK6zdoubjMK9BXsisMqj+sntyWF9Q17nT1v582ZDaGlD2qug3439YYXj4K4x58559/c2CZG0jKzTl8p4PiYaPGeo/6nonXCNTAblfjTEiNm1c=;
 4:bIbBpD1VtnwlnXzI64TIloyxvWbbrdmJP6DAlhMMWlLlB8Al4OnGSXjcz5Ju+ORqY80fuT/PpPDIoPO4NASOVAFfwjM08uEZnrjAZO0ddqf6kjBQTY6Y34msyFvzPK9YaphTzoLdMunjL4EtzaOujwHPCrhLQbFE45+ggHMmqWyxc0MQNgcxw4YQntCr+ORjywRwSnyJ0opk5PsGBtkeMBeU1eIa88CZAjHStZVQ2X5ogMjzljVibM+z4crnKKK7QrLhrU4wrs+TURQypLvDwZE6mvBVY2eyIir7zrQMSxYqTgGEoSB60QL4vqwXM40T
X-Microsoft-Antispam-PRVS: <CY1PR07MB2522987A1A6FC9DEBEC15D88E30E0@CY1PR07MB2522.namprd07.prod.outlook.com>
X-Exchange-Antispam-Report-Test: UriScan:(180628864354917);
X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0;
 RULEID:(6040450)(2401047)(8121501046)(5005006)(3231023)(93006095)(10201501046)(3002001)(6041248)(201703131423075)(201702281528075)(201703061421075)(201703061406153)(20161123555025)(20161123564025)(20161123560025)(20161123558100)(20161123562025)(6072148)(201708071742011);
 SRVR:CY1PR07MB2522; BCL:0; PCL:0; RULEID:(100000803101)(100110400095);
 SRVR:CY1PR07MB2522; 
X-Forefront-PRVS: 0525BB0ADF
X-Forefront-Antispam-Report: SFV:NSPM;
 SFS:(10009020)(376002)(39850400004)(346002)(396003)(366004)(13464003)(189003)(199004)(478600001)(6246003)(9686003)(105586002)(47776003)(5009440100003)(97736004)(52116002)(66066001)(3846002)(68736007)(106356001)(1076002)(6496006)(575784001)(72206003)(316002)(6116002)(23726003)(33716001)(83506002)(16526018)(33656002)(4326008)(305945005)(6916009)(53936002)(16586007)(8936002)(58126008)(50466002)(81166006)(33896004)(5660300001)(229853002)(55016002)(76176011)(42882006)(8676002)(59450400001)(81156014)(6666003)(25786009)(386003)(2950100002)(2906002)(7736002)(18370500001);
 DIR:OUT; SFP:1101; SCL:1; SRVR:CY1PR07MB2522; H:jerin; FPR:; SPF:None;
 PTR:InfoNoRecords; MX:1; A:1; LANG:en; 
Received-SPF: None (protection.outlook.com: cavium.com does not designate
 permitted sender hosts)
X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; CY1PR07MB2522;
 23:DKBTfqC6lbPu4tOnldp/hji+NBd4kj8mBGu2dRDFJ?=
 =?us-ascii?Q?tSYZBeGZ2gCF2LwW6F2tPxF02UchZanMi64aG0VNHzjO3X8WfdCU75CroZRm?=
 =?us-ascii?Q?Z3ORIphEA+W5UYSkwMQARjQYq2YXHqS96DdPYquRDq8OhnPgZ3XtNXTDZnKz?=
 =?us-ascii?Q?dqetykoCSseh68hroxPvhybUhwS7czoDBNRcsn+gCogOMvS75Ad2SoZqrWHH?=
 =?us-ascii?Q?0OtnMXmMlOArcxJX3f/WhZkHoENzRsMUttusRyjaREY8mKK2j04Wsl+SVBit?=
 =?us-ascii?Q?PU7gBjlDs4V8YMBH+uRDZpUZ37hTqmgQrjTXX4NrcnZRVI2kKdEg8OjB/tW+?=
 =?us-ascii?Q?Vu78uoZ9Q9B656CYqsy+qW/ginXegAHecS2DLxMh57FSLmfieu4FRKS0aa4k?=
 =?us-ascii?Q?SFB88BffX4/TdiK9V+NdzlOx9FQ1G0jpigOtMH1fAuZgg5uUWQm9iKf55BFc?=
 =?us-ascii?Q?Q4IwXTloMW74VpW/9rwOv2LXUoL9+roaq+DfsL8jdj8ng5Zbt5UzDm0sMrGb?=
 =?us-ascii?Q?LdVUblSP4lBaOjUqqDB4uhcbzk0ofq8OEBz3X9Rst5xg8n+I/Vkn5GsIMCac?=
 =?us-ascii?Q?hU1tbr9JQ53KsQovxp2htA9QEpmMtZl0unib7zuHxuuZ1AKy+4JeuzGYPs9t?=
 =?us-ascii?Q?sx8hZsNcd1b64kqcbJ3C/FEXRYcVr4MyQInNob9REqLUggzwsAKrBGVq5u/Y?=
 =?us-ascii?Q?YVyK1ox00dRWW+CWDZdnd9ikQkLFmXARzHcHmPU783KrEqOBEJaglVxxB86a?=
 =?us-ascii?Q?HtmrfdFvVbk0ydr1cQcH65anQpVX+msFGMcb5PPoB6sWdYrSKhbdmj7IWYyP?=
 =?us-ascii?Q?Xra89G8Qot07+c8r51RyLHPNwoo7WczIkRCNdOSQY6bN/iCGdwzYMqKaFllq?=
 =?us-ascii?Q?H1EJAN+Qese+4JQUBK+Qke+TebtrEIgQgKMaPV21TZ4yRmULzr/SJjvsDliE?=
 =?us-ascii?Q?PBLhl/K7mFzNarFPM5l2+NPMTiNpi0J/42QbEKKqXsIhBjS8OjIU5ETv6ym1?=
 =?us-ascii?Q?4MX0Og8ZqP1bOEf0l4/FiMXqNNU5ksum3BzYZR18FWAjNfZi7+mkDI1D989G?=
 =?us-ascii?Q?GsaWgvTXXWNhMPXAFVBh+K6fVnmlySelVQVFj3O/hdRdn6xyrmHhri+krA2H?=
 =?us-ascii?Q?7nRNAT+vzC2u8E67ihzXKv3myFcWKoeRFRlmRn914H9S/jsvO5+PR0/wNK8K?=
 =?us-ascii?Q?yaihCWdQF+zfwk7J+n2O1lAgkZ3604oV/2OA1qyMGQRZp0kUV7+xfoAX7CPn?=
 =?us-ascii?Q?bxImvfaPtF7nYy9vX/FUGYht/kHH57PZycLxWhGyg8YDfRVk0w3QtH38Yuf7?=
 =?us-ascii?B?QT09?=
X-Microsoft-Exchange-Diagnostics: 1; CY1PR07MB2522;
 6:lj40dAki8rdg7jqmOhGD3iXzPBhUKMW6q0TMybPZwpxxc9StTxo/VpXuY5TDr3zLzLcbvGPjtYx53iUTI1yYkwLtIerCLLnmJd6J4+V7U/sFFesIZC1vGHgdBw+VEs+2w31u4IFD95R0WaYAhJX1CMNObewK9IQrPNeOG2ZDLs6VETyvCkowGrJaYg/kSFeldecttL3Mk1LGe7HhyzP+bEd0pDpUci+S3QUnzTcFBHqhtwzmf+FwA7P2jrfu8sB7/bA4fyjRFgcWoPj4NA9VdFx+h+ZOGg2WpuLPHQdHDAnyFyfzfXSTQ6PmO1ktYfUWwLBwLNaT50x9bLJ9deYPaNBj4e3DLOUx/oxAuocnabs=;
 5:q5lo1E3zZ7GzNfTVIxRLQq4CeOXzvd9cMrk3fXT/v276DzSpMgSPB8nSrKHzptdYw3Ku/BYHvQafFqvE+C6TTNj9Ddl1Pt6mxdP20jpO2iyJ/ZPOgcb8hL+DlLa3cR2YHM+ZPZMNta1+0sT+4NArhv1FnaII7xnxaO2Ccji/WYI=;
 24:gRn/uN6WH38hU4r1oP5tqM5J3CVyZ3XeDxNBFSDeoZQLyp0NS/JX1lYQSRZS/Q4/XDGjGod0QYtyBepQ1MdXf/EoUL8cGDsqnSxoor7rjAs=;
 7:gUUG8ZpL5tzBu2n/j4irzbKTUNXlrdUbc6Bs3f22tDDaTzzIIqC0cjUPR4+vdKK+midpfcn7TSs5oXvwYERshqFjLSZql20aAWf+YVgUA6j5DC2difL7YBQKanRgYSej6O9AAxcb1YLq4LqOOvwCDuEWpI/schm6KDX+3v0OCksJAPmkpI+16FPof2QilP7xTTDC5MnmvQmORwUNMtvexjav0LbvYwXuKNo3LtnLefNIaTVuQR6seCpsa9ovlUBY
SpamDiagnosticOutput: 1:99
SpamDiagnosticMetadata: NSPM
X-OriginatorOrg: caviumnetworks.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 18 Dec 2017 07:44:39.8299 (UTC)
X-MS-Exchange-CrossTenant-Network-Message-Id: 7a9e1e4f-41fe-4b72-eef7-08d545eb32f2
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194
X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY1PR07MB2522
Subject: Re: [dpdk-dev] [PATCH v3] arch/arm: optimization for memcpy on
	AArch64
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 18 Dec 2017 07:44:43 -0000

-----Original Message-----
> Date: Mon, 18 Dec 2017 10:54:24 +0800
> From: Herbert Guan <herbert.guan@arm.com>
> To: dev@dpdk.org, jerin.jacob@caviumnetworks.com
> CC: Herbert Guan <herbert.guan@arm.com>
> Subject: [PATCH v3] arch/arm: optimization for memcpy on AArch64
> X-Mailer: git-send-email 1.8.3.1
> 
> Signed-off-by: Herbert Guan <herbert.guan@arm.com>
> ---
>  config/common_armv8a_linuxapp                      |   6 +
>  .../common/include/arch/arm/rte_memcpy_64.h        | 292 +++++++++++++++++++++
>  2 files changed, 298 insertions(+)
> 
> diff --git a/config/common_armv8a_linuxapp b/config/common_armv8a_linuxapp
> index 6732d1e..8f0cbed 100644
> --- a/config/common_armv8a_linuxapp
> +++ b/config/common_armv8a_linuxapp
> @@ -44,6 +44,12 @@ CONFIG_RTE_FORCE_INTRINSICS=y
>  # to address minimum DMA alignment across all arm64 implementations.
>  CONFIG_RTE_CACHE_LINE_SIZE=128
>  
> +# Accelarate rte_memcpy.  Be sure to run unit test to determine the

Additional space before "Be". Rather than just mentioning the unit test, mention
the absolute test case name(memcpy_perf_autotest)

> +# best threshold in code.  Refer to notes in source file

Additional space before "Refer"

> +# (lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h) for more
> +# info.
> +CONFIG_RTE_ARCH_ARM64_MEMCPY=n
> +
>  CONFIG_RTE_LIBRTE_FM10K_PMD=n
>  CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n
>  CONFIG_RTE_LIBRTE_AVP_PMD=n
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
> index b80d8ba..1ea275d 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
> @@ -42,6 +42,296 @@
>  
>  #include "generic/rte_memcpy.h"
>  
> +#ifdef RTE_ARCH_ARM64_MEMCPY

See the comment below at "(GCC_VERSION < 50400)" check

> +#include <rte_common.h>
> +#include <rte_branch_prediction.h>
> +
> +/*
> + * The memory copy performance differs on different AArch64 micro-architectures.
> + * And the most recent glibc (e.g. 2.23 or later) can provide a better memcpy()
> + * performance compared to old glibc versions. It's always suggested to use a
> + * more recent glibc if possible, from which the entire system can get benefit.
> + *
> + * This implementation improves memory copy on some aarch64 micro-architectures,
> + * when an old glibc (e.g. 2.19, 2.17...) is being used. It is disabled by
> + * default and needs "RTE_ARCH_ARM64_MEMCPY" defined to activate. It's not
> + * always providing better performance than memcpy() so users need to run unit
> + * test "memcpy_perf_autotest" and customize parameters in customization section
> + * below for best performance.
> + *
> + * Compiler version will also impact the rte_memcpy() performance. It's observed
> + * on some platforms and with the same code, GCC 7.2.0 compiled binaries can
> + * provide better performance than GCC 4.8.5 compiled binaries.
> + */
> +
> +/**************************************
> + * Beginning of customization section
> + **************************************/
> +#define ALIGNMENT_MASK 0x0F

This symbol will be included in public rte_memcpy.h version for arm64 DPDK build.
Please use RTE_ prefix to avoid multi definition.(RTE_ARCH_ARM64_ALIGN_MASK ? or any shorter name)

> +#ifndef RTE_ARCH_ARM64_MEMCPY_STRICT_ALIGN
> +/* Only src unalignment will be treaed as unaligned copy */
> +#define IS_UNALIGNED_COPY(dst, src) ((uintptr_t)(dst) & ALIGNMENT_MASK)
> +#else
> +/* Both dst and src unalignment will be treated as unaligned copy */
> +#define IS_UNALIGNED_COPY(dst, src) \
> +		(((uintptr_t)(dst) | (uintptr_t)(src)) & ALIGNMENT_MASK)
> +#endif
> +
> +
> +/*
> + * If copy size is larger than threshold, memcpy() will be used.
> + * Run "memcpy_perf_autotest" to determine the proper threshold.
> + */
> +#define ALIGNED_THRESHOLD       ((size_t)(0xffffffff))
> +#define UNALIGNED_THRESHOLD     ((size_t)(0xffffffff))

Same as above comment.

> +
> +/**************************************
> + * End of customization section
> + **************************************/
> +#ifdef RTE_TOOLCHAIN_GCC
> +#if (GCC_VERSION < 50400)
> +#warning "The GCC version is quite old, which may result in sub-optimal \
> +performance of the compiled code. It is suggested that at least GCC 5.4.0 \
> +be used."

Even though it is warning, based on where this file get included it will generate error(see below)
How about, selecting optimized memcpy when RTE_ARCH_ARM64_MEMCPY && if (GCC_VERSION >= 50400) ?

  CC eal_common_options.o
In file included from
/home/jerin/dpdk.org/build/include/rte_memcpy.h:37:0,from
/home/jerin/dpdk.org/lib/librte_eal/common/eal_common_options.c:53:
/home/jerin/dpdk.org/build/include/rte_memcpy_64.h:93:2: error: #warning
                                                        ^^^^^^^^
"The GCC version is quite old, which may result in sub-optimal
performance of the compiled code. It is suggested that at least GCC
5.4.0 be used." [-Werror=cpp]
                ^^^^^^^^^^^^^^
 #warning "The GCC version is quite old, which may result in sub-optimal
\
  ^


> +#endif
> +#endif
> +
> +
> +#if RTE_CACHE_LINE_SIZE >= 128

We can remove this conditional compilation check. ie. It can get compiled for both cases,
But it will be used only when RTE_CACHE_LINE_SIZE >= 128

> +static __rte_always_inline void
> +rte_memcpy_ge16_lt128
> +(uint8_t *restrict dst, const uint8_t *restrict src, size_t n)
> +{
> +	if (n < 64) {
> +		if (n == 16) {
> +			rte_mov16(dst, src);
> +		} else if (n <= 32) {
> +			rte_mov16(dst, src);
> +			rte_mov16(dst - 16 + n, src - 16 + n);
> +		} else if (n <= 48) {
> +			rte_mov32(dst, src);
> +			rte_mov16(dst - 16 + n, src - 16 + n);
> +		} else {
> +			rte_mov48(dst, src);
> +			rte_mov16(dst - 16 + n, src - 16 + n);
> +		}
> +	} else {
> +		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
> +		if (n > 48 + 64)
> +			rte_mov64(dst - 64 + n, src - 64 + n);
> +		else if (n > 32 + 64)
> +			rte_mov48(dst - 48 + n, src - 48 + n);
> +		else if (n > 16 + 64)
> +			rte_mov32(dst - 32 + n, src - 32 + n);
> +		else if (n > 64)
> +			rte_mov16(dst - 16 + n, src - 16 + n);
> +	}
> +}
> +
> +
> +#else

Same as above comment.

> +static __rte_always_inline void
> +rte_memcpy_ge16_lt64
> +(uint8_t *restrict dst, const uint8_t *restrict src, size_t n)
> +{
> +	if (n == 16) {
> +		rte_mov16(dst, src);
> +	} else if (n <= 32) {
> +		rte_mov16(dst, src);
> +		rte_mov16(dst - 16 + n, src - 16 + n);
> +	} else if (n <= 48) {
> +		rte_mov32(dst, src);
> +		rte_mov16(dst - 16 + n, src - 16 + n);
> +	} else {
> +		rte_mov48(dst, src);
> +		rte_mov16(dst - 16 + n, src - 16 + n);
> +	}
> +}
> +
> +
> +static __rte_always_inline void *
> +rte_memcpy(void *restrict dst, const void *restrict src, size_t n)
> +{
> +	if (n < 16) {
> +		rte_memcpy_lt16((uint8_t *)dst, (const uint8_t *)src, n);
> +		return dst;
> +	}
> +#if RTE_CACHE_LINE_SIZE >= 128
> +	if (n < 128) {
> +		rte_memcpy_ge16_lt128((uint8_t *)dst, (const uint8_t *)src, n);
> +		return dst;
> +	}
> +#else
> +	if (n < 64) {
> +		rte_memcpy_ge16_lt64((uint8_t *)dst, (const uint8_t *)src, n);
> +		return dst;
> +	}
> +#endif
> +	__builtin_prefetch(src, 0, 0);
> +	__builtin_prefetch(dst, 1, 0);
> +	if (likely(
> +		  (!IS_UNALIGNED_COPY(dst, src) && n <= ALIGNED_THRESHOLD)
> +		   || (IS_UNALIGNED_COPY(dst, src) && n <= UNALIGNED_THRESHOLD)
> +		  )) {
> +#if RTE_CACHE_LINE_SIZE >= 128
> +		rte_memcpy_ge128((uint8_t *)dst, (const uint8_t *)src, n);
> +#else
> +		rte_memcpy_ge64((uint8_t *)dst, (const uint8_t *)src, n);
> +#endif

Can we remove this #ifdef clutter(We have two of them in a same function)?

I suggest to remove this clutter by having the separate routine. ie.
1)
#if RTE_CACHE_LINE_SIZE >= 128
rte_memcpy(void *restrict dst, const void *restrict src, size_t n)
{
}
#else
rte_memcpy(void *restrict dst, const void *restrict src, size_t n)
{
}
#endif

2) Have separate inline function to resolve following logic and used it 
in both variants.

	if (likely(
		  (!IS_UNALIGNED_COPY(dst, src) && n <= ALIGNED_THRESHOLD)
		   || (IS_UNALIGNED_COPY(dst, src) && n <= UNALIGNED_THRESHOLD)
		  )) {

With above changes:
Acked-by: Jerin Jacob <jerin.jacob@caviumnetworks.com>