From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <Jerin.Jacob@caviumnetworks.com>
Received: from na01-bn1-obe.outbound.protection.outlook.com
 (mail-bn1bon0056.outbound.protection.outlook.com [157.56.111.56])
 by dpdk.org (Postfix) with ESMTP id 64FC2FE5
 for <dev@dpdk.org>; Mon,  2 Nov 2015 05:58:00 +0100 (CET)
Authentication-Results: spf=none (sender IP is )
 smtp.mailfrom=Jerin.Jacob@caviumnetworks.com; 
Received: from localhost.localdomain (122.167.182.182) by
 BLUPR0701MB1972.namprd07.prod.outlook.com (10.163.121.23) with Microsoft SMTP
 Server (TLS) id 15.1.312.18; Mon, 2 Nov 2015 04:57:52 +0000
Date: Mon, 2 Nov 2015 10:27:29 +0530
From: Jerin Jacob <jerin.jacob@caviumnetworks.com>
To: David Hunt <david.hunt@intel.com>
Message-ID: <20151102045728.GB16413@localhost.localdomain>
References: <1446212959-19832-1-git-send-email-david.hunt@intel.com>
 <1446212959-19832-2-git-send-email-david.hunt@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Disposition: inline
In-Reply-To: <1446212959-19832-2-git-send-email-david.hunt@intel.com>
User-Agent: Mutt/1.5.23 (2014-03-12)
X-Originating-IP: [122.167.182.182]
X-ClientProxiedBy: MA1PR01CA0050.INDPRD01.PROD.OUTLOOK.COM (25.164.116.150) To
 BLUPR0701MB1972.namprd07.prod.outlook.com (25.163.121.23)
X-Microsoft-Exchange-Diagnostics: 1; BLUPR0701MB1972;
 2:3mHjC+XNn1HaxAHl7UKQCsFGElYoXFMzVXKJyxKf9RCyoq69XYn6HJ1vXNi91rFzN6KW/zHlxdVygmOC+U2JREs87W6SknQdpbAS2FvsvyYDWOmKV9b+lGgKTz2WfCM3pGavoIbG5MMMGGGI0w/JrkFkZxJmbNxHxIxWjduDY5c=;
 3:C7+Sg2RbmJooFL36Q7lDwRXqYgnH4eD4kJpnIyluA8nEQQPhIvAtlqUeEFYqoFDSyxExT8f8w+HBPhpY01uNSaPKqtQ2QEQFyV4QUHaYVIbN3Gr58Xbh32jMk+OdA86ad1ag5ORyUGcUPFAvOCkInw==;
 25:dK+aovjAzz8WXCTciJS6zTlDn3+c0tP9s6S4fiWaY1H1GUx5FWPc4OtxXyxuOjSMMpcIQCoyEaMlwI/SqC8Ko16vqwIFbx7kpNL+pJaM/+wlta8cOgbtPaiebre9D4MKxts3p+lf9kI5yuaYccIJMH80YIKudddGUCSHafaDIwsCiEFuNZa3nQM8U7MZI7NOuvXxS5zHWoRscn30b64OEjyP/rEtUCRxUKur8MBhKa7we2s3Nh92zhuJS/Ql0WahvB2MyxyhjbAKe+0MJvYIiA==
X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:;SRVR:BLUPR0701MB1972;
X-Microsoft-Exchange-Diagnostics: 1; BLUPR0701MB1972;
 20:Nuf2PtFFLf8gPOk6EQ8zlW3oHzUSmc3nzZCIhK1fZ0UslP4sCDoBB+bafWSerandrNyH5WG48+MxnonWpdDfivNo96/WcP/6lD+Sx+w19+FEDYDAfIOaJ8G/XZPQsT6c/B1IcYaf7XWdw7NE/TGxooN9NtzXUD8gpywlXP1VwOoEfk/FC4+AgRZSpXrtDbdB22cPULu0rMYXqTRI66PtwAktTgjM/QujfZtK8mxnfCb3aZaO/VHeStf032U3lG4yinTtSLnTQRW50CTGtNqPu8WSVKg112yl0qRnh80Gn0hz87u8/FV/bs7Mgd41t+9GOy/3RxhKiv43xwQ5OHpDj1EBb3CJcAX4bzaPzFzFOKdfUKIMfLU3IaZqy14Jh52HPU8zSPBz8XxBWbskNNkPW0VHTmd8B5PuqAqjuhP3csnr8vihv8VcZDQc4dupuWSRS9FU4NyHo9ix+xGiJOcMn2dkGSVGx+jiEWFSX7HTlP75T34zx9SPeLq2HTeQxYGqdfz2lM3LJKp45Xf0j9lipQX2/nlH0pAVI8SbhGfoAXqVcQVgPGnKANH9G4KswlS9stU4pAymrpAqFOEY80DSD/+qxmlsoQqVgwIKCAs7C5o=
X-Microsoft-Antispam-PRVS: <BLUPR0701MB197284D58246D8A98AA631808C2C0@BLUPR0701MB1972.namprd07.prod.outlook.com>
X-Exchange-Antispam-Report-Test: UriScan:;
X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0;
 RULEID:(601004)(2401047)(8121501046)(5005006)(520078)(3002001)(10201501046);
 SRVR:BLUPR0701MB1972; BCL:0; PCL:0; RULEID:; SRVR:BLUPR0701MB1972; 
X-Microsoft-Exchange-Diagnostics: 1; BLUPR0701MB1972;
 4:qjHsxpzfqtXvC0mvy+rXyDW9UzHtvp6hXyyin+kW+1zz6mlfGKOLFPGl/vPm3s1ei8yKoF13/6e7P484WgGoivVfKV04fYEPS1n1zkmXN6BMlhD34EozByc/Sne4o25/V3/o+PngnTymre36iVvCi4oQvn45qtvhMUuFoyLCPI3hQNNt9huBsy6UWkctxkd2g6idZ2T47+tpkZ5IAq1F+zLdUuQSHqQ/OHeU60jX1NuQTgyJoVruUCJZ1xxUQWQdQ2jHfTn+cXJ3/JqkXPPn+3Zz375PiNhw2c3OuZXcJvA2dmM5xsfW9eWdpfx3jCJp2hbzLT5d4aqumUxmuyhBJ/I0RjU0XXptYsh10ZCthv9l8W4ZYUv8wNBpc1GqRvFi
X-Forefront-PRVS: 0748FF9A04
X-Forefront-Antispam-Report: SFV:NSPM;
 SFS:(10009020)(6069001)(6009001)(199003)(189002)(24454002)(2950100001)(42186005)(4001350100001)(46406003)(81156007)(110136002)(54356999)(33656002)(86362001)(61506002)(97756001)(5008740100001)(5004730100002)(77096005)(47776003)(101416001)(50466002)(83506001)(66066001)(23726002)(189998001)(106356001)(5001960100002)(5007970100001)(40100003)(122386002)(19580395003)(105586002)(50986999)(97736004)(19580405001)(92566002)(76176999)(87976001)(7099028);
 DIR:OUT; SFP:1101; SCL:1; SRVR:BLUPR0701MB1972; H:localhost.localdomain; FPR:;
 SPF:None; PTR:InfoNoRecords; A:1; MX:1; LANG:en; 
Received-SPF: None (protection.outlook.com: caviumnetworks.com does not
 designate permitted sender hosts)
X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BLUPR0701MB1972;
 23:MrncCjT4erd0+WmdODnYY9ooCa/O8gPp2Ypi+cl?=
 =?us-ascii?Q?JBj2PIQ/WtHz0TFODSfs0EFO7OWJIP3MXz8N/7fcx3iI7Or9WcGBy0iO2G8G?=
 =?us-ascii?Q?Tnw44+mKRS9dUwV8Jn/oKivpc+UQQo9IA+MEc8zj+J2qgZmkRLI66EdD+8jF?=
 =?us-ascii?Q?yvXZBFdZgbJrD8Uyf/F3qPn1YOdGHfVTQIlN5KUTL9mUCPV/NnqFHBLGzN88?=
 =?us-ascii?Q?haISpf0W0VlHxZ4qMkZw+IXKxOLGWVqgJ7Hfy+wfDBqmfAPw1IDX3vOXYctK?=
 =?us-ascii?Q?MrC8cr5Q2fcI4AxUfPtRRtnjyyHac7yP77Gi1uGuAa8mO8vrS2oat5vR0Vxo?=
 =?us-ascii?Q?bVVT900fXOke2BGJi6U1pAccivV7AGfk5DxguXLeEcVYJf9ZzWHRhmUSpFWV?=
 =?us-ascii?Q?/w+8kpldVzQlOIMveuPYkExmlaRqzpoq4dxOOQ+s4Gvzg5msN0NzyzluunRy?=
 =?us-ascii?Q?zWjHy1DL6eLiY+yIk9hPe2dnZ5nfOClJyM+iXO1fe7BICezR1WPn472DRC0J?=
 =?us-ascii?Q?stj9cQq6Jfm+ZomwXb7nKzaE8XyowCjYoW5Rk57hyv3cFJ0o+k2t8byPxofk?=
 =?us-ascii?Q?6xIG2P9cvpDmPqCvVOKxfM6+3Jnh0qajq1De69Zo6ugunt6vtARjQe/Ihl/4?=
 =?us-ascii?Q?XHTqexscGsKrZg3HiRE05DCDreDB2uIqQyryBe6c54DVRLjWx3qKgcjEG2dA?=
 =?us-ascii?Q?OafvovlSbfo5u9FibB1HRGqCF1GSDb4HUmlAn+Emkkf9kmmZI47JoBPh54li?=
 =?us-ascii?Q?bov9HY5YkjkdMQJQDintiFYF6w1n0dZOhz91eym7xWE3RoNzFci4S/WMt6Rb?=
 =?us-ascii?Q?NtJ4+Yi7LXie3vF352A9jzkkcmPI1V5EzkARKvhSKoflJ5LNBfqvPmn//n40?=
 =?us-ascii?Q?AhZdpzWq2qCcMfIkp6UM48NV37BaRxSDKUro+aZXFJf99dAx0i2flTiHfFw4?=
 =?us-ascii?Q?LvrhAKP0ANGVmZM/P7SRdSyH3vGS9oyXaF+AUZXphESiyTDW5D35UWuzIQkn?=
 =?us-ascii?Q?bqNDCRAtWidqNVH5rryrCmDaGghz9LClre7PnEO4y7CBBLDI1le2j5fPy+Q0?=
 =?us-ascii?Q?yTXncGRQI+Xs+JPZR4q4CiPHWlDzH?=
X-Microsoft-Exchange-Diagnostics: 1; BLUPR0701MB1972;
 5:89e0164VcfIjpSIgl3qBbos+59H50yZQ/i8aXUdk1bSn+qhyJ/FffbkewVAu3eB++T1JZw+8MtnsWpV5S2/PT8faLrX8JAhTR80cBE9K54DjrgDTX8fd5JvgBcbcO6GihiLlsEubqQo/J9AppQJGGA==;
 24:KfE75oayrqvjmfV/zpLSbLHVzmOu8IKc/t/DlYC77ycC1XtAziJeCdqsT49V1JMSTv/+/fFphItNFLdtoBk/q1tjxTJjdcHwG/g82kdTWM8=;
 20:at67gECjK1EGJQQ0/2tafd/PG3paKSIN+s8YKVaj1jPquTz3ZBiqwsKHWb5sc19bFZItDFJqMarLMURO5r4y4Q==
SpamDiagnosticOutput: 1:23
SpamDiagnosticMetadata: NSPM
X-OriginatorOrg: caviumnetworks.com
X-MS-Exchange-CrossTenant-OriginalArrivalTime: 02 Nov 2015 04:57:52.7508 (UTC)
X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted
X-MS-Exchange-Transport-CrossTenantHeadersStamped: BLUPR0701MB1972
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] [PATCH v3 1/6] eal/arm: add 64-bit armv8 version of
 rte_memcpy.h
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Mon, 02 Nov 2015 04:58:00 -0000

On Fri, Oct 30, 2015 at 01:49:14PM +0000, David Hunt wrote:
> Signed-off-by: David Hunt <david.hunt@intel.com>
> ---
>  .../common/include/arch/arm/rte_memcpy.h           |   4 +
>  .../common/include/arch/arm/rte_memcpy_64.h        | 308 +++++++++++++++++++++
>  2 files changed, 312 insertions(+)
>  create mode 100644 lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
> 
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy.h
> index d9f5bf1..1d562c3 100644
> --- a/lib/librte_eal/common/include/arch/arm/rte_memcpy.h
> +++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy.h
> @@ -33,6 +33,10 @@
>  #ifndef _RTE_MEMCPY_ARM_H_
>  #define _RTE_MEMCPY_ARM_H_
>  
> +#ifdef RTE_ARCH_64
> +#include <rte_memcpy_64.h>
> +#else
>  #include <rte_memcpy_32.h>
> +#endif
>  
>  #endif /* _RTE_MEMCPY_ARM_H_ */
> diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
> new file mode 100644
> index 0000000..6d85113
> --- /dev/null
> +++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h
> @@ -0,0 +1,308 @@
> +/*
> + *   BSD LICENSE
> + *
> + *   Copyright (C) IBM Corporation 2014.
> + *
> + *   Redistribution and use in source and binary forms, with or without
> + *   modification, are permitted provided that the following conditions
> + *   are met:
> + *
> + *     * Redistributions of source code must retain the above copyright
> + *       notice, this list of conditions and the following disclaimer.
> + *     * Redistributions in binary form must reproduce the above copyright
> + *       notice, this list of conditions and the following disclaimer in
> + *       the documentation and/or other materials provided with the
> + *       distribution.
> + *     * Neither the name of IBM Corporation nor the names of its
> + *       contributors may be used to endorse or promote products derived
> + *       from this software without specific prior written permission.
> + *
> + *   THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
> + *   "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
> + *   LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
> + *   A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
> + *   OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
> + *   SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
> + *   LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
> + *   DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
> + *   THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
> + *   (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
> + *   OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
> +*/
> +
> +#ifndef _RTE_MEMCPY_ARM_64_H_
> +#define _RTE_MEMCPY_ARM_64_H_
> +
> +#include <stdint.h>
> +#include <string.h>
> +
> +#ifdef __cplusplus
> +extern "C" {
> +#endif
> +
> +#include "generic/rte_memcpy.h"
> +
> +#ifdef __ARM_NEON_FP

SIMD is not optional in armv8 spec.So every armv8 machine will have
SIMD instruction unlike armv7.More over LDP/STP instruction is
not part of SIMD.So this check is not required or it can
be replaced with a check that select memcpy from either libc or this specific
implementation

> +
> +/* ARM NEON Intrinsics are used to copy data */
> +#include <arm_neon.h>
> +
> +static inline void
> +rte_mov16(uint8_t *dst, const uint8_t *src)
> +{
> +	asm volatile("LDP d0, d1, [%0]\n\t"
> +		     "STP d0, d1, [%1]\n\t"
> +		     : : "r" (src), "r" (dst) :
> +	);
> +}

IMO, no need to hardcode registers used for the mem move(d0, d1).
Let compiler schedule the registers for better performance.


> +
> +static inline void
> +rte_mov32(uint8_t *dst, const uint8_t *src)
> +{
> +	asm volatile("LDP q0, q1, [%0]\n\t"
> +		     "STP q0, q1, [%1]\n\t"
> +		     : : "r" (src), "r" (dst) :
> +	);
> +}
> +
> +static inline void
> +rte_mov48(uint8_t *dst, const uint8_t *src)
> +{
> +	asm volatile("LDP q0, q1, [%0]\n\t"
> +		     "STP q0, q1, [%1]\n\t"
> +		     "LDP d0, d1, [%0 , #32]\n\t"
> +		     "STP d0, d1, [%1 , #32]\n\t"
> +		     : : "r" (src), "r" (dst) :
> +	);
> +}
> +
> +static inline void
> +rte_mov64(uint8_t *dst, const uint8_t *src)
> +{
> +	asm volatile("LDP q0, q1, [%0]\n\t"
> +		     "STP q0, q1, [%1]\n\t"
> +		     "LDP q0, q1, [%0 , #32]\n\t"
> +		     "STP q0, q1, [%1 , #32]\n\t"
> +		     : : "r" (src), "r" (dst) :
> +	);
> +}
> +
> +static inline void
> +rte_mov128(uint8_t *dst, const uint8_t *src)
> +{
> +	asm volatile("LDP q0, q1, [%0]\n\t"
> +		     "STP q0, q1, [%1]\n\t"
> +		     "LDP q0, q1, [%0 , #32]\n\t"
> +		     "STP q0, q1, [%1 , #32]\n\t"
> +		     "LDP q0, q1, [%0 , #64]\n\t"
> +		     "STP q0, q1, [%1 , #64]\n\t"
> +		     "LDP q0, q1, [%0 , #96]\n\t"
> +		     "STP q0, q1, [%1 , #96]\n\t"
> +		     : : "r" (src), "r" (dst) :
> +	);
> +}
> +
> +static inline void
> +rte_mov256(uint8_t *dst, const uint8_t *src)
> +{
> +	asm volatile("LDP q0, q1, [%0]\n\t"
> +		     "STP q0, q1, [%1]\n\t"
> +		     "LDP q0, q1, [%0 , #32]\n\t"
> +		     "STP q0, q1, [%1 , #32]\n\t"
> +		     "LDP q0, q1, [%0 , #64]\n\t"
> +		     "STP q0, q1, [%1 , #64]\n\t"
> +		     "LDP q0, q1, [%0 , #96]\n\t"
> +		     "STP q0, q1, [%1 , #96]\n\t"
> +		     "LDP q0, q1, [%0 , #128]\n\t"
> +		     "STP q0, q1, [%1 , #128]\n\t"
> +		     "LDP q0, q1, [%0 , #160]\n\t"
> +		     "STP q0, q1, [%1 , #160]\n\t"
> +		     "LDP q0, q1, [%0 , #192]\n\t"
> +		     "STP q0, q1, [%1 , #192]\n\t"
> +		     "LDP q0, q1, [%0 , #224]\n\t"
> +		     "STP q0, q1, [%1 , #224]\n\t"
> +		     : : "r" (src), "r" (dst) :
> +	);
> +}
> +
> +#define rte_memcpy(dst, src, n)              \
> +	({ (__builtin_constant_p(n)) ?       \
> +	memcpy((dst), (src), (n)) :          \
> +	rte_memcpy_func((dst), (src), (n)); })
> +
> +static inline void *
> +rte_memcpy_func(void *dst, const void *src, size_t n)
> +{
> +	void *ret = dst;
> +
> +	/* We can't copy < 16 bytes using XMM registers so do it manually. */
> +	if (n < 16) {
> +		if (n & 0x01) {
> +			*(uint8_t *)dst = *(const uint8_t *)src;
> +			dst = (uint8_t *)dst + 1;
> +			src = (const uint8_t *)src + 1;
> +		}
> +		if (n & 0x02) {
> +			*(uint16_t *)dst = *(const uint16_t *)src;
> +			dst = (uint16_t *)dst + 1;
> +			src = (const uint16_t *)src + 1;
> +		}
> +		if (n & 0x04) {
> +			*(uint32_t *)dst = *(const uint32_t *)src;
> +			dst = (uint32_t *)dst + 1;
> +			src = (const uint32_t *)src + 1;
> +		}
> +		if (n & 0x08)
> +			*(uint64_t *)dst = *(const uint64_t *)src;
> +		return ret;
> +	}
> +
> +	/* Special fast cases for <= 128 bytes */
> +	if (n <= 32) {
> +		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
> +		rte_mov16((uint8_t *)dst - 16 + n,
> +			(const uint8_t *)src - 16 + n);
> +		return ret;
> +	}
> +
> +	if (n <= 64) {
> +		rte_mov32((uint8_t *)dst, (const uint8_t *)src);
> +		rte_mov32((uint8_t *)dst - 32 + n,
> +			(const uint8_t *)src - 32 + n);
> +		return ret;
> +	}
> +
> +	if (n <= 128) {
> +		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
> +		rte_mov64((uint8_t *)dst - 64 + n,
> +			(const uint8_t *)src - 64 + n);
> +		return ret;
> +	}
> +
> +	/*
> +	 * For large copies > 128 bytes. This combination of 256, 64 and 16 byte
> +	 * copies was found to be faster than doing 128 and 32 byte copies as
> +	 * well.
> +	 */
> +	for ( ; n >= 256; n -= 256) {

There is room for prefetching the next cacheline based on the cache line
size.

> +		rte_mov256((uint8_t *)dst, (const uint8_t *)src);
> +		dst = (uint8_t *)dst + 256;
> +		src = (const uint8_t *)src + 256;
> +	}
> +
> +	/*
> +	 * We split the remaining bytes (which will be less than 256) into
> +	 * 64byte (2^6) chunks.
> +	 * Using incrementing integers in the case labels of a switch statement
> +	 * enourages the compiler to use a jump table. To get incrementing
> +	 * integers, we shift the 2 relevant bits to the LSB position to first
> +	 * get decrementing integers, and then subtract.
> +	 */
> +	switch (3 - (n >> 6)) {
> +	case 0x00:
> +		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
> +		n -= 64;
> +		dst = (uint8_t *)dst + 64;
> +		src = (const uint8_t *)src + 64;      /* fallthrough */
> +	case 0x01:
> +		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
> +		n -= 64;
> +		dst = (uint8_t *)dst + 64;
> +		src = (const uint8_t *)src + 64;      /* fallthrough */
> +	case 0x02:
> +		rte_mov64((uint8_t *)dst, (const uint8_t *)src);
> +		n -= 64;
> +		dst = (uint8_t *)dst + 64;
> +		src = (const uint8_t *)src + 64;      /* fallthrough */
> +	default:
> +		break;
> +	}
> +
> +	/*
> +	 * We split the remaining bytes (which will be less than 64) into
> +	 * 16byte (2^4) chunks, using the same switch structure as above.
> +	 */
> +	switch (3 - (n >> 4)) {
> +	case 0x00:
> +		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
> +		n -= 16;
> +		dst = (uint8_t *)dst + 16;
> +		src = (const uint8_t *)src + 16;      /* fallthrough */
> +	case 0x01:
> +		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
> +		n -= 16;
> +		dst = (uint8_t *)dst + 16;
> +		src = (const uint8_t *)src + 16;      /* fallthrough */
> +	case 0x02:
> +		rte_mov16((uint8_t *)dst, (const uint8_t *)src);
> +		n -= 16;
> +		dst = (uint8_t *)dst + 16;
> +		src = (const uint8_t *)src + 16;      /* fallthrough */
> +	default:
> +		break;
> +	}
> +
> +	/* Copy any remaining bytes, without going beyond end of buffers */
> +	if (n != 0)
> +		rte_mov16((uint8_t *)dst - 16 + n,
> +			(const uint8_t *)src - 16 + n);
> +	return ret;
> +}
> +
> +#else
> +
> +static inline void
> +rte_mov16(uint8_t *dst, const uint8_t *src)
> +{
> +	memcpy(dst, src, 16);
> +}
> +
> +static inline void
> +rte_mov32(uint8_t *dst, const uint8_t *src)
> +{
> +	memcpy(dst, src, 32);
> +}
> +
> +static inline void
> +rte_mov48(uint8_t *dst, const uint8_t *src)
> +{
> +	memcpy(dst, src, 48);
> +}
> +
> +static inline void
> +rte_mov64(uint8_t *dst, const uint8_t *src)
> +{
> +	memcpy(dst, src, 64);
> +}
> +
> +static inline void
> +rte_mov128(uint8_t *dst, const uint8_t *src)
> +{
> +	memcpy(dst, src, 128);
> +}
> +
> +static inline void
> +rte_mov256(uint8_t *dst, const uint8_t *src)
> +{
> +	memcpy(dst, src, 256);
> +}
> +
> +static inline void *
> +rte_memcpy(void *dst, const void *src, size_t n)
> +{
> +	return memcpy(dst, src, n);
> +}
> +
> +static inline void *
> +rte_memcpy_func(void *dst, const void *src, size_t n)
> +{
> +	return memcpy(dst, src, n);
> +}
> +
> +#endif /* __ARM_NEON_FP */
> +
> +#ifdef __cplusplus
> +}
> +#endif
> +
> +#endif /* _RTE_MEMCPY_ARM_64_H_ */
> -- 
> 1.9.1
>