From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from NAM03-CO1-obe.outbound.protection.outlook.com (mail-co1nam03on0048.outbound.protection.outlook.com [104.47.40.48]) by dpdk.org (Postfix) with ESMTP id B8B531B1A5 for ; Wed, 3 Jan 2018 14:35:29 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=CAVIUMNETWORKS.onmicrosoft.com; s=selector1-cavium-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version; bh=6pamWrA9jWgv2bGWP9gHzoVkMrGzRTsyY+aYsId6eZQ=; b=fZaDgFiiKbf8xNLKfj7wb0lWTobrzDA5Fc4lAt1wbIYogyavaUk4+lDyrdR6FD/1x8Bew+oey1jESDCp2cRvBx/e36u1WVrQkJlQsJENG3cFSVfya1OTddaxPtGaBPv9eZS/M7v7WNKOEyNwUnPuMb0dIZgVhXR+m9jYY7nhXyQ= Authentication-Results: spf=none (sender IP is ) smtp.mailfrom=Jerin.JacobKollanukkaran@cavium.com; Received: from jerin (111.93.218.67) by BN3PR07MB2513.namprd07.prod.outlook.com (10.167.4.138) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA384_P256) id 15.20.366.8; Wed, 3 Jan 2018 13:35:26 +0000 Date: Wed, 3 Jan 2018 19:05:15 +0530 From: Jerin Jacob To: Herbert Guan Cc: dev@dpdk.org Message-ID: <20180103133513.GA30368@jerin> References: <1513565664-19509-1-git-send-email-herbert.guan@arm.com> <1513834427-12635-1-git-send-email-herbert.guan@arm.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1513834427-12635-1-git-send-email-herbert.guan@arm.com> User-Agent: Mutt/1.9.2 (2017-12-15) X-Originating-IP: [111.93.218.67] X-ClientProxiedBy: BM1PR0101CA0042.INDPRD01.PROD.OUTLOOK.COM (10.174.221.28) To BN3PR07MB2513.namprd07.prod.outlook.com (10.167.4.138) X-MS-PublicTrafficType: Email X-MS-Office365-Filtering-Correlation-Id: 2e06d714-42c2-4047-e3d1-08d552aed9bd X-Microsoft-Antispam: UriScan:; BCL:0; PCL:0; RULEID:(4534020)(4602075)(4627115)(201703031133081)(201702281549075)(5600026)(4604075)(2017052603307)(7153060); SRVR:BN3PR07MB2513; X-Microsoft-Exchange-Diagnostics: 1; BN3PR07MB2513; 3:Q62Lw+W6cExsuFx1KI0cy1HQyV7WiiaOGhS6vdYdYd/jnl0KGX524yDbw9IdIbcNDlLwRt77dlTlsZwGnpy0OtTvuww0PEQtxA30LaSP2KJPeBLPrp1770lNK1NqyDg5Ft+sFBt5pxv+BJj4r/avkazdbtenmTpWp5Y9Hc2bPZH1ohZ91CLm24HIAiiPlyaRjHYBq3dqWLPCW21FA26DRSTrU1BbA3W4q9428bbRmLgy2FEF/Eso55jKMra4X5wN; 25:2DOLRdCmeh/9qCEl+ncaOcgXUHiKhMxZrtnLL8U8RQ7VCFmMh516lEMQJBt4M+W4FCfV/lJQnF5oO6ECDxoi1KVXsK3yTR6WFcdhlq/K/jM/mPB5v/xqaRE4d8o8raMkCPkukhJmudCubLJOXPy5Yr12SwDetkJ8WtwdQyL4U58z8ACLVe8Dv3riD19dV8FZgXdOqs0lSafhKrEPQkn6G1A6ghSGP6dmrbhDc0oD7kWcPrsBIngBhJc1vloTozvzVYIqWt+6qXrL039NwsNc8vRAQfX3w9tpyWgm/saJU8nAoIt6OHF9bqpK5ttYEvjH7oy2XcVQDQJLDTZEPBgR0A==; 31:x7ERIcWzjUXp4CrYUexA5Py8fX+Zyj0/29bVi5bYTY8WTwmT1/j4EJ6FFug+Jh8qPPGE4ju3hbSdgyrVSMzlJpdOzqHfY/dU1xUuxllogqEugzFsq6mbfaiL4XRd5xHiFHY/C9ckl7P+FGCXVowKqps8HNf9VMHLQpfbzBK1aqJ1c6EJiLaP7+5dNfBcTg36dwCDRLug+qZO1+5SWxc5JeV2HjRiAfPIa2j0JD4rufM= X-MS-TrafficTypeDiagnostic: BN3PR07MB2513: X-Microsoft-Exchange-Diagnostics: 1; BN3PR07MB2513; 20:XvhLr9ODlCmmI589U3+nG55MZhL1dCnDIjz3lNN3zeQ+jm8uV0h+mLkILdMkdk7TslTXAYVLWgTnHHG9sJ6EJkgP/zyK0uZLK0zii56cPaXihRw2pJuW5gULrj2BStkta6aQVSvJHcuVNWqCwGx5AO1nRA9ZifUUbkXDgoq6PBnUMmOQzjXNbt62rF2nHYSl7fgttdwCR5PSpnJ/au/DXv0Kk+t5WJfTcKMe5Lf3YEAs7g6tlJKAD0ZO6Ppv/eSZfDkIytKjIOm8RpamSLnWNrRI3zBKVxQQyDkpRcXmVUeuTI5ivEwxJI99jSSeYaIpsn1rO0kfViyjlL+no0x0zH3WZyWXQOezh7vrrYv//vrznoOOeTVEoSdfIiRtRzc6fQRXV5BG9oDWACpI97xBFhQLi+dYtmEgIDb4orZi5KWZ/ge14mE0hVpJj0DGR+alh1qp6dJ7cFq1Ug561sb2u1M6iUBc7JnGF2nXOGKYYSpaepYVT9YuiYD6ghxkXoDlzE6crRTKI3gTzl3OoUk6eMeuaVgAwyNWVJYdhsIFPa4vzKH73vC5+UA++5K7qe0l1r/hEirXh+UY1Izt/esrXJjzO7v1xGYwud8PsiHsvsU=; 4:xIok7ftfZ3z7xhkl2nZzotvvWE5J4XUca2iB6JQqzFgjTSKGuJdKsI1A/YpBBcKJkNJmIQ5apWjAYoalFkUIciTaX62QrLzKT5C6XZOKCQ0JebJoOc5BFfTpWwYwaG0/IfEcDdA4PSE66g5Vv8G+1To3RQytiLNCwUZyVKvs6ITb7xH2QdwCaDcW4Q+/TPgKVOlvoxMTzVQpgg1YJmBN/ZMcbCc3aijglWuhBBolp32/40MUzX2fA5gVztkjI1RFpbNTRWbBulHvNqYHug1JETGvNczRK3xOlUZNnco53f9W89bOjF5u77TEvW2cCfyN X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(180628864354917); X-Exchange-Antispam-Report-CFA-Test: BCL:0; PCL:0; RULEID:(6040470)(2401047)(5005006)(8121501046)(3231023)(944501075)(93006095)(3002001)(10201501046)(6041268)(20161123558120)(20161123564045)(20161123560045)(20161123562045)(201703131423095)(201702281528075)(20161123555045)(201703061421075)(201703061406153)(6072148)(201708071742011); SRVR:BN3PR07MB2513; BCL:0; PCL:0; RULEID:(100000803101)(100110400095); SRVR:BN3PR07MB2513; X-Forefront-PRVS: 0541031FF6 X-Forefront-Antispam-Report: SFV:NSPM; SFS:(10009020)(346002)(39860400002)(396003)(39380400002)(376002)(366004)(189003)(199004)(13464003)(76176011)(386003)(106356001)(6496006)(97736004)(33896004)(33656002)(52116002)(2906002)(47776003)(53936002)(105586002)(59450400001)(5009440100003)(23726003)(3846002)(33716001)(1076002)(5660300001)(6116002)(83506002)(229853002)(9686003)(55016002)(575784001)(16526018)(6246003)(81166006)(16586007)(81156014)(6666003)(6916009)(478600001)(72206003)(7736002)(2950100002)(50466002)(305945005)(25786009)(68736007)(8676002)(4326008)(8936002)(316002)(66066001)(42882006)(58126008)(18370500001); DIR:OUT; SFP:1101; SCL:1; SRVR:BN3PR07MB2513; H:jerin; FPR:; SPF:None; PTR:InfoNoRecords; MX:1; A:1; LANG:en; Received-SPF: None (protection.outlook.com: cavium.com does not designate permitted sender hosts) X-Microsoft-Exchange-Diagnostics: =?us-ascii?Q?1; BN3PR07MB2513; 23:qmfP4ddXsDvXXBedbwN6PV/JTIDCT0whSz7PsPZSa?= =?us-ascii?Q?4YSZ8e1LI25zuNvKejdnmtz1OYCiF9Zy8+BzVJrHWi5HwTi7ZV6StQmFfHvt?= =?us-ascii?Q?1ZYC9nhrRSSB8y/DY57gsS+puulRRff121Pgb6LpBKrpcYe2T+PTPNgzfQGv?= =?us-ascii?Q?3R8WPBkBX2UGXKdgLbJVgA8T73k9Ijb1n7aCp/QAqQVClsAlI8NVkiE+jg//?= =?us-ascii?Q?fFGTbGs3jolmeR5h49ROBWpncnjldWb2PPCn3XRMmg+s5+V02uUp2W+4euu0?= =?us-ascii?Q?2ufGjMBJiQoGhX9eIS29Kr+1ZWBJ6Vn7CvZ1z7ShB/S0P+551I/8aSdsV+3M?= =?us-ascii?Q?LCq/8KcC4SJfo4dW+pfoUfIZl09UWJUKgjAtqRj27KRfiiZcwsjiMdysvOlk?= =?us-ascii?Q?0KA8isbyHmghG34pRYRWZbIrhipXtnyv/HnfHzUFSK0HWyId6MOL/k1DY15b?= =?us-ascii?Q?3m9VOGVKaPXkLv6H27sk8Wa5WFKneNfSBUYX5txpC2J0jI/qNB6PhHcXh1Hl?= =?us-ascii?Q?833D99LiDlFsNJ1ch0qcQ7HN+DApmAqaJbcNahDVoXaQSt7RRXhCa2zC/09u?= =?us-ascii?Q?P5qxUR6vD0WQvqEPcuK8r4syjqhoPgRaMGlg4xo8nqVlY+XQPK8clCVdQvjI?= =?us-ascii?Q?JhM2NTRU9FckcEg+8eaj7KeRkKIxc5g5FumS0m7QMqwfTAIXHtr5Sy79U/fw?= =?us-ascii?Q?aaQQCVhp2CuTISrK5DnZU74ik463a4IED7ZMp9Vu0hpEdoDHVPe4dBrOIv/u?= =?us-ascii?Q?jqb3q8saKKH7J80ZT+IfTluAzjwgAgU5d2Z933x6hmmIijpEAw8LvcuG/8s4?= =?us-ascii?Q?1YJbSBEP6nvdFV1NzLLjSe/pb2XDvQvSD8fvDtt1jVXqC7LD5INQEBrBMZ0L?= =?us-ascii?Q?IFMnzB51wwNiNfS/SyktHKAiYA9wygjId8wWL0O3wO2UkpPW7Oa8UN7ODbOJ?= =?us-ascii?Q?Oyl2pdY59kmXDwcEgwEvWpr5Wl/C/n/Si6T9NoS4uGh15tniC10ioOJQDoSi?= =?us-ascii?Q?FqQA3diRhcckmOFyUeBHzhn4mxHGFzyc+p5L47rYgzUGgB5IXRgGwd0Y0quw?= =?us-ascii?Q?GLpdq07I/LYrSUjG0F6ykCI5cWhxPRPHlWrMUaZJFR9FcDrXFelqGLL5ntIs?= =?us-ascii?Q?nVtxvzg5U4RYcTxfS6wdLHp7sA9omOgsfbBUIKUJWbRHEjIFFKgUHPKDAsFF?= =?us-ascii?Q?TJKJIGkEwtNsyPp821A6aD3Ngc3z/+cJvqaQlT3A0SuHwlqB3vB1DwyV4Vt+?= =?us-ascii?Q?38IhU48M2VLbRP6LLrIlavIfvtAUPtZasqSEkALvGRPsMtPJvQVb+R6bebVA?= =?us-ascii?Q?m153KLdMoaDQjf6LOnEZ4P3OpgR81LG59lD/74ksPmv?= X-Microsoft-Exchange-Diagnostics: 1; BN3PR07MB2513; 6:cgiyT4wKz4iTZe70aePA6tVWoKdOY5E8WW6DaOLbh6DaBnNf6sfnJdEmganUSItsjC9K/FzEwgZhWRZvSROiHHXi2YzhLBHYTCDFaXFuV8bnprLpfcFZpNtSCSelukZXuVdkVXdjUXNPIGwxiWuwh9SVKvJDSkNblr1Isfwp+QNilm3GSR4mwLW+VkeFksvB7b289fC7pqq1OHTFCRHQaG9MMIbhe+bx9HAnT/LQ4yZw5ZkjqejFdzRdf2vKX6Nz2/3DZJLA+A//KkCQNagIHxnbBdtY5MLRqEHekmQvGrfuUXKmCtFvZB/9d/xnHiFs8w5furkoMU6EzCRr2XAEvlTCxTCbNnw8KRPEUe4R7NA=; 5:C8xAvPAr1HlAcMUy2YCvnQChipdVadPL5l7B/NecQe1VznNXOrm3C1gPW0wGkMiyePOaGTFw+bdG8qG69XhUqDkU7ZxKxYR8XThHivUcBil5acuHqTjjOoBWnEUN3PqNv53CO+IriktaZlgV4JdBrGpl/Wc2yTqwXVfsAGlVPXQ=; 24:JN3c9Bx/B6LVoQOaQNkrvNSr6XF/41bx94pJqHQyffrLNwb//f5oi1yxEqa6GIoApJMkTAUUYn5co3jhUo+tZzhO6zPg35accxiehCDaPWU=; 7:UySQ+CAMbPA3tZKQaX/4dubrP+eTdjeHByZCvceEmH82xdRpdxc5pfKtdE1Dipjn+nw3pKl80zGNXtUeKYLY4tUMoGVGKuHuWbOi31obxaGG6LS6k0vTIRZXIHNIxy6tNGxI9hlvEst5jrIw2QV0X4iPFcLQ3sUe1RqXKEu9tDB3DbsPX1K/f2T322MCDKzHwspFUYScBqsXZNLx0tu1rkMZPFuSYCKDiNiWyatQkN0MVSMNU475tfFQxySrfZu6 SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-OriginatorOrg: caviumnetworks.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 03 Jan 2018 13:35:26.4305 (UTC) X-MS-Exchange-CrossTenant-Network-Message-Id: 2e06d714-42c2-4047-e3d1-08d552aed9bd X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 711e4ccf-2e9b-4bcf-a551-4094005b6194 X-MS-Exchange-Transport-CrossTenantHeadersStamped: BN3PR07MB2513 Subject: Re: [dpdk-dev] [PATCH v4] arch/arm: optimization for memcpy on AArch64 X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 03 Jan 2018 13:35:30 -0000 -----Original Message----- > Date: Thu, 21 Dec 2017 13:33:47 +0800 > From: Herbert Guan > To: dev@dpdk.org, jerin.jacob@caviumnetworks.com > CC: Herbert Guan > Subject: [PATCH v4] arch/arm: optimization for memcpy on AArch64 > X-Mailer: git-send-email 1.8.3.1 > > This patch provides an option to do rte_memcpy() using 'restrict' > qualifier, which can induce GCC to do optimizations by using more > efficient instructions, providing some performance gain over memcpy() > on some AArch64 platforms/enviroments. > > The memory copy performance differs between different AArch64 > platforms. And a more recent glibc (e.g. 2.23 or later) > can provide a better memcpy() performance compared to old glibc > versions. It's always suggested to use a more recent glibc if > possible, from which the entire system can get benefit. If for some > reason an old glibc has to be used, this patch is provided for an > alternative. > > This implementation can improve memory copy on some AArch64 > platforms, when an old glibc (e.g. 2.19, 2.17...) is being used. > It is disabled by default and needs "RTE_ARCH_ARM64_MEMCPY" > defined to activate. It's not always proving better performance > than memcpy() so users need to run DPDK unit test > "memcpy_perf_autotest" and customize parameters in "customization > section" in rte_memcpy_64.h for best performance. > > Compiler version will also impact the rte_memcpy() performance. > It's observed on some platforms and with the same code, GCC 7.2.0 > compiled binary can provide better performance than GCC 4.8.5. It's > suggested to use GCC 5.4.0 or later. > > Signed-off-by: Herbert Guan Looks good. Find inline request for some minor changes. Feel free to add my Acked-by with those changes. > --- > config/common_armv8a_linuxapp | 6 + > .../common/include/arch/arm/rte_memcpy_64.h | 287 +++++++++++++++++++++ > 2 files changed, 293 insertions(+) > > diff --git a/config/common_armv8a_linuxapp b/config/common_armv8a_linuxapp > index 6732d1e..8f0cbed 100644 > --- a/config/common_armv8a_linuxapp > +++ b/config/common_armv8a_linuxapp > @@ -44,6 +44,12 @@ CONFIG_RTE_FORCE_INTRINSICS=y > # to address minimum DMA alignment across all arm64 implementations. > CONFIG_RTE_CACHE_LINE_SIZE=128 > > +# Accelarate rte_memcpy. Be sure to run unit test to determine the > +# best threshold in code. Refer to notes in source file > +# (lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h) for more > +# info. > +CONFIG_RTE_ARCH_ARM64_MEMCPY=n > + > CONFIG_RTE_LIBRTE_FM10K_PMD=n > CONFIG_RTE_LIBRTE_SFC_EFX_PMD=n > CONFIG_RTE_LIBRTE_AVP_PMD=n > diff --git a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h > index b80d8ba..b269f34 100644 > --- a/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h > +++ b/lib/librte_eal/common/include/arch/arm/rte_memcpy_64.h > @@ -42,6 +42,291 @@ > > #include "generic/rte_memcpy.h" > > +#ifdef RTE_ARCH_ARM64_MEMCPY > +#include > +#include > + > +/* > + * The memory copy performance differs on different AArch64 micro-architectures. > + * And the most recent glibc (e.g. 2.23 or later) can provide a better memcpy() > + * performance compared to old glibc versions. It's always suggested to use a > + * more recent glibc if possible, from which the entire system can get benefit. > + * > + * This implementation improves memory copy on some aarch64 micro-architectures, > + * when an old glibc (e.g. 2.19, 2.17...) is being used. It is disabled by > + * default and needs "RTE_ARCH_ARM64_MEMCPY" defined to activate. It's not > + * always providing better performance than memcpy() so users need to run unit > + * test "memcpy_perf_autotest" and customize parameters in customization section > + * below for best performance. > + * > + * Compiler version will also impact the rte_memcpy() performance. It's observed > + * on some platforms and with the same code, GCC 7.2.0 compiled binaries can > + * provide better performance than GCC 4.8.5 compiled binaries. > + */ > + > +/************************************** > + * Beginning of customization section > + **************************************/ > +#define RTE_ARM64_MEMCPY_ALIGN_MASK 0x0F > +#ifndef RTE_ARCH_ARM64_MEMCPY_STRICT_ALIGN > +/* Only src unalignment will be treaed as unaligned copy */ > +#define IS_UNALIGNED_COPY(dst, src) \ Better to to change to RTE_ARM64_MEMCPY_IS_UNALIGNED_COPY, as it is defined in public DPDK header file. > + ((uintptr_t)(dst) & RTE_ARM64_MEMCPY_ALIGN_MASK) > +#else > +/* Both dst and src unalignment will be treated as unaligned copy */ > +#define IS_UNALIGNED_COPY(dst, src) \ > + (((uintptr_t)(dst) | (uintptr_t)(src)) & RTE_ARM64_MEMCPY_ALIGN_MASK) Same as above > +#endif > + > + > +/* > + * If copy size is larger than threshold, memcpy() will be used. > + * Run "memcpy_perf_autotest" to determine the proper threshold. > + */ > +#define RTE_ARM64_MEMCPY_ALIGNED_THRESHOLD ((size_t)(0xffffffff)) > +#define RTE_ARM64_MEMCPY_UNALIGNED_THRESHOLD ((size_t)(0xffffffff)) > + > +/* > + * The logic of USE_RTE_MEMCPY() can also be modified to best fit platform. > + */ > +#define USE_RTE_MEMCPY(dst, src, n) \ > +((!IS_UNALIGNED_COPY(dst, src) && n <= RTE_ARM64_MEMCPY_ALIGNED_THRESHOLD) \ > +|| (IS_UNALIGNED_COPY(dst, src) && n <= RTE_ARM64_MEMCPY_UNALIGNED_THRESHOLD)) > + > + > +/************************************** > + * End of customization section > + **************************************/ > +#if defined(RTE_TOOLCHAIN_GCC) && !defined(RTE_AARCH64_SKIP_GCC_VERSION_CHECK) To maintain consistency s/RTE_AARCH64_SKIP_GCC_VERSION_CHECK/RTE_ARM64_MEMCPY_SKIP_GCC_VERSION_CHECK > +#if (GCC_VERSION < 50400) > +#warning "The GCC version is quite old, which may result in sub-optimal \ > +performance of the compiled code. It is suggested that at least GCC 5.4.0 \ > +be used." > +#endif > +#endif > + > +static __rte_always_inline void rte_mov16(uint8_t *dst, const uint8_t *src) static __rte_always_inline void rte_mov16(uint8_t *dst, const uint8_t *src) > +{ > + __uint128_t *dst128 = (__uint128_t *)dst; > + const __uint128_t *src128 = (const __uint128_t *)src; > + *dst128 = *src128; > +} > + > +static __rte_always_inline void rte_mov32(uint8_t *dst, const uint8_t *src) See above > +{ > + __uint128_t *dst128 = (__uint128_t *)dst; > + const __uint128_t *src128 = (const __uint128_t *)src; > + const __uint128_t x0 = src128[0], x1 = src128[1]; > + dst128[0] = x0; > + dst128[1] = x1; > +} > + > +static __rte_always_inline void rte_mov48(uint8_t *dst, const uint8_t *src) > +{ See above > + __uint128_t *dst128 = (__uint128_t *)dst; > + const __uint128_t *src128 = (const __uint128_t *)src; > + const __uint128_t x0 = src128[0], x1 = src128[1], x2 = src128[2]; > + dst128[0] = x0; > + dst128[1] = x1; > + dst128[2] = x2; > +} > + > +static __rte_always_inline void rte_mov64(uint8_t *dst, const uint8_t *src) > +{ See above > + __uint128_t *dst128 = (__uint128_t *)dst; > + const __uint128_t *src128 = (const __uint128_t *)src; > + const __uint128_t > + x0 = src128[0], x1 = src128[1], x2 = src128[2], x3 = src128[3]; > + dst128[0] = x0; > + dst128[1] = x1; > + dst128[2] = x2; > + dst128[3] = x3; > +} > + > +static __rte_always_inline void rte_mov128(uint8_t *dst, const uint8_t *src) > +{ See above > + __uint128_t *dst128 = (__uint128_t *)dst; > + const __uint128_t *src128 = (const __uint128_t *)src; > + /* Keep below declaration & copy sequence for optimized instructions */ > + const __uint128_t > + x0 = src128[0], x1 = src128[1], x2 = src128[2], x3 = src128[3]; > + dst128[0] = x0; > + __uint128_t x4 = src128[4]; > + dst128[1] = x1; > + __uint128_t x5 = src128[5]; > + dst128[2] = x2; > + __uint128_t x6 = src128[6]; > + dst128[3] = x3; > + __uint128_t x7 = src128[7]; > + dst128[4] = x4; > + dst128[5] = x5; > + dst128[6] = x6; > + dst128[7] = x7; > +} > + > +static __rte_always_inline void rte_mov256(uint8_t *dst, const uint8_t *src) > +{ See above