* [dpdk-dev] [PATCH 0/3] add lpm support for NEON @ 2015-11-30 17:24 Jerin Jacob 2015-11-30 17:24 ` [dpdk-dev] [PATCH 1/3] eal: introduce rte_vect_* abstractions Jerin Jacob ` (4 more replies) 0 siblings, 5 replies; 47+ messages in thread From: Jerin Jacob @ 2015-11-30 17:24 UTC (permalink / raw) To: dev - Introduce new rte_vect_* abstractions in eal - This patch set has the changes required for optimised pm library usage in arm64 perspective - Tested on Juno and Thunder boards - Tested and verified the changes with following DPDK unit test cases --lpm_autotest --lpm6_autotest - This patch set has dependency on [dpdk-dev] [PATCH v4 0/2] disable CONFIG_RTE_SCHED_VECTOR for arm - With these changes, arm64 platform supports all DPDK libraries(in feature wise) Jerin Jacob (3): eal: introduce rte_vect_* abstractions lpm: add support for NEON maintainers: claim responsibility for arm64 specific files of hash and lpm MAINTAINERS | 3 + app/test/test_lpm.c | 10 +- config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - lib/librte_eal/common/include/arch/arm/rte_vect.h | 17 ++- lib/librte_eal/common/include/arch/x86/rte_vect.h | 8 + lib/librte_lpm/Makefile | 3 + lib/librte_lpm/rte_lpm.h | 5 + lib/librte_lpm/rte_lpm_neon.h | 172 ++++++++++++++++++++++ 8 files changed, 212 insertions(+), 9 deletions(-) create mode 100644 lib/librte_lpm/rte_lpm_neon.h -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH 1/3] eal: introduce rte_vect_* abstractions 2015-11-30 17:24 [dpdk-dev] [PATCH 0/3] add lpm support for NEON Jerin Jacob @ 2015-11-30 17:24 ` Jerin Jacob 2015-12-02 13:43 ` Jan Viktorin 2015-11-30 17:24 ` [dpdk-dev] [PATCH 2/3] lpm: add support for NEON Jerin Jacob ` (3 subsequent siblings) 4 siblings, 1 reply; 47+ messages in thread From: Jerin Jacob @ 2015-11-30 17:24 UTC (permalink / raw) To: dev introduce rte_vect_* abstractions to remove SSE/AVX specific code in the common code(i.e the test applications) The patch does not provide any functional change for IA, the goal is to have infrastructure to reuse the common vector-based test code across all the architectures. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> --- lib/librte_eal/common/include/arch/arm/rte_vect.h | 17 ++++++++++++++++- lib/librte_eal/common/include/arch/x86/rte_vect.h | 8 ++++++++ 2 files changed, 24 insertions(+), 1 deletion(-) diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h index 21cdb4d..d300951 100644 --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h @@ -33,13 +33,14 @@ #ifndef _RTE_VECT_ARM_H_ #define _RTE_VECT_ARM_H_ -#include "arm_neon.h" +#include <arm_neon.h> #ifdef __cplusplus extern "C" { #endif typedef int32x4_t xmm_t; +typedef int32x4_t __m128i; #define XMM_SIZE (sizeof(xmm_t)) #define XMM_MASK (XMM_SIZE - 1) @@ -53,6 +54,20 @@ typedef union rte_xmm { double pd[XMM_SIZE / sizeof(double)]; } __attribute__((aligned(16))) rte_xmm_t; +/* rte_vect_* abstraction implementation using NEON */ + +/* loads the __m128i value from address p(does not need to be 16-byte aligned)*/ +#define rte_vect_loadu_sil128(p) vld1q_s32((const int32_t *)p) + +/* sets the 4 signed 32-bit integer values and returns the __m128i variable */ +static inline __m128i __attribute__((always_inline)) +rte_vect_set_epi32(int i3, int i2, int i1, int i0) +{ + int32_t data[4] = {i0, i1, i2, i3}; + + return vld1q_s32(data); +} + #ifdef __cplusplus } #endif diff --git a/lib/librte_eal/common/include/arch/x86/rte_vect.h b/lib/librte_eal/common/include/arch/x86/rte_vect.h index b698797..91c6523 100644 --- a/lib/librte_eal/common/include/arch/x86/rte_vect.h +++ b/lib/librte_eal/common/include/arch/x86/rte_vect.h @@ -125,6 +125,14 @@ typedef union rte_ymm { }) #endif /* (defined(__ICC) && __ICC < 1210) */ +/* rte_vect_* abstraction implementation using SSE */ + +/* loads the __m128i value from address p(does not need to be 16-byte aligned)*/ +#define rte_vect_loadu_sil128(p) _mm_loadu_si128(p) + +/* sets the 4 signed 32-bit integer values and returns the __m128i variable */ +#define rte_vect_set_epi32(i3, i2, i1, i0) _mm_set_epi32(i3, i2, i1, i0) + #ifdef __cplusplus } #endif -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH 1/3] eal: introduce rte_vect_* abstractions 2015-11-30 17:24 ` [dpdk-dev] [PATCH 1/3] eal: introduce rte_vect_* abstractions Jerin Jacob @ 2015-12-02 13:43 ` Jan Viktorin 2015-12-02 14:51 ` Jerin Jacob 0 siblings, 1 reply; 47+ messages in thread From: Jan Viktorin @ 2015-12-02 13:43 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev On Mon, 30 Nov 2015 22:54:11 +0530 Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote: > introduce rte_vect_* abstractions to remove SSE/AVX specific > code in the common code(i.e the test applications) > > The patch does not provide any functional change for IA, the goal is to Does IA mean Intel Architecture? > have infrastructure to reuse the common vector-based test code across > all the architectures. > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > --- > lib/librte_eal/common/include/arch/arm/rte_vect.h | 17 ++++++++++++++++- > lib/librte_eal/common/include/arch/x86/rte_vect.h | 8 ++++++++ > 2 files changed, 24 insertions(+), 1 deletion(-) > > diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h > index 21cdb4d..d300951 100644 > --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h > +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h > @@ -33,13 +33,14 @@ > #ifndef _RTE_VECT_ARM_H_ > #define _RTE_VECT_ARM_H_ > > -#include "arm_neon.h" > +#include <arm_neon.h> > > #ifdef __cplusplus > extern "C" { > #endif > > typedef int32x4_t xmm_t; > +typedef int32x4_t __m128i; As Jianbo pointed out recently, the __m128i type should be refactored in a general rte_vect API too. If we do something like #if SSE typedef __m128i rte_128i; #elif NEON typedef int32x4_y rte_128i; #endif does it make somebody angry? I am afraid that it will influence a lot of code. However, from the ABI point of view, it is OK, isn't it? > > #define XMM_SIZE (sizeof(xmm_t)) > #define XMM_MASK (XMM_SIZE - 1) > @@ -53,6 +54,20 @@ typedef union rte_xmm { > double pd[XMM_SIZE / sizeof(double)]; > } __attribute__((aligned(16))) rte_xmm_t; > > +/* rte_vect_* abstraction implementation using NEON */ > + > +/* loads the __m128i value from address p(does not need to be 16-byte aligned)*/ > +#define rte_vect_loadu_sil128(p) vld1q_s32((const int32_t *)p) > + > +/* sets the 4 signed 32-bit integer values and returns the __m128i variable */ > +static inline __m128i __attribute__((always_inline)) > +rte_vect_set_epi32(int i3, int i2, int i1, int i0) > +{ > + int32_t data[4] = {i0, i1, i2, i3}; > + > + return vld1q_s32(data); > +} > + > #ifdef __cplusplus > } > #endif > diff --git a/lib/librte_eal/common/include/arch/x86/rte_vect.h b/lib/librte_eal/common/include/arch/x86/rte_vect.h > index b698797..91c6523 100644 > --- a/lib/librte_eal/common/include/arch/x86/rte_vect.h > +++ b/lib/librte_eal/common/include/arch/x86/rte_vect.h > @@ -125,6 +125,14 @@ typedef union rte_ymm { > }) > #endif /* (defined(__ICC) && __ICC < 1210) */ > > +/* rte_vect_* abstraction implementation using SSE */ > + > +/* loads the __m128i value from address p(does not need to be 16-byte aligned)*/ > +#define rte_vect_loadu_sil128(p) _mm_loadu_si128(p) > + > +/* sets the 4 signed 32-bit integer values and returns the __m128i variable */ > +#define rte_vect_set_epi32(i3, i2, i1, i0) _mm_set_epi32(i3, i2, i1, i0) > + > #ifdef __cplusplus > } > #endif I like this approach. It is a question whether to inherit names from SSE. However, why to reinvent the wheel... We probably need other people to give their ideas about such generalization of the API. I think, there should be an autotest of the rte_vect API. Is it possible to create one? Regards Jan -- Jan Viktorin E-mail: Viktorin@RehiveTech.com System Architect Web: www.RehiveTech.com RehiveTech Brno, Czech Republic ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH 1/3] eal: introduce rte_vect_* abstractions 2015-12-02 13:43 ` Jan Viktorin @ 2015-12-02 14:51 ` Jerin Jacob 0 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2015-12-02 14:51 UTC (permalink / raw) To: Jan Viktorin; +Cc: dev On Wed, Dec 02, 2015 at 02:43:34PM +0100, Jan Viktorin wrote: > On Mon, 30 Nov 2015 22:54:11 +0530 > Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote: > > > introduce rte_vect_* abstractions to remove SSE/AVX specific > > code in the common code(i.e the test applications) > > > > The patch does not provide any functional change for IA, the goal is to > > Does IA mean Intel Architecture? Yes. > > > have infrastructure to reuse the common vector-based test code across > > all the architectures. > > > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > --- > > lib/librte_eal/common/include/arch/arm/rte_vect.h | 17 ++++++++++++++++- > > lib/librte_eal/common/include/arch/x86/rte_vect.h | 8 ++++++++ > > 2 files changed, 24 insertions(+), 1 deletion(-) > > > > diff --git a/lib/librte_eal/common/include/arch/arm/rte_vect.h b/lib/librte_eal/common/include/arch/arm/rte_vect.h > > index 21cdb4d..d300951 100644 > > --- a/lib/librte_eal/common/include/arch/arm/rte_vect.h > > +++ b/lib/librte_eal/common/include/arch/arm/rte_vect.h > > @@ -33,13 +33,14 @@ > > #ifndef _RTE_VECT_ARM_H_ > > #define _RTE_VECT_ARM_H_ > > > > -#include "arm_neon.h" > > +#include <arm_neon.h> > > > > #ifdef __cplusplus > > extern "C" { > > #endif > > > > typedef int32x4_t xmm_t; > > +typedef int32x4_t __m128i; > > As Jianbo pointed out recently, the __m128i type should be refactored in > a general rte_vect API too. If we do something like > > #if SSE > typedef __m128i rte_128i; > #elif NEON > typedef int32x4_y rte_128i; > #endif > > does it make somebody angry? I am afraid that it will influence a lot of > code. However, from the ABI point of view, it is OK, isn't it? > > > > > #define XMM_SIZE (sizeof(xmm_t)) > > #define XMM_MASK (XMM_SIZE - 1) > > @@ -53,6 +54,20 @@ typedef union rte_xmm { > > double pd[XMM_SIZE / sizeof(double)]; > > } __attribute__((aligned(16))) rte_xmm_t; > > > > +/* rte_vect_* abstraction implementation using NEON */ > > + > > +/* loads the __m128i value from address p(does not need to be 16-byte aligned)*/ > > +#define rte_vect_loadu_sil128(p) vld1q_s32((const int32_t *)p) > > + > > +/* sets the 4 signed 32-bit integer values and returns the __m128i variable */ > > +static inline __m128i __attribute__((always_inline)) > > +rte_vect_set_epi32(int i3, int i2, int i1, int i0) > > +{ > > + int32_t data[4] = {i0, i1, i2, i3}; > > + > > + return vld1q_s32(data); > > +} > > + > > #ifdef __cplusplus > > } > > #endif > > diff --git a/lib/librte_eal/common/include/arch/x86/rte_vect.h b/lib/librte_eal/common/include/arch/x86/rte_vect.h > > index b698797..91c6523 100644 > > --- a/lib/librte_eal/common/include/arch/x86/rte_vect.h > > +++ b/lib/librte_eal/common/include/arch/x86/rte_vect.h > > @@ -125,6 +125,14 @@ typedef union rte_ymm { > > }) > > #endif /* (defined(__ICC) && __ICC < 1210) */ > > > > +/* rte_vect_* abstraction implementation using SSE */ > > + > > +/* loads the __m128i value from address p(does not need to be 16-byte aligned)*/ > > +#define rte_vect_loadu_sil128(p) _mm_loadu_si128(p) > > + > > +/* sets the 4 signed 32-bit integer values and returns the __m128i variable */ > > +#define rte_vect_set_epi32(i3, i2, i1, i0) _mm_set_epi32(i3, i2, i1, i0) > > + > > #ifdef __cplusplus > > } > > #endif > > I like this approach. It is a question whether to inherit names from > SSE. However, why to reinvent the wheel... > > We probably need other people to give their ideas about such > generalization of the API. Yes, I would like get the feedback from other people. ret_vect_* abstraction only for the common code (i.e test code) which typically used to call the SIMD DPDK API's across the architecture. > > I think, there should be an autotest of the rte_vect API. Is it > possible to create one? Yes > > Regards > Jan > > -- > Jan Viktorin E-mail: Viktorin@RehiveTech.com > System Architect Web: www.RehiveTech.com > RehiveTech > Brno, Czech Republic ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH 2/3] lpm: add support for NEON 2015-11-30 17:24 [dpdk-dev] [PATCH 0/3] add lpm support for NEON Jerin Jacob 2015-11-30 17:24 ` [dpdk-dev] [PATCH 1/3] eal: introduce rte_vect_* abstractions Jerin Jacob @ 2015-11-30 17:24 ` Jerin Jacob 2015-12-02 13:43 ` Jan Viktorin 2015-11-30 17:24 ` [dpdk-dev] [PATCH 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob ` (2 subsequent siblings) 4 siblings, 1 reply; 47+ messages in thread From: Jerin Jacob @ 2015-11-30 17:24 UTC (permalink / raw) To: dev enabled CONFIG_RTE_LIBRTE_LPM, CONFIG_RTE_LIBRTE_TABLE, CONFIG_RTE_LIBRTE_PIPELINE libraries for arm64. TABLE, PIPELINE libraries were disabled due to LPM library dependency. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> --- app/test/test_lpm.c | 10 +- config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - lib/librte_lpm/Makefile | 3 + lib/librte_lpm/rte_lpm.h | 5 + lib/librte_lpm/rte_lpm_neon.h | 172 +++++++++++++++++++++++++++++ 5 files changed, 185 insertions(+), 8 deletions(-) create mode 100644 lib/librte_lpm/rte_lpm_neon.h diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c index 8b4ded9..207301b 100644 --- a/app/test/test_lpm.c +++ b/app/test/test_lpm.c @@ -324,7 +324,7 @@ test7(void) status = rte_lpm_lookup(lpm, ip, &next_hop_return); TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip, ip + 0x100, ip - 0x100, ip); + ipx4 = rte_vect_set_epi32(ip, ip + 0x100, ip - 0x100, ip); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); TEST_LPM_ASSERT(hop[0] == next_hop_add); TEST_LPM_ASSERT(hop[1] == UINT16_MAX); @@ -380,7 +380,7 @@ test8(void) TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip2, ip1, ip2, ip1); + ipx4 = rte_vect_set_epi32(ip2, ip1, ip2, ip1); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); TEST_LPM_ASSERT(hop[0] == UINT16_MAX); TEST_LPM_ASSERT(hop[1] == next_hop_add); @@ -408,7 +408,7 @@ test8(void) status = rte_lpm_lookup(lpm, ip1, &next_hop_return); TEST_LPM_ASSERT(status == -ENOENT); - ipx4 = _mm_set_epi32(ip1, ip1, ip2, ip2); + ipx4 = rte_vect_set_epi32(ip1, ip1, ip2, ip2); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); if (depth != 1) { TEST_LPM_ASSERT(hop[0] == next_hop_add); @@ -872,7 +872,7 @@ test12(void) TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip, ip + 1, ip, ip - 1); + ipx4 = rte_vect_set_epi32(ip, ip + 1, ip, ip - 1); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); TEST_LPM_ASSERT(hop[0] == UINT16_MAX); TEST_LPM_ASSERT(hop[1] == next_hop_add); @@ -1291,7 +1291,7 @@ perf_test(void) unsigned k; __m128i ipx4; - ipx4 = _mm_loadu_si128((__m128i *)(ip_batch + j)); + ipx4 = rte_vect_loadu_sil128((__m128i *)(ip_batch + j)); ipx4 = *(__m128i *)(ip_batch + j); rte_lpm_lookupx4(lpm, ipx4, next_hops, UINT16_MAX); for (k = 0; k < RTE_DIM(next_hops); k++) diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc index 504f3ed..57f7941 100644 --- a/config/defconfig_arm64-armv8a-linuxapp-gcc +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n CONFIG_RTE_LIBRTE_FM10K_PMD=n CONFIG_RTE_LIBRTE_I40E_PMD=n -CONFIG_RTE_LIBRTE_LPM=n -CONFIG_RTE_LIBRTE_TABLE=n -CONFIG_RTE_LIBRTE_PIPELINE=n CONFIG_RTE_SCHED_VECTOR=n diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index 688cfc9..2fd5305 100644 --- a/lib/librte_lpm/Makefile +++ b/lib/librte_lpm/Makefile @@ -46,6 +46,9 @@ SRCS-$(CONFIG_RTE_LIBRTE_LPM) := rte_lpm.c rte_lpm6.c # install this header file SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include := rte_lpm.h rte_lpm6.h +ifeq ($(CONFIG_RTE_ARCH_ARM64),y) +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_neon.h +endif # this lib needs eal DEPDIRS-$(CONFIG_RTE_LIBRTE_LPM) += lib/librte_eal diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index c299ce2..12b75ce 100644 --- a/lib/librte_lpm/rte_lpm.h +++ b/lib/librte_lpm/rte_lpm.h @@ -361,6 +361,9 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips, /* Mask four results. */ #define RTE_LPM_MASKX4_RES UINT64_C(0x00ff00ff00ff00ff) +#if defined(RTE_ARCH_ARM64) +#include "rte_lpm_neon.h" +#else /** * Lookup four IP addresses in an LPM table. * @@ -473,6 +476,8 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4], hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; } +#endif + #ifdef __cplusplus } #endif diff --git a/lib/librte_lpm/rte_lpm_neon.h b/lib/librte_lpm/rte_lpm_neon.h new file mode 100644 index 0000000..6ec4255 --- /dev/null +++ b/lib/librte_lpm/rte_lpm_neon.h @@ -0,0 +1,172 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Cavium Networks. All rights reserved. + * All rights reserved. + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Derived rte_lpm_lookupx4 implementation from lib/librte_lpm/rte_lpm.h + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Cavium Networks nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_LPM_NEON_H_ +#define _RTE_LPM_NEON_H_ + +/** + * @file + * RTE Longest Prefix Match (LPM) lookup for neon + */ + +#include <rte_branch_prediction.h> +#include <rte_byteorder.h> +#include <rte_memory.h> +#include <rte_common.h> +#include <rte_vect.h> + +#ifdef __cplusplus +extern "C" { +#endif + +#ifdef __cplusplus +} +#endif + +/** + * Lookup four IP addresses in an LPM table. + * + * @param lpm + * LPM object handle + * @param ip + * Four IPs to be looked up in the LPM table + * @param hop + * Next hop of the most specific rule found for IP (valid on lookup hit only). + * This is an 4 elements array of two byte values. + * If the lookup was succesfull for the given IP, then least significant byte + * of the corresponding element is the actual next hop and the most + * significant byte is zero. + * If the lookup for the given IP failed, then corresponding element would + * contain default value, see description of then next parameter. + * @param defv + * Default value to populate into corresponding element of hop[] array, + * if lookup would fail. + */ +static inline void +rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4], + uint16_t defv) +{ + uint32x4_t i24; + rte_xmm_t i8; + uint16_t tbl[4]; + uint64_t idx, pt; + + const uint32_t mask = UINT8_MAX; + const int32x4_t mask8 = vdupq_n_s32(mask); + + /* + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries + * as one 64-bit value (0x0300030003000300). + */ + const uint64_t mask_xv = + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 16 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 48); + + /* + * RTE_LPM_LOOKUP_SUCCESS for 4 LPM entries + * as one 64-bit value (0x0100010001000100). + */ + const uint64_t mask_v = + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 16 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48); + + /* get 4 indexes for tbl24[]. */ + i24 = vshrq_n_u32((uint32x4_t)ip, CHAR_BIT); + + /* extract values from tbl24[] */ + idx = vgetq_lane_u64((uint64x2_t)i24, 0); + + tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + idx = vgetq_lane_u64((uint64x2_t)i24, 1); + + tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + /* get 4 indexes for tbl8[]. */ + i8.x = vandq_s32(ip, mask8); + + pt = (uint64_t)tbl[0] | + (uint64_t)tbl[1] << 16 | + (uint64_t)tbl[2] << 32 | + (uint64_t)tbl[3] << 48; + + /* search successfully finished for all 4 IP addresses. */ + if (likely((pt & mask_xv) == mask_v)) { + uintptr_t ph = (uintptr_t)hop; + *(uint64_t *)ph = pt & RTE_LPM_MASKX4_RES; + return; + } + + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[0] = i8.u32[0] + + (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[0] = *(const uint16_t *)&lpm->tbl8[i8.u32[0]]; + } + if (unlikely((pt >> 16 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[1] = i8.u32[1] + + (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[1] = *(const uint16_t *)&lpm->tbl8[i8.u32[1]]; + } + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[2] = i8.u32[2] + + (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[2] = *(const uint16_t *)&lpm->tbl8[i8.u32[2]]; + } + if (unlikely((pt >> 48 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[3] = i8.u32[3] + + (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[3] = *(const uint16_t *)&lpm->tbl8[i8.u32[3]]; + } + + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[0] : defv; + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[1] : defv; + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[2] : defv; + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; +} + +#endif /* _RTE_LPM_NEON_H_ */ -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH 2/3] lpm: add support for NEON 2015-11-30 17:24 ` [dpdk-dev] [PATCH 2/3] lpm: add support for NEON Jerin Jacob @ 2015-12-02 13:43 ` Jan Viktorin 2015-12-02 14:56 ` Jerin Jacob 0 siblings, 1 reply; 47+ messages in thread From: Jan Viktorin @ 2015-12-02 13:43 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev On Mon, 30 Nov 2015 22:54:12 +0530 Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote: > enabled CONFIG_RTE_LIBRTE_LPM, CONFIG_RTE_LIBRTE_TABLE, > CONFIG_RTE_LIBRTE_PIPELINE libraries for arm64. > > TABLE, PIPELINE libraries were disabled due to LPM library dependency. > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > --- > app/test/test_lpm.c | 10 +- > config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - > lib/librte_lpm/Makefile | 3 + > lib/librte_lpm/rte_lpm.h | 5 + > lib/librte_lpm/rte_lpm_neon.h | 172 +++++++++++++++++++++++++++++ > 5 files changed, 185 insertions(+), 8 deletions(-) > create mode 100644 lib/librte_lpm/rte_lpm_neon.h > > [snip] > > # this lib needs eal > DEPDIRS-$(CONFIG_RTE_LIBRTE_LPM) += lib/librte_eal > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h > index c299ce2..12b75ce 100644 > --- a/lib/librte_lpm/rte_lpm.h > +++ b/lib/librte_lpm/rte_lpm.h > @@ -361,6 +361,9 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips, > /* Mask four results. */ > #define RTE_LPM_MASKX4_RES UINT64_C(0x00ff00ff00ff00ff) > > +#if defined(RTE_ARCH_ARM64) > +#include "rte_lpm_neon.h" > +#else > /** > * Lookup four IP addresses in an LPM table. > * > @@ -473,6 +476,8 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4], > hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; > } > > +#endif > + I would separate the SSE implementation into its own file as well. Otherwise, I like this patch. I hope to be able to test it soon. > [snip] -- Jan Viktorin E-mail: Viktorin@RehiveTech.com System Architect Web: www.RehiveTech.com RehiveTech Brno, Czech Republic ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH 2/3] lpm: add support for NEON 2015-12-02 13:43 ` Jan Viktorin @ 2015-12-02 14:56 ` Jerin Jacob 2015-12-02 15:00 ` Jan Viktorin 0 siblings, 1 reply; 47+ messages in thread From: Jerin Jacob @ 2015-12-02 14:56 UTC (permalink / raw) To: Jan Viktorin; +Cc: dev On Wed, Dec 02, 2015 at 02:43:40PM +0100, Jan Viktorin wrote: > On Mon, 30 Nov 2015 22:54:12 +0530 > Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote: > > > enabled CONFIG_RTE_LIBRTE_LPM, CONFIG_RTE_LIBRTE_TABLE, > > CONFIG_RTE_LIBRTE_PIPELINE libraries for arm64. > > > > TABLE, PIPELINE libraries were disabled due to LPM library dependency. > > > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > --- > > app/test/test_lpm.c | 10 +- > > config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - > > lib/librte_lpm/Makefile | 3 + > > lib/librte_lpm/rte_lpm.h | 5 + > > lib/librte_lpm/rte_lpm_neon.h | 172 +++++++++++++++++++++++++++++ > > 5 files changed, 185 insertions(+), 8 deletions(-) > > create mode 100644 lib/librte_lpm/rte_lpm_neon.h > > > > [snip] > > > > # this lib needs eal > > DEPDIRS-$(CONFIG_RTE_LIBRTE_LPM) += lib/librte_eal > > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h > > index c299ce2..12b75ce 100644 > > --- a/lib/librte_lpm/rte_lpm.h > > +++ b/lib/librte_lpm/rte_lpm.h > > @@ -361,6 +361,9 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips, > > /* Mask four results. */ > > #define RTE_LPM_MASKX4_RES UINT64_C(0x00ff00ff00ff00ff) > > > > +#if defined(RTE_ARCH_ARM64) > > +#include "rte_lpm_neon.h" > > +#else > > /** > > * Lookup four IP addresses in an LPM table. > > * > > @@ -473,6 +476,8 @@ rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4], > > hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; > > } > > > > +#endif > > + > > I would separate the SSE implementation into its own file as well. make sense. planning to make it as lib/librte_lpm/rte_lpm_sse.h and lib/librte_lpm/rte_lpm_neon.h. OK ? I can fix it in next revision. > > Otherwise, I like this patch. I hope to be able to test it soon. > > > [snip] > > > -- > Jan Viktorin E-mail: Viktorin@RehiveTech.com > System Architect Web: www.RehiveTech.com > RehiveTech > Brno, Czech Republic ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH 2/3] lpm: add support for NEON 2015-12-02 14:56 ` Jerin Jacob @ 2015-12-02 15:00 ` Jan Viktorin 0 siblings, 0 replies; 47+ messages in thread From: Jan Viktorin @ 2015-12-02 15:00 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev On Wed, 2 Dec 2015 20:26:08 +0530 Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote: > > [snip] > > I would separate the SSE implementation into its own file as well. > > make sense. planning to make it as lib/librte_lpm/rte_lpm_sse.h > and lib/librte_lpm/rte_lpm_neon.h. OK ? > > I can fix it in next revision. Yes, please. Jan ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm 2015-11-30 17:24 [dpdk-dev] [PATCH 0/3] add lpm support for NEON Jerin Jacob 2015-11-30 17:24 ` [dpdk-dev] [PATCH 1/3] eal: introduce rte_vect_* abstractions Jerin Jacob 2015-11-30 17:24 ` [dpdk-dev] [PATCH 2/3] lpm: add support for NEON Jerin Jacob @ 2015-11-30 17:24 ` Jerin Jacob 2015-12-02 13:43 ` Jan Viktorin 2015-12-02 13:43 ` [dpdk-dev] [PATCH 0/3] add lpm support for NEON Jan Viktorin 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 " Jerin Jacob 4 siblings, 1 reply; 47+ messages in thread From: Jerin Jacob @ 2015-11-30 17:24 UTC (permalink / raw) To: dev Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> --- MAINTAINERS | 3 +++ 1 file changed, 3 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 4478862..dc8f80a 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -130,6 +130,9 @@ ARM v8 M: Jerin Jacob <jerin.jacob@caviumnetworks.com> F: lib/librte_eal/common/include/arch/arm/*_64.h F: lib/librte_acl/acl_run_neon.* +F: lib/librte_lpm/rte_lpm_neon.h +F: lib/librte_hash/rte_crc_arm64.h +F: lib/librte_hash/rte_cmp_arm64.h EZchip TILE-Gx M: Zhigang Lu <zlu@ezchip.com> -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm 2015-11-30 17:24 ` [dpdk-dev] [PATCH 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob @ 2015-12-02 13:43 ` Jan Viktorin 2015-12-02 14:57 ` Jerin Jacob 0 siblings, 1 reply; 47+ messages in thread From: Jan Viktorin @ 2015-12-02 13:43 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev On Mon, 30 Nov 2015 22:54:13 +0530 Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote: > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > --- > MAINTAINERS | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 4478862..dc8f80a 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -130,6 +130,9 @@ ARM v8 > M: Jerin Jacob <jerin.jacob@caviumnetworks.com> > F: lib/librte_eal/common/include/arch/arm/*_64.h > F: lib/librte_acl/acl_run_neon.* > +F: lib/librte_lpm/rte_lpm_neon.h > +F: lib/librte_hash/rte_crc_arm64.h > +F: lib/librte_hash/rte_cmp_arm64.h I can't see the librte_hash/* files in the patch set. Is it by mistake? > > EZchip TILE-Gx > M: Zhigang Lu <zlu@ezchip.com> -- Jan Viktorin E-mail: Viktorin@RehiveTech.com System Architect Web: www.RehiveTech.com RehiveTech Brno, Czech Republic ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm 2015-12-02 13:43 ` Jan Viktorin @ 2015-12-02 14:57 ` Jerin Jacob 0 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2015-12-02 14:57 UTC (permalink / raw) To: Jan Viktorin; +Cc: dev On Wed, Dec 02, 2015 at 02:43:52PM +0100, Jan Viktorin wrote: > On Mon, 30 Nov 2015 22:54:13 +0530 > Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote: > > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > --- > > MAINTAINERS | 3 +++ > > 1 file changed, 3 insertions(+) > > > > diff --git a/MAINTAINERS b/MAINTAINERS > > index 4478862..dc8f80a 100644 > > --- a/MAINTAINERS > > +++ b/MAINTAINERS > > @@ -130,6 +130,9 @@ ARM v8 > > M: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > F: lib/librte_eal/common/include/arch/arm/*_64.h > > F: lib/librte_acl/acl_run_neon.* > > +F: lib/librte_lpm/rte_lpm_neon.h > > +F: lib/librte_hash/rte_crc_arm64.h > > +F: lib/librte_hash/rte_cmp_arm64.h > > I can't see the librte_hash/* files in the patch set. Is it by mistake? Those files are already in upstream. > > > > > EZchip TILE-Gx > > M: Zhigang Lu <zlu@ezchip.com> > > > > -- > Jan Viktorin E-mail: Viktorin@RehiveTech.com > System Architect Web: www.RehiveTech.com > RehiveTech > Brno, Czech Republic ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH 0/3] add lpm support for NEON 2015-11-30 17:24 [dpdk-dev] [PATCH 0/3] add lpm support for NEON Jerin Jacob ` (2 preceding siblings ...) 2015-11-30 17:24 ` [dpdk-dev] [PATCH 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob @ 2015-12-02 13:43 ` Jan Viktorin 2015-12-02 14:41 ` Jerin Jacob 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 " Jerin Jacob 4 siblings, 1 reply; 47+ messages in thread From: Jan Viktorin @ 2015-12-02 13:43 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev Hello Jerin, thank you for this patch series. Please CC me next time when doing an ARM-related changes. It took me a while to find the related e-mails on the mail server. On Mon, 30 Nov 2015 22:54:10 +0530 Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote: > - Introduce new rte_vect_* abstractions in eal > - This patch set has the changes required for optimised pm library usage in arm64 perspective > - Tested on Juno and Thunder boards > - Tested and verified the changes with following DPDK unit test cases > --lpm_autotest > --lpm6_autotest > - This patch set has dependency on [dpdk-dev] [PATCH v4 0/2] disable CONFIG_RTE_SCHED_VECTOR for arm What kind of dependency is it? Functional? > - With these changes, arm64 platform supports all DPDK libraries(in feature wise) Is there some ARMv8 specific NEON instruction? > > Jerin Jacob (3): > eal: introduce rte_vect_* abstractions > lpm: add support for NEON > maintainers: claim responsibility for arm64 specific files of hash and > lpm > > MAINTAINERS | 3 + > app/test/test_lpm.c | 10 +- > config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - > lib/librte_eal/common/include/arch/arm/rte_vect.h | 17 ++- > lib/librte_eal/common/include/arch/x86/rte_vect.h | 8 + > lib/librte_lpm/Makefile | 3 + > lib/librte_lpm/rte_lpm.h | 5 + > lib/librte_lpm/rte_lpm_neon.h | 172 ++++++++++++++++++++++ > 8 files changed, 212 insertions(+), 9 deletions(-) > create mode 100644 lib/librte_lpm/rte_lpm_neon.h > > -- > 2.1.0 > -- Jan Viktorin E-mail: Viktorin@RehiveTech.com System Architect Web: www.RehiveTech.com RehiveTech Brno, Czech Republic ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH 0/3] add lpm support for NEON 2015-12-02 13:43 ` [dpdk-dev] [PATCH 0/3] add lpm support for NEON Jan Viktorin @ 2015-12-02 14:41 ` Jerin Jacob 0 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2015-12-02 14:41 UTC (permalink / raw) To: Jan Viktorin; +Cc: dev On Wed, Dec 02, 2015 at 02:43:12PM +0100, Jan Viktorin wrote: > Hello Jerin, > > thank you for this patch series. Please CC me next time when doing an > ARM-related changes. It took me a while to find the related e-mails on > the mail server. It's was my mistake. Sorry about that. > > On Mon, 30 Nov 2015 22:54:10 +0530 > Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote: > > > - Introduce new rte_vect_* abstractions in eal > > - This patch set has the changes required for optimised pm library usage in arm64 perspective > > - Tested on Juno and Thunder boards > > - Tested and verified the changes with following DPDK unit test cases > > --lpm_autotest > > --lpm6_autotest > > - This patch set has dependency on [dpdk-dev] [PATCH v4 0/2] disable CONFIG_RTE_SCHED_VECTOR for arm > > What kind of dependency is it? Functional? Not functional, Just "git am" dependency on config file change due to recent config file re structuring. > > > - With these changes, arm64 platform supports all DPDK libraries(in feature wise) > > Is there some ARMv8 specific NEON instruction? NO. I just said as covering note as ACL on armv7 was not supported at that time. > > > > > Jerin Jacob (3): > > eal: introduce rte_vect_* abstractions > > lpm: add support for NEON > > maintainers: claim responsibility for arm64 specific files of hash and > > lpm > > > > MAINTAINERS | 3 + > > app/test/test_lpm.c | 10 +- > > config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - > > lib/librte_eal/common/include/arch/arm/rte_vect.h | 17 ++- > > lib/librte_eal/common/include/arch/x86/rte_vect.h | 8 + > > lib/librte_lpm/Makefile | 3 + > > lib/librte_lpm/rte_lpm.h | 5 + > > lib/librte_lpm/rte_lpm_neon.h | 172 ++++++++++++++++++++++ > > 8 files changed, 212 insertions(+), 9 deletions(-) > > create mode 100644 lib/librte_lpm/rte_lpm_neon.h > > > > -- > > 2.1.0 > > > > > > -- > Jan Viktorin E-mail: Viktorin@RehiveTech.com > System Architect Web: www.RehiveTech.com > RehiveTech > Brno, Czech Republic ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v2 0/3] add lpm support for NEON 2015-11-30 17:24 [dpdk-dev] [PATCH 0/3] add lpm support for NEON Jerin Jacob ` (3 preceding siblings ...) 2015-12-02 13:43 ` [dpdk-dev] [PATCH 0/3] add lpm support for NEON Jan Viktorin @ 2015-12-04 15:14 ` Jerin Jacob 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob ` (3 more replies) 4 siblings, 4 replies; 47+ messages in thread From: Jerin Jacob @ 2015-12-04 15:14 UTC (permalink / raw) To: dev - This patch enabled lpm for ARM - Used architecture agnostic xmm_t to represent 128 bit SIMD variable in rte_lpm_lookupx4 API definition - Tested on Juno and Thunderx boards - Tested and verified the changes with following DPDK unit test cases --lpm_autotest --lpm6_autotest v1..v2 - make rte_lpm_lookupx4 API definition architecture agnostic - vect_* abstraction scope reduce to only app/test as this abstraction used only to load/store and set vectors in test application which is the consumer of rte_lpm_lookupx4 like API - support for armv7 apart from armv8 - taken changes from Jianbo's lpm patches Jerin Jacob (3): lpm: make rte_lpm_lookupx4 API definition architecture agnostic lpm: add support for NEON maintainers: claim responsibility for arm64 specific files of hash and lpm MAINTAINERS | 3 + app/test/test_lpm.c | 21 ++-- app/test/test_xmmt_ops.h | 67 +++++++++++++ config/defconfig_arm-armv7a-linuxapp-gcc | 3 - config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - lib/librte_lpm/Makefile | 6 ++ lib/librte_lpm/rte_lpm.h | 99 ++----------------- lib/librte_lpm/rte_lpm_neon.h | 148 +++++++++++++++++++++++++++++ lib/librte_lpm/rte_lpm_sse.h | 143 ++++++++++++++++++++++++++++ 9 files changed, 386 insertions(+), 107 deletions(-) create mode 100644 app/test/test_xmmt_ops.h create mode 100644 lib/librte_lpm/rte_lpm_neon.h create mode 100644 lib/librte_lpm/rte_lpm_sse.h -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v2 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 " Jerin Jacob @ 2015-12-04 15:14 ` Jerin Jacob 2015-12-07 6:15 ` Jianbo Liu 2015-12-07 14:06 ` Ananyev, Konstantin 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 2/3] lpm: add support for NEON Jerin Jacob ` (2 subsequent siblings) 3 siblings, 2 replies; 47+ messages in thread From: Jerin Jacob @ 2015-12-04 15:14 UTC (permalink / raw) To: dev -Used architecture agnostic xmm_t to represent 128 bit SIMD variable -Introduced vect_* API abstraction in app/test to test rte_lpm_lookupx4 API in architecture agnostic way -Moved rte_lpm_lookupx4 SSE implementation to architecture specific rte_lpm_sse.h file to accommodate new rte_lpm_lookupx4 implementation for a different architecture. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> --- app/test/test_lpm.c | 21 ++++--- app/test/test_xmmt_ops.h | 47 ++++++++++++++ lib/librte_lpm/Makefile | 2 + lib/librte_lpm/rte_lpm.h | 93 +--------------------------- lib/librte_lpm/rte_lpm_sse.h | 143 +++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 206 insertions(+), 100 deletions(-) create mode 100644 app/test/test_xmmt_ops.h create mode 100644 lib/librte_lpm/rte_lpm_sse.h diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c index 8b4ded9..59674f1 100644 --- a/app/test/test_lpm.c +++ b/app/test/test_lpm.c @@ -49,6 +49,7 @@ #include "rte_lpm.h" #include "test_lpm_routes.h" +#include "test_xmmt_ops.h" #define TEST_LPM_ASSERT(cond) do { \ if (!(cond)) { \ @@ -308,7 +309,7 @@ test6(void) int32_t test7(void) { - __m128i ipx4; + xmm_t ipx4; uint16_t hop[4]; struct rte_lpm *lpm = NULL; uint32_t ip = IPv4(0, 0, 0, 0); @@ -324,7 +325,7 @@ test7(void) status = rte_lpm_lookup(lpm, ip, &next_hop_return); TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip, ip + 0x100, ip - 0x100, ip); + ipx4 = vect_set_epi32(ip, ip + 0x100, ip - 0x100, ip); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); TEST_LPM_ASSERT(hop[0] == next_hop_add); TEST_LPM_ASSERT(hop[1] == UINT16_MAX); @@ -354,7 +355,7 @@ test7(void) int32_t test8(void) { - __m128i ipx4; + xmm_t ipx4; uint16_t hop[4]; struct rte_lpm *lpm = NULL; uint32_t ip1 = IPv4(127, 255, 255, 255), ip2 = IPv4(128, 0, 0, 0); @@ -380,7 +381,7 @@ test8(void) TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip2, ip1, ip2, ip1); + ipx4 = vect_set_epi32(ip2, ip1, ip2, ip1); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); TEST_LPM_ASSERT(hop[0] == UINT16_MAX); TEST_LPM_ASSERT(hop[1] == next_hop_add); @@ -408,7 +409,7 @@ test8(void) status = rte_lpm_lookup(lpm, ip1, &next_hop_return); TEST_LPM_ASSERT(status == -ENOENT); - ipx4 = _mm_set_epi32(ip1, ip1, ip2, ip2); + ipx4 = vect_set_epi32(ip1, ip1, ip2, ip2); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); if (depth != 1) { TEST_LPM_ASSERT(hop[0] == next_hop_add); @@ -850,7 +851,7 @@ test11(void) int32_t test12(void) { - __m128i ipx4; + xmm_t ipx4; uint16_t hop[4]; struct rte_lpm *lpm = NULL; uint32_t ip, i; @@ -872,7 +873,7 @@ test12(void) TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip, ip + 1, ip, ip - 1); + ipx4 = vect_set_epi32(ip, ip + 1, ip, ip - 1); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); TEST_LPM_ASSERT(hop[0] == UINT16_MAX); TEST_LPM_ASSERT(hop[1] == next_hop_add); @@ -1289,10 +1290,10 @@ perf_test(void) begin = rte_rdtsc(); for (j = 0; j < BATCH_SIZE; j += RTE_DIM(next_hops)) { unsigned k; - __m128i ipx4; + xmm_t ipx4; - ipx4 = _mm_loadu_si128((__m128i *)(ip_batch + j)); - ipx4 = *(__m128i *)(ip_batch + j); + ipx4 = vect_loadu_sil128((xmm_t *)(ip_batch + j)); + ipx4 = *(xmm_t *)(ip_batch + j); rte_lpm_lookupx4(lpm, ipx4, next_hops, UINT16_MAX); for (k = 0; k < RTE_DIM(next_hops); k++) if (unlikely(next_hops[k] == UINT16_MAX)) diff --git a/app/test/test_xmmt_ops.h b/app/test/test_xmmt_ops.h new file mode 100644 index 0000000..c055912 --- /dev/null +++ b/app/test/test_xmmt_ops.h @@ -0,0 +1,47 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Cavium Networks. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Cavium Networks nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _TEST_XMMT_OPS_H_ +#define _TEST_XMMT_OPS_H_ + +#include <rte_vect.h> + +/* vect_* abstraction implementation using SSE */ + +/* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ +#define vect_loadu_sil128(p) _mm_loadu_si128(p) + +/* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ +#define vect_set_epi32(i3, i2, i1, i0) _mm_set_epi32(i3, i2, i1, i0) + +#endif /* _TEST_XMMT_OPS_H_ */ diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index 688cfc9..ce3a1d1 100644 --- a/lib/librte_lpm/Makefile +++ b/lib/librte_lpm/Makefile @@ -47,6 +47,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_LPM) := rte_lpm.c rte_lpm6.c # install this header file SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include := rte_lpm.h rte_lpm6.h +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_sse.h + # this lib needs eal DEPDIRS-$(CONFIG_RTE_LIBRTE_LPM) += lib/librte_eal diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index c299ce2..dfe1378 100644 --- a/lib/librte_lpm/rte_lpm.h +++ b/lib/librte_lpm/rte_lpm.h @@ -381,97 +381,10 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips, * if lookup would fail. */ static inline void -rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4], - uint16_t defv) -{ - __m128i i24; - rte_xmm_t i8; - uint16_t tbl[4]; - uint64_t idx, pt; +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], + uint16_t defv); - const __m128i mask8 = - _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX); - - /* - * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries - * as one 64-bit value (0x0300030003000300). - */ - const uint64_t mask_xv = - ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 16 | - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32 | - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 48); - - /* - * RTE_LPM_LOOKUP_SUCCESS for 4 LPM entries - * as one 64-bit value (0x0100010001000100). - */ - const uint64_t mask_v = - ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 16 | - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 | - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48); - - /* get 4 indexes for tbl24[]. */ - i24 = _mm_srli_epi32(ip, CHAR_BIT); - - /* extract values from tbl24[] */ - idx = _mm_cvtsi128_si64(i24); - i24 = _mm_srli_si128(i24, sizeof(uint64_t)); - - tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; - tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; - - idx = _mm_cvtsi128_si64(i24); - - tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; - tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; - - /* get 4 indexes for tbl8[]. */ - i8.x = _mm_and_si128(ip, mask8); - - pt = (uint64_t)tbl[0] | - (uint64_t)tbl[1] << 16 | - (uint64_t)tbl[2] << 32 | - (uint64_t)tbl[3] << 48; - - /* search successfully finished for all 4 IP addresses. */ - if (likely((pt & mask_xv) == mask_v)) { - uintptr_t ph = (uintptr_t)hop; - *(uint64_t *)ph = pt & RTE_LPM_MASKX4_RES; - return; - } - - if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[0] = i8.u32[0] + - (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl[0] = *(const uint16_t *)&lpm->tbl8[i8.u32[0]]; - } - if (unlikely((pt >> 16 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[1] = i8.u32[1] + - (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl[1] = *(const uint16_t *)&lpm->tbl8[i8.u32[1]]; - } - if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[2] = i8.u32[2] + - (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl[2] = *(const uint16_t *)&lpm->tbl8[i8.u32[2]]; - } - if (unlikely((pt >> 48 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[3] = i8.u32[3] + - (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl[3] = *(const uint16_t *)&lpm->tbl8[i8.u32[3]]; - } - - hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[0] : defv; - hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[1] : defv; - hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[2] : defv; - hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; -} +#include "rte_lpm_sse.h" #ifdef __cplusplus } diff --git a/lib/librte_lpm/rte_lpm_sse.h b/lib/librte_lpm/rte_lpm_sse.h new file mode 100644 index 0000000..2b7eeec --- /dev/null +++ b/lib/librte_lpm/rte_lpm_sse.h @@ -0,0 +1,143 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_LPM_SSE_H_ +#define _RTE_LPM_SSE_H_ + +#include <rte_branch_prediction.h> +#include <rte_byteorder.h> +#include <rte_common.h> +#include <rte_vect.h> + +#ifdef __cplusplus +extern "C" { +#endif + +static inline void +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], + uint16_t defv) +{ + __m128i i24; + rte_xmm_t i8; + uint16_t tbl[4]; + uint64_t idx, pt; + + const __m128i mask8 = + _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX); + + /* + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries + * as one 64-bit value (0x0300030003000300). + */ + const uint64_t mask_xv = + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 16 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 48); + + /* + * RTE_LPM_LOOKUP_SUCCESS for 4 LPM entries + * as one 64-bit value (0x0100010001000100). + */ + const uint64_t mask_v = + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 16 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48); + + /* get 4 indexes for tbl24[]. */ + i24 = _mm_srli_epi32(ip, CHAR_BIT); + + /* extract values from tbl24[] */ + idx = _mm_cvtsi128_si64(i24); + i24 = _mm_srli_si128(i24, sizeof(uint64_t)); + + tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + idx = _mm_cvtsi128_si64(i24); + + tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + /* get 4 indexes for tbl8[]. */ + i8.x = _mm_and_si128(ip, mask8); + + pt = (uint64_t)tbl[0] | + (uint64_t)tbl[1] << 16 | + (uint64_t)tbl[2] << 32 | + (uint64_t)tbl[3] << 48; + + /* search successfully finished for all 4 IP addresses. */ + if (likely((pt & mask_xv) == mask_v)) { + uintptr_t ph = (uintptr_t)hop; + *(uint64_t *)ph = pt & RTE_LPM_MASKX4_RES; + return; + } + + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[0] = i8.u32[0] + + (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[0] = *(const uint16_t *)&lpm->tbl8[i8.u32[0]]; + } + if (unlikely((pt >> 16 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[1] = i8.u32[1] + + (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[1] = *(const uint16_t *)&lpm->tbl8[i8.u32[1]]; + } + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[2] = i8.u32[2] + + (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[2] = *(const uint16_t *)&lpm->tbl8[i8.u32[2]]; + } + if (unlikely((pt >> 48 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[3] = i8.u32[3] + + (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[3] = *(const uint16_t *)&lpm->tbl8[i8.u32[3]]; + } + + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[0] : defv; + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[1] : defv; + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[2] : defv; + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_LPM_SSE_H_ */ -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob @ 2015-12-07 6:15 ` Jianbo Liu 2015-12-07 6:57 ` Jerin Jacob 2015-12-07 14:06 ` Ananyev, Konstantin 1 sibling, 1 reply; 47+ messages in thread From: Jianbo Liu @ 2015-12-07 6:15 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev On 4 December 2015 at 23:14, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote: > -Used architecture agnostic xmm_t to represent 128 bit SIMD variable > > -Introduced vect_* API abstraction in app/test to test rte_lpm_lookupx4 > API in architecture agnostic way > > -Moved rte_lpm_lookupx4 SSE implementation to architecture specific > rte_lpm_sse.h file to accommodate new rte_lpm_lookupx4 implementation > for a different architecture. > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > --- > app/test/test_lpm.c | 21 ++++--- > app/test/test_xmmt_ops.h | 47 ++++++++++++++ > lib/librte_lpm/Makefile | 2 + > lib/librte_lpm/rte_lpm.h | 93 +--------------------------- > lib/librte_lpm/rte_lpm_sse.h | 143 +++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 206 insertions(+), 100 deletions(-) > create mode 100644 app/test/test_xmmt_ops.h > create mode 100644 lib/librte_lpm/rte_lpm_sse.h > > diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c > index 8b4ded9..59674f1 100644 > --- a/app/test/test_lpm.c > +++ b/app/test/test_lpm.c > @@ -49,6 +49,7 @@ > > #include "rte_lpm.h" > #include "test_lpm_routes.h" > +#include "test_xmmt_ops.h" > > #define TEST_LPM_ASSERT(cond) do { \ > if (!(cond)) { \ > @@ -308,7 +309,7 @@ test6(void) > int32_t > test7(void) > { > - __m128i ipx4; > + xmm_t ipx4; > uint16_t hop[4]; > struct rte_lpm *lpm = NULL; > uint32_t ip = IPv4(0, 0, 0, 0); > @@ -324,7 +325,7 @@ test7(void) > status = rte_lpm_lookup(lpm, ip, &next_hop_return); > TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); > > - ipx4 = _mm_set_epi32(ip, ip + 0x100, ip - 0x100, ip); > + ipx4 = vect_set_epi32(ip, ip + 0x100, ip - 0x100, ip); > rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); > TEST_LPM_ASSERT(hop[0] == next_hop_add); > TEST_LPM_ASSERT(hop[1] == UINT16_MAX); > @@ -354,7 +355,7 @@ test7(void) > int32_t > test8(void) > { > - __m128i ipx4; > + xmm_t ipx4; > uint16_t hop[4]; > struct rte_lpm *lpm = NULL; > uint32_t ip1 = IPv4(127, 255, 255, 255), ip2 = IPv4(128, 0, 0, 0); > @@ -380,7 +381,7 @@ test8(void) > TEST_LPM_ASSERT((status == 0) && > (next_hop_return == next_hop_add)); > > - ipx4 = _mm_set_epi32(ip2, ip1, ip2, ip1); > + ipx4 = vect_set_epi32(ip2, ip1, ip2, ip1); > rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); > TEST_LPM_ASSERT(hop[0] == UINT16_MAX); > TEST_LPM_ASSERT(hop[1] == next_hop_add); > @@ -408,7 +409,7 @@ test8(void) > status = rte_lpm_lookup(lpm, ip1, &next_hop_return); > TEST_LPM_ASSERT(status == -ENOENT); > > - ipx4 = _mm_set_epi32(ip1, ip1, ip2, ip2); > + ipx4 = vect_set_epi32(ip1, ip1, ip2, ip2); > rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); > if (depth != 1) { > TEST_LPM_ASSERT(hop[0] == next_hop_add); > @@ -850,7 +851,7 @@ test11(void) > int32_t > test12(void) > { > - __m128i ipx4; > + xmm_t ipx4; > uint16_t hop[4]; > struct rte_lpm *lpm = NULL; > uint32_t ip, i; > @@ -872,7 +873,7 @@ test12(void) > TEST_LPM_ASSERT((status == 0) && > (next_hop_return == next_hop_add)); > > - ipx4 = _mm_set_epi32(ip, ip + 1, ip, ip - 1); > + ipx4 = vect_set_epi32(ip, ip + 1, ip, ip - 1); > rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); > TEST_LPM_ASSERT(hop[0] == UINT16_MAX); > TEST_LPM_ASSERT(hop[1] == next_hop_add); > @@ -1289,10 +1290,10 @@ perf_test(void) > begin = rte_rdtsc(); > for (j = 0; j < BATCH_SIZE; j += RTE_DIM(next_hops)) { > unsigned k; > - __m128i ipx4; > + xmm_t ipx4; > > - ipx4 = _mm_loadu_si128((__m128i *)(ip_batch + j)); > - ipx4 = *(__m128i *)(ip_batch + j); > + ipx4 = vect_loadu_sil128((xmm_t *)(ip_batch + j)); > + ipx4 = *(xmm_t *)(ip_batch + j); > rte_lpm_lookupx4(lpm, ipx4, next_hops, UINT16_MAX); > for (k = 0; k < RTE_DIM(next_hops); k++) > if (unlikely(next_hops[k] == UINT16_MAX)) > diff --git a/app/test/test_xmmt_ops.h b/app/test/test_xmmt_ops.h > new file mode 100644 > index 0000000..c055912 > --- /dev/null > +++ b/app/test/test_xmmt_ops.h Why add this new file under app/test, which is only for test app? Should vect_loadu_sil128/vect_set_epi32 be in each ARCH's rte_vect.h? > @@ -0,0 +1,47 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2015 Cavium Networks. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Cavium Networks nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#ifndef _TEST_XMMT_OPS_H_ > +#define _TEST_XMMT_OPS_H_ > + > +#include <rte_vect.h> > + > +/* vect_* abstraction implementation using SSE */ > + > +/* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ > +#define vect_loadu_sil128(p) _mm_loadu_si128(p) > + > +/* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ > +#define vect_set_epi32(i3, i2, i1, i0) _mm_set_epi32(i3, i2, i1, i0) > + > +#endif /* _TEST_XMMT_OPS_H_ */ > diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile > index 688cfc9..ce3a1d1 100644 > --- a/lib/librte_lpm/Makefile > +++ b/lib/librte_lpm/Makefile > @@ -47,6 +47,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_LPM) := rte_lpm.c rte_lpm6.c > # install this header file > SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include := rte_lpm.h rte_lpm6.h > > +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_sse.h > + > # this lib needs eal > DEPDIRS-$(CONFIG_RTE_LIBRTE_LPM) += lib/librte_eal > > diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h > index c299ce2..dfe1378 100644 > --- a/lib/librte_lpm/rte_lpm.h > +++ b/lib/librte_lpm/rte_lpm.h > @@ -381,97 +381,10 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips, > * if lookup would fail. > */ > static inline void > -rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4], > - uint16_t defv) > -{ > - __m128i i24; > - rte_xmm_t i8; > - uint16_t tbl[4]; > - uint64_t idx, pt; > +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], > + uint16_t defv); > > - const __m128i mask8 = > - _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX); > - > - /* > - * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries > - * as one 64-bit value (0x0300030003000300). > - */ > - const uint64_t mask_xv = > - ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | > - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 16 | > - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32 | > - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 48); > - > - /* > - * RTE_LPM_LOOKUP_SUCCESS for 4 LPM entries > - * as one 64-bit value (0x0100010001000100). > - */ > - const uint64_t mask_v = > - ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | > - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 16 | > - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 | > - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48); > - > - /* get 4 indexes for tbl24[]. */ > - i24 = _mm_srli_epi32(ip, CHAR_BIT); > - > - /* extract values from tbl24[] */ > - idx = _mm_cvtsi128_si64(i24); > - i24 = _mm_srli_si128(i24, sizeof(uint64_t)); > - > - tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; > - tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; > - > - idx = _mm_cvtsi128_si64(i24); > - > - tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; > - tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; > - > - /* get 4 indexes for tbl8[]. */ > - i8.x = _mm_and_si128(ip, mask8); > - > - pt = (uint64_t)tbl[0] | > - (uint64_t)tbl[1] << 16 | > - (uint64_t)tbl[2] << 32 | > - (uint64_t)tbl[3] << 48; > - > - /* search successfully finished for all 4 IP addresses. */ > - if (likely((pt & mask_xv) == mask_v)) { > - uintptr_t ph = (uintptr_t)hop; > - *(uint64_t *)ph = pt & RTE_LPM_MASKX4_RES; > - return; > - } > - > - if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > - i8.u32[0] = i8.u32[0] + > - (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > - tbl[0] = *(const uint16_t *)&lpm->tbl8[i8.u32[0]]; > - } > - if (unlikely((pt >> 16 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > - i8.u32[1] = i8.u32[1] + > - (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > - tbl[1] = *(const uint16_t *)&lpm->tbl8[i8.u32[1]]; > - } > - if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > - i8.u32[2] = i8.u32[2] + > - (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > - tbl[2] = *(const uint16_t *)&lpm->tbl8[i8.u32[2]]; > - } > - if (unlikely((pt >> 48 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > - i8.u32[3] = i8.u32[3] + > - (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > - tbl[3] = *(const uint16_t *)&lpm->tbl8[i8.u32[3]]; > - } > - > - hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[0] : defv; > - hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[1] : defv; > - hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[2] : defv; > - hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; > -} > +#include "rte_lpm_sse.h" > > #ifdef __cplusplus > } > diff --git a/lib/librte_lpm/rte_lpm_sse.h b/lib/librte_lpm/rte_lpm_sse.h > new file mode 100644 > index 0000000..2b7eeec > --- /dev/null > +++ b/lib/librte_lpm/rte_lpm_sse.h > @@ -0,0 +1,143 @@ > +/*- > + * BSD LICENSE > + * > + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. > + * All rights reserved. > + * > + * Redistribution and use in source and binary forms, with or without > + * modification, are permitted provided that the following conditions > + * are met: > + * > + * * Redistributions of source code must retain the above copyright > + * notice, this list of conditions and the following disclaimer. > + * * Redistributions in binary form must reproduce the above copyright > + * notice, this list of conditions and the following disclaimer in > + * the documentation and/or other materials provided with the > + * distribution. > + * * Neither the name of Intel Corporation nor the names of its > + * contributors may be used to endorse or promote products derived > + * from this software without specific prior written permission. > + * > + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS > + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT > + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR > + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT > + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, > + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY > + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT > + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE > + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. > + */ > + > +#ifndef _RTE_LPM_SSE_H_ > +#define _RTE_LPM_SSE_H_ > + > +#include <rte_branch_prediction.h> > +#include <rte_byteorder.h> > +#include <rte_common.h> > +#include <rte_vect.h> > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +static inline void > +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], > + uint16_t defv) > +{ > + __m128i i24; > + rte_xmm_t i8; > + uint16_t tbl[4]; > + uint64_t idx, pt; > + > + const __m128i mask8 = > + _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX); > + > + /* > + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries > + * as one 64-bit value (0x0300030003000300). > + */ > + const uint64_t mask_xv = > + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | > + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 16 | > + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32 | > + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 48); > + > + /* > + * RTE_LPM_LOOKUP_SUCCESS for 4 LPM entries > + * as one 64-bit value (0x0100010001000100). > + */ > + const uint64_t mask_v = > + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | > + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 16 | > + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 | > + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48); > + > + /* get 4 indexes for tbl24[]. */ > + i24 = _mm_srli_epi32(ip, CHAR_BIT); > + > + /* extract values from tbl24[] */ > + idx = _mm_cvtsi128_si64(i24); > + i24 = _mm_srli_si128(i24, sizeof(uint64_t)); > + > + tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; > + tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; > + > + idx = _mm_cvtsi128_si64(i24); > + > + tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; > + tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; > + > + /* get 4 indexes for tbl8[]. */ > + i8.x = _mm_and_si128(ip, mask8); > + > + pt = (uint64_t)tbl[0] | > + (uint64_t)tbl[1] << 16 | > + (uint64_t)tbl[2] << 32 | > + (uint64_t)tbl[3] << 48; > + > + /* search successfully finished for all 4 IP addresses. */ > + if (likely((pt & mask_xv) == mask_v)) { > + uintptr_t ph = (uintptr_t)hop; > + *(uint64_t *)ph = pt & RTE_LPM_MASKX4_RES; > + return; > + } > + > + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > + i8.u32[0] = i8.u32[0] + > + (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > + tbl[0] = *(const uint16_t *)&lpm->tbl8[i8.u32[0]]; > + } > + if (unlikely((pt >> 16 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > + i8.u32[1] = i8.u32[1] + > + (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > + tbl[1] = *(const uint16_t *)&lpm->tbl8[i8.u32[1]]; > + } > + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > + i8.u32[2] = i8.u32[2] + > + (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > + tbl[2] = *(const uint16_t *)&lpm->tbl8[i8.u32[2]]; > + } > + if (unlikely((pt >> 48 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == > + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { > + i8.u32[3] = i8.u32[3] + > + (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; > + tbl[3] = *(const uint16_t *)&lpm->tbl8[i8.u32[3]]; > + } > + > + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[0] : defv; > + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[1] : defv; > + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[2] : defv; > + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* _RTE_LPM_SSE_H_ */ > -- > 2.1.0 > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic 2015-12-07 6:15 ` Jianbo Liu @ 2015-12-07 6:57 ` Jerin Jacob 0 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2015-12-07 6:57 UTC (permalink / raw) To: Jianbo Liu; +Cc: dev On Mon, Dec 07, 2015 at 02:15:28PM +0800, Jianbo Liu wrote: > On 4 December 2015 at 23:14, Jerin Jacob <jerin.jacob@caviumnetworks.com> wrote: > > -Used architecture agnostic xmm_t to represent 128 bit SIMD variable > > > > -Introduced vect_* API abstraction in app/test to test rte_lpm_lookupx4 > > API in architecture agnostic way > > > > -Moved rte_lpm_lookupx4 SSE implementation to architecture specific > > rte_lpm_sse.h file to accommodate new rte_lpm_lookupx4 implementation > > for a different architecture. > > > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > --- > > app/test/test_lpm.c | 21 ++++--- > > app/test/test_xmmt_ops.h | 47 ++++++++++++++ > > lib/librte_lpm/Makefile | 2 + > > lib/librte_lpm/rte_lpm.h | 93 +--------------------------- > > lib/librte_lpm/rte_lpm_sse.h | 143 +++++++++++++++++++++++++++++++++++++++++++ > > 5 files changed, 206 insertions(+), 100 deletions(-) > > create mode 100644 app/test/test_xmmt_ops.h > > create mode 100644 lib/librte_lpm/rte_lpm_sse.h > > > > diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c > > index 8b4ded9..59674f1 100644 > > --- a/app/test/test_lpm.c > > +++ b/app/test/test_lpm.c > > @@ -49,6 +49,7 @@ > > > > #include "rte_lpm.h" > > #include "test_lpm_routes.h" > > +#include "test_xmmt_ops.h" > > > > #define TEST_LPM_ASSERT(cond) do { \ > > if (!(cond)) { \ > > @@ -308,7 +309,7 @@ test6(void) > > int32_t > > test7(void) > > { > > - __m128i ipx4; > > + xmm_t ipx4; > > uint16_t hop[4]; > > struct rte_lpm *lpm = NULL; > > uint32_t ip = IPv4(0, 0, 0, 0); > > @@ -324,7 +325,7 @@ test7(void) > > status = rte_lpm_lookup(lpm, ip, &next_hop_return); > > TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); > > > > - ipx4 = _mm_set_epi32(ip, ip + 0x100, ip - 0x100, ip); > > + ipx4 = vect_set_epi32(ip, ip + 0x100, ip - 0x100, ip); > > rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); > > TEST_LPM_ASSERT(hop[0] == next_hop_add); > > TEST_LPM_ASSERT(hop[1] == UINT16_MAX); > > @@ -354,7 +355,7 @@ test7(void) > > int32_t > > test8(void) > > { > > - __m128i ipx4; > > + xmm_t ipx4; > > uint16_t hop[4]; > > struct rte_lpm *lpm = NULL; > > uint32_t ip1 = IPv4(127, 255, 255, 255), ip2 = IPv4(128, 0, 0, 0); > > @@ -380,7 +381,7 @@ test8(void) > > TEST_LPM_ASSERT((status == 0) && > > (next_hop_return == next_hop_add)); > > > > - ipx4 = _mm_set_epi32(ip2, ip1, ip2, ip1); > > + ipx4 = vect_set_epi32(ip2, ip1, ip2, ip1); > > rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); > > TEST_LPM_ASSERT(hop[0] == UINT16_MAX); > > TEST_LPM_ASSERT(hop[1] == next_hop_add); > > @@ -408,7 +409,7 @@ test8(void) > > status = rte_lpm_lookup(lpm, ip1, &next_hop_return); > > TEST_LPM_ASSERT(status == -ENOENT); > > > > - ipx4 = _mm_set_epi32(ip1, ip1, ip2, ip2); > > + ipx4 = vect_set_epi32(ip1, ip1, ip2, ip2); > > rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); > > if (depth != 1) { > > TEST_LPM_ASSERT(hop[0] == next_hop_add); > > @@ -850,7 +851,7 @@ test11(void) > > int32_t > > test12(void) > > { > > - __m128i ipx4; > > + xmm_t ipx4; > > uint16_t hop[4]; > > struct rte_lpm *lpm = NULL; > > uint32_t ip, i; > > @@ -872,7 +873,7 @@ test12(void) > > TEST_LPM_ASSERT((status == 0) && > > (next_hop_return == next_hop_add)); > > > > - ipx4 = _mm_set_epi32(ip, ip + 1, ip, ip - 1); > > + ipx4 = vect_set_epi32(ip, ip + 1, ip, ip - 1); > > rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); > > TEST_LPM_ASSERT(hop[0] == UINT16_MAX); > > TEST_LPM_ASSERT(hop[1] == next_hop_add); > > @@ -1289,10 +1290,10 @@ perf_test(void) > > begin = rte_rdtsc(); > > for (j = 0; j < BATCH_SIZE; j += RTE_DIM(next_hops)) { > > unsigned k; > > - __m128i ipx4; > > + xmm_t ipx4; > > > > - ipx4 = _mm_loadu_si128((__m128i *)(ip_batch + j)); > > - ipx4 = *(__m128i *)(ip_batch + j); > > + ipx4 = vect_loadu_sil128((xmm_t *)(ip_batch + j)); > > + ipx4 = *(xmm_t *)(ip_batch + j); > > rte_lpm_lookupx4(lpm, ipx4, next_hops, UINT16_MAX); > > for (k = 0; k < RTE_DIM(next_hops); k++) > > if (unlikely(next_hops[k] == UINT16_MAX)) > > diff --git a/app/test/test_xmmt_ops.h b/app/test/test_xmmt_ops.h > > new file mode 100644 > > index 0000000..c055912 > > --- /dev/null > > +++ b/app/test/test_xmmt_ops.h > Why add this new file under app/test, which is only for test app? > Should vect_loadu_sil128/vect_set_epi32 be in each ARCH's rte_vect.h? > V1 was like that, I thought of moving the file under app/test because 1) all the ARCH can't have the implementation for vector primitives if architecture doesn't support it like ppc64 and tile so moving EAL may not be a good idea 2) scope of vector abstraction only for using the API(i.e test app), NOT for implementing the library. So its boils down to load/store/set should not be beyond that. and I am afraid that if we opening up EAL abstraction that will change the scope and which will have performance implication to use emulating the logic in library 3) It's been discussed, There was no disagreement on this http://dpdk.org/ml/archives/dev/2015-December/029404.html Thanks, Jerin [snip] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v2 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob 2015-12-07 6:15 ` Jianbo Liu @ 2015-12-07 14:06 ` Ananyev, Konstantin 1 sibling, 0 replies; 47+ messages in thread From: Ananyev, Konstantin @ 2015-12-07 14:06 UTC (permalink / raw) To: Jerin Jacob, dev > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com] > Sent: Friday, December 04, 2015 3:15 PM > To: dev@dpdk.org > Cc: thomas.monjalon@6wind.com; Ananyev, Konstantin; viktorin@rehivetech.com; jianbo.liu@linaro.org; Jerin Jacob > Subject: [dpdk-dev] [PATCH v2 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic > > -Used architecture agnostic xmm_t to represent 128 bit SIMD variable > > -Introduced vect_* API abstraction in app/test to test rte_lpm_lookupx4 > API in architecture agnostic way > > -Moved rte_lpm_lookupx4 SSE implementation to architecture specific > rte_lpm_sse.h file to accommodate new rte_lpm_lookupx4 implementation > for a different architecture. > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > --- Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v2 2/3] lpm: add support for NEON 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 " Jerin Jacob 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob @ 2015-12-04 15:14 ` Jerin Jacob 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON Jerin Jacob 3 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2015-12-04 15:14 UTC (permalink / raw) To: dev Enabled CONFIG_RTE_LIBRTE_LPM, CONFIG_RTE_LIBRTE_TABLE, CONFIG_RTE_LIBRTE_PIPELINE libraries for arm and arm64 TABLE, PIPELINE libraries were disabled due to LPM library dependency. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org> --- app/test/test_xmmt_ops.h | 20 ++++ config/defconfig_arm-armv7a-linuxapp-gcc | 3 - config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - lib/librte_lpm/Makefile | 4 + lib/librte_lpm/rte_lpm.h | 4 + lib/librte_lpm/rte_lpm_neon.h | 148 +++++++++++++++++++++++++++++ 6 files changed, 176 insertions(+), 6 deletions(-) create mode 100644 lib/librte_lpm/rte_lpm_neon.h diff --git a/app/test/test_xmmt_ops.h b/app/test/test_xmmt_ops.h index c055912..c18fc12 100644 --- a/app/test/test_xmmt_ops.h +++ b/app/test/test_xmmt_ops.h @@ -36,6 +36,24 @@ #include <rte_vect.h> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) + +/* vect_* abstraction implementation using NEON */ + +/* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ +#define vect_loadu_sil128(p) vld1q_s32((const int32_t *)p) + +/* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ +static inline xmm_t __attribute__((always_inline)) +vect_set_epi32(int i3, int i2, int i1, int i0) +{ + int32_t data[4] = {i0, i1, i2, i3}; + + return vld1q_s32(data); +} + +#else + /* vect_* abstraction implementation using SSE */ /* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ @@ -44,4 +62,6 @@ /* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ #define vect_set_epi32(i3, i2, i1, i0) _mm_set_epi32(i3, i2, i1, i0) +#endif + #endif /* _TEST_XMMT_OPS_H_ */ diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc index 9924ff9..cdbf4ac 100644 --- a/config/defconfig_arm-armv7a-linuxapp-gcc +++ b/config/defconfig_arm-armv7a-linuxapp-gcc @@ -54,9 +54,6 @@ CONFIG_RTE_EAL_IGB_UIO=n # fails to compile on ARM CONFIG_RTE_LIBRTE_ACL=n -CONFIG_RTE_LIBRTE_LPM=n -CONFIG_RTE_LIBRTE_TABLE=n -CONFIG_RTE_LIBRTE_PIPELINE=n CONFIG_RTE_SCHED_VECTOR=n # cannot use those on ARM diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc index 504f3ed..57f7941 100644 --- a/config/defconfig_arm64-armv8a-linuxapp-gcc +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n CONFIG_RTE_LIBRTE_FM10K_PMD=n CONFIG_RTE_LIBRTE_I40E_PMD=n -CONFIG_RTE_LIBRTE_LPM=n -CONFIG_RTE_LIBRTE_TABLE=n -CONFIG_RTE_LIBRTE_PIPELINE=n CONFIG_RTE_SCHED_VECTOR=n diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index ce3a1d1..7f93006 100644 --- a/lib/librte_lpm/Makefile +++ b/lib/librte_lpm/Makefile @@ -47,7 +47,11 @@ SRCS-$(CONFIG_RTE_LIBRTE_LPM) := rte_lpm.c rte_lpm6.c # install this header file SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include := rte_lpm.h rte_lpm6.h +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),) +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_neon.h +else SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_sse.h +endif # this lib needs eal DEPDIRS-$(CONFIG_RTE_LIBRTE_LPM) += lib/librte_eal diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index dfe1378..0c892de 100644 --- a/lib/librte_lpm/rte_lpm.h +++ b/lib/librte_lpm/rte_lpm.h @@ -384,7 +384,11 @@ static inline void rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], uint16_t defv); +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) +#include "rte_lpm_neon.h" +#else #include "rte_lpm_sse.h" +#endif #ifdef __cplusplus } diff --git a/lib/librte_lpm/rte_lpm_neon.h b/lib/librte_lpm/rte_lpm_neon.h new file mode 100644 index 0000000..fcd2a8a --- /dev/null +++ b/lib/librte_lpm/rte_lpm_neon.h @@ -0,0 +1,148 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Cavium Networks. All rights reserved. + * All rights reserved. + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Derived rte_lpm_lookupx4 implementation from lib/librte_lpm/rte_lpm_sse.h + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Cavium Networks nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_LPM_NEON_H_ +#define _RTE_LPM_NEON_H_ + +#include <rte_branch_prediction.h> +#include <rte_byteorder.h> +#include <rte_memory.h> +#include <rte_common.h> +#include <rte_vect.h> + +#ifdef __cplusplus +extern "C" { +#endif + +static inline void +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], + uint16_t defv) +{ + uint32x4_t i24; + rte_xmm_t i8; + uint16_t tbl[4]; + uint64_t idx, pt; + + const uint32_t mask = UINT8_MAX; + const int32x4_t mask8 = vdupq_n_s32(mask); + + /* + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries + * as one 64-bit value (0x0300030003000300). + */ + const uint64_t mask_xv = + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 16 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 48); + + /* + * RTE_LPM_LOOKUP_SUCCESS for 4 LPM entries + * as one 64-bit value (0x0100010001000100). + */ + const uint64_t mask_v = + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 16 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48); + + /* get 4 indexes for tbl24[]. */ + i24 = vshrq_n_u32((uint32x4_t)ip, CHAR_BIT); + + /* extract values from tbl24[] */ + idx = vgetq_lane_u64((uint64x2_t)i24, 0); + + tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + idx = vgetq_lane_u64((uint64x2_t)i24, 1); + + tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + /* get 4 indexes for tbl8[]. */ + i8.x = vandq_s32(ip, mask8); + + pt = (uint64_t)tbl[0] | + (uint64_t)tbl[1] << 16 | + (uint64_t)tbl[2] << 32 | + (uint64_t)tbl[3] << 48; + + /* search successfully finished for all 4 IP addresses. */ + if (likely((pt & mask_xv) == mask_v)) { + uintptr_t ph = (uintptr_t)hop; + *(uint64_t *)ph = pt & RTE_LPM_MASKX4_RES; + return; + } + + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[0] = i8.u32[0] + + (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[0] = *(const uint16_t *)&lpm->tbl8[i8.u32[0]]; + } + if (unlikely((pt >> 16 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[1] = i8.u32[1] + + (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[1] = *(const uint16_t *)&lpm->tbl8[i8.u32[1]]; + } + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[2] = i8.u32[2] + + (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[2] = *(const uint16_t *)&lpm->tbl8[i8.u32[2]]; + } + if (unlikely((pt >> 48 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[3] = i8.u32[3] + + (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[3] = *(const uint16_t *)&lpm->tbl8[i8.u32[3]]; + } + + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[0] : defv; + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[1] : defv; + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[2] : defv; + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_LPM_NEON_H_ */ -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v2 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 " Jerin Jacob 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 2/3] lpm: add support for NEON Jerin Jacob @ 2015-12-04 15:14 ` Jerin Jacob 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON Jerin Jacob 3 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2015-12-04 15:14 UTC (permalink / raw) To: dev Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> --- MAINTAINERS | 3 +++ 1 file changed, 3 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 460245b..b8ca465 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -133,6 +133,9 @@ ARM v8 M: Jerin Jacob <jerin.jacob@caviumnetworks.com> F: lib/librte_eal/common/include/arch/arm/*_64.h F: lib/librte_acl/acl_run_neon.* +F: lib/librte_lpm/rte_lpm_neon.h +F: lib/librte_hash/rte_crc_arm64.h +F: lib/librte_hash/rte_cmp_arm64.h EZchip TILE-Gx M: Zhigang Lu <zlu@ezchip.com> -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 " Jerin Jacob ` (2 preceding siblings ...) 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob @ 2016-01-29 4:10 ` Jerin Jacob 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob ` (4 more replies) 3 siblings, 5 replies; 47+ messages in thread From: Jerin Jacob @ 2016-01-29 4:10 UTC (permalink / raw) To: dev; +Cc: viktorin - This patch enables lpm for ARM - Used architecture agnostic xmm_t to represent 128 bit SIMD variable in rte_lpm_lookupx4 API definition - Tested on Juno and Thunderx boards - Tested and verified the changes with following DPDK unit test cases --lpm_autotest --lpm6_autotest v1..v2 - make rte_lpm_lookupx4 API definition architecture agnostic - vect_* abstraction scope reduce to only app/test as this abstraction used only to load/store and set vectors in test application which is the consumer of rte_lpm_lookupx4 like API - support for armv7 apart from armv8 - taken changes from Jianbo's lpm patches v2..v3 - add Acked-by for 0001-lpm-make-rte_lpm_lookupx4-API-definition-architectur.patch - re-based to DPDK 2.2 -- fixed the conflict in config/defconfig_arm-armv7a-linuxapp-gcc and MAINTAINERS file Jerin Jacob (3): lpm: make rte_lpm_lookupx4 API definition architecture agnostic lpm: add support for NEON maintainers: claim responsibility for arm64 specific files of hash and lpm MAINTAINERS | 3 + app/test/test_lpm.c | 21 ++-- app/test/test_xmmt_ops.h | 67 +++++++++++++ config/defconfig_arm-armv7a-linuxapp-gcc | 3 - config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - lib/librte_lpm/Makefile | 6 ++ lib/librte_lpm/rte_lpm.h | 99 ++----------------- lib/librte_lpm/rte_lpm_neon.h | 148 +++++++++++++++++++++++++++++ lib/librte_lpm/rte_lpm_sse.h | 143 ++++++++++++++++++++++++++++ 9 files changed, 386 insertions(+), 107 deletions(-) create mode 100644 app/test/test_xmmt_ops.h create mode 100644 lib/librte_lpm/rte_lpm_neon.h create mode 100644 lib/librte_lpm/rte_lpm_sse.h -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v3 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON Jerin Jacob @ 2016-01-29 4:10 ` Jerin Jacob 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 2/3] lpm: add support for NEON Jerin Jacob ` (3 subsequent siblings) 4 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2016-01-29 4:10 UTC (permalink / raw) To: dev; +Cc: viktorin -Used architecture agnostic xmm_t to represent 128 bit SIMD variable -Introduced vect_* API abstraction in app/test to test rte_lpm_lookupx4 API in architecture agnostic way -Moved rte_lpm_lookupx4 SSE implementation to architecture specific rte_lpm_sse.h file to accommodate new rte_lpm_lookupx4 implementation for a different architecture. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- app/test/test_lpm.c | 21 ++++--- app/test/test_xmmt_ops.h | 47 ++++++++++++++ lib/librte_lpm/Makefile | 2 + lib/librte_lpm/rte_lpm.h | 93 +--------------------------- lib/librte_lpm/rte_lpm_sse.h | 143 +++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 206 insertions(+), 100 deletions(-) create mode 100644 app/test/test_xmmt_ops.h create mode 100644 lib/librte_lpm/rte_lpm_sse.h diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c index 8b4ded9..59674f1 100644 --- a/app/test/test_lpm.c +++ b/app/test/test_lpm.c @@ -49,6 +49,7 @@ #include "rte_lpm.h" #include "test_lpm_routes.h" +#include "test_xmmt_ops.h" #define TEST_LPM_ASSERT(cond) do { \ if (!(cond)) { \ @@ -308,7 +309,7 @@ test6(void) int32_t test7(void) { - __m128i ipx4; + xmm_t ipx4; uint16_t hop[4]; struct rte_lpm *lpm = NULL; uint32_t ip = IPv4(0, 0, 0, 0); @@ -324,7 +325,7 @@ test7(void) status = rte_lpm_lookup(lpm, ip, &next_hop_return); TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip, ip + 0x100, ip - 0x100, ip); + ipx4 = vect_set_epi32(ip, ip + 0x100, ip - 0x100, ip); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); TEST_LPM_ASSERT(hop[0] == next_hop_add); TEST_LPM_ASSERT(hop[1] == UINT16_MAX); @@ -354,7 +355,7 @@ test7(void) int32_t test8(void) { - __m128i ipx4; + xmm_t ipx4; uint16_t hop[4]; struct rte_lpm *lpm = NULL; uint32_t ip1 = IPv4(127, 255, 255, 255), ip2 = IPv4(128, 0, 0, 0); @@ -380,7 +381,7 @@ test8(void) TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip2, ip1, ip2, ip1); + ipx4 = vect_set_epi32(ip2, ip1, ip2, ip1); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); TEST_LPM_ASSERT(hop[0] == UINT16_MAX); TEST_LPM_ASSERT(hop[1] == next_hop_add); @@ -408,7 +409,7 @@ test8(void) status = rte_lpm_lookup(lpm, ip1, &next_hop_return); TEST_LPM_ASSERT(status == -ENOENT); - ipx4 = _mm_set_epi32(ip1, ip1, ip2, ip2); + ipx4 = vect_set_epi32(ip1, ip1, ip2, ip2); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); if (depth != 1) { TEST_LPM_ASSERT(hop[0] == next_hop_add); @@ -850,7 +851,7 @@ test11(void) int32_t test12(void) { - __m128i ipx4; + xmm_t ipx4; uint16_t hop[4]; struct rte_lpm *lpm = NULL; uint32_t ip, i; @@ -872,7 +873,7 @@ test12(void) TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip, ip + 1, ip, ip - 1); + ipx4 = vect_set_epi32(ip, ip + 1, ip, ip - 1); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); TEST_LPM_ASSERT(hop[0] == UINT16_MAX); TEST_LPM_ASSERT(hop[1] == next_hop_add); @@ -1289,10 +1290,10 @@ perf_test(void) begin = rte_rdtsc(); for (j = 0; j < BATCH_SIZE; j += RTE_DIM(next_hops)) { unsigned k; - __m128i ipx4; + xmm_t ipx4; - ipx4 = _mm_loadu_si128((__m128i *)(ip_batch + j)); - ipx4 = *(__m128i *)(ip_batch + j); + ipx4 = vect_loadu_sil128((xmm_t *)(ip_batch + j)); + ipx4 = *(xmm_t *)(ip_batch + j); rte_lpm_lookupx4(lpm, ipx4, next_hops, UINT16_MAX); for (k = 0; k < RTE_DIM(next_hops); k++) if (unlikely(next_hops[k] == UINT16_MAX)) diff --git a/app/test/test_xmmt_ops.h b/app/test/test_xmmt_ops.h new file mode 100644 index 0000000..c055912 --- /dev/null +++ b/app/test/test_xmmt_ops.h @@ -0,0 +1,47 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Cavium Networks. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Cavium Networks nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _TEST_XMMT_OPS_H_ +#define _TEST_XMMT_OPS_H_ + +#include <rte_vect.h> + +/* vect_* abstraction implementation using SSE */ + +/* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ +#define vect_loadu_sil128(p) _mm_loadu_si128(p) + +/* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ +#define vect_set_epi32(i3, i2, i1, i0) _mm_set_epi32(i3, i2, i1, i0) + +#endif /* _TEST_XMMT_OPS_H_ */ diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index 688cfc9..ce3a1d1 100644 --- a/lib/librte_lpm/Makefile +++ b/lib/librte_lpm/Makefile @@ -47,6 +47,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_LPM) := rte_lpm.c rte_lpm6.c # install this header file SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include := rte_lpm.h rte_lpm6.h +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_sse.h + # this lib needs eal DEPDIRS-$(CONFIG_RTE_LIBRTE_LPM) += lib/librte_eal diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index c299ce2..dfe1378 100644 --- a/lib/librte_lpm/rte_lpm.h +++ b/lib/librte_lpm/rte_lpm.h @@ -381,97 +381,10 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips, * if lookup would fail. */ static inline void -rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4], - uint16_t defv) -{ - __m128i i24; - rte_xmm_t i8; - uint16_t tbl[4]; - uint64_t idx, pt; +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], + uint16_t defv); - const __m128i mask8 = - _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX); - - /* - * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries - * as one 64-bit value (0x0300030003000300). - */ - const uint64_t mask_xv = - ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 16 | - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32 | - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 48); - - /* - * RTE_LPM_LOOKUP_SUCCESS for 4 LPM entries - * as one 64-bit value (0x0100010001000100). - */ - const uint64_t mask_v = - ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 16 | - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 | - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48); - - /* get 4 indexes for tbl24[]. */ - i24 = _mm_srli_epi32(ip, CHAR_BIT); - - /* extract values from tbl24[] */ - idx = _mm_cvtsi128_si64(i24); - i24 = _mm_srli_si128(i24, sizeof(uint64_t)); - - tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; - tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; - - idx = _mm_cvtsi128_si64(i24); - - tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; - tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; - - /* get 4 indexes for tbl8[]. */ - i8.x = _mm_and_si128(ip, mask8); - - pt = (uint64_t)tbl[0] | - (uint64_t)tbl[1] << 16 | - (uint64_t)tbl[2] << 32 | - (uint64_t)tbl[3] << 48; - - /* search successfully finished for all 4 IP addresses. */ - if (likely((pt & mask_xv) == mask_v)) { - uintptr_t ph = (uintptr_t)hop; - *(uint64_t *)ph = pt & RTE_LPM_MASKX4_RES; - return; - } - - if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[0] = i8.u32[0] + - (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl[0] = *(const uint16_t *)&lpm->tbl8[i8.u32[0]]; - } - if (unlikely((pt >> 16 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[1] = i8.u32[1] + - (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl[1] = *(const uint16_t *)&lpm->tbl8[i8.u32[1]]; - } - if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[2] = i8.u32[2] + - (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl[2] = *(const uint16_t *)&lpm->tbl8[i8.u32[2]]; - } - if (unlikely((pt >> 48 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[3] = i8.u32[3] + - (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl[3] = *(const uint16_t *)&lpm->tbl8[i8.u32[3]]; - } - - hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[0] : defv; - hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[1] : defv; - hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[2] : defv; - hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; -} +#include "rte_lpm_sse.h" #ifdef __cplusplus } diff --git a/lib/librte_lpm/rte_lpm_sse.h b/lib/librte_lpm/rte_lpm_sse.h new file mode 100644 index 0000000..2b7eeec --- /dev/null +++ b/lib/librte_lpm/rte_lpm_sse.h @@ -0,0 +1,143 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_LPM_SSE_H_ +#define _RTE_LPM_SSE_H_ + +#include <rte_branch_prediction.h> +#include <rte_byteorder.h> +#include <rte_common.h> +#include <rte_vect.h> + +#ifdef __cplusplus +extern "C" { +#endif + +static inline void +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], + uint16_t defv) +{ + __m128i i24; + rte_xmm_t i8; + uint16_t tbl[4]; + uint64_t idx, pt; + + const __m128i mask8 = + _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX); + + /* + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries + * as one 64-bit value (0x0300030003000300). + */ + const uint64_t mask_xv = + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 16 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 48); + + /* + * RTE_LPM_LOOKUP_SUCCESS for 4 LPM entries + * as one 64-bit value (0x0100010001000100). + */ + const uint64_t mask_v = + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 16 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48); + + /* get 4 indexes for tbl24[]. */ + i24 = _mm_srli_epi32(ip, CHAR_BIT); + + /* extract values from tbl24[] */ + idx = _mm_cvtsi128_si64(i24); + i24 = _mm_srli_si128(i24, sizeof(uint64_t)); + + tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + idx = _mm_cvtsi128_si64(i24); + + tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + /* get 4 indexes for tbl8[]. */ + i8.x = _mm_and_si128(ip, mask8); + + pt = (uint64_t)tbl[0] | + (uint64_t)tbl[1] << 16 | + (uint64_t)tbl[2] << 32 | + (uint64_t)tbl[3] << 48; + + /* search successfully finished for all 4 IP addresses. */ + if (likely((pt & mask_xv) == mask_v)) { + uintptr_t ph = (uintptr_t)hop; + *(uint64_t *)ph = pt & RTE_LPM_MASKX4_RES; + return; + } + + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[0] = i8.u32[0] + + (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[0] = *(const uint16_t *)&lpm->tbl8[i8.u32[0]]; + } + if (unlikely((pt >> 16 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[1] = i8.u32[1] + + (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[1] = *(const uint16_t *)&lpm->tbl8[i8.u32[1]]; + } + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[2] = i8.u32[2] + + (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[2] = *(const uint16_t *)&lpm->tbl8[i8.u32[2]]; + } + if (unlikely((pt >> 48 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[3] = i8.u32[3] + + (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[3] = *(const uint16_t *)&lpm->tbl8[i8.u32[3]]; + } + + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[0] : defv; + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[1] : defv; + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[2] : defv; + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_LPM_SSE_H_ */ -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v3 2/3] lpm: add support for NEON 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON Jerin Jacob 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob @ 2016-01-29 4:10 ` Jerin Jacob 2016-02-11 11:46 ` Thomas Monjalon 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob ` (2 subsequent siblings) 4 siblings, 1 reply; 47+ messages in thread From: Jerin Jacob @ 2016-01-29 4:10 UTC (permalink / raw) To: dev; +Cc: viktorin Enabled CONFIG_RTE_LIBRTE_LPM, CONFIG_RTE_LIBRTE_TABLE, CONFIG_RTE_LIBRTE_PIPELINE libraries for arm and arm64 TABLE, PIPELINE libraries were disabled due to LPM library dependency. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org> --- app/test/test_xmmt_ops.h | 20 ++++ config/defconfig_arm-armv7a-linuxapp-gcc | 3 - config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - lib/librte_lpm/Makefile | 4 + lib/librte_lpm/rte_lpm.h | 4 + lib/librte_lpm/rte_lpm_neon.h | 148 +++++++++++++++++++++++++++++ 6 files changed, 176 insertions(+), 6 deletions(-) create mode 100644 lib/librte_lpm/rte_lpm_neon.h diff --git a/app/test/test_xmmt_ops.h b/app/test/test_xmmt_ops.h index c055912..c18fc12 100644 --- a/app/test/test_xmmt_ops.h +++ b/app/test/test_xmmt_ops.h @@ -36,6 +36,24 @@ #include <rte_vect.h> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) + +/* vect_* abstraction implementation using NEON */ + +/* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ +#define vect_loadu_sil128(p) vld1q_s32((const int32_t *)p) + +/* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ +static inline xmm_t __attribute__((always_inline)) +vect_set_epi32(int i3, int i2, int i1, int i0) +{ + int32_t data[4] = {i0, i1, i2, i3}; + + return vld1q_s32(data); +} + +#else + /* vect_* abstraction implementation using SSE */ /* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ @@ -44,4 +62,6 @@ /* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ #define vect_set_epi32(i3, i2, i1, i0) _mm_set_epi32(i3, i2, i1, i0) +#endif + #endif /* _TEST_XMMT_OPS_H_ */ diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc index cbebd64..efffa1f 100644 --- a/config/defconfig_arm-armv7a-linuxapp-gcc +++ b/config/defconfig_arm-armv7a-linuxapp-gcc @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n CONFIG_RTE_EAL_IGB_UIO=n # fails to compile on ARM -CONFIG_RTE_LIBRTE_LPM=n -CONFIG_RTE_LIBRTE_TABLE=n -CONFIG_RTE_LIBRTE_PIPELINE=n CONFIG_RTE_SCHED_VECTOR=n # cannot use those on ARM diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc index 504f3ed..57f7941 100644 --- a/config/defconfig_arm64-armv8a-linuxapp-gcc +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc @@ -51,7 +51,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n CONFIG_RTE_LIBRTE_FM10K_PMD=n CONFIG_RTE_LIBRTE_I40E_PMD=n -CONFIG_RTE_LIBRTE_LPM=n -CONFIG_RTE_LIBRTE_TABLE=n -CONFIG_RTE_LIBRTE_PIPELINE=n CONFIG_RTE_SCHED_VECTOR=n diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index ce3a1d1..7f93006 100644 --- a/lib/librte_lpm/Makefile +++ b/lib/librte_lpm/Makefile @@ -47,7 +47,11 @@ SRCS-$(CONFIG_RTE_LIBRTE_LPM) := rte_lpm.c rte_lpm6.c # install this header file SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include := rte_lpm.h rte_lpm6.h +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),) +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_neon.h +else SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_sse.h +endif # this lib needs eal DEPDIRS-$(CONFIG_RTE_LIBRTE_LPM) += lib/librte_eal diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index dfe1378..0c892de 100644 --- a/lib/librte_lpm/rte_lpm.h +++ b/lib/librte_lpm/rte_lpm.h @@ -384,7 +384,11 @@ static inline void rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], uint16_t defv); +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) +#include "rte_lpm_neon.h" +#else #include "rte_lpm_sse.h" +#endif #ifdef __cplusplus } diff --git a/lib/librte_lpm/rte_lpm_neon.h b/lib/librte_lpm/rte_lpm_neon.h new file mode 100644 index 0000000..fcd2a8a --- /dev/null +++ b/lib/librte_lpm/rte_lpm_neon.h @@ -0,0 +1,148 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Cavium Networks. All rights reserved. + * All rights reserved. + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Derived rte_lpm_lookupx4 implementation from lib/librte_lpm/rte_lpm_sse.h + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Cavium Networks nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_LPM_NEON_H_ +#define _RTE_LPM_NEON_H_ + +#include <rte_branch_prediction.h> +#include <rte_byteorder.h> +#include <rte_memory.h> +#include <rte_common.h> +#include <rte_vect.h> + +#ifdef __cplusplus +extern "C" { +#endif + +static inline void +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], + uint16_t defv) +{ + uint32x4_t i24; + rte_xmm_t i8; + uint16_t tbl[4]; + uint64_t idx, pt; + + const uint32_t mask = UINT8_MAX; + const int32x4_t mask8 = vdupq_n_s32(mask); + + /* + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries + * as one 64-bit value (0x0300030003000300). + */ + const uint64_t mask_xv = + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 16 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 48); + + /* + * RTE_LPM_LOOKUP_SUCCESS for 4 LPM entries + * as one 64-bit value (0x0100010001000100). + */ + const uint64_t mask_v = + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 16 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48); + + /* get 4 indexes for tbl24[]. */ + i24 = vshrq_n_u32((uint32x4_t)ip, CHAR_BIT); + + /* extract values from tbl24[] */ + idx = vgetq_lane_u64((uint64x2_t)i24, 0); + + tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + idx = vgetq_lane_u64((uint64x2_t)i24, 1); + + tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + /* get 4 indexes for tbl8[]. */ + i8.x = vandq_s32(ip, mask8); + + pt = (uint64_t)tbl[0] | + (uint64_t)tbl[1] << 16 | + (uint64_t)tbl[2] << 32 | + (uint64_t)tbl[3] << 48; + + /* search successfully finished for all 4 IP addresses. */ + if (likely((pt & mask_xv) == mask_v)) { + uintptr_t ph = (uintptr_t)hop; + *(uint64_t *)ph = pt & RTE_LPM_MASKX4_RES; + return; + } + + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[0] = i8.u32[0] + + (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[0] = *(const uint16_t *)&lpm->tbl8[i8.u32[0]]; + } + if (unlikely((pt >> 16 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[1] = i8.u32[1] + + (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[1] = *(const uint16_t *)&lpm->tbl8[i8.u32[1]]; + } + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[2] = i8.u32[2] + + (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[2] = *(const uint16_t *)&lpm->tbl8[i8.u32[2]]; + } + if (unlikely((pt >> 48 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[3] = i8.u32[3] + + (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[3] = *(const uint16_t *)&lpm->tbl8[i8.u32[3]]; + } + + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[0] : defv; + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[1] : defv; + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[2] : defv; + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_LPM_NEON_H_ */ -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/3] lpm: add support for NEON 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 2/3] lpm: add support for NEON Jerin Jacob @ 2016-02-11 11:46 ` Thomas Monjalon 2016-02-12 6:47 ` Jerin Jacob 0 siblings, 1 reply; 47+ messages in thread From: Thomas Monjalon @ 2016-02-11 11:46 UTC (permalink / raw) To: Jerin Jacob, konstantin.ananyev; +Cc: dev, viktorin 2016-01-29 09:40, Jerin Jacob: > --- a/app/test/test_xmmt_ops.h > +++ b/app/test/test_xmmt_ops.h > +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) [...] > +#else [...] > --- a/lib/librte_lpm/Makefile > +++ b/lib/librte_lpm/Makefile > +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),) > +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_neon.h > +else > SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_sse.h > +endif [...] > --- a/lib/librte_lpm/rte_lpm.h > +++ b/lib/librte_lpm/rte_lpm.h > +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) > +#include "rte_lpm_neon.h" > +#else > #include "rte_lpm_sse.h" > +#endif Instead of defaulting to x86 SSE, it would be better to replace "else" by "elif X86/SSE". I suggest using RTE_ARCH_X86 or RTE_CPUFLAG_SSEx. By the way, what is the minimum SSE version required? ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/3] lpm: add support for NEON 2016-02-11 11:46 ` Thomas Monjalon @ 2016-02-12 6:47 ` Jerin Jacob 2016-02-12 8:42 ` Thomas Monjalon 0 siblings, 1 reply; 47+ messages in thread From: Jerin Jacob @ 2016-02-12 6:47 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev, viktorin On Thu, Feb 11, 2016 at 12:46:33PM +0100, Thomas Monjalon wrote: > 2016-01-29 09:40, Jerin Jacob: > > --- a/lib/librte_lpm/Makefile > > +++ b/lib/librte_lpm/Makefile > > +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),) > > +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_neon.h > > +else > > SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_sse.h > > +endif > [...] > > --- a/lib/librte_lpm/rte_lpm.h > > +++ b/lib/librte_lpm/rte_lpm.h > > +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) > > +#include "rte_lpm_neon.h" > > +#else > > #include "rte_lpm_sse.h" > > +#endif > > Instead of defaulting to x86 SSE, it would be better to replace > "else" by "elif X86/SSE". > I suggest using RTE_ARCH_X86 or RTE_CPUFLAG_SSEx. Some architectures(tile)[1] are planning to emulate SSE instruction used in LPM for LPM library support.So that way it makes sense to use SSE as default. But if anyone has any objections then I can add the check else let keep in existing way. [1] http://dpdk.org/ml/archives/dev/2016-January/031147.html Jerin > By the way, what is the minimum SSE version required? > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/3] lpm: add support for NEON 2016-02-12 6:47 ` Jerin Jacob @ 2016-02-12 8:42 ` Thomas Monjalon 0 siblings, 0 replies; 47+ messages in thread From: Thomas Monjalon @ 2016-02-12 8:42 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, viktorin 2016-02-12 12:17, Jerin Jacob: > On Thu, Feb 11, 2016 at 12:46:33PM +0100, Thomas Monjalon wrote: > > 2016-01-29 09:40, Jerin Jacob: > > > --- a/lib/librte_lpm/Makefile > > > +++ b/lib/librte_lpm/Makefile > > > +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),) > > > +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_neon.h > > > +else > > > SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_sse.h > > > +endif > > [...] > > > --- a/lib/librte_lpm/rte_lpm.h > > > +++ b/lib/librte_lpm/rte_lpm.h > > > +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) > > > +#include "rte_lpm_neon.h" > > > +#else > > > #include "rte_lpm_sse.h" > > > +#endif > > > > Instead of defaulting to x86 SSE, it would be better to replace > > "else" by "elif X86/SSE". > > I suggest using RTE_ARCH_X86 or RTE_CPUFLAG_SSEx. > > Some architectures(tile)[1] are planning to emulate SSE instruction used > in LPM for LPM library support.So that way it makes sense to use SSE as default. Not sure it is a great idea to emulate instructions of another arch. > But if anyone has any objections then I can add the check else let > keep in existing way. If Tile wants to use x86 code, it's better to do it explicitly (X86 || TILE). > [1] > http://dpdk.org/ml/archives/dev/2016-January/031147.html ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v3 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON Jerin Jacob 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 2/3] lpm: add support for NEON Jerin Jacob @ 2016-01-29 4:10 ` Jerin Jacob 2016-02-08 9:29 ` [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON Jerin Jacob 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 " Jerin Jacob 4 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2016-01-29 4:10 UTC (permalink / raw) To: dev; +Cc: viktorin Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> --- MAINTAINERS | 3 +++ 1 file changed, 3 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index b90aeea..e3fab58 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -138,6 +138,9 @@ M: Jerin Jacob <jerin.jacob@caviumnetworks.com> M: Jianbo Liu <jianbo.liu@linaro.org> F: lib/librte_eal/common/include/arch/arm/*_64.h F: lib/librte_acl/acl_run_neon.* +F: lib/librte_lpm/rte_lpm_neon.h +F: lib/librte_hash/rte_crc_arm64.h +F: lib/librte_hash/rte_cmp_arm64.h EZchip TILE-Gx M: Zhigang Lu <zlu@ezchip.com> -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON Jerin Jacob ` (2 preceding siblings ...) 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob @ 2016-02-08 9:29 ` Jerin Jacob 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 " Jerin Jacob 4 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2016-02-08 9:29 UTC (permalink / raw) To: dev; +Cc: viktorin On Fri, Jan 29, 2016 at 09:40:42AM +0530, Jerin Jacob wrote: > - This patch enables lpm for ARM > - Used architecture agnostic xmm_t to represent 128 bit SIMD variable in > rte_lpm_lookupx4 API definition > - Tested on Juno and Thunderx boards > - Tested and verified the changes with following DPDK unit test cases > --lpm_autotest > --lpm6_autotest > v1..v2 > - make rte_lpm_lookupx4 API definition architecture agnostic > - vect_* abstraction scope reduce to only app/test as this abstraction used > only to load/store and set vectors in test application which is > the consumer of rte_lpm_lookupx4 like API > - support for armv7 apart from armv8 > - taken changes from Jianbo's lpm patches > > v2..v3 > - add Acked-by for 0001-lpm-make-rte_lpm_lookupx4-API-definition-architectur.patch > - re-based to DPDK 2.2 > -- fixed the conflict in config/defconfig_arm-armv7a-linuxapp-gcc and MAINTAINERS file > > Jerin Jacob (3): > lpm: make rte_lpm_lookupx4 API definition architecture agnostic > lpm: add support for NEON > maintainers: claim responsibility for arm64 specific files of hash and > lpm > ping for review/merge. > MAINTAINERS | 3 + > app/test/test_lpm.c | 21 ++-- > app/test/test_xmmt_ops.h | 67 +++++++++++++ > config/defconfig_arm-armv7a-linuxapp-gcc | 3 - > config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - > lib/librte_lpm/Makefile | 6 ++ > lib/librte_lpm/rte_lpm.h | 99 ++----------------- > lib/librte_lpm/rte_lpm_neon.h | 148 +++++++++++++++++++++++++++++ > lib/librte_lpm/rte_lpm_sse.h | 143 ++++++++++++++++++++++++++++ > 9 files changed, 386 insertions(+), 107 deletions(-) > create mode 100644 app/test/test_xmmt_ops.h > create mode 100644 lib/librte_lpm/rte_lpm_neon.h > create mode 100644 lib/librte_lpm/rte_lpm_sse.h > > -- > 2.1.0 > ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON Jerin Jacob ` (3 preceding siblings ...) 2016-02-08 9:29 ` [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON Jerin Jacob @ 2016-02-12 12:28 ` Jerin Jacob 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob ` (4 more replies) 4 siblings, 5 replies; 47+ messages in thread From: Jerin Jacob @ 2016-02-12 12:28 UTC (permalink / raw) To: dev; +Cc: viktorin - This patch enables lpm for ARM - Used architecture agnostic xmm_t to represent 128 bit SIMD variable in rte_lpm_lookupx4 API definition - Tested on Juno and Thunderx boards - Tested and verified the changes with following DPDK unit test cases --lpm_autotest --lpm6_autotest v1..v2 - make rte_lpm_lookupx4 API definition architecture agnostic - vect_* abstraction scope reduce to only app/test as this abstraction used only to load/store and set vectors in test application which is the consumer of rte_lpm_lookupx4 like API - support for armv7 apart from armv8 - taken changes from Jianbo's lpm patches v2..v3 - add Acked-by for 0001-lpm-make-rte_lpm_lookupx4-API-definition-architectur.patch - re-based to DPDK 2.2 -- fixed the conflict in config/defconfig_arm-armv7a-linuxapp-gcc and MAINTAINERS file v3..v4 -Instead of defaulting the lpm implementation to SSE, SSE implementation kept under RTE_ARCH_X86 conditional compilation check as suggested by Thomas Jerin Jacob (3): lpm: make rte_lpm_lookupx4 API definition architecture agnostic lpm: add support for NEON maintainers: claim responsibility for arm64 specific files of hash and lpm MAINTAINERS | 3 + app/test/test_lpm.c | 21 ++-- app/test/test_xmmt_ops.h | 67 +++++++++++++ config/defconfig_arm-armv7a-linuxapp-gcc | 3 - config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - lib/librte_lpm/Makefile | 6 ++ lib/librte_lpm/rte_lpm.h | 99 ++----------------- lib/librte_lpm/rte_lpm_neon.h | 148 +++++++++++++++++++++++++++++ lib/librte_lpm/rte_lpm_sse.h | 143 ++++++++++++++++++++++++++++ 9 files changed, 386 insertions(+), 107 deletions(-) create mode 100644 app/test/test_xmmt_ops.h create mode 100644 lib/librte_lpm/rte_lpm_neon.h create mode 100644 lib/librte_lpm/rte_lpm_sse.h -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v4 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 " Jerin Jacob @ 2016-02-12 12:28 ` Jerin Jacob 2016-03-01 17:42 ` Thomas Monjalon 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 2/3] lpm: add support for NEON Jerin Jacob ` (3 subsequent siblings) 4 siblings, 1 reply; 47+ messages in thread From: Jerin Jacob @ 2016-02-12 12:28 UTC (permalink / raw) To: dev; +Cc: viktorin -Used architecture agnostic xmm_t to represent 128 bit SIMD variable -Introduced vect_* API abstraction in app/test to test rte_lpm_lookupx4 API in architecture agnostic way -Moved rte_lpm_lookupx4 SSE implementation to architecture specific rte_lpm_sse.h file to accommodate new rte_lpm_lookupx4 implementation for a different architecture. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- app/test/test_lpm.c | 21 ++++--- app/test/test_xmmt_ops.h | 47 ++++++++++++++ lib/librte_lpm/Makefile | 2 + lib/librte_lpm/rte_lpm.h | 93 +--------------------------- lib/librte_lpm/rte_lpm_sse.h | 143 +++++++++++++++++++++++++++++++++++++++++++ 5 files changed, 206 insertions(+), 100 deletions(-) create mode 100644 app/test/test_xmmt_ops.h create mode 100644 lib/librte_lpm/rte_lpm_sse.h diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c index 8b4ded9..59674f1 100644 --- a/app/test/test_lpm.c +++ b/app/test/test_lpm.c @@ -49,6 +49,7 @@ #include "rte_lpm.h" #include "test_lpm_routes.h" +#include "test_xmmt_ops.h" #define TEST_LPM_ASSERT(cond) do { \ if (!(cond)) { \ @@ -308,7 +309,7 @@ test6(void) int32_t test7(void) { - __m128i ipx4; + xmm_t ipx4; uint16_t hop[4]; struct rte_lpm *lpm = NULL; uint32_t ip = IPv4(0, 0, 0, 0); @@ -324,7 +325,7 @@ test7(void) status = rte_lpm_lookup(lpm, ip, &next_hop_return); TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip, ip + 0x100, ip - 0x100, ip); + ipx4 = vect_set_epi32(ip, ip + 0x100, ip - 0x100, ip); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); TEST_LPM_ASSERT(hop[0] == next_hop_add); TEST_LPM_ASSERT(hop[1] == UINT16_MAX); @@ -354,7 +355,7 @@ test7(void) int32_t test8(void) { - __m128i ipx4; + xmm_t ipx4; uint16_t hop[4]; struct rte_lpm *lpm = NULL; uint32_t ip1 = IPv4(127, 255, 255, 255), ip2 = IPv4(128, 0, 0, 0); @@ -380,7 +381,7 @@ test8(void) TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip2, ip1, ip2, ip1); + ipx4 = vect_set_epi32(ip2, ip1, ip2, ip1); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); TEST_LPM_ASSERT(hop[0] == UINT16_MAX); TEST_LPM_ASSERT(hop[1] == next_hop_add); @@ -408,7 +409,7 @@ test8(void) status = rte_lpm_lookup(lpm, ip1, &next_hop_return); TEST_LPM_ASSERT(status == -ENOENT); - ipx4 = _mm_set_epi32(ip1, ip1, ip2, ip2); + ipx4 = vect_set_epi32(ip1, ip1, ip2, ip2); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); if (depth != 1) { TEST_LPM_ASSERT(hop[0] == next_hop_add); @@ -850,7 +851,7 @@ test11(void) int32_t test12(void) { - __m128i ipx4; + xmm_t ipx4; uint16_t hop[4]; struct rte_lpm *lpm = NULL; uint32_t ip, i; @@ -872,7 +873,7 @@ test12(void) TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip, ip + 1, ip, ip - 1); + ipx4 = vect_set_epi32(ip, ip + 1, ip, ip - 1); rte_lpm_lookupx4(lpm, ipx4, hop, UINT16_MAX); TEST_LPM_ASSERT(hop[0] == UINT16_MAX); TEST_LPM_ASSERT(hop[1] == next_hop_add); @@ -1289,10 +1290,10 @@ perf_test(void) begin = rte_rdtsc(); for (j = 0; j < BATCH_SIZE; j += RTE_DIM(next_hops)) { unsigned k; - __m128i ipx4; + xmm_t ipx4; - ipx4 = _mm_loadu_si128((__m128i *)(ip_batch + j)); - ipx4 = *(__m128i *)(ip_batch + j); + ipx4 = vect_loadu_sil128((xmm_t *)(ip_batch + j)); + ipx4 = *(xmm_t *)(ip_batch + j); rte_lpm_lookupx4(lpm, ipx4, next_hops, UINT16_MAX); for (k = 0; k < RTE_DIM(next_hops); k++) if (unlikely(next_hops[k] == UINT16_MAX)) diff --git a/app/test/test_xmmt_ops.h b/app/test/test_xmmt_ops.h new file mode 100644 index 0000000..c055912 --- /dev/null +++ b/app/test/test_xmmt_ops.h @@ -0,0 +1,47 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Cavium Networks. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Cavium Networks nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _TEST_XMMT_OPS_H_ +#define _TEST_XMMT_OPS_H_ + +#include <rte_vect.h> + +/* vect_* abstraction implementation using SSE */ + +/* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ +#define vect_loadu_sil128(p) _mm_loadu_si128(p) + +/* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ +#define vect_set_epi32(i3, i2, i1, i0) _mm_set_epi32(i3, i2, i1, i0) + +#endif /* _TEST_XMMT_OPS_H_ */ diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index 688cfc9..ce3a1d1 100644 --- a/lib/librte_lpm/Makefile +++ b/lib/librte_lpm/Makefile @@ -47,6 +47,8 @@ SRCS-$(CONFIG_RTE_LIBRTE_LPM) := rte_lpm.c rte_lpm6.c # install this header file SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include := rte_lpm.h rte_lpm6.h +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_sse.h + # this lib needs eal DEPDIRS-$(CONFIG_RTE_LIBRTE_LPM) += lib/librte_eal diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index c299ce2..dfe1378 100644 --- a/lib/librte_lpm/rte_lpm.h +++ b/lib/librte_lpm/rte_lpm.h @@ -381,97 +381,10 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t * ips, * if lookup would fail. */ static inline void -rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint16_t hop[4], - uint16_t defv) -{ - __m128i i24; - rte_xmm_t i8; - uint16_t tbl[4]; - uint64_t idx, pt; +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], + uint16_t defv); - const __m128i mask8 = - _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX); - - /* - * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries - * as one 64-bit value (0x0300030003000300). - */ - const uint64_t mask_xv = - ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 16 | - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32 | - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 48); - - /* - * RTE_LPM_LOOKUP_SUCCESS for 4 LPM entries - * as one 64-bit value (0x0100010001000100). - */ - const uint64_t mask_v = - ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 16 | - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 | - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48); - - /* get 4 indexes for tbl24[]. */ - i24 = _mm_srli_epi32(ip, CHAR_BIT); - - /* extract values from tbl24[] */ - idx = _mm_cvtsi128_si64(i24); - i24 = _mm_srli_si128(i24, sizeof(uint64_t)); - - tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; - tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; - - idx = _mm_cvtsi128_si64(i24); - - tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; - tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; - - /* get 4 indexes for tbl8[]. */ - i8.x = _mm_and_si128(ip, mask8); - - pt = (uint64_t)tbl[0] | - (uint64_t)tbl[1] << 16 | - (uint64_t)tbl[2] << 32 | - (uint64_t)tbl[3] << 48; - - /* search successfully finished for all 4 IP addresses. */ - if (likely((pt & mask_xv) == mask_v)) { - uintptr_t ph = (uintptr_t)hop; - *(uint64_t *)ph = pt & RTE_LPM_MASKX4_RES; - return; - } - - if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[0] = i8.u32[0] + - (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl[0] = *(const uint16_t *)&lpm->tbl8[i8.u32[0]]; - } - if (unlikely((pt >> 16 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[1] = i8.u32[1] + - (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl[1] = *(const uint16_t *)&lpm->tbl8[i8.u32[1]]; - } - if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[2] = i8.u32[2] + - (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl[2] = *(const uint16_t *)&lpm->tbl8[i8.u32[2]]; - } - if (unlikely((pt >> 48 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[3] = i8.u32[3] + - (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - tbl[3] = *(const uint16_t *)&lpm->tbl8[i8.u32[3]]; - } - - hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[0] : defv; - hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[1] : defv; - hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[2] : defv; - hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; -} +#include "rte_lpm_sse.h" #ifdef __cplusplus } diff --git a/lib/librte_lpm/rte_lpm_sse.h b/lib/librte_lpm/rte_lpm_sse.h new file mode 100644 index 0000000..2b7eeec --- /dev/null +++ b/lib/librte_lpm/rte_lpm_sse.h @@ -0,0 +1,143 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_LPM_SSE_H_ +#define _RTE_LPM_SSE_H_ + +#include <rte_branch_prediction.h> +#include <rte_byteorder.h> +#include <rte_common.h> +#include <rte_vect.h> + +#ifdef __cplusplus +extern "C" { +#endif + +static inline void +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], + uint16_t defv) +{ + __m128i i24; + rte_xmm_t i8; + uint16_t tbl[4]; + uint64_t idx, pt; + + const __m128i mask8 = + _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX); + + /* + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries + * as one 64-bit value (0x0300030003000300). + */ + const uint64_t mask_xv = + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 16 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 48); + + /* + * RTE_LPM_LOOKUP_SUCCESS for 4 LPM entries + * as one 64-bit value (0x0100010001000100). + */ + const uint64_t mask_v = + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 16 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48); + + /* get 4 indexes for tbl24[]. */ + i24 = _mm_srli_epi32(ip, CHAR_BIT); + + /* extract values from tbl24[] */ + idx = _mm_cvtsi128_si64(i24); + i24 = _mm_srli_si128(i24, sizeof(uint64_t)); + + tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + idx = _mm_cvtsi128_si64(i24); + + tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + /* get 4 indexes for tbl8[]. */ + i8.x = _mm_and_si128(ip, mask8); + + pt = (uint64_t)tbl[0] | + (uint64_t)tbl[1] << 16 | + (uint64_t)tbl[2] << 32 | + (uint64_t)tbl[3] << 48; + + /* search successfully finished for all 4 IP addresses. */ + if (likely((pt & mask_xv) == mask_v)) { + uintptr_t ph = (uintptr_t)hop; + *(uint64_t *)ph = pt & RTE_LPM_MASKX4_RES; + return; + } + + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[0] = i8.u32[0] + + (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[0] = *(const uint16_t *)&lpm->tbl8[i8.u32[0]]; + } + if (unlikely((pt >> 16 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[1] = i8.u32[1] + + (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[1] = *(const uint16_t *)&lpm->tbl8[i8.u32[1]]; + } + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[2] = i8.u32[2] + + (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[2] = *(const uint16_t *)&lpm->tbl8[i8.u32[2]]; + } + if (unlikely((pt >> 48 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[3] = i8.u32[3] + + (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[3] = *(const uint16_t *)&lpm->tbl8[i8.u32[3]]; + } + + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[0] : defv; + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[1] : defv; + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[2] : defv; + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_LPM_SSE_H_ */ -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob @ 2016-03-01 17:42 ` Thomas Monjalon 2016-03-02 6:28 ` Jerin Jacob 0 siblings, 1 reply; 47+ messages in thread From: Thomas Monjalon @ 2016-03-01 17:42 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, viktorin 2016-02-12 17:58, Jerin Jacob: > -Used architecture agnostic xmm_t to represent 128 bit SIMD variable > > -Introduced vect_* API abstraction in app/test to test rte_lpm_lookupx4 > API in architecture agnostic way > > -Moved rte_lpm_lookupx4 SSE implementation to architecture specific > rte_lpm_sse.h file to accommodate new rte_lpm_lookupx4 implementation > for a different architecture. > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > --- > app/test/test_lpm.c | 21 ++++--- > app/test/test_xmmt_ops.h | 47 ++++++++++++++ > lib/librte_lpm/Makefile | 2 + > lib/librte_lpm/rte_lpm.h | 93 +--------------------------- > lib/librte_lpm/rte_lpm_sse.h | 143 +++++++++++++++++++++++++++++++++++++++++++ > 5 files changed, 206 insertions(+), 100 deletions(-) app/test/test_xmmt_ops.h must be added to LPM in MAINTAINERS file. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic 2016-03-01 17:42 ` Thomas Monjalon @ 2016-03-02 6:28 ` Jerin Jacob 0 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2016-03-02 6:28 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev, viktorin On Tue, Mar 01, 2016 at 06:42:35PM +0100, Thomas Monjalon wrote: > 2016-02-12 17:58, Jerin Jacob: > > -Used architecture agnostic xmm_t to represent 128 bit SIMD variable > > > > -Introduced vect_* API abstraction in app/test to test rte_lpm_lookupx4 > > API in architecture agnostic way > > > > -Moved rte_lpm_lookupx4 SSE implementation to architecture specific > > rte_lpm_sse.h file to accommodate new rte_lpm_lookupx4 implementation > > for a different architecture. > > > > Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> > > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > > --- > > app/test/test_lpm.c | 21 ++++--- > > app/test/test_xmmt_ops.h | 47 ++++++++++++++ > > lib/librte_lpm/Makefile | 2 + > > lib/librte_lpm/rte_lpm.h | 93 +--------------------------- > > lib/librte_lpm/rte_lpm_sse.h | 143 +++++++++++++++++++++++++++++++++++++++++++ > > 5 files changed, 206 insertions(+), 100 deletions(-) > > app/test/test_xmmt_ops.h must be added to LPM in MAINTAINERS file. OK. I will add into LPM section like below, --- a/MAINTAINERS +++ b/MAINTAINERS @@ -444,6 +444,7 @@ F: lib/librte_lpm/ F: doc/guides/prog_guide/lpm* F: app/test/test_lpm* F: app/test/test_func_reentrancy.c +F: app/test/test_xmmt_ops.h > > ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v4 2/3] lpm: add support for NEON 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 " Jerin Jacob 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob @ 2016-02-12 12:28 ` Jerin Jacob 2016-03-01 17:46 ` Thomas Monjalon 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob ` (2 subsequent siblings) 4 siblings, 1 reply; 47+ messages in thread From: Jerin Jacob @ 2016-02-12 12:28 UTC (permalink / raw) To: dev; +Cc: viktorin Enabled CONFIG_RTE_LIBRTE_LPM, CONFIG_RTE_LIBRTE_TABLE, CONFIG_RTE_LIBRTE_PIPELINE libraries for arm and arm64 TABLE, PIPELINE libraries were disabled due to LPM library dependency. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org> --- app/test/test_xmmt_ops.h | 20 ++++ config/defconfig_arm-armv7a-linuxapp-gcc | 3 - config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - lib/librte_lpm/Makefile | 4 + lib/librte_lpm/rte_lpm.h | 4 + lib/librte_lpm/rte_lpm_neon.h | 148 +++++++++++++++++++++++++++++ 6 files changed, 176 insertions(+), 6 deletions(-) create mode 100644 lib/librte_lpm/rte_lpm_neon.h diff --git a/app/test/test_xmmt_ops.h b/app/test/test_xmmt_ops.h index c055912..de9c16f 100644 --- a/app/test/test_xmmt_ops.h +++ b/app/test/test_xmmt_ops.h @@ -36,6 +36,24 @@ #include <rte_vect.h> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) + +/* vect_* abstraction implementation using NEON */ + +/* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ +#define vect_loadu_sil128(p) vld1q_s32((const int32_t *)p) + +/* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ +static inline xmm_t __attribute__((always_inline)) +vect_set_epi32(int i3, int i2, int i1, int i0) +{ + int32_t data[4] = {i0, i1, i2, i3}; + + return vld1q_s32(data); +} + +#elif defined(RTE_ARCH_X86) + /* vect_* abstraction implementation using SSE */ /* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ @@ -44,4 +62,6 @@ /* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ #define vect_set_epi32(i3, i2, i1, i0) _mm_set_epi32(i3, i2, i1, i0) +#endif + #endif /* _TEST_XMMT_OPS_H_ */ diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc index cbebd64..efffa1f 100644 --- a/config/defconfig_arm-armv7a-linuxapp-gcc +++ b/config/defconfig_arm-armv7a-linuxapp-gcc @@ -53,9 +53,6 @@ CONFIG_RTE_LIBRTE_KNI=n CONFIG_RTE_EAL_IGB_UIO=n # fails to compile on ARM -CONFIG_RTE_LIBRTE_LPM=n -CONFIG_RTE_LIBRTE_TABLE=n -CONFIG_RTE_LIBRTE_PIPELINE=n CONFIG_RTE_SCHED_VECTOR=n # cannot use those on ARM diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc index eacd01c..52e0c97 100644 --- a/config/defconfig_arm64-armv8a-linuxapp-gcc +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc @@ -49,7 +49,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n CONFIG_RTE_LIBRTE_FM10K_PMD=n CONFIG_RTE_LIBRTE_I40E_PMD=n -CONFIG_RTE_LIBRTE_LPM=n -CONFIG_RTE_LIBRTE_TABLE=n -CONFIG_RTE_LIBRTE_PIPELINE=n CONFIG_RTE_SCHED_VECTOR=n diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index ce3a1d1..656ade2 100644 --- a/lib/librte_lpm/Makefile +++ b/lib/librte_lpm/Makefile @@ -47,7 +47,11 @@ SRCS-$(CONFIG_RTE_LIBRTE_LPM) := rte_lpm.c rte_lpm6.c # install this header file SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include := rte_lpm.h rte_lpm6.h +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),) +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_neon.h +else ifeq ($(CONFIG_RTE_ARCH_X86),y) SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_sse.h +endif # this lib needs eal DEPDIRS-$(CONFIG_RTE_LIBRTE_LPM) += lib/librte_eal diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index dfe1378..2c34a25 100644 --- a/lib/librte_lpm/rte_lpm.h +++ b/lib/librte_lpm/rte_lpm.h @@ -384,7 +384,11 @@ static inline void rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], uint16_t defv); +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) +#include "rte_lpm_neon.h" +#elif defined(RTE_ARCH_X86) #include "rte_lpm_sse.h" +#endif #ifdef __cplusplus } diff --git a/lib/librte_lpm/rte_lpm_neon.h b/lib/librte_lpm/rte_lpm_neon.h new file mode 100644 index 0000000..fcd2a8a --- /dev/null +++ b/lib/librte_lpm/rte_lpm_neon.h @@ -0,0 +1,148 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Cavium Networks. All rights reserved. + * All rights reserved. + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Derived rte_lpm_lookupx4 implementation from lib/librte_lpm/rte_lpm_sse.h + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Cavium Networks nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_LPM_NEON_H_ +#define _RTE_LPM_NEON_H_ + +#include <rte_branch_prediction.h> +#include <rte_byteorder.h> +#include <rte_memory.h> +#include <rte_common.h> +#include <rte_vect.h> + +#ifdef __cplusplus +extern "C" { +#endif + +static inline void +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint16_t hop[4], + uint16_t defv) +{ + uint32x4_t i24; + rte_xmm_t i8; + uint16_t tbl[4]; + uint64_t idx, pt; + + const uint32_t mask = UINT8_MAX; + const int32x4_t mask8 = vdupq_n_s32(mask); + + /* + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 4 LPM entries + * as one 64-bit value (0x0300030003000300). + */ + const uint64_t mask_xv = + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 16 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32 | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 48); + + /* + * RTE_LPM_LOOKUP_SUCCESS for 4 LPM entries + * as one 64-bit value (0x0100010001000100). + */ + const uint64_t mask_v = + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 16 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32 | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 48); + + /* get 4 indexes for tbl24[]. */ + i24 = vshrq_n_u32((uint32x4_t)ip, CHAR_BIT); + + /* extract values from tbl24[] */ + idx = vgetq_lane_u64((uint64x2_t)i24, 0); + + tbl[0] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[1] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + idx = vgetq_lane_u64((uint64x2_t)i24, 1); + + tbl[2] = *(const uint16_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[3] = *(const uint16_t *)&lpm->tbl24[idx >> 32]; + + /* get 4 indexes for tbl8[]. */ + i8.x = vandq_s32(ip, mask8); + + pt = (uint64_t)tbl[0] | + (uint64_t)tbl[1] << 16 | + (uint64_t)tbl[2] << 32 | + (uint64_t)tbl[3] << 48; + + /* search successfully finished for all 4 IP addresses. */ + if (likely((pt & mask_xv) == mask_v)) { + uintptr_t ph = (uintptr_t)hop; + *(uint64_t *)ph = pt & RTE_LPM_MASKX4_RES; + return; + } + + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[0] = i8.u32[0] + + (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[0] = *(const uint16_t *)&lpm->tbl8[i8.u32[0]]; + } + if (unlikely((pt >> 16 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[1] = i8.u32[1] + + (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[1] = *(const uint16_t *)&lpm->tbl8[i8.u32[1]]; + } + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[2] = i8.u32[2] + + (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[2] = *(const uint16_t *)&lpm->tbl8[i8.u32[2]]; + } + if (unlikely((pt >> 48 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[3] = i8.u32[3] + + (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + tbl[3] = *(const uint16_t *)&lpm->tbl8[i8.u32[3]]; + } + + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[0] : defv; + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[1] : defv; + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[2] : defv; + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? (uint8_t)tbl[3] : defv; +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_LPM_NEON_H_ */ -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/3] lpm: add support for NEON 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 2/3] lpm: add support for NEON Jerin Jacob @ 2016-03-01 17:46 ` Thomas Monjalon 2016-03-02 6:45 ` Jerin Jacob 0 siblings, 1 reply; 47+ messages in thread From: Thomas Monjalon @ 2016-03-01 17:46 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, viktorin 2016-02-12 17:58, Jerin Jacob: > # fails to compile on ARM > -CONFIG_RTE_LIBRTE_LPM=n > -CONFIG_RTE_LIBRTE_TABLE=n > -CONFIG_RTE_LIBRTE_PIPELINE=n The associated examples cannot compile. Maybe it's too early to enable them. What about updating the comment to state that only examples fail? > --- a/lib/librte_lpm/Makefile > +++ b/lib/librte_lpm/Makefile > +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),) > +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_neon.h Simpler: ifneq ($(CONFIG_RTE_ARCH_ARM)$(CONFIG_RTE_ARCH_ARM64),nn) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v4 2/3] lpm: add support for NEON 2016-03-01 17:46 ` Thomas Monjalon @ 2016-03-02 6:45 ` Jerin Jacob 0 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2016-03-02 6:45 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev, viktorin On Tue, Mar 01, 2016 at 06:46:04PM +0100, Thomas Monjalon wrote: > 2016-02-12 17:58, Jerin Jacob: > > # fails to compile on ARM > > -CONFIG_RTE_LIBRTE_LPM=n > > -CONFIG_RTE_LIBRTE_TABLE=n > > -CONFIG_RTE_LIBRTE_PIPELINE=n > > The associated examples cannot compile. > Maybe it's too early to enable them. > What about updating the comment to state that only examples fail? Not sure where to comment it though. The only l3fwd build is failing on arm64 due to insane use of SSE intrinsics with out proper abstraction in recent l3fwd rework. l3fwd was building earlier with a minor change; Now it looks like it needs reasonable cycles to fix it properly. > > > --- a/lib/librte_lpm/Makefile > > +++ b/lib/librte_lpm/Makefile > > +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),) > > +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_neon.h > > Simpler: > ifneq ($(CONFIG_RTE_ARCH_ARM)$(CONFIG_RTE_ARCH_ARM64),nn) I will change it in next version ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v4 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 " Jerin Jacob 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 2/3] lpm: add support for NEON Jerin Jacob @ 2016-02-12 12:28 ` Jerin Jacob 2016-03-01 17:47 ` Thomas Monjalon 2016-02-16 13:27 ` [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON Kobylinski, MichalX 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 " Jerin Jacob 4 siblings, 1 reply; 47+ messages in thread From: Jerin Jacob @ 2016-02-12 12:28 UTC (permalink / raw) To: dev; +Cc: viktorin Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> --- MAINTAINERS | 3 +++ 1 file changed, 3 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index b90aeea..e3fab58 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -138,6 +138,9 @@ M: Jerin Jacob <jerin.jacob@caviumnetworks.com> M: Jianbo Liu <jianbo.liu@linaro.org> F: lib/librte_eal/common/include/arch/arm/*_64.h F: lib/librte_acl/acl_run_neon.* +F: lib/librte_lpm/rte_lpm_neon.h +F: lib/librte_hash/rte_crc_arm64.h +F: lib/librte_hash/rte_cmp_arm64.h EZchip TILE-Gx M: Zhigang Lu <zlu@ezchip.com> -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob @ 2016-03-01 17:47 ` Thomas Monjalon 2016-03-02 6:46 ` Jerin Jacob 0 siblings, 1 reply; 47+ messages in thread From: Thomas Monjalon @ 2016-03-01 17:47 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, viktorin 2016-02-12 17:58, Jerin Jacob: > +F: lib/librte_lpm/rte_lpm_neon.h This line should be in the previous patch. > +F: lib/librte_hash/rte_crc_arm64.h > +F: lib/librte_hash/rte_cmp_arm64.h Yes, hash for ARM was forgotten. Please add a Fixes: line to refer to the arm enablement of librte_hash. ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v4 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm 2016-03-01 17:47 ` Thomas Monjalon @ 2016-03-02 6:46 ` Jerin Jacob 0 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2016-03-02 6:46 UTC (permalink / raw) To: Thomas Monjalon; +Cc: dev, viktorin On Tue, Mar 01, 2016 at 06:47:46PM +0100, Thomas Monjalon wrote: > 2016-02-12 17:58, Jerin Jacob: > > +F: lib/librte_lpm/rte_lpm_neon.h > > This line should be in the previous patch. Will fix in v5 > > > +F: lib/librte_hash/rte_crc_arm64.h > > +F: lib/librte_hash/rte_cmp_arm64.h > > Yes, hash for ARM was forgotten. > Please add a Fixes: line to refer to the arm enablement of librte_hash. Will fix in v5 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 " Jerin Jacob ` (2 preceding siblings ...) 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob @ 2016-02-16 13:27 ` Kobylinski, MichalX 2016-02-16 16:44 ` Jerin Jacob 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 " Jerin Jacob 4 siblings, 1 reply; 47+ messages in thread From: Kobylinski, MichalX @ 2016-02-16 13:27 UTC (permalink / raw) To: Jerin Jacob, dev; +Cc: viktorin > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob > Sent: Friday, February 12, 2016 1:29 PM > To: dev@dpdk.org > Cc: viktorin@rehivetech.com > Subject: [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON > > - This patch enables lpm for ARM > - Used architecture agnostic xmm_t to represent 128 bit SIMD variable in > rte_lpm_lookupx4 API definition > - Tested on Juno and Thunderx boards > - Tested and verified the changes with following DPDK unit test cases > --lpm_autotest > --lpm6_autotest > v1..v2 > - make rte_lpm_lookupx4 API definition architecture agnostic > - vect_* abstraction scope reduce to only app/test as this abstraction used only > to load/store and set vectors in test application which is the consumer of > rte_lpm_lookupx4 like API > - support for armv7 apart from armv8 > - taken changes from Jianbo's lpm patches > > v2..v3 > - add Acked-by for 0001-lpm-make-rte_lpm_lookupx4-API-definition- > architectur.patch > - re-based to DPDK 2.2 > -- fixed the conflict in config/defconfig_arm-armv7a-linuxapp-gcc and > MAINTAINERS file > > v3..v4 > -Instead of defaulting the lpm implementation to SSE, SSE implementation kept > under RTE_ARCH_X86 conditional compilation check as suggested by Thomas > > Jerin Jacob (3): > lpm: make rte_lpm_lookupx4 API definition architecture agnostic > lpm: add support for NEON > maintainers: claim responsibility for arm64 specific files of hash and > lpm > > MAINTAINERS | 3 + > app/test/test_lpm.c | 21 ++-- > app/test/test_xmmt_ops.h | 67 +++++++++++++ > config/defconfig_arm-armv7a-linuxapp-gcc | 3 - > config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - > lib/librte_lpm/Makefile | 6 ++ > lib/librte_lpm/rte_lpm.h | 99 ++----------------- > lib/librte_lpm/rte_lpm_neon.h | 148 +++++++++++++++++++++++++++++ > lib/librte_lpm/rte_lpm_sse.h | 143 ++++++++++++++++++++++++++++ > 9 files changed, 386 insertions(+), 107 deletions(-) create mode 100644 > app/test/test_xmmt_ops.h create mode 100644 lib/librte_lpm/rte_lpm_neon.h > create mode 100644 lib/librte_lpm/rte_lpm_sse.h > > -- > 2.1.0 Hi Jerin, Are you planning increase next_hop field for ARM? I extended next_hop field from 8 bits to 24 bits and created structure to configure LPM for x86. Please look at my patchset with proposal increase next_hop field and structure to configure. http://patchwork.dpdk.org/dev/patchwork/patch/10249/ http://patchwork.dpdk.org/dev/patchwork/patch/10250/ Best Regards, Michal ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON 2016-02-16 13:27 ` [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON Kobylinski, MichalX @ 2016-02-16 16:44 ` Jerin Jacob 2016-02-18 10:26 ` Kobylinski, MichalX 0 siblings, 1 reply; 47+ messages in thread From: Jerin Jacob @ 2016-02-16 16:44 UTC (permalink / raw) To: Kobylinski, MichalX; +Cc: dev, viktorin On Tue, Feb 16, 2016 at 01:27:02PM +0000, Kobylinski, MichalX wrote: > > > > -----Original Message----- > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob > > Sent: Friday, February 12, 2016 1:29 PM > > To: dev@dpdk.org > > Cc: viktorin@rehivetech.com > > Subject: [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON > > [snip] > > > > Jerin Jacob (3): > > lpm: make rte_lpm_lookupx4 API definition architecture agnostic > > lpm: add support for NEON > > maintainers: claim responsibility for arm64 specific files of hash and > > lpm > > > > MAINTAINERS | 3 + > > app/test/test_lpm.c | 21 ++-- > > app/test/test_xmmt_ops.h | 67 +++++++++++++ > > config/defconfig_arm-armv7a-linuxapp-gcc | 3 - > > config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - > > lib/librte_lpm/Makefile | 6 ++ > > lib/librte_lpm/rte_lpm.h | 99 ++----------------- > > lib/librte_lpm/rte_lpm_neon.h | 148 +++++++++++++++++++++++++++++ > > lib/librte_lpm/rte_lpm_sse.h | 143 ++++++++++++++++++++++++++++ > > 9 files changed, 386 insertions(+), 107 deletions(-) create mode 100644 > > app/test/test_xmmt_ops.h create mode 100644 lib/librte_lpm/rte_lpm_neon.h > > create mode 100644 lib/librte_lpm/rte_lpm_sse.h > > > > -- > > 2.1.0 > > Hi Jerin, Hi Michal, > Are you planning increase next_hop field for ARM? I extended next_hop field from 8 bits to 24 bits and created structure to configure LPM for x86. Yes, I am planning to increase next_hop field for ARM as a separate patch. Let this base patchset get merges. I will make ARM specific changes for your new feature in 'rte_lpm_lookupx4' as a separate patch on top of your series. So that in case if I want to go back to 8 bit then I can do it Jerin > Please look at my patchset with proposal increase next_hop field and structure to configure. > > http://patchwork.dpdk.org/dev/patchwork/patch/10249/ > http://patchwork.dpdk.org/dev/patchwork/patch/10250/ > > Best Regards, > Michal > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON 2016-02-16 16:44 ` Jerin Jacob @ 2016-02-18 10:26 ` Kobylinski, MichalX 2016-02-19 0:34 ` Jerin Jacob 0 siblings, 1 reply; 47+ messages in thread From: Kobylinski, MichalX @ 2016-02-18 10:26 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, viktorin > -----Original Message----- > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com] > Sent: Tuesday, February 16, 2016 5:44 PM > To: Kobylinski, MichalX <michalx.kobylinski@intel.com> > Cc: dev@dpdk.org; viktorin@rehivetech.com > Subject: Re: [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON > Importance: High > > On Tue, Feb 16, 2016 at 01:27:02PM +0000, Kobylinski, MichalX wrote: > > > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob > > > Sent: Friday, February 12, 2016 1:29 PM > > > To: dev@dpdk.org > > > Cc: viktorin@rehivetech.com > > > Subject: [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON > > > > [snip] > > > > > > Jerin Jacob (3): > > > lpm: make rte_lpm_lookupx4 API definition architecture agnostic > > > lpm: add support for NEON > > > maintainers: claim responsibility for arm64 specific files of hash and > > > lpm > > > > > > MAINTAINERS | 3 + > > > app/test/test_lpm.c | 21 ++-- > > > app/test/test_xmmt_ops.h | 67 +++++++++++++ > > > config/defconfig_arm-armv7a-linuxapp-gcc | 3 - > > > config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - > > > lib/librte_lpm/Makefile | 6 ++ > > > lib/librte_lpm/rte_lpm.h | 99 ++----------------- > > > lib/librte_lpm/rte_lpm_neon.h | 148 > +++++++++++++++++++++++++++++ > > > lib/librte_lpm/rte_lpm_sse.h | 143 > ++++++++++++++++++++++++++++ > > > 9 files changed, 386 insertions(+), 107 deletions(-) create mode > > > 100644 app/test/test_xmmt_ops.h create mode 100644 > > > lib/librte_lpm/rte_lpm_neon.h create mode 100644 > > > lib/librte_lpm/rte_lpm_sse.h > > > > > > -- > > > 2.1.0 > > > > Hi Jerin, > > Hi Michal, > > > Are you planning increase next_hop field for ARM? I extended next_hop field > from 8 bits to 24 bits and created structure to configure LPM for x86. > > Yes, I am planning to increase next_hop field for ARM as a separate patch. Let > this base patchset get merges. > > I will make ARM specific changes for your new feature in 'rte_lpm_lookupx4' as > a separate patch on top of your series. > So that in case if I want to go back to 8 bit then I can do it > > Jerin Thank you for your answer. Do you prepare separate patch with changes for ARM architecture on the top my series? If you want I can support you with prepare new patch. Michal > > > Please look at my patchset with proposal increase next_hop field and structure > to configure. > > > > http://patchwork.dpdk.org/dev/patchwork/patch/10249/ > > http://patchwork.dpdk.org/dev/patchwork/patch/10250/ > > > > Best Regards, > > Michal > > ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON 2016-02-18 10:26 ` Kobylinski, MichalX @ 2016-02-19 0:34 ` Jerin Jacob 0 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2016-02-19 0:34 UTC (permalink / raw) To: Kobylinski, MichalX; +Cc: dev, viktorin On Thu, Feb 18, 2016 at 10:26:44AM +0000, Kobylinski, MichalX wrote: > > > > -----Original Message----- > > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com] > > Sent: Tuesday, February 16, 2016 5:44 PM > > To: Kobylinski, MichalX <michalx.kobylinski@intel.com> > > Cc: dev@dpdk.org; viktorin@rehivetech.com > > Subject: Re: [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON > > Importance: High > > > > On Tue, Feb 16, 2016 at 01:27:02PM +0000, Kobylinski, MichalX wrote: > > > > > > > > > > -----Original Message----- > > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Jerin Jacob > > > > Sent: Friday, February 12, 2016 1:29 PM > > > > To: dev@dpdk.org > > > > Cc: viktorin@rehivetech.com > > > > Subject: [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON > > > > > > [snip] > > > > > > > > Jerin Jacob (3): > > > > lpm: make rte_lpm_lookupx4 API definition architecture agnostic > > > > lpm: add support for NEON > > > > maintainers: claim responsibility for arm64 specific files of hash and > > > > lpm > > > > > > > > MAINTAINERS | 3 + > > > > app/test/test_lpm.c | 21 ++-- > > > > app/test/test_xmmt_ops.h | 67 +++++++++++++ > > > > config/defconfig_arm-armv7a-linuxapp-gcc | 3 - > > > > config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - > > > > lib/librte_lpm/Makefile | 6 ++ > > > > lib/librte_lpm/rte_lpm.h | 99 ++----------------- > > > > lib/librte_lpm/rte_lpm_neon.h | 148 > > +++++++++++++++++++++++++++++ > > > > lib/librte_lpm/rte_lpm_sse.h | 143 > > ++++++++++++++++++++++++++++ > > > > 9 files changed, 386 insertions(+), 107 deletions(-) create mode > > > > 100644 app/test/test_xmmt_ops.h create mode 100644 > > > > lib/librte_lpm/rte_lpm_neon.h create mode 100644 > > > > lib/librte_lpm/rte_lpm_sse.h > > > > > > > > -- > > > > 2.1.0 > > > > > > Hi Jerin, > > > > Hi Michal, > > > > > Are you planning increase next_hop field for ARM? I extended next_hop field > > from 8 bits to 24 bits and created structure to configure LPM for x86. > > > > Yes, I am planning to increase next_hop field for ARM as a separate patch. Let > > this base patchset get merges. > > > > I will make ARM specific changes for your new feature in 'rte_lpm_lookupx4' as > > a separate patch on top of your series. > > So that in case if I want to go back to 8 bit then I can do it > > > > Jerin > > Thank you for your answer. > Do you prepare separate patch with changes for ARM architecture on the top my series? > If you want I can support you with prepare new patch. Yes, Can you rebase your patch with this patch(add lpm support for NEON). I can fill in ARM specific changes of 'rte_lpm_lookupx4' as a seperate patch on top it. Jerin > > Michal > > > > > > Please look at my patchset with proposal increase next_hop field and structure > > to configure. > > > > > > http://patchwork.dpdk.org/dev/patchwork/patch/10249/ > > > http://patchwork.dpdk.org/dev/patchwork/patch/10250/ > > > > > > Best Regards, > > > Michal > > > ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v5 0/3] add lpm support for NEON 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 " Jerin Jacob ` (3 preceding siblings ...) 2016-02-16 13:27 ` [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON Kobylinski, MichalX @ 2016-03-11 3:52 ` Jerin Jacob 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob ` (3 more replies) 4 siblings, 4 replies; 47+ messages in thread From: Jerin Jacob @ 2016-03-11 3:52 UTC (permalink / raw) To: dev; +Cc: viktorin - This patch enables lpm for ARM - Used architecture agnostic xmm_t to represent 128 bit SIMD variable in rte_lpm_lookupx4 API definition - Tested on Juno and Thunderx boards - Tested and verified the changes with following DPDK unit test cases --lpm_autotest --lpm6_autotest v1..v2 - make rte_lpm_lookupx4 API definition architecture agnostic - vect_* abstraction scope reduce to only app/test as this abstraction used only to load/store and set vectors in test application which is the consumer of rte_lpm_lookupx4 like API - support for armv7 apart from armv8 - taken changes from Jianbo's lpm patches v2..v3 - add Acked-by for 0001-lpm-make-rte_lpm_lookupx4-API-definition- architectur.patch - re-based to DPDK 2.2 -- fixed the conflict in config/defconfig_arm-armv7a-linuxapp-gcc and MAINTAINERS file v3..v4 -Instead of defaulting the lpm implementation to SSE, SSE implementation kept under RTE_ARCH_X86 conditional compilation check as suggested by Thomas v4..v5 - Rebase the series based on Michal's "Increased number of next hops for LPM IPv4" patch - Added the changes suggested by Thomas --http://dpdk.org/dev/patchwork/patch/10478/ --http://dpdk.org/dev/patchwork/patch/10480/ Jerin Jacob (3): lpm: make rte_lpm_lookupx4 API definition architecture agnostic lpm: add support for NEON Maintainers: claim responsibility for arm64 specific files of hash MAINTAINERS | 4 + app/test/test_lpm.c | 21 ++-- app/test/test_xmmt_ops.h | 67 +++++++++++++ config/defconfig_arm-armv7a-linuxapp-gcc | 3 - config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - lib/librte_lpm/Makefile | 6 ++ lib/librte_lpm/rte_lpm.h | 105 ++------------------ lib/librte_lpm/rte_lpm_neon.h | 153 +++++++++++++++++++++++++++++ lib/librte_lpm/rte_lpm_sse.h | 149 ++++++++++++++++++++++++++++ 9 files changed, 398 insertions(+), 113 deletions(-) create mode 100644 app/test/test_xmmt_ops.h create mode 100644 lib/librte_lpm/rte_lpm_neon.h create mode 100644 lib/librte_lpm/rte_lpm_sse.h -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v5 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 " Jerin Jacob @ 2016-03-11 3:52 ` Jerin Jacob 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 2/3] lpm: add support for NEON Jerin Jacob ` (2 subsequent siblings) 3 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2016-03-11 3:52 UTC (permalink / raw) To: dev; +Cc: viktorin -Used architecture agnostic xmm_t to represent 128 bit SIMD variable -Introduced vect_* API abstraction in app/test to test rte_lpm_lookupx4 API in architecture agnostic way -Moved rte_lpm_lookupx4 SSE implementation to architecture specific rte_lpm_sse.h file to accommodate new rte_lpm_lookupx4 implementation for a different architecture. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- MAINTAINERS | 1 + app/test/test_lpm.c | 21 +++--- app/test/test_xmmt_ops.h | 47 ++++++++++++++ lib/librte_lpm/Makefile | 1 + lib/librte_lpm/rte_lpm.h | 99 +--------------------------- lib/librte_lpm/rte_lpm_sse.h | 149 +++++++++++++++++++++++++++++++++++++++++++ 6 files changed, 212 insertions(+), 106 deletions(-) create mode 100644 app/test/test_xmmt_ops.h create mode 100644 lib/librte_lpm/rte_lpm_sse.h diff --git a/MAINTAINERS b/MAINTAINERS index 59e981f..fc03ce8 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -458,6 +458,7 @@ F: lib/librte_lpm/ F: doc/guides/prog_guide/lpm* F: app/test/test_lpm* F: app/test/test_func_reentrancy.c +F: app/test/test_xmmt_ops.h Traffic metering M: Cristian Dumitrescu <cristian.dumitrescu@intel.com> diff --git a/app/test/test_lpm.c b/app/test/test_lpm.c index aaf95ec..40fbbc6 100644 --- a/app/test/test_lpm.c +++ b/app/test/test_lpm.c @@ -49,6 +49,7 @@ #include "rte_lpm.h" #include "test_lpm_routes.h" +#include "test_xmmt_ops.h" #define TEST_LPM_ASSERT(cond) do { \ if (!(cond)) { \ @@ -345,7 +346,7 @@ test6(void) int32_t test7(void) { - __m128i ipx4; + xmm_t ipx4; uint32_t hop[4]; struct rte_lpm *lpm = NULL; struct rte_lpm_config config; @@ -366,7 +367,7 @@ test7(void) status = rte_lpm_lookup(lpm, ip, &next_hop_return); TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip, ip + 0x100, ip - 0x100, ip); + ipx4 = vect_set_epi32(ip, ip + 0x100, ip - 0x100, ip); rte_lpm_lookupx4(lpm, ipx4, hop, UINT32_MAX); TEST_LPM_ASSERT(hop[0] == next_hop_add); TEST_LPM_ASSERT(hop[1] == UINT32_MAX); @@ -396,7 +397,7 @@ test7(void) int32_t test8(void) { - __m128i ipx4; + xmm_t ipx4; uint32_t hop[4]; struct rte_lpm *lpm = NULL; struct rte_lpm_config config; @@ -428,7 +429,7 @@ test8(void) TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip2, ip1, ip2, ip1); + ipx4 = vect_set_epi32(ip2, ip1, ip2, ip1); rte_lpm_lookupx4(lpm, ipx4, hop, UINT32_MAX); TEST_LPM_ASSERT(hop[0] == UINT32_MAX); TEST_LPM_ASSERT(hop[1] == next_hop_add); @@ -455,7 +456,7 @@ test8(void) status = rte_lpm_lookup(lpm, ip1, &next_hop_return); TEST_LPM_ASSERT(status == -ENOENT); - ipx4 = _mm_set_epi32(ip1, ip1, ip2, ip2); + ipx4 = vect_set_epi32(ip1, ip1, ip2, ip2); rte_lpm_lookupx4(lpm, ipx4, hop, UINT32_MAX); if (depth != 1) { TEST_LPM_ASSERT(hop[0] == next_hop_add); @@ -912,7 +913,7 @@ test11(void) int32_t test12(void) { - __m128i ipx4; + xmm_t ipx4; uint32_t hop[4]; struct rte_lpm *lpm = NULL; struct rte_lpm_config config; @@ -939,7 +940,7 @@ test12(void) TEST_LPM_ASSERT((status == 0) && (next_hop_return == next_hop_add)); - ipx4 = _mm_set_epi32(ip, ip + 1, ip, ip - 1); + ipx4 = vect_set_epi32(ip, ip + 1, ip, ip - 1); rte_lpm_lookupx4(lpm, ipx4, hop, UINT32_MAX); TEST_LPM_ASSERT(hop[0] == UINT32_MAX); TEST_LPM_ASSERT(hop[1] == next_hop_add); @@ -1386,10 +1387,10 @@ perf_test(void) begin = rte_rdtsc(); for (j = 0; j < BATCH_SIZE; j += RTE_DIM(next_hops)) { unsigned k; - __m128i ipx4; + xmm_t ipx4; - ipx4 = _mm_loadu_si128((__m128i *)(ip_batch + j)); - ipx4 = *(__m128i *)(ip_batch + j); + ipx4 = vect_loadu_sil128((xmm_t *)(ip_batch + j)); + ipx4 = *(xmm_t *)(ip_batch + j); rte_lpm_lookupx4(lpm, ipx4, next_hops, UINT32_MAX); for (k = 0; k < RTE_DIM(next_hops); k++) if (unlikely(next_hops[k] == UINT32_MAX)) diff --git a/app/test/test_xmmt_ops.h b/app/test/test_xmmt_ops.h new file mode 100644 index 0000000..c055912 --- /dev/null +++ b/app/test/test_xmmt_ops.h @@ -0,0 +1,47 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Cavium Networks. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Cavium Networks nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _TEST_XMMT_OPS_H_ +#define _TEST_XMMT_OPS_H_ + +#include <rte_vect.h> + +/* vect_* abstraction implementation using SSE */ + +/* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ +#define vect_loadu_sil128(p) _mm_loadu_si128(p) + +/* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ +#define vect_set_epi32(i3, i2, i1, i0) _mm_set_epi32(i3, i2, i1, i0) + +#endif /* _TEST_XMMT_OPS_H_ */ diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index 688cfc9..aa51fe4 100644 --- a/lib/librte_lpm/Makefile +++ b/lib/librte_lpm/Makefile @@ -46,6 +46,7 @@ SRCS-$(CONFIG_RTE_LIBRTE_LPM) := rte_lpm.c rte_lpm6.c # install this header file SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include := rte_lpm.h rte_lpm6.h +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_sse.h # this lib needs eal DEPDIRS-$(CONFIG_RTE_LIBRTE_LPM) += lib/librte_eal diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index c2b429f..cc55439 100644 --- a/lib/librte_lpm/rte_lpm.h +++ b/lib/librte_lpm/rte_lpm.h @@ -475,103 +475,10 @@ rte_lpm_lookup_bulk_func(const struct rte_lpm *lpm, const uint32_t *ips, * if lookup would fail. */ static inline void -rte_lpm_lookupx4(const struct rte_lpm *lpm, __m128i ip, uint32_t hop[4], - uint32_t defv) -{ - __m128i i24; - rte_xmm_t i8; - uint32_t tbl[4]; - uint64_t idx, pt, pt2; - const uint32_t *ptbl; +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], + uint32_t defv); - const __m128i mask8 = - _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX); - - /* - * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 2 LPM entries - * as one 64-bit value (0x0300000003000000). - */ - const uint64_t mask_xv = - ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | - (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32); - - /* - * RTE_LPM_LOOKUP_SUCCESS for 2 LPM entries - * as one 64-bit value (0x0100000001000000). - */ - const uint64_t mask_v = - ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | - (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32); - - /* get 4 indexes for tbl24[]. */ - i24 = _mm_srli_epi32(ip, CHAR_BIT); - - /* extract values from tbl24[] */ - idx = _mm_cvtsi128_si64(i24); - i24 = _mm_srli_si128(i24, sizeof(uint64_t)); - - ptbl = (const uint32_t *)&lpm->tbl24[(uint32_t)idx]; - tbl[0] = *ptbl; - ptbl = (const uint32_t *)&lpm->tbl24[idx >> 32]; - tbl[1] = *ptbl; - - idx = _mm_cvtsi128_si64(i24); - - ptbl = (const uint32_t *)&lpm->tbl24[(uint32_t)idx]; - tbl[2] = *ptbl; - ptbl = (const uint32_t *)&lpm->tbl24[idx >> 32]; - tbl[3] = *ptbl; - - /* get 4 indexes for tbl8[]. */ - i8.x = _mm_and_si128(ip, mask8); - - pt = (uint64_t)tbl[0] | - (uint64_t)tbl[1] << 32; - pt2 = (uint64_t)tbl[2] | - (uint64_t)tbl[3] << 32; - - /* search successfully finished for all 4 IP addresses. */ - if (likely((pt & mask_xv) == mask_v) && - likely((pt2 & mask_xv) == mask_v)) { - *(uint64_t *)hop = pt & RTE_LPM_MASKX4_RES; - *(uint64_t *)(hop + 2) = pt2 & RTE_LPM_MASKX4_RES; - return; - } - - if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[0] = i8.u32[0] + - (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[0]]; - tbl[0] = *ptbl; - } - if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[1] = i8.u32[1] + - (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[1]]; - tbl[1] = *ptbl; - } - if (unlikely((pt2 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[2] = i8.u32[2] + - (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[2]]; - tbl[2] = *ptbl; - } - if (unlikely((pt2 >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == - RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { - i8.u32[3] = i8.u32[3] + - (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; - ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[3]]; - tbl[3] = *ptbl; - } - - hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[0] & 0x00FFFFFF : defv; - hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[1] & 0x00FFFFFF : defv; - hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[2] & 0x00FFFFFF : defv; - hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[3] & 0x00FFFFFF : defv; -} +#include "rte_lpm_sse.h" #ifdef __cplusplus } diff --git a/lib/librte_lpm/rte_lpm_sse.h b/lib/librte_lpm/rte_lpm_sse.h new file mode 100644 index 0000000..da83099 --- /dev/null +++ b/lib/librte_lpm/rte_lpm_sse.h @@ -0,0 +1,149 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Intel Corporation nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_LPM_SSE_H_ +#define _RTE_LPM_SSE_H_ + +#include <rte_branch_prediction.h> +#include <rte_byteorder.h> +#include <rte_common.h> +#include <rte_vect.h> + +#ifdef __cplusplus +extern "C" { +#endif + +static inline void +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], + uint32_t defv) +{ + __m128i i24; + rte_xmm_t i8; + uint32_t tbl[4]; + uint64_t idx, pt, pt2; + const uint32_t *ptbl; + + const __m128i mask8 = + _mm_set_epi32(UINT8_MAX, UINT8_MAX, UINT8_MAX, UINT8_MAX); + + /* + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 2 LPM entries + * as one 64-bit value (0x0300000003000000). + */ + const uint64_t mask_xv = + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32); + + /* + * RTE_LPM_LOOKUP_SUCCESS for 2 LPM entries + * as one 64-bit value (0x0100000001000000). + */ + const uint64_t mask_v = + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32); + + /* get 4 indexes for tbl24[]. */ + i24 = _mm_srli_epi32(ip, CHAR_BIT); + + /* extract values from tbl24[] */ + idx = _mm_cvtsi128_si64(i24); + i24 = _mm_srli_si128(i24, sizeof(uint64_t)); + + ptbl = (const uint32_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[0] = *ptbl; + ptbl = (const uint32_t *)&lpm->tbl24[idx >> 32]; + tbl[1] = *ptbl; + + idx = _mm_cvtsi128_si64(i24); + + ptbl = (const uint32_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[2] = *ptbl; + ptbl = (const uint32_t *)&lpm->tbl24[idx >> 32]; + tbl[3] = *ptbl; + + /* get 4 indexes for tbl8[]. */ + i8.x = _mm_and_si128(ip, mask8); + + pt = (uint64_t)tbl[0] | + (uint64_t)tbl[1] << 32; + pt2 = (uint64_t)tbl[2] | + (uint64_t)tbl[3] << 32; + + /* search successfully finished for all 4 IP addresses. */ + if (likely((pt & mask_xv) == mask_v) && + likely((pt2 & mask_xv) == mask_v)) { + *(uint64_t *)hop = pt & RTE_LPM_MASKX4_RES; + *(uint64_t *)(hop + 2) = pt2 & RTE_LPM_MASKX4_RES; + return; + } + + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[0] = i8.u32[0] + + (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[0]]; + tbl[0] = *ptbl; + } + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[1] = i8.u32[1] + + (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[1]]; + tbl[1] = *ptbl; + } + if (unlikely((pt2 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[2] = i8.u32[2] + + (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[2]]; + tbl[2] = *ptbl; + } + if (unlikely((pt2 >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[3] = i8.u32[3] + + (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[3]]; + tbl[3] = *ptbl; + } + + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[0] & 0x00FFFFFF : defv; + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[1] & 0x00FFFFFF : defv; + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[2] & 0x00FFFFFF : defv; + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[3] & 0x00FFFFFF : defv; +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_LPM_SSE_H_ */ -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v5 2/3] lpm: add support for NEON 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 " Jerin Jacob 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob @ 2016-03-11 3:52 ` Jerin Jacob 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 3/3] Maintainers: claim responsibility for arm64 specific files of hash Jerin Jacob 2016-03-11 14:24 ` [dpdk-dev] [PATCH v5 0/3] add lpm support for NEON Thomas Monjalon 3 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2016-03-11 3:52 UTC (permalink / raw) To: dev; +Cc: viktorin Enabled CONFIG_RTE_LIBRTE_LPM, CONFIG_RTE_LIBRTE_TABLE, CONFIG_RTE_LIBRTE_PIPELINE libraries for arm and arm64 TABLE, PIPELINE libraries were disabled due to LPM library dependency. Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> Signed-off-by: Jianbo Liu <jianbo.liu@linaro.org> --- MAINTAINERS | 1 + app/test/test_xmmt_ops.h | 20 ++++ config/defconfig_arm-armv7a-linuxapp-gcc | 3 - config/defconfig_arm64-armv8a-linuxapp-gcc | 3 - lib/librte_lpm/Makefile | 5 + lib/librte_lpm/rte_lpm.h | 4 + lib/librte_lpm/rte_lpm_neon.h | 153 +++++++++++++++++++++++++++++ 7 files changed, 183 insertions(+), 6 deletions(-) create mode 100644 lib/librte_lpm/rte_lpm_neon.h diff --git a/MAINTAINERS b/MAINTAINERS index fc03ce8..578387b 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -139,6 +139,7 @@ M: Jerin Jacob <jerin.jacob@caviumnetworks.com> M: Jianbo Liu <jianbo.liu@linaro.org> F: lib/librte_eal/common/include/arch/arm/*_64.h F: lib/librte_acl/acl_run_neon.* +F: lib/librte_lpm/rte_lpm_neon.h EZchip TILE-Gx M: Zhigang Lu <zlu@ezchip.com> diff --git a/app/test/test_xmmt_ops.h b/app/test/test_xmmt_ops.h index c055912..de9c16f 100644 --- a/app/test/test_xmmt_ops.h +++ b/app/test/test_xmmt_ops.h @@ -36,6 +36,24 @@ #include <rte_vect.h> +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) + +/* vect_* abstraction implementation using NEON */ + +/* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ +#define vect_loadu_sil128(p) vld1q_s32((const int32_t *)p) + +/* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ +static inline xmm_t __attribute__((always_inline)) +vect_set_epi32(int i3, int i2, int i1, int i0) +{ + int32_t data[4] = {i0, i1, i2, i3}; + + return vld1q_s32(data); +} + +#elif defined(RTE_ARCH_X86) + /* vect_* abstraction implementation using SSE */ /* loads the xmm_t value from address p(does not need to be 16-byte aligned)*/ @@ -44,4 +62,6 @@ /* sets the 4 signed 32-bit integer values and returns the xmm_t variable */ #define vect_set_epi32(i3, i2, i1, i0) _mm_set_epi32(i3, i2, i1, i0) +#endif + #endif /* _TEST_XMMT_OPS_H_ */ diff --git a/config/defconfig_arm-armv7a-linuxapp-gcc b/config/defconfig_arm-armv7a-linuxapp-gcc index 23ba95e..b007ca7 100644 --- a/config/defconfig_arm-armv7a-linuxapp-gcc +++ b/config/defconfig_arm-armv7a-linuxapp-gcc @@ -54,9 +54,6 @@ CONFIG_RTE_LIBRTE_KNI=n CONFIG_RTE_EAL_IGB_UIO=n # fails to compile on ARM -CONFIG_RTE_LIBRTE_LPM=n -CONFIG_RTE_LIBRTE_TABLE=n -CONFIG_RTE_LIBRTE_PIPELINE=n CONFIG_RTE_SCHED_VECTOR=n # cannot use those on ARM diff --git a/config/defconfig_arm64-armv8a-linuxapp-gcc b/config/defconfig_arm64-armv8a-linuxapp-gcc index f6f5d18..b0b17cf 100644 --- a/config/defconfig_arm64-armv8a-linuxapp-gcc +++ b/config/defconfig_arm64-armv8a-linuxapp-gcc @@ -48,7 +48,4 @@ CONFIG_RTE_LIBRTE_IVSHMEM=n CONFIG_RTE_LIBRTE_FM10K_PMD=n CONFIG_RTE_LIBRTE_I40E_PMD=n -CONFIG_RTE_LIBRTE_LPM=n -CONFIG_RTE_LIBRTE_TABLE=n -CONFIG_RTE_LIBRTE_PIPELINE=n CONFIG_RTE_SCHED_VECTOR=n diff --git a/lib/librte_lpm/Makefile b/lib/librte_lpm/Makefile index aa51fe4..656ade2 100644 --- a/lib/librte_lpm/Makefile +++ b/lib/librte_lpm/Makefile @@ -46,7 +46,12 @@ SRCS-$(CONFIG_RTE_LIBRTE_LPM) := rte_lpm.c rte_lpm6.c # install this header file SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include := rte_lpm.h rte_lpm6.h + +ifneq ($(filter y,$(CONFIG_RTE_ARCH_ARM) $(CONFIG_RTE_ARCH_ARM64)),) +SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_neon.h +else ifeq ($(CONFIG_RTE_ARCH_X86),y) SYMLINK-$(CONFIG_RTE_LIBRTE_LPM)-include += rte_lpm_sse.h +endif # this lib needs eal DEPDIRS-$(CONFIG_RTE_LIBRTE_LPM) += lib/librte_eal diff --git a/lib/librte_lpm/rte_lpm.h b/lib/librte_lpm/rte_lpm.h index cc55439..2df1d67 100644 --- a/lib/librte_lpm/rte_lpm.h +++ b/lib/librte_lpm/rte_lpm.h @@ -478,7 +478,11 @@ static inline void rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], uint32_t defv); +#if defined(RTE_ARCH_ARM) || defined(RTE_ARCH_ARM64) +#include "rte_lpm_neon.h" +#else #include "rte_lpm_sse.h" +#endif #ifdef __cplusplus } diff --git a/lib/librte_lpm/rte_lpm_neon.h b/lib/librte_lpm/rte_lpm_neon.h new file mode 100644 index 0000000..7c64315 --- /dev/null +++ b/lib/librte_lpm/rte_lpm_neon.h @@ -0,0 +1,153 @@ +/*- + * BSD LICENSE + * + * Copyright(c) 2015 Cavium Networks. All rights reserved. + * All rights reserved. + * + * Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + * All rights reserved. + * + * Derived rte_lpm_lookupx4 implementation from lib/librte_lpm/rte_lpm_sse.h + * + * Redistribution and use in source and binary forms, with or without + * modification, are permitted provided that the following conditions + * are met: + * + * * Redistributions of source code must retain the above copyright + * notice, this list of conditions and the following disclaimer. + * * Redistributions in binary form must reproduce the above copyright + * notice, this list of conditions and the following disclaimer in + * the documentation and/or other materials provided with the + * distribution. + * * Neither the name of Cavium Networks nor the names of its + * contributors may be used to endorse or promote products derived + * from this software without specific prior written permission. + * + * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + */ + +#ifndef _RTE_LPM_NEON_H_ +#define _RTE_LPM_NEON_H_ + +#include <rte_branch_prediction.h> +#include <rte_byteorder.h> +#include <rte_common.h> +#include <rte_vect.h> + +#ifdef __cplusplus +extern "C" { +#endif + +static inline void +rte_lpm_lookupx4(const struct rte_lpm *lpm, xmm_t ip, uint32_t hop[4], + uint32_t defv) +{ + uint32x4_t i24; + rte_xmm_t i8; + uint32_t tbl[4]; + uint64_t idx, pt, pt2; + const uint32_t *ptbl; + + const uint32_t mask = UINT8_MAX; + const int32x4_t mask8 = vdupq_n_s32(mask); + + /* + * RTE_LPM_VALID_EXT_ENTRY_BITMASK for 2 LPM entries + * as one 64-bit value (0x0300000003000000). + */ + const uint64_t mask_xv = + ((uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK | + (uint64_t)RTE_LPM_VALID_EXT_ENTRY_BITMASK << 32); + + /* + * RTE_LPM_LOOKUP_SUCCESS for 2 LPM entries + * as one 64-bit value (0x0100000001000000). + */ + const uint64_t mask_v = + ((uint64_t)RTE_LPM_LOOKUP_SUCCESS | + (uint64_t)RTE_LPM_LOOKUP_SUCCESS << 32); + + /* get 4 indexes for tbl24[]. */ + i24 = vshrq_n_u32((uint32x4_t)ip, CHAR_BIT); + + /* extract values from tbl24[] */ + idx = vgetq_lane_u64((uint64x2_t)i24, 0); + + ptbl = (const uint32_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[0] = *ptbl; + ptbl = (const uint32_t *)&lpm->tbl24[idx >> 32]; + tbl[1] = *ptbl; + + idx = vgetq_lane_u64((uint64x2_t)i24, 1); + + ptbl = (const uint32_t *)&lpm->tbl24[(uint32_t)idx]; + tbl[2] = *ptbl; + ptbl = (const uint32_t *)&lpm->tbl24[idx >> 32]; + tbl[3] = *ptbl; + + /* get 4 indexes for tbl8[]. */ + i8.x = vandq_s32(ip, mask8); + + pt = (uint64_t)tbl[0] | + (uint64_t)tbl[1] << 32; + pt2 = (uint64_t)tbl[2] | + (uint64_t)tbl[3] << 32; + + /* search successfully finished for all 4 IP addresses. */ + if (likely((pt & mask_xv) == mask_v) && + likely((pt2 & mask_xv) == mask_v)) { + *(uint64_t *)hop = pt & RTE_LPM_MASKX4_RES; + *(uint64_t *)(hop + 2) = pt2 & RTE_LPM_MASKX4_RES; + return; + } + + if (unlikely((pt & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[0] = i8.u32[0] + + (uint8_t)tbl[0] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[0]]; + tbl[0] = *ptbl; + } + if (unlikely((pt >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[1] = i8.u32[1] + + (uint8_t)tbl[1] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[1]]; + tbl[1] = *ptbl; + } + if (unlikely((pt2 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[2] = i8.u32[2] + + (uint8_t)tbl[2] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[2]]; + tbl[2] = *ptbl; + } + if (unlikely((pt2 >> 32 & RTE_LPM_VALID_EXT_ENTRY_BITMASK) == + RTE_LPM_VALID_EXT_ENTRY_BITMASK)) { + i8.u32[3] = i8.u32[3] + + (uint8_t)tbl[3] * RTE_LPM_TBL8_GROUP_NUM_ENTRIES; + ptbl = (const uint32_t *)&lpm->tbl8[i8.u32[3]]; + tbl[3] = *ptbl; + } + + hop[0] = (tbl[0] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[0] & 0x00FFFFFF : defv; + hop[1] = (tbl[1] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[1] & 0x00FFFFFF : defv; + hop[2] = (tbl[2] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[2] & 0x00FFFFFF : defv; + hop[3] = (tbl[3] & RTE_LPM_LOOKUP_SUCCESS) ? tbl[3] & 0x00FFFFFF : defv; +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_LPM_NEON_H_ */ -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* [dpdk-dev] [PATCH v5 3/3] Maintainers: claim responsibility for arm64 specific files of hash 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 " Jerin Jacob 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 2/3] lpm: add support for NEON Jerin Jacob @ 2016-03-11 3:52 ` Jerin Jacob 2016-03-11 14:24 ` [dpdk-dev] [PATCH v5 0/3] add lpm support for NEON Thomas Monjalon 3 siblings, 0 replies; 47+ messages in thread From: Jerin Jacob @ 2016-03-11 3:52 UTC (permalink / raw) To: dev; +Cc: viktorin Fixes: f123e3d2ca92 ("hash: replace libc memcmp with optimized functions for arm64") Signed-off-by: Jerin Jacob <jerin.jacob@caviumnetworks.com> --- MAINTAINERS | 2 ++ 1 file changed, 2 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 578387b..aa3aa65 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -140,6 +140,8 @@ M: Jianbo Liu <jianbo.liu@linaro.org> F: lib/librte_eal/common/include/arch/arm/*_64.h F: lib/librte_acl/acl_run_neon.* F: lib/librte_lpm/rte_lpm_neon.h +F: lib/librte_hash/rte_crc_arm64.h +F: lib/librte_hash/rte_cmp_arm64.h EZchip TILE-Gx M: Zhigang Lu <zlu@ezchip.com> -- 2.1.0 ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: [dpdk-dev] [PATCH v5 0/3] add lpm support for NEON 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 " Jerin Jacob ` (2 preceding siblings ...) 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 3/3] Maintainers: claim responsibility for arm64 specific files of hash Jerin Jacob @ 2016-03-11 14:24 ` Thomas Monjalon 3 siblings, 0 replies; 47+ messages in thread From: Thomas Monjalon @ 2016-03-11 14:24 UTC (permalink / raw) To: Jerin Jacob; +Cc: dev, viktorin > Jerin Jacob (3): > lpm: make rte_lpm_lookupx4 API definition architecture agnostic > lpm: add support for NEON > Maintainers: claim responsibility for arm64 specific files of hash Applied, thanks ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2016-03-11 14:26 UTC | newest] Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-11-30 17:24 [dpdk-dev] [PATCH 0/3] add lpm support for NEON Jerin Jacob 2015-11-30 17:24 ` [dpdk-dev] [PATCH 1/3] eal: introduce rte_vect_* abstractions Jerin Jacob 2015-12-02 13:43 ` Jan Viktorin 2015-12-02 14:51 ` Jerin Jacob 2015-11-30 17:24 ` [dpdk-dev] [PATCH 2/3] lpm: add support for NEON Jerin Jacob 2015-12-02 13:43 ` Jan Viktorin 2015-12-02 14:56 ` Jerin Jacob 2015-12-02 15:00 ` Jan Viktorin 2015-11-30 17:24 ` [dpdk-dev] [PATCH 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob 2015-12-02 13:43 ` Jan Viktorin 2015-12-02 14:57 ` Jerin Jacob 2015-12-02 13:43 ` [dpdk-dev] [PATCH 0/3] add lpm support for NEON Jan Viktorin 2015-12-02 14:41 ` Jerin Jacob 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 " Jerin Jacob 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob 2015-12-07 6:15 ` Jianbo Liu 2015-12-07 6:57 ` Jerin Jacob 2015-12-07 14:06 ` Ananyev, Konstantin 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 2/3] lpm: add support for NEON Jerin Jacob 2015-12-04 15:14 ` [dpdk-dev] [PATCH v2 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON Jerin Jacob 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 2/3] lpm: add support for NEON Jerin Jacob 2016-02-11 11:46 ` Thomas Monjalon 2016-02-12 6:47 ` Jerin Jacob 2016-02-12 8:42 ` Thomas Monjalon 2016-01-29 4:10 ` [dpdk-dev] [PATCH v3 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob 2016-02-08 9:29 ` [dpdk-dev] [PATCH v3 0/3] add lpm support for NEON Jerin Jacob 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 " Jerin Jacob 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob 2016-03-01 17:42 ` Thomas Monjalon 2016-03-02 6:28 ` Jerin Jacob 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 2/3] lpm: add support for NEON Jerin Jacob 2016-03-01 17:46 ` Thomas Monjalon 2016-03-02 6:45 ` Jerin Jacob 2016-02-12 12:28 ` [dpdk-dev] [PATCH v4 3/3] maintainers: claim responsibility for arm64 specific files of hash and lpm Jerin Jacob 2016-03-01 17:47 ` Thomas Monjalon 2016-03-02 6:46 ` Jerin Jacob 2016-02-16 13:27 ` [dpdk-dev] [PATCH v4 0/3] add lpm support for NEON Kobylinski, MichalX 2016-02-16 16:44 ` Jerin Jacob 2016-02-18 10:26 ` Kobylinski, MichalX 2016-02-19 0:34 ` Jerin Jacob 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 " Jerin Jacob 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 1/3] lpm: make rte_lpm_lookupx4 API definition architecture agnostic Jerin Jacob 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 2/3] lpm: add support for NEON Jerin Jacob 2016-03-11 3:52 ` [dpdk-dev] [PATCH v5 3/3] Maintainers: claim responsibility for arm64 specific files of hash Jerin Jacob 2016-03-11 14:24 ` [dpdk-dev] [PATCH v5 0/3] add lpm support for NEON Thomas Monjalon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).