* [dpdk-dev] [PATCH v3 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC @ 2020-09-29 15:35 Mairtin o Loingsigh 2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh ` (2 more replies) 0 siblings, 3 replies; 23+ messages in thread From: Mairtin o Loingsigh @ 2020-09-29 15:35 UTC (permalink / raw) To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch Cc: dev, brendan.ryan, david.coyle, Mairtin o Loingsigh This patchset makes two significant enhancements to the CRC modules of the rte_net library: 1) Adds run-time selection of the optimal architecture-specific CRC path. Previously the selection was solely made at compile-time, meaning it could only be built and run on the same generation of CPU. Adding run-time selection ability means this can be used from distro packages and/or DPDK can be compiled on an older CPU and run on a newer CPU. 2) Adds an optimized CRC implementation based on the AVX512 and VPCLMULQDQ instruction sets. For further details, please see the commit messages of the individual patches. v2: * Added support for run-time selection of optimal architecture-specific CRC, based on v1 review comment. * Added full working AVX512/VPCLMULDQD support for CRC32-Ethernet and CRC16-CCITT. v1: * Initial version, with incomplete AVX512/VPCLMULDQD support for CRC32-Ethernet only. Mairtin o Loingsigh (2): net: add run-time architecture specific CRC selection net: add support for AVX512/VPCLMULQDQ based CRC app/test/test_crc.c | 11 +- config/x86/meson.build | 6 +- doc/guides/rel_notes/release_20_11.rst | 6 + lib/librte_net/meson.build | 89 ++++- lib/librte_net/net_crc.h | 45 +++ lib/librte_net/net_crc_avx512.c | 424 ++++++++++++++++++++++ lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 27 +- lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 +- lib/librte_net/rte_net_crc.c | 100 +++-- lib/librte_net/rte_net_crc.h | 4 +- 10 files changed, 674 insertions(+), 72 deletions(-) create mode 100644 lib/librte_net/net_crc.h create mode 100644 lib/librte_net/net_crc_avx512.c rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%) rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%) -- 2.12.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection 2020-09-29 15:35 [dpdk-dev] [PATCH v3 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh @ 2020-09-29 15:36 ` Mairtin o Loingsigh 2020-10-02 15:17 ` Singh, Jasvinder 2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh 2 siblings, 1 reply; 23+ messages in thread From: Mairtin o Loingsigh @ 2020-09-29 15:36 UTC (permalink / raw) To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch Cc: dev, brendan.ryan, david.coyle, Mairtin o Loingsigh This patch adds support for run-time selection of the optimal architecture-specific CRC path, based on the supported instruction set(s) of the CPU. The compiler option checks have been moved from the C files to the meson script. The rte_cpu_get_flag_enabled function is called automatically by the library at process initialization time to determine which instructions the CPU supports, with the most optimal supported CRC path ultimately selected. Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> Signed-off-by: David Coyle <david.coyle@intel.com> --- doc/guides/rel_notes/release_20_11.rst | 4 ++ lib/librte_net/meson.build | 34 +++++++++++- lib/librte_net/net_crc.h | 34 ++++++++++++ lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 27 +++------ lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 ++++-------- lib/librte_net/rte_net_crc.c | 67 ++++++++++++++--------- 6 files changed, 132 insertions(+), 68 deletions(-) create mode 100644 lib/librte_net/net_crc.h rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%) rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%) diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index 4eb3224a7..6bd222dca 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -55,6 +55,10 @@ New Features Also, make sure to start the actual text at the margin. ======================================================= +* **Updated CRC modules of rte_net library.** + + * Added run-time selection of the optimal architecture-specific CRC path. + * **Updated Cisco enic driver.** * Added support for VF representors with single-queue Tx/Rx and flow API diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index 24ed8253b..b6880bd85 100644 --- a/lib/librte_net/meson.build +++ b/lib/librte_net/meson.build @@ -1,5 +1,5 @@ # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017 Intel Corporation +# Copyright(c) 2017-2020 Intel Corporation headers = files('rte_ip.h', 'rte_tcp.h', @@ -20,3 +20,35 @@ headers = files('rte_ip.h', sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c') deps += ['mbuf'] + +if dpdk_conf.has('RTE_ARCH_X86_64') + net_crc_sse42_cpu_support = \ + cc.get_define('__PCLMUL__', args: machine_args) != '' + net_crc_sse42_cc_support = \ + cc.has_argument('-mpclmul') and cc.has_argument('-maes') + + build_static_net_crc_sse42_lib = 0 + + if net_crc_sse42_cpu_support == true + sources += files('net_crc_sse.c') + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] + elif net_crc_sse42_cc_support == true + build_static_net_crc_sse42_lib = 1 + net_crc_sse42_lib_cflags = ['-mpclmul', '-maes'] + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] + endif + + if build_static_net_crc_sse42_lib == 1 + net_crc_sse42_lib = static_library( + 'net_crc_sse42_lib', + 'net_crc_sse.c', + dependencies: static_rte_eal, + c_args: [cflags, + net_crc_sse42_lib_cflags]) + objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c') + endif +elif dpdk_conf.has('RTE_ARCH_ARM64') and \ + cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '' + sources += files('net_crc_neon.c') + cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT'] +endif diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h new file mode 100644 index 000000000..a1578a56c --- /dev/null +++ b/lib/librte_net/net_crc.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _NET_CRC_H_ +#define _NET_CRC_H_ + +/* + * Different implementations of CRC + */ + +/* SSE4.2 */ + +void +rte_net_crc_sse42_init(void); + +uint32_t +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len); + +uint32_t +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len); + +/* NEON */ + +void +rte_net_crc_neon_init(void); + +uint32_t +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len); + +uint32_t +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len); + +#endif /* _NET_CRC_H_ */ diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c similarity index 95% rename from lib/librte_net/net_crc_neon.h rename to lib/librte_net/net_crc_neon.c index 63fa1d4a1..b79684ec2 100644 --- a/lib/librte_net/net_crc_neon.h +++ b/lib/librte_net/net_crc_neon.c @@ -1,18 +1,17 @@ /* SPDX-License-Identifier: BSD-3-Clause * Copyright(c) 2017 Cavium, Inc + * Copyright(c) 2020 Intel Corporation */ -#ifndef _NET_CRC_NEON_H_ -#define _NET_CRC_NEON_H_ +#include <string.h> +#include <rte_common.h> #include <rte_branch_prediction.h> #include <rte_net_crc.h> #include <rte_vect.h> #include <rte_cpuflags.h> -#ifdef __cplusplus -extern "C" { -#endif +#include "net_crc.h" /** PMULL CRC computation context structure */ struct crc_pmull_ctx { @@ -218,7 +217,7 @@ crc32_eth_calc_pmull( return n; } -static inline void +void rte_net_crc_neon_init(void) { /* Initialize CRC16 data */ @@ -242,9 +241,8 @@ rte_net_crc_neon_init(void) crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8); } -static inline uint32_t -rte_crc16_ccitt_neon_handler(const uint8_t *data, - uint32_t data_len) +uint32_t +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len) { return (uint16_t)~crc32_eth_calc_pmull(data, data_len, @@ -252,18 +250,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data, &crc16_ccitt_pmull); } -static inline uint32_t -rte_crc32_eth_neon_handler(const uint8_t *data, - uint32_t data_len) +uint32_t +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len) { return ~crc32_eth_calc_pmull(data, data_len, 0xffffffffUL, &crc32_eth_pmull); } - -#ifdef __cplusplus -} -#endif - -#endif /* _NET_CRC_NEON_H_ */ diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c similarity index 94% rename from lib/librte_net/net_crc_sse.h rename to lib/librte_net/net_crc_sse.c index 1c7b7a548..053b54b39 100644 --- a/lib/librte_net/net_crc_sse.h +++ b/lib/librte_net/net_crc_sse.c @@ -1,18 +1,16 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2017-2020 Intel Corporation */ -#ifndef _RTE_NET_CRC_SSE_H_ -#define _RTE_NET_CRC_SSE_H_ +#include <string.h> +#include <rte_common.h> #include <rte_branch_prediction.h> +#include <rte_cpuflags.h> -#include <x86intrin.h> -#include <cpuid.h> +#include "net_crc.h" -#ifdef __cplusplus -extern "C" { -#endif +#include <x86intrin.h> /** PCLMULQDQ CRC computation context structure */ struct crc_pclmulqdq_ctx { @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq( return n; } - -static inline void +void rte_net_crc_sse42_init(void) { uint64_t k1, k2, k5, k6; @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void) * use other data types such as float, double, etc. */ _mm_empty(); - } -static inline uint32_t -rte_crc16_ccitt_sse42_handler(const uint8_t *data, - uint32_t data_len) +uint32_t +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len) { /** return 16-bit CRC value */ return (uint16_t)~crc32_eth_calc_pclmulqdq(data, @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, &crc16_ccitt_pclmulqdq); } -static inline uint32_t -rte_crc32_eth_sse42_handler(const uint8_t *data, - uint32_t data_len) +uint32_t +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len) { return ~crc32_eth_calc_pclmulqdq(data, data_len, 0xffffffffUL, &crc32_eth_pclmulqdq); } - -#ifdef __cplusplus -} -#endif - -#endif /* _RTE_NET_CRC_SSE_H_ */ diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index 4f5b9e828..83dccbfba 100644 --- a/lib/librte_net/rte_net_crc.c +++ b/lib/librte_net/rte_net_crc.c @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2017-2020 Intel Corporation */ #include <stddef.h> @@ -10,17 +10,7 @@ #include <rte_common.h> #include <rte_net_crc.h> -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__) -#define X86_64_SSE42_PCLMULQDQ 1 -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO) -#define ARM64_NEON_PMULL 1 -#endif - -#ifdef X86_64_SSE42_PCLMULQDQ -#include <net_crc_sse.h> -#elif defined ARM64_NEON_PMULL -#include <net_crc_neon.h> -#endif +#include "net_crc.h" /** CRC polynomials */ #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL @@ -47,13 +37,13 @@ static rte_net_crc_handler handlers_scalar[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, }; - -#ifdef X86_64_SSE42_PCLMULQDQ +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static rte_net_crc_handler handlers_sse42[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler, }; -#elif defined ARM64_NEON_PMULL +#endif +#ifdef CC_ARM64_NEON_PMULL_SUPPORT static rte_net_crc_handler handlers_neon[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler, @@ -142,22 +132,44 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len) crc32_eth_lut); } +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT +static uint8_t +sse42_pclmulqdq_cpu_supported(void) +{ + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ); +} +#endif + +#ifdef CC_ARM64_NEON_PMULL_SUPPORT +static uint8_t +neon_pmull_cpu_supported(void) +{ + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL); +} +#endif + void rte_net_crc_set_alg(enum rte_net_crc_alg alg) { switch (alg) { -#ifdef X86_64_SSE42_PCLMULQDQ +#ifdef RTE_ARCH_X86_64 case RTE_NET_CRC_SSE42: - handlers = handlers_sse42; - break; -#elif defined ARM64_NEON_PMULL - /* fall-through */ +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT + if (sse42_pclmulqdq_cpu_supported()) { + handlers = handlers_sse42; + break; + } +#endif +#endif /* RTE_ARCH_X86_64 */ +#ifdef RTE_ARCH_ARM64 case RTE_NET_CRC_NEON: - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { +#ifdef CC_ARM64_NEON_PMULL_SUPPORT + if (neon_pmull_cpu_supported()) { handlers = handlers_neon; break; } #endif +#endif /* RTE_ARCH_ARM64 */ /* fall-through */ case RTE_NET_CRC_SCALAR: /* fall-through */ @@ -188,11 +200,14 @@ RTE_INIT(rte_net_crc_init) rte_net_crc_scalar_init(); -#ifdef X86_64_SSE42_PCLMULQDQ - alg = RTE_NET_CRC_SSE42; - rte_net_crc_sse42_init(); -#elif defined ARM64_NEON_PMULL - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT + if (sse42_pclmulqdq_cpu_supported()) { + alg = RTE_NET_CRC_SSE42; + rte_net_crc_sse42_init(); + } +#endif +#ifdef CC_ARM64_NEON_PMULL_SUPPORT + if (neon_pmull_cpu_supported()) { alg = RTE_NET_CRC_NEON; rte_net_crc_neon_init(); } -- 2.12.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection 2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh @ 2020-10-02 15:17 ` Singh, Jasvinder 2020-10-06 16:38 ` O'loingsigh, Mairtin 0 siblings, 1 reply; 23+ messages in thread From: Singh, Jasvinder @ 2020-10-02 15:17 UTC (permalink / raw) To: O'loingsigh, Mairtin, Richardson, Bruce, De Lara Guarch, Pablo Cc: dev, Ryan, Brendan, Coyle, David > -----Original Message----- > From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com> > Sent: Tuesday, September 29, 2020 4:36 PM > To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce > <bruce.richardson@intel.com>; De Lara Guarch, Pablo > <pablo.de.lara.guarch@intel.com> > Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; Coyle, David > <david.coyle@intel.com>; O'loingsigh, Mairtin > <mairtin.oloingsigh@intel.com> > Subject: [PATCH v3 1/2] net: add run-time architecture specific CRC selection > > This patch adds support for run-time selection of the optimal architecture- > specific CRC path, based on the supported instruction set(s) of the CPU. > > The compiler option checks have been moved from the C files to the meson > script. The rte_cpu_get_flag_enabled function is called automatically by the > library at process initialization time to determine which instructions the CPU > supports, with the most optimal supported CRC path ultimately selected. > > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> > Signed-off-by: David Coyle <david.coyle@intel.com> > --- > doc/guides/rel_notes/release_20_11.rst | 4 ++ > lib/librte_net/meson.build | 34 +++++++++++- > lib/librte_net/net_crc.h | 34 ++++++++++++ > lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 27 +++------ > lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 ++++-------- > lib/librte_net/rte_net_crc.c | 67 ++++++++++++++--------- > 6 files changed, 132 insertions(+), 68 deletions(-) create mode 100644 > lib/librte_net/net_crc.h rename lib/librte_net/{net_crc_neon.h => > net_crc_neon.c} (95%) rename lib/librte_net/{net_crc_sse.h => > net_crc_sse.c} (94%) > > diff --git a/doc/guides/rel_notes/release_20_11.rst > b/doc/guides/rel_notes/release_20_11.rst > index 4eb3224a7..6bd222dca 100644 > --- a/doc/guides/rel_notes/release_20_11.rst > +++ b/doc/guides/rel_notes/release_20_11.rst > @@ -55,6 +55,10 @@ New Features > Also, make sure to start the actual text at the margin. > ======================================================= <snip> _t *data, uint32_t data_len); > + > +#endif /* _NET_CRC_H_ */ > diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c > similarity index 95% rename from lib/librte_net/net_crc_neon.h rename to > lib/librte_net/net_crc_neon.c index 63fa1d4a1..b79684ec2 100644 > --- a/lib/librte_net/net_crc_neon.h > +++ b/lib/librte_net/net_crc_neon.c > @@ -1,18 +1,17 @@ > /* SPDX-License-Identifier: BSD-3-Clause > * Copyright(c) 2017 Cavium, Inc > + * Copyright(c) 2020 Intel Corporation > */ Could you please remove intel copyright as there is no change in this file? > -#ifndef _NET_CRC_NEON_H_ > -#define _NET_CRC_NEON_H_ > +#include <string.h> > > +#include <rte_common.h> > #include <rte_branch_prediction.h> > #include <rte_net_crc.h> > #include <rte_vect.h> > #include <rte_cpuflags.h> > > -#ifdef __cplusplus > -extern "C" { > -#endif > +#include "net_crc.h" > > /** PMULL CRC computation context structure */ struct crc_pmull_ctx { @@ > -218,7 +217,7 @@ crc32_eth_calc_pmull( > return n; > } > > -static inline void > +void > rte_net_crc_neon_init(void) > { > /* Initialize CRC16 data */ > @@ -242,9 +241,8 @@ rte_net_crc_neon_init(void) > crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8); } > > -static inline uint32_t > -rte_crc16_ccitt_neon_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len) > { > return (uint16_t)~crc32_eth_calc_pmull(data, > data_len, > @@ -252,18 +250,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data, > &crc16_ccitt_pmull); > } > > -static inline uint32_t > -rte_crc32_eth_neon_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len) > { > return ~crc32_eth_calc_pmull(data, > data_len, > 0xffffffffUL, > &crc32_eth_pmull); > } > - > -#ifdef __cplusplus > -} > -#endif > - > -#endif /* _NET_CRC_NEON_H_ */ > diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c > similarity index 94% rename from lib/librte_net/net_crc_sse.h rename to > lib/librte_net/net_crc_sse.c index 1c7b7a548..053b54b39 100644 > --- a/lib/librte_net/net_crc_sse.h > +++ b/lib/librte_net/net_crc_sse.c > @@ -1,18 +1,16 @@ > /* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2017 Intel Corporation > + * Copyright(c) 2017-2020 Intel Corporation > */ > > -#ifndef _RTE_NET_CRC_SSE_H_ > -#define _RTE_NET_CRC_SSE_H_ > +#include <string.h> > > +#include <rte_common.h> > #include <rte_branch_prediction.h> > +#include <rte_cpuflags.h> > > -#include <x86intrin.h> > -#include <cpuid.h> > +#include "net_crc.h" > > -#ifdef __cplusplus > -extern "C" { > -#endif > +#include <x86intrin.h> > > /** PCLMULQDQ CRC computation context structure */ struct > crc_pclmulqdq_ctx { @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq( > return n; > } > > - > -static inline void > +void > rte_net_crc_sse42_init(void) > { > uint64_t k1, k2, k5, k6; > @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void) > * use other data types such as float, double, etc. > */ > _mm_empty(); > - > } > > -static inline uint32_t > -rte_crc16_ccitt_sse42_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len) > { > /** return 16-bit CRC value */ > return (uint16_t)~crc32_eth_calc_pclmulqdq(data, > @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t > *data, > &crc16_ccitt_pclmulqdq); > } > > -static inline uint32_t > -rte_crc32_eth_sse42_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len) > { > return ~crc32_eth_calc_pclmulqdq(data, > data_len, > 0xffffffffUL, > &crc32_eth_pclmulqdq); > } > - > -#ifdef __cplusplus > -} > -#endif > - > -#endif /* _RTE_NET_CRC_SSE_H_ */ > diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index > 4f5b9e828..83dccbfba 100644 > --- a/lib/librte_net/rte_net_crc.c > +++ b/lib/librte_net/rte_net_crc.c > @@ -1,5 +1,5 @@ > /* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2017 Intel Corporation > + * Copyright(c) 2017-2020 Intel Corporation > */ > > #include <stddef.h> > @@ -10,17 +10,7 @@ > #include <rte_common.h> > #include <rte_net_crc.h> > > -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__) > -#define X86_64_SSE42_PCLMULQDQ 1 > -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO) > -#define ARM64_NEON_PMULL 1 > -#endif > - > -#ifdef X86_64_SSE42_PCLMULQDQ > -#include <net_crc_sse.h> > -#elif defined ARM64_NEON_PMULL > -#include <net_crc_neon.h> > -#endif > +#include "net_crc.h" > > /** CRC polynomials */ > #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL @@ -47,13 +37,13 @@ > static rte_net_crc_handler handlers_scalar[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, }; > - > -#ifdef X86_64_SSE42_PCLMULQDQ > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > static rte_net_crc_handler handlers_sse42[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler, }; -#elif > defined ARM64_NEON_PMULL > +#endif > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > static rte_net_crc_handler handlers_neon[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler, @@ -142,22 > +132,44 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len) > crc32_eth_lut); > } > > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static uint8_t > +sse42_pclmulqdq_cpu_supported(void) > +{ > + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ); > +} > +#endif > + > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > +static uint8_t > +neon_pmull_cpu_supported(void) > +{ > + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL); > +} > +#endif > + > void > rte_net_crc_set_alg(enum rte_net_crc_alg alg) { > switch (alg) { > -#ifdef X86_64_SSE42_PCLMULQDQ > +#ifdef RTE_ARCH_X86_64 > case RTE_NET_CRC_SSE42: > - handlers = handlers_sse42; > - break; > -#elif defined ARM64_NEON_PMULL > - /* fall-through */ > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > + if (sse42_pclmulqdq_cpu_supported()) { > + handlers = handlers_sse42; > + break; > + } > +#endif > +#endif /* RTE_ARCH_X86_64 */ > +#ifdef RTE_ARCH_ARM64 > case RTE_NET_CRC_NEON: > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > + if (neon_pmull_cpu_supported()) { > handlers = handlers_neon; > break; > } > #endif > +#endif /* RTE_ARCH_ARM64 */ > /* fall-through */ > case RTE_NET_CRC_SCALAR: > /* fall-through */ > @@ -188,11 +200,14 @@ RTE_INIT(rte_net_crc_init) > > rte_net_crc_scalar_init(); > > -#ifdef X86_64_SSE42_PCLMULQDQ > - alg = RTE_NET_CRC_SSE42; > - rte_net_crc_sse42_init(); > -#elif defined ARM64_NEON_PMULL > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > + if (sse42_pclmulqdq_cpu_supported()) { > + alg = RTE_NET_CRC_SSE42; > + rte_net_crc_sse42_init(); > + } > +#endif > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > + if (neon_pmull_cpu_supported()) { > alg = RTE_NET_CRC_NEON; > rte_net_crc_neon_init(); > } > -- > 2.12.3 Patch looks good to me except the one stated above. ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection 2020-10-02 15:17 ` Singh, Jasvinder @ 2020-10-06 16:38 ` O'loingsigh, Mairtin 0 siblings, 0 replies; 23+ messages in thread From: O'loingsigh, Mairtin @ 2020-10-06 16:38 UTC (permalink / raw) To: Singh, Jasvinder, Richardson, Bruce, De Lara Guarch, Pablo Cc: dev, Ryan, Brendan, Coyle, David Hi, > -----Original Message----- > From: Singh, Jasvinder <jasvinder.singh@intel.com> > Sent: Friday, October 2, 2020 4:18 PM > To: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>; Richardson, Bruce > <bruce.richardson@intel.com>; De Lara Guarch, Pablo > <pablo.de.lara.guarch@intel.com> > Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; Coyle, David > <david.coyle@intel.com> > Subject: RE: [PATCH v3 1/2] net: add run-time architecture specific CRC > selection > > > > > -----Original Message----- > > From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com> > > Sent: Tuesday, September 29, 2020 4:36 PM > > To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce > > <bruce.richardson@intel.com>; De Lara Guarch, Pablo > > <pablo.de.lara.guarch@intel.com> > > Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; Coyle, > David > > <david.coyle@intel.com>; O'loingsigh, Mairtin > > <mairtin.oloingsigh@intel.com> > > Subject: [PATCH v3 1/2] net: add run-time architecture specific CRC > > selection > > > > This patch adds support for run-time selection of the optimal > > architecture- specific CRC path, based on the supported instruction set(s) > of the CPU. > > > > The compiler option checks have been moved from the C files to the > > meson script. The rte_cpu_get_flag_enabled function is called > > automatically by the library at process initialization time to > > determine which instructions the CPU supports, with the most optimal > supported CRC path ultimately selected. > > > > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> > > Signed-off-by: David Coyle <david.coyle@intel.com> > > --- > > doc/guides/rel_notes/release_20_11.rst | 4 ++ > > lib/librte_net/meson.build | 34 +++++++++++- > > lib/librte_net/net_crc.h | 34 ++++++++++++ > > lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 27 +++------ > > lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 ++++-------- > > lib/librte_net/rte_net_crc.c | 67 ++++++++++++++--------- > > 6 files changed, 132 insertions(+), 68 deletions(-) create mode > > 100644 lib/librte_net/net_crc.h rename lib/librte_net/{net_crc_neon.h > > => net_crc_neon.c} (95%) rename lib/librte_net/{net_crc_sse.h => > > net_crc_sse.c} (94%) > > > > diff --git a/doc/guides/rel_notes/release_20_11.rst > > b/doc/guides/rel_notes/release_20_11.rst > > index 4eb3224a7..6bd222dca 100644 > > --- a/doc/guides/rel_notes/release_20_11.rst > > +++ b/doc/guides/rel_notes/release_20_11.rst > > @@ -55,6 +55,10 @@ New Features > > Also, make sure to start the actual text at the margin. > > ======================================================= > > <snip> > > > _t *data, uint32_t data_len); > > + > > +#endif /* _NET_CRC_H_ */ > > diff --git a/lib/librte_net/net_crc_neon.h > > b/lib/librte_net/net_crc_neon.c similarity index 95% rename from > > lib/librte_net/net_crc_neon.h rename to lib/librte_net/net_crc_neon.c > > index 63fa1d4a1..b79684ec2 100644 > > --- a/lib/librte_net/net_crc_neon.h > > +++ b/lib/librte_net/net_crc_neon.c > > @@ -1,18 +1,17 @@ > > /* SPDX-License-Identifier: BSD-3-Clause > > * Copyright(c) 2017 Cavium, Inc > > + * Copyright(c) 2020 Intel Corporation > > */ > > Could you please remove intel copyright as there is no change in this file? > > > -#ifndef _NET_CRC_NEON_H_ > > -#define _NET_CRC_NEON_H_ > > +#include <string.h> > > > > +#include <rte_common.h> > > #include <rte_branch_prediction.h> > > #include <rte_net_crc.h> > > #include <rte_vect.h> > > #include <rte_cpuflags.h> > > > > -#ifdef __cplusplus > > -extern "C" { > > -#endif > > +#include "net_crc.h" > > > > /** PMULL CRC computation context structure */ struct crc_pmull_ctx > > { @@ > > -218,7 +217,7 @@ crc32_eth_calc_pmull( > > return n; > > } > > > > -static inline void > > +void > > rte_net_crc_neon_init(void) > > { > > /* Initialize CRC16 data */ > > @@ -242,9 +241,8 @@ rte_net_crc_neon_init(void) > > crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8); } > > > > -static inline uint32_t > > -rte_crc16_ccitt_neon_handler(const uint8_t *data, > > - uint32_t data_len) > > +uint32_t > > +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len) > > { > > return (uint16_t)~crc32_eth_calc_pmull(data, > > data_len, > > @@ -252,18 +250,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t > *data, > > &crc16_ccitt_pmull); > > } > > > > -static inline uint32_t > > -rte_crc32_eth_neon_handler(const uint8_t *data, > > - uint32_t data_len) > > +uint32_t > > +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len) > > { > > return ~crc32_eth_calc_pmull(data, > > data_len, > > 0xffffffffUL, > > &crc32_eth_pmull); > > } > > - > > -#ifdef __cplusplus > > -} > > -#endif > > - > > -#endif /* _NET_CRC_NEON_H_ */ > > diff --git a/lib/librte_net/net_crc_sse.h > > b/lib/librte_net/net_crc_sse.c similarity index 94% rename from > > lib/librte_net/net_crc_sse.h rename to lib/librte_net/net_crc_sse.c > > index 1c7b7a548..053b54b39 100644 > > --- a/lib/librte_net/net_crc_sse.h > > +++ b/lib/librte_net/net_crc_sse.c > > @@ -1,18 +1,16 @@ > > /* SPDX-License-Identifier: BSD-3-Clause > > - * Copyright(c) 2017 Intel Corporation > > + * Copyright(c) 2017-2020 Intel Corporation > > */ > > > > -#ifndef _RTE_NET_CRC_SSE_H_ > > -#define _RTE_NET_CRC_SSE_H_ > > +#include <string.h> > > > > +#include <rte_common.h> > > #include <rte_branch_prediction.h> > > +#include <rte_cpuflags.h> > > > > -#include <x86intrin.h> > > -#include <cpuid.h> > > +#include "net_crc.h" > > > > -#ifdef __cplusplus > > -extern "C" { > > -#endif > > +#include <x86intrin.h> > > > > /** PCLMULQDQ CRC computation context structure */ struct > > crc_pclmulqdq_ctx { @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq( > > return n; > > } > > > > - > > -static inline void > > +void > > rte_net_crc_sse42_init(void) > > { > > uint64_t k1, k2, k5, k6; > > @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void) > > * use other data types such as float, double, etc. > > */ > > _mm_empty(); > > - > > } > > > > -static inline uint32_t > > -rte_crc16_ccitt_sse42_handler(const uint8_t *data, > > - uint32_t data_len) > > +uint32_t > > +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len) > > { > > /** return 16-bit CRC value */ > > return (uint16_t)~crc32_eth_calc_pclmulqdq(data, > > @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t > > *data, > > &crc16_ccitt_pclmulqdq); > > } > > > > -static inline uint32_t > > -rte_crc32_eth_sse42_handler(const uint8_t *data, > > - uint32_t data_len) > > +uint32_t > > +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len) > > { > > return ~crc32_eth_calc_pclmulqdq(data, > > data_len, > > 0xffffffffUL, > > &crc32_eth_pclmulqdq); > > } > > - > > -#ifdef __cplusplus > > -} > > -#endif > > - > > -#endif /* _RTE_NET_CRC_SSE_H_ */ > > diff --git a/lib/librte_net/rte_net_crc.c > > b/lib/librte_net/rte_net_crc.c index 4f5b9e828..83dccbfba 100644 > > --- a/lib/librte_net/rte_net_crc.c > > +++ b/lib/librte_net/rte_net_crc.c > > @@ -1,5 +1,5 @@ > > /* SPDX-License-Identifier: BSD-3-Clause > > - * Copyright(c) 2017 Intel Corporation > > + * Copyright(c) 2017-2020 Intel Corporation > > */ > > > > #include <stddef.h> > > @@ -10,17 +10,7 @@ > > #include <rte_common.h> > > #include <rte_net_crc.h> > > > > -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__) > > -#define X86_64_SSE42_PCLMULQDQ 1 > > -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO) > > -#define ARM64_NEON_PMULL 1 > > -#endif > > - > > -#ifdef X86_64_SSE42_PCLMULQDQ > > -#include <net_crc_sse.h> > > -#elif defined ARM64_NEON_PMULL > > -#include <net_crc_neon.h> > > -#endif > > +#include "net_crc.h" > > > > /** CRC polynomials */ > > #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL @@ -47,13 +37,13 @@ > static > > rte_net_crc_handler handlers_scalar[] = { > > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, > > [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, }; > > - > > -#ifdef X86_64_SSE42_PCLMULQDQ > > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > > static rte_net_crc_handler handlers_sse42[] = { > > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, > > [RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler, }; -#elif > > defined ARM64_NEON_PMULL > > +#endif > > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > > static rte_net_crc_handler handlers_neon[] = { > > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler, > > [RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler, @@ -142,22 > > +132,44 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t > > +data_len) > > crc32_eth_lut); > > } > > > > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static uint8_t > > +sse42_pclmulqdq_cpu_supported(void) > > +{ > > + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ); > > +} > > +#endif > > + > > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > > +static uint8_t > > +neon_pmull_cpu_supported(void) > > +{ > > + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL); > > +} > > +#endif > > + > > void > > rte_net_crc_set_alg(enum rte_net_crc_alg alg) { > > switch (alg) { > > -#ifdef X86_64_SSE42_PCLMULQDQ > > +#ifdef RTE_ARCH_X86_64 > > case RTE_NET_CRC_SSE42: > > - handlers = handlers_sse42; > > - break; > > -#elif defined ARM64_NEON_PMULL > > - /* fall-through */ > > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > > + if (sse42_pclmulqdq_cpu_supported()) { > > + handlers = handlers_sse42; > > + break; > > + } > > +#endif > > +#endif /* RTE_ARCH_X86_64 */ > > +#ifdef RTE_ARCH_ARM64 > > case RTE_NET_CRC_NEON: > > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { > > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > > + if (neon_pmull_cpu_supported()) { > > handlers = handlers_neon; > > break; > > } > > #endif > > +#endif /* RTE_ARCH_ARM64 */ > > /* fall-through */ > > case RTE_NET_CRC_SCALAR: > > /* fall-through */ > > @@ -188,11 +200,14 @@ RTE_INIT(rte_net_crc_init) > > > > rte_net_crc_scalar_init(); > > > > -#ifdef X86_64_SSE42_PCLMULQDQ > > - alg = RTE_NET_CRC_SSE42; > > - rte_net_crc_sse42_init(); > > -#elif defined ARM64_NEON_PMULL > > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { > > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > > + if (sse42_pclmulqdq_cpu_supported()) { > > + alg = RTE_NET_CRC_SSE42; > > + rte_net_crc_sse42_init(); > > + } > > +#endif > > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > > + if (neon_pmull_cpu_supported()) { > > alg = RTE_NET_CRC_NEON; > > rte_net_crc_neon_init(); > > } > > -- > > 2.12.3 > > Patch looks good to me except the one stated above. > > Fix for above comment on copyright has been applied to v4 patch which was just submitted Regards, Mairtin ^ permalink raw reply [flat|nested] 23+ messages in thread
* [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC 2020-09-29 15:35 [dpdk-dev] [PATCH v3 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh 2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh @ 2020-09-29 15:36 ` Mairtin o Loingsigh 2020-10-05 13:20 ` De Lara Guarch, Pablo 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh 2 siblings, 1 reply; 23+ messages in thread From: Mairtin o Loingsigh @ 2020-09-29 15:36 UTC (permalink / raw) To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch Cc: dev, brendan.ryan, david.coyle, Mairtin o Loingsigh This patch enables the optimized calculation of CRC32-Ethernet and CRC16-CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC implementation is built if the compiler supports the required instruction sets. It is selected at run-time if the host CPU, again, supports the required instruction sets. Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> Signed-off-by: David Coyle <david.coyle@intel.com> --- app/test/test_crc.c | 11 +- config/x86/meson.build | 6 +- doc/guides/rel_notes/release_20_11.rst | 2 + lib/librte_net/meson.build | 55 +++++ lib/librte_net/net_crc.h | 11 + lib/librte_net/net_crc_avx512.c | 424 +++++++++++++++++++++++++++++++++ lib/librte_net/rte_net_crc.c | 33 +++ lib/librte_net/rte_net_crc.h | 4 +- 8 files changed, 542 insertions(+), 4 deletions(-) create mode 100644 lib/librte_net/net_crc_avx512.c diff --git a/app/test/test_crc.c b/app/test/test_crc.c index f8a74e04e..bf1d34435 100644 --- a/app/test/test_crc.c +++ b/app/test/test_crc.c @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2017-2020 Intel Corporation */ #include "test.h" @@ -149,6 +149,15 @@ test_crc(void) return ret; } + /* set CRC avx512 mode */ + rte_net_crc_set_alg(RTE_NET_CRC_AVX512); + + ret = test_crc_calc(); + if (ret < 0) { + printf("test crc (x86_64 AVX512): failed (%d)\n", ret); + return ret; + } + /* set CRC neon mode */ rte_net_crc_set_alg(RTE_NET_CRC_NEON); diff --git a/config/x86/meson.build b/config/x86/meson.build index fea4d5403..172b72b72 100644 --- a/config/x86/meson.build +++ b/config/x86/meson.build @@ -1,5 +1,5 @@ # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017-2019 Intel Corporation +# Copyright(c) 2017-2020 Intel Corporation # get binutils version for the workaround of Bug 97 if not is_windows @@ -23,7 +23,9 @@ endforeach optional_flags = ['AES', 'PCLMUL', 'AVX', 'AVX2', 'AVX512F', - 'RDRND', 'RDSEED'] + 'RDRND', 'RDSEED', + 'AVX512BW', 'AVX512DQ', + 'AVX512VL', 'VPCLMULQDQ'] foreach f:optional_flags if cc.get_define('__@0@__'.format(f), args: machine_args) == '1' if f == 'PCLMUL' # special case flags with different defines diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index 6bd222dca..509749ebd 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -58,6 +58,8 @@ New Features * **Updated CRC modules of rte_net library.** * Added run-time selection of the optimal architecture-specific CRC path. + * Added optimized implementations of CRC32-Ethernet and CRC16-CCITT + using the AVX512 and VPCLMULQDQ instruction sets. * **Updated Cisco enic driver.** diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index b6880bd85..eeae25bc1 100644 --- a/lib/librte_net/meson.build +++ b/lib/librte_net/meson.build @@ -24,18 +24,62 @@ deps += ['mbuf'] if dpdk_conf.has('RTE_ARCH_X86_64') net_crc_sse42_cpu_support = \ cc.get_define('__PCLMUL__', args: machine_args) != '' + net_crc_avx512_cpu_support = \ + cc.get_define('__AVX512F__', args: machine_args) != '' and \ + cc.get_define('__AVX512BW__', args: machine_args) != '' and \ + cc.get_define('__AVX512DQ__', args: machine_args) != '' and \ + cc.get_define('__AVX512VL__', args: machine_args) != '' and \ + cc.get_define('__VPCLMULQDQ__', args: machine_args) != '' + net_crc_sse42_cc_support = \ cc.has_argument('-mpclmul') and cc.has_argument('-maes') + net_crc_avx512_cc_support = \ + not machine_args.contains('-mno-avx512f') and \ + cc.has_argument('-mavx512f') and \ + cc.has_argument('-mavx512bw') and \ + cc.has_argument('-mavx512dq') and \ + cc.has_argument('-mavx512vl') and \ + cc.has_argument('-mvpclmulqdq') and \ + cc.has_argument('-mavx2') and \ + cc.has_argument('-mavx') build_static_net_crc_sse42_lib = 0 + build_static_net_crc_avx512_lib = 0 if net_crc_sse42_cpu_support == true sources += files('net_crc_sse.c') cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] + if net_crc_avx512_cpu_support == true + sources += files('net_crc_avx512.c') + cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] + elif net_crc_avx512_cc_support == true + build_static_net_crc_avx512_lib = 1 + net_crc_avx512_lib_cflags = ['-mavx512f', + '-mavx512bw', + '-mavx512dq', + '-mavx512vl', + '-mvpclmulqdq', + '-mavx2', + '-mavx'] + cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] + endif elif net_crc_sse42_cc_support == true build_static_net_crc_sse42_lib = 1 net_crc_sse42_lib_cflags = ['-mpclmul', '-maes'] cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] + if net_crc_avx512_cc_support == true + build_static_net_crc_avx512_lib = 1 + net_crc_avx512_lib_cflags = ['-mpclmul', + '-maes', + '-mavx512f', + '-mavx512bw', + '-mavx512dq', + '-mavx512vl', + '-mvpclmulqdq', + '-mavx2', + '-mavx'] + cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] + endif endif if build_static_net_crc_sse42_lib == 1 @@ -47,6 +91,17 @@ if dpdk_conf.has('RTE_ARCH_X86_64') net_crc_sse42_lib_cflags]) objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c') endif + + if build_static_net_crc_avx512_lib == 1 + net_crc_avx512_lib = static_library( + 'net_crc_avx512_lib', + 'net_crc_avx512.c', + dependencies: static_rte_eal, + c_args: [cflags, + net_crc_avx512_lib_cflags]) + objs += net_crc_avx512_lib.extract_objects('net_crc_avx512.c') + endif + elif dpdk_conf.has('RTE_ARCH_ARM64') and \ cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '' sources += files('net_crc_neon.c') diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h index a1578a56c..7a74d5406 100644 --- a/lib/librte_net/net_crc.h +++ b/lib/librte_net/net_crc.h @@ -20,6 +20,17 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len); uint32_t rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len); +/* AVX512 */ + +void +rte_net_crc_avx512_init(void); + +uint32_t +rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len); + +uint32_t +rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len); + /* NEON */ void diff --git a/lib/librte_net/net_crc_avx512.c b/lib/librte_net/net_crc_avx512.c new file mode 100644 index 000000000..81aac6349 --- /dev/null +++ b/lib/librte_net/net_crc_avx512.c @@ -0,0 +1,424 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#include <string.h> + +#include <rte_common.h> +#include <rte_branch_prediction.h> +#include <rte_cpuflags.h> + +#include "net_crc.h" + +#include <x86intrin.h> + +/* VPCLMULQDQ CRC computation context structure */ +struct crc_vpclmulqdq_ctx { + __m512i rk1_rk2; + __m512i rk3_rk4; + __m512i fold_7x128b; + __m512i fold_3x128b; + __m128i rk5_rk6; + __m128i rk7_rk8; + __m128i fold_1x128b; +}; + +static struct crc_vpclmulqdq_ctx crc32_eth __rte_aligned(64); +static struct crc_vpclmulqdq_ctx crc16_ccitt __rte_aligned(64); + +static uint16_t byte_len_to_mask_table[] = { + 0x0000, 0x0001, 0x0003, 0x0007, + 0x000f, 0x001f, 0x003f, 0x007f, + 0x00ff, 0x01ff, 0x03ff, 0x07ff, + 0x0fff, 0x1fff, 0x3fff, 0x7fff, + 0xffff}; + +static const uint8_t shf_table[32] __rte_aligned(16) = { + 0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, + 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, + 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f +}; + +static const uint32_t mask[4] __rte_aligned(16) = { + 0xffffffff, 0xffffffff, 0x00000000, 0x00000000 +}; + +static const uint32_t mask2[4] __rte_aligned(16) = { + 0x00000000, 0xffffffff, 0xffffffff, 0xffffffff +}; + +static __rte_always_inline __m512i +crcr32_folding_round(__m512i data_block, __m512i precomp, __m512i fold) +{ + __m512i tmp0, tmp1; + + tmp0 = _mm512_clmulepi64_epi128(fold, precomp, 0x01); + tmp1 = _mm512_clmulepi64_epi128(fold, precomp, 0x10); + + return _mm512_ternarylogic_epi64(tmp0, tmp1, data_block, 0x96); +} + +static __rte_always_inline __m128i +crc32_fold_128(__m512i fold0, __m512i fold1, + const struct crc_vpclmulqdq_ctx *params) +{ + __m128i res, res2; + __m256i a; + __m512i tmp0, tmp1, tmp2, tmp3; + __m512i tmp4; + + tmp0 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, 0x01); + tmp1 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, 0x10); + + res = _mm512_extracti64x2_epi64(fold1, 3); + tmp4 = _mm512_maskz_broadcast_i32x4(0xF, res); + + tmp2 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, 0x01); + tmp3 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, 0x10); + + tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp1, tmp2, 0x96); + tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp3, tmp4, 0x96); + + tmp1 = _mm512_shuffle_i64x2(tmp0, tmp0, 0x4e); + + a = _mm256_xor_si256(*(__m256i *)&tmp1, *(__m256i *)&tmp0); + res = _mm256_extracti64x2_epi64(a, 1); + res2 = _mm_xor_si128(res, *(__m128i *)&a); + + return res2; +} + +static __rte_always_inline __m128i +last_two_xmm(const uint8_t *data, uint32_t data_len, uint32_t n, __m128i res, + const struct crc_vpclmulqdq_ctx *params) +{ + uint32_t offset; + __m128i res2, res3, res4, pshufb_shf; + + const uint32_t mask3[4] __rte_aligned(16) = { + 0x80808080, 0x80808080, 0x80808080, 0x80808080 + }; + + res2 = res; + offset = data_len - n; + res3 = _mm_loadu_si128((const __m128i *)&data[n+offset-16]); + + pshufb_shf = _mm_loadu_si128((const __m128i *) + (shf_table + (data_len-n))); + + res = _mm_shuffle_epi8(res, pshufb_shf); + pshufb_shf = _mm_xor_si128(pshufb_shf, + _mm_load_si128((const __m128i *) mask3)); + res2 = _mm_shuffle_epi8(res2, pshufb_shf); + + res2 = _mm_blendv_epi8(res2, res3, pshufb_shf); + + res4 = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x01); + res = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x10); + res = _mm_ternarylogic_epi64(res, res2, res4, 0x96); + + return res; +} + +static __rte_always_inline __m128i +done_128(__m128i res, const struct crc_vpclmulqdq_ctx *params) +{ + __m128i res1; + + res1 = res; + + res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x0); + res1 = _mm_srli_si128(res1, 8); + res = _mm_xor_si128(res, res1); + + res1 = res; + res = _mm_slli_si128(res, 4); + res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x10); + res = _mm_xor_si128(res, res1); + + return res; +} + +static __rte_always_inline uint32_t +barrett_reduction(__m128i data64, const struct crc_vpclmulqdq_ctx *params) +{ + __m128i tmp0, tmp1; + + data64 = _mm_and_si128(data64, *(const __m128i *)mask2); + tmp0 = data64; + tmp1 = data64; + + data64 = _mm_clmulepi64_si128(tmp0, params->rk7_rk8, 0x0); + data64 = _mm_ternarylogic_epi64(data64, tmp1, *(const __m128i *)mask, + 0x28); + + tmp1 = data64; + data64 = _mm_clmulepi64_si128(data64, params->rk7_rk8, 0x10); + data64 = _mm_ternarylogic_epi64(data64, tmp1, tmp0, 0x96); + + return _mm_extract_epi32(data64, 2); +} + +static __rte_always_inline void +reduction_loop(__m128i *fold, int *len, const uint8_t *data, uint32_t *n, + const struct crc_vpclmulqdq_ctx *params) +{ + __m128i tmp, tmp1; + + tmp = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x1); + *fold = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x10); + *fold = _mm_xor_si128(*fold, tmp); + tmp1 = _mm_loadu_si128((const __m128i *)&data[*n]); + *fold = _mm_xor_si128(*fold, tmp1); + *n += 16; + *len -= 16; +} + +static __rte_always_inline uint32_t +crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len, uint32_t crc, + const struct crc_vpclmulqdq_ctx *params) +{ + __m128i res, d; + __m256i b; + __m512i temp, k; + __m512i qw0 = _mm512_set1_epi64(0), qw1, qw2, qw3; + __m512i fold0, fold1, fold2, fold3; + __mmask16 mask; + uint32_t n = 0; + int reduction = 0; + + /* Get CRC init value */ + b = _mm256_insert_epi32(_mm256_setzero_si256(), crc, 0); + temp = _mm512_inserti32x8(_mm512_setzero_si512(), b, 0); + + if (data_len > 255) { + fold0 = _mm512_loadu_si512((const __m512i *)data); + fold1 = _mm512_loadu_si512((const __m512i *)(data+64)); + fold2 = _mm512_loadu_si512((const __m512i *)(data+128)); + fold3 = _mm512_loadu_si512((const __m512i *)(data+192)); + fold0 = _mm512_xor_si512(fold0, temp); + + /* Main folding loop */ + k = params->rk1_rk2; + for (n = 256; (n + 256) <= data_len; n += 256) { + qw0 = _mm512_loadu_si512((const __m512i *)&data[n]); + qw1 = _mm512_loadu_si512((const __m512i *) + &(data[n+64])); + qw2 = _mm512_loadu_si512((const __m512i *) + &(data[n+128])); + qw3 = _mm512_loadu_si512((const __m512i *) + &(data[n+192])); + fold0 = crcr32_folding_round(qw0, k, fold0); + fold1 = crcr32_folding_round(qw1, k, fold1); + fold2 = crcr32_folding_round(qw2, k, fold2); + fold3 = crcr32_folding_round(qw3, k, fold3); + } + + /* 256 to 128 fold */ + k = params->rk3_rk4; + fold0 = crcr32_folding_round(fold2, k, fold0); + fold1 = crcr32_folding_round(fold3, k, fold1); + + res = crc32_fold_128(fold0, fold1, params); + + reduction = 240 - ((n+256)-data_len); + + while (reduction > 0) + reduction_loop(&res, &reduction, data, &n, + params); + + reduction += 16; + + if (n != data_len) + res = last_two_xmm(data, data_len, n, res, + params); + } else { + if (data_len > 31) { + res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0); + d = _mm_loadu_si128((const __m128i *)data); + res = _mm_xor_si128(res, d); + n += 16; + + reduction = 240 - ((n+256)-data_len); + + while (reduction > 0) + reduction_loop(&res, &reduction, data, &n, + params); + + if (n != data_len) + res = last_two_xmm(data, data_len, n, res, + params); + } else if (data_len > 16) { + res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0); + d = _mm_loadu_si128((const __m128i *)data); + res = _mm_xor_si128(res, d); + n += 16; + + if (n != data_len) + res = last_two_xmm(data, data_len, n, res, + params); + } else if (data_len == 16) { + res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0); + d = _mm_loadu_si128((const __m128i *)data); + res = _mm_xor_si128(res, d); + } else { + res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0); + mask = byte_len_to_mask_table[data_len]; + d = _mm_maskz_loadu_epi8(mask, data); + res = _mm_xor_si128(res, d); + + if (data_len > 3) { + d = _mm_loadu_si128((const __m128i *) + &shf_table[data_len]); + res = _mm_shuffle_epi8(res, d); + } else if (data_len > 2) { + res = _mm_slli_si128(res, 5); + goto do_barrett_reduction; + } else if (data_len > 1) { + res = _mm_slli_si128(res, 6); + goto do_barrett_reduction; + } else if (data_len > 0) { + res = _mm_slli_si128(res, 7); + goto do_barrett_reduction; + } else { + /* zero length case */ + return crc; + } + } + } + + res = done_128(res, params); + +do_barrett_reduction: + n = barrett_reduction(res, params); + + return n; +} + +static void +crc32_load_init_constants(void) +{ + __m128i a; + /* fold constants */ + uint64_t c0 = 0x00000000e95c1271; + uint64_t c1 = 0x00000000ce3371cb; + uint64_t c2 = 0x00000000910eeec1; + uint64_t c3 = 0x0000000033fff533; + uint64_t c4 = 0x000000000cbec0ed; + uint64_t c5 = 0x0000000031f8303f; + uint64_t c6 = 0x0000000057c54819; + uint64_t c7 = 0x00000000df068dc2; + uint64_t c8 = 0x00000000ae0b5394; + uint64_t c9 = 0x000000001c279815; + uint64_t c10 = 0x000000001d9513d7; + uint64_t c11 = 0x000000008f352d95; + uint64_t c12 = 0x00000000af449247; + uint64_t c13 = 0x000000003db1ecdc; + uint64_t c14 = 0x0000000081256527; + uint64_t c15 = 0x00000000f1da05aa; + uint64_t c16 = 0x00000000ccaa009e; + uint64_t c17 = 0x00000000ae689191; + uint64_t c18 = 0x00000000ccaa009e; + uint64_t c19 = 0x00000000b8bc6765; + uint64_t c20 = 0x00000001f7011640; + uint64_t c21 = 0x00000001db710640; + + a = _mm_set_epi64x(c1, c0); + crc32_eth.rk1_rk2 = _mm512_broadcast_i32x4(a); + + a = _mm_set_epi64x(c3, c2); + crc32_eth.rk3_rk4 = _mm512_broadcast_i32x4(a); + + crc32_eth.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8, + c9, c10, c11); + crc32_eth.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15, + c16, c17, 0, 0); + crc32_eth.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16), + _mm_cvtsi64_m64(c17)); + + crc32_eth.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18), + _mm_cvtsi64_m64(c19)); + crc32_eth.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20), + _mm_cvtsi64_m64(c21)); +} + +static void +crc16_load_init_constants(void) +{ + __m128i a; + /* fold constants */ + uint64_t c0 = 0x0000000000009a19; + uint64_t c1 = 0x0000000000002df8; + uint64_t c2 = 0x00000000000068af; + uint64_t c3 = 0x000000000000b6c9; + uint64_t c4 = 0x000000000000c64f; + uint64_t c5 = 0x000000000000cd95; + uint64_t c6 = 0x000000000000d341; + uint64_t c7 = 0x000000000000b8f2; + uint64_t c8 = 0x0000000000000842; + uint64_t c9 = 0x000000000000b072; + uint64_t c10 = 0x00000000000047e3; + uint64_t c11 = 0x000000000000922d; + uint64_t c12 = 0x0000000000000e3a; + uint64_t c13 = 0x0000000000004d7a; + uint64_t c14 = 0x0000000000005b44; + uint64_t c15 = 0x0000000000007762; + uint64_t c16 = 0x00000000000081bf; + uint64_t c17 = 0x0000000000008e10; + uint64_t c18 = 0x00000000000081bf; + uint64_t c19 = 0x0000000000001cbb; + uint64_t c20 = 0x000000011c581910; + uint64_t c21 = 0x0000000000010810; + + a = _mm_set_epi64x(c1, c0); + crc16_ccitt.rk1_rk2 = _mm512_broadcast_i32x4(a); + + a = _mm_set_epi64x(c3, c2); + crc16_ccitt.rk3_rk4 = _mm512_broadcast_i32x4(a); + + crc16_ccitt.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8, + c9, c10, c11); + crc16_ccitt.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15, + c16, c17, 0, 0); + crc16_ccitt.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16), + _mm_cvtsi64_m64(c17)); + + crc16_ccitt.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18), + _mm_cvtsi64_m64(c19)); + crc16_ccitt.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20), + _mm_cvtsi64_m64(c21)); +} + +void +rte_net_crc_avx512_init(void) +{ + crc32_load_init_constants(); + crc16_load_init_constants(); + + /* + * Reset the register as following calculation may + * use other data types such as float, double, etc. + */ + _mm_empty(); +} + +uint32_t +rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len) +{ + /* return 16-bit CRC value */ + return (uint16_t)~crc32_eth_calc_vpclmulqdq(data, + data_len, + 0xffff, + &crc16_ccitt); +} + +uint32_t +rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len) +{ + /* return 32-bit CRC value */ + return ~crc32_eth_calc_vpclmulqdq(data, + data_len, + 0xffffffffUL, + &crc32_eth); +} diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index 83dccbfba..fcf9cc0ef 100644 --- a/lib/librte_net/rte_net_crc.c +++ b/lib/librte_net/rte_net_crc.c @@ -37,6 +37,12 @@ static rte_net_crc_handler handlers_scalar[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, }; +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT +static rte_net_crc_handler handlers_avx512[] = { + [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_avx512_handler, + [RTE_NET_CRC32_ETH] = rte_crc32_eth_avx512_handler, +}; +#endif #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static rte_net_crc_handler handlers_sse42[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, @@ -132,6 +138,19 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len) crc32_eth_lut); } +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT +static uint8_t +avx512_vpclmulqdq_cpu_supported(void) +{ + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512DQ) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_VPCLMULQDQ); +} +#endif + #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static uint8_t sse42_pclmulqdq_cpu_supported(void) @@ -153,6 +172,14 @@ rte_net_crc_set_alg(enum rte_net_crc_alg alg) { switch (alg) { #ifdef RTE_ARCH_X86_64 + case RTE_NET_CRC_AVX512: +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT + if (avx512_vpclmulqdq_cpu_supported()) { + handlers = handlers_avx512; + break; + } +#endif + /* fall-through */ case RTE_NET_CRC_SSE42: #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT if (sse42_pclmulqdq_cpu_supported()) { @@ -206,6 +233,12 @@ RTE_INIT(rte_net_crc_init) rte_net_crc_sse42_init(); } #endif +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT + if (avx512_vpclmulqdq_cpu_supported()) { + alg = RTE_NET_CRC_AVX512; + rte_net_crc_avx512_init(); + } +#endif #ifdef CC_ARM64_NEON_PMULL_SUPPORT if (neon_pmull_cpu_supported()) { alg = RTE_NET_CRC_NEON; diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h index 16e85ca97..72d3e10ff 100644 --- a/lib/librte_net/rte_net_crc.h +++ b/lib/librte_net/rte_net_crc.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2017-2020 Intel Corporation */ #ifndef _RTE_NET_CRC_H_ @@ -23,6 +23,7 @@ enum rte_net_crc_alg { RTE_NET_CRC_SCALAR = 0, RTE_NET_CRC_SSE42, RTE_NET_CRC_NEON, + RTE_NET_CRC_AVX512, }; /** @@ -35,6 +36,7 @@ enum rte_net_crc_alg { * - RTE_NET_CRC_SCALAR * - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic) * - RTE_NET_CRC_NEON (Use ARM Neon intrinsic) + * - RTE_NET_CRC_AVX512 (Use 512-bit AVX intrinsic) */ void rte_net_crc_set_alg(enum rte_net_crc_alg alg); -- 2.12.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC 2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh @ 2020-10-05 13:20 ` De Lara Guarch, Pablo 2020-10-05 13:38 ` O'loingsigh, Mairtin 0 siblings, 1 reply; 23+ messages in thread From: De Lara Guarch, Pablo @ 2020-10-05 13:20 UTC (permalink / raw) To: O'loingsigh, Mairtin, Singh, Jasvinder, Richardson, Bruce Cc: dev, Ryan, Brendan, Coyle, David Hi Mairtin, > -----Original Message----- > From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com> > Sent: Tuesday, September 29, 2020 4:36 PM > To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce > <bruce.richardson@intel.com>; De Lara Guarch, Pablo > <pablo.de.lara.guarch@intel.com> > Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; Coyle, David > <david.coyle@intel.com>; O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com> > Subject: [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC > > This patch enables the optimized calculation of CRC32-Ethernet and CRC16- > CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC > implementation is built if the compiler supports the required instruction sets. It is > selected at run-time if the host CPU, again, supports the required instruction > sets. > > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> > Signed-off-by: David Coyle <david.coyle@intel.com> ... > +static __rte_always_inline uint32_t > +crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len, uint32_t > crc, > + const struct crc_vpclmulqdq_ctx *params) { > + __m128i res, d; > + __m256i b; > + __m512i temp, k; > + __m512i qw0 = _mm512_set1_epi64(0), qw1, qw2, qw3; > + __m512i fold0, fold1, fold2, fold3; > + __mmask16 mask; > + uint32_t n = 0; > + int reduction = 0; > + > + /* Get CRC init value */ > + b = _mm256_insert_epi32(_mm256_setzero_si256(), crc, 0); > + temp = _mm512_inserti32x8(_mm512_setzero_si512(), b, 0); You can replace this with the following, which produces less instructions (b needs to be changed to __m128i): b = _mm_cvtsi32_si128(crc); temp = _mm512_castsi128_si512(b); > + > + if (data_len > 255) { > + fold0 = _mm512_loadu_si512((const __m512i *)data); ... > + } else { > + if (data_len > 31) { > + res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0); Should work better with: res = _mm_cvtsi32_si128(crc); > + d = _mm_loadu_si128((const __m128i *)data); > + res = _mm_xor_si128(res, d); > + n += 16; > + > + reduction = 240 - ((n+256)-data_len); > + > + while (reduction > 0) > + reduction_loop(&res, &reduction, data, &n, > + params); > + > + if (n != data_len) > + res = last_two_xmm(data, data_len, n, res, > + params); > + } else if (data_len > 16) { > + res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0); Same as above. > + d = _mm_loadu_si128((const __m128i *)data); > + res = _mm_xor_si128(res, d); > + n += 16; > + > + if (n != data_len) > + res = last_two_xmm(data, data_len, n, res, > + params); > + } else if (data_len == 16) { > + res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0); Same. > + d = _mm_loadu_si128((const __m128i *)data); > + res = _mm_xor_si128(res, d); > + } else { > + res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0); Same. > + mask = byte_len_to_mask_table[data_len]; > + d = _mm_maskz_loadu_epi8(mask, data); ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC 2020-10-05 13:20 ` De Lara Guarch, Pablo @ 2020-10-05 13:38 ` O'loingsigh, Mairtin 0 siblings, 0 replies; 23+ messages in thread From: O'loingsigh, Mairtin @ 2020-10-05 13:38 UTC (permalink / raw) To: De Lara Guarch, Pablo, Singh, Jasvinder, Richardson, Bruce Cc: dev, Ryan, Brendan, Coyle, David Hi Pablo, > -----Original Message----- > From: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com> > Sent: Monday, October 5, 2020 2:20 PM > To: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>; Singh, Jasvinder > <jasvinder.singh@intel.com>; Richardson, Bruce > <bruce.richardson@intel.com> > Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; Coyle, David > <david.coyle@intel.com> > Subject: RE: [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ > based CRC > > Hi Mairtin, > > > -----Original Message----- > > From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com> > > Sent: Tuesday, September 29, 2020 4:36 PM > > To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce > > <bruce.richardson@intel.com>; De Lara Guarch, Pablo > > <pablo.de.lara.guarch@intel.com> > > Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; Coyle, > David > > <david.coyle@intel.com>; O'loingsigh, Mairtin > > <mairtin.oloingsigh@intel.com> > > Subject: [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based > > CRC > > > > This patch enables the optimized calculation of CRC32-Ethernet and > > CRC16- CCITT using the AVX512 and VPCLMULQDQ instruction sets. This > > CRC implementation is built if the compiler supports the required > > instruction sets. It is selected at run-time if the host CPU, again, > > supports the required instruction sets. > > > > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> > > Signed-off-by: David Coyle <david.coyle@intel.com> > > ... > > > +static __rte_always_inline uint32_t > > +crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len, > > +uint32_t > > crc, > > + const struct crc_vpclmulqdq_ctx *params) { > > + __m128i res, d; > > + __m256i b; > > + __m512i temp, k; > > + __m512i qw0 = _mm512_set1_epi64(0), qw1, qw2, qw3; > > + __m512i fold0, fold1, fold2, fold3; > > + __mmask16 mask; > > + uint32_t n = 0; > > + int reduction = 0; > > + > > + /* Get CRC init value */ > > + b = _mm256_insert_epi32(_mm256_setzero_si256(), crc, 0); > > + temp = _mm512_inserti32x8(_mm512_setzero_si512(), b, 0); > > You can replace this with the following, which produces less instructions (b > needs to be changed to __m128i): > > b = _mm_cvtsi32_si128(crc); > temp = _mm512_castsi128_si512(b); > > > + > > + if (data_len > 255) { > > + fold0 = _mm512_loadu_si512((const __m512i *)data); > > ... > > > + } else { > > + if (data_len > 31) { > > + res = _mm_insert_epi32(_mm_setzero_si128(), crc, > 0); > > Should work better with: > > res = _mm_cvtsi32_si128(crc); > > > + d = _mm_loadu_si128((const __m128i *)data); > > + res = _mm_xor_si128(res, d); > > + n += 16; > > + > > + reduction = 240 - ((n+256)-data_len); > > + > > + while (reduction > 0) > > + reduction_loop(&res, &reduction, data, &n, > > + params); > > + > > + if (n != data_len) > > + res = last_two_xmm(data, data_len, n, res, > > + params); > > + } else if (data_len > 16) { > > + res = _mm_insert_epi32(_mm_setzero_si128(), crc, > 0); > > Same as above. > > > + d = _mm_loadu_si128((const __m128i *)data); > > + res = _mm_xor_si128(res, d); > > + n += 16; > > + > > + if (n != data_len) > > + res = last_two_xmm(data, data_len, n, res, > > + params); > > + } else if (data_len == 16) { > > + res = _mm_insert_epi32(_mm_setzero_si128(), crc, > 0); > > Same. > > > + d = _mm_loadu_si128((const __m128i *)data); > > + res = _mm_xor_si128(res, d); > > + } else { > > + res = _mm_insert_epi32(_mm_setzero_si128(), crc, > 0); > > Same. > > > + mask = byte_len_to_mask_table[data_len]; > > + d = _mm_maskz_loadu_epi8(mask, data); Thanks for the feedback. Ill make these changes and submit a v4 patch Regards, Mairtin ^ permalink raw reply [flat|nested] 23+ messages in thread
* [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC 2020-09-29 15:35 [dpdk-dev] [PATCH v3 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh 2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh 2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh @ 2020-10-06 16:23 ` Mairtin o Loingsigh 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh ` (3 more replies) 2 siblings, 4 replies; 23+ messages in thread From: Mairtin o Loingsigh @ 2020-10-06 16:23 UTC (permalink / raw) To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch Cc: dev, brendan.ryan, mairtin.oloingsigh, david.coyle This patchset makes two significant enhancements to the CRC modules of the rte_net library: 1) Adds run-time selection of the optimal architecture-specific CRC path. Previously the selection was solely made at compile-time, meaning it could only be built and run on the same generation of CPU. Adding run-time selection ability means this can be used from distro packages and/or DPDK can be compiled on an older CPU and run on a newer CPU. 2) Adds an optimized CRC implementation based on the AVX512 and VPCLMULQDQ instruction sets. For further details, please see the commit messages of the individual patches. v4: * Fixed build issue when older version of meson is used (0.47.1) * Addressed review comments * remove Intel copyright header from neon CRC file * tidy-up of register initialisation v3: * Re-submitted v2 as encountered problems when originally submitting it. v2: * Added support for run-time selection of optimal architecture-specific CRC, based on v1 review comment. * Added full working AVX512/VPCLMULQDQ support for CRC32-Ethernet and CRC16-CCITT. v1: * Initial version, with incomplete AVX512/VPCLMULQDQ support for CRC32-Ethernet only. Mairtin o Loingsigh (2): net: add run-time architecture specific CRC selection net: add support for AVX512/VPCLMULQDQ based CRC app/test/test_crc.c | 11 +- config/x86/meson.build | 6 +- doc/guides/rel_notes/release_20_11.rst | 6 + lib/librte_net/meson.build | 89 ++++- lib/librte_net/net_crc.h | 45 +++ lib/librte_net/net_crc_avx512.c | 423 ++++++++++++++++++++++ lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 +- lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 +- lib/librte_net/rte_net_crc.c | 100 +++-- lib/librte_net/rte_net_crc.h | 4 +- 10 files changed, 672 insertions(+), 72 deletions(-) create mode 100644 lib/librte_net/net_crc.h create mode 100644 lib/librte_net/net_crc_avx512.c rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%) rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%) -- 2.12.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh @ 2020-10-06 16:23 ` Mairtin o Loingsigh 2020-10-07 14:59 ` Ananyev, Konstantin 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh ` (2 subsequent siblings) 3 siblings, 1 reply; 23+ messages in thread From: Mairtin o Loingsigh @ 2020-10-06 16:23 UTC (permalink / raw) To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch Cc: dev, brendan.ryan, mairtin.oloingsigh, david.coyle This patch adds support for run-time selection of the optimal architecture-specific CRC path, based on the supported instruction set(s) of the CPU. The compiler option checks have been moved from the C files to the meson script. The rte_cpu_get_flag_enabled function is called automatically by the library at process initialization time to determine which instructions the CPU supports, with the most optimal supported CRC path ultimately selected. Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> Signed-off-by: David Coyle <david.coyle@intel.com> --- doc/guides/rel_notes/release_20_11.rst | 4 ++ lib/librte_net/meson.build | 34 +++++++++++- lib/librte_net/net_crc.h | 34 ++++++++++++ lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 +++------ lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 ++++-------- lib/librte_net/rte_net_crc.c | 67 ++++++++++++++--------- 6 files changed, 131 insertions(+), 68 deletions(-) create mode 100644 lib/librte_net/net_crc.h rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%) rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%) diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index ca5ec7391..0f14e087d 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -55,6 +55,10 @@ New Features Also, make sure to start the actual text at the margin. ======================================================= +* **Updated CRC modules of rte_net library.** + + * Added run-time selection of the optimal architecture-specific CRC path. + * **Updated Broadcom bnxt driver.** Updated the Broadcom bnxt driver with new features and improvements, including: diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index 24ed8253b..fa439b9e5 100644 --- a/lib/librte_net/meson.build +++ b/lib/librte_net/meson.build @@ -1,5 +1,5 @@ # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017 Intel Corporation +# Copyright(c) 2017-2020 Intel Corporation headers = files('rte_ip.h', 'rte_tcp.h', @@ -20,3 +20,35 @@ headers = files('rte_ip.h', sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c') deps += ['mbuf'] + +if dpdk_conf.has('RTE_ARCH_X86_64') + net_crc_sse42_cpu_support = ( + cc.get_define('__PCLMUL__', args: machine_args) != '') + net_crc_sse42_cc_support = ( + cc.has_argument('-mpclmul') and cc.has_argument('-maes')) + + build_static_net_crc_sse42_lib = 0 + + if net_crc_sse42_cpu_support == true + sources += files('net_crc_sse.c') + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] + elif net_crc_sse42_cc_support == true + build_static_net_crc_sse42_lib = 1 + net_crc_sse42_lib_cflags = ['-mpclmul', '-maes'] + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] + endif + + if build_static_net_crc_sse42_lib == 1 + net_crc_sse42_lib = static_library( + 'net_crc_sse42_lib', + 'net_crc_sse.c', + dependencies: static_rte_eal, + c_args: [cflags, + net_crc_sse42_lib_cflags]) + objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c') + endif +elif (dpdk_conf.has('RTE_ARCH_ARM64') and + cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '') + sources += files('net_crc_neon.c') + cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT'] +endif diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h new file mode 100644 index 000000000..a1578a56c --- /dev/null +++ b/lib/librte_net/net_crc.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _NET_CRC_H_ +#define _NET_CRC_H_ + +/* + * Different implementations of CRC + */ + +/* SSE4.2 */ + +void +rte_net_crc_sse42_init(void); + +uint32_t +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len); + +uint32_t +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len); + +/* NEON */ + +void +rte_net_crc_neon_init(void); + +uint32_t +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len); + +uint32_t +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len); + +#endif /* _NET_CRC_H_ */ diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c similarity index 95% rename from lib/librte_net/net_crc_neon.h rename to lib/librte_net/net_crc_neon.c index 63fa1d4a1..f61d75a8c 100644 --- a/lib/librte_net/net_crc_neon.h +++ b/lib/librte_net/net_crc_neon.c @@ -2,17 +2,15 @@ * Copyright(c) 2017 Cavium, Inc */ -#ifndef _NET_CRC_NEON_H_ -#define _NET_CRC_NEON_H_ +#include <string.h> +#include <rte_common.h> #include <rte_branch_prediction.h> #include <rte_net_crc.h> #include <rte_vect.h> #include <rte_cpuflags.h> -#ifdef __cplusplus -extern "C" { -#endif +#include "net_crc.h" /** PMULL CRC computation context structure */ struct crc_pmull_ctx { @@ -218,7 +216,7 @@ crc32_eth_calc_pmull( return n; } -static inline void +void rte_net_crc_neon_init(void) { /* Initialize CRC16 data */ @@ -242,9 +240,8 @@ rte_net_crc_neon_init(void) crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8); } -static inline uint32_t -rte_crc16_ccitt_neon_handler(const uint8_t *data, - uint32_t data_len) +uint32_t +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len) { return (uint16_t)~crc32_eth_calc_pmull(data, data_len, @@ -252,18 +249,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data, &crc16_ccitt_pmull); } -static inline uint32_t -rte_crc32_eth_neon_handler(const uint8_t *data, - uint32_t data_len) +uint32_t +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len) { return ~crc32_eth_calc_pmull(data, data_len, 0xffffffffUL, &crc32_eth_pmull); } - -#ifdef __cplusplus -} -#endif - -#endif /* _NET_CRC_NEON_H_ */ diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c similarity index 94% rename from lib/librte_net/net_crc_sse.h rename to lib/librte_net/net_crc_sse.c index 1c7b7a548..053b54b39 100644 --- a/lib/librte_net/net_crc_sse.h +++ b/lib/librte_net/net_crc_sse.c @@ -1,18 +1,16 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2017-2020 Intel Corporation */ -#ifndef _RTE_NET_CRC_SSE_H_ -#define _RTE_NET_CRC_SSE_H_ +#include <string.h> +#include <rte_common.h> #include <rte_branch_prediction.h> +#include <rte_cpuflags.h> -#include <x86intrin.h> -#include <cpuid.h> +#include "net_crc.h" -#ifdef __cplusplus -extern "C" { -#endif +#include <x86intrin.h> /** PCLMULQDQ CRC computation context structure */ struct crc_pclmulqdq_ctx { @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq( return n; } - -static inline void +void rte_net_crc_sse42_init(void) { uint64_t k1, k2, k5, k6; @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void) * use other data types such as float, double, etc. */ _mm_empty(); - } -static inline uint32_t -rte_crc16_ccitt_sse42_handler(const uint8_t *data, - uint32_t data_len) +uint32_t +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len) { /** return 16-bit CRC value */ return (uint16_t)~crc32_eth_calc_pclmulqdq(data, @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, &crc16_ccitt_pclmulqdq); } -static inline uint32_t -rte_crc32_eth_sse42_handler(const uint8_t *data, - uint32_t data_len) +uint32_t +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len) { return ~crc32_eth_calc_pclmulqdq(data, data_len, 0xffffffffUL, &crc32_eth_pclmulqdq); } - -#ifdef __cplusplus -} -#endif - -#endif /* _RTE_NET_CRC_SSE_H_ */ diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index 4f5b9e828..83dccbfba 100644 --- a/lib/librte_net/rte_net_crc.c +++ b/lib/librte_net/rte_net_crc.c @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2017-2020 Intel Corporation */ #include <stddef.h> @@ -10,17 +10,7 @@ #include <rte_common.h> #include <rte_net_crc.h> -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__) -#define X86_64_SSE42_PCLMULQDQ 1 -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO) -#define ARM64_NEON_PMULL 1 -#endif - -#ifdef X86_64_SSE42_PCLMULQDQ -#include <net_crc_sse.h> -#elif defined ARM64_NEON_PMULL -#include <net_crc_neon.h> -#endif +#include "net_crc.h" /** CRC polynomials */ #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL @@ -47,13 +37,13 @@ static rte_net_crc_handler handlers_scalar[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, }; - -#ifdef X86_64_SSE42_PCLMULQDQ +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static rte_net_crc_handler handlers_sse42[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler, }; -#elif defined ARM64_NEON_PMULL +#endif +#ifdef CC_ARM64_NEON_PMULL_SUPPORT static rte_net_crc_handler handlers_neon[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler, @@ -142,22 +132,44 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len) crc32_eth_lut); } +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT +static uint8_t +sse42_pclmulqdq_cpu_supported(void) +{ + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ); +} +#endif + +#ifdef CC_ARM64_NEON_PMULL_SUPPORT +static uint8_t +neon_pmull_cpu_supported(void) +{ + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL); +} +#endif + void rte_net_crc_set_alg(enum rte_net_crc_alg alg) { switch (alg) { -#ifdef X86_64_SSE42_PCLMULQDQ +#ifdef RTE_ARCH_X86_64 case RTE_NET_CRC_SSE42: - handlers = handlers_sse42; - break; -#elif defined ARM64_NEON_PMULL - /* fall-through */ +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT + if (sse42_pclmulqdq_cpu_supported()) { + handlers = handlers_sse42; + break; + } +#endif +#endif /* RTE_ARCH_X86_64 */ +#ifdef RTE_ARCH_ARM64 case RTE_NET_CRC_NEON: - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { +#ifdef CC_ARM64_NEON_PMULL_SUPPORT + if (neon_pmull_cpu_supported()) { handlers = handlers_neon; break; } #endif +#endif /* RTE_ARCH_ARM64 */ /* fall-through */ case RTE_NET_CRC_SCALAR: /* fall-through */ @@ -188,11 +200,14 @@ RTE_INIT(rte_net_crc_init) rte_net_crc_scalar_init(); -#ifdef X86_64_SSE42_PCLMULQDQ - alg = RTE_NET_CRC_SSE42; - rte_net_crc_sse42_init(); -#elif defined ARM64_NEON_PMULL - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT + if (sse42_pclmulqdq_cpu_supported()) { + alg = RTE_NET_CRC_SSE42; + rte_net_crc_sse42_init(); + } +#endif +#ifdef CC_ARM64_NEON_PMULL_SUPPORT + if (neon_pmull_cpu_supported()) { alg = RTE_NET_CRC_NEON; rte_net_crc_neon_init(); } -- 2.12.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh @ 2020-10-07 14:59 ` Ananyev, Konstantin 2020-10-09 14:04 ` Coyle, David 0 siblings, 1 reply; 23+ messages in thread From: Ananyev, Konstantin @ 2020-10-07 14:59 UTC (permalink / raw) To: O'loingsigh, Mairtin, Singh, Jasvinder, Richardson, Bruce, De Lara Guarch, Pablo Cc: dev, Ryan, Brendan, O'loingsigh, Mairtin, Coyle, David > > This patch adds support for run-time selection of the optimal > architecture-specific CRC path, based on the supported instruction set(s) > of the CPU. > > The compiler option checks have been moved from the C files to the meson > script. The rte_cpu_get_flag_enabled function is called automatically by > the library at process initialization time to determine which > instructions the CPU supports, with the most optimal supported CRC path > ultimately selected. > > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> > Signed-off-by: David Coyle <david.coyle@intel.com> LGTM, just one nit see below. With that: Series acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > --- > doc/guides/rel_notes/release_20_11.rst | 4 ++ > lib/librte_net/meson.build | 34 +++++++++++- > lib/librte_net/net_crc.h | 34 ++++++++++++ > lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 +++------ > lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 ++++-------- > lib/librte_net/rte_net_crc.c | 67 ++++++++++++++--------- > 6 files changed, 131 insertions(+), 68 deletions(-) > create mode 100644 lib/librte_net/net_crc.h > rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%) > rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%) > > diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst > index ca5ec7391..0f14e087d 100644 > --- a/doc/guides/rel_notes/release_20_11.rst > +++ b/doc/guides/rel_notes/release_20_11.rst > @@ -55,6 +55,10 @@ New Features > Also, make sure to start the actual text at the margin. > ======================================================= > > +* **Updated CRC modules of rte_net library.** > + > + * Added run-time selection of the optimal architecture-specific CRC path. > + > * **Updated Broadcom bnxt driver.** > > Updated the Broadcom bnxt driver with new features and improvements, including: > diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build > index 24ed8253b..fa439b9e5 100644 > --- a/lib/librte_net/meson.build > +++ b/lib/librte_net/meson.build > @@ -1,5 +1,5 @@ > # SPDX-License-Identifier: BSD-3-Clause > -# Copyright(c) 2017 Intel Corporation > +# Copyright(c) 2017-2020 Intel Corporation > > headers = files('rte_ip.h', > 'rte_tcp.h', > @@ -20,3 +20,35 @@ headers = files('rte_ip.h', > > sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c') > deps += ['mbuf'] > + > +if dpdk_conf.has('RTE_ARCH_X86_64') > + net_crc_sse42_cpu_support = ( > + cc.get_define('__PCLMUL__', args: machine_args) != '') > + net_crc_sse42_cc_support = ( > + cc.has_argument('-mpclmul') and cc.has_argument('-maes')) > + > + build_static_net_crc_sse42_lib = 0 > + > + if net_crc_sse42_cpu_support == true > + sources += files('net_crc_sse.c') > + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] > + elif net_crc_sse42_cc_support == true > + build_static_net_crc_sse42_lib = 1 > + net_crc_sse42_lib_cflags = ['-mpclmul', '-maes'] > + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] > + endif > + > + if build_static_net_crc_sse42_lib == 1 > + net_crc_sse42_lib = static_library( > + 'net_crc_sse42_lib', > + 'net_crc_sse.c', > + dependencies: static_rte_eal, > + c_args: [cflags, > + net_crc_sse42_lib_cflags]) > + objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c') > + endif > +elif (dpdk_conf.has('RTE_ARCH_ARM64') and > + cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '') > + sources += files('net_crc_neon.c') > + cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT'] > +endif > diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h > new file mode 100644 > index 000000000..a1578a56c > --- /dev/null > +++ b/lib/librte_net/net_crc.h > @@ -0,0 +1,34 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + */ > + > +#ifndef _NET_CRC_H_ > +#define _NET_CRC_H_ > + > +/* > + * Different implementations of CRC > + */ > + > +/* SSE4.2 */ > + > +void > +rte_net_crc_sse42_init(void); > + > +uint32_t > +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len); > + > +uint32_t > +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len); > + > +/* NEON */ > + > +void > +rte_net_crc_neon_init(void); > + > +uint32_t > +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len); > + > +uint32_t > +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len); > + > +#endif /* _NET_CRC_H_ */ > diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c > similarity index 95% > rename from lib/librte_net/net_crc_neon.h > rename to lib/librte_net/net_crc_neon.c > index 63fa1d4a1..f61d75a8c 100644 > --- a/lib/librte_net/net_crc_neon.h > +++ b/lib/librte_net/net_crc_neon.c > @@ -2,17 +2,15 @@ > * Copyright(c) 2017 Cavium, Inc > */ > > -#ifndef _NET_CRC_NEON_H_ > -#define _NET_CRC_NEON_H_ > +#include <string.h> > > +#include <rte_common.h> > #include <rte_branch_prediction.h> > #include <rte_net_crc.h> > #include <rte_vect.h> > #include <rte_cpuflags.h> > > -#ifdef __cplusplus > -extern "C" { > -#endif > +#include "net_crc.h" > > /** PMULL CRC computation context structure */ > struct crc_pmull_ctx { > @@ -218,7 +216,7 @@ crc32_eth_calc_pmull( > return n; > } > > -static inline void > +void > rte_net_crc_neon_init(void) > { > /* Initialize CRC16 data */ > @@ -242,9 +240,8 @@ rte_net_crc_neon_init(void) > crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8); > } > > -static inline uint32_t > -rte_crc16_ccitt_neon_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len) > { > return (uint16_t)~crc32_eth_calc_pmull(data, > data_len, > @@ -252,18 +249,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data, > &crc16_ccitt_pmull); > } > > -static inline uint32_t > -rte_crc32_eth_neon_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len) > { > return ~crc32_eth_calc_pmull(data, > data_len, > 0xffffffffUL, > &crc32_eth_pmull); > } > - > -#ifdef __cplusplus > -} > -#endif > - > -#endif /* _NET_CRC_NEON_H_ */ > diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c > similarity index 94% > rename from lib/librte_net/net_crc_sse.h > rename to lib/librte_net/net_crc_sse.c > index 1c7b7a548..053b54b39 100644 > --- a/lib/librte_net/net_crc_sse.h > +++ b/lib/librte_net/net_crc_sse.c > @@ -1,18 +1,16 @@ > /* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2017 Intel Corporation > + * Copyright(c) 2017-2020 Intel Corporation > */ > > -#ifndef _RTE_NET_CRC_SSE_H_ > -#define _RTE_NET_CRC_SSE_H_ > +#include <string.h> > > +#include <rte_common.h> > #include <rte_branch_prediction.h> > +#include <rte_cpuflags.h> > > -#include <x86intrin.h> > -#include <cpuid.h> > +#include "net_crc.h" > > -#ifdef __cplusplus > -extern "C" { > -#endif > +#include <x86intrin.h> > > /** PCLMULQDQ CRC computation context structure */ > struct crc_pclmulqdq_ctx { > @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq( > return n; > } > > - > -static inline void > +void > rte_net_crc_sse42_init(void) > { > uint64_t k1, k2, k5, k6; > @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void) > * use other data types such as float, double, etc. > */ > _mm_empty(); > - > } > > -static inline uint32_t > -rte_crc16_ccitt_sse42_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len) > { > /** return 16-bit CRC value */ > return (uint16_t)~crc32_eth_calc_pclmulqdq(data, > @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, > &crc16_ccitt_pclmulqdq); > } > > -static inline uint32_t > -rte_crc32_eth_sse42_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len) > { > return ~crc32_eth_calc_pclmulqdq(data, > data_len, > 0xffffffffUL, > &crc32_eth_pclmulqdq); > } > - > -#ifdef __cplusplus > -} > -#endif > - > -#endif /* _RTE_NET_CRC_SSE_H_ */ > diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c > index 4f5b9e828..83dccbfba 100644 > --- a/lib/librte_net/rte_net_crc.c > +++ b/lib/librte_net/rte_net_crc.c > @@ -1,5 +1,5 @@ > /* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2017 Intel Corporation > + * Copyright(c) 2017-2020 Intel Corporation > */ > > #include <stddef.h> > @@ -10,17 +10,7 @@ > #include <rte_common.h> > #include <rte_net_crc.h> > > -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__) > -#define X86_64_SSE42_PCLMULQDQ 1 > -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO) > -#define ARM64_NEON_PMULL 1 > -#endif > - > -#ifdef X86_64_SSE42_PCLMULQDQ > -#include <net_crc_sse.h> > -#elif defined ARM64_NEON_PMULL > -#include <net_crc_neon.h> > -#endif > +#include "net_crc.h" > > /** CRC polynomials */ > #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL > @@ -47,13 +37,13 @@ static rte_net_crc_handler handlers_scalar[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, > }; > - > -#ifdef X86_64_SSE42_PCLMULQDQ > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > static rte_net_crc_handler handlers_sse42[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler, > }; > -#elif defined ARM64_NEON_PMULL > +#endif > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > static rte_net_crc_handler handlers_neon[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler, > @@ -142,22 +132,44 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len) > crc32_eth_lut); > } > > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > +static uint8_t > +sse42_pclmulqdq_cpu_supported(void) > +{ > + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ); > +} As a nit, I think it would be better to hide #fidef inside the function, and return an 0 when define is not set. Something like: static int sse42_pclmulqdq_cpu_supported(void) { #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ); #else return 0; } Same for other cpu_supported functions. And then you can remove these ifdefs in set_alg and other palces, i.e.: void rte_net_crc_set_alg(enum rte_net_crc_alg alg) { switch (alg) { #ifdef RTE_ARCH_X86_64 case RTE_NET_CRC_AVX512: if (avx512_vpclmulqdq_cpu_supported()) { handlers = handlers_avx512; break; } /* fall-through */ case RTE_NET_CRC_SSE42: if (sse42_pclmulqdq_cpu_supported()) { handlers = handlers_sse42; break; } #endif ... Same for rte_net_crc_init() > +#endif > + > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > +static uint8_t > +neon_pmull_cpu_supported(void) > +{ > + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL); > +} > +#endif > + > void > rte_net_crc_set_alg(enum rte_net_crc_alg alg) > { > switch (alg) { > -#ifdef X86_64_SSE42_PCLMULQDQ > +#ifdef RTE_ARCH_X86_64 > case RTE_NET_CRC_SSE42: > - handlers = handlers_sse42; > - break; > -#elif defined ARM64_NEON_PMULL > - /* fall-through */ > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > + if (sse42_pclmulqdq_cpu_supported()) { > + handlers = handlers_sse42; > + break; > + } > +#endif > +#endif /* RTE_ARCH_X86_64 */ > +#ifdef RTE_ARCH_ARM64 > case RTE_NET_CRC_NEON: > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > + if (neon_pmull_cpu_supported()) { > handlers = handlers_neon; > break; > } > #endif > +#endif /* RTE_ARCH_ARM64 */ > /* fall-through */ > case RTE_NET_CRC_SCALAR: > /* fall-through */ > @@ -188,11 +200,14 @@ RTE_INIT(rte_net_crc_init) > > rte_net_crc_scalar_init(); > > -#ifdef X86_64_SSE42_PCLMULQDQ > - alg = RTE_NET_CRC_SSE42; > - rte_net_crc_sse42_init(); > -#elif defined ARM64_NEON_PMULL > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > + if (sse42_pclmulqdq_cpu_supported()) { > + alg = RTE_NET_CRC_SSE42; > + rte_net_crc_sse42_init(); > + } > +#endif > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > + if (neon_pmull_cpu_supported()) { > alg = RTE_NET_CRC_NEON; > rte_net_crc_neon_init(); > } > -- > 2.12.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection 2020-10-07 14:59 ` Ananyev, Konstantin @ 2020-10-09 14:04 ` Coyle, David 2020-10-10 12:42 ` Ananyev, Konstantin 0 siblings, 1 reply; 23+ messages in thread From: Coyle, David @ 2020-10-09 14:04 UTC (permalink / raw) To: Ananyev, Konstantin, O'loingsigh, Mairtin, Singh, Jasvinder, Richardson, Bruce, De Lara Guarch, Pablo Cc: dev, Ryan, Brendan, O'loingsigh, Mairtin Hi Konstantin, thanks for your review > -----Original Message----- > From: Ananyev, Konstantin <konstantin.ananyev@intel.com> > Sent: Wednesday, October 7, 2020 3:59 PM <snip> > > > > > This patch adds support for run-time selection of the optimal > > architecture-specific CRC path, based on the supported instruction > > set(s) of the CPU. > > > > The compiler option checks have been moved from the C files to the > > meson script. The rte_cpu_get_flag_enabled function is called > > automatically by the library at process initialization time to > > determine which instructions the CPU supports, with the most optimal > > supported CRC path ultimately selected. > > > > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> > > Signed-off-by: David Coyle <david.coyle@intel.com> > > LGTM, just one nit see below. > With that: > Series acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > > > --- > > doc/guides/rel_notes/release_20_11.rst | 4 ++ > > lib/librte_net/meson.build | 34 +++++++++++- > > lib/librte_net/net_crc.h | 34 ++++++++++++ > > lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 +++------ > > lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 ++++-------- > > lib/librte_net/rte_net_crc.c | 67 ++++++++++++++--------- > > 6 files changed, 131 insertions(+), 68 deletions(-) create mode > > 100644 lib/librte_net/net_crc.h rename lib/librte_net/{net_crc_neon.h > > => net_crc_neon.c} (95%) rename lib/librte_net/{net_crc_sse.h => > > net_crc_sse.c} (94%) > > > > <snip> > > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static uint8_t > > +sse42_pclmulqdq_cpu_supported(void) > > +{ > > + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ); > > +} > > As a nit, I think it would be better to hide #fidef inside the function, and > return an 0 when define is not set. > Something like: > > static int > sse42_pclmulqdq_cpu_supported(void) > { > #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ); > #else > return 0; > } > > Same for other cpu_supported functions. > And then you can remove these ifdefs in set_alg and other palces, i.e.: > > void > rte_net_crc_set_alg(enum rte_net_crc_alg alg) { > switch (alg) { > #ifdef RTE_ARCH_X86_64 > case RTE_NET_CRC_AVX512: > if (avx512_vpclmulqdq_cpu_supported()) { > handlers = handlers_avx512; > break; > } > /* fall-through */ > case RTE_NET_CRC_SSE42: > if (sse42_pclmulqdq_cpu_supported()) { > handlers = handlers_sse42; > break; > } > #endif > ... > > Same for rte_net_crc_init() [DC] I have reworked the ifdefs in this file based on your comments here and off-list discussions. These are available now in the v5. All ifdef's have been removed out the API function definitions and moved down into 'helper' type functions - looks much cleaner now. Your Ack has been carried through too to v5 as you mentioned > > > +#endif > > + > > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > > +static uint8_t > > +neon_pmull_cpu_supported(void) > > +{ > > + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL); > > +} > > +#endif > > + > > void > > rte_net_crc_set_alg(enum rte_net_crc_alg alg) { > > switch (alg) { > > -#ifdef X86_64_SSE42_PCLMULQDQ > > +#ifdef RTE_ARCH_X86_64 > > case RTE_NET_CRC_SSE42: > > - handlers = handlers_sse42; > > - break; > > -#elif defined ARM64_NEON_PMULL > > - /* fall-through */ > > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > > + if (sse42_pclmulqdq_cpu_supported()) { > > + handlers = handlers_sse42; > > + break; > > + } > > +#endif > > +#endif /* RTE_ARCH_X86_64 */ > > +#ifdef RTE_ARCH_ARM64 > > case RTE_NET_CRC_NEON: > > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { > > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > > + if (neon_pmull_cpu_supported()) { > > handlers = handlers_neon; > > break; > > } > > #endif > > +#endif /* RTE_ARCH_ARM64 */ > > /* fall-through */ > > case RTE_NET_CRC_SCALAR: > > /* fall-through */ > > @@ -188,11 +200,14 @@ RTE_INIT(rte_net_crc_init) > > > > rte_net_crc_scalar_init(); > > > > -#ifdef X86_64_SSE42_PCLMULQDQ > > - alg = RTE_NET_CRC_SSE42; > > - rte_net_crc_sse42_init(); > > -#elif defined ARM64_NEON_PMULL > > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { > > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > > + if (sse42_pclmulqdq_cpu_supported()) { > > + alg = RTE_NET_CRC_SSE42; > > + rte_net_crc_sse42_init(); > > + } > > +#endif > > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > > + if (neon_pmull_cpu_supported()) { > > alg = RTE_NET_CRC_NEON; > > rte_net_crc_neon_init(); > > } > > -- > > 2.12.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection 2020-10-09 14:04 ` Coyle, David @ 2020-10-10 12:42 ` Ananyev, Konstantin 0 siblings, 0 replies; 23+ messages in thread From: Ananyev, Konstantin @ 2020-10-10 12:42 UTC (permalink / raw) To: Coyle, David, O'loingsigh, Mairtin, Singh, Jasvinder, Richardson, Bruce, De Lara Guarch, Pablo Cc: dev, Ryan, Brendan, O'loingsigh, Mairtin Hi David, > > > This patch adds support for run-time selection of the optimal > > > architecture-specific CRC path, based on the supported instruction > > > set(s) of the CPU. > > > > > > The compiler option checks have been moved from the C files to the > > > meson script. The rte_cpu_get_flag_enabled function is called > > > automatically by the library at process initialization time to > > > determine which instructions the CPU supports, with the most optimal > > > supported CRC path ultimately selected. > > > > > > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> > > > Signed-off-by: David Coyle <david.coyle@intel.com> > > > > LGTM, just one nit see below. > > With that: > > Series acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > > > > > --- > > > doc/guides/rel_notes/release_20_11.rst | 4 ++ > > > lib/librte_net/meson.build | 34 +++++++++++- > > > lib/librte_net/net_crc.h | 34 ++++++++++++ > > > lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 +++------ > > > lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 ++++-------- > > > lib/librte_net/rte_net_crc.c | 67 ++++++++++++++--------- > > > 6 files changed, 131 insertions(+), 68 deletions(-) create mode > > > 100644 lib/librte_net/net_crc.h rename lib/librte_net/{net_crc_neon.h > > > => net_crc_neon.c} (95%) rename lib/librte_net/{net_crc_sse.h => > > > net_crc_sse.c} (94%) > > > > > > > > <snip> > > > > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static uint8_t > > > +sse42_pclmulqdq_cpu_supported(void) > > > +{ > > > + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ); > > > +} > > > > As a nit, I think it would be better to hide #fidef inside the function, and > > return an 0 when define is not set. > > Something like: > > > > static int > > sse42_pclmulqdq_cpu_supported(void) > > { > > #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > > return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ); > > #else > > return 0; > > } > > > > Same for other cpu_supported functions. > > And then you can remove these ifdefs in set_alg and other palces, i.e.: > > > > void > > rte_net_crc_set_alg(enum rte_net_crc_alg alg) { > > switch (alg) { > > #ifdef RTE_ARCH_X86_64 > > case RTE_NET_CRC_AVX512: > > if (avx512_vpclmulqdq_cpu_supported()) { > > handlers = handlers_avx512; > > break; > > } > > /* fall-through */ > > case RTE_NET_CRC_SSE42: > > if (sse42_pclmulqdq_cpu_supported()) { > > handlers = handlers_sse42; > > break; > > } > > #endif > > ... > > > > Same for rte_net_crc_init() > > [DC] I have reworked the ifdefs in this file based on your comments here and off-list discussions. > These are available now in the v5. > > All ifdef's have been removed out the API function definitions and moved down into 'helper' type > functions - looks much cleaner now. > > Your Ack has been carried through too to v5 as you mentioned LGTM, thanks. Konstantin ^ permalink raw reply [flat|nested] 23+ messages in thread
* [dpdk-dev] [PATCH v4 2/2] net: add support for AVX512/VPCLMULQDQ based CRC 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh @ 2020-10-06 16:23 ` Mairtin o Loingsigh 2020-10-07 9:26 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " David Marchand 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh 3 siblings, 0 replies; 23+ messages in thread From: Mairtin o Loingsigh @ 2020-10-06 16:23 UTC (permalink / raw) To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch Cc: dev, brendan.ryan, mairtin.oloingsigh, david.coyle This patch enables the optimized calculation of CRC32-Ethernet and CRC16-CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC implementation is built if the compiler supports the required instruction sets. It is selected at run-time if the host CPU, again, supports the required instruction sets. Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> Signed-off-by: David Coyle <david.coyle@intel.com> --- app/test/test_crc.c | 11 +- config/x86/meson.build | 6 +- doc/guides/rel_notes/release_20_11.rst | 2 + lib/librte_net/meson.build | 55 +++++ lib/librte_net/net_crc.h | 11 + lib/librte_net/net_crc_avx512.c | 423 +++++++++++++++++++++++++++++++++ lib/librte_net/rte_net_crc.c | 33 +++ lib/librte_net/rte_net_crc.h | 4 +- 8 files changed, 541 insertions(+), 4 deletions(-) create mode 100644 lib/librte_net/net_crc_avx512.c diff --git a/app/test/test_crc.c b/app/test/test_crc.c index f8a74e04e..bf1d34435 100644 --- a/app/test/test_crc.c +++ b/app/test/test_crc.c @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2017-2020 Intel Corporation */ #include "test.h" @@ -149,6 +149,15 @@ test_crc(void) return ret; } + /* set CRC avx512 mode */ + rte_net_crc_set_alg(RTE_NET_CRC_AVX512); + + ret = test_crc_calc(); + if (ret < 0) { + printf("test crc (x86_64 AVX512): failed (%d)\n", ret); + return ret; + } + /* set CRC neon mode */ rte_net_crc_set_alg(RTE_NET_CRC_NEON); diff --git a/config/x86/meson.build b/config/x86/meson.build index fea4d5403..172b72b72 100644 --- a/config/x86/meson.build +++ b/config/x86/meson.build @@ -1,5 +1,5 @@ # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017-2019 Intel Corporation +# Copyright(c) 2017-2020 Intel Corporation # get binutils version for the workaround of Bug 97 if not is_windows @@ -23,7 +23,9 @@ endforeach optional_flags = ['AES', 'PCLMUL', 'AVX', 'AVX2', 'AVX512F', - 'RDRND', 'RDSEED'] + 'RDRND', 'RDSEED', + 'AVX512BW', 'AVX512DQ', + 'AVX512VL', 'VPCLMULQDQ'] foreach f:optional_flags if cc.get_define('__@0@__'.format(f), args: machine_args) == '1' if f == 'PCLMUL' # special case flags with different defines diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index 0f14e087d..af6e5e40c 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -58,6 +58,8 @@ New Features * **Updated CRC modules of rte_net library.** * Added run-time selection of the optimal architecture-specific CRC path. + * Added optimized implementations of CRC32-Ethernet and CRC16-CCITT + using the AVX512 and VPCLMULQDQ instruction sets. * **Updated Broadcom bnxt driver.** diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index fa439b9e5..6c96b361a 100644 --- a/lib/librte_net/meson.build +++ b/lib/librte_net/meson.build @@ -24,18 +24,62 @@ deps += ['mbuf'] if dpdk_conf.has('RTE_ARCH_X86_64') net_crc_sse42_cpu_support = ( cc.get_define('__PCLMUL__', args: machine_args) != '') + net_crc_avx512_cpu_support = ( + cc.get_define('__AVX512F__', args: machine_args) != '' and + cc.get_define('__AVX512BW__', args: machine_args) != '' and + cc.get_define('__AVX512DQ__', args: machine_args) != '' and + cc.get_define('__AVX512VL__', args: machine_args) != '' and + cc.get_define('__VPCLMULQDQ__', args: machine_args) != '') + net_crc_sse42_cc_support = ( cc.has_argument('-mpclmul') and cc.has_argument('-maes')) + net_crc_avx512_cc_support = ( + not machine_args.contains('-mno-avx512f') and + cc.has_argument('-mavx512f') and + cc.has_argument('-mavx512bw') and + cc.has_argument('-mavx512dq') and + cc.has_argument('-mavx512vl') and + cc.has_argument('-mvpclmulqdq') and + cc.has_argument('-mavx2') and + cc.has_argument('-mavx')) build_static_net_crc_sse42_lib = 0 + build_static_net_crc_avx512_lib = 0 if net_crc_sse42_cpu_support == true sources += files('net_crc_sse.c') cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] + if net_crc_avx512_cpu_support == true + sources += files('net_crc_avx512.c') + cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] + elif net_crc_avx512_cc_support == true + build_static_net_crc_avx512_lib = 1 + net_crc_avx512_lib_cflags = ['-mavx512f', + '-mavx512bw', + '-mavx512dq', + '-mavx512vl', + '-mvpclmulqdq', + '-mavx2', + '-mavx'] + cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] + endif elif net_crc_sse42_cc_support == true build_static_net_crc_sse42_lib = 1 net_crc_sse42_lib_cflags = ['-mpclmul', '-maes'] cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] + if net_crc_avx512_cc_support == true + build_static_net_crc_avx512_lib = 1 + net_crc_avx512_lib_cflags = ['-mpclmul', + '-maes', + '-mavx512f', + '-mavx512bw', + '-mavx512dq', + '-mavx512vl', + '-mvpclmulqdq', + '-mavx2', + '-mavx'] + cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] + endif endif if build_static_net_crc_sse42_lib == 1 @@ -47,6 +91,17 @@ if dpdk_conf.has('RTE_ARCH_X86_64') net_crc_sse42_lib_cflags]) objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c') endif + + if build_static_net_crc_avx512_lib == 1 + net_crc_avx512_lib = static_library( + 'net_crc_avx512_lib', + 'net_crc_avx512.c', + dependencies: static_rte_eal, + c_args: [cflags, + net_crc_avx512_lib_cflags]) + objs += net_crc_avx512_lib.extract_objects('net_crc_avx512.c') + endif + elif (dpdk_conf.has('RTE_ARCH_ARM64') and cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '') sources += files('net_crc_neon.c') diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h index a1578a56c..7a74d5406 100644 --- a/lib/librte_net/net_crc.h +++ b/lib/librte_net/net_crc.h @@ -20,6 +20,17 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len); uint32_t rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len); +/* AVX512 */ + +void +rte_net_crc_avx512_init(void); + +uint32_t +rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len); + +uint32_t +rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len); + /* NEON */ void diff --git a/lib/librte_net/net_crc_avx512.c b/lib/librte_net/net_crc_avx512.c new file mode 100644 index 000000000..3740fe3c9 --- /dev/null +++ b/lib/librte_net/net_crc_avx512.c @@ -0,0 +1,423 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#include <string.h> + +#include <rte_common.h> +#include <rte_branch_prediction.h> +#include <rte_cpuflags.h> + +#include "net_crc.h" + +#include <x86intrin.h> + +/* VPCLMULQDQ CRC computation context structure */ +struct crc_vpclmulqdq_ctx { + __m512i rk1_rk2; + __m512i rk3_rk4; + __m512i fold_7x128b; + __m512i fold_3x128b; + __m128i rk5_rk6; + __m128i rk7_rk8; + __m128i fold_1x128b; +}; + +static struct crc_vpclmulqdq_ctx crc32_eth __rte_aligned(64); +static struct crc_vpclmulqdq_ctx crc16_ccitt __rte_aligned(64); + +static uint16_t byte_len_to_mask_table[] = { + 0x0000, 0x0001, 0x0003, 0x0007, + 0x000f, 0x001f, 0x003f, 0x007f, + 0x00ff, 0x01ff, 0x03ff, 0x07ff, + 0x0fff, 0x1fff, 0x3fff, 0x7fff, + 0xffff}; + +static const uint8_t shf_table[32] __rte_aligned(16) = { + 0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, + 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, + 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f +}; + +static const uint32_t mask[4] __rte_aligned(16) = { + 0xffffffff, 0xffffffff, 0x00000000, 0x00000000 +}; + +static const uint32_t mask2[4] __rte_aligned(16) = { + 0x00000000, 0xffffffff, 0xffffffff, 0xffffffff +}; + +static __rte_always_inline __m512i +crcr32_folding_round(__m512i data_block, __m512i precomp, __m512i fold) +{ + __m512i tmp0, tmp1; + + tmp0 = _mm512_clmulepi64_epi128(fold, precomp, 0x01); + tmp1 = _mm512_clmulepi64_epi128(fold, precomp, 0x10); + + return _mm512_ternarylogic_epi64(tmp0, tmp1, data_block, 0x96); +} + +static __rte_always_inline __m128i +crc32_fold_128(__m512i fold0, __m512i fold1, + const struct crc_vpclmulqdq_ctx *params) +{ + __m128i res, res2; + __m256i a; + __m512i tmp0, tmp1, tmp2, tmp3; + __m512i tmp4; + + tmp0 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, 0x01); + tmp1 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, 0x10); + + res = _mm512_extracti64x2_epi64(fold1, 3); + tmp4 = _mm512_maskz_broadcast_i32x4(0xF, res); + + tmp2 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, 0x01); + tmp3 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, 0x10); + + tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp1, tmp2, 0x96); + tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp3, tmp4, 0x96); + + tmp1 = _mm512_shuffle_i64x2(tmp0, tmp0, 0x4e); + + a = _mm256_xor_si256(*(__m256i *)&tmp1, *(__m256i *)&tmp0); + res = _mm256_extracti64x2_epi64(a, 1); + res2 = _mm_xor_si128(res, *(__m128i *)&a); + + return res2; +} + +static __rte_always_inline __m128i +last_two_xmm(const uint8_t *data, uint32_t data_len, uint32_t n, __m128i res, + const struct crc_vpclmulqdq_ctx *params) +{ + uint32_t offset; + __m128i res2, res3, res4, pshufb_shf; + + const uint32_t mask3[4] __rte_aligned(16) = { + 0x80808080, 0x80808080, 0x80808080, 0x80808080 + }; + + res2 = res; + offset = data_len - n; + res3 = _mm_loadu_si128((const __m128i *)&data[n+offset-16]); + + pshufb_shf = _mm_loadu_si128((const __m128i *) + (shf_table + (data_len-n))); + + res = _mm_shuffle_epi8(res, pshufb_shf); + pshufb_shf = _mm_xor_si128(pshufb_shf, + _mm_load_si128((const __m128i *) mask3)); + res2 = _mm_shuffle_epi8(res2, pshufb_shf); + + res2 = _mm_blendv_epi8(res2, res3, pshufb_shf); + + res4 = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x01); + res = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x10); + res = _mm_ternarylogic_epi64(res, res2, res4, 0x96); + + return res; +} + +static __rte_always_inline __m128i +done_128(__m128i res, const struct crc_vpclmulqdq_ctx *params) +{ + __m128i res1; + + res1 = res; + + res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x0); + res1 = _mm_srli_si128(res1, 8); + res = _mm_xor_si128(res, res1); + + res1 = res; + res = _mm_slli_si128(res, 4); + res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x10); + res = _mm_xor_si128(res, res1); + + return res; +} + +static __rte_always_inline uint32_t +barrett_reduction(__m128i data64, const struct crc_vpclmulqdq_ctx *params) +{ + __m128i tmp0, tmp1; + + data64 = _mm_and_si128(data64, *(const __m128i *)mask2); + tmp0 = data64; + tmp1 = data64; + + data64 = _mm_clmulepi64_si128(tmp0, params->rk7_rk8, 0x0); + data64 = _mm_ternarylogic_epi64(data64, tmp1, *(const __m128i *)mask, + 0x28); + + tmp1 = data64; + data64 = _mm_clmulepi64_si128(data64, params->rk7_rk8, 0x10); + data64 = _mm_ternarylogic_epi64(data64, tmp1, tmp0, 0x96); + + return _mm_extract_epi32(data64, 2); +} + +static __rte_always_inline void +reduction_loop(__m128i *fold, int *len, const uint8_t *data, uint32_t *n, + const struct crc_vpclmulqdq_ctx *params) +{ + __m128i tmp, tmp1; + + tmp = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x1); + *fold = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x10); + *fold = _mm_xor_si128(*fold, tmp); + tmp1 = _mm_loadu_si128((const __m128i *)&data[*n]); + *fold = _mm_xor_si128(*fold, tmp1); + *n += 16; + *len -= 16; +} + +static __rte_always_inline uint32_t +crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len, uint32_t crc, + const struct crc_vpclmulqdq_ctx *params) +{ + __m128i res, d, b; + __m512i temp, k; + __m512i qw0 = _mm512_set1_epi64(0), qw1, qw2, qw3; + __m512i fold0, fold1, fold2, fold3; + __mmask16 mask; + uint32_t n = 0; + int reduction = 0; + + /* Get CRC init value */ + b = _mm_cvtsi32_si128(crc); + temp = _mm512_castsi128_si512(b); + + if (data_len > 255) { + fold0 = _mm512_loadu_si512((const __m512i *)data); + fold1 = _mm512_loadu_si512((const __m512i *)(data+64)); + fold2 = _mm512_loadu_si512((const __m512i *)(data+128)); + fold3 = _mm512_loadu_si512((const __m512i *)(data+192)); + fold0 = _mm512_xor_si512(fold0, temp); + + /* Main folding loop */ + k = params->rk1_rk2; + for (n = 256; (n + 256) <= data_len; n += 256) { + qw0 = _mm512_loadu_si512((const __m512i *)&data[n]); + qw1 = _mm512_loadu_si512((const __m512i *) + &(data[n+64])); + qw2 = _mm512_loadu_si512((const __m512i *) + &(data[n+128])); + qw3 = _mm512_loadu_si512((const __m512i *) + &(data[n+192])); + fold0 = crcr32_folding_round(qw0, k, fold0); + fold1 = crcr32_folding_round(qw1, k, fold1); + fold2 = crcr32_folding_round(qw2, k, fold2); + fold3 = crcr32_folding_round(qw3, k, fold3); + } + + /* 256 to 128 fold */ + k = params->rk3_rk4; + fold0 = crcr32_folding_round(fold2, k, fold0); + fold1 = crcr32_folding_round(fold3, k, fold1); + + res = crc32_fold_128(fold0, fold1, params); + + reduction = 240 - ((n+256)-data_len); + + while (reduction > 0) + reduction_loop(&res, &reduction, data, &n, + params); + + reduction += 16; + + if (n != data_len) + res = last_two_xmm(data, data_len, n, res, + params); + } else { + if (data_len > 31) { + res = _mm_cvtsi32_si128(crc); + d = _mm_loadu_si128((const __m128i *)data); + res = _mm_xor_si128(res, d); + n += 16; + + reduction = 240 - ((n+256)-data_len); + + while (reduction > 0) + reduction_loop(&res, &reduction, data, &n, + params); + + if (n != data_len) + res = last_two_xmm(data, data_len, n, res, + params); + } else if (data_len > 16) { + res = _mm_cvtsi32_si128(crc); + d = _mm_loadu_si128((const __m128i *)data); + res = _mm_xor_si128(res, d); + n += 16; + + if (n != data_len) + res = last_two_xmm(data, data_len, n, res, + params); + } else if (data_len == 16) { + res = _mm_cvtsi32_si128(crc); + d = _mm_loadu_si128((const __m128i *)data); + res = _mm_xor_si128(res, d); + } else { + res = _mm_cvtsi32_si128(crc); + mask = byte_len_to_mask_table[data_len]; + d = _mm_maskz_loadu_epi8(mask, data); + res = _mm_xor_si128(res, d); + + if (data_len > 3) { + d = _mm_loadu_si128((const __m128i *) + &shf_table[data_len]); + res = _mm_shuffle_epi8(res, d); + } else if (data_len > 2) { + res = _mm_slli_si128(res, 5); + goto do_barrett_reduction; + } else if (data_len > 1) { + res = _mm_slli_si128(res, 6); + goto do_barrett_reduction; + } else if (data_len > 0) { + res = _mm_slli_si128(res, 7); + goto do_barrett_reduction; + } else { + /* zero length case */ + return crc; + } + } + } + + res = done_128(res, params); + +do_barrett_reduction: + n = barrett_reduction(res, params); + + return n; +} + +static void +crc32_load_init_constants(void) +{ + __m128i a; + /* fold constants */ + uint64_t c0 = 0x00000000e95c1271; + uint64_t c1 = 0x00000000ce3371cb; + uint64_t c2 = 0x00000000910eeec1; + uint64_t c3 = 0x0000000033fff533; + uint64_t c4 = 0x000000000cbec0ed; + uint64_t c5 = 0x0000000031f8303f; + uint64_t c6 = 0x0000000057c54819; + uint64_t c7 = 0x00000000df068dc2; + uint64_t c8 = 0x00000000ae0b5394; + uint64_t c9 = 0x000000001c279815; + uint64_t c10 = 0x000000001d9513d7; + uint64_t c11 = 0x000000008f352d95; + uint64_t c12 = 0x00000000af449247; + uint64_t c13 = 0x000000003db1ecdc; + uint64_t c14 = 0x0000000081256527; + uint64_t c15 = 0x00000000f1da05aa; + uint64_t c16 = 0x00000000ccaa009e; + uint64_t c17 = 0x00000000ae689191; + uint64_t c18 = 0x00000000ccaa009e; + uint64_t c19 = 0x00000000b8bc6765; + uint64_t c20 = 0x00000001f7011640; + uint64_t c21 = 0x00000001db710640; + + a = _mm_set_epi64x(c1, c0); + crc32_eth.rk1_rk2 = _mm512_broadcast_i32x4(a); + + a = _mm_set_epi64x(c3, c2); + crc32_eth.rk3_rk4 = _mm512_broadcast_i32x4(a); + + crc32_eth.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8, + c9, c10, c11); + crc32_eth.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15, + c16, c17, 0, 0); + crc32_eth.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16), + _mm_cvtsi64_m64(c17)); + + crc32_eth.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18), + _mm_cvtsi64_m64(c19)); + crc32_eth.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20), + _mm_cvtsi64_m64(c21)); +} + +static void +crc16_load_init_constants(void) +{ + __m128i a; + /* fold constants */ + uint64_t c0 = 0x0000000000009a19; + uint64_t c1 = 0x0000000000002df8; + uint64_t c2 = 0x00000000000068af; + uint64_t c3 = 0x000000000000b6c9; + uint64_t c4 = 0x000000000000c64f; + uint64_t c5 = 0x000000000000cd95; + uint64_t c6 = 0x000000000000d341; + uint64_t c7 = 0x000000000000b8f2; + uint64_t c8 = 0x0000000000000842; + uint64_t c9 = 0x000000000000b072; + uint64_t c10 = 0x00000000000047e3; + uint64_t c11 = 0x000000000000922d; + uint64_t c12 = 0x0000000000000e3a; + uint64_t c13 = 0x0000000000004d7a; + uint64_t c14 = 0x0000000000005b44; + uint64_t c15 = 0x0000000000007762; + uint64_t c16 = 0x00000000000081bf; + uint64_t c17 = 0x0000000000008e10; + uint64_t c18 = 0x00000000000081bf; + uint64_t c19 = 0x0000000000001cbb; + uint64_t c20 = 0x000000011c581910; + uint64_t c21 = 0x0000000000010810; + + a = _mm_set_epi64x(c1, c0); + crc16_ccitt.rk1_rk2 = _mm512_broadcast_i32x4(a); + + a = _mm_set_epi64x(c3, c2); + crc16_ccitt.rk3_rk4 = _mm512_broadcast_i32x4(a); + + crc16_ccitt.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8, + c9, c10, c11); + crc16_ccitt.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15, + c16, c17, 0, 0); + crc16_ccitt.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16), + _mm_cvtsi64_m64(c17)); + + crc16_ccitt.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18), + _mm_cvtsi64_m64(c19)); + crc16_ccitt.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20), + _mm_cvtsi64_m64(c21)); +} + +void +rte_net_crc_avx512_init(void) +{ + crc32_load_init_constants(); + crc16_load_init_constants(); + + /* + * Reset the register as following calculation may + * use other data types such as float, double, etc. + */ + _mm_empty(); +} + +uint32_t +rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len) +{ + /* return 16-bit CRC value */ + return (uint16_t)~crc32_eth_calc_vpclmulqdq(data, + data_len, + 0xffff, + &crc16_ccitt); +} + +uint32_t +rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len) +{ + /* return 32-bit CRC value */ + return ~crc32_eth_calc_vpclmulqdq(data, + data_len, + 0xffffffffUL, + &crc32_eth); +} diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index 83dccbfba..fcf9cc0ef 100644 --- a/lib/librte_net/rte_net_crc.c +++ b/lib/librte_net/rte_net_crc.c @@ -37,6 +37,12 @@ static rte_net_crc_handler handlers_scalar[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, }; +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT +static rte_net_crc_handler handlers_avx512[] = { + [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_avx512_handler, + [RTE_NET_CRC32_ETH] = rte_crc32_eth_avx512_handler, +}; +#endif #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static rte_net_crc_handler handlers_sse42[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, @@ -132,6 +138,19 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len) crc32_eth_lut); } +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT +static uint8_t +avx512_vpclmulqdq_cpu_supported(void) +{ + return rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512DQ) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ) && + rte_cpu_get_flag_enabled(RTE_CPUFLAG_VPCLMULQDQ); +} +#endif + #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static uint8_t sse42_pclmulqdq_cpu_supported(void) @@ -153,6 +172,14 @@ rte_net_crc_set_alg(enum rte_net_crc_alg alg) { switch (alg) { #ifdef RTE_ARCH_X86_64 + case RTE_NET_CRC_AVX512: +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT + if (avx512_vpclmulqdq_cpu_supported()) { + handlers = handlers_avx512; + break; + } +#endif + /* fall-through */ case RTE_NET_CRC_SSE42: #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT if (sse42_pclmulqdq_cpu_supported()) { @@ -206,6 +233,12 @@ RTE_INIT(rte_net_crc_init) rte_net_crc_sse42_init(); } #endif +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT + if (avx512_vpclmulqdq_cpu_supported()) { + alg = RTE_NET_CRC_AVX512; + rte_net_crc_avx512_init(); + } +#endif #ifdef CC_ARM64_NEON_PMULL_SUPPORT if (neon_pmull_cpu_supported()) { alg = RTE_NET_CRC_NEON; diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h index 16e85ca97..72d3e10ff 100644 --- a/lib/librte_net/rte_net_crc.h +++ b/lib/librte_net/rte_net_crc.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2017-2020 Intel Corporation */ #ifndef _RTE_NET_CRC_H_ @@ -23,6 +23,7 @@ enum rte_net_crc_alg { RTE_NET_CRC_SCALAR = 0, RTE_NET_CRC_SSE42, RTE_NET_CRC_NEON, + RTE_NET_CRC_AVX512, }; /** @@ -35,6 +36,7 @@ enum rte_net_crc_alg { * - RTE_NET_CRC_SCALAR * - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic) * - RTE_NET_CRC_NEON (Use ARM Neon intrinsic) + * - RTE_NET_CRC_AVX512 (Use 512-bit AVX intrinsic) */ void rte_net_crc_set_alg(enum rte_net_crc_alg alg); -- 2.12.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh @ 2020-10-07 9:26 ` David Marchand 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh 3 siblings, 0 replies; 23+ messages in thread From: David Marchand @ 2020-10-07 9:26 UTC (permalink / raw) To: Olivier Matz, Bruce Richardson, Ananyev, Konstantin, Jerin Jacob Kollanukkaran, Ruifeng Wang (Arm Technology China) Cc: Singh, Jasvinder, Pablo de Lara, dev, Ryan, Brendan, David Coyle, Mairtin o Loingsigh On Tue, Oct 6, 2020 at 6:23 PM Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> wrote: > > This patchset makes two significant enhancements to the CRC modules of > the rte_net library: > > 1) Adds run-time selection of the optimal architecture-specific CRC path. > Previously the selection was solely made at compile-time, meaning it > could only be built and run on the same generation of CPU. Adding > run-time selection ability means this can be used from distro packages > and/or DPDK can be compiled on an older CPU and run on a newer CPU. > 2) Adds an optimized CRC implementation based on the AVX512 and > VPCLMULQDQ instruction sets. > > For further details, please see the commit messages of the individual > patches. Reviews please? Thanks. -- David Marchand ^ permalink raw reply [flat|nested] 23+ messages in thread
* [dpdk-dev] [PATCH v5 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh ` (2 preceding siblings ...) 2020-10-07 9:26 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " David Marchand @ 2020-10-09 13:50 ` Mairtin o Loingsigh 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh ` (3 more replies) 3 siblings, 4 replies; 23+ messages in thread From: Mairtin o Loingsigh @ 2020-10-09 13:50 UTC (permalink / raw) To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch, konstantin.ananyev Cc: dev, brendan.ryan, mairtin.oloingsigh, david.coyle This patchset makes two significant enhancements to the CRC modules of the rte_net library: 1) Adds run-time selection of the optimal architecture-specific CRC path. Previously the selection was solely made at compile-time, meaning it could only be built and run on the same generation of CPU. Adding run-time selection ability means this can be used from distro packages and/or DPDK can be compiled on an older CPU and run on a newer CPU. 2) Adds an optimized CRC implementation based on the AVX512 and VPCLMULQDQ instruction sets. For further details, please see the commit messages of the individual patches. v5: * Tidied-up the ifdef checks for RTE_ARCH_* and compiler support of CRC paths, as per review comments: * All ifdef checks removed from API function definitions and moved into helper functions. v4: * Fixed build issue when older version of meson is used (0.47.1). * Addressed review comments: * remove Intel copyright header from neon CRC file. * tidy-up of register initialisation. v3: * Re-submitted v2 as encountered problems when originally submitting it. v2: * Added support for run-time selection of optimal architecture-specific CRC, based on v1 review comment. * Added full working AVX512/VPCLMULQDQ support for CRC32-Ethernet and CRC16-CCITT. v1: * Initial version, with incomplete AVX512/VPCLMULQDQ support for CRC32-Ethernet only. Mairtin o Loingsigh (2): net: add run-time architecture specific CRC selection net: add support for AVX512/VPCLMULQDQ based CRC app/test/test_crc.c | 11 +- config/x86/meson.build | 6 +- doc/guides/rel_notes/release_20_11.rst | 6 + lib/librte_net/meson.build | 89 ++++- lib/librte_net/net_crc.h | 45 +++ lib/librte_net/net_crc_avx512.c | 423 ++++++++++++++++++++++ lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 +- lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 +- lib/librte_net/rte_net_crc.c | 162 +++++++-- lib/librte_net/rte_net_crc.h | 4 +- 10 files changed, 722 insertions(+), 84 deletions(-) create mode 100644 lib/librte_net/net_crc.h create mode 100644 lib/librte_net/net_crc_avx512.c rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%) rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%) -- 2.12.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh @ 2020-10-09 13:50 ` Mairtin o Loingsigh 2020-10-09 16:22 ` Singh, Jasvinder ` (2 more replies) 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh ` (2 subsequent siblings) 3 siblings, 3 replies; 23+ messages in thread From: Mairtin o Loingsigh @ 2020-10-09 13:50 UTC (permalink / raw) To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch, konstantin.ananyev Cc: dev, brendan.ryan, mairtin.oloingsigh, david.coyle This patch adds support for run-time selection of the optimal architecture-specific CRC path, based on the supported instruction set(s) of the CPU. The compiler option checks have been moved from the C files to the meson script. The rte_cpu_get_flag_enabled function is called automatically by the library at process initialization time to determine which instructions the CPU supports, with the most optimal supported CRC path ultimately selected. Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> Signed-off-by: David Coyle <david.coyle@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- doc/guides/rel_notes/release_20_11.rst | 4 + lib/librte_net/meson.build | 34 ++++++- lib/librte_net/net_crc.h | 34 +++++++ lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 ++--- lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 ++----- lib/librte_net/rte_net_crc.c | 116 +++++++++++++++------- 6 files changed, 168 insertions(+), 80 deletions(-) create mode 100644 lib/librte_net/net_crc.h rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%) rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%) diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index 808bdc4e5..b77297f7e 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -55,6 +55,10 @@ New Features Also, make sure to start the actual text at the margin. ======================================================= +* **Updated CRC modules of rte_net library.** + + * Added run-time selection of the optimal architecture-specific CRC path. + * **Updated Broadcom bnxt driver.** Updated the Broadcom bnxt driver with new features and improvements, including: diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index 24ed8253b..fa439b9e5 100644 --- a/lib/librte_net/meson.build +++ b/lib/librte_net/meson.build @@ -1,5 +1,5 @@ # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017 Intel Corporation +# Copyright(c) 2017-2020 Intel Corporation headers = files('rte_ip.h', 'rte_tcp.h', @@ -20,3 +20,35 @@ headers = files('rte_ip.h', sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c') deps += ['mbuf'] + +if dpdk_conf.has('RTE_ARCH_X86_64') + net_crc_sse42_cpu_support = ( + cc.get_define('__PCLMUL__', args: machine_args) != '') + net_crc_sse42_cc_support = ( + cc.has_argument('-mpclmul') and cc.has_argument('-maes')) + + build_static_net_crc_sse42_lib = 0 + + if net_crc_sse42_cpu_support == true + sources += files('net_crc_sse.c') + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] + elif net_crc_sse42_cc_support == true + build_static_net_crc_sse42_lib = 1 + net_crc_sse42_lib_cflags = ['-mpclmul', '-maes'] + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] + endif + + if build_static_net_crc_sse42_lib == 1 + net_crc_sse42_lib = static_library( + 'net_crc_sse42_lib', + 'net_crc_sse.c', + dependencies: static_rte_eal, + c_args: [cflags, + net_crc_sse42_lib_cflags]) + objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c') + endif +elif (dpdk_conf.has('RTE_ARCH_ARM64') and + cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '') + sources += files('net_crc_neon.c') + cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT'] +endif diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h new file mode 100644 index 000000000..a1578a56c --- /dev/null +++ b/lib/librte_net/net_crc.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#ifndef _NET_CRC_H_ +#define _NET_CRC_H_ + +/* + * Different implementations of CRC + */ + +/* SSE4.2 */ + +void +rte_net_crc_sse42_init(void); + +uint32_t +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len); + +uint32_t +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len); + +/* NEON */ + +void +rte_net_crc_neon_init(void); + +uint32_t +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len); + +uint32_t +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len); + +#endif /* _NET_CRC_H_ */ diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c similarity index 95% rename from lib/librte_net/net_crc_neon.h rename to lib/librte_net/net_crc_neon.c index 63fa1d4a1..f61d75a8c 100644 --- a/lib/librte_net/net_crc_neon.h +++ b/lib/librte_net/net_crc_neon.c @@ -2,17 +2,15 @@ * Copyright(c) 2017 Cavium, Inc */ -#ifndef _NET_CRC_NEON_H_ -#define _NET_CRC_NEON_H_ +#include <string.h> +#include <rte_common.h> #include <rte_branch_prediction.h> #include <rte_net_crc.h> #include <rte_vect.h> #include <rte_cpuflags.h> -#ifdef __cplusplus -extern "C" { -#endif +#include "net_crc.h" /** PMULL CRC computation context structure */ struct crc_pmull_ctx { @@ -218,7 +216,7 @@ crc32_eth_calc_pmull( return n; } -static inline void +void rte_net_crc_neon_init(void) { /* Initialize CRC16 data */ @@ -242,9 +240,8 @@ rte_net_crc_neon_init(void) crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8); } -static inline uint32_t -rte_crc16_ccitt_neon_handler(const uint8_t *data, - uint32_t data_len) +uint32_t +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len) { return (uint16_t)~crc32_eth_calc_pmull(data, data_len, @@ -252,18 +249,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data, &crc16_ccitt_pmull); } -static inline uint32_t -rte_crc32_eth_neon_handler(const uint8_t *data, - uint32_t data_len) +uint32_t +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len) { return ~crc32_eth_calc_pmull(data, data_len, 0xffffffffUL, &crc32_eth_pmull); } - -#ifdef __cplusplus -} -#endif - -#endif /* _NET_CRC_NEON_H_ */ diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c similarity index 94% rename from lib/librte_net/net_crc_sse.h rename to lib/librte_net/net_crc_sse.c index 1c7b7a548..053b54b39 100644 --- a/lib/librte_net/net_crc_sse.h +++ b/lib/librte_net/net_crc_sse.c @@ -1,18 +1,16 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2017-2020 Intel Corporation */ -#ifndef _RTE_NET_CRC_SSE_H_ -#define _RTE_NET_CRC_SSE_H_ +#include <string.h> +#include <rte_common.h> #include <rte_branch_prediction.h> +#include <rte_cpuflags.h> -#include <x86intrin.h> -#include <cpuid.h> +#include "net_crc.h" -#ifdef __cplusplus -extern "C" { -#endif +#include <x86intrin.h> /** PCLMULQDQ CRC computation context structure */ struct crc_pclmulqdq_ctx { @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq( return n; } - -static inline void +void rte_net_crc_sse42_init(void) { uint64_t k1, k2, k5, k6; @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void) * use other data types such as float, double, etc. */ _mm_empty(); - } -static inline uint32_t -rte_crc16_ccitt_sse42_handler(const uint8_t *data, - uint32_t data_len) +uint32_t +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len) { /** return 16-bit CRC value */ return (uint16_t)~crc32_eth_calc_pclmulqdq(data, @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, &crc16_ccitt_pclmulqdq); } -static inline uint32_t -rte_crc32_eth_sse42_handler(const uint8_t *data, - uint32_t data_len) +uint32_t +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len) { return ~crc32_eth_calc_pclmulqdq(data, data_len, 0xffffffffUL, &crc32_eth_pclmulqdq); } - -#ifdef __cplusplus -} -#endif - -#endif /* _RTE_NET_CRC_SSE_H_ */ diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index 4f5b9e828..d271d5205 100644 --- a/lib/librte_net/rte_net_crc.c +++ b/lib/librte_net/rte_net_crc.c @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2017-2020 Intel Corporation */ #include <stddef.h> @@ -10,17 +10,7 @@ #include <rte_common.h> #include <rte_net_crc.h> -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__) -#define X86_64_SSE42_PCLMULQDQ 1 -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO) -#define ARM64_NEON_PMULL 1 -#endif - -#ifdef X86_64_SSE42_PCLMULQDQ -#include <net_crc_sse.h> -#elif defined ARM64_NEON_PMULL -#include <net_crc_neon.h> -#endif +#include "net_crc.h" /** CRC polynomials */ #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL @@ -41,25 +31,27 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len); typedef uint32_t (*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len); -static rte_net_crc_handler *handlers; +static const rte_net_crc_handler *handlers; -static rte_net_crc_handler handlers_scalar[] = { +static const rte_net_crc_handler handlers_scalar[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, }; - -#ifdef X86_64_SSE42_PCLMULQDQ -static rte_net_crc_handler handlers_sse42[] = { +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT +static const rte_net_crc_handler handlers_sse42[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler, }; -#elif defined ARM64_NEON_PMULL -static rte_net_crc_handler handlers_neon[] = { +#endif +#ifdef CC_ARM64_NEON_PMULL_SUPPORT +static const rte_net_crc_handler handlers_neon[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler, }; #endif +/* Scalar handling */ + /** * Reflect the bits about the middle * @@ -142,29 +134,82 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len) crc32_eth_lut); } +/* SSE4.2/PCLMULQDQ handling */ + +#define SSE42_PCLMULQDQ_CPU_SUPPORTED \ + rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ) + +static const rte_net_crc_handler * +sse42_pclmulqdq_get_handlers(void) +{ +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT + if (SSE42_PCLMULQDQ_CPU_SUPPORTED) + return handlers_sse42; +#endif + return NULL; +} + +static uint8_t +sse42_pclmulqdq_init(void) +{ +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT + if (SSE42_PCLMULQDQ_CPU_SUPPORTED) { + rte_net_crc_sse42_init(); + return 1; + } +#endif + return 0; +} + +/* NEON/PMULL handling */ + +#define NEON_PMULL_CPU_SUPPORTED \ + rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL) + +static const rte_net_crc_handler * +neon_pmull_get_handlers(void) +{ +#ifdef CC_ARM64_NEON_PMULL_SUPPORT + if (NEON_PMULL_CPU_SUPPORTED) + return handlers_neon; +#endif + return NULL; +} + +static uint8_t +neon_pmull_init(void) +{ +#ifdef CC_ARM64_NEON_PMULL_SUPPORT + if (NEON_PMULL_CPU_SUPPORTED) { + rte_net_crc_neon_init(); + return 1; + } +#endif + return 0; +} + +/* Public API */ + void rte_net_crc_set_alg(enum rte_net_crc_alg alg) { + handlers = NULL; + switch (alg) { -#ifdef X86_64_SSE42_PCLMULQDQ case RTE_NET_CRC_SSE42: - handlers = handlers_sse42; - break; -#elif defined ARM64_NEON_PMULL - /* fall-through */ + handlers = sse42_pclmulqdq_get_handlers(); + break; /* for x86, always break here */ case RTE_NET_CRC_NEON: - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { - handlers = handlers_neon; - break; - } -#endif + handlers = neon_pmull_get_handlers(); /* fall-through */ case RTE_NET_CRC_SCALAR: /* fall-through */ default: - handlers = handlers_scalar; break; } + + if (handlers == NULL) + handlers = handlers_scalar; } uint32_t @@ -188,15 +233,10 @@ RTE_INIT(rte_net_crc_init) rte_net_crc_scalar_init(); -#ifdef X86_64_SSE42_PCLMULQDQ - alg = RTE_NET_CRC_SSE42; - rte_net_crc_sse42_init(); -#elif defined ARM64_NEON_PMULL - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { + if (sse42_pclmulqdq_init()) + alg = RTE_NET_CRC_SSE42; + if (neon_pmull_init()) alg = RTE_NET_CRC_NEON; - rte_net_crc_neon_init(); - } -#endif rte_net_crc_set_alg(alg); } -- 2.12.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh @ 2020-10-09 16:22 ` Singh, Jasvinder 2020-10-10 9:34 ` Ruifeng Wang 2020-10-13 9:07 ` Bruce Richardson 2 siblings, 0 replies; 23+ messages in thread From: Singh, Jasvinder @ 2020-10-09 16:22 UTC (permalink / raw) To: O'loingsigh, Mairtin, Richardson, Bruce, De Lara Guarch, Pablo, Ananyev, Konstantin Cc: dev, Ryan, Brendan, Coyle, David > -----Original Message----- > From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com> > Sent: Friday, October 9, 2020 2:51 PM > To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce > <bruce.richardson@intel.com>; De Lara Guarch, Pablo > <pablo.de.lara.guarch@intel.com>; Ananyev, Konstantin > <konstantin.ananyev@intel.com> > Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; O'loingsigh, > Mairtin <mairtin.oloingsigh@intel.com>; Coyle, David > <david.coyle@intel.com> > Subject: [PATCH v5 1/2] net: add run-time architecture specific CRC selection > > This patch adds support for run-time selection of the optimal architecture- > specific CRC path, based on the supported instruction set(s) of the CPU. > > The compiler option checks have been moved from the C files to the meson > script. The rte_cpu_get_flag_enabled function is called automatically by the > library at process initialization time to determine which instructions the CPU > supports, with the most optimal supported CRC path ultimately selected. > > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> > Signed-off-by: David Coyle <david.coyle@intel.com> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > --- > doc/guides/rel_notes/release_20_11.rst | 4 + > lib/librte_net/meson.build | 34 ++++++- > lib/librte_net/net_crc.h | 34 +++++++ > lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 ++--- > lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 ++----- > lib/librte_net/rte_net_crc.c | 116 +++++++++++++++------- > 6 files changed, 168 insertions(+), 80 deletions(-) create mode 100644 > lib/librte_net/net_crc.h rename lib/librte_net/{net_crc_neon.h => > net_crc_neon.c} (95%) rename lib/librte_net/{net_crc_sse.h => > net_crc_sse.c} (94%) > > diff --git a/doc/guides/rel_notes/release_20_11.rst > b/doc/guides/rel_notes/release_20_11.rst > index 808bdc4e5..b77297f7e 100644 > --- a/doc/guides/rel_notes/release_20_11.rst > +++ b/doc/guides/rel_notes/release_20_11.rst > @@ -55,6 +55,10 @@ New Features > Also, make sure to start the actual text at the margin. > ======================================================= > > +* **Updated CRC modules of rte_net library.** > + > + * Added run-time selection of the optimal architecture-specific CRC path. > + > * **Updated Broadcom bnxt driver.** > > Updated the Broadcom bnxt driver with new features and improvements, > including: > diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index > 24ed8253b..fa439b9e5 100644 > --- a/lib/librte_net/meson.build > +++ b/lib/librte_net/meson.build > @@ -1,5 +1,5 @@ > # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017 Intel > Corporation > +# Copyright(c) 2017-2020 Intel Corporation > > headers = files('rte_ip.h', > 'rte_tcp.h', > @@ -20,3 +20,35 @@ headers = files('rte_ip.h', > > sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c') deps += > ['mbuf'] > + > +if dpdk_conf.has('RTE_ARCH_X86_64') > + net_crc_sse42_cpu_support = ( > + cc.get_define('__PCLMUL__', args: machine_args) != '') > + net_crc_sse42_cc_support = ( > + cc.has_argument('-mpclmul') and cc.has_argument('-maes')) > + > + build_static_net_crc_sse42_lib = 0 > + > + if net_crc_sse42_cpu_support == true > + sources += files('net_crc_sse.c') > + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] > + elif net_crc_sse42_cc_support == true > + build_static_net_crc_sse42_lib = 1 > + net_crc_sse42_lib_cflags = ['-mpclmul', '-maes'] > + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] > + endif > + > + if build_static_net_crc_sse42_lib == 1 > + net_crc_sse42_lib = static_library( > + 'net_crc_sse42_lib', > + 'net_crc_sse.c', > + dependencies: static_rte_eal, > + c_args: [cflags, > + net_crc_sse42_lib_cflags]) > + objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c') > + endif > +elif (dpdk_conf.has('RTE_ARCH_ARM64') and > + cc.get_define('__ARM_FEATURE_CRYPTO', args: > machine_args) != '') > + sources += files('net_crc_neon.c') > + cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT'] endif > diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h new file mode > 100644 index 000000000..a1578a56c > --- /dev/null > +++ b/lib/librte_net/net_crc.h > @@ -0,0 +1,34 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + */ > + > +#ifndef _NET_CRC_H_ > +#define _NET_CRC_H_ > + > +/* > + * Different implementations of CRC > + */ > + > +/* SSE4.2 */ > + > +void > +rte_net_crc_sse42_init(void); > + > +uint32_t > +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len); > + > +uint32_t > +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len); > + > +/* NEON */ > + > +void > +rte_net_crc_neon_init(void); > + > +uint32_t > +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len); > + > +uint32_t > +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len); > + > +#endif /* _NET_CRC_H_ */ > diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c > similarity index 95% rename from lib/librte_net/net_crc_neon.h rename to > lib/librte_net/net_crc_neon.c index 63fa1d4a1..f61d75a8c 100644 > --- a/lib/librte_net/net_crc_neon.h > +++ b/lib/librte_net/net_crc_neon.c > @@ -2,17 +2,15 @@ > * Copyright(c) 2017 Cavium, Inc > */ > > -#ifndef _NET_CRC_NEON_H_ > -#define _NET_CRC_NEON_H_ > +#include <string.h> > > +#include <rte_common.h> > #include <rte_branch_prediction.h> > #include <rte_net_crc.h> > #include <rte_vect.h> > #include <rte_cpuflags.h> > > -#ifdef __cplusplus > -extern "C" { > -#endif > +#include "net_crc.h" > > /** PMULL CRC computation context structure */ struct crc_pmull_ctx { @@ > -218,7 +216,7 @@ crc32_eth_calc_pmull( > return n; > } > > -static inline void > +void > rte_net_crc_neon_init(void) > { > /* Initialize CRC16 data */ > @@ -242,9 +240,8 @@ rte_net_crc_neon_init(void) > crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8); } > > -static inline uint32_t > -rte_crc16_ccitt_neon_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len) > { > return (uint16_t)~crc32_eth_calc_pmull(data, > data_len, > @@ -252,18 +249,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data, > &crc16_ccitt_pmull); > } > > -static inline uint32_t > -rte_crc32_eth_neon_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len) > { > return ~crc32_eth_calc_pmull(data, > data_len, > 0xffffffffUL, > &crc32_eth_pmull); > } > - > -#ifdef __cplusplus > -} > -#endif > - > -#endif /* _NET_CRC_NEON_H_ */ > diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c > similarity index 94% rename from lib/librte_net/net_crc_sse.h rename to > lib/librte_net/net_crc_sse.c index 1c7b7a548..053b54b39 100644 > --- a/lib/librte_net/net_crc_sse.h > +++ b/lib/librte_net/net_crc_sse.c > @@ -1,18 +1,16 @@ > /* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2017 Intel Corporation > + * Copyright(c) 2017-2020 Intel Corporation > */ > > -#ifndef _RTE_NET_CRC_SSE_H_ > -#define _RTE_NET_CRC_SSE_H_ > +#include <string.h> > > +#include <rte_common.h> > #include <rte_branch_prediction.h> > +#include <rte_cpuflags.h> > > -#include <x86intrin.h> > -#include <cpuid.h> > +#include "net_crc.h" > > -#ifdef __cplusplus > -extern "C" { > -#endif > +#include <x86intrin.h> > > /** PCLMULQDQ CRC computation context structure */ struct > crc_pclmulqdq_ctx { @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq( > return n; > } > > - > -static inline void > +void > rte_net_crc_sse42_init(void) > { > uint64_t k1, k2, k5, k6; > @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void) > * use other data types such as float, double, etc. > */ > _mm_empty(); > - > } > > -static inline uint32_t > -rte_crc16_ccitt_sse42_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len) > { > /** return 16-bit CRC value */ > return (uint16_t)~crc32_eth_calc_pclmulqdq(data, > @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t > *data, > &crc16_ccitt_pclmulqdq); > } > > -static inline uint32_t > -rte_crc32_eth_sse42_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len) > { > return ~crc32_eth_calc_pclmulqdq(data, > data_len, > 0xffffffffUL, > &crc32_eth_pclmulqdq); > } > - > -#ifdef __cplusplus > -} > -#endif > - > -#endif /* _RTE_NET_CRC_SSE_H_ */ > diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index > 4f5b9e828..d271d5205 100644 > --- a/lib/librte_net/rte_net_crc.c > +++ b/lib/librte_net/rte_net_crc.c > @@ -1,5 +1,5 @@ > /* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2017 Intel Corporation > + * Copyright(c) 2017-2020 Intel Corporation > */ > > #include <stddef.h> > @@ -10,17 +10,7 @@ > #include <rte_common.h> > #include <rte_net_crc.h> > > -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__) > -#define X86_64_SSE42_PCLMULQDQ 1 > -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO) > -#define ARM64_NEON_PMULL 1 > -#endif > - > -#ifdef X86_64_SSE42_PCLMULQDQ > -#include <net_crc_sse.h> > -#elif defined ARM64_NEON_PMULL > -#include <net_crc_neon.h> > -#endif > +#include "net_crc.h" > > /** CRC polynomials */ > #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL @@ -41,25 +31,27 @@ > rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len); typedef > uint32_t (*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len); > > -static rte_net_crc_handler *handlers; > +static const rte_net_crc_handler *handlers; > > -static rte_net_crc_handler handlers_scalar[] = { > +static const rte_net_crc_handler handlers_scalar[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, }; > - > -#ifdef X86_64_SSE42_PCLMULQDQ > -static rte_net_crc_handler handlers_sse42[] = { > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static const > +rte_net_crc_handler handlers_sse42[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler, }; -#elif > defined ARM64_NEON_PMULL -static rte_net_crc_handler handlers_neon[] = > { > +#endif > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > +static const rte_net_crc_handler handlers_neon[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler, }; #endif > > +/* Scalar handling */ > + > /** > * Reflect the bits about the middle > * > @@ -142,29 +134,82 @@ rte_crc32_eth_handler(const uint8_t *data, > uint32_t data_len) > crc32_eth_lut); > } > > +/* SSE4.2/PCLMULQDQ handling */ > + > +#define SSE42_PCLMULQDQ_CPU_SUPPORTED \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ) > + > +static const rte_net_crc_handler * > +sse42_pclmulqdq_get_handlers(void) > +{ > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > + if (SSE42_PCLMULQDQ_CPU_SUPPORTED) > + return handlers_sse42; > +#endif > + return NULL; > +} > + > +static uint8_t > +sse42_pclmulqdq_init(void) > +{ > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > + if (SSE42_PCLMULQDQ_CPU_SUPPORTED) { > + rte_net_crc_sse42_init(); > + return 1; > + } > +#endif > + return 0; > +} > + > +/* NEON/PMULL handling */ > + > +#define NEON_PMULL_CPU_SUPPORTED \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL) > + > +static const rte_net_crc_handler * > +neon_pmull_get_handlers(void) > +{ > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > + if (NEON_PMULL_CPU_SUPPORTED) > + return handlers_neon; > +#endif > + return NULL; > +} > + > +static uint8_t > +neon_pmull_init(void) > +{ > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > + if (NEON_PMULL_CPU_SUPPORTED) { > + rte_net_crc_neon_init(); > + return 1; > + } > +#endif > + return 0; > +} > + > +/* Public API */ > + > void > rte_net_crc_set_alg(enum rte_net_crc_alg alg) { > + handlers = NULL; > + > switch (alg) { > -#ifdef X86_64_SSE42_PCLMULQDQ > case RTE_NET_CRC_SSE42: > - handlers = handlers_sse42; > - break; > -#elif defined ARM64_NEON_PMULL > - /* fall-through */ > + handlers = sse42_pclmulqdq_get_handlers(); > + break; /* for x86, always break here */ > case RTE_NET_CRC_NEON: > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { > - handlers = handlers_neon; > - break; > - } > -#endif > + handlers = neon_pmull_get_handlers(); > /* fall-through */ > case RTE_NET_CRC_SCALAR: > /* fall-through */ > default: > - handlers = handlers_scalar; > break; > } > + > + if (handlers == NULL) > + handlers = handlers_scalar; > } > > uint32_t > @@ -188,15 +233,10 @@ RTE_INIT(rte_net_crc_init) > > rte_net_crc_scalar_init(); > > -#ifdef X86_64_SSE42_PCLMULQDQ > - alg = RTE_NET_CRC_SSE42; > - rte_net_crc_sse42_init(); > -#elif defined ARM64_NEON_PMULL > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { > + if (sse42_pclmulqdq_init()) > + alg = RTE_NET_CRC_SSE42; > + if (neon_pmull_init()) > alg = RTE_NET_CRC_NEON; > - rte_net_crc_neon_init(); > - } > -#endif > > rte_net_crc_set_alg(alg); > } > -- > 2.12.3 Reviewed-by: Jasvinder Singh <jasvinder.singh@intel.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh 2020-10-09 16:22 ` Singh, Jasvinder @ 2020-10-10 9:34 ` Ruifeng Wang 2020-10-13 9:07 ` Bruce Richardson 2 siblings, 0 replies; 23+ messages in thread From: Ruifeng Wang @ 2020-10-10 9:34 UTC (permalink / raw) To: Mairtin o Loingsigh, jasvinder.singh, bruce.richardson, pablo.de.lara.guarch, konstantin.ananyev Cc: dev, brendan.ryan, david.coyle, nd > -----Original Message----- > From: dev <dev-bounces@dpdk.org> On Behalf Of Mairtin o Loingsigh > Sent: Friday, October 9, 2020 9:51 PM > To: jasvinder.singh@intel.com; bruce.richardson@intel.com; > pablo.de.lara.guarch@intel.com; konstantin.ananyev@intel.com > Cc: dev@dpdk.org; brendan.ryan@intel.com; mairtin.oloingsigh@intel.com; > david.coyle@intel.com > Subject: [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific > CRC selection > > This patch adds support for run-time selection of the optimal architecture- > specific CRC path, based on the supported instruction set(s) of the CPU. > > The compiler option checks have been moved from the C files to the meson > script. The rte_cpu_get_flag_enabled function is called automatically by the > library at process initialization time to determine which instructions the CPU > supports, with the most optimal supported CRC path ultimately selected. > > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> > Signed-off-by: David Coyle <david.coyle@intel.com> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > --- > doc/guides/rel_notes/release_20_11.rst | 4 + > lib/librte_net/meson.build | 34 ++++++- > lib/librte_net/net_crc.h | 34 +++++++ > lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 ++--- > lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 ++----- > lib/librte_net/rte_net_crc.c | 116 +++++++++++++++------- > 6 files changed, 168 insertions(+), 80 deletions(-) create mode 100644 > lib/librte_net/net_crc.h rename lib/librte_net/{net_crc_neon.h => > net_crc_neon.c} (95%) rename lib/librte_net/{net_crc_sse.h => > net_crc_sse.c} (94%) > > diff --git a/doc/guides/rel_notes/release_20_11.rst > b/doc/guides/rel_notes/release_20_11.rst > index 808bdc4e5..b77297f7e 100644 > --- a/doc/guides/rel_notes/release_20_11.rst > +++ b/doc/guides/rel_notes/release_20_11.rst > @@ -55,6 +55,10 @@ New Features > Also, make sure to start the actual text at the margin. > ======================================================= > > +* **Updated CRC modules of rte_net library.** > + > + * Added run-time selection of the optimal architecture-specific CRC path. > + > * **Updated Broadcom bnxt driver.** > > Updated the Broadcom bnxt driver with new features and improvements, > including: > diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index > 24ed8253b..fa439b9e5 100644 > --- a/lib/librte_net/meson.build > +++ b/lib/librte_net/meson.build > @@ -1,5 +1,5 @@ > # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017 Intel > Corporation > +# Copyright(c) 2017-2020 Intel Corporation > > headers = files('rte_ip.h', > 'rte_tcp.h', > @@ -20,3 +20,35 @@ headers = files('rte_ip.h', > > sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c') deps += > ['mbuf'] > + > +if dpdk_conf.has('RTE_ARCH_X86_64') > + net_crc_sse42_cpu_support = ( > + cc.get_define('__PCLMUL__', args: machine_args) != '') > + net_crc_sse42_cc_support = ( > + cc.has_argument('-mpclmul') and cc.has_argument('-maes')) > + > + build_static_net_crc_sse42_lib = 0 > + > + if net_crc_sse42_cpu_support == true > + sources += files('net_crc_sse.c') > + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] > + elif net_crc_sse42_cc_support == true > + build_static_net_crc_sse42_lib = 1 > + net_crc_sse42_lib_cflags = ['-mpclmul', '-maes'] > + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] > + endif > + > + if build_static_net_crc_sse42_lib == 1 > + net_crc_sse42_lib = static_library( > + 'net_crc_sse42_lib', > + 'net_crc_sse.c', > + dependencies: static_rte_eal, > + c_args: [cflags, > + net_crc_sse42_lib_cflags]) > + objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c') > + endif > +elif (dpdk_conf.has('RTE_ARCH_ARM64') and > + cc.get_define('__ARM_FEATURE_CRYPTO', args: > machine_args) != '') > + sources += files('net_crc_neon.c') > + cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT'] endif > diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h new file mode > 100644 index 000000000..a1578a56c > --- /dev/null > +++ b/lib/librte_net/net_crc.h > @@ -0,0 +1,34 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + */ > + > +#ifndef _NET_CRC_H_ > +#define _NET_CRC_H_ > + > +/* > + * Different implementations of CRC > + */ > + > +/* SSE4.2 */ > + > +void > +rte_net_crc_sse42_init(void); > + > +uint32_t > +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len); > + > +uint32_t > +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len); > + > +/* NEON */ > + > +void > +rte_net_crc_neon_init(void); > + > +uint32_t > +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len); > + > +uint32_t > +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len); > + > +#endif /* _NET_CRC_H_ */ > diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c > similarity index 95% rename from lib/librte_net/net_crc_neon.h rename to > lib/librte_net/net_crc_neon.c index 63fa1d4a1..f61d75a8c 100644 > --- a/lib/librte_net/net_crc_neon.h > +++ b/lib/librte_net/net_crc_neon.c > @@ -2,17 +2,15 @@ > * Copyright(c) 2017 Cavium, Inc > */ > > -#ifndef _NET_CRC_NEON_H_ > -#define _NET_CRC_NEON_H_ > +#include <string.h> > > +#include <rte_common.h> > #include <rte_branch_prediction.h> > #include <rte_net_crc.h> > #include <rte_vect.h> > #include <rte_cpuflags.h> > > -#ifdef __cplusplus > -extern "C" { > -#endif > +#include "net_crc.h" > > /** PMULL CRC computation context structure */ struct crc_pmull_ctx > { @@ -218,7 +216,7 @@ crc32_eth_calc_pmull( > return n; > } > > -static inline void > +void > rte_net_crc_neon_init(void) > { > /* Initialize CRC16 data */ > @@ -242,9 +240,8 @@ rte_net_crc_neon_init(void) > crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8); } > > -static inline uint32_t > -rte_crc16_ccitt_neon_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len) > { > return (uint16_t)~crc32_eth_calc_pmull(data, > data_len, > @@ -252,18 +249,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data, > &crc16_ccitt_pmull); > } > > -static inline uint32_t > -rte_crc32_eth_neon_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len) > { > return ~crc32_eth_calc_pmull(data, > data_len, > 0xffffffffUL, > &crc32_eth_pmull); > } > - > -#ifdef __cplusplus > -} > -#endif > - > -#endif /* _NET_CRC_NEON_H_ */ > diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c > similarity index 94% rename from lib/librte_net/net_crc_sse.h rename to > lib/librte_net/net_crc_sse.c index 1c7b7a548..053b54b39 100644 > --- a/lib/librte_net/net_crc_sse.h > +++ b/lib/librte_net/net_crc_sse.c > @@ -1,18 +1,16 @@ > /* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2017 Intel Corporation > + * Copyright(c) 2017-2020 Intel Corporation > */ > > -#ifndef _RTE_NET_CRC_SSE_H_ > -#define _RTE_NET_CRC_SSE_H_ > +#include <string.h> > > +#include <rte_common.h> > #include <rte_branch_prediction.h> > +#include <rte_cpuflags.h> > > -#include <x86intrin.h> > -#include <cpuid.h> > +#include "net_crc.h" > > -#ifdef __cplusplus > -extern "C" { > -#endif > +#include <x86intrin.h> > > /** PCLMULQDQ CRC computation context structure */ struct > crc_pclmulqdq_ctx { @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq( > return n; > } > > - > -static inline void > +void > rte_net_crc_sse42_init(void) > { > uint64_t k1, k2, k5, k6; > @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void) > * use other data types such as float, double, etc. > */ > _mm_empty(); > - > } > > -static inline uint32_t > -rte_crc16_ccitt_sse42_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len) > { > /** return 16-bit CRC value */ > return (uint16_t)~crc32_eth_calc_pclmulqdq(data, > @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, > &crc16_ccitt_pclmulqdq); > } > > -static inline uint32_t > -rte_crc32_eth_sse42_handler(const uint8_t *data, > - uint32_t data_len) > +uint32_t > +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len) > { > return ~crc32_eth_calc_pclmulqdq(data, > data_len, > 0xffffffffUL, > &crc32_eth_pclmulqdq); > } > - > -#ifdef __cplusplus > -} > -#endif > - > -#endif /* _RTE_NET_CRC_SSE_H_ */ > diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index > 4f5b9e828..d271d5205 100644 > --- a/lib/librte_net/rte_net_crc.c > +++ b/lib/librte_net/rte_net_crc.c > @@ -1,5 +1,5 @@ > /* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2017 Intel Corporation > + * Copyright(c) 2017-2020 Intel Corporation > */ > > #include <stddef.h> > @@ -10,17 +10,7 @@ > #include <rte_common.h> > #include <rte_net_crc.h> > > -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__) > -#define X86_64_SSE42_PCLMULQDQ 1 > -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO) > -#define ARM64_NEON_PMULL 1 > -#endif > - > -#ifdef X86_64_SSE42_PCLMULQDQ > -#include <net_crc_sse.h> > -#elif defined ARM64_NEON_PMULL > -#include <net_crc_neon.h> > -#endif > +#include "net_crc.h" > > /** CRC polynomials */ > #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL @@ -41,25 +31,27 @@ > rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len); typedef > uint32_t (*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len); > > -static rte_net_crc_handler *handlers; > +static const rte_net_crc_handler *handlers; > > -static rte_net_crc_handler handlers_scalar[] = { > +static const rte_net_crc_handler handlers_scalar[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, }; > - > -#ifdef X86_64_SSE42_PCLMULQDQ > -static rte_net_crc_handler handlers_sse42[] = { > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static const > +rte_net_crc_handler handlers_sse42[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler, }; -#elif > defined ARM64_NEON_PMULL -static rte_net_crc_handler handlers_neon[] > = { > +#endif > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > +static const rte_net_crc_handler handlers_neon[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler, }; #endif > > +/* Scalar handling */ > + > /** > * Reflect the bits about the middle > * > @@ -142,29 +134,82 @@ rte_crc32_eth_handler(const uint8_t *data, > uint32_t data_len) > crc32_eth_lut); > } > > +/* SSE4.2/PCLMULQDQ handling */ > + > +#define SSE42_PCLMULQDQ_CPU_SUPPORTED \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ) > + > +static const rte_net_crc_handler * > +sse42_pclmulqdq_get_handlers(void) > +{ > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > + if (SSE42_PCLMULQDQ_CPU_SUPPORTED) > + return handlers_sse42; > +#endif > + return NULL; > +} > + > +static uint8_t > +sse42_pclmulqdq_init(void) > +{ > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT > + if (SSE42_PCLMULQDQ_CPU_SUPPORTED) { > + rte_net_crc_sse42_init(); > + return 1; > + } > +#endif > + return 0; > +} > + > +/* NEON/PMULL handling */ > + > +#define NEON_PMULL_CPU_SUPPORTED \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL) > + > +static const rte_net_crc_handler * > +neon_pmull_get_handlers(void) > +{ > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > + if (NEON_PMULL_CPU_SUPPORTED) > + return handlers_neon; > +#endif > + return NULL; > +} > + > +static uint8_t > +neon_pmull_init(void) > +{ > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT > + if (NEON_PMULL_CPU_SUPPORTED) { > + rte_net_crc_neon_init(); > + return 1; > + } > +#endif > + return 0; > +} > + > +/* Public API */ > + > void > rte_net_crc_set_alg(enum rte_net_crc_alg alg) { > + handlers = NULL; > + > switch (alg) { > -#ifdef X86_64_SSE42_PCLMULQDQ > case RTE_NET_CRC_SSE42: > - handlers = handlers_sse42; > - break; > -#elif defined ARM64_NEON_PMULL > - /* fall-through */ > + handlers = sse42_pclmulqdq_get_handlers(); > + break; /* for x86, always break here */ > case RTE_NET_CRC_NEON: > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { > - handlers = handlers_neon; > - break; > - } > -#endif > + handlers = neon_pmull_get_handlers(); > /* fall-through */ > case RTE_NET_CRC_SCALAR: > /* fall-through */ > default: > - handlers = handlers_scalar; > break; > } > + > + if (handlers == NULL) > + handlers = handlers_scalar; > } > > uint32_t > @@ -188,15 +233,10 @@ RTE_INIT(rte_net_crc_init) > > rte_net_crc_scalar_init(); > > -#ifdef X86_64_SSE42_PCLMULQDQ > - alg = RTE_NET_CRC_SSE42; > - rte_net_crc_sse42_init(); > -#elif defined ARM64_NEON_PMULL > - if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) { > + if (sse42_pclmulqdq_init()) > + alg = RTE_NET_CRC_SSE42; > + if (neon_pmull_init()) > alg = RTE_NET_CRC_NEON; > - rte_net_crc_neon_init(); > - } > -#endif > > rte_net_crc_set_alg(alg); > } > -- > 2.12.3 The change looks good to me. Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh 2020-10-09 16:22 ` Singh, Jasvinder 2020-10-10 9:34 ` Ruifeng Wang @ 2020-10-13 9:07 ` Bruce Richardson 2 siblings, 0 replies; 23+ messages in thread From: Bruce Richardson @ 2020-10-13 9:07 UTC (permalink / raw) To: Mairtin o Loingsigh Cc: jasvinder.singh, pablo.de.lara.guarch, konstantin.ananyev, dev, brendan.ryan, david.coyle On Fri, Oct 09, 2020 at 02:50:44PM +0100, Mairtin o Loingsigh wrote: > This patch adds support for run-time selection of the optimal > architecture-specific CRC path, based on the supported instruction set(s) > of the CPU. > > The compiler option checks have been moved from the C files to the meson > script. The rte_cpu_get_flag_enabled function is called automatically by > the library at process initialization time to determine which > instructions the CPU supports, with the most optimal supported CRC path > ultimately selected. > > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> > Signed-off-by: David Coyle <david.coyle@intel.com> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > --- > doc/guides/rel_notes/release_20_11.rst | 4 + > lib/librte_net/meson.build | 34 ++++++- > lib/librte_net/net_crc.h | 34 +++++++ > lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 ++--- > lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 ++----- > lib/librte_net/rte_net_crc.c | 116 +++++++++++++++------- > 6 files changed, 168 insertions(+), 80 deletions(-) > create mode 100644 lib/librte_net/net_crc.h > rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%) > rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%) > > diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst > index 808bdc4e5..b77297f7e 100644 > --- a/doc/guides/rel_notes/release_20_11.rst > +++ b/doc/guides/rel_notes/release_20_11.rst > @@ -55,6 +55,10 @@ New Features > Also, make sure to start the actual text at the margin. > ======================================================= > > +* **Updated CRC modules of rte_net library.** > + > + * Added run-time selection of the optimal architecture-specific CRC path. > + > * **Updated Broadcom bnxt driver.** > > Updated the Broadcom bnxt driver with new features and improvements, including: > diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build > index 24ed8253b..fa439b9e5 100644 > --- a/lib/librte_net/meson.build > +++ b/lib/librte_net/meson.build > @@ -1,5 +1,5 @@ > # SPDX-License-Identifier: BSD-3-Clause > -# Copyright(c) 2017 Intel Corporation > +# Copyright(c) 2017-2020 Intel Corporation > > headers = files('rte_ip.h', > 'rte_tcp.h', > @@ -20,3 +20,35 @@ headers = files('rte_ip.h', > > sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c') > deps += ['mbuf'] > + > +if dpdk_conf.has('RTE_ARCH_X86_64') > + net_crc_sse42_cpu_support = ( > + cc.get_define('__PCLMUL__', args: machine_args) != '') > + net_crc_sse42_cc_support = ( > + cc.has_argument('-mpclmul') and cc.has_argument('-maes')) > + > + build_static_net_crc_sse42_lib = 0 > + > + if net_crc_sse42_cpu_support == true > + sources += files('net_crc_sse.c') > + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] > + elif net_crc_sse42_cc_support == true > + build_static_net_crc_sse42_lib = 1 > + net_crc_sse42_lib_cflags = ['-mpclmul', '-maes'] > + cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] > + endif > + > + if build_static_net_crc_sse42_lib == 1 > + net_crc_sse42_lib = static_library( > + 'net_crc_sse42_lib', > + 'net_crc_sse.c', > + dependencies: static_rte_eal, > + c_args: [cflags, > + net_crc_sse42_lib_cflags]) > + objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c') > + endif > +elif (dpdk_conf.has('RTE_ARCH_ARM64') and > + cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '') > + sources += files('net_crc_neon.c') > + cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT'] > +endif This meson code looks ok to me. Not sure you needed the variable for "net_crc_sse42_lib_cflags", but generally looks good. Acked-by: Bruce Richardson <bruce.richadson@intel.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh @ 2020-10-09 13:50 ` Mairtin o Loingsigh 2020-10-09 16:24 ` Singh, Jasvinder 2020-10-09 18:35 ` [dpdk-dev] [PATCH v5 0/2] net: add CRC run-time checks and " De Lara Guarch, Pablo 2020-10-13 18:47 ` David Marchand 3 siblings, 1 reply; 23+ messages in thread From: Mairtin o Loingsigh @ 2020-10-09 13:50 UTC (permalink / raw) To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch, konstantin.ananyev Cc: dev, brendan.ryan, mairtin.oloingsigh, david.coyle This patch enables the optimized calculation of CRC32-Ethernet and CRC16-CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC implementation is built if the compiler supports the required instruction sets. It is selected at run-time if the host CPU, again, supports the required instruction sets. Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> Signed-off-by: David Coyle <david.coyle@intel.com> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> --- app/test/test_crc.c | 11 +- config/x86/meson.build | 6 +- doc/guides/rel_notes/release_20_11.rst | 2 + lib/librte_net/meson.build | 55 +++++ lib/librte_net/net_crc.h | 11 + lib/librte_net/net_crc_avx512.c | 423 +++++++++++++++++++++++++++++++++ lib/librte_net/rte_net_crc.c | 46 ++++ lib/librte_net/rte_net_crc.h | 4 +- 8 files changed, 554 insertions(+), 4 deletions(-) create mode 100644 lib/librte_net/net_crc_avx512.c diff --git a/app/test/test_crc.c b/app/test/test_crc.c index f8a74e04e..bf1d34435 100644 --- a/app/test/test_crc.c +++ b/app/test/test_crc.c @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2017-2020 Intel Corporation */ #include "test.h" @@ -149,6 +149,15 @@ test_crc(void) return ret; } + /* set CRC avx512 mode */ + rte_net_crc_set_alg(RTE_NET_CRC_AVX512); + + ret = test_crc_calc(); + if (ret < 0) { + printf("test crc (x86_64 AVX512): failed (%d)\n", ret); + return ret; + } + /* set CRC neon mode */ rte_net_crc_set_alg(RTE_NET_CRC_NEON); diff --git a/config/x86/meson.build b/config/x86/meson.build index fea4d5403..172b72b72 100644 --- a/config/x86/meson.build +++ b/config/x86/meson.build @@ -1,5 +1,5 @@ # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017-2019 Intel Corporation +# Copyright(c) 2017-2020 Intel Corporation # get binutils version for the workaround of Bug 97 if not is_windows @@ -23,7 +23,9 @@ endforeach optional_flags = ['AES', 'PCLMUL', 'AVX', 'AVX2', 'AVX512F', - 'RDRND', 'RDSEED'] + 'RDRND', 'RDSEED', + 'AVX512BW', 'AVX512DQ', + 'AVX512VL', 'VPCLMULQDQ'] foreach f:optional_flags if cc.get_define('__@0@__'.format(f), args: machine_args) == '1' if f == 'PCLMUL' # special case flags with different defines diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst index b77297f7e..5eda680d5 100644 --- a/doc/guides/rel_notes/release_20_11.rst +++ b/doc/guides/rel_notes/release_20_11.rst @@ -58,6 +58,8 @@ New Features * **Updated CRC modules of rte_net library.** * Added run-time selection of the optimal architecture-specific CRC path. + * Added optimized implementations of CRC32-Ethernet and CRC16-CCITT + using the AVX512 and VPCLMULQDQ instruction sets. * **Updated Broadcom bnxt driver.** diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index fa439b9e5..6c96b361a 100644 --- a/lib/librte_net/meson.build +++ b/lib/librte_net/meson.build @@ -24,18 +24,62 @@ deps += ['mbuf'] if dpdk_conf.has('RTE_ARCH_X86_64') net_crc_sse42_cpu_support = ( cc.get_define('__PCLMUL__', args: machine_args) != '') + net_crc_avx512_cpu_support = ( + cc.get_define('__AVX512F__', args: machine_args) != '' and + cc.get_define('__AVX512BW__', args: machine_args) != '' and + cc.get_define('__AVX512DQ__', args: machine_args) != '' and + cc.get_define('__AVX512VL__', args: machine_args) != '' and + cc.get_define('__VPCLMULQDQ__', args: machine_args) != '') + net_crc_sse42_cc_support = ( cc.has_argument('-mpclmul') and cc.has_argument('-maes')) + net_crc_avx512_cc_support = ( + not machine_args.contains('-mno-avx512f') and + cc.has_argument('-mavx512f') and + cc.has_argument('-mavx512bw') and + cc.has_argument('-mavx512dq') and + cc.has_argument('-mavx512vl') and + cc.has_argument('-mvpclmulqdq') and + cc.has_argument('-mavx2') and + cc.has_argument('-mavx')) build_static_net_crc_sse42_lib = 0 + build_static_net_crc_avx512_lib = 0 if net_crc_sse42_cpu_support == true sources += files('net_crc_sse.c') cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] + if net_crc_avx512_cpu_support == true + sources += files('net_crc_avx512.c') + cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] + elif net_crc_avx512_cc_support == true + build_static_net_crc_avx512_lib = 1 + net_crc_avx512_lib_cflags = ['-mavx512f', + '-mavx512bw', + '-mavx512dq', + '-mavx512vl', + '-mvpclmulqdq', + '-mavx2', + '-mavx'] + cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] + endif elif net_crc_sse42_cc_support == true build_static_net_crc_sse42_lib = 1 net_crc_sse42_lib_cflags = ['-mpclmul', '-maes'] cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] + if net_crc_avx512_cc_support == true + build_static_net_crc_avx512_lib = 1 + net_crc_avx512_lib_cflags = ['-mpclmul', + '-maes', + '-mavx512f', + '-mavx512bw', + '-mavx512dq', + '-mavx512vl', + '-mvpclmulqdq', + '-mavx2', + '-mavx'] + cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] + endif endif if build_static_net_crc_sse42_lib == 1 @@ -47,6 +91,17 @@ if dpdk_conf.has('RTE_ARCH_X86_64') net_crc_sse42_lib_cflags]) objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c') endif + + if build_static_net_crc_avx512_lib == 1 + net_crc_avx512_lib = static_library( + 'net_crc_avx512_lib', + 'net_crc_avx512.c', + dependencies: static_rte_eal, + c_args: [cflags, + net_crc_avx512_lib_cflags]) + objs += net_crc_avx512_lib.extract_objects('net_crc_avx512.c') + endif + elif (dpdk_conf.has('RTE_ARCH_ARM64') and cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '') sources += files('net_crc_neon.c') diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h index a1578a56c..7a74d5406 100644 --- a/lib/librte_net/net_crc.h +++ b/lib/librte_net/net_crc.h @@ -20,6 +20,17 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len); uint32_t rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len); +/* AVX512 */ + +void +rte_net_crc_avx512_init(void); + +uint32_t +rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len); + +uint32_t +rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len); + /* NEON */ void diff --git a/lib/librte_net/net_crc_avx512.c b/lib/librte_net/net_crc_avx512.c new file mode 100644 index 000000000..3740fe3c9 --- /dev/null +++ b/lib/librte_net/net_crc_avx512.c @@ -0,0 +1,423 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2020 Intel Corporation + */ + +#include <string.h> + +#include <rte_common.h> +#include <rte_branch_prediction.h> +#include <rte_cpuflags.h> + +#include "net_crc.h" + +#include <x86intrin.h> + +/* VPCLMULQDQ CRC computation context structure */ +struct crc_vpclmulqdq_ctx { + __m512i rk1_rk2; + __m512i rk3_rk4; + __m512i fold_7x128b; + __m512i fold_3x128b; + __m128i rk5_rk6; + __m128i rk7_rk8; + __m128i fold_1x128b; +}; + +static struct crc_vpclmulqdq_ctx crc32_eth __rte_aligned(64); +static struct crc_vpclmulqdq_ctx crc16_ccitt __rte_aligned(64); + +static uint16_t byte_len_to_mask_table[] = { + 0x0000, 0x0001, 0x0003, 0x0007, + 0x000f, 0x001f, 0x003f, 0x007f, + 0x00ff, 0x01ff, 0x03ff, 0x07ff, + 0x0fff, 0x1fff, 0x3fff, 0x7fff, + 0xffff}; + +static const uint8_t shf_table[32] __rte_aligned(16) = { + 0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, + 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, + 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f +}; + +static const uint32_t mask[4] __rte_aligned(16) = { + 0xffffffff, 0xffffffff, 0x00000000, 0x00000000 +}; + +static const uint32_t mask2[4] __rte_aligned(16) = { + 0x00000000, 0xffffffff, 0xffffffff, 0xffffffff +}; + +static __rte_always_inline __m512i +crcr32_folding_round(__m512i data_block, __m512i precomp, __m512i fold) +{ + __m512i tmp0, tmp1; + + tmp0 = _mm512_clmulepi64_epi128(fold, precomp, 0x01); + tmp1 = _mm512_clmulepi64_epi128(fold, precomp, 0x10); + + return _mm512_ternarylogic_epi64(tmp0, tmp1, data_block, 0x96); +} + +static __rte_always_inline __m128i +crc32_fold_128(__m512i fold0, __m512i fold1, + const struct crc_vpclmulqdq_ctx *params) +{ + __m128i res, res2; + __m256i a; + __m512i tmp0, tmp1, tmp2, tmp3; + __m512i tmp4; + + tmp0 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, 0x01); + tmp1 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, 0x10); + + res = _mm512_extracti64x2_epi64(fold1, 3); + tmp4 = _mm512_maskz_broadcast_i32x4(0xF, res); + + tmp2 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, 0x01); + tmp3 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, 0x10); + + tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp1, tmp2, 0x96); + tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp3, tmp4, 0x96); + + tmp1 = _mm512_shuffle_i64x2(tmp0, tmp0, 0x4e); + + a = _mm256_xor_si256(*(__m256i *)&tmp1, *(__m256i *)&tmp0); + res = _mm256_extracti64x2_epi64(a, 1); + res2 = _mm_xor_si128(res, *(__m128i *)&a); + + return res2; +} + +static __rte_always_inline __m128i +last_two_xmm(const uint8_t *data, uint32_t data_len, uint32_t n, __m128i res, + const struct crc_vpclmulqdq_ctx *params) +{ + uint32_t offset; + __m128i res2, res3, res4, pshufb_shf; + + const uint32_t mask3[4] __rte_aligned(16) = { + 0x80808080, 0x80808080, 0x80808080, 0x80808080 + }; + + res2 = res; + offset = data_len - n; + res3 = _mm_loadu_si128((const __m128i *)&data[n+offset-16]); + + pshufb_shf = _mm_loadu_si128((const __m128i *) + (shf_table + (data_len-n))); + + res = _mm_shuffle_epi8(res, pshufb_shf); + pshufb_shf = _mm_xor_si128(pshufb_shf, + _mm_load_si128((const __m128i *) mask3)); + res2 = _mm_shuffle_epi8(res2, pshufb_shf); + + res2 = _mm_blendv_epi8(res2, res3, pshufb_shf); + + res4 = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x01); + res = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x10); + res = _mm_ternarylogic_epi64(res, res2, res4, 0x96); + + return res; +} + +static __rte_always_inline __m128i +done_128(__m128i res, const struct crc_vpclmulqdq_ctx *params) +{ + __m128i res1; + + res1 = res; + + res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x0); + res1 = _mm_srli_si128(res1, 8); + res = _mm_xor_si128(res, res1); + + res1 = res; + res = _mm_slli_si128(res, 4); + res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x10); + res = _mm_xor_si128(res, res1); + + return res; +} + +static __rte_always_inline uint32_t +barrett_reduction(__m128i data64, const struct crc_vpclmulqdq_ctx *params) +{ + __m128i tmp0, tmp1; + + data64 = _mm_and_si128(data64, *(const __m128i *)mask2); + tmp0 = data64; + tmp1 = data64; + + data64 = _mm_clmulepi64_si128(tmp0, params->rk7_rk8, 0x0); + data64 = _mm_ternarylogic_epi64(data64, tmp1, *(const __m128i *)mask, + 0x28); + + tmp1 = data64; + data64 = _mm_clmulepi64_si128(data64, params->rk7_rk8, 0x10); + data64 = _mm_ternarylogic_epi64(data64, tmp1, tmp0, 0x96); + + return _mm_extract_epi32(data64, 2); +} + +static __rte_always_inline void +reduction_loop(__m128i *fold, int *len, const uint8_t *data, uint32_t *n, + const struct crc_vpclmulqdq_ctx *params) +{ + __m128i tmp, tmp1; + + tmp = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x1); + *fold = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x10); + *fold = _mm_xor_si128(*fold, tmp); + tmp1 = _mm_loadu_si128((const __m128i *)&data[*n]); + *fold = _mm_xor_si128(*fold, tmp1); + *n += 16; + *len -= 16; +} + +static __rte_always_inline uint32_t +crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len, uint32_t crc, + const struct crc_vpclmulqdq_ctx *params) +{ + __m128i res, d, b; + __m512i temp, k; + __m512i qw0 = _mm512_set1_epi64(0), qw1, qw2, qw3; + __m512i fold0, fold1, fold2, fold3; + __mmask16 mask; + uint32_t n = 0; + int reduction = 0; + + /* Get CRC init value */ + b = _mm_cvtsi32_si128(crc); + temp = _mm512_castsi128_si512(b); + + if (data_len > 255) { + fold0 = _mm512_loadu_si512((const __m512i *)data); + fold1 = _mm512_loadu_si512((const __m512i *)(data+64)); + fold2 = _mm512_loadu_si512((const __m512i *)(data+128)); + fold3 = _mm512_loadu_si512((const __m512i *)(data+192)); + fold0 = _mm512_xor_si512(fold0, temp); + + /* Main folding loop */ + k = params->rk1_rk2; + for (n = 256; (n + 256) <= data_len; n += 256) { + qw0 = _mm512_loadu_si512((const __m512i *)&data[n]); + qw1 = _mm512_loadu_si512((const __m512i *) + &(data[n+64])); + qw2 = _mm512_loadu_si512((const __m512i *) + &(data[n+128])); + qw3 = _mm512_loadu_si512((const __m512i *) + &(data[n+192])); + fold0 = crcr32_folding_round(qw0, k, fold0); + fold1 = crcr32_folding_round(qw1, k, fold1); + fold2 = crcr32_folding_round(qw2, k, fold2); + fold3 = crcr32_folding_round(qw3, k, fold3); + } + + /* 256 to 128 fold */ + k = params->rk3_rk4; + fold0 = crcr32_folding_round(fold2, k, fold0); + fold1 = crcr32_folding_round(fold3, k, fold1); + + res = crc32_fold_128(fold0, fold1, params); + + reduction = 240 - ((n+256)-data_len); + + while (reduction > 0) + reduction_loop(&res, &reduction, data, &n, + params); + + reduction += 16; + + if (n != data_len) + res = last_two_xmm(data, data_len, n, res, + params); + } else { + if (data_len > 31) { + res = _mm_cvtsi32_si128(crc); + d = _mm_loadu_si128((const __m128i *)data); + res = _mm_xor_si128(res, d); + n += 16; + + reduction = 240 - ((n+256)-data_len); + + while (reduction > 0) + reduction_loop(&res, &reduction, data, &n, + params); + + if (n != data_len) + res = last_two_xmm(data, data_len, n, res, + params); + } else if (data_len > 16) { + res = _mm_cvtsi32_si128(crc); + d = _mm_loadu_si128((const __m128i *)data); + res = _mm_xor_si128(res, d); + n += 16; + + if (n != data_len) + res = last_two_xmm(data, data_len, n, res, + params); + } else if (data_len == 16) { + res = _mm_cvtsi32_si128(crc); + d = _mm_loadu_si128((const __m128i *)data); + res = _mm_xor_si128(res, d); + } else { + res = _mm_cvtsi32_si128(crc); + mask = byte_len_to_mask_table[data_len]; + d = _mm_maskz_loadu_epi8(mask, data); + res = _mm_xor_si128(res, d); + + if (data_len > 3) { + d = _mm_loadu_si128((const __m128i *) + &shf_table[data_len]); + res = _mm_shuffle_epi8(res, d); + } else if (data_len > 2) { + res = _mm_slli_si128(res, 5); + goto do_barrett_reduction; + } else if (data_len > 1) { + res = _mm_slli_si128(res, 6); + goto do_barrett_reduction; + } else if (data_len > 0) { + res = _mm_slli_si128(res, 7); + goto do_barrett_reduction; + } else { + /* zero length case */ + return crc; + } + } + } + + res = done_128(res, params); + +do_barrett_reduction: + n = barrett_reduction(res, params); + + return n; +} + +static void +crc32_load_init_constants(void) +{ + __m128i a; + /* fold constants */ + uint64_t c0 = 0x00000000e95c1271; + uint64_t c1 = 0x00000000ce3371cb; + uint64_t c2 = 0x00000000910eeec1; + uint64_t c3 = 0x0000000033fff533; + uint64_t c4 = 0x000000000cbec0ed; + uint64_t c5 = 0x0000000031f8303f; + uint64_t c6 = 0x0000000057c54819; + uint64_t c7 = 0x00000000df068dc2; + uint64_t c8 = 0x00000000ae0b5394; + uint64_t c9 = 0x000000001c279815; + uint64_t c10 = 0x000000001d9513d7; + uint64_t c11 = 0x000000008f352d95; + uint64_t c12 = 0x00000000af449247; + uint64_t c13 = 0x000000003db1ecdc; + uint64_t c14 = 0x0000000081256527; + uint64_t c15 = 0x00000000f1da05aa; + uint64_t c16 = 0x00000000ccaa009e; + uint64_t c17 = 0x00000000ae689191; + uint64_t c18 = 0x00000000ccaa009e; + uint64_t c19 = 0x00000000b8bc6765; + uint64_t c20 = 0x00000001f7011640; + uint64_t c21 = 0x00000001db710640; + + a = _mm_set_epi64x(c1, c0); + crc32_eth.rk1_rk2 = _mm512_broadcast_i32x4(a); + + a = _mm_set_epi64x(c3, c2); + crc32_eth.rk3_rk4 = _mm512_broadcast_i32x4(a); + + crc32_eth.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8, + c9, c10, c11); + crc32_eth.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15, + c16, c17, 0, 0); + crc32_eth.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16), + _mm_cvtsi64_m64(c17)); + + crc32_eth.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18), + _mm_cvtsi64_m64(c19)); + crc32_eth.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20), + _mm_cvtsi64_m64(c21)); +} + +static void +crc16_load_init_constants(void) +{ + __m128i a; + /* fold constants */ + uint64_t c0 = 0x0000000000009a19; + uint64_t c1 = 0x0000000000002df8; + uint64_t c2 = 0x00000000000068af; + uint64_t c3 = 0x000000000000b6c9; + uint64_t c4 = 0x000000000000c64f; + uint64_t c5 = 0x000000000000cd95; + uint64_t c6 = 0x000000000000d341; + uint64_t c7 = 0x000000000000b8f2; + uint64_t c8 = 0x0000000000000842; + uint64_t c9 = 0x000000000000b072; + uint64_t c10 = 0x00000000000047e3; + uint64_t c11 = 0x000000000000922d; + uint64_t c12 = 0x0000000000000e3a; + uint64_t c13 = 0x0000000000004d7a; + uint64_t c14 = 0x0000000000005b44; + uint64_t c15 = 0x0000000000007762; + uint64_t c16 = 0x00000000000081bf; + uint64_t c17 = 0x0000000000008e10; + uint64_t c18 = 0x00000000000081bf; + uint64_t c19 = 0x0000000000001cbb; + uint64_t c20 = 0x000000011c581910; + uint64_t c21 = 0x0000000000010810; + + a = _mm_set_epi64x(c1, c0); + crc16_ccitt.rk1_rk2 = _mm512_broadcast_i32x4(a); + + a = _mm_set_epi64x(c3, c2); + crc16_ccitt.rk3_rk4 = _mm512_broadcast_i32x4(a); + + crc16_ccitt.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8, + c9, c10, c11); + crc16_ccitt.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15, + c16, c17, 0, 0); + crc16_ccitt.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16), + _mm_cvtsi64_m64(c17)); + + crc16_ccitt.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18), + _mm_cvtsi64_m64(c19)); + crc16_ccitt.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20), + _mm_cvtsi64_m64(c21)); +} + +void +rte_net_crc_avx512_init(void) +{ + crc32_load_init_constants(); + crc16_load_init_constants(); + + /* + * Reset the register as following calculation may + * use other data types such as float, double, etc. + */ + _mm_empty(); +} + +uint32_t +rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len) +{ + /* return 16-bit CRC value */ + return (uint16_t)~crc32_eth_calc_vpclmulqdq(data, + data_len, + 0xffff, + &crc16_ccitt); +} + +uint32_t +rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len) +{ + /* return 32-bit CRC value */ + return ~crc32_eth_calc_vpclmulqdq(data, + data_len, + 0xffffffffUL, + &crc32_eth); +} diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index d271d5205..32a366590 100644 --- a/lib/librte_net/rte_net_crc.c +++ b/lib/librte_net/rte_net_crc.c @@ -37,6 +37,12 @@ static const rte_net_crc_handler handlers_scalar[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, }; +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT +static const rte_net_crc_handler handlers_avx512[] = { + [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_avx512_handler, + [RTE_NET_CRC32_ETH] = rte_crc32_eth_avx512_handler, +}; +#endif #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static const rte_net_crc_handler handlers_sse42[] = { [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, @@ -134,6 +140,39 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len) crc32_eth_lut); } +/* AVX512/VPCLMULQDQ handling */ + +#define AVX512_VPCLMULQDQ_CPU_SUPPORTED ( \ + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) && \ + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) && \ + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512DQ) && \ + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) && \ + rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ) && \ + rte_cpu_get_flag_enabled(RTE_CPUFLAG_VPCLMULQDQ) \ +) + +static const rte_net_crc_handler * +avx512_vpclmulqdq_get_handlers(void) +{ +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT + if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) + return handlers_avx512; +#endif + return NULL; +} + +static uint8_t +avx512_vpclmulqdq_init(void) +{ +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT + if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) { + rte_net_crc_avx512_init(); + return 1; + } +#endif + return 0; +} + /* SSE4.2/PCLMULQDQ handling */ #define SSE42_PCLMULQDQ_CPU_SUPPORTED \ @@ -196,6 +235,11 @@ rte_net_crc_set_alg(enum rte_net_crc_alg alg) handlers = NULL; switch (alg) { + case RTE_NET_CRC_AVX512: + handlers = avx512_vpclmulqdq_get_handlers(); + if (handlers != NULL) + break; + /* fall-through */ case RTE_NET_CRC_SSE42: handlers = sse42_pclmulqdq_get_handlers(); break; /* for x86, always break here */ @@ -235,6 +279,8 @@ RTE_INIT(rte_net_crc_init) if (sse42_pclmulqdq_init()) alg = RTE_NET_CRC_SSE42; + if (avx512_vpclmulqdq_init()) + alg = RTE_NET_CRC_AVX512; if (neon_pmull_init()) alg = RTE_NET_CRC_NEON; diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h index 16e85ca97..72d3e10ff 100644 --- a/lib/librte_net/rte_net_crc.h +++ b/lib/librte_net/rte_net_crc.h @@ -1,5 +1,5 @@ /* SPDX-License-Identifier: BSD-3-Clause - * Copyright(c) 2017 Intel Corporation + * Copyright(c) 2017-2020 Intel Corporation */ #ifndef _RTE_NET_CRC_H_ @@ -23,6 +23,7 @@ enum rte_net_crc_alg { RTE_NET_CRC_SCALAR = 0, RTE_NET_CRC_SSE42, RTE_NET_CRC_NEON, + RTE_NET_CRC_AVX512, }; /** @@ -35,6 +36,7 @@ enum rte_net_crc_alg { * - RTE_NET_CRC_SCALAR * - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic) * - RTE_NET_CRC_NEON (Use ARM Neon intrinsic) + * - RTE_NET_CRC_AVX512 (Use 512-bit AVX intrinsic) */ void rte_net_crc_set_alg(enum rte_net_crc_alg alg); -- 2.12.3 ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh @ 2020-10-09 16:24 ` Singh, Jasvinder 0 siblings, 0 replies; 23+ messages in thread From: Singh, Jasvinder @ 2020-10-09 16:24 UTC (permalink / raw) To: O'loingsigh, Mairtin, Richardson, Bruce, De Lara Guarch, Pablo, Ananyev, Konstantin Cc: dev, Ryan, Brendan, Coyle, David > -----Original Message----- > From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com> > Sent: Friday, October 9, 2020 2:51 PM > To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce > <bruce.richardson@intel.com>; De Lara Guarch, Pablo > <pablo.de.lara.guarch@intel.com>; Ananyev, Konstantin > <konstantin.ananyev@intel.com> > Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; O'loingsigh, > Mairtin <mairtin.oloingsigh@intel.com>; Coyle, David > <david.coyle@intel.com> > Subject: [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based > CRC > > This patch enables the optimized calculation of CRC32-Ethernet and CRC16- > CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC > implementation is built if the compiler supports the required instruction sets. > It is selected at run-time if the host CPU, again, supports the required > instruction sets. > > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> > Signed-off-by: David Coyle <david.coyle@intel.com> > Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com> > --- > app/test/test_crc.c | 11 +- > config/x86/meson.build | 6 +- > doc/guides/rel_notes/release_20_11.rst | 2 + > lib/librte_net/meson.build | 55 +++++ > lib/librte_net/net_crc.h | 11 + > lib/librte_net/net_crc_avx512.c | 423 > +++++++++++++++++++++++++++++++++ > lib/librte_net/rte_net_crc.c | 46 ++++ > lib/librte_net/rte_net_crc.h | 4 +- > 8 files changed, 554 insertions(+), 4 deletions(-) create mode 100644 > lib/librte_net/net_crc_avx512.c > > diff --git a/app/test/test_crc.c b/app/test/test_crc.c index > f8a74e04e..bf1d34435 100644 > --- a/app/test/test_crc.c > +++ b/app/test/test_crc.c > @@ -1,5 +1,5 @@ > /* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2017 Intel Corporation > + * Copyright(c) 2017-2020 Intel Corporation > */ > > #include "test.h" > @@ -149,6 +149,15 @@ test_crc(void) > return ret; > } > > + /* set CRC avx512 mode */ > + rte_net_crc_set_alg(RTE_NET_CRC_AVX512); > + > + ret = test_crc_calc(); > + if (ret < 0) { > + printf("test crc (x86_64 AVX512): failed (%d)\n", ret); > + return ret; > + } > + > /* set CRC neon mode */ > rte_net_crc_set_alg(RTE_NET_CRC_NEON); > > diff --git a/config/x86/meson.build b/config/x86/meson.build index > fea4d5403..172b72b72 100644 > --- a/config/x86/meson.build > +++ b/config/x86/meson.build > @@ -1,5 +1,5 @@ > # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017-2019 Intel > Corporation > +# Copyright(c) 2017-2020 Intel Corporation > > # get binutils version for the workaround of Bug 97 if not is_windows @@ - > 23,7 +23,9 @@ endforeach > > optional_flags = ['AES', 'PCLMUL', > 'AVX', 'AVX2', 'AVX512F', > - 'RDRND', 'RDSEED'] > + 'RDRND', 'RDSEED', > + 'AVX512BW', 'AVX512DQ', > + 'AVX512VL', 'VPCLMULQDQ'] > foreach f:optional_flags > if cc.get_define('__@0@__'.format(f), args: machine_args) == '1' > if f == 'PCLMUL' # special case flags with different defines diff > --git a/doc/guides/rel_notes/release_20_11.rst > b/doc/guides/rel_notes/release_20_11.rst > index b77297f7e..5eda680d5 100644 > --- a/doc/guides/rel_notes/release_20_11.rst > +++ b/doc/guides/rel_notes/release_20_11.rst > @@ -58,6 +58,8 @@ New Features > * **Updated CRC modules of rte_net library.** > > * Added run-time selection of the optimal architecture-specific CRC path. > + * Added optimized implementations of CRC32-Ethernet and CRC16-CCITT > + using the AVX512 and VPCLMULQDQ instruction sets. > > * **Updated Broadcom bnxt driver.** > > diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index > fa439b9e5..6c96b361a 100644 > --- a/lib/librte_net/meson.build > +++ b/lib/librte_net/meson.build > @@ -24,18 +24,62 @@ deps += ['mbuf'] > if dpdk_conf.has('RTE_ARCH_X86_64') > net_crc_sse42_cpu_support = ( > cc.get_define('__PCLMUL__', args: machine_args) != '') > + net_crc_avx512_cpu_support = ( > + cc.get_define('__AVX512F__', args: machine_args) != '' and > + cc.get_define('__AVX512BW__', args: machine_args) != '' and > + cc.get_define('__AVX512DQ__', args: machine_args) != '' and > + cc.get_define('__AVX512VL__', args: machine_args) != '' and > + cc.get_define('__VPCLMULQDQ__', args: machine_args) != '') > + > net_crc_sse42_cc_support = ( > cc.has_argument('-mpclmul') and cc.has_argument('-maes')) > + net_crc_avx512_cc_support = ( > + not machine_args.contains('-mno-avx512f') and > + cc.has_argument('-mavx512f') and > + cc.has_argument('-mavx512bw') and > + cc.has_argument('-mavx512dq') and > + cc.has_argument('-mavx512vl') and > + cc.has_argument('-mvpclmulqdq') and > + cc.has_argument('-mavx2') and > + cc.has_argument('-mavx')) > > build_static_net_crc_sse42_lib = 0 > + build_static_net_crc_avx512_lib = 0 > > if net_crc_sse42_cpu_support == true > sources += files('net_crc_sse.c') > cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] > + if net_crc_avx512_cpu_support == true > + sources += files('net_crc_avx512.c') > + cflags += ['- > DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] > + elif net_crc_avx512_cc_support == true > + build_static_net_crc_avx512_lib = 1 > + net_crc_avx512_lib_cflags = ['-mavx512f', > + '-mavx512bw', > + '-mavx512dq', > + '-mavx512vl', > + '-mvpclmulqdq', > + '-mavx2', > + '-mavx'] > + cflags += ['- > DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] > + endif > elif net_crc_sse42_cc_support == true > build_static_net_crc_sse42_lib = 1 > net_crc_sse42_lib_cflags = ['-mpclmul', '-maes'] > cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT'] > + if net_crc_avx512_cc_support == true > + build_static_net_crc_avx512_lib = 1 > + net_crc_avx512_lib_cflags = ['-mpclmul', > + '-maes', > + '-mavx512f', > + '-mavx512bw', > + '-mavx512dq', > + '-mavx512vl', > + '-mvpclmulqdq', > + '-mavx2', > + '-mavx'] > + cflags += ['- > DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT'] > + endif > endif > > if build_static_net_crc_sse42_lib == 1 @@ -47,6 +91,17 @@ if > dpdk_conf.has('RTE_ARCH_X86_64') > net_crc_sse42_lib_cflags]) > objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c') > endif > + > + if build_static_net_crc_avx512_lib == 1 > + net_crc_avx512_lib = static_library( > + 'net_crc_avx512_lib', > + 'net_crc_avx512.c', > + dependencies: static_rte_eal, > + c_args: [cflags, > + net_crc_avx512_lib_cflags]) > + objs += > net_crc_avx512_lib.extract_objects('net_crc_avx512.c') > + endif > + > elif (dpdk_conf.has('RTE_ARCH_ARM64') and > cc.get_define('__ARM_FEATURE_CRYPTO', args: > machine_args) != '') > sources += files('net_crc_neon.c') > diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h index > a1578a56c..7a74d5406 100644 > --- a/lib/librte_net/net_crc.h > +++ b/lib/librte_net/net_crc.h > @@ -20,6 +20,17 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, > uint32_t data_len); uint32_t rte_crc32_eth_sse42_handler(const uint8_t > *data, uint32_t data_len); > > +/* AVX512 */ > + > +void > +rte_net_crc_avx512_init(void); > + > +uint32_t > +rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len); > + > +uint32_t > +rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len); > + > /* NEON */ > > void > diff --git a/lib/librte_net/net_crc_avx512.c b/lib/librte_net/net_crc_avx512.c > new file mode 100644 index 000000000..3740fe3c9 > --- /dev/null > +++ b/lib/librte_net/net_crc_avx512.c > @@ -0,0 +1,423 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + */ > + > +#include <string.h> > + > +#include <rte_common.h> > +#include <rte_branch_prediction.h> > +#include <rte_cpuflags.h> > + > +#include "net_crc.h" > + > +#include <x86intrin.h> > + > +/* VPCLMULQDQ CRC computation context structure */ struct > +crc_vpclmulqdq_ctx { > + __m512i rk1_rk2; > + __m512i rk3_rk4; > + __m512i fold_7x128b; > + __m512i fold_3x128b; > + __m128i rk5_rk6; > + __m128i rk7_rk8; > + __m128i fold_1x128b; > +}; > + > +static struct crc_vpclmulqdq_ctx crc32_eth __rte_aligned(64); static > +struct crc_vpclmulqdq_ctx crc16_ccitt __rte_aligned(64); > + > +static uint16_t byte_len_to_mask_table[] = { > + 0x0000, 0x0001, 0x0003, 0x0007, > + 0x000f, 0x001f, 0x003f, 0x007f, > + 0x00ff, 0x01ff, 0x03ff, 0x07ff, > + 0x0fff, 0x1fff, 0x3fff, 0x7fff, > + 0xffff}; > + > +static const uint8_t shf_table[32] __rte_aligned(16) = { > + 0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, > + 0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, > + 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, > + 0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f }; > + > +static const uint32_t mask[4] __rte_aligned(16) = { > + 0xffffffff, 0xffffffff, 0x00000000, 0x00000000 }; > + > +static const uint32_t mask2[4] __rte_aligned(16) = { > + 0x00000000, 0xffffffff, 0xffffffff, 0xffffffff }; > + > +static __rte_always_inline __m512i > +crcr32_folding_round(__m512i data_block, __m512i precomp, __m512i > fold) > +{ > + __m512i tmp0, tmp1; > + > + tmp0 = _mm512_clmulepi64_epi128(fold, precomp, 0x01); > + tmp1 = _mm512_clmulepi64_epi128(fold, precomp, 0x10); > + > + return _mm512_ternarylogic_epi64(tmp0, tmp1, data_block, 0x96); } > + > +static __rte_always_inline __m128i > +crc32_fold_128(__m512i fold0, __m512i fold1, > + const struct crc_vpclmulqdq_ctx *params) { > + __m128i res, res2; > + __m256i a; > + __m512i tmp0, tmp1, tmp2, tmp3; > + __m512i tmp4; > + > + tmp0 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, > 0x01); > + tmp1 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, > 0x10); > + > + res = _mm512_extracti64x2_epi64(fold1, 3); > + tmp4 = _mm512_maskz_broadcast_i32x4(0xF, res); > + > + tmp2 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, > 0x01); > + tmp3 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, > 0x10); > + > + tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp1, tmp2, 0x96); > + tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp3, tmp4, 0x96); > + > + tmp1 = _mm512_shuffle_i64x2(tmp0, tmp0, 0x4e); > + > + a = _mm256_xor_si256(*(__m256i *)&tmp1, *(__m256i *)&tmp0); > + res = _mm256_extracti64x2_epi64(a, 1); > + res2 = _mm_xor_si128(res, *(__m128i *)&a); > + > + return res2; > +} > + > +static __rte_always_inline __m128i > +last_two_xmm(const uint8_t *data, uint32_t data_len, uint32_t n, __m128i > res, > + const struct crc_vpclmulqdq_ctx *params) { > + uint32_t offset; > + __m128i res2, res3, res4, pshufb_shf; > + > + const uint32_t mask3[4] __rte_aligned(16) = { > + 0x80808080, 0x80808080, 0x80808080, 0x80808080 > + }; > + > + res2 = res; > + offset = data_len - n; > + res3 = _mm_loadu_si128((const __m128i *)&data[n+offset-16]); > + > + pshufb_shf = _mm_loadu_si128((const __m128i *) > + (shf_table + (data_len-n))); > + > + res = _mm_shuffle_epi8(res, pshufb_shf); > + pshufb_shf = _mm_xor_si128(pshufb_shf, > + _mm_load_si128((const __m128i *) mask3)); > + res2 = _mm_shuffle_epi8(res2, pshufb_shf); > + > + res2 = _mm_blendv_epi8(res2, res3, pshufb_shf); > + > + res4 = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x01); > + res = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x10); > + res = _mm_ternarylogic_epi64(res, res2, res4, 0x96); > + > + return res; > +} > + > +static __rte_always_inline __m128i > +done_128(__m128i res, const struct crc_vpclmulqdq_ctx *params) { > + __m128i res1; > + > + res1 = res; > + > + res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x0); > + res1 = _mm_srli_si128(res1, 8); > + res = _mm_xor_si128(res, res1); > + > + res1 = res; > + res = _mm_slli_si128(res, 4); > + res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x10); > + res = _mm_xor_si128(res, res1); > + > + return res; > +} > + > +static __rte_always_inline uint32_t > +barrett_reduction(__m128i data64, const struct crc_vpclmulqdq_ctx > +*params) { > + __m128i tmp0, tmp1; > + > + data64 = _mm_and_si128(data64, *(const __m128i *)mask2); > + tmp0 = data64; > + tmp1 = data64; > + > + data64 = _mm_clmulepi64_si128(tmp0, params->rk7_rk8, 0x0); > + data64 = _mm_ternarylogic_epi64(data64, tmp1, *(const __m128i > *)mask, > + 0x28); > + > + tmp1 = data64; > + data64 = _mm_clmulepi64_si128(data64, params->rk7_rk8, 0x10); > + data64 = _mm_ternarylogic_epi64(data64, tmp1, tmp0, 0x96); > + > + return _mm_extract_epi32(data64, 2); > +} > + > +static __rte_always_inline void > +reduction_loop(__m128i *fold, int *len, const uint8_t *data, uint32_t *n, > + const struct crc_vpclmulqdq_ctx *params) { > + __m128i tmp, tmp1; > + > + tmp = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x1); > + *fold = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x10); > + *fold = _mm_xor_si128(*fold, tmp); > + tmp1 = _mm_loadu_si128((const __m128i *)&data[*n]); > + *fold = _mm_xor_si128(*fold, tmp1); > + *n += 16; > + *len -= 16; > +} > + > +static __rte_always_inline uint32_t > +crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len, uint32_t > crc, > + const struct crc_vpclmulqdq_ctx *params) { > + __m128i res, d, b; > + __m512i temp, k; > + __m512i qw0 = _mm512_set1_epi64(0), qw1, qw2, qw3; > + __m512i fold0, fold1, fold2, fold3; > + __mmask16 mask; > + uint32_t n = 0; > + int reduction = 0; > + > + /* Get CRC init value */ > + b = _mm_cvtsi32_si128(crc); > + temp = _mm512_castsi128_si512(b); > + > + if (data_len > 255) { > + fold0 = _mm512_loadu_si512((const __m512i *)data); > + fold1 = _mm512_loadu_si512((const __m512i *)(data+64)); > + fold2 = _mm512_loadu_si512((const __m512i *)(data+128)); > + fold3 = _mm512_loadu_si512((const __m512i *)(data+192)); > + fold0 = _mm512_xor_si512(fold0, temp); > + > + /* Main folding loop */ > + k = params->rk1_rk2; > + for (n = 256; (n + 256) <= data_len; n += 256) { > + qw0 = _mm512_loadu_si512((const __m512i > *)&data[n]); > + qw1 = _mm512_loadu_si512((const __m512i *) > + &(data[n+64])); > + qw2 = _mm512_loadu_si512((const __m512i *) > + &(data[n+128])); > + qw3 = _mm512_loadu_si512((const __m512i *) > + &(data[n+192])); > + fold0 = crcr32_folding_round(qw0, k, fold0); > + fold1 = crcr32_folding_round(qw1, k, fold1); > + fold2 = crcr32_folding_round(qw2, k, fold2); > + fold3 = crcr32_folding_round(qw3, k, fold3); > + } > + > + /* 256 to 128 fold */ > + k = params->rk3_rk4; > + fold0 = crcr32_folding_round(fold2, k, fold0); > + fold1 = crcr32_folding_round(fold3, k, fold1); > + > + res = crc32_fold_128(fold0, fold1, params); > + > + reduction = 240 - ((n+256)-data_len); > + > + while (reduction > 0) > + reduction_loop(&res, &reduction, data, &n, > + params); > + > + reduction += 16; > + > + if (n != data_len) > + res = last_two_xmm(data, data_len, n, res, > + params); > + } else { > + if (data_len > 31) { > + res = _mm_cvtsi32_si128(crc); > + d = _mm_loadu_si128((const __m128i *)data); > + res = _mm_xor_si128(res, d); > + n += 16; > + > + reduction = 240 - ((n+256)-data_len); > + > + while (reduction > 0) > + reduction_loop(&res, &reduction, data, &n, > + params); > + > + if (n != data_len) > + res = last_two_xmm(data, data_len, n, res, > + params); > + } else if (data_len > 16) { > + res = _mm_cvtsi32_si128(crc); > + d = _mm_loadu_si128((const __m128i *)data); > + res = _mm_xor_si128(res, d); > + n += 16; > + > + if (n != data_len) > + res = last_two_xmm(data, data_len, n, res, > + params); > + } else if (data_len == 16) { > + res = _mm_cvtsi32_si128(crc); > + d = _mm_loadu_si128((const __m128i *)data); > + res = _mm_xor_si128(res, d); > + } else { > + res = _mm_cvtsi32_si128(crc); > + mask = byte_len_to_mask_table[data_len]; > + d = _mm_maskz_loadu_epi8(mask, data); > + res = _mm_xor_si128(res, d); > + > + if (data_len > 3) { > + d = _mm_loadu_si128((const __m128i *) > + &shf_table[data_len]); > + res = _mm_shuffle_epi8(res, d); > + } else if (data_len > 2) { > + res = _mm_slli_si128(res, 5); > + goto do_barrett_reduction; > + } else if (data_len > 1) { > + res = _mm_slli_si128(res, 6); > + goto do_barrett_reduction; > + } else if (data_len > 0) { > + res = _mm_slli_si128(res, 7); > + goto do_barrett_reduction; > + } else { > + /* zero length case */ > + return crc; > + } > + } > + } > + > + res = done_128(res, params); > + > +do_barrett_reduction: > + n = barrett_reduction(res, params); > + > + return n; > +} > + > +static void > +crc32_load_init_constants(void) > +{ > + __m128i a; > + /* fold constants */ > + uint64_t c0 = 0x00000000e95c1271; > + uint64_t c1 = 0x00000000ce3371cb; > + uint64_t c2 = 0x00000000910eeec1; > + uint64_t c3 = 0x0000000033fff533; > + uint64_t c4 = 0x000000000cbec0ed; > + uint64_t c5 = 0x0000000031f8303f; > + uint64_t c6 = 0x0000000057c54819; > + uint64_t c7 = 0x00000000df068dc2; > + uint64_t c8 = 0x00000000ae0b5394; > + uint64_t c9 = 0x000000001c279815; > + uint64_t c10 = 0x000000001d9513d7; > + uint64_t c11 = 0x000000008f352d95; > + uint64_t c12 = 0x00000000af449247; > + uint64_t c13 = 0x000000003db1ecdc; > + uint64_t c14 = 0x0000000081256527; > + uint64_t c15 = 0x00000000f1da05aa; > + uint64_t c16 = 0x00000000ccaa009e; > + uint64_t c17 = 0x00000000ae689191; > + uint64_t c18 = 0x00000000ccaa009e; > + uint64_t c19 = 0x00000000b8bc6765; > + uint64_t c20 = 0x00000001f7011640; > + uint64_t c21 = 0x00000001db710640; > + > + a = _mm_set_epi64x(c1, c0); > + crc32_eth.rk1_rk2 = _mm512_broadcast_i32x4(a); > + > + a = _mm_set_epi64x(c3, c2); > + crc32_eth.rk3_rk4 = _mm512_broadcast_i32x4(a); > + > + crc32_eth.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8, > + c9, c10, c11); > + crc32_eth.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15, > + c16, c17, 0, 0); > + crc32_eth.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16), > + _mm_cvtsi64_m64(c17)); > + > + crc32_eth.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18), > + _mm_cvtsi64_m64(c19)); > + crc32_eth.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20), > + _mm_cvtsi64_m64(c21)); > +} > + > +static void > +crc16_load_init_constants(void) > +{ > + __m128i a; > + /* fold constants */ > + uint64_t c0 = 0x0000000000009a19; > + uint64_t c1 = 0x0000000000002df8; > + uint64_t c2 = 0x00000000000068af; > + uint64_t c3 = 0x000000000000b6c9; > + uint64_t c4 = 0x000000000000c64f; > + uint64_t c5 = 0x000000000000cd95; > + uint64_t c6 = 0x000000000000d341; > + uint64_t c7 = 0x000000000000b8f2; > + uint64_t c8 = 0x0000000000000842; > + uint64_t c9 = 0x000000000000b072; > + uint64_t c10 = 0x00000000000047e3; > + uint64_t c11 = 0x000000000000922d; > + uint64_t c12 = 0x0000000000000e3a; > + uint64_t c13 = 0x0000000000004d7a; > + uint64_t c14 = 0x0000000000005b44; > + uint64_t c15 = 0x0000000000007762; > + uint64_t c16 = 0x00000000000081bf; > + uint64_t c17 = 0x0000000000008e10; > + uint64_t c18 = 0x00000000000081bf; > + uint64_t c19 = 0x0000000000001cbb; > + uint64_t c20 = 0x000000011c581910; > + uint64_t c21 = 0x0000000000010810; > + > + a = _mm_set_epi64x(c1, c0); > + crc16_ccitt.rk1_rk2 = _mm512_broadcast_i32x4(a); > + > + a = _mm_set_epi64x(c3, c2); > + crc16_ccitt.rk3_rk4 = _mm512_broadcast_i32x4(a); > + > + crc16_ccitt.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8, > + c9, c10, c11); > + crc16_ccitt.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15, > + c16, c17, 0, 0); > + crc16_ccitt.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16), > + _mm_cvtsi64_m64(c17)); > + > + crc16_ccitt.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18), > + _mm_cvtsi64_m64(c19)); > + crc16_ccitt.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20), > + _mm_cvtsi64_m64(c21)); > +} > + > +void > +rte_net_crc_avx512_init(void) > +{ > + crc32_load_init_constants(); > + crc16_load_init_constants(); > + > + /* > + * Reset the register as following calculation may > + * use other data types such as float, double, etc. > + */ > + _mm_empty(); > +} > + > +uint32_t > +rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len) > +{ > + /* return 16-bit CRC value */ > + return (uint16_t)~crc32_eth_calc_vpclmulqdq(data, > + data_len, > + 0xffff, > + &crc16_ccitt); > +} > + > +uint32_t > +rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len) { > + /* return 32-bit CRC value */ > + return ~crc32_eth_calc_vpclmulqdq(data, > + data_len, > + 0xffffffffUL, > + &crc32_eth); > +} > diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index > d271d5205..32a366590 100644 > --- a/lib/librte_net/rte_net_crc.c > +++ b/lib/librte_net/rte_net_crc.c > @@ -37,6 +37,12 @@ static const rte_net_crc_handler handlers_scalar[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler, > [RTE_NET_CRC32_ETH] = rte_crc32_eth_handler, }; > +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT > +static const rte_net_crc_handler handlers_avx512[] = { > + [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_avx512_handler, > + [RTE_NET_CRC32_ETH] = rte_crc32_eth_avx512_handler, }; #endif > #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static const > rte_net_crc_handler handlers_sse42[] = { > [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, @@ - > 134,6 +140,39 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t > data_len) > crc32_eth_lut); > } > > +/* AVX512/VPCLMULQDQ handling */ > + > +#define AVX512_VPCLMULQDQ_CPU_SUPPORTED ( \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) && \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) && \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512DQ) && \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) && \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ) && \ > + rte_cpu_get_flag_enabled(RTE_CPUFLAG_VPCLMULQDQ) \ > +) > + > +static const rte_net_crc_handler * > +avx512_vpclmulqdq_get_handlers(void) > +{ > +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT > + if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) > + return handlers_avx512; > +#endif > + return NULL; > +} > + > +static uint8_t > +avx512_vpclmulqdq_init(void) > +{ > +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT > + if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) { > + rte_net_crc_avx512_init(); > + return 1; > + } > +#endif > + return 0; > +} > + > /* SSE4.2/PCLMULQDQ handling */ > > #define SSE42_PCLMULQDQ_CPU_SUPPORTED \ @@ -196,6 +235,11 @@ > rte_net_crc_set_alg(enum rte_net_crc_alg alg) > handlers = NULL; > > switch (alg) { > + case RTE_NET_CRC_AVX512: > + handlers = avx512_vpclmulqdq_get_handlers(); > + if (handlers != NULL) > + break; > + /* fall-through */ > case RTE_NET_CRC_SSE42: > handlers = sse42_pclmulqdq_get_handlers(); > break; /* for x86, always break here */ @@ -235,6 +279,8 > @@ RTE_INIT(rte_net_crc_init) > > if (sse42_pclmulqdq_init()) > alg = RTE_NET_CRC_SSE42; > + if (avx512_vpclmulqdq_init()) > + alg = RTE_NET_CRC_AVX512; > if (neon_pmull_init()) > alg = RTE_NET_CRC_NEON; > > diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h index > 16e85ca97..72d3e10ff 100644 > --- a/lib/librte_net/rte_net_crc.h > +++ b/lib/librte_net/rte_net_crc.h > @@ -1,5 +1,5 @@ > /* SPDX-License-Identifier: BSD-3-Clause > - * Copyright(c) 2017 Intel Corporation > + * Copyright(c) 2017-2020 Intel Corporation > */ > > #ifndef _RTE_NET_CRC_H_ > @@ -23,6 +23,7 @@ enum rte_net_crc_alg { > RTE_NET_CRC_SCALAR = 0, > RTE_NET_CRC_SSE42, > RTE_NET_CRC_NEON, > + RTE_NET_CRC_AVX512, > }; > > /** > @@ -35,6 +36,7 @@ enum rte_net_crc_alg { > * - RTE_NET_CRC_SCALAR > * - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic) > * - RTE_NET_CRC_NEON (Use ARM Neon intrinsic) > + * - RTE_NET_CRC_AVX512 (Use 512-bit AVX intrinsic) > */ > void > rte_net_crc_set_alg(enum rte_net_crc_alg alg); > -- > 2.12.3 Reviewed-by: Jasvinder Singh <jasvinder.singh@intel.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v5 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh @ 2020-10-09 18:35 ` De Lara Guarch, Pablo 2020-10-13 18:47 ` David Marchand 3 siblings, 0 replies; 23+ messages in thread From: De Lara Guarch, Pablo @ 2020-10-09 18:35 UTC (permalink / raw) To: O'loingsigh, Mairtin, Singh, Jasvinder, Richardson, Bruce, Ananyev, Konstantin Cc: dev, Ryan, Brendan, Coyle, David > -----Original Message----- > From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com> > Sent: Friday, October 9, 2020 2:51 PM > To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce > <bruce.richardson@intel.com>; De Lara Guarch, Pablo > <pablo.de.lara.guarch@intel.com>; Ananyev, Konstantin > <konstantin.ananyev@intel.com> > Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; O'loingsigh, > Mairtin <mairtin.oloingsigh@intel.com>; Coyle, David <david.coyle@intel.com> > Subject: [PATCH v5 0/2] net: add CRC run-time checks and > AVX512/VPCLMULQDQ based CRC > > This patchset makes two significant enhancements to the CRC modules of the > rte_net library: > > 1) Adds run-time selection of the optimal architecture-specific CRC path. > Previously the selection was solely made at compile-time, meaning it > could only be built and run on the same generation of CPU. Adding > run-time selection ability means this can be used from distro packages > and/or DPDK can be compiled on an older CPU and run on a newer CPU. > 2) Adds an optimized CRC implementation based on the AVX512 and > VPCLMULQDQ instruction sets. > > For further details, please see the commit messages of the individual patches. > > v5: > * Tidied-up the ifdef checks for RTE_ARCH_* and compiler support of CRC > paths, as per review comments: > * All ifdef checks removed from API function definitions and moved into > helper functions. > > v4: > * Fixed build issue when older version of meson is used (0.47.1). > * Addressed review comments: > * remove Intel copyright header from neon CRC file. > * tidy-up of register initialisation. > > v3: > * Re-submitted v2 as encountered problems when originally submitting it. > > v2: > * Added support for run-time selection of optimal architecture-specific > CRC, based on v1 review comment. > * Added full working AVX512/VPCLMULQDQ support for CRC32-Ethernet and > CRC16-CCITT. > > v1: > * Initial version, with incomplete AVX512/VPCLMULQDQ support for > CRC32-Ethernet only. > > Mairtin o Loingsigh (2): > net: add run-time architecture specific CRC selection > net: add support for AVX512/VPCLMULQDQ based CRC > > app/test/test_crc.c | 11 +- > config/x86/meson.build | 6 +- > doc/guides/rel_notes/release_20_11.rst | 6 + > lib/librte_net/meson.build | 89 ++++- > lib/librte_net/net_crc.h | 45 +++ > lib/librte_net/net_crc_avx512.c | 423 ++++++++++++++++++++++ > lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 +- > lib/librte_net/{net_crc_sse.h => net_crc_sse.c} | 34 +- > lib/librte_net/rte_net_crc.c | 162 +++++++-- > lib/librte_net/rte_net_crc.h | 4 +- > 10 files changed, 722 insertions(+), 84 deletions(-) create mode 100644 > lib/librte_net/net_crc.h create mode 100644 lib/librte_net/net_crc_avx512.c > rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%) rename > lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%) > > -- > 2.12.3 Series-Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com> ^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [dpdk-dev] [PATCH v5 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh ` (2 preceding siblings ...) 2020-10-09 18:35 ` [dpdk-dev] [PATCH v5 0/2] net: add CRC run-time checks and " De Lara Guarch, Pablo @ 2020-10-13 18:47 ` David Marchand 3 siblings, 0 replies; 23+ messages in thread From: David Marchand @ 2020-10-13 18:47 UTC (permalink / raw) To: Mairtin o Loingsigh Cc: Singh, Jasvinder, Bruce Richardson, Pablo de Lara, Ananyev, Konstantin, dev, Ryan, Brendan, David Coyle, Olivier Matz On Fri, Oct 9, 2020 at 3:55 PM Mairtin o Loingsigh <mairtin.oloingsigh@intel.com> wrote: > > This patchset makes two significant enhancements to the CRC modules of > the rte_net library: > > 1) Adds run-time selection of the optimal architecture-specific CRC path. > Previously the selection was solely made at compile-time, meaning it > could only be built and run on the same generation of CPU. Adding > run-time selection ability means this can be used from distro packages > and/or DPDK can be compiled on an older CPU and run on a newer CPU. > 2) Adds an optimized CRC implementation based on the AVX512 and > VPCLMULQDQ instruction sets. > > For further details, please see the commit messages of the individual > patches. > > v5: > * Tidied-up the ifdef checks for RTE_ARCH_* and compiler support of CRC > paths, as per review comments: > * All ifdef checks removed from API function definitions and moved into > helper functions. Updated MAINTAINERS with renamed/added files. Series applied. -- David Marchand ^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2020-10-13 18:48 UTC | newest] Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-09-29 15:35 [dpdk-dev] [PATCH v3 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh 2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh 2020-10-02 15:17 ` Singh, Jasvinder 2020-10-06 16:38 ` O'loingsigh, Mairtin 2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh 2020-10-05 13:20 ` De Lara Guarch, Pablo 2020-10-05 13:38 ` O'loingsigh, Mairtin 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh 2020-10-07 14:59 ` Ananyev, Konstantin 2020-10-09 14:04 ` Coyle, David 2020-10-10 12:42 ` Ananyev, Konstantin 2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh 2020-10-07 9:26 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " David Marchand 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh 2020-10-09 16:22 ` Singh, Jasvinder 2020-10-10 9:34 ` Ruifeng Wang 2020-10-13 9:07 ` Bruce Richardson 2020-10-09 13:50 ` [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh 2020-10-09 16:24 ` Singh, Jasvinder 2020-10-09 18:35 ` [dpdk-dev] [PATCH v5 0/2] net: add CRC run-time checks and " De Lara Guarch, Pablo 2020-10-13 18:47 ` David Marchand
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).