DPDK patches and discussions
 help / color / mirror / Atom feed
* [dpdk-dev] [PATCH v3 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC
@ 2020-09-29 15:35 Mairtin o Loingsigh
  2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
                   ` (2 more replies)
  0 siblings, 3 replies; 23+ messages in thread
From: Mairtin o Loingsigh @ 2020-09-29 15:35 UTC (permalink / raw)
  To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, brendan.ryan, david.coyle, Mairtin o Loingsigh

This patchset makes two significant enhancements to the CRC modules of
the rte_net library:

1) Adds run-time selection of the optimal architecture-specific CRC path.
   Previously the selection was solely made at compile-time, meaning it
   could only be built and run on the same generation of CPU. Adding
   run-time selection ability means this can be used from distro packages
   and/or DPDK can be compiled on an older CPU and run on a newer CPU.
2) Adds an optimized CRC implementation based on the AVX512 and
   VPCLMULQDQ instruction sets.
   
For further details, please see the commit messages of the individual
patches.

v2:
* Added support for run-time selection of optimal architecture-specific
  CRC, based on v1 review comment.
* Added full working AVX512/VPCLMULDQD support for CRC32-Ethernet and
  CRC16-CCITT.

v1:
* Initial version, with incomplete AVX512/VPCLMULDQD support for
  CRC32-Ethernet only.

Mairtin o Loingsigh (2):
  net: add run-time architecture specific CRC selection
  net: add support for AVX512/VPCLMULQDQ based CRC

 app/test/test_crc.c                               |  11 +-
 config/x86/meson.build                            |   6 +-
 doc/guides/rel_notes/release_20_11.rst            |   6 +
 lib/librte_net/meson.build                        |  89 ++++-
 lib/librte_net/net_crc.h                          |  45 +++
 lib/librte_net/net_crc_avx512.c                   | 424 ++++++++++++++++++++++
 lib/librte_net/{net_crc_neon.h => net_crc_neon.c} |  27 +-
 lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   |  34 +-
 lib/librte_net/rte_net_crc.c                      | 100 +++--
 lib/librte_net/rte_net_crc.h                      |   4 +-
 10 files changed, 674 insertions(+), 72 deletions(-)
 create mode 100644 lib/librte_net/net_crc.h
 create mode 100644 lib/librte_net/net_crc_avx512.c
 rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%)
 rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%)

-- 
2.12.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection
  2020-09-29 15:35 [dpdk-dev] [PATCH v3 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
@ 2020-09-29 15:36 ` Mairtin o Loingsigh
  2020-10-02 15:17   ` Singh, Jasvinder
  2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
  2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh
  2 siblings, 1 reply; 23+ messages in thread
From: Mairtin o Loingsigh @ 2020-09-29 15:36 UTC (permalink / raw)
  To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, brendan.ryan, david.coyle, Mairtin o Loingsigh

This patch adds support for run-time selection of the optimal
architecture-specific CRC path, based on the supported instruction set(s)
of the CPU.

The compiler option checks have been moved from the C files to the meson
script. The rte_cpu_get_flag_enabled function is called automatically by
the library at process initialization time to determine which
instructions the CPU supports, with the most optimal supported CRC path
ultimately selected.

Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Signed-off-by: David Coyle <david.coyle@intel.com>
---
 doc/guides/rel_notes/release_20_11.rst            |  4 ++
 lib/librte_net/meson.build                        | 34 +++++++++++-
 lib/librte_net/net_crc.h                          | 34 ++++++++++++
 lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 27 +++------
 lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   | 34 ++++--------
 lib/librte_net/rte_net_crc.c                      | 67 ++++++++++++++---------
 6 files changed, 132 insertions(+), 68 deletions(-)
 create mode 100644 lib/librte_net/net_crc.h
 rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%)
 rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%)

diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 4eb3224a7..6bd222dca 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Updated CRC modules of rte_net library.**
+
+  * Added run-time selection of the optimal architecture-specific CRC path.
+
 * **Updated Cisco enic driver.**
 
   * Added support for VF representors with single-queue Tx/Rx and flow API
diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build
index 24ed8253b..b6880bd85 100644
--- a/lib/librte_net/meson.build
+++ b/lib/librte_net/meson.build
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: BSD-3-Clause
-# Copyright(c) 2017 Intel Corporation
+# Copyright(c) 2017-2020 Intel Corporation
 
 headers = files('rte_ip.h',
 	'rte_tcp.h',
@@ -20,3 +20,35 @@ headers = files('rte_ip.h',
 
 sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c')
 deps += ['mbuf']
+
+if dpdk_conf.has('RTE_ARCH_X86_64')
+	net_crc_sse42_cpu_support = \
+		cc.get_define('__PCLMUL__', args: machine_args) != ''
+	net_crc_sse42_cc_support = \
+		cc.has_argument('-mpclmul') and cc.has_argument('-maes')
+
+	build_static_net_crc_sse42_lib = 0
+
+	if net_crc_sse42_cpu_support == true
+		sources += files('net_crc_sse.c')
+		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
+	elif net_crc_sse42_cc_support == true
+		build_static_net_crc_sse42_lib = 1
+		net_crc_sse42_lib_cflags = ['-mpclmul', '-maes']
+		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
+	endif
+
+	if build_static_net_crc_sse42_lib == 1
+		net_crc_sse42_lib = static_library(
+					'net_crc_sse42_lib',
+					'net_crc_sse.c',
+					dependencies: static_rte_eal,
+					c_args: [cflags,
+						net_crc_sse42_lib_cflags])
+		objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c')
+	endif
+elif dpdk_conf.has('RTE_ARCH_ARM64') and \
+		cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != ''
+	sources += files('net_crc_neon.c')
+	cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT']
+endif
diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h
new file mode 100644
index 000000000..a1578a56c
--- /dev/null
+++ b/lib/librte_net/net_crc.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#ifndef _NET_CRC_H_
+#define _NET_CRC_H_
+
+/*
+ * Different implementations of CRC
+ */
+
+/* SSE4.2 */
+
+void
+rte_net_crc_sse42_init(void);
+
+uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len);
+
+uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len);
+
+/* NEON */
+
+void
+rte_net_crc_neon_init(void);
+
+uint32_t
+rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len);
+
+uint32_t
+rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len);
+
+#endif /* _NET_CRC_H_ */
diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c
similarity index 95%
rename from lib/librte_net/net_crc_neon.h
rename to lib/librte_net/net_crc_neon.c
index 63fa1d4a1..b79684ec2 100644
--- a/lib/librte_net/net_crc_neon.h
+++ b/lib/librte_net/net_crc_neon.c
@@ -1,18 +1,17 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2017 Cavium, Inc
+ * Copyright(c) 2020 Intel Corporation
  */
 
-#ifndef _NET_CRC_NEON_H_
-#define _NET_CRC_NEON_H_
+#include <string.h>
 
+#include <rte_common.h>
 #include <rte_branch_prediction.h>
 #include <rte_net_crc.h>
 #include <rte_vect.h>
 #include <rte_cpuflags.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
+#include "net_crc.h"
 
 /** PMULL CRC computation context structure */
 struct crc_pmull_ctx {
@@ -218,7 +217,7 @@ crc32_eth_calc_pmull(
 	return n;
 }
 
-static inline void
+void
 rte_net_crc_neon_init(void)
 {
 	/* Initialize CRC16 data */
@@ -242,9 +241,8 @@ rte_net_crc_neon_init(void)
 	crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8);
 }
 
-static inline uint32_t
-rte_crc16_ccitt_neon_handler(const uint8_t *data,
-	uint32_t data_len)
+uint32_t
+rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len)
 {
 	return (uint16_t)~crc32_eth_calc_pmull(data,
 		data_len,
@@ -252,18 +250,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data,
 		&crc16_ccitt_pmull);
 }
 
-static inline uint32_t
-rte_crc32_eth_neon_handler(const uint8_t *data,
-	uint32_t data_len)
+uint32_t
+rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len)
 {
 	return ~crc32_eth_calc_pmull(data,
 		data_len,
 		0xffffffffUL,
 		&crc32_eth_pmull);
 }
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* _NET_CRC_NEON_H_ */
diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c
similarity index 94%
rename from lib/librte_net/net_crc_sse.h
rename to lib/librte_net/net_crc_sse.c
index 1c7b7a548..053b54b39 100644
--- a/lib/librte_net/net_crc_sse.h
+++ b/lib/librte_net/net_crc_sse.c
@@ -1,18 +1,16 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2017-2020 Intel Corporation
  */
 
-#ifndef _RTE_NET_CRC_SSE_H_
-#define _RTE_NET_CRC_SSE_H_
+#include <string.h>
 
+#include <rte_common.h>
 #include <rte_branch_prediction.h>
+#include <rte_cpuflags.h>
 
-#include <x86intrin.h>
-#include <cpuid.h>
+#include "net_crc.h"
 
-#ifdef __cplusplus
-extern "C" {
-#endif
+#include <x86intrin.h>
 
 /** PCLMULQDQ CRC computation context structure */
 struct crc_pclmulqdq_ctx {
@@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq(
 	return n;
 }
 
-
-static inline void
+void
 rte_net_crc_sse42_init(void)
 {
 	uint64_t k1, k2, k5, k6;
@@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void)
 	 * use other data types such as float, double, etc.
 	 */
 	_mm_empty();
-
 }
 
-static inline uint32_t
-rte_crc16_ccitt_sse42_handler(const uint8_t *data,
-	uint32_t data_len)
+uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len)
 {
 	/** return 16-bit CRC value */
 	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
@@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data,
 		&crc16_ccitt_pclmulqdq);
 }
 
-static inline uint32_t
-rte_crc32_eth_sse42_handler(const uint8_t *data,
-	uint32_t data_len)
+uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len)
 {
 	return ~crc32_eth_calc_pclmulqdq(data,
 		data_len,
 		0xffffffffUL,
 		&crc32_eth_pclmulqdq);
 }
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* _RTE_NET_CRC_SSE_H_ */
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
index 4f5b9e828..83dccbfba 100644
--- a/lib/librte_net/rte_net_crc.c
+++ b/lib/librte_net/rte_net_crc.c
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2017-2020 Intel Corporation
  */
 
 #include <stddef.h>
@@ -10,17 +10,7 @@
 #include <rte_common.h>
 #include <rte_net_crc.h>
 
-#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__)
-#define X86_64_SSE42_PCLMULQDQ     1
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO)
-#define ARM64_NEON_PMULL           1
-#endif
-
-#ifdef X86_64_SSE42_PCLMULQDQ
-#include <net_crc_sse.h>
-#elif defined ARM64_NEON_PMULL
-#include <net_crc_neon.h>
-#endif
+#include "net_crc.h"
 
 /** CRC polynomials */
 #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
@@ -47,13 +37,13 @@ static rte_net_crc_handler handlers_scalar[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
 };
-
-#ifdef X86_64_SSE42_PCLMULQDQ
+#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
 static rte_net_crc_handler handlers_sse42[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
 };
-#elif defined ARM64_NEON_PMULL
+#endif
+#ifdef CC_ARM64_NEON_PMULL_SUPPORT
 static rte_net_crc_handler handlers_neon[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler,
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler,
@@ -142,22 +132,44 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
 		crc32_eth_lut);
 }
 
+#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
+static uint8_t
+sse42_pclmulqdq_cpu_supported(void)
+{
+	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ);
+}
+#endif
+
+#ifdef CC_ARM64_NEON_PMULL_SUPPORT
+static uint8_t
+neon_pmull_cpu_supported(void)
+{
+	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL);
+}
+#endif
+
 void
 rte_net_crc_set_alg(enum rte_net_crc_alg alg)
 {
 	switch (alg) {
-#ifdef X86_64_SSE42_PCLMULQDQ
+#ifdef RTE_ARCH_X86_64
 	case RTE_NET_CRC_SSE42:
-		handlers = handlers_sse42;
-		break;
-#elif defined ARM64_NEON_PMULL
-		/* fall-through */
+#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
+		if (sse42_pclmulqdq_cpu_supported()) {
+			handlers = handlers_sse42;
+			break;
+		}
+#endif
+#endif /* RTE_ARCH_X86_64 */
+#ifdef RTE_ARCH_ARM64
 	case RTE_NET_CRC_NEON:
-		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
+#ifdef CC_ARM64_NEON_PMULL_SUPPORT
+		if (neon_pmull_cpu_supported()) {
 			handlers = handlers_neon;
 			break;
 		}
 #endif
+#endif /* RTE_ARCH_ARM64 */
 		/* fall-through */
 	case RTE_NET_CRC_SCALAR:
 		/* fall-through */
@@ -188,11 +200,14 @@ RTE_INIT(rte_net_crc_init)
 
 	rte_net_crc_scalar_init();
 
-#ifdef X86_64_SSE42_PCLMULQDQ
-	alg = RTE_NET_CRC_SSE42;
-	rte_net_crc_sse42_init();
-#elif defined ARM64_NEON_PMULL
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
+#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
+	if (sse42_pclmulqdq_cpu_supported()) {
+		alg = RTE_NET_CRC_SSE42;
+		rte_net_crc_sse42_init();
+	}
+#endif
+#ifdef CC_ARM64_NEON_PMULL_SUPPORT
+	if (neon_pmull_cpu_supported()) {
 		alg = RTE_NET_CRC_NEON;
 		rte_net_crc_neon_init();
 	}
-- 
2.12.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC
  2020-09-29 15:35 [dpdk-dev] [PATCH v3 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
  2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
@ 2020-09-29 15:36 ` Mairtin o Loingsigh
  2020-10-05 13:20   ` De Lara Guarch, Pablo
  2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh
  2 siblings, 1 reply; 23+ messages in thread
From: Mairtin o Loingsigh @ 2020-09-29 15:36 UTC (permalink / raw)
  To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, brendan.ryan, david.coyle, Mairtin o Loingsigh

This patch enables the optimized calculation of CRC32-Ethernet and
CRC16-CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC
implementation is built if the compiler supports the required instruction
sets. It is selected at run-time if the host CPU, again, supports the
required instruction sets.

Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Signed-off-by: David Coyle <david.coyle@intel.com>
---
 app/test/test_crc.c                    |  11 +-
 config/x86/meson.build                 |   6 +-
 doc/guides/rel_notes/release_20_11.rst |   2 +
 lib/librte_net/meson.build             |  55 +++++
 lib/librte_net/net_crc.h               |  11 +
 lib/librte_net/net_crc_avx512.c        | 424 +++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_crc.c           |  33 +++
 lib/librte_net/rte_net_crc.h           |   4 +-
 8 files changed, 542 insertions(+), 4 deletions(-)
 create mode 100644 lib/librte_net/net_crc_avx512.c

diff --git a/app/test/test_crc.c b/app/test/test_crc.c
index f8a74e04e..bf1d34435 100644
--- a/app/test/test_crc.c
+++ b/app/test/test_crc.c
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2017-2020 Intel Corporation
  */
 
 #include "test.h"
@@ -149,6 +149,15 @@ test_crc(void)
 		return ret;
 	}
 
+	/* set CRC avx512 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_AVX512);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test crc (x86_64 AVX512): failed (%d)\n", ret);
+		return ret;
+	}
+
 	/* set CRC neon mode */
 	rte_net_crc_set_alg(RTE_NET_CRC_NEON);
 
diff --git a/config/x86/meson.build b/config/x86/meson.build
index fea4d5403..172b72b72 100644
--- a/config/x86/meson.build
+++ b/config/x86/meson.build
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: BSD-3-Clause
-# Copyright(c) 2017-2019 Intel Corporation
+# Copyright(c) 2017-2020 Intel Corporation
 
 # get binutils version for the workaround of Bug 97
 if not is_windows
@@ -23,7 +23,9 @@ endforeach
 
 optional_flags = ['AES', 'PCLMUL',
 		'AVX', 'AVX2', 'AVX512F',
-		'RDRND', 'RDSEED']
+		'RDRND', 'RDSEED',
+		'AVX512BW', 'AVX512DQ',
+		'AVX512VL', 'VPCLMULQDQ']
 foreach f:optional_flags
 	if cc.get_define('__@0@__'.format(f), args: machine_args) == '1'
 		if f == 'PCLMUL' # special case flags with different defines
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 6bd222dca..509749ebd 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -58,6 +58,8 @@ New Features
 * **Updated CRC modules of rte_net library.**
 
   * Added run-time selection of the optimal architecture-specific CRC path.
+  * Added optimized implementations of CRC32-Ethernet and CRC16-CCITT
+    using the AVX512 and VPCLMULQDQ instruction sets.
 
 * **Updated Cisco enic driver.**
 
diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build
index b6880bd85..eeae25bc1 100644
--- a/lib/librte_net/meson.build
+++ b/lib/librte_net/meson.build
@@ -24,18 +24,62 @@ deps += ['mbuf']
 if dpdk_conf.has('RTE_ARCH_X86_64')
 	net_crc_sse42_cpu_support = \
 		cc.get_define('__PCLMUL__', args: machine_args) != ''
+	net_crc_avx512_cpu_support = \
+		cc.get_define('__AVX512F__', args: machine_args) != '' and \
+		cc.get_define('__AVX512BW__', args: machine_args) != '' and \
+		cc.get_define('__AVX512DQ__', args: machine_args) != '' and \
+		cc.get_define('__AVX512VL__', args: machine_args) != '' and \
+		cc.get_define('__VPCLMULQDQ__', args: machine_args) != ''
+
 	net_crc_sse42_cc_support = \
 		cc.has_argument('-mpclmul') and cc.has_argument('-maes')
+	net_crc_avx512_cc_support = \
+		not machine_args.contains('-mno-avx512f') and \
+		cc.has_argument('-mavx512f') and \
+		cc.has_argument('-mavx512bw') and \
+		cc.has_argument('-mavx512dq') and \
+		cc.has_argument('-mavx512vl') and \
+		cc.has_argument('-mvpclmulqdq') and \
+		cc.has_argument('-mavx2') and \
+		cc.has_argument('-mavx')
 
 	build_static_net_crc_sse42_lib = 0
+	build_static_net_crc_avx512_lib = 0
 
 	if net_crc_sse42_cpu_support == true
 		sources += files('net_crc_sse.c')
 		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
+		if net_crc_avx512_cpu_support == true
+			sources += files('net_crc_avx512.c')
+			cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT']
+		elif net_crc_avx512_cc_support == true
+			build_static_net_crc_avx512_lib = 1
+			net_crc_avx512_lib_cflags = ['-mavx512f',
+							'-mavx512bw',
+							'-mavx512dq',
+							'-mavx512vl',
+							'-mvpclmulqdq',
+							'-mavx2',
+							'-mavx']
+			cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT']
+		endif
 	elif net_crc_sse42_cc_support == true
 		build_static_net_crc_sse42_lib = 1
 		net_crc_sse42_lib_cflags = ['-mpclmul', '-maes']
 		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
+		if net_crc_avx512_cc_support == true
+			build_static_net_crc_avx512_lib = 1
+			net_crc_avx512_lib_cflags = ['-mpclmul',
+							'-maes',
+							'-mavx512f',
+							'-mavx512bw',
+							'-mavx512dq',
+							'-mavx512vl',
+							'-mvpclmulqdq',
+							'-mavx2',
+							'-mavx']
+			cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT']
+		endif
 	endif
 
 	if build_static_net_crc_sse42_lib == 1
@@ -47,6 +91,17 @@ if dpdk_conf.has('RTE_ARCH_X86_64')
 						net_crc_sse42_lib_cflags])
 		objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c')
 	endif
+
+	if build_static_net_crc_avx512_lib == 1
+		net_crc_avx512_lib = static_library(
+					'net_crc_avx512_lib',
+					'net_crc_avx512.c',
+					dependencies: static_rte_eal,
+					c_args: [cflags,
+						net_crc_avx512_lib_cflags])
+		objs += net_crc_avx512_lib.extract_objects('net_crc_avx512.c')
+	endif
+
 elif dpdk_conf.has('RTE_ARCH_ARM64') and \
 		cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != ''
 	sources += files('net_crc_neon.c')
diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h
index a1578a56c..7a74d5406 100644
--- a/lib/librte_net/net_crc.h
+++ b/lib/librte_net/net_crc.h
@@ -20,6 +20,17 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len);
 uint32_t
 rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len);
 
+/* AVX512 */
+
+void
+rte_net_crc_avx512_init(void);
+
+uint32_t
+rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len);
+
+uint32_t
+rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len);
+
 /* NEON */
 
 void
diff --git a/lib/librte_net/net_crc_avx512.c b/lib/librte_net/net_crc_avx512.c
new file mode 100644
index 000000000..81aac6349
--- /dev/null
+++ b/lib/librte_net/net_crc_avx512.c
@@ -0,0 +1,424 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_branch_prediction.h>
+#include <rte_cpuflags.h>
+
+#include "net_crc.h"
+
+#include <x86intrin.h>
+
+/* VPCLMULQDQ CRC computation context structure */
+struct crc_vpclmulqdq_ctx {
+	__m512i rk1_rk2;
+	__m512i rk3_rk4;
+	__m512i fold_7x128b;
+	__m512i fold_3x128b;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+	__m128i fold_1x128b;
+};
+
+static struct crc_vpclmulqdq_ctx crc32_eth __rte_aligned(64);
+static struct crc_vpclmulqdq_ctx crc16_ccitt __rte_aligned(64);
+
+static uint16_t byte_len_to_mask_table[] = {
+	0x0000, 0x0001, 0x0003, 0x0007,
+	0x000f, 0x001f, 0x003f, 0x007f,
+	0x00ff, 0x01ff, 0x03ff, 0x07ff,
+	0x0fff, 0x1fff, 0x3fff, 0x7fff,
+	0xffff};
+
+static const uint8_t shf_table[32] __rte_aligned(16) = {
+	0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+	0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+};
+
+static const uint32_t mask[4] __rte_aligned(16) = {
+	0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+};
+
+static const uint32_t mask2[4] __rte_aligned(16) = {
+	0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+};
+
+static __rte_always_inline __m512i
+crcr32_folding_round(__m512i data_block, __m512i precomp, __m512i fold)
+{
+	__m512i tmp0, tmp1;
+
+	tmp0 = _mm512_clmulepi64_epi128(fold, precomp, 0x01);
+	tmp1 = _mm512_clmulepi64_epi128(fold, precomp, 0x10);
+
+	return _mm512_ternarylogic_epi64(tmp0, tmp1, data_block, 0x96);
+}
+
+static __rte_always_inline __m128i
+crc32_fold_128(__m512i fold0, __m512i fold1,
+	const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i res, res2;
+	__m256i a;
+	__m512i tmp0, tmp1, tmp2, tmp3;
+	__m512i tmp4;
+
+	tmp0 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, 0x01);
+	tmp1 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, 0x10);
+
+	res = _mm512_extracti64x2_epi64(fold1, 3);
+	tmp4 = _mm512_maskz_broadcast_i32x4(0xF, res);
+
+	tmp2 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, 0x01);
+	tmp3 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, 0x10);
+
+	tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp1, tmp2, 0x96);
+	tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp3, tmp4, 0x96);
+
+	tmp1 = _mm512_shuffle_i64x2(tmp0, tmp0, 0x4e);
+
+	a = _mm256_xor_si256(*(__m256i *)&tmp1, *(__m256i *)&tmp0);
+	res = _mm256_extracti64x2_epi64(a, 1);
+	res2 = _mm_xor_si128(res, *(__m128i *)&a);
+
+	return res2;
+}
+
+static __rte_always_inline __m128i
+last_two_xmm(const uint8_t *data, uint32_t data_len, uint32_t n, __m128i res,
+	const struct crc_vpclmulqdq_ctx *params)
+{
+	uint32_t offset;
+	__m128i res2, res3, res4, pshufb_shf;
+
+	const uint32_t mask3[4] __rte_aligned(16) = {
+		   0x80808080, 0x80808080, 0x80808080, 0x80808080
+	};
+
+	res2 = res;
+	offset = data_len - n;
+	res3 = _mm_loadu_si128((const __m128i *)&data[n+offset-16]);
+
+	pshufb_shf = _mm_loadu_si128((const __m128i *)
+			(shf_table + (data_len-n)));
+
+	res = _mm_shuffle_epi8(res, pshufb_shf);
+	pshufb_shf = _mm_xor_si128(pshufb_shf,
+			_mm_load_si128((const __m128i *) mask3));
+	res2 = _mm_shuffle_epi8(res2, pshufb_shf);
+
+	res2 = _mm_blendv_epi8(res2, res3, pshufb_shf);
+
+	res4 = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x01);
+	res = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x10);
+	res = _mm_ternarylogic_epi64(res, res2, res4, 0x96);
+
+	return res;
+}
+
+static __rte_always_inline __m128i
+done_128(__m128i res, const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i res1;
+
+	res1 = res;
+
+	res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x0);
+	res1 = _mm_srli_si128(res1, 8);
+	res = _mm_xor_si128(res, res1);
+
+	res1 = res;
+	res = _mm_slli_si128(res, 4);
+	res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x10);
+	res = _mm_xor_si128(res, res1);
+
+	return res;
+}
+
+static __rte_always_inline uint32_t
+barrett_reduction(__m128i data64, const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i tmp0, tmp1;
+
+	data64 =  _mm_and_si128(data64, *(const __m128i *)mask2);
+	tmp0 = data64;
+	tmp1 = data64;
+
+	data64 = _mm_clmulepi64_si128(tmp0, params->rk7_rk8, 0x0);
+	data64 = _mm_ternarylogic_epi64(data64, tmp1, *(const __m128i *)mask,
+			0x28);
+
+	tmp1 = data64;
+	data64 = _mm_clmulepi64_si128(data64, params->rk7_rk8, 0x10);
+	data64 = _mm_ternarylogic_epi64(data64, tmp1, tmp0, 0x96);
+
+	return _mm_extract_epi32(data64, 2);
+}
+
+static __rte_always_inline void
+reduction_loop(__m128i *fold, int *len, const uint8_t *data, uint32_t *n,
+	const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i tmp, tmp1;
+
+	tmp = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x1);
+	*fold = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x10);
+	*fold = _mm_xor_si128(*fold, tmp);
+	tmp1 = _mm_loadu_si128((const __m128i *)&data[*n]);
+	*fold = _mm_xor_si128(*fold, tmp1);
+	*n += 16;
+	*len -= 16;
+}
+
+static __rte_always_inline uint32_t
+crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len, uint32_t crc,
+	const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i res, d;
+	__m256i b;
+	__m512i temp, k;
+	__m512i qw0 = _mm512_set1_epi64(0), qw1, qw2, qw3;
+	__m512i fold0, fold1, fold2, fold3;
+	__mmask16 mask;
+	uint32_t n = 0;
+	int reduction = 0;
+
+	/* Get CRC init value */
+	b = _mm256_insert_epi32(_mm256_setzero_si256(), crc, 0);
+	temp = _mm512_inserti32x8(_mm512_setzero_si512(), b, 0);
+
+	if (data_len > 255) {
+		fold0 = _mm512_loadu_si512((const __m512i *)data);
+		fold1 = _mm512_loadu_si512((const __m512i *)(data+64));
+		fold2 = _mm512_loadu_si512((const __m512i *)(data+128));
+		fold3 = _mm512_loadu_si512((const __m512i *)(data+192));
+		fold0 = _mm512_xor_si512(fold0, temp);
+
+		/* Main folding loop */
+		k = params->rk1_rk2;
+		for (n = 256; (n + 256) <= data_len; n += 256) {
+			qw0 = _mm512_loadu_si512((const __m512i *)&data[n]);
+			qw1 = _mm512_loadu_si512((const __m512i *)
+					&(data[n+64]));
+			qw2 = _mm512_loadu_si512((const __m512i *)
+					&(data[n+128]));
+			qw3 = _mm512_loadu_si512((const __m512i *)
+					&(data[n+192]));
+			fold0 = crcr32_folding_round(qw0, k, fold0);
+			fold1 = crcr32_folding_round(qw1, k, fold1);
+			fold2 = crcr32_folding_round(qw2, k, fold2);
+			fold3 = crcr32_folding_round(qw3, k, fold3);
+		}
+
+		/* 256 to 128 fold */
+		k = params->rk3_rk4;
+		fold0 = crcr32_folding_round(fold2, k, fold0);
+		fold1 = crcr32_folding_round(fold3, k, fold1);
+
+		res = crc32_fold_128(fold0, fold1, params);
+
+		reduction = 240 - ((n+256)-data_len);
+
+		while (reduction > 0)
+			reduction_loop(&res, &reduction, data, &n,
+					params);
+
+		reduction += 16;
+
+		if (n != data_len)
+			res = last_two_xmm(data, data_len, n, res,
+					params);
+	} else {
+		if (data_len > 31) {
+			res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+			d = _mm_loadu_si128((const __m128i *)data);
+			res = _mm_xor_si128(res, d);
+			n += 16;
+
+			reduction = 240 - ((n+256)-data_len);
+
+			while (reduction > 0)
+				reduction_loop(&res, &reduction, data, &n,
+						params);
+
+			if (n != data_len)
+				res = last_two_xmm(data, data_len, n, res,
+						params);
+		} else if (data_len > 16) {
+			res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+			d = _mm_loadu_si128((const __m128i *)data);
+			res = _mm_xor_si128(res, d);
+			n += 16;
+
+			if (n != data_len)
+				res = last_two_xmm(data, data_len, n, res,
+						params);
+		} else if (data_len == 16) {
+			res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+			d = _mm_loadu_si128((const __m128i *)data);
+			res = _mm_xor_si128(res, d);
+		} else {
+			res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);
+			mask = byte_len_to_mask_table[data_len];
+			d = _mm_maskz_loadu_epi8(mask, data);
+			res = _mm_xor_si128(res, d);
+
+			if (data_len > 3) {
+				d = _mm_loadu_si128((const __m128i *)
+						&shf_table[data_len]);
+				res = _mm_shuffle_epi8(res, d);
+			} else if (data_len > 2) {
+				res = _mm_slli_si128(res, 5);
+				goto do_barrett_reduction;
+			} else if (data_len > 1) {
+				res = _mm_slli_si128(res, 6);
+				goto do_barrett_reduction;
+			} else if (data_len > 0) {
+				res = _mm_slli_si128(res, 7);
+				goto do_barrett_reduction;
+			} else {
+				/* zero length case */
+				return crc;
+			}
+		}
+	}
+
+	res = done_128(res, params);
+
+do_barrett_reduction:
+	n = barrett_reduction(res, params);
+
+	return n;
+}
+
+static void
+crc32_load_init_constants(void)
+{
+	__m128i a;
+	/* fold constants */
+	uint64_t c0 = 0x00000000e95c1271;
+	uint64_t c1 = 0x00000000ce3371cb;
+	uint64_t c2 = 0x00000000910eeec1;
+	uint64_t c3 = 0x0000000033fff533;
+	uint64_t c4 = 0x000000000cbec0ed;
+	uint64_t c5 = 0x0000000031f8303f;
+	uint64_t c6 = 0x0000000057c54819;
+	uint64_t c7 = 0x00000000df068dc2;
+	uint64_t c8 = 0x00000000ae0b5394;
+	uint64_t c9 = 0x000000001c279815;
+	uint64_t c10 = 0x000000001d9513d7;
+	uint64_t c11 = 0x000000008f352d95;
+	uint64_t c12 = 0x00000000af449247;
+	uint64_t c13 = 0x000000003db1ecdc;
+	uint64_t c14 = 0x0000000081256527;
+	uint64_t c15 = 0x00000000f1da05aa;
+	uint64_t c16 = 0x00000000ccaa009e;
+	uint64_t c17 = 0x00000000ae689191;
+	uint64_t c18 = 0x00000000ccaa009e;
+	uint64_t c19 = 0x00000000b8bc6765;
+	uint64_t c20 = 0x00000001f7011640;
+	uint64_t c21 = 0x00000001db710640;
+
+	a = _mm_set_epi64x(c1, c0);
+	crc32_eth.rk1_rk2 = _mm512_broadcast_i32x4(a);
+
+	a = _mm_set_epi64x(c3, c2);
+	crc32_eth.rk3_rk4 = _mm512_broadcast_i32x4(a);
+
+	crc32_eth.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8,
+			c9, c10, c11);
+	crc32_eth.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15,
+			c16, c17, 0, 0);
+	crc32_eth.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16),
+			_mm_cvtsi64_m64(c17));
+
+	crc32_eth.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18),
+			_mm_cvtsi64_m64(c19));
+	crc32_eth.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20),
+			_mm_cvtsi64_m64(c21));
+}
+
+static void
+crc16_load_init_constants(void)
+{
+	__m128i a;
+	/* fold constants */
+	uint64_t c0 = 0x0000000000009a19;
+	uint64_t c1 = 0x0000000000002df8;
+	uint64_t c2 = 0x00000000000068af;
+	uint64_t c3 = 0x000000000000b6c9;
+	uint64_t c4 = 0x000000000000c64f;
+	uint64_t c5 = 0x000000000000cd95;
+	uint64_t c6 = 0x000000000000d341;
+	uint64_t c7 = 0x000000000000b8f2;
+	uint64_t c8 = 0x0000000000000842;
+	uint64_t c9 = 0x000000000000b072;
+	uint64_t c10 = 0x00000000000047e3;
+	uint64_t c11 = 0x000000000000922d;
+	uint64_t c12 = 0x0000000000000e3a;
+	uint64_t c13 = 0x0000000000004d7a;
+	uint64_t c14 = 0x0000000000005b44;
+	uint64_t c15 = 0x0000000000007762;
+	uint64_t c16 = 0x00000000000081bf;
+	uint64_t c17 = 0x0000000000008e10;
+	uint64_t c18 = 0x00000000000081bf;
+	uint64_t c19 = 0x0000000000001cbb;
+	uint64_t c20 = 0x000000011c581910;
+	uint64_t c21 = 0x0000000000010810;
+
+	a = _mm_set_epi64x(c1, c0);
+	crc16_ccitt.rk1_rk2 = _mm512_broadcast_i32x4(a);
+
+	a = _mm_set_epi64x(c3, c2);
+	crc16_ccitt.rk3_rk4 = _mm512_broadcast_i32x4(a);
+
+	crc16_ccitt.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8,
+			c9, c10, c11);
+	crc16_ccitt.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15,
+			c16, c17, 0, 0);
+	crc16_ccitt.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16),
+			_mm_cvtsi64_m64(c17));
+
+	crc16_ccitt.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18),
+			_mm_cvtsi64_m64(c19));
+	crc16_ccitt.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20),
+			_mm_cvtsi64_m64(c21));
+}
+
+void
+rte_net_crc_avx512_init(void)
+{
+	crc32_load_init_constants();
+	crc16_load_init_constants();
+
+	/*
+	 * Reset the register as following calculation may
+	 * use other data types such as float, double, etc.
+	 */
+	_mm_empty();
+}
+
+uint32_t
+rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len)
+{
+	/* return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_vpclmulqdq(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt);
+}
+
+uint32_t
+rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len)
+{
+	/* return 32-bit CRC value */
+	return ~crc32_eth_calc_vpclmulqdq(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth);
+}
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
index 83dccbfba..fcf9cc0ef 100644
--- a/lib/librte_net/rte_net_crc.c
+++ b/lib/librte_net/rte_net_crc.c
@@ -37,6 +37,12 @@ static rte_net_crc_handler handlers_scalar[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
 };
+#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
+static rte_net_crc_handler handlers_avx512[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_avx512_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_avx512_handler,
+};
+#endif
 #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
 static rte_net_crc_handler handlers_sse42[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
@@ -132,6 +138,19 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
 		crc32_eth_lut);
 }
 
+#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
+static uint8_t
+avx512_vpclmulqdq_cpu_supported(void)
+{
+	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) &&
+		rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) &&
+		rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512DQ) &&
+		rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) &&
+		rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ) &&
+		rte_cpu_get_flag_enabled(RTE_CPUFLAG_VPCLMULQDQ);
+}
+#endif
+
 #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
 static uint8_t
 sse42_pclmulqdq_cpu_supported(void)
@@ -153,6 +172,14 @@ rte_net_crc_set_alg(enum rte_net_crc_alg alg)
 {
 	switch (alg) {
 #ifdef RTE_ARCH_X86_64
+	case RTE_NET_CRC_AVX512:
+#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
+		if (avx512_vpclmulqdq_cpu_supported()) {
+			handlers = handlers_avx512;
+			break;
+		}
+#endif
+		/* fall-through */
 	case RTE_NET_CRC_SSE42:
 #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
 		if (sse42_pclmulqdq_cpu_supported()) {
@@ -206,6 +233,12 @@ RTE_INIT(rte_net_crc_init)
 		rte_net_crc_sse42_init();
 	}
 #endif
+#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
+	if (avx512_vpclmulqdq_cpu_supported()) {
+		alg = RTE_NET_CRC_AVX512;
+		rte_net_crc_avx512_init();
+	}
+#endif
 #ifdef CC_ARM64_NEON_PMULL_SUPPORT
 	if (neon_pmull_cpu_supported()) {
 		alg = RTE_NET_CRC_NEON;
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
index 16e85ca97..72d3e10ff 100644
--- a/lib/librte_net/rte_net_crc.h
+++ b/lib/librte_net/rte_net_crc.h
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2017-2020 Intel Corporation
  */
 
 #ifndef _RTE_NET_CRC_H_
@@ -23,6 +23,7 @@ enum rte_net_crc_alg {
 	RTE_NET_CRC_SCALAR = 0,
 	RTE_NET_CRC_SSE42,
 	RTE_NET_CRC_NEON,
+	RTE_NET_CRC_AVX512,
 };
 
 /**
@@ -35,6 +36,7 @@ enum rte_net_crc_alg {
  *   - RTE_NET_CRC_SCALAR
  *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
  *   - RTE_NET_CRC_NEON (Use ARM Neon intrinsic)
+ *   - RTE_NET_CRC_AVX512 (Use 512-bit AVX intrinsic)
  */
 void
 rte_net_crc_set_alg(enum rte_net_crc_alg alg);
-- 
2.12.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection
  2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
@ 2020-10-02 15:17   ` Singh, Jasvinder
  2020-10-06 16:38     ` O'loingsigh, Mairtin
  0 siblings, 1 reply; 23+ messages in thread
From: Singh, Jasvinder @ 2020-10-02 15:17 UTC (permalink / raw)
  To: O'loingsigh, Mairtin, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, Ryan, Brendan, Coyle, David



> -----Original Message-----
> From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>
> Sent: Tuesday, September 29, 2020 4:36 PM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; Coyle, David
> <david.coyle@intel.com>; O'loingsigh, Mairtin
> <mairtin.oloingsigh@intel.com>
> Subject: [PATCH v3 1/2] net: add run-time architecture specific CRC selection
> 
> This patch adds support for run-time selection of the optimal architecture-
> specific CRC path, based on the supported instruction set(s) of the CPU.
> 
> The compiler option checks have been moved from the C files to the meson
> script. The rte_cpu_get_flag_enabled function is called automatically by the
> library at process initialization time to determine which instructions the CPU
> supports, with the most optimal supported CRC path ultimately selected.
> 
> Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
> Signed-off-by: David Coyle <david.coyle@intel.com>
> ---
>  doc/guides/rel_notes/release_20_11.rst            |  4 ++
>  lib/librte_net/meson.build                        | 34 +++++++++++-
>  lib/librte_net/net_crc.h                          | 34 ++++++++++++
>  lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 27 +++------
>  lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   | 34 ++++--------
>  lib/librte_net/rte_net_crc.c                      | 67 ++++++++++++++---------
>  6 files changed, 132 insertions(+), 68 deletions(-)  create mode 100644
> lib/librte_net/net_crc.h  rename lib/librte_net/{net_crc_neon.h =>
> net_crc_neon.c} (95%)  rename lib/librte_net/{net_crc_sse.h =>
> net_crc_sse.c} (94%)
> 
> diff --git a/doc/guides/rel_notes/release_20_11.rst
> b/doc/guides/rel_notes/release_20_11.rst
> index 4eb3224a7..6bd222dca 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -55,6 +55,10 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================

<snip>


_t *data, uint32_t data_len);
> +
> +#endif /* _NET_CRC_H_ */
> diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c
> similarity index 95% rename from lib/librte_net/net_crc_neon.h rename to
> lib/librte_net/net_crc_neon.c index 63fa1d4a1..b79684ec2 100644
> --- a/lib/librte_net/net_crc_neon.h
> +++ b/lib/librte_net/net_crc_neon.c
> @@ -1,18 +1,17 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
>   * Copyright(c) 2017 Cavium, Inc
> + * Copyright(c) 2020 Intel Corporation
>   */

Could you please remove intel copyright as there is no change in this file? 

> -#ifndef _NET_CRC_NEON_H_
> -#define _NET_CRC_NEON_H_
> +#include <string.h>
> 
> +#include <rte_common.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_net_crc.h>
>  #include <rte_vect.h>
>  #include <rte_cpuflags.h>
> 
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> +#include "net_crc.h"
> 
>  /** PMULL CRC computation context structure */  struct crc_pmull_ctx { @@
> -218,7 +217,7 @@ crc32_eth_calc_pmull(
>  	return n;
>  }
> 
> -static inline void
> +void
>  rte_net_crc_neon_init(void)
>  {
>  	/* Initialize CRC16 data */
> @@ -242,9 +241,8 @@ rte_net_crc_neon_init(void)
>  	crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8);  }
> 
> -static inline uint32_t
> -rte_crc16_ccitt_neon_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	return (uint16_t)~crc32_eth_calc_pmull(data,
>  		data_len,
> @@ -252,18 +250,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data,
>  		&crc16_ccitt_pmull);
>  }
> 
> -static inline uint32_t
> -rte_crc32_eth_neon_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	return ~crc32_eth_calc_pmull(data,
>  		data_len,
>  		0xffffffffUL,
>  		&crc32_eth_pmull);
>  }
> -
> -#ifdef __cplusplus
> -}
> -#endif
> -
> -#endif /* _NET_CRC_NEON_H_ */
> diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c
> similarity index 94% rename from lib/librte_net/net_crc_sse.h rename to
> lib/librte_net/net_crc_sse.c index 1c7b7a548..053b54b39 100644
> --- a/lib/librte_net/net_crc_sse.h
> +++ b/lib/librte_net/net_crc_sse.c
> @@ -1,18 +1,16 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2017 Intel Corporation
> + * Copyright(c) 2017-2020 Intel Corporation
>   */
> 
> -#ifndef _RTE_NET_CRC_SSE_H_
> -#define _RTE_NET_CRC_SSE_H_
> +#include <string.h>
> 
> +#include <rte_common.h>
>  #include <rte_branch_prediction.h>
> +#include <rte_cpuflags.h>
> 
> -#include <x86intrin.h>
> -#include <cpuid.h>
> +#include "net_crc.h"
> 
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> +#include <x86intrin.h>
> 
>  /** PCLMULQDQ CRC computation context structure */  struct
> crc_pclmulqdq_ctx { @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq(
>  	return n;
>  }
> 
> -
> -static inline void
> +void
>  rte_net_crc_sse42_init(void)
>  {
>  	uint64_t k1, k2, k5, k6;
> @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void)
>  	 * use other data types such as float, double, etc.
>  	 */
>  	_mm_empty();
> -
>  }
> 
> -static inline uint32_t
> -rte_crc16_ccitt_sse42_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	/** return 16-bit CRC value */
>  	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
> @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t
> *data,
>  		&crc16_ccitt_pclmulqdq);
>  }
> 
> -static inline uint32_t
> -rte_crc32_eth_sse42_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	return ~crc32_eth_calc_pclmulqdq(data,
>  		data_len,
>  		0xffffffffUL,
>  		&crc32_eth_pclmulqdq);
>  }
> -
> -#ifdef __cplusplus
> -}
> -#endif
> -
> -#endif /* _RTE_NET_CRC_SSE_H_ */
> diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index
> 4f5b9e828..83dccbfba 100644
> --- a/lib/librte_net/rte_net_crc.c
> +++ b/lib/librte_net/rte_net_crc.c
> @@ -1,5 +1,5 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2017 Intel Corporation
> + * Copyright(c) 2017-2020 Intel Corporation
>   */
> 
>  #include <stddef.h>
> @@ -10,17 +10,7 @@
>  #include <rte_common.h>
>  #include <rte_net_crc.h>
> 
> -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__)
> -#define X86_64_SSE42_PCLMULQDQ     1
> -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO)
> -#define ARM64_NEON_PMULL           1
> -#endif
> -
> -#ifdef X86_64_SSE42_PCLMULQDQ
> -#include <net_crc_sse.h>
> -#elif defined ARM64_NEON_PMULL
> -#include <net_crc_neon.h>
> -#endif
> +#include "net_crc.h"
> 
>  /** CRC polynomials */
>  #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL @@ -47,13 +37,13 @@
> static rte_net_crc_handler handlers_scalar[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,  };
> -
> -#ifdef X86_64_SSE42_PCLMULQDQ
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
>  static rte_net_crc_handler handlers_sse42[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,  }; -#elif
> defined ARM64_NEON_PMULL
> +#endif
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
>  static rte_net_crc_handler handlers_neon[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler, @@ -142,22
> +132,44 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
>  		crc32_eth_lut);
>  }
> 
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static uint8_t
> +sse42_pclmulqdq_cpu_supported(void)
> +{
> +	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ);
> +}
> +#endif
> +
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> +static uint8_t
> +neon_pmull_cpu_supported(void)
> +{
> +	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL);
> +}
> +#endif
> +
>  void
>  rte_net_crc_set_alg(enum rte_net_crc_alg alg)  {
>  	switch (alg) {
> -#ifdef X86_64_SSE42_PCLMULQDQ
> +#ifdef RTE_ARCH_X86_64
>  	case RTE_NET_CRC_SSE42:
> -		handlers = handlers_sse42;
> -		break;
> -#elif defined ARM64_NEON_PMULL
> -		/* fall-through */
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> +		if (sse42_pclmulqdq_cpu_supported()) {
> +			handlers = handlers_sse42;
> +			break;
> +		}
> +#endif
> +#endif /* RTE_ARCH_X86_64 */
> +#ifdef RTE_ARCH_ARM64
>  	case RTE_NET_CRC_NEON:
> -		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> +		if (neon_pmull_cpu_supported()) {
>  			handlers = handlers_neon;
>  			break;
>  		}
>  #endif
> +#endif /* RTE_ARCH_ARM64 */
>  		/* fall-through */
>  	case RTE_NET_CRC_SCALAR:
>  		/* fall-through */
> @@ -188,11 +200,14 @@ RTE_INIT(rte_net_crc_init)
> 
>  	rte_net_crc_scalar_init();
> 
> -#ifdef X86_64_SSE42_PCLMULQDQ
> -	alg = RTE_NET_CRC_SSE42;
> -	rte_net_crc_sse42_init();
> -#elif defined ARM64_NEON_PMULL
> -	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> +	if (sse42_pclmulqdq_cpu_supported()) {
> +		alg = RTE_NET_CRC_SSE42;
> +		rte_net_crc_sse42_init();
> +	}
> +#endif
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> +	if (neon_pmull_cpu_supported()) {
>  		alg = RTE_NET_CRC_NEON;
>  		rte_net_crc_neon_init();
>  	}
> --
> 2.12.3

Patch looks good to me except the one stated above.

 


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC
  2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
@ 2020-10-05 13:20   ` De Lara Guarch, Pablo
  2020-10-05 13:38     ` O'loingsigh, Mairtin
  0 siblings, 1 reply; 23+ messages in thread
From: De Lara Guarch, Pablo @ 2020-10-05 13:20 UTC (permalink / raw)
  To: O'loingsigh, Mairtin, Singh, Jasvinder, Richardson, Bruce
  Cc: dev, Ryan, Brendan, Coyle, David

Hi Mairtin,

> -----Original Message-----
> From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>
> Sent: Tuesday, September 29, 2020 4:36 PM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; Coyle, David
> <david.coyle@intel.com>; O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>
> Subject: [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC
> 
> This patch enables the optimized calculation of CRC32-Ethernet and CRC16-
> CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC
> implementation is built if the compiler supports the required instruction sets. It is
> selected at run-time if the host CPU, again, supports the required instruction
> sets.
> 
> Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
> Signed-off-by: David Coyle <david.coyle@intel.com>

...

> +static __rte_always_inline uint32_t
> +crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len, uint32_t
> crc,
> +	const struct crc_vpclmulqdq_ctx *params) {
> +	__m128i res, d;
> +	__m256i b;
> +	__m512i temp, k;
> +	__m512i qw0 = _mm512_set1_epi64(0), qw1, qw2, qw3;
> +	__m512i fold0, fold1, fold2, fold3;
> +	__mmask16 mask;
> +	uint32_t n = 0;
> +	int reduction = 0;
> +
> +	/* Get CRC init value */
> +	b = _mm256_insert_epi32(_mm256_setzero_si256(), crc, 0);
> +	temp = _mm512_inserti32x8(_mm512_setzero_si512(), b, 0);

You can replace this with the following, which produces less instructions
(b needs to be changed to __m128i):

        b = _mm_cvtsi32_si128(crc);
        temp = _mm512_castsi128_si512(b);

> +
> +	if (data_len > 255) {
> +		fold0 = _mm512_loadu_si512((const __m512i *)data);

...

> +	} else {
> +		if (data_len > 31) {
> +			res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);

Should work better with:

res = _mm_cvtsi32_si128(crc);

> +			d = _mm_loadu_si128((const __m128i *)data);
> +			res = _mm_xor_si128(res, d);
> +			n += 16;
> +
> +			reduction = 240 - ((n+256)-data_len);
> +
> +			while (reduction > 0)
> +				reduction_loop(&res, &reduction, data, &n,
> +						params);
> +
> +			if (n != data_len)
> +				res = last_two_xmm(data, data_len, n, res,
> +						params);
> +		} else if (data_len > 16) {
> +			res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);

Same as above.

> +			d = _mm_loadu_si128((const __m128i *)data);
> +			res = _mm_xor_si128(res, d);
> +			n += 16;
> +
> +			if (n != data_len)
> +				res = last_two_xmm(data, data_len, n, res,
> +						params);
> +		} else if (data_len == 16) {
> +			res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);

Same.

> +			d = _mm_loadu_si128((const __m128i *)data);
> +			res = _mm_xor_si128(res, d);
> +		} else {
> +			res = _mm_insert_epi32(_mm_setzero_si128(), crc, 0);

Same.

> +			mask = byte_len_to_mask_table[data_len];
> +			d = _mm_maskz_loadu_epi8(mask, data);


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC
  2020-10-05 13:20   ` De Lara Guarch, Pablo
@ 2020-10-05 13:38     ` O'loingsigh, Mairtin
  0 siblings, 0 replies; 23+ messages in thread
From: O'loingsigh, Mairtin @ 2020-10-05 13:38 UTC (permalink / raw)
  To: De Lara Guarch, Pablo, Singh, Jasvinder, Richardson, Bruce
  Cc: dev, Ryan, Brendan, Coyle, David

Hi Pablo,

> -----Original Message-----
> From: De Lara Guarch, Pablo <pablo.de.lara.guarch@intel.com>
> Sent: Monday, October 5, 2020 2:20 PM
> To: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>; Singh, Jasvinder
> <jasvinder.singh@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>
> Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; Coyle, David
> <david.coyle@intel.com>
> Subject: RE: [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ
> based CRC
> 
> Hi Mairtin,
> 
> > -----Original Message-----
> > From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>
> > Sent: Tuesday, September 29, 2020 4:36 PM
> > To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce
> > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>
> > Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; Coyle,
> David
> > <david.coyle@intel.com>; O'loingsigh, Mairtin
> > <mairtin.oloingsigh@intel.com>
> > Subject: [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based
> > CRC
> >
> > This patch enables the optimized calculation of CRC32-Ethernet and
> > CRC16- CCITT using the AVX512 and VPCLMULQDQ instruction sets. This
> > CRC implementation is built if the compiler supports the required
> > instruction sets. It is selected at run-time if the host CPU, again,
> > supports the required instruction sets.
> >
> > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
> > Signed-off-by: David Coyle <david.coyle@intel.com>
> 
> ...
> 
> > +static __rte_always_inline uint32_t
> > +crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len,
> > +uint32_t
> > crc,
> > +	const struct crc_vpclmulqdq_ctx *params) {
> > +	__m128i res, d;
> > +	__m256i b;
> > +	__m512i temp, k;
> > +	__m512i qw0 = _mm512_set1_epi64(0), qw1, qw2, qw3;
> > +	__m512i fold0, fold1, fold2, fold3;
> > +	__mmask16 mask;
> > +	uint32_t n = 0;
> > +	int reduction = 0;
> > +
> > +	/* Get CRC init value */
> > +	b = _mm256_insert_epi32(_mm256_setzero_si256(), crc, 0);
> > +	temp = _mm512_inserti32x8(_mm512_setzero_si512(), b, 0);
> 
> You can replace this with the following, which produces less instructions (b
> needs to be changed to __m128i):
> 
>         b = _mm_cvtsi32_si128(crc);
>         temp = _mm512_castsi128_si512(b);
> 
> > +
> > +	if (data_len > 255) {
> > +		fold0 = _mm512_loadu_si512((const __m512i *)data);
> 
> ...
> 
> > +	} else {
> > +		if (data_len > 31) {
> > +			res = _mm_insert_epi32(_mm_setzero_si128(), crc,
> 0);
> 
> Should work better with:
> 
> res = _mm_cvtsi32_si128(crc);
> 
> > +			d = _mm_loadu_si128((const __m128i *)data);
> > +			res = _mm_xor_si128(res, d);
> > +			n += 16;
> > +
> > +			reduction = 240 - ((n+256)-data_len);
> > +
> > +			while (reduction > 0)
> > +				reduction_loop(&res, &reduction, data, &n,
> > +						params);
> > +
> > +			if (n != data_len)
> > +				res = last_two_xmm(data, data_len, n, res,
> > +						params);
> > +		} else if (data_len > 16) {
> > +			res = _mm_insert_epi32(_mm_setzero_si128(), crc,
> 0);
> 
> Same as above.
> 
> > +			d = _mm_loadu_si128((const __m128i *)data);
> > +			res = _mm_xor_si128(res, d);
> > +			n += 16;
> > +
> > +			if (n != data_len)
> > +				res = last_two_xmm(data, data_len, n, res,
> > +						params);
> > +		} else if (data_len == 16) {
> > +			res = _mm_insert_epi32(_mm_setzero_si128(), crc,
> 0);
> 
> Same.
> 
> > +			d = _mm_loadu_si128((const __m128i *)data);
> > +			res = _mm_xor_si128(res, d);
> > +		} else {
> > +			res = _mm_insert_epi32(_mm_setzero_si128(), crc,
> 0);
> 
> Same.
> 
> > +			mask = byte_len_to_mask_table[data_len];
> > +			d = _mm_maskz_loadu_epi8(mask, data);

Thanks for the feedback.
Ill make these changes and submit a v4 patch

Regards,
Mairtin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC
  2020-09-29 15:35 [dpdk-dev] [PATCH v3 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
  2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
  2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
@ 2020-10-06 16:23 ` Mairtin o Loingsigh
  2020-10-06 16:23   ` [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
                     ` (3 more replies)
  2 siblings, 4 replies; 23+ messages in thread
From: Mairtin o Loingsigh @ 2020-10-06 16:23 UTC (permalink / raw)
  To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, brendan.ryan, mairtin.oloingsigh, david.coyle

This patchset makes two significant enhancements to the CRC modules of
the rte_net library:

1) Adds run-time selection of the optimal architecture-specific CRC path.
   Previously the selection was solely made at compile-time, meaning it
   could only be built and run on the same generation of CPU. Adding
   run-time selection ability means this can be used from distro packages
   and/or DPDK can be compiled on an older CPU and run on a newer CPU.
2) Adds an optimized CRC implementation based on the AVX512 and
   VPCLMULQDQ instruction sets.
   
For further details, please see the commit messages of the individual
patches.

v4:
* Fixed build issue when older version of meson is used (0.47.1)
* Addressed review comments
  * remove Intel copyright header from neon CRC file
  * tidy-up of register initialisation

v3:
* Re-submitted v2 as encountered problems when originally submitting it.

v2:
* Added support for run-time selection of optimal architecture-specific
  CRC, based on v1 review comment.
* Added full working AVX512/VPCLMULQDQ support for CRC32-Ethernet and
  CRC16-CCITT.

v1:
* Initial version, with incomplete AVX512/VPCLMULQDQ support for
  CRC32-Ethernet only.

Mairtin o Loingsigh (2):
  net: add run-time architecture specific CRC selection
  net: add support for AVX512/VPCLMULQDQ based CRC

 app/test/test_crc.c                               |  11 +-
 config/x86/meson.build                            |   6 +-
 doc/guides/rel_notes/release_20_11.rst            |   6 +
 lib/librte_net/meson.build                        |  89 ++++-
 lib/librte_net/net_crc.h                          |  45 +++
 lib/librte_net/net_crc_avx512.c                   | 423 ++++++++++++++++++++++
 lib/librte_net/{net_crc_neon.h => net_crc_neon.c} |  26 +-
 lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   |  34 +-
 lib/librte_net/rte_net_crc.c                      | 100 +++--
 lib/librte_net/rte_net_crc.h                      |   4 +-
 10 files changed, 672 insertions(+), 72 deletions(-)
 create mode 100644 lib/librte_net/net_crc.h
 create mode 100644 lib/librte_net/net_crc_avx512.c
 rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%)
 rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%)

-- 
2.12.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection
  2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh
@ 2020-10-06 16:23   ` Mairtin o Loingsigh
  2020-10-07 14:59     ` Ananyev, Konstantin
  2020-10-06 16:23   ` [dpdk-dev] [PATCH v4 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 23+ messages in thread
From: Mairtin o Loingsigh @ 2020-10-06 16:23 UTC (permalink / raw)
  To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, brendan.ryan, mairtin.oloingsigh, david.coyle

This patch adds support for run-time selection of the optimal
architecture-specific CRC path, based on the supported instruction set(s)
of the CPU.

The compiler option checks have been moved from the C files to the meson
script. The rte_cpu_get_flag_enabled function is called automatically by
the library at process initialization time to determine which
instructions the CPU supports, with the most optimal supported CRC path
ultimately selected.

Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Signed-off-by: David Coyle <david.coyle@intel.com>
---
 doc/guides/rel_notes/release_20_11.rst            |  4 ++
 lib/librte_net/meson.build                        | 34 +++++++++++-
 lib/librte_net/net_crc.h                          | 34 ++++++++++++
 lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 +++------
 lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   | 34 ++++--------
 lib/librte_net/rte_net_crc.c                      | 67 ++++++++++++++---------
 6 files changed, 131 insertions(+), 68 deletions(-)
 create mode 100644 lib/librte_net/net_crc.h
 rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%)
 rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%)

diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index ca5ec7391..0f14e087d 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Updated CRC modules of rte_net library.**
+
+  * Added run-time selection of the optimal architecture-specific CRC path.
+
 * **Updated Broadcom bnxt driver.**
 
   Updated the Broadcom bnxt driver with new features and improvements, including:
diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build
index 24ed8253b..fa439b9e5 100644
--- a/lib/librte_net/meson.build
+++ b/lib/librte_net/meson.build
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: BSD-3-Clause
-# Copyright(c) 2017 Intel Corporation
+# Copyright(c) 2017-2020 Intel Corporation
 
 headers = files('rte_ip.h',
 	'rte_tcp.h',
@@ -20,3 +20,35 @@ headers = files('rte_ip.h',
 
 sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c')
 deps += ['mbuf']
+
+if dpdk_conf.has('RTE_ARCH_X86_64')
+	net_crc_sse42_cpu_support = (
+		cc.get_define('__PCLMUL__', args: machine_args) != '')
+	net_crc_sse42_cc_support = (
+		cc.has_argument('-mpclmul') and cc.has_argument('-maes'))
+
+	build_static_net_crc_sse42_lib = 0
+
+	if net_crc_sse42_cpu_support == true
+		sources += files('net_crc_sse.c')
+		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
+	elif net_crc_sse42_cc_support == true
+		build_static_net_crc_sse42_lib = 1
+		net_crc_sse42_lib_cflags = ['-mpclmul', '-maes']
+		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
+	endif
+
+	if build_static_net_crc_sse42_lib == 1
+		net_crc_sse42_lib = static_library(
+					'net_crc_sse42_lib',
+					'net_crc_sse.c',
+					dependencies: static_rte_eal,
+					c_args: [cflags,
+						net_crc_sse42_lib_cflags])
+		objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c')
+	endif
+elif (dpdk_conf.has('RTE_ARCH_ARM64') and
+		cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '')
+	sources += files('net_crc_neon.c')
+	cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT']
+endif
diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h
new file mode 100644
index 000000000..a1578a56c
--- /dev/null
+++ b/lib/librte_net/net_crc.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#ifndef _NET_CRC_H_
+#define _NET_CRC_H_
+
+/*
+ * Different implementations of CRC
+ */
+
+/* SSE4.2 */
+
+void
+rte_net_crc_sse42_init(void);
+
+uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len);
+
+uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len);
+
+/* NEON */
+
+void
+rte_net_crc_neon_init(void);
+
+uint32_t
+rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len);
+
+uint32_t
+rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len);
+
+#endif /* _NET_CRC_H_ */
diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c
similarity index 95%
rename from lib/librte_net/net_crc_neon.h
rename to lib/librte_net/net_crc_neon.c
index 63fa1d4a1..f61d75a8c 100644
--- a/lib/librte_net/net_crc_neon.h
+++ b/lib/librte_net/net_crc_neon.c
@@ -2,17 +2,15 @@
  * Copyright(c) 2017 Cavium, Inc
  */
 
-#ifndef _NET_CRC_NEON_H_
-#define _NET_CRC_NEON_H_
+#include <string.h>
 
+#include <rte_common.h>
 #include <rte_branch_prediction.h>
 #include <rte_net_crc.h>
 #include <rte_vect.h>
 #include <rte_cpuflags.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
+#include "net_crc.h"
 
 /** PMULL CRC computation context structure */
 struct crc_pmull_ctx {
@@ -218,7 +216,7 @@ crc32_eth_calc_pmull(
 	return n;
 }
 
-static inline void
+void
 rte_net_crc_neon_init(void)
 {
 	/* Initialize CRC16 data */
@@ -242,9 +240,8 @@ rte_net_crc_neon_init(void)
 	crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8);
 }
 
-static inline uint32_t
-rte_crc16_ccitt_neon_handler(const uint8_t *data,
-	uint32_t data_len)
+uint32_t
+rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len)
 {
 	return (uint16_t)~crc32_eth_calc_pmull(data,
 		data_len,
@@ -252,18 +249,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data,
 		&crc16_ccitt_pmull);
 }
 
-static inline uint32_t
-rte_crc32_eth_neon_handler(const uint8_t *data,
-	uint32_t data_len)
+uint32_t
+rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len)
 {
 	return ~crc32_eth_calc_pmull(data,
 		data_len,
 		0xffffffffUL,
 		&crc32_eth_pmull);
 }
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* _NET_CRC_NEON_H_ */
diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c
similarity index 94%
rename from lib/librte_net/net_crc_sse.h
rename to lib/librte_net/net_crc_sse.c
index 1c7b7a548..053b54b39 100644
--- a/lib/librte_net/net_crc_sse.h
+++ b/lib/librte_net/net_crc_sse.c
@@ -1,18 +1,16 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2017-2020 Intel Corporation
  */
 
-#ifndef _RTE_NET_CRC_SSE_H_
-#define _RTE_NET_CRC_SSE_H_
+#include <string.h>
 
+#include <rte_common.h>
 #include <rte_branch_prediction.h>
+#include <rte_cpuflags.h>
 
-#include <x86intrin.h>
-#include <cpuid.h>
+#include "net_crc.h"
 
-#ifdef __cplusplus
-extern "C" {
-#endif
+#include <x86intrin.h>
 
 /** PCLMULQDQ CRC computation context structure */
 struct crc_pclmulqdq_ctx {
@@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq(
 	return n;
 }
 
-
-static inline void
+void
 rte_net_crc_sse42_init(void)
 {
 	uint64_t k1, k2, k5, k6;
@@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void)
 	 * use other data types such as float, double, etc.
 	 */
 	_mm_empty();
-
 }
 
-static inline uint32_t
-rte_crc16_ccitt_sse42_handler(const uint8_t *data,
-	uint32_t data_len)
+uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len)
 {
 	/** return 16-bit CRC value */
 	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
@@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data,
 		&crc16_ccitt_pclmulqdq);
 }
 
-static inline uint32_t
-rte_crc32_eth_sse42_handler(const uint8_t *data,
-	uint32_t data_len)
+uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len)
 {
 	return ~crc32_eth_calc_pclmulqdq(data,
 		data_len,
 		0xffffffffUL,
 		&crc32_eth_pclmulqdq);
 }
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* _RTE_NET_CRC_SSE_H_ */
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
index 4f5b9e828..83dccbfba 100644
--- a/lib/librte_net/rte_net_crc.c
+++ b/lib/librte_net/rte_net_crc.c
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2017-2020 Intel Corporation
  */
 
 #include <stddef.h>
@@ -10,17 +10,7 @@
 #include <rte_common.h>
 #include <rte_net_crc.h>
 
-#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__)
-#define X86_64_SSE42_PCLMULQDQ     1
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO)
-#define ARM64_NEON_PMULL           1
-#endif
-
-#ifdef X86_64_SSE42_PCLMULQDQ
-#include <net_crc_sse.h>
-#elif defined ARM64_NEON_PMULL
-#include <net_crc_neon.h>
-#endif
+#include "net_crc.h"
 
 /** CRC polynomials */
 #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
@@ -47,13 +37,13 @@ static rte_net_crc_handler handlers_scalar[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
 };
-
-#ifdef X86_64_SSE42_PCLMULQDQ
+#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
 static rte_net_crc_handler handlers_sse42[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
 };
-#elif defined ARM64_NEON_PMULL
+#endif
+#ifdef CC_ARM64_NEON_PMULL_SUPPORT
 static rte_net_crc_handler handlers_neon[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler,
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler,
@@ -142,22 +132,44 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
 		crc32_eth_lut);
 }
 
+#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
+static uint8_t
+sse42_pclmulqdq_cpu_supported(void)
+{
+	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ);
+}
+#endif
+
+#ifdef CC_ARM64_NEON_PMULL_SUPPORT
+static uint8_t
+neon_pmull_cpu_supported(void)
+{
+	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL);
+}
+#endif
+
 void
 rte_net_crc_set_alg(enum rte_net_crc_alg alg)
 {
 	switch (alg) {
-#ifdef X86_64_SSE42_PCLMULQDQ
+#ifdef RTE_ARCH_X86_64
 	case RTE_NET_CRC_SSE42:
-		handlers = handlers_sse42;
-		break;
-#elif defined ARM64_NEON_PMULL
-		/* fall-through */
+#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
+		if (sse42_pclmulqdq_cpu_supported()) {
+			handlers = handlers_sse42;
+			break;
+		}
+#endif
+#endif /* RTE_ARCH_X86_64 */
+#ifdef RTE_ARCH_ARM64
 	case RTE_NET_CRC_NEON:
-		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
+#ifdef CC_ARM64_NEON_PMULL_SUPPORT
+		if (neon_pmull_cpu_supported()) {
 			handlers = handlers_neon;
 			break;
 		}
 #endif
+#endif /* RTE_ARCH_ARM64 */
 		/* fall-through */
 	case RTE_NET_CRC_SCALAR:
 		/* fall-through */
@@ -188,11 +200,14 @@ RTE_INIT(rte_net_crc_init)
 
 	rte_net_crc_scalar_init();
 
-#ifdef X86_64_SSE42_PCLMULQDQ
-	alg = RTE_NET_CRC_SSE42;
-	rte_net_crc_sse42_init();
-#elif defined ARM64_NEON_PMULL
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
+#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
+	if (sse42_pclmulqdq_cpu_supported()) {
+		alg = RTE_NET_CRC_SSE42;
+		rte_net_crc_sse42_init();
+	}
+#endif
+#ifdef CC_ARM64_NEON_PMULL_SUPPORT
+	if (neon_pmull_cpu_supported()) {
 		alg = RTE_NET_CRC_NEON;
 		rte_net_crc_neon_init();
 	}
-- 
2.12.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v4 2/2] net: add support for AVX512/VPCLMULQDQ based CRC
  2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh
  2020-10-06 16:23   ` [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
@ 2020-10-06 16:23   ` Mairtin o Loingsigh
  2020-10-07  9:26   ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " David Marchand
  2020-10-09 13:50   ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh
  3 siblings, 0 replies; 23+ messages in thread
From: Mairtin o Loingsigh @ 2020-10-06 16:23 UTC (permalink / raw)
  To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch
  Cc: dev, brendan.ryan, mairtin.oloingsigh, david.coyle

This patch enables the optimized calculation of CRC32-Ethernet and
CRC16-CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC
implementation is built if the compiler supports the required instruction
sets. It is selected at run-time if the host CPU, again, supports the
required instruction sets.

Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Signed-off-by: David Coyle <david.coyle@intel.com>
---
 app/test/test_crc.c                    |  11 +-
 config/x86/meson.build                 |   6 +-
 doc/guides/rel_notes/release_20_11.rst |   2 +
 lib/librte_net/meson.build             |  55 +++++
 lib/librte_net/net_crc.h               |  11 +
 lib/librte_net/net_crc_avx512.c        | 423 +++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_crc.c           |  33 +++
 lib/librte_net/rte_net_crc.h           |   4 +-
 8 files changed, 541 insertions(+), 4 deletions(-)
 create mode 100644 lib/librte_net/net_crc_avx512.c

diff --git a/app/test/test_crc.c b/app/test/test_crc.c
index f8a74e04e..bf1d34435 100644
--- a/app/test/test_crc.c
+++ b/app/test/test_crc.c
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2017-2020 Intel Corporation
  */
 
 #include "test.h"
@@ -149,6 +149,15 @@ test_crc(void)
 		return ret;
 	}
 
+	/* set CRC avx512 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_AVX512);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test crc (x86_64 AVX512): failed (%d)\n", ret);
+		return ret;
+	}
+
 	/* set CRC neon mode */
 	rte_net_crc_set_alg(RTE_NET_CRC_NEON);
 
diff --git a/config/x86/meson.build b/config/x86/meson.build
index fea4d5403..172b72b72 100644
--- a/config/x86/meson.build
+++ b/config/x86/meson.build
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: BSD-3-Clause
-# Copyright(c) 2017-2019 Intel Corporation
+# Copyright(c) 2017-2020 Intel Corporation
 
 # get binutils version for the workaround of Bug 97
 if not is_windows
@@ -23,7 +23,9 @@ endforeach
 
 optional_flags = ['AES', 'PCLMUL',
 		'AVX', 'AVX2', 'AVX512F',
-		'RDRND', 'RDSEED']
+		'RDRND', 'RDSEED',
+		'AVX512BW', 'AVX512DQ',
+		'AVX512VL', 'VPCLMULQDQ']
 foreach f:optional_flags
 	if cc.get_define('__@0@__'.format(f), args: machine_args) == '1'
 		if f == 'PCLMUL' # special case flags with different defines
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 0f14e087d..af6e5e40c 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -58,6 +58,8 @@ New Features
 * **Updated CRC modules of rte_net library.**
 
   * Added run-time selection of the optimal architecture-specific CRC path.
+  * Added optimized implementations of CRC32-Ethernet and CRC16-CCITT
+    using the AVX512 and VPCLMULQDQ instruction sets.
 
 * **Updated Broadcom bnxt driver.**
 
diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build
index fa439b9e5..6c96b361a 100644
--- a/lib/librte_net/meson.build
+++ b/lib/librte_net/meson.build
@@ -24,18 +24,62 @@ deps += ['mbuf']
 if dpdk_conf.has('RTE_ARCH_X86_64')
 	net_crc_sse42_cpu_support = (
 		cc.get_define('__PCLMUL__', args: machine_args) != '')
+	net_crc_avx512_cpu_support = (
+		cc.get_define('__AVX512F__', args: machine_args) != '' and
+		cc.get_define('__AVX512BW__', args: machine_args) != '' and
+		cc.get_define('__AVX512DQ__', args: machine_args) != '' and
+		cc.get_define('__AVX512VL__', args: machine_args) != '' and
+		cc.get_define('__VPCLMULQDQ__', args: machine_args) != '')
+
 	net_crc_sse42_cc_support = (
 		cc.has_argument('-mpclmul') and cc.has_argument('-maes'))
+	net_crc_avx512_cc_support = (
+		not machine_args.contains('-mno-avx512f') and
+		cc.has_argument('-mavx512f') and
+		cc.has_argument('-mavx512bw') and
+		cc.has_argument('-mavx512dq') and
+		cc.has_argument('-mavx512vl') and
+		cc.has_argument('-mvpclmulqdq') and
+		cc.has_argument('-mavx2') and
+		cc.has_argument('-mavx'))
 
 	build_static_net_crc_sse42_lib = 0
+	build_static_net_crc_avx512_lib = 0
 
 	if net_crc_sse42_cpu_support == true
 		sources += files('net_crc_sse.c')
 		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
+		if net_crc_avx512_cpu_support == true
+			sources += files('net_crc_avx512.c')
+			cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT']
+		elif net_crc_avx512_cc_support == true
+			build_static_net_crc_avx512_lib = 1
+			net_crc_avx512_lib_cflags = ['-mavx512f',
+							'-mavx512bw',
+							'-mavx512dq',
+							'-mavx512vl',
+							'-mvpclmulqdq',
+							'-mavx2',
+							'-mavx']
+			cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT']
+		endif
 	elif net_crc_sse42_cc_support == true
 		build_static_net_crc_sse42_lib = 1
 		net_crc_sse42_lib_cflags = ['-mpclmul', '-maes']
 		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
+		if net_crc_avx512_cc_support == true
+			build_static_net_crc_avx512_lib = 1
+			net_crc_avx512_lib_cflags = ['-mpclmul',
+							'-maes',
+							'-mavx512f',
+							'-mavx512bw',
+							'-mavx512dq',
+							'-mavx512vl',
+							'-mvpclmulqdq',
+							'-mavx2',
+							'-mavx']
+			cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT']
+		endif
 	endif
 
 	if build_static_net_crc_sse42_lib == 1
@@ -47,6 +91,17 @@ if dpdk_conf.has('RTE_ARCH_X86_64')
 						net_crc_sse42_lib_cflags])
 		objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c')
 	endif
+
+	if build_static_net_crc_avx512_lib == 1
+		net_crc_avx512_lib = static_library(
+					'net_crc_avx512_lib',
+					'net_crc_avx512.c',
+					dependencies: static_rte_eal,
+					c_args: [cflags,
+						net_crc_avx512_lib_cflags])
+		objs += net_crc_avx512_lib.extract_objects('net_crc_avx512.c')
+	endif
+
 elif (dpdk_conf.has('RTE_ARCH_ARM64') and
 		cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '')
 	sources += files('net_crc_neon.c')
diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h
index a1578a56c..7a74d5406 100644
--- a/lib/librte_net/net_crc.h
+++ b/lib/librte_net/net_crc.h
@@ -20,6 +20,17 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len);
 uint32_t
 rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len);
 
+/* AVX512 */
+
+void
+rte_net_crc_avx512_init(void);
+
+uint32_t
+rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len);
+
+uint32_t
+rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len);
+
 /* NEON */
 
 void
diff --git a/lib/librte_net/net_crc_avx512.c b/lib/librte_net/net_crc_avx512.c
new file mode 100644
index 000000000..3740fe3c9
--- /dev/null
+++ b/lib/librte_net/net_crc_avx512.c
@@ -0,0 +1,423 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_branch_prediction.h>
+#include <rte_cpuflags.h>
+
+#include "net_crc.h"
+
+#include <x86intrin.h>
+
+/* VPCLMULQDQ CRC computation context structure */
+struct crc_vpclmulqdq_ctx {
+	__m512i rk1_rk2;
+	__m512i rk3_rk4;
+	__m512i fold_7x128b;
+	__m512i fold_3x128b;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+	__m128i fold_1x128b;
+};
+
+static struct crc_vpclmulqdq_ctx crc32_eth __rte_aligned(64);
+static struct crc_vpclmulqdq_ctx crc16_ccitt __rte_aligned(64);
+
+static uint16_t byte_len_to_mask_table[] = {
+	0x0000, 0x0001, 0x0003, 0x0007,
+	0x000f, 0x001f, 0x003f, 0x007f,
+	0x00ff, 0x01ff, 0x03ff, 0x07ff,
+	0x0fff, 0x1fff, 0x3fff, 0x7fff,
+	0xffff};
+
+static const uint8_t shf_table[32] __rte_aligned(16) = {
+	0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+	0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+};
+
+static const uint32_t mask[4] __rte_aligned(16) = {
+	0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+};
+
+static const uint32_t mask2[4] __rte_aligned(16) = {
+	0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+};
+
+static __rte_always_inline __m512i
+crcr32_folding_round(__m512i data_block, __m512i precomp, __m512i fold)
+{
+	__m512i tmp0, tmp1;
+
+	tmp0 = _mm512_clmulepi64_epi128(fold, precomp, 0x01);
+	tmp1 = _mm512_clmulepi64_epi128(fold, precomp, 0x10);
+
+	return _mm512_ternarylogic_epi64(tmp0, tmp1, data_block, 0x96);
+}
+
+static __rte_always_inline __m128i
+crc32_fold_128(__m512i fold0, __m512i fold1,
+	const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i res, res2;
+	__m256i a;
+	__m512i tmp0, tmp1, tmp2, tmp3;
+	__m512i tmp4;
+
+	tmp0 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, 0x01);
+	tmp1 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, 0x10);
+
+	res = _mm512_extracti64x2_epi64(fold1, 3);
+	tmp4 = _mm512_maskz_broadcast_i32x4(0xF, res);
+
+	tmp2 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, 0x01);
+	tmp3 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, 0x10);
+
+	tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp1, tmp2, 0x96);
+	tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp3, tmp4, 0x96);
+
+	tmp1 = _mm512_shuffle_i64x2(tmp0, tmp0, 0x4e);
+
+	a = _mm256_xor_si256(*(__m256i *)&tmp1, *(__m256i *)&tmp0);
+	res = _mm256_extracti64x2_epi64(a, 1);
+	res2 = _mm_xor_si128(res, *(__m128i *)&a);
+
+	return res2;
+}
+
+static __rte_always_inline __m128i
+last_two_xmm(const uint8_t *data, uint32_t data_len, uint32_t n, __m128i res,
+	const struct crc_vpclmulqdq_ctx *params)
+{
+	uint32_t offset;
+	__m128i res2, res3, res4, pshufb_shf;
+
+	const uint32_t mask3[4] __rte_aligned(16) = {
+		   0x80808080, 0x80808080, 0x80808080, 0x80808080
+	};
+
+	res2 = res;
+	offset = data_len - n;
+	res3 = _mm_loadu_si128((const __m128i *)&data[n+offset-16]);
+
+	pshufb_shf = _mm_loadu_si128((const __m128i *)
+			(shf_table + (data_len-n)));
+
+	res = _mm_shuffle_epi8(res, pshufb_shf);
+	pshufb_shf = _mm_xor_si128(pshufb_shf,
+			_mm_load_si128((const __m128i *) mask3));
+	res2 = _mm_shuffle_epi8(res2, pshufb_shf);
+
+	res2 = _mm_blendv_epi8(res2, res3, pshufb_shf);
+
+	res4 = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x01);
+	res = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x10);
+	res = _mm_ternarylogic_epi64(res, res2, res4, 0x96);
+
+	return res;
+}
+
+static __rte_always_inline __m128i
+done_128(__m128i res, const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i res1;
+
+	res1 = res;
+
+	res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x0);
+	res1 = _mm_srli_si128(res1, 8);
+	res = _mm_xor_si128(res, res1);
+
+	res1 = res;
+	res = _mm_slli_si128(res, 4);
+	res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x10);
+	res = _mm_xor_si128(res, res1);
+
+	return res;
+}
+
+static __rte_always_inline uint32_t
+barrett_reduction(__m128i data64, const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i tmp0, tmp1;
+
+	data64 =  _mm_and_si128(data64, *(const __m128i *)mask2);
+	tmp0 = data64;
+	tmp1 = data64;
+
+	data64 = _mm_clmulepi64_si128(tmp0, params->rk7_rk8, 0x0);
+	data64 = _mm_ternarylogic_epi64(data64, tmp1, *(const __m128i *)mask,
+			0x28);
+
+	tmp1 = data64;
+	data64 = _mm_clmulepi64_si128(data64, params->rk7_rk8, 0x10);
+	data64 = _mm_ternarylogic_epi64(data64, tmp1, tmp0, 0x96);
+
+	return _mm_extract_epi32(data64, 2);
+}
+
+static __rte_always_inline void
+reduction_loop(__m128i *fold, int *len, const uint8_t *data, uint32_t *n,
+	const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i tmp, tmp1;
+
+	tmp = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x1);
+	*fold = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x10);
+	*fold = _mm_xor_si128(*fold, tmp);
+	tmp1 = _mm_loadu_si128((const __m128i *)&data[*n]);
+	*fold = _mm_xor_si128(*fold, tmp1);
+	*n += 16;
+	*len -= 16;
+}
+
+static __rte_always_inline uint32_t
+crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len, uint32_t crc,
+	const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i res, d, b;
+	__m512i temp, k;
+	__m512i qw0 = _mm512_set1_epi64(0), qw1, qw2, qw3;
+	__m512i fold0, fold1, fold2, fold3;
+	__mmask16 mask;
+	uint32_t n = 0;
+	int reduction = 0;
+
+	/* Get CRC init value */
+	b = _mm_cvtsi32_si128(crc);
+	temp = _mm512_castsi128_si512(b);
+
+	if (data_len > 255) {
+		fold0 = _mm512_loadu_si512((const __m512i *)data);
+		fold1 = _mm512_loadu_si512((const __m512i *)(data+64));
+		fold2 = _mm512_loadu_si512((const __m512i *)(data+128));
+		fold3 = _mm512_loadu_si512((const __m512i *)(data+192));
+		fold0 = _mm512_xor_si512(fold0, temp);
+
+		/* Main folding loop */
+		k = params->rk1_rk2;
+		for (n = 256; (n + 256) <= data_len; n += 256) {
+			qw0 = _mm512_loadu_si512((const __m512i *)&data[n]);
+			qw1 = _mm512_loadu_si512((const __m512i *)
+					&(data[n+64]));
+			qw2 = _mm512_loadu_si512((const __m512i *)
+					&(data[n+128]));
+			qw3 = _mm512_loadu_si512((const __m512i *)
+					&(data[n+192]));
+			fold0 = crcr32_folding_round(qw0, k, fold0);
+			fold1 = crcr32_folding_round(qw1, k, fold1);
+			fold2 = crcr32_folding_round(qw2, k, fold2);
+			fold3 = crcr32_folding_round(qw3, k, fold3);
+		}
+
+		/* 256 to 128 fold */
+		k = params->rk3_rk4;
+		fold0 = crcr32_folding_round(fold2, k, fold0);
+		fold1 = crcr32_folding_round(fold3, k, fold1);
+
+		res = crc32_fold_128(fold0, fold1, params);
+
+		reduction = 240 - ((n+256)-data_len);
+
+		while (reduction > 0)
+			reduction_loop(&res, &reduction, data, &n,
+					params);
+
+		reduction += 16;
+
+		if (n != data_len)
+			res = last_two_xmm(data, data_len, n, res,
+					params);
+	} else {
+		if (data_len > 31) {
+			res = _mm_cvtsi32_si128(crc);
+			d = _mm_loadu_si128((const __m128i *)data);
+			res = _mm_xor_si128(res, d);
+			n += 16;
+
+			reduction = 240 - ((n+256)-data_len);
+
+			while (reduction > 0)
+				reduction_loop(&res, &reduction, data, &n,
+						params);
+
+			if (n != data_len)
+				res = last_two_xmm(data, data_len, n, res,
+						params);
+		} else if (data_len > 16) {
+			res = _mm_cvtsi32_si128(crc);
+			d = _mm_loadu_si128((const __m128i *)data);
+			res = _mm_xor_si128(res, d);
+			n += 16;
+
+			if (n != data_len)
+				res = last_two_xmm(data, data_len, n, res,
+						params);
+		} else if (data_len == 16) {
+			res = _mm_cvtsi32_si128(crc);
+			d = _mm_loadu_si128((const __m128i *)data);
+			res = _mm_xor_si128(res, d);
+		} else {
+			res = _mm_cvtsi32_si128(crc);
+			mask = byte_len_to_mask_table[data_len];
+			d = _mm_maskz_loadu_epi8(mask, data);
+			res = _mm_xor_si128(res, d);
+
+			if (data_len > 3) {
+				d = _mm_loadu_si128((const __m128i *)
+						&shf_table[data_len]);
+				res = _mm_shuffle_epi8(res, d);
+			} else if (data_len > 2) {
+				res = _mm_slli_si128(res, 5);
+				goto do_barrett_reduction;
+			} else if (data_len > 1) {
+				res = _mm_slli_si128(res, 6);
+				goto do_barrett_reduction;
+			} else if (data_len > 0) {
+				res = _mm_slli_si128(res, 7);
+				goto do_barrett_reduction;
+			} else {
+				/* zero length case */
+				return crc;
+			}
+		}
+	}
+
+	res = done_128(res, params);
+
+do_barrett_reduction:
+	n = barrett_reduction(res, params);
+
+	return n;
+}
+
+static void
+crc32_load_init_constants(void)
+{
+	__m128i a;
+	/* fold constants */
+	uint64_t c0 = 0x00000000e95c1271;
+	uint64_t c1 = 0x00000000ce3371cb;
+	uint64_t c2 = 0x00000000910eeec1;
+	uint64_t c3 = 0x0000000033fff533;
+	uint64_t c4 = 0x000000000cbec0ed;
+	uint64_t c5 = 0x0000000031f8303f;
+	uint64_t c6 = 0x0000000057c54819;
+	uint64_t c7 = 0x00000000df068dc2;
+	uint64_t c8 = 0x00000000ae0b5394;
+	uint64_t c9 = 0x000000001c279815;
+	uint64_t c10 = 0x000000001d9513d7;
+	uint64_t c11 = 0x000000008f352d95;
+	uint64_t c12 = 0x00000000af449247;
+	uint64_t c13 = 0x000000003db1ecdc;
+	uint64_t c14 = 0x0000000081256527;
+	uint64_t c15 = 0x00000000f1da05aa;
+	uint64_t c16 = 0x00000000ccaa009e;
+	uint64_t c17 = 0x00000000ae689191;
+	uint64_t c18 = 0x00000000ccaa009e;
+	uint64_t c19 = 0x00000000b8bc6765;
+	uint64_t c20 = 0x00000001f7011640;
+	uint64_t c21 = 0x00000001db710640;
+
+	a = _mm_set_epi64x(c1, c0);
+	crc32_eth.rk1_rk2 = _mm512_broadcast_i32x4(a);
+
+	a = _mm_set_epi64x(c3, c2);
+	crc32_eth.rk3_rk4 = _mm512_broadcast_i32x4(a);
+
+	crc32_eth.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8,
+			c9, c10, c11);
+	crc32_eth.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15,
+			c16, c17, 0, 0);
+	crc32_eth.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16),
+			_mm_cvtsi64_m64(c17));
+
+	crc32_eth.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18),
+			_mm_cvtsi64_m64(c19));
+	crc32_eth.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20),
+			_mm_cvtsi64_m64(c21));
+}
+
+static void
+crc16_load_init_constants(void)
+{
+	__m128i a;
+	/* fold constants */
+	uint64_t c0 = 0x0000000000009a19;
+	uint64_t c1 = 0x0000000000002df8;
+	uint64_t c2 = 0x00000000000068af;
+	uint64_t c3 = 0x000000000000b6c9;
+	uint64_t c4 = 0x000000000000c64f;
+	uint64_t c5 = 0x000000000000cd95;
+	uint64_t c6 = 0x000000000000d341;
+	uint64_t c7 = 0x000000000000b8f2;
+	uint64_t c8 = 0x0000000000000842;
+	uint64_t c9 = 0x000000000000b072;
+	uint64_t c10 = 0x00000000000047e3;
+	uint64_t c11 = 0x000000000000922d;
+	uint64_t c12 = 0x0000000000000e3a;
+	uint64_t c13 = 0x0000000000004d7a;
+	uint64_t c14 = 0x0000000000005b44;
+	uint64_t c15 = 0x0000000000007762;
+	uint64_t c16 = 0x00000000000081bf;
+	uint64_t c17 = 0x0000000000008e10;
+	uint64_t c18 = 0x00000000000081bf;
+	uint64_t c19 = 0x0000000000001cbb;
+	uint64_t c20 = 0x000000011c581910;
+	uint64_t c21 = 0x0000000000010810;
+
+	a = _mm_set_epi64x(c1, c0);
+	crc16_ccitt.rk1_rk2 = _mm512_broadcast_i32x4(a);
+
+	a = _mm_set_epi64x(c3, c2);
+	crc16_ccitt.rk3_rk4 = _mm512_broadcast_i32x4(a);
+
+	crc16_ccitt.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8,
+			c9, c10, c11);
+	crc16_ccitt.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15,
+			c16, c17, 0, 0);
+	crc16_ccitt.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16),
+			_mm_cvtsi64_m64(c17));
+
+	crc16_ccitt.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18),
+			_mm_cvtsi64_m64(c19));
+	crc16_ccitt.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20),
+			_mm_cvtsi64_m64(c21));
+}
+
+void
+rte_net_crc_avx512_init(void)
+{
+	crc32_load_init_constants();
+	crc16_load_init_constants();
+
+	/*
+	 * Reset the register as following calculation may
+	 * use other data types such as float, double, etc.
+	 */
+	_mm_empty();
+}
+
+uint32_t
+rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len)
+{
+	/* return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_vpclmulqdq(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt);
+}
+
+uint32_t
+rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len)
+{
+	/* return 32-bit CRC value */
+	return ~crc32_eth_calc_vpclmulqdq(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth);
+}
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
index 83dccbfba..fcf9cc0ef 100644
--- a/lib/librte_net/rte_net_crc.c
+++ b/lib/librte_net/rte_net_crc.c
@@ -37,6 +37,12 @@ static rte_net_crc_handler handlers_scalar[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
 };
+#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
+static rte_net_crc_handler handlers_avx512[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_avx512_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_avx512_handler,
+};
+#endif
 #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
 static rte_net_crc_handler handlers_sse42[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
@@ -132,6 +138,19 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
 		crc32_eth_lut);
 }
 
+#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
+static uint8_t
+avx512_vpclmulqdq_cpu_supported(void)
+{
+	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) &&
+		rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) &&
+		rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512DQ) &&
+		rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) &&
+		rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ) &&
+		rte_cpu_get_flag_enabled(RTE_CPUFLAG_VPCLMULQDQ);
+}
+#endif
+
 #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
 static uint8_t
 sse42_pclmulqdq_cpu_supported(void)
@@ -153,6 +172,14 @@ rte_net_crc_set_alg(enum rte_net_crc_alg alg)
 {
 	switch (alg) {
 #ifdef RTE_ARCH_X86_64
+	case RTE_NET_CRC_AVX512:
+#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
+		if (avx512_vpclmulqdq_cpu_supported()) {
+			handlers = handlers_avx512;
+			break;
+		}
+#endif
+		/* fall-through */
 	case RTE_NET_CRC_SSE42:
 #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
 		if (sse42_pclmulqdq_cpu_supported()) {
@@ -206,6 +233,12 @@ RTE_INIT(rte_net_crc_init)
 		rte_net_crc_sse42_init();
 	}
 #endif
+#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
+	if (avx512_vpclmulqdq_cpu_supported()) {
+		alg = RTE_NET_CRC_AVX512;
+		rte_net_crc_avx512_init();
+	}
+#endif
 #ifdef CC_ARM64_NEON_PMULL_SUPPORT
 	if (neon_pmull_cpu_supported()) {
 		alg = RTE_NET_CRC_NEON;
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
index 16e85ca97..72d3e10ff 100644
--- a/lib/librte_net/rte_net_crc.h
+++ b/lib/librte_net/rte_net_crc.h
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2017-2020 Intel Corporation
  */
 
 #ifndef _RTE_NET_CRC_H_
@@ -23,6 +23,7 @@ enum rte_net_crc_alg {
 	RTE_NET_CRC_SCALAR = 0,
 	RTE_NET_CRC_SSE42,
 	RTE_NET_CRC_NEON,
+	RTE_NET_CRC_AVX512,
 };
 
 /**
@@ -35,6 +36,7 @@ enum rte_net_crc_alg {
  *   - RTE_NET_CRC_SCALAR
  *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
  *   - RTE_NET_CRC_NEON (Use ARM Neon intrinsic)
+ *   - RTE_NET_CRC_AVX512 (Use 512-bit AVX intrinsic)
  */
 void
 rte_net_crc_set_alg(enum rte_net_crc_alg alg);
-- 
2.12.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection
  2020-10-02 15:17   ` Singh, Jasvinder
@ 2020-10-06 16:38     ` O'loingsigh, Mairtin
  0 siblings, 0 replies; 23+ messages in thread
From: O'loingsigh, Mairtin @ 2020-10-06 16:38 UTC (permalink / raw)
  To: Singh, Jasvinder, Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, Ryan, Brendan, Coyle, David

Hi,

> -----Original Message-----
> From: Singh, Jasvinder <jasvinder.singh@intel.com>
> Sent: Friday, October 2, 2020 4:18 PM
> To: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>
> Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; Coyle, David
> <david.coyle@intel.com>
> Subject: RE: [PATCH v3 1/2] net: add run-time architecture specific CRC
> selection
> 
> 
> 
> > -----Original Message-----
> > From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>
> > Sent: Tuesday, September 29, 2020 4:36 PM
> > To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce
> > <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> > <pablo.de.lara.guarch@intel.com>
> > Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; Coyle,
> David
> > <david.coyle@intel.com>; O'loingsigh, Mairtin
> > <mairtin.oloingsigh@intel.com>
> > Subject: [PATCH v3 1/2] net: add run-time architecture specific CRC
> > selection
> >
> > This patch adds support for run-time selection of the optimal
> > architecture- specific CRC path, based on the supported instruction set(s)
> of the CPU.
> >
> > The compiler option checks have been moved from the C files to the
> > meson script. The rte_cpu_get_flag_enabled function is called
> > automatically by the library at process initialization time to
> > determine which instructions the CPU supports, with the most optimal
> supported CRC path ultimately selected.
> >
> > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
> > Signed-off-by: David Coyle <david.coyle@intel.com>
> > ---
> >  doc/guides/rel_notes/release_20_11.rst            |  4 ++
> >  lib/librte_net/meson.build                        | 34 +++++++++++-
> >  lib/librte_net/net_crc.h                          | 34 ++++++++++++
> >  lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 27 +++------
> >  lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   | 34 ++++--------
> >  lib/librte_net/rte_net_crc.c                      | 67 ++++++++++++++---------
> >  6 files changed, 132 insertions(+), 68 deletions(-)  create mode
> > 100644 lib/librte_net/net_crc.h  rename lib/librte_net/{net_crc_neon.h
> > => net_crc_neon.c} (95%)  rename lib/librte_net/{net_crc_sse.h =>
> > net_crc_sse.c} (94%)
> >
> > diff --git a/doc/guides/rel_notes/release_20_11.rst
> > b/doc/guides/rel_notes/release_20_11.rst
> > index 4eb3224a7..6bd222dca 100644
> > --- a/doc/guides/rel_notes/release_20_11.rst
> > +++ b/doc/guides/rel_notes/release_20_11.rst
> > @@ -55,6 +55,10 @@ New Features
> >       Also, make sure to start the actual text at the margin.
> >       =======================================================
> 
> <snip>
> 
> 
> _t *data, uint32_t data_len);
> > +
> > +#endif /* _NET_CRC_H_ */
> > diff --git a/lib/librte_net/net_crc_neon.h
> > b/lib/librte_net/net_crc_neon.c similarity index 95% rename from
> > lib/librte_net/net_crc_neon.h rename to lib/librte_net/net_crc_neon.c
> > index 63fa1d4a1..b79684ec2 100644
> > --- a/lib/librte_net/net_crc_neon.h
> > +++ b/lib/librte_net/net_crc_neon.c
> > @@ -1,18 +1,17 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> >   * Copyright(c) 2017 Cavium, Inc
> > + * Copyright(c) 2020 Intel Corporation
> >   */
> 
> Could you please remove intel copyright as there is no change in this file?
> 
> > -#ifndef _NET_CRC_NEON_H_
> > -#define _NET_CRC_NEON_H_
> > +#include <string.h>
> >
> > +#include <rte_common.h>
> >  #include <rte_branch_prediction.h>
> >  #include <rte_net_crc.h>
> >  #include <rte_vect.h>
> >  #include <rte_cpuflags.h>
> >
> > -#ifdef __cplusplus
> > -extern "C" {
> > -#endif
> > +#include "net_crc.h"
> >
> >  /** PMULL CRC computation context structure */  struct crc_pmull_ctx
> > { @@
> > -218,7 +217,7 @@ crc32_eth_calc_pmull(
> >  	return n;
> >  }
> >
> > -static inline void
> > +void
> >  rte_net_crc_neon_init(void)
> >  {
> >  	/* Initialize CRC16 data */
> > @@ -242,9 +241,8 @@ rte_net_crc_neon_init(void)
> >  	crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8);  }
> >
> > -static inline uint32_t
> > -rte_crc16_ccitt_neon_handler(const uint8_t *data,
> > -	uint32_t data_len)
> > +uint32_t
> > +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len)
> >  {
> >  	return (uint16_t)~crc32_eth_calc_pmull(data,
> >  		data_len,
> > @@ -252,18 +250,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t
> *data,
> >  		&crc16_ccitt_pmull);
> >  }
> >
> > -static inline uint32_t
> > -rte_crc32_eth_neon_handler(const uint8_t *data,
> > -	uint32_t data_len)
> > +uint32_t
> > +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len)
> >  {
> >  	return ~crc32_eth_calc_pmull(data,
> >  		data_len,
> >  		0xffffffffUL,
> >  		&crc32_eth_pmull);
> >  }
> > -
> > -#ifdef __cplusplus
> > -}
> > -#endif
> > -
> > -#endif /* _NET_CRC_NEON_H_ */
> > diff --git a/lib/librte_net/net_crc_sse.h
> > b/lib/librte_net/net_crc_sse.c similarity index 94% rename from
> > lib/librte_net/net_crc_sse.h rename to lib/librte_net/net_crc_sse.c
> > index 1c7b7a548..053b54b39 100644
> > --- a/lib/librte_net/net_crc_sse.h
> > +++ b/lib/librte_net/net_crc_sse.c
> > @@ -1,18 +1,16 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> > - * Copyright(c) 2017 Intel Corporation
> > + * Copyright(c) 2017-2020 Intel Corporation
> >   */
> >
> > -#ifndef _RTE_NET_CRC_SSE_H_
> > -#define _RTE_NET_CRC_SSE_H_
> > +#include <string.h>
> >
> > +#include <rte_common.h>
> >  #include <rte_branch_prediction.h>
> > +#include <rte_cpuflags.h>
> >
> > -#include <x86intrin.h>
> > -#include <cpuid.h>
> > +#include "net_crc.h"
> >
> > -#ifdef __cplusplus
> > -extern "C" {
> > -#endif
> > +#include <x86intrin.h>
> >
> >  /** PCLMULQDQ CRC computation context structure */  struct
> > crc_pclmulqdq_ctx { @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq(
> >  	return n;
> >  }
> >
> > -
> > -static inline void
> > +void
> >  rte_net_crc_sse42_init(void)
> >  {
> >  	uint64_t k1, k2, k5, k6;
> > @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void)
> >  	 * use other data types such as float, double, etc.
> >  	 */
> >  	_mm_empty();
> > -
> >  }
> >
> > -static inline uint32_t
> > -rte_crc16_ccitt_sse42_handler(const uint8_t *data,
> > -	uint32_t data_len)
> > +uint32_t
> > +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len)
> >  {
> >  	/** return 16-bit CRC value */
> >  	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
> > @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t
> > *data,
> >  		&crc16_ccitt_pclmulqdq);
> >  }
> >
> > -static inline uint32_t
> > -rte_crc32_eth_sse42_handler(const uint8_t *data,
> > -	uint32_t data_len)
> > +uint32_t
> > +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len)
> >  {
> >  	return ~crc32_eth_calc_pclmulqdq(data,
> >  		data_len,
> >  		0xffffffffUL,
> >  		&crc32_eth_pclmulqdq);
> >  }
> > -
> > -#ifdef __cplusplus
> > -}
> > -#endif
> > -
> > -#endif /* _RTE_NET_CRC_SSE_H_ */
> > diff --git a/lib/librte_net/rte_net_crc.c
> > b/lib/librte_net/rte_net_crc.c index 4f5b9e828..83dccbfba 100644
> > --- a/lib/librte_net/rte_net_crc.c
> > +++ b/lib/librte_net/rte_net_crc.c
> > @@ -1,5 +1,5 @@
> >  /* SPDX-License-Identifier: BSD-3-Clause
> > - * Copyright(c) 2017 Intel Corporation
> > + * Copyright(c) 2017-2020 Intel Corporation
> >   */
> >
> >  #include <stddef.h>
> > @@ -10,17 +10,7 @@
> >  #include <rte_common.h>
> >  #include <rte_net_crc.h>
> >
> > -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__)
> > -#define X86_64_SSE42_PCLMULQDQ     1
> > -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO)
> > -#define ARM64_NEON_PMULL           1
> > -#endif
> > -
> > -#ifdef X86_64_SSE42_PCLMULQDQ
> > -#include <net_crc_sse.h>
> > -#elif defined ARM64_NEON_PMULL
> > -#include <net_crc_neon.h>
> > -#endif
> > +#include "net_crc.h"
> >
> >  /** CRC polynomials */
> >  #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL @@ -47,13 +37,13 @@
> static
> > rte_net_crc_handler handlers_scalar[] = {
> >  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
> >  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,  };
> > -
> > -#ifdef X86_64_SSE42_PCLMULQDQ
> > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> >  static rte_net_crc_handler handlers_sse42[] = {
> >  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
> >  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,  }; -#elif
> > defined ARM64_NEON_PMULL
> > +#endif
> > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> >  static rte_net_crc_handler handlers_neon[] = {
> >  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler,
> >  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler, @@ -142,22
> > +132,44 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t
> > +data_len)
> >  		crc32_eth_lut);
> >  }
> >
> > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static uint8_t
> > +sse42_pclmulqdq_cpu_supported(void)
> > +{
> > +	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ);
> > +}
> > +#endif
> > +
> > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> > +static uint8_t
> > +neon_pmull_cpu_supported(void)
> > +{
> > +	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL);
> > +}
> > +#endif
> > +
> >  void
> >  rte_net_crc_set_alg(enum rte_net_crc_alg alg)  {
> >  	switch (alg) {
> > -#ifdef X86_64_SSE42_PCLMULQDQ
> > +#ifdef RTE_ARCH_X86_64
> >  	case RTE_NET_CRC_SSE42:
> > -		handlers = handlers_sse42;
> > -		break;
> > -#elif defined ARM64_NEON_PMULL
> > -		/* fall-through */
> > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> > +		if (sse42_pclmulqdq_cpu_supported()) {
> > +			handlers = handlers_sse42;
> > +			break;
> > +		}
> > +#endif
> > +#endif /* RTE_ARCH_X86_64 */
> > +#ifdef RTE_ARCH_ARM64
> >  	case RTE_NET_CRC_NEON:
> > -		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> > +		if (neon_pmull_cpu_supported()) {
> >  			handlers = handlers_neon;
> >  			break;
> >  		}
> >  #endif
> > +#endif /* RTE_ARCH_ARM64 */
> >  		/* fall-through */
> >  	case RTE_NET_CRC_SCALAR:
> >  		/* fall-through */
> > @@ -188,11 +200,14 @@ RTE_INIT(rte_net_crc_init)
> >
> >  	rte_net_crc_scalar_init();
> >
> > -#ifdef X86_64_SSE42_PCLMULQDQ
> > -	alg = RTE_NET_CRC_SSE42;
> > -	rte_net_crc_sse42_init();
> > -#elif defined ARM64_NEON_PMULL
> > -	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> > +	if (sse42_pclmulqdq_cpu_supported()) {
> > +		alg = RTE_NET_CRC_SSE42;
> > +		rte_net_crc_sse42_init();
> > +	}
> > +#endif
> > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> > +	if (neon_pmull_cpu_supported()) {
> >  		alg = RTE_NET_CRC_NEON;
> >  		rte_net_crc_neon_init();
> >  	}
> > --
> > 2.12.3
> 
> Patch looks good to me except the one stated above.
> 
> 
Fix for above comment on copyright has been applied to v4 patch which was just submitted

Regards,
Mairtin

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC
  2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh
  2020-10-06 16:23   ` [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
  2020-10-06 16:23   ` [dpdk-dev] [PATCH v4 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
@ 2020-10-07  9:26   ` David Marchand
  2020-10-09 13:50   ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh
  3 siblings, 0 replies; 23+ messages in thread
From: David Marchand @ 2020-10-07  9:26 UTC (permalink / raw)
  To: Olivier Matz, Bruce Richardson, Ananyev, Konstantin,
	Jerin Jacob Kollanukkaran, Ruifeng Wang (Arm Technology China)
  Cc: Singh, Jasvinder, Pablo de Lara, dev, Ryan, Brendan, David Coyle,
	Mairtin o Loingsigh

On Tue, Oct 6, 2020 at 6:23 PM Mairtin o Loingsigh
<mairtin.oloingsigh@intel.com> wrote:
>
> This patchset makes two significant enhancements to the CRC modules of
> the rte_net library:
>
> 1) Adds run-time selection of the optimal architecture-specific CRC path.
>    Previously the selection was solely made at compile-time, meaning it
>    could only be built and run on the same generation of CPU. Adding
>    run-time selection ability means this can be used from distro packages
>    and/or DPDK can be compiled on an older CPU and run on a newer CPU.
> 2) Adds an optimized CRC implementation based on the AVX512 and
>    VPCLMULQDQ instruction sets.
>
> For further details, please see the commit messages of the individual
> patches.

Reviews please?
Thanks.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection
  2020-10-06 16:23   ` [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
@ 2020-10-07 14:59     ` Ananyev, Konstantin
  2020-10-09 14:04       ` Coyle, David
  0 siblings, 1 reply; 23+ messages in thread
From: Ananyev, Konstantin @ 2020-10-07 14:59 UTC (permalink / raw)
  To: O'loingsigh, Mairtin, Singh, Jasvinder, Richardson, Bruce,
	De Lara Guarch, Pablo
  Cc: dev, Ryan, Brendan, O'loingsigh, Mairtin, Coyle, David


> 
> This patch adds support for run-time selection of the optimal
> architecture-specific CRC path, based on the supported instruction set(s)
> of the CPU.
> 
> The compiler option checks have been moved from the C files to the meson
> script. The rte_cpu_get_flag_enabled function is called automatically by
> the library at process initialization time to determine which
> instructions the CPU supports, with the most optimal supported CRC path
> ultimately selected.
> 
> Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
> Signed-off-by: David Coyle <david.coyle@intel.com>

LGTM, just one nit see below.
With that:
Series acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>

> ---
>  doc/guides/rel_notes/release_20_11.rst            |  4 ++
>  lib/librte_net/meson.build                        | 34 +++++++++++-
>  lib/librte_net/net_crc.h                          | 34 ++++++++++++
>  lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 +++------
>  lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   | 34 ++++--------
>  lib/librte_net/rte_net_crc.c                      | 67 ++++++++++++++---------
>  6 files changed, 131 insertions(+), 68 deletions(-)
>  create mode 100644 lib/librte_net/net_crc.h
>  rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%)
>  rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%)
> 
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index ca5ec7391..0f14e087d 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -55,6 +55,10 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
> 
> +* **Updated CRC modules of rte_net library.**
> +
> +  * Added run-time selection of the optimal architecture-specific CRC path.
> +
>  * **Updated Broadcom bnxt driver.**
> 
>    Updated the Broadcom bnxt driver with new features and improvements, including:
> diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build
> index 24ed8253b..fa439b9e5 100644
> --- a/lib/librte_net/meson.build
> +++ b/lib/librte_net/meson.build
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: BSD-3-Clause
> -# Copyright(c) 2017 Intel Corporation
> +# Copyright(c) 2017-2020 Intel Corporation
> 
>  headers = files('rte_ip.h',
>  	'rte_tcp.h',
> @@ -20,3 +20,35 @@ headers = files('rte_ip.h',
> 
>  sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c')
>  deps += ['mbuf']
> +
> +if dpdk_conf.has('RTE_ARCH_X86_64')
> +	net_crc_sse42_cpu_support = (
> +		cc.get_define('__PCLMUL__', args: machine_args) != '')
> +	net_crc_sse42_cc_support = (
> +		cc.has_argument('-mpclmul') and cc.has_argument('-maes'))
> +
> +	build_static_net_crc_sse42_lib = 0
> +
> +	if net_crc_sse42_cpu_support == true
> +		sources += files('net_crc_sse.c')
> +		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
> +	elif net_crc_sse42_cc_support == true
> +		build_static_net_crc_sse42_lib = 1
> +		net_crc_sse42_lib_cflags = ['-mpclmul', '-maes']
> +		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
> +	endif
> +
> +	if build_static_net_crc_sse42_lib == 1
> +		net_crc_sse42_lib = static_library(
> +					'net_crc_sse42_lib',
> +					'net_crc_sse.c',
> +					dependencies: static_rte_eal,
> +					c_args: [cflags,
> +						net_crc_sse42_lib_cflags])
> +		objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c')
> +	endif
> +elif (dpdk_conf.has('RTE_ARCH_ARM64') and
> +		cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '')
> +	sources += files('net_crc_neon.c')
> +	cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT']
> +endif
> diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h
> new file mode 100644
> index 000000000..a1578a56c
> --- /dev/null
> +++ b/lib/librte_net/net_crc.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#ifndef _NET_CRC_H_
> +#define _NET_CRC_H_
> +
> +/*
> + * Different implementations of CRC
> + */
> +
> +/* SSE4.2 */
> +
> +void
> +rte_net_crc_sse42_init(void);
> +
> +uint32_t
> +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len);
> +
> +uint32_t
> +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len);
> +
> +/* NEON */
> +
> +void
> +rte_net_crc_neon_init(void);
> +
> +uint32_t
> +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len);
> +
> +uint32_t
> +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len);
> +
> +#endif /* _NET_CRC_H_ */
> diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c
> similarity index 95%
> rename from lib/librte_net/net_crc_neon.h
> rename to lib/librte_net/net_crc_neon.c
> index 63fa1d4a1..f61d75a8c 100644
> --- a/lib/librte_net/net_crc_neon.h
> +++ b/lib/librte_net/net_crc_neon.c
> @@ -2,17 +2,15 @@
>   * Copyright(c) 2017 Cavium, Inc
>   */
> 
> -#ifndef _NET_CRC_NEON_H_
> -#define _NET_CRC_NEON_H_
> +#include <string.h>
> 
> +#include <rte_common.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_net_crc.h>
>  #include <rte_vect.h>
>  #include <rte_cpuflags.h>
> 
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> +#include "net_crc.h"
> 
>  /** PMULL CRC computation context structure */
>  struct crc_pmull_ctx {
> @@ -218,7 +216,7 @@ crc32_eth_calc_pmull(
>  	return n;
>  }
> 
> -static inline void
> +void
>  rte_net_crc_neon_init(void)
>  {
>  	/* Initialize CRC16 data */
> @@ -242,9 +240,8 @@ rte_net_crc_neon_init(void)
>  	crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8);
>  }
> 
> -static inline uint32_t
> -rte_crc16_ccitt_neon_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	return (uint16_t)~crc32_eth_calc_pmull(data,
>  		data_len,
> @@ -252,18 +249,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data,
>  		&crc16_ccitt_pmull);
>  }
> 
> -static inline uint32_t
> -rte_crc32_eth_neon_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	return ~crc32_eth_calc_pmull(data,
>  		data_len,
>  		0xffffffffUL,
>  		&crc32_eth_pmull);
>  }
> -
> -#ifdef __cplusplus
> -}
> -#endif
> -
> -#endif /* _NET_CRC_NEON_H_ */
> diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c
> similarity index 94%
> rename from lib/librte_net/net_crc_sse.h
> rename to lib/librte_net/net_crc_sse.c
> index 1c7b7a548..053b54b39 100644
> --- a/lib/librte_net/net_crc_sse.h
> +++ b/lib/librte_net/net_crc_sse.c
> @@ -1,18 +1,16 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2017 Intel Corporation
> + * Copyright(c) 2017-2020 Intel Corporation
>   */
> 
> -#ifndef _RTE_NET_CRC_SSE_H_
> -#define _RTE_NET_CRC_SSE_H_
> +#include <string.h>
> 
> +#include <rte_common.h>
>  #include <rte_branch_prediction.h>
> +#include <rte_cpuflags.h>
> 
> -#include <x86intrin.h>
> -#include <cpuid.h>
> +#include "net_crc.h"
> 
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> +#include <x86intrin.h>
> 
>  /** PCLMULQDQ CRC computation context structure */
>  struct crc_pclmulqdq_ctx {
> @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq(
>  	return n;
>  }
> 
> -
> -static inline void
> +void
>  rte_net_crc_sse42_init(void)
>  {
>  	uint64_t k1, k2, k5, k6;
> @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void)
>  	 * use other data types such as float, double, etc.
>  	 */
>  	_mm_empty();
> -
>  }
> 
> -static inline uint32_t
> -rte_crc16_ccitt_sse42_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	/** return 16-bit CRC value */
>  	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
> @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data,
>  		&crc16_ccitt_pclmulqdq);
>  }
> 
> -static inline uint32_t
> -rte_crc32_eth_sse42_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	return ~crc32_eth_calc_pclmulqdq(data,
>  		data_len,
>  		0xffffffffUL,
>  		&crc32_eth_pclmulqdq);
>  }
> -
> -#ifdef __cplusplus
> -}
> -#endif
> -
> -#endif /* _RTE_NET_CRC_SSE_H_ */
> diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
> index 4f5b9e828..83dccbfba 100644
> --- a/lib/librte_net/rte_net_crc.c
> +++ b/lib/librte_net/rte_net_crc.c
> @@ -1,5 +1,5 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2017 Intel Corporation
> + * Copyright(c) 2017-2020 Intel Corporation
>   */
> 
>  #include <stddef.h>
> @@ -10,17 +10,7 @@
>  #include <rte_common.h>
>  #include <rte_net_crc.h>
> 
> -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__)
> -#define X86_64_SSE42_PCLMULQDQ     1
> -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO)
> -#define ARM64_NEON_PMULL           1
> -#endif
> -
> -#ifdef X86_64_SSE42_PCLMULQDQ
> -#include <net_crc_sse.h>
> -#elif defined ARM64_NEON_PMULL
> -#include <net_crc_neon.h>
> -#endif
> +#include "net_crc.h"
> 
>  /** CRC polynomials */
>  #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
> @@ -47,13 +37,13 @@ static rte_net_crc_handler handlers_scalar[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
>  };
> -
> -#ifdef X86_64_SSE42_PCLMULQDQ
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
>  static rte_net_crc_handler handlers_sse42[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
>  };
> -#elif defined ARM64_NEON_PMULL
> +#endif
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
>  static rte_net_crc_handler handlers_neon[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler,
> @@ -142,22 +132,44 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
>  		crc32_eth_lut);
>  }
> 
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> +static uint8_t
> +sse42_pclmulqdq_cpu_supported(void)
> +{
> +	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ);
> +}

As a nit, I think it would be better to hide #fidef inside the function,
and return an 0 when define is not set.
Something like:

static int
sse42_pclmulqdq_cpu_supported(void)
{
#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ);
#else
	return 0;
}

Same for other cpu_supported functions.
And then you can remove these ifdefs in set_alg and other palces, i.e.:

void
rte_net_crc_set_alg(enum rte_net_crc_alg alg)
{
        switch (alg) {
#ifdef RTE_ARCH_X86_64
        case RTE_NET_CRC_AVX512:
                if (avx512_vpclmulqdq_cpu_supported()) {
                        handlers = handlers_avx512;
                        break;
                }
                /* fall-through */
        case RTE_NET_CRC_SSE42:
                if (sse42_pclmulqdq_cpu_supported()) {
                        handlers = handlers_sse42;
                        break;
                }
#endif
...

Same for rte_net_crc_init()

> +#endif
> +
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> +static uint8_t
> +neon_pmull_cpu_supported(void)
> +{
> +	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL);
> +}
> +#endif
> +
>  void
>  rte_net_crc_set_alg(enum rte_net_crc_alg alg)
>  {
>  	switch (alg) {
> -#ifdef X86_64_SSE42_PCLMULQDQ
> +#ifdef RTE_ARCH_X86_64
>  	case RTE_NET_CRC_SSE42:
> -		handlers = handlers_sse42;
> -		break;
> -#elif defined ARM64_NEON_PMULL
> -		/* fall-through */
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> +		if (sse42_pclmulqdq_cpu_supported()) {
> +			handlers = handlers_sse42;
> +			break;
> +		}
> +#endif
> +#endif /* RTE_ARCH_X86_64 */
> +#ifdef RTE_ARCH_ARM64
>  	case RTE_NET_CRC_NEON:
> -		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> +		if (neon_pmull_cpu_supported()) {
>  			handlers = handlers_neon;
>  			break;
>  		}
>  #endif
> +#endif /* RTE_ARCH_ARM64 */
>  		/* fall-through */
>  	case RTE_NET_CRC_SCALAR:
>  		/* fall-through */
> @@ -188,11 +200,14 @@ RTE_INIT(rte_net_crc_init)
> 
>  	rte_net_crc_scalar_init();
> 
> -#ifdef X86_64_SSE42_PCLMULQDQ
> -	alg = RTE_NET_CRC_SSE42;
> -	rte_net_crc_sse42_init();
> -#elif defined ARM64_NEON_PMULL
> -	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> +	if (sse42_pclmulqdq_cpu_supported()) {
> +		alg = RTE_NET_CRC_SSE42;
> +		rte_net_crc_sse42_init();
> +	}
> +#endif
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> +	if (neon_pmull_cpu_supported()) {
>  		alg = RTE_NET_CRC_NEON;
>  		rte_net_crc_neon_init();
>  	}
> --
> 2.12.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v5 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC
  2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh
                     ` (2 preceding siblings ...)
  2020-10-07  9:26   ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " David Marchand
@ 2020-10-09 13:50   ` Mairtin o Loingsigh
  2020-10-09 13:50     ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
                       ` (3 more replies)
  3 siblings, 4 replies; 23+ messages in thread
From: Mairtin o Loingsigh @ 2020-10-09 13:50 UTC (permalink / raw)
  To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch,
	konstantin.ananyev
  Cc: dev, brendan.ryan, mairtin.oloingsigh, david.coyle

This patchset makes two significant enhancements to the CRC modules of
the rte_net library:

1) Adds run-time selection of the optimal architecture-specific CRC path.
   Previously the selection was solely made at compile-time, meaning it
   could only be built and run on the same generation of CPU. Adding
   run-time selection ability means this can be used from distro packages
   and/or DPDK can be compiled on an older CPU and run on a newer CPU.
2) Adds an optimized CRC implementation based on the AVX512 and
   VPCLMULQDQ instruction sets.
   
For further details, please see the commit messages of the individual
patches.

v5:
* Tidied-up the ifdef checks for RTE_ARCH_* and compiler support of CRC 
  paths, as per review comments:
  * All ifdef checks removed from API function definitions and moved into
    helper functions.
    
v4:
* Fixed build issue when older version of meson is used (0.47.1).
* Addressed review comments:
  * remove Intel copyright header from neon CRC file.
  * tidy-up of register initialisation.

v3:
* Re-submitted v2 as encountered problems when originally submitting it.

v2:
* Added support for run-time selection of optimal architecture-specific
  CRC, based on v1 review comment.
* Added full working AVX512/VPCLMULQDQ support for CRC32-Ethernet and
  CRC16-CCITT.

v1:
* Initial version, with incomplete AVX512/VPCLMULQDQ support for
  CRC32-Ethernet only.

Mairtin o Loingsigh (2):
  net: add run-time architecture specific CRC selection
  net: add support for AVX512/VPCLMULQDQ based CRC

 app/test/test_crc.c                               |  11 +-
 config/x86/meson.build                            |   6 +-
 doc/guides/rel_notes/release_20_11.rst            |   6 +
 lib/librte_net/meson.build                        |  89 ++++-
 lib/librte_net/net_crc.h                          |  45 +++
 lib/librte_net/net_crc_avx512.c                   | 423 ++++++++++++++++++++++
 lib/librte_net/{net_crc_neon.h => net_crc_neon.c} |  26 +-
 lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   |  34 +-
 lib/librte_net/rte_net_crc.c                      | 162 +++++++--
 lib/librte_net/rte_net_crc.h                      |   4 +-
 10 files changed, 722 insertions(+), 84 deletions(-)
 create mode 100644 lib/librte_net/net_crc.h
 create mode 100644 lib/librte_net/net_crc_avx512.c
 rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%)
 rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%)

-- 
2.12.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection
  2020-10-09 13:50   ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh
@ 2020-10-09 13:50     ` Mairtin o Loingsigh
  2020-10-09 16:22       ` Singh, Jasvinder
                         ` (2 more replies)
  2020-10-09 13:50     ` [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
                       ` (2 subsequent siblings)
  3 siblings, 3 replies; 23+ messages in thread
From: Mairtin o Loingsigh @ 2020-10-09 13:50 UTC (permalink / raw)
  To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch,
	konstantin.ananyev
  Cc: dev, brendan.ryan, mairtin.oloingsigh, david.coyle

This patch adds support for run-time selection of the optimal
architecture-specific CRC path, based on the supported instruction set(s)
of the CPU.

The compiler option checks have been moved from the C files to the meson
script. The rte_cpu_get_flag_enabled function is called automatically by
the library at process initialization time to determine which
instructions the CPU supports, with the most optimal supported CRC path
ultimately selected.

Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Signed-off-by: David Coyle <david.coyle@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 doc/guides/rel_notes/release_20_11.rst            |   4 +
 lib/librte_net/meson.build                        |  34 ++++++-
 lib/librte_net/net_crc.h                          |  34 +++++++
 lib/librte_net/{net_crc_neon.h => net_crc_neon.c} |  26 ++---
 lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   |  34 ++-----
 lib/librte_net/rte_net_crc.c                      | 116 +++++++++++++++-------
 6 files changed, 168 insertions(+), 80 deletions(-)
 create mode 100644 lib/librte_net/net_crc.h
 rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%)
 rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%)

diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index 808bdc4e5..b77297f7e 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -55,6 +55,10 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Updated CRC modules of rte_net library.**
+
+  * Added run-time selection of the optimal architecture-specific CRC path.
+
 * **Updated Broadcom bnxt driver.**
 
   Updated the Broadcom bnxt driver with new features and improvements, including:
diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build
index 24ed8253b..fa439b9e5 100644
--- a/lib/librte_net/meson.build
+++ b/lib/librte_net/meson.build
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: BSD-3-Clause
-# Copyright(c) 2017 Intel Corporation
+# Copyright(c) 2017-2020 Intel Corporation
 
 headers = files('rte_ip.h',
 	'rte_tcp.h',
@@ -20,3 +20,35 @@ headers = files('rte_ip.h',
 
 sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c')
 deps += ['mbuf']
+
+if dpdk_conf.has('RTE_ARCH_X86_64')
+	net_crc_sse42_cpu_support = (
+		cc.get_define('__PCLMUL__', args: machine_args) != '')
+	net_crc_sse42_cc_support = (
+		cc.has_argument('-mpclmul') and cc.has_argument('-maes'))
+
+	build_static_net_crc_sse42_lib = 0
+
+	if net_crc_sse42_cpu_support == true
+		sources += files('net_crc_sse.c')
+		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
+	elif net_crc_sse42_cc_support == true
+		build_static_net_crc_sse42_lib = 1
+		net_crc_sse42_lib_cflags = ['-mpclmul', '-maes']
+		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
+	endif
+
+	if build_static_net_crc_sse42_lib == 1
+		net_crc_sse42_lib = static_library(
+					'net_crc_sse42_lib',
+					'net_crc_sse.c',
+					dependencies: static_rte_eal,
+					c_args: [cflags,
+						net_crc_sse42_lib_cflags])
+		objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c')
+	endif
+elif (dpdk_conf.has('RTE_ARCH_ARM64') and
+		cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '')
+	sources += files('net_crc_neon.c')
+	cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT']
+endif
diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h
new file mode 100644
index 000000000..a1578a56c
--- /dev/null
+++ b/lib/librte_net/net_crc.h
@@ -0,0 +1,34 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#ifndef _NET_CRC_H_
+#define _NET_CRC_H_
+
+/*
+ * Different implementations of CRC
+ */
+
+/* SSE4.2 */
+
+void
+rte_net_crc_sse42_init(void);
+
+uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len);
+
+uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len);
+
+/* NEON */
+
+void
+rte_net_crc_neon_init(void);
+
+uint32_t
+rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len);
+
+uint32_t
+rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len);
+
+#endif /* _NET_CRC_H_ */
diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c
similarity index 95%
rename from lib/librte_net/net_crc_neon.h
rename to lib/librte_net/net_crc_neon.c
index 63fa1d4a1..f61d75a8c 100644
--- a/lib/librte_net/net_crc_neon.h
+++ b/lib/librte_net/net_crc_neon.c
@@ -2,17 +2,15 @@
  * Copyright(c) 2017 Cavium, Inc
  */
 
-#ifndef _NET_CRC_NEON_H_
-#define _NET_CRC_NEON_H_
+#include <string.h>
 
+#include <rte_common.h>
 #include <rte_branch_prediction.h>
 #include <rte_net_crc.h>
 #include <rte_vect.h>
 #include <rte_cpuflags.h>
 
-#ifdef __cplusplus
-extern "C" {
-#endif
+#include "net_crc.h"
 
 /** PMULL CRC computation context structure */
 struct crc_pmull_ctx {
@@ -218,7 +216,7 @@ crc32_eth_calc_pmull(
 	return n;
 }
 
-static inline void
+void
 rte_net_crc_neon_init(void)
 {
 	/* Initialize CRC16 data */
@@ -242,9 +240,8 @@ rte_net_crc_neon_init(void)
 	crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8);
 }
 
-static inline uint32_t
-rte_crc16_ccitt_neon_handler(const uint8_t *data,
-	uint32_t data_len)
+uint32_t
+rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len)
 {
 	return (uint16_t)~crc32_eth_calc_pmull(data,
 		data_len,
@@ -252,18 +249,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data,
 		&crc16_ccitt_pmull);
 }
 
-static inline uint32_t
-rte_crc32_eth_neon_handler(const uint8_t *data,
-	uint32_t data_len)
+uint32_t
+rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len)
 {
 	return ~crc32_eth_calc_pmull(data,
 		data_len,
 		0xffffffffUL,
 		&crc32_eth_pmull);
 }
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* _NET_CRC_NEON_H_ */
diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c
similarity index 94%
rename from lib/librte_net/net_crc_sse.h
rename to lib/librte_net/net_crc_sse.c
index 1c7b7a548..053b54b39 100644
--- a/lib/librte_net/net_crc_sse.h
+++ b/lib/librte_net/net_crc_sse.c
@@ -1,18 +1,16 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2017-2020 Intel Corporation
  */
 
-#ifndef _RTE_NET_CRC_SSE_H_
-#define _RTE_NET_CRC_SSE_H_
+#include <string.h>
 
+#include <rte_common.h>
 #include <rte_branch_prediction.h>
+#include <rte_cpuflags.h>
 
-#include <x86intrin.h>
-#include <cpuid.h>
+#include "net_crc.h"
 
-#ifdef __cplusplus
-extern "C" {
-#endif
+#include <x86intrin.h>
 
 /** PCLMULQDQ CRC computation context structure */
 struct crc_pclmulqdq_ctx {
@@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq(
 	return n;
 }
 
-
-static inline void
+void
 rte_net_crc_sse42_init(void)
 {
 	uint64_t k1, k2, k5, k6;
@@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void)
 	 * use other data types such as float, double, etc.
 	 */
 	_mm_empty();
-
 }
 
-static inline uint32_t
-rte_crc16_ccitt_sse42_handler(const uint8_t *data,
-	uint32_t data_len)
+uint32_t
+rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len)
 {
 	/** return 16-bit CRC value */
 	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
@@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data,
 		&crc16_ccitt_pclmulqdq);
 }
 
-static inline uint32_t
-rte_crc32_eth_sse42_handler(const uint8_t *data,
-	uint32_t data_len)
+uint32_t
+rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len)
 {
 	return ~crc32_eth_calc_pclmulqdq(data,
 		data_len,
 		0xffffffffUL,
 		&crc32_eth_pclmulqdq);
 }
-
-#ifdef __cplusplus
-}
-#endif
-
-#endif /* _RTE_NET_CRC_SSE_H_ */
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
index 4f5b9e828..d271d5205 100644
--- a/lib/librte_net/rte_net_crc.c
+++ b/lib/librte_net/rte_net_crc.c
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2017-2020 Intel Corporation
  */
 
 #include <stddef.h>
@@ -10,17 +10,7 @@
 #include <rte_common.h>
 #include <rte_net_crc.h>
 
-#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__)
-#define X86_64_SSE42_PCLMULQDQ     1
-#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO)
-#define ARM64_NEON_PMULL           1
-#endif
-
-#ifdef X86_64_SSE42_PCLMULQDQ
-#include <net_crc_sse.h>
-#elif defined ARM64_NEON_PMULL
-#include <net_crc_neon.h>
-#endif
+#include "net_crc.h"
 
 /** CRC polynomials */
 #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL
@@ -41,25 +31,27 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len);
 typedef uint32_t
 (*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len);
 
-static rte_net_crc_handler *handlers;
+static const rte_net_crc_handler *handlers;
 
-static rte_net_crc_handler handlers_scalar[] = {
+static const rte_net_crc_handler handlers_scalar[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
 };
-
-#ifdef X86_64_SSE42_PCLMULQDQ
-static rte_net_crc_handler handlers_sse42[] = {
+#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
+static const rte_net_crc_handler handlers_sse42[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,
 };
-#elif defined ARM64_NEON_PMULL
-static rte_net_crc_handler handlers_neon[] = {
+#endif
+#ifdef CC_ARM64_NEON_PMULL_SUPPORT
+static const rte_net_crc_handler handlers_neon[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler,
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler,
 };
 #endif
 
+/* Scalar handling */
+
 /**
  * Reflect the bits about the middle
  *
@@ -142,29 +134,82 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
 		crc32_eth_lut);
 }
 
+/* SSE4.2/PCLMULQDQ handling */
+
+#define SSE42_PCLMULQDQ_CPU_SUPPORTED \
+	rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ)
+
+static const rte_net_crc_handler *
+sse42_pclmulqdq_get_handlers(void)
+{
+#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
+	if (SSE42_PCLMULQDQ_CPU_SUPPORTED)
+		return handlers_sse42;
+#endif
+	return NULL;
+}
+
+static uint8_t
+sse42_pclmulqdq_init(void)
+{
+#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
+	if (SSE42_PCLMULQDQ_CPU_SUPPORTED) {
+		rte_net_crc_sse42_init();
+		return 1;
+	}
+#endif
+	return 0;
+}
+
+/* NEON/PMULL handling */
+
+#define NEON_PMULL_CPU_SUPPORTED \
+	rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)
+
+static const rte_net_crc_handler *
+neon_pmull_get_handlers(void)
+{
+#ifdef CC_ARM64_NEON_PMULL_SUPPORT
+	if (NEON_PMULL_CPU_SUPPORTED)
+		return handlers_neon;
+#endif
+	return NULL;
+}
+
+static uint8_t
+neon_pmull_init(void)
+{
+#ifdef CC_ARM64_NEON_PMULL_SUPPORT
+	if (NEON_PMULL_CPU_SUPPORTED) {
+		rte_net_crc_neon_init();
+		return 1;
+	}
+#endif
+	return 0;
+}
+
+/* Public API */
+
 void
 rte_net_crc_set_alg(enum rte_net_crc_alg alg)
 {
+	handlers = NULL;
+
 	switch (alg) {
-#ifdef X86_64_SSE42_PCLMULQDQ
 	case RTE_NET_CRC_SSE42:
-		handlers = handlers_sse42;
-		break;
-#elif defined ARM64_NEON_PMULL
-		/* fall-through */
+		handlers = sse42_pclmulqdq_get_handlers();
+		break; /* for x86, always break here */
 	case RTE_NET_CRC_NEON:
-		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
-			handlers = handlers_neon;
-			break;
-		}
-#endif
+		handlers = neon_pmull_get_handlers();
 		/* fall-through */
 	case RTE_NET_CRC_SCALAR:
 		/* fall-through */
 	default:
-		handlers = handlers_scalar;
 		break;
 	}
+
+	if (handlers == NULL)
+		handlers = handlers_scalar;
 }
 
 uint32_t
@@ -188,15 +233,10 @@ RTE_INIT(rte_net_crc_init)
 
 	rte_net_crc_scalar_init();
 
-#ifdef X86_64_SSE42_PCLMULQDQ
-	alg = RTE_NET_CRC_SSE42;
-	rte_net_crc_sse42_init();
-#elif defined ARM64_NEON_PMULL
-	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
+	if (sse42_pclmulqdq_init())
+		alg = RTE_NET_CRC_SSE42;
+	if (neon_pmull_init())
 		alg = RTE_NET_CRC_NEON;
-		rte_net_crc_neon_init();
-	}
-#endif
 
 	rte_net_crc_set_alg(alg);
 }
-- 
2.12.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC
  2020-10-09 13:50   ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh
  2020-10-09 13:50     ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
@ 2020-10-09 13:50     ` Mairtin o Loingsigh
  2020-10-09 16:24       ` Singh, Jasvinder
  2020-10-09 18:35     ` [dpdk-dev] [PATCH v5 0/2] net: add CRC run-time checks and " De Lara Guarch, Pablo
  2020-10-13 18:47     ` David Marchand
  3 siblings, 1 reply; 23+ messages in thread
From: Mairtin o Loingsigh @ 2020-10-09 13:50 UTC (permalink / raw)
  To: jasvinder.singh, bruce.richardson, pablo.de.lara.guarch,
	konstantin.ananyev
  Cc: dev, brendan.ryan, mairtin.oloingsigh, david.coyle

This patch enables the optimized calculation of CRC32-Ethernet and
CRC16-CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC
implementation is built if the compiler supports the required instruction
sets. It is selected at run-time if the host CPU, again, supports the
required instruction sets.

Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
Signed-off-by: David Coyle <david.coyle@intel.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
---
 app/test/test_crc.c                    |  11 +-
 config/x86/meson.build                 |   6 +-
 doc/guides/rel_notes/release_20_11.rst |   2 +
 lib/librte_net/meson.build             |  55 +++++
 lib/librte_net/net_crc.h               |  11 +
 lib/librte_net/net_crc_avx512.c        | 423 +++++++++++++++++++++++++++++++++
 lib/librte_net/rte_net_crc.c           |  46 ++++
 lib/librte_net/rte_net_crc.h           |   4 +-
 8 files changed, 554 insertions(+), 4 deletions(-)
 create mode 100644 lib/librte_net/net_crc_avx512.c

diff --git a/app/test/test_crc.c b/app/test/test_crc.c
index f8a74e04e..bf1d34435 100644
--- a/app/test/test_crc.c
+++ b/app/test/test_crc.c
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2017-2020 Intel Corporation
  */
 
 #include "test.h"
@@ -149,6 +149,15 @@ test_crc(void)
 		return ret;
 	}
 
+	/* set CRC avx512 mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_AVX512);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test crc (x86_64 AVX512): failed (%d)\n", ret);
+		return ret;
+	}
+
 	/* set CRC neon mode */
 	rte_net_crc_set_alg(RTE_NET_CRC_NEON);
 
diff --git a/config/x86/meson.build b/config/x86/meson.build
index fea4d5403..172b72b72 100644
--- a/config/x86/meson.build
+++ b/config/x86/meson.build
@@ -1,5 +1,5 @@
 # SPDX-License-Identifier: BSD-3-Clause
-# Copyright(c) 2017-2019 Intel Corporation
+# Copyright(c) 2017-2020 Intel Corporation
 
 # get binutils version for the workaround of Bug 97
 if not is_windows
@@ -23,7 +23,9 @@ endforeach
 
 optional_flags = ['AES', 'PCLMUL',
 		'AVX', 'AVX2', 'AVX512F',
-		'RDRND', 'RDSEED']
+		'RDRND', 'RDSEED',
+		'AVX512BW', 'AVX512DQ',
+		'AVX512VL', 'VPCLMULQDQ']
 foreach f:optional_flags
 	if cc.get_define('__@0@__'.format(f), args: machine_args) == '1'
 		if f == 'PCLMUL' # special case flags with different defines
diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
index b77297f7e..5eda680d5 100644
--- a/doc/guides/rel_notes/release_20_11.rst
+++ b/doc/guides/rel_notes/release_20_11.rst
@@ -58,6 +58,8 @@ New Features
 * **Updated CRC modules of rte_net library.**
 
   * Added run-time selection of the optimal architecture-specific CRC path.
+  * Added optimized implementations of CRC32-Ethernet and CRC16-CCITT
+    using the AVX512 and VPCLMULQDQ instruction sets.
 
 * **Updated Broadcom bnxt driver.**
 
diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build
index fa439b9e5..6c96b361a 100644
--- a/lib/librte_net/meson.build
+++ b/lib/librte_net/meson.build
@@ -24,18 +24,62 @@ deps += ['mbuf']
 if dpdk_conf.has('RTE_ARCH_X86_64')
 	net_crc_sse42_cpu_support = (
 		cc.get_define('__PCLMUL__', args: machine_args) != '')
+	net_crc_avx512_cpu_support = (
+		cc.get_define('__AVX512F__', args: machine_args) != '' and
+		cc.get_define('__AVX512BW__', args: machine_args) != '' and
+		cc.get_define('__AVX512DQ__', args: machine_args) != '' and
+		cc.get_define('__AVX512VL__', args: machine_args) != '' and
+		cc.get_define('__VPCLMULQDQ__', args: machine_args) != '')
+
 	net_crc_sse42_cc_support = (
 		cc.has_argument('-mpclmul') and cc.has_argument('-maes'))
+	net_crc_avx512_cc_support = (
+		not machine_args.contains('-mno-avx512f') and
+		cc.has_argument('-mavx512f') and
+		cc.has_argument('-mavx512bw') and
+		cc.has_argument('-mavx512dq') and
+		cc.has_argument('-mavx512vl') and
+		cc.has_argument('-mvpclmulqdq') and
+		cc.has_argument('-mavx2') and
+		cc.has_argument('-mavx'))
 
 	build_static_net_crc_sse42_lib = 0
+	build_static_net_crc_avx512_lib = 0
 
 	if net_crc_sse42_cpu_support == true
 		sources += files('net_crc_sse.c')
 		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
+		if net_crc_avx512_cpu_support == true
+			sources += files('net_crc_avx512.c')
+			cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT']
+		elif net_crc_avx512_cc_support == true
+			build_static_net_crc_avx512_lib = 1
+			net_crc_avx512_lib_cflags = ['-mavx512f',
+							'-mavx512bw',
+							'-mavx512dq',
+							'-mavx512vl',
+							'-mvpclmulqdq',
+							'-mavx2',
+							'-mavx']
+			cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT']
+		endif
 	elif net_crc_sse42_cc_support == true
 		build_static_net_crc_sse42_lib = 1
 		net_crc_sse42_lib_cflags = ['-mpclmul', '-maes']
 		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
+		if net_crc_avx512_cc_support == true
+			build_static_net_crc_avx512_lib = 1
+			net_crc_avx512_lib_cflags = ['-mpclmul',
+							'-maes',
+							'-mavx512f',
+							'-mavx512bw',
+							'-mavx512dq',
+							'-mavx512vl',
+							'-mvpclmulqdq',
+							'-mavx2',
+							'-mavx']
+			cflags += ['-DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT']
+		endif
 	endif
 
 	if build_static_net_crc_sse42_lib == 1
@@ -47,6 +91,17 @@ if dpdk_conf.has('RTE_ARCH_X86_64')
 						net_crc_sse42_lib_cflags])
 		objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c')
 	endif
+
+	if build_static_net_crc_avx512_lib == 1
+		net_crc_avx512_lib = static_library(
+					'net_crc_avx512_lib',
+					'net_crc_avx512.c',
+					dependencies: static_rte_eal,
+					c_args: [cflags,
+						net_crc_avx512_lib_cflags])
+		objs += net_crc_avx512_lib.extract_objects('net_crc_avx512.c')
+	endif
+
 elif (dpdk_conf.has('RTE_ARCH_ARM64') and
 		cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '')
 	sources += files('net_crc_neon.c')
diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h
index a1578a56c..7a74d5406 100644
--- a/lib/librte_net/net_crc.h
+++ b/lib/librte_net/net_crc.h
@@ -20,6 +20,17 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len);
 uint32_t
 rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len);
 
+/* AVX512 */
+
+void
+rte_net_crc_avx512_init(void);
+
+uint32_t
+rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len);
+
+uint32_t
+rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len);
+
 /* NEON */
 
 void
diff --git a/lib/librte_net/net_crc_avx512.c b/lib/librte_net/net_crc_avx512.c
new file mode 100644
index 000000000..3740fe3c9
--- /dev/null
+++ b/lib/librte_net/net_crc_avx512.c
@@ -0,0 +1,423 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) 2020 Intel Corporation
+ */
+
+#include <string.h>
+
+#include <rte_common.h>
+#include <rte_branch_prediction.h>
+#include <rte_cpuflags.h>
+
+#include "net_crc.h"
+
+#include <x86intrin.h>
+
+/* VPCLMULQDQ CRC computation context structure */
+struct crc_vpclmulqdq_ctx {
+	__m512i rk1_rk2;
+	__m512i rk3_rk4;
+	__m512i fold_7x128b;
+	__m512i fold_3x128b;
+	__m128i rk5_rk6;
+	__m128i rk7_rk8;
+	__m128i fold_1x128b;
+};
+
+static struct crc_vpclmulqdq_ctx crc32_eth __rte_aligned(64);
+static struct crc_vpclmulqdq_ctx crc16_ccitt __rte_aligned(64);
+
+static uint16_t byte_len_to_mask_table[] = {
+	0x0000, 0x0001, 0x0003, 0x0007,
+	0x000f, 0x001f, 0x003f, 0x007f,
+	0x00ff, 0x01ff, 0x03ff, 0x07ff,
+	0x0fff, 0x1fff, 0x3fff, 0x7fff,
+	0xffff};
+
+static const uint8_t shf_table[32] __rte_aligned(16) = {
+	0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
+	0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f
+};
+
+static const uint32_t mask[4] __rte_aligned(16) = {
+	0xffffffff, 0xffffffff, 0x00000000, 0x00000000
+};
+
+static const uint32_t mask2[4] __rte_aligned(16) = {
+	0x00000000, 0xffffffff, 0xffffffff, 0xffffffff
+};
+
+static __rte_always_inline __m512i
+crcr32_folding_round(__m512i data_block, __m512i precomp, __m512i fold)
+{
+	__m512i tmp0, tmp1;
+
+	tmp0 = _mm512_clmulepi64_epi128(fold, precomp, 0x01);
+	tmp1 = _mm512_clmulepi64_epi128(fold, precomp, 0x10);
+
+	return _mm512_ternarylogic_epi64(tmp0, tmp1, data_block, 0x96);
+}
+
+static __rte_always_inline __m128i
+crc32_fold_128(__m512i fold0, __m512i fold1,
+	const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i res, res2;
+	__m256i a;
+	__m512i tmp0, tmp1, tmp2, tmp3;
+	__m512i tmp4;
+
+	tmp0 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, 0x01);
+	tmp1 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b, 0x10);
+
+	res = _mm512_extracti64x2_epi64(fold1, 3);
+	tmp4 = _mm512_maskz_broadcast_i32x4(0xF, res);
+
+	tmp2 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, 0x01);
+	tmp3 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b, 0x10);
+
+	tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp1, tmp2, 0x96);
+	tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp3, tmp4, 0x96);
+
+	tmp1 = _mm512_shuffle_i64x2(tmp0, tmp0, 0x4e);
+
+	a = _mm256_xor_si256(*(__m256i *)&tmp1, *(__m256i *)&tmp0);
+	res = _mm256_extracti64x2_epi64(a, 1);
+	res2 = _mm_xor_si128(res, *(__m128i *)&a);
+
+	return res2;
+}
+
+static __rte_always_inline __m128i
+last_two_xmm(const uint8_t *data, uint32_t data_len, uint32_t n, __m128i res,
+	const struct crc_vpclmulqdq_ctx *params)
+{
+	uint32_t offset;
+	__m128i res2, res3, res4, pshufb_shf;
+
+	const uint32_t mask3[4] __rte_aligned(16) = {
+		   0x80808080, 0x80808080, 0x80808080, 0x80808080
+	};
+
+	res2 = res;
+	offset = data_len - n;
+	res3 = _mm_loadu_si128((const __m128i *)&data[n+offset-16]);
+
+	pshufb_shf = _mm_loadu_si128((const __m128i *)
+			(shf_table + (data_len-n)));
+
+	res = _mm_shuffle_epi8(res, pshufb_shf);
+	pshufb_shf = _mm_xor_si128(pshufb_shf,
+			_mm_load_si128((const __m128i *) mask3));
+	res2 = _mm_shuffle_epi8(res2, pshufb_shf);
+
+	res2 = _mm_blendv_epi8(res2, res3, pshufb_shf);
+
+	res4 = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x01);
+	res = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x10);
+	res = _mm_ternarylogic_epi64(res, res2, res4, 0x96);
+
+	return res;
+}
+
+static __rte_always_inline __m128i
+done_128(__m128i res, const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i res1;
+
+	res1 = res;
+
+	res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x0);
+	res1 = _mm_srli_si128(res1, 8);
+	res = _mm_xor_si128(res, res1);
+
+	res1 = res;
+	res = _mm_slli_si128(res, 4);
+	res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x10);
+	res = _mm_xor_si128(res, res1);
+
+	return res;
+}
+
+static __rte_always_inline uint32_t
+barrett_reduction(__m128i data64, const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i tmp0, tmp1;
+
+	data64 =  _mm_and_si128(data64, *(const __m128i *)mask2);
+	tmp0 = data64;
+	tmp1 = data64;
+
+	data64 = _mm_clmulepi64_si128(tmp0, params->rk7_rk8, 0x0);
+	data64 = _mm_ternarylogic_epi64(data64, tmp1, *(const __m128i *)mask,
+			0x28);
+
+	tmp1 = data64;
+	data64 = _mm_clmulepi64_si128(data64, params->rk7_rk8, 0x10);
+	data64 = _mm_ternarylogic_epi64(data64, tmp1, tmp0, 0x96);
+
+	return _mm_extract_epi32(data64, 2);
+}
+
+static __rte_always_inline void
+reduction_loop(__m128i *fold, int *len, const uint8_t *data, uint32_t *n,
+	const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i tmp, tmp1;
+
+	tmp = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x1);
+	*fold = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x10);
+	*fold = _mm_xor_si128(*fold, tmp);
+	tmp1 = _mm_loadu_si128((const __m128i *)&data[*n]);
+	*fold = _mm_xor_si128(*fold, tmp1);
+	*n += 16;
+	*len -= 16;
+}
+
+static __rte_always_inline uint32_t
+crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len, uint32_t crc,
+	const struct crc_vpclmulqdq_ctx *params)
+{
+	__m128i res, d, b;
+	__m512i temp, k;
+	__m512i qw0 = _mm512_set1_epi64(0), qw1, qw2, qw3;
+	__m512i fold0, fold1, fold2, fold3;
+	__mmask16 mask;
+	uint32_t n = 0;
+	int reduction = 0;
+
+	/* Get CRC init value */
+	b = _mm_cvtsi32_si128(crc);
+	temp = _mm512_castsi128_si512(b);
+
+	if (data_len > 255) {
+		fold0 = _mm512_loadu_si512((const __m512i *)data);
+		fold1 = _mm512_loadu_si512((const __m512i *)(data+64));
+		fold2 = _mm512_loadu_si512((const __m512i *)(data+128));
+		fold3 = _mm512_loadu_si512((const __m512i *)(data+192));
+		fold0 = _mm512_xor_si512(fold0, temp);
+
+		/* Main folding loop */
+		k = params->rk1_rk2;
+		for (n = 256; (n + 256) <= data_len; n += 256) {
+			qw0 = _mm512_loadu_si512((const __m512i *)&data[n]);
+			qw1 = _mm512_loadu_si512((const __m512i *)
+					&(data[n+64]));
+			qw2 = _mm512_loadu_si512((const __m512i *)
+					&(data[n+128]));
+			qw3 = _mm512_loadu_si512((const __m512i *)
+					&(data[n+192]));
+			fold0 = crcr32_folding_round(qw0, k, fold0);
+			fold1 = crcr32_folding_round(qw1, k, fold1);
+			fold2 = crcr32_folding_round(qw2, k, fold2);
+			fold3 = crcr32_folding_round(qw3, k, fold3);
+		}
+
+		/* 256 to 128 fold */
+		k = params->rk3_rk4;
+		fold0 = crcr32_folding_round(fold2, k, fold0);
+		fold1 = crcr32_folding_round(fold3, k, fold1);
+
+		res = crc32_fold_128(fold0, fold1, params);
+
+		reduction = 240 - ((n+256)-data_len);
+
+		while (reduction > 0)
+			reduction_loop(&res, &reduction, data, &n,
+					params);
+
+		reduction += 16;
+
+		if (n != data_len)
+			res = last_two_xmm(data, data_len, n, res,
+					params);
+	} else {
+		if (data_len > 31) {
+			res = _mm_cvtsi32_si128(crc);
+			d = _mm_loadu_si128((const __m128i *)data);
+			res = _mm_xor_si128(res, d);
+			n += 16;
+
+			reduction = 240 - ((n+256)-data_len);
+
+			while (reduction > 0)
+				reduction_loop(&res, &reduction, data, &n,
+						params);
+
+			if (n != data_len)
+				res = last_two_xmm(data, data_len, n, res,
+						params);
+		} else if (data_len > 16) {
+			res = _mm_cvtsi32_si128(crc);
+			d = _mm_loadu_si128((const __m128i *)data);
+			res = _mm_xor_si128(res, d);
+			n += 16;
+
+			if (n != data_len)
+				res = last_two_xmm(data, data_len, n, res,
+						params);
+		} else if (data_len == 16) {
+			res = _mm_cvtsi32_si128(crc);
+			d = _mm_loadu_si128((const __m128i *)data);
+			res = _mm_xor_si128(res, d);
+		} else {
+			res = _mm_cvtsi32_si128(crc);
+			mask = byte_len_to_mask_table[data_len];
+			d = _mm_maskz_loadu_epi8(mask, data);
+			res = _mm_xor_si128(res, d);
+
+			if (data_len > 3) {
+				d = _mm_loadu_si128((const __m128i *)
+						&shf_table[data_len]);
+				res = _mm_shuffle_epi8(res, d);
+			} else if (data_len > 2) {
+				res = _mm_slli_si128(res, 5);
+				goto do_barrett_reduction;
+			} else if (data_len > 1) {
+				res = _mm_slli_si128(res, 6);
+				goto do_barrett_reduction;
+			} else if (data_len > 0) {
+				res = _mm_slli_si128(res, 7);
+				goto do_barrett_reduction;
+			} else {
+				/* zero length case */
+				return crc;
+			}
+		}
+	}
+
+	res = done_128(res, params);
+
+do_barrett_reduction:
+	n = barrett_reduction(res, params);
+
+	return n;
+}
+
+static void
+crc32_load_init_constants(void)
+{
+	__m128i a;
+	/* fold constants */
+	uint64_t c0 = 0x00000000e95c1271;
+	uint64_t c1 = 0x00000000ce3371cb;
+	uint64_t c2 = 0x00000000910eeec1;
+	uint64_t c3 = 0x0000000033fff533;
+	uint64_t c4 = 0x000000000cbec0ed;
+	uint64_t c5 = 0x0000000031f8303f;
+	uint64_t c6 = 0x0000000057c54819;
+	uint64_t c7 = 0x00000000df068dc2;
+	uint64_t c8 = 0x00000000ae0b5394;
+	uint64_t c9 = 0x000000001c279815;
+	uint64_t c10 = 0x000000001d9513d7;
+	uint64_t c11 = 0x000000008f352d95;
+	uint64_t c12 = 0x00000000af449247;
+	uint64_t c13 = 0x000000003db1ecdc;
+	uint64_t c14 = 0x0000000081256527;
+	uint64_t c15 = 0x00000000f1da05aa;
+	uint64_t c16 = 0x00000000ccaa009e;
+	uint64_t c17 = 0x00000000ae689191;
+	uint64_t c18 = 0x00000000ccaa009e;
+	uint64_t c19 = 0x00000000b8bc6765;
+	uint64_t c20 = 0x00000001f7011640;
+	uint64_t c21 = 0x00000001db710640;
+
+	a = _mm_set_epi64x(c1, c0);
+	crc32_eth.rk1_rk2 = _mm512_broadcast_i32x4(a);
+
+	a = _mm_set_epi64x(c3, c2);
+	crc32_eth.rk3_rk4 = _mm512_broadcast_i32x4(a);
+
+	crc32_eth.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8,
+			c9, c10, c11);
+	crc32_eth.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15,
+			c16, c17, 0, 0);
+	crc32_eth.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16),
+			_mm_cvtsi64_m64(c17));
+
+	crc32_eth.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18),
+			_mm_cvtsi64_m64(c19));
+	crc32_eth.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20),
+			_mm_cvtsi64_m64(c21));
+}
+
+static void
+crc16_load_init_constants(void)
+{
+	__m128i a;
+	/* fold constants */
+	uint64_t c0 = 0x0000000000009a19;
+	uint64_t c1 = 0x0000000000002df8;
+	uint64_t c2 = 0x00000000000068af;
+	uint64_t c3 = 0x000000000000b6c9;
+	uint64_t c4 = 0x000000000000c64f;
+	uint64_t c5 = 0x000000000000cd95;
+	uint64_t c6 = 0x000000000000d341;
+	uint64_t c7 = 0x000000000000b8f2;
+	uint64_t c8 = 0x0000000000000842;
+	uint64_t c9 = 0x000000000000b072;
+	uint64_t c10 = 0x00000000000047e3;
+	uint64_t c11 = 0x000000000000922d;
+	uint64_t c12 = 0x0000000000000e3a;
+	uint64_t c13 = 0x0000000000004d7a;
+	uint64_t c14 = 0x0000000000005b44;
+	uint64_t c15 = 0x0000000000007762;
+	uint64_t c16 = 0x00000000000081bf;
+	uint64_t c17 = 0x0000000000008e10;
+	uint64_t c18 = 0x00000000000081bf;
+	uint64_t c19 = 0x0000000000001cbb;
+	uint64_t c20 = 0x000000011c581910;
+	uint64_t c21 = 0x0000000000010810;
+
+	a = _mm_set_epi64x(c1, c0);
+	crc16_ccitt.rk1_rk2 = _mm512_broadcast_i32x4(a);
+
+	a = _mm_set_epi64x(c3, c2);
+	crc16_ccitt.rk3_rk4 = _mm512_broadcast_i32x4(a);
+
+	crc16_ccitt.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8,
+			c9, c10, c11);
+	crc16_ccitt.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15,
+			c16, c17, 0, 0);
+	crc16_ccitt.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16),
+			_mm_cvtsi64_m64(c17));
+
+	crc16_ccitt.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18),
+			_mm_cvtsi64_m64(c19));
+	crc16_ccitt.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20),
+			_mm_cvtsi64_m64(c21));
+}
+
+void
+rte_net_crc_avx512_init(void)
+{
+	crc32_load_init_constants();
+	crc16_load_init_constants();
+
+	/*
+	 * Reset the register as following calculation may
+	 * use other data types such as float, double, etc.
+	 */
+	_mm_empty();
+}
+
+uint32_t
+rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len)
+{
+	/* return 16-bit CRC value */
+	return (uint16_t)~crc32_eth_calc_vpclmulqdq(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt);
+}
+
+uint32_t
+rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len)
+{
+	/* return 32-bit CRC value */
+	return ~crc32_eth_calc_vpclmulqdq(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth);
+}
diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c
index d271d5205..32a366590 100644
--- a/lib/librte_net/rte_net_crc.c
+++ b/lib/librte_net/rte_net_crc.c
@@ -37,6 +37,12 @@ static const rte_net_crc_handler handlers_scalar[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,
 };
+#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
+static const rte_net_crc_handler handlers_avx512[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_avx512_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_avx512_handler,
+};
+#endif
 #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
 static const rte_net_crc_handler handlers_sse42[] = {
 	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
@@ -134,6 +140,39 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len)
 		crc32_eth_lut);
 }
 
+/* AVX512/VPCLMULQDQ handling */
+
+#define AVX512_VPCLMULQDQ_CPU_SUPPORTED ( \
+	rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) && \
+	rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) && \
+	rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512DQ) && \
+	rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) && \
+	rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ) && \
+	rte_cpu_get_flag_enabled(RTE_CPUFLAG_VPCLMULQDQ) \
+)
+
+static const rte_net_crc_handler *
+avx512_vpclmulqdq_get_handlers(void)
+{
+#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
+	if (AVX512_VPCLMULQDQ_CPU_SUPPORTED)
+		return handlers_avx512;
+#endif
+	return NULL;
+}
+
+static uint8_t
+avx512_vpclmulqdq_init(void)
+{
+#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
+	if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) {
+		rte_net_crc_avx512_init();
+		return 1;
+	}
+#endif
+	return 0;
+}
+
 /* SSE4.2/PCLMULQDQ handling */
 
 #define SSE42_PCLMULQDQ_CPU_SUPPORTED \
@@ -196,6 +235,11 @@ rte_net_crc_set_alg(enum rte_net_crc_alg alg)
 	handlers = NULL;
 
 	switch (alg) {
+	case RTE_NET_CRC_AVX512:
+		handlers = avx512_vpclmulqdq_get_handlers();
+		if (handlers != NULL)
+			break;
+		/* fall-through */
 	case RTE_NET_CRC_SSE42:
 		handlers = sse42_pclmulqdq_get_handlers();
 		break; /* for x86, always break here */
@@ -235,6 +279,8 @@ RTE_INIT(rte_net_crc_init)
 
 	if (sse42_pclmulqdq_init())
 		alg = RTE_NET_CRC_SSE42;
+	if (avx512_vpclmulqdq_init())
+		alg = RTE_NET_CRC_AVX512;
 	if (neon_pmull_init())
 		alg = RTE_NET_CRC_NEON;
 
diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h
index 16e85ca97..72d3e10ff 100644
--- a/lib/librte_net/rte_net_crc.h
+++ b/lib/librte_net/rte_net_crc.h
@@ -1,5 +1,5 @@
 /* SPDX-License-Identifier: BSD-3-Clause
- * Copyright(c) 2017 Intel Corporation
+ * Copyright(c) 2017-2020 Intel Corporation
  */
 
 #ifndef _RTE_NET_CRC_H_
@@ -23,6 +23,7 @@ enum rte_net_crc_alg {
 	RTE_NET_CRC_SCALAR = 0,
 	RTE_NET_CRC_SSE42,
 	RTE_NET_CRC_NEON,
+	RTE_NET_CRC_AVX512,
 };
 
 /**
@@ -35,6 +36,7 @@ enum rte_net_crc_alg {
  *   - RTE_NET_CRC_SCALAR
  *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
  *   - RTE_NET_CRC_NEON (Use ARM Neon intrinsic)
+ *   - RTE_NET_CRC_AVX512 (Use 512-bit AVX intrinsic)
  */
 void
 rte_net_crc_set_alg(enum rte_net_crc_alg alg);
-- 
2.12.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection
  2020-10-07 14:59     ` Ananyev, Konstantin
@ 2020-10-09 14:04       ` Coyle, David
  2020-10-10 12:42         ` Ananyev, Konstantin
  0 siblings, 1 reply; 23+ messages in thread
From: Coyle, David @ 2020-10-09 14:04 UTC (permalink / raw)
  To: Ananyev, Konstantin, O'loingsigh, Mairtin, Singh, Jasvinder,
	Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, Ryan, Brendan, O'loingsigh, Mairtin

Hi Konstantin, thanks for your review

> -----Original Message-----
> From: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Sent: Wednesday, October 7, 2020 3:59 PM

<snip>

> 
> >
> > This patch adds support for run-time selection of the optimal
> > architecture-specific CRC path, based on the supported instruction
> > set(s) of the CPU.
> >
> > The compiler option checks have been moved from the C files to the
> > meson script. The rte_cpu_get_flag_enabled function is called
> > automatically by the library at process initialization time to
> > determine which instructions the CPU supports, with the most optimal
> > supported CRC path ultimately selected.
> >
> > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
> > Signed-off-by: David Coyle <david.coyle@intel.com>
> 
> LGTM, just one nit see below.
> With that:
> Series acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> 
> > ---
> >  doc/guides/rel_notes/release_20_11.rst            |  4 ++
> >  lib/librte_net/meson.build                        | 34 +++++++++++-
> >  lib/librte_net/net_crc.h                          | 34 ++++++++++++
> >  lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 +++------
> >  lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   | 34 ++++--------
> >  lib/librte_net/rte_net_crc.c                      | 67 ++++++++++++++---------
> >  6 files changed, 131 insertions(+), 68 deletions(-)  create mode
> > 100644 lib/librte_net/net_crc.h  rename lib/librte_net/{net_crc_neon.h
> > => net_crc_neon.c} (95%)  rename lib/librte_net/{net_crc_sse.h =>
> > net_crc_sse.c} (94%)
> >
> >

<snip>

> > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static uint8_t
> > +sse42_pclmulqdq_cpu_supported(void)
> > +{
> > +	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ);
> > +}
> 
> As a nit, I think it would be better to hide #fidef inside the function, and
> return an 0 when define is not set.
> Something like:
> 
> static int
> sse42_pclmulqdq_cpu_supported(void)
> {
> #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> 	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ);
> #else
> 	return 0;
> }
> 
> Same for other cpu_supported functions.
> And then you can remove these ifdefs in set_alg and other palces, i.e.:
> 
> void
> rte_net_crc_set_alg(enum rte_net_crc_alg alg) {
>         switch (alg) {
> #ifdef RTE_ARCH_X86_64
>         case RTE_NET_CRC_AVX512:
>                 if (avx512_vpclmulqdq_cpu_supported()) {
>                         handlers = handlers_avx512;
>                         break;
>                 }
>                 /* fall-through */
>         case RTE_NET_CRC_SSE42:
>                 if (sse42_pclmulqdq_cpu_supported()) {
>                         handlers = handlers_sse42;
>                         break;
>                 }
> #endif
> ...
> 
> Same for rte_net_crc_init()

[DC] I have reworked the ifdefs in this file based on your comments here and off-list discussions.
These are available now in the v5.

All ifdef's have been removed out the API function definitions and moved down into 'helper' type
functions - looks much cleaner now.

Your Ack has been carried through too to v5 as you mentioned

> 
> > +#endif
> > +
> > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> > +static uint8_t
> > +neon_pmull_cpu_supported(void)
> > +{
> > +	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL);
> > +}
> > +#endif
> > +
> >  void
> >  rte_net_crc_set_alg(enum rte_net_crc_alg alg)  {
> >  	switch (alg) {
> > -#ifdef X86_64_SSE42_PCLMULQDQ
> > +#ifdef RTE_ARCH_X86_64
> >  	case RTE_NET_CRC_SSE42:
> > -		handlers = handlers_sse42;
> > -		break;
> > -#elif defined ARM64_NEON_PMULL
> > -		/* fall-through */
> > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> > +		if (sse42_pclmulqdq_cpu_supported()) {
> > +			handlers = handlers_sse42;
> > +			break;
> > +		}
> > +#endif
> > +#endif /* RTE_ARCH_X86_64 */
> > +#ifdef RTE_ARCH_ARM64
> >  	case RTE_NET_CRC_NEON:
> > -		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> > +		if (neon_pmull_cpu_supported()) {
> >  			handlers = handlers_neon;
> >  			break;
> >  		}
> >  #endif
> > +#endif /* RTE_ARCH_ARM64 */
> >  		/* fall-through */
> >  	case RTE_NET_CRC_SCALAR:
> >  		/* fall-through */
> > @@ -188,11 +200,14 @@ RTE_INIT(rte_net_crc_init)
> >
> >  	rte_net_crc_scalar_init();
> >
> > -#ifdef X86_64_SSE42_PCLMULQDQ
> > -	alg = RTE_NET_CRC_SSE42;
> > -	rte_net_crc_sse42_init();
> > -#elif defined ARM64_NEON_PMULL
> > -	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> > +	if (sse42_pclmulqdq_cpu_supported()) {
> > +		alg = RTE_NET_CRC_SSE42;
> > +		rte_net_crc_sse42_init();
> > +	}
> > +#endif
> > +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> > +	if (neon_pmull_cpu_supported()) {
> >  		alg = RTE_NET_CRC_NEON;
> >  		rte_net_crc_neon_init();
> >  	}
> > --
> > 2.12.3


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection
  2020-10-09 13:50     ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
@ 2020-10-09 16:22       ` Singh, Jasvinder
  2020-10-10  9:34       ` Ruifeng Wang
  2020-10-13  9:07       ` Bruce Richardson
  2 siblings, 0 replies; 23+ messages in thread
From: Singh, Jasvinder @ 2020-10-09 16:22 UTC (permalink / raw)
  To: O'loingsigh, Mairtin, Richardson, Bruce, De Lara Guarch,
	Pablo, Ananyev, Konstantin
  Cc: dev, Ryan, Brendan, Coyle, David



> -----Original Message-----
> From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>
> Sent: Friday, October 9, 2020 2:51 PM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; O'loingsigh,
> Mairtin <mairtin.oloingsigh@intel.com>; Coyle, David
> <david.coyle@intel.com>
> Subject: [PATCH v5 1/2] net: add run-time architecture specific CRC selection
> 
> This patch adds support for run-time selection of the optimal architecture-
> specific CRC path, based on the supported instruction set(s) of the CPU.
> 
> The compiler option checks have been moved from the C files to the meson
> script. The rte_cpu_get_flag_enabled function is called automatically by the
> library at process initialization time to determine which instructions the CPU
> supports, with the most optimal supported CRC path ultimately selected.
> 
> Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
> Signed-off-by: David Coyle <david.coyle@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  doc/guides/rel_notes/release_20_11.rst            |   4 +
>  lib/librte_net/meson.build                        |  34 ++++++-
>  lib/librte_net/net_crc.h                          |  34 +++++++
>  lib/librte_net/{net_crc_neon.h => net_crc_neon.c} |  26 ++---
>  lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   |  34 ++-----
>  lib/librte_net/rte_net_crc.c                      | 116 +++++++++++++++-------
>  6 files changed, 168 insertions(+), 80 deletions(-)  create mode 100644
> lib/librte_net/net_crc.h  rename lib/librte_net/{net_crc_neon.h =>
> net_crc_neon.c} (95%)  rename lib/librte_net/{net_crc_sse.h =>
> net_crc_sse.c} (94%)
> 
> diff --git a/doc/guides/rel_notes/release_20_11.rst
> b/doc/guides/rel_notes/release_20_11.rst
> index 808bdc4e5..b77297f7e 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -55,6 +55,10 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
> 
> +* **Updated CRC modules of rte_net library.**
> +
> +  * Added run-time selection of the optimal architecture-specific CRC path.
> +
>  * **Updated Broadcom bnxt driver.**
> 
>    Updated the Broadcom bnxt driver with new features and improvements,
> including:
> diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index
> 24ed8253b..fa439b9e5 100644
> --- a/lib/librte_net/meson.build
> +++ b/lib/librte_net/meson.build
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017 Intel
> Corporation
> +# Copyright(c) 2017-2020 Intel Corporation
> 
>  headers = files('rte_ip.h',
>  	'rte_tcp.h',
> @@ -20,3 +20,35 @@ headers = files('rte_ip.h',
> 
>  sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c')  deps +=
> ['mbuf']
> +
> +if dpdk_conf.has('RTE_ARCH_X86_64')
> +	net_crc_sse42_cpu_support = (
> +		cc.get_define('__PCLMUL__', args: machine_args) != '')
> +	net_crc_sse42_cc_support = (
> +		cc.has_argument('-mpclmul') and cc.has_argument('-maes'))
> +
> +	build_static_net_crc_sse42_lib = 0
> +
> +	if net_crc_sse42_cpu_support == true
> +		sources += files('net_crc_sse.c')
> +		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
> +	elif net_crc_sse42_cc_support == true
> +		build_static_net_crc_sse42_lib = 1
> +		net_crc_sse42_lib_cflags = ['-mpclmul', '-maes']
> +		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
> +	endif
> +
> +	if build_static_net_crc_sse42_lib == 1
> +		net_crc_sse42_lib = static_library(
> +					'net_crc_sse42_lib',
> +					'net_crc_sse.c',
> +					dependencies: static_rte_eal,
> +					c_args: [cflags,
> +						net_crc_sse42_lib_cflags])
> +		objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c')
> +	endif
> +elif (dpdk_conf.has('RTE_ARCH_ARM64') and
> +		cc.get_define('__ARM_FEATURE_CRYPTO', args:
> machine_args) != '')
> +	sources += files('net_crc_neon.c')
> +	cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT'] endif
> diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h new file mode
> 100644 index 000000000..a1578a56c
> --- /dev/null
> +++ b/lib/librte_net/net_crc.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#ifndef _NET_CRC_H_
> +#define _NET_CRC_H_
> +
> +/*
> + * Different implementations of CRC
> + */
> +
> +/* SSE4.2 */
> +
> +void
> +rte_net_crc_sse42_init(void);
> +
> +uint32_t
> +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len);
> +
> +uint32_t
> +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len);
> +
> +/* NEON */
> +
> +void
> +rte_net_crc_neon_init(void);
> +
> +uint32_t
> +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len);
> +
> +uint32_t
> +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len);
> +
> +#endif /* _NET_CRC_H_ */
> diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c
> similarity index 95% rename from lib/librte_net/net_crc_neon.h rename to
> lib/librte_net/net_crc_neon.c index 63fa1d4a1..f61d75a8c 100644
> --- a/lib/librte_net/net_crc_neon.h
> +++ b/lib/librte_net/net_crc_neon.c
> @@ -2,17 +2,15 @@
>   * Copyright(c) 2017 Cavium, Inc
>   */
> 
> -#ifndef _NET_CRC_NEON_H_
> -#define _NET_CRC_NEON_H_
> +#include <string.h>
> 
> +#include <rte_common.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_net_crc.h>
>  #include <rte_vect.h>
>  #include <rte_cpuflags.h>
> 
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> +#include "net_crc.h"
> 
>  /** PMULL CRC computation context structure */  struct crc_pmull_ctx { @@
> -218,7 +216,7 @@ crc32_eth_calc_pmull(
>  	return n;
>  }
> 
> -static inline void
> +void
>  rte_net_crc_neon_init(void)
>  {
>  	/* Initialize CRC16 data */
> @@ -242,9 +240,8 @@ rte_net_crc_neon_init(void)
>  	crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8);  }
> 
> -static inline uint32_t
> -rte_crc16_ccitt_neon_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	return (uint16_t)~crc32_eth_calc_pmull(data,
>  		data_len,
> @@ -252,18 +249,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data,
>  		&crc16_ccitt_pmull);
>  }
> 
> -static inline uint32_t
> -rte_crc32_eth_neon_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	return ~crc32_eth_calc_pmull(data,
>  		data_len,
>  		0xffffffffUL,
>  		&crc32_eth_pmull);
>  }
> -
> -#ifdef __cplusplus
> -}
> -#endif
> -
> -#endif /* _NET_CRC_NEON_H_ */
> diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c
> similarity index 94% rename from lib/librte_net/net_crc_sse.h rename to
> lib/librte_net/net_crc_sse.c index 1c7b7a548..053b54b39 100644
> --- a/lib/librte_net/net_crc_sse.h
> +++ b/lib/librte_net/net_crc_sse.c
> @@ -1,18 +1,16 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2017 Intel Corporation
> + * Copyright(c) 2017-2020 Intel Corporation
>   */
> 
> -#ifndef _RTE_NET_CRC_SSE_H_
> -#define _RTE_NET_CRC_SSE_H_
> +#include <string.h>
> 
> +#include <rte_common.h>
>  #include <rte_branch_prediction.h>
> +#include <rte_cpuflags.h>
> 
> -#include <x86intrin.h>
> -#include <cpuid.h>
> +#include "net_crc.h"
> 
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> +#include <x86intrin.h>
> 
>  /** PCLMULQDQ CRC computation context structure */  struct
> crc_pclmulqdq_ctx { @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq(
>  	return n;
>  }
> 
> -
> -static inline void
> +void
>  rte_net_crc_sse42_init(void)
>  {
>  	uint64_t k1, k2, k5, k6;
> @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void)
>  	 * use other data types such as float, double, etc.
>  	 */
>  	_mm_empty();
> -
>  }
> 
> -static inline uint32_t
> -rte_crc16_ccitt_sse42_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	/** return 16-bit CRC value */
>  	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
> @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t
> *data,
>  		&crc16_ccitt_pclmulqdq);
>  }
> 
> -static inline uint32_t
> -rte_crc32_eth_sse42_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	return ~crc32_eth_calc_pclmulqdq(data,
>  		data_len,
>  		0xffffffffUL,
>  		&crc32_eth_pclmulqdq);
>  }
> -
> -#ifdef __cplusplus
> -}
> -#endif
> -
> -#endif /* _RTE_NET_CRC_SSE_H_ */
> diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index
> 4f5b9e828..d271d5205 100644
> --- a/lib/librte_net/rte_net_crc.c
> +++ b/lib/librte_net/rte_net_crc.c
> @@ -1,5 +1,5 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2017 Intel Corporation
> + * Copyright(c) 2017-2020 Intel Corporation
>   */
> 
>  #include <stddef.h>
> @@ -10,17 +10,7 @@
>  #include <rte_common.h>
>  #include <rte_net_crc.h>
> 
> -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__)
> -#define X86_64_SSE42_PCLMULQDQ     1
> -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO)
> -#define ARM64_NEON_PMULL           1
> -#endif
> -
> -#ifdef X86_64_SSE42_PCLMULQDQ
> -#include <net_crc_sse.h>
> -#elif defined ARM64_NEON_PMULL
> -#include <net_crc_neon.h>
> -#endif
> +#include "net_crc.h"
> 
>  /** CRC polynomials */
>  #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL @@ -41,25 +31,27 @@
> rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len);  typedef
> uint32_t  (*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len);
> 
> -static rte_net_crc_handler *handlers;
> +static const rte_net_crc_handler *handlers;
> 
> -static rte_net_crc_handler handlers_scalar[] = {
> +static const rte_net_crc_handler handlers_scalar[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,  };
> -
> -#ifdef X86_64_SSE42_PCLMULQDQ
> -static rte_net_crc_handler handlers_sse42[] = {
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static const
> +rte_net_crc_handler handlers_sse42[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,  }; -#elif
> defined ARM64_NEON_PMULL -static rte_net_crc_handler handlers_neon[] =
> {
> +#endif
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> +static const rte_net_crc_handler handlers_neon[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler,  };  #endif
> 
> +/* Scalar handling */
> +
>  /**
>   * Reflect the bits about the middle
>   *
> @@ -142,29 +134,82 @@ rte_crc32_eth_handler(const uint8_t *data,
> uint32_t data_len)
>  		crc32_eth_lut);
>  }
> 
> +/* SSE4.2/PCLMULQDQ handling */
> +
> +#define SSE42_PCLMULQDQ_CPU_SUPPORTED \
> +	rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ)
> +
> +static const rte_net_crc_handler *
> +sse42_pclmulqdq_get_handlers(void)
> +{
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> +	if (SSE42_PCLMULQDQ_CPU_SUPPORTED)
> +		return handlers_sse42;
> +#endif
> +	return NULL;
> +}
> +
> +static uint8_t
> +sse42_pclmulqdq_init(void)
> +{
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> +	if (SSE42_PCLMULQDQ_CPU_SUPPORTED) {
> +		rte_net_crc_sse42_init();
> +		return 1;
> +	}
> +#endif
> +	return 0;
> +}
> +
> +/* NEON/PMULL handling */
> +
> +#define NEON_PMULL_CPU_SUPPORTED \
> +	rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)
> +
> +static const rte_net_crc_handler *
> +neon_pmull_get_handlers(void)
> +{
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> +	if (NEON_PMULL_CPU_SUPPORTED)
> +		return handlers_neon;
> +#endif
> +	return NULL;
> +}
> +
> +static uint8_t
> +neon_pmull_init(void)
> +{
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> +	if (NEON_PMULL_CPU_SUPPORTED) {
> +		rte_net_crc_neon_init();
> +		return 1;
> +	}
> +#endif
> +	return 0;
> +}
> +
> +/* Public API */
> +
>  void
>  rte_net_crc_set_alg(enum rte_net_crc_alg alg)  {
> +	handlers = NULL;
> +
>  	switch (alg) {
> -#ifdef X86_64_SSE42_PCLMULQDQ
>  	case RTE_NET_CRC_SSE42:
> -		handlers = handlers_sse42;
> -		break;
> -#elif defined ARM64_NEON_PMULL
> -		/* fall-through */
> +		handlers = sse42_pclmulqdq_get_handlers();
> +		break; /* for x86, always break here */
>  	case RTE_NET_CRC_NEON:
> -		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> -			handlers = handlers_neon;
> -			break;
> -		}
> -#endif
> +		handlers = neon_pmull_get_handlers();
>  		/* fall-through */
>  	case RTE_NET_CRC_SCALAR:
>  		/* fall-through */
>  	default:
> -		handlers = handlers_scalar;
>  		break;
>  	}
> +
> +	if (handlers == NULL)
> +		handlers = handlers_scalar;
>  }
> 
>  uint32_t
> @@ -188,15 +233,10 @@ RTE_INIT(rte_net_crc_init)
> 
>  	rte_net_crc_scalar_init();
> 
> -#ifdef X86_64_SSE42_PCLMULQDQ
> -	alg = RTE_NET_CRC_SSE42;
> -	rte_net_crc_sse42_init();
> -#elif defined ARM64_NEON_PMULL
> -	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> +	if (sse42_pclmulqdq_init())
> +		alg = RTE_NET_CRC_SSE42;
> +	if (neon_pmull_init())
>  		alg = RTE_NET_CRC_NEON;
> -		rte_net_crc_neon_init();
> -	}
> -#endif
> 
>  	rte_net_crc_set_alg(alg);
>  }
> --
> 2.12.3


Reviewed-by: Jasvinder Singh <jasvinder.singh@intel.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC
  2020-10-09 13:50     ` [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
@ 2020-10-09 16:24       ` Singh, Jasvinder
  0 siblings, 0 replies; 23+ messages in thread
From: Singh, Jasvinder @ 2020-10-09 16:24 UTC (permalink / raw)
  To: O'loingsigh, Mairtin, Richardson, Bruce, De Lara Guarch,
	Pablo, Ananyev, Konstantin
  Cc: dev, Ryan, Brendan, Coyle, David



> -----Original Message-----
> From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>
> Sent: Friday, October 9, 2020 2:51 PM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; O'loingsigh,
> Mairtin <mairtin.oloingsigh@intel.com>; Coyle, David
> <david.coyle@intel.com>
> Subject: [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based
> CRC
> 
> This patch enables the optimized calculation of CRC32-Ethernet and CRC16-
> CCITT using the AVX512 and VPCLMULQDQ instruction sets. This CRC
> implementation is built if the compiler supports the required instruction sets.
> It is selected at run-time if the host CPU, again, supports the required
> instruction sets.
> 
> Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
> Signed-off-by: David Coyle <david.coyle@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  app/test/test_crc.c                    |  11 +-
>  config/x86/meson.build                 |   6 +-
>  doc/guides/rel_notes/release_20_11.rst |   2 +
>  lib/librte_net/meson.build             |  55 +++++
>  lib/librte_net/net_crc.h               |  11 +
>  lib/librte_net/net_crc_avx512.c        | 423
> +++++++++++++++++++++++++++++++++
>  lib/librte_net/rte_net_crc.c           |  46 ++++
>  lib/librte_net/rte_net_crc.h           |   4 +-
>  8 files changed, 554 insertions(+), 4 deletions(-)  create mode 100644
> lib/librte_net/net_crc_avx512.c
> 
> diff --git a/app/test/test_crc.c b/app/test/test_crc.c index
> f8a74e04e..bf1d34435 100644
> --- a/app/test/test_crc.c
> +++ b/app/test/test_crc.c
> @@ -1,5 +1,5 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2017 Intel Corporation
> + * Copyright(c) 2017-2020 Intel Corporation
>   */
> 
>  #include "test.h"
> @@ -149,6 +149,15 @@ test_crc(void)
>  		return ret;
>  	}
> 
> +	/* set CRC avx512 mode */
> +	rte_net_crc_set_alg(RTE_NET_CRC_AVX512);
> +
> +	ret = test_crc_calc();
> +	if (ret < 0) {
> +		printf("test crc (x86_64 AVX512): failed (%d)\n", ret);
> +		return ret;
> +	}
> +
>  	/* set CRC neon mode */
>  	rte_net_crc_set_alg(RTE_NET_CRC_NEON);
> 
> diff --git a/config/x86/meson.build b/config/x86/meson.build index
> fea4d5403..172b72b72 100644
> --- a/config/x86/meson.build
> +++ b/config/x86/meson.build
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017-2019 Intel
> Corporation
> +# Copyright(c) 2017-2020 Intel Corporation
> 
>  # get binutils version for the workaround of Bug 97  if not is_windows @@ -
> 23,7 +23,9 @@ endforeach
> 
>  optional_flags = ['AES', 'PCLMUL',
>  		'AVX', 'AVX2', 'AVX512F',
> -		'RDRND', 'RDSEED']
> +		'RDRND', 'RDSEED',
> +		'AVX512BW', 'AVX512DQ',
> +		'AVX512VL', 'VPCLMULQDQ']
>  foreach f:optional_flags
>  	if cc.get_define('__@0@__'.format(f), args: machine_args) == '1'
>  		if f == 'PCLMUL' # special case flags with different defines diff
> --git a/doc/guides/rel_notes/release_20_11.rst
> b/doc/guides/rel_notes/release_20_11.rst
> index b77297f7e..5eda680d5 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -58,6 +58,8 @@ New Features
>  * **Updated CRC modules of rte_net library.**
> 
>    * Added run-time selection of the optimal architecture-specific CRC path.
> +  * Added optimized implementations of CRC32-Ethernet and CRC16-CCITT
> +    using the AVX512 and VPCLMULQDQ instruction sets.
> 
>  * **Updated Broadcom bnxt driver.**
> 
> diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index
> fa439b9e5..6c96b361a 100644
> --- a/lib/librte_net/meson.build
> +++ b/lib/librte_net/meson.build
> @@ -24,18 +24,62 @@ deps += ['mbuf']
>  if dpdk_conf.has('RTE_ARCH_X86_64')
>  	net_crc_sse42_cpu_support = (
>  		cc.get_define('__PCLMUL__', args: machine_args) != '')
> +	net_crc_avx512_cpu_support = (
> +		cc.get_define('__AVX512F__', args: machine_args) != '' and
> +		cc.get_define('__AVX512BW__', args: machine_args) != '' and
> +		cc.get_define('__AVX512DQ__', args: machine_args) != '' and
> +		cc.get_define('__AVX512VL__', args: machine_args) != '' and
> +		cc.get_define('__VPCLMULQDQ__', args: machine_args) != '')
> +
>  	net_crc_sse42_cc_support = (
>  		cc.has_argument('-mpclmul') and cc.has_argument('-maes'))
> +	net_crc_avx512_cc_support = (
> +		not machine_args.contains('-mno-avx512f') and
> +		cc.has_argument('-mavx512f') and
> +		cc.has_argument('-mavx512bw') and
> +		cc.has_argument('-mavx512dq') and
> +		cc.has_argument('-mavx512vl') and
> +		cc.has_argument('-mvpclmulqdq') and
> +		cc.has_argument('-mavx2') and
> +		cc.has_argument('-mavx'))
> 
>  	build_static_net_crc_sse42_lib = 0
> +	build_static_net_crc_avx512_lib = 0
> 
>  	if net_crc_sse42_cpu_support == true
>  		sources += files('net_crc_sse.c')
>  		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
> +		if net_crc_avx512_cpu_support == true
> +			sources += files('net_crc_avx512.c')
> +			cflags += ['-
> DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT']
> +		elif net_crc_avx512_cc_support == true
> +			build_static_net_crc_avx512_lib = 1
> +			net_crc_avx512_lib_cflags = ['-mavx512f',
> +							'-mavx512bw',
> +							'-mavx512dq',
> +							'-mavx512vl',
> +							'-mvpclmulqdq',
> +							'-mavx2',
> +							'-mavx']
> +			cflags += ['-
> DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT']
> +		endif
>  	elif net_crc_sse42_cc_support == true
>  		build_static_net_crc_sse42_lib = 1
>  		net_crc_sse42_lib_cflags = ['-mpclmul', '-maes']
>  		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
> +		if net_crc_avx512_cc_support == true
> +			build_static_net_crc_avx512_lib = 1
> +			net_crc_avx512_lib_cflags = ['-mpclmul',
> +							'-maes',
> +							'-mavx512f',
> +							'-mavx512bw',
> +							'-mavx512dq',
> +							'-mavx512vl',
> +							'-mvpclmulqdq',
> +							'-mavx2',
> +							'-mavx']
> +			cflags += ['-
> DCC_X86_64_AVX512_VPCLMULQDQ_SUPPORT']
> +		endif
>  	endif
> 
>  	if build_static_net_crc_sse42_lib == 1 @@ -47,6 +91,17 @@ if
> dpdk_conf.has('RTE_ARCH_X86_64')
>  						net_crc_sse42_lib_cflags])
>  		objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c')
>  	endif
> +
> +	if build_static_net_crc_avx512_lib == 1
> +		net_crc_avx512_lib = static_library(
> +					'net_crc_avx512_lib',
> +					'net_crc_avx512.c',
> +					dependencies: static_rte_eal,
> +					c_args: [cflags,
> +						net_crc_avx512_lib_cflags])
> +		objs +=
> net_crc_avx512_lib.extract_objects('net_crc_avx512.c')
> +	endif
> +
>  elif (dpdk_conf.has('RTE_ARCH_ARM64') and
>  		cc.get_define('__ARM_FEATURE_CRYPTO', args:
> machine_args) != '')
>  	sources += files('net_crc_neon.c')
> diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h index
> a1578a56c..7a74d5406 100644
> --- a/lib/librte_net/net_crc.h
> +++ b/lib/librte_net/net_crc.h
> @@ -20,6 +20,17 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data,
> uint32_t data_len);  uint32_t  rte_crc32_eth_sse42_handler(const uint8_t
> *data, uint32_t data_len);
> 
> +/* AVX512 */
> +
> +void
> +rte_net_crc_avx512_init(void);
> +
> +uint32_t
> +rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len);
> +
> +uint32_t
> +rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len);
> +
>  /* NEON */
> 
>  void
> diff --git a/lib/librte_net/net_crc_avx512.c b/lib/librte_net/net_crc_avx512.c
> new file mode 100644 index 000000000..3740fe3c9
> --- /dev/null
> +++ b/lib/librte_net/net_crc_avx512.c
> @@ -0,0 +1,423 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#include <string.h>
> +
> +#include <rte_common.h>
> +#include <rte_branch_prediction.h>
> +#include <rte_cpuflags.h>
> +
> +#include "net_crc.h"
> +
> +#include <x86intrin.h>
> +
> +/* VPCLMULQDQ CRC computation context structure */ struct
> +crc_vpclmulqdq_ctx {
> +	__m512i rk1_rk2;
> +	__m512i rk3_rk4;
> +	__m512i fold_7x128b;
> +	__m512i fold_3x128b;
> +	__m128i rk5_rk6;
> +	__m128i rk7_rk8;
> +	__m128i fold_1x128b;
> +};
> +
> +static struct crc_vpclmulqdq_ctx crc32_eth __rte_aligned(64); static
> +struct crc_vpclmulqdq_ctx crc16_ccitt __rte_aligned(64);
> +
> +static uint16_t byte_len_to_mask_table[] = {
> +	0x0000, 0x0001, 0x0003, 0x0007,
> +	0x000f, 0x001f, 0x003f, 0x007f,
> +	0x00ff, 0x01ff, 0x03ff, 0x07ff,
> +	0x0fff, 0x1fff, 0x3fff, 0x7fff,
> +	0xffff};
> +
> +static const uint8_t shf_table[32] __rte_aligned(16) = {
> +	0x00, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
> +	0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f,
> +	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
> +	0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f };
> +
> +static const uint32_t mask[4] __rte_aligned(16) = {
> +	0xffffffff, 0xffffffff, 0x00000000, 0x00000000 };
> +
> +static const uint32_t mask2[4] __rte_aligned(16) = {
> +	0x00000000, 0xffffffff, 0xffffffff, 0xffffffff };
> +
> +static __rte_always_inline __m512i
> +crcr32_folding_round(__m512i data_block, __m512i precomp, __m512i
> fold)
> +{
> +	__m512i tmp0, tmp1;
> +
> +	tmp0 = _mm512_clmulepi64_epi128(fold, precomp, 0x01);
> +	tmp1 = _mm512_clmulepi64_epi128(fold, precomp, 0x10);
> +
> +	return _mm512_ternarylogic_epi64(tmp0, tmp1, data_block, 0x96); }
> +
> +static __rte_always_inline __m128i
> +crc32_fold_128(__m512i fold0, __m512i fold1,
> +	const struct crc_vpclmulqdq_ctx *params) {
> +	__m128i res, res2;
> +	__m256i a;
> +	__m512i tmp0, tmp1, tmp2, tmp3;
> +	__m512i tmp4;
> +
> +	tmp0 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b,
> 0x01);
> +	tmp1 = _mm512_clmulepi64_epi128(fold0, params->fold_7x128b,
> 0x10);
> +
> +	res = _mm512_extracti64x2_epi64(fold1, 3);
> +	tmp4 = _mm512_maskz_broadcast_i32x4(0xF, res);
> +
> +	tmp2 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b,
> 0x01);
> +	tmp3 = _mm512_clmulepi64_epi128(fold1, params->fold_3x128b,
> 0x10);
> +
> +	tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp1, tmp2, 0x96);
> +	tmp0 = _mm512_ternarylogic_epi64(tmp0, tmp3, tmp4, 0x96);
> +
> +	tmp1 = _mm512_shuffle_i64x2(tmp0, tmp0, 0x4e);
> +
> +	a = _mm256_xor_si256(*(__m256i *)&tmp1, *(__m256i *)&tmp0);
> +	res = _mm256_extracti64x2_epi64(a, 1);
> +	res2 = _mm_xor_si128(res, *(__m128i *)&a);
> +
> +	return res2;
> +}
> +
> +static __rte_always_inline __m128i
> +last_two_xmm(const uint8_t *data, uint32_t data_len, uint32_t n, __m128i
> res,
> +	const struct crc_vpclmulqdq_ctx *params) {
> +	uint32_t offset;
> +	__m128i res2, res3, res4, pshufb_shf;
> +
> +	const uint32_t mask3[4] __rte_aligned(16) = {
> +		   0x80808080, 0x80808080, 0x80808080, 0x80808080
> +	};
> +
> +	res2 = res;
> +	offset = data_len - n;
> +	res3 = _mm_loadu_si128((const __m128i *)&data[n+offset-16]);
> +
> +	pshufb_shf = _mm_loadu_si128((const __m128i *)
> +			(shf_table + (data_len-n)));
> +
> +	res = _mm_shuffle_epi8(res, pshufb_shf);
> +	pshufb_shf = _mm_xor_si128(pshufb_shf,
> +			_mm_load_si128((const __m128i *) mask3));
> +	res2 = _mm_shuffle_epi8(res2, pshufb_shf);
> +
> +	res2 = _mm_blendv_epi8(res2, res3, pshufb_shf);
> +
> +	res4 = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x01);
> +	res = _mm_clmulepi64_si128(res, params->fold_1x128b, 0x10);
> +	res = _mm_ternarylogic_epi64(res, res2, res4, 0x96);
> +
> +	return res;
> +}
> +
> +static __rte_always_inline __m128i
> +done_128(__m128i res, const struct crc_vpclmulqdq_ctx *params) {
> +	__m128i res1;
> +
> +	res1 = res;
> +
> +	res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x0);
> +	res1 = _mm_srli_si128(res1, 8);
> +	res = _mm_xor_si128(res, res1);
> +
> +	res1 = res;
> +	res = _mm_slli_si128(res, 4);
> +	res = _mm_clmulepi64_si128(res, params->rk5_rk6, 0x10);
> +	res = _mm_xor_si128(res, res1);
> +
> +	return res;
> +}
> +
> +static __rte_always_inline uint32_t
> +barrett_reduction(__m128i data64, const struct crc_vpclmulqdq_ctx
> +*params) {
> +	__m128i tmp0, tmp1;
> +
> +	data64 =  _mm_and_si128(data64, *(const __m128i *)mask2);
> +	tmp0 = data64;
> +	tmp1 = data64;
> +
> +	data64 = _mm_clmulepi64_si128(tmp0, params->rk7_rk8, 0x0);
> +	data64 = _mm_ternarylogic_epi64(data64, tmp1, *(const __m128i
> *)mask,
> +			0x28);
> +
> +	tmp1 = data64;
> +	data64 = _mm_clmulepi64_si128(data64, params->rk7_rk8, 0x10);
> +	data64 = _mm_ternarylogic_epi64(data64, tmp1, tmp0, 0x96);
> +
> +	return _mm_extract_epi32(data64, 2);
> +}
> +
> +static __rte_always_inline void
> +reduction_loop(__m128i *fold, int *len, const uint8_t *data, uint32_t *n,
> +	const struct crc_vpclmulqdq_ctx *params) {
> +	__m128i tmp, tmp1;
> +
> +	tmp = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x1);
> +	*fold = _mm_clmulepi64_si128(*fold, params->fold_1x128b, 0x10);
> +	*fold = _mm_xor_si128(*fold, tmp);
> +	tmp1 = _mm_loadu_si128((const __m128i *)&data[*n]);
> +	*fold = _mm_xor_si128(*fold, tmp1);
> +	*n += 16;
> +	*len -= 16;
> +}
> +
> +static __rte_always_inline uint32_t
> +crc32_eth_calc_vpclmulqdq(const uint8_t *data, uint32_t data_len, uint32_t
> crc,
> +	const struct crc_vpclmulqdq_ctx *params) {
> +	__m128i res, d, b;
> +	__m512i temp, k;
> +	__m512i qw0 = _mm512_set1_epi64(0), qw1, qw2, qw3;
> +	__m512i fold0, fold1, fold2, fold3;
> +	__mmask16 mask;
> +	uint32_t n = 0;
> +	int reduction = 0;
> +
> +	/* Get CRC init value */
> +	b = _mm_cvtsi32_si128(crc);
> +	temp = _mm512_castsi128_si512(b);
> +
> +	if (data_len > 255) {
> +		fold0 = _mm512_loadu_si512((const __m512i *)data);
> +		fold1 = _mm512_loadu_si512((const __m512i *)(data+64));
> +		fold2 = _mm512_loadu_si512((const __m512i *)(data+128));
> +		fold3 = _mm512_loadu_si512((const __m512i *)(data+192));
> +		fold0 = _mm512_xor_si512(fold0, temp);
> +
> +		/* Main folding loop */
> +		k = params->rk1_rk2;
> +		for (n = 256; (n + 256) <= data_len; n += 256) {
> +			qw0 = _mm512_loadu_si512((const __m512i
> *)&data[n]);
> +			qw1 = _mm512_loadu_si512((const __m512i *)
> +					&(data[n+64]));
> +			qw2 = _mm512_loadu_si512((const __m512i *)
> +					&(data[n+128]));
> +			qw3 = _mm512_loadu_si512((const __m512i *)
> +					&(data[n+192]));
> +			fold0 = crcr32_folding_round(qw0, k, fold0);
> +			fold1 = crcr32_folding_round(qw1, k, fold1);
> +			fold2 = crcr32_folding_round(qw2, k, fold2);
> +			fold3 = crcr32_folding_round(qw3, k, fold3);
> +		}
> +
> +		/* 256 to 128 fold */
> +		k = params->rk3_rk4;
> +		fold0 = crcr32_folding_round(fold2, k, fold0);
> +		fold1 = crcr32_folding_round(fold3, k, fold1);
> +
> +		res = crc32_fold_128(fold0, fold1, params);
> +
> +		reduction = 240 - ((n+256)-data_len);
> +
> +		while (reduction > 0)
> +			reduction_loop(&res, &reduction, data, &n,
> +					params);
> +
> +		reduction += 16;
> +
> +		if (n != data_len)
> +			res = last_two_xmm(data, data_len, n, res,
> +					params);
> +	} else {
> +		if (data_len > 31) {
> +			res = _mm_cvtsi32_si128(crc);
> +			d = _mm_loadu_si128((const __m128i *)data);
> +			res = _mm_xor_si128(res, d);
> +			n += 16;
> +
> +			reduction = 240 - ((n+256)-data_len);
> +
> +			while (reduction > 0)
> +				reduction_loop(&res, &reduction, data, &n,
> +						params);
> +
> +			if (n != data_len)
> +				res = last_two_xmm(data, data_len, n, res,
> +						params);
> +		} else if (data_len > 16) {
> +			res = _mm_cvtsi32_si128(crc);
> +			d = _mm_loadu_si128((const __m128i *)data);
> +			res = _mm_xor_si128(res, d);
> +			n += 16;
> +
> +			if (n != data_len)
> +				res = last_two_xmm(data, data_len, n, res,
> +						params);
> +		} else if (data_len == 16) {
> +			res = _mm_cvtsi32_si128(crc);
> +			d = _mm_loadu_si128((const __m128i *)data);
> +			res = _mm_xor_si128(res, d);
> +		} else {
> +			res = _mm_cvtsi32_si128(crc);
> +			mask = byte_len_to_mask_table[data_len];
> +			d = _mm_maskz_loadu_epi8(mask, data);
> +			res = _mm_xor_si128(res, d);
> +
> +			if (data_len > 3) {
> +				d = _mm_loadu_si128((const __m128i *)
> +						&shf_table[data_len]);
> +				res = _mm_shuffle_epi8(res, d);
> +			} else if (data_len > 2) {
> +				res = _mm_slli_si128(res, 5);
> +				goto do_barrett_reduction;
> +			} else if (data_len > 1) {
> +				res = _mm_slli_si128(res, 6);
> +				goto do_barrett_reduction;
> +			} else if (data_len > 0) {
> +				res = _mm_slli_si128(res, 7);
> +				goto do_barrett_reduction;
> +			} else {
> +				/* zero length case */
> +				return crc;
> +			}
> +		}
> +	}
> +
> +	res = done_128(res, params);
> +
> +do_barrett_reduction:
> +	n = barrett_reduction(res, params);
> +
> +	return n;
> +}
> +
> +static void
> +crc32_load_init_constants(void)
> +{
> +	__m128i a;
> +	/* fold constants */
> +	uint64_t c0 = 0x00000000e95c1271;
> +	uint64_t c1 = 0x00000000ce3371cb;
> +	uint64_t c2 = 0x00000000910eeec1;
> +	uint64_t c3 = 0x0000000033fff533;
> +	uint64_t c4 = 0x000000000cbec0ed;
> +	uint64_t c5 = 0x0000000031f8303f;
> +	uint64_t c6 = 0x0000000057c54819;
> +	uint64_t c7 = 0x00000000df068dc2;
> +	uint64_t c8 = 0x00000000ae0b5394;
> +	uint64_t c9 = 0x000000001c279815;
> +	uint64_t c10 = 0x000000001d9513d7;
> +	uint64_t c11 = 0x000000008f352d95;
> +	uint64_t c12 = 0x00000000af449247;
> +	uint64_t c13 = 0x000000003db1ecdc;
> +	uint64_t c14 = 0x0000000081256527;
> +	uint64_t c15 = 0x00000000f1da05aa;
> +	uint64_t c16 = 0x00000000ccaa009e;
> +	uint64_t c17 = 0x00000000ae689191;
> +	uint64_t c18 = 0x00000000ccaa009e;
> +	uint64_t c19 = 0x00000000b8bc6765;
> +	uint64_t c20 = 0x00000001f7011640;
> +	uint64_t c21 = 0x00000001db710640;
> +
> +	a = _mm_set_epi64x(c1, c0);
> +	crc32_eth.rk1_rk2 = _mm512_broadcast_i32x4(a);
> +
> +	a = _mm_set_epi64x(c3, c2);
> +	crc32_eth.rk3_rk4 = _mm512_broadcast_i32x4(a);
> +
> +	crc32_eth.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8,
> +			c9, c10, c11);
> +	crc32_eth.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15,
> +			c16, c17, 0, 0);
> +	crc32_eth.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16),
> +			_mm_cvtsi64_m64(c17));
> +
> +	crc32_eth.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18),
> +			_mm_cvtsi64_m64(c19));
> +	crc32_eth.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20),
> +			_mm_cvtsi64_m64(c21));
> +}
> +
> +static void
> +crc16_load_init_constants(void)
> +{
> +	__m128i a;
> +	/* fold constants */
> +	uint64_t c0 = 0x0000000000009a19;
> +	uint64_t c1 = 0x0000000000002df8;
> +	uint64_t c2 = 0x00000000000068af;
> +	uint64_t c3 = 0x000000000000b6c9;
> +	uint64_t c4 = 0x000000000000c64f;
> +	uint64_t c5 = 0x000000000000cd95;
> +	uint64_t c6 = 0x000000000000d341;
> +	uint64_t c7 = 0x000000000000b8f2;
> +	uint64_t c8 = 0x0000000000000842;
> +	uint64_t c9 = 0x000000000000b072;
> +	uint64_t c10 = 0x00000000000047e3;
> +	uint64_t c11 = 0x000000000000922d;
> +	uint64_t c12 = 0x0000000000000e3a;
> +	uint64_t c13 = 0x0000000000004d7a;
> +	uint64_t c14 = 0x0000000000005b44;
> +	uint64_t c15 = 0x0000000000007762;
> +	uint64_t c16 = 0x00000000000081bf;
> +	uint64_t c17 = 0x0000000000008e10;
> +	uint64_t c18 = 0x00000000000081bf;
> +	uint64_t c19 = 0x0000000000001cbb;
> +	uint64_t c20 = 0x000000011c581910;
> +	uint64_t c21 = 0x0000000000010810;
> +
> +	a = _mm_set_epi64x(c1, c0);
> +	crc16_ccitt.rk1_rk2 = _mm512_broadcast_i32x4(a);
> +
> +	a = _mm_set_epi64x(c3, c2);
> +	crc16_ccitt.rk3_rk4 = _mm512_broadcast_i32x4(a);
> +
> +	crc16_ccitt.fold_7x128b = _mm512_setr_epi64(c4, c5, c6, c7, c8,
> +			c9, c10, c11);
> +	crc16_ccitt.fold_3x128b = _mm512_setr_epi64(c12, c13, c14, c15,
> +			c16, c17, 0, 0);
> +	crc16_ccitt.fold_1x128b = _mm_setr_epi64(_mm_cvtsi64_m64(c16),
> +			_mm_cvtsi64_m64(c17));
> +
> +	crc16_ccitt.rk5_rk6 = _mm_setr_epi64(_mm_cvtsi64_m64(c18),
> +			_mm_cvtsi64_m64(c19));
> +	crc16_ccitt.rk7_rk8 = _mm_setr_epi64(_mm_cvtsi64_m64(c20),
> +			_mm_cvtsi64_m64(c21));
> +}
> +
> +void
> +rte_net_crc_avx512_init(void)
> +{
> +	crc32_load_init_constants();
> +	crc16_load_init_constants();
> +
> +	/*
> +	 * Reset the register as following calculation may
> +	 * use other data types such as float, double, etc.
> +	 */
> +	_mm_empty();
> +}
> +
> +uint32_t
> +rte_crc16_ccitt_avx512_handler(const uint8_t *data, uint32_t data_len)
> +{
> +	/* return 16-bit CRC value */
> +	return (uint16_t)~crc32_eth_calc_vpclmulqdq(data,
> +		data_len,
> +		0xffff,
> +		&crc16_ccitt);
> +}
> +
> +uint32_t
> +rte_crc32_eth_avx512_handler(const uint8_t *data, uint32_t data_len) {
> +	/* return 32-bit CRC value */
> +	return ~crc32_eth_calc_vpclmulqdq(data,
> +		data_len,
> +		0xffffffffUL,
> +		&crc32_eth);
> +}
> diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index
> d271d5205..32a366590 100644
> --- a/lib/librte_net/rte_net_crc.c
> +++ b/lib/librte_net/rte_net_crc.c
> @@ -37,6 +37,12 @@ static const rte_net_crc_handler handlers_scalar[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,  };
> +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
> +static const rte_net_crc_handler handlers_avx512[] = {
> +	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_avx512_handler,
> +	[RTE_NET_CRC32_ETH] = rte_crc32_eth_avx512_handler, }; #endif
>  #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT  static const
> rte_net_crc_handler handlers_sse42[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler, @@ -
> 134,6 +140,39 @@ rte_crc32_eth_handler(const uint8_t *data, uint32_t
> data_len)
>  		crc32_eth_lut);
>  }
> 
> +/* AVX512/VPCLMULQDQ handling */
> +
> +#define AVX512_VPCLMULQDQ_CPU_SUPPORTED ( \
> +	rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512F) && \
> +	rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512BW) && \
> +	rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512DQ) && \
> +	rte_cpu_get_flag_enabled(RTE_CPUFLAG_AVX512VL) && \
> +	rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ) && \
> +	rte_cpu_get_flag_enabled(RTE_CPUFLAG_VPCLMULQDQ) \
> +)
> +
> +static const rte_net_crc_handler *
> +avx512_vpclmulqdq_get_handlers(void)
> +{
> +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
> +	if (AVX512_VPCLMULQDQ_CPU_SUPPORTED)
> +		return handlers_avx512;
> +#endif
> +	return NULL;
> +}
> +
> +static uint8_t
> +avx512_vpclmulqdq_init(void)
> +{
> +#ifdef CC_X86_64_AVX512_VPCLMULQDQ_SUPPORT
> +	if (AVX512_VPCLMULQDQ_CPU_SUPPORTED) {
> +		rte_net_crc_avx512_init();
> +		return 1;
> +	}
> +#endif
> +	return 0;
> +}
> +
>  /* SSE4.2/PCLMULQDQ handling */
> 
>  #define SSE42_PCLMULQDQ_CPU_SUPPORTED \ @@ -196,6 +235,11 @@
> rte_net_crc_set_alg(enum rte_net_crc_alg alg)
>  	handlers = NULL;
> 
>  	switch (alg) {
> +	case RTE_NET_CRC_AVX512:
> +		handlers = avx512_vpclmulqdq_get_handlers();
> +		if (handlers != NULL)
> +			break;
> +		/* fall-through */
>  	case RTE_NET_CRC_SSE42:
>  		handlers = sse42_pclmulqdq_get_handlers();
>  		break; /* for x86, always break here */ @@ -235,6 +279,8
> @@ RTE_INIT(rte_net_crc_init)
> 
>  	if (sse42_pclmulqdq_init())
>  		alg = RTE_NET_CRC_SSE42;
> +	if (avx512_vpclmulqdq_init())
> +		alg = RTE_NET_CRC_AVX512;
>  	if (neon_pmull_init())
>  		alg = RTE_NET_CRC_NEON;
> 
> diff --git a/lib/librte_net/rte_net_crc.h b/lib/librte_net/rte_net_crc.h index
> 16e85ca97..72d3e10ff 100644
> --- a/lib/librte_net/rte_net_crc.h
> +++ b/lib/librte_net/rte_net_crc.h
> @@ -1,5 +1,5 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2017 Intel Corporation
> + * Copyright(c) 2017-2020 Intel Corporation
>   */
> 
>  #ifndef _RTE_NET_CRC_H_
> @@ -23,6 +23,7 @@ enum rte_net_crc_alg {
>  	RTE_NET_CRC_SCALAR = 0,
>  	RTE_NET_CRC_SSE42,
>  	RTE_NET_CRC_NEON,
> +	RTE_NET_CRC_AVX512,
>  };
> 
>  /**
> @@ -35,6 +36,7 @@ enum rte_net_crc_alg {
>   *   - RTE_NET_CRC_SCALAR
>   *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
>   *   - RTE_NET_CRC_NEON (Use ARM Neon intrinsic)
> + *   - RTE_NET_CRC_AVX512 (Use 512-bit AVX intrinsic)
>   */
>  void
>  rte_net_crc_set_alg(enum rte_net_crc_alg alg);
> --
> 2.12.3

Reviewed-by: Jasvinder Singh <jasvinder.singh@intel.com>


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC
  2020-10-09 13:50   ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh
  2020-10-09 13:50     ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
  2020-10-09 13:50     ` [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
@ 2020-10-09 18:35     ` De Lara Guarch, Pablo
  2020-10-13 18:47     ` David Marchand
  3 siblings, 0 replies; 23+ messages in thread
From: De Lara Guarch, Pablo @ 2020-10-09 18:35 UTC (permalink / raw)
  To: O'loingsigh, Mairtin, Singh, Jasvinder, Richardson, Bruce,
	Ananyev, Konstantin
  Cc: dev, Ryan, Brendan, Coyle, David



> -----Original Message-----
> From: O'loingsigh, Mairtin <mairtin.oloingsigh@intel.com>
> Sent: Friday, October 9, 2020 2:51 PM
> To: Singh, Jasvinder <jasvinder.singh@intel.com>; Richardson, Bruce
> <bruce.richardson@intel.com>; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>; Ananyev, Konstantin
> <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org; Ryan, Brendan <brendan.ryan@intel.com>; O'loingsigh,
> Mairtin <mairtin.oloingsigh@intel.com>; Coyle, David <david.coyle@intel.com>
> Subject: [PATCH v5 0/2] net: add CRC run-time checks and
> AVX512/VPCLMULQDQ based CRC
> 
> This patchset makes two significant enhancements to the CRC modules of the
> rte_net library:
> 
> 1) Adds run-time selection of the optimal architecture-specific CRC path.
>    Previously the selection was solely made at compile-time, meaning it
>    could only be built and run on the same generation of CPU. Adding
>    run-time selection ability means this can be used from distro packages
>    and/or DPDK can be compiled on an older CPU and run on a newer CPU.
> 2) Adds an optimized CRC implementation based on the AVX512 and
>    VPCLMULQDQ instruction sets.
> 
> For further details, please see the commit messages of the individual patches.
> 
> v5:
> * Tidied-up the ifdef checks for RTE_ARCH_* and compiler support of CRC
>   paths, as per review comments:
>   * All ifdef checks removed from API function definitions and moved into
>     helper functions.
> 
> v4:
> * Fixed build issue when older version of meson is used (0.47.1).
> * Addressed review comments:
>   * remove Intel copyright header from neon CRC file.
>   * tidy-up of register initialisation.
> 
> v3:
> * Re-submitted v2 as encountered problems when originally submitting it.
> 
> v2:
> * Added support for run-time selection of optimal architecture-specific
>   CRC, based on v1 review comment.
> * Added full working AVX512/VPCLMULQDQ support for CRC32-Ethernet and
>   CRC16-CCITT.
> 
> v1:
> * Initial version, with incomplete AVX512/VPCLMULQDQ support for
>   CRC32-Ethernet only.
> 
> Mairtin o Loingsigh (2):
>   net: add run-time architecture specific CRC selection
>   net: add support for AVX512/VPCLMULQDQ based CRC
> 
>  app/test/test_crc.c                               |  11 +-
>  config/x86/meson.build                            |   6 +-
>  doc/guides/rel_notes/release_20_11.rst            |   6 +
>  lib/librte_net/meson.build                        |  89 ++++-
>  lib/librte_net/net_crc.h                          |  45 +++
>  lib/librte_net/net_crc_avx512.c                   | 423 ++++++++++++++++++++++
>  lib/librte_net/{net_crc_neon.h => net_crc_neon.c} |  26 +-
>  lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   |  34 +-
>  lib/librte_net/rte_net_crc.c                      | 162 +++++++--
>  lib/librte_net/rte_net_crc.h                      |   4 +-
>  10 files changed, 722 insertions(+), 84 deletions(-)  create mode 100644
> lib/librte_net/net_crc.h  create mode 100644 lib/librte_net/net_crc_avx512.c
> rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%)  rename
> lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%)
> 
> --
> 2.12.3

Series-Reviewed-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection
  2020-10-09 13:50     ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
  2020-10-09 16:22       ` Singh, Jasvinder
@ 2020-10-10  9:34       ` Ruifeng Wang
  2020-10-13  9:07       ` Bruce Richardson
  2 siblings, 0 replies; 23+ messages in thread
From: Ruifeng Wang @ 2020-10-10  9:34 UTC (permalink / raw)
  To: Mairtin o Loingsigh, jasvinder.singh, bruce.richardson,
	pablo.de.lara.guarch, konstantin.ananyev
  Cc: dev, brendan.ryan, david.coyle, nd


> -----Original Message-----
> From: dev <dev-bounces@dpdk.org> On Behalf Of Mairtin o Loingsigh
> Sent: Friday, October 9, 2020 9:51 PM
> To: jasvinder.singh@intel.com; bruce.richardson@intel.com;
> pablo.de.lara.guarch@intel.com; konstantin.ananyev@intel.com
> Cc: dev@dpdk.org; brendan.ryan@intel.com; mairtin.oloingsigh@intel.com;
> david.coyle@intel.com
> Subject: [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific
> CRC selection
> 
> This patch adds support for run-time selection of the optimal architecture-
> specific CRC path, based on the supported instruction set(s) of the CPU.
> 
> The compiler option checks have been moved from the C files to the meson
> script. The rte_cpu_get_flag_enabled function is called automatically by the
> library at process initialization time to determine which instructions the CPU
> supports, with the most optimal supported CRC path ultimately selected.
> 
> Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
> Signed-off-by: David Coyle <david.coyle@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  doc/guides/rel_notes/release_20_11.rst            |   4 +
>  lib/librte_net/meson.build                        |  34 ++++++-
>  lib/librte_net/net_crc.h                          |  34 +++++++
>  lib/librte_net/{net_crc_neon.h => net_crc_neon.c} |  26 ++---
>  lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   |  34 ++-----
>  lib/librte_net/rte_net_crc.c                      | 116 +++++++++++++++-------
>  6 files changed, 168 insertions(+), 80 deletions(-)  create mode 100644
> lib/librte_net/net_crc.h  rename lib/librte_net/{net_crc_neon.h =>
> net_crc_neon.c} (95%)  rename lib/librte_net/{net_crc_sse.h =>
> net_crc_sse.c} (94%)
> 
> diff --git a/doc/guides/rel_notes/release_20_11.rst
> b/doc/guides/rel_notes/release_20_11.rst
> index 808bdc4e5..b77297f7e 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -55,6 +55,10 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
> 
> +* **Updated CRC modules of rte_net library.**
> +
> +  * Added run-time selection of the optimal architecture-specific CRC path.
> +
>  * **Updated Broadcom bnxt driver.**
> 
>    Updated the Broadcom bnxt driver with new features and improvements,
> including:
> diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build index
> 24ed8253b..fa439b9e5 100644
> --- a/lib/librte_net/meson.build
> +++ b/lib/librte_net/meson.build
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: BSD-3-Clause -# Copyright(c) 2017 Intel
> Corporation
> +# Copyright(c) 2017-2020 Intel Corporation
> 
>  headers = files('rte_ip.h',
>  	'rte_tcp.h',
> @@ -20,3 +20,35 @@ headers = files('rte_ip.h',
> 
>  sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c')  deps +=
> ['mbuf']
> +
> +if dpdk_conf.has('RTE_ARCH_X86_64')
> +	net_crc_sse42_cpu_support = (
> +		cc.get_define('__PCLMUL__', args: machine_args) != '')
> +	net_crc_sse42_cc_support = (
> +		cc.has_argument('-mpclmul') and cc.has_argument('-maes'))
> +
> +	build_static_net_crc_sse42_lib = 0
> +
> +	if net_crc_sse42_cpu_support == true
> +		sources += files('net_crc_sse.c')
> +		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
> +	elif net_crc_sse42_cc_support == true
> +		build_static_net_crc_sse42_lib = 1
> +		net_crc_sse42_lib_cflags = ['-mpclmul', '-maes']
> +		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
> +	endif
> +
> +	if build_static_net_crc_sse42_lib == 1
> +		net_crc_sse42_lib = static_library(
> +					'net_crc_sse42_lib',
> +					'net_crc_sse.c',
> +					dependencies: static_rte_eal,
> +					c_args: [cflags,
> +						net_crc_sse42_lib_cflags])
> +		objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c')
> +	endif
> +elif (dpdk_conf.has('RTE_ARCH_ARM64') and
> +		cc.get_define('__ARM_FEATURE_CRYPTO', args:
> machine_args) != '')
> +	sources += files('net_crc_neon.c')
> +	cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT'] endif
> diff --git a/lib/librte_net/net_crc.h b/lib/librte_net/net_crc.h new file mode
> 100644 index 000000000..a1578a56c
> --- /dev/null
> +++ b/lib/librte_net/net_crc.h
> @@ -0,0 +1,34 @@
> +/* SPDX-License-Identifier: BSD-3-Clause
> + * Copyright(c) 2020 Intel Corporation
> + */
> +
> +#ifndef _NET_CRC_H_
> +#define _NET_CRC_H_
> +
> +/*
> + * Different implementations of CRC
> + */
> +
> +/* SSE4.2 */
> +
> +void
> +rte_net_crc_sse42_init(void);
> +
> +uint32_t
> +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len);
> +
> +uint32_t
> +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len);
> +
> +/* NEON */
> +
> +void
> +rte_net_crc_neon_init(void);
> +
> +uint32_t
> +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len);
> +
> +uint32_t
> +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len);
> +
> +#endif /* _NET_CRC_H_ */
> diff --git a/lib/librte_net/net_crc_neon.h b/lib/librte_net/net_crc_neon.c
> similarity index 95% rename from lib/librte_net/net_crc_neon.h rename to
> lib/librte_net/net_crc_neon.c index 63fa1d4a1..f61d75a8c 100644
> --- a/lib/librte_net/net_crc_neon.h
> +++ b/lib/librte_net/net_crc_neon.c
> @@ -2,17 +2,15 @@
>   * Copyright(c) 2017 Cavium, Inc
>   */
> 
> -#ifndef _NET_CRC_NEON_H_
> -#define _NET_CRC_NEON_H_
> +#include <string.h>
> 
> +#include <rte_common.h>
>  #include <rte_branch_prediction.h>
>  #include <rte_net_crc.h>
>  #include <rte_vect.h>
>  #include <rte_cpuflags.h>
> 
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> +#include "net_crc.h"
> 
>  /** PMULL CRC computation context structure */  struct crc_pmull_ctx
> { @@ -218,7 +216,7 @@ crc32_eth_calc_pmull(
>  	return n;
>  }
> 
> -static inline void
> +void
>  rte_net_crc_neon_init(void)
>  {
>  	/* Initialize CRC16 data */
> @@ -242,9 +240,8 @@ rte_net_crc_neon_init(void)
>  	crc32_eth_pmull.rk7_rk8 = vld1q_u64(eth_k7_k8);  }
> 
> -static inline uint32_t
> -rte_crc16_ccitt_neon_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	return (uint16_t)~crc32_eth_calc_pmull(data,
>  		data_len,
> @@ -252,18 +249,11 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data,
>  		&crc16_ccitt_pmull);
>  }
> 
> -static inline uint32_t
> -rte_crc32_eth_neon_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	return ~crc32_eth_calc_pmull(data,
>  		data_len,
>  		0xffffffffUL,
>  		&crc32_eth_pmull);
>  }
> -
> -#ifdef __cplusplus
> -}
> -#endif
> -
> -#endif /* _NET_CRC_NEON_H_ */
> diff --git a/lib/librte_net/net_crc_sse.h b/lib/librte_net/net_crc_sse.c
> similarity index 94% rename from lib/librte_net/net_crc_sse.h rename to
> lib/librte_net/net_crc_sse.c index 1c7b7a548..053b54b39 100644
> --- a/lib/librte_net/net_crc_sse.h
> +++ b/lib/librte_net/net_crc_sse.c
> @@ -1,18 +1,16 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2017 Intel Corporation
> + * Copyright(c) 2017-2020 Intel Corporation
>   */
> 
> -#ifndef _RTE_NET_CRC_SSE_H_
> -#define _RTE_NET_CRC_SSE_H_
> +#include <string.h>
> 
> +#include <rte_common.h>
>  #include <rte_branch_prediction.h>
> +#include <rte_cpuflags.h>
> 
> -#include <x86intrin.h>
> -#include <cpuid.h>
> +#include "net_crc.h"
> 
> -#ifdef __cplusplus
> -extern "C" {
> -#endif
> +#include <x86intrin.h>
> 
>  /** PCLMULQDQ CRC computation context structure */  struct
> crc_pclmulqdq_ctx { @@ -259,8 +257,7 @@ crc32_eth_calc_pclmulqdq(
>  	return n;
>  }
> 
> -
> -static inline void
> +void
>  rte_net_crc_sse42_init(void)
>  {
>  	uint64_t k1, k2, k5, k6;
> @@ -303,12 +300,10 @@ rte_net_crc_sse42_init(void)
>  	 * use other data types such as float, double, etc.
>  	 */
>  	_mm_empty();
> -
>  }
> 
> -static inline uint32_t
> -rte_crc16_ccitt_sse42_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc16_ccitt_sse42_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	/** return 16-bit CRC value */
>  	return (uint16_t)~crc32_eth_calc_pclmulqdq(data,
> @@ -317,18 +312,11 @@ rte_crc16_ccitt_sse42_handler(const uint8_t *data,
>  		&crc16_ccitt_pclmulqdq);
>  }
> 
> -static inline uint32_t
> -rte_crc32_eth_sse42_handler(const uint8_t *data,
> -	uint32_t data_len)
> +uint32_t
> +rte_crc32_eth_sse42_handler(const uint8_t *data, uint32_t data_len)
>  {
>  	return ~crc32_eth_calc_pclmulqdq(data,
>  		data_len,
>  		0xffffffffUL,
>  		&crc32_eth_pclmulqdq);
>  }
> -
> -#ifdef __cplusplus
> -}
> -#endif
> -
> -#endif /* _RTE_NET_CRC_SSE_H_ */
> diff --git a/lib/librte_net/rte_net_crc.c b/lib/librte_net/rte_net_crc.c index
> 4f5b9e828..d271d5205 100644
> --- a/lib/librte_net/rte_net_crc.c
> +++ b/lib/librte_net/rte_net_crc.c
> @@ -1,5 +1,5 @@
>  /* SPDX-License-Identifier: BSD-3-Clause
> - * Copyright(c) 2017 Intel Corporation
> + * Copyright(c) 2017-2020 Intel Corporation
>   */
> 
>  #include <stddef.h>
> @@ -10,17 +10,7 @@
>  #include <rte_common.h>
>  #include <rte_net_crc.h>
> 
> -#if defined(RTE_ARCH_X86_64) && defined(__PCLMUL__)
> -#define X86_64_SSE42_PCLMULQDQ     1
> -#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRYPTO)
> -#define ARM64_NEON_PMULL           1
> -#endif
> -
> -#ifdef X86_64_SSE42_PCLMULQDQ
> -#include <net_crc_sse.h>
> -#elif defined ARM64_NEON_PMULL
> -#include <net_crc_neon.h>
> -#endif
> +#include "net_crc.h"
> 
>  /** CRC polynomials */
>  #define CRC32_ETH_POLYNOMIAL 0x04c11db7UL @@ -41,25 +31,27 @@
> rte_crc32_eth_handler(const uint8_t *data, uint32_t data_len);  typedef
> uint32_t  (*rte_net_crc_handler)(const uint8_t *data, uint32_t data_len);
> 
> -static rte_net_crc_handler *handlers;
> +static const rte_net_crc_handler *handlers;
> 
> -static rte_net_crc_handler handlers_scalar[] = {
> +static const rte_net_crc_handler handlers_scalar[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_handler,  };
> -
> -#ifdef X86_64_SSE42_PCLMULQDQ
> -static rte_net_crc_handler handlers_sse42[] = {
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static const
> +rte_net_crc_handler handlers_sse42[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_sse42_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_sse42_handler,  }; -#elif
> defined ARM64_NEON_PMULL -static rte_net_crc_handler handlers_neon[]
> = {
> +#endif
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> +static const rte_net_crc_handler handlers_neon[] = {
>  	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_neon_handler,
>  	[RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler,  };  #endif
> 
> +/* Scalar handling */
> +
>  /**
>   * Reflect the bits about the middle
>   *
> @@ -142,29 +134,82 @@ rte_crc32_eth_handler(const uint8_t *data,
> uint32_t data_len)
>  		crc32_eth_lut);
>  }
> 
> +/* SSE4.2/PCLMULQDQ handling */
> +
> +#define SSE42_PCLMULQDQ_CPU_SUPPORTED \
> +	rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ)
> +
> +static const rte_net_crc_handler *
> +sse42_pclmulqdq_get_handlers(void)
> +{
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> +	if (SSE42_PCLMULQDQ_CPU_SUPPORTED)
> +		return handlers_sse42;
> +#endif
> +	return NULL;
> +}
> +
> +static uint8_t
> +sse42_pclmulqdq_init(void)
> +{
> +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> +	if (SSE42_PCLMULQDQ_CPU_SUPPORTED) {
> +		rte_net_crc_sse42_init();
> +		return 1;
> +	}
> +#endif
> +	return 0;
> +}
> +
> +/* NEON/PMULL handling */
> +
> +#define NEON_PMULL_CPU_SUPPORTED \
> +	rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)
> +
> +static const rte_net_crc_handler *
> +neon_pmull_get_handlers(void)
> +{
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> +	if (NEON_PMULL_CPU_SUPPORTED)
> +		return handlers_neon;
> +#endif
> +	return NULL;
> +}
> +
> +static uint8_t
> +neon_pmull_init(void)
> +{
> +#ifdef CC_ARM64_NEON_PMULL_SUPPORT
> +	if (NEON_PMULL_CPU_SUPPORTED) {
> +		rte_net_crc_neon_init();
> +		return 1;
> +	}
> +#endif
> +	return 0;
> +}
> +
> +/* Public API */
> +
>  void
>  rte_net_crc_set_alg(enum rte_net_crc_alg alg)  {
> +	handlers = NULL;
> +
>  	switch (alg) {
> -#ifdef X86_64_SSE42_PCLMULQDQ
>  	case RTE_NET_CRC_SSE42:
> -		handlers = handlers_sse42;
> -		break;
> -#elif defined ARM64_NEON_PMULL
> -		/* fall-through */
> +		handlers = sse42_pclmulqdq_get_handlers();
> +		break; /* for x86, always break here */
>  	case RTE_NET_CRC_NEON:
> -		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> -			handlers = handlers_neon;
> -			break;
> -		}
> -#endif
> +		handlers = neon_pmull_get_handlers();
>  		/* fall-through */
>  	case RTE_NET_CRC_SCALAR:
>  		/* fall-through */
>  	default:
> -		handlers = handlers_scalar;
>  		break;
>  	}
> +
> +	if (handlers == NULL)
> +		handlers = handlers_scalar;
>  }
> 
>  uint32_t
> @@ -188,15 +233,10 @@ RTE_INIT(rte_net_crc_init)
> 
>  	rte_net_crc_scalar_init();
> 
> -#ifdef X86_64_SSE42_PCLMULQDQ
> -	alg = RTE_NET_CRC_SSE42;
> -	rte_net_crc_sse42_init();
> -#elif defined ARM64_NEON_PMULL
> -	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_PMULL)) {
> +	if (sse42_pclmulqdq_init())
> +		alg = RTE_NET_CRC_SSE42;
> +	if (neon_pmull_init())
>  		alg = RTE_NET_CRC_NEON;
> -		rte_net_crc_neon_init();
> -	}
> -#endif
> 
>  	rte_net_crc_set_alg(alg);
>  }
> --
> 2.12.3
The change looks good to me.

Reviewed-by: Ruifeng Wang <ruifeng.wang@arm.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection
  2020-10-09 14:04       ` Coyle, David
@ 2020-10-10 12:42         ` Ananyev, Konstantin
  0 siblings, 0 replies; 23+ messages in thread
From: Ananyev, Konstantin @ 2020-10-10 12:42 UTC (permalink / raw)
  To: Coyle, David, O'loingsigh, Mairtin, Singh, Jasvinder,
	Richardson, Bruce, De Lara Guarch, Pablo
  Cc: dev, Ryan, Brendan, O'loingsigh, Mairtin



Hi David,

> > > This patch adds support for run-time selection of the optimal
> > > architecture-specific CRC path, based on the supported instruction
> > > set(s) of the CPU.
> > >
> > > The compiler option checks have been moved from the C files to the
> > > meson script. The rte_cpu_get_flag_enabled function is called
> > > automatically by the library at process initialization time to
> > > determine which instructions the CPU supports, with the most optimal
> > > supported CRC path ultimately selected.
> > >
> > > Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
> > > Signed-off-by: David Coyle <david.coyle@intel.com>
> >
> > LGTM, just one nit see below.
> > With that:
> > Series acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> >
> > > ---
> > >  doc/guides/rel_notes/release_20_11.rst            |  4 ++
> > >  lib/librte_net/meson.build                        | 34 +++++++++++-
> > >  lib/librte_net/net_crc.h                          | 34 ++++++++++++
> > >  lib/librte_net/{net_crc_neon.h => net_crc_neon.c} | 26 +++------
> > >  lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   | 34 ++++--------
> > >  lib/librte_net/rte_net_crc.c                      | 67 ++++++++++++++---------
> > >  6 files changed, 131 insertions(+), 68 deletions(-)  create mode
> > > 100644 lib/librte_net/net_crc.h  rename lib/librte_net/{net_crc_neon.h
> > > => net_crc_neon.c} (95%)  rename lib/librte_net/{net_crc_sse.h =>
> > > net_crc_sse.c} (94%)
> > >
> > >
> 
> <snip>
> 
> > > +#ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT static uint8_t
> > > +sse42_pclmulqdq_cpu_supported(void)
> > > +{
> > > +	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ);
> > > +}
> >
> > As a nit, I think it would be better to hide #fidef inside the function, and
> > return an 0 when define is not set.
> > Something like:
> >
> > static int
> > sse42_pclmulqdq_cpu_supported(void)
> > {
> > #ifdef CC_X86_64_SSE42_PCLMULQDQ_SUPPORT
> > 	return rte_cpu_get_flag_enabled(RTE_CPUFLAG_PCLMULQDQ);
> > #else
> > 	return 0;
> > }
> >
> > Same for other cpu_supported functions.
> > And then you can remove these ifdefs in set_alg and other palces, i.e.:
> >
> > void
> > rte_net_crc_set_alg(enum rte_net_crc_alg alg) {
> >         switch (alg) {
> > #ifdef RTE_ARCH_X86_64
> >         case RTE_NET_CRC_AVX512:
> >                 if (avx512_vpclmulqdq_cpu_supported()) {
> >                         handlers = handlers_avx512;
> >                         break;
> >                 }
> >                 /* fall-through */
> >         case RTE_NET_CRC_SSE42:
> >                 if (sse42_pclmulqdq_cpu_supported()) {
> >                         handlers = handlers_sse42;
> >                         break;
> >                 }
> > #endif
> > ...
> >
> > Same for rte_net_crc_init()
> 
> [DC] I have reworked the ifdefs in this file based on your comments here and off-list discussions.
> These are available now in the v5.
> 
> All ifdef's have been removed out the API function definitions and moved down into 'helper' type
> functions - looks much cleaner now.
>
> Your Ack has been carried through too to v5 as you mentioned

LGTM, thanks.
Konstantin

 

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection
  2020-10-09 13:50     ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
  2020-10-09 16:22       ` Singh, Jasvinder
  2020-10-10  9:34       ` Ruifeng Wang
@ 2020-10-13  9:07       ` Bruce Richardson
  2 siblings, 0 replies; 23+ messages in thread
From: Bruce Richardson @ 2020-10-13  9:07 UTC (permalink / raw)
  To: Mairtin o Loingsigh
  Cc: jasvinder.singh, pablo.de.lara.guarch, konstantin.ananyev, dev,
	brendan.ryan, david.coyle

On Fri, Oct 09, 2020 at 02:50:44PM +0100, Mairtin o Loingsigh wrote:
> This patch adds support for run-time selection of the optimal
> architecture-specific CRC path, based on the supported instruction set(s)
> of the CPU.
> 
> The compiler option checks have been moved from the C files to the meson
> script. The rte_cpu_get_flag_enabled function is called automatically by
> the library at process initialization time to determine which
> instructions the CPU supports, with the most optimal supported CRC path
> ultimately selected.
> 
> Signed-off-by: Mairtin o Loingsigh <mairtin.oloingsigh@intel.com>
> Signed-off-by: David Coyle <david.coyle@intel.com>
> Acked-by: Konstantin Ananyev <konstantin.ananyev@intel.com>
> ---
>  doc/guides/rel_notes/release_20_11.rst            |   4 +
>  lib/librte_net/meson.build                        |  34 ++++++-
>  lib/librte_net/net_crc.h                          |  34 +++++++
>  lib/librte_net/{net_crc_neon.h => net_crc_neon.c} |  26 ++---
>  lib/librte_net/{net_crc_sse.h => net_crc_sse.c}   |  34 ++-----
>  lib/librte_net/rte_net_crc.c                      | 116 +++++++++++++++-------
>  6 files changed, 168 insertions(+), 80 deletions(-)
>  create mode 100644 lib/librte_net/net_crc.h
>  rename lib/librte_net/{net_crc_neon.h => net_crc_neon.c} (95%)
>  rename lib/librte_net/{net_crc_sse.h => net_crc_sse.c} (94%)
> 
> diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_notes/release_20_11.rst
> index 808bdc4e5..b77297f7e 100644
> --- a/doc/guides/rel_notes/release_20_11.rst
> +++ b/doc/guides/rel_notes/release_20_11.rst
> @@ -55,6 +55,10 @@ New Features
>       Also, make sure to start the actual text at the margin.
>       =======================================================
>  
> +* **Updated CRC modules of rte_net library.**
> +
> +  * Added run-time selection of the optimal architecture-specific CRC path.
> +
>  * **Updated Broadcom bnxt driver.**
>  
>    Updated the Broadcom bnxt driver with new features and improvements, including:
> diff --git a/lib/librte_net/meson.build b/lib/librte_net/meson.build
> index 24ed8253b..fa439b9e5 100644
> --- a/lib/librte_net/meson.build
> +++ b/lib/librte_net/meson.build
> @@ -1,5 +1,5 @@
>  # SPDX-License-Identifier: BSD-3-Clause
> -# Copyright(c) 2017 Intel Corporation
> +# Copyright(c) 2017-2020 Intel Corporation
>  
>  headers = files('rte_ip.h',
>  	'rte_tcp.h',
> @@ -20,3 +20,35 @@ headers = files('rte_ip.h',
>  
>  sources = files('rte_arp.c', 'rte_ether.c', 'rte_net.c', 'rte_net_crc.c')
>  deps += ['mbuf']
> +
> +if dpdk_conf.has('RTE_ARCH_X86_64')
> +	net_crc_sse42_cpu_support = (
> +		cc.get_define('__PCLMUL__', args: machine_args) != '')
> +	net_crc_sse42_cc_support = (
> +		cc.has_argument('-mpclmul') and cc.has_argument('-maes'))
> +
> +	build_static_net_crc_sse42_lib = 0
> +
> +	if net_crc_sse42_cpu_support == true
> +		sources += files('net_crc_sse.c')
> +		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
> +	elif net_crc_sse42_cc_support == true
> +		build_static_net_crc_sse42_lib = 1
> +		net_crc_sse42_lib_cflags = ['-mpclmul', '-maes']
> +		cflags += ['-DCC_X86_64_SSE42_PCLMULQDQ_SUPPORT']
> +	endif
> +
> +	if build_static_net_crc_sse42_lib == 1
> +		net_crc_sse42_lib = static_library(
> +					'net_crc_sse42_lib',
> +					'net_crc_sse.c',
> +					dependencies: static_rte_eal,
> +					c_args: [cflags,
> +						net_crc_sse42_lib_cflags])
> +		objs += net_crc_sse42_lib.extract_objects('net_crc_sse.c')
> +	endif
> +elif (dpdk_conf.has('RTE_ARCH_ARM64') and
> +		cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '')
> +	sources += files('net_crc_neon.c')
> +	cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT']
> +endif

This meson code looks ok to me. Not sure you needed the variable for
"net_crc_sse42_lib_cflags", but generally looks good.

Acked-by: Bruce Richardson <bruce.richadson@intel.com>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [dpdk-dev] [PATCH v5 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC
  2020-10-09 13:50   ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh
                       ` (2 preceding siblings ...)
  2020-10-09 18:35     ` [dpdk-dev] [PATCH v5 0/2] net: add CRC run-time checks and " De Lara Guarch, Pablo
@ 2020-10-13 18:47     ` David Marchand
  3 siblings, 0 replies; 23+ messages in thread
From: David Marchand @ 2020-10-13 18:47 UTC (permalink / raw)
  To: Mairtin o Loingsigh
  Cc: Singh, Jasvinder, Bruce Richardson, Pablo de Lara, Ananyev,
	Konstantin, dev, Ryan, Brendan, David Coyle, Olivier Matz

On Fri, Oct 9, 2020 at 3:55 PM Mairtin o Loingsigh
<mairtin.oloingsigh@intel.com> wrote:
>
> This patchset makes two significant enhancements to the CRC modules of
> the rte_net library:
>
> 1) Adds run-time selection of the optimal architecture-specific CRC path.
>    Previously the selection was solely made at compile-time, meaning it
>    could only be built and run on the same generation of CPU. Adding
>    run-time selection ability means this can be used from distro packages
>    and/or DPDK can be compiled on an older CPU and run on a newer CPU.
> 2) Adds an optimized CRC implementation based on the AVX512 and
>    VPCLMULQDQ instruction sets.
>
> For further details, please see the commit messages of the individual
> patches.
>
> v5:
> * Tidied-up the ifdef checks for RTE_ARCH_* and compiler support of CRC
>   paths, as per review comments:
>   * All ifdef checks removed from API function definitions and moved into
>     helper functions.

Updated MAINTAINERS with renamed/added files.
Series applied.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2020-10-13 18:48 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-29 15:35 [dpdk-dev] [PATCH v3 0/2] net: add CRC run-time checks and AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
2020-10-02 15:17   ` Singh, Jasvinder
2020-10-06 16:38     ` O'loingsigh, Mairtin
2020-09-29 15:36 ` [dpdk-dev] [PATCH v3 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
2020-10-05 13:20   ` De Lara Guarch, Pablo
2020-10-05 13:38     ` O'loingsigh, Mairtin
2020-10-06 16:23 ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " Mairtin o Loingsigh
2020-10-06 16:23   ` [dpdk-dev] [PATCH v4 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
2020-10-07 14:59     ` Ananyev, Konstantin
2020-10-09 14:04       ` Coyle, David
2020-10-10 12:42         ` Ananyev, Konstantin
2020-10-06 16:23   ` [dpdk-dev] [PATCH v4 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
2020-10-07  9:26   ` [dpdk-dev] [PATCH v4 0/2] net: add CRC run-time checks and " David Marchand
2020-10-09 13:50   ` [dpdk-dev] [PATCH v5 " Mairtin o Loingsigh
2020-10-09 13:50     ` [dpdk-dev] [PATCH v5 1/2] net: add run-time architecture specific CRC selection Mairtin o Loingsigh
2020-10-09 16:22       ` Singh, Jasvinder
2020-10-10  9:34       ` Ruifeng Wang
2020-10-13  9:07       ` Bruce Richardson
2020-10-09 13:50     ` [dpdk-dev] [PATCH v5 2/2] net: add support for AVX512/VPCLMULQDQ based CRC Mairtin o Loingsigh
2020-10-09 16:24       ` Singh, Jasvinder
2020-10-09 18:35     ` [dpdk-dev] [PATCH v5 0/2] net: add CRC run-time checks and " De Lara Guarch, Pablo
2020-10-13 18:47     ` David Marchand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).