DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 0/5] riscv: implement accelerated crc using zbc
@ 2024-06-18 17:41 Daniel Gregory
  2024-06-18 17:41 ` [PATCH 1/5] config/riscv: add flag for using Zbc extension Daniel Gregory
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Daniel Gregory @ 2024-06-18 17:41 UTC (permalink / raw)
  To: Stanislaw Kardach
  Cc: dev, Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng,
	Daniel Gregory

The RISC-V Zbc extension adds instructions for carry-less multiplication
we can use to implement CRC in hardware. This patchset contains two new
implementations:

- one in lib/hash/rte_crc_riscv64.h that uses a Barrett reduction to
  implement the four rte_hash_crc_* functions
- one in lib/net/net_crc_zbc.c that uses repeated single-folds to reduce
  the buffer until it is small enough for a Barrett reduction to
  implement rte_crc16_ccitt_zbc_handler and rte_crc32_eth_zbc_handler

My approach is largely based on the Intel's "Fast CRC Computation Using
PCLMULQDQ Instruction" white paper
https://www.researchgate.net/publication/263424619_Fast_CRC_computation
and a post about "Optimizing CRC32 for small payload sizes on x86"
https://mary.rs/lab/crc32/

These implementations are behind a new flag, RTE_RISCV_ZBC. Due to use
of bitmanip compiler intrinsics, a modern version of GCC (14+) or Clang
(18+) is required to compile with this flag enabled.

I have carried out some performance comparisons between the generic
table implementations and the new hardware implementations. Listed below
is the number of cycles it takes to compute the CRC hash for buffers of
various sizes (as reported by rte_get_timer_cycles()). These results
were collected on a Kendryte K230 and averaged over 20 samples:

|Buffer    | CRC32-ETH (lib/net) | CRC32C (lib/hash)   |
|Size (MB) | Table    | Hardware | Table    | Hardware |
|----------|----------|----------|----------|----------|
|        1 |   155168 |    11610 |    73026 |    18385 |
|        2 |   311203 |    22998 |   145586 |    35886 |
|        3 |   466744 |    34370 |   218536 |    53939 |
|        4 |   621843 |    45536 |   291574 |    71944 |
|        5 |   777908 |    56989 |   364152 |    89706 |
|        6 |   932736 |    68023 |   437016 |   107726 |
|        7 |  1088756 |    79236 |   510197 |   125426 |
|        8 |  1243794 |    90467 |   583231 |   143614 |

These results suggest a speed-up of lib/net by thirteen times, and of
lib/hash by four times.

Daniel Gregory (5):
  config/riscv: add flag for using Zbc extension
  hash: implement crc using riscv carryless multiply
  net: implement crc using riscv carryless multiply
  examples/l3fwd: use accelerated crc on riscv
  ipfrag: use accelerated crc on riscv

 MAINTAINERS                    |   2 +
 app/test/test_crc.c            |   9 ++
 app/test/test_hash.c           |   7 ++
 config/riscv/meson.build       |   7 ++
 examples/l3fwd/l3fwd_em.c      |   2 +-
 lib/hash/meson.build           |   1 +
 lib/hash/rte_crc_riscv64.h     |  89 +++++++++++++++
 lib/hash/rte_hash_crc.c        |  12 +-
 lib/hash/rte_hash_crc.h        |   6 +-
 lib/ip_frag/ip_frag_internal.c |   6 +-
 lib/net/meson.build            |   4 +
 lib/net/net_crc.h              |  11 ++
 lib/net/net_crc_zbc.c          | 202 +++++++++++++++++++++++++++++++++
 lib/net/rte_net_crc.c          |  35 ++++++
 lib/net/rte_net_crc.h          |   2 +
 15 files changed, 389 insertions(+), 6 deletions(-)
 create mode 100644 lib/hash/rte_crc_riscv64.h
 create mode 100644 lib/net/net_crc_zbc.c

-- 
2.39.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/5] config/riscv: add flag for using Zbc extension
  2024-06-18 17:41 [PATCH 0/5] riscv: implement accelerated crc using zbc Daniel Gregory
@ 2024-06-18 17:41 ` Daniel Gregory
  2024-06-18 20:03   ` Stephen Hemminger
  2024-06-18 17:41 ` [PATCH 2/5] hash: implement crc using riscv carryless multiply Daniel Gregory
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 10+ messages in thread
From: Daniel Gregory @ 2024-06-18 17:41 UTC (permalink / raw)
  To: Stanislaw Kardach, Bruce Richardson
  Cc: dev, Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng,
	Daniel Gregory

The RISC-V Zbc extension adds carry-less multiply instructions we can
use to implement more efficient CRC hashing algorithms.

Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
---
 config/riscv/meson.build | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/config/riscv/meson.build b/config/riscv/meson.build
index 07d7d9da23..4bda4089bd 100644
--- a/config/riscv/meson.build
+++ b/config/riscv/meson.build
@@ -26,6 +26,13 @@ flags_common = [
     # read from /proc/device-tree/cpus/timebase-frequency. This property is
     # guaranteed on Linux, as riscv time_init() requires it.
     ['RTE_RISCV_TIME_FREQ', 0],
+
+    # Use RISC-V Carry-less multiplication extension (Zbc) for hardware
+    # implementations of CRC-32C (lib/hash/rte_crc_riscv64.h), CRC-32 and CRC-16
+    # (lib/net/net_crc_zbc.c). Requires intrinsics available in GCC 14.1.0+ and
+    # Clang 18.1.0+
+    # Make sure to add '_zbc' to your target's -march below
+    ['RTE_RISCV_ZBC', false],
 ]
 
 ## SoC-specific options.
-- 
2.39.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/5] hash: implement crc using riscv carryless multiply
  2024-06-18 17:41 [PATCH 0/5] riscv: implement accelerated crc using zbc Daniel Gregory
  2024-06-18 17:41 ` [PATCH 1/5] config/riscv: add flag for using Zbc extension Daniel Gregory
@ 2024-06-18 17:41 ` Daniel Gregory
  2024-06-18 17:41 ` [PATCH 3/5] net: " Daniel Gregory
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Daniel Gregory @ 2024-06-18 17:41 UTC (permalink / raw)
  To: Thomas Monjalon, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
	Vladimir Medvedkin, Stanislaw Kardach
  Cc: dev, Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng,
	Daniel Gregory

Using carryless multiply instructions from RISC-V's Zbc extension,
implement a Barrett reduction that calculates CRC-32C checksums.

Based on the approach described by Intel's whitepaper on "Fast CRC
Computation for Generic Polynomials Using PCLMULQDQ Instruction", which
is also described here
(https://web.archive.org/web/20240111232520/https://mary.rs/lab/crc32/)

Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
---
 MAINTAINERS                |  1 +
 app/test/test_hash.c       |  7 +++
 lib/hash/meson.build       |  1 +
 lib/hash/rte_crc_riscv64.h | 89 ++++++++++++++++++++++++++++++++++++++
 lib/hash/rte_hash_crc.c    | 12 ++++-
 lib/hash/rte_hash_crc.h    |  6 ++-
 6 files changed, 114 insertions(+), 2 deletions(-)
 create mode 100644 lib/hash/rte_crc_riscv64.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 472713124c..48800f39c4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -318,6 +318,7 @@ M: Stanislaw Kardach <stanislaw.kardach@gmail.com>
 F: config/riscv/
 F: doc/guides/linux_gsg/cross_build_dpdk_for_riscv.rst
 F: lib/eal/riscv/
+F: lib/hash/rte_crc_riscv64.h
 
 Intel x86
 M: Bruce Richardson <bruce.richardson@intel.com>
diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 24d3b547ad..c8c4197ad8 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -205,6 +205,13 @@ test_crc32_hash_alg_equiv(void)
 			printf("Failed checking CRC32_SW against CRC32_ARM64\n");
 			break;
 		}
+
+		/* Check against 8-byte-operand RISCV64 CRC32 if available */
+		rte_hash_crc_set_alg(CRC32_RISCV64);
+		if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
+			printf("Failed checking CRC32_SW against CRC32_RISC64\n");
+			break;
+		}
 	}
 
 	/* Resetting to best available algorithm */
diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index 277eb9fa93..8355869a80 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -12,6 +12,7 @@ headers = files(
 indirect_headers += files(
         'rte_crc_arm64.h',
         'rte_crc_generic.h',
+        'rte_crc_riscv64.h',
         'rte_crc_sw.h',
         'rte_crc_x86.h',
         'rte_thash_x86_gfni.h',
diff --git a/lib/hash/rte_crc_riscv64.h b/lib/hash/rte_crc_riscv64.h
new file mode 100644
index 0000000000..94f6857c69
--- /dev/null
+++ b/lib/hash/rte_crc_riscv64.h
@@ -0,0 +1,89 @@
+/* SPDX-License_Identifier: BSD-3-Clause
+ * Copyright(c) ByteDance 2024
+ */
+
+#include <assert.h>
+#include <stdint.h>
+
+#include <riscv_bitmanip.h>
+
+#ifndef _RTE_CRC_RISCV64_H_
+#define _RTE_CRC_RISCV64_H_
+
+/*
+ * CRC-32C takes a reflected input (bit 7 is the lsb) and produces a reflected
+ * output. As reflecting the value we're checksumming is expensive, we instead
+ * reflect the polynomial P (0x11EDC6F41) and mu and our CRC32 algorithm.
+ *
+ * The mu constant is used for a Barrett reduction. It's 2^96 / P (0x11F91CAF6)
+ * reflected. Picking 2^96 rather than 2^64 means we can calculate a 64-bit crc
+ * using only two multiplications (https://mary.rs/lab/crc32/)
+ */
+static const uint64_t p = 0x105EC76F1;
+static const uint64_t mu = 0x4869EC38DEA713F1UL;
+
+/* Calculate the CRC32C checksum using a Barrett reduction */
+static inline uint32_t
+crc32c_riscv64(uint64_t data, uint32_t init_val, uint32_t bits)
+{
+	assert((bits == 64) || (bits == 32) || (bits == 16) || (bits == 8));
+
+	/* Combine data with the initial value */
+	uint64_t crc = (uint64_t)(data ^ init_val) << (64 - bits);
+
+	/*
+	 * Multiply by mu, which is 2^96 / P. Division by 2^96 occurs by taking
+	 * the lower 64 bits of the result (remember we're inverted)
+	 */
+	crc = __riscv_clmul_64(crc, mu);
+	/* Multiply by P */
+	crc = __riscv_clmulh_64(crc, p);
+
+	/* Subtract from original (only needed for smaller sizes) */
+	if (bits == 16 || bits == 8)
+		crc ^= init_val >> bits;
+
+	return crc;
+}
+
+/*
+ * Use carryless multiply to perform hash on a value, falling back on the
+ * software in case the Zbc extension is not supported
+ */
+static inline uint32_t
+rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
+{
+	if (likely(rte_hash_crc32_alg & CRC32_RISCV64))
+		return crc32c_riscv64(data, init_val, 8);
+
+	return crc32c_1byte(data, init_val);
+}
+
+static inline uint32_t
+rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
+{
+	if (likely(rte_hash_crc32_alg & CRC32_RISCV64))
+		return crc32c_riscv64(data, init_val, 16);
+
+	return crc32c_2bytes(data, init_val);
+}
+
+static inline uint32_t
+rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
+{
+	if (likely(rte_hash_crc32_alg & CRC32_RISCV64))
+		return crc32c_riscv64(data, init_val, 32);
+
+	return crc32c_1word(data, init_val);
+}
+
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+	if (likely(rte_hash_crc32_alg & CRC32_RISCV64))
+		return crc32c_riscv64(data, init_val, 64);
+
+	return crc32c_2words(data, init_val);
+}
+
+#endif /* _RTE_CRC_RISCV64_H_ */
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
index c037cdb0f0..ece1a84b29 100644
--- a/lib/hash/rte_hash_crc.c
+++ b/lib/hash/rte_hash_crc.c
@@ -15,7 +15,7 @@ RTE_LOG_REGISTER_SUFFIX(hash_crc_logtype, crc, INFO);
 uint8_t rte_hash_crc32_alg = CRC32_SW;
 
 /**
- * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * Allow or disallow use of SSE4.2/ARMv8/RISC-V intrinsics for CRC32 hash
  * calculation.
  *
  * @param alg
@@ -24,6 +24,7 @@ uint8_t rte_hash_crc32_alg = CRC32_SW;
  *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
  *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *   - (CRC32_RISCV64) Use RISCV64 Zbc extension if available
  *
  */
 void
@@ -52,6 +53,13 @@ rte_hash_crc_set_alg(uint8_t alg)
 		rte_hash_crc32_alg = CRC32_ARM64;
 #endif
 
+#if defined(RTE_ARCH_RISCV) && defined(RTE_RISCV_ZBC)
+	if (!(alg & CRC32_RISCV64))
+		HASH_CRC_LOG(WARNING,
+			"Unsupported CRC32 algorithm requested using CRC32_RISCV64");
+	rte_hash_crc32_alg = CRC32_RISCV64;
+#endif
+
 	if (rte_hash_crc32_alg == CRC32_SW)
 		HASH_CRC_LOG(WARNING,
 			"Unsupported CRC32 algorithm requested using CRC32_SW");
@@ -64,6 +72,8 @@ RTE_INIT(rte_hash_crc_init_alg)
 	rte_hash_crc_set_alg(CRC32_SSE42_x64);
 #elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
 	rte_hash_crc_set_alg(CRC32_ARM64);
+#elif defined(RTE_ARCH_RISCV) && defined(RTE_RISCV_ZBC)
+	rte_hash_crc_set_alg(CRC32_RISCV64);
 #else
 	rte_hash_crc_set_alg(CRC32_SW);
 #endif
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 8ad2422ec3..2be433fa21 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -28,6 +28,7 @@ extern "C" {
 #define CRC32_x64           (1U << 2)
 #define CRC32_SSE42_x64     (CRC32_x64|CRC32_SSE42)
 #define CRC32_ARM64         (1U << 3)
+#define CRC32_RISCV64       (1U << 4)
 
 extern uint8_t rte_hash_crc32_alg;
 
@@ -35,12 +36,14 @@ extern uint8_t rte_hash_crc32_alg;
 #include "rte_crc_arm64.h"
 #elif defined(RTE_ARCH_X86)
 #include "rte_crc_x86.h"
+#elif defined(RTE_ARCH_RISCV) && defined(RTE_RISCV_ZBC)
+#include "rte_crc_riscv64.h"
 #else
 #include "rte_crc_generic.h"
 #endif
 
 /**
- * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * Allow or disallow use of SSE4.2/ARMv8/RISC-V intrinsics for CRC32 hash
  * calculation.
  *
  * @param alg
@@ -49,6 +52,7 @@ extern uint8_t rte_hash_crc32_alg;
  *   - (CRC32_SSE42) Use SSE4.2 intrinsics if available
  *   - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
  *   - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ *   - (CRC32_RISCV64) Use RISC-V Carry-less multiply if available (default rv64gc_zbc)
  */
 void
 rte_hash_crc_set_alg(uint8_t alg);
-- 
2.39.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 3/5] net: implement crc using riscv carryless multiply
  2024-06-18 17:41 [PATCH 0/5] riscv: implement accelerated crc using zbc Daniel Gregory
  2024-06-18 17:41 ` [PATCH 1/5] config/riscv: add flag for using Zbc extension Daniel Gregory
  2024-06-18 17:41 ` [PATCH 2/5] hash: implement crc using riscv carryless multiply Daniel Gregory
@ 2024-06-18 17:41 ` Daniel Gregory
  2024-06-18 17:41 ` [PATCH 4/5] examples/l3fwd: use accelerated crc on riscv Daniel Gregory
  2024-06-18 17:41 ` [PATCH 5/5] ipfrag: " Daniel Gregory
  4 siblings, 0 replies; 10+ messages in thread
From: Daniel Gregory @ 2024-06-18 17:41 UTC (permalink / raw)
  To: Thomas Monjalon, Jasvinder Singh, Stanislaw Kardach
  Cc: dev, Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng,
	Daniel Gregory

Using carryless multiply instructions (clmul) from RISC-V's Zbc
extension, implement CRC-32 and CRC-16 calculations on buffers.

Based on the approach described in Intel's whitepaper on "Fast CRC
Computation for Generic Polynomails Using PCLMULQDQ Instructions", we
perfom repeated folds-by-1 whilst the buffer is still big enough, then
perform Barrett's reductions on the rest.

Add a case to the crc_autotest suite that tests this implementation.

This implementation is enabled by setting the RTE_RISCV_ZBC flag
(see config/riscv/meson.build).

Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
---
 MAINTAINERS           |   1 +
 app/test/test_crc.c   |   9 ++
 lib/net/meson.build   |   4 +
 lib/net/net_crc.h     |  11 +++
 lib/net/net_crc_zbc.c | 202 ++++++++++++++++++++++++++++++++++++++++++
 lib/net/rte_net_crc.c |  35 ++++++++
 lib/net/rte_net_crc.h |   2 +
 7 files changed, 264 insertions(+)
 create mode 100644 lib/net/net_crc_zbc.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 48800f39c4..6562e62779 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -319,6 +319,7 @@ F: config/riscv/
 F: doc/guides/linux_gsg/cross_build_dpdk_for_riscv.rst
 F: lib/eal/riscv/
 F: lib/hash/rte_crc_riscv64.h
+F: lib/net/net_crc_zbc.c
 
 Intel x86
 M: Bruce Richardson <bruce.richardson@intel.com>
diff --git a/app/test/test_crc.c b/app/test/test_crc.c
index b85fca35fe..fa91557cf5 100644
--- a/app/test/test_crc.c
+++ b/app/test/test_crc.c
@@ -168,6 +168,15 @@ test_crc(void)
 		return ret;
 	}
 
+	/* set CRC riscv mode */
+	rte_net_crc_set_alg(RTE_NET_CRC_ZBC);
+
+	ret = test_crc_calc();
+	if (ret < 0) {
+		printf("test crc (riscv64 zbc clmul): failed (%d)\n", ret);
+		return ret;
+	}
+
 	return 0;
 }
 
diff --git a/lib/net/meson.build b/lib/net/meson.build
index 0b69138949..f2ae019bea 100644
--- a/lib/net/meson.build
+++ b/lib/net/meson.build
@@ -125,4 +125,8 @@ elif (dpdk_conf.has('RTE_ARCH_ARM64') and
         cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '')
     sources += files('net_crc_neon.c')
     cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT']
+elif (dpdk_conf.has('RTE_ARCH_RISCV') and dpdk_conf.has('RTE_RISCV_ZBC') and
+	dpdk_conf.get('RTE_RISCV_ZBC'))
+    sources += files('net_crc_zbc.c')
+    cflags += ['-DCC_RISCV64_ZBC_CLMUL_SUPPORT']
 endif
diff --git a/lib/net/net_crc.h b/lib/net/net_crc.h
index 7a74d5406c..06ae113b47 100644
--- a/lib/net/net_crc.h
+++ b/lib/net/net_crc.h
@@ -42,4 +42,15 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len);
 uint32_t
 rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len);
 
+/* RISCV64 Zbc */
+void
+rte_net_crc_zbc_init(void);
+
+uint32_t
+rte_crc16_ccitt_zbc_handler(const uint8_t *data, uint32_t data_len);
+
+uint32_t
+rte_crc32_eth_zbc_handler(const uint8_t *data, uint32_t data_len);
+
+
 #endif /* _NET_CRC_H_ */
diff --git a/lib/net/net_crc_zbc.c b/lib/net/net_crc_zbc.c
new file mode 100644
index 0000000000..5907d69471
--- /dev/null
+++ b/lib/net/net_crc_zbc.c
@@ -0,0 +1,202 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) ByteDance 2024
+ */
+
+#include <riscv_bitmanip.h>
+#include <stdint.h>
+
+#include <rte_common.h>
+#include <rte_net_crc.h>
+
+#include "net_crc.h"
+
+/* CLMUL CRC computation context structure */
+struct crc_clmul_ctx {
+	uint64_t Pr;
+	uint64_t mu;
+	uint64_t k3;
+	uint64_t k4;
+	uint64_t k5;
+};
+
+struct crc_clmul_ctx crc32_eth_clmul;
+struct crc_clmul_ctx crc16_ccitt_clmul;
+
+/* Perform Barrett's reduction on 8, 16, 32 or 64-bit value */
+static inline uint32_t
+crc32_barrett_zbc(
+	const uint64_t data,
+	uint32_t crc,
+	uint32_t bits,
+	const struct crc_clmul_ctx *params)
+{
+	assert((bits == 64) || (bits == 32) || (bits == 16) || (bits == 8));
+
+	/* Combine data with the initial value */
+	uint64_t temp = (uint64_t)(data ^ crc) << (64 - bits);
+
+	/*
+	 * Multiply by mu, which is 2^96 / P. Division by 2^96 occurs by taking
+	 * the lower 64 bits of the result (remember we're inverted)
+	 */
+	temp = __riscv_clmul_64(temp, params->mu);
+	/* Multiply by P */
+	temp = __riscv_clmulh_64(temp, params->Pr);
+
+	/* Subtract from original (only needed for smaller sizes) */
+	if (bits == 16 || bits == 8)
+		temp ^= crc >> bits;
+
+	return temp;
+}
+
+/* Repeat Barrett's reduction for short buffer sizes */
+static inline uint32_t
+crc32_repeated_barrett_zbc(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_clmul_ctx *params)
+{
+	while (data_len >= 8) {
+		crc = crc32_barrett_zbc(*(const uint64_t *)data, crc, 64, params);
+		data += 8;
+		data_len -= 8;
+	}
+	if (data_len >= 4) {
+		crc = crc32_barrett_zbc(*(const uint32_t *)data, crc, 32, params);
+		data += 4;
+		data_len -= 4;
+	}
+	if (data_len >= 2) {
+		crc = crc32_barrett_zbc(*(const uint16_t *)data, crc, 16, params);
+		data += 2;
+		data_len -= 2;
+	}
+	if (data_len >= 1)
+		crc = crc32_barrett_zbc(*(const uint8_t *)data, crc, 8, params);
+
+	return crc;
+}
+
+/*
+ * Perform repeated reductions-by-1 over a buffer
+ * Reduces a buffer of uint64_t (naturally aligned) of even length to 128 bits
+ * Returns the upper and lower 64-bits in arguments
+ */
+static inline void
+crc32_reduction_zbc(
+	const uint64_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_clmul_ctx *params,
+	uint64_t *high,
+	uint64_t *low)
+{
+	*high = *(data++) ^ crc;
+	*low = *(data++);
+	data_len--;
+
+	for (; data_len >= 2; data_len -= 2) {
+		uint64_t highh = __riscv_clmulh_64(params->k3, *high);
+		uint64_t highl = __riscv_clmul_64(params->k3, *high);
+		uint64_t lowh = __riscv_clmulh_64(params->k4, *low);
+		uint64_t lowl = __riscv_clmul_64(params->k4, *low);
+
+		*high = highl ^ lowl;
+		*low = highh ^ lowh;
+
+		*high ^= *(data++);
+		*low ^= *(data++);
+	}
+}
+
+static inline uint32_t
+crc32_eth_calc_zbc(
+	const uint8_t *data,
+	uint32_t data_len,
+	uint32_t crc,
+	const struct crc_clmul_ctx *params)
+{
+	/* Barrett reduce until buffer aligned to 4-byte word */
+	uint32_t misalign = (size_t)data & 7;
+	if (misalign != 0) {
+		crc = crc32_repeated_barrett_zbc(data, misalign, crc, params);
+		data += misalign;
+		data_len -= misalign;
+	}
+
+	/* Minimum length we can do reduction-by-1 over */
+	const uint32_t min_len = 16;
+	if (data_len < min_len)
+		return crc32_repeated_barrett_zbc(data, data_len, crc, params);
+
+	/* Fold buffer into two 4-byte words */
+	/* Length is calculated by dividing by 8 then rounding down til even */
+	const uint64_t *data_word = (const uint64_t *)data;
+	uint32_t data_word_len = (data_len >> 3) & ~1;
+	uint32_t excess = data_len & 15;
+	uint64_t high, low;
+	crc32_reduction_zbc(data_word, data_word_len, crc, params, &high, &low);
+	data += data_word_len << 3;
+
+	/* Fold last 128 bits into 96 */
+	low = __riscv_clmul_64(params->k4, high) ^ low;
+	high = __riscv_clmulh_64(params->k4, high);
+	/* Upper 32 bits of high are now zero */
+	high = (low >> 32) | (high << 32);
+
+	/* Fold last 96 bits into 64 */
+	uint64_t temp = __riscv_clmul_64(low & 0xffffffff, params->k5);
+	temp ^= high;
+
+	/* Barrett reduction of last 64 bits */
+	uint64_t orig = temp;
+	temp = __riscv_clmul_64(temp, params->mu);
+	temp &= 0xffffffff;
+	temp = __riscv_clmul_64(temp, params->Pr);
+	crc = (temp ^ orig) >> 32;
+
+	/* Combine crc with any excess */
+	crc = crc32_repeated_barrett_zbc(data, excess, crc, params);
+
+	return crc;
+}
+
+void
+rte_net_crc_zbc_init(void)
+{
+	/* Initialise CRC32 data */
+	crc32_eth_clmul.Pr = 0x1db710641LL; /* polynomial P reversed */
+	crc32_eth_clmul.mu = 0xb4e5b025f7011641LL; /* (2 ^ 64 / P) reversed */
+	crc32_eth_clmul.k3 = 0x1751997d0LL; /* (x^(128+32) mod P << 32) reversed << 1 */
+	crc32_eth_clmul.k4 = 0x0ccaa009eLL; /* (x^(128-32) mod P << 32) reversed << 1 */
+	crc32_eth_clmul.k5 = 0x163cd6124LL; /* (x^64 mod P << 32) reversed << 1 */
+
+	/* Initialise CRC16 data */
+	/* Same calculations as above, with polynomial << 16 */
+	crc16_ccitt_clmul.Pr = 0x10811LL;
+	crc16_ccitt_clmul.mu = 0x859b040b1c581911LL;
+	crc16_ccitt_clmul.k3 = 0x8e10LL;
+	crc16_ccitt_clmul.k4 = 0x189aeLL;
+	crc16_ccitt_clmul.k5 = 0x114aaLL;
+}
+
+uint32_t
+rte_crc16_ccitt_zbc_handler(const uint8_t *data, uint32_t data_len)
+{
+	/* Negate the crc, which is present in the lower 16-bits */
+	return (uint16_t)~crc32_eth_calc_zbc(data,
+		data_len,
+		0xffff,
+		&crc16_ccitt_clmul);
+}
+
+uint32_t
+rte_crc32_eth_zbc_handler(const uint8_t *data, uint32_t data_len)
+{
+	return ~crc32_eth_calc_zbc(data,
+		data_len,
+		0xffffffffUL,
+		&crc32_eth_clmul);
+}
diff --git a/lib/net/rte_net_crc.c b/lib/net/rte_net_crc.c
index 346c285c15..1d03501267 100644
--- a/lib/net/rte_net_crc.c
+++ b/lib/net/rte_net_crc.c
@@ -67,6 +67,12 @@ static const rte_net_crc_handler handlers_neon[] = {
 	[RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler,
 };
 #endif
+#ifdef CC_RISCV64_ZBC_CLMUL_SUPPORT
+static const rte_net_crc_handler handlers_zbc[] = {
+	[RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_zbc_handler,
+	[RTE_NET_CRC32_ETH] = rte_crc32_eth_zbc_handler,
+};
+#endif
 
 static uint16_t max_simd_bitwidth;
 
@@ -244,6 +250,26 @@ neon_pmull_init(void)
 #endif
 }
 
+/* ZBC/CLMUL handling */
+
+static const rte_net_crc_handler *
+zbc_clmul_get_handlers(void)
+{
+#ifdef CC_RISCV64_ZBC_CLMUL_SUPPORT
+	return handlers_zbc;
+#endif
+	NET_LOG(INFO, "Requirements not met, can't use Zbc");
+	return NULL;
+}
+
+static void
+zbc_clmul_init(void)
+{
+#ifdef CC_RISCV64_ZBC_CLMUL_SUPPORT
+	rte_net_crc_zbc_init();
+#endif
+}
+
 /* Default handling */
 
 static uint32_t
@@ -260,6 +286,9 @@ rte_crc16_ccitt_default_handler(const uint8_t *data, uint32_t data_len)
 	if (handlers != NULL)
 		return handlers[RTE_NET_CRC16_CCITT](data, data_len);
 	handlers = neon_pmull_get_handlers();
+	if (handlers != NULL)
+		return handlers[RTE_NET_CRC16_CCITT](data, data_len);
+	handlers = zbc_clmul_get_handlers();
 	if (handlers != NULL)
 		return handlers[RTE_NET_CRC16_CCITT](data, data_len);
 	handlers = handlers_scalar;
@@ -282,6 +311,8 @@ rte_crc32_eth_default_handler(const uint8_t *data, uint32_t data_len)
 	handlers = neon_pmull_get_handlers();
 	if (handlers != NULL)
 		return handlers[RTE_NET_CRC32_ETH](data, data_len);
+	handlers = zbc_clmul_get_handlers();
+		return handlers[RTE_NET_CRC32_ETH](data, data_len);
 	handlers = handlers_scalar;
 	return handlers[RTE_NET_CRC32_ETH](data, data_len);
 }
@@ -306,6 +337,9 @@ rte_net_crc_set_alg(enum rte_net_crc_alg alg)
 		break; /* for x86, always break here */
 	case RTE_NET_CRC_NEON:
 		handlers = neon_pmull_get_handlers();
+		break;
+	case RTE_NET_CRC_ZBC:
+		handlers = zbc_clmul_get_handlers();
 		/* fall-through */
 	case RTE_NET_CRC_SCALAR:
 		/* fall-through */
@@ -338,4 +372,5 @@ RTE_INIT(rte_net_crc_init)
 	sse42_pclmulqdq_init();
 	avx512_vpclmulqdq_init();
 	neon_pmull_init();
+	zbc_clmul_init();
 }
diff --git a/lib/net/rte_net_crc.h b/lib/net/rte_net_crc.h
index 72d3e10ff6..12fa6a8a02 100644
--- a/lib/net/rte_net_crc.h
+++ b/lib/net/rte_net_crc.h
@@ -24,6 +24,7 @@ enum rte_net_crc_alg {
 	RTE_NET_CRC_SSE42,
 	RTE_NET_CRC_NEON,
 	RTE_NET_CRC_AVX512,
+	RTE_NET_CRC_ZBC,
 };
 
 /**
@@ -37,6 +38,7 @@ enum rte_net_crc_alg {
  *   - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
  *   - RTE_NET_CRC_NEON (Use ARM Neon intrinsic)
  *   - RTE_NET_CRC_AVX512 (Use 512-bit AVX intrinsic)
+ *   - RTE_NET_CRC_ZBC (Use RISC-V Zbc extension)
  */
 void
 rte_net_crc_set_alg(enum rte_net_crc_alg alg);
-- 
2.39.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/5] examples/l3fwd: use accelerated crc on riscv
  2024-06-18 17:41 [PATCH 0/5] riscv: implement accelerated crc using zbc Daniel Gregory
                   ` (2 preceding siblings ...)
  2024-06-18 17:41 ` [PATCH 3/5] net: " Daniel Gregory
@ 2024-06-18 17:41 ` Daniel Gregory
  2024-06-18 17:41 ` [PATCH 5/5] ipfrag: " Daniel Gregory
  4 siblings, 0 replies; 10+ messages in thread
From: Daniel Gregory @ 2024-06-18 17:41 UTC (permalink / raw)
  Cc: dev, Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng,
	Daniel Gregory

When the RISC-V Zbc (carryless multiplication) extension is present, an
implementation of CRC hashing using hardware instructions is available.
Use it rather than jhash.

Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
---
 examples/l3fwd/l3fwd_em.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index d98e66ea2c..4cec2dc6a9 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -29,7 +29,7 @@
 #include "l3fwd_event.h"
 #include "em_route_parse.c"
 
-#if defined(RTE_ARCH_X86) || defined(__ARM_FEATURE_CRC32)
+#if defined(RTE_ARCH_X86) || defined(__ARM_FEATURE_CRC32) || defined(RTE_RISCV_ZBC)
 #define EM_HASH_CRC 1
 #endif
 
-- 
2.39.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 5/5] ipfrag: use accelerated crc on riscv
  2024-06-18 17:41 [PATCH 0/5] riscv: implement accelerated crc using zbc Daniel Gregory
                   ` (3 preceding siblings ...)
  2024-06-18 17:41 ` [PATCH 4/5] examples/l3fwd: use accelerated crc on riscv Daniel Gregory
@ 2024-06-18 17:41 ` Daniel Gregory
  4 siblings, 0 replies; 10+ messages in thread
From: Daniel Gregory @ 2024-06-18 17:41 UTC (permalink / raw)
  To: Konstantin Ananyev
  Cc: dev, Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng,
	Daniel Gregory

When the RISC-V Zbc (carryless multiplication) extension is present, an
implementation of CRC hashing using hardware instructions is available.
Use it rather than jhash.

Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
---
 lib/ip_frag/ip_frag_internal.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/lib/ip_frag/ip_frag_internal.c b/lib/ip_frag/ip_frag_internal.c
index 7cbef647df..7806264078 100644
--- a/lib/ip_frag/ip_frag_internal.c
+++ b/lib/ip_frag/ip_frag_internal.c
@@ -45,14 +45,14 @@ ipv4_frag_hash(const struct ip_frag_key *key, uint32_t *v1, uint32_t *v2)
 
 	p = (const uint32_t *)&key->src_dst;
 
-#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_ARM64)
+#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_ARM64) || defined(RTE_RISCV_ZBC)
 	v = rte_hash_crc_4byte(p[0], PRIME_VALUE);
 	v = rte_hash_crc_4byte(p[1], v);
 	v = rte_hash_crc_4byte(key->id, v);
 #else
 
 	v = rte_jhash_3words(p[0], p[1], key->id, PRIME_VALUE);
-#endif /* RTE_ARCH_X86 */
+#endif /* RTE_ARCH_X86 || RTE_ARCH_ARM64 || RTE_RISCV_ZBC */
 
 	*v1 =  v;
 	*v2 = (v << 7) + (v >> 14);
@@ -66,7 +66,7 @@ ipv6_frag_hash(const struct ip_frag_key *key, uint32_t *v1, uint32_t *v2)
 
 	p = (const uint32_t *) &key->src_dst;
 
-#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_ARM64)
+#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_ARM64) || defined(RTE_RISCV_ZBC)
 	v = rte_hash_crc_4byte(p[0], PRIME_VALUE);
 	v = rte_hash_crc_4byte(p[1], v);
 	v = rte_hash_crc_4byte(p[2], v);
-- 
2.39.2


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/5] config/riscv: add flag for using Zbc extension
  2024-06-18 17:41 ` [PATCH 1/5] config/riscv: add flag for using Zbc extension Daniel Gregory
@ 2024-06-18 20:03   ` Stephen Hemminger
  2024-06-19  7:08     ` Morten Brørup
  0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2024-06-18 20:03 UTC (permalink / raw)
  To: Daniel Gregory
  Cc: Stanislaw Kardach, Bruce Richardson, dev, Liang Ma,
	Punit Agrawal, Pengcheng Wang, Chunsong Feng

On Tue, 18 Jun 2024 18:41:29 +0100
Daniel Gregory <daniel.gregory@bytedance.com> wrote:

> diff --git a/config/riscv/meson.build b/config/riscv/meson.build
> index 07d7d9da23..4bda4089bd 100644
> --- a/config/riscv/meson.build
> +++ b/config/riscv/meson.build
> @@ -26,6 +26,13 @@ flags_common = [
>      # read from /proc/device-tree/cpus/timebase-frequency. This property is
>      # guaranteed on Linux, as riscv time_init() requires it.
>      ['RTE_RISCV_TIME_FREQ', 0],
> +
> +    # Use RISC-V Carry-less multiplication extension (Zbc) for hardware
> +    # implementations of CRC-32C (lib/hash/rte_crc_riscv64.h), CRC-32 and CRC-16
> +    # (lib/net/net_crc_zbc.c). Requires intrinsics available in GCC 14.1.0+ and
> +    # Clang 18.1.0+
> +    # Make sure to add '_zbc' to your target's -march below
> +    ['RTE_RISCV_ZBC', false],
>  ]

Please do not add more config options via compile flags.
It makes it impossible for distros to ship one version.

Instead, detect at compile or runtime

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 1/5] config/riscv: add flag for using Zbc extension
  2024-06-18 20:03   ` Stephen Hemminger
@ 2024-06-19  7:08     ` Morten Brørup
  2024-06-19 14:49       ` Stephen Hemminger
  2024-06-19 16:41       ` Daniel Gregory
  0 siblings, 2 replies; 10+ messages in thread
From: Morten Brørup @ 2024-06-19  7:08 UTC (permalink / raw)
  To: Stephen Hemminger, Daniel Gregory
  Cc: Stanislaw Kardach, Bruce Richardson, dev, Liang Ma,
	Punit Agrawal, Pengcheng Wang, Chunsong Feng

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
1/5] config/riscv: add flag for using Zbc extension
> 
> On Tue, 18 Jun 2024 18:41:29 +0100
> Daniel Gregory <daniel.gregory@bytedance.com> wrote:
> 
> > diff --git a/config/riscv/meson.build b/config/riscv/meson.build
> > index 07d7d9da23..4bda4089bd 100644
> > --- a/config/riscv/meson.build
> > +++ b/config/riscv/meson.build
> > @@ -26,6 +26,13 @@ flags_common = [
> >      # read from /proc/device-tree/cpus/timebase-frequency. This property is
> >      # guaranteed on Linux, as riscv time_init() requires it.
> >      ['RTE_RISCV_TIME_FREQ', 0],
> > +
> > +    # Use RISC-V Carry-less multiplication extension (Zbc) for hardware
> > +    # implementations of CRC-32C (lib/hash/rte_crc_riscv64.h), CRC-32 and
> CRC-16
> > +    # (lib/net/net_crc_zbc.c). Requires intrinsics available in GCC 14.1.0+
> and
> > +    # Clang 18.1.0+
> > +    # Make sure to add '_zbc' to your target's -march below
> > +    ['RTE_RISCV_ZBC', false],
> >  ]
> 
> Please do not add more config options via compile flags.
> It makes it impossible for distros to ship one version.
> 
> Instead, detect at compile or runtime

Build time detection is not possible for cross builds.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/5] config/riscv: add flag for using Zbc extension
  2024-06-19  7:08     ` Morten Brørup
@ 2024-06-19 14:49       ` Stephen Hemminger
  2024-06-19 16:41       ` Daniel Gregory
  1 sibling, 0 replies; 10+ messages in thread
From: Stephen Hemminger @ 2024-06-19 14:49 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Daniel Gregory, Stanislaw Kardach, Bruce Richardson, dev,
	Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng

On Wed, 19 Jun 2024 09:08:14 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> > 
> > Please do not add more config options via compile flags.
> > It makes it impossible for distros to ship one version.
> > 
> > Instead, detect at compile or runtime  
> 
> Build time detection is not possible for cross builds.

I was thinking some mechanism like the ARM configs.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* RE: [PATCH 1/5] config/riscv: add flag for using Zbc extension
  2024-06-19  7:08     ` Morten Brørup
  2024-06-19 14:49       ` Stephen Hemminger
@ 2024-06-19 16:41       ` Daniel Gregory
  1 sibling, 0 replies; 10+ messages in thread
From: Daniel Gregory @ 2024-06-19 16:41 UTC (permalink / raw)
  To: Morten Brørup, Stephen Hemminger
  Cc: Stanislaw Kardach, Bruce Richardson, dev, Liang Ma,
	Punit Agrawal, Pengcheng Wang, Chunsong Feng

On Wed, Jun 19, 2024 at 09:08:14AM +0200, Morten Brørup wrote:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> 1/5] config/riscv: add flag for using Zbc extension
> > 
> > On Tue, 18 Jun 2024 18:41:29 +0100
> > Daniel Gregory <daniel.gregory@bytedance.com> wrote:
> > 
> > > diff --git a/config/riscv/meson.build b/config/riscv/meson.build
> > > index 07d7d9da23..4bda4089bd 100644
> > > --- a/config/riscv/meson.build
> > > +++ b/config/riscv/meson.build
> > > @@ -26,6 +26,13 @@ flags_common = [
> > >      # read from /proc/device-tree/cpus/timebase-frequency. This property is
> > >      # guaranteed on Linux, as riscv time_init() requires it.
> > >      ['RTE_RISCV_TIME_FREQ', 0],
> > > +
> > > +    # Use RISC-V Carry-less multiplication extension (Zbc) for hardware
> > > +    # implementations of CRC-32C (lib/hash/rte_crc_riscv64.h), CRC-32 and
> > CRC-16
> > > +    # (lib/net/net_crc_zbc.c). Requires intrinsics available in GCC 14.1.0+
> > and
> > > +    # Clang 18.1.0+
> > > +    # Make sure to add '_zbc' to your target's -march below
> > > +    ['RTE_RISCV_ZBC', false],
> > >  ]
> > 
> > Please do not add more config options via compile flags.
> > It makes it impossible for distros to ship one version.
> > 
> > Instead, detect at compile or runtime
> 
> Build time detection is not possible for cross builds.
> 

How about build time detection based on the target's configured
instruction set (either specified by cross-file or passed in through
-Dinstruction_set)? We could have a map from extensions present in the
ISA string to compile flags that should be enabled.

I suggested this whilst discussing a previous patch adding support for
the Zawrs extension, but haven't heard back from Stanisław yet:
https://lore.kernel.org/dpdk-dev/20240520094854.GA3672529@ste-uk-lab-gw/

As for runtime detection, newer kernel versions have a hardware probing
interface for detecting the presence of extensions, support could be
added to rte_cpuflags.c?
https://docs.kernel.org/arch/riscv/hwprobe.html

In combination, distros on newer kernels could ship a version that has
these optimisations baked in that falls back to a generic implementation
when the extension is detected to not be present, and systems without
the latest GCC/Clang can still compile by specifying a target ISA that
doesn't include "_zbc".

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2024-06-19 16:41 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-18 17:41 [PATCH 0/5] riscv: implement accelerated crc using zbc Daniel Gregory
2024-06-18 17:41 ` [PATCH 1/5] config/riscv: add flag for using Zbc extension Daniel Gregory
2024-06-18 20:03   ` Stephen Hemminger
2024-06-19  7:08     ` Morten Brørup
2024-06-19 14:49       ` Stephen Hemminger
2024-06-19 16:41       ` Daniel Gregory
2024-06-18 17:41 ` [PATCH 2/5] hash: implement crc using riscv carryless multiply Daniel Gregory
2024-06-18 17:41 ` [PATCH 3/5] net: " Daniel Gregory
2024-06-18 17:41 ` [PATCH 4/5] examples/l3fwd: use accelerated crc on riscv Daniel Gregory
2024-06-18 17:41 ` [PATCH 5/5] ipfrag: " Daniel Gregory

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).