* [PATCH 1/5] config/riscv: add flag for using Zbc extension
2024-06-18 17:41 [PATCH 0/5] riscv: implement accelerated crc using zbc Daniel Gregory
@ 2024-06-18 17:41 ` Daniel Gregory
2024-06-18 20:03 ` Stephen Hemminger
2024-06-18 17:41 ` [PATCH 2/5] hash: implement crc using riscv carryless multiply Daniel Gregory
` (3 subsequent siblings)
4 siblings, 1 reply; 10+ messages in thread
From: Daniel Gregory @ 2024-06-18 17:41 UTC (permalink / raw)
To: Stanislaw Kardach, Bruce Richardson
Cc: dev, Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng,
Daniel Gregory
The RISC-V Zbc extension adds carry-less multiply instructions we can
use to implement more efficient CRC hashing algorithms.
Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
---
config/riscv/meson.build | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/config/riscv/meson.build b/config/riscv/meson.build
index 07d7d9da23..4bda4089bd 100644
--- a/config/riscv/meson.build
+++ b/config/riscv/meson.build
@@ -26,6 +26,13 @@ flags_common = [
# read from /proc/device-tree/cpus/timebase-frequency. This property is
# guaranteed on Linux, as riscv time_init() requires it.
['RTE_RISCV_TIME_FREQ', 0],
+
+ # Use RISC-V Carry-less multiplication extension (Zbc) for hardware
+ # implementations of CRC-32C (lib/hash/rte_crc_riscv64.h), CRC-32 and CRC-16
+ # (lib/net/net_crc_zbc.c). Requires intrinsics available in GCC 14.1.0+ and
+ # Clang 18.1.0+
+ # Make sure to add '_zbc' to your target's -march below
+ ['RTE_RISCV_ZBC', false],
]
## SoC-specific options.
--
2.39.2
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/5] config/riscv: add flag for using Zbc extension
2024-06-18 17:41 ` [PATCH 1/5] config/riscv: add flag for using Zbc extension Daniel Gregory
@ 2024-06-18 20:03 ` Stephen Hemminger
2024-06-19 7:08 ` Morten Brørup
0 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2024-06-18 20:03 UTC (permalink / raw)
To: Daniel Gregory
Cc: Stanislaw Kardach, Bruce Richardson, dev, Liang Ma,
Punit Agrawal, Pengcheng Wang, Chunsong Feng
On Tue, 18 Jun 2024 18:41:29 +0100
Daniel Gregory <daniel.gregory@bytedance.com> wrote:
> diff --git a/config/riscv/meson.build b/config/riscv/meson.build
> index 07d7d9da23..4bda4089bd 100644
> --- a/config/riscv/meson.build
> +++ b/config/riscv/meson.build
> @@ -26,6 +26,13 @@ flags_common = [
> # read from /proc/device-tree/cpus/timebase-frequency. This property is
> # guaranteed on Linux, as riscv time_init() requires it.
> ['RTE_RISCV_TIME_FREQ', 0],
> +
> + # Use RISC-V Carry-less multiplication extension (Zbc) for hardware
> + # implementations of CRC-32C (lib/hash/rte_crc_riscv64.h), CRC-32 and CRC-16
> + # (lib/net/net_crc_zbc.c). Requires intrinsics available in GCC 14.1.0+ and
> + # Clang 18.1.0+
> + # Make sure to add '_zbc' to your target's -march below
> + ['RTE_RISCV_ZBC', false],
> ]
Please do not add more config options via compile flags.
It makes it impossible for distros to ship one version.
Instead, detect at compile or runtime
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH 1/5] config/riscv: add flag for using Zbc extension
2024-06-18 20:03 ` Stephen Hemminger
@ 2024-06-19 7:08 ` Morten Brørup
2024-06-19 14:49 ` Stephen Hemminger
2024-06-19 16:41 ` Daniel Gregory
0 siblings, 2 replies; 10+ messages in thread
From: Morten Brørup @ 2024-06-19 7:08 UTC (permalink / raw)
To: Stephen Hemminger, Daniel Gregory
Cc: Stanislaw Kardach, Bruce Richardson, dev, Liang Ma,
Punit Agrawal, Pengcheng Wang, Chunsong Feng
> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
1/5] config/riscv: add flag for using Zbc extension
>
> On Tue, 18 Jun 2024 18:41:29 +0100
> Daniel Gregory <daniel.gregory@bytedance.com> wrote:
>
> > diff --git a/config/riscv/meson.build b/config/riscv/meson.build
> > index 07d7d9da23..4bda4089bd 100644
> > --- a/config/riscv/meson.build
> > +++ b/config/riscv/meson.build
> > @@ -26,6 +26,13 @@ flags_common = [
> > # read from /proc/device-tree/cpus/timebase-frequency. This property is
> > # guaranteed on Linux, as riscv time_init() requires it.
> > ['RTE_RISCV_TIME_FREQ', 0],
> > +
> > + # Use RISC-V Carry-less multiplication extension (Zbc) for hardware
> > + # implementations of CRC-32C (lib/hash/rte_crc_riscv64.h), CRC-32 and
> CRC-16
> > + # (lib/net/net_crc_zbc.c). Requires intrinsics available in GCC 14.1.0+
> and
> > + # Clang 18.1.0+
> > + # Make sure to add '_zbc' to your target's -march below
> > + ['RTE_RISCV_ZBC', false],
> > ]
>
> Please do not add more config options via compile flags.
> It makes it impossible for distros to ship one version.
>
> Instead, detect at compile or runtime
Build time detection is not possible for cross builds.
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 1/5] config/riscv: add flag for using Zbc extension
2024-06-19 7:08 ` Morten Brørup
@ 2024-06-19 14:49 ` Stephen Hemminger
2024-06-19 16:41 ` Daniel Gregory
1 sibling, 0 replies; 10+ messages in thread
From: Stephen Hemminger @ 2024-06-19 14:49 UTC (permalink / raw)
To: Morten Brørup
Cc: Daniel Gregory, Stanislaw Kardach, Bruce Richardson, dev,
Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng
On Wed, 19 Jun 2024 09:08:14 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:
> >
> > Please do not add more config options via compile flags.
> > It makes it impossible for distros to ship one version.
> >
> > Instead, detect at compile or runtime
>
> Build time detection is not possible for cross builds.
I was thinking some mechanism like the ARM configs.
^ permalink raw reply [flat|nested] 10+ messages in thread
* RE: [PATCH 1/5] config/riscv: add flag for using Zbc extension
2024-06-19 7:08 ` Morten Brørup
2024-06-19 14:49 ` Stephen Hemminger
@ 2024-06-19 16:41 ` Daniel Gregory
1 sibling, 0 replies; 10+ messages in thread
From: Daniel Gregory @ 2024-06-19 16:41 UTC (permalink / raw)
To: Morten Brørup, Stephen Hemminger
Cc: Stanislaw Kardach, Bruce Richardson, dev, Liang Ma,
Punit Agrawal, Pengcheng Wang, Chunsong Feng
On Wed, Jun 19, 2024 at 09:08:14AM +0200, Morten Brørup wrote:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> 1/5] config/riscv: add flag for using Zbc extension
> >
> > On Tue, 18 Jun 2024 18:41:29 +0100
> > Daniel Gregory <daniel.gregory@bytedance.com> wrote:
> >
> > > diff --git a/config/riscv/meson.build b/config/riscv/meson.build
> > > index 07d7d9da23..4bda4089bd 100644
> > > --- a/config/riscv/meson.build
> > > +++ b/config/riscv/meson.build
> > > @@ -26,6 +26,13 @@ flags_common = [
> > > # read from /proc/device-tree/cpus/timebase-frequency. This property is
> > > # guaranteed on Linux, as riscv time_init() requires it.
> > > ['RTE_RISCV_TIME_FREQ', 0],
> > > +
> > > + # Use RISC-V Carry-less multiplication extension (Zbc) for hardware
> > > + # implementations of CRC-32C (lib/hash/rte_crc_riscv64.h), CRC-32 and
> > CRC-16
> > > + # (lib/net/net_crc_zbc.c). Requires intrinsics available in GCC 14.1.0+
> > and
> > > + # Clang 18.1.0+
> > > + # Make sure to add '_zbc' to your target's -march below
> > > + ['RTE_RISCV_ZBC', false],
> > > ]
> >
> > Please do not add more config options via compile flags.
> > It makes it impossible for distros to ship one version.
> >
> > Instead, detect at compile or runtime
>
> Build time detection is not possible for cross builds.
>
How about build time detection based on the target's configured
instruction set (either specified by cross-file or passed in through
-Dinstruction_set)? We could have a map from extensions present in the
ISA string to compile flags that should be enabled.
I suggested this whilst discussing a previous patch adding support for
the Zawrs extension, but haven't heard back from Stanisław yet:
https://lore.kernel.org/dpdk-dev/20240520094854.GA3672529@ste-uk-lab-gw/
As for runtime detection, newer kernel versions have a hardware probing
interface for detecting the presence of extensions, support could be
added to rte_cpuflags.c?
https://docs.kernel.org/arch/riscv/hwprobe.html
In combination, distros on newer kernels could ship a version that has
these optimisations baked in that falls back to a generic implementation
when the extension is detected to not be present, and systems without
the latest GCC/Clang can still compile by specifying a target ISA that
doesn't include "_zbc".
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 2/5] hash: implement crc using riscv carryless multiply
2024-06-18 17:41 [PATCH 0/5] riscv: implement accelerated crc using zbc Daniel Gregory
2024-06-18 17:41 ` [PATCH 1/5] config/riscv: add flag for using Zbc extension Daniel Gregory
@ 2024-06-18 17:41 ` Daniel Gregory
2024-06-18 17:41 ` [PATCH 3/5] net: " Daniel Gregory
` (2 subsequent siblings)
4 siblings, 0 replies; 10+ messages in thread
From: Daniel Gregory @ 2024-06-18 17:41 UTC (permalink / raw)
To: Thomas Monjalon, Yipeng Wang, Sameh Gobriel, Bruce Richardson,
Vladimir Medvedkin, Stanislaw Kardach
Cc: dev, Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng,
Daniel Gregory
Using carryless multiply instructions from RISC-V's Zbc extension,
implement a Barrett reduction that calculates CRC-32C checksums.
Based on the approach described by Intel's whitepaper on "Fast CRC
Computation for Generic Polynomials Using PCLMULQDQ Instruction", which
is also described here
(https://web.archive.org/web/20240111232520/https://mary.rs/lab/crc32/)
Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
---
MAINTAINERS | 1 +
app/test/test_hash.c | 7 +++
lib/hash/meson.build | 1 +
lib/hash/rte_crc_riscv64.h | 89 ++++++++++++++++++++++++++++++++++++++
lib/hash/rte_hash_crc.c | 12 ++++-
lib/hash/rte_hash_crc.h | 6 ++-
6 files changed, 114 insertions(+), 2 deletions(-)
create mode 100644 lib/hash/rte_crc_riscv64.h
diff --git a/MAINTAINERS b/MAINTAINERS
index 472713124c..48800f39c4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -318,6 +318,7 @@ M: Stanislaw Kardach <stanislaw.kardach@gmail.com>
F: config/riscv/
F: doc/guides/linux_gsg/cross_build_dpdk_for_riscv.rst
F: lib/eal/riscv/
+F: lib/hash/rte_crc_riscv64.h
Intel x86
M: Bruce Richardson <bruce.richardson@intel.com>
diff --git a/app/test/test_hash.c b/app/test/test_hash.c
index 24d3b547ad..c8c4197ad8 100644
--- a/app/test/test_hash.c
+++ b/app/test/test_hash.c
@@ -205,6 +205,13 @@ test_crc32_hash_alg_equiv(void)
printf("Failed checking CRC32_SW against CRC32_ARM64\n");
break;
}
+
+ /* Check against 8-byte-operand RISCV64 CRC32 if available */
+ rte_hash_crc_set_alg(CRC32_RISCV64);
+ if (hash_val != rte_hash_crc(data64, data_len, init_val)) {
+ printf("Failed checking CRC32_SW against CRC32_RISC64\n");
+ break;
+ }
}
/* Resetting to best available algorithm */
diff --git a/lib/hash/meson.build b/lib/hash/meson.build
index 277eb9fa93..8355869a80 100644
--- a/lib/hash/meson.build
+++ b/lib/hash/meson.build
@@ -12,6 +12,7 @@ headers = files(
indirect_headers += files(
'rte_crc_arm64.h',
'rte_crc_generic.h',
+ 'rte_crc_riscv64.h',
'rte_crc_sw.h',
'rte_crc_x86.h',
'rte_thash_x86_gfni.h',
diff --git a/lib/hash/rte_crc_riscv64.h b/lib/hash/rte_crc_riscv64.h
new file mode 100644
index 0000000000..94f6857c69
--- /dev/null
+++ b/lib/hash/rte_crc_riscv64.h
@@ -0,0 +1,89 @@
+/* SPDX-License_Identifier: BSD-3-Clause
+ * Copyright(c) ByteDance 2024
+ */
+
+#include <assert.h>
+#include <stdint.h>
+
+#include <riscv_bitmanip.h>
+
+#ifndef _RTE_CRC_RISCV64_H_
+#define _RTE_CRC_RISCV64_H_
+
+/*
+ * CRC-32C takes a reflected input (bit 7 is the lsb) and produces a reflected
+ * output. As reflecting the value we're checksumming is expensive, we instead
+ * reflect the polynomial P (0x11EDC6F41) and mu and our CRC32 algorithm.
+ *
+ * The mu constant is used for a Barrett reduction. It's 2^96 / P (0x11F91CAF6)
+ * reflected. Picking 2^96 rather than 2^64 means we can calculate a 64-bit crc
+ * using only two multiplications (https://mary.rs/lab/crc32/)
+ */
+static const uint64_t p = 0x105EC76F1;
+static const uint64_t mu = 0x4869EC38DEA713F1UL;
+
+/* Calculate the CRC32C checksum using a Barrett reduction */
+static inline uint32_t
+crc32c_riscv64(uint64_t data, uint32_t init_val, uint32_t bits)
+{
+ assert((bits == 64) || (bits == 32) || (bits == 16) || (bits == 8));
+
+ /* Combine data with the initial value */
+ uint64_t crc = (uint64_t)(data ^ init_val) << (64 - bits);
+
+ /*
+ * Multiply by mu, which is 2^96 / P. Division by 2^96 occurs by taking
+ * the lower 64 bits of the result (remember we're inverted)
+ */
+ crc = __riscv_clmul_64(crc, mu);
+ /* Multiply by P */
+ crc = __riscv_clmulh_64(crc, p);
+
+ /* Subtract from original (only needed for smaller sizes) */
+ if (bits == 16 || bits == 8)
+ crc ^= init_val >> bits;
+
+ return crc;
+}
+
+/*
+ * Use carryless multiply to perform hash on a value, falling back on the
+ * software in case the Zbc extension is not supported
+ */
+static inline uint32_t
+rte_hash_crc_1byte(uint8_t data, uint32_t init_val)
+{
+ if (likely(rte_hash_crc32_alg & CRC32_RISCV64))
+ return crc32c_riscv64(data, init_val, 8);
+
+ return crc32c_1byte(data, init_val);
+}
+
+static inline uint32_t
+rte_hash_crc_2byte(uint16_t data, uint32_t init_val)
+{
+ if (likely(rte_hash_crc32_alg & CRC32_RISCV64))
+ return crc32c_riscv64(data, init_val, 16);
+
+ return crc32c_2bytes(data, init_val);
+}
+
+static inline uint32_t
+rte_hash_crc_4byte(uint32_t data, uint32_t init_val)
+{
+ if (likely(rte_hash_crc32_alg & CRC32_RISCV64))
+ return crc32c_riscv64(data, init_val, 32);
+
+ return crc32c_1word(data, init_val);
+}
+
+static inline uint32_t
+rte_hash_crc_8byte(uint64_t data, uint32_t init_val)
+{
+ if (likely(rte_hash_crc32_alg & CRC32_RISCV64))
+ return crc32c_riscv64(data, init_val, 64);
+
+ return crc32c_2words(data, init_val);
+}
+
+#endif /* _RTE_CRC_RISCV64_H_ */
diff --git a/lib/hash/rte_hash_crc.c b/lib/hash/rte_hash_crc.c
index c037cdb0f0..ece1a84b29 100644
--- a/lib/hash/rte_hash_crc.c
+++ b/lib/hash/rte_hash_crc.c
@@ -15,7 +15,7 @@ RTE_LOG_REGISTER_SUFFIX(hash_crc_logtype, crc, INFO);
uint8_t rte_hash_crc32_alg = CRC32_SW;
/**
- * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * Allow or disallow use of SSE4.2/ARMv8/RISC-V intrinsics for CRC32 hash
* calculation.
*
* @param alg
@@ -24,6 +24,7 @@ uint8_t rte_hash_crc32_alg = CRC32_SW;
* - (CRC32_SSE42) Use SSE4.2 intrinsics if available
* - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
* - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ * - (CRC32_RISCV64) Use RISCV64 Zbc extension if available
*
*/
void
@@ -52,6 +53,13 @@ rte_hash_crc_set_alg(uint8_t alg)
rte_hash_crc32_alg = CRC32_ARM64;
#endif
+#if defined(RTE_ARCH_RISCV) && defined(RTE_RISCV_ZBC)
+ if (!(alg & CRC32_RISCV64))
+ HASH_CRC_LOG(WARNING,
+ "Unsupported CRC32 algorithm requested using CRC32_RISCV64");
+ rte_hash_crc32_alg = CRC32_RISCV64;
+#endif
+
if (rte_hash_crc32_alg == CRC32_SW)
HASH_CRC_LOG(WARNING,
"Unsupported CRC32 algorithm requested using CRC32_SW");
@@ -64,6 +72,8 @@ RTE_INIT(rte_hash_crc_init_alg)
rte_hash_crc_set_alg(CRC32_SSE42_x64);
#elif defined(RTE_ARCH_ARM64) && defined(__ARM_FEATURE_CRC32)
rte_hash_crc_set_alg(CRC32_ARM64);
+#elif defined(RTE_ARCH_RISCV) && defined(RTE_RISCV_ZBC)
+ rte_hash_crc_set_alg(CRC32_RISCV64);
#else
rte_hash_crc_set_alg(CRC32_SW);
#endif
diff --git a/lib/hash/rte_hash_crc.h b/lib/hash/rte_hash_crc.h
index 8ad2422ec3..2be433fa21 100644
--- a/lib/hash/rte_hash_crc.h
+++ b/lib/hash/rte_hash_crc.h
@@ -28,6 +28,7 @@ extern "C" {
#define CRC32_x64 (1U << 2)
#define CRC32_SSE42_x64 (CRC32_x64|CRC32_SSE42)
#define CRC32_ARM64 (1U << 3)
+#define CRC32_RISCV64 (1U << 4)
extern uint8_t rte_hash_crc32_alg;
@@ -35,12 +36,14 @@ extern uint8_t rte_hash_crc32_alg;
#include "rte_crc_arm64.h"
#elif defined(RTE_ARCH_X86)
#include "rte_crc_x86.h"
+#elif defined(RTE_ARCH_RISCV) && defined(RTE_RISCV_ZBC)
+#include "rte_crc_riscv64.h"
#else
#include "rte_crc_generic.h"
#endif
/**
- * Allow or disallow use of SSE4.2/ARMv8 intrinsics for CRC32 hash
+ * Allow or disallow use of SSE4.2/ARMv8/RISC-V intrinsics for CRC32 hash
* calculation.
*
* @param alg
@@ -49,6 +52,7 @@ extern uint8_t rte_hash_crc32_alg;
* - (CRC32_SSE42) Use SSE4.2 intrinsics if available
* - (CRC32_SSE42_x64) Use 64-bit SSE4.2 intrinsic if available (default x86)
* - (CRC32_ARM64) Use ARMv8 CRC intrinsic if available (default ARMv8)
+ * - (CRC32_RISCV64) Use RISC-V Carry-less multiply if available (default rv64gc_zbc)
*/
void
rte_hash_crc_set_alg(uint8_t alg);
--
2.39.2
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 3/5] net: implement crc using riscv carryless multiply
2024-06-18 17:41 [PATCH 0/5] riscv: implement accelerated crc using zbc Daniel Gregory
2024-06-18 17:41 ` [PATCH 1/5] config/riscv: add flag for using Zbc extension Daniel Gregory
2024-06-18 17:41 ` [PATCH 2/5] hash: implement crc using riscv carryless multiply Daniel Gregory
@ 2024-06-18 17:41 ` Daniel Gregory
2024-06-18 17:41 ` [PATCH 4/5] examples/l3fwd: use accelerated crc on riscv Daniel Gregory
2024-06-18 17:41 ` [PATCH 5/5] ipfrag: " Daniel Gregory
4 siblings, 0 replies; 10+ messages in thread
From: Daniel Gregory @ 2024-06-18 17:41 UTC (permalink / raw)
To: Thomas Monjalon, Jasvinder Singh, Stanislaw Kardach
Cc: dev, Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng,
Daniel Gregory
Using carryless multiply instructions (clmul) from RISC-V's Zbc
extension, implement CRC-32 and CRC-16 calculations on buffers.
Based on the approach described in Intel's whitepaper on "Fast CRC
Computation for Generic Polynomails Using PCLMULQDQ Instructions", we
perfom repeated folds-by-1 whilst the buffer is still big enough, then
perform Barrett's reductions on the rest.
Add a case to the crc_autotest suite that tests this implementation.
This implementation is enabled by setting the RTE_RISCV_ZBC flag
(see config/riscv/meson.build).
Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
---
MAINTAINERS | 1 +
app/test/test_crc.c | 9 ++
lib/net/meson.build | 4 +
lib/net/net_crc.h | 11 +++
lib/net/net_crc_zbc.c | 202 ++++++++++++++++++++++++++++++++++++++++++
lib/net/rte_net_crc.c | 35 ++++++++
lib/net/rte_net_crc.h | 2 +
7 files changed, 264 insertions(+)
create mode 100644 lib/net/net_crc_zbc.c
diff --git a/MAINTAINERS b/MAINTAINERS
index 48800f39c4..6562e62779 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -319,6 +319,7 @@ F: config/riscv/
F: doc/guides/linux_gsg/cross_build_dpdk_for_riscv.rst
F: lib/eal/riscv/
F: lib/hash/rte_crc_riscv64.h
+F: lib/net/net_crc_zbc.c
Intel x86
M: Bruce Richardson <bruce.richardson@intel.com>
diff --git a/app/test/test_crc.c b/app/test/test_crc.c
index b85fca35fe..fa91557cf5 100644
--- a/app/test/test_crc.c
+++ b/app/test/test_crc.c
@@ -168,6 +168,15 @@ test_crc(void)
return ret;
}
+ /* set CRC riscv mode */
+ rte_net_crc_set_alg(RTE_NET_CRC_ZBC);
+
+ ret = test_crc_calc();
+ if (ret < 0) {
+ printf("test crc (riscv64 zbc clmul): failed (%d)\n", ret);
+ return ret;
+ }
+
return 0;
}
diff --git a/lib/net/meson.build b/lib/net/meson.build
index 0b69138949..f2ae019bea 100644
--- a/lib/net/meson.build
+++ b/lib/net/meson.build
@@ -125,4 +125,8 @@ elif (dpdk_conf.has('RTE_ARCH_ARM64') and
cc.get_define('__ARM_FEATURE_CRYPTO', args: machine_args) != '')
sources += files('net_crc_neon.c')
cflags += ['-DCC_ARM64_NEON_PMULL_SUPPORT']
+elif (dpdk_conf.has('RTE_ARCH_RISCV') and dpdk_conf.has('RTE_RISCV_ZBC') and
+ dpdk_conf.get('RTE_RISCV_ZBC'))
+ sources += files('net_crc_zbc.c')
+ cflags += ['-DCC_RISCV64_ZBC_CLMUL_SUPPORT']
endif
diff --git a/lib/net/net_crc.h b/lib/net/net_crc.h
index 7a74d5406c..06ae113b47 100644
--- a/lib/net/net_crc.h
+++ b/lib/net/net_crc.h
@@ -42,4 +42,15 @@ rte_crc16_ccitt_neon_handler(const uint8_t *data, uint32_t data_len);
uint32_t
rte_crc32_eth_neon_handler(const uint8_t *data, uint32_t data_len);
+/* RISCV64 Zbc */
+void
+rte_net_crc_zbc_init(void);
+
+uint32_t
+rte_crc16_ccitt_zbc_handler(const uint8_t *data, uint32_t data_len);
+
+uint32_t
+rte_crc32_eth_zbc_handler(const uint8_t *data, uint32_t data_len);
+
+
#endif /* _NET_CRC_H_ */
diff --git a/lib/net/net_crc_zbc.c b/lib/net/net_crc_zbc.c
new file mode 100644
index 0000000000..5907d69471
--- /dev/null
+++ b/lib/net/net_crc_zbc.c
@@ -0,0 +1,202 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright(c) ByteDance 2024
+ */
+
+#include <riscv_bitmanip.h>
+#include <stdint.h>
+
+#include <rte_common.h>
+#include <rte_net_crc.h>
+
+#include "net_crc.h"
+
+/* CLMUL CRC computation context structure */
+struct crc_clmul_ctx {
+ uint64_t Pr;
+ uint64_t mu;
+ uint64_t k3;
+ uint64_t k4;
+ uint64_t k5;
+};
+
+struct crc_clmul_ctx crc32_eth_clmul;
+struct crc_clmul_ctx crc16_ccitt_clmul;
+
+/* Perform Barrett's reduction on 8, 16, 32 or 64-bit value */
+static inline uint32_t
+crc32_barrett_zbc(
+ const uint64_t data,
+ uint32_t crc,
+ uint32_t bits,
+ const struct crc_clmul_ctx *params)
+{
+ assert((bits == 64) || (bits == 32) || (bits == 16) || (bits == 8));
+
+ /* Combine data with the initial value */
+ uint64_t temp = (uint64_t)(data ^ crc) << (64 - bits);
+
+ /*
+ * Multiply by mu, which is 2^96 / P. Division by 2^96 occurs by taking
+ * the lower 64 bits of the result (remember we're inverted)
+ */
+ temp = __riscv_clmul_64(temp, params->mu);
+ /* Multiply by P */
+ temp = __riscv_clmulh_64(temp, params->Pr);
+
+ /* Subtract from original (only needed for smaller sizes) */
+ if (bits == 16 || bits == 8)
+ temp ^= crc >> bits;
+
+ return temp;
+}
+
+/* Repeat Barrett's reduction for short buffer sizes */
+static inline uint32_t
+crc32_repeated_barrett_zbc(
+ const uint8_t *data,
+ uint32_t data_len,
+ uint32_t crc,
+ const struct crc_clmul_ctx *params)
+{
+ while (data_len >= 8) {
+ crc = crc32_barrett_zbc(*(const uint64_t *)data, crc, 64, params);
+ data += 8;
+ data_len -= 8;
+ }
+ if (data_len >= 4) {
+ crc = crc32_barrett_zbc(*(const uint32_t *)data, crc, 32, params);
+ data += 4;
+ data_len -= 4;
+ }
+ if (data_len >= 2) {
+ crc = crc32_barrett_zbc(*(const uint16_t *)data, crc, 16, params);
+ data += 2;
+ data_len -= 2;
+ }
+ if (data_len >= 1)
+ crc = crc32_barrett_zbc(*(const uint8_t *)data, crc, 8, params);
+
+ return crc;
+}
+
+/*
+ * Perform repeated reductions-by-1 over a buffer
+ * Reduces a buffer of uint64_t (naturally aligned) of even length to 128 bits
+ * Returns the upper and lower 64-bits in arguments
+ */
+static inline void
+crc32_reduction_zbc(
+ const uint64_t *data,
+ uint32_t data_len,
+ uint32_t crc,
+ const struct crc_clmul_ctx *params,
+ uint64_t *high,
+ uint64_t *low)
+{
+ *high = *(data++) ^ crc;
+ *low = *(data++);
+ data_len--;
+
+ for (; data_len >= 2; data_len -= 2) {
+ uint64_t highh = __riscv_clmulh_64(params->k3, *high);
+ uint64_t highl = __riscv_clmul_64(params->k3, *high);
+ uint64_t lowh = __riscv_clmulh_64(params->k4, *low);
+ uint64_t lowl = __riscv_clmul_64(params->k4, *low);
+
+ *high = highl ^ lowl;
+ *low = highh ^ lowh;
+
+ *high ^= *(data++);
+ *low ^= *(data++);
+ }
+}
+
+static inline uint32_t
+crc32_eth_calc_zbc(
+ const uint8_t *data,
+ uint32_t data_len,
+ uint32_t crc,
+ const struct crc_clmul_ctx *params)
+{
+ /* Barrett reduce until buffer aligned to 4-byte word */
+ uint32_t misalign = (size_t)data & 7;
+ if (misalign != 0) {
+ crc = crc32_repeated_barrett_zbc(data, misalign, crc, params);
+ data += misalign;
+ data_len -= misalign;
+ }
+
+ /* Minimum length we can do reduction-by-1 over */
+ const uint32_t min_len = 16;
+ if (data_len < min_len)
+ return crc32_repeated_barrett_zbc(data, data_len, crc, params);
+
+ /* Fold buffer into two 4-byte words */
+ /* Length is calculated by dividing by 8 then rounding down til even */
+ const uint64_t *data_word = (const uint64_t *)data;
+ uint32_t data_word_len = (data_len >> 3) & ~1;
+ uint32_t excess = data_len & 15;
+ uint64_t high, low;
+ crc32_reduction_zbc(data_word, data_word_len, crc, params, &high, &low);
+ data += data_word_len << 3;
+
+ /* Fold last 128 bits into 96 */
+ low = __riscv_clmul_64(params->k4, high) ^ low;
+ high = __riscv_clmulh_64(params->k4, high);
+ /* Upper 32 bits of high are now zero */
+ high = (low >> 32) | (high << 32);
+
+ /* Fold last 96 bits into 64 */
+ uint64_t temp = __riscv_clmul_64(low & 0xffffffff, params->k5);
+ temp ^= high;
+
+ /* Barrett reduction of last 64 bits */
+ uint64_t orig = temp;
+ temp = __riscv_clmul_64(temp, params->mu);
+ temp &= 0xffffffff;
+ temp = __riscv_clmul_64(temp, params->Pr);
+ crc = (temp ^ orig) >> 32;
+
+ /* Combine crc with any excess */
+ crc = crc32_repeated_barrett_zbc(data, excess, crc, params);
+
+ return crc;
+}
+
+void
+rte_net_crc_zbc_init(void)
+{
+ /* Initialise CRC32 data */
+ crc32_eth_clmul.Pr = 0x1db710641LL; /* polynomial P reversed */
+ crc32_eth_clmul.mu = 0xb4e5b025f7011641LL; /* (2 ^ 64 / P) reversed */
+ crc32_eth_clmul.k3 = 0x1751997d0LL; /* (x^(128+32) mod P << 32) reversed << 1 */
+ crc32_eth_clmul.k4 = 0x0ccaa009eLL; /* (x^(128-32) mod P << 32) reversed << 1 */
+ crc32_eth_clmul.k5 = 0x163cd6124LL; /* (x^64 mod P << 32) reversed << 1 */
+
+ /* Initialise CRC16 data */
+ /* Same calculations as above, with polynomial << 16 */
+ crc16_ccitt_clmul.Pr = 0x10811LL;
+ crc16_ccitt_clmul.mu = 0x859b040b1c581911LL;
+ crc16_ccitt_clmul.k3 = 0x8e10LL;
+ crc16_ccitt_clmul.k4 = 0x189aeLL;
+ crc16_ccitt_clmul.k5 = 0x114aaLL;
+}
+
+uint32_t
+rte_crc16_ccitt_zbc_handler(const uint8_t *data, uint32_t data_len)
+{
+ /* Negate the crc, which is present in the lower 16-bits */
+ return (uint16_t)~crc32_eth_calc_zbc(data,
+ data_len,
+ 0xffff,
+ &crc16_ccitt_clmul);
+}
+
+uint32_t
+rte_crc32_eth_zbc_handler(const uint8_t *data, uint32_t data_len)
+{
+ return ~crc32_eth_calc_zbc(data,
+ data_len,
+ 0xffffffffUL,
+ &crc32_eth_clmul);
+}
diff --git a/lib/net/rte_net_crc.c b/lib/net/rte_net_crc.c
index 346c285c15..1d03501267 100644
--- a/lib/net/rte_net_crc.c
+++ b/lib/net/rte_net_crc.c
@@ -67,6 +67,12 @@ static const rte_net_crc_handler handlers_neon[] = {
[RTE_NET_CRC32_ETH] = rte_crc32_eth_neon_handler,
};
#endif
+#ifdef CC_RISCV64_ZBC_CLMUL_SUPPORT
+static const rte_net_crc_handler handlers_zbc[] = {
+ [RTE_NET_CRC16_CCITT] = rte_crc16_ccitt_zbc_handler,
+ [RTE_NET_CRC32_ETH] = rte_crc32_eth_zbc_handler,
+};
+#endif
static uint16_t max_simd_bitwidth;
@@ -244,6 +250,26 @@ neon_pmull_init(void)
#endif
}
+/* ZBC/CLMUL handling */
+
+static const rte_net_crc_handler *
+zbc_clmul_get_handlers(void)
+{
+#ifdef CC_RISCV64_ZBC_CLMUL_SUPPORT
+ return handlers_zbc;
+#endif
+ NET_LOG(INFO, "Requirements not met, can't use Zbc");
+ return NULL;
+}
+
+static void
+zbc_clmul_init(void)
+{
+#ifdef CC_RISCV64_ZBC_CLMUL_SUPPORT
+ rte_net_crc_zbc_init();
+#endif
+}
+
/* Default handling */
static uint32_t
@@ -260,6 +286,9 @@ rte_crc16_ccitt_default_handler(const uint8_t *data, uint32_t data_len)
if (handlers != NULL)
return handlers[RTE_NET_CRC16_CCITT](data, data_len);
handlers = neon_pmull_get_handlers();
+ if (handlers != NULL)
+ return handlers[RTE_NET_CRC16_CCITT](data, data_len);
+ handlers = zbc_clmul_get_handlers();
if (handlers != NULL)
return handlers[RTE_NET_CRC16_CCITT](data, data_len);
handlers = handlers_scalar;
@@ -282,6 +311,8 @@ rte_crc32_eth_default_handler(const uint8_t *data, uint32_t data_len)
handlers = neon_pmull_get_handlers();
if (handlers != NULL)
return handlers[RTE_NET_CRC32_ETH](data, data_len);
+ handlers = zbc_clmul_get_handlers();
+ return handlers[RTE_NET_CRC32_ETH](data, data_len);
handlers = handlers_scalar;
return handlers[RTE_NET_CRC32_ETH](data, data_len);
}
@@ -306,6 +337,9 @@ rte_net_crc_set_alg(enum rte_net_crc_alg alg)
break; /* for x86, always break here */
case RTE_NET_CRC_NEON:
handlers = neon_pmull_get_handlers();
+ break;
+ case RTE_NET_CRC_ZBC:
+ handlers = zbc_clmul_get_handlers();
/* fall-through */
case RTE_NET_CRC_SCALAR:
/* fall-through */
@@ -338,4 +372,5 @@ RTE_INIT(rte_net_crc_init)
sse42_pclmulqdq_init();
avx512_vpclmulqdq_init();
neon_pmull_init();
+ zbc_clmul_init();
}
diff --git a/lib/net/rte_net_crc.h b/lib/net/rte_net_crc.h
index 72d3e10ff6..12fa6a8a02 100644
--- a/lib/net/rte_net_crc.h
+++ b/lib/net/rte_net_crc.h
@@ -24,6 +24,7 @@ enum rte_net_crc_alg {
RTE_NET_CRC_SSE42,
RTE_NET_CRC_NEON,
RTE_NET_CRC_AVX512,
+ RTE_NET_CRC_ZBC,
};
/**
@@ -37,6 +38,7 @@ enum rte_net_crc_alg {
* - RTE_NET_CRC_SSE42 (Use 64-bit SSE4.2 intrinsic)
* - RTE_NET_CRC_NEON (Use ARM Neon intrinsic)
* - RTE_NET_CRC_AVX512 (Use 512-bit AVX intrinsic)
+ * - RTE_NET_CRC_ZBC (Use RISC-V Zbc extension)
*/
void
rte_net_crc_set_alg(enum rte_net_crc_alg alg);
--
2.39.2
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 4/5] examples/l3fwd: use accelerated crc on riscv
2024-06-18 17:41 [PATCH 0/5] riscv: implement accelerated crc using zbc Daniel Gregory
` (2 preceding siblings ...)
2024-06-18 17:41 ` [PATCH 3/5] net: " Daniel Gregory
@ 2024-06-18 17:41 ` Daniel Gregory
2024-06-18 17:41 ` [PATCH 5/5] ipfrag: " Daniel Gregory
4 siblings, 0 replies; 10+ messages in thread
From: Daniel Gregory @ 2024-06-18 17:41 UTC (permalink / raw)
Cc: dev, Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng,
Daniel Gregory
When the RISC-V Zbc (carryless multiplication) extension is present, an
implementation of CRC hashing using hardware instructions is available.
Use it rather than jhash.
Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
---
examples/l3fwd/l3fwd_em.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/examples/l3fwd/l3fwd_em.c b/examples/l3fwd/l3fwd_em.c
index d98e66ea2c..4cec2dc6a9 100644
--- a/examples/l3fwd/l3fwd_em.c
+++ b/examples/l3fwd/l3fwd_em.c
@@ -29,7 +29,7 @@
#include "l3fwd_event.h"
#include "em_route_parse.c"
-#if defined(RTE_ARCH_X86) || defined(__ARM_FEATURE_CRC32)
+#if defined(RTE_ARCH_X86) || defined(__ARM_FEATURE_CRC32) || defined(RTE_RISCV_ZBC)
#define EM_HASH_CRC 1
#endif
--
2.39.2
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 5/5] ipfrag: use accelerated crc on riscv
2024-06-18 17:41 [PATCH 0/5] riscv: implement accelerated crc using zbc Daniel Gregory
` (3 preceding siblings ...)
2024-06-18 17:41 ` [PATCH 4/5] examples/l3fwd: use accelerated crc on riscv Daniel Gregory
@ 2024-06-18 17:41 ` Daniel Gregory
4 siblings, 0 replies; 10+ messages in thread
From: Daniel Gregory @ 2024-06-18 17:41 UTC (permalink / raw)
To: Konstantin Ananyev
Cc: dev, Liang Ma, Punit Agrawal, Pengcheng Wang, Chunsong Feng,
Daniel Gregory
When the RISC-V Zbc (carryless multiplication) extension is present, an
implementation of CRC hashing using hardware instructions is available.
Use it rather than jhash.
Signed-off-by: Daniel Gregory <daniel.gregory@bytedance.com>
---
lib/ip_frag/ip_frag_internal.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/ip_frag/ip_frag_internal.c b/lib/ip_frag/ip_frag_internal.c
index 7cbef647df..7806264078 100644
--- a/lib/ip_frag/ip_frag_internal.c
+++ b/lib/ip_frag/ip_frag_internal.c
@@ -45,14 +45,14 @@ ipv4_frag_hash(const struct ip_frag_key *key, uint32_t *v1, uint32_t *v2)
p = (const uint32_t *)&key->src_dst;
-#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_ARM64)
+#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_ARM64) || defined(RTE_RISCV_ZBC)
v = rte_hash_crc_4byte(p[0], PRIME_VALUE);
v = rte_hash_crc_4byte(p[1], v);
v = rte_hash_crc_4byte(key->id, v);
#else
v = rte_jhash_3words(p[0], p[1], key->id, PRIME_VALUE);
-#endif /* RTE_ARCH_X86 */
+#endif /* RTE_ARCH_X86 || RTE_ARCH_ARM64 || RTE_RISCV_ZBC */
*v1 = v;
*v2 = (v << 7) + (v >> 14);
@@ -66,7 +66,7 @@ ipv6_frag_hash(const struct ip_frag_key *key, uint32_t *v1, uint32_t *v2)
p = (const uint32_t *) &key->src_dst;
-#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_ARM64)
+#if defined(RTE_ARCH_X86) || defined(RTE_ARCH_ARM64) || defined(RTE_RISCV_ZBC)
v = rte_hash_crc_4byte(p[0], PRIME_VALUE);
v = rte_hash_crc_4byte(p[1], v);
v = rte_hash_crc_4byte(p[2], v);
--
2.39.2
^ permalink raw reply [flat|nested] 10+ messages in thread