From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 6E02A45493; Tue, 18 Jun 2024 19:41:51 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 54F2B402E3; Tue, 18 Jun 2024 19:41:51 +0200 (CEST) Received: from mail-wr1-f52.google.com (mail-wr1-f52.google.com [209.85.221.52]) by mails.dpdk.org (Postfix) with ESMTP id BF75F402CB for ; Tue, 18 Jun 2024 19:41:50 +0200 (CEST) Received: by mail-wr1-f52.google.com with SMTP id ffacd0b85a97d-36087277246so2465214f8f.3 for ; Tue, 18 Jun 2024 10:41:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1718732510; x=1719337310; darn=dpdk.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=+bekZInEZ/wtmhk+vo0qwitpSIvyNAwoK1XwnAX3Bro=; b=Sm9UX+0P24M9W2w7QdCYBx0xN0jHPqXBXCuK4LZXYjE+gnnYgVX8jh4mCXfBc9Edvo ZdBHV4Sv9+5+uFGZEvp3D0G2nMwwXxpaikd8juZ0cqyysjY/zGQLL0NMj2mQarfqDz2Y YvberrVslIfUbDDsE7BEW6pDsvhQnx1MH4SvyqKcZkh7lEH/fU0keAvnalSWoIEfqYxf EF889Q8HmzWXT6tJ+2ZgFGDt3AhIEAjnRiRNADtCNSxF63A/irtuYJPFuqzlToLVjZH6 QRyNtwrdaQrjvFrcYmQRehM9l9qaRztkzWTxR0Co1H+x5uvyMua8bxn/QPvWyqCGFuOP nnfw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718732510; x=1719337310; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+bekZInEZ/wtmhk+vo0qwitpSIvyNAwoK1XwnAX3Bro=; b=CF33h5LCLp7GqEVlWVS++dAFDzqczyx+IJfFxY7lMBI3cGr6/2VAutLPZMY4yHK6te n57OrF1i65zrSgdEPyHYI0vCzJO0ZGmVHZD/SoA0KRRUxTVqQWwkNdrpNX4Jdfu8KgeT mUbJTWgKluDgOYRWJvfRUtD7OVvQ3Juq7kj11Rux+zHrjvOFXw9gLU4pHwzpHehGRmMB Id5ogKeqJPTWVz30GWlmii9T3D6xQVjRywtnV8bISxEM2rIOzoP4UhiqwiaqovNEg0YR pxPCHWC9ka19p3rmn0XLUFWi1PBpLRhv3kjoQ4FTMbnkOVhNeWuKfkdpf6cUvlT6Poo9 Jj7g== X-Gm-Message-State: AOJu0Yzx1lKaroks5phEKOUk8jKmD2YkvxMBa8Me2ZRAD73jyYRKh9yy BdlZHKvZeK5zjqn9Nk0XzbpILfYciZUG6Qs1xNGIBOD4ELH820ubjzD2vuwgBtM= X-Google-Smtp-Source: AGHT+IEZ7dhg09XgB7GN008VqEP6ihfSIRL/V8nTvCn+yRDH4YcIvIBFsw08W2N221r0YkC8qAS27A== X-Received: by 2002:adf:fe0b:0:b0:363:10cb:45e9 with SMTP id ffacd0b85a97d-36319892f10mr174829f8f.44.1718732510198; Tue, 18 Jun 2024 10:41:50 -0700 (PDT) Received: from C02FF2N1MD6T.bytedance.net ([79.173.157.19]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3609d95bc02sm3211334f8f.18.2024.06.18.10.41.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 18 Jun 2024 10:41:49 -0700 (PDT) From: Daniel Gregory To: Stanislaw Kardach Cc: dev@dpdk.org, Liang Ma , Punit Agrawal , Pengcheng Wang , Chunsong Feng , Daniel Gregory Subject: [PATCH 0/5] riscv: implement accelerated crc using zbc Date: Tue, 18 Jun 2024 18:41:28 +0100 Message-Id: <20240618174133.33457-1-daniel.gregory@bytedance.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org The RISC-V Zbc extension adds instructions for carry-less multiplication we can use to implement CRC in hardware. This patchset contains two new implementations: - one in lib/hash/rte_crc_riscv64.h that uses a Barrett reduction to implement the four rte_hash_crc_* functions - one in lib/net/net_crc_zbc.c that uses repeated single-folds to reduce the buffer until it is small enough for a Barrett reduction to implement rte_crc16_ccitt_zbc_handler and rte_crc32_eth_zbc_handler My approach is largely based on the Intel's "Fast CRC Computation Using PCLMULQDQ Instruction" white paper https://www.researchgate.net/publication/263424619_Fast_CRC_computation and a post about "Optimizing CRC32 for small payload sizes on x86" https://mary.rs/lab/crc32/ These implementations are behind a new flag, RTE_RISCV_ZBC. Due to use of bitmanip compiler intrinsics, a modern version of GCC (14+) or Clang (18+) is required to compile with this flag enabled. I have carried out some performance comparisons between the generic table implementations and the new hardware implementations. Listed below is the number of cycles it takes to compute the CRC hash for buffers of various sizes (as reported by rte_get_timer_cycles()). These results were collected on a Kendryte K230 and averaged over 20 samples: |Buffer | CRC32-ETH (lib/net) | CRC32C (lib/hash) | |Size (MB) | Table | Hardware | Table | Hardware | |----------|----------|----------|----------|----------| | 1 | 155168 | 11610 | 73026 | 18385 | | 2 | 311203 | 22998 | 145586 | 35886 | | 3 | 466744 | 34370 | 218536 | 53939 | | 4 | 621843 | 45536 | 291574 | 71944 | | 5 | 777908 | 56989 | 364152 | 89706 | | 6 | 932736 | 68023 | 437016 | 107726 | | 7 | 1088756 | 79236 | 510197 | 125426 | | 8 | 1243794 | 90467 | 583231 | 143614 | These results suggest a speed-up of lib/net by thirteen times, and of lib/hash by four times. Daniel Gregory (5): config/riscv: add flag for using Zbc extension hash: implement crc using riscv carryless multiply net: implement crc using riscv carryless multiply examples/l3fwd: use accelerated crc on riscv ipfrag: use accelerated crc on riscv MAINTAINERS | 2 + app/test/test_crc.c | 9 ++ app/test/test_hash.c | 7 ++ config/riscv/meson.build | 7 ++ examples/l3fwd/l3fwd_em.c | 2 +- lib/hash/meson.build | 1 + lib/hash/rte_crc_riscv64.h | 89 +++++++++++++++ lib/hash/rte_hash_crc.c | 12 +- lib/hash/rte_hash_crc.h | 6 +- lib/ip_frag/ip_frag_internal.c | 6 +- lib/net/meson.build | 4 + lib/net/net_crc.h | 11 ++ lib/net/net_crc_zbc.c | 202 +++++++++++++++++++++++++++++++++ lib/net/rte_net_crc.c | 35 ++++++ lib/net/rte_net_crc.h | 2 + 15 files changed, 389 insertions(+), 6 deletions(-) create mode 100644 lib/hash/rte_crc_riscv64.h create mode 100644 lib/net/net_crc_zbc.c -- 2.39.2