DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH 0/6] Clean up many __builtin_* in drivers.
@ 2024-10-24 12:05 David Marchand
  2024-10-24 12:05 ` [PATCH 1/6] devtools: handle multiple pattern for skipping files David Marchand
                   ` (5 more replies)
  0 siblings, 6 replies; 16+ messages in thread
From: David Marchand @ 2024-10-24 12:05 UTC (permalink / raw)
  To: dev; +Cc: thomas, ferruh.yigit, stephen, mattias.ronnblom

Here is a series cleaning up most of uses of __builtin_* helpers in
drivers when they have a direct replacement in EAL.

checkpatch is extended to forbid new additions.

-- 
David Marchand

David Marchand (6):
  devtools: handle multiple pattern for skipping files
  devtools: forbid use of builtin helpers
  common/dpaax: use prefetch macros
  crypto/openssl: fix 3DES-CTR with big endian CPUs
  drivers: use branch prediction macros
  drivers: use bitops API instead of compiler builtins

 devtools/check-forbidden-tokens.awk      | 21 +++++++++++++++------
 devtools/checkpatches.sh                 |  7 ++++---
 drivers/bus/fslmc/qbman/include/compat.h |  6 ++----
 drivers/common/dpaax/compat.h            |  6 ++++--
 drivers/common/nfp/nfp_platform.h        |  4 +++-
 drivers/crypto/openssl/rte_openssl_pmd.c |  5 +++--
 drivers/dma/hisilicon/hisi_dmadev.h      |  3 ++-
 drivers/ml/cnxk/cn10k_ml_ocm.c           |  7 ++++---
 drivers/net/bnxt/bnxt_rxtx_vec_neon.c    |  4 ++--
 drivers/net/bnxt/tf_ulp/ulp_flow_db.c    |  6 ++++--
 drivers/net/bnxt/tf_ulp/ulp_gen_hash.c   |  4 +++-
 drivers/net/bonding/rte_eth_bond_pmd.c   |  3 ++-
 drivers/net/cpfl/cpfl_flow_engine_fxp.c  |  5 ++++-
 drivers/net/enetfec/enet_ethdev.c        |  5 +++--
 drivers/net/enetfec/enet_ethdev.h        |  6 ------
 drivers/net/hns3/hns3_rxtx_vec_neon.h    |  4 +++-
 drivers/net/i40e/i40e_rxtx_vec_neon.c    |  4 +++-
 drivers/net/iavf/iavf_rxtx_vec_neon.c    |  4 +++-
 drivers/net/mlx5/hws/mlx5dr_definer.c    |  8 +++++---
 drivers/net/mlx5/mlx5_flow_dv.c          |  3 ++-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h    | 12 ++++++------
 drivers/net/mlx5/mlx5_tx.c               |  2 +-
 drivers/net/qede/base/bcm_osal.h         |  4 +++-
 drivers/net/vmxnet3/base/vmxnet3_osdep.h |  4 +++-
 24 files changed, 84 insertions(+), 53 deletions(-)

-- 
2.46.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/6] devtools: handle multiple pattern for skipping files
  2024-10-24 12:05 [PATCH 0/6] Clean up many __builtin_* in drivers David Marchand
@ 2024-10-24 12:05 ` David Marchand
  2024-10-24 12:05 ` [PATCH 2/6] devtools: forbid use of builtin helpers David Marchand
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 16+ messages in thread
From: David Marchand @ 2024-10-24 12:05 UTC (permalink / raw)
  To: dev; +Cc: thomas, ferruh.yigit, stephen, mattias.ronnblom

We may want to skip multiple patterns when forbidding use of some
expression.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 devtools/check-forbidden-tokens.awk | 21 +++++++++++++++------
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/devtools/check-forbidden-tokens.awk b/devtools/check-forbidden-tokens.awk
index 59b1121090..28d32fc086 100755
--- a/devtools/check-forbidden-tokens.awk
+++ b/devtools/check-forbidden-tokens.awk
@@ -10,6 +10,7 @@
 BEGIN {
 	split(FOLDERS,deny_folders," ");
 	split(EXPRESSIONS,deny_expr," ");
+	split(SKIP_FILES,skip_files," ");
 	in_file=0;
 	in_comment=0;
 	count=0;
@@ -56,14 +57,22 @@ BEGIN {
 	}
 	count = 0
 	for (i in deny_folders) {
-		re = "^\\+\\+\\+ b/" deny_folders[i];
-		if ($0 ~ re) {
-			# Check only if the files are not part of SKIP_FILES
-			if (!(length(SKIP_FILES) && ($re ~ SKIP_FILES))) {
-				in_file = 1
-				last_file = $0
+		if (!($0 ~ "^\\+\\+\\+ b/" deny_folders[i])) {
+			continue
+		}
+		skip = 0
+		for (j in skip_files) {
+			if (!($0 ~ "^\\+\\+\\+ b/" skip_files[j])) {
+				continue
 			}
+			skip = 1
+			break
+		}
+		if (skip == 0) {
+			in_file = 1
+			last_file = $0
 		}
+		break
 	}
 }
 END {
-- 
2.46.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/6] devtools: forbid use of builtin helpers
  2024-10-24 12:05 [PATCH 0/6] Clean up many __builtin_* in drivers David Marchand
  2024-10-24 12:05 ` [PATCH 1/6] devtools: handle multiple pattern for skipping files David Marchand
@ 2024-10-24 12:05 ` David Marchand
  2024-10-24 16:40   ` Stephen Hemminger
  2024-10-24 12:05 ` [PATCH 3/6] common/dpaax: use prefetch macros David Marchand
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: David Marchand @ 2024-10-24 12:05 UTC (permalink / raw)
  To: dev; +Cc: thomas, ferruh.yigit, stephen, mattias.ronnblom

Warn on use of any __builtin helpers, but leave it open for EAL (as it
is where the abstractions for OS and compiler differences are), and
some drivers base code.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 devtools/checkpatches.sh | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/devtools/checkpatches.sh b/devtools/checkpatches.sh
index d860f19045..4a8591be22 100755
--- a/devtools/checkpatches.sh
+++ b/devtools/checkpatches.sh
@@ -177,11 +177,12 @@ check_forbidden_additions() { # <patch>
 		-f $(dirname $(readlink -f $0))/check-forbidden-tokens.awk \
 		"$1" || res=1
 
-	# forbid use of non abstracted bit count operations
+	# forbid use of compiler __builtin_*
 	awk -v FOLDERS="lib drivers app examples" \
-		-v EXPRESSIONS='\\<__builtin_(clz|ctz|ffs|popcount)(ll)?\\>' \
+		-v SKIP_FILES='lib/eal/ drivers/.*/base/ drivers/.*osdep.h$' \
+		-v EXPRESSIONS='\\<__builtin_' \
 		-v RET_ON_FAIL=1 \
-		-v MESSAGE='Using __builtin helpers for bit count operations' \
+		-v MESSAGE='Using __builtin helpers, prefer EAL macros' \
 		-f $(dirname $(readlink -f $0))/check-forbidden-tokens.awk \
 		"$1" || res=1
 
-- 
2.46.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 3/6] common/dpaax: use prefetch macros
  2024-10-24 12:05 [PATCH 0/6] Clean up many __builtin_* in drivers David Marchand
  2024-10-24 12:05 ` [PATCH 1/6] devtools: handle multiple pattern for skipping files David Marchand
  2024-10-24 12:05 ` [PATCH 2/6] devtools: forbid use of builtin helpers David Marchand
@ 2024-10-24 12:05 ` David Marchand
  2024-10-24 16:40   ` Stephen Hemminger
  2024-10-24 12:05 ` [PATCH 4/6] crypto/openssl: fix 3DES-CTR with big endian CPUs David Marchand
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 16+ messages in thread
From: David Marchand @ 2024-10-24 12:05 UTC (permalink / raw)
  To: dev
  Cc: thomas, ferruh.yigit, stephen, mattias.ronnblom, Hemant Agrawal,
	Sachin Saxena

Prefer EAL macros over __builtin_ helpers.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 drivers/common/dpaax/compat.h | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/common/dpaax/compat.h b/drivers/common/dpaax/compat.h
index cbabc1588b..7c8d82c2b2 100644
--- a/drivers/common/dpaax/compat.h
+++ b/drivers/common/dpaax/compat.h
@@ -30,6 +30,7 @@
 #include <assert.h>
 #include <dirent.h>
 #include <inttypes.h>
+
 #include <rte_byteorder.h>
 #include <rte_atomic.h>
 #include <rte_spinlock.h>
@@ -37,6 +38,7 @@
 #include <rte_debug.h>
 #include <rte_cycles.h>
 #include <rte_malloc.h>
+#include <rte_prefetch.h>
 
 /* The following definitions are primarily to allow the single-source driver
  * interfaces to be included by arbitrary program code. Ie. for interfaces that
@@ -142,8 +144,8 @@ static inline void out_be32(volatile void *__p, u32 val)
 #define hwsync() rte_rmb()
 #define lwsync() rte_wmb()
 
-#define dcbt_ro(p) __builtin_prefetch(p, 0)
-#define dcbt_rw(p) __builtin_prefetch(p, 1)
+#define dcbt_ro(p) rte_prefetch0(p)
+#define dcbt_rw(p) rte_prefetch0_write(p)
 
 #if defined(RTE_ARCH_ARM)
 #if defined(RTE_ARCH_64)
-- 
2.46.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 4/6] crypto/openssl: fix 3DES-CTR with big endian CPUs
  2024-10-24 12:05 [PATCH 0/6] Clean up many __builtin_* in drivers David Marchand
                   ` (2 preceding siblings ...)
  2024-10-24 12:05 ` [PATCH 3/6] common/dpaax: use prefetch macros David Marchand
@ 2024-10-24 12:05 ` David Marchand
  2024-10-24 12:54   ` Morten Brørup
  2024-10-24 12:05 ` [PATCH 5/6] drivers: use branch prediction macros David Marchand
  2024-10-24 12:05 ` [PATCH 6/6] drivers: use bitops API instead of compiler builtins David Marchand
  5 siblings, 1 reply; 16+ messages in thread
From: David Marchand @ 2024-10-24 12:05 UTC (permalink / raw)
  To: dev
  Cc: thomas, ferruh.yigit, stephen, mattias.ronnblom, stable, Kai Ji,
	Slawomir Mrozowicz, Tomasz Kulasek, Daniel Mrzyglod,
	Pablo de Lara, Michal Kobylinski

Caught by code review.

Don't byte swap unconditionally (assuming that CPU is little endian is
wrong). Instead, convert from big endian to cpu and vice versa.

Fixes: d61f70b4c918 ("crypto/libcrypto: add driver for OpenSSL library")
Cc: stable@dpdk.org

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 drivers/crypto/openssl/rte_openssl_pmd.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/openssl/rte_openssl_pmd.c b/drivers/crypto/openssl/rte_openssl_pmd.c
index 9657b70c7a..9f5f3cda7d 100644
--- a/drivers/crypto/openssl/rte_openssl_pmd.c
+++ b/drivers/crypto/openssl/rte_openssl_pmd.c
@@ -2,6 +2,7 @@
  * Copyright(c) 2016-2017 Intel Corporation
  */
 
+#include <rte_byteorder.h>
 #include <rte_common.h>
 #include <rte_hexdump.h>
 #include <rte_cryptodev.h>
@@ -110,9 +111,9 @@ ctr_inc(uint8_t *ctr)
 {
 	uint64_t *ctr64 = (uint64_t *)ctr;
 
-	*ctr64 = __builtin_bswap64(*ctr64);
+	*ctr64 = rte_be_to_cpu_64(*ctr64);
 	(*ctr64)++;
-	*ctr64 = __builtin_bswap64(*ctr64);
+	*ctr64 = rte_cpu_to_be_64(*ctr64);
 }
 
 /*
-- 
2.46.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 5/6] drivers: use branch prediction macros
  2024-10-24 12:05 [PATCH 0/6] Clean up many __builtin_* in drivers David Marchand
                   ` (3 preceding siblings ...)
  2024-10-24 12:05 ` [PATCH 4/6] crypto/openssl: fix 3DES-CTR with big endian CPUs David Marchand
@ 2024-10-24 12:05 ` David Marchand
  2024-10-24 16:41   ` Stephen Hemminger
  2024-10-24 12:05 ` [PATCH 6/6] drivers: use bitops API instead of compiler builtins David Marchand
  5 siblings, 1 reply; 16+ messages in thread
From: David Marchand @ 2024-10-24 12:05 UTC (permalink / raw)
  To: dev
  Cc: thomas, ferruh.yigit, stephen, mattias.ronnblom, Hemant Agrawal,
	Sachin Saxena, Devendra Singh Rawat, Alok Prasad, Jochen Behrens

Prefer EAL macros over __builtin_ helpers.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 drivers/bus/fslmc/qbman/include/compat.h | 6 ++----
 drivers/net/qede/base/bcm_osal.h         | 4 +++-
 drivers/net/vmxnet3/base/vmxnet3_osdep.h | 4 +++-
 3 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/bus/fslmc/qbman/include/compat.h b/drivers/bus/fslmc/qbman/include/compat.h
index ece5da5906..4ac3254bc7 100644
--- a/drivers/bus/fslmc/qbman/include/compat.h
+++ b/drivers/bus/fslmc/qbman/include/compat.h
@@ -16,7 +16,9 @@
 #include <malloc.h>
 #include <unistd.h>
 #include <linux/types.h>
+
 #include <rte_atomic.h>
+#include <rte_branch_prediction.h>
 
 /* The following definitions are primarily to allow the single-source driver
  * interfaces to be included by arbitrary program code. Ie. for interfaces that
@@ -24,10 +26,6 @@
  * with certain attributes and types used in those interfaces.
  */
 
-/* Required compiler attributes */
-#define likely(x)	__builtin_expect(!!(x), 1)
-#define unlikely(x)	__builtin_expect(!!(x), 0)
-
 /* Required types */
 typedef uint64_t	dma_addr_t;
 
diff --git a/drivers/net/qede/base/bcm_osal.h b/drivers/net/qede/base/bcm_osal.h
index 7869103c63..357981f63d 100644
--- a/drivers/net/qede/base/bcm_osal.h
+++ b/drivers/net/qede/base/bcm_osal.h
@@ -10,7 +10,9 @@
 #include <stdbool.h>
 #include <pthread.h>
 #include <time.h>
+
 #include <rte_bitops.h>
+#include <rte_branch_prediction.h>
 #include <rte_byteorder.h>
 #include <rte_spinlock.h>
 #include <rte_malloc.h>
@@ -442,7 +444,7 @@ u32 qede_osal_log2(u32);
 #define OSAL_CACHE_LINE_SIZE RTE_CACHE_LINE_SIZE
 #define OSAL_IOMEM volatile
 #define OSAL_UNUSED    __rte_unused
-#define OSAL_UNLIKELY(x)  __builtin_expect(!!(x), 0)
+#define OSAL_UNLIKELY(x) unlikely(x)
 #define OSAL_MIN_T(type, __min1, __min2) RTE_MIN_T(__min1, __min2, type)
 #define OSAL_MAX_T(type, __max1, __max2) RTE_MAX_T(__max1, __max2, type)
 
diff --git a/drivers/net/vmxnet3/base/vmxnet3_osdep.h b/drivers/net/vmxnet3/base/vmxnet3_osdep.h
index 381a68db69..b1cd9ed056 100644
--- a/drivers/net/vmxnet3/base/vmxnet3_osdep.h
+++ b/drivers/net/vmxnet3/base/vmxnet3_osdep.h
@@ -7,13 +7,15 @@
 
 #include <stdbool.h>
 
+#include <rte_branch_prediction.h>
+
 typedef uint64_t	uint64;
 typedef uint32_t	uint32;
 typedef uint16_t	uint16;
 typedef uint8_t		uint8;
 
 #ifndef UNLIKELY
-#define UNLIKELY(x)  __builtin_expect((x),0)
+#define UNLIKELY(x)  unlikely(x)
 #endif /* unlikely */
 
 #endif /* _VMXNET3_OSDEP_H */
-- 
2.46.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 6/6] drivers: use bitops API instead of compiler builtins
  2024-10-24 12:05 [PATCH 0/6] Clean up many __builtin_* in drivers David Marchand
                   ` (4 preceding siblings ...)
  2024-10-24 12:05 ` [PATCH 5/6] drivers: use branch prediction macros David Marchand
@ 2024-10-24 12:05 ` David Marchand
  2024-10-24 12:25   ` Morten Brørup
  5 siblings, 1 reply; 16+ messages in thread
From: David Marchand @ 2024-10-24 12:05 UTC (permalink / raw)
  To: dev
  Cc: thomas, ferruh.yigit, stephen, mattias.ronnblom, Chaoyong He,
	Chengwen Feng, Srikanth Yalavarthi, Wathsala Vithanage,
	Ajit Khaparde, Somnath Kotur, Chas Williams, Min Hu (Connor),
	Apeksha Gupta, Sachin Saxena, Jie Hai, Jingjing Wu,
	Dariusz Sosnowski, Viacheslav Ovsiienko, Bing Zhao, Ori Kam,
	Suanming Mou, Matan Azrad

Stop using directly __builtin_ bit operations,
prefer existing DPDK wrappers.

Note: this is a brute sed all over drivers (skipping base drivers)
for __builtin_* that have a direct replacement in EAL bitops.
There is more work to do, like adding some missing macros inspired from
kernel (FIELD_*) macros but this is left for later.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 drivers/common/nfp/nfp_platform.h       |  4 +++-
 drivers/dma/hisilicon/hisi_dmadev.h     |  3 ++-
 drivers/ml/cnxk/cn10k_ml_ocm.c          |  7 ++++---
 drivers/net/bnxt/bnxt_rxtx_vec_neon.c   |  4 ++--
 drivers/net/bnxt/tf_ulp/ulp_flow_db.c   |  6 ++++--
 drivers/net/bnxt/tf_ulp/ulp_gen_hash.c  |  4 +++-
 drivers/net/bonding/rte_eth_bond_pmd.c  |  3 ++-
 drivers/net/cpfl/cpfl_flow_engine_fxp.c |  5 ++++-
 drivers/net/enetfec/enet_ethdev.c       |  5 +++--
 drivers/net/enetfec/enet_ethdev.h       |  6 ------
 drivers/net/hns3/hns3_rxtx_vec_neon.h   |  4 +++-
 drivers/net/i40e/i40e_rxtx_vec_neon.c   |  4 +++-
 drivers/net/iavf/iavf_rxtx_vec_neon.c   |  4 +++-
 drivers/net/mlx5/hws/mlx5dr_definer.c   |  8 +++++---
 drivers/net/mlx5/mlx5_flow_dv.c         |  3 ++-
 drivers/net/mlx5/mlx5_rxtx_vec_neon.h   | 12 ++++++------
 drivers/net/mlx5/mlx5_tx.c              |  2 +-
 17 files changed, 50 insertions(+), 34 deletions(-)

diff --git a/drivers/common/nfp/nfp_platform.h b/drivers/common/nfp/nfp_platform.h
index 1687942e41..0b02fcf1e8 100644
--- a/drivers/common/nfp/nfp_platform.h
+++ b/drivers/common/nfp/nfp_platform.h
@@ -8,6 +8,8 @@
 
 #include <stdint.h>
 
+#include <rte_bitops.h>
+
 #define DIV_ROUND_UP(n, d)             (((n) + (d) - 1) / (d))
 
 #define DMA_BIT_MASK(n)    ((1ULL << (n)) - 1)
@@ -21,7 +23,7 @@
 #define GENMASK_ULL(h, l) \
 	((~0ULL << (l)) & (~0ULL >> (BITS_PER_LONG_LONG - (h) - 1)))
 
-#define __bf_shf(x) (__builtin_ffsll(x) - 1)
+#define __bf_shf(x) rte_bsf64(x)
 
 #define FIELD_GET(_mask, _reg) \
 	(__extension__ ({ \
diff --git a/drivers/dma/hisilicon/hisi_dmadev.h b/drivers/dma/hisilicon/hisi_dmadev.h
index a57b5c759a..786fe3cc0e 100644
--- a/drivers/dma/hisilicon/hisi_dmadev.h
+++ b/drivers/dma/hisilicon/hisi_dmadev.h
@@ -5,6 +5,7 @@
 #ifndef HISI_DMADEV_H
 #define HISI_DMADEV_H
 
+#include <rte_bitops.h>
 #include <rte_byteorder.h>
 #include <rte_common.h>
 #include <rte_memzone.h>
@@ -14,7 +15,7 @@
 #define BITS_PER_LONG	(__SIZEOF_LONG__ * 8)
 #define GENMASK(h, l) \
 		(((~0UL) << (l)) & (~0UL >> (BITS_PER_LONG - 1 - (h))))
-#define BF_SHF(x) (__builtin_ffsll(x) - 1)
+#define BF_SHF(x) rte_bsf64(x)
 #define FIELD_GET(mask, reg) \
 		((typeof(mask))(((reg) & (mask)) >> BF_SHF(mask)))
 
diff --git a/drivers/ml/cnxk/cn10k_ml_ocm.c b/drivers/ml/cnxk/cn10k_ml_ocm.c
index 749ddeb344..0032fe82da 100644
--- a/drivers/ml/cnxk/cn10k_ml_ocm.c
+++ b/drivers/ml/cnxk/cn10k_ml_ocm.c
@@ -2,6 +2,7 @@
  * Copyright (c) 2022 Marvell.
  */
 
+#include <rte_bitops.h>
 #include <rte_mldev_pmd.h>
 
 #include <roc_api.h>
@@ -203,11 +204,11 @@ cn10k_ml_ocm_tilecount(uint64_t tilemask, int *start, int *end)
 
 	PLT_ASSERT(tilemask != 0);
 
-	*start = __builtin_ctzl(tilemask);
-	*end = 64 - __builtin_clzl(tilemask) - 1;
+	*start = rte_ctz64(tilemask);
+	*end = 64 - rte_clz64(tilemask) - 1;
 	count = *end - *start + 1;
 
-	PLT_ASSERT(count == __builtin_popcountl(tilemask));
+	PLT_ASSERT(count == rte_popcount64(tilemask));
 	return count;
 }
 
diff --git a/drivers/net/bnxt/bnxt_rxtx_vec_neon.c b/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
index 355d41bbd3..840b21cef9 100644
--- a/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
+++ b/drivers/net/bnxt/bnxt_rxtx_vec_neon.c
@@ -4,7 +4,7 @@
 #include <inttypes.h>
 #include <stdbool.h>
 
-#include <rte_bitmap.h>
+#include <rte_bitops.h>
 #include <rte_byteorder.h>
 #include <rte_malloc.h>
 #include <rte_memory.h>
@@ -290,7 +290,7 @@ recv_burst_vec_neon(void *rx_queue, struct rte_mbuf **rx_pkts, uint16_t nb_pkts)
 		if (valid == 0)
 			num_valid = 4;
 		else
-			num_valid = __builtin_ctzl(valid) / 16;
+			num_valid = rte_ctz64(valid) / 16;
 
 		if (num_valid == 0)
 			break;
diff --git a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
index 2e6ea43ac1..aac974a970 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_flow_db.c
@@ -3,7 +3,9 @@
  * All rights reserved.
  */
 
+#include <rte_bitops.h>
 #include <rte_malloc.h>
+
 #include "bnxt.h"
 #include "bnxt_tf_common.h"
 #include "ulp_utils.h"
@@ -938,7 +940,7 @@ ulp_flow_db_next_entry_get(struct bnxt_ulp_flow_db *flow_db,
 		 */
 		if (s_idx == idx)
 			bs &= (-1UL >> mod_fid);
-		lfid = (idx * ULP_INDEX_BITMAP_SIZE) + __builtin_clzl(bs);
+		lfid = (idx * ULP_INDEX_BITMAP_SIZE) + rte_clz64(bs);
 		if (*fid >= lfid) {
 			BNXT_TF_DBG(ERR, "Flow Database is corrupt\n");
 			return -ENOENT;
@@ -1480,7 +1482,7 @@ ulp_flow_db_parent_child_flow_next_entry_get(struct bnxt_ulp_flow_db *flow_db,
 		 */
 		if (s_idx == idx)
 			bs &= (-1UL >> mod_fid);
-		next_fid = (idx * ULP_INDEX_BITMAP_SIZE) + __builtin_clzl(bs);
+		next_fid = (idx * ULP_INDEX_BITMAP_SIZE) + rte_clz64(bs);
 		if (*child_fid >= next_fid) {
 			BNXT_TF_DBG(ERR, "Parent Child Database is corrupt\n");
 			return -ENOENT;
diff --git a/drivers/net/bnxt/tf_ulp/ulp_gen_hash.c b/drivers/net/bnxt/tf_ulp/ulp_gen_hash.c
index d746fbbd4e..9f27b56334 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_gen_hash.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_gen_hash.c
@@ -3,8 +3,10 @@
  * All rights reserved.
  */
 
+#include <rte_bitops.h>
 #include <rte_log.h>
 #include <rte_malloc.h>
+
 #include "bnxt_tf_common.h"
 #include "ulp_gen_hash.h"
 #include "ulp_utils.h"
@@ -25,7 +27,7 @@ int32_t ulp_bit_alloc_list_alloc(struct bit_alloc_list *blist,
 
 	if (idx <= bsize_64) {
 		if (bentry)
-			jdx = __builtin_clzl(~bentry);
+			jdx = rte_clz64(~bentry);
 		*index = ((idx - 1) * ULP_INDEX_BITMAP_SIZE) + jdx;
 		ULP_INDEX_BITMAP_SET(blist->bdata[(idx - 1)], jdx);
 		return 0;
diff --git a/drivers/net/bonding/rte_eth_bond_pmd.c b/drivers/net/bonding/rte_eth_bond_pmd.c
index cda1c37124..91bf2c2345 100644
--- a/drivers/net/bonding/rte_eth_bond_pmd.c
+++ b/drivers/net/bonding/rte_eth_bond_pmd.c
@@ -5,6 +5,7 @@
 #include <stdbool.h>
 #include <netinet/in.h>
 
+#include <rte_bitops.h>
 #include <rte_mbuf.h>
 #include <rte_malloc.h>
 #include <ethdev_driver.h>
@@ -3982,7 +3983,7 @@ bond_ethdev_configure(struct rte_eth_dev *dev)
 		 * Two '1' in binary of 'link_speeds': bit0 and a unique
 		 * speed bit.
 		 */
-		if (__builtin_popcountl(link_speeds) != 2) {
+		if (rte_popcount64(link_speeds) != 2) {
 			RTE_BOND_LOG(ERR, "please set a unique speed.");
 			return -EINVAL;
 		}
diff --git a/drivers/net/cpfl/cpfl_flow_engine_fxp.c b/drivers/net/cpfl/cpfl_flow_engine_fxp.c
index 2c75ea6577..0101c30911 100644
--- a/drivers/net/cpfl/cpfl_flow_engine_fxp.c
+++ b/drivers/net/cpfl/cpfl_flow_engine_fxp.c
@@ -10,6 +10,8 @@
 #include <unistd.h>
 #include <stdarg.h>
 #include <math.h>
+
+#include <rte_bitops.h>
 #include <rte_debug.h>
 #include <rte_ether.h>
 #include <rte_log.h>
@@ -20,6 +22,7 @@
 #include <rte_flow.h>
 #include <rte_bitmap.h>
 #include <ethdev_driver.h>
+
 #include "cpfl_rules.h"
 #include "cpfl_logs.h"
 #include "cpfl_ethdev.h"
@@ -608,7 +611,7 @@ cpfl_fxp_mod_idx_alloc(struct cpfl_adapter_ext *ad)
 	if (!rte_bitmap_scan(ad->mod_bm, &pos, &slab))
 		return CPFL_MAX_MOD_CONTENT_INDEX;
 
-	pos += __builtin_ffsll(slab) - 1;
+	pos += rte_bsf64(slab);
 	rte_bitmap_clear(ad->mod_bm, pos);
 
 	return pos;
diff --git a/drivers/net/enetfec/enet_ethdev.c b/drivers/net/enetfec/enet_ethdev.c
index 8c7067fbb5..4151d7fca3 100644
--- a/drivers/net/enetfec/enet_ethdev.c
+++ b/drivers/net/enetfec/enet_ethdev.c
@@ -6,6 +6,7 @@
 
 #include <ethdev_vdev.h>
 #include <ethdev_driver.h>
+#include <rte_bitops.h>
 #include <rte_io.h>
 
 #include "enet_pmd_logs.h"
@@ -374,7 +375,7 @@ enetfec_tx_queue_setup(struct rte_eth_dev *dev,
 	unsigned int size;
 	unsigned int dsize = fep->bufdesc_ex ? sizeof(struct bufdesc_ex) :
 		sizeof(struct bufdesc);
-	unsigned int dsize_log2 = fls64(dsize);
+	unsigned int dsize_log2 = rte_fls_u64(dsize);
 
 	/* Tx deferred start is not supported */
 	if (tx_conf->tx_deferred_start) {
@@ -453,7 +454,7 @@ enetfec_rx_queue_setup(struct rte_eth_dev *dev,
 	unsigned int size;
 	unsigned int dsize = fep->bufdesc_ex ? sizeof(struct bufdesc_ex) :
 			sizeof(struct bufdesc);
-	unsigned int dsize_log2 = fls64(dsize);
+	unsigned int dsize_log2 = rte_fls_u64(dsize);
 
 	/* Rx deferred start is not supported */
 	if (rx_conf->rx_deferred_start) {
diff --git a/drivers/net/enetfec/enet_ethdev.h b/drivers/net/enetfec/enet_ethdev.h
index 02a3397890..4e196b8552 100644
--- a/drivers/net/enetfec/enet_ethdev.h
+++ b/drivers/net/enetfec/enet_ethdev.h
@@ -125,12 +125,6 @@ bufdesc *enet_get_nextdesc(struct bufdesc *bdp, struct bufdesc_prop *bd)
 		: (struct bufdesc *)(((uintptr_t)bdp) + bd->d_size);
 }
 
-static inline int
-fls64(unsigned long word)
-{
-	return (64 - __builtin_clzl(word)) - 1;
-}
-
 static inline struct
 bufdesc *enet_get_prevdesc(struct bufdesc *bdp, struct bufdesc_prop *bd)
 {
diff --git a/drivers/net/hns3/hns3_rxtx_vec_neon.h b/drivers/net/hns3/hns3_rxtx_vec_neon.h
index 0dc6b9f0a2..bbb5478015 100644
--- a/drivers/net/hns3/hns3_rxtx_vec_neon.h
+++ b/drivers/net/hns3/hns3_rxtx_vec_neon.h
@@ -5,6 +5,8 @@
 #ifndef HNS3_RXTX_VEC_NEON_H
 #define HNS3_RXTX_VEC_NEON_H
 
+#include <rte_bitops.h>
+
 #include <arm_neon.h>
 
 #pragma GCC diagnostic ignored "-Wcast-qual"
@@ -189,7 +191,7 @@ hns3_recv_burst_vec(struct hns3_rx_queue *__restrict rxq,
 		if (likely(stat == 0))
 			bd_valid_num = HNS3_DEFAULT_DESCS_PER_LOOP;
 		else
-			bd_valid_num = __builtin_ctzl(stat) / HNS3_UINT16_BIT;
+			bd_valid_num = rte_ctz64(stat) / HNS3_UINT16_BIT;
 		if (bd_valid_num == 0)
 			break;
 
diff --git a/drivers/net/i40e/i40e_rxtx_vec_neon.c b/drivers/net/i40e/i40e_rxtx_vec_neon.c
index 3a99137b5e..e1c5c7041b 100644
--- a/drivers/net/i40e/i40e_rxtx_vec_neon.c
+++ b/drivers/net/i40e/i40e_rxtx_vec_neon.c
@@ -4,7 +4,9 @@
  */
 
 #include <stdint.h>
+
 #include <ethdev_driver.h>
+#include <rte_bitops.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
 
@@ -558,7 +560,7 @@ _recv_raw_pkts_vec(struct i40e_rx_queue *__rte_restrict rxq,
 		if (unlikely(stat == 0)) {
 			nb_pkts_recd += RTE_I40E_DESCS_PER_LOOP;
 		} else {
-			nb_pkts_recd += __builtin_ctzl(stat) / I40E_UINT16_BIT;
+			nb_pkts_recd += rte_ctz64(stat) / I40E_UINT16_BIT;
 			break;
 		}
 	}
diff --git a/drivers/net/iavf/iavf_rxtx_vec_neon.c b/drivers/net/iavf/iavf_rxtx_vec_neon.c
index 20b656e899..04be574683 100644
--- a/drivers/net/iavf/iavf_rxtx_vec_neon.c
+++ b/drivers/net/iavf/iavf_rxtx_vec_neon.c
@@ -4,7 +4,9 @@
  */
 
 #include <stdint.h>
+
 #include <ethdev_driver.h>
+#include <rte_bitops.h>
 #include <rte_malloc.h>
 #include <rte_vect.h>
 
@@ -366,7 +368,7 @@ _recv_raw_pkts_vec(struct iavf_rx_queue *__rte_restrict rxq,
 		if (unlikely(stat == 0)) {
 			nb_pkts_recd += IAVF_VPMD_DESCS_PER_LOOP;
 		} else {
-			nb_pkts_recd += __builtin_ctzl(stat) / IAVF_UINT16_BIT;
+			nb_pkts_recd += rte_ctz64(stat) / IAVF_UINT16_BIT;
 			break;
 		}
 	}
diff --git a/drivers/net/mlx5/hws/mlx5dr_definer.c b/drivers/net/mlx5/hws/mlx5dr_definer.c
index a9fa5d06ed..5c2e889444 100644
--- a/drivers/net/mlx5/hws/mlx5dr_definer.c
+++ b/drivers/net/mlx5/hws/mlx5dr_definer.c
@@ -2,6 +2,8 @@
  * Copyright (c) 2022 NVIDIA Corporation & Affiliates
  */
 
+#include <rte_bitops.h>
+
 #include "mlx5dr_internal.h"
 
 #define GTP_PDU_SC	0x85
@@ -1548,7 +1550,7 @@ mlx5dr_definer_conv_item_port(struct mlx5dr_definer_conv_data *cd,
 		fc->tag_set = &mlx5dr_definer_vport_set;
 		fc->tag_mask_set = &mlx5dr_definer_ones_set;
 		DR_CALC_SET_HDR(fc, registers, register_c_0);
-		fc->bit_off = __builtin_ctz(caps->wire_regc_mask);
+		fc->bit_off = rte_ctz32(caps->wire_regc_mask);
 		fc->bit_mask = caps->wire_regc_mask >> fc->bit_off;
 		fc->dr_ctx = cd->ctx;
 	} else {
@@ -2666,8 +2668,8 @@ mlx5dr_definer_conv_item_geneve_opt(struct mlx5dr_definer_conv_data *cd,
 		fc->item_idx = item_idx;
 		fc->tag_set = &mlx5dr_definer_ones_set;
 		fc->byte_off = hl_ok_bit->dw_offset * DW_SIZE +
-				__builtin_clz(hl_ok_bit->dw_mask) / 8;
-		fc->bit_off = __builtin_ctz(hl_ok_bit->dw_mask);
+				rte_clz32(hl_ok_bit->dw_mask) / 8;
+		fc->bit_off = rte_ctz32(hl_ok_bit->dw_mask);
 		fc->bit_mask = 0x1;
 	}
 
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 201e215e4b..040727f2e8 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -8,6 +8,7 @@
 #include <string.h>
 #include <unistd.h>
 
+#include <rte_bitops.h>
 #include <rte_common.h>
 #include <rte_ether.h>
 #include <ethdev_driver.h>
@@ -9068,7 +9069,7 @@ flow_dv_validate(struct rte_eth_dev *dev, const struct rte_flow_attr *attr,
 		    !(non_shared_age && count) &&
 		    (attr->group || (attr->transfer && priv->fdb_def_rule)) &&
 		    priv->sh->flow_hit_aso_en);
-	if (__builtin_popcountl(aso_mask) > 1)
+	if (rte_popcount64(aso_mask) > 1)
 		return rte_flow_error_set(error, ENOTSUP, RTE_FLOW_ERROR_TYPE_ACTION,
 					  NULL, "unsupported combining AGE, METER, CT ASO actions in a single rule");
 	/*
diff --git a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
index 510f60b25d..0ce9827ed9 100644
--- a/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
+++ b/drivers/net/mlx5/mlx5_rxtx_vec_neon.h
@@ -11,6 +11,7 @@
 #include <stdlib.h>
 #include <arm_neon.h>
 
+#include <rte_bitops.h>
 #include <rte_mbuf.h>
 #include <rte_mempool.h>
 #include <rte_prefetch.h>
@@ -620,7 +621,7 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 
 	/*
 	 * Note that vectors have reverse order - {v3, v2, v1, v0}, because
-	 * there's no instruction to count trailing zeros. __builtin_clzl() is
+	 * there's no instruction to count trailing zeros. rte_clz64() is
 	 * used instead.
 	 *
 	 * A. copy 4 mbuf pointers from elts ring to returning pkts.
@@ -808,13 +809,12 @@ rxq_cq_process_v(struct mlx5_rxq_data *rxq, volatile struct mlx5_cqe *cq,
 		/* E.2 mask out invalid entries. */
 		comp_mask = vbic_u16(comp_mask, invalid_mask);
 		/* E.3 get the first compressed CQE. */
-		comp_idx = __builtin_clzl(vget_lane_u64(vreinterpret_u64_u16(
-					  comp_mask), 0)) /
-					  (sizeof(uint16_t) * 8);
+		comp_idx = rte_clz64(vget_lane_u64(vreinterpret_u64_u16(comp_mask), 0)) /
+			(sizeof(uint16_t) * 8);
 		invalid_mask = vorr_u16(invalid_mask, comp_mask);
 		/* D.7 count non-compressed valid CQEs. */
-		n = __builtin_clzl(vget_lane_u64(vreinterpret_u64_u16(
-				   invalid_mask), 0)) / (sizeof(uint16_t) * 8);
+		n = rte_clz64(vget_lane_u64(vreinterpret_u64_u16(invalid_mask), 0)) /
+			(sizeof(uint16_t) * 8);
 		nocmp_n += n;
 		/*
 		 * D.2 mask out entries after the compressed CQE.
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 04f80bb9bd..fc105970a3 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -619,7 +619,7 @@ mlx5_select_tx_function(struct rte_eth_dev *dev)
 		 * Check whether it has minimal amount
 		 * of not requested offloads.
 		 */
-		tmp = __builtin_popcountl(tmp & ~olx);
+		tmp = rte_popcount64(tmp & ~olx);
 		if (m >= RTE_DIM(txoff_func) || tmp < diff) {
 			/* First or better match, save and continue. */
 			m = i;
-- 
2.46.2


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 6/6] drivers: use bitops API instead of compiler builtins
  2024-10-24 12:05 ` [PATCH 6/6] drivers: use bitops API instead of compiler builtins David Marchand
@ 2024-10-24 12:25   ` Morten Brørup
  0 siblings, 0 replies; 16+ messages in thread
From: Morten Brørup @ 2024-10-24 12:25 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: thomas, ferruh.yigit, stephen, mattias.ronnblom, Chaoyong He,
	Chengwen Feng, Srikanth Yalavarthi, Wathsala Vithanage,
	Ajit Khaparde, Somnath Kotur, Chas Williams, Min Hu (Connor),
	Apeksha Gupta, Sachin Saxena, Jie Hai, Jingjing Wu,
	Dariusz Sosnowski, Viacheslav Ovsiienko, Bing Zhao, Ori Kam,
	Suanming Mou, Matan Azrad

> From: David Marchand [mailto:david.marchand@redhat.com]
> Sent: Thursday, 24 October 2024 14.06
> 
> Stop using directly __builtin_ bit operations,
> prefer existing DPDK wrappers.
> 
> Note: this is a brute sed all over drivers (skipping base drivers)
> for __builtin_* that have a direct replacement in EAL bitops.

There could be differences between 32 and 64 bit architectures.
Simple search-replace might not cut it.
Didn't review in detail, just speculating.

> There is more work to do, like adding some missing macros inspired from
> kernel (FIELD_*) macros but this is left for later.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>
> ---

^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 4/6] crypto/openssl: fix 3DES-CTR with big endian CPUs
  2024-10-24 12:05 ` [PATCH 4/6] crypto/openssl: fix 3DES-CTR with big endian CPUs David Marchand
@ 2024-10-24 12:54   ` Morten Brørup
  2024-10-24 13:10     ` David Marchand
  0 siblings, 1 reply; 16+ messages in thread
From: Morten Brørup @ 2024-10-24 12:54 UTC (permalink / raw)
  To: David Marchand, dev
  Cc: thomas, ferruh.yigit, stephen, mattias.ronnblom, stable, Kai Ji,
	Slawomir Mrozowicz, Tomasz Kulasek, Daniel Mrzyglod,
	Pablo de Lara, Michal Kobylinski

> From: David Marchand [mailto:david.marchand@redhat.com]
> Sent: Thursday, 24 October 2024 14.06
> 
> Caught by code review.
> 
> Don't byte swap unconditionally (assuming that CPU is little endian is
> wrong). Instead, convert from big endian to cpu and vice versa.

Yes looks like a bug.
I wonder if this PMD has more similar bugs...
grep bswap drivers/crypto/openssl/* says no.

> @@ -110,9 +111,9 @@ ctr_inc(uint8_t *ctr)
>  {
>  	uint64_t *ctr64 = (uint64_t *)ctr;
> 
> -	*ctr64 = __builtin_bswap64(*ctr64);
> +	*ctr64 = rte_be_to_cpu_64(*ctr64);
>  	(*ctr64)++;
> -	*ctr64 = __builtin_bswap64(*ctr64);
> +	*ctr64 = rte_cpu_to_be_64(*ctr64);
>  }

But that's not all.

There may be an alignment bug too; the way it is used in process_openssl_cipher_des3ctr(), "ctr" is not guaranteed to be uint64_t aligned.

How about this instead:

ctr_inc(void *ctr)
{
	uint64_t ctr64 = rte_be_to_cpu_64(*(unaligned_uint64_t *)ctr);
	ctr64++;
	*(unaligned_uint64_t *)ctr = rte_cpu_to_be_64(ctr64);
}

Or this:

ctr_inc(void *ctr)
{
	uint64_t ctr64;

	memcpy(&ctr64, ctr, sizeof(uint64_t));
	ctr64 = rte_be_to_cpu_64(ctr64);
	ctr64++;
	ctr64 = rte_cpu_to_be_64(ctr64);
	memcpy(ctr, &ctr64, sizeof(uint64_t));
}

Or use a union in process_openssl_cipher_des3ctr() to ensure it's uint64_t aligned.


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6] crypto/openssl: fix 3DES-CTR with big endian CPUs
  2024-10-24 12:54   ` Morten Brørup
@ 2024-10-24 13:10     ` David Marchand
  2024-10-24 13:17       ` David Marchand
  0 siblings, 1 reply; 16+ messages in thread
From: David Marchand @ 2024-10-24 13:10 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, thomas, ferruh.yigit, stephen, mattias.ronnblom, stable,
	Kai Ji, Slawomir Mrozowicz, Tomasz Kulasek, Daniel Mrzyglod,
	Pablo de Lara, Michal Kobylinski

On Thu, Oct 24, 2024 at 2:55 PM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > From: David Marchand [mailto:david.marchand@redhat.com]
> > Sent: Thursday, 24 October 2024 14.06
> >
> > Caught by code review.
> >
> > Don't byte swap unconditionally (assuming that CPU is little endian is
> > wrong). Instead, convert from big endian to cpu and vice versa.
>
> Yes looks like a bug.
> I wonder if this PMD has more similar bugs...
> grep bswap drivers/crypto/openssl/* says no.
>
> > @@ -110,9 +111,9 @@ ctr_inc(uint8_t *ctr)
> >  {
> >       uint64_t *ctr64 = (uint64_t *)ctr;
> >
> > -     *ctr64 = __builtin_bswap64(*ctr64);
> > +     *ctr64 = rte_be_to_cpu_64(*ctr64);
> >       (*ctr64)++;
> > -     *ctr64 = __builtin_bswap64(*ctr64);
> > +     *ctr64 = rte_cpu_to_be_64(*ctr64);
> >  }
>
> But that's not all.
>
> There may be an alignment bug too; the way it is used in process_openssl_cipher_des3ctr(), "ctr" is not guaranteed to be uint64_t aligned.
>
> How about this instead:
>
> ctr_inc(void *ctr)
> {
>         uint64_t ctr64 = rte_be_to_cpu_64(*(unaligned_uint64_t *)ctr);
>         ctr64++;
>         *(unaligned_uint64_t *)ctr = rte_cpu_to_be_64(ctr64);
> }
>
> Or this:
>
> ctr_inc(void *ctr)
> {
>         uint64_t ctr64;
>
>         memcpy(&ctr64, ctr, sizeof(uint64_t));
>         ctr64 = rte_be_to_cpu_64(ctr64);
>         ctr64++;
>         ctr64 = rte_cpu_to_be_64(ctr64);
>         memcpy(ctr, &ctr64, sizeof(uint64_t));
> }
>
> Or use a union in process_openssl_cipher_des3ctr() to ensure it's uint64_t aligned.

Or declare ctr as a uint64_t in process_openssl_cipher_des3ctr
directly, and remove this casting.


-- 
David Marchand


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6] crypto/openssl: fix 3DES-CTR with big endian CPUs
  2024-10-24 13:10     ` David Marchand
@ 2024-10-24 13:17       ` David Marchand
  2024-10-24 13:30         ` David Marchand
  0 siblings, 1 reply; 16+ messages in thread
From: David Marchand @ 2024-10-24 13:17 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, thomas, ferruh.yigit, stephen, mattias.ronnblom, stable,
	Kai Ji, Slawomir Mrozowicz, Tomasz Kulasek, Daniel Mrzyglod,
	Pablo de Lara, Michal Kobylinski

On Thu, Oct 24, 2024 at 3:10 PM David Marchand
<david.marchand@redhat.com> wrote:
> > There may be an alignment bug too; the way it is used in process_openssl_cipher_des3ctr(), "ctr" is not guaranteed to be uint64_t aligned.
> >
> > How about this instead:
> >
> > ctr_inc(void *ctr)
> > {
> >         uint64_t ctr64 = rte_be_to_cpu_64(*(unaligned_uint64_t *)ctr);
> >         ctr64++;
> >         *(unaligned_uint64_t *)ctr = rte_cpu_to_be_64(ctr64);
> > }
> >
> > Or this:
> >
> > ctr_inc(void *ctr)
> > {
> >         uint64_t ctr64;
> >
> >         memcpy(&ctr64, ctr, sizeof(uint64_t));
> >         ctr64 = rte_be_to_cpu_64(ctr64);
> >         ctr64++;
> >         ctr64 = rte_cpu_to_be_64(ctr64);
> >         memcpy(ctr, &ctr64, sizeof(uint64_t));
> > }
> >
> > Or use a union in process_openssl_cipher_des3ctr() to ensure it's uint64_t aligned.
>
> Or declare ctr as a uint64_t in process_openssl_cipher_des3ctr
> directly, and remove this casting.

Like:

diff --git a/drivers/crypto/openssl/rte_openssl_pmd.c
b/drivers/crypto/openssl/rte_openssl_pmd.c
index 9657b70c7a..8e193759b7 100644
--- a/drivers/crypto/openssl/rte_openssl_pmd.c
+++ b/drivers/crypto/openssl/rte_openssl_pmd.c
@@ -99,22 +99,6 @@ digest_name_get(enum rte_crypto_auth_algorithm algo)

 static int cryptodev_openssl_remove(struct rte_vdev_device *vdev);

-/*----------------------------------------------------------------------------*/
-
-/**
- * Increment counter by 1
- * Counter is 64 bit array, big-endian
- */
-static void
-ctr_inc(uint8_t *ctr)
-{
-       uint64_t *ctr64 = (uint64_t *)ctr;
-
-       *ctr64 = __builtin_bswap64(*ctr64);
-       (*ctr64)++;
-       *ctr64 = __builtin_bswap64(*ctr64);
-}
-
 /*
  *------------------------------------------------------------------------------
  * Session Prepare
@@ -1192,7 +1176,9 @@ static int
 process_openssl_cipher_des3ctr(struct rte_mbuf *mbuf_src, uint8_t *dst,
                int offset, uint8_t *iv, int srclen, EVP_CIPHER_CTX *ctx)
 {
-       uint8_t ebuf[8], ctr[8];
+       uint8_t ebuf[8];
+       uint64_t host_ctr;
+       uint64_t ctr;
        int unused, n;
        struct rte_mbuf *m;
        uint8_t *src;
@@ -1209,14 +1195,16 @@ process_openssl_cipher_des3ctr(struct rte_mbuf
*mbuf_src, uint8_t *dst,
        l = rte_pktmbuf_data_len(m) - offset;

        memcpy(ctr, iv, 8);
+       host_ctr = rte_be_64_to_cpu(ctr);

        for (n = 0; n < srclen; n++) {
                if (n % 8 == 0) {
+                       ctr = rte_cpu_to_be_64(host_ctr);
                        if (EVP_EncryptUpdate(ctx,
                                        (unsigned char *)&ebuf, &unused,
                                        (const unsigned char *)&ctr, 8) <= 0)
                                goto process_cipher_des3ctr_err;
-                       ctr_inc(ctr);
+                       host_ctr++;
                }
                dst[n] = *(src++) ^ ebuf[n % 8];


-- 
David Marchand


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/6] crypto/openssl: fix 3DES-CTR with big endian CPUs
  2024-10-24 13:17       ` David Marchand
@ 2024-10-24 13:30         ` David Marchand
  2024-10-24 14:21           ` Morten Brørup
  0 siblings, 1 reply; 16+ messages in thread
From: David Marchand @ 2024-10-24 13:30 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, thomas, ferruh.yigit, stephen, mattias.ronnblom, stable,
	Kai Ji, Slawomir Mrozowicz, Tomasz Kulasek, Daniel Mrzyglod,
	Pablo de Lara, Michal Kobylinski

On Thu, Oct 24, 2024 at 3:17 PM David Marchand
<david.marchand@redhat.com> wrote:
> @@ -1209,14 +1195,16 @@ process_openssl_cipher_des3ctr(struct rte_mbuf
> *mbuf_src, uint8_t *dst,
>         l = rte_pktmbuf_data_len(m) - offset;
>
>         memcpy(ctr, iv, 8);
> +       host_ctr = rte_be_64_to_cpu(ctr);
>
>         for (n = 0; n < srclen; n++) {
>                 if (n % 8 == 0) {
> +                       ctr = rte_cpu_to_be_64(host_ctr);

Moving this here adds one uneeded extra conversion on the first iteration.
So I would keep the conversion around the host_ctr variable increment,
if you get the idea.


>                         if (EVP_EncryptUpdate(ctx,
>                                         (unsigned char *)&ebuf, &unused,
>                                         (const unsigned char *)&ctr, 8) <= 0)
>                                 goto process_cipher_des3ctr_err;
> -                       ctr_inc(ctr);
> +                       host_ctr++;
>                 }
>                 dst[n] = *(src++) ^ ebuf[n % 8];

-- 
David Marchand


^ permalink raw reply	[flat|nested] 16+ messages in thread

* RE: [PATCH 4/6] crypto/openssl: fix 3DES-CTR with big endian CPUs
  2024-10-24 13:30         ` David Marchand
@ 2024-10-24 14:21           ` Morten Brørup
  0 siblings, 0 replies; 16+ messages in thread
From: Morten Brørup @ 2024-10-24 14:21 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, thomas, ferruh.yigit, stephen, mattias.ronnblom, stable,
	Kai Ji, Slawomir Mrozowicz, Tomasz Kulasek, Daniel Mrzyglod,
	Pablo de Lara, Michal Kobylinski

> From: David Marchand [mailto:david.marchand@redhat.com]
> Sent: Thursday, 24 October 2024 15.30
> 
> On Thu, Oct 24, 2024 at 3:17 PM David Marchand
> <david.marchand@redhat.com> wrote:
> > @@ -1209,14 +1195,16 @@ process_openssl_cipher_des3ctr(struct
> rte_mbuf
> > *mbuf_src, uint8_t *dst,
> >         l = rte_pktmbuf_data_len(m) - offset;
> >
> >         memcpy(ctr, iv, 8);
> > +       host_ctr = rte_be_64_to_cpu(ctr);
> >
> >         for (n = 0; n < srclen; n++) {
> >                 if (n % 8 == 0) {
> > +                       ctr = rte_cpu_to_be_64(host_ctr);
> 
> Moving this here adds one uneeded extra conversion on the first
> iteration.
> So I would keep the conversion around the host_ctr variable increment,
> if you get the idea.
> 
> 
> >                         if (EVP_EncryptUpdate(ctx,
> >                                         (unsigned char *)&ebuf,
> &unused,
> >                                         (const unsigned char *)&ctr,
> 8) <= 0)
> >                                 goto process_cipher_des3ctr_err;
> > -                       ctr_inc(ctr);
> > +                       host_ctr++;
> >                 }
> >                 dst[n] = *(src++) ^ ebuf[n % 8];
> 
> --
> David Marchand

LGTM.
For the next version,
Acked-by: Morten Brørup <mb@smartsharesystems.com>


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/6] devtools: forbid use of builtin helpers
  2024-10-24 12:05 ` [PATCH 2/6] devtools: forbid use of builtin helpers David Marchand
@ 2024-10-24 16:40   ` Stephen Hemminger
  0 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2024-10-24 16:40 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, thomas, ferruh.yigit, mattias.ronnblom

On Thu, 24 Oct 2024 14:05:31 +0200
David Marchand <david.marchand@redhat.com> wrote:

> Warn on use of any __builtin helpers, but leave it open for EAL (as it
> is where the abstractions for OS and compiler differences are), and
> some drivers base code.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/6] common/dpaax: use prefetch macros
  2024-10-24 12:05 ` [PATCH 3/6] common/dpaax: use prefetch macros David Marchand
@ 2024-10-24 16:40   ` Stephen Hemminger
  0 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2024-10-24 16:40 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, thomas, ferruh.yigit, mattias.ronnblom, Hemant Agrawal,
	Sachin Saxena

On Thu, 24 Oct 2024 14:05:32 +0200
David Marchand <david.marchand@redhat.com> wrote:

> Prefer EAL macros over __builtin_ helpers.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>


Acked-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/6] drivers: use branch prediction macros
  2024-10-24 12:05 ` [PATCH 5/6] drivers: use branch prediction macros David Marchand
@ 2024-10-24 16:41   ` Stephen Hemminger
  0 siblings, 0 replies; 16+ messages in thread
From: Stephen Hemminger @ 2024-10-24 16:41 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, thomas, ferruh.yigit, mattias.ronnblom, Hemant Agrawal,
	Sachin Saxena, Devendra Singh Rawat, Alok Prasad, Jochen Behrens

On Thu, 24 Oct 2024 14:05:34 +0200
David Marchand <david.marchand@redhat.com> wrote:

> Prefer EAL macros over __builtin_ helpers.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2024-10-24 16:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-10-24 12:05 [PATCH 0/6] Clean up many __builtin_* in drivers David Marchand
2024-10-24 12:05 ` [PATCH 1/6] devtools: handle multiple pattern for skipping files David Marchand
2024-10-24 12:05 ` [PATCH 2/6] devtools: forbid use of builtin helpers David Marchand
2024-10-24 16:40   ` Stephen Hemminger
2024-10-24 12:05 ` [PATCH 3/6] common/dpaax: use prefetch macros David Marchand
2024-10-24 16:40   ` Stephen Hemminger
2024-10-24 12:05 ` [PATCH 4/6] crypto/openssl: fix 3DES-CTR with big endian CPUs David Marchand
2024-10-24 12:54   ` Morten Brørup
2024-10-24 13:10     ` David Marchand
2024-10-24 13:17       ` David Marchand
2024-10-24 13:30         ` David Marchand
2024-10-24 14:21           ` Morten Brørup
2024-10-24 12:05 ` [PATCH 5/6] drivers: use branch prediction macros David Marchand
2024-10-24 16:41   ` Stephen Hemminger
2024-10-24 12:05 ` [PATCH 6/6] drivers: use bitops API instead of compiler builtins David Marchand
2024-10-24 12:25   ` Morten Brørup

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).