* [PATCH 1/2] config/arm: adds Arm Neoverse N3 SoC
@ 2024-06-04  4:44 Wathsala Vithanage
  2024-06-04  4:44 ` [PATCH 2/2] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
                   ` (2 more replies)
  0 siblings, 3 replies; 32+ messages in thread
From: Wathsala Vithanage @ 2024-06-04  4:44 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson
  Cc: dev, nd, Wathsala Vithanage, Dhruv Tripathi, Jack Bond-Preston
Add Arm Neoverse N3 part number to build configuration.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
Reviewed-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>
---
 config/arm/meson.build | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/config/arm/meson.build b/config/arm/meson.build
index a45aa9e466..ec7588f055 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -116,6 +116,18 @@ part_number_config_arm = {
             ['RTE_MAX_LCORE', 144],
             ['RTE_MAX_NUMA_NODES', 2]
         ]
+    },
+    '0xd8e': {
+        'march': 'armv8.7-a',
+        'march_features': ['sve2'],
+        'fallback_march': 'armv8.5-a',
+        'flags': [
+            ['RTE_MACHINE', '"neoverse-n3"'],
+            ['RTE_ARM_FEATURE_ATOMICS', true],
+            ['RTE_ARM_FEATURE_WFXT', true],
+            ['RTE_MAX_LCORE', 192],
+            ['RTE_MAX_NUMA_NODES', 2]
+        ]
     }
 }
 implementer_arm = {
@@ -555,6 +567,13 @@ soc_n2 = {
     'numa': false
 }
 
+soc_n3 = {
+    'description': 'Arm Neoverse N3',
+    'implementer': '0x41',
+    'part_number': '0xd8e',
+    'numa': false
+}
+
 soc_odyssey = {
     'description': 'Marvell Odyssey',
     'implementer': '0x41',
@@ -680,6 +699,7 @@ socs = {
     'kunpeng930': soc_kunpeng930,
     'n1sdp': soc_n1sdp,
     'n2': soc_n2,
+    'n3': soc_n3,
     'odyssey' : soc_odyssey,
     'stingray': soc_stingray,
     'thunderx2': soc_thunderx2,
@@ -833,7 +853,7 @@ if update_flags
         if part_number_config.get('force_march', false)
             candidate_march = part_number_config['march']
         else
-            supported_marchs = ['armv9-a', 'armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a',
+            supported_marchs = ['armv9-a', 'armv8.7-a', 'armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a',
                                 'armv8.2-a', 'armv8.1-a', 'armv8-a']
             check_compiler_support = false
             foreach supported_march: supported_marchs
-- 
2.34.1
^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH 2/2] eal: add Arm WFET in power management intrinsics
  2024-06-04  4:44 [PATCH 1/2] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
@ 2024-06-04  4:44 ` Wathsala Vithanage
  2024-06-04 15:41   ` Stephen Hemminger
  2024-06-19  6:45   ` [PATCH v2 1/2] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
  2024-07-15 22:53 ` [PATCH v3 1/4] eal: expand the availability of WFE and related instructions Wathsala Vithanage
  2024-07-26 17:15 ` [PATCH v4 1/4] eal: expand the availability of WFE and related instructions Wathsala Vithanage
  2 siblings, 2 replies; 32+ messages in thread
From: Wathsala Vithanage @ 2024-06-04  4:44 UTC (permalink / raw)
  To: Thomas Monjalon, Tyler Retzlaff, Ruifeng Wang
  Cc: dev, nd, Wathsala Vithanage, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna
Wait for event with timeout (WFET) puts the CPU in a low power
mode and stays there until an event is signalled (SEV), loss of
an exclusive monitor or a timeout.
WFET is enabled selectively by checking FEAT_WFxT in Linux
auxiliary vector. If FEAT_WFxT is not available power management
will fallback to WFE.
RTE_ARM_USE_WFE macro is not required to enable WFE feature for
PMD power monitoring.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>
Reviewed-by: Nick Connolly <nick.connolly@arm.com>
Reviewed-by: Vinod Krishna <vinod.krishna@arm.com>
---
 .mailmap                                   |  4 +-
 app/test/test_cpuflags.c                   |  3 ++
 lib/eal/arm/include/rte_cpuflags_64.h      |  1 +
 lib/eal/arm/include/rte_pause_64.h         | 21 ++++++--
 lib/eal/arm/include/rte_power_intrinsics.h |  6 +++
 lib/eal/arm/rte_cpuflags.c                 |  3 +-
 lib/eal/arm/rte_power_intrinsics.c         | 56 ++++++++++++----------
 7 files changed, 62 insertions(+), 32 deletions(-)
diff --git a/.mailmap b/.mailmap
index 3843868716..eefcce4935 100644
--- a/.mailmap
+++ b/.mailmap
@@ -332,6 +332,7 @@ Dexia Li <dexia.li@jaguarmicro.com>
 Dexuan Cui <decui@microsoft.com>
 Dharmik Thakkar <dharmikjayesh.thakkar@arm.com> <dharmik.thakkar@arm.com>
 Dheemanth Mallikarjun <dheemanthm@vmware.com>
+Dhruv Tripathi <dhruv.tripathi@arm.com>
 Diana Wang <na.wang@corigine.com>
 Didier Pallard <didier.pallard@6wind.com>
 Dilshod Urazov <dilshod.urazov@oktetlabs.ru>
@@ -1027,7 +1028,7 @@ Netanel Belgazal <netanel@amazon.com>
 Netanel Gonen <netanelg@mellanox.com>
 Niall Power <niall.power@intel.com>
 Nicholas Pratte <npratte@iol.unh.edu>
-Nick Connolly <nick.connolly@mayadata.io>
+Nick Connolly <nick.connolly@arm.com> <Nick.x.Connolly@gmail.com>
 Nick Nunley <nicholas.d.nunley@intel.com>
 Niclas Storm <niclas.storm@ericsson.com>
 Nicolas Chautru <nicolas.chautru@intel.com>
@@ -1516,6 +1517,7 @@ Vincent Jardin <vincent.jardin@6wind.com>
 Vincent Li <vincent.mc.li@gmail.com>
 Vincent S. Cojot <vcojot@redhat.com>
 Vinh Tran <vinh.t.tran10@gmail.com>
+Vinod Krishna <vinod.krishna@arm.com>
 Vipin Varghese <vipin.varghese@amd.com> <vipin.varghese@intel.com>
 Vipul Ashri <vipul.ashri@oracle.com>
 Visa Hankala <visa@hankala.org>
diff --git a/app/test/test_cpuflags.c b/app/test/test_cpuflags.c
index a0ff74720c..22ab4dff0a 100644
--- a/app/test/test_cpuflags.c
+++ b/app/test/test_cpuflags.c
@@ -156,6 +156,9 @@ test_cpuflags(void)
 
 	printf("Check for SVEBF16:\t");
 	CHECK_FOR_FLAG(RTE_CPUFLAG_SVEBF16);
+
+	printf("Check for WFXT:\t");
+	CHECK_FOR_FLAG(RTE_CPUFLAG_WFXT);
 #endif
 
 #if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h
index afe70209c3..db61539db2 100644
--- a/lib/eal/arm/include/rte_cpuflags_64.h
+++ b/lib/eal/arm/include/rte_cpuflags_64.h
@@ -35,6 +35,7 @@ enum rte_cpu_flag_t {
 	RTE_CPUFLAG_SVEF32MM,
 	RTE_CPUFLAG_SVEF64MM,
 	RTE_CPUFLAG_SVEBF16,
+	RTE_CPUFLAG_WFXT,
 	RTE_CPUFLAG_AARCH64,
 };
 
diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h
index 5cb8b59056..f732407425 100644
--- a/lib/eal/arm/include/rte_pause_64.h
+++ b/lib/eal/arm/include/rte_pause_64.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2017 Cavium, Inc
- * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Arm Limited
  */
 
 #ifndef _RTE_PAUSE_ARM64_H_
@@ -23,17 +23,28 @@ static inline void rte_pause(void)
 	asm volatile("yield" ::: "memory");
 }
 
-#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
 
-/* Send a local event to quit WFE. */
+/* Send a local event to quit WFE/WFxT. */
 #define __RTE_ARM_SEVL() { asm volatile("sevl" : : : "memory"); }
 
-/* Send a global event to quit WFE for all cores. */
+/* Send a global event to quit WFE/WFxT for all cores. */
 #define __RTE_ARM_SEV() { asm volatile("sev" : : : "memory"); }
 
 /* Put processor into low power WFE(Wait For Event) state. */
 #define __RTE_ARM_WFE() { asm volatile("wfe" : : : "memory"); }
 
+/* Put processor into low power WFET (WFE with Timeout) state. */
+#ifdef RTE_ARM_FEATURE_WFXT
+#define __RTE_ARM_WFET(t) {                              \
+	asm volatile("wfet %x[to]"                        \
+			:                                 \
+			: [to] "r" (t)                    \
+			: "memory");                      \
+	}
+#else
+#define __RTE_ARM_WFET(t) { RTE_SET_USED(t); }
+#endif
+
 /*
  * Atomic exclusive load from addr, it returns the 8-bit content of
  * *addr while making it 'monitored', when it is written by someone
@@ -147,6 +158,8 @@ static inline void rte_pause(void)
 		__RTE_ARM_LOAD_EXC_128(src, dst, memorder) \
 }
 
+#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
+
 static __rte_always_inline void
 rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
 		int memorder)
diff --git a/lib/eal/arm/include/rte_power_intrinsics.h b/lib/eal/arm/include/rte_power_intrinsics.h
index 9e498e9ebf..dc80fca60b 100644
--- a/lib/eal/arm/include/rte_power_intrinsics.h
+++ b/lib/eal/arm/include/rte_power_intrinsics.h
@@ -13,6 +13,12 @@ extern "C" {
 
 #include "generic/rte_power_intrinsics.h"
 
+struct rte_power_monitor_info {
+	uint16_t init_done:1,   /* Initialization status bit */
+		wfet_en:1,	/* FEAT_WFET enabled bit */
+		reserved:14;	/* Reserved */
+};
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c
index 7ba4f8ba97..ad76f2448b 100644
--- a/lib/eal/arm/rte_cpuflags.c
+++ b/lib/eal/arm/rte_cpuflags.c
@@ -115,6 +115,7 @@ const struct feature_entry rte_cpu_feature_table[] = {
 	FEAT_DEF(SVEF32MM,	REG_HWCAP2,   10)
 	FEAT_DEF(SVEF64MM,	REG_HWCAP2,   11)
 	FEAT_DEF(SVEBF16,	REG_HWCAP2,   12)
+	FEAT_DEF(WFXT,		REG_HWCAP2,   31)
 	FEAT_DEF(AARCH64,	REG_PLATFORM,  0)
 };
 #endif /* RTE_ARCH */
@@ -163,7 +164,5 @@ void
 rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics)
 {
 	memset(intrinsics, 0, sizeof(*intrinsics));
-#ifdef RTE_ARM_USE_WFE
 	intrinsics->power_monitor = 1;
-#endif
 }
diff --git a/lib/eal/arm/rte_power_intrinsics.c b/lib/eal/arm/rte_power_intrinsics.c
index f54cf59e80..4c858192f2 100644
--- a/lib/eal/arm/rte_power_intrinsics.c
+++ b/lib/eal/arm/rte_power_intrinsics.c
@@ -4,20 +4,28 @@
 
 #include <errno.h>
 
+#include <rte_per_lcore.h>
+
+#include "rte_cpuflags.h"
 #include "rte_power_intrinsics.h"
 
 /**
- * This function uses WFE instruction to make lcore suspend
+ * Per core rte_power_monitor_info struct.
+ */
+RTE_DEFINE_PER_LCORE(struct rte_power_monitor_info, pm_info) = {
+	.init_done = 0,
+	.wfet_en = 0,
+};
+
+/**
+ * This function uses WFE/WFET instruction to make lcore suspend
  * execution on ARM.
- * Note that timestamp based timeout is not supported yet.
  */
 int
 rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 		const uint64_t tsc_timestamp)
 {
-	RTE_SET_USED(tsc_timestamp);
-
-#ifdef RTE_ARM_USE_WFE
+	struct rte_power_monitor_info *pminfo;
 	const unsigned int lcore_id = rte_lcore_id();
 	uint64_t cur_value;
 
@@ -31,33 +39,37 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	if (pmc->fn == NULL)
 		return -EINVAL;
 
+	pminfo = &RTE_PER_LCORE(pm_info);
+
+	if (unlikely(!(pminfo->init_done))) {
+		if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_WFXT))
+			pminfo->wfet_en = 1;
+		pminfo->init_done = 1;
+	}
+
 	switch (pmc->size) {
 	case sizeof(uint8_t):
-		__RTE_ARM_LOAD_EXC_8(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_8(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	case sizeof(uint16_t):
-		__RTE_ARM_LOAD_EXC_16(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_16(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	case sizeof(uint32_t):
-		__RTE_ARM_LOAD_EXC_32(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_32(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	case sizeof(uint64_t):
-		__RTE_ARM_LOAD_EXC_64(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_64(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	default:
 		return -EINVAL; /* unexpected size */
 	}
 
-	return 0;
-#else
-	RTE_SET_USED(pmc);
+	if (pminfo->wfet_en)
+		__RTE_ARM_WFET(tsc_timestamp)
+	else
+		__RTE_ARM_WFE()
 
-	return -ENOTSUP;
-#endif
+	return 0;
 }
 
 /**
@@ -80,14 +92,8 @@ int
 rte_power_monitor_wakeup(const unsigned int lcore_id)
 {
 	RTE_SET_USED(lcore_id);
-
-#ifdef RTE_ARM_USE_WFE
-	__RTE_ARM_SEV()
-
+	__RTE_ARM_SEV();
 	return 0;
-#else
-	return -ENOTSUP;
-#endif
 }
 
 int
-- 
2.34.1
^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [PATCH 2/2] eal: add Arm WFET in power management intrinsics
  2024-06-04  4:44 ` [PATCH 2/2] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
@ 2024-06-04 15:41   ` Stephen Hemminger
  2024-06-19  6:45   ` [PATCH v2 1/2] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
  1 sibling, 0 replies; 32+ messages in thread
From: Stephen Hemminger @ 2024-06-04 15:41 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: Thomas Monjalon, Tyler Retzlaff, Ruifeng Wang, dev, nd,
	Dhruv Tripathi, Honnappa Nagarahalli, Jack Bond-Preston,
	Nick Connolly, Vinod Krishna
On Tue,  4 Jun 2024 04:44:01 +0000
Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:
> --- a/lib/eal/arm/include/rte_cpuflags_64.h
> +++ b/lib/eal/arm/include/rte_cpuflags_64.h
> @@ -35,6 +35,7 @@ enum rte_cpu_flag_t {
>  	RTE_CPUFLAG_SVEF32MM,
>  	RTE_CPUFLAG_SVEF64MM,
>  	RTE_CPUFLAG_SVEBF16,
> +	RTE_CPUFLAG_WFXT,
>  	RTE_CPUFLAG_AARCH64,
>  };
Adding new entry in middle of enum will cause ABI to change.
^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH v2 1/2] config/arm: adds Arm Neoverse N3 SoC
  2024-06-04  4:44 ` [PATCH 2/2] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
  2024-06-04 15:41   ` Stephen Hemminger
@ 2024-06-19  6:45   ` Wathsala Vithanage
  2024-06-19  6:45     ` [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
  1 sibling, 1 reply; 32+ messages in thread
From: Wathsala Vithanage @ 2024-06-19  6:45 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson
  Cc: dev, nd, Wathsala Vithanage, Dhruv Tripathi, Jack Bond-Preston
Add Arm Neoverse N3 part number to build configuration.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
Reviewed-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>
---
 config/arm/meson.build | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/config/arm/meson.build b/config/arm/meson.build
index a45aa9e466..ec7588f055 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -116,6 +116,18 @@ part_number_config_arm = {
             ['RTE_MAX_LCORE', 144],
             ['RTE_MAX_NUMA_NODES', 2]
         ]
+    },
+    '0xd8e': {
+        'march': 'armv8.7-a',
+        'march_features': ['sve2'],
+        'fallback_march': 'armv8.5-a',
+        'flags': [
+            ['RTE_MACHINE', '"neoverse-n3"'],
+            ['RTE_ARM_FEATURE_ATOMICS', true],
+            ['RTE_ARM_FEATURE_WFXT', true],
+            ['RTE_MAX_LCORE', 192],
+            ['RTE_MAX_NUMA_NODES', 2]
+        ]
     }
 }
 implementer_arm = {
@@ -555,6 +567,13 @@ soc_n2 = {
     'numa': false
 }
 
+soc_n3 = {
+    'description': 'Arm Neoverse N3',
+    'implementer': '0x41',
+    'part_number': '0xd8e',
+    'numa': false
+}
+
 soc_odyssey = {
     'description': 'Marvell Odyssey',
     'implementer': '0x41',
@@ -680,6 +699,7 @@ socs = {
     'kunpeng930': soc_kunpeng930,
     'n1sdp': soc_n1sdp,
     'n2': soc_n2,
+    'n3': soc_n3,
     'odyssey' : soc_odyssey,
     'stingray': soc_stingray,
     'thunderx2': soc_thunderx2,
@@ -833,7 +853,7 @@ if update_flags
         if part_number_config.get('force_march', false)
             candidate_march = part_number_config['march']
         else
-            supported_marchs = ['armv9-a', 'armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a',
+            supported_marchs = ['armv9-a', 'armv8.7-a', 'armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a',
                                 'armv8.2-a', 'armv8.1-a', 'armv8-a']
             check_compiler_support = false
             foreach supported_march: supported_marchs
-- 
2.34.1
^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-06-19  6:45   ` [PATCH v2 1/2] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
@ 2024-06-19  6:45     ` Wathsala Vithanage
  2024-06-27 15:30       ` Thomas Monjalon
  0 siblings, 1 reply; 32+ messages in thread
From: Wathsala Vithanage @ 2024-06-19  6:45 UTC (permalink / raw)
  To: Thomas Monjalon, Tyler Retzlaff, Ruifeng Wang
  Cc: dev, nd, Wathsala Vithanage, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna
Wait for event with timeout (WFET) puts the CPU in a low power
mode and stays there until an event is signalled (SEV), loss of
an exclusive monitor or a timeout.
WFET is enabled selectively by checking FEAT_WFxT in Linux
auxiliary vector. If FEAT_WFxT is not available power management
will fallback to WFE.
RTE_ARM_USE_WFE macro is not required to enable WFE feature for
PMD power monitoring.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>
Reviewed-by: Nick Connolly <nick.connolly@arm.com>
Reviewed-by: Vinod Krishna <vinod.krishna@arm.com>
---
 .mailmap                              |  2 ++
 app/test/test_cpuflags.c              |  3 ++
 lib/eal/arm/include/rte_cpuflags_64.h |  1 +
 lib/eal/arm/include/rte_pause_64.h    | 21 ++++++++++---
 lib/eal/arm/rte_cpuflags.c            |  3 +-
 lib/eal/arm/rte_power_intrinsics.c    | 45 +++++++++++++++++----------
 6 files changed, 52 insertions(+), 23 deletions(-)
diff --git a/.mailmap b/.mailmap
index 3843868716..31995d492d 100644
--- a/.mailmap
+++ b/.mailmap
@@ -332,6 +332,7 @@ Dexia Li <dexia.li@jaguarmicro.com>
 Dexuan Cui <decui@microsoft.com>
 Dharmik Thakkar <dharmikjayesh.thakkar@arm.com> <dharmik.thakkar@arm.com>
 Dheemanth Mallikarjun <dheemanthm@vmware.com>
+Dhruv Tripathi <dhruv.tripathi@arm.com>
 Diana Wang <na.wang@corigine.com>
 Didier Pallard <didier.pallard@6wind.com>
 Dilshod Urazov <dilshod.urazov@oktetlabs.ru>
@@ -1516,6 +1517,7 @@ Vincent Jardin <vincent.jardin@6wind.com>
 Vincent Li <vincent.mc.li@gmail.com>
 Vincent S. Cojot <vcojot@redhat.com>
 Vinh Tran <vinh.t.tran10@gmail.com>
+Vinod Krishna <vinod.krishna@arm.com>
 Vipin Varghese <vipin.varghese@amd.com> <vipin.varghese@intel.com>
 Vipul Ashri <vipul.ashri@oracle.com>
 Visa Hankala <visa@hankala.org>
diff --git a/app/test/test_cpuflags.c b/app/test/test_cpuflags.c
index a0ff74720c..22ab4dff0a 100644
--- a/app/test/test_cpuflags.c
+++ b/app/test/test_cpuflags.c
@@ -156,6 +156,9 @@ test_cpuflags(void)
 
 	printf("Check for SVEBF16:\t");
 	CHECK_FOR_FLAG(RTE_CPUFLAG_SVEBF16);
+
+	printf("Check for WFXT:\t");
+	CHECK_FOR_FLAG(RTE_CPUFLAG_WFXT);
 #endif
 
 #if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h
index afe70209c3..1945a97ca1 100644
--- a/lib/eal/arm/include/rte_cpuflags_64.h
+++ b/lib/eal/arm/include/rte_cpuflags_64.h
@@ -36,6 +36,7 @@ enum rte_cpu_flag_t {
 	RTE_CPUFLAG_SVEF64MM,
 	RTE_CPUFLAG_SVEBF16,
 	RTE_CPUFLAG_AARCH64,
+	RTE_CPUFLAG_WFXT,
 };
 
 #include "generic/rte_cpuflags.h"
diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h
index 5cb8b59056..f732407425 100644
--- a/lib/eal/arm/include/rte_pause_64.h
+++ b/lib/eal/arm/include/rte_pause_64.h
@@ -1,6 +1,6 @@
 /* SPDX-License-Identifier: BSD-3-Clause
  * Copyright(c) 2017 Cavium, Inc
- * Copyright(c) 2019 Arm Limited
+ * Copyright(c) 2024 Arm Limited
  */
 
 #ifndef _RTE_PAUSE_ARM64_H_
@@ -23,17 +23,28 @@ static inline void rte_pause(void)
 	asm volatile("yield" ::: "memory");
 }
 
-#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
 
-/* Send a local event to quit WFE. */
+/* Send a local event to quit WFE/WFxT. */
 #define __RTE_ARM_SEVL() { asm volatile("sevl" : : : "memory"); }
 
-/* Send a global event to quit WFE for all cores. */
+/* Send a global event to quit WFE/WFxT for all cores. */
 #define __RTE_ARM_SEV() { asm volatile("sev" : : : "memory"); }
 
 /* Put processor into low power WFE(Wait For Event) state. */
 #define __RTE_ARM_WFE() { asm volatile("wfe" : : : "memory"); }
 
+/* Put processor into low power WFET (WFE with Timeout) state. */
+#ifdef RTE_ARM_FEATURE_WFXT
+#define __RTE_ARM_WFET(t) {                              \
+	asm volatile("wfet %x[to]"                        \
+			:                                 \
+			: [to] "r" (t)                    \
+			: "memory");                      \
+	}
+#else
+#define __RTE_ARM_WFET(t) { RTE_SET_USED(t); }
+#endif
+
 /*
  * Atomic exclusive load from addr, it returns the 8-bit content of
  * *addr while making it 'monitored', when it is written by someone
@@ -147,6 +158,8 @@ static inline void rte_pause(void)
 		__RTE_ARM_LOAD_EXC_128(src, dst, memorder) \
 }
 
+#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
+
 static __rte_always_inline void
 rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
 		int memorder)
diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c
index 7ba4f8ba97..ad76f2448b 100644
--- a/lib/eal/arm/rte_cpuflags.c
+++ b/lib/eal/arm/rte_cpuflags.c
@@ -115,6 +115,7 @@ const struct feature_entry rte_cpu_feature_table[] = {
 	FEAT_DEF(SVEF32MM,	REG_HWCAP2,   10)
 	FEAT_DEF(SVEF64MM,	REG_HWCAP2,   11)
 	FEAT_DEF(SVEBF16,	REG_HWCAP2,   12)
+	FEAT_DEF(WFXT,		REG_HWCAP2,   31)
 	FEAT_DEF(AARCH64,	REG_PLATFORM,  0)
 };
 #endif /* RTE_ARCH */
@@ -163,7 +164,5 @@ void
 rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics)
 {
 	memset(intrinsics, 0, sizeof(*intrinsics));
-#ifdef RTE_ARM_USE_WFE
 	intrinsics->power_monitor = 1;
-#endif
 }
diff --git a/lib/eal/arm/rte_power_intrinsics.c b/lib/eal/arm/rte_power_intrinsics.c
index f54cf59e80..fc7a0c61f0 100644
--- a/lib/eal/arm/rte_power_intrinsics.c
+++ b/lib/eal/arm/rte_power_intrinsics.c
@@ -4,20 +4,31 @@
 
 #include <errno.h>
 
+#include "rte_cpuflags.h"
 #include "rte_power_intrinsics.h"
 
 /**
- * This function uses WFE instruction to make lcore suspend
+ *  Set wfet_en if WFET is supported
+ */
+uint8_t wfet_en;
+
+RTE_INIT(rte_power_intrinsics_init)
+{
+#ifdef RTE_ARCH_64
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_WFXT))
+		wfet_en = 1;
+#endif
+}
+
+/**
+ * This function uses WFE/WFET instruction to make lcore suspend
  * execution on ARM.
- * Note that timestamp based timeout is not supported yet.
  */
 int
 rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 		const uint64_t tsc_timestamp)
 {
-	RTE_SET_USED(tsc_timestamp);
-
-#ifdef RTE_ARM_USE_WFE
+#ifdef RTE_ARCH_64
 	const unsigned int lcore_id = rte_lcore_id();
 	uint64_t cur_value;
 
@@ -33,28 +44,30 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 
 	switch (pmc->size) {
 	case sizeof(uint8_t):
-		__RTE_ARM_LOAD_EXC_8(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_8(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	case sizeof(uint16_t):
-		__RTE_ARM_LOAD_EXC_16(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_16(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	case sizeof(uint32_t):
-		__RTE_ARM_LOAD_EXC_32(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_32(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	case sizeof(uint64_t):
-		__RTE_ARM_LOAD_EXC_64(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_64(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	default:
 		return -EINVAL; /* unexpected size */
 	}
 
+	if (wfet_en)
+		__RTE_ARM_WFET(tsc_timestamp)
+	else
+		__RTE_ARM_WFE()
+
 	return 0;
 #else
 	RTE_SET_USED(pmc);
+	RTE_SET_USED(tsc_timestamp);
 
 	return -ENOTSUP;
 #endif
@@ -80,10 +93,8 @@ int
 rte_power_monitor_wakeup(const unsigned int lcore_id)
 {
 	RTE_SET_USED(lcore_id);
-
-#ifdef RTE_ARM_USE_WFE
-	__RTE_ARM_SEV()
-
+#ifdef RTE_ARCH_64
+	__RTE_ARM_SEV();
 	return 0;
 #else
 	return -ENOTSUP;
-- 
2.34.1
^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-06-19  6:45     ` [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
@ 2024-06-27 15:30       ` Thomas Monjalon
  2024-07-01 21:34         ` Wathsala Wathawana Vithanage
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Monjalon @ 2024-06-27 15:30 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand
19/06/2024 08:45, Wathsala Vithanage:
> --- a/lib/eal/arm/include/rte_cpuflags_64.h
> +++ b/lib/eal/arm/include/rte_cpuflags_64.h
> @@ -36,6 +36,7 @@ enum rte_cpu_flag_t {
>  	RTE_CPUFLAG_SVEF64MM,
>  	RTE_CPUFLAG_SVEBF16,
>  	RTE_CPUFLAG_AARCH64,
> +	RTE_CPUFLAG_WFXT,
>  };
It may be useful to add comments explaining each flag.
May be a separate patch in this series?
> - * Copyright(c) 2019 Arm Limited
> + * Copyright(c) 2024 Arm Limited
No, it's wrong to remove initial date,
and no, you don't need to update dates at all.
> -#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
Why removing this #ifdef?
> -/* Send a local event to quit WFE. */
> +/* Send a local event to quit WFE/WFxT. */
>  #define __RTE_ARM_SEVL() { asm volatile("sevl" : : : "memory"); }
>  
> -/* Send a global event to quit WFE for all cores. */
> +/* Send a global event to quit WFE/WFxT for all cores. */
>  #define __RTE_ARM_SEV() { asm volatile("sev" : : : "memory"); }
>  
>  /* Put processor into low power WFE(Wait For Event) state. */
>  #define __RTE_ARM_WFE() { asm volatile("wfe" : : : "memory"); }
>  
> +/* Put processor into low power WFET (WFE with Timeout) state. */
> +#ifdef RTE_ARM_FEATURE_WFXT
> +#define __RTE_ARM_WFET(t) {                              \
> +	asm volatile("wfet %x[to]"                        \
> +			:                                 \
> +			: [to] "r" (t)                    \
> +			: "memory");                      \
> +	}
Is there any intrinsic function available?
[...]
> --- a/lib/eal/arm/rte_cpuflags.c
> +++ b/lib/eal/arm/rte_cpuflags.c
> @@ -115,6 +115,7 @@ const struct feature_entry rte_cpu_feature_table[] = {
>  	FEAT_DEF(SVEF32MM,	REG_HWCAP2,   10)
>  	FEAT_DEF(SVEF64MM,	REG_HWCAP2,   11)
>  	FEAT_DEF(SVEBF16,	REG_HWCAP2,   12)
> +	FEAT_DEF(WFXT,		REG_HWCAP2,   31)
>  	FEAT_DEF(AARCH64,	REG_PLATFORM,  0)
Are you sure of alignment? (looks wrong in my email client)
[...]
>  rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics)
>  {
>  	memset(intrinsics, 0, sizeof(*intrinsics));
> -#ifdef RTE_ARM_USE_WFE
>  	intrinsics->power_monitor = 1;
> -#endif
Why removing this #ifdef?
> +uint8_t wfet_en;
It should be made static probably.
This variable will be unused in some cases, needs #ifdef.
> +
> +RTE_INIT(rte_power_intrinsics_init)
> +{
> +#ifdef RTE_ARCH_64
> +	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_WFXT))
> +		wfet_en = 1;
> +#endif
> +}
> +
> +/**
> + * This function uses WFE/WFET instruction to make lcore suspend
>   * execution on ARM.
> - * Note that timestamp based timeout is not supported yet.
>   */
>  int
>  rte_power_monitor(const struct rte_power_monitor_cond *pmc,
>  		const uint64_t tsc_timestamp)
>  {
> -	RTE_SET_USED(tsc_timestamp);
> -
> -#ifdef RTE_ARM_USE_WFE
> +#ifdef RTE_ARCH_64
It looks wrong.
If RTE_ARM_USE_WFE is disabled, you should not call __RTE_ARM_WFE().
>  	const unsigned int lcore_id = rte_lcore_id();
>  	uint64_t cur_value;
>  
> @@ -33,28 +44,30 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
>  
>  	switch (pmc->size) {
>  	case sizeof(uint8_t):
> -		__RTE_ARM_LOAD_EXC_8(pmc->addr, cur_value, rte_memory_order_relaxed)
> -		__RTE_ARM_WFE()
> +		__RTE_ARM_LOAD_EXC_8(pmc->addr, cur_value, rte_memory_order_relaxed);
^ permalink raw reply	[flat|nested] 32+ messages in thread
* RE: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-06-27 15:30       ` Thomas Monjalon
@ 2024-07-01 21:34         ` Wathsala Wathawana Vithanage
  2024-07-02  8:29           ` Thomas Monjalon
  0 siblings, 1 reply; 32+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-07-01 21:34 UTC (permalink / raw)
  To: thomas
  Cc: Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand, nd
> 19/06/2024 08:45, Wathsala Vithanage:
> > --- a/lib/eal/arm/include/rte_cpuflags_64.h
> > +++ b/lib/eal/arm/include/rte_cpuflags_64.h
> > @@ -36,6 +36,7 @@ enum rte_cpu_flag_t {
> >  	RTE_CPUFLAG_SVEF64MM,
> >  	RTE_CPUFLAG_SVEBF16,
> >  	RTE_CPUFLAG_AARCH64,
> > +	RTE_CPUFLAG_WFXT,
> >  };
> 
> It may be useful to add comments explaining each flag.
> May be a separate patch in this series?
> 
+1
> 
> > - * Copyright(c) 2019 Arm Limited
> > + * Copyright(c) 2024 Arm Limited
> 
> No, it's wrong to remove initial date,
> and no, you don't need to update dates at all.
> 
> 
> > -#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
> 
> Why removing this #ifdef?
It's not removed, it's moved further down and just above rte_wait_until_equal_X functions.
Use of SEV, and WFE are not limited to  rte_wait_until_equal_X functions, 
PMDs should be able to use them for power management. 
> 
> > -/* Send a local event to quit WFE. */
> > +/* Send a local event to quit WFE/WFxT. */
> >  #define __RTE_ARM_SEVL() { asm volatile("sevl" : : : "memory"); }
> >
> > -/* Send a global event to quit WFE for all cores. */
> > +/* Send a global event to quit WFE/WFxT for all cores. */
> >  #define __RTE_ARM_SEV() { asm volatile("sev" : : : "memory"); }
> >
> >  /* Put processor into low power WFE(Wait For Event) state. */
> > #define __RTE_ARM_WFE() { asm volatile("wfe" : : : "memory"); }
> >
> > +/* Put processor into low power WFET (WFE with Timeout) state. */
> > +#ifdef RTE_ARM_FEATURE_WFXT
> > +#define __RTE_ARM_WFET(t) {                              \
> > +	asm volatile("wfet %x[to]"                        \
> > +			:                                 \
> > +			: [to] "r" (t)                    \
> > +			: "memory");                      \
> > +	}
> 
> Is there any intrinsic function available?
> 
We don't have an intrinsic at the moment.
> 
> [...]
> > --- a/lib/eal/arm/rte_cpuflags.c
> > +++ b/lib/eal/arm/rte_cpuflags.c
> > @@ -115,6 +115,7 @@ const struct feature_entry rte_cpu_feature_table[] =
> {
> >  	FEAT_DEF(SVEF32MM,	REG_HWCAP2,   10)
> >  	FEAT_DEF(SVEF64MM,	REG_HWCAP2,   11)
> >  	FEAT_DEF(SVEBF16,	REG_HWCAP2,   12)
> > +	FEAT_DEF(WFXT,		REG_HWCAP2,   31)
> >  	FEAT_DEF(AARCH64,	REG_PLATFORM,  0)
> 
> Are you sure of alignment? (looks wrong in my email client)
Didn't see this before, I will check.
> 
> 
> [...]
> >  rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics)
> > {
> >  	memset(intrinsics, 0, sizeof(*intrinsics)); -#ifdef RTE_ARM_USE_WFE
> >  	intrinsics->power_monitor = 1;
> > -#endif
> 
> Why removing this #ifdef?
WFE is available in all the Arm platforms DPDK currently supports. Therefore, an explicit flag is not
required at build time. 
The purpose of RTE_ARM_USE_WFE seems to be controlling the use of WFE in rte_wait_until_equal
functions by defining RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED macro  only when RTE_ARM_USE_WFE
is defined. (eal/arm/include/rte_pause_64.h:15)
From what I'm hearing this was done due to a performance issue rte_wait_until_equal_X functions had
when using WFE.
However, that design is flawed because it made other users of WFE dependent on the choice of
the use of WFE in rte_wait_until_equal functions as __RTE_ARM_WFE was wrapped in an
RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED #ifdef block.
This patch fixes this issue by moving __RTE_ARM_WFE out of RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
block.
Perhaps we should change RTE_ARM_USE_WFE to something like 
RTE_ARM_USE_WFE_IN_WAIT_UNTIL_EQUAL ?
> 
> 
> > +uint8_t wfet_en;
> 
> It should be made static probably.
> This variable will be unused in some cases, needs #ifdef.
> 
This variable is used in all cases. It's 1 when WFET is available, 0 when it's not.
> > +
> > +RTE_INIT(rte_power_intrinsics_init)
> > +{
> > +#ifdef RTE_ARCH_64
> > +	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_WFXT))
> > +		wfet_en = 1;
> > +#endif
> > +}
> > +
> > +/**
> > + * This function uses WFE/WFET instruction to make lcore suspend
> >   * execution on ARM.
> > - * Note that timestamp based timeout is not supported yet.
> >   */
> >  int
> >  rte_power_monitor(const struct rte_power_monitor_cond *pmc,
> >  		const uint64_t tsc_timestamp)
> >  {
> > -	RTE_SET_USED(tsc_timestamp);
> > -
> > -#ifdef RTE_ARM_USE_WFE
> > +#ifdef RTE_ARCH_64
> 
> It looks wrong.
> If RTE_ARM_USE_WFE is disabled, you should not call __RTE_ARM_WFE().
> 
RTE_ARM_USE_WFE should be renamed to reflect its actual use. It's safe to assume that
WFE is available universally in Arm DPDK context.
> >  	const unsigned int lcore_id = rte_lcore_id();
> >  	uint64_t cur_value;
> >
> > @@ -33,28 +44,30 @@ rte_power_monitor(const struct
> > rte_power_monitor_cond *pmc,
> >
> >  	switch (pmc->size) {
> >  	case sizeof(uint8_t):
> > -		__RTE_ARM_LOAD_EXC_8(pmc->addr, cur_value,
> rte_memory_order_relaxed)
> > -		__RTE_ARM_WFE()
> > +		__RTE_ARM_LOAD_EXC_8(pmc->addr, cur_value,
> > +rte_memory_order_relaxed);
> 
> 
^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-01 21:34         ` Wathsala Wathawana Vithanage
@ 2024-07-02  8:29           ` Thomas Monjalon
  2024-07-03 13:27             ` Wathsala Wathawana Vithanage
  2024-07-03 16:19             ` Wathsala Wathawana Vithanage
  0 siblings, 2 replies; 32+ messages in thread
From: Thomas Monjalon @ 2024-07-02  8:29 UTC (permalink / raw)
  To: Wathsala Wathawana Vithanage
  Cc: Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand, nd
01/07/2024 23:34, Wathsala Wathawana Vithanage:
> > 19/06/2024 08:45, Wathsala Vithanage:
> > >  rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics)
> > > {
> > >  	memset(intrinsics, 0, sizeof(*intrinsics)); -#ifdef RTE_ARM_USE_WFE
> > >  	intrinsics->power_monitor = 1;
> > > -#endif
> > 
> > Why removing this #ifdef?
> 
> WFE is available in all the Arm platforms DPDK currently supports. Therefore, an explicit flag is not
> required at build time. 
> 
> The purpose of RTE_ARM_USE_WFE seems to be controlling the use of WFE in rte_wait_until_equal
> functions by defining RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED macro  only when RTE_ARM_USE_WFE
> is defined. (eal/arm/include/rte_pause_64.h:15)
> From what I'm hearing this was done due to a performance issue rte_wait_until_equal_X functions had
> when using WFE.
> However, that design is flawed because it made other users of WFE dependent on the choice of
> the use of WFE in rte_wait_until_equal functions as __RTE_ARM_WFE was wrapped in an
> RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED #ifdef block.
> This patch fixes this issue by moving __RTE_ARM_WFE out of RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
> block.
> 
> Perhaps we should change RTE_ARM_USE_WFE to something like 
> RTE_ARM_USE_WFE_IN_WAIT_UNTIL_EQUAL ?
Yes perhaps.
And more importantly, you should do such change in a separate patch.
Don't hide it in WFET patch.
> > > +uint8_t wfet_en;
> > 
> > It should be made static probably.
> > This variable will be unused in some cases, needs #ifdef.
> > 
> 
> This variable is used in all cases. It's 1 when WFET is available, 0 when it's not.
It would be 0 if you make it static yes.
> > > +
> > > +RTE_INIT(rte_power_intrinsics_init)
> > > +{
> > > +#ifdef RTE_ARCH_64
> > > +	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_WFXT))
> > > +		wfet_en = 1;
> > > +#endif
> > > +}
> > > +
> > > +/**
> > > + * This function uses WFE/WFET instruction to make lcore suspend
> > >   * execution on ARM.
> > > - * Note that timestamp based timeout is not supported yet.
> > >   */
> > >  int
> > >  rte_power_monitor(const struct rte_power_monitor_cond *pmc,
> > >  		const uint64_t tsc_timestamp)
> > >  {
> > > -	RTE_SET_USED(tsc_timestamp);
> > > -
> > > -#ifdef RTE_ARM_USE_WFE
> > > +#ifdef RTE_ARCH_64
> > 
> > It looks wrong.
> > If RTE_ARM_USE_WFE is disabled, you should not call __RTE_ARM_WFE().
> > 
> 
> RTE_ARM_USE_WFE should be renamed to reflect its actual use. It's safe to assume that
> WFE is available universally in Arm DPDK context.
^ permalink raw reply	[flat|nested] 32+ messages in thread
* RE: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-02  8:29           ` Thomas Monjalon
@ 2024-07-03 13:27             ` Wathsala Wathawana Vithanage
  2024-07-03 13:33               ` Thomas Monjalon
  2024-07-03 16:19             ` Wathsala Wathawana Vithanage
  1 sibling, 1 reply; 32+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-07-03 13:27 UTC (permalink / raw)
  To: thomas
  Cc: Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand, nd, nd
> RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED #ifdef block.
> > This patch fixes this issue by moving __RTE_ARM_WFE out of
> > RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED block.
> >
> > Perhaps we should change RTE_ARM_USE_WFE to something like
> > RTE_ARM_USE_WFE_IN_WAIT_UNTIL_EQUAL ?
> 
> Yes perhaps.
RTE_ARM_USE_WFE is already used in drivers/event/cnxk/cn10k_worker.h
therefore RTE_ARM_USE_WFE_IN_WAIT_UNTIL_EQUAL is not suitable.
I wouldn't mind keeping RTE_ARM_USE_WFE because "USE_WFE" sounds like an
instruction to use WFE rather than an indication of availability of the WFE instruction. 
^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-03 13:27             ` Wathsala Wathawana Vithanage
@ 2024-07-03 13:33               ` Thomas Monjalon
  2024-07-03 16:58                 ` Wathsala Wathawana Vithanage
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Monjalon @ 2024-07-03 13:33 UTC (permalink / raw)
  To: Wathsala Wathawana Vithanage
  Cc: Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand
03/07/2024 15:27, Wathsala Wathawana Vithanage:
> > RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED #ifdef block.
> > > This patch fixes this issue by moving __RTE_ARM_WFE out of
> > > RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED block.
> > >
> > > Perhaps we should change RTE_ARM_USE_WFE to something like
> > > RTE_ARM_USE_WFE_IN_WAIT_UNTIL_EQUAL ?
> > 
> > Yes perhaps.
> RTE_ARM_USE_WFE is already used in drivers/event/cnxk/cn10k_worker.h
> therefore RTE_ARM_USE_WFE_IN_WAIT_UNTIL_EQUAL is not suitable.
> I wouldn't mind keeping RTE_ARM_USE_WFE because "USE_WFE" sounds like an
> instruction to use WFE rather than an indication of availability of the WFE instruction. 
The problem is that the definition of this flag is not clear.
What is it doing?
If it's really disabling WFE, keep the #ifdef to not use it.
For now, it is a nack of this patch for all reasons described before.
^ permalink raw reply	[flat|nested] 32+ messages in thread
* RE: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-02  8:29           ` Thomas Monjalon
  2024-07-03 13:27             ` Wathsala Wathawana Vithanage
@ 2024-07-03 16:19             ` Wathsala Wathawana Vithanage
  1 sibling, 0 replies; 32+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-07-03 16:19 UTC (permalink / raw)
  To: thomas
  Cc: Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand, nd, nd
> > > > +uint8_t wfet_en;
> > >
> > > It should be made static probably.
> > > This variable will be unused in some cases, needs #ifdef.
> > >
> >
> > This variable is used in all cases. It's 1 when WFET is available, 0 when it's not.
> 
> It would be 0 if you make it static yes.
No need to make it static to set it to 0, it will be placed in BSS section as is hence 
will always be initialized to 0.
Static only limits its scope to this file, has nothing to do with initialization to 0.
^ permalink raw reply	[flat|nested] 32+ messages in thread
* RE: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-03 13:33               ` Thomas Monjalon
@ 2024-07-03 16:58                 ` Wathsala Wathawana Vithanage
  2024-07-04 10:55                   ` Pavan Nikhilesh Bhagavatula
  0 siblings, 1 reply; 32+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-07-03 16:58 UTC (permalink / raw)
  To: thomas, Pavan Nikhilesh Bhagavatula
  Cc: Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand, nd
> 03/07/2024 15:27, Wathsala Wathawana Vithanage:
> > > RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED #ifdef block.
> > > > This patch fixes this issue by moving __RTE_ARM_WFE out of
> > > > RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED block.
> > > >
> > > > Perhaps we should change RTE_ARM_USE_WFE to something like
> > > > RTE_ARM_USE_WFE_IN_WAIT_UNTIL_EQUAL ?
> > >
> > > Yes perhaps.
> > RTE_ARM_USE_WFE is already used in drivers/event/cnxk/cn10k_worker.h
> > therefore RTE_ARM_USE_WFE_IN_WAIT_UNTIL_EQUAL is not suitable.
> > I wouldn't mind keeping RTE_ARM_USE_WFE because "USE_WFE" sounds
> like
> > an instruction to use WFE rather than an indication of availability of the WFE
> instruction.
> 
> The problem is that the definition of this flag is not clear.
> What is it doing?
> If it's really disabling WFE, keep the #ifdef to not use it.
> 
> For now, it is a nack of this patch for all reasons described before.
> 
Only other place where this flag is used is drivers/event/cnxk/cn10k_worker.h
b8dbcbe8a57 (Pavan Nikhilesh      2024-02-27 13:41:53 +0530 284) #if defined(RTE_ARM_USE_WFE)
Let’s ask Pavan why this flag is used in cn10k driver. 
From our perspective, WFE is available on all the supported arm platforms in DPDK.
Therefore, RTE_ARM_USE_WFE should be treated as a flag to choose between WFE
and non-WFE code paths due to performance reasons rather than as a flag that indicates
the availability of the instruction on the target CPU.
 
 
^ permalink raw reply	[flat|nested] 32+ messages in thread
* RE: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-03 16:58                 ` Wathsala Wathawana Vithanage
@ 2024-07-04 10:55                   ` Pavan Nikhilesh Bhagavatula
  2024-07-04 14:14                     ` Thomas Monjalon
  2024-07-05 16:01                     ` Wathsala Wathawana Vithanage
  0 siblings, 2 replies; 32+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2024-07-04 10:55 UTC (permalink / raw)
  To: Wathsala Wathawana Vithanage, thomas
  Cc: Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand, nd
> > 03/07/2024 15:27, Wathsala Wathawana Vithanage:
> > > > RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED #ifdef block.
> > > > > This patch fixes this issue by moving __RTE_ARM_WFE out of
> > > > > RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED block.
> > > > >
> > > > > Perhaps we should change RTE_ARM_USE_WFE to something like
> > > > > RTE_ARM_USE_WFE_IN_WAIT_UNTIL_EQUAL ?
> > > >
> > > > Yes perhaps.
> > > RTE_ARM_USE_WFE is already used in drivers/event/cnxk/cn10k_worker.h
> > > therefore RTE_ARM_USE_WFE_IN_WAIT_UNTIL_EQUAL is not suitable.
> > > I wouldn't mind keeping RTE_ARM_USE_WFE because "USE_WFE" sounds
> > like
> > > an instruction to use WFE rather than an indication of availability of the
> WFE
> > instruction.
> >
> > The problem is that the definition of this flag is not clear.
> > What is it doing?
> > If it's really disabling WFE, keep the #ifdef to not use it.
> >
> > For now, it is a nack of this patch for all reasons described before.
> >
> 
> Only other place where this flag is used is drivers/event/cnxk/cn10k_worker.h
> 
> b8dbcbe8a57 (Pavan Nikhilesh      2024-02-27 13:41:53 +0530 284) #if
> defined(RTE_ARM_USE_WFE)
> 
> Let’s ask Pavan why this flag is used in cn10k driver.
> 
> From our perspective, WFE is available on all the supported arm platforms in
> DPDK.
> Therefore, RTE_ARM_USE_WFE should be treated as a flag to choose between
> WFE
> and non-WFE code paths due to performance reasons rather than as a flag
> that indicates
> the availability of the instruction on the target CPU.
> 
We are using this flag to allow application to choose between WFE and non-WFE code path.
The non-WFE path performs slightly better.
> 
> 
> 
^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-04 10:55                   ` Pavan Nikhilesh Bhagavatula
@ 2024-07-04 14:14                     ` Thomas Monjalon
  2024-07-04 14:55                       ` Stephen Hemminger
  2024-07-05 16:01                     ` Wathsala Wathawana Vithanage
  1 sibling, 1 reply; 32+ messages in thread
From: Thomas Monjalon @ 2024-07-04 14:14 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula
  Cc: Wathsala Wathawana Vithanage, Tyler Retzlaff, Ruifeng Wang, dev,
	nd, Dhruv Tripathi, Honnappa Nagarahalli, Jack Bond-Preston,
	Nick Connolly, Vinod Krishna, david.marchand, nd
04/07/2024 12:55, Pavan Nikhilesh Bhagavatula:
> > > 03/07/2024 15:27, Wathsala Wathawana Vithanage:
> > > > > RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED #ifdef block.
> > > > > > This patch fixes this issue by moving __RTE_ARM_WFE out of
> > > > > > RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED block.
> > > > > >
> > > > > > Perhaps we should change RTE_ARM_USE_WFE to something like
> > > > > > RTE_ARM_USE_WFE_IN_WAIT_UNTIL_EQUAL ?
> > > > >
> > > > > Yes perhaps.
> > > > RTE_ARM_USE_WFE is already used in drivers/event/cnxk/cn10k_worker.h
> > > > therefore RTE_ARM_USE_WFE_IN_WAIT_UNTIL_EQUAL is not suitable.
> > > > I wouldn't mind keeping RTE_ARM_USE_WFE because "USE_WFE" sounds
> > > like
> > > > an instruction to use WFE rather than an indication of availability of the
> > WFE
> > > instruction.
> > >
> > > The problem is that the definition of this flag is not clear.
> > > What is it doing?
> > > If it's really disabling WFE, keep the #ifdef to not use it.
> > >
> > > For now, it is a nack of this patch for all reasons described before.
> > >
> > 
> > Only other place where this flag is used is drivers/event/cnxk/cn10k_worker.h
> > 
> > b8dbcbe8a57 (Pavan Nikhilesh      2024-02-27 13:41:53 +0530 284) #if
> > defined(RTE_ARM_USE_WFE)
> > 
> > Let’s ask Pavan why this flag is used in cn10k driver.
> > 
> > From our perspective, WFE is available on all the supported arm platforms in
> > DPDK.
> > Therefore, RTE_ARM_USE_WFE should be treated as a flag to choose between
> > WFE
> > and non-WFE code paths due to performance reasons rather than as a flag
> > that indicates
> > the availability of the instruction on the target CPU.
> > 
> 
> We are using this flag to allow application to choose between WFE and non-WFE code path.
> The non-WFE path performs slightly better.
What's the benefit of the WFE path then?
^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-04 14:14                     ` Thomas Monjalon
@ 2024-07-04 14:55                       ` Stephen Hemminger
  2024-07-04 18:59                         ` Thomas Monjalon
  0 siblings, 1 reply; 32+ messages in thread
From: Stephen Hemminger @ 2024-07-04 14:55 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: Pavan Nikhilesh Bhagavatula, Wathsala Wathawana Vithanage,
	Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand
On Thu, 04 Jul 2024 16:14:42 +0200
Thomas Monjalon <thomas@monjalon.net> wrote:
> > > Let’s ask Pavan why this flag is used in cn10k driver.
> > > 
> > > From our perspective, WFE is available on all the supported arm platforms in
> > > DPDK.
> > > Therefore, RTE_ARM_USE_WFE should be treated as a flag to choose between
> > > WFE
> > > and non-WFE code paths due to performance reasons rather than as a flag
> > > that indicates
> > > the availability of the instruction on the target CPU.
> > >   
> > 
> > We are using this flag to allow application to choose between WFE and non-WFE code path.
> > The non-WFE path performs slightly better.  
> 
> What's the benefit of the WFE path then?
WFE saves power at the expense of latency.
Maybe some form of hybrid approach would work best and could
be always used.
For example, many implementations of mutex do a short spin poll
then fall back to a waiting primitive (like futex).
^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-04 14:55                       ` Stephen Hemminger
@ 2024-07-04 18:59                         ` Thomas Monjalon
  2024-07-05 16:10                           ` [EXTERNAL] " Pavan Nikhilesh Bhagavatula
  0 siblings, 1 reply; 32+ messages in thread
From: Thomas Monjalon @ 2024-07-04 18:59 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Pavan Nikhilesh Bhagavatula, Wathsala Wathawana Vithanage,
	Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand
04/07/2024 16:55, Stephen Hemminger:
> On Thu, 04 Jul 2024 16:14:42 +0200
> Thomas Monjalon <thomas@monjalon.net> wrote:
> 
> > > > Let’s ask Pavan why this flag is used in cn10k driver.
> > > > 
> > > > From our perspective, WFE is available on all the supported arm platforms in
> > > > DPDK.
> > > > Therefore, RTE_ARM_USE_WFE should be treated as a flag to choose between
> > > > WFE
> > > > and non-WFE code paths due to performance reasons rather than as a flag
> > > > that indicates
> > > > the availability of the instruction on the target CPU.
> > > >   
> > > 
> > > We are using this flag to allow application to choose between WFE and non-WFE code path.
> > > The non-WFE path performs slightly better.  
> > 
> > What's the benefit of the WFE path then?
> 
> WFE saves power at the expense of latency.
Yes maybe there is a misunderstanding.
Pavan can you confirm you were saying "throughput is better on non-WFE"?
but "power consumption is lower on WFE path"?
> Maybe some form of hybrid approach would work best and could
> be always used.
> 
> For example, many implementations of mutex do a short spin poll
> then fall back to a waiting primitive (like futex).
^ permalink raw reply	[flat|nested] 32+ messages in thread
* RE: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-04 10:55                   ` Pavan Nikhilesh Bhagavatula
  2024-07-04 14:14                     ` Thomas Monjalon
@ 2024-07-05 16:01                     ` Wathsala Wathawana Vithanage
  2024-07-05 16:11                       ` Pavan Nikhilesh Bhagavatula
  1 sibling, 1 reply; 32+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-07-05 16:01 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, thomas
  Cc: Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand, nd, nd
> 
> We are using this flag to allow application to choose between WFE and non-
> WFE code path.
> The non-WFE path performs slightly better.
> 
Is it correct if I assume that this flag is not used to indicate the availability 
of WFE instruction in any of your platforms?
If so, then its usage is to choose between WFE and non-WFE implementations
for performance reasons.
^ permalink raw reply	[flat|nested] 32+ messages in thread
* RE: [EXTERNAL] Re: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-04 18:59                         ` Thomas Monjalon
@ 2024-07-05 16:10                           ` Pavan Nikhilesh Bhagavatula
  2024-07-07 17:37                             ` [EXTERNAL] " Honnappa Nagarahalli
  0 siblings, 1 reply; 32+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2024-07-05 16:10 UTC (permalink / raw)
  To: Thomas Monjalon, Stephen Hemminger
  Cc: Wathsala Wathawana Vithanage, Tyler Retzlaff, Ruifeng Wang, dev,
	nd, Dhruv Tripathi, Honnappa Nagarahalli, Jack Bond-Preston,
	Nick Connolly, Vinod Krishna, david.marchand
> 04/07/2024 16:55, Stephen Hemminger:
> > On Thu, 04 Jul 2024 16:14:42 +0200
> > Thomas Monjalon <thomas@monjalon.net> wrote:
> >
> > > > > Let’s ask Pavan why this flag is used in cn10k driver.
> > > > >
> > > > > From our perspective, WFE is available on all the supported arm
> platforms in
> > > > > DPDK.
> > > > > Therefore, RTE_ARM_USE_WFE should be treated as a flag to choose
> between
> > > > > WFE
> > > > > and non-WFE code paths due to performance reasons rather than as a
> flag
> > > > > that indicates
> > > > > the availability of the instruction on the target CPU.
> > > > >
> > > >
> > > > We are using this flag to allow application to choose between WFE and
> non-WFE code path.
> > > > The non-WFE path performs slightly better.
> > >
> > > What's the benefit of the WFE path then?
> >
> > WFE saves power at the expense of latency.
> 
> Yes maybe there is a misunderstanding.
> Pavan can you confirm you were saying "throughput is better on non-WFE"?
> but "power consumption is lower on WFE path"?
> 
Yes, throughput is better on non-WFE and power consumption is lower on WFE path.
But the statement cant be generalized for all use-cases, it depends on lot of factors.
So, we use RTE_ARM_USE_WFE to allow applications to decide what they want. 
> > Maybe some form of hybrid approach would work best and could
> > be always used.
> >
> > For example, many implementations of mutex do a short spin poll
> > then fall back to a waiting primitive (like futex).
This is already done across cnxk drivers and common layer I believe.
> 
^ permalink raw reply	[flat|nested] 32+ messages in thread
* RE: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-05 16:01                     ` Wathsala Wathawana Vithanage
@ 2024-07-05 16:11                       ` Pavan Nikhilesh Bhagavatula
  2024-07-05 16:25                         ` Wathsala Wathawana Vithanage
  0 siblings, 1 reply; 32+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2024-07-05 16:11 UTC (permalink / raw)
  To: Wathsala Wathawana Vithanage, thomas
  Cc: Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand, nd, nd
> -----Original Message-----
> From: Wathsala Wathawana Vithanage <wathsala.vithanage@arm.com>
> Sent: Friday, July 5, 2024 9:32 PM
> To: Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com>;
> thomas@monjalon.net
> Cc: Tyler Retzlaff <roretzla@linux.microsoft.com>; Ruifeng Wang
> <Ruifeng.Wang@arm.com>; dev@dpdk.org; nd <nd@arm.com>; Dhruv
> Tripathi <Dhruv.Tripathi@arm.com>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Jack Bond-Preston <jack.bond-
> preston@foss.arm.com>; Nick Connolly <Nick.Connolly@arm.com>; Vinod
> Krishna <Vinod.Krishna@arm.com>; david.marchand@redhat.com; nd
> <nd@arm.com>; nd <nd@arm.com>
> Subject: [EXTERNAL] RE: [PATCH v2 2/2] eal: add Arm WFET in power
> management intrinsics
> 
> > > We are using this flag to allow application to choose between WFE and
> non- > WFE code path. > The non-WFE path performs slightly better. > Is it
> correct if I assume that this flag is not used to indicate the availability
> 
> >
> > We are using this flag to allow application to choose between WFE and non-
> > WFE code path.
> > The non-WFE path performs slightly better.
> >
> 
> Is it correct if I assume that this flag is not used to indicate the availability
> of WFE instruction in any of your platforms?
> If so, then its usage is to choose between WFE and non-WFE implementations
> for performance reasons.
> 
Yes, its purely for performance reasons.
> 
> 
^ permalink raw reply	[flat|nested] 32+ messages in thread
* RE: [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-05 16:11                       ` Pavan Nikhilesh Bhagavatula
@ 2024-07-05 16:25                         ` Wathsala Wathawana Vithanage
  0 siblings, 0 replies; 32+ messages in thread
From: Wathsala Wathawana Vithanage @ 2024-07-05 16:25 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, thomas
  Cc: Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Honnappa Nagarahalli, Jack Bond-Preston, Nick Connolly,
	Vinod Krishna, david.marchand, nd, nd, nd
> > > > We are using this flag to allow application to choose between WFE
> > > > and
> > non- > WFE code path. > The non-WFE path performs slightly better. >
> > Is it correct if I assume that this flag is not used to indicate the
> > availability
> >
> > >
> > > We are using this flag to allow application to choose between WFE
> > > and non- WFE code path.
> > > The non-WFE path performs slightly better.
> > >
> >
> > Is it correct if I assume that this flag is not used to indicate the
> > availability of WFE instruction in any of your platforms?
> > If so, then its usage is to choose between WFE and non-WFE
> > implementations for performance reasons.
> >
> 
> Yes, its purely for performance reasons.
> 
So, in that sense I don't mind keeping the name RTE_ARM_USE_WFE
for the purpose of using WFE instruction on a code path based on
power/performance bias. This has been the case with
rte_wait_until_equal_N functions as well.
If the name RTE_ARM_USE_WFE is confusing, please suggest a better
name. We should also document the intended use of this flag.
^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [EXTERNAL] [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics
  2024-07-05 16:10                           ` [EXTERNAL] " Pavan Nikhilesh Bhagavatula
@ 2024-07-07 17:37                             ` Honnappa Nagarahalli
  0 siblings, 0 replies; 32+ messages in thread
From: Honnappa Nagarahalli @ 2024-07-07 17:37 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula
  Cc: thomas, Stephen Hemminger, Wathsala Wathawana Vithanage,
	Tyler Retzlaff, Ruifeng Wang, dev, nd, Dhruv Tripathi,
	Jack Bond-Preston, Nick Connolly, Vinod Krishna, david.marchand
> On Jul 5, 2024, at 5:10 PM, Pavan Nikhilesh Bhagavatula <pbhagavatula@marvell.com> wrote:
> 
>> 04/07/2024 16:55, Stephen Hemminger:
>>> On Thu, 04 Jul 2024 16:14:42 +0200
>>> Thomas Monjalon <thomas@monjalon.net> wrote:
>>> 
>>>>>> Let’s ask Pavan why this flag is used in cn10k driver.
>>>>>> 
>>>>>> From our perspective, WFE is available on all the supported arm
>> platforms in
>>>>>> DPDK.
>>>>>> Therefore, RTE_ARM_USE_WFE should be treated as a flag to choose
>> between
>>>>>> WFE
>>>>>> and non-WFE code paths due to performance reasons rather than as a
>> flag
>>>>>> that indicates
>>>>>> the availability of the instruction on the target CPU.
>>>>>> 
>>>>> 
>>>>> We are using this flag to allow application to choose between WFE and
>> non-WFE code path.
>>>>> The non-WFE path performs slightly better.
>>>> 
>>>> What's the benefit of the WFE path then?
>>> 
>>> WFE saves power at the expense of latency.
>> 
>> Yes maybe there is a misunderstanding.
>> Pavan can you confirm you were saying "throughput is better on non-WFE"?
>> but "power consumption is lower on WFE path"?
>> 
> 
> Yes, throughput is better on non-WFE and power consumption is lower on WFE path.
> 
> But the statement cant be generalized for all use-cases, it depends on lot of factors.
> So, we use RTE_ARM_USE_WFE to allow applications to decide what they want.
When WFE was enabled in DPDK, it was introduced in spinlock, ticket lock, ring etc. We ran the relevant micro-benchmarks and realized that with WFE the performance was lower. Hence it was added under a flag to allow the user to choose the feature (not as a way to say that the feature is present in the CPU).
IMO, we should not use this flag for PMD power savings. In PMD, use of WFE is purely for power savings and not performance. IIRC, there is already code and enough configurable parameters available that control when the PMD calls WFE (equivalent in other architectures). So, there is no need of a compile time flag for this. 
> 
>>> Maybe some form of hybrid approach would work best and could
>>> be always used.
>>> 
>>> For example, many implementations of mutex do a short spin poll
>>> then fall back to a waiting primitive (like futex).
> 
> This is already done across cnxk drivers and common layer I believe.
> 
>> 
> 
> 
^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH v3 1/4] eal: expand the availability of WFE and related instructions
  2024-06-04  4:44 [PATCH 1/2] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
  2024-06-04  4:44 ` [PATCH 2/2] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
@ 2024-07-15 22:53 ` Wathsala Vithanage
  2024-07-15 22:53   ` [PATCH v3 2/4] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
                     ` (2 more replies)
  2024-07-26 17:15 ` [PATCH v4 1/4] eal: expand the availability of WFE and related instructions Wathsala Vithanage
  2 siblings, 3 replies; 32+ messages in thread
From: Wathsala Vithanage @ 2024-07-15 22:53 UTC (permalink / raw)
  To: dev, Ruifeng Wang; +Cc: nd, Wathsala Vithanage, Dhruv Tripathi
The availability of __RTE_ARM_WFE, __RTE_ARM_SEV, __RTE_ARM_SEVL,
and  __RTE_ARM_LOAD_EXC_* macros for other applications, such as
PMD power management, should not depend on the choice of use of
these instructions in rte_wait_until_equal_N functions.
Therefore, this patch moves these macros out of control of the
RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED macro.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 lib/eal/arm/include/rte_pause_64.h | 4 ++--
 lib/eal/arm/rte_cpuflags.c         | 4 ++--
 lib/eal/arm/rte_power_intrinsics.c | 9 ++++-----
 3 files changed, 8 insertions(+), 9 deletions(-)
diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h
index 9e2dbf3531..8224f09ba7 100644
--- a/lib/eal/arm/include/rte_pause_64.h
+++ b/lib/eal/arm/include/rte_pause_64.h
@@ -24,8 +24,6 @@ static inline void rte_pause(void)
 	asm volatile("yield" ::: "memory");
 }
 
-#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
-
 /* Send a local event to quit WFE. */
 #define __RTE_ARM_SEVL() { asm volatile("sevl" : : : "memory"); }
 
@@ -148,6 +146,8 @@ static inline void rte_pause(void)
 		__RTE_ARM_LOAD_EXC_128(src, dst, memorder) \
 }
 
+#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
+
 static __rte_always_inline void
 rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
 		rte_memory_order memorder)
diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c
index 7ba4f8ba97..29884c285f 100644
--- a/lib/eal/arm/rte_cpuflags.c
+++ b/lib/eal/arm/rte_cpuflags.c
@@ -163,7 +163,7 @@ void
 rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics)
 {
 	memset(intrinsics, 0, sizeof(*intrinsics));
-#ifdef RTE_ARM_USE_WFE
+#ifdef RTE_ARCH_64
 	intrinsics->power_monitor = 1;
-#endif
+#endif /* RTE_ARCH_64 */
 }
diff --git a/lib/eal/arm/rte_power_intrinsics.c b/lib/eal/arm/rte_power_intrinsics.c
index f54cf59e80..b0056cce8b 100644
--- a/lib/eal/arm/rte_power_intrinsics.c
+++ b/lib/eal/arm/rte_power_intrinsics.c
@@ -17,7 +17,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 {
 	RTE_SET_USED(tsc_timestamp);
 
-#ifdef RTE_ARM_USE_WFE
+#ifdef RTE_ARCH_64
 	const unsigned int lcore_id = rte_lcore_id();
 	uint64_t cur_value;
 
@@ -57,7 +57,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	RTE_SET_USED(pmc);
 
 	return -ENOTSUP;
-#endif
+#endif /* RTE_ARCH_64 */
 }
 
 /**
@@ -81,13 +81,12 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
 {
 	RTE_SET_USED(lcore_id);
 
-#ifdef RTE_ARM_USE_WFE
+#ifdef RTE_ARCH_64
 	__RTE_ARM_SEV()
-
 	return 0;
 #else
 	return -ENOTSUP;
-#endif
+#endif /* RTE_ARCH_64 */
 }
 
 int
-- 
2.34.1
^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH v3 2/4] config/arm: adds Arm Neoverse N3 SoC
  2024-07-15 22:53 ` [PATCH v3 1/4] eal: expand the availability of WFE and related instructions Wathsala Vithanage
@ 2024-07-15 22:53   ` Wathsala Vithanage
  2024-07-16  1:52     ` Honnappa Nagarahalli
  2024-07-15 22:53   ` [PATCH v3 3/4] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
  2024-07-15 22:53   ` [PATCH v3 4/4] eal: describe Arm CPU features including WFXT Wathsala Vithanage
  2 siblings, 1 reply; 32+ messages in thread
From: Wathsala Vithanage @ 2024-07-15 22:53 UTC (permalink / raw)
  To: dev, Ruifeng Wang, Bruce Richardson; +Cc: nd, Wathsala Vithanage
Add Arm Neoverse N3 part number to build configuration.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
---
 config/arm/meson.build | 22 +++++++++++++++++++++-
 1 file changed, 21 insertions(+), 1 deletion(-)
diff --git a/config/arm/meson.build b/config/arm/meson.build
index 012935d5d7..8018218b76 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -116,6 +116,18 @@ part_number_config_arm = {
             ['RTE_MAX_LCORE', 144],
             ['RTE_MAX_NUMA_NODES', 2]
         ]
+    },
+    '0xd8e': {
+        'march': 'armv8.7-a',
+        'march_features': ['sve2'],
+        'fallback_march': 'armv8.5-a',
+        'flags': [
+            ['RTE_MACHINE', '"neoverse-n3"'],
+            ['RTE_ARM_FEATURE_ATOMICS', true],
+            ['RTE_ARM_FEATURE_WFXT', true],
+            ['RTE_MAX_LCORE', 192],
+            ['RTE_MAX_NUMA_NODES', 2]
+        ]
     }
 }
 implementer_arm = {
@@ -572,6 +584,13 @@ soc_n2 = {
     'numa': false
 }
 
+soc_n3 = {
+    'description': 'Arm Neoverse N3',
+    'implementer': '0x41',
+    'part_number': '0xd8e',
+    'numa': false
+}
+
 soc_odyssey = {
     'description': 'Marvell Odyssey',
     'implementer': '0x41',
@@ -699,6 +718,7 @@ socs = {
     'kunpeng930': soc_kunpeng930,
     'n1sdp': soc_n1sdp,
     'n2': soc_n2,
+    'n3': soc_n3,
     'odyssey' : soc_odyssey,
     'stingray': soc_stingray,
     'thunderx2': soc_thunderx2,
@@ -852,7 +872,7 @@ if update_flags
         if part_number_config.get('force_march', false)
             candidate_march = part_number_config['march']
         else
-            supported_marchs = ['armv9-a', 'armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a',
+            supported_marchs = ['armv9-a', 'armv8.7-a', 'armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a',
                                 'armv8.2-a', 'armv8.1-a', 'armv8-a']
             check_compiler_support = false
             foreach supported_march: supported_marchs
-- 
2.34.1
^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH v3 3/4] eal: add Arm WFET in power management intrinsics
  2024-07-15 22:53 ` [PATCH v3 1/4] eal: expand the availability of WFE and related instructions Wathsala Vithanage
  2024-07-15 22:53   ` [PATCH v3 2/4] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
@ 2024-07-15 22:53   ` Wathsala Vithanage
  2024-07-15 22:53   ` [PATCH v3 4/4] eal: describe Arm CPU features including WFXT Wathsala Vithanage
  2 siblings, 0 replies; 32+ messages in thread
From: Wathsala Vithanage @ 2024-07-15 22:53 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Tyler Retzlaff, Ruifeng Wang
  Cc: nd, Wathsala Vithanage, Dhruv Tripathi, Honnappa Nagarahalli,
	Jack Bond-Preston, Nick Connolly, Vinod Krishna
Wait for event with timeout (WFET) puts the CPU in a low power
mode and stays there until an event is signalled (SEV), loss of
an exclusive monitor or a timeout.
WFET is enabled selectively by checking FEAT_WFxT in Linux
auxiliary vector. If FEAT_WFxT is not available power management
will fallback to WFE.
WFE is available on all the Arm platforms supported by DPDK.
Therefore, the RTE_ARM_USE_WFE macro is not required to enable
the WFE feature for PMD power monitoring. 
RTE_ARM_USE_WFE is used at the build time to use the WFE instruction
where applicable in the code at the developer's discretion rather
than as an indicator of the instruction's availability.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>
Reviewed-by: Nick Connolly <nick.connolly@arm.com>
Reviewed-by: Vinod Krishna <vinod.krishna@arm.com>
---
 .mailmap                              |  2 ++
 app/test/test_cpuflags.c              |  3 +++
 lib/eal/arm/include/rte_cpuflags_64.h |  3 +++
 lib/eal/arm/include/rte_pause_64.h    | 16 +++++++++--
 lib/eal/arm/rte_cpuflags.c            |  1 +
 lib/eal/arm/rte_power_intrinsics.c    | 39 ++++++++++++++++++---------
 6 files changed, 50 insertions(+), 14 deletions(-)
diff --git a/.mailmap b/.mailmap
index f1e64286a1..a5c49d3702 100644
--- a/.mailmap
+++ b/.mailmap
@@ -338,6 +338,7 @@ Dexia Li <dexia.li@jaguarmicro.com>
 Dexuan Cui <decui@microsoft.com>
 Dharmik Thakkar <dharmikjayesh.thakkar@arm.com> <dharmik.thakkar@arm.com>
 Dheemanth Mallikarjun <dheemanthm@vmware.com>
+Dhruv Tripathi <dhruv.tripathi@arm.com>
 Diana Wang <na.wang@corigine.com>
 Didier Pallard <didier.pallard@6wind.com>
 Dilshod Urazov <dilshod.urazov@oktetlabs.ru>
@@ -1539,6 +1540,7 @@ Vincent Li <vincent.mc.li@gmail.com>
 Vincent S. Cojot <vcojot@redhat.com>
 Vinh Tran <vinh.t.tran10@gmail.com>
 Vipin Padmam Ramesh <vipinp@vmware.com>
+Vinod Krishna <vinod.krishna@arm.com>
 Vipin Varghese <vipin.varghese@amd.com> <vipin.varghese@intel.com>
 Vipul Ashri <vipul.ashri@oracle.com>
 Visa Hankala <visa@hankala.org>
diff --git a/app/test/test_cpuflags.c b/app/test/test_cpuflags.c
index a0ff74720c..22ab4dff0a 100644
--- a/app/test/test_cpuflags.c
+++ b/app/test/test_cpuflags.c
@@ -156,6 +156,9 @@ test_cpuflags(void)
 
 	printf("Check for SVEBF16:\t");
 	CHECK_FOR_FLAG(RTE_CPUFLAG_SVEBF16);
+
+	printf("Check for WFXT:\t");
+	CHECK_FOR_FLAG(RTE_CPUFLAG_WFXT);
 #endif
 
 #if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h
index afe70209c3..993d980a02 100644
--- a/lib/eal/arm/include/rte_cpuflags_64.h
+++ b/lib/eal/arm/include/rte_cpuflags_64.h
@@ -36,6 +36,9 @@ enum rte_cpu_flag_t {
 	RTE_CPUFLAG_SVEF64MM,
 	RTE_CPUFLAG_SVEBF16,
 	RTE_CPUFLAG_AARCH64,
+
+	/* WFET and WFIT instructions */
+	RTE_CPUFLAG_WFXT,
 };
 
 #include "generic/rte_cpuflags.h"
diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h
index 8224f09ba7..809403bffa 100644
--- a/lib/eal/arm/include/rte_pause_64.h
+++ b/lib/eal/arm/include/rte_pause_64.h
@@ -24,15 +24,27 @@ static inline void rte_pause(void)
 	asm volatile("yield" ::: "memory");
 }
 
-/* Send a local event to quit WFE. */
+/* Send a local event to quit WFE/WFxT. */
 #define __RTE_ARM_SEVL() { asm volatile("sevl" : : : "memory"); }
 
-/* Send a global event to quit WFE for all cores. */
+/* Send a global event to quit WFE/WFxT for all cores. */
 #define __RTE_ARM_SEV() { asm volatile("sev" : : : "memory"); }
 
 /* Put processor into low power WFE(Wait For Event) state. */
 #define __RTE_ARM_WFE() { asm volatile("wfe" : : : "memory"); }
 
+/* Put processor into low power WFET (WFE with Timeout) state. */
+#ifdef RTE_ARM_FEATURE_WFXT
+#define __RTE_ARM_WFET(t) {                              \
+	asm volatile("wfet %x[to]"                        \
+			:                                 \
+			: [to] "r" (t)                    \
+			: "memory");                      \
+	}
+#else
+#define __RTE_ARM_WFET(t) { RTE_SET_USED(t); }
+#endif
+
 /*
  * Atomic exclusive load from addr, it returns the 8-bit content of
  * *addr while making it 'monitored', when it is written by someone
diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c
index 29884c285f..88e10c6da0 100644
--- a/lib/eal/arm/rte_cpuflags.c
+++ b/lib/eal/arm/rte_cpuflags.c
@@ -115,6 +115,7 @@ const struct feature_entry rte_cpu_feature_table[] = {
 	FEAT_DEF(SVEF32MM,	REG_HWCAP2,   10)
 	FEAT_DEF(SVEF64MM,	REG_HWCAP2,   11)
 	FEAT_DEF(SVEBF16,	REG_HWCAP2,   12)
+	FEAT_DEF(WFXT,		REG_HWCAP2,   31)
 	FEAT_DEF(AARCH64,	REG_PLATFORM,  0)
 };
 #endif /* RTE_ARCH */
diff --git a/lib/eal/arm/rte_power_intrinsics.c b/lib/eal/arm/rte_power_intrinsics.c
index b0056cce8b..6475bbca04 100644
--- a/lib/eal/arm/rte_power_intrinsics.c
+++ b/lib/eal/arm/rte_power_intrinsics.c
@@ -4,19 +4,32 @@
 
 #include <errno.h>
 
+#include "rte_cpuflags.h"
 #include "rte_power_intrinsics.h"
 
 /**
- * This function uses WFE instruction to make lcore suspend
+ *  Set wfet_en if WFET is supported
+ */
+#ifdef RTE_ARCH_64
+static uint8_t wfet_en;
+#endif /* RTE_ARCH_64 */
+
+RTE_INIT(rte_power_intrinsics_init)
+{
+#ifdef RTE_ARCH_64
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_WFXT))
+		wfet_en = 1;
+#endif /* RTE_ARCH_64 */
+}
+
+/**
+ * This function uses WFE/WFET instruction to make lcore suspend
  * execution on ARM.
- * Note that timestamp based timeout is not supported yet.
  */
 int
 rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 		const uint64_t tsc_timestamp)
 {
-	RTE_SET_USED(tsc_timestamp);
-
 #ifdef RTE_ARCH_64
 	const unsigned int lcore_id = rte_lcore_id();
 	uint64_t cur_value;
@@ -33,28 +46,30 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 
 	switch (pmc->size) {
 	case sizeof(uint8_t):
-		__RTE_ARM_LOAD_EXC_8(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_8(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	case sizeof(uint16_t):
-		__RTE_ARM_LOAD_EXC_16(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_16(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	case sizeof(uint32_t):
-		__RTE_ARM_LOAD_EXC_32(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_32(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	case sizeof(uint64_t):
-		__RTE_ARM_LOAD_EXC_64(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_64(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	default:
 		return -EINVAL; /* unexpected size */
 	}
 
+	if (wfet_en)
+		__RTE_ARM_WFET(tsc_timestamp)
+	else
+		__RTE_ARM_WFE()
+
 	return 0;
 #else
 	RTE_SET_USED(pmc);
+	RTE_SET_USED(tsc_timestamp);
 
 	return -ENOTSUP;
 #endif /* RTE_ARCH_64 */
-- 
2.34.1
^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH v3 4/4] eal: describe Arm CPU features including WFXT
  2024-07-15 22:53 ` [PATCH v3 1/4] eal: expand the availability of WFE and related instructions Wathsala Vithanage
  2024-07-15 22:53   ` [PATCH v3 2/4] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
  2024-07-15 22:53   ` [PATCH v3 3/4] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
@ 2024-07-15 22:53   ` Wathsala Vithanage
  2024-07-16  1:02     ` Honnappa Nagarahalli
  2 siblings, 1 reply; 32+ messages in thread
From: Wathsala Vithanage @ 2024-07-15 22:53 UTC (permalink / raw)
  To: dev, Ruifeng Wang; +Cc: nd, Wathsala Vithanage, Dhruv Tripathi
RTE_CPUFALG_WFXT indicates the availability of WFET and WFIT
instructions. To preserve consistency across the rte_cpu_flag_t
enumeration, add descriptive comments to each Arm feature listed.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 lib/eal/arm/include/rte_cpuflags_64.h | 48 +++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h
index 993d980a02..eed67bf6ec 100644
--- a/lib/eal/arm/include/rte_cpuflags_64.h
+++ b/lib/eal/arm/include/rte_cpuflags_64.h
@@ -13,28 +13,76 @@ extern "C" {
  * Enumeration of all CPU features supported
  */
 enum rte_cpu_flag_t {
+	/* Floating point capability */
 	RTE_CPUFLAG_FP = 0,
+
+	/* Arm Neon extension */
 	RTE_CPUFLAG_NEON,
+
+	/* Generic timer event stream */
 	RTE_CPUFLAG_EVTSTRM,
+
+	/* AES instructions */
 	RTE_CPUFLAG_AES,
+
+	/* Polynomial multiply long instruction */
 	RTE_CPUFLAG_PMULL,
+
+	/* SHA1 instructions */
 	RTE_CPUFLAG_SHA1,
+
+	/* SHA2 instructions */
 	RTE_CPUFLAG_SHA2,
+
+	/* CRC32 instruction */
 	RTE_CPUFLAG_CRC32,
+
+	/*
+	 * LDADD, LDCLR, LDEOR, LDSET, LDSMAX, LDSMIN, LDUMAX, LDUMIN, CAS,
+	 * CASP, and SWP instructions
+	 */
 	RTE_CPUFLAG_ATOMICS,
+
+	/* Arm SVE extension */
 	RTE_CPUFLAG_SVE,
+
+	/* Arm SVE2 extension */
 	RTE_CPUFLAG_SVE2,
+
+	/* SVE-AES instructions */
 	RTE_CPUFLAG_SVEAES,
+
+	/* SVE-PMULL instruction */
 	RTE_CPUFLAG_SVEPMULL,
+
+	/* SVE bit permute instructions */
 	RTE_CPUFLAG_SVEBITPERM,
+
+	/* SVE-SHA3 instructions */
 	RTE_CPUFLAG_SVESHA3,
+
+	/* SVE-SM4 instructions */
 	RTE_CPUFLAG_SVESM4,
+
+	/* CFINV, RMIF, SETF16, SETF8, AXFLAG, and XAFLAG instructions */
 	RTE_CPUFLAG_FLAGM2,
+
+	/* FRINT32Z, FRINT32X, FRINT64Z, and FRINT64X instructions */
 	RTE_CPUFLAG_FRINT,
+
+	/* SVE Int8 matrix multiplication instructions */
 	RTE_CPUFLAG_SVEI8MM,
+
+	/* SVE FP32 floating-point matrix multiplication instructions */
 	RTE_CPUFLAG_SVEF32MM,
+
+	/* SVE FP64 floating-point matrix multiplication instructions */
 	RTE_CPUFLAG_SVEF64MM,
+
+	/* SVE BFloat16 instructions */
 	RTE_CPUFLAG_SVEBF16,
+
+	/* 64 bit execution state of the Arm architecture */
 	RTE_CPUFLAG_AARCH64,
 
 	/* WFET and WFIT instructions */
-- 
2.34.1
^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [PATCH v3 4/4] eal: describe Arm CPU features including WFXT
  2024-07-15 22:53   ` [PATCH v3 4/4] eal: describe Arm CPU features including WFXT Wathsala Vithanage
@ 2024-07-16  1:02     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 32+ messages in thread
From: Honnappa Nagarahalli @ 2024-07-16  1:02 UTC (permalink / raw)
  To: Wathsala Wathawana Vithanage; +Cc: dev, Ruifeng Wang, nd, Dhruv Tripathi
> On Jul 15, 2024, at 5:53 PM, Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:
> 
> RTE_CPUFALG_WFXT indicates the availability of WFET and WFIT
> instructions. To preserve consistency across the rte_cpu_flag_t
> enumeration, add descriptive comments to each Arm feature listed.
IMO, above can be simpler. “Add descriptive comments to each Arm feature listed in rte_cpu_flag_t"
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
Otherwise, looks good.
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> ---
> lib/eal/arm/include/rte_cpuflags_64.h | 48 +++++++++++++++++++++++++++
> 1 file changed, 48 insertions(+)
> 
> diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h
> index 993d980a02..eed67bf6ec 100644
> --- a/lib/eal/arm/include/rte_cpuflags_64.h
> +++ b/lib/eal/arm/include/rte_cpuflags_64.h
> @@ -13,28 +13,76 @@ extern "C" {
>  * Enumeration of all CPU features supported
>  */
> enum rte_cpu_flag_t {
> + /* Floating point capability */
> RTE_CPUFLAG_FP = 0,
> +
> + /* Arm Neon extension */
> RTE_CPUFLAG_NEON,
> +
> + /* Generic timer event stream */
> RTE_CPUFLAG_EVTSTRM,
> +
> + /* AES instructions */
> RTE_CPUFLAG_AES,
> +
> + /* Polynomial multiply long instruction */
> RTE_CPUFLAG_PMULL,
> +
> + /* SHA1 instructions */
> RTE_CPUFLAG_SHA1,
> +
> + /* SHA2 instructions */
> RTE_CPUFLAG_SHA2,
> +
> + /* CRC32 instruction */
> RTE_CPUFLAG_CRC32,
> +
> + /*
> + * LDADD, LDCLR, LDEOR, LDSET, LDSMAX, LDSMIN, LDUMAX, LDUMIN, CAS,
> + * CASP, and SWP instructions
> + */
> RTE_CPUFLAG_ATOMICS,
> +
> + /* Arm SVE extension */
> RTE_CPUFLAG_SVE,
> +
> + /* Arm SVE2 extension */
> RTE_CPUFLAG_SVE2,
> +
> + /* SVE-AES instructions */
> RTE_CPUFLAG_SVEAES,
> +
> + /* SVE-PMULL instruction */
> RTE_CPUFLAG_SVEPMULL,
> +
> + /* SVE bit permute instructions */
> RTE_CPUFLAG_SVEBITPERM,
> +
> + /* SVE-SHA3 instructions */
> RTE_CPUFLAG_SVESHA3,
> +
> + /* SVE-SM4 instructions */
> RTE_CPUFLAG_SVESM4,
> +
> + /* CFINV, RMIF, SETF16, SETF8, AXFLAG, and XAFLAG instructions */
> RTE_CPUFLAG_FLAGM2,
> +
> + /* FRINT32Z, FRINT32X, FRINT64Z, and FRINT64X instructions */
> RTE_CPUFLAG_FRINT,
> +
> + /* SVE Int8 matrix multiplication instructions */
> RTE_CPUFLAG_SVEI8MM,
> +
> + /* SVE FP32 floating-point matrix multiplication instructions */
> RTE_CPUFLAG_SVEF32MM,
> +
> + /* SVE FP64 floating-point matrix multiplication instructions */
> RTE_CPUFLAG_SVEF64MM,
> +
> + /* SVE BFloat16 instructions */
> RTE_CPUFLAG_SVEBF16,
> +
> + /* 64 bit execution state of the Arm architecture */
> RTE_CPUFLAG_AARCH64,
> 
> /* WFET and WFIT instructions */
> -- 
> 2.34.1
> 
^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [PATCH v3 2/4] config/arm: adds Arm Neoverse N3 SoC
  2024-07-15 22:53   ` [PATCH v3 2/4] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
@ 2024-07-16  1:52     ` Honnappa Nagarahalli
  0 siblings, 0 replies; 32+ messages in thread
From: Honnappa Nagarahalli @ 2024-07-16  1:52 UTC (permalink / raw)
  To: Wathsala Wathawana Vithanage; +Cc: dev, Ruifeng Wang, Bruce Richardson, nd
> On Jul 15, 2024, at 5:53 PM, Wathsala Vithanage <wathsala.vithanage@arm.com> wrote:
> 
> Add Arm Neoverse N3 part number to build configuration.
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> ---
> config/arm/meson.build | 22 +++++++++++++++++++++-
> 1 file changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/config/arm/meson.build b/config/arm/meson.build
> index 012935d5d7..8018218b76 100644
> --- a/config/arm/meson.build
> +++ b/config/arm/meson.build
> @@ -116,6 +116,18 @@ part_number_config_arm = {
>             ['RTE_MAX_LCORE', 144],
>             ['RTE_MAX_NUMA_NODES', 2]
>         ]
> +    },
> +    '0xd8e': {
> +        'march': 'armv8.7-a’,
Should the above be ‘armv9-a’?
> +        'march_features': ['sve2'],
> +        'fallback_march': 'armv8.5-a',
> +        'flags': [
> +            ['RTE_MACHINE', '"neoverse-n3"'],
> +            ['RTE_ARM_FEATURE_ATOMICS', true],
> +            ['RTE_ARM_FEATURE_WFXT', true],
> +            ['RTE_MAX_LCORE', 192],
> +            ['RTE_MAX_NUMA_NODES', 2]
> +        ]
>     }
> }
> implementer_arm = {
> @@ -572,6 +584,13 @@ soc_n2 = {
>     'numa': false
> }
> 
> +soc_n3 = {
> +    'description': 'Arm Neoverse N3',
> +    'implementer': '0x41',
> +    'part_number': '0xd8e',
> +    'numa': false
> +}
> +
> soc_odyssey = {
>     'description': 'Marvell Odyssey',
>     'implementer': '0x41',
> @@ -699,6 +718,7 @@ socs = {
>     'kunpeng930': soc_kunpeng930,
>     'n1sdp': soc_n1sdp,
>     'n2': soc_n2,
> +    'n3': soc_n3,
>     'odyssey' : soc_odyssey,
>     'stingray': soc_stingray,
>     'thunderx2': soc_thunderx2,
> @@ -852,7 +872,7 @@ if update_flags
>         if part_number_config.get('force_march', false)
>             candidate_march = part_number_config['march']
>         else
> -            supported_marchs = ['armv9-a', 'armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a',
> +            supported_marchs = ['armv9-a', 'armv8.7-a', 'armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a',
>                                 'armv8.2-a', 'armv8.1-a', 'armv8-a']
>             check_compiler_support = false
>             foreach supported_march: supported_marchs
> -- 
> 2.34.1
> 
^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH v4 1/4] eal: expand the availability of WFE and related instructions
  2024-06-04  4:44 [PATCH 1/2] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
  2024-06-04  4:44 ` [PATCH 2/2] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
  2024-07-15 22:53 ` [PATCH v3 1/4] eal: expand the availability of WFE and related instructions Wathsala Vithanage
@ 2024-07-26 17:15 ` Wathsala Vithanage
  2024-07-26 17:15   ` [PATCH v4 2/4] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
                     ` (2 more replies)
  2 siblings, 3 replies; 32+ messages in thread
From: Wathsala Vithanage @ 2024-07-26 17:15 UTC (permalink / raw)
  To: Thomas Monjalon, Ruifeng Wang
  Cc: honnappa.nagarahalli, dev, Wathsala Vithanage, Dhruv Tripathi
The availability of __RTE_ARM_WFE, __RTE_ARM_SEV, __RTE_ARM_SEVL,
and  __RTE_ARM_LOAD_EXC_* macros for other applications, such as
PMD power management, should not depend on the choice of use of
these instructions in rte_wait_until_equal_N functions.
Therefore, this patch moves these macros out of control of the
RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED macro.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 .mailmap                           | 1 +
 lib/eal/arm/include/rte_pause_64.h | 4 ++--
 lib/eal/arm/rte_cpuflags.c         | 4 ++--
 lib/eal/arm/rte_power_intrinsics.c | 9 ++++-----
 4 files changed, 9 insertions(+), 9 deletions(-)
diff --git a/.mailmap b/.mailmap
index f1e64286a1..9c28b74655 100644
--- a/.mailmap
+++ b/.mailmap
@@ -338,6 +338,7 @@ Dexia Li <dexia.li@jaguarmicro.com>
 Dexuan Cui <decui@microsoft.com>
 Dharmik Thakkar <dharmikjayesh.thakkar@arm.com> <dharmik.thakkar@arm.com>
 Dheemanth Mallikarjun <dheemanthm@vmware.com>
+Dhruv Tripathi <dhruv.tripathi@arm.com>
 Diana Wang <na.wang@corigine.com>
 Didier Pallard <didier.pallard@6wind.com>
 Dilshod Urazov <dilshod.urazov@oktetlabs.ru>
diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h
index 9e2dbf3531..8224f09ba7 100644
--- a/lib/eal/arm/include/rte_pause_64.h
+++ b/lib/eal/arm/include/rte_pause_64.h
@@ -24,8 +24,6 @@ static inline void rte_pause(void)
 	asm volatile("yield" ::: "memory");
 }
 
-#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
-
 /* Send a local event to quit WFE. */
 #define __RTE_ARM_SEVL() { asm volatile("sevl" : : : "memory"); }
 
@@ -148,6 +146,8 @@ static inline void rte_pause(void)
 		__RTE_ARM_LOAD_EXC_128(src, dst, memorder) \
 }
 
+#ifdef RTE_WAIT_UNTIL_EQUAL_ARCH_DEFINED
+
 static __rte_always_inline void
 rte_wait_until_equal_16(volatile uint16_t *addr, uint16_t expected,
 		rte_memory_order memorder)
diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c
index 7ba4f8ba97..29884c285f 100644
--- a/lib/eal/arm/rte_cpuflags.c
+++ b/lib/eal/arm/rte_cpuflags.c
@@ -163,7 +163,7 @@ void
 rte_cpu_get_intrinsics_support(struct rte_cpu_intrinsics *intrinsics)
 {
 	memset(intrinsics, 0, sizeof(*intrinsics));
-#ifdef RTE_ARM_USE_WFE
+#ifdef RTE_ARCH_64
 	intrinsics->power_monitor = 1;
-#endif
+#endif /* RTE_ARCH_64 */
 }
diff --git a/lib/eal/arm/rte_power_intrinsics.c b/lib/eal/arm/rte_power_intrinsics.c
index f54cf59e80..b0056cce8b 100644
--- a/lib/eal/arm/rte_power_intrinsics.c
+++ b/lib/eal/arm/rte_power_intrinsics.c
@@ -17,7 +17,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 {
 	RTE_SET_USED(tsc_timestamp);
 
-#ifdef RTE_ARM_USE_WFE
+#ifdef RTE_ARCH_64
 	const unsigned int lcore_id = rte_lcore_id();
 	uint64_t cur_value;
 
@@ -57,7 +57,7 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 	RTE_SET_USED(pmc);
 
 	return -ENOTSUP;
-#endif
+#endif /* RTE_ARCH_64 */
 }
 
 /**
@@ -81,13 +81,12 @@ rte_power_monitor_wakeup(const unsigned int lcore_id)
 {
 	RTE_SET_USED(lcore_id);
 
-#ifdef RTE_ARM_USE_WFE
+#ifdef RTE_ARCH_64
 	__RTE_ARM_SEV()
-
 	return 0;
 #else
 	return -ENOTSUP;
-#endif
+#endif /* RTE_ARCH_64 */
 }
 
 int
-- 
2.34.1
^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH v4 2/4] config/arm: adds Arm Neoverse N3 SoC
  2024-07-26 17:15 ` [PATCH v4 1/4] eal: expand the availability of WFE and related instructions Wathsala Vithanage
@ 2024-07-26 17:15   ` Wathsala Vithanage
  2024-07-26 17:15   ` [PATCH v4 3/4] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
  2024-07-26 17:15   ` [PATCH v4 4/4] eal: describe Arm CPU features including WFXT Wathsala Vithanage
  2 siblings, 0 replies; 32+ messages in thread
From: Wathsala Vithanage @ 2024-07-26 17:15 UTC (permalink / raw)
  To: Ruifeng Wang, Bruce Richardson
  Cc: honnappa.nagarahalli, dev, Wathsala Vithanage, Dhruv Tripathi
Add Arm Neoverse N3 part number to build configuration.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 config/arm/meson.build | 31 ++++++++++++++++++++++++++++++-
 1 file changed, 30 insertions(+), 1 deletion(-)
diff --git a/config/arm/meson.build b/config/arm/meson.build
index 012935d5d7..acf8e933ab 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -116,6 +116,27 @@ part_number_config_arm = {
             ['RTE_MAX_LCORE', 144],
             ['RTE_MAX_NUMA_NODES', 2]
         ]
+    },
+    '0xd8e': {
+# Only when -march=armv9-a+wfxt is used will the WFET
+# feature be compiled with armv9 instructions.
+# However, +wfxt is not supported by GCC at the moment.
+# Although armv9-a is the fitting version of Arm ISA for
+# Neoverse N3, it cannot be used when enabling wfxt for
+# the above reasons.
+# The workaround for this is to use armv8.7-a, which
+# doesn't require +wfxt for binutils version 2.36 or
+# greater.
+        'march': 'armv8.7-a',
+        'march_features': ['sve2'],
+        'fallback_march': 'armv8.5-a',
+        'flags': [
+            ['RTE_MACHINE', '"neoverse-n3"'],
+            ['RTE_ARM_FEATURE_ATOMICS', true],
+            ['RTE_ARM_FEATURE_WFXT', true],
+            ['RTE_MAX_LCORE', 192],
+            ['RTE_MAX_NUMA_NODES', 2]
+        ]
     }
 }
 implementer_arm = {
@@ -572,6 +593,13 @@ soc_n2 = {
     'numa': false
 }
 
+soc_n3 = {
+    'description': 'Arm Neoverse N3',
+    'implementer': '0x41',
+    'part_number': '0xd8e',
+    'numa': false
+}
+
 soc_odyssey = {
     'description': 'Marvell Odyssey',
     'implementer': '0x41',
@@ -699,6 +727,7 @@ socs = {
     'kunpeng930': soc_kunpeng930,
     'n1sdp': soc_n1sdp,
     'n2': soc_n2,
+    'n3': soc_n3,
     'odyssey' : soc_odyssey,
     'stingray': soc_stingray,
     'thunderx2': soc_thunderx2,
@@ -852,7 +881,7 @@ if update_flags
         if part_number_config.get('force_march', false)
             candidate_march = part_number_config['march']
         else
-            supported_marchs = ['armv9-a', 'armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a',
+            supported_marchs = ['armv9-a', 'armv8.7-a', 'armv8.6-a', 'armv8.5-a', 'armv8.4-a', 'armv8.3-a',
                                 'armv8.2-a', 'armv8.1-a', 'armv8-a']
             check_compiler_support = false
             foreach supported_march: supported_marchs
-- 
2.34.1
^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH v4 3/4] eal: add Arm WFET in power management intrinsics
  2024-07-26 17:15 ` [PATCH v4 1/4] eal: expand the availability of WFE and related instructions Wathsala Vithanage
  2024-07-26 17:15   ` [PATCH v4 2/4] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
@ 2024-07-26 17:15   ` Wathsala Vithanage
  2024-10-10 17:01     ` Thomas Monjalon
  2024-07-26 17:15   ` [PATCH v4 4/4] eal: describe Arm CPU features including WFXT Wathsala Vithanage
  2 siblings, 1 reply; 32+ messages in thread
From: Wathsala Vithanage @ 2024-07-26 17:15 UTC (permalink / raw)
  To: Thomas Monjalon, Tyler Retzlaff, Ruifeng Wang
  Cc: honnappa.nagarahalli, dev, Wathsala Vithanage, Dhruv Tripathi,
	Jack Bond-Preston, Nick Connolly, Vinod Krishna
Wait for event with timeout (WFET) puts the CPU in a low power
mode and stays there until an event is signalled (SEV), loss of
an exclusive monitor or a timeout.
WFET is enabled selectively by checking FEAT_WFxT in Linux
auxiliary vector. If FEAT_WFxT is not available power management
will fallback to WFE.
WFE is available on all the Arm platforms supported by DPDK.
Therefore, the RTE_ARM_USE_WFE macro is not required to enable
the WFE feature for PMD power monitoring. 
RTE_ARM_USE_WFE is used at the build time to use the WFE instruction
where applicable in the code at the developer's discretion rather
than as an indicator of the instruction's availability.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>
Reviewed-by: Nick Connolly <nick.connolly@arm.com>
Reviewed-by: Vinod Krishna <vinod.krishna@arm.com>
---
 .mailmap                              |  1 +
 app/test/test_cpuflags.c              |  3 +++
 lib/eal/arm/include/rte_cpuflags_64.h |  3 +++
 lib/eal/arm/include/rte_pause_64.h    | 16 +++++++++--
 lib/eal/arm/rte_cpuflags.c            |  1 +
 lib/eal/arm/rte_power_intrinsics.c    | 39 ++++++++++++++++++---------
 6 files changed, 49 insertions(+), 14 deletions(-)
diff --git a/.mailmap b/.mailmap
index 9c28b74655..a5c49d3702 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1540,6 +1540,7 @@ Vincent Li <vincent.mc.li@gmail.com>
 Vincent S. Cojot <vcojot@redhat.com>
 Vinh Tran <vinh.t.tran10@gmail.com>
 Vipin Padmam Ramesh <vipinp@vmware.com>
+Vinod Krishna <vinod.krishna@arm.com>
 Vipin Varghese <vipin.varghese@amd.com> <vipin.varghese@intel.com>
 Vipul Ashri <vipul.ashri@oracle.com>
 Visa Hankala <visa@hankala.org>
diff --git a/app/test/test_cpuflags.c b/app/test/test_cpuflags.c
index a0ff74720c..22ab4dff0a 100644
--- a/app/test/test_cpuflags.c
+++ b/app/test/test_cpuflags.c
@@ -156,6 +156,9 @@ test_cpuflags(void)
 
 	printf("Check for SVEBF16:\t");
 	CHECK_FOR_FLAG(RTE_CPUFLAG_SVEBF16);
+
+	printf("Check for WFXT:\t");
+	CHECK_FOR_FLAG(RTE_CPUFLAG_WFXT);
 #endif
 
 #if defined(RTE_ARCH_X86_64) || defined(RTE_ARCH_I686)
diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h
index afe70209c3..993d980a02 100644
--- a/lib/eal/arm/include/rte_cpuflags_64.h
+++ b/lib/eal/arm/include/rte_cpuflags_64.h
@@ -36,6 +36,9 @@ enum rte_cpu_flag_t {
 	RTE_CPUFLAG_SVEF64MM,
 	RTE_CPUFLAG_SVEBF16,
 	RTE_CPUFLAG_AARCH64,
+
+	/* WFET and WFIT instructions */
+	RTE_CPUFLAG_WFXT,
 };
 
 #include "generic/rte_cpuflags.h"
diff --git a/lib/eal/arm/include/rte_pause_64.h b/lib/eal/arm/include/rte_pause_64.h
index 8224f09ba7..809403bffa 100644
--- a/lib/eal/arm/include/rte_pause_64.h
+++ b/lib/eal/arm/include/rte_pause_64.h
@@ -24,15 +24,27 @@ static inline void rte_pause(void)
 	asm volatile("yield" ::: "memory");
 }
 
-/* Send a local event to quit WFE. */
+/* Send a local event to quit WFE/WFxT. */
 #define __RTE_ARM_SEVL() { asm volatile("sevl" : : : "memory"); }
 
-/* Send a global event to quit WFE for all cores. */
+/* Send a global event to quit WFE/WFxT for all cores. */
 #define __RTE_ARM_SEV() { asm volatile("sev" : : : "memory"); }
 
 /* Put processor into low power WFE(Wait For Event) state. */
 #define __RTE_ARM_WFE() { asm volatile("wfe" : : : "memory"); }
 
+/* Put processor into low power WFET (WFE with Timeout) state. */
+#ifdef RTE_ARM_FEATURE_WFXT
+#define __RTE_ARM_WFET(t) {                              \
+	asm volatile("wfet %x[to]"                        \
+			:                                 \
+			: [to] "r" (t)                    \
+			: "memory");                      \
+	}
+#else
+#define __RTE_ARM_WFET(t) { RTE_SET_USED(t); }
+#endif
+
 /*
  * Atomic exclusive load from addr, it returns the 8-bit content of
  * *addr while making it 'monitored', when it is written by someone
diff --git a/lib/eal/arm/rte_cpuflags.c b/lib/eal/arm/rte_cpuflags.c
index 29884c285f..88e10c6da0 100644
--- a/lib/eal/arm/rte_cpuflags.c
+++ b/lib/eal/arm/rte_cpuflags.c
@@ -115,6 +115,7 @@ const struct feature_entry rte_cpu_feature_table[] = {
 	FEAT_DEF(SVEF32MM,	REG_HWCAP2,   10)
 	FEAT_DEF(SVEF64MM,	REG_HWCAP2,   11)
 	FEAT_DEF(SVEBF16,	REG_HWCAP2,   12)
+	FEAT_DEF(WFXT,		REG_HWCAP2,   31)
 	FEAT_DEF(AARCH64,	REG_PLATFORM,  0)
 };
 #endif /* RTE_ARCH */
diff --git a/lib/eal/arm/rte_power_intrinsics.c b/lib/eal/arm/rte_power_intrinsics.c
index b0056cce8b..6475bbca04 100644
--- a/lib/eal/arm/rte_power_intrinsics.c
+++ b/lib/eal/arm/rte_power_intrinsics.c
@@ -4,19 +4,32 @@
 
 #include <errno.h>
 
+#include "rte_cpuflags.h"
 #include "rte_power_intrinsics.h"
 
 /**
- * This function uses WFE instruction to make lcore suspend
+ *  Set wfet_en if WFET is supported
+ */
+#ifdef RTE_ARCH_64
+static uint8_t wfet_en;
+#endif /* RTE_ARCH_64 */
+
+RTE_INIT(rte_power_intrinsics_init)
+{
+#ifdef RTE_ARCH_64
+	if (rte_cpu_get_flag_enabled(RTE_CPUFLAG_WFXT))
+		wfet_en = 1;
+#endif /* RTE_ARCH_64 */
+}
+
+/**
+ * This function uses WFE/WFET instruction to make lcore suspend
  * execution on ARM.
- * Note that timestamp based timeout is not supported yet.
  */
 int
 rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 		const uint64_t tsc_timestamp)
 {
-	RTE_SET_USED(tsc_timestamp);
-
 #ifdef RTE_ARCH_64
 	const unsigned int lcore_id = rte_lcore_id();
 	uint64_t cur_value;
@@ -33,28 +46,30 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc,
 
 	switch (pmc->size) {
 	case sizeof(uint8_t):
-		__RTE_ARM_LOAD_EXC_8(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_8(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	case sizeof(uint16_t):
-		__RTE_ARM_LOAD_EXC_16(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_16(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	case sizeof(uint32_t):
-		__RTE_ARM_LOAD_EXC_32(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_32(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	case sizeof(uint64_t):
-		__RTE_ARM_LOAD_EXC_64(pmc->addr, cur_value, rte_memory_order_relaxed)
-		__RTE_ARM_WFE()
+		__RTE_ARM_LOAD_EXC_64(pmc->addr, cur_value, rte_memory_order_relaxed);
 		break;
 	default:
 		return -EINVAL; /* unexpected size */
 	}
 
+	if (wfet_en)
+		__RTE_ARM_WFET(tsc_timestamp)
+	else
+		__RTE_ARM_WFE()
+
 	return 0;
 #else
 	RTE_SET_USED(pmc);
+	RTE_SET_USED(tsc_timestamp);
 
 	return -ENOTSUP;
 #endif /* RTE_ARCH_64 */
-- 
2.34.1
^ permalink raw reply	[flat|nested] 32+ messages in thread
* [PATCH v4 4/4] eal: describe Arm CPU features including WFXT
  2024-07-26 17:15 ` [PATCH v4 1/4] eal: expand the availability of WFE and related instructions Wathsala Vithanage
  2024-07-26 17:15   ` [PATCH v4 2/4] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
  2024-07-26 17:15   ` [PATCH v4 3/4] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
@ 2024-07-26 17:15   ` Wathsala Vithanage
  2 siblings, 0 replies; 32+ messages in thread
From: Wathsala Vithanage @ 2024-07-26 17:15 UTC (permalink / raw)
  To: Ruifeng Wang
  Cc: honnappa.nagarahalli, dev, Wathsala Vithanage, Dhruv Tripathi
Add descriptive comments to each Arm feature listed in rte_cpu_flag_t.
Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
---
 lib/eal/arm/include/rte_cpuflags_64.h | 48 +++++++++++++++++++++++++++
 1 file changed, 48 insertions(+)
diff --git a/lib/eal/arm/include/rte_cpuflags_64.h b/lib/eal/arm/include/rte_cpuflags_64.h
index 993d980a02..eed67bf6ec 100644
--- a/lib/eal/arm/include/rte_cpuflags_64.h
+++ b/lib/eal/arm/include/rte_cpuflags_64.h
@@ -13,28 +13,76 @@ extern "C" {
  * Enumeration of all CPU features supported
  */
 enum rte_cpu_flag_t {
+	/* Floating point capability */
 	RTE_CPUFLAG_FP = 0,
+
+	/* Arm Neon extension */
 	RTE_CPUFLAG_NEON,
+
+	/* Generic timer event stream */
 	RTE_CPUFLAG_EVTSTRM,
+
+	/* AES instructions */
 	RTE_CPUFLAG_AES,
+
+	/* Polynomial multiply long instruction */
 	RTE_CPUFLAG_PMULL,
+
+	/* SHA1 instructions */
 	RTE_CPUFLAG_SHA1,
+
+	/* SHA2 instructions */
 	RTE_CPUFLAG_SHA2,
+
+	/* CRC32 instruction */
 	RTE_CPUFLAG_CRC32,
+
+	/*
+	 * LDADD, LDCLR, LDEOR, LDSET, LDSMAX, LDSMIN, LDUMAX, LDUMIN, CAS,
+	 * CASP, and SWP instructions
+	 */
 	RTE_CPUFLAG_ATOMICS,
+
+	/* Arm SVE extension */
 	RTE_CPUFLAG_SVE,
+
+	/* Arm SVE2 extension */
 	RTE_CPUFLAG_SVE2,
+
+	/* SVE-AES instructions */
 	RTE_CPUFLAG_SVEAES,
+
+	/* SVE-PMULL instruction */
 	RTE_CPUFLAG_SVEPMULL,
+
+	/* SVE bit permute instructions */
 	RTE_CPUFLAG_SVEBITPERM,
+
+	/* SVE-SHA3 instructions */
 	RTE_CPUFLAG_SVESHA3,
+
+	/* SVE-SM4 instructions */
 	RTE_CPUFLAG_SVESM4,
+
+	/* CFINV, RMIF, SETF16, SETF8, AXFLAG, and XAFLAG instructions */
 	RTE_CPUFLAG_FLAGM2,
+
+	/* FRINT32Z, FRINT32X, FRINT64Z, and FRINT64X instructions */
 	RTE_CPUFLAG_FRINT,
+
+	/* SVE Int8 matrix multiplication instructions */
 	RTE_CPUFLAG_SVEI8MM,
+
+	/* SVE FP32 floating-point matrix multiplication instructions */
 	RTE_CPUFLAG_SVEF32MM,
+
+	/* SVE FP64 floating-point matrix multiplication instructions */
 	RTE_CPUFLAG_SVEF64MM,
+
+	/* SVE BFloat16 instructions */
 	RTE_CPUFLAG_SVEBF16,
+
+	/* 64 bit execution state of the Arm architecture */
 	RTE_CPUFLAG_AARCH64,
 
 	/* WFET and WFIT instructions */
-- 
2.34.1
^ permalink raw reply	[flat|nested] 32+ messages in thread
* Re: [PATCH v4 3/4] eal: add Arm WFET in power management intrinsics
  2024-07-26 17:15   ` [PATCH v4 3/4] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
@ 2024-10-10 17:01     ` Thomas Monjalon
  0 siblings, 0 replies; 32+ messages in thread
From: Thomas Monjalon @ 2024-10-10 17:01 UTC (permalink / raw)
  To: Wathsala Vithanage
  Cc: Tyler Retzlaff, Ruifeng Wang, honnappa.nagarahalli, dev,
	Wathsala Vithanage, Dhruv Tripathi, Jack Bond-Preston,
	Nick Connolly, Vinod Krishna
26/07/2024 19:15, Wathsala Vithanage:
> Wait for event with timeout (WFET) puts the CPU in a low power
> mode and stays there until an event is signalled (SEV), loss of
> an exclusive monitor or a timeout.
> WFET is enabled selectively by checking FEAT_WFxT in Linux
> auxiliary vector. If FEAT_WFxT is not available power management
> will fallback to WFE.
> WFE is available on all the Arm platforms supported by DPDK.
> Therefore, the RTE_ARM_USE_WFE macro is not required to enable
> the WFE feature for PMD power monitoring. 
> RTE_ARM_USE_WFE is used at the build time to use the WFE instruction
> where applicable in the code at the developer's discretion rather
> than as an indicator of the instruction's availability.
> 
> Signed-off-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> Reviewed-by: Dhruv Tripathi <dhruv.tripathi@arm.com>
> Reviewed-by: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>
> Reviewed-by: Jack Bond-Preston <jack.bond-preston@foss.arm.com>
> Reviewed-by: Nick Connolly <nick.connolly@arm.com>
> Reviewed-by: Vinod Krishna <vinod.krishna@arm.com>
Series applied, thanks.
^ permalink raw reply	[flat|nested] 32+ messages in thread
end of thread, other threads:[~2024-10-10 17:02 UTC | newest]
Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-06-04  4:44 [PATCH 1/2] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
2024-06-04  4:44 ` [PATCH 2/2] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
2024-06-04 15:41   ` Stephen Hemminger
2024-06-19  6:45   ` [PATCH v2 1/2] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
2024-06-19  6:45     ` [PATCH v2 2/2] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
2024-06-27 15:30       ` Thomas Monjalon
2024-07-01 21:34         ` Wathsala Wathawana Vithanage
2024-07-02  8:29           ` Thomas Monjalon
2024-07-03 13:27             ` Wathsala Wathawana Vithanage
2024-07-03 13:33               ` Thomas Monjalon
2024-07-03 16:58                 ` Wathsala Wathawana Vithanage
2024-07-04 10:55                   ` Pavan Nikhilesh Bhagavatula
2024-07-04 14:14                     ` Thomas Monjalon
2024-07-04 14:55                       ` Stephen Hemminger
2024-07-04 18:59                         ` Thomas Monjalon
2024-07-05 16:10                           ` [EXTERNAL] " Pavan Nikhilesh Bhagavatula
2024-07-07 17:37                             ` [EXTERNAL] " Honnappa Nagarahalli
2024-07-05 16:01                     ` Wathsala Wathawana Vithanage
2024-07-05 16:11                       ` Pavan Nikhilesh Bhagavatula
2024-07-05 16:25                         ` Wathsala Wathawana Vithanage
2024-07-03 16:19             ` Wathsala Wathawana Vithanage
2024-07-15 22:53 ` [PATCH v3 1/4] eal: expand the availability of WFE and related instructions Wathsala Vithanage
2024-07-15 22:53   ` [PATCH v3 2/4] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
2024-07-16  1:52     ` Honnappa Nagarahalli
2024-07-15 22:53   ` [PATCH v3 3/4] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
2024-07-15 22:53   ` [PATCH v3 4/4] eal: describe Arm CPU features including WFXT Wathsala Vithanage
2024-07-16  1:02     ` Honnappa Nagarahalli
2024-07-26 17:15 ` [PATCH v4 1/4] eal: expand the availability of WFE and related instructions Wathsala Vithanage
2024-07-26 17:15   ` [PATCH v4 2/4] config/arm: adds Arm Neoverse N3 SoC Wathsala Vithanage
2024-07-26 17:15   ` [PATCH v4 3/4] eal: add Arm WFET in power management intrinsics Wathsala Vithanage
2024-10-10 17:01     ` Thomas Monjalon
2024-07-26 17:15   ` [PATCH v4 4/4] eal: describe Arm CPU features including WFXT Wathsala Vithanage
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).