DPDK patches and discussions
 help / color / mirror / Atom feed
* [PATCH] event/cnxk: use WFE LDP loop for getwork routine
@ 2024-01-04 19:36 pbhagavatula
  2024-01-09  7:56 ` Jerin Jacob
  2024-01-17 14:25 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time pbhagavatula
  0 siblings, 2 replies; 12+ messages in thread
From: pbhagavatula @ 2024-01-04 19:36 UTC (permalink / raw)
  To: jerinj, Pavan Nikhilesh, Shijith Thotton; +Cc: dev

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Use WFE LDP loop while polling for GETWORK completion for better
power savings.
Disabled by default and can be enabled by setting
`RTE_ARM_USE_WFE` to `true` in `config/arm/meson.build`

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/eventdevs/cnxk.rst     |  9 ++++++
 drivers/event/cnxk/cn10k_worker.h | 52 +++++++++++++++++++++++++------
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst
index cccb8a0304..d62c143c77 100644
--- a/doc/guides/eventdevs/cnxk.rst
+++ b/doc/guides/eventdevs/cnxk.rst
@@ -198,6 +198,15 @@ Runtime Config Options
 
     -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0
 
+Power Savings on CN10K
+----------------------
+
+ARM cores can additionally use WFE when polling for transactions on SSO bus
+to save power i.e., in the event dequeue call ARM core can enter WFE and exit
+when either work has been scheduled or dequeue timeout has reached.
+This can be enabled by setting ``RTE_ARM_USE_WFE`` to ``true`` in
+``config/arm/meson.build``.
+
 Debugging Options
 -----------------
 
diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h
index 8aa916fa12..92d5190842 100644
--- a/drivers/event/cnxk/cn10k_worker.h
+++ b/drivers/event/cnxk/cn10k_worker.h
@@ -250,23 +250,57 @@ cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct rte_event *ev,
 
 	gw.get_work = ws->gw_wdata;
 #if defined(RTE_ARCH_ARM64)
-#if !defined(__clang__)
-	asm volatile(
-		PLT_CPU_FEATURE_PREAMBLE
-		"caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n"
-		: [wdata] "+r"(gw.get_work)
-		: [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
-		: "memory");
-#else
+#if defined(__clang__)
 	register uint64_t x0 __asm("x0") = (uint64_t)gw.u64[0];
 	register uint64_t x1 __asm("x1") = (uint64_t)gw.u64[1];
+#if defined(RTE_ARM_USE_WFE)
+	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldp %[x0], %[x1], [%[tag_loc]]	\n"
+		     "		tbz %[x0], %[pend_gw], done%=	\n"
+		     "		sevl					\n"
+		     "rty%=:	wfe					\n"
+		     "		ldp %[x0], %[x1], [%[tag_loc]]	\n"
+		     "		tbnz %[x0], %[pend_gw], rty%=	\n"
+		     "done%=:						\n"
+		     "		dmb ld					\n"
+		     : [x0] "+r" (x0), [x1] "+r" (x1)
+		     : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0),
+		       [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT)
+		     : "memory");
+#else
 	asm volatile(".arch armv8-a+lse\n"
 		     "caspal %[x0], %[x1], %[x0], %[x1], [%[dst]]\n"
-		     : [x0] "+r"(x0), [x1] "+r"(x1)
+		     : [x0] "+r" (x0), [x1] "+r" (x1)
 		     : [dst] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
 		     : "memory");
+#endif
 	gw.u64[0] = x0;
 	gw.u64[1] = x1;
+#else
+#if defined(RTE_ARM_USE_WFE)
+	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldp %[wdata], %H[wdata], [%[tag_loc]]	\n"
+		     "		tbz %[wdata], %[pend_gw], done%=	\n"
+		     "		sevl					\n"
+		     "rty%=:	wfe					\n"
+		     "		ldp %[wdata], %H[wdata], [%[tag_loc]]	\n"
+		     "		tbnz %[wdata], %[pend_gw], rty%=	\n"
+		     "done%=:						\n"
+		     "		dmb ld					\n"
+		     : [wdata] "=&r"(gw.get_work)
+		     : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0),
+		       [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT)
+		     : "memory");
+#else
+	asm volatile(
+		PLT_CPU_FEATURE_PREAMBLE
+		"caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n"
+		: [wdata] "+r"(gw.get_work)
+		: [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
+		: "memory");
+#endif
 #endif
 #else
 	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH] event/cnxk: use WFE LDP loop for getwork routine
  2024-01-04 19:36 [PATCH] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
@ 2024-01-09  7:56 ` Jerin Jacob
  2024-01-17 14:25 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time pbhagavatula
  1 sibling, 0 replies; 12+ messages in thread
From: Jerin Jacob @ 2024-01-09  7:56 UTC (permalink / raw)
  To: pbhagavatula, Ruifeng Wang (Arm Technology China), Honnappa Nagarahalli
  Cc: jerinj, Shijith Thotton, dev

On Fri, Jan 5, 2024 at 9:24 AM <pbhagavatula@marvell.com> wrote:
>
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> Use WFE LDP loop while polling for GETWORK completion for better
> power savings.
> Disabled by default and can be enabled by setting
> `RTE_ARM_USE_WFE` to `true` in `config/arm/meson.build`
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>  doc/guides/eventdevs/cnxk.rst     |  9 ++++++
>  drivers/event/cnxk/cn10k_worker.h | 52 +++++++++++++++++++++++++------
>  2 files changed, 52 insertions(+), 9 deletions(-)
>
> diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst
> index cccb8a0304..d62c143c77 100644
> --- a/doc/guides/eventdevs/cnxk.rst
> +++ b/doc/guides/eventdevs/cnxk.rst
> @@ -198,6 +198,15 @@ Runtime Config Options
>
>      -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0
>
> +Power Savings on CN10K
> +----------------------
> +
> +ARM cores can additionally use WFE when polling for transactions on SSO bus
> +to save power i.e., in the event dequeue call ARM core can enter WFE and exit
> +when either work has been scheduled or dequeue timeout has reached.
> +This can be enabled by setting ``RTE_ARM_USE_WFE`` to ``true`` in
> +``config/arm/meson.build``.

+ ARM maintainers

IMO, Updating config/arm/meson.build for enabling RTE_ARM_USE_WFE,
needs to improved.
Could you push a patch for enabling via -D... or via -Dc_args=...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] config/arm: allow WFE to be enabled config time
  2024-01-04 19:36 [PATCH] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
  2024-01-09  7:56 ` Jerin Jacob
@ 2024-01-17 14:25 ` pbhagavatula
  2024-01-17 14:26   ` [PATCH v2 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
                     ` (2 more replies)
  1 sibling, 3 replies; 12+ messages in thread
From: pbhagavatula @ 2024-01-17 14:25 UTC (permalink / raw)
  To: jerinj, Ruifeng Wang, Bruce Richardson; +Cc: dev, Pavan Nikhilesh

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Allow RTE_ARM_USE_WFE to be enabled at meson configuration
time by passing it via c_args instead of modifying
`config/arm/meson.build`.

Example usage:
 meson build -Dc_args='-DRTE_ARM_USE_WFE' \
	--cross-file config/arm/arm64_cn10k_linux_gcc

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 config/arm/meson.build | 1 -
 1 file changed, 1 deletion(-)

diff --git a/config/arm/meson.build b/config/arm/meson.build
index 36f21d2259..a63711e986 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -17,7 +17,6 @@ flags_common = [
         #    ['RTE_ARM64_MEMCPY_ALIGN_MASK', 0xF],
         #    ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false],
 
-        ['RTE_ARM_USE_WFE', false],
         ['RTE_ARCH_ARM64', true],
         ['RTE_CACHE_LINE_SIZE', 128]
 ]
-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] event/cnxk: use WFE LDP loop for getwork routine
  2024-01-17 14:25 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time pbhagavatula
@ 2024-01-17 14:26   ` pbhagavatula
  2024-01-18  1:52   ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time Ruifeng Wang
  2024-01-21 15:21   ` [PATCH v3 " pbhagavatula
  2 siblings, 0 replies; 12+ messages in thread
From: pbhagavatula @ 2024-01-17 14:26 UTC (permalink / raw)
  To: jerinj, Pavan Nikhilesh, Shijith Thotton; +Cc: dev

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Use WFE LDP loop while polling for GETWORK completion for better
power savings.
Disabled by default and can be enabled by configuring meson with
-Dc_args='-DRTE_ARM_USE_WFE'.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/eventdevs/cnxk.rst     |  9 ++++++
 drivers/event/cnxk/cn10k_worker.h | 52 +++++++++++++++++++++++++------
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst
index cccb8a0304..04f5b5025b 100644
--- a/doc/guides/eventdevs/cnxk.rst
+++ b/doc/guides/eventdevs/cnxk.rst
@@ -198,6 +198,15 @@ Runtime Config Options
 
     -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0
 
+Power Savings on CN10K
+----------------------
+
+ARM cores can additionally use WFE when polling for transactions on SSO bus
+to save power i.e., in the event dequeue call ARM core can enter WFE and exit
+when either work has been scheduled or dequeue timeout has reached.
+This can be enabled by configuring meson with the following option
+``-Dc_args='-DRTE_ARM_USE_WFE'``.
+
 Debugging Options
 -----------------
 
diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h
index 8aa916fa12..92d5190842 100644
--- a/drivers/event/cnxk/cn10k_worker.h
+++ b/drivers/event/cnxk/cn10k_worker.h
@@ -250,23 +250,57 @@ cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct rte_event *ev,
 
 	gw.get_work = ws->gw_wdata;
 #if defined(RTE_ARCH_ARM64)
-#if !defined(__clang__)
-	asm volatile(
-		PLT_CPU_FEATURE_PREAMBLE
-		"caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n"
-		: [wdata] "+r"(gw.get_work)
-		: [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
-		: "memory");
-#else
+#if defined(__clang__)
 	register uint64_t x0 __asm("x0") = (uint64_t)gw.u64[0];
 	register uint64_t x1 __asm("x1") = (uint64_t)gw.u64[1];
+#if defined(RTE_ARM_USE_WFE)
+	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldp %[x0], %[x1], [%[tag_loc]]	\n"
+		     "		tbz %[x0], %[pend_gw], done%=	\n"
+		     "		sevl					\n"
+		     "rty%=:	wfe					\n"
+		     "		ldp %[x0], %[x1], [%[tag_loc]]	\n"
+		     "		tbnz %[x0], %[pend_gw], rty%=	\n"
+		     "done%=:						\n"
+		     "		dmb ld					\n"
+		     : [x0] "+r" (x0), [x1] "+r" (x1)
+		     : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0),
+		       [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT)
+		     : "memory");
+#else
 	asm volatile(".arch armv8-a+lse\n"
 		     "caspal %[x0], %[x1], %[x0], %[x1], [%[dst]]\n"
-		     : [x0] "+r"(x0), [x1] "+r"(x1)
+		     : [x0] "+r" (x0), [x1] "+r" (x1)
 		     : [dst] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
 		     : "memory");
+#endif
 	gw.u64[0] = x0;
 	gw.u64[1] = x1;
+#else
+#if defined(RTE_ARM_USE_WFE)
+	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldp %[wdata], %H[wdata], [%[tag_loc]]	\n"
+		     "		tbz %[wdata], %[pend_gw], done%=	\n"
+		     "		sevl					\n"
+		     "rty%=:	wfe					\n"
+		     "		ldp %[wdata], %H[wdata], [%[tag_loc]]	\n"
+		     "		tbnz %[wdata], %[pend_gw], rty%=	\n"
+		     "done%=:						\n"
+		     "		dmb ld					\n"
+		     : [wdata] "=&r"(gw.get_work)
+		     : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0),
+		       [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT)
+		     : "memory");
+#else
+	asm volatile(
+		PLT_CPU_FEATURE_PREAMBLE
+		"caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n"
+		: [wdata] "+r"(gw.get_work)
+		: [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
+		: "memory");
+#endif
 #endif
 #else
 	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] config/arm: allow WFE to be enabled config time
  2024-01-17 14:25 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time pbhagavatula
  2024-01-17 14:26   ` [PATCH v2 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
@ 2024-01-18  1:52   ` Ruifeng Wang
  2024-01-21 15:21   ` [PATCH v3 " pbhagavatula
  2 siblings, 0 replies; 12+ messages in thread
From: Ruifeng Wang @ 2024-01-18  1:52 UTC (permalink / raw)
  To: pbhagavatula, jerinj, Bruce Richardson; +Cc: dev, nd


On 2024/1/17 10:25 PM, pbhagavatula@marvell.com wrote:
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> Allow RTE_ARM_USE_WFE to be enabled at meson configuration
> time by passing it via c_args instead of modifying
> `config/arm/meson.build`.
> 
> Example usage:
>   meson build -Dc_args='-DRTE_ARM_USE_WFE' \
> 	--cross-file config/arm/arm64_cn10k_linux_gcc
> 
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>   config/arm/meson.build | 1 -
>   1 file changed, 1 deletion(-)
> 
> diff --git a/config/arm/meson.build b/config/arm/meson.build
> index 36f21d2259..a63711e986 100644
> --- a/config/arm/meson.build
> +++ b/config/arm/meson.build
> @@ -17,7 +17,6 @@ flags_common = [
>           #    ['RTE_ARM64_MEMCPY_ALIGN_MASK', 0xF],
>           #    ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false],
>   
> -        ['RTE_ARM_USE_WFE', false],

What about commenting this line out instead?
It will be easier to track the configurables.

>           ['RTE_ARCH_ARM64', true],
>           ['RTE_CACHE_LINE_SIZE', 128]
>   ]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 1/2] config/arm: allow WFE to be enabled config time
  2024-01-17 14:25 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time pbhagavatula
  2024-01-17 14:26   ` [PATCH v2 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
  2024-01-18  1:52   ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time Ruifeng Wang
@ 2024-01-21 15:21   ` pbhagavatula
  2024-01-21 15:21     ` [PATCH v3 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
                       ` (3 more replies)
  2 siblings, 4 replies; 12+ messages in thread
From: pbhagavatula @ 2024-01-21 15:21 UTC (permalink / raw)
  To: jerinj, Ruifeng.Wang, nd, Ruifeng Wang, Bruce Richardson
  Cc: dev, Pavan Nikhilesh

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Allow RTE_ARM_USE_WFE to be enabled at meson configuration
time by passing it via c_args instead of modifying
`config/arm/meson.build`.

Example usage:
 meson build -Dc_args='-DRTE_ARM_USE_WFE' \
	--cross-file config/arm/arm64_cn10k_linux_gcc

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 v3 Changes:
 - Comment the meson option instead of removing it.

 config/arm/meson.build | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/config/arm/meson.build b/config/arm/meson.build
index 36f21d2259..89e1de312b 100644
--- a/config/arm/meson.build
+++ b/config/arm/meson.build
@@ -17,7 +17,9 @@ flags_common = [
         #    ['RTE_ARM64_MEMCPY_ALIGN_MASK', 0xF],
         #    ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false],

-        ['RTE_ARM_USE_WFE', false],
+        # Enable use of ARM wait for event instruction.
+        # ['RTE_ARM_USE_WFE', false],
+
         ['RTE_ARCH_ARM64', true],
         ['RTE_CACHE_LINE_SIZE', 128]
 ]
--
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v3 2/2] event/cnxk: use WFE LDP loop for getwork routine
  2024-01-21 15:21   ` [PATCH v3 " pbhagavatula
@ 2024-01-21 15:21     ` pbhagavatula
  2024-01-22  6:37     ` [PATCH v3 1/2] config/arm: allow WFE to be enabled config time fengchengwen
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 12+ messages in thread
From: pbhagavatula @ 2024-01-21 15:21 UTC (permalink / raw)
  To: jerinj, Ruifeng.Wang, nd, Pavan Nikhilesh, Shijith Thotton; +Cc: dev

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Use WFE LDP loop while polling for GETWORK completion for better
power savings.
Disabled by default and can be enabled by configuring meson with
-Dc_args='-DRTE_ARM_USE_WFE'.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 doc/guides/eventdevs/cnxk.rst     |  9 ++++++
 drivers/event/cnxk/cn10k_worker.h | 52 +++++++++++++++++++++++++------
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst
index cccb8a0304..04f5b5025b 100644
--- a/doc/guides/eventdevs/cnxk.rst
+++ b/doc/guides/eventdevs/cnxk.rst
@@ -198,6 +198,15 @@ Runtime Config Options
 
     -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0
 
+Power Savings on CN10K
+----------------------
+
+ARM cores can additionally use WFE when polling for transactions on SSO bus
+to save power i.e., in the event dequeue call ARM core can enter WFE and exit
+when either work has been scheduled or dequeue timeout has reached.
+This can be enabled by configuring meson with the following option
+``-Dc_args='-DRTE_ARM_USE_WFE'``.
+
 Debugging Options
 -----------------
 
diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h
index 8aa916fa12..92d5190842 100644
--- a/drivers/event/cnxk/cn10k_worker.h
+++ b/drivers/event/cnxk/cn10k_worker.h
@@ -250,23 +250,57 @@ cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct rte_event *ev,
 
 	gw.get_work = ws->gw_wdata;
 #if defined(RTE_ARCH_ARM64)
-#if !defined(__clang__)
-	asm volatile(
-		PLT_CPU_FEATURE_PREAMBLE
-		"caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n"
-		: [wdata] "+r"(gw.get_work)
-		: [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
-		: "memory");
-#else
+#if defined(__clang__)
 	register uint64_t x0 __asm("x0") = (uint64_t)gw.u64[0];
 	register uint64_t x1 __asm("x1") = (uint64_t)gw.u64[1];
+#if defined(RTE_ARM_USE_WFE)
+	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldp %[x0], %[x1], [%[tag_loc]]	\n"
+		     "		tbz %[x0], %[pend_gw], done%=	\n"
+		     "		sevl					\n"
+		     "rty%=:	wfe					\n"
+		     "		ldp %[x0], %[x1], [%[tag_loc]]	\n"
+		     "		tbnz %[x0], %[pend_gw], rty%=	\n"
+		     "done%=:						\n"
+		     "		dmb ld					\n"
+		     : [x0] "+r" (x0), [x1] "+r" (x1)
+		     : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0),
+		       [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT)
+		     : "memory");
+#else
 	asm volatile(".arch armv8-a+lse\n"
 		     "caspal %[x0], %[x1], %[x0], %[x1], [%[dst]]\n"
-		     : [x0] "+r"(x0), [x1] "+r"(x1)
+		     : [x0] "+r" (x0), [x1] "+r" (x1)
 		     : [dst] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
 		     : "memory");
+#endif
 	gw.u64[0] = x0;
 	gw.u64[1] = x1;
+#else
+#if defined(RTE_ARM_USE_WFE)
+	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldp %[wdata], %H[wdata], [%[tag_loc]]	\n"
+		     "		tbz %[wdata], %[pend_gw], done%=	\n"
+		     "		sevl					\n"
+		     "rty%=:	wfe					\n"
+		     "		ldp %[wdata], %H[wdata], [%[tag_loc]]	\n"
+		     "		tbnz %[wdata], %[pend_gw], rty%=	\n"
+		     "done%=:						\n"
+		     "		dmb ld					\n"
+		     : [wdata] "=&r"(gw.get_work)
+		     : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0),
+		       [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT)
+		     : "memory");
+#else
+	asm volatile(
+		PLT_CPU_FEATURE_PREAMBLE
+		"caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n"
+		: [wdata] "+r"(gw.get_work)
+		: [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
+		: "memory");
+#endif
 #endif
 #else
 	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
-- 
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] config/arm: allow WFE to be enabled config time
  2024-01-21 15:21   ` [PATCH v3 " pbhagavatula
  2024-01-21 15:21     ` [PATCH v3 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
@ 2024-01-22  6:37     ` fengchengwen
  2024-01-22  6:43     ` Ruifeng Wang
  2024-02-01 22:03     ` [PATCH v4] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
  3 siblings, 0 replies; 12+ messages in thread
From: fengchengwen @ 2024-01-22  6:37 UTC (permalink / raw)
  To: pbhagavatula, jerinj, Ruifeng.Wang, nd, Bruce Richardson; +Cc: dev

Acked-by: Chengwen Feng <fengchengwen@huawei.com>

On 2024/1/21 23:21, pbhagavatula@marvell.com wrote:
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> 
> Allow RTE_ARM_USE_WFE to be enabled at meson configuration
> time by passing it via c_args instead of modifying
> `config/arm/meson.build`.
> 
> Example usage:
>  meson build -Dc_args='-DRTE_ARM_USE_WFE' \
> 	--cross-file config/arm/arm64_cn10k_linux_gcc
> 
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>  v3 Changes:
>  - Comment the meson option instead of removing it.
> 
>  config/arm/meson.build | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/config/arm/meson.build b/config/arm/meson.build
> index 36f21d2259..89e1de312b 100644
> --- a/config/arm/meson.build
> +++ b/config/arm/meson.build
> @@ -17,7 +17,9 @@ flags_common = [
>          #    ['RTE_ARM64_MEMCPY_ALIGN_MASK', 0xF],
>          #    ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false],
> 
> -        ['RTE_ARM_USE_WFE', false],
> +        # Enable use of ARM wait for event instruction.
> +        # ['RTE_ARM_USE_WFE', false],
> +
>          ['RTE_ARCH_ARM64', true],
>          ['RTE_CACHE_LINE_SIZE', 128]
>  ]
> --
> 2.25.1
> 
> .
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] config/arm: allow WFE to be enabled config time
  2024-01-21 15:21   ` [PATCH v3 " pbhagavatula
  2024-01-21 15:21     ` [PATCH v3 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
  2024-01-22  6:37     ` [PATCH v3 1/2] config/arm: allow WFE to be enabled config time fengchengwen
@ 2024-01-22  6:43     ` Ruifeng Wang
  2024-02-01 16:37       ` Jerin Jacob
  2024-02-01 22:03     ` [PATCH v4] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
  3 siblings, 1 reply; 12+ messages in thread
From: Ruifeng Wang @ 2024-01-22  6:43 UTC (permalink / raw)
  To: pbhagavatula, jerinj, nd, Bruce Richardson; +Cc: dev


On 2024/1/21 11:21 PM, pbhagavatula@marvell.com wrote:
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> Allow RTE_ARM_USE_WFE to be enabled at meson configuration
> time by passing it via c_args instead of modifying
> `config/arm/meson.build`.
>
> Example usage:
>   meson build -Dc_args='-DRTE_ARM_USE_WFE' \
>       --cross-file config/arm/arm64_cn10k_linux_gcc
>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>   v3 Changes:
>   - Comment the meson option instead of removing it.
>
>   config/arm/meson.build | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/config/arm/meson.build b/config/arm/meson.build
> index 36f21d2259..89e1de312b 100644
> --- a/config/arm/meson.build
> +++ b/config/arm/meson.build
> @@ -17,7 +17,9 @@ flags_common = [
>           #    ['RTE_ARM64_MEMCPY_ALIGN_MASK', 0xF],
>           #    ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false],
>
> -        ['RTE_ARM_USE_WFE', false],
> +        # Enable use of ARM wait for event instruction.
> +        # ['RTE_ARM_USE_WFE', false],
> +
>           ['RTE_ARCH_ARM64', true],
>           ['RTE_CACHE_LINE_SIZE', 128]
>   ]
> --
> 2.25.1
>
Acked-by: Ruifeng Wang <ruifeng.wang@arm.com>
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3 1/2] config/arm: allow WFE to be enabled config time
  2024-01-22  6:43     ` Ruifeng Wang
@ 2024-02-01 16:37       ` Jerin Jacob
  0 siblings, 0 replies; 12+ messages in thread
From: Jerin Jacob @ 2024-02-01 16:37 UTC (permalink / raw)
  To: Ruifeng Wang; +Cc: pbhagavatula, jerinj, nd, Bruce Richardson, dev

On Mon, Jan 22, 2024 at 12:13 PM Ruifeng Wang <Ruifeng.Wang@arm.com> wrote:
>
>
> On 2024/1/21 11:21 PM, pbhagavatula@marvell.com wrote:
> > From: Pavan Nikhilesh <pbhagavatula@marvell.com>
> >
> > Allow RTE_ARM_USE_WFE to be enabled at meson configuration
> > time by passing it via c_args instead of modifying
> > `config/arm/meson.build`.
> >
> > Example usage:
> >   meson build -Dc_args='-DRTE_ARM_USE_WFE' \
> >       --cross-file config/arm/arm64_cn10k_linux_gcc
> >
> > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>

Could you split and resend this series a two separate patch as this
patch needs to go through main tree.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4] event/cnxk: use WFE LDP loop for getwork routine
  2024-01-21 15:21   ` [PATCH v3 " pbhagavatula
                       ` (2 preceding siblings ...)
  2024-01-22  6:43     ` Ruifeng Wang
@ 2024-02-01 22:03     ` pbhagavatula
  2024-02-25 15:20       ` Jerin Jacob
  3 siblings, 1 reply; 12+ messages in thread
From: pbhagavatula @ 2024-02-01 22:03 UTC (permalink / raw)
  To: jerinj, Pavan Nikhilesh, Shijith Thotton; +Cc: dev

From: Pavan Nikhilesh <pbhagavatula@marvell.com>

Use WFE LDP loop while polling for GETWORK completion for better
power savings.
Disabled by default and can be enabled by configuring meson with
-Dc_args='-DRTE_ARM_USE_WFE'.

Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
---
 v4 Changes:
 - Split patches

 doc/guides/eventdevs/cnxk.rst     |  9 ++++++
 drivers/event/cnxk/cn10k_worker.h | 52 +++++++++++++++++++++++++------
 2 files changed, 52 insertions(+), 9 deletions(-)

diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst
index cccb8a0304..04f5b5025b 100644
--- a/doc/guides/eventdevs/cnxk.rst
+++ b/doc/guides/eventdevs/cnxk.rst
@@ -198,6 +198,15 @@ Runtime Config Options

     -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0

+Power Savings on CN10K
+----------------------
+
+ARM cores can additionally use WFE when polling for transactions on SSO bus
+to save power i.e., in the event dequeue call ARM core can enter WFE and exit
+when either work has been scheduled or dequeue timeout has reached.
+This can be enabled by configuring meson with the following option
+``-Dc_args='-DRTE_ARM_USE_WFE'``.
+
 Debugging Options
 -----------------

diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h
index 8aa916fa12..92d5190842 100644
--- a/drivers/event/cnxk/cn10k_worker.h
+++ b/drivers/event/cnxk/cn10k_worker.h
@@ -250,23 +250,57 @@ cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct rte_event *ev,

 	gw.get_work = ws->gw_wdata;
 #if defined(RTE_ARCH_ARM64)
-#if !defined(__clang__)
-	asm volatile(
-		PLT_CPU_FEATURE_PREAMBLE
-		"caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n"
-		: [wdata] "+r"(gw.get_work)
-		: [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
-		: "memory");
-#else
+#if defined(__clang__)
 	register uint64_t x0 __asm("x0") = (uint64_t)gw.u64[0];
 	register uint64_t x1 __asm("x1") = (uint64_t)gw.u64[1];
+#if defined(RTE_ARM_USE_WFE)
+	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldp %[x0], %[x1], [%[tag_loc]]	\n"
+		     "		tbz %[x0], %[pend_gw], done%=	\n"
+		     "		sevl					\n"
+		     "rty%=:	wfe					\n"
+		     "		ldp %[x0], %[x1], [%[tag_loc]]	\n"
+		     "		tbnz %[x0], %[pend_gw], rty%=	\n"
+		     "done%=:						\n"
+		     "		dmb ld					\n"
+		     : [x0] "+r" (x0), [x1] "+r" (x1)
+		     : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0),
+		       [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT)
+		     : "memory");
+#else
 	asm volatile(".arch armv8-a+lse\n"
 		     "caspal %[x0], %[x1], %[x0], %[x1], [%[dst]]\n"
-		     : [x0] "+r"(x0), [x1] "+r"(x1)
+		     : [x0] "+r" (x0), [x1] "+r" (x1)
 		     : [dst] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
 		     : "memory");
+#endif
 	gw.u64[0] = x0;
 	gw.u64[1] = x1;
+#else
+#if defined(RTE_ARM_USE_WFE)
+	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
+	asm volatile(PLT_CPU_FEATURE_PREAMBLE
+		     "		ldp %[wdata], %H[wdata], [%[tag_loc]]	\n"
+		     "		tbz %[wdata], %[pend_gw], done%=	\n"
+		     "		sevl					\n"
+		     "rty%=:	wfe					\n"
+		     "		ldp %[wdata], %H[wdata], [%[tag_loc]]	\n"
+		     "		tbnz %[wdata], %[pend_gw], rty%=	\n"
+		     "done%=:						\n"
+		     "		dmb ld					\n"
+		     : [wdata] "=&r"(gw.get_work)
+		     : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0),
+		       [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT)
+		     : "memory");
+#else
+	asm volatile(
+		PLT_CPU_FEATURE_PREAMBLE
+		"caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n"
+		: [wdata] "+r"(gw.get_work)
+		: [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0)
+		: "memory");
+#endif
 #endif
 #else
 	plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0);
--
2.25.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4] event/cnxk: use WFE LDP loop for getwork routine
  2024-02-01 22:03     ` [PATCH v4] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
@ 2024-02-25 15:20       ` Jerin Jacob
  0 siblings, 0 replies; 12+ messages in thread
From: Jerin Jacob @ 2024-02-25 15:20 UTC (permalink / raw)
  To: pbhagavatula; +Cc: jerinj, Shijith Thotton, dev

On Fri, Feb 2, 2024 at 5:59 AM <pbhagavatula@marvell.com> wrote:
>
> From: Pavan Nikhilesh <pbhagavatula@marvell.com>
>
> Use WFE LDP loop while polling for GETWORK completion for better
> power savings.

> Disabled by default and can be enabled by configuring meson with
> -Dc_args='-DRTE_ARM_USE_WFE'.

Since this section is not yet merged. We can remove this commit log.

>
> Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com>
> ---
>  v4 Changes:
>  - Split patches
>
>  doc/guides/eventdevs/cnxk.rst     |  9 ++++++

Please update the release notes for this PMD feature.


>  drivers/event/cnxk/cn10k_worker.h | 52 +++++++++++++++++++++++++------
>  2 files changed, 52 insertions(+), 9 deletions(-)
>
> diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst
> index cccb8a0304..04f5b5025b 100644
> --- a/doc/guides/eventdevs/cnxk.rst
> +++ b/doc/guides/eventdevs/cnxk.rst
> @@ -198,6 +198,15 @@ Runtime Config Options
>
>      -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0
>
> +Power Savings on CN10K
> +----------------------
> +
> +ARM cores can additionally use WFE when polling for transactions on SSO bus
> +to save power i.e., in the event dequeue call ARM core can enter WFE and exit
> +when either work has been scheduled or dequeue timeout has reached.
> +This can be enabled by configuring meson with the following option
> +``-Dc_args='-DRTE_ARM_USE_WFE'``.

The last section can be made as generic, as other patches are not merged.
i.e This can be enabled by selecting RTE_ARM_USE_WFE or so.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-02-25 15:21 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-04 19:36 [PATCH] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
2024-01-09  7:56 ` Jerin Jacob
2024-01-17 14:25 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time pbhagavatula
2024-01-17 14:26   ` [PATCH v2 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
2024-01-18  1:52   ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time Ruifeng Wang
2024-01-21 15:21   ` [PATCH v3 " pbhagavatula
2024-01-21 15:21     ` [PATCH v3 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
2024-01-22  6:37     ` [PATCH v3 1/2] config/arm: allow WFE to be enabled config time fengchengwen
2024-01-22  6:43     ` Ruifeng Wang
2024-02-01 16:37       ` Jerin Jacob
2024-02-01 22:03     ` [PATCH v4] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula
2024-02-25 15:20       ` Jerin Jacob

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).