* [PATCH] event/cnxk: use WFE LDP loop for getwork routine @ 2024-01-04 19:36 pbhagavatula 2024-01-09 7:56 ` Jerin Jacob 2024-01-17 14:25 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time pbhagavatula 0 siblings, 2 replies; 12+ messages in thread From: pbhagavatula @ 2024-01-04 19:36 UTC (permalink / raw) To: jerinj, Pavan Nikhilesh, Shijith Thotton; +Cc: dev From: Pavan Nikhilesh <pbhagavatula@marvell.com> Use WFE LDP loop while polling for GETWORK completion for better power savings. Disabled by default and can be enabled by setting `RTE_ARM_USE_WFE` to `true` in `config/arm/meson.build` Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> --- doc/guides/eventdevs/cnxk.rst | 9 ++++++ drivers/event/cnxk/cn10k_worker.h | 52 +++++++++++++++++++++++++------ 2 files changed, 52 insertions(+), 9 deletions(-) diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst index cccb8a0304..d62c143c77 100644 --- a/doc/guides/eventdevs/cnxk.rst +++ b/doc/guides/eventdevs/cnxk.rst @@ -198,6 +198,15 @@ Runtime Config Options -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0 +Power Savings on CN10K +---------------------- + +ARM cores can additionally use WFE when polling for transactions on SSO bus +to save power i.e., in the event dequeue call ARM core can enter WFE and exit +when either work has been scheduled or dequeue timeout has reached. +This can be enabled by setting ``RTE_ARM_USE_WFE`` to ``true`` in +``config/arm/meson.build``. + Debugging Options ----------------- diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h index 8aa916fa12..92d5190842 100644 --- a/drivers/event/cnxk/cn10k_worker.h +++ b/drivers/event/cnxk/cn10k_worker.h @@ -250,23 +250,57 @@ cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct rte_event *ev, gw.get_work = ws->gw_wdata; #if defined(RTE_ARCH_ARM64) -#if !defined(__clang__) - asm volatile( - PLT_CPU_FEATURE_PREAMBLE - "caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n" - : [wdata] "+r"(gw.get_work) - : [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) - : "memory"); -#else +#if defined(__clang__) register uint64_t x0 __asm("x0") = (uint64_t)gw.u64[0]; register uint64_t x1 __asm("x1") = (uint64_t)gw.u64[1]; +#if defined(RTE_ARM_USE_WFE) + plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0); + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldp %[x0], %[x1], [%[tag_loc]] \n" + " tbz %[x0], %[pend_gw], done%= \n" + " sevl \n" + "rty%=: wfe \n" + " ldp %[x0], %[x1], [%[tag_loc]] \n" + " tbnz %[x0], %[pend_gw], rty%= \n" + "done%=: \n" + " dmb ld \n" + : [x0] "+r" (x0), [x1] "+r" (x1) + : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0), + [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT) + : "memory"); +#else asm volatile(".arch armv8-a+lse\n" "caspal %[x0], %[x1], %[x0], %[x1], [%[dst]]\n" - : [x0] "+r"(x0), [x1] "+r"(x1) + : [x0] "+r" (x0), [x1] "+r" (x1) : [dst] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) : "memory"); +#endif gw.u64[0] = x0; gw.u64[1] = x1; +#else +#if defined(RTE_ARM_USE_WFE) + plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0); + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldp %[wdata], %H[wdata], [%[tag_loc]] \n" + " tbz %[wdata], %[pend_gw], done%= \n" + " sevl \n" + "rty%=: wfe \n" + " ldp %[wdata], %H[wdata], [%[tag_loc]] \n" + " tbnz %[wdata], %[pend_gw], rty%= \n" + "done%=: \n" + " dmb ld \n" + : [wdata] "=&r"(gw.get_work) + : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0), + [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT) + : "memory"); +#else + asm volatile( + PLT_CPU_FEATURE_PREAMBLE + "caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n" + : [wdata] "+r"(gw.get_work) + : [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) + : "memory"); +#endif #endif #else plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0); -- 2.25.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH] event/cnxk: use WFE LDP loop for getwork routine 2024-01-04 19:36 [PATCH] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula @ 2024-01-09 7:56 ` Jerin Jacob 2024-01-17 14:25 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time pbhagavatula 1 sibling, 0 replies; 12+ messages in thread From: Jerin Jacob @ 2024-01-09 7:56 UTC (permalink / raw) To: pbhagavatula, Ruifeng Wang (Arm Technology China), Honnappa Nagarahalli Cc: jerinj, Shijith Thotton, dev On Fri, Jan 5, 2024 at 9:24 AM <pbhagavatula@marvell.com> wrote: > > From: Pavan Nikhilesh <pbhagavatula@marvell.com> > > Use WFE LDP loop while polling for GETWORK completion for better > power savings. > Disabled by default and can be enabled by setting > `RTE_ARM_USE_WFE` to `true` in `config/arm/meson.build` > > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> > --- > doc/guides/eventdevs/cnxk.rst | 9 ++++++ > drivers/event/cnxk/cn10k_worker.h | 52 +++++++++++++++++++++++++------ > 2 files changed, 52 insertions(+), 9 deletions(-) > > diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst > index cccb8a0304..d62c143c77 100644 > --- a/doc/guides/eventdevs/cnxk.rst > +++ b/doc/guides/eventdevs/cnxk.rst > @@ -198,6 +198,15 @@ Runtime Config Options > > -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0 > > +Power Savings on CN10K > +---------------------- > + > +ARM cores can additionally use WFE when polling for transactions on SSO bus > +to save power i.e., in the event dequeue call ARM core can enter WFE and exit > +when either work has been scheduled or dequeue timeout has reached. > +This can be enabled by setting ``RTE_ARM_USE_WFE`` to ``true`` in > +``config/arm/meson.build``. + ARM maintainers IMO, Updating config/arm/meson.build for enabling RTE_ARM_USE_WFE, needs to improved. Could you push a patch for enabling via -D... or via -Dc_args=... ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 1/2] config/arm: allow WFE to be enabled config time 2024-01-04 19:36 [PATCH] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula 2024-01-09 7:56 ` Jerin Jacob @ 2024-01-17 14:25 ` pbhagavatula 2024-01-17 14:26 ` [PATCH v2 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula ` (2 more replies) 1 sibling, 3 replies; 12+ messages in thread From: pbhagavatula @ 2024-01-17 14:25 UTC (permalink / raw) To: jerinj, Ruifeng Wang, Bruce Richardson; +Cc: dev, Pavan Nikhilesh From: Pavan Nikhilesh <pbhagavatula@marvell.com> Allow RTE_ARM_USE_WFE to be enabled at meson configuration time by passing it via c_args instead of modifying `config/arm/meson.build`. Example usage: meson build -Dc_args='-DRTE_ARM_USE_WFE' \ --cross-file config/arm/arm64_cn10k_linux_gcc Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> --- config/arm/meson.build | 1 - 1 file changed, 1 deletion(-) diff --git a/config/arm/meson.build b/config/arm/meson.build index 36f21d2259..a63711e986 100644 --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -17,7 +17,6 @@ flags_common = [ # ['RTE_ARM64_MEMCPY_ALIGN_MASK', 0xF], # ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false], - ['RTE_ARM_USE_WFE', false], ['RTE_ARCH_ARM64', true], ['RTE_CACHE_LINE_SIZE', 128] ] -- 2.25.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 2/2] event/cnxk: use WFE LDP loop for getwork routine 2024-01-17 14:25 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time pbhagavatula @ 2024-01-17 14:26 ` pbhagavatula 2024-01-18 1:52 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time Ruifeng Wang 2024-01-21 15:21 ` [PATCH v3 " pbhagavatula 2 siblings, 0 replies; 12+ messages in thread From: pbhagavatula @ 2024-01-17 14:26 UTC (permalink / raw) To: jerinj, Pavan Nikhilesh, Shijith Thotton; +Cc: dev From: Pavan Nikhilesh <pbhagavatula@marvell.com> Use WFE LDP loop while polling for GETWORK completion for better power savings. Disabled by default and can be enabled by configuring meson with -Dc_args='-DRTE_ARM_USE_WFE'. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> --- doc/guides/eventdevs/cnxk.rst | 9 ++++++ drivers/event/cnxk/cn10k_worker.h | 52 +++++++++++++++++++++++++------ 2 files changed, 52 insertions(+), 9 deletions(-) diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst index cccb8a0304..04f5b5025b 100644 --- a/doc/guides/eventdevs/cnxk.rst +++ b/doc/guides/eventdevs/cnxk.rst @@ -198,6 +198,15 @@ Runtime Config Options -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0 +Power Savings on CN10K +---------------------- + +ARM cores can additionally use WFE when polling for transactions on SSO bus +to save power i.e., in the event dequeue call ARM core can enter WFE and exit +when either work has been scheduled or dequeue timeout has reached. +This can be enabled by configuring meson with the following option +``-Dc_args='-DRTE_ARM_USE_WFE'``. + Debugging Options ----------------- diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h index 8aa916fa12..92d5190842 100644 --- a/drivers/event/cnxk/cn10k_worker.h +++ b/drivers/event/cnxk/cn10k_worker.h @@ -250,23 +250,57 @@ cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct rte_event *ev, gw.get_work = ws->gw_wdata; #if defined(RTE_ARCH_ARM64) -#if !defined(__clang__) - asm volatile( - PLT_CPU_FEATURE_PREAMBLE - "caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n" - : [wdata] "+r"(gw.get_work) - : [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) - : "memory"); -#else +#if defined(__clang__) register uint64_t x0 __asm("x0") = (uint64_t)gw.u64[0]; register uint64_t x1 __asm("x1") = (uint64_t)gw.u64[1]; +#if defined(RTE_ARM_USE_WFE) + plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0); + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldp %[x0], %[x1], [%[tag_loc]] \n" + " tbz %[x0], %[pend_gw], done%= \n" + " sevl \n" + "rty%=: wfe \n" + " ldp %[x0], %[x1], [%[tag_loc]] \n" + " tbnz %[x0], %[pend_gw], rty%= \n" + "done%=: \n" + " dmb ld \n" + : [x0] "+r" (x0), [x1] "+r" (x1) + : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0), + [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT) + : "memory"); +#else asm volatile(".arch armv8-a+lse\n" "caspal %[x0], %[x1], %[x0], %[x1], [%[dst]]\n" - : [x0] "+r"(x0), [x1] "+r"(x1) + : [x0] "+r" (x0), [x1] "+r" (x1) : [dst] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) : "memory"); +#endif gw.u64[0] = x0; gw.u64[1] = x1; +#else +#if defined(RTE_ARM_USE_WFE) + plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0); + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldp %[wdata], %H[wdata], [%[tag_loc]] \n" + " tbz %[wdata], %[pend_gw], done%= \n" + " sevl \n" + "rty%=: wfe \n" + " ldp %[wdata], %H[wdata], [%[tag_loc]] \n" + " tbnz %[wdata], %[pend_gw], rty%= \n" + "done%=: \n" + " dmb ld \n" + : [wdata] "=&r"(gw.get_work) + : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0), + [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT) + : "memory"); +#else + asm volatile( + PLT_CPU_FEATURE_PREAMBLE + "caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n" + : [wdata] "+r"(gw.get_work) + : [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) + : "memory"); +#endif #endif #else plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0); -- 2.25.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/2] config/arm: allow WFE to be enabled config time 2024-01-17 14:25 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time pbhagavatula 2024-01-17 14:26 ` [PATCH v2 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula @ 2024-01-18 1:52 ` Ruifeng Wang 2024-01-21 15:21 ` [PATCH v3 " pbhagavatula 2 siblings, 0 replies; 12+ messages in thread From: Ruifeng Wang @ 2024-01-18 1:52 UTC (permalink / raw) To: pbhagavatula, jerinj, Bruce Richardson; +Cc: dev, nd On 2024/1/17 10:25 PM, pbhagavatula@marvell.com wrote: > From: Pavan Nikhilesh <pbhagavatula@marvell.com> > > Allow RTE_ARM_USE_WFE to be enabled at meson configuration > time by passing it via c_args instead of modifying > `config/arm/meson.build`. > > Example usage: > meson build -Dc_args='-DRTE_ARM_USE_WFE' \ > --cross-file config/arm/arm64_cn10k_linux_gcc > > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> > --- > config/arm/meson.build | 1 - > 1 file changed, 1 deletion(-) > > diff --git a/config/arm/meson.build b/config/arm/meson.build > index 36f21d2259..a63711e986 100644 > --- a/config/arm/meson.build > +++ b/config/arm/meson.build > @@ -17,7 +17,6 @@ flags_common = [ > # ['RTE_ARM64_MEMCPY_ALIGN_MASK', 0xF], > # ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false], > > - ['RTE_ARM_USE_WFE', false], What about commenting this line out instead? It will be easier to track the configurables. > ['RTE_ARCH_ARM64', true], > ['RTE_CACHE_LINE_SIZE', 128] > ] ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 1/2] config/arm: allow WFE to be enabled config time 2024-01-17 14:25 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time pbhagavatula 2024-01-17 14:26 ` [PATCH v2 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula 2024-01-18 1:52 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time Ruifeng Wang @ 2024-01-21 15:21 ` pbhagavatula 2024-01-21 15:21 ` [PATCH v3 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula ` (3 more replies) 2 siblings, 4 replies; 12+ messages in thread From: pbhagavatula @ 2024-01-21 15:21 UTC (permalink / raw) To: jerinj, Ruifeng.Wang, nd, Ruifeng Wang, Bruce Richardson Cc: dev, Pavan Nikhilesh From: Pavan Nikhilesh <pbhagavatula@marvell.com> Allow RTE_ARM_USE_WFE to be enabled at meson configuration time by passing it via c_args instead of modifying `config/arm/meson.build`. Example usage: meson build -Dc_args='-DRTE_ARM_USE_WFE' \ --cross-file config/arm/arm64_cn10k_linux_gcc Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> --- v3 Changes: - Comment the meson option instead of removing it. config/arm/meson.build | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/config/arm/meson.build b/config/arm/meson.build index 36f21d2259..89e1de312b 100644 --- a/config/arm/meson.build +++ b/config/arm/meson.build @@ -17,7 +17,9 @@ flags_common = [ # ['RTE_ARM64_MEMCPY_ALIGN_MASK', 0xF], # ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false], - ['RTE_ARM_USE_WFE', false], + # Enable use of ARM wait for event instruction. + # ['RTE_ARM_USE_WFE', false], + ['RTE_ARCH_ARM64', true], ['RTE_CACHE_LINE_SIZE', 128] ] -- 2.25.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v3 2/2] event/cnxk: use WFE LDP loop for getwork routine 2024-01-21 15:21 ` [PATCH v3 " pbhagavatula @ 2024-01-21 15:21 ` pbhagavatula 2024-01-22 6:37 ` [PATCH v3 1/2] config/arm: allow WFE to be enabled config time fengchengwen ` (2 subsequent siblings) 3 siblings, 0 replies; 12+ messages in thread From: pbhagavatula @ 2024-01-21 15:21 UTC (permalink / raw) To: jerinj, Ruifeng.Wang, nd, Pavan Nikhilesh, Shijith Thotton; +Cc: dev From: Pavan Nikhilesh <pbhagavatula@marvell.com> Use WFE LDP loop while polling for GETWORK completion for better power savings. Disabled by default and can be enabled by configuring meson with -Dc_args='-DRTE_ARM_USE_WFE'. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> --- doc/guides/eventdevs/cnxk.rst | 9 ++++++ drivers/event/cnxk/cn10k_worker.h | 52 +++++++++++++++++++++++++------ 2 files changed, 52 insertions(+), 9 deletions(-) diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst index cccb8a0304..04f5b5025b 100644 --- a/doc/guides/eventdevs/cnxk.rst +++ b/doc/guides/eventdevs/cnxk.rst @@ -198,6 +198,15 @@ Runtime Config Options -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0 +Power Savings on CN10K +---------------------- + +ARM cores can additionally use WFE when polling for transactions on SSO bus +to save power i.e., in the event dequeue call ARM core can enter WFE and exit +when either work has been scheduled or dequeue timeout has reached. +This can be enabled by configuring meson with the following option +``-Dc_args='-DRTE_ARM_USE_WFE'``. + Debugging Options ----------------- diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h index 8aa916fa12..92d5190842 100644 --- a/drivers/event/cnxk/cn10k_worker.h +++ b/drivers/event/cnxk/cn10k_worker.h @@ -250,23 +250,57 @@ cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct rte_event *ev, gw.get_work = ws->gw_wdata; #if defined(RTE_ARCH_ARM64) -#if !defined(__clang__) - asm volatile( - PLT_CPU_FEATURE_PREAMBLE - "caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n" - : [wdata] "+r"(gw.get_work) - : [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) - : "memory"); -#else +#if defined(__clang__) register uint64_t x0 __asm("x0") = (uint64_t)gw.u64[0]; register uint64_t x1 __asm("x1") = (uint64_t)gw.u64[1]; +#if defined(RTE_ARM_USE_WFE) + plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0); + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldp %[x0], %[x1], [%[tag_loc]] \n" + " tbz %[x0], %[pend_gw], done%= \n" + " sevl \n" + "rty%=: wfe \n" + " ldp %[x0], %[x1], [%[tag_loc]] \n" + " tbnz %[x0], %[pend_gw], rty%= \n" + "done%=: \n" + " dmb ld \n" + : [x0] "+r" (x0), [x1] "+r" (x1) + : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0), + [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT) + : "memory"); +#else asm volatile(".arch armv8-a+lse\n" "caspal %[x0], %[x1], %[x0], %[x1], [%[dst]]\n" - : [x0] "+r"(x0), [x1] "+r"(x1) + : [x0] "+r" (x0), [x1] "+r" (x1) : [dst] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) : "memory"); +#endif gw.u64[0] = x0; gw.u64[1] = x1; +#else +#if defined(RTE_ARM_USE_WFE) + plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0); + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldp %[wdata], %H[wdata], [%[tag_loc]] \n" + " tbz %[wdata], %[pend_gw], done%= \n" + " sevl \n" + "rty%=: wfe \n" + " ldp %[wdata], %H[wdata], [%[tag_loc]] \n" + " tbnz %[wdata], %[pend_gw], rty%= \n" + "done%=: \n" + " dmb ld \n" + : [wdata] "=&r"(gw.get_work) + : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0), + [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT) + : "memory"); +#else + asm volatile( + PLT_CPU_FEATURE_PREAMBLE + "caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n" + : [wdata] "+r"(gw.get_work) + : [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) + : "memory"); +#endif #endif #else plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0); -- 2.25.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/2] config/arm: allow WFE to be enabled config time 2024-01-21 15:21 ` [PATCH v3 " pbhagavatula 2024-01-21 15:21 ` [PATCH v3 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula @ 2024-01-22 6:37 ` fengchengwen 2024-01-22 6:43 ` Ruifeng Wang 2024-02-01 22:03 ` [PATCH v4] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula 3 siblings, 0 replies; 12+ messages in thread From: fengchengwen @ 2024-01-22 6:37 UTC (permalink / raw) To: pbhagavatula, jerinj, Ruifeng.Wang, nd, Bruce Richardson; +Cc: dev Acked-by: Chengwen Feng <fengchengwen@huawei.com> On 2024/1/21 23:21, pbhagavatula@marvell.com wrote: > From: Pavan Nikhilesh <pbhagavatula@marvell.com> > > Allow RTE_ARM_USE_WFE to be enabled at meson configuration > time by passing it via c_args instead of modifying > `config/arm/meson.build`. > > Example usage: > meson build -Dc_args='-DRTE_ARM_USE_WFE' \ > --cross-file config/arm/arm64_cn10k_linux_gcc > > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> > --- > v3 Changes: > - Comment the meson option instead of removing it. > > config/arm/meson.build | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/config/arm/meson.build b/config/arm/meson.build > index 36f21d2259..89e1de312b 100644 > --- a/config/arm/meson.build > +++ b/config/arm/meson.build > @@ -17,7 +17,9 @@ flags_common = [ > # ['RTE_ARM64_MEMCPY_ALIGN_MASK', 0xF], > # ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false], > > - ['RTE_ARM_USE_WFE', false], > + # Enable use of ARM wait for event instruction. > + # ['RTE_ARM_USE_WFE', false], > + > ['RTE_ARCH_ARM64', true], > ['RTE_CACHE_LINE_SIZE', 128] > ] > -- > 2.25.1 > > . > ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/2] config/arm: allow WFE to be enabled config time 2024-01-21 15:21 ` [PATCH v3 " pbhagavatula 2024-01-21 15:21 ` [PATCH v3 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula 2024-01-22 6:37 ` [PATCH v3 1/2] config/arm: allow WFE to be enabled config time fengchengwen @ 2024-01-22 6:43 ` Ruifeng Wang 2024-02-01 16:37 ` Jerin Jacob 2024-02-01 22:03 ` [PATCH v4] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula 3 siblings, 1 reply; 12+ messages in thread From: Ruifeng Wang @ 2024-01-22 6:43 UTC (permalink / raw) To: pbhagavatula, jerinj, nd, Bruce Richardson; +Cc: dev On 2024/1/21 11:21 PM, pbhagavatula@marvell.com wrote: > From: Pavan Nikhilesh <pbhagavatula@marvell.com> > > Allow RTE_ARM_USE_WFE to be enabled at meson configuration > time by passing it via c_args instead of modifying > `config/arm/meson.build`. > > Example usage: > meson build -Dc_args='-DRTE_ARM_USE_WFE' \ > --cross-file config/arm/arm64_cn10k_linux_gcc > > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> > --- > v3 Changes: > - Comment the meson option instead of removing it. > > config/arm/meson.build | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/config/arm/meson.build b/config/arm/meson.build > index 36f21d2259..89e1de312b 100644 > --- a/config/arm/meson.build > +++ b/config/arm/meson.build > @@ -17,7 +17,9 @@ flags_common = [ > # ['RTE_ARM64_MEMCPY_ALIGN_MASK', 0xF], > # ['RTE_ARM64_MEMCPY_STRICT_ALIGN', false], > > - ['RTE_ARM_USE_WFE', false], > + # Enable use of ARM wait for event instruction. > + # ['RTE_ARM_USE_WFE', false], > + > ['RTE_ARCH_ARM64', true], > ['RTE_CACHE_LINE_SIZE', 128] > ] > -- > 2.25.1 > Acked-by: Ruifeng Wang <ruifeng.wang@arm.com> IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you. ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v3 1/2] config/arm: allow WFE to be enabled config time 2024-01-22 6:43 ` Ruifeng Wang @ 2024-02-01 16:37 ` Jerin Jacob 0 siblings, 0 replies; 12+ messages in thread From: Jerin Jacob @ 2024-02-01 16:37 UTC (permalink / raw) To: Ruifeng Wang; +Cc: pbhagavatula, jerinj, nd, Bruce Richardson, dev On Mon, Jan 22, 2024 at 12:13 PM Ruifeng Wang <Ruifeng.Wang@arm.com> wrote: > > > On 2024/1/21 11:21 PM, pbhagavatula@marvell.com wrote: > > From: Pavan Nikhilesh <pbhagavatula@marvell.com> > > > > Allow RTE_ARM_USE_WFE to be enabled at meson configuration > > time by passing it via c_args instead of modifying > > `config/arm/meson.build`. > > > > Example usage: > > meson build -Dc_args='-DRTE_ARM_USE_WFE' \ > > --cross-file config/arm/arm64_cn10k_linux_gcc > > > > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> Could you split and resend this series a two separate patch as this patch needs to go through main tree. ^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v4] event/cnxk: use WFE LDP loop for getwork routine 2024-01-21 15:21 ` [PATCH v3 " pbhagavatula ` (2 preceding siblings ...) 2024-01-22 6:43 ` Ruifeng Wang @ 2024-02-01 22:03 ` pbhagavatula 2024-02-25 15:20 ` Jerin Jacob 3 siblings, 1 reply; 12+ messages in thread From: pbhagavatula @ 2024-02-01 22:03 UTC (permalink / raw) To: jerinj, Pavan Nikhilesh, Shijith Thotton; +Cc: dev From: Pavan Nikhilesh <pbhagavatula@marvell.com> Use WFE LDP loop while polling for GETWORK completion for better power savings. Disabled by default and can be enabled by configuring meson with -Dc_args='-DRTE_ARM_USE_WFE'. Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> --- v4 Changes: - Split patches doc/guides/eventdevs/cnxk.rst | 9 ++++++ drivers/event/cnxk/cn10k_worker.h | 52 +++++++++++++++++++++++++------ 2 files changed, 52 insertions(+), 9 deletions(-) diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst index cccb8a0304..04f5b5025b 100644 --- a/doc/guides/eventdevs/cnxk.rst +++ b/doc/guides/eventdevs/cnxk.rst @@ -198,6 +198,15 @@ Runtime Config Options -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0 +Power Savings on CN10K +---------------------- + +ARM cores can additionally use WFE when polling for transactions on SSO bus +to save power i.e., in the event dequeue call ARM core can enter WFE and exit +when either work has been scheduled or dequeue timeout has reached. +This can be enabled by configuring meson with the following option +``-Dc_args='-DRTE_ARM_USE_WFE'``. + Debugging Options ----------------- diff --git a/drivers/event/cnxk/cn10k_worker.h b/drivers/event/cnxk/cn10k_worker.h index 8aa916fa12..92d5190842 100644 --- a/drivers/event/cnxk/cn10k_worker.h +++ b/drivers/event/cnxk/cn10k_worker.h @@ -250,23 +250,57 @@ cn10k_sso_hws_get_work(struct cn10k_sso_hws *ws, struct rte_event *ev, gw.get_work = ws->gw_wdata; #if defined(RTE_ARCH_ARM64) -#if !defined(__clang__) - asm volatile( - PLT_CPU_FEATURE_PREAMBLE - "caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n" - : [wdata] "+r"(gw.get_work) - : [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) - : "memory"); -#else +#if defined(__clang__) register uint64_t x0 __asm("x0") = (uint64_t)gw.u64[0]; register uint64_t x1 __asm("x1") = (uint64_t)gw.u64[1]; +#if defined(RTE_ARM_USE_WFE) + plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0); + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldp %[x0], %[x1], [%[tag_loc]] \n" + " tbz %[x0], %[pend_gw], done%= \n" + " sevl \n" + "rty%=: wfe \n" + " ldp %[x0], %[x1], [%[tag_loc]] \n" + " tbnz %[x0], %[pend_gw], rty%= \n" + "done%=: \n" + " dmb ld \n" + : [x0] "+r" (x0), [x1] "+r" (x1) + : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0), + [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT) + : "memory"); +#else asm volatile(".arch armv8-a+lse\n" "caspal %[x0], %[x1], %[x0], %[x1], [%[dst]]\n" - : [x0] "+r"(x0), [x1] "+r"(x1) + : [x0] "+r" (x0), [x1] "+r" (x1) : [dst] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) : "memory"); +#endif gw.u64[0] = x0; gw.u64[1] = x1; +#else +#if defined(RTE_ARM_USE_WFE) + plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0); + asm volatile(PLT_CPU_FEATURE_PREAMBLE + " ldp %[wdata], %H[wdata], [%[tag_loc]] \n" + " tbz %[wdata], %[pend_gw], done%= \n" + " sevl \n" + "rty%=: wfe \n" + " ldp %[wdata], %H[wdata], [%[tag_loc]] \n" + " tbnz %[wdata], %[pend_gw], rty%= \n" + "done%=: \n" + " dmb ld \n" + : [wdata] "=&r"(gw.get_work) + : [tag_loc] "r"(ws->base + SSOW_LF_GWS_WQE0), + [pend_gw] "i"(SSOW_LF_GWS_TAG_PEND_GET_WORK_BIT) + : "memory"); +#else + asm volatile( + PLT_CPU_FEATURE_PREAMBLE + "caspal %[wdata], %H[wdata], %[wdata], %H[wdata], [%[gw_loc]]\n" + : [wdata] "+r"(gw.get_work) + : [gw_loc] "r"(ws->base + SSOW_LF_GWS_OP_GET_WORK0) + : "memory"); +#endif #endif #else plt_write64(gw.u64[0], ws->base + SSOW_LF_GWS_OP_GET_WORK0); -- 2.25.1 ^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v4] event/cnxk: use WFE LDP loop for getwork routine 2024-02-01 22:03 ` [PATCH v4] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula @ 2024-02-25 15:20 ` Jerin Jacob 0 siblings, 0 replies; 12+ messages in thread From: Jerin Jacob @ 2024-02-25 15:20 UTC (permalink / raw) To: pbhagavatula; +Cc: jerinj, Shijith Thotton, dev On Fri, Feb 2, 2024 at 5:59 AM <pbhagavatula@marvell.com> wrote: > > From: Pavan Nikhilesh <pbhagavatula@marvell.com> > > Use WFE LDP loop while polling for GETWORK completion for better > power savings. > Disabled by default and can be enabled by configuring meson with > -Dc_args='-DRTE_ARM_USE_WFE'. Since this section is not yet merged. We can remove this commit log. > > Signed-off-by: Pavan Nikhilesh <pbhagavatula@marvell.com> > --- > v4 Changes: > - Split patches > > doc/guides/eventdevs/cnxk.rst | 9 ++++++ Please update the release notes for this PMD feature. > drivers/event/cnxk/cn10k_worker.h | 52 +++++++++++++++++++++++++------ > 2 files changed, 52 insertions(+), 9 deletions(-) > > diff --git a/doc/guides/eventdevs/cnxk.rst b/doc/guides/eventdevs/cnxk.rst > index cccb8a0304..04f5b5025b 100644 > --- a/doc/guides/eventdevs/cnxk.rst > +++ b/doc/guides/eventdevs/cnxk.rst > @@ -198,6 +198,15 @@ Runtime Config Options > > -a 0002:0e:00.0,tim_eclk_freq=122880000-1000000000-0 > > +Power Savings on CN10K > +---------------------- > + > +ARM cores can additionally use WFE when polling for transactions on SSO bus > +to save power i.e., in the event dequeue call ARM core can enter WFE and exit > +when either work has been scheduled or dequeue timeout has reached. > +This can be enabled by configuring meson with the following option > +``-Dc_args='-DRTE_ARM_USE_WFE'``. The last section can be made as generic, as other patches are not merged. i.e This can be enabled by selecting RTE_ARM_USE_WFE or so. ^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2024-02-25 15:21 UTC | newest] Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-01-04 19:36 [PATCH] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula 2024-01-09 7:56 ` Jerin Jacob 2024-01-17 14:25 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time pbhagavatula 2024-01-17 14:26 ` [PATCH v2 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula 2024-01-18 1:52 ` [PATCH v2 1/2] config/arm: allow WFE to be enabled config time Ruifeng Wang 2024-01-21 15:21 ` [PATCH v3 " pbhagavatula 2024-01-21 15:21 ` [PATCH v3 2/2] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula 2024-01-22 6:37 ` [PATCH v3 1/2] config/arm: allow WFE to be enabled config time fengchengwen 2024-01-22 6:43 ` Ruifeng Wang 2024-02-01 16:37 ` Jerin Jacob 2024-02-01 22:03 ` [PATCH v4] event/cnxk: use WFE LDP loop for getwork routine pbhagavatula 2024-02-25 15:20 ` Jerin Jacob
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).