* Re: [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE
2025-10-15 13:39 [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE Sivaprasad Tummala
@ 2025-10-28 14:53 ` Thomas Monjalon
2025-11-07 17:59 ` Dariusz Sosnowski
2025-11-11 3:40 ` [PATCH v2] " Sivaprasad Tummala
2 siblings, 0 replies; 5+ messages in thread
From: Thomas Monjalon @ 2025-10-28 14:53 UTC (permalink / raw)
To: Sivaprasad Tummala
Cc: jerinj, kirankumark, ndabilpuram, yanzhirun_163, david.marchand,
ktraynor, konstantin.ananyev, konstantin.v.ananyev,
bruce.richardson, maxime.coquelin, anatoly.burakov, aconole, dev,
akozyrev, stable, Dariusz Sosnowski, Viacheslav Ovsiienko,
Bing Zhao, Ori Kam, Suanming Mou, Matan Azrad, Maayan Kashani
Cc mlx5 maintainers
15/10/2025 15:39, Sivaprasad Tummala:
> Previously, the PMD used a common monitor callback to determine
> CQE ownership for power-aware polling. However, when a CQE contained
> an invalid opcode(MLX5_CQE_INVALID), ownership bit was not reliable.
> As a result, the monitor condition could falsely indicate CQE
> availability and cause the CPU to wake up unnecessarily during
> low traffic periods.
>
> This resulted in spurious wakeups in monitor-wait mode and reduced
> the expected power savings, as cores exited the sleep state even
> when no valid CQEs were available.
>
> This patch introduces a dedicated callback that skips invalid CQEs
> and optimizes power efficiency by preventing false wakeups caused
> by hardware-owned or invalid entries.
>
> Fixes: a8f0df6bf98d ("net/mlx5: support power monitoring")
> Cc: akozyrev@nvidia.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---
> drivers/net/mlx5/mlx5_rx.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
> index 420a03068d..2765b4b730 100644
> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -295,6 +295,20 @@ mlx5_monitor_callback(const uint64_t value,
> return (value & m) == v ? -1 : 0;
> }
>
> +static int
> +mlx5_monitor_cqe_own_callback(const uint64_t value,
> + const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
> +{
> + const uint64_t m = opaque[CLB_MSK_IDX];
> + const uint64_t v = opaque[CLB_VAL_IDX];
> + const uint64_t match = ((value & m) == v);
> + const uint64_t opcode = MLX5_CQE_OPCODE(value);
> + const uint64_t valid_op = (opcode ^ MLX5_CQE_INVALID);
> +
> + /* ownership bit is not valid for invalid opcode; CQE is HW owned */
> + return -(match & valid_op);
> +}
> +
> int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
> {
> struct mlx5_rxq_data *rxq = rx_queue;
> @@ -312,12 +326,13 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
> pmc->addr = &cqe->validity_iteration_count;
> pmc->opaque[CLB_VAL_IDX] = vic;
> pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_VIC_INIT;
> + pmc->fn = mlx5_monitor_callback;
> } else {
> pmc->addr = &cqe->op_own;
> pmc->opaque[CLB_VAL_IDX] = !!idx;
> pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
> + pmc->fn = mlx5_monitor_cqe_own_callback;
> }
> - pmc->fn = mlx5_monitor_callback;
> pmc->size = sizeof(uint8_t);
> return 0;
> }
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE
2025-10-15 13:39 [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE Sivaprasad Tummala
2025-10-28 14:53 ` Thomas Monjalon
@ 2025-11-07 17:59 ` Dariusz Sosnowski
2025-11-10 9:29 ` Tummala, Sivaprasad
2025-11-11 3:40 ` [PATCH v2] " Sivaprasad Tummala
2 siblings, 1 reply; 5+ messages in thread
From: Dariusz Sosnowski @ 2025-11-07 17:59 UTC (permalink / raw)
To: Sivaprasad Tummala, Alexander Kozyrev, Viacheslav Ovsiienko
Cc: jerinj, kirankumark, ndabilpuram, yanzhirun_163, david.marchand,
ktraynor, thomas, konstantin.ananyev, konstantin.v.ananyev,
bruce.richardson, maxime.coquelin, anatoly.burakov, aconole, dev,
stable
Hi,
Thank you for the contribution. Please see comments below.
On Wed, Oct 15, 2025 at 01:39:57PM +0000, Sivaprasad Tummala wrote:
> Previously, the PMD used a common monitor callback to determine
> CQE ownership for power-aware polling. However, when a CQE contained
> an invalid opcode(MLX5_CQE_INVALID), ownership bit was not reliable.
> As a result, the monitor condition could falsely indicate CQE
> availability and cause the CPU to wake up unnecessarily during
> low traffic periods.
>
> This resulted in spurious wakeups in monitor-wait mode and reduced
> the expected power savings, as cores exited the sleep state even
> when no valid CQEs were available.
>
> This patch introduces a dedicated callback that skips invalid CQEs
> and optimizes power efficiency by preventing false wakeups caused
> by hardware-owned or invalid entries.
>
> Fixes: a8f0df6bf98d ("net/mlx5: support power monitoring")
> Cc: akozyrev@nvidia.com
> Cc: stable@dpdk.org
>
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---
> drivers/net/mlx5/mlx5_rx.c | 17 ++++++++++++++++-
> 1 file changed, 16 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
> index 420a03068d..2765b4b730 100644
> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -295,6 +295,20 @@ mlx5_monitor_callback(const uint64_t value,
> return (value & m) == v ? -1 : 0;
> }
>
> +static int
> +mlx5_monitor_cqe_own_callback(const uint64_t value,
> + const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
> +{
> + const uint64_t m = opaque[CLB_MSK_IDX];
> + const uint64_t v = opaque[CLB_VAL_IDX];
> + const uint64_t match = ((value & m) == v);
Could you please rename "match" variable to "sw_owned"?
This name would better relay the meaning of the checked condition that
CQE owner bit value signifies that CQE is SW owned.
> + const uint64_t opcode = MLX5_CQE_OPCODE(value);
> + const uint64_t valid_op = (opcode ^ MLX5_CQE_INVALID);
IMO the usage of bit operations here (although logic is correct) is a bit confusing.
Could you rewrite it in terms of logical operations so it's easier to
follow? For example like this:
const uint64_t valid_op = opcode != MLX5_CQE_INVALID
return (sw_owned && valid_op) ? -1 : 0;
This also would properly describe in code the required condition:
CQE can be parsed by SW if and only if owner bit is "SW owned" and CQE
opcode is valid.
> +
> + /* ownership bit is not valid for invalid opcode; CQE is HW owned */
> + return -(match & valid_op);
> +}
> +
> int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
> {
> struct mlx5_rxq_data *rxq = rx_queue;
> @@ -312,12 +326,13 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
> pmc->addr = &cqe->validity_iteration_count;
> pmc->opaque[CLB_VAL_IDX] = vic;
> pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_VIC_INIT;
> + pmc->fn = mlx5_monitor_callback;
Alex, Slava: Just to double check - in case of enhanced CQE compression
layout, should both CQE opcode and vic be checked?
Right now only vic is checked in power monitor callback for that case.
In Rx datapath both are checked to determine CQE ownership:
https://github.com/DPDK/dpdk/blob/main/drivers/common/mlx5/mlx5_common.h#L277
> } else {
> pmc->addr = &cqe->op_own;
> pmc->opaque[CLB_VAL_IDX] = !!idx;
> pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
> + pmc->fn = mlx5_monitor_cqe_own_callback;
> }
> - pmc->fn = mlx5_monitor_callback;
> pmc->size = sizeof(uint8_t);
> return 0;
> }
> --
> 2.43.0
>
Best regards,
Dariusz Sosnowski
^ permalink raw reply [flat|nested] 5+ messages in thread* Re: [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE
2025-11-07 17:59 ` Dariusz Sosnowski
@ 2025-11-10 9:29 ` Tummala, Sivaprasad
0 siblings, 0 replies; 5+ messages in thread
From: Tummala, Sivaprasad @ 2025-11-10 9:29 UTC (permalink / raw)
To: Dariusz Sosnowski, Alexander Kozyrev, Viacheslav Ovsiienko
Cc: jerinj, kirankumark, ndabilpuram, yanzhirun_163, david.marchand,
ktraynor, thomas, konstantin.ananyev, konstantin.v.ananyev,
bruce.richardson, maxime.coquelin, anatoly.burakov, aconole, dev,
stable
[-- Attachment #1: Type: text/plain, Size: 5375 bytes --]
[AMD Official Use Only - AMD Internal Distribution Only]
Hi Dariusz,
________________________________
From: Dariusz Sosnowski <dsosnowski@nvidia.com>
Sent: Friday, November 07, 2025 11:29 PM
To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>; Alexander Kozyrev <akozyrev@nvidia.com>; Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Cc: jerinj@marvell.com <jerinj@marvell.com>; kirankumark@marvell.com <kirankumark@marvell.com>; ndabilpuram@marvell.com <ndabilpuram@marvell.com>; yanzhirun_163@163.com <yanzhirun_163@163.com>; david.marchand@redhat.com <david.marchand@redhat.com>; ktraynor@redhat.com <ktraynor@redhat.com>; thomas@monjalon.net <thomas@monjalon.net>; konstantin.ananyev@huawei.com <konstantin.ananyev@huawei.com>; konstantin.v.ananyev@yandex.ru <konstantin.v.ananyev@yandex.ru>; bruce.richardson@intel.com <bruce.richardson@intel.com>; maxime.coquelin@redhat.com <maxime.coquelin@redhat.com>; anatoly.burakov@intel.com <anatoly.burakov@intel.com>; aconole@redhat.com <aconole@redhat.com>; dev@dpdk.org <dev@dpdk.org>; stable@dpdk.org <stable@dpdk.org>
Subject: Re: [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> Hi,
>
> Thank you for the contribution. Please see comments below.
>
> On Wed, Oct 15, 2025 at 01:39:57PM +0000, Sivaprasad Tummala wrote:
>> Previously, the PMD used a common monitor callback to determine
>> CQE ownership for power-aware polling. However, when a CQE contained
>> an invalid opcode(MLX5_CQE_INVALID), ownership bit was not reliable.
>> As a result, the monitor condition could falsely indicate CQE
>> availability and cause the CPU to wake up unnecessarily during
>> low traffic periods.
>>
>> This resulted in spurious wakeups in monitor-wait mode and reduced
>> the expected power savings, as cores exited the sleep state even
>> when no valid CQEs were available.
>>
>> This patch introduces a dedicated callback that skips invalid CQEs
>> and optimizes power efficiency by preventing false wakeups caused
>> by hardware-owned or invalid entries.
>>
>> Fixes: a8f0df6bf98d ("net/mlx5: support power monitoring")
>> Cc: akozyrev@nvidia.com
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>> ---
>> drivers/net/mlx5/mlx5_rx.c | 17 ++++++++++++++++-
>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
>> index 420a03068d..2765b4b730 100644
>> --- a/drivers/net/mlx5/mlx5_rx.c
>> +++ b/drivers/net/mlx5/mlx5_rx.c
>> @@ -295,6 +295,20 @@ mlx5_monitor_callback(const uint64_t value,
>> return (value & m) == v ? -1 : 0;
>> }
>>
>> +static int
>> +mlx5_monitor_cqe_own_callback(const uint64_t value,
>> + const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
>> +{
>> + const uint64_t m = opaque[CLB_MSK_IDX];
>> + const uint64_t v = opaque[CLB_VAL_IDX];
>> + const uint64_t match = ((value & m) == v);
>
> Could you please rename "match" variable to "sw_owned"?
> This name would better relay the meaning of the checked condition that
> CQE owner bit value signifies that CQE is SW owned.
ACK! Will update this in v2.
>
>> + const uint64_t opcode = MLX5_CQE_OPCODE(value);
>> + const uint64_t valid_op = (opcode ^ MLX5_CQE_INVALID);
>
>IMO the usage of bit operations here (although logic is correct) is a bit confusing.
>Could you rewrite it in terms of logical operations so it's easier to
>follow? For example like this:
>
> const uint64_t valid_op = opcode != MLX5_CQE_INVALID
>
> return (sw_owned && valid_op) ? -1 : 0;
>
>This also would properly describe in code the required condition:
>CQE can be parsed by SW if and only if owner bit is "SW owned" and CQE
>opcode is valid.
ACK! Will update this in v2.
>
>> +
>> + /* ownership bit is not valid for invalid opcode; CQE is HW owned */
>> + return -(match & valid_op);
>> +}
>> +
>> int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>> {
>> struct mlx5_rxq_data *rxq = rx_queue;
>> @@ -312,12 +326,13 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>> pmc->addr = &cqe->validity_iteration_count;
>> pmc->opaque[CLB_VAL_IDX] = vic;
>> pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_VIC_INIT;
>> + pmc->fn = mlx5_monitor_callback;
>
>Alex, Slava: Just to double check - in case of enhanced CQE compression
>layout, should both CQE opcode and vic be checked?
>Right now only vic is checked in power monitor callback for that case.
>In Rx datapath both are checked to determine CQE ownership:
>https://github.com/DPDK/dpdk/blob/main/drivers/common/mlx5/mlx5_common.h#L277
>
>> } else {
>> pmc->addr = &cqe->op_own;
>> pmc->opaque[CLB_VAL_IDX] = !!idx;
>> pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
>> + pmc->fn = mlx5_monitor_cqe_own_callback;
>> }
>> - pmc->fn = mlx5_monitor_callback;
>> pmc->size = sizeof(uint8_t);
>> return 0;
>> }
>> --
>> 2.43.0
>>
>
>Best regards,
>Dariusz Sosnowski
Thanks & Regards,
Sivaprasad
[-- Attachment #2: Type: text/html, Size: 9093 bytes --]
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2] net/mlx5: fix spurious CPU wakeups caused by invalid CQE
2025-10-15 13:39 [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE Sivaprasad Tummala
2025-10-28 14:53 ` Thomas Monjalon
2025-11-07 17:59 ` Dariusz Sosnowski
@ 2025-11-11 3:40 ` Sivaprasad Tummala
2 siblings, 0 replies; 5+ messages in thread
From: Sivaprasad Tummala @ 2025-11-11 3:40 UTC (permalink / raw)
To: Dariusz Sosnowski, Viacheslav Ovsiienko, Bing Zhao, Ori Kam,
Suanming Mou, Matan Azrad, Alexander Kozyrev
Cc: dev, stable
Previously, the PMD used a common monitor callback to determine
CQE ownership for power-aware polling. However, when a CQE contained
an invalid opcode(MLX5_CQE_INVALID), ownership bit was not reliable.
As a result, the monitor condition could falsely indicate CQE
availability and cause the CPU to wake up unnecessarily during
low traffic periods.
This resulted in spurious wakeups in monitor-wait mode and reduced
the expected power savings, as cores exited the sleep state even
when no valid CQEs were available.
This patch introduces a dedicated callback that skips invalid CQEs
and optimizes power efficiency by preventing false wakeups caused
by hardware-owned or invalid entries.
Fixes: a8f0df6bf98d ("net/mlx5: support power monitoring")
Cc: akozyrev@nvidia.com
Cc: stable@dpdk.org
Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
v2:
- Updated CQE opcode check logic — replaced XOR with comparison
- Renamed variable match to sw_owned for clarity
- Updated CQE ownership check order
---
drivers/net/mlx5/mlx5_rx.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 420a03068d..ac663a978e 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -295,6 +295,20 @@ mlx5_monitor_callback(const uint64_t value,
return (value & m) == v ? -1 : 0;
}
+static int
+mlx5_monitor_cqe_own_callback(const uint64_t value,
+ const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+ const uint64_t m = opaque[CLB_MSK_IDX];
+ const uint64_t v = opaque[CLB_VAL_IDX];
+ const uint64_t sw_owned = ((value & m) == v);
+ const uint64_t opcode = MLX5_CQE_OPCODE(value);
+ const uint64_t valid_op = (opcode != MLX5_CQE_INVALID);
+
+ /* ownership bit is not valid for invalid opcode; CQE is HW owned */
+ return -(valid_op & sw_owned);
+}
+
int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
{
struct mlx5_rxq_data *rxq = rx_queue;
@@ -312,12 +326,13 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
pmc->addr = &cqe->validity_iteration_count;
pmc->opaque[CLB_VAL_IDX] = vic;
pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_VIC_INIT;
+ pmc->fn = mlx5_monitor_callback;
} else {
pmc->addr = &cqe->op_own;
pmc->opaque[CLB_VAL_IDX] = !!idx;
pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+ pmc->fn = mlx5_monitor_cqe_own_callback;
}
- pmc->fn = mlx5_monitor_callback;
pmc->size = sizeof(uint8_t);
return 0;
}
--
2.43.0
^ permalink raw reply [flat|nested] 5+ messages in thread