patches for DPDK stable branches
 help / color / mirror / Atom feed
* [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE
@ 2025-10-15 13:39 Sivaprasad Tummala
  2025-10-28 14:53 ` Thomas Monjalon
  2025-11-07 17:59 ` Dariusz Sosnowski
  0 siblings, 2 replies; 3+ messages in thread
From: Sivaprasad Tummala @ 2025-10-15 13:39 UTC (permalink / raw)
  To: jerinj, kirankumark, ndabilpuram, yanzhirun_163, david.marchand,
	ktraynor, thomas, konstantin.ananyev, konstantin.v.ananyev,
	bruce.richardson, maxime.coquelin, anatoly.burakov, aconole
  Cc: dev, akozyrev, stable

Previously, the PMD used a common monitor callback to determine
CQE ownership for power-aware polling. However, when a CQE contained
an invalid opcode(MLX5_CQE_INVALID), ownership bit was not reliable.
As a result, the monitor condition could falsely indicate CQE
availability and cause the CPU to wake up unnecessarily during
low traffic periods.

This resulted in spurious wakeups in monitor-wait mode and reduced
the expected power savings, as cores exited the sleep state even
when no valid CQEs were available.

This patch introduces a dedicated callback that skips invalid CQEs
and optimizes power efficiency by preventing false wakeups caused
by hardware-owned or invalid entries.

Fixes: a8f0df6bf98d ("net/mlx5: support power monitoring")
Cc: akozyrev@nvidia.com
Cc: stable@dpdk.org

Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
---
 drivers/net/mlx5/mlx5_rx.c | 17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
index 420a03068d..2765b4b730 100644
--- a/drivers/net/mlx5/mlx5_rx.c
+++ b/drivers/net/mlx5/mlx5_rx.c
@@ -295,6 +295,20 @@ mlx5_monitor_callback(const uint64_t value,
 	return (value & m) == v ? -1 : 0;
 }
 
+static int
+mlx5_monitor_cqe_own_callback(const uint64_t value,
+		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
+{
+	const uint64_t m = opaque[CLB_MSK_IDX];
+	const uint64_t v = opaque[CLB_VAL_IDX];
+	const uint64_t match = ((value & m) == v);
+	const uint64_t opcode = MLX5_CQE_OPCODE(value);
+	const uint64_t valid_op = (opcode ^ MLX5_CQE_INVALID);
+
+	/* ownership bit is not valid for invalid opcode; CQE is HW owned */
+	return -(match & valid_op);
+}
+
 int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 {
 	struct mlx5_rxq_data *rxq = rx_queue;
@@ -312,12 +326,13 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
 		pmc->addr = &cqe->validity_iteration_count;
 		pmc->opaque[CLB_VAL_IDX] = vic;
 		pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_VIC_INIT;
+		pmc->fn = mlx5_monitor_callback;
 	} else {
 		pmc->addr = &cqe->op_own;
 		pmc->opaque[CLB_VAL_IDX] = !!idx;
 		pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
+		pmc->fn = mlx5_monitor_cqe_own_callback;
 	}
-	pmc->fn = mlx5_monitor_callback;
 	pmc->size = sizeof(uint8_t);
 	return 0;
 }
-- 
2.43.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE
  2025-10-15 13:39 [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE Sivaprasad Tummala
@ 2025-10-28 14:53 ` Thomas Monjalon
  2025-11-07 17:59 ` Dariusz Sosnowski
  1 sibling, 0 replies; 3+ messages in thread
From: Thomas Monjalon @ 2025-10-28 14:53 UTC (permalink / raw)
  To: Sivaprasad Tummala
  Cc: jerinj, kirankumark, ndabilpuram, yanzhirun_163, david.marchand,
	ktraynor, konstantin.ananyev, konstantin.v.ananyev,
	bruce.richardson, maxime.coquelin, anatoly.burakov, aconole, dev,
	akozyrev, stable, Dariusz Sosnowski, Viacheslav Ovsiienko,
	Bing Zhao, Ori Kam, Suanming Mou, Matan Azrad, Maayan Kashani

Cc mlx5 maintainers

15/10/2025 15:39, Sivaprasad Tummala:
> Previously, the PMD used a common monitor callback to determine
> CQE ownership for power-aware polling. However, when a CQE contained
> an invalid opcode(MLX5_CQE_INVALID), ownership bit was not reliable.
> As a result, the monitor condition could falsely indicate CQE
> availability and cause the CPU to wake up unnecessarily during
> low traffic periods.
> 
> This resulted in spurious wakeups in monitor-wait mode and reduced
> the expected power savings, as cores exited the sleep state even
> when no valid CQEs were available.
> 
> This patch introduces a dedicated callback that skips invalid CQEs
> and optimizes power efficiency by preventing false wakeups caused
> by hardware-owned or invalid entries.
> 
> Fixes: a8f0df6bf98d ("net/mlx5: support power monitoring")
> Cc: akozyrev@nvidia.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---
>  drivers/net/mlx5/mlx5_rx.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
> index 420a03068d..2765b4b730 100644
> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -295,6 +295,20 @@ mlx5_monitor_callback(const uint64_t value,
>  	return (value & m) == v ? -1 : 0;
>  }
>  
> +static int
> +mlx5_monitor_cqe_own_callback(const uint64_t value,
> +		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
> +{
> +	const uint64_t m = opaque[CLB_MSK_IDX];
> +	const uint64_t v = opaque[CLB_VAL_IDX];
> +	const uint64_t match = ((value & m) == v);
> +	const uint64_t opcode = MLX5_CQE_OPCODE(value);
> +	const uint64_t valid_op = (opcode ^ MLX5_CQE_INVALID);
> +
> +	/* ownership bit is not valid for invalid opcode; CQE is HW owned */
> +	return -(match & valid_op);
> +}
> +
>  int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>  {
>  	struct mlx5_rxq_data *rxq = rx_queue;
> @@ -312,12 +326,13 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>  		pmc->addr = &cqe->validity_iteration_count;
>  		pmc->opaque[CLB_VAL_IDX] = vic;
>  		pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_VIC_INIT;
> +		pmc->fn = mlx5_monitor_callback;
>  	} else {
>  		pmc->addr = &cqe->op_own;
>  		pmc->opaque[CLB_VAL_IDX] = !!idx;
>  		pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
> +		pmc->fn = mlx5_monitor_cqe_own_callback;
>  	}
> -	pmc->fn = mlx5_monitor_callback;
>  	pmc->size = sizeof(uint8_t);
>  	return 0;
>  }




^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE
  2025-10-15 13:39 [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE Sivaprasad Tummala
  2025-10-28 14:53 ` Thomas Monjalon
@ 2025-11-07 17:59 ` Dariusz Sosnowski
  1 sibling, 0 replies; 3+ messages in thread
From: Dariusz Sosnowski @ 2025-11-07 17:59 UTC (permalink / raw)
  To: Sivaprasad Tummala, Alexander Kozyrev, Viacheslav Ovsiienko
  Cc: jerinj, kirankumark, ndabilpuram, yanzhirun_163, david.marchand,
	ktraynor, thomas, konstantin.ananyev, konstantin.v.ananyev,
	bruce.richardson, maxime.coquelin, anatoly.burakov, aconole, dev,
	stable

Hi,

Thank you for the contribution. Please see comments below.

On Wed, Oct 15, 2025 at 01:39:57PM +0000, Sivaprasad Tummala wrote:
> Previously, the PMD used a common monitor callback to determine
> CQE ownership for power-aware polling. However, when a CQE contained
> an invalid opcode(MLX5_CQE_INVALID), ownership bit was not reliable.
> As a result, the monitor condition could falsely indicate CQE
> availability and cause the CPU to wake up unnecessarily during
> low traffic periods.
> 
> This resulted in spurious wakeups in monitor-wait mode and reduced
> the expected power savings, as cores exited the sleep state even
> when no valid CQEs were available.
> 
> This patch introduces a dedicated callback that skips invalid CQEs
> and optimizes power efficiency by preventing false wakeups caused
> by hardware-owned or invalid entries.
> 
> Fixes: a8f0df6bf98d ("net/mlx5: support power monitoring")
> Cc: akozyrev@nvidia.com
> Cc: stable@dpdk.org
> 
> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
> ---
>  drivers/net/mlx5/mlx5_rx.c | 17 ++++++++++++++++-
>  1 file changed, 16 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
> index 420a03068d..2765b4b730 100644
> --- a/drivers/net/mlx5/mlx5_rx.c
> +++ b/drivers/net/mlx5/mlx5_rx.c
> @@ -295,6 +295,20 @@ mlx5_monitor_callback(const uint64_t value,
>  	return (value & m) == v ? -1 : 0;
>  }
>  
> +static int
> +mlx5_monitor_cqe_own_callback(const uint64_t value,
> +		const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
> +{
> +	const uint64_t m = opaque[CLB_MSK_IDX];
> +	const uint64_t v = opaque[CLB_VAL_IDX];
> +	const uint64_t match = ((value & m) == v);

Could you please rename "match" variable to "sw_owned"?
This name would better relay the meaning of the checked condition that
CQE owner bit value signifies that CQE is SW owned.

> +	const uint64_t opcode = MLX5_CQE_OPCODE(value);
> +	const uint64_t valid_op = (opcode ^ MLX5_CQE_INVALID);

IMO the usage of bit operations here (although logic is correct) is a bit confusing.
Could you rewrite it in terms of logical operations so it's easier to
follow? For example like this:

	const uint64_t valid_op = opcode != MLX5_CQE_INVALID

	return (sw_owned && valid_op) ? -1 : 0;

This also would properly describe in code the required condition:
CQE can be parsed by SW if and only if owner bit is "SW owned" and CQE
opcode is valid.

> +
> +	/* ownership bit is not valid for invalid opcode; CQE is HW owned */
> +	return -(match & valid_op);
> +}
> +
>  int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>  {
>  	struct mlx5_rxq_data *rxq = rx_queue;
> @@ -312,12 +326,13 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>  		pmc->addr = &cqe->validity_iteration_count;
>  		pmc->opaque[CLB_VAL_IDX] = vic;
>  		pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_VIC_INIT;
> +		pmc->fn = mlx5_monitor_callback;

Alex, Slava: Just to double check - in case of enhanced CQE compression
layout, should both CQE opcode and vic be checked?
Right now only vic is checked in power monitor callback for that case.
In Rx datapath both are checked to determine CQE ownership:
https://github.com/DPDK/dpdk/blob/main/drivers/common/mlx5/mlx5_common.h#L277

>  	} else {
>  		pmc->addr = &cqe->op_own;
>  		pmc->opaque[CLB_VAL_IDX] = !!idx;
>  		pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
> +		pmc->fn = mlx5_monitor_cqe_own_callback;
>  	}
> -	pmc->fn = mlx5_monitor_callback;
>  	pmc->size = sizeof(uint8_t);
>  	return 0;
>  }
> -- 
> 2.43.0
> 

Best regards,
Dariusz Sosnowski

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-11-07 17:59 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-10-15 13:39 [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE Sivaprasad Tummala
2025-10-28 14:53 ` Thomas Monjalon
2025-11-07 17:59 ` Dariusz Sosnowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).