From: "Tummala, Sivaprasad" <Sivaprasad.Tummala@amd.com>
To: Dariusz Sosnowski <dsosnowski@nvidia.com>,
Alexander Kozyrev <akozyrev@nvidia.com>,
Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Cc: "jerinj@marvell.com" <jerinj@marvell.com>,
"kirankumark@marvell.com" <kirankumark@marvell.com>,
"ndabilpuram@marvell.com" <ndabilpuram@marvell.com>,
"yanzhirun_163@163.com" <yanzhirun_163@163.com>,
"david.marchand@redhat.com" <david.marchand@redhat.com>,
"ktraynor@redhat.com" <ktraynor@redhat.com>,
"thomas@monjalon.net" <thomas@monjalon.net>,
"konstantin.ananyev@huawei.com" <konstantin.ananyev@huawei.com>,
"konstantin.v.ananyev@yandex.ru" <konstantin.v.ananyev@yandex.ru>,
"bruce.richardson@intel.com" <bruce.richardson@intel.com>,
"maxime.coquelin@redhat.com" <maxime.coquelin@redhat.com>,
"anatoly.burakov@intel.com" <anatoly.burakov@intel.com>,
"aconole@redhat.com" <aconole@redhat.com>,
"dev@dpdk.org" <dev@dpdk.org>,
"stable@dpdk.org" <stable@dpdk.org>
Subject: Re: [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE
Date: Mon, 10 Nov 2025 09:29:13 +0000 [thread overview]
Message-ID: <CH3PR12MB8233444042288DA4196F222A86CEA@CH3PR12MB8233.namprd12.prod.outlook.com> (raw)
In-Reply-To: <20251107175936.4yr2quzngd5gnp2l@ds-vm-debian.local>
[-- Attachment #1: Type: text/plain, Size: 5375 bytes --]
[AMD Official Use Only - AMD Internal Distribution Only]
Hi Dariusz,
________________________________
From: Dariusz Sosnowski <dsosnowski@nvidia.com>
Sent: Friday, November 07, 2025 11:29 PM
To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>; Alexander Kozyrev <akozyrev@nvidia.com>; Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Cc: jerinj@marvell.com <jerinj@marvell.com>; kirankumark@marvell.com <kirankumark@marvell.com>; ndabilpuram@marvell.com <ndabilpuram@marvell.com>; yanzhirun_163@163.com <yanzhirun_163@163.com>; david.marchand@redhat.com <david.marchand@redhat.com>; ktraynor@redhat.com <ktraynor@redhat.com>; thomas@monjalon.net <thomas@monjalon.net>; konstantin.ananyev@huawei.com <konstantin.ananyev@huawei.com>; konstantin.v.ananyev@yandex.ru <konstantin.v.ananyev@yandex.ru>; bruce.richardson@intel.com <bruce.richardson@intel.com>; maxime.coquelin@redhat.com <maxime.coquelin@redhat.com>; anatoly.burakov@intel.com <anatoly.burakov@intel.com>; aconole@redhat.com <aconole@redhat.com>; dev@dpdk.org <dev@dpdk.org>; stable@dpdk.org <stable@dpdk.org>
Subject: Re: [PATCH] net/mlx5: fix spurious CPU wakeups caused by invalid CQE
Caution: This message originated from an External Source. Use proper caution when opening attachments, clicking links, or responding.
> Hi,
>
> Thank you for the contribution. Please see comments below.
>
> On Wed, Oct 15, 2025 at 01:39:57PM +0000, Sivaprasad Tummala wrote:
>> Previously, the PMD used a common monitor callback to determine
>> CQE ownership for power-aware polling. However, when a CQE contained
>> an invalid opcode(MLX5_CQE_INVALID), ownership bit was not reliable.
>> As a result, the monitor condition could falsely indicate CQE
>> availability and cause the CPU to wake up unnecessarily during
>> low traffic periods.
>>
>> This resulted in spurious wakeups in monitor-wait mode and reduced
>> the expected power savings, as cores exited the sleep state even
>> when no valid CQEs were available.
>>
>> This patch introduces a dedicated callback that skips invalid CQEs
>> and optimizes power efficiency by preventing false wakeups caused
>> by hardware-owned or invalid entries.
>>
>> Fixes: a8f0df6bf98d ("net/mlx5: support power monitoring")
>> Cc: akozyrev@nvidia.com
>> Cc: stable@dpdk.org
>>
>> Signed-off-by: Sivaprasad Tummala <sivaprasad.tummala@amd.com>
>> ---
>> drivers/net/mlx5/mlx5_rx.c | 17 ++++++++++++++++-
>> 1 file changed, 16 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/net/mlx5/mlx5_rx.c b/drivers/net/mlx5/mlx5_rx.c
>> index 420a03068d..2765b4b730 100644
>> --- a/drivers/net/mlx5/mlx5_rx.c
>> +++ b/drivers/net/mlx5/mlx5_rx.c
>> @@ -295,6 +295,20 @@ mlx5_monitor_callback(const uint64_t value,
>> return (value & m) == v ? -1 : 0;
>> }
>>
>> +static int
>> +mlx5_monitor_cqe_own_callback(const uint64_t value,
>> + const uint64_t opaque[RTE_POWER_MONITOR_OPAQUE_SZ])
>> +{
>> + const uint64_t m = opaque[CLB_MSK_IDX];
>> + const uint64_t v = opaque[CLB_VAL_IDX];
>> + const uint64_t match = ((value & m) == v);
>
> Could you please rename "match" variable to "sw_owned"?
> This name would better relay the meaning of the checked condition that
> CQE owner bit value signifies that CQE is SW owned.
ACK! Will update this in v2.
>
>> + const uint64_t opcode = MLX5_CQE_OPCODE(value);
>> + const uint64_t valid_op = (opcode ^ MLX5_CQE_INVALID);
>
>IMO the usage of bit operations here (although logic is correct) is a bit confusing.
>Could you rewrite it in terms of logical operations so it's easier to
>follow? For example like this:
>
> const uint64_t valid_op = opcode != MLX5_CQE_INVALID
>
> return (sw_owned && valid_op) ? -1 : 0;
>
>This also would properly describe in code the required condition:
>CQE can be parsed by SW if and only if owner bit is "SW owned" and CQE
>opcode is valid.
ACK! Will update this in v2.
>
>> +
>> + /* ownership bit is not valid for invalid opcode; CQE is HW owned */
>> + return -(match & valid_op);
>> +}
>> +
>> int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>> {
>> struct mlx5_rxq_data *rxq = rx_queue;
>> @@ -312,12 +326,13 @@ int mlx5_get_monitor_addr(void *rx_queue, struct rte_power_monitor_cond *pmc)
>> pmc->addr = &cqe->validity_iteration_count;
>> pmc->opaque[CLB_VAL_IDX] = vic;
>> pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_VIC_INIT;
>> + pmc->fn = mlx5_monitor_callback;
>
>Alex, Slava: Just to double check - in case of enhanced CQE compression
>layout, should both CQE opcode and vic be checked?
>Right now only vic is checked in power monitor callback for that case.
>In Rx datapath both are checked to determine CQE ownership:
>https://github.com/DPDK/dpdk/blob/main/drivers/common/mlx5/mlx5_common.h#L277
>
>> } else {
>> pmc->addr = &cqe->op_own;
>> pmc->opaque[CLB_VAL_IDX] = !!idx;
>> pmc->opaque[CLB_MSK_IDX] = MLX5_CQE_OWNER_MASK;
>> + pmc->fn = mlx5_monitor_cqe_own_callback;
>> }
>> - pmc->fn = mlx5_monitor_callback;
>> pmc->size = sizeof(uint8_t);
>> return 0;
>> }
>> --
>> 2.43.0
>>
>
>Best regards,
>Dariusz Sosnowski
Thanks & Regards,
Sivaprasad
[-- Attachment #2: Type: text/html, Size: 9093 bytes --]
next prev parent reply other threads:[~2025-11-10 9:29 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-10-15 13:39 Sivaprasad Tummala
2025-10-28 14:53 ` Thomas Monjalon
2025-11-07 17:59 ` Dariusz Sosnowski
2025-11-10 9:29 ` Tummala, Sivaprasad [this message]
2025-11-11 3:40 ` [PATCH v2] " Sivaprasad Tummala
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CH3PR12MB8233444042288DA4196F222A86CEA@CH3PR12MB8233.namprd12.prod.outlook.com \
--to=sivaprasad.tummala@amd.com \
--cc=aconole@redhat.com \
--cc=akozyrev@nvidia.com \
--cc=anatoly.burakov@intel.com \
--cc=bruce.richardson@intel.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=dsosnowski@nvidia.com \
--cc=jerinj@marvell.com \
--cc=kirankumark@marvell.com \
--cc=konstantin.ananyev@huawei.com \
--cc=konstantin.v.ananyev@yandex.ru \
--cc=ktraynor@redhat.com \
--cc=maxime.coquelin@redhat.com \
--cc=ndabilpuram@marvell.com \
--cc=stable@dpdk.org \
--cc=thomas@monjalon.net \
--cc=viacheslavo@nvidia.com \
--cc=yanzhirun_163@163.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).