From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 3165E45B40;
	Tue, 15 Oct 2024 09:47:31 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 09217402E3;
	Tue, 15 Oct 2024 09:47:31 +0200 (CEST)
Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187])
 by mails.dpdk.org (Postfix) with ESMTP id 6D883400EF
 for <dev@dpdk.org>; Tue, 15 Oct 2024 09:47:29 +0200 (CEST)
Received: from mail.maildlp.com (unknown [172.19.162.254])
 by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4XSR3x1BjGzyTBK;
 Tue, 15 Oct 2024 15:46:05 +0800 (CST)
Received: from kwepemm600004.china.huawei.com (unknown [7.193.23.242])
 by mail.maildlp.com (Postfix) with ESMTPS id CEDDF18010F;
 Tue, 15 Oct 2024 15:47:27 +0800 (CST)
Received: from [10.67.121.59] (10.67.121.59) by kwepemm600004.china.huawei.com
 (7.193.23.242) with Microsoft SMTP Server (version=TLS1_2,
 cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Tue, 15 Oct
 2024 15:47:27 +0800
Message-ID: <7309a620-a463-5bfb-0b8e-ed95c4290d52@huawei.com>
Date: Tue, 15 Oct 2024 15:47:22 +0800
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
 Thunderbird/91.2.0
Subject: Re: [PATCH v10 1/2] power: introduce PM QoS API on CPU wide
To: Konstantin Ananyev <konstantin.ananyev@huawei.com>, "dev@dpdk.org"
 <dev@dpdk.org>
CC: "mb@smartsharesystems.com" <mb@smartsharesystems.com>,
 "thomas@monjalon.net" <thomas@monjalon.net>, "ferruh.yigit@amd.com"
 <ferruh.yigit@amd.com>, "anatoly.burakov@intel.com"
 <anatoly.burakov@intel.com>, "david.hunt@intel.com" <david.hunt@intel.com>,
 "sivaprasad.tummala@amd.com" <sivaprasad.tummala@amd.com>,
 "stephen@networkplumber.org" <stephen@networkplumber.org>,
 "david.marchand@redhat.com" <david.marchand@redhat.com>, Fengchengwen
 <fengchengwen@huawei.com>, liuyonglong <liuyonglong@huawei.com>
References: <20240320105529.5626-1-lihuisong@huawei.com>
 <20240912023812.30885-1-lihuisong@huawei.com>
 <20240912023812.30885-2-lihuisong@huawei.com>
 <773b9cf3df354a168e42aecb627b0b2c@huawei.com>
From: "lihuisong (C)" <lihuisong@huawei.com>
In-Reply-To: <773b9cf3df354a168e42aecb627b0b2c@huawei.com>
Content-Type: text/plain; charset="UTF-8"; format=flowed
Content-Transfer-Encoding: 8bit
X-Originating-IP: [10.67.121.59]
X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To
 kwepemm600004.china.huawei.com (7.193.23.242)
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

Hi Konstantin Ananyev,

Thanks for your review.

在 2024/10/14 16:29, Konstantin Ananyev 写道:
>> The deeper the idle state, the lower the power consumption, but the longer
>> the resume time. Some service are delay sensitive and very except the low
>> resume time, like interrupt packet receiving mode.
>>
>> And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
>> interface is used to set and get the resume latency limit on the cpuX for
>> userspace. Each cpuidle governor in Linux select which idle state to enter
>> based on this CPU resume latency in their idle task.
>>
>> The per-CPU PM QoS API can be used to control this CPU's idle state
>> selection and limit just enter the shallowest idle state to low the delay
>> after sleep by setting strict resume latency (zero value).
>>
>> Signed-off-by: Huisong Li <lihuisong@huawei.com>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>> ---
>>   doc/guides/prog_guide/power_man.rst    |  24 ++++++
>>   doc/guides/rel_notes/release_24_11.rst |   5 ++
>>   lib/power/meson.build                  |   2 +
>>   lib/power/rte_power_qos.c              | 111 +++++++++++++++++++++++++
>>   lib/power/rte_power_qos.h              |  73 ++++++++++++++++
>>   lib/power/version.map                  |   4 +
>>   6 files changed, 219 insertions(+)
>>   create mode 100644 lib/power/rte_power_qos.c
>>   create mode 100644 lib/power/rte_power_qos.h
>>
>> diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst
>> index f6674efe2d..faa32b4320 100644
>> --- a/doc/guides/prog_guide/power_man.rst
>> +++ b/doc/guides/prog_guide/power_man.rst
>> @@ -249,6 +249,30 @@ Get Num Pkgs
>>   Get Num Dies
>>     Get the number of die's on a given package.
>>
>> +
>> +PM QoS
>> +------
>> +
>> +The deeper the idle state, the lower the power consumption, but the longer
>> +the resume time. Some service are delay sensitive and very except the low
>> +resume time, like interrupt packet receiving mode.
>> +
>> +And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs
>> +interface is used to set and get the resume latency limit on the cpuX for
>> +userspace. Each cpuidle governor in Linux select which idle state to enter
>> +based on this CPU resume latency in their idle task.
>> +
>> +The per-CPU PM QoS API can be used to set and get the CPU resume latency based
>> +on this sysfs.
>> +
>> +The ``rte_power_qos_set_cpu_resume_latency()`` function can control the CPU's
>> +idle state selection in Linux and limit just to enter the shallowest idle state
>> +to low the delay of resuming service after sleeping by setting strict resume
>> +latency (zero value).
>> +
>> +The ``rte_power_qos_get_cpu_resume_latency()`` function can get the resume
>> +latency on specified CPU.
>> +
>>   References
>>   ----------
>>
>> diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst
>> index 0ff70d9057..bd72d0a595 100644
>> --- a/doc/guides/rel_notes/release_24_11.rst
>> +++ b/doc/guides/rel_notes/release_24_11.rst
>> @@ -55,6 +55,11 @@ New Features
>>        Also, make sure to start the actual text at the margin.
>>        =======================================================
>>
>> +* **Introduce per-CPU PM QoS interface.**
>> +
>> +  * Add per-CPU PM QoS interface to low the delay after sleep by controlling
>> +    CPU idle state selection.
>> +
>>
>>   Removed Items
>>   -------------
>> diff --git a/lib/power/meson.build b/lib/power/meson.build
>> index b8426589b2..8222e178b0 100644
>> --- a/lib/power/meson.build
>> +++ b/lib/power/meson.build
>> @@ -23,12 +23,14 @@ sources = files(
>>           'rte_power.c',
>>           'rte_power_uncore.c',
>>           'rte_power_pmd_mgmt.c',
>> +        'rte_power_qos.c',
>>   )
>>   headers = files(
>>           'rte_power.h',
>>           'rte_power_guest_channel.h',
>>           'rte_power_pmd_mgmt.h',
>>           'rte_power_uncore.h',
>> +        'rte_power_qos.h',
>>   )
>>   if cc.has_argument('-Wno-cast-qual')
>>       cflags += '-Wno-cast-qual'
>> diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c
>> new file mode 100644
>> index 0000000000..8eb26cd41a
>> --- /dev/null
>> +++ b/lib/power/rte_power_qos.c
>> @@ -0,0 +1,111 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2024 HiSilicon Limited
>> + */
>> +
>> +#include <errno.h>
>> +#include <stdlib.h>
>> +#include <string.h>
>> +
>> +#include <rte_lcore.h>
>> +#include <rte_log.h>
>> +
>> +#include "power_common.h"
>> +#include "rte_power_qos.h"
>> +
>> +#define PM_QOS_SYSFILE_RESUME_LATENCY_US	\
>> +	"/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us"
>> +
>> +#define PM_QOS_CPU_RESUME_LATENCY_BUF_LEN	32
>> +
>> +int
>> +rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency)
>> +{
>> +	char buf[PM_QOS_CPU_RESUME_LATENCY_BUF_LEN];
>> +	FILE *f;
>> +	int ret;
>> +
>> +	if (!rte_lcore_is_enabled(lcore_id)) {
>> +		POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (latency < 0) {
>> +		POWER_LOG(ERR, "latency should be greater than and equal to 0");
>> +		return -EINVAL;
>> +	}
>> +
>> +	ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
> That was already brought by Morten:
> lcore_id is not always equal to cpu_core_id (cpu affinity).
Yes, Morten also mentioned it.
And I tried to answer him, please find our previous disscussion, thanks.
I think it's ok😁
> Looking through power library it is not specific to that particular patch,
> but sort of common limitation (bug?) in rte_power lib.
Yes it is very common in power lib.
>   
>
>> +	if (ret != 0) {
>> +		POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US, lcore_id);
>> +		return ret;
>> +	}
>> +
<...>