From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9DB1645B9E; Tue, 22 Oct 2024 11:41:30 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 45A494029B; Tue, 22 Oct 2024 11:41:30 +0200 (CEST) Received: from szxga08-in.huawei.com (szxga08-in.huawei.com [45.249.212.255]) by mails.dpdk.org (Postfix) with ESMTP id 360864029A for ; Tue, 22 Oct 2024 11:41:27 +0200 (CEST) Received: from mail.maildlp.com (unknown [172.19.163.252]) by szxga08-in.huawei.com (SkyGuard) with ESMTP id 4XXnFR2MRJz13Kk1; Tue, 22 Oct 2024 17:39:23 +0800 (CST) Received: from kwepemm600004.china.huawei.com (unknown [7.193.23.242]) by mail.maildlp.com (Postfix) with ESMTPS id 92AC51800E2; Tue, 22 Oct 2024 17:41:23 +0800 (CST) Received: from [10.67.121.59] (10.67.121.59) by kwepemm600004.china.huawei.com (7.193.23.242) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Tue, 22 Oct 2024 17:41:22 +0800 Message-ID: <4baf28f3-ac47-e74e-09a1-fbd7665a1cd0@huawei.com> Date: Tue, 22 Oct 2024 17:41:21 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: [PATCH v11 1/2] power: introduce PM QoS API on CPU wide To: Konstantin Ananyev , "dev@dpdk.org" CC: "mb@smartsharesystems.com" , "thomas@monjalon.net" , "ferruh.yigit@amd.com" , "anatoly.burakov@intel.com" , "david.hunt@intel.com" , "sivaprasad.tummala@amd.com" , "stephen@networkplumber.org" , "david.marchand@redhat.com" , Fengchengwen , liuyonglong References: <20240320105529.5626-1-lihuisong@huawei.com> <20241021114253.31216-1-lihuisong@huawei.com> <20241021114253.31216-2-lihuisong@huawei.com> <3c3297d000ea40cc8813b156173e1ff4@huawei.com> From: "lihuisong (C)" In-Reply-To: <3c3297d000ea40cc8813b156173e1ff4@huawei.com> Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.121.59] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm600004.china.huawei.com (7.193.23.242) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org 在 2024/10/22 17:08, Konstantin Ananyev 写道: > >> The deeper the idle state, the lower the power consumption, but the longer >> the resume time. Some service are delay sensitive and very except the low >> resume time, like interrupt packet receiving mode. >> >> And the "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs >> interface is used to set and get the resume latency limit on the cpuX for >> userspace. Each cpuidle governor in Linux select which idle state to enter >> based on this CPU resume latency in their idle task. >> >> The per-CPU PM QoS API can be used to control this CPU's idle state >> selection and limit just enter the shallowest idle state to low the delay >> when wake up from by setting strict resume latency (zero value). >> >> Signed-off-by: Huisong Li >> Acked-by: Morten Brørup > LGTM overall, few nits, see below. > Acked-by: Konstantin Ananyev > >> --- >> doc/guides/prog_guide/power_man.rst | 19 ++++ >> doc/guides/rel_notes/release_24_11.rst | 5 + >> lib/power/meson.build | 2 + >> lib/power/rte_power_qos.c | 123 +++++++++++++++++++++++++ >> lib/power/rte_power_qos.h | 73 +++++++++++++++ >> lib/power/version.map | 4 + >> 6 files changed, 226 insertions(+) >> create mode 100644 lib/power/rte_power_qos.c >> create mode 100644 lib/power/rte_power_qos.h >> >> diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst >> index f6674efe2d..91358b04f3 100644 >> --- a/doc/guides/prog_guide/power_man.rst >> +++ b/doc/guides/prog_guide/power_man.rst >> @@ -107,6 +107,25 @@ User Cases >> The power management mechanism is used to save power when performing L3 forwarding. >> >> >> +PM QoS >> +------ >> + >> +The "/sys/devices/system/cpu/cpuX/power/pm_qos_resume_latency_us" sysfs >> +interface is used to set and get the resume latency limit on the cpuX for >> +userspace. Each cpuidle governor in Linux select which idle state to enter >> +based on this CPU resume latency in their idle task. >> + >> +The deeper the idle state, the lower the power consumption, but the longer >> +the resume time. Some service are latency sensitive and very except the low >> +resume time, like interrupt packet receiving mode. >> + >> +Applications can set and get the CPU resume latency by the >> +``rte_power_qos_set_cpu_resume_latency()`` and ``rte_power_qos_get_cpu_resume_latency()`` >> +respectively. Applications can set a strict resume latency (zero value) by >> +the ``rte_power_qos_set_cpu_resume_latency()`` to low the resume latency and >> +get better performance (instead, the power consumption of platform may increase). >> + >> + >> Ethernet PMD Power Management API >> --------------------------------- >> >> diff --git a/doc/guides/rel_notes/release_24_11.rst b/doc/guides/rel_notes/release_24_11.rst >> index fa4822d928..d9e268274b 100644 >> --- a/doc/guides/rel_notes/release_24_11.rst >> +++ b/doc/guides/rel_notes/release_24_11.rst >> @@ -237,6 +237,11 @@ New Features >> This field is used to pass an extra configuration settings such as ability >> to lookup IPv4 addresses in network byte order. >> >> +* **Introduce per-CPU PM QoS interface.** >> + >> + * Add per-CPU PM QoS interface to low the resume latency when wake up from >> + idle state. >> + >> * **Added new API to register telemetry endpoint callbacks with private arguments.** >> >> A new ``rte_telemetry_register_cmd_arg`` function is available to pass an opaque value to >> diff --git a/lib/power/meson.build b/lib/power/meson.build >> index 2f0f3d26e9..9b5d3e8315 100644 >> --- a/lib/power/meson.build >> +++ b/lib/power/meson.build >> @@ -23,12 +23,14 @@ sources = files( >> 'rte_power.c', >> 'rte_power_uncore.c', >> 'rte_power_pmd_mgmt.c', >> + 'rte_power_qos.c', >> ) >> headers = files( >> 'rte_power.h', >> 'rte_power_guest_channel.h', >> 'rte_power_pmd_mgmt.h', >> 'rte_power_uncore.h', >> + 'rte_power_qos.h', >> ) >> >> deps += ['timer', 'ethdev'] >> diff --git a/lib/power/rte_power_qos.c b/lib/power/rte_power_qos.c >> new file mode 100644 >> index 0000000000..09692b2161 >> --- /dev/null >> +++ b/lib/power/rte_power_qos.c >> @@ -0,0 +1,123 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright(c) 2024 HiSilicon Limited >> + */ >> + >> +#include >> +#include >> +#include >> + >> +#include >> +#include >> + >> +#include "power_common.h" >> +#include "rte_power_qos.h" >> + >> +#define PM_QOS_SYSFILE_RESUME_LATENCY_US \ >> + "/sys/devices/system/cpu/cpu%u/power/pm_qos_resume_latency_us" >> + >> +#define PM_QOS_CPU_RESUME_LATENCY_BUF_LEN 32 >> + >> +int >> +rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency) >> +{ >> + char buf[PM_QOS_CPU_RESUME_LATENCY_BUF_LEN]; >> + uint32_t cpu_id; >> + FILE *f; >> + int ret; >> + >> + if (!rte_lcore_is_enabled(lcore_id)) { >> + POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id); >> + return -EINVAL; >> + } >> + ret = power_get_lcore_mapped_cpu_id(lcore_id, &cpu_id); >> + if (ret != 0) >> + return ret; >> + >> + if (latency < 0) { >> + POWER_LOG(ERR, "latency should be greater than and equal to 0"); >> + return -EINVAL; >> + } >> + >> + ret = open_core_sysfs_file(&f, "w", PM_QOS_SYSFILE_RESUME_LATENCY_US, cpu_id); >> + if (ret != 0) { >> + POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US" : %s", >> + cpu_id, strerror(errno)); >> + return ret; >> + } >> + >> + /* >> + * Based on the sysfs interface pm_qos_resume_latency_us under >> + * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meaning >> + * is as follows for different input string. >> + * 1> the resume latency is 0 if the input is "n/a". >> + * 2> the resume latency is no constraint if the input is "0". >> + * 3> the resume latency is the actual value to be set. >> + */ >> + if (latency == 0) > > Why not to use your own macro: > RTE_POWER_QOS_STRICT_LATENCY_VALUE > Instead of hard-coded constant here? you are right. will fix it in next version. > >> + snprintf(buf, sizeof(buf), "%s", "n/a"); >> + else if (latency == RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT) >> + snprintf(buf, sizeof(buf), "%u", 0); >> + else >> + snprintf(buf, sizeof(buf), "%u", latency); >> + >> + ret = write_core_sysfs_s(f, buf); >> + if (ret != 0) >> + POWER_LOG(ERR, "Failed to write "PM_QOS_SYSFILE_RESUME_LATENCY_US" : %s", >> + cpu_id, strerror(errno)); >> + >> + fclose(f); >> + >> + return ret; >> +} >> + >> +int >> +rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id) >> +{ >> + char buf[PM_QOS_CPU_RESUME_LATENCY_BUF_LEN]; >> + int latency = -1; >> + uint32_t cpu_id; >> + FILE *f; >> + int ret; >> + >> + if (!rte_lcore_is_enabled(lcore_id)) { >> + POWER_LOG(ERR, "lcore id %u is not enabled", lcore_id); >> + return -EINVAL; >> + } >> + ret = power_get_lcore_mapped_cpu_id(lcore_id, &cpu_id); >> + if (ret != 0) >> + return ret; >> + >> + ret = open_core_sysfs_file(&f, "r", PM_QOS_SYSFILE_RESUME_LATENCY_US, cpu_id); >> + if (ret != 0) { >> + POWER_LOG(ERR, "Failed to open "PM_QOS_SYSFILE_RESUME_LATENCY_US" : %s", >> + cpu_id, strerror(errno)); >> + return ret; >> + } >> + >> + ret = read_core_sysfs_s(f, buf, sizeof(buf)); >> + if (ret != 0) { >> + POWER_LOG(ERR, "Failed to read "PM_QOS_SYSFILE_RESUME_LATENCY_US" : %s", >> + cpu_id, strerror(errno)); >> + goto out; >> + } >> + >> + /* >> + * Based on the sysfs interface pm_qos_resume_latency_us under >> + * @PM_QOS_SYSFILE_RESUME_LATENCY_US directory in kernel, their meaning >> + * is as follows for different output string. >> + * 1> the resume latency is 0 if the output is "n/a". >> + * 2> the resume latency is no constraint if the output is "0". >> + * 3> the resume latency is the actual value in used for other string. >> + */ >> + if (strcmp(buf, "n/a") == 0) >> + latency = 0; > > RTE_POWER_QOS_STRICT_LATENCY_VALUE Ack > ? > >> + else { >> + latency = strtoul(buf, NULL, 10); >> + latency = latency == 0 ? RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT : latency; >> + } >> + >> +out: >> + fclose(f); >> + >> + return latency != -1 ? latency : ret; >> +} >> diff --git a/lib/power/rte_power_qos.h b/lib/power/rte_power_qos.h >> new file mode 100644 >> index 0000000000..990c488373 >> --- /dev/null >> +++ b/lib/power/rte_power_qos.h >> @@ -0,0 +1,73 @@ >> +/* SPDX-License-Identifier: BSD-3-Clause >> + * Copyright(c) 2024 HiSilicon Limited >> + */ >> + >> +#ifndef RTE_POWER_QOS_H >> +#define RTE_POWER_QOS_H >> + >> +#include >> + >> +#include >> + >> +#ifdef __cplusplus >> +extern "C" { >> +#endif >> + >> +/** >> + * @file rte_power_qos.h >> + * >> + * PM QoS API. >> + * >> + * The CPU-wide resume latency limit has a positive impact on this CPU's idle >> + * state selection in each cpuidle governor. >> + * Please see the PM QoS on CPU wide in the following link: >> + * https://www.kernel.org/doc/html/latest/admin-guide/abi-testing.html?highlight=pm_qos_resume_latency_us#abi-sys-devices- >> power-pm-qos-resume-latency-us >> + * >> + * The deeper the idle state, the lower the power consumption, but the >> + * longer the resume time. Some service are delay sensitive and very except the >> + * low resume time, like interrupt packet receiving mode. >> + * >> + * In these case, per-CPU PM QoS API can be used to control this CPU's idle >> + * state selection and limit just enter the shallowest idle state to low the >> + * delay after sleep by setting strict resume latency (zero value). >> + */ >> + >> +#define RTE_POWER_QOS_STRICT_LATENCY_VALUE 0 >> +#define RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT ((int)(UINT32_MAX >> 1)) > Isn't it just INT32_MAX? will fix it. > >> +/** >> + * @warning >> + * @b EXPERIMENTAL: this API may change without prior notice. >> + * >> + * @param lcore_id >> + * target logical core id >> + * >> + * @param latency >> + * The latency should be greater than and equal to zero in microseconds unit. >> + * >> + * @return >> + * 0 on success. Otherwise negative value is returned. >> + */ >> +__rte_experimental >> +int rte_power_qos_set_cpu_resume_latency(uint16_t lcore_id, int latency); >> + >> +/** >> + * @warning >> + * @b EXPERIMENTAL: this API may change without prior notice. >> + * >> + * Get the current resume latency of this logical core. >> + * The default value in kernel is @see RTE_POWER_QOS_RESUME_LATENCY_NO_CONSTRAINT >> + * if don't set it. >> + * >> + * @return >> + * Negative value on failure. >> + * >= 0 means the actual resume latency limit on this core. >> + */ >> +__rte_experimental >> +int rte_power_qos_get_cpu_resume_latency(uint16_t lcore_id); >> + >> +#ifdef __cplusplus >> +} >> +#endif >> + >> +#endif /* RTE_POWER_QOS_H */ >> diff --git a/lib/power/version.map b/lib/power/version.map >> index c9a226614e..08f178a39d 100644 >> --- a/lib/power/version.map >> +++ b/lib/power/version.map >> @@ -51,4 +51,8 @@ EXPERIMENTAL { >> rte_power_set_uncore_env; >> rte_power_uncore_freqs; >> rte_power_unset_uncore_env; >> + >> + # added in 24.11 >> + rte_power_qos_get_cpu_resume_latency; >> + rte_power_qos_set_cpu_resume_latency; >> }; >> -- >> 2.22.0