From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 2EDB0459C8; Thu, 19 Sep 2024 05:37:37 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id D6A6E40295; Thu, 19 Sep 2024 05:37:36 +0200 (CEST) Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by mails.dpdk.org (Postfix) with ESMTP id 155E740156 for ; Thu, 19 Sep 2024 05:37:33 +0200 (CEST) Received: from mail.maildlp.com (unknown [172.19.88.105]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4X8LkY0KDqzfcKv; Thu, 19 Sep 2024 11:35:17 +0800 (CST) Received: from kwepemm600004.china.huawei.com (unknown [7.193.23.242]) by mail.maildlp.com (Postfix) with ESMTPS id 37640140393; Thu, 19 Sep 2024 11:37:31 +0800 (CST) Received: from [10.67.121.59] (10.67.121.59) by kwepemm600004.china.huawei.com (7.193.23.242) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2507.39; Thu, 19 Sep 2024 11:37:30 +0800 Message-ID: <42b29014-6b34-585c-9d3d-a10284d8c149@huawei.com> Date: Thu, 19 Sep 2024 11:37:29 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.2.0 Subject: Re: [PATCH v2 1/4] power: refactor core power management library To: "Tummala, Sivaprasad" CC: "dev@dpdk.org" , "david.hunt@intel.com" , "anatoly.burakov@intel.com" , "radu.nicolau@intel.com" , "cristian.dumitrescu@intel.com" , "jerinj@marvell.com" , "konstantin.ananyev@huawei.com" , "Yigit, Ferruh" , "gakhil@marvell.com" References: <20240720165030.246294-1-sivaprasad.tummala@amd.com> <20240826130650.320130-1-sivaprasad.tummala@amd.com> <20240826130650.320130-2-sivaprasad.tummala@amd.com> <375e3ab9-bf84-1844-787d-ec89230ae739@huawei.com> From: "lihuisong (C)" In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.67.121.59] X-ClientProxiedBy: dggems703-chm.china.huawei.com (10.3.19.180) To kwepemm600004.china.huawei.com (7.193.23.242) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org 在 2024/9/18 16:37, Tummala, Sivaprasad 写道: > [AMD Official Use Only - AMD Internal Distribution Only] > >> -----Original Message----- >> From: lihuisong (C) >> Sent: Friday, September 13, 2024 1:05 PM >> To: Tummala, Sivaprasad >> Cc: dev@dpdk.org; david.hunt@intel.com; anatoly.burakov@intel.com; >> radu.nicolau@intel.com; cristian.dumitrescu@intel.com; jerinj@marvell.com; >> konstantin.ananyev@huawei.com; Yigit, Ferruh ; >> gakhil@marvell.com >> Subject: Re: [PATCH v2 1/4] power: refactor core power management library >> >> Caution: This message originated from an External Source. Use proper caution >> when opening attachments, clicking links, or responding. >> >> >> 在 2024/9/12 19:17, Tummala, Sivaprasad 写道: >>> [AMD Official Use Only - AMD Internal Distribution Only] >>> >>> Hi Huisong, >>> >>> Please find my response inline. >>> >>>> -----Original Message----- >>>> From: lihuisong (C) >>>> Sent: Tuesday, August 27, 2024 1:51 PM >>>> To: Tummala, Sivaprasad >>>> Cc: dev@dpdk.org; david.hunt@intel.com; anatoly.burakov@intel.com; >>>> radu.nicolau@intel.com; cristian.dumitrescu@intel.com; >>>> jerinj@marvell.com; konstantin.ananyev@huawei.com; Yigit, Ferruh >>>> ; gakhil@marvell.com >>>> Subject: Re: [PATCH v2 1/4] power: refactor core power management >>>> library >>>> >>>> Caution: This message originated from an External Source. Use proper >>>> caution when opening attachments, clicking links, or responding. >>>> >>>> >>>> Hi Sivaprasa, >>>> >>>> Some comments inline. >>>> >>>> /Huisong >>>> >>>> 在 2024/8/26 21:06, Sivaprasad Tummala 写道: >>>>> This patch introduces a comprehensive refactor to the core power >>>>> management library. The primary focus is on improving modularity and >>>>> organization by relocating specific driver implementations from the >>>>> 'lib/power' directory to dedicated directories within >>>>> 'drivers/power/core/*'. The adjustment of meson.build files enables >>>>> the selective activation of individual drivers. >>>>> These changes contribute to a significant enhancement in code >>>>> organization, providing a clearer structure for driver implementations. >>>>> The refactor aims to improve overall code clarity and boost >>>>> maintainability. Additionally, it establishes a foundation for >>>>> future development, allowing for more focused work on individual >>>>> drivers and seamless integration of forthcoming enhancements. >>>>> >>>>> v2: >>>>> - added NULL check for global_core_ops in rte_power_get_core_ops >>>>> >>>>> Signed-off-by: Sivaprasad Tummala >>>>> --- >>>>> drivers/meson.build | 1 + >>>>> .../power/acpi/acpi_cpufreq.c | 22 +- >>>>> .../power/acpi/acpi_cpufreq.h | 6 +- >>>>> drivers/power/acpi/meson.build | 10 + >>>>> .../power/amd_pstate/amd_pstate_cpufreq.c | 24 +- >>>>> .../power/amd_pstate/amd_pstate_cpufreq.h | 8 +- >>>>> drivers/power/amd_pstate/meson.build | 10 + >>>>> .../power/cppc/cppc_cpufreq.c | 22 +- >>>>> .../power/cppc/cppc_cpufreq.h | 8 +- >>>>> drivers/power/cppc/meson.build | 10 + >>>>> .../power/kvm_vm}/guest_channel.c | 0 >>>>> .../power/kvm_vm}/guest_channel.h | 0 >>>>> .../power/kvm_vm/kvm_vm.c | 22 +- >>>>> .../power/kvm_vm/kvm_vm.h | 6 +- >>>>> drivers/power/kvm_vm/meson.build | 16 + >>>>> drivers/power/meson.build | 12 + >>>>> drivers/power/pstate/meson.build | 10 + >>>>> .../power/pstate/pstate_cpufreq.c | 22 +- >>>>> .../power/pstate/pstate_cpufreq.h | 6 +- >>>>> lib/power/meson.build | 7 +- >>>>> lib/power/power_common.c | 2 +- >>>>> lib/power/power_common.h | 16 +- >>>>> lib/power/rte_power.c | 291 ++++++------------ >>>>> lib/power/rte_power.h | 139 ++++++--- >>>>> lib/power/rte_power_core_ops.h | 208 +++++++++++++ >>>>> lib/power/version.map | 14 + >>>>> 26 files changed, 621 insertions(+), 271 deletions(-) >>>>> rename lib/power/power_acpi_cpufreq.c => >>>>> drivers/power/acpi/acpi_cpufreq.c >>>> (95%) >>>>> rename lib/power/power_acpi_cpufreq.h => >>>>> drivers/power/acpi/acpi_cpufreq.h >>>> (98%) >>>>> create mode 100644 drivers/power/acpi/meson.build >>>>> rename lib/power/power_amd_pstate_cpufreq.c => >>>> drivers/power/amd_pstate/amd_pstate_cpufreq.c (95%) >>>>> rename lib/power/power_amd_pstate_cpufreq.h => >>>> drivers/power/amd_pstate/amd_pstate_cpufreq.h (97%) >>>>> create mode 100644 drivers/power/amd_pstate/meson.build >>>>> rename lib/power/power_cppc_cpufreq.c => >>>>> drivers/power/cppc/cppc_cpufreq.c >>>> (95%) >>>>> rename lib/power/power_cppc_cpufreq.h => >>>>> drivers/power/cppc/cppc_cpufreq.h >>>> (97%) >>>>> create mode 100644 drivers/power/cppc/meson.build >>>>> rename {lib/power => drivers/power/kvm_vm}/guest_channel.c (100%) >>>>> rename {lib/power => drivers/power/kvm_vm}/guest_channel.h (100%) >>>>> rename lib/power/power_kvm_vm.c => drivers/power/kvm_vm/kvm_vm.c >> (82%) >>>>> rename lib/power/power_kvm_vm.h => drivers/power/kvm_vm/kvm_vm.h >> (98%) >>>>> create mode 100644 drivers/power/kvm_vm/meson.build >>>>> create mode 100644 drivers/power/meson.build >>>>> create mode 100644 drivers/power/pstate/meson.build >>>>> rename lib/power/power_pstate_cpufreq.c => >>>> drivers/power/pstate/pstate_cpufreq.c (96%) >>>>> rename lib/power/power_pstate_cpufreq.h => >>>> drivers/power/pstate/pstate_cpufreq.h (98%) >>>>> create mode 100644 lib/power/rte_power_core_ops.h >>>> How about use the following directory structure? >>>> *For power libs* >>>> lib/power/power_common.* >>>> lib/power/rte_power_pmd_mgmt.* >>>> lib/power/rte_power_cpufreq_api.* (replacing rte_power.c file maybe simple for >> us. >>>> but I'm not sure if we can put the init of core, uncore and pmd mgmt >>>> to rte_power_init.c in rte_power.c.) >>>> lib/power/rte_power_uncore_freq_api.* >>> Yes, renaming rte_power.c is definitely a possible incremental change that could >> be considered later. >>> However, for the time being, our focus will be on refactoring the cpufreq drivers >> only. >> The rte_power.c just works for the initialization of cpufreq driver. Now that you are >> reworking core and uncore power library and rearrange the directory under power. >> I think renaming this file name should be more appropriate in this series. >>>> *And has directories under drivers/power:* >>>> 1> For core dvfs driver: >>>> drivers/power/cpufreq/acpi_cpufreq.c >>>> drivers/power/cpufreq/cppc_cpufreq.c >>>> drivers/power/cpufreq/amd_pstate_cpufreq.c >>>> drivers/power/cpufreq/intel_pstate_cpufreq.c >>>> drivers/power/cpufreq/kvm_cpufreq.c >>>> The code of each cpufreq driver is not too much and doesn't probably >>>> increase. So don't need to use a directory for it. >>>> >>>> 2> For uncore dvfs driver: >>>> drivers/power/uncorefreq/intel_uncore.* >>>>> diff --git a/drivers/meson.build b/drivers/meson.build index >>>>> 66931d4241..9d77e0deab 100644 >>>>> --- a/drivers/meson.build >>>>> +++ b/drivers/meson.build >>>>> @@ -29,6 +29,7 @@ subdirs = [ >>>>> 'event', # depends on common, bus, mempool and net. >>>>> 'baseband', # depends on common and bus. >>>>> 'gpu', # depends on common and bus. >>>>> + 'power', # depends on common (in future). >>>>> ] >>>>> >>>>> if meson.is_cross_build() >>>>> diff --git a/lib/power/power_acpi_cpufreq.c >>>>> b/drivers/power/acpi/acpi_cpufreq.c >>>>> similarity index 95% >>>>> rename from lib/power/power_acpi_cpufreq.c rename to >>>>> drivers/power/acpi/acpi_cpufreq.c >>>> do not suggest to create one directory for each cpufreq driver. >>>> Because pstate drivers also comply with ACPI spec, right? >>>> In addition, the code of each cpufreq drivers are not too much. >>>> There is just one file under one directory which is not good. >>> One of our objectives for the refactoring is to selectively disable non-essential >> drivers using Meson build options. >>> However, by rearranging the driver structure, we risk disrupting this capability. >> I get your purpose. >> The cpufreq library has the feature and interface to detect which driver to use, right? >> So it is not necessary for cpufreq library to introduce the Meson build options, which >> probably makes it complicate. > In Meson, you can reduce code size by disabling specific drivers or components through build options, > allowing you to exclude unnecessary features. At runtime, the library will automatically detect the available driver, > and if it's not present in the build, initialization will fail. I still cannot understand why you want to do this. The reducing code size is not a good reason. If all libraries or drivers want to do it like this, need to add more meson options. Unless there is a situation where multiple drivers on a platform can be used, and the automatic detection mechanism is not enough to determine which driver to use. Of course, you also can see the opinion of other reviewers. > We're not introducing any new complexities; rather, we aim to ensure that the drivers in drivers/power/* > are consistent with the other drivers. What do the other drivers stand for? Anyway, we need to make the directory hierarchy under drivers/power/ and lib/power/ clear. >>>>> index 81996e1c13..8637c69703 100644 >>>>> --- a/lib/power/power_acpi_cpufreq.c >>>>> +++ b/drivers/power/acpi/acpi_cpufreq.c >>>>> @@ -10,7 +10,7 @@ >>>>> #include >>>>> #include >>>>> >>>>> -#include "power_acpi_cpufreq.h" >>>>> +#include "acpi_cpufreq.h" >>>>> #include "power_common.h" >>>>> >>>> <...> >>>>> +if not is_linux >>>>> + build = false >>>>> + reason = 'only supported on Linux' >>>>> +endif >>>>> +sources = files('pstate_cpufreq.c') >>>>> + >>>>> +deps += ['power'] >>>>> diff --git a/lib/power/power_pstate_cpufreq.c >>>>> b/drivers/power/pstate/pstate_cpufreq.c >>>>> similarity index 96% >>>>> rename from lib/power/power_pstate_cpufreq.c rename to >>>>> drivers/power/pstate/pstate_cpufreq.c >>>> pstate_cpufreq.c is actually intel_pstate cpufreq driver, right? >>>> So how about modify this file name to intel_pstate_cpufreq.c? >>> Yes, will fix this in next version. >>>>> index 2343121621..c32b1adabc 100644 >>>>> --- a/lib/power/power_pstate_cpufreq.c >>>>> +++ b/drivers/power/pstate/pstate_cpufreq.c >>>>> @@ -15,7 +15,7 @@ >>>>> #include >>>>> >>>>> #include "rte_power_pmd_mgmt.h" >>>>> -#include "power_pstate_cpufreq.h" >>>>> +#include "pstate_cpufreq.h" >>>>> #include "power_common.h" >>>>> >>>>> /* macros used for rounding frequency to nearest 100000 */ @@ >>>>> -888,3 >>>>> +888,23 @@ int power_pstate_get_capabilities(unsigned int lcore_id, >>>>> >>>>> return 0; >>>>> } >>>>> + >>>> <...> >>>>> diff --git a/lib/power/power_common.c b/lib/power/power_common.c >>>>> index 590986d5ef..6c06411e8b 100644 >>>>> --- a/lib/power/power_common.c >>>>> +++ b/lib/power/power_common.c >>>>> @@ -12,7 +12,7 @@ >>>>> >>>>> #include "power_common.h" >>>>> >>>>> -RTE_LOG_REGISTER_DEFAULT(power_logtype, INFO); >>>>> +RTE_LOG_REGISTER_DEFAULT(rte_power_logtype, INFO); >>>>> >>>>> #define POWER_SYSFILE_SCALING_DRIVER \ >>>>> "/sys/devices/system/cpu/cpu%u/cpufreq/scaling_driver" >>>>> diff --git a/lib/power/power_common.h b/lib/power/power_common.h >>>>> index >>>>> 83f742f42a..767686ee12 100644 >>>>> --- a/lib/power/power_common.h >>>>> +++ b/lib/power/power_common.h >>>>> @@ -6,12 +6,13 @@ >>>>> #define _POWER_COMMON_H_ >>>>> >>>>> #include >>>>> +#include >>>>> #include >>>>> >>>>> #define RTE_POWER_INVALID_FREQ_INDEX (~0) >>>>> >>>>> -extern int power_logtype; >>>>> -#define RTE_LOGTYPE_POWER power_logtype >>>>> +extern int rte_power_logtype; >>>>> +#define RTE_LOGTYPE_POWER rte_power_logtype >>>>> #define POWER_LOG(level, ...) \ >>>>> RTE_LOG_LINE(level, POWER, "" __VA_ARGS__) >>>>> >>>>> @@ -23,13 +24,24 @@ extern int power_logtype; >>>>> #endif >>>>> >>>>> /* check if scaling driver matches one we want */ >>>>> +__rte_internal >>>>> int cpufreq_check_scaling_driver(const char *driver); >>>>> + >>>>> +__rte_internal >>>>> int power_set_governor(unsigned int lcore_id, const char *new_governor, >>>>> char *orig_governor, size_t orig_governor_len); >>>> suggest that move cpufreq interfaces like this to the >>>> rte_power_cpufreq_api.* I proposed above. >>> This is an internal API and isn’t intended for direct use by applications. >>> By moving it to rte_power_*, we risk exposing it inadvertently. >> we don't expose these to applications. application do not include this header file. >> power_set_governor() and cpufreq_check_scaling_driver() is just used by cpufreq >> driver. So they just can be seen by cpufreq lib or module, right? >> But if these interface are in power_common.h, pmd_mgmt and uncore driver also >> include this header file and can see them. This is not good. >> AFAIS, the power_common.h should just contain the kind of interfaces that are used >> by all power libs or sub-modules, like cpufreq, uncore, pmd_mgmt and so on. > OK., Will move this internal APIs from power_common.h to a separate header file. >>>> The interfaces in power_comm.* can be used by all power modules, like >>>> core/uncore/pmd mgmt. >>>>> + >>>>> +__rte_internal >>>>> int open_core_sysfs_file(FILE **f, const char *mode, const char *format, ...) >>>>> __rte_format_printf(3, 4); >>>>> + >>>>> +__rte_internal >>>>> int read_core_sysfs_u32(FILE *f, uint32_t *val); >>>>> + >>>>> +__rte_internal >>>>> int read_core_sysfs_s(FILE *f, char *buf, unsigned int len); >>>>> + >>>>> +__rte_internal >>>>> int write_core_sysfs_s(FILE *f, const char *str); >>>>> >>>>> #endif /* _POWER_COMMON_H_ */ >>>>> diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c >>>> The name of the rte_power.c file is impropriate now. The context in >>>> this file is just for cpufreq, right? >>>> So I suggest that we need to rename this file as the >>>> rte_power_cpufreq_api.c >>> Yes, renaming rte_power.c to rte_power_cpufreq.c is definitely a >>> possible incremental change and will fix this as a separate patch. >>> . >>> >>>>> index 36c3f3da98..2bf6d40517 100644 >>>>> --- a/lib/power/rte_power.c >>>>> +++ b/lib/power/rte_power.c >>>>> @@ -8,153 +8,86 @@ >>>>> #include >>>>> >>>>> #include "rte_power.h" >>>>> -#include "power_acpi_cpufreq.h" >>>>> -#include "power_cppc_cpufreq.h" >>>>> #include "power_common.h" >>>>> -#include "power_kvm_vm.h" >>>>> -#include "power_pstate_cpufreq.h" >>>>> -#include "power_amd_pstate_cpufreq.h" >>>>> >>>>> -enum power_management_env global_default_env = PM_ENV_NOT_SET; >>>>> +static enum power_management_env global_default_env = >>>> PM_ENV_NOT_SET; >>>>> +static struct rte_power_core_ops *global_power_core_ops; >>>>> >>>>> static rte_spinlock_t global_env_cfg_lock = >>>>> RTE_SPINLOCK_INITIALIZER; >>>>> +static RTE_TAILQ_HEAD(, rte_power_core_ops) core_ops_list = >>>>> + TAILQ_HEAD_INITIALIZER(core_ops_list); >>>>> >>>>> -/* function pointers */ >>>>> -rte_power_freqs_t rte_power_freqs = NULL; -rte_power_get_freq_t >>>>> rte_power_get_freq = NULL; -rte_power_set_freq_t rte_power_set_freq >>>>> = NULL; -rte_power_freq_change_t rte_power_freq_up = NULL; >>>>> -rte_power_freq_change_t rte_power_freq_down = NULL; >>>>> -rte_power_freq_change_t rte_power_freq_max = NULL; >>>>> -rte_power_freq_change_t rte_power_freq_min = NULL; >>>>> -rte_power_freq_change_t rte_power_turbo_status; >>>>> -rte_power_freq_change_t rte_power_freq_enable_turbo; >>>>> -rte_power_freq_change_t rte_power_freq_disable_turbo; >>>>> -rte_power_get_capabilities_t rte_power_get_capabilities; >>>>> - >>>>> -static void >>>>> -reset_power_function_ptrs(void) >>>>> + >>>>> +const char *power_env_str[] = { >>>>> + "not set", >>>>> + "acpi", >>>>> + "kvm-vm", >>>>> + "pstate", >>>>> + "cppc", >>>>> + "amd-pstate" >>>>> +}; >>>>> + >>>>> +/* register the ops struct in rte_power_core_ops, return 0 on >>>>> +success. */ int rte_power_register_ops(struct rte_power_core_ops >>>>> +*driver_ops) >>>>> { >>>>> - rte_power_freqs = NULL; >>>>> - rte_power_get_freq = NULL; >>>>> - rte_power_set_freq = NULL; >>>>> - rte_power_freq_up = NULL; >>>>> - rte_power_freq_down = NULL; >>>>> - rte_power_freq_max = NULL; >>>>> - rte_power_freq_min = NULL; >>>>> - rte_power_turbo_status = NULL; >>>>> - rte_power_freq_enable_turbo = NULL; >>>>> - rte_power_freq_disable_turbo = NULL; >>>>> - rte_power_get_capabilities = NULL; >>>>> + if (!driver_ops->init || !driver_ops->exit || >>>>> + !driver_ops->check_env_support || !driver_ops->get_avail_freqs || >>>>> + !driver_ops->get_freq || !driver_ops->set_freq || >>>>> + !driver_ops->freq_up || !driver_ops->freq_down || >>>>> + !driver_ops->freq_max || !driver_ops->freq_min || >>>>> + !driver_ops->turbo_status || !driver_ops->enable_turbo || >>>>> + !driver_ops->disable_turbo || !driver_ops->get_caps) { >>>>> + POWER_LOG(ERR, "Missing callbacks while registering >>>>> + power ops"); >>>> turbo_status(), enable_turbo() and disable turbo() are not necessary, right? >>> Nope, this is required to get the current status unlike the capability API >> (get_caps()). >> ok >>>> These depand on the capabilities from get_caps(). >>>>> + return -EINVAL; >>>>> + } >>>>> + >>>>> + TAILQ_INSERT_TAIL(&core_ops_list, driver_ops, next); >>>>> + >>>>> + return 0; >>>>> } >>>>> >>>>> int >>>>> rte_power_check_env_supported(enum power_management_env env) >>>>> { >>>>> - switch (env) { >>>>> - case PM_ENV_ACPI_CPUFREQ: >>>>> - return power_acpi_cpufreq_check_supported(); >>>>> - case PM_ENV_PSTATE_CPUFREQ: >>>>> - return power_pstate_cpufreq_check_supported(); >>>>> - case PM_ENV_KVM_VM: >>>>> - return power_kvm_vm_check_supported(); >>>>> - case PM_ENV_CPPC_CPUFREQ: >>>>> - return power_cppc_cpufreq_check_supported(); >>>>> - case PM_ENV_AMD_PSTATE_CPUFREQ: >>>>> - return power_amd_pstate_cpufreq_check_supported(); >>>>> - default: >>>>> - rte_errno = EINVAL; >>>>> - return -1; >>>>> - } >>>>> + struct rte_power_core_ops *ops; >>>>> + >>>>> + if (env >= RTE_DIM(power_env_str)) >>>>> + return 0; >>>>> + >>>>> + RTE_TAILQ_FOREACH(ops, &core_ops_list, next) >>>>> + if (strncmp(ops->name, power_env_str[env], >>>>> + RTE_POWER_DRIVER_NAMESZ) == 0) >>>>> + return ops->check_env_support(); >>>>> + >>>>> + return 0; >>>>> } >>>>> >>>>> int >>>>> rte_power_set_env(enum power_management_env env) >>>>> { >>>>> + struct rte_power_core_ops *ops; >>>>> + int ret = -1; >>>>> + >>>>> rte_spinlock_lock(&global_env_cfg_lock); >>>>> >>>>> if (global_default_env != PM_ENV_NOT_SET) { >>>>> POWER_LOG(ERR, "Power Management Environment already >> set."); >>>>> - rte_spinlock_unlock(&global_env_cfg_lock); >>>>> - return -1; >>>>> - } >>>>> - >>>> <...> >>>>> - if (ret == 0) >>>>> - global_default_env = env; >>>>> - else { >>>>> - global_default_env = PM_ENV_NOT_SET; >>>>> - reset_power_function_ptrs(); >>>>> - } >>>>> + RTE_TAILQ_FOREACH(ops, &core_ops_list, next) >>>>> + if (strncmp(ops->name, power_env_str[env], >>>>> + RTE_POWER_DRIVER_NAMESZ) == 0) { >>>>> + global_power_core_ops = ops; >>>>> + global_default_env = env; >>>>> + ret = 0; >>>>> + goto out; >>>>> + } >>>>> + POWER_LOG(ERR, "Invalid Power Management Environment(%d) set", >>>>> + env); >>>>> >>>>> +out: >>>>> rte_spinlock_unlock(&global_env_cfg_lock); >>>>> return ret; >>>>> } >>>>> @@ -164,94 +97,66 @@ rte_power_unset_env(void) >>>>> { >>>>> rte_spinlock_lock(&global_env_cfg_lock); >>>>> global_default_env = PM_ENV_NOT_SET; >>>>> - reset_power_function_ptrs(); >>>>> + global_power_core_ops = NULL; >>>>> rte_spinlock_unlock(&global_env_cfg_lock); >>>>> } >>>>> >>>>> enum power_management_env >>>>> -rte_power_get_env(void) { >>>>> +rte_power_get_env(void) >>>>> +{ >>>>> return global_default_env; >>>>> } >>>>> >>>>> -int >>>>> -rte_power_init(unsigned int lcore_id) >>>>> +struct rte_power_core_ops * >>>>> +rte_power_get_core_ops(void) >>>>> { >>>>> - int ret = -1; >>>>> + RTE_ASSERT(global_power_core_ops != NULL); >>>>> >>>>> - switch (global_default_env) { >>>>> - case PM_ENV_ACPI_CPUFREQ: >>>>> - return power_acpi_cpufreq_init(lcore_id); >>>>> - case PM_ENV_KVM_VM: >>>>> - return power_kvm_vm_init(lcore_id); >>>>> - case PM_ENV_PSTATE_CPUFREQ: >>>>> - return power_pstate_cpufreq_init(lcore_id); >>>>> - case PM_ENV_CPPC_CPUFREQ: >>>>> - return power_cppc_cpufreq_init(lcore_id); >>>>> - case PM_ENV_AMD_PSTATE_CPUFREQ: >>>>> - return power_amd_pstate_cpufreq_init(lcore_id); >>>>> - default: >>>>> - POWER_LOG(INFO, "Env isn't set yet!"); >>>>> - } >>>>> + return global_power_core_ops; >>>>> +} >>>>> >>>>> - /* Auto detect Environment */ >>>>> - POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq power >>>> management..."); >>>>> - ret = power_acpi_cpufreq_init(lcore_id); >>>>> - if (ret == 0) { >>>>> - rte_power_set_env(PM_ENV_ACPI_CPUFREQ); >>>>> - goto out; >>>>> - } >>>>> +int >>>>> +rte_power_init(unsigned int lcore_id) { >>>>> + struct rte_power_core_ops *ops; >>>>> + uint8_t env; >>>>> >>>>> - POWER_LOG(INFO, "Attempting to initialise PSTAT power >> management..."); >>>>> - ret = power_pstate_cpufreq_init(lcore_id); >>>>> - if (ret == 0) { >>>>> - rte_power_set_env(PM_ENV_PSTATE_CPUFREQ); >>>>> - goto out; >>>>> - } >>>>> + if (global_default_env != PM_ENV_NOT_SET) >>>>> + return global_power_core_ops->init(lcore_id); >>>>> >>>>> - POWER_LOG(INFO, "Attempting to initialise AMD PSTATE power >>>> management..."); >>>>> - ret = power_amd_pstate_cpufreq_init(lcore_id); >>>>> - if (ret == 0) { >>>>> - rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ); >>>>> - goto out; >>>>> - } >>>>> + POWER_LOG(INFO, "Env isn't set yet!"); >>>> remove this log? >>>>> - POWER_LOG(INFO, "Attempting to initialise CPPC power >> management..."); >>>>> - ret = power_cppc_cpufreq_init(lcore_id); >>>>> - if (ret == 0) { >>>>> - rte_power_set_env(PM_ENV_CPPC_CPUFREQ); >>>>> - goto out; >>>>> - } >>>>> + /* Auto detect Environment */ >>>>> + RTE_TAILQ_FOREACH(ops, &core_ops_list, next) >>>>> + if (ops) { >>>>> + POWER_LOG(INFO, >>>>> + "Attempting to initialise %s cpufreq power management...", >>>>> + ops->name); >>>>> + if (ops->init(lcore_id) == 0) { >>>>> + for (env = 0; env < RTE_DIM(power_env_str); env++) >>>>> + if (strncmp(ops->name, power_env_str[env], >>>>> + RTE_POWER_DRIVER_NAMESZ) == 0) { >>>>> + rte_power_set_env(env); >>>>> + return 0; >>>>> + } >>>>> + } >>>>> + } >>>> Can we change the logic of rte_power_set_env()? like: >>>> RTE_TAILQ_FOREACH(ops, &core_ops_list, next) { >>>> for (env = 0; env < RTE_DIM(power_env_str); env++) { >>>> if (strncmp(ops->name, power_env_str[env], >>>> RTE_POWER_DRIVER_NAMESZ) == 0 && >>>> ops->init(lcore_id) == 0) { >>>> global_power_core_ops = ops; >>>> global_default_env = env; >>>> } >>>> } >>>> } >>>> That is easier to follow code. >>> Yes, will fix in next version. >>> >>>>> + >>>>> + POWER_LOG(ERR, >>>>> + "Unable to set Power Management Environment for lcore %u", >>>>> + lcore_id); >>>>> >>>>> - POWER_LOG(INFO, "Attempting to initialise VM power management..."); >>>>> - ret = power_kvm_vm_init(lcore_id); >>>>> - if (ret == 0) { >>>>> - rte_power_set_env(PM_ENV_KVM_VM); >>>>> - goto out; >>>>> - } >>>>> - POWER_LOG(ERR, "Unable to set Power Management Environment for >> lcore >>>> " >>>>> - "%u", lcore_id); >>>>> -out: >>>>> - return ret; >>>>> + return -1; >>>>> } >>>>> >>>>> int >>>>> rte_power_exit(unsigned int lcore_id) >>>>> { >>>>> - switch (global_default_env) { >>>>> - case PM_ENV_ACPI_CPUFREQ: >>>>> - return power_acpi_cpufreq_exit(lcore_id); >>>>> - case PM_ENV_KVM_VM: >>>>> - return power_kvm_vm_exit(lcore_id); >>>>> - case PM_ENV_PSTATE_CPUFREQ: >>>>> - return power_pstate_cpufreq_exit(lcore_id); >>>>> - case PM_ENV_CPPC_CPUFREQ: >>>>> - return power_cppc_cpufreq_exit(lcore_id); >>>>> - case PM_ENV_AMD_PSTATE_CPUFREQ: >>>>> - return power_amd_pstate_cpufreq_exit(lcore_id); >>>>> - default: >>>>> - POWER_LOG(ERR, "Environment has not been set, unable to exit >>>> gracefully"); >>>>> + if (global_default_env != PM_ENV_NOT_SET) >>>>> + return global_power_core_ops->exit(lcore_id); >>>>> >>>>> - } >>>>> - return -1; >>>>> + POWER_LOG(ERR, >>>>> + "Environment has not been set, unable to exit >>>>> + gracefully"); >>>>> >>>>> + return -1; >>>>> } >>>>> diff --git a/lib/power/rte_power.h b/lib/power/rte_power.h index >>>>> 4fa4afe399..5e4aacf08b 100644 >>>>> --- a/lib/power/rte_power.h >>>>> +++ b/lib/power/rte_power.h >>>>> @@ -1,5 +1,6 @@ >>>>> /* SPDX-License-Identifier: BSD-3-Clause >>>>> * Copyright(c) 2010-2014 Intel Corporation >>>>> + * Copyright(c) 2024 Advanced Micro Devices, Inc. >>>>> */ >>>>> >>>>> #ifndef _RTE_POWER_H >>>>> @@ -14,14 +15,21 @@ >>>>> #include >>>>> #include >>>>> >>>>> +#include "rte_power_core_ops.h" >>>>> + >>>>> #ifdef __cplusplus >>>>> extern "C" { >>>>> #endif >>>>> >>>>> /* Power Management Environment State */ -enum >>>>> power_management_env {PM_ENV_NOT_SET, PM_ENV_ACPI_CPUFREQ, >> PM_ENV_KVM_VM, >>>>> - PM_ENV_PSTATE_CPUFREQ, PM_ENV_CPPC_CPUFREQ, >>>>> - PM_ENV_AMD_PSTATE_CPUFREQ}; >>>>> +enum power_management_env { >>>>> + PM_ENV_NOT_SET = 0, >>>>> + PM_ENV_ACPI_CPUFREQ, >>>>> + PM_ENV_KVM_VM, >>>>> + PM_ENV_PSTATE_CPUFREQ, >>>>> + PM_ENV_CPPC_CPUFREQ, >>>>> + PM_ENV_AMD_PSTATE_CPUFREQ >>>>> +}; >>>>> >>>>> /** >>>>> * Check if a specific power management environment type is >>>>> supported on a @@ -66,6 +74,15 @@ void rte_power_unset_env(void); >>>>> */ >>>>> enum power_management_env rte_power_get_env(void); >>>> I'd like to let user not know used which cpufreq driver, which is friendly to user. >>>> >>>> So we can rethink if this API is necessary. >>> For any API changes, could we handle this as a separate RFC for discussion? >>> It’s important that these changes are not included within the scope of this patch. >> Agreed. >> Can you post a separate RFC to disscuss this improvement later? >>>>> +/** >>>>> + * @internal Get the power ops struct from its index. >>>>> + * >>>>> + * @return >>>>> + * The pointer to the ops struct in the table if registered. >>>>> + */ >>>>> +struct rte_power_core_ops * >>>>> +rte_power_get_core_ops(void); >>>>> + >>>>> /** >>>>> * Initialize power management for a specific lcore. If rte_power_set_env() >> has >>>>> * not been called then an auto-detect of the environment will >>>>> start and @@ -108,10 +125,13 @@ int rte_power_exit(unsigned int lcore_id); >>>>> * @return >>>>> * The number of available frequencies. >>>>> */ >>>>> -typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, uint32_t *freqs, >>>>> - uint32_t num); >>>>> +static inline uint32_t >>>>> +rte_power_freqs(unsigned int lcore_id, uint32_t *freqs, uint32_t n) { >>>>> + struct rte_power_core_ops *ops = rte_power_get_core_ops(); >>>>> >>>>> -extern rte_power_freqs_t rte_power_freqs; >>>>> + return ops->get_avail_freqs(lcore_id, freqs, n); } >>>>> >>>>> /** >>>>> * Return the current index of available frequencies of a specific lcore. >>>>> @@ -124,9 +144,13 @@ extern rte_power_freqs_t rte_power_freqs; >>>>> * @return >>>>> * The current index of available frequencies. >>>>> */ >>>>> -typedef uint32_t (*rte_power_get_freq_t)(unsigned int lcore_id); >>>>> +static inline uint32_t >>>>> +rte_power_get_freq(unsigned int lcore_id) { >>>>> + struct rte_power_core_ops *ops = rte_power_get_core_ops(); >>>>> >>>>> -extern rte_power_get_freq_t rte_power_get_freq; >>>>> + return ops->get_freq(lcore_id); } >>>>> >>>>> /** >>>>> * Set the new frequency for a specific lcore by indicating the >>>>> index of @@ -144,82 +168,101 @@ extern rte_power_get_freq_t >>>> rte_power_get_freq; >>>>> * - 0 on success without frequency changed. >>>>> * - Negative on error. >>>>> */ >>>>> -typedef int (*rte_power_set_freq_t)(unsigned int lcore_id, uint32_t >>>>> index); >>>>> - >>>>> -extern rte_power_set_freq_t rte_power_set_freq; >>>>> +static inline uint32_t >>>>> +rte_power_set_freq(unsigned int lcore_id, uint32_t index) { >>>>> + struct rte_power_core_ops *ops = rte_power_get_core_ops(); >>>>> >>>>> -/** >>>>> - * Function pointer definition for generic frequency change >>>>> functions. Review >>>>> - * each environments specific documentation for usage. >>>>> - * >>>>> - * @param lcore_id >>>>> - * lcore id. >>>>> - * >>>>> - * @return >>>>> - * - 1 on success with frequency changed. >>>>> - * - 0 on success without frequency changed. >>>>> - * - Negative on error. >>>>> - */ >>>>> -typedef int (*rte_power_freq_change_t)(unsigned int lcore_id); >>>>> + return ops->set_freq(lcore_id, index); } >>>>> >>>>> /** >>>>> * Scale up the frequency of a specific lcore according to the available >>>>> * frequencies. >>>>> * Review each environments specific documentation for usage. >>>>> */ >>>>> -extern rte_power_freq_change_t rte_power_freq_up; >>>>> +static inline int >>>>> +rte_power_freq_up(unsigned int lcore_id) { >>>>> + struct rte_power_core_ops *ops = rte_power_get_core_ops(); >>>>> + >>>>> + return ops->freq_up(lcore_id); } >>>>> >>>>> /** >>>>> * Scale down the frequency of a specific lcore according to the available >>>>> * frequencies. >>>>> * Review each environments specific documentation for usage. >>>>> */ >>>>> -extern rte_power_freq_change_t rte_power_freq_down; >>>>> +static inline int >>>>> +rte_power_freq_down(unsigned int lcore_id) { >>>>> + struct rte_power_core_ops *ops = rte_power_get_core_ops(); >>>>> + >>>>> + return ops->freq_down(lcore_id); } >>>>> >>>>> /** >>>>> * Scale up the frequency of a specific lcore to the highest according to the >>>>> * available frequencies. >>>>> * Review each environments specific documentation for usage. >>>>> */ >>>>> -extern rte_power_freq_change_t rte_power_freq_max; >>>>> +static inline int >>>>> +rte_power_freq_max(unsigned int lcore_id) { >>>>> + struct rte_power_core_ops *ops = rte_power_get_core_ops(); >>>>> + >>>>> + return ops->freq_max(lcore_id); } >>>>> >>>>> /** >>>>> * Scale down the frequency of a specific lcore to the lowest according to the >>>>> * available frequencies. >>>>> * Review each environments specific documentation for usage.. >>>>> */ >>>>> -extern rte_power_freq_change_t rte_power_freq_min; >>>>> +static inline int >>>>> +rte_power_freq_min(unsigned int lcore_id) { >>>>> + struct rte_power_core_ops *ops = rte_power_get_core_ops(); >>>>> + >>>>> + return ops->freq_min(lcore_id); } >>>>> >>>>> /** >>>>> * Query the Turbo Boost status of a specific lcore. >>>>> * Review each environments specific documentation for usage.. >>>>> */ >>>>> -extern rte_power_freq_change_t rte_power_turbo_status; >>>>> +static inline int >>>>> +rte_power_turbo_status(unsigned int lcore_id) { >>>>> + struct rte_power_core_ops *ops = rte_power_get_core_ops(); >>>>> + >>>>> + return ops->turbo_status(lcore_id); } >>>>> >>>>> /** >>>>> * Enable Turbo Boost for this lcore. >>>>> * Review each environments specific documentation for usage.. >>>>> */ >>>>> -extern rte_power_freq_change_t rte_power_freq_enable_turbo; >>>>> +static inline int >>>>> +rte_power_freq_enable_turbo(unsigned int lcore_id) { >>>>> + struct rte_power_core_ops *ops = rte_power_get_core_ops(); >>>>> + >>>>> + return ops->enable_turbo(lcore_id); } >>>>> >>>>> /** >>>>> * Disable Turbo Boost for this lcore. >>>>> * Review each environments specific documentation for usage.. >>>>> */ >>>>> -extern rte_power_freq_change_t rte_power_freq_disable_turbo; >>>>> +static inline int >>>>> +rte_power_freq_disable_turbo(unsigned int lcore_id) { >>>>> + struct rte_power_core_ops *ops = rte_power_get_core_ops(); >>>>> >>>>> -/** >>>>> - * Power capabilities summary. >>>>> - */ >>>>> -struct rte_power_core_capabilities { >>>>> - union { >>>>> - uint64_t capabilities; >>>>> - struct { >>>>> - uint64_t turbo:1; /**< Turbo can be enabled. */ >>>>> - uint64_t priority:1; /**< SST-BF high freq core */ >>>>> - }; >>>>> - }; >>>>> -}; >>>>> + return ops->disable_turbo(lcore_id); } >>>>> >>>>> /** >>>>> * Returns power capabilities for a specific lcore. >>>>> @@ -235,10 +278,14 @@ struct rte_power_core_capabilities { >>>>> * - 0 on success. >>>>> * - Negative on error. >>>>> */ >>>>> -typedef int (*rte_power_get_capabilities_t)(unsigned int lcore_id, >>>>> - struct rte_power_core_capabilities *caps); >>>>> +static inline int >>>>> +rte_power_get_capabilities(unsigned int lcore_id, >>>>> + struct rte_power_core_capabilities *caps) { >>>>> + struct rte_power_core_ops *ops = rte_power_get_core_ops(); >>>>> >>>>> -extern rte_power_get_capabilities_t rte_power_get_capabilities; >>>>> + return ops->get_caps(lcore_id, caps); } >>>>> >>>>> #ifdef __cplusplus >>>>> } >>>>> diff --git a/lib/power/rte_power_core_ops.h >>>>> b/lib/power/rte_power_core_ops.h new file mode 100644 index >>>>> 0000000000..356a64df79 >>>>> --- /dev/null >>>>> +++ b/lib/power/rte_power_core_ops.h >>>>> @@ -0,0 +1,208 @@ >>>>> +/* SPDX-License-Identifier: BSD-3-Clause >>>>> + * Copyright(c) 2010-2014 Intel Corporation >>>>> + * Copyright(c) 2024 Advanced Micro Devices, Inc. >>>>> + */ >>>>> + >>>>> +#ifndef _RTE_POWER_CORE_OPS_H >>>>> +#define _RTE_POWER_CORE_OPS_H >>>>> + >>>> suggest rename the file as rte_power_cpufreq_api.h. >>>> If so, the role of this file is more clearly. >>>>> +__rte_internal >>>>> +int rte_power_register_ops(struct rte_power_core_ops *ops); >>>>> + >>>>> +/** >>>>> + * Macro to statically register the ops of a cpufreq driver. >>>>> + */ >>>>> +#define RTE_POWER_REGISTER_OPS(ops) \ >>>>> + RTE_INIT(power_hdlr_init_##ops) \ >>>>> + { \ >>>>> + rte_power_register_ops(&ops); \ >>>>> + } >>>>> + >>>>> +/** >>>>> + * @internal Get the power ops struct from its index. >>>>> + * >>>>> + * @return >>>>> + * The pointer to the ops struct in the table if registered. >>>>> + */ >>>>> +struct rte_power_core_ops * >>>>> +rte_power_get_core_ops(void); >>>>> + >>>>> +#ifdef __cplusplus >>>>> +} >>>>> +#endif >>>>> + >>>>> +#endif >>>>> diff --git a/lib/power/version.map b/lib/power/version.map index >>>>> c9a226614e..bd64e0828f 100644 >>>>> --- a/lib/power/version.map >>>>> +++ b/lib/power/version.map >>>>> @@ -51,4 +51,18 @@ EXPERIMENTAL { >>>>> rte_power_set_uncore_env; >>>>> rte_power_uncore_freqs; >>>>> rte_power_unset_uncore_env; >>>>> + # added in 24.07 >>>> 24.07-->24.11? >>>>> + rte_power_logtype; >>>>> +}; >>>>> + >>>>> +INTERNAL { >>>>> + global: >>>>> + >>>>> + rte_power_register_ops; >>>>> + cpufreq_check_scaling_driver; >>>>> + power_set_governor; >>>>> + open_core_sysfs_file; >>>>> + read_core_sysfs_u32; >>>>> + read_core_sysfs_s; >>>>> + write_core_sysfs_s; >>>>> };