From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id 2845D8E70 for ; Tue, 12 Jan 2016 16:17:25 +0100 (CET) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga101.fm.intel.com with ESMTP; 12 Jan 2016 07:17:24 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.20,558,1444719600"; d="scan'208";a="725456248" Received: from fmsmsx103.amr.corp.intel.com ([10.18.124.201]) by orsmga003.jf.intel.com with ESMTP; 12 Jan 2016 07:17:24 -0800 Received: from fmsmsx154.amr.corp.intel.com (10.18.116.70) by FMSMSX103.amr.corp.intel.com (10.18.124.201) with Microsoft SMTP Server (TLS) id 14.3.248.2; Tue, 12 Jan 2016 07:17:23 -0800 Received: from shsmsx152.ccr.corp.intel.com (10.239.6.52) by FMSMSX154.amr.corp.intel.com (10.18.116.70) with Microsoft SMTP Server (TLS) id 14.3.248.2; Tue, 12 Jan 2016 07:17:23 -0800 Received: from shsmsx104.ccr.corp.intel.com ([169.254.5.183]) by SHSMSX152.ccr.corp.intel.com ([169.254.6.201]) with mapi id 14.03.0248.002; Tue, 12 Jan 2016 23:17:21 +0800 From: "Zhang, Helin" To: Matthew Hall , "dev@dpdk.org" Thread-Topic: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor Thread-Index: AQHRRfurFgXgnEmlnEeFzSCFs4FFBJ74CvTg Date: Tue, 12 Jan 2016 15:17:21 +0000 Message-ID: References: <20151206000839.GA23450@mhcomputing.net> <5688D2EE.5010700@mhcomputing.net> In-Reply-To: <5688D2EE.5010700@mhcomputing.net> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.239.127.40] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 12 Jan 2016 15:17:25 -0000 Hi Matthew Yes, you have indicated out the key, the power management module has change= d or upgraded. Could you help to try the legacy one to see if it still works, as indicated= in your link? Taking control of the governor from kernel to user space, might need one mo= re checks before that. But it is actually not a big issue, as user can switch it back to anything = via 'echo'. Yes, it seems that librte_power is out of date for a while. It is not easy = to track all the kernel versions. Now we have good chance to do that, as you have reported issues. Let's have= a look on the new power management mechanism and then see if we can do som= ething. Really thanks to your questions! Regards, Helin > -----Original Message----- > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Matthew Hall > Sent: Sunday, January 3, 2016 3:51 PM > To: dev@dpdk.org > Subject: Re: [dpdk-dev] librte_power w/ intel_pstate cpufreq governor >=20 > Hello, >=20 > In about one month, I never received any response about all these major > issues I was finding with librte_power and the intel_pstate based CPU > clockrate control driver used in all the new Linux kernels. >=20 > From what I can tell, none of this librte_power code ever worked right i= n the > first place on Sandy Bridge and newer, because the chip secretly ignores > clockrate adjustments from outside. >=20 > Can anyone who is more expert about Intel Power Management please help > me check this and point me to some documentation which explains how this > is supposed to work? >=20 > I am kind of blocked on doing performance / production quality > improvements on my code, without some kind of basic help understanding > how this librte_power stuff should work. >=20 > Thanks, > Matthew. >=20 > On 12/5/15 4:08 PM, Matthew Hall wrote: > > Hello all, > > > > I wanted to ask some questions about librte_power and the great > > adaptive polling / IRQ mode example in l3fwd-power. > > > > I am very interested in getting this to work in my project because it > > will make it much friendlier to attract new community developers if I > > am as cooperative as possible with system resources. > > > > Let's discuss the init process for a moment. It has some problems on > > my system, and I need some help to figure out how to handle this right. > > > > 1. Begins with the call to rte_power_init. > > > > 2. Attempts to init ACPI cpufreq mode. > > > > 2.1. Sets lcore cpufreq governor to userspace mode. > > > > 2.2. Function power_get_available_freqs checks lcore CPU frequencies > from: > > > > /sys/devices/system/cpu/cpuX/cpufreq/scaling_available_frequencies > > > > 2.3. This fails with (cryptic) error "POWER: ERR: File not openned". I > > am planning to write a patch for this error a bit later. > > > > My kernel is using the intel_pstate driver, so > > scaling_available_frequencies does not exist: > > > > http://askubuntu.com/questions/544266/why-are-missing-the-frequency- > op > > tions-on-cpufreq-utils-indicator > > > > 3. When power_get_available_freqs fails, rte_power_acpi_cpufreq_init > fails. > > > > 4. rte_power_init will try rte_power_kvm_vm_init. That will fail > > because it's a physical Skylake system not some kind of VM. > > > > 5. Now rte_power_init totally fails, with error "POWER: ERR: Unable to > > set Power Management Environment for lcore 0". > > > > So, I have a couple of questions to figure out from here: > > > > 1. It seems bad to switch the governor into userspace before verifying > > the frequencies available in scaling_available_frequencies. If there > > are no frequencies available, it seems like it should not be trying to > > take over control of an effectively uncontrollable value. > > > > 2. If the governor is switched to userspace, and then no governing is > > done, it seems like the clockrate will necessarily always be wrong > > also because nothing will be configuring it anymore, neither kernel, > > nor failed DPDK userspace code, since rte_power_freq_up / down > > function pointers will always be NULL. Is this true? This seems bad if = so. > > > > It seems that the librte_power code is basically out of date, as > > pstate has been present since Sandy Bridge, which is quite old by now > > for network processing. I am not sure how to make this work right now. > > So far I see a couple options but I really don't know much about this s= tuff: > > > > 1) skip rte_power_init completely, and let intel_pstate handle it > > using HWP mode > > > > 2) disable intel_pstate, switch to the legacy ACPI cpufreq (but people > > warned this old driver is mostly a no-op and the CPU ignores its freque= ncy > requests). > > > > The Internet advice says it's possible, but not a very good idea, to > > switch from the modern intel_pstate driver to the legacy ACPI mode. > > Reading through the kernel docs (below) state that it's better to use > > HWP (Hardware P State) > > mode: > > > > https://www.kernel.org/doc/Documentation/cpu-freq/intel-pstate.txt > > > > If none of this rte_power_init stuff works, are the other CPU > > conservation measures inside the l3fwd-power example enough to work > > right with HWP all by themselves with nothing additional? > > > > Thanks, > > Matthew. > >