From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 234624CC3 for ; Tue, 2 Oct 2018 16:22:50 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Oct 2018 07:22:50 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,332,1534834800"; d="scan'208";a="88524226" Received: from dhunt5-mobl2.ger.corp.intel.com (HELO [10.237.221.37]) ([10.237.221.37]) by orsmga003.jf.intel.com with ESMTP; 02 Oct 2018 07:22:48 -0700 To: Liang Ma Cc: dev@dpdk.org, lei.a.yao@intel.com, ktraynor@redhat.com, marko.kovacevic@intel.com References: <1538146714-30973-1-git-send-email-liang.j.ma@intel.com> <1538488107-7181-1-git-send-email-liang.j.ma@intel.com> From: "Hunt, David" Message-ID: Date: Tue, 2 Oct 2018 15:22:47 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 MIME-Version: 1.0 In-Reply-To: <1538488107-7181-1-git-send-email-liang.j.ma@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-US Subject: Re: [dpdk-dev] [PATCH v10 1/4] lib/librte_power: traffic pattern aware power control X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 02 Oct 2018 14:22:51 -0000 On 2/10/2018 2:48 PM, Liang Ma wrote: > 1. Abstract > > For packet processing workloads such as DPDK polling is continuous. > This means CPU cores always show 100% busy independent of how much work > those cores are doing. It is critical to accurately determine how busy > a core is hugely important for the following reasons: > > * No indication of overload conditions. > > * User does not know how much real load is on a system, resulting > in wasted energy as no power management is utilized. > > Compared to the original l3fwd-power design, instead of going to sleep > after detecting an empty poll, the new mechanism just lowers the core > frequency. As a result, the application does not stop polling the device, > which leads to improved handling of bursts of traffic. > > When the system become busy, the empty poll mechanism can also increase the > core frequency (including turbo) to do best effort for intensive traffic. > This gives us more flexible and balanced traffic awareness over the > standard l3fwd-power application. > > 2. Proposed solution > > The proposed solution focuses on how many times empty polls are executed. > The less the number of empty polls, means current core is busy with > processing workload, therefore, the higher frequency is needed. The high > empty poll number indicates the current core not doing any real work > therefore, we can lower the frequency to safe power. > > In the current implementation, each core has 1 empty-poll counter which > assume 1 core is dedicated to 1 queue. This will need to be expanded in the > future to support multiple queues per core. > > 2.1 Power state definition: > > LOW: Not currently used, reserved for future use. > > MED: the frequency is used to process modest traffic workload. > > HIGH: the frequency is used to process busy traffic workload. > > 2.2 There are two phases to establish the power management system: > > a.Initialization/Training phase. The training phase is necessary > in order to figure out the system polling baseline numbers from > idle to busy. The highest poll count will be during idle, where > all polls are empty. These poll counts will be different between > systems due to the many possible processor micro-arch, cache > and device configurations, hence the training phase. > In the training phase, traffic is blocked so the training > algorithm can average the empty-poll numbers for the LOW, MED and > HIGH power states in order to create a baseline. > The core's counter are collected every 10ms, and the Training > phase will take 2 seconds. > Training is disabled as default configuration. The default > parameter is applied. Sample App still can trigger training > if that's needed. Once the training phase has been executed once on > a system, the application can then be started with the relevant > thresholds provided on the command line, allowing the application > to start passing start traffic immediately > > b.Normal phase. Traffic starts immediately based on the default > thresholds, or based on the user supplied thresholds via the > command line parameters. The run-time poll counts are compared with > the baseline and the decision will be taken to move to MED power > state or HIGH power state. The counters are calculated every 10ms. > > 3. Proposed API > > 1. rte_power_empty_poll_stat_init(struct ep_params **eptr, > uint8_t *freq_tlb, struct ep_policy *policy); > which is used to initialize the power management system. > > 2. rte_power_empty_poll_stat_free(void); > which is used to free the resource hold by power management system. > > 3. rte_power_empty_poll_stat_update(unsigned int lcore_id); > which is used to update specific core empty poll counter, not thread safe > > 4. rte_power_poll_stat_update(unsigned int lcore_id, uint8_t nb_pkt); > which is used to update specific core valid poll counter, not thread safe > > 5. rte_power_empty_poll_stat_fetch(unsigned int lcore_id); > which is used to get specific core empty poll counter. > > 6. rte_power_poll_stat_fetch(unsigned int lcore_id); > which is used to get specific core valid poll counter. > > 7. rte_empty_poll_detection(struct rte_timer *tim, void *arg); > which is used to detect empty poll state changes then take action. > > ChangeLog: > v2: fix some coding style issues. > v3: rename the filename, API name. > v4: no change. > v5: no change. > v6: re-work the code layout, update API. > v7: fix minor typo and lift node num limit. > v8: disable training as default option. > v9: minor git log update. > v10: update due to the code review comments. > > Signed-off-by: Liang Ma > > Reviewed-by: Lei Yao > --- Acked-by: David Hunt