From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 7CFED1B4DE for ; Fri, 28 Sep 2018 16:58:50 +0200 (CEST) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga007.jf.intel.com ([10.7.209.58]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 28 Sep 2018 07:58:49 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.54,315,1534834800"; d="scan'208";a="76883601" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga007.jf.intel.com with ESMTP; 28 Sep 2018 07:58:48 -0700 Received: from sivswdev01.ir.intel.com (sivswdev01.ir.intel.com [10.237.217.45]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id w8SEwlrw016661; Fri, 28 Sep 2018 15:58:47 +0100 Received: from sivswdev01.ir.intel.com (localhost [127.0.0.1]) by sivswdev01.ir.intel.com with ESMTP id w8SEwlTs031235; Fri, 28 Sep 2018 15:58:47 +0100 Received: (from lma25@localhost) by sivswdev01.ir.intel.com with LOCAL id w8SEwlV5031231; Fri, 28 Sep 2018 15:58:47 +0100 From: Liang Ma To: david.hunt@intel.com Cc: dev@dpdk.org, lei.a.yao@intel.com, ktraynor@redhat.com, marko.kovacevic@intel.com, Liang Ma Date: Fri, 28 Sep 2018 15:58:33 +0100 Message-Id: <1538146714-30973-3-git-send-email-liang.j.ma@intel.com> X-Mailer: git-send-email 1.7.7.4 In-Reply-To: <1538146714-30973-1-git-send-email-liang.j.ma@intel.com> References: <1537191016-26330-1-git-send-email-liang.j.ma@intel.com> <1538146714-30973-1-git-send-email-liang.j.ma@intel.com> Subject: [dpdk-dev] [PATCH v9 3/4] doc/guides/pro_guide/power-man: update the power API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 28 Sep 2018 14:58:51 -0000 Update the document for empty poll API. Change Logs: v9: minor changes for syntax. Update document. Signed-off-by: Liang Ma --- doc/guides/prog_guide/power_man.rst | 86 +++++++++++++++++++++++++++++++++++++ 1 file changed, 86 insertions(+) diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst index eba1cc6..68b7e8b 100644 --- a/doc/guides/prog_guide/power_man.rst +++ b/doc/guides/prog_guide/power_man.rst @@ -106,6 +106,92 @@ User Cases The power management mechanism is used to save power when performing L3 forwarding. + +Empty Poll API +-------------- + +Abstract +~~~~~~~~ + +For packet processing workloads such as DPDK polling is continuous. +This means CPU cores always show 100% busy independent of how much work +those cores are doing. It is critical to accurately determine how busy +a core is hugely important for the following reasons: + + * No indication of overload conditions + * User does not know how much real load is on a system, resulting + in wasted energy as no power management is utilized + +Compared to the original l3fwd-power design, instead of going to sleep +after detecting an empty poll, the new mechanism just lowers the core frequency. +As a result, the application does not stop polling the device, which leads +to improved handling of bursts of traffic. + +When the system become busy, the empty poll mechanism can also increase the core +frequency (including turbo) to do best effort for intensive traffic. This gives +us more flexible and balanced traffic awareness over the standard l3fwd-power +application. + + +Proposed Solution +~~~~~~~~~~~~~~~~~ +The proposed solution focuses on how many times empty polls are executed. +The less the number of empty polls, means current core is busy with processing +workload, therefore, the higher frequency is needed. The high empty poll number +indicates the current core not doing any real work therefore, we can lower the +frequency to safe power. + +In the current implementation, each core has 1 empty-poll counter which assume +1 core is dedicated to 1 queue. This will need to be expanded in the future to +support multiple queues per core. + +Power state definition: +^^^^^^^^^^^^^^^^^^^^^^^ + +* LOW: Not currently used, reserved for future use. + +* MED: the frequency is used to process modest traffic workload. + +* HIGH: the frequency is used to process busy traffic workload. + +There are two phases to establish the power management system: +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +* Training phase. This phase is used to measure the optimal frequency + change thresholds for a given system. The thresholds will differ from + system to system due to differences in processor micro-architecture, + cache and device configurations. + In this phase, the user must ensure that no traffic can enter the + system so that counts can be measured for empty polls at low, medium + and high frequencies. Each frequency is measured for two seconds. + Once the training phase is complete, the threshold numbers are + displayed, and normal mode resumes, and traffic can be allowed into + the system. These threshold number can be used on the command line + when starting the application in normal mode to avoid re-training + every time. + +* Normal phase. Every 10ms the run-time counters are compared + to the supplied threshold values, and the decision will be made + whether to move to a different power state (by adjusting the + frequency). + +API Overview for Empty Poll Power Management +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ +* **State Init**: initialize the power management system. + +* **State Free**: free the resource hold by power management system. + +* **Update Empty Poll Counter**: update the empty poll counter. + +* **Update Valid Poll Counter**: update the valid poll counter. + +* **Set the Fequence Index**: update the power state/frequency mapping. + +* **Detect empty poll state change**: empty poll state change detection algorithm then take action. + +User Cases +---------- +The mechanism can applied to any device which is based on polling. e.g. NIC, FPGA. + References ---------- -- 2.7.5