From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id D40013F9 for ; Tue, 9 Dec 2014 13:02:23 +0100 (CET) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP; 09 Dec 2014 04:02:18 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.04,691,1406617200"; d="scan'208";a="495979516" Received: from irvmail001.ir.intel.com ([163.33.26.43]) by orsmga003.jf.intel.com with ESMTP; 09 Dec 2014 03:58:35 -0800 Received: from sivswdev02.ir.intel.com (sivswdev02.ir.intel.com [10.237.217.46]) by irvmail001.ir.intel.com (8.14.3/8.13.6/MailSET/Hub) with ESMTP id sB9C2Gvr017319; Tue, 9 Dec 2014 12:02:16 GMT Received: from sivswdev02.ir.intel.com (localhost [127.0.0.1]) by sivswdev02.ir.intel.com with ESMTP id sB9C2G0e011998; Tue, 9 Dec 2014 12:02:16 GMT Received: (from pdelarax@localhost) by sivswdev02.ir.intel.com with id sB9C2GSd011994; Tue, 9 Dec 2014 12:02:16 GMT From: Pablo de Lara To: dev@dpdk.org Date: Tue, 9 Dec 2014 12:02:08 +0000 Message-Id: <1418126528-22287-4-git-send-email-pablo.de.lara.guarch@intel.com> X-Mailer: git-send-email 1.7.4.1 In-Reply-To: <1418126528-22287-1-git-send-email-pablo.de.lara.guarch@intel.com> References: <1417193202-23972-1-git-send-email-pablo.de.lara.guarch@intel.com> <1418126528-22287-1-git-send-email-pablo.de.lara.guarch@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Subject: [dpdk-dev] =?utf-8?q?=5BPATCH_v3_3/3=5D_doc=3A_add_VM_power_mgmt_?= =?utf-8?q?app?= X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Dec 2014 12:02:24 -0000 Added new section in sample app UG for the new VM power management app. Signed-off-by: Alan Carew Signed-off-by: Pablo de Lara --- doc/guides/rel_notes/rel_description.rst | 2 + doc/guides/sample_app_ug/index.rst | 5 + doc/guides/sample_app_ug/vm_power_management.rst | 361 ++++++++++++++++++++++ 3 files changed, 368 insertions(+), 0 deletions(-) create mode 100644 doc/guides/sample_app_ug/vm_power_management.rst diff --git a/doc/guides/rel_notes/rel_description.rst b/doc/guides/rel_notes/rel_description.rst index 07c897b..d159b3c 100644 --- a/doc/guides/rel_notes/rel_description.rst +++ b/doc/guides/rel_notes/rel_description.rst @@ -149,6 +149,8 @@ The following is a list of Intel® DPDK documents in the suggested reading order * Kernel NIC Interface (KNI) + * VM Power Management + In addition, there are some other applications that are built when the libraries are created. The source for these applications is in the DPDK/app directory and are called: diff --git a/doc/guides/sample_app_ug/index.rst b/doc/guides/sample_app_ug/index.rst index db88b0d..c3b50e2 100644 --- a/doc/guides/sample_app_ug/index.rst +++ b/doc/guides/sample_app_ug/index.rst @@ -101,6 +101,7 @@ Copyright © 2012 - 2014, Intel Corporation. All rights reserved. internet_proto_ip_pipeline test_pipeline dist_app + vm_power_management **Figures** @@ -152,6 +153,10 @@ Copyright © 2012 - 2014, Intel Corporation. All rights reserved. :ref:`Figure 23.Distributor Sample Application Layout ` +:ref:`Figure 24.High level Solution ` + +:ref:`Figure 25.VM request to scale frequency ` + **Tables** :ref:`Table 1.Output Traffic Marking ` diff --git a/doc/guides/sample_app_ug/vm_power_management.rst b/doc/guides/sample_app_ug/vm_power_management.rst new file mode 100644 index 0000000..f5b5200 --- /dev/null +++ b/doc/guides/sample_app_ug/vm_power_management.rst @@ -0,0 +1,361 @@ +.. BSD LICENSE + Copyright(c) 2010-2014 Intel Corporation. All rights reserved. + All rights reserved. + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions + are met: + + * Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + * Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in + the documentation and/or other materials provided with the + distribution. + * Neither the name of Intel Corporation nor the names of its + contributors may be used to endorse or promote products derived + from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +VM Power Management Application +=============================== + +Introduction +------------ + +Applications running in Virtual Environments have an abstract view of +the underlying hardware on the Host, in particular applications cannot see +the binding of virtual to physical hardware. +When looking at CPU resourcing, the pinning of Virtual CPUs(vCPUs) to +Host Physical CPUs(pCPUS) is not apparent to an application +and this pinning may change over time. +Furthermore, Operating Systems on virtual machines do not have the ability +to govern their own power policy; the Machine Specific Registers (MSRs) +for enabling P-State transitions are not exposed to Operating Systems +running on Virtual Machines(VMs). + +The Virtual Machine Power Management solution shows an example of +how a DPDK application can indicate its processing requirements using VM local +only information(vCPU/lcore) to a Host based Monitor which is responsible +for accepting requests for frequency changes for a vCPU, translating the vCPU +to a pCPU via libvirt and affecting the change in frequency. + +The solution is comprised of two high-level components: + +#. Example Host Application + + Using a Command Line Interface(CLI) for VM->Host communication channel management + allows adding channels to the Monitor, setting and querying the vCPU to pCPU pinning, + inspecting and manually changing the frequency for each CPU. + The CLI runs on a single lcore while the thread responsible for managing + VM requests runs on a second lcore. + + VM requests arriving on a channel for frequency changes are passed + to the librte_power ACPI cpufreq sysfs based library. + The Host Application relies on both qemu-kvm and libvirt to function. + +#. librte_power for Virtual Machines + + Using an alternate implementation for the librte_power API, requests for + frequency changes are forwarded to the host monitor rather than + the APCI cpufreq sysfs interface used on the host. + + The l3fwd-power application will use this implementation when deployed on a VM + (see Chapter 11 "L3 Forwarding with Power Management Application"). + +.. _figure_24: + +**Figure 24. Highlevel Solution** + +|vm_power_mgr_highlevel| + +Overview +-------- + +VM Power Management employs qemu-kvm to provide communications channels +between the host and VMs in the form of Virtio-Serial which appears as +a paravirtualized serial device on a VM and can be configured to use +various backends on the host. For this example each Virtio-Serial endpoint +on the host is configured as AF_UNIX file socket, supporting poll/select +and epoll for event notification. +In this example each channel endpoint on the host is monitored via +epoll for EPOLLIN events. +Each channel is specified as qemu-kvm arguments or as libvirt XML for each VM, +where each VM can have a number of channels up to a maximum of 64 per VM, +in this example each DPDK lcore on a VM has exclusive access to a channel. + +To enable frequency changes from within a VM, a request via the librte_power interface +is forwarded via Virtio-Serial to the host, each request contains the vCPU +and power command(scale up/down/min/max). +The API for host and guest librte_power is consistent across environments, +with the selection of VM or Host Implementation determined at automatically +at runtime based on the environment. + +Upon receiving a request, the host translates the vCPU to a pCPU via +the libvirt API before forwarding to the host librte_power. + +.. _figure_25: + +**Figure 25. VM request to scale frequency** + +|vm_power_mgr_vm_request_seq| + +Performance Considerations +~~~~~~~~~~~~~~~~~~~~~~~~~~ + +While Haswell Microarchitecture allows for independent power control for each core, +earlier Microarchtectures do not offer such fine grained control. +When deployed on pre-Haswell platforms greater care must be taken in selecting +which cores are assigned to a VM, for instance a core will not scale down +until its sibling is similarly scaled. + +Configuration +------------- + +BIOS +~~~~ + +Enhanced Intel SpeedStep® Technology must be enabled in the platform BIOS +if the power management feature of DPDK is to be used. +Otherwise, the sys file folder /sys/devices/system/cpu/cpu0/cpufreq will not exist, +and the CPU frequency- based power management cannot be used. +Consult the relevant BIOS documentation to determine how these settings +can be accessed. + +Host Operating System +~~~~~~~~~~~~~~~~~~~~~ + +The Host OS must also have the *apci_cpufreq* module installed, in some cases +the *intel_pstate* driver may be the default Power Management environment. +To enable *acpi_cpufreq* and disable *intel_pstate*, add the following +to the grub linux command line: + +.. code-block:: console + + intel_pstate=disable + +Upon rebooting, load the *acpi_cpufreq* module: + +.. code-block:: console + + modprobe acpi_cpufreq + +Hypervisor Channel Configuration +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Virtio-Serial channels are configured via libvirt XML: + + +.. code-block:: xml + + {vm_name} + +
+ + + + + + + +Where a single controller of type *virtio-serial* is created and up to 32 channels +can be associated with a single controller and multiple controllers can be specified. +The convention is to use the name of the VM in the host path *{vm_name}* and +to increment *{channel_num}* for each channel, likewise the port value *{N}* +must be incremented for each channel. + +Each channel on the host will appear in *path*, the directory */tmp/powermonitor/* +must first be created and given qemu permissions + +.. code-block:: console + + mkdir /tmp/powermonitor/ + chown qemu:qemu /tmp/powermonitor + +Note that files and directories within /tmp are generally removed upon +rebooting the host and the above steps may need to be carried out after each reboot. + +The serial device as it appears on a VM is configured with the *target* element attribute *name* +and must be in the form of *virtio.serial.port.poweragent.{vm_channel_num}*, +where *vm_channel_num* is typically the lcore channel to be used in DPDK VM applications. + +Each channel on a VM will be present at */dev/virtio-ports/virtio.serial.port.poweragent.{vm_channel_num}* + +Compiling and Running the Host Application +------------------------------------------ + +Compiling +~~~~~~~~~ + +#. export RTE_SDK=/path/to/rte_sdk +#. cd ${RTE_SDK}/examples/vm_power_manager +#. make + +Running +~~~~~~~ + +The application does not have any specific command line options other than *EAL*: + +.. code-block:: console + + ./build/vm_power_mgr [EAL options] + +The application requires exactly two cores to run, one core is dedicated to the CLI, +while the other is dedicated to the channel endpoint monitor, for example to run +on cores 0 & 1 on a system with 4 memory channels: + +.. code-block:: console + + ./build/vm_power_mgr -c 0x3 -n 4 + +After successful initialisation the user is presented with VM Power Manager CLI: + +.. code-block:: console + + vm_power> + +Virtual Machines can now be added to the VM Power Manager: + +.. code-block:: console + + vm_power> add_vm {vm_name} + +When a {vm_name} is specified with the *add_vm* command a lookup is performed +with libvirt to ensure that the VM exists, {vm_name} is used as an unique identifier +to associate channels with a particular VM and for executing operations on a VM within the CLI. +VMs do not have to be running to in order to add them. + +A number of commands can be issued via the CLI in relation to VMs: + + Remove a Virtual Machine identified by {vm_name} from the VM Power Manager. + + .. code-block:: console + + rm_vm {vm_name} + + Add communication channels for the specified VM, the virtio channels must be enabled + in the VM configuration(qemu/libvirt) and the associated VM must be active. + {list} is a comma-separated list of channel numbers to add, using the keyword 'all' + will attempt to add all channels for the VM: + + .. code-block:: console + + add_channels {vm_name} {list}|all + + Enable or disable the communication channels in {list}(comma-separated) + for the specified VM, alternatively list can be replaced with keyword 'all'. + Disabled channels will still receive packets on the host, however the commands + they specify will be ignored. Set status to 'enabled' to begin processing requests again: + + .. code-block:: console + + set_channel_status {vm_name} {list}|all enabled|disabled + + Print to the CLI the information on the specified VM, the information + lists the number of vCPUS, the pinning to pCPU(s) as a bit mask, along with + any communication channels associated with each VM, along with the status of each channel: + + .. code-block:: console + + show_vm {vm_name} + + Set the binding of Virtual CPU on VM with name {vm_name} to the Physical CPU mask: + + .. code-block:: console + + set_pcpu_mask {vm_name} {vcpu} {pcpu} + + Set the binding of Virtual CPU on VM to the Physical CPU: + + .. code-block:: console + + set_pcpu {vm_name} {vcpu} {pcpu} + +Manual control and inspection can also be carried in relation CPU frequency scaling: + + Get the current frequency for each core specified in the mask: + + .. code-block:: console + + show_cpu_freq_mask {mask} + + Set the current frequency for the cores specified in {core_mask} by scaling each up/down/min/max: + + .. code-block:: console + + set_cpu_freq {core_mask} up|down|min|max + + Get the current frequency for the specified core: + + .. code-block:: console + + show_cpu_freq {core_num} + + Set the current frequency for the specified core by scaling up/down/min/max: + + .. code-block:: console + + set_cpu_freq {core_num} up|down|min|max + +Compiling and Running the Guest Applications +-------------------------------------------- + +For compiling and running l3fwd-power, see Chapter 11 "L3 Forwarding with Power Management Application". + +A guest CLI is also provided for validating the setup. + +For both l3fwd-power and guest CLI, the channels for the VM must be monitored by the +host application using the *add_channels* command on the host. + +Compiling +~~~~~~~~~ + +#. export RTE_SDK=/path/to/rte_sdk +#. cd ${RTE_SDK}/examples/vm_power_manager/guest_cli +#. make + +Running +~~~~~~~ + +The application does not have any specific command line options other than *EAL*: + +.. code-block:: console + + ./build/vm_power_mgr [EAL options] + +The application for example purposes uses a channel for each lcore enabled, +for example to run on cores 0,1,2,3 on a system with 4 memory channels: + +.. code-block:: console + + ./build/guest_vm_power_mgr -c 0xf -n 4 + + +After successful initialisation the user is presented with VM Power Manager Guest CLI: + +.. code-block:: console + + vm_power(guest)> + +To change the frequency of a lcore, use the set_cpu_freq command. +Where {core_num} is the lcore and channel to change frequency by scaling up/down/min/max. + +.. code-block:: console + + set_cpu_freq {core_num} up|down|min|max + +.. |vm_power_mgr_highlevel| image:: img/vm_power_mgr_highlevel.svg + +.. |vm_power_mgr_vm_request_seq| image:: img/vm_power_mgr_vm_request_seq.svg -- 1.7.4.1