From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wi0-f177.google.com (mail-wi0-f177.google.com [209.85.212.177]) by dpdk.org (Postfix) with ESMTP id 0669F7E6E for ; Tue, 14 Oct 2014 16:56:11 +0200 (CEST) Received: by mail-wi0-f177.google.com with SMTP id fb4so10467907wid.16 for ; Tue, 14 Oct 2014 08:03:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:from:to:cc:subject:date:message-id:organization :user-agent:in-reply-to:references:mime-version :content-transfer-encoding:content-type; bh=HwLuEoXbYGU6VC2B2xhJi5wJj4BOxfklsb6fIGYfGnw=; b=Cd6778NXwHtanQGtTU9b487YW5JiUm9QQZxM8GFqHszbrebNxcMWbkNlVtnEqaRbjm EtbDkaWSkPWzbkvLp/qHp1wfAdaFlDNBTHlj5lkZrHXYZrQSri5v3I4ELvcONBs2W2sO hFMLy7oVbmNKKwzbpeinhCuBr6Tn+cZFS3emhgNvKjXmMuoiE8F/Y9j3C9cXFQ7xhROQ 6p03mmkGAw/FRGwI1N6pUXYOoigOzF8ypG7Jv5KsiPqsSX9v0cYxCKa7KO3ra331uQnO rPC31/kp1FsqTpOg/J6MeYaI2FXe0eWf3UAlMZq7nfekQWtSOq7P3dA+u2pDP446s8ZC 3RiQ== X-Gm-Message-State: ALoCoQkJdgEDe6xKWfz/eybO9WMdLGNJy5i7gAI9N2ytH59cH/QQ8yitZvd0jUP6KNGk5g5TFGy7 X-Received: by 10.180.96.10 with SMTP id do10mr6191806wib.16.1413299036397; Tue, 14 Oct 2014 08:03:56 -0700 (PDT) Received: from xps13.localnet (136-92-190-109.dsl.ovh.fr. [109.190.92.136]) by mx.google.com with ESMTPSA id ey6sm16021770wib.16.2014.10.14.08.03.55 for (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Tue, 14 Oct 2014 08:03:55 -0700 (PDT) From: Thomas Monjalon To: "Carew, Alan" Date: Tue, 14 Oct 2014 17:03:41 +0200 Message-ID: <3349663.LNtcecTXb3@xps13> Organization: 6WIND User-Agent: KMail/4.14.1 (Linux/3.16.4-1-ARCH; KDE/4.14.1; x86_64; ; ) In-Reply-To: <0E29434AEE0C3A4180987AB476A6F6306D28093B@IRSMSX109.ger.corp.intel.com> References: <1412003903-9061-1-git-send-email-alan.carew@intel.com> <3264386.kAdiTFhMft@xps13> <0E29434AEE0C3A4180987AB476A6F6306D28093B@IRSMSX109.ger.corp.intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 7Bit Content-Type: text/plain; charset="us-ascii" Cc: dev@dpdk.org Subject: Re: [dpdk-dev] [PATCH v4 00/10] VM Power Management X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Oct 2014 14:56:12 -0000 2014-10-14 12:37, Carew, Alan: > > > The following patches add two DPDK sample applications and an alternate > > > implementation of librte_power for use in virtualized environments. > > > The idea is to provide librte_power functionality from within a VM to address > > > the lack of MSRs to facilitate frequency changes from within a VM. > > > It is ideally suited for Haswell which provides per core frequency scaling. > > > > > > The current librte_power affects frequency changes via the acpi-cpufreq > > > 'userspace' power governor, accessed via sysfs. > > > > Something was preventing me from looking deeper in this big codebase, > > but I didn't know what sounds weird. > > Now I realize: the real problem is that virtualization transparency is > > broken for power management. So the right thing to do is to fix it in > > KVM. I think all this patchset is a huge workaround. > > > > Did you try to fix it with Qemu/KVM? > > When looking at the libvirt API it would seem to be a natural fit to have > power management sitting there, so in essence I would agree. > > However with a DPDK solution it would be possible to re-use the message bus > to pass information like device stats, application state, D-state requests > etc. to the host and allow for management layer(e.g. OpenStack) to make > informed decisions. I think that management informations should be transmitted in a management channel. Such solution should exist in OpenStack. > Also, the scope of adding power management to qemu/KVM would be huge; > while the easier path is not always the best and the problem of power > management in VMs is both a DPDK problem (given that librte_power only > worked on the host) and a general virtualization problem that would be > better solved by those with direct knowledge of Qemu/KVM architecture > and influence on the direction of the Qemu project. Being a huge effort is not an argument. Please check with Qemu community, they'll welcome it. > As it stands, the host backend is simply an example application that can > be replaced by a VMM or Orchestration layer, by using Virtio-Serial it has > obvious leanings to Qemu, but even this could be easily swapped out for > XenBus, IVSHMEM, IP etc. > > If power management is to be eventually supported by Hypervisors directly > then we could also enable to option to switch to that environment, currently > the librte_power implementations (VM or Host) can be selected dynamically > (environment auto-detection) or explicitly via rte_power_set_env(), adding > an arbitrary number of environments is relatively easy. Yes, you are adding a new layer to workaround hypervisor lacks. And this layer will handle native support when it will exist. But if you implement native support now, we don't need this extra layer. > I hope this helps to clarify the approach. Thanks for your explanation. -- Thomas