From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9FCB9A00BE; Wed, 27 May 2020 19:02:17 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 5420E1DABF; Wed, 27 May 2020 19:02:16 +0200 (CEST) Received: from mga01.intel.com (mga01.intel.com [192.55.52.88]) by dpdk.org (Postfix) with ESMTP id ADF521DA94 for ; Wed, 27 May 2020 19:02:14 +0200 (CEST) IronPort-SDR: jRx4wwcoc5yKKUjtfTibZXbe1KXUA1W7cOYuHQGGirc2bsV6BnWiUOT5NiwcQeGI6libuyEI3M +E+8FDUSo26A== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga101.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 27 May 2020 10:02:13 -0700 IronPort-SDR: MaXS45/2WO59nujdrtrRjETs4kjQTz9FjN4HHNKgvg4lZDAm+Dtc3kAl9qGy8MWp+kzsy3qK9d zXWLUqqTz0nA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,442,1583222400"; d="scan'208";a="442595950" Received: from silpixa00399498.ir.intel.com (HELO silpixa00399498.ger.corp.intel.com) ([10.237.222.52]) by orsmga005.jf.intel.com with ESMTP; 27 May 2020 10:01:59 -0700 From: Anatoly Burakov To: dev@dpdk.org Cc: david.hunt@intel.com, liang.j.ma@intel.com Date: Wed, 27 May 2020 18:02:00 +0100 Message-Id: X-Mailer: git-send-email 2.17.1 Subject: [dpdk-dev] [RFC 0/6] Power-optimized RX for Ethernet devices X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" This patchset proposes a simple API for Ethernet drivers to cause the CPU to enter a power-optimized state while waiting for packets to arrive, along with a set of (hopefully generic) intrinsics that facilitate that. This is achieved through cooperation with the NIC driver that will allow us to know address of the next NIC RX ring packet descriptor, and wait for writes on it. On IA, this is achieved through using UMONITOR/UMWAIT instructions. They are used in their raw opcode form because there is no widespread compiler support for them yet. Still, the API is made generic enough to hopefully support other architectures, if they happen to implement similar instructions. To achieve power savings, there is a very simple mechanism used: we're counting empty polls, and if a certain threshold is reached, we get the address of next RX ring descriptor from the NIC driver, arm the monitoring hardware, and enter a power-optimized state. We will then wake up when either a timeout happens, or a write happens (or generally whenever CPU feels like waking up - this is platform- specific), and proceed as normal. The empty poll counter is reset whenever we actually get packets, so we only go to sleep when we know nothing is going on. Why are we putting it into ethdev as opposed to leaving this up to the application? Our customers specifically requested a way to do it wit minimal changes to the application code. The current approach allows to just flip a switch and automagically have power savings. There are certain limitations in this patchset right now: - Currently, only 1:1 core to queue mapping is supported, meaning that each lcore must at most handle RX on a single queue - Currently, power management is enabled per-port, not per-queue - There is potential to greatly increase TX latency if we are buffering things, and go to sleep before sending packets - The API is not perfect and could use some improvement and discussion - The API doesn't extend to other device types - The intrinsics are platform-specific, so ethdev has some platform-specific code in it - Support was only implemented for devices using net/ixgbe, net/i40e and net/ice drivers Hopefully this would generate enough feedback to clear a path forward! Anatoly Burakov (6): eal: add power management intrinsics ethdev: add simple power management API net/ixgbe: implement power management API net/i40e: implement power management API net/ice: implement power management API app/testpmd: add command for power management on a port app/test-pmd/cmdline.c | 48 +++++++ drivers/net/i40e/i40e_ethdev.c | 1 + drivers/net/i40e/i40e_rxtx.c | 23 +++ drivers/net/i40e/i40e_rxtx.h | 2 + drivers/net/ice/ice_ethdev.c | 1 + drivers/net/ice/ice_rxtx.c | 23 +++ drivers/net/ice/ice_rxtx.h | 2 + drivers/net/ixgbe/ixgbe_ethdev.c | 1 + drivers/net/ixgbe/ixgbe_rxtx.c | 22 +++ drivers/net/ixgbe/ixgbe_rxtx.h | 2 + .../include/generic/rte_power_intrinsics.h | 64 +++++++++ lib/librte_eal/include/meson.build | 1 + lib/librte_eal/x86/include/meson.build | 1 + lib/librte_eal/x86/include/rte_cpuflags.h | 1 + .../x86/include/rte_power_intrinsics.h | 134 ++++++++++++++++++ lib/librte_eal/x86/rte_cpuflags.c | 2 + lib/librte_ethdev/rte_ethdev.c | 39 +++++ lib/librte_ethdev/rte_ethdev.h | 70 +++++++++ lib/librte_ethdev/rte_ethdev_core.h | 41 +++++- lib/librte_ethdev/rte_ethdev_version.map | 4 + 20 files changed, 480 insertions(+), 2 deletions(-) create mode 100644 lib/librte_eal/include/generic/rte_power_intrinsics.h create mode 100644 lib/librte_eal/x86/include/rte_power_intrinsics.h -- 2.17.1