From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id E422BA04DB; Wed, 14 Oct 2020 20:41:23 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id F0BC61C2B4; Wed, 14 Oct 2020 20:41:21 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id 9BC3FDE0 for ; Wed, 14 Oct 2020 20:41:19 +0200 (CEST) IronPort-SDR: MkZ8F8eBY2ZTwHwSLkCd/MhWadafZnKIXtu8yPWSVXOGIUpZybcduq6H44GXsOImnC9iQwZ1tc n8CGwM3wFpWg== X-IronPort-AV: E=McAfee;i="6000,8403,9774"; a="250863659" X-IronPort-AV: E=Sophos;i="5.77,375,1596524400"; d="scan'208";a="250863659" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Oct 2020 11:41:16 -0700 IronPort-SDR: AGCduhPni4NtfzLQXKxEklaVbXPrzuaKxrIHvMewVt9OXFOiqEuOlk7O+t0p/BPkiHaVOHKqTJ 4AIk+DALwuYA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.77,375,1596524400"; d="scan'208";a="345767045" Received: from fmsmsx601.amr.corp.intel.com ([10.18.126.81]) by fmsmga004.fm.intel.com with ESMTP; 14 Oct 2020 11:41:15 -0700 Received: from fmsmsx603.amr.corp.intel.com (10.18.126.83) by fmsmsx601.amr.corp.intel.com (10.18.126.81) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Wed, 14 Oct 2020 11:41:15 -0700 Received: from fmsedg601.ED.cps.intel.com (10.1.192.135) by fmsmsx603.amr.corp.intel.com (10.18.126.83) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5 via Frontend Transport; Wed, 14 Oct 2020 11:41:15 -0700 Received: from NAM11-CO1-obe.outbound.protection.outlook.com (104.47.56.170) by edgegateway.intel.com (192.55.55.70) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.1713.5; Wed, 14 Oct 2020 11:41:12 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=AlGXZMieFKvJFIaZsWL3Jrs7Tdo0uWnYaFly3QUKFz4RuTUXv7xzTz8+qMmfaQiJLlCX9SgHcUm2xKgoiAfh1+r/PCnpAW5gaG12SM3juo45Ns1kPjrwuv7nu7XIv5ud+MQp/Nk58smAJFKyAjrp/VMHm1M+g2IkLCmEgh4u2tKh+Z9XD7xx4t+lZWenTxZEDKUkhBeCKmPyw9YJBxX+9L9bJREtQOKQ1WXOSQh+hahRN++sj3lQfXswSLNNE6Itc9ZZzA0LRPKQPoyKVxuSK8aqVC4wjCrVQ4moJB2im/r7HvmVZdYA+1Qg598jLLYXgKrZl1P4XxQf6kbre5M5hA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rVU5teBCQbe5JC0jVtcX45yvIGQQ/vz0tLdHdUpd35A=; b=OnTR8aQVLOqNh4XSONggnbsZGLc9IHnzWekRKPpBwfmPaffxg4rFzp7Pt14MbMbZeKoR1QIZrpDwm1KaNnY1wjj8WQzTFkzoBgQcDu3YrF/4JWD7OBSx1IMME1+UpmA0RWU6sl8X54TVoO6442mfOkRrlB5OVh8+tvwOxCOSfQpduNXi2qf9w7mz3P3WNfH/EzMbHU7I36mfcGcuKIFmCf7lrZZZ02sVnt/UvqAn64Cn34rP/GxxItj7fbo4BSmdMAvm2WCcuoH6J77cOIhMf9yzJNRukQ7X1RDo/Rev9Vqnt2ZHsjFGoryMKJIMQr3ZyXKgmD8gAgwh9pFfVM+JZA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=rVU5teBCQbe5JC0jVtcX45yvIGQQ/vz0tLdHdUpd35A=; b=UHNjN55NEEObn9vXrwmMD3SQkOeWY+Jq40MUjXz9nCLVTrp5fgf5Jk54hhoZIlFhAi8eLcQGctkvJBBImHntvEKNNJ44c74S2m25U9lkhrHmKyJ9Ty8BW9l1fx1rHY5VXVyshrSWfZcH08V0frZUS/YdkE8DXcRtJsH6tYJ+MRw= Received: from SN6PR11MB3310.namprd11.prod.outlook.com (2603:10b6:805:b9::13) by SN6PR11MB2896.namprd11.prod.outlook.com (2603:10b6:805:d9::20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3477.21; Wed, 14 Oct 2020 18:41:07 +0000 Received: from SN6PR11MB3310.namprd11.prod.outlook.com ([fe80::ed1d:ff6d:e1e2:6d07]) by SN6PR11MB3310.namprd11.prod.outlook.com ([fe80::ed1d:ff6d:e1e2:6d07%2]) with mapi id 15.20.3455.030; Wed, 14 Oct 2020 18:41:07 +0000 From: "Ananyev, Konstantin" To: "Burakov, Anatoly" , "dev@dpdk.org" CC: "Ma, Liang J" , "Hunt, David" , Ray Kinsella , Neil Horman , "jerinjacobk@gmail.com" , "Richardson, Bruce" , "thomas@monjalon.net" , "McDaniel, Timothy" , "Eads, Gage" , "Macnamara, Chris" Thread-Topic: [PATCH v6 05/10] power: add PMD power management API and callback Thread-Index: AQHWoi5CnitWzoU1vkmz0KFVm0rTg6mXaP4A Date: Wed, 14 Oct 2020 18:41:07 +0000 Message-ID: References: <532f45c5d79b4c30a919553d322bb66e91534466.1602258833.git.anatoly.burakov@intel.com> In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.5.1.3 authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=intel.com; x-originating-ip: [46.7.39.127] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 82e8efbf-821a-4d54-ccf7-08d87070b680 x-ms-traffictypediagnostic: SN6PR11MB2896: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: 7lJt+r+cJQpw93xDCk2mL9XIYVyBVpp4yAJFiljltAWRIMqU5Egtb0zhtt/KLtfJH9iB4z0xxGKQzzIi5nTmPMT4L9nWkU/+uSrlWVC+Q54uoluk8QYs+y37VcDQ37A+Vr/JoD/9MXI5Th5lq6AN+YMQdC3mVEHsqeBGb0ScOeBEnMd9+BpQd5Rh+W7rt2e8XdtHOmgOrxllOPisQRHkCrT2yfd5Vxnucn2WR70CNpAqSlFBQcVy+geS8Q1Xplert8nyDw62LoTQ8wlhvow/aQvWdpLhhZFH/OZJiaCmY1XsjLNVCvbnp2JMhjsr1+3HPmyZ0o+Pekg404F+AvXi1g== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:SN6PR11MB3310.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(4636009)(39860400002)(136003)(396003)(346002)(376002)(366004)(110136005)(66556008)(71200400001)(54906003)(26005)(66446008)(316002)(55016002)(478600001)(64756008)(66476007)(30864003)(9686003)(8676002)(66946007)(33656002)(8936002)(186003)(5660300002)(7696005)(107886003)(76116006)(6506007)(2906002)(4326008)(52536014)(86362001)(83380400001); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: kenKt44+PNoh82Uf07EfC5h85VM/8/yjvRJWBJX4itexIMUl3jx5+/dG18Hj7S8fBEy2JzIjyoYEdxerJaOaPMWZYGAYXEUsGn9EkcAYyYaq+bwX+HhdlH8A0haRd/IvWgQCx940s8FcJ+PZ4YUZoGYDRmej5ow75YSajmtTXfOlEWCwBzrj9HLVL2Egb0+Zk4lJFJnypU65TA/fy7u96nZKyMtxkzr6//I0VwF3fRtHURolCKLS+f870adItRNLLv7dlZMF4CSW1kNaqUzlkzg3No582BbWiJHiSlOOVRUxtGZGvQjOnOjXaVLkjZe2kBh3BtioByfwFh0CE1cLqdj5L4cmYYm+gn834VZkjfruMDmNBvmWFEyFao15+4ntEmp1XxHvN9b9CxpguUDzNMQZB70QgesMo1pccVzncwwg9Fcr8fy6QGO6W4SCApvvUUcYQUrmp+yANhwtZQsgtfBKY99MTz3YRez2yemibxQdiDILEaES3RCcH+vVmJYYqiPfwlbpu9RQbIt2Vb0x15dJjFHEzqpO/YLfDlDhux868YypP8+9YbA50RxIoZaauNO9T+IAkKS0zAJu8iZkNSwhhLQOKZWo1gcB7JW00j04BDQpIfzspQdi7PaDVCp6DWY1gEQ47VpqTIpp7PYvxg== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: SN6PR11MB3310.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 82e8efbf-821a-4d54-ccf7-08d87070b680 X-MS-Exchange-CrossTenant-originalarrivaltime: 14 Oct 2020 18:41:07.1148 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: sC6m46MGgehlAb9ICx45u08QSXPFQ8IM3s6I1FGTND2KnQXIyeG/uSb0pzFmBXS7gps6WlDqPbS0F7hUiexqL1FaJzPw6R4uiyLLXt1ahL4= X-MS-Exchange-Transport-CrossTenantHeadersStamped: SN6PR11MB2896 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v6 05/10] power: add PMD power management API and callback X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > From: Liang Ma >=20 > Add a simple on/off switch that will enable saving power when no > packets are arriving. It is based on counting the number of empty > polls and, when the number reaches a certain threshold, entering an > architecture-defined optimized power state that will either wait > until a TSC timestamp expires, or when packets arrive. >=20 > This API mandates a core-to-single-queue mapping (that is, multiple > queued per device are supported, but they have to be polled on different > cores). >=20 > This design is using PMD RX callbacks. >=20 > 1. UMWAIT/UMONITOR: >=20 > When a certain threshold of empty polls is reached, the core will go > into a power optimized sleep while waiting on an address of next RX > descriptor to be written to. >=20 > 2. Pause instruction >=20 > Instead of move the core into deeper C state, this method uses the > pause instruction to avoid busy polling. >=20 > 3. Frequency scaling > Reuse existing DPDK power library to scale up/down core frequency > depending on traffic volume. >=20 > Signed-off-by: Liang Ma > Signed-off-by: Anatoly Burakov > --- >=20 > Notes: > v6: > - Added wakeup mechanism for UMWAIT > - Removed memory allocation (everything is now allocated statically) > - Fixed various typos and comments > - Check for invalid queue ID > - Moved release notes to this patch >=20 > v5: > - Make error checking more robust > - Prevent initializing scaling if ACPI or PSTATE env wasn't set > - Prevent initializing UMWAIT path if PMD doesn't support get_wake_= addr > - Add some debug logging > - Replace x86-specific code path to generic path using the intrinsic = check >=20 > doc/guides/rel_notes/release_20_11.rst | 11 + > lib/librte_power/meson.build | 5 +- > lib/librte_power/rte_power_pmd_mgmt.c | 300 +++++++++++++++++++++++++ > lib/librte_power/rte_power_pmd_mgmt.h | 92 ++++++++ > lib/librte_power/rte_power_version.map | 4 + > 5 files changed, 410 insertions(+), 2 deletions(-) > create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c > create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h >=20 > diff --git a/doc/guides/rel_notes/release_20_11.rst b/doc/guides/rel_note= s/release_20_11.rst > index ca4f43f7f9..06b822aa36 100644 > --- a/doc/guides/rel_notes/release_20_11.rst > +++ b/doc/guides/rel_notes/release_20_11.rst > @@ -197,6 +197,17 @@ New Features > * Added new ``RTE_ACL_CLASSIFY_AVX512X32`` vector implementation, > which can process up to 32 flows in parallel. Requires AVX512 suppor= t. >=20 > +* **Add PMD power management mechanism** > + > + 3 new Ethernet PMD power management mechanism is added through existin= g > + RX callback infrastructure. > + > + * Add power saving scheme based on UMWAIT instruction (x86 only) > + * Add power saving scheme based on ``rte_pause()`` > + * Add power saving scheme based on frequency scaling through the power= library > + * Add new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_enable()`` > + * Add new EXPERIMENTAL API ``rte_power_pmd_mgmt_queue_disable()`` > + >=20 > Removed Items > ------------- > diff --git a/lib/librte_power/meson.build b/lib/librte_power/meson.build > index 78c031c943..cc3c7a8646 100644 > --- a/lib/librte_power/meson.build > +++ b/lib/librte_power/meson.build > @@ -9,6 +9,7 @@ sources =3D files('rte_power.c', 'power_acpi_cpufreq.c', > 'power_kvm_vm.c', 'guest_channel.c', > 'rte_power_empty_poll.c', > 'power_pstate_cpufreq.c', > + 'rte_power_pmd_mgmt.c', > 'power_common.c') > -headers =3D files('rte_power.h','rte_power_empty_poll.h') > -deps +=3D ['timer'] > +headers =3D files('rte_power.h','rte_power_empty_poll.h','rte_power_pmd_= mgmt.h') > +deps +=3D ['timer' ,'ethdev'] > diff --git a/lib/librte_power/rte_power_pmd_mgmt.c b/lib/librte_power/rte= _power_pmd_mgmt.c > new file mode 100644 > index 0000000000..2b7d2a1a46 > --- /dev/null > +++ b/lib/librte_power/rte_power_pmd_mgmt.c > @@ -0,0 +1,300 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2010-2020 Intel Corporation > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "rte_power_pmd_mgmt.h" > + > +#define EMPTYPOLL_MAX 512 > + > +/** > + * Possible power management states of an ethdev port. > + */ > +enum pmd_mgmt_state { > + /** Device power management is disabled. */ > + PMD_MGMT_DISABLED =3D 0, > + /** Device power management is enabled. */ > + PMD_MGMT_ENABLED, > +}; > + > +struct pmd_queue_cfg { > + enum pmd_mgmt_state pwr_mgmt_state; > + /**< State of power management for this queue */ > + enum rte_power_pmd_mgmt_type cb_mode; > + /**< Callback mode for this queue */ > + const struct rte_eth_rxtx_callback *cur_cb; > + /**< Callback instance */ > + rte_spinlock_t umwait_lock; > + /**< Per-queue status lock - used only for UMWAIT mode */ > + volatile void *wait_addr; > + /**< UMWAIT wakeup address */ > + uint64_t empty_poll_stats; > + /**< Number of empty polls */ > +} __rte_cache_aligned; > + > +static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PE= R_PORT]; > + > +/* trigger a write to the cache line we're waiting on */ > +static void umwait_wakeup(volatile void *addr) > +{ > + uint64_t val; > + > + val =3D __atomic_load_n((volatile uint64_t *)addr, __ATOMIC_RELAXED); > + __atomic_compare_exchange_n((volatile uint64_t *)addr, &val, val, 0, > + __ATOMIC_RELAXED, __ATOMIC_RELAXED); > +} > + > +static uint16_t > +clb_umwait(uint16_t port_id, uint16_t qidx, > + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, > + uint16_t max_pkts __rte_unused, void *addr __rte_unused) > +{ > + > + struct pmd_queue_cfg *q_conf; > + > + q_conf =3D &port_cfg[port_id][qidx]; > + > + if (unlikely(nb_rx =3D=3D 0)) { > + q_conf->empty_poll_stats++; > + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { > + volatile void *target_addr; > + uint64_t expected, mask; > + uint16_t ret; > + uint8_t data_sz; > + > + /* > + * get address of next descriptor in the RX > + * ring for this queue, as well as expected > + * value and a mask. > + */ > + ret =3D rte_eth_get_wake_addr(port_id, qidx, > + &target_addr, &expected, &mask, > + &data_sz); > + if (ret =3D=3D 0) { > + /* > + * we need to ensure we can wake up by another > + * thread triggering a write, so we need the > + * address to always be up to date. > + */ > + rte_spinlock_lock(&q_conf->umwait_lock); I think you need to check state here, and _disable() have to set state with= lock grabbed. Otherwise this lock wouldn't protect you from race conditions. As an example: CP@T0: rte_spinlock_lock(&queue_cfg->umwait_lock); if (queue_cfg->wait_addr !=3D NULL) //wait_addr =3D=3D NULL, fallthrough rte_spinlock_unlock(&queue_cfg->umwait_lock); DP@T1: rte_spinlock_lock(&queue_cfg->umwait_lock); queue_cfg->wait_addr =3D target_addr; monitor_sync(...); // DP was put to sleep CP@T2: queue_cfg->cur_cb =3D NULL; queue_cfg->pwr_mgmt_state =3D PMD_MGMT_DISABLED; ret =3D 0; rte_power_pmd_mgmt_queue_disable() finished with success, but DP core wasn't wokenup. To be more specific: clb_umwait(...) { ... lock(&qcfg->lck); if (qcfg->state =3D=3D ENABLED) { qcfg->wake_addr =3D addr; monitor_sync(addr, ...,&qcfg->lck); } unlock(&qcfg->lck);=20 ... } _disable(...) { ... lock(&qcfg->lck); qcfg->state =3D DISABLED; if (qcfg->wake_addr !=3D NULL) monitor_wakeup(qcfg->wake_addr); unlock(&qcfg->lock); ... } > + q_conf->wait_addr =3D target_addr; > + /* -1ULL is maximum value for TSC */ > + rte_power_monitor_sync(target_addr, expected, > + mask, -1ULL, data_sz, > + &q_conf->umwait_lock); > + /* erase the address */ > + q_conf->wait_addr =3D NULL; > + rte_spinlock_unlock(&q_conf->umwait_lock); > + } > + } > + } else > + q_conf->empty_poll_stats =3D 0; > + > + return nb_rx; > +} > + > +static uint16_t > +clb_pause(uint16_t port_id, uint16_t qidx, > + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, > + uint16_t max_pkts __rte_unused, void *addr __rte_unused) > +{ > + struct pmd_queue_cfg *q_conf; > + > + q_conf =3D &port_cfg[port_id][qidx]; > + > + if (unlikely(nb_rx =3D=3D 0)) { > + q_conf->empty_poll_stats++; > + /* sleep for 1 microsecond */ > + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) > + rte_delay_us(1); > + } else > + q_conf->empty_poll_stats =3D 0; > + > + return nb_rx; > +} > + > +static uint16_t > +clb_scale_freq(uint16_t port_id, uint16_t qidx, > + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, > + uint16_t max_pkts __rte_unused, void *_ __rte_unused) > +{ > + struct pmd_queue_cfg *q_conf; > + > + q_conf =3D &port_cfg[port_id][qidx]; > + > + if (unlikely(nb_rx =3D=3D 0)) { > + q_conf->empty_poll_stats++; > + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) > + /* scale down freq */ > + rte_power_freq_min(rte_lcore_id()); > + } else { > + q_conf->empty_poll_stats =3D 0; > + /* scale up freq */ > + rte_power_freq_max(rte_lcore_id()); > + } > + > + return nb_rx; > +} > + > +int > +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, > + uint16_t port_id, uint16_t queue_id, > + enum rte_power_pmd_mgmt_type mode) > +{ > + struct rte_eth_dev *dev; > + struct pmd_queue_cfg *queue_cfg; > + int ret; > + > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); > + dev =3D &rte_eth_devices[port_id]; > + > + /* check if queue id is valid */ > + if (queue_id >=3D dev->data->nb_rx_queues || > + queue_id >=3D RTE_MAX_QUEUES_PER_PORT) { > + return -EINVAL; > + } > + > + queue_cfg =3D &port_cfg[port_id][queue_id]; > + > + if (queue_cfg->pwr_mgmt_state =3D=3D PMD_MGMT_ENABLED) { > + ret =3D -EINVAL; > + goto end; > + } > + > + switch (mode) { > + case RTE_POWER_MGMT_TYPE_WAIT: > + { > + /* check if rte_power_monitor is supported */ > + uint64_t dummy_expected, dummy_mask; > + struct rte_cpu_intrinsics i; > + volatile void *dummy_addr; > + uint8_t dummy_sz; > + > + rte_cpu_get_intrinsics_support(&i); > + > + if (!i.power_monitor) { > + RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); > + ret =3D -ENOTSUP; > + goto end; > + } > + > + /* check if the device supports the necessary PMD API */ > + if (rte_eth_get_wake_addr(port_id, queue_id, > + &dummy_addr, &dummy_expected, > + &dummy_mask, &dummy_sz) =3D=3D -ENOTSUP) { > + RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_rxq_ring_a= ddr_get\n"); > + ret =3D -ENOTSUP; > + goto end; > + } > + /* initialize UMWAIT spinlock */ > + rte_spinlock_init(&queue_cfg->umwait_lock); I think don't need to do that. It supposed to be in valid state (otherwise you are probably in trouble any= way). > + > + /* initialize data before enabling the callback */ > + queue_cfg->empty_poll_stats =3D 0; > + queue_cfg->cb_mode =3D mode; > + queue_cfg->pwr_mgmt_state =3D PMD_MGMT_ENABLED; > + > + queue_cfg->cur_cb =3D rte_eth_add_rx_callback(port_id, queue_id, > + clb_umwait, NULL); Would be a bit cleaner/nicer to move add_rx_callback out of switch() {} As you have to do it always anyway. Same thought for disable() and remove_rx_callback(). > + break; > + } > + case RTE_POWER_MGMT_TYPE_SCALE: > + { > + enum power_management_env env; > + /* only PSTATE and ACPI modes are supported */ > + if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) && > + !rte_power_check_env_supported( > + PM_ENV_PSTATE_CPUFREQ)) { > + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n"= ); > + ret =3D -ENOTSUP; > + goto end; > + } > + /* ensure we could initialize the power library */ > + if (rte_power_init(lcore_id)) { > + ret =3D -EINVAL; > + goto end; > + } > + /* ensure we initialized the correct env */ > + env =3D rte_power_get_env(); > + if (env !=3D PM_ENV_ACPI_CPUFREQ && > + env !=3D PM_ENV_PSTATE_CPUFREQ) { > + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized= \n"); > + ret =3D -ENOTSUP; > + goto end; > + } > + /* initialize data before enabling the callback */ > + queue_cfg->empty_poll_stats =3D 0; > + queue_cfg->cb_mode =3D mode; > + queue_cfg->pwr_mgmt_state =3D PMD_MGMT_ENABLED; > + > + queue_cfg->cur_cb =3D rte_eth_add_rx_callback(port_id, > + queue_id, clb_scale_freq, NULL); > + break; > + } > + case RTE_POWER_MGMT_TYPE_PAUSE: > + /* initialize data before enabling the callback */ > + queue_cfg->empty_poll_stats =3D 0; > + queue_cfg->cb_mode =3D mode; > + queue_cfg->pwr_mgmt_state =3D PMD_MGMT_ENABLED; > + > + queue_cfg->cur_cb =3D rte_eth_add_rx_callback(port_id, queue_id, > + clb_pause, NULL); > + break; > + } > + ret =3D 0; > + > +end: > + return ret; > +} > + > +int > +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, > + uint16_t port_id, uint16_t queue_id) > +{ > + struct pmd_queue_cfg *queue_cfg; > + int ret; > + > + queue_cfg =3D &port_cfg[port_id][queue_id]; > + > + if (queue_cfg->pwr_mgmt_state =3D=3D PMD_MGMT_DISABLED) { > + ret =3D -EINVAL; > + goto end; > + } > + > + switch (queue_cfg->cb_mode) { > + case RTE_POWER_MGMT_TYPE_WAIT: > + rte_spinlock_lock(&queue_cfg->umwait_lock); > + > + /* wake up the core from UMWAIT sleep, if any */ > + if (queue_cfg->wait_addr !=3D NULL) > + umwait_wakeup(queue_cfg->wait_addr); > + > + rte_spinlock_unlock(&queue_cfg->umwait_lock); > + /* fall-through */ > + case RTE_POWER_MGMT_TYPE_PAUSE: > + rte_eth_remove_rx_callback(port_id, queue_id, > + queue_cfg->cur_cb); > + break; > + case RTE_POWER_MGMT_TYPE_SCALE: > + rte_power_freq_max(lcore_id); > + rte_eth_remove_rx_callback(port_id, queue_id, > + queue_cfg->cur_cb); > + rte_power_exit(lcore_id); > + break; > + } > + /* > + * we don't free the RX callback here because it is unsafe to do so > + * unless we know for a fact that all data plane threads have stopped. > + */ > + queue_cfg->cur_cb =3D NULL; > + queue_cfg->pwr_mgmt_state =3D PMD_MGMT_DISABLED; > + ret =3D 0; > +end: > + return ret; > +} > diff --git a/lib/librte_power/rte_power_pmd_mgmt.h b/lib/librte_power/rte= _power_pmd_mgmt.h > new file mode 100644 > index 0000000000..a7a3f98268 > --- /dev/null > +++ b/lib/librte_power/rte_power_pmd_mgmt.h > @@ -0,0 +1,92 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2010-2020 Intel Corporation > + */ > + > +#ifndef _RTE_POWER_PMD_MGMT_H > +#define _RTE_POWER_PMD_MGMT_H > + > +/** > + * @file > + * RTE PMD Power Management > + */ > +#include > +#include > + > +#include > +#include > +#include > +#include > +#include > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +/** > + * PMD Power Management Type > + */ > +enum rte_power_pmd_mgmt_type { > + /** WAIT callback mode. */ > + RTE_POWER_MGMT_TYPE_WAIT =3D 1, > + /** PAUSE callback mode. */ > + RTE_POWER_MGMT_TYPE_PAUSE, > + /** Freq Scaling callback mode. */ > + RTE_POWER_MGMT_TYPE_SCALE, > +}; > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change, or be removed, without prior no= tice > + * > + * Setup per-queue power management callback. > + * > + * @note This function is not thread-safe. > + * > + * @param lcore_id > + * lcore_id. > + * @param port_id > + * The port identifier of the Ethernet device. > + * @param queue_id > + * The queue identifier of the Ethernet device. > + * @param mode > + * The power management callback function type. > + > + * @return > + * 0 on success > + * <0 on error > + */ > +__rte_experimental > +int > +rte_power_pmd_mgmt_queue_enable(unsigned int lcore_id, > + uint16_t port_id, > + uint16_t queue_id, > + enum rte_power_pmd_mgmt_type mode); > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change, or be removed, without prior no= tice > + * > + * Remove per-queue power management callback. > + * > + * @note This function is not thread-safe. > + * > + * @param lcore_id > + * lcore_id. > + * @param port_id > + * The port identifier of the Ethernet device. > + * @param queue_id > + * The queue identifier of the Ethernet device. > + * @return > + * 0 on success > + * <0 on error > + */ > +__rte_experimental > +int > +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, > + uint16_t port_id, > + uint16_t queue_id); > +#ifdef __cplusplus > +} > +#endif > + > +#endif > diff --git a/lib/librte_power/rte_power_version.map b/lib/librte_power/rt= e_power_version.map > index 69ca9af616..3f2f6cd6f6 100644 > --- a/lib/librte_power/rte_power_version.map > +++ b/lib/librte_power/rte_power_version.map > @@ -34,4 +34,8 @@ EXPERIMENTAL { > rte_power_guest_channel_receive_msg; > rte_power_poll_stat_fetch; > rte_power_poll_stat_update; > + # added in 20.11 > + rte_power_pmd_mgmt_queue_enable; > + rte_power_pmd_mgmt_queue_disable; > + > }; > -- > 2.17.1