From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0A76FA0A0F; Wed, 30 Jun 2021 13:05:06 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 8AA4440141; Wed, 30 Jun 2021 13:05:05 +0200 (CEST) Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by mails.dpdk.org (Postfix) with ESMTP id 710FA40040 for ; Wed, 30 Jun 2021 13:05:03 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10030"; a="208149467" X-IronPort-AV: E=Sophos;i="5.83,311,1616482800"; d="scan'208";a="208149467" Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 30 Jun 2021 04:05:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,311,1616482800"; d="scan'208";a="644044924" Received: from orsmsx602.amr.corp.intel.com ([10.22.229.15]) by fmsmga005.fm.intel.com with ESMTP; 30 Jun 2021 04:05:02 -0700 Received: from orsmsx611.amr.corp.intel.com (10.22.229.24) by ORSMSX602.amr.corp.intel.com (10.22.229.15) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.4; Wed, 30 Jun 2021 04:05:01 -0700 Received: from ORSEDG602.ED.cps.intel.com (10.7.248.7) by orsmsx611.amr.corp.intel.com (10.22.229.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2242.4 via Frontend Transport; Wed, 30 Jun 2021 04:05:01 -0700 Received: from NAM10-MW2-obe.outbound.protection.outlook.com (104.47.55.100) by edgegateway.intel.com (134.134.137.103) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.1.2242.4; Wed, 30 Jun 2021 04:04:59 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=bQkmVLNE9OAVaT/jyZhBJSMyDeRhRJWsREtSNRGxyTeFkc9p9iXlKGa6NBVGee0fYCe/U9tWOcMJvbTB2ixo+OIcZWsvndDAvnIoF9rdOJw1as5HUs+CVGXOFWQXijS+qN0KN8GyFQElClre1PSrMjCRJNC430GxDwOuFqVhuhlMtDYli9SZPgM2dbqbkji7a2+VFFxEMz1Fe22xLce5BAeU98xBiy78bMKg3UlNJXE46ckNTVhr8aJKkSj9i9Jpav0dCgGPYXaxGcCNuzd5RLBymGmzCKy9h4rdgEefWXbcZY/A+OD9kZehZ2qgxb1I17WDKhQ5jBsE4vnMPlgAsg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kn84wyi7NYUg6LQ7i3SZS4FzEWMmrG6IAIz189LXRro=; b=BjEvFJVrZW8xMf3+pwBM80eVHpxJ84BtmWZ6KqJU0FHfvuTX+pifYN+j9v3KpAFc4tV8OdBwJu7eK0OpenkXg6wS2cyFx26BMHfvXuwl4Kk5HfOVXlenE+D+8QASCUZPb6XHGboK6mrAycM6+phdOurNLwZVblL2IwrFJqR7XfxrgDsAmgUDvYnOFFdDcUzpRfD2BwUyuSMEuvkCa7n8UhCJpasZnI8PaIXyVZdvHqBUaK+bTdT6DQbeaE0JRt+bd/53/uNDj11lw9eKOxiqtoPT2M+rgtx/V4K6LHr2byeoDsMn1en+oEN3ftqO0VRN0C2JRtW5Mj7YGlliqJbeXA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=kn84wyi7NYUg6LQ7i3SZS4FzEWMmrG6IAIz189LXRro=; b=QHCuZbysVlad+If7xmcWMq26XyAtm5yLyGOM5ta1OY5LM6Nxdd7mPjyzagbmxEdQSiFlICB9mj8//gZUQuknL7wfaAIYO3IFLWAl8FUjviymXlWrWEZgHt0RKj9QDJqI1doVczLEKhT90uzhQE/1JEkx5X4V0kVU3sf4j9W7070= Received: from DM6PR11MB4491.namprd11.prod.outlook.com (2603:10b6:5:204::19) by DM5PR11MB2044.namprd11.prod.outlook.com (2603:10b6:3:10::23) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4287.22; Wed, 30 Jun 2021 11:04:57 +0000 Received: from DM6PR11MB4491.namprd11.prod.outlook.com ([fe80::7dc4:66b0:f76b:6d48]) by DM6PR11MB4491.namprd11.prod.outlook.com ([fe80::7dc4:66b0:f76b:6d48%7]) with mapi id 15.20.4264.026; Wed, 30 Jun 2021 11:04:57 +0000 From: "Ananyev, Konstantin" To: "Burakov, Anatoly" , "dev@dpdk.org" , "Hunt, David" CC: "Loftus, Ciara" Thread-Topic: [PATCH v5 5/7] power: support callbacks for multiple Rx queues Thread-Index: AQHXbP5Dd91lY61/Z0CNJj1Qy++4O6ssW/Zw Date: Wed, 30 Jun 2021 11:04:57 +0000 Message-ID: References: <8f5d030a77aa2f0e95e9680cb911b4e8db30c879.1624981670.git.anatoly.burakov@intel.com> In-Reply-To: <8f5d030a77aa2f0e95e9680cb911b4e8db30c879.1624981670.git.anatoly.burakov@intel.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.5.1.3 authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=intel.com; x-originating-ip: [109.255.184.192] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 3d4d67b0-3f8d-4ad1-49cd-08d93bb6e5af x-ms-traffictypediagnostic: DM5PR11MB2044: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: D5BXGoDzc+fHScEvd831W9evqFnr3n2oPMiGYBEDptJq4vwehBElQ+msnfCTgvXUypJnk5XoUn185BMjU82IZbfTxrrJDiDzRt6puoHscTeTuAsXkZuPBHE8gXwLbDR7c1ulP7BRLsRcwxzgY/LDhggLWVOw1SuQHQtS00U7Ipr6DnP3X9zeI0+2oltZ5Xpc9V2E/3oLGUvEeoTBi2XeCYxL0uclWhLsNPiVLKFESElQ6phvxNJVhMvg92uZhuPtJGvYhG844Ui8l1xpkQNo/QWIfv8uQZhK5OHueZw+5e1jQxG7X0B731Xn6lNqJctmpCk2FYnyFWQ+Eph6c21EzQQTJl4s9j7eZ/dLzUIHsYMww4SgvoAXSuJkRwg6ySNcpbqb0+bp1TEDQXeYG7SGg9WPHKHrZGHh1m5CMVIQuXtEOwSuVNbtLcX8BRtIdG7hqO1YhRQJojqI4KaCBazfHX3ULDlo+W0p79w6dIT+iUE94CLLKuagivOZd+hhbUpKuvEhzFmw9C0o6X7c6BEJkz5tyN4JXBvpMGqLpPYxZRTFGKQgh2E4UKzQAnmU2qSNM3Fooo9ffE2gWlJwJZcTPHCJvoLtjm+bVnhpqltq7eDlOsG/VhLwmTNkNe4BuxXF9hxpq/iO2XTXd8ZHB8KnEg== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:DM6PR11MB4491.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFS:(39860400002)(366004)(376002)(136003)(396003)(346002)(52536014)(8676002)(7696005)(83380400001)(26005)(5660300002)(55236004)(38100700002)(66476007)(66556008)(86362001)(8936002)(66446008)(64756008)(71200400001)(6506007)(186003)(122000001)(316002)(76116006)(66946007)(478600001)(110136005)(30864003)(4326008)(107886003)(6636002)(33656002)(2906002)(55016002)(9686003)(559001)(579004); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata-chunkcount: 1 x-ms-exchange-antispam-messagedata-0: =?us-ascii?Q?BK2xC9tagP/E6GgB2UhIlNooQyRvotbJcFjTXs8JPiKA7Qnfl5UPsw+C/vD2?= =?us-ascii?Q?Muy+Mp32Llp9C8dNhMkAoCL41Xs994/VUL8u6jzw/1gDoRuyzyDs2KZ6FTp7?= =?us-ascii?Q?H80y5/2T/GHocN3o3O7YZFtOuiaE7xt0Hc18hNDWdci67UJWM/hH5Tb4HT2y?= =?us-ascii?Q?vS8asX61kNVsUTLwrDLerF4OpIdZrUVy3+uFE2T5WqyHOm/xKEykgnyCpnBn?= =?us-ascii?Q?sXP3JvGbWe7UrNt4E11Gls4vh31QsxHfHGE1IGh634m2LaQY7Qx/sr+brqn+?= =?us-ascii?Q?+cbncfbVJtjWgRlpebBTWX4DOEZIgBYH/s64yYFwEyIse9JAaEAxhm9KKpXt?= =?us-ascii?Q?txzNNIN4gjwkC3ubuGXw638F8hqRAGmyfV+erL0Mw6a4aU0LkdAQj5mbkPDx?= =?us-ascii?Q?Mmm8Zpdz1Jo0jAIzwzZRSFp9ct98BQKXRhlzsMQ1V2pdpXtcWl4qo9Ljk2Tx?= =?us-ascii?Q?8N3KdP9298937DZYt3d8I63afXLbUBxs5+5yN70OIaCHm0gU/dpceBykGEyb?= =?us-ascii?Q?X1IiXVWM3JWZ4iZx3i2wlVRrbwtUXcEuJ7DK1/DxJDF++ZCn6voam9a7DQm7?= =?us-ascii?Q?AngxXrvFGrURag+mYwJa/78Orvpg0Wm9d5643Amnbsf9E2qe+ij6JMDKCWKB?= =?us-ascii?Q?1sa9II0zJioX4hHw6oZ89NjQ2VOH96hcDcWE3kSGWibiSKzqw73nVYBMnxlw?= =?us-ascii?Q?ojHGVHwJ3uCiLLlNCt4NuEAaPoPO7vPPfazyeKk7cbxLr7FQToPM3BnAkRaX?= =?us-ascii?Q?t6FuOQAfhZmfHtD74Oe6gyvx8mSStsYydCbCVM9UNJumEQlra5eANX8DtvKx?= =?us-ascii?Q?SqgA4mpYuorLwFHc/KRm/ibligYdXGV9a94YnWINy2L2r7+0TSnBqNBEyM2E?= =?us-ascii?Q?uSKf5+rbNAAR+xD0wttFesFyAXwt8a0USul9kFr7mXGrKuYymAK1dfj5uACg?= =?us-ascii?Q?GU6OCoRZ03KDt7c6EysW2HGoHAE0EDgeQizzrgUDE+zUMkUnaqesVNhq+sMn?= =?us-ascii?Q?CcfFLHl49yEKFMPFMPmF4JiRDkJ2SACzYH2RD2Css5eCL/PVEqj2+VSMZidL?= =?us-ascii?Q?v/TpMQ0aACAteGWl8JZsz99lryLY8VXDMvOelIII4ARcGUJh2B9TL7fSX5jC?= =?us-ascii?Q?wPKFYbs1JPoCszGfqv7GUygXFBCRK0NF299WgPmX832puiaW5FTuz+5gILSt?= =?us-ascii?Q?oFDQoY3EA/yjh/O3M9acXG4VXGLYnG10dOkEPQ5K9okxV22W8zLgHhvNq0uE?= =?us-ascii?Q?ohxNLL6NgPlTJBzSU8/yO6TJbhCUqA6wLPQGq/7wJISaEsKYd3KmXNeEumEz?= =?us-ascii?Q?MTWfOegBr2jT45lwZBGVJfjq?= Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-AuthAs: Internal X-MS-Exchange-CrossTenant-AuthSource: DM6PR11MB4491.namprd11.prod.outlook.com X-MS-Exchange-CrossTenant-Network-Message-Id: 3d4d67b0-3f8d-4ad1-49cd-08d93bb6e5af X-MS-Exchange-CrossTenant-originalarrivaltime: 30 Jun 2021 11:04:57.2103 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: LaOFvfQi3Qfi7uQBdwp3/B9e2yzXBcuqxWCXJIpBUrBxjQR7soYtFC2xdIsfMBZN72VyEp8Qz9XUOZOfKf2DAmptQusPBpVLo254jlxvl9g= X-MS-Exchange-Transport-CrossTenantHeadersStamped: DM5PR11MB2044 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v5 5/7] power: support callbacks for multiple Rx queues X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" =20 > Currently, there is a hard limitation on the PMD power management > support that only allows it to support a single queue per lcore. This is > not ideal as most DPDK use cases will poll multiple queues per core. >=20 > The PMD power management mechanism relies on ethdev Rx callbacks, so it > is very difficult to implement such support because callbacks are > effectively stateless and have no visibility into what the other ethdev > devices are doing. This places limitations on what we can do within the > framework of Rx callbacks, but the basics of this implementation are as > follows: >=20 > - Replace per-queue structures with per-lcore ones, so that any device > polled from the same lcore can share data > - Any queue that is going to be polled from a specific lcore has to be > added to the list of queues to poll, so that the callback is aware of > other queues being polled by the same lcore > - Both the empty poll counter and the actual power saving mechanism is > shared between all queues polled on a particular lcore, and is only > activated when all queues in the list were polled and were determined > to have no traffic. > - The limitation on UMWAIT-based polling is not removed because UMWAIT > is incapable of monitoring more than one address. >=20 > Also, while we're at it, update and improve the docs. >=20 > Signed-off-by: Anatoly Burakov > --- >=20 > Notes: > v5: > - Remove the "power save queue" API and replace it with mechanism sug= gested by > Konstantin >=20 > v3: > - Move the list of supported NICs to NIC feature table >=20 > v2: > - Use a TAILQ for queues instead of a static array > - Address feedback from Konstantin > - Add additional checks for stopped queues >=20 > doc/guides/nics/features.rst | 10 + > doc/guides/prog_guide/power_man.rst | 65 ++-- > doc/guides/rel_notes/release_21_08.rst | 3 + > lib/power/rte_power_pmd_mgmt.c | 431 ++++++++++++++++++------- > 4 files changed, 373 insertions(+), 136 deletions(-) >=20 > diff --git a/doc/guides/nics/features.rst b/doc/guides/nics/features.rst > index 403c2b03a3..a96e12d155 100644 > --- a/doc/guides/nics/features.rst > +++ b/doc/guides/nics/features.rst > @@ -912,6 +912,16 @@ Supports to get Rx/Tx packet burst mode information. > * **[implements] eth_dev_ops**: ``rx_burst_mode_get``, ``tx_burst_mode_g= et``. > * **[related] API**: ``rte_eth_rx_burst_mode_get()``, ``rte_eth_tx_burst= _mode_get()``. >=20 > +.. _nic_features_get_monitor_addr: > + > +PMD power management using monitor addresses > +-------------------------------------------- > + > +Supports getting a monitoring condition to use together with Ethernet PM= D power > +management (see :doc:`../prog_guide/power_man` for more details). > + > +* **[implements] eth_dev_ops**: ``get_monitor_addr`` > + > .. _nic_features_other: >=20 > Other dev ops not represented by a Feature > diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/= power_man.rst > index c70ae128ac..ec04a72108 100644 > --- a/doc/guides/prog_guide/power_man.rst > +++ b/doc/guides/prog_guide/power_man.rst > @@ -198,34 +198,41 @@ Ethernet PMD Power Management API > Abstract > ~~~~~~~~ >=20 > -Existing power management mechanisms require developers > -to change application design or change code to make use of it. > -The PMD power management API provides a convenient alternative > -by utilizing Ethernet PMD RX callbacks, > -and triggering power saving whenever empty poll count reaches a certain = number. > - > -Monitor > - This power saving scheme will put the CPU into optimized power state > - and use the ``rte_power_monitor()`` function > - to monitor the Ethernet PMD RX descriptor address, > - and wake the CPU up whenever there's new traffic. > - > -Pause > - This power saving scheme will avoid busy polling > - by either entering power-optimized sleep state > - with ``rte_power_pause()`` function, > - or, if it's not available, use ``rte_pause()``. > - > -Frequency scaling > - This power saving scheme will use ``librte_power`` library > - functionality to scale the core frequency up/down > - depending on traffic volume. > - > -.. note:: > - > - Currently, this power management API is limited to mandatory mapping > - of 1 queue to 1 core (multiple queues are supported, > - but they must be polled from different cores). > +Existing power management mechanisms require developers to change applic= ation > +design or change code to make use of it. The PMD power management API pr= ovides a > +convenient alternative by utilizing Ethernet PMD RX callbacks, and trigg= ering > +power saving whenever empty poll count reaches a certain number. > + > +* Monitor > + This power saving scheme will put the CPU into optimized power state = and > + monitor the Ethernet PMD RX descriptor address, waking the CPU up whe= never > + there's new traffic. Support for this scheme may not be available on = all > + platforms, and further limitations may apply (see below). > + > +* Pause > + This power saving scheme will avoid busy polling by either entering > + power-optimized sleep state with ``rte_power_pause()`` function, or, = if it's > + not supported by the underlying platform, use ``rte_pause()``. > + > +* Frequency scaling > + This power saving scheme will use ``librte_power`` library functional= ity to > + scale the core frequency up/down depending on traffic volume. > + > +The "monitor" mode is only supported in the following configurations and= scenarios: > + > +* If ``rte_cpu_get_intrinsics_support()`` function indicates that > + ``rte_power_monitor()`` is supported by the platform, then monitoring = will be > + limited to a mapping of 1 core 1 queue (thus, each Rx queue will have = to be > + monitored from a different lcore). > + > +* If ``rte_cpu_get_intrinsics_support()`` function indicates that the > + ``rte_power_monitor()`` function is not supported, then monitor mode w= ill not > + be supported. > + > +* Not all Ethernet drivers support monitoring, even if the underlying > + platform may support the necessary CPU instructions. Please refer to > + :doc:`../nics/overview` for more information. > + >=20 > API Overview for Ethernet PMD Power Management > ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > @@ -242,3 +249,5 @@ References >=20 > * The :doc:`../sample_app_ug/vm_power_management` > chapter in the :doc:`../sample_app_ug/index` section. > + > +* The :doc:`../nics/overview` chapter in the :doc:`../nics/index` sect= ion > diff --git a/doc/guides/rel_notes/release_21_08.rst b/doc/guides/rel_note= s/release_21_08.rst > index f015c509fc..3926d45ef8 100644 > --- a/doc/guides/rel_notes/release_21_08.rst > +++ b/doc/guides/rel_notes/release_21_08.rst > @@ -57,6 +57,9 @@ New Features >=20 > * eal: added ``rte_power_monitor_multi`` to support waiting for multiple= events. >=20 > +* rte_power: The experimental PMD power management API now supports mana= ging > + multiple Ethernet Rx queues per lcore. > + >=20 > Removed Items > ------------- > diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgm= t.c > index 9b95cf1794..fccfd236c2 100644 > --- a/lib/power/rte_power_pmd_mgmt.c > +++ b/lib/power/rte_power_pmd_mgmt.c > @@ -33,18 +33,96 @@ enum pmd_mgmt_state { > PMD_MGMT_ENABLED > }; >=20 > -struct pmd_queue_cfg { > +union queue { > + uint32_t val; > + struct { > + uint16_t portid; > + uint16_t qid; > + }; > +}; > + > +struct queue_list_entry { > + TAILQ_ENTRY(queue_list_entry) next; > + union queue queue; > + uint64_t n_empty_polls; > + const struct rte_eth_rxtx_callback *cb; > +}; > + > +struct pmd_core_cfg { > + TAILQ_HEAD(queue_list_head, queue_list_entry) head; > + /**< List of queues associated with this lcore */ > + size_t n_queues; > + /**< How many queues are in the list? */ > volatile enum pmd_mgmt_state pwr_mgmt_state; > /**< State of power management for this queue */ > enum rte_power_pmd_mgmt_type cb_mode; > /**< Callback mode for this queue */ > - const struct rte_eth_rxtx_callback *cur_cb; > - /**< Callback instance */ > - uint64_t empty_poll_stats; > - /**< Number of empty polls */ > + uint64_t n_queues_ready_to_sleep; > + /**< Number of queues ready to enter power optimized state */ > } __rte_cache_aligned; > +static struct pmd_core_cfg lcore_cfgs[RTE_MAX_LCORE]; >=20 > -static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PE= R_PORT]; > +static inline bool > +queue_equal(const union queue *l, const union queue *r) > +{ > + return l->val =3D=3D r->val; > +} > + > +static inline void > +queue_copy(union queue *dst, const union queue *src) > +{ > + dst->val =3D src->val; > +} > + > +static struct queue_list_entry * > +queue_list_find(const struct pmd_core_cfg *cfg, const union queue *q) > +{ > + struct queue_list_entry *cur; > + > + TAILQ_FOREACH(cur, &cfg->head, next) { > + if (queue_equal(&cur->queue, q)) > + return cur; > + } > + return NULL; > +} > + > +static int > +queue_list_add(struct pmd_core_cfg *cfg, const union queue *q) > +{ > + struct queue_list_entry *qle; > + > + /* is it already in the list? */ > + if (queue_list_find(cfg, q) !=3D NULL) > + return -EEXIST; > + > + qle =3D malloc(sizeof(*qle)); > + if (qle =3D=3D NULL) > + return -ENOMEM; > + memset(qle, 0, sizeof(*qle)); > + > + queue_copy(&qle->queue, q); > + TAILQ_INSERT_TAIL(&cfg->head, qle, next); > + cfg->n_queues++; > + qle->n_empty_polls =3D 0; > + > + return 0; > +} > + > +static struct queue_list_entry * > +queue_list_take(struct pmd_core_cfg *cfg, const union queue *q) > +{ > + struct queue_list_entry *found; > + > + found =3D queue_list_find(cfg, q); > + if (found =3D=3D NULL) > + return NULL; > + > + TAILQ_REMOVE(&cfg->head, found, next); > + cfg->n_queues--; > + > + /* freeing is responsibility of the caller */ > + return found; > +} >=20 > static void > calc_tsc(void) > @@ -74,21 +152,56 @@ calc_tsc(void) > } > } >=20 > +static inline void > +queue_reset(struct pmd_core_cfg *cfg, struct queue_list_entry *qcfg) > +{ > + /* reset empty poll counter for this queue */ > + qcfg->n_empty_polls =3D 0; > + /* reset the sleep counter too */ > + cfg->n_queues_ready_to_sleep =3D 0; > +} > + > +static inline bool > +queue_can_sleep(struct pmd_core_cfg *cfg, struct queue_list_entry *qcfg) > +{ > + /* this function is called - that means we have an empty poll */ > + qcfg->n_empty_polls++; > + > + /* if we haven't reached threshold for empty polls, we can't sleep */ > + if (qcfg->n_empty_polls <=3D EMPTYPOLL_MAX) > + return false; > + > + /* we're ready to sleep */ > + cfg->n_queues_ready_to_sleep++; > + > + return true; > +} > + > +static inline bool > +lcore_can_sleep(struct pmd_core_cfg *cfg) > +{ > + /* are all queues ready to sleep? */ > + if (cfg->n_queues_ready_to_sleep !=3D cfg->n_queues) > + return false; > + > + /* we've reached an iteration where we can sleep, reset sleep counter *= / > + cfg->n_queues_ready_to_sleep =3D 0; > + > + return true; > +} As I can see it a slightly modified one from what was discussed. I understand that it seems simpler, but I think there are some problems wit= h it: - each queue can be counted more than once at lcore_cfg->n_queues_ready_to_= sleep - queues n_empty_polls are not reset after sleep(). To illustrate the problem, let say we have 2 queues, and at some moment we = have: q0.n_empty_polls =3D=3D EMPTYPOLL_MAX + 1 q1.n_empty_polls =3D=3D EMPTYPOLL_MAX + 1 cfg->n_queues_ready_to_sleep =3D=3D 2 So lcore_can_sleep() returns 'true' and sets: cfg->n_queues_ready_to_sleep =3D=3D 0 Now, after sleep(): q0.n_empty_polls =3D=3D EMPTYPOLL_MAX + 1 q1.n_empty_polls =3D=3D EMPTYPOLL_MAX + 1 So after: queue_can_sleep(q0); queue_can_sleep(q1); will have: cfg->n_queues_ready_to_sleep =3D=3D 2=20 again, and we'll go to another sleep after just one rx_burst() attempt for = each queue. > + > static uint16_t > clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte= _unused, > - uint16_t nb_rx, uint16_t max_pkts __rte_unused, > - void *addr __rte_unused) > + uint16_t nb_rx, uint16_t max_pkts __rte_unused, void *arg) > { > + struct queue_list_entry *queue_conf =3D arg; >=20 > - struct pmd_queue_cfg *q_conf; > - > - q_conf =3D &port_cfg[port_id][qidx]; > - > + /* this callback can't do more than one queue, omit multiqueue logic */ > if (unlikely(nb_rx =3D=3D 0)) { > - q_conf->empty_poll_stats++; > - if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { > + queue_conf->n_empty_polls++; > + if (unlikely(queue_conf->n_empty_polls > EMPTYPOLL_MAX)) { > struct rte_power_monitor_cond pmc; > - uint16_t ret; > + int ret; >=20 > /* use monitoring condition to sleep */ > ret =3D rte_eth_get_monitor_addr(port_id, qidx, > @@ -97,60 +210,77 @@ clb_umwait(uint16_t port_id, uint16_t qidx, struct r= te_mbuf **pkts __rte_unused, > rte_power_monitor(&pmc, UINT64_MAX); > } > } else > - q_conf->empty_poll_stats =3D 0; > + queue_conf->n_empty_polls =3D 0; >=20 > return nb_rx; > } >=20 > static uint16_t > -clb_pause(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_= unused, > - uint16_t nb_rx, uint16_t max_pkts __rte_unused, > - void *addr __rte_unused) > +clb_pause(uint16_t port_id __rte_unused, uint16_t qidx __rte_unused, > + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, > + uint16_t max_pkts __rte_unused, void *arg) > { > - struct pmd_queue_cfg *q_conf; > + const unsigned int lcore =3D rte_lcore_id(); > + struct queue_list_entry *queue_conf =3D arg; > + struct pmd_core_cfg *lcore_conf; > + const bool empty =3D nb_rx =3D=3D 0; >=20 > - q_conf =3D &port_cfg[port_id][qidx]; > + lcore_conf =3D &lcore_cfgs[lcore]; >=20 > - if (unlikely(nb_rx =3D=3D 0)) { > - q_conf->empty_poll_stats++; > - /* sleep for 1 microsecond */ > - if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { > - /* use tpause if we have it */ > - if (global_data.intrinsics_support.power_pause) { > - const uint64_t cur =3D rte_rdtsc(); > - const uint64_t wait_tsc =3D > - cur + global_data.tsc_per_us; > - rte_power_pause(wait_tsc); > - } else { > - uint64_t i; > - for (i =3D 0; i < global_data.pause_per_us; i++) > - rte_pause(); > - } > + if (likely(!empty)) > + /* early exit */ > + queue_reset(lcore_conf, queue_conf); > + else { > + /* can this queue sleep? */ > + if (!queue_can_sleep(lcore_conf, queue_conf)) > + return nb_rx; > + > + /* can this lcore sleep? */ > + if (!lcore_can_sleep(lcore_conf)) > + return nb_rx; > + > + /* sleep for 1 microsecond, use tpause if we have it */ > + if (global_data.intrinsics_support.power_pause) { > + const uint64_t cur =3D rte_rdtsc(); > + const uint64_t wait_tsc =3D > + cur + global_data.tsc_per_us; > + rte_power_pause(wait_tsc); > + } else { > + uint64_t i; > + for (i =3D 0; i < global_data.pause_per_us; i++) > + rte_pause(); > } > - } else > - q_conf->empty_poll_stats =3D 0; > + } >=20 > return nb_rx; > } >=20 > static uint16_t > -clb_scale_freq(uint16_t port_id, uint16_t qidx, > +clb_scale_freq(uint16_t port_id __rte_unused, uint16_t qidx __rte_unused= , > struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, > - uint16_t max_pkts __rte_unused, void *_ __rte_unused) > + uint16_t max_pkts __rte_unused, void *arg) > { > - struct pmd_queue_cfg *q_conf; > + const unsigned int lcore =3D rte_lcore_id(); > + const bool empty =3D nb_rx =3D=3D 0; > + struct pmd_core_cfg *lcore_conf =3D &lcore_cfgs[lcore]; > + struct queue_list_entry *queue_conf =3D arg; >=20 > - q_conf =3D &port_cfg[port_id][qidx]; > + if (likely(!empty)) { > + /* early exit */ > + queue_reset(lcore_conf, queue_conf); >=20 > - if (unlikely(nb_rx =3D=3D 0)) { > - q_conf->empty_poll_stats++; > - if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) > - /* scale down freq */ > - rte_power_freq_min(rte_lcore_id()); > - } else { > - q_conf->empty_poll_stats =3D 0; > - /* scale up freq */ > + /* scale up freq immediately */ > rte_power_freq_max(rte_lcore_id()); > + } else { > + /* can this queue sleep? */ > + if (!queue_can_sleep(lcore_conf, queue_conf)) > + return nb_rx; > + > + /* can this lcore sleep? */ > + if (!lcore_can_sleep(lcore_conf)) > + return nb_rx; > + > + rte_power_freq_min(rte_lcore_id()); > } >=20 > return nb_rx; > @@ -167,11 +297,80 @@ queue_stopped(const uint16_t port_id, const uint16_= t queue_id) > return qinfo.queue_state =3D=3D RTE_ETH_QUEUE_STATE_STOPPED; > } >=20 > +static int > +cfg_queues_stopped(struct pmd_core_cfg *queue_cfg) > +{ > + const struct queue_list_entry *entry; > + > + TAILQ_FOREACH(entry, &queue_cfg->head, next) { > + const union queue *q =3D &entry->queue; > + int ret =3D queue_stopped(q->portid, q->qid); > + if (ret !=3D 1) > + return ret; > + } > + return 1; > +} > + > +static int > +check_scale(unsigned int lcore) > +{ > + enum power_management_env env; > + > + /* only PSTATE and ACPI modes are supported */ > + if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) && > + !rte_power_check_env_supported(PM_ENV_PSTATE_CPUFREQ)) { > + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n")= ; > + return -ENOTSUP; > + } > + /* ensure we could initialize the power library */ > + if (rte_power_init(lcore)) > + return -EINVAL; > + > + /* ensure we initialized the correct env */ > + env =3D rte_power_get_env(); > + if (env !=3D PM_ENV_ACPI_CPUFREQ && env !=3D PM_ENV_PSTATE_CPUFREQ) { > + RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized\= n"); > + return -ENOTSUP; > + } > + > + /* we're done */ > + return 0; > +} > + > +static int > +check_monitor(struct pmd_core_cfg *cfg, const union queue *qdata) > +{ > + struct rte_power_monitor_cond dummy; > + > + /* check if rte_power_monitor is supported */ > + if (!global_data.intrinsics_support.power_monitor) { > + RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); > + return -ENOTSUP; > + } > + > + if (cfg->n_queues > 0) { > + RTE_LOG(DEBUG, POWER, "Monitoring multiple queues is not supported\n")= ; > + return -ENOTSUP; > + } > + > + /* check if the device supports the necessary PMD API */ > + if (rte_eth_get_monitor_addr(qdata->portid, qdata->qid, > + &dummy) =3D=3D -ENOTSUP) { > + RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_get_monitor= _addr\n"); > + return -ENOTSUP; > + } > + > + /* we're done */ > + return 0; > +} > + > int > rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port= _id, > uint16_t queue_id, enum rte_power_pmd_mgmt_type mode) > { > - struct pmd_queue_cfg *queue_cfg; > + const union queue qdata =3D {.portid =3D port_id, .qid =3D queue_id}; > + struct pmd_core_cfg *lcore_cfg; > + struct queue_list_entry *queue_cfg; > struct rte_eth_dev_info info; > rte_rx_callback_fn clb; > int ret; > @@ -202,9 +401,19 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lco= re_id, uint16_t port_id, > goto end; > } >=20 > - queue_cfg =3D &port_cfg[port_id][queue_id]; > + lcore_cfg =3D &lcore_cfgs[lcore_id]; >=20 > - if (queue_cfg->pwr_mgmt_state !=3D PMD_MGMT_DISABLED) { > + /* check if other queues are stopped as well */ > + ret =3D cfg_queues_stopped(lcore_cfg); > + if (ret !=3D 1) { > + /* error means invalid queue, 0 means queue wasn't stopped */ > + ret =3D ret < 0 ? -EINVAL : -EBUSY; > + goto end; > + } > + > + /* if callback was already enabled, check current callback type */ > + if (lcore_cfg->pwr_mgmt_state !=3D PMD_MGMT_DISABLED && > + lcore_cfg->cb_mode !=3D mode) { > ret =3D -EINVAL; > goto end; > } > @@ -214,53 +423,20 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lc= ore_id, uint16_t port_id, >=20 > switch (mode) { > case RTE_POWER_MGMT_TYPE_MONITOR: > - { > - struct rte_power_monitor_cond dummy; > - > - /* check if rte_power_monitor is supported */ > - if (!global_data.intrinsics_support.power_monitor) { > - RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); > - ret =3D -ENOTSUP; > + /* check if we can add a new queue */ > + ret =3D check_monitor(lcore_cfg, &qdata); > + if (ret < 0) > goto end; > - } >=20 > - /* check if the device supports the necessary PMD API */ > - if (rte_eth_get_monitor_addr(port_id, queue_id, > - &dummy) =3D=3D -ENOTSUP) { > - RTE_LOG(DEBUG, POWER, "The device does not support rte_eth_get_monito= r_addr\n"); > - ret =3D -ENOTSUP; > - goto end; > - } > clb =3D clb_umwait; > break; > - } > case RTE_POWER_MGMT_TYPE_SCALE: > - { > - enum power_management_env env; > - /* only PSTATE and ACPI modes are supported */ > - if (!rte_power_check_env_supported(PM_ENV_ACPI_CPUFREQ) && > - !rte_power_check_env_supported( > - PM_ENV_PSTATE_CPUFREQ)) { > - RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes are supported\n"= ); > - ret =3D -ENOTSUP; > + /* check if we can add a new queue */ > + ret =3D check_scale(lcore_id); > + if (ret < 0) > goto end; > - } > - /* ensure we could initialize the power library */ > - if (rte_power_init(lcore_id)) { > - ret =3D -EINVAL; > - goto end; > - } > - /* ensure we initialized the correct env */ > - env =3D rte_power_get_env(); > - if (env !=3D PM_ENV_ACPI_CPUFREQ && > - env !=3D PM_ENV_PSTATE_CPUFREQ) { > - RTE_LOG(DEBUG, POWER, "Neither ACPI nor PSTATE modes were initialized= \n"); > - ret =3D -ENOTSUP; > - goto end; > - } > clb =3D clb_scale_freq; > break; > - } > case RTE_POWER_MGMT_TYPE_PAUSE: > /* figure out various time-to-tsc conversions */ > if (global_data.tsc_per_us =3D=3D 0) > @@ -273,13 +449,23 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lc= ore_id, uint16_t port_id, > ret =3D -EINVAL; > goto end; > } > + /* add this queue to the list */ > + ret =3D queue_list_add(lcore_cfg, &qdata); > + if (ret < 0) { > + RTE_LOG(DEBUG, POWER, "Failed to add queue to list: %s\n", > + strerror(-ret)); > + goto end; > + } > + /* new queue is always added last */ > + queue_cfg =3D TAILQ_LAST(&lcore_cfgs->head, queue_list_head); >=20 > /* initialize data before enabling the callback */ > - queue_cfg->empty_poll_stats =3D 0; > - queue_cfg->cb_mode =3D mode; > - queue_cfg->pwr_mgmt_state =3D PMD_MGMT_ENABLED; > - queue_cfg->cur_cb =3D rte_eth_add_rx_callback(port_id, queue_id, > - clb, NULL); > + if (lcore_cfg->n_queues =3D=3D 1) { > + lcore_cfg->cb_mode =3D mode; > + lcore_cfg->pwr_mgmt_state =3D PMD_MGMT_ENABLED; > + } > + queue_cfg->cb =3D rte_eth_add_rx_callback(port_id, queue_id, > + clb, queue_cfg); >=20 > ret =3D 0; > end: > @@ -290,7 +476,9 @@ int > rte_power_ethdev_pmgmt_queue_disable(unsigned int lcore_id, > uint16_t port_id, uint16_t queue_id) > { > - struct pmd_queue_cfg *queue_cfg; > + const union queue qdata =3D {.portid =3D port_id, .qid =3D queue_id}; > + struct pmd_core_cfg *lcore_cfg; > + struct queue_list_entry *queue_cfg; > int ret; >=20 > RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); > @@ -306,24 +494,40 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int l= core_id, > } >=20 > /* no need to check queue id as wrong queue id would not be enabled */ > - queue_cfg =3D &port_cfg[port_id][queue_id]; > + lcore_cfg =3D &lcore_cfgs[lcore_id]; >=20 > - if (queue_cfg->pwr_mgmt_state !=3D PMD_MGMT_ENABLED) > + /* check if other queues are stopped as well */ > + ret =3D cfg_queues_stopped(lcore_cfg); > + if (ret !=3D 1) { > + /* error means invalid queue, 0 means queue wasn't stopped */ > + return ret < 0 ? -EINVAL : -EBUSY; > + } > + > + if (lcore_cfg->pwr_mgmt_state !=3D PMD_MGMT_ENABLED) > return -EINVAL; >=20 > - /* stop any callbacks from progressing */ > - queue_cfg->pwr_mgmt_state =3D PMD_MGMT_DISABLED; > + /* > + * There is no good/easy way to do this without race conditions, so we > + * are just going to throw our hands in the air and hope that the user > + * has read the documentation and has ensured that ports are stopped at > + * the time we enter the API functions. > + */ > + queue_cfg =3D queue_list_take(lcore_cfg, &qdata); > + if (queue_cfg =3D=3D NULL) > + return -ENOENT; >=20 > - switch (queue_cfg->cb_mode) { > + /* if we've removed all queues from the lists, set state to disabled */ > + if (lcore_cfg->n_queues =3D=3D 0) > + lcore_cfg->pwr_mgmt_state =3D PMD_MGMT_DISABLED; > + > + switch (lcore_cfg->cb_mode) { > case RTE_POWER_MGMT_TYPE_MONITOR: /* fall-through */ > case RTE_POWER_MGMT_TYPE_PAUSE: > - rte_eth_remove_rx_callback(port_id, queue_id, > - queue_cfg->cur_cb); > + rte_eth_remove_rx_callback(port_id, queue_id, queue_cfg->cb); > break; > case RTE_POWER_MGMT_TYPE_SCALE: > rte_power_freq_max(lcore_id); > - rte_eth_remove_rx_callback(port_id, queue_id, > - queue_cfg->cur_cb); > + rte_eth_remove_rx_callback(port_id, queue_id, queue_cfg->cb); > rte_power_exit(lcore_id); > break; > } > @@ -332,7 +536,18 @@ rte_power_ethdev_pmgmt_queue_disable(unsigned int lc= ore_id, > * ports before calling any of these API's, so we can assume that the > * callbacks can be freed. we're intentionally casting away const-ness. > */ > - rte_free((void *)queue_cfg->cur_cb); > + rte_free((void *)queue_cfg->cb); > + free(queue_cfg); >=20 > return 0; > } > + > +RTE_INIT(rte_power_ethdev_pmgmt_init) { > + size_t i; > + > + /* initialize all tailqs */ > + for (i =3D 0; i < RTE_DIM(lcore_cfgs); i++) { > + struct pmd_core_cfg *cfg =3D &lcore_cfgs[i]; > + TAILQ_INIT(&cfg->head); > + } > +} > -- > 2.25.1