From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id AAF8DA00BE; Thu, 28 May 2020 14:15:27 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 445411DB9E; Thu, 28 May 2020 14:15:27 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 70B1E1DB9C for ; Thu, 28 May 2020 14:15:25 +0200 (CEST) IronPort-SDR: 6N509OFI4cSzeIm4g2Kf/WsB7XgDYBio7JGAENHutj2+PeidsM3+53rkVnQ1UmMs32p90sNR6K yZXfonj4QiwQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga005.fm.intel.com ([10.253.24.32]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2020 05:15:24 -0700 IronPort-SDR: kd2E+DWPVv6fNWUKM2XXwB5hgCB4bToA1mw2b1MVk+IzyrPqcXWkrN6bGEyNxQwksZVojZGdsE yVd6i0R4osXA== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,444,1583222400"; d="scan'208";a="469095410" Received: from orsmsx108.amr.corp.intel.com ([10.22.240.6]) by fmsmga005.fm.intel.com with ESMTP; 28 May 2020 05:15:06 -0700 Received: from orsmsx161.amr.corp.intel.com (10.22.240.84) by ORSMSX108.amr.corp.intel.com (10.22.240.6) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 28 May 2020 05:15:05 -0700 Received: from ORSEDG002.ED.cps.intel.com (10.7.248.5) by ORSMSX161.amr.corp.intel.com (10.22.240.84) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 28 May 2020 05:15:06 -0700 Received: from NAM10-BN7-obe.outbound.protection.outlook.com (104.47.70.101) by edgegateway.intel.com (134.134.137.101) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 28 May 2020 05:15:05 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=oXk9rC2uRxEXmAV2vEzy368CCplMTrMt0Sg13KU52oa45Q13sf09LiZsjjjk0kLSyYVC7FBTMkGU3BojP6NJ5YsmRZUyWTESnZOl5gWLtaW0rLR7vCEFUMyc3SJ9QFjV3bhF3z2yAl2rfzKIQZqHNR/ckm6oiraFHz0ocA2vip7Lv5iLnXZvumCZD+uGzB37e/GLBf/mOXSAXDBgra+FaRGBJMaGLqtbUzM1JudkUsoglyJH9m8kSeNedzZEzYZy9/QY4I/qfr/TwzLQ3e6q4W7o+TBryCYxYW0ly1nUoMhLrWCuW4thGpZuUfCliAgx06b64oOo4aiYDGSqVFg1/w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BDN6f1YD/ToyLYX8mH6mITcfempyK6CWQTNZfm4heU4=; b=DZuPCdb2uwSK+jOOopujDNesBpPiZhSsu0wjKqCNk6ARhcRjUvI3JXIL0RDsFSfUiq5+ACsDQg0tXIVyKENkY0vlJVP1D2jt5H+f41cZTmVQEcMf3OtazfBQK6xktBcR5aLf4dsMfD+gHw9ijQkiar0kqTavnVOzT9p+4QpfgG6iEcfrIKnNwrz3UGYvyf63mgon7BcH3h110BsGn3rQLODu11ymEqUamQ1rOhOmmNgTVvjBekDyGKyHioV+cLUH4k0lIpxZ4X8X7xeKx3PtjphhsL/xOSqELjtg54hVFbtwNEbbnDLamt+lnO5qKpjU+ZmdYLlvYbJCN+4CpQWbJg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=BDN6f1YD/ToyLYX8mH6mITcfempyK6CWQTNZfm4heU4=; b=nHcmtweFA5z/7JvjXSCi3qb+gHhFXq7AR+QZMpSvYEoY7p1yU8OhyoU8aQsXMmJZOdyja6sMeN8kHPAkLCb+f0KHFyNAx4IHwCjYFXkD7TCxj78y8ZCWFurlt2x0UdssjEooSYN1KQPMHDpI9VX9dF+foAEQMvxm3rO/NdinM/A= Received: from BYAPR11MB3301.namprd11.prod.outlook.com (2603:10b6:a03:7f::26) by BYAPR11MB2758.namprd11.prod.outlook.com (2603:10b6:a02:c9::11) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3045.18; Thu, 28 May 2020 12:15:03 +0000 Received: from BYAPR11MB3301.namprd11.prod.outlook.com ([fe80::f160:29ab:b8f9:4189]) by BYAPR11MB3301.namprd11.prod.outlook.com ([fe80::f160:29ab:b8f9:4189%6]) with mapi id 15.20.3045.018; Thu, 28 May 2020 12:15:03 +0000 From: "Ananyev, Konstantin" To: "Burakov, Anatoly" , "dev@dpdk.org" CC: Thomas Monjalon , "Yigit, Ferruh" , Andrew Rybchenko , "Ray Kinsella" , Neil Horman , "Hunt, David" , "Ma, Liang J" Thread-Topic: [dpdk-dev] [RFC 2/6] ethdev: add simple power management API Thread-Index: AQHWNEjI82HMU/fAW06Ay3Ys+iBHP6i9YOsQ Date: Thu, 28 May 2020 12:15:03 +0000 Message-ID: References: <4b726f267b9d8c0d7bb1579fe6b2f4d640d75675.1590598121.git.anatoly.burakov@intel.com> In-Reply-To: <4b726f267b9d8c0d7bb1579fe6b2f4d640d75675.1590598121.git.anatoly.burakov@intel.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.2.0.6 authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=intel.com; x-originating-ip: [192.198.151.191] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 6005c875-fb6d-455c-732e-08d80300c066 x-ms-traffictypediagnostic: BYAPR11MB2758: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:4303; x-forefront-prvs: 0417A3FFD2 x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: Sz+JkQZ2gPKK7YdXH45LbKW7kIeQccY1MDLrzXygpccnyIAfnm/8IMIzr36LSuLztdsGkg2qkMbnzEKbUGbjvMXyC9VQ4q53XirLkZJeBmO0ZlBo/kx45JJwNKwtyrNDwh1MK/iQ03Mrqe5W1ejj+gz72XmlN9VKcSRv6BF5f0f9V366Js7epJBLn2u6xJSun0NxXYdAyVthhDxPmSKGk8MjKGfN30nOF4i8gQDV3DAFJu+BVxzwSRCmbVOd8WM/bYN7uzQ0P0ElrnjCWpil3eHuPsKy9TTnEcvSjtx5lvZHoNLNhj0ACCzuX5hLcFi5 x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB3301.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(396003)(136003)(39860400002)(366004)(346002)(376002)(6506007)(8676002)(316002)(30864003)(2906002)(478600001)(107886003)(9686003)(33656002)(54906003)(110136005)(52536014)(66446008)(66946007)(55016002)(71200400001)(66476007)(64756008)(66556008)(86362001)(76116006)(4326008)(7696005)(83380400001)(8936002)(5660300002)(26005)(186003); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: 3i4BARc1GE3T5ZsQ3d6Iwm2u7hiI2m1224yTl/2T0faxa6bIj7qwAlTOMZB1kitzmwMZPntM6qiEcwcwSXqPOHe48uZgoYUYBQkvWO++sflPDhD1Uc/QHidbKi3P6OIlLrFZeei2P6WbKHU8i80yBHFUob25wfTnV1tYOyGbFIda29/2OFlFqpU579jYEWQU1oLPyZN/XNBWhHBZPkS3MJHPSHMFhfhTPbspOEgw6x7Tua6YV3OK1k6h8ONnHCW5FcbutyWDfmd/YEu7Vf7R+AGzGJrDOMHuzinKp68UdeIqDr6mCbub6nsKrj+e2fpqtPRh+j7HafyrIdtJyMVCm6jYJvcPIuNnYNQ1lNwmZlc1ZOa55oi/H8K7In6VCsJwC/kBBLujXzU1uAESXSukYRhTffEMxBqLUHbRHKCXiRsfmvWSXcohw2bmsa1TXk1ynYH0nXGNCHXh/9/pbutjIdteeTteX6phINQVvF38lrFgRYTgdr7gA65pZcFPtKFP Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 6005c875-fb6d-455c-732e-08d80300c066 X-MS-Exchange-CrossTenant-originalarrivaltime: 28 May 2020 12:15:03.1750 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: A2gM079aV39NdDXyNdGIfQ7xVuEA/uZdf7cxybDylnVdYnNMG/IyTB+XX3UrKz2iTdFM5uLmdkd6H+DkjQcLbcC/DutpBbrQHAAJwqfyQiE= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB2758 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [RFC 2/6] ethdev: add simple power management API X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" >=20 > Add a simple on/off switch that will enable saving power when no > packets are arriving. It is based on counting the number of empty > polls and, when the number reaches a certain threshold, entering an > architecture-defined optimized power state that will either wait > until a TSC timestamp expires, or when packets arrive. >=20 > This API is limited to 1 core 1 queue use case as there is no > coordination between queues/cores in ethdev. >=20 > The TSC timestamp is automatically calculated using current link > speed and RX descriptor ring size, such that the sleep time is > not longer than it would take for a NIC to fill its entire RX > descriptor ring. >=20 > Signed-off-by: Liang J. Ma > Signed-off-by: Anatoly Burakov > --- > lib/librte_ethdev/rte_ethdev.c | 39 +++++++++++++ > lib/librte_ethdev/rte_ethdev.h | 70 ++++++++++++++++++++++++ > lib/librte_ethdev/rte_ethdev_core.h | 41 +++++++++++++- > lib/librte_ethdev/rte_ethdev_version.map | 4 ++ > 4 files changed, 152 insertions(+), 2 deletions(-) >=20 > diff --git a/lib/librte_ethdev/rte_ethdev.c b/lib/librte_ethdev/rte_ethde= v.c > index 8e10a6fc36..0be5ecfc11 100644 > --- a/lib/librte_ethdev/rte_ethdev.c > +++ b/lib/librte_ethdev/rte_ethdev.c > @@ -16,6 +16,7 @@ > #include >=20 > #include > +#include > #include > #include > #include > @@ -5053,6 +5054,44 @@ rte_eth_dev_pool_ops_supported(uint16_t port_id, c= onst char *pool) > return (*dev->dev_ops->pool_ops_supported)(dev, pool); > } >=20 > +int > +rte_eth_dev_power_mgmt_enable(uint16_t port_id) > +{ > + struct rte_eth_dev *dev; > + > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); > + dev =3D &rte_eth_devices[port_id]; > + > + if (!rte_cpu_get_flag_enabled(RTE_CPUFLAG_WAITPKG)) > + return -ENOTSUP; > + > + /* allocate memory for empty poll stats */ > + dev->empty_poll_stats =3D rte_malloc_socket(NULL, > + sizeof(struct rte_eth_ep_stat) * RTE_MAX_QUEUES_PER_PORT, > + 0, dev->data->numa_node); > + > + if (dev->empty_poll_stats =3D=3D NULL) > + return -ENOMEM; > + > + dev->pwr_mgmt_state =3D RTE_ETH_DEV_POWER_MGMT_ENABLED; > + return 0; > +} > + > +int > +rte_eth_dev_power_mgmt_disable(uint16_t port_id) > +{ > + struct rte_eth_dev *dev; > + > + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); > + dev =3D &rte_eth_devices[port_id]; > + > + /* rte_free ignores NULL so safe to call without checks */ > + rte_free(dev->empty_poll_stats); > + > + dev->pwr_mgmt_state =3D RTE_ETH_DEV_POWER_MGMT_DISABLED; > + return 0; > +} > + > /** > * A set of values to describe the possible states of a switch domain. > */ > diff --git a/lib/librte_ethdev/rte_ethdev.h b/lib/librte_ethdev/rte_ethde= v.h > index a49242bcd2..b8318f7e91 100644 > --- a/lib/librte_ethdev/rte_ethdev.h > +++ b/lib/librte_ethdev/rte_ethdev.h > @@ -157,6 +157,7 @@ extern "C" { > #include > #include > #include > +#include >=20 > #include "rte_ethdev_trace_fp.h" > #include "rte_dev_info.h" > @@ -666,6 +667,7 @@ rte_eth_rss_hf_refine(uint64_t rss_hf) > /** Maximum nb. of vlan per mirror rule */ > #define ETH_MIRROR_MAX_VLANS 64 >=20 > +#define ETH_EMPTYPOLL_MAX 512 /**< Empty poll number threshlold= */ > #define ETH_MIRROR_VIRTUAL_POOL_UP 0x01 /**< Virtual Pool uplink Mi= rroring. */ > #define ETH_MIRROR_UPLINK_PORT 0x02 /**< Uplink Port Mirroring.= */ > #define ETH_MIRROR_DOWNLINK_PORT 0x04 /**< Downlink Port Mirrorin= g. */ > @@ -1490,6 +1492,16 @@ enum rte_eth_dev_state { > RTE_ETH_DEV_REMOVED, > }; >=20 > +/** > + * Possible power managment states of an ethdev port. > + */ > +enum rte_eth_dev_power_mgmt_state { > + /** Device power management is disabled. */ > + RTE_ETH_DEV_POWER_MGMT_DISABLED =3D 0, > + /** Device power management is enabled. */ > + RTE_ETH_DEV_POWER_MGMT_ENABLED > +}; > + > struct rte_eth_dev_sriov { > uint8_t active; /**< SRIOV is active with 16, 32 or 64 po= ols */ > uint8_t nb_q_per_pool; /**< rx queue number per pool */ > @@ -4302,6 +4314,38 @@ __rte_experimental > int rte_eth_dev_hairpin_capability_get(uint16_t port_id, > struct rte_eth_hairpin_cap *cap); >=20 > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change, or be removed, without prior no= tice > + * > + * Enable device power management. > + * > + * @param port_id > + * The port identifier of the Ethernet device. > + * > + * @return > + * 0 on success > + * <0 on error > + */ > +__rte_experimental > +int rte_eth_dev_power_mgmt_enable(uint16_t port_id); > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change, or be removed, without prior no= tice > + * > + * Disable device power management. > + * > + * @param port_id > + * The port identifier of the Ethernet device. > + * > + * @return > + * 0 on success > + * <0 on error > + */ > +__rte_experimental > +int rte_eth_dev_power_mgmt_disable(uint16_t port_id); > + > #include >=20 > /** > @@ -4417,6 +4461,32 @@ rte_eth_rx_burst(uint16_t port_id, uint16_t queue_= id, > } while (cb !=3D NULL); > } > #endif > + if (dev->pwr_mgmt_state =3D=3D RTE_ETH_DEV_POWER_MGMT_ENABLED) { > + if (unlikely(nb_rx =3D=3D 0)) { > + dev->empty_poll_stats[queue_id].num++; > + if (unlikely(dev->empty_poll_stats[queue_id].num > > + ETH_EMPTYPOLL_MAX)) { > + volatile void *target_addr; > + uint64_t expected, mask; > + int ret; > + > + /* > + * get address of next descriptor in the RX > + * ring for this queue, as well as expected > + * value and a mask. > + */ > + ret =3D (*dev->dev_ops->next_rx_desc) > + (dev->data->rx_queues[queue_id], > + &target_addr, &expected, &mask); That makes every PMD that doesn't support next_rx_desc op to crash. One simple way to avoid it - check in rte_eth_dev_power_mgmt_enable() that = PMD does implement ops->next_rx_desc. Though I don't think introducing such new op is a best approach, as it impl= ies that PMD does have HW RX descriptor mapped into WB-type memory, and dictate= s=20 to PMD on what it should sleep on. Though depending on HW/SW capabilities and implementation PMD might choose = to sleep on different thing (HW doorbell, SW cond var, etc.). Another thing - I doubt it is a good idea to pollute generic RX function wi= th power specific code (again, as I said above it probably wouldn't be that generic = for all possible PMDs). >From my perspective we have 2 alternatives to implement such functionality: 1. Keep rte_eth_dev_power_mgmt_enable/disable(port, queue) and move actual= =20 *wait_on* code into the PMD RX implementations (we probably can still h= ave some common. =20 logic about allowed number of empty polls, max timeout to sleep, etc.). 2. Drop rte_eth_dev_power_mgmt_enable/disable and introduce explicit: rte_eth_dev_wait_for_packet(port, queue, timeout) API function. =20 In both cases PMD will have a full freedom to implement *wait_on_packet* fu= nctionality=20 in a most convenient way. For 2) user would have to do some extra work himself (count number of consecutive empty polls, call *wait_on_packet* function ex= plicitly). Though I think it can be easily hidden inside some wrapper API on top of rte_eth_rx_burst()/rte_eth-dev_wait_for_packet(). Something like rte_eth_rx_burst_wait() or so. We can have logic about allowed number of empty polls, might be some other conditions in that top level function. In that case changes in the user app will still be minimal.=20 >From other side 2) gives user explicit control on where and when to sleep, so from my perspective it seems more straightforward and flexible. > + if (ret =3D=3D 0) > + /* -1ULL is maximum value for TSC */ > + rte_power_monitor(target_addr, > + expected, mask, > + 0, -1ULL); > + } > + } else > + dev->empty_poll_stats[queue_id].num =3D 0; > + } >=20 > rte_ethdev_trace_rx_burst(port_id, queue_id, (void **)rx_pkts, nb_rx); > return nb_rx; > diff --git a/lib/librte_ethdev/rte_ethdev_core.h b/lib/librte_ethdev/rte_= ethdev_core.h > index 32407dd418..4e23d465f0 100644 > --- a/lib/librte_ethdev/rte_ethdev_core.h > +++ b/lib/librte_ethdev/rte_ethdev_core.h > @@ -603,6 +603,27 @@ typedef int (*eth_tx_hairpin_queue_setup_t) > uint16_t nb_tx_desc, > const struct rte_eth_hairpin_conf *hairpin_conf); >=20 > +/** > + * @internal > + * Get the next RX ring descriptor address. > + * > + * @param rxq > + * ethdev queue pointer. > + * @param tail_desc_addr > + * the pointer point to descriptor address var. > + * > + * @return > + * Negative errno value on error, 0 on success. > + * > + * @retval 0 > + * Success. > + * @retval -EINVAL > + * Failed to get descriptor address. > + */ > +typedef int (*eth_next_rx_desc_t) > + (void *rxq, volatile void **tail_desc_addr, > + uint64_t *expected, uint64_t *mask); > + > /** > * @internal A structure containing the functions exported by an Etherne= t driver. > */ > @@ -752,6 +773,8 @@ struct eth_dev_ops { > /**< Set up device RX hairpin queue. */ > eth_tx_hairpin_queue_setup_t tx_hairpin_queue_setup; > /**< Set up device TX hairpin queue. */ > + eth_next_rx_desc_t next_rx_desc; > + /**< Get next RX ring descriptor address. */ > }; >=20 > /** > @@ -768,6 +791,14 @@ struct rte_eth_rxtx_callback { > void *param; > }; >=20 > +/** > + * @internal > + * Structure used to hold counters for empty poll > + */ > +struct rte_eth_ep_stat { > + uint64_t num; > +} __rte_cache_aligned; > + > /** > * @internal > * The generic data structure associated with each ethernet device. > @@ -807,8 +838,14 @@ struct rte_eth_dev { > enum rte_eth_dev_state state; /**< Flag indicating the port state */ > void *security_ctx; /**< Context for security ops */ >=20 > - uint64_t reserved_64s[4]; /**< Reserved for future fields */ > - void *reserved_ptrs[4]; /**< Reserved for future fields */ > + /**< Empty poll number */ > + enum rte_eth_dev_power_mgmt_state pwr_mgmt_state; > + uint32_t reserved_32; > + uint64_t reserved_64s[3]; /**< Reserved for future fields */ > + > + /**< Flag indicating the port power state */ > + struct rte_eth_ep_stat *empty_poll_stats; > + void *reserved_ptrs[3]; /**< Reserved for future fields */ > } __rte_cache_aligned; >=20 > struct rte_eth_dev_sriov; > diff --git a/lib/librte_ethdev/rte_ethdev_version.map b/lib/librte_ethdev= /rte_ethdev_version.map > index 7155056045..141361823d 100644 > --- a/lib/librte_ethdev/rte_ethdev_version.map > +++ b/lib/librte_ethdev/rte_ethdev_version.map > @@ -241,4 +241,8 @@ EXPERIMENTAL { > __rte_ethdev_trace_rx_burst; > __rte_ethdev_trace_tx_burst; > rte_flow_get_aged_flows; > + > + # added in 20.08 > + rte_eth_dev_power_mgmt_disable; > + rte_eth_dev_power_mgmt_enable; > }; > -- > 2.17.1