From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 18633A0A02; Wed, 13 Jan 2021 18:29:13 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C6705140E9B; Wed, 13 Jan 2021 18:29:12 +0100 (CET) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by mails.dpdk.org (Postfix) with ESMTP id 295F5140E9A for ; Wed, 13 Jan 2021 18:29:09 +0100 (CET) IronPort-SDR: ga7R6+6khUjm5VxD557aRZYN3PzWD9cP/WvhnleXbsUNzUj4RaERa3cgTRsGp9kUKxTAhc6u5H RGGpGF7VCqRA== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="158016465" X-IronPort-AV: E=Sophos;i="5.79,344,1602572400"; d="scan'208";a="158016465" Received: from orsmga008.jf.intel.com ([10.7.209.65]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2021 09:29:09 -0800 IronPort-SDR: BVp6b4N21uwUpescM8PGIvq0T7JNR+a38Bwy7SJhKWcYfPILmgcQktq9fKOR7vJgrUh0z6ls4X E3aFjk+Rsc0g== X-IronPort-AV: E=Sophos;i="5.79,344,1602572400"; d="scan'208";a="381937158" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.253.108]) ([10.213.253.108]) by orsmga008-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jan 2021 09:29:06 -0800 To: "Ananyev, Konstantin" , "dev@dpdk.org" Cc: "Ma, Liang J" , "Hunt, David" , Ray Kinsella , Neil Horman , "thomas@monjalon.net" , "McDaniel, Timothy" , "Richardson, Bruce" , "Macnamara, Chris" References: <7618b351d6f3be9eba0f6cc862440fb126f934d2.1610473000.git.anatoly.burakov@intel.com> From: "Burakov, Anatoly" Message-ID: <5044aa4a-ed24-df3e-b0b2-cf6e05d839a8@intel.com> Date: Wed, 13 Jan 2021 17:29:04 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v16 07/11] power: add PMD power management API and callback X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 13-Jan-21 12:58 PM, Ananyev, Konstantin wrote: > > >> -----Original Message----- >> From: Burakov, Anatoly >> Sent: Tuesday, January 12, 2021 5:37 PM >> To: dev@dpdk.org >> Cc: Ma, Liang J ; Hunt, David ; Ray Kinsella ; Neil Horman >> ; thomas@monjalon.net; Ananyev, Konstantin ; McDaniel, Timothy >> ; Richardson, Bruce ; Macnamara, Chris >> Subject: [PATCH v16 07/11] power: add PMD power management API and callback >> >> From: Liang Ma >> >> Add a simple on/off switch that will enable saving power when no >> packets are arriving. It is based on counting the number of empty >> polls and, when the number reaches a certain threshold, entering an >> architecture-defined optimized power state that will either wait >> until a TSC timestamp expires, or when packets arrive. >> >> This API mandates a core-to-single-queue mapping (that is, multiple >> queued per device are supported, but they have to be polled on different >> cores). >> >> This design is using PMD RX callbacks. >> >> 1. UMWAIT/UMONITOR: >> >> When a certain threshold of empty polls is reached, the core will go >> into a power optimized sleep while waiting on an address of next RX >> descriptor to be written to. >> >> 2. TPAUSE/Pause instruction >> >> This method uses the pause (or TPAUSE, if available) instruction to >> avoid busy polling. >> >> 3. Frequency scaling >> Reuse existing DPDK power library to scale up/down core frequency >> depending on traffic volume. >> >> Signed-off-by: Liang Ma >> Signed-off-by: Anatoly Burakov >> --- >> >> Notes: >> v15: >> - Fix check in UMWAIT callback >> >> v13: >> - Rework the synchronization mechanism to not require locking >> - Add more parameter checking >> - Rework n_rx_queues access to not go through internal PMD structures and use >> public API instead >> >> v13: >> - Rework the synchronization mechanism to not require locking >> - Add more parameter checking >> - Rework n_rx_queues access to not go through internal PMD structures and use >> public API instead >> >> doc/guides/prog_guide/power_man.rst | 44 +++ >> doc/guides/rel_notes/release_21_02.rst | 10 + >> lib/librte_power/meson.build | 5 +- >> lib/librte_power/rte_power_pmd_mgmt.c | 359 +++++++++++++++++++++++++ >> lib/librte_power/rte_power_pmd_mgmt.h | 90 +++++++ >> lib/librte_power/version.map | 5 + >> 6 files changed, 511 insertions(+), 2 deletions(-) >> create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c >> create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h >> > > ... > >> + >> +static uint16_t >> +clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, >> +uint16_t nb_rx, uint16_t max_pkts __rte_unused, >> +void *addr __rte_unused) >> +{ >> + >> +struct pmd_queue_cfg *q_conf; >> + >> +q_conf = &port_cfg[port_id][qidx]; >> + >> +if (unlikely(nb_rx == 0)) { >> +q_conf->empty_poll_stats++; >> +if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { >> +struct rte_power_monitor_cond pmc; >> +uint16_t ret; >> + >> +/* >> + * we might get a cancellation request while being >> + * inside the callback, in which case the wakeup >> + * wouldn't work because it would've arrived too early. >> + * >> + * to get around this, we notify the other thread that >> + * we're sleeping, so that it can spin until we're done. >> + * unsolicited wakeups are perfectly safe. >> + */ >> +q_conf->umwait_in_progress = true; > > This write and subsequent read can be reordered by the cpu. > I think you need rte_atomic_thread_fence(__ATOMIC_SEQ_CST) here and > in disable() code-path below. > >> + >> +/* check if we need to cancel sleep */ >> +if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) { >> +/* use monitoring condition to sleep */ >> +ret = rte_eth_get_monitor_addr(port_id, qidx, >> +&pmc); >> +if (ret == 0) >> +rte_power_monitor(&pmc, -1ULL); >> +} >> +q_conf->umwait_in_progress = false; >> +} >> +} else >> +q_conf->empty_poll_stats = 0; >> + >> +return nb_rx; >> +} >> + > > ... > >> + >> +int >> +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, >> +uint16_t port_id, uint16_t queue_id) >> +{ >> +struct pmd_queue_cfg *queue_cfg; >> + >> +RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); >> + >> +if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT) >> +return -EINVAL; >> + >> +/* no need to check queue id as wrong queue id would not be enabled */ >> +queue_cfg = &port_cfg[port_id][queue_id]; >> + >> +if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED) >> +return -EINVAL; >> + >> +/* let the callback know we're shutting down */ >> +queue_cfg->pwr_mgmt_state = PMD_MGMT_BUSY; > > Same as above - write to pwr_mgmt_state and read from umwait_in_progress > could be reordered by cpu. > Need to insert rte_atomic_thread_fence(__ATOMIC_SEQ_CST) between them. > > BTW, out of curiosity - why do you need this intermediate > state (PMD_MGMT_BUSY) at all? > Why not directly: > queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; > ? > Thanks for suggestions, i'll add those. The goal for the "intermediate" step is to prevent Rx callback from sleeping in the first place. We can't "wake up" earlier than it goes to sleep, but we may get a request to disable power management while we're at the beginning of the callback and haven't yet entered the rte_power_monitor code. In this case, setting it to "BUSY" will prevent the callback from ever sleeping in the first place (see rte_power_pmd_mgmt:108 check), and will unset the "umwait in progress" if there was any. So, we have three situations to handle: 1) wake up during umwait 2) "wake up" during callback after we've set the "umwait in progress" flag but before actual umwait happens - we don't wait to exit before we're sure there's nothing sleeping there 3) "wake up" during callback before we set the "umwait in progress" flag 1) is handled by the rte_power_monitor_wakeup() call, so that's taken care of. 2) is handled by the other thread waiting on "umwait in progress" becoming false. 3) is handled by having this BUSY check in the umwait thread. Hope that made sense! -- Thanks, Anatoly