From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 5ACA3A0A02; Thu, 14 Jan 2021 14:00:15 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3521214123D; Thu, 14 Jan 2021 14:00:15 +0100 (CET) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by mails.dpdk.org (Postfix) with ESMTP id B6C66141237 for ; Thu, 14 Jan 2021 14:00:13 +0100 (CET) IronPort-SDR: s10qvViTdPrATHlHjy7DfiRgRg/VQvOp5AAhvAzkF9dm25Pn6q9L5Jg+8wMNgZLjSPlfuxprPP pjyKwR3AoFSw== X-IronPort-AV: E=McAfee;i="6000,8403,9863"; a="263149866" X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="263149866" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 05:00:12 -0800 IronPort-SDR: SU5zW5559ntnM8N2NajCmhro7NuUPEyQ7DnQ4yQE7Jn1uLlUKV8eQYodzLunGt9A4dHH8cLkIk zCDU70tfSpLQ== X-IronPort-AV: E=Sophos;i="5.79,347,1602572400"; d="scan'208";a="465243385" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.255.126]) ([10.213.255.126]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 14 Jan 2021 05:00:10 -0800 From: "Burakov, Anatoly" To: "Ananyev, Konstantin" , "dev@dpdk.org" Cc: "Ma, Liang J" , "Hunt, David" , Ray Kinsella , Neil Horman , "thomas@monjalon.net" , "McDaniel, Timothy" , "Richardson, Bruce" , "Macnamara, Chris" References: <7618b351d6f3be9eba0f6cc862440fb126f934d2.1610473000.git.anatoly.burakov@intel.com> <5044aa4a-ed24-df3e-b0b2-cf6e05d839a8@intel.com> Message-ID: Date: Thu, 14 Jan 2021 13:00:07 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <5044aa4a-ed24-df3e-b0b2-cf6e05d839a8@intel.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit Subject: Re: [dpdk-dev] [PATCH v16 07/11] power: add PMD power management API and callback X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 13-Jan-21 5:29 PM, Burakov, Anatoly wrote: > On 13-Jan-21 12:58 PM, Ananyev, Konstantin wrote: >> >> >>> -----Original Message----- >>> From: Burakov, Anatoly >>> Sent: Tuesday, January 12, 2021 5:37 PM >>> To: dev@dpdk.org >>> Cc: Ma, Liang J ; Hunt, David >>> ; Ray Kinsella ; Neil Horman >>> ; thomas@monjalon.net; Ananyev, Konstantin >>> ; McDaniel, Timothy >>> ; Richardson, Bruce >>> ; Macnamara, Chris >>> >>> Subject: [PATCH v16 07/11] power: add PMD power management API and >>> callback >>> >>> From: Liang Ma >>> >>> Add a simple on/off switch that will enable saving power when no >>> packets are arriving. It is based on counting the number of empty >>> polls and, when the number reaches a certain threshold, entering an >>> architecture-defined optimized power state that will either wait >>> until a TSC timestamp expires, or when packets arrive. >>> >>> This API mandates a core-to-single-queue mapping (that is, multiple >>> queued per device are supported, but they have to be polled on different >>> cores). >>> >>> This design is using PMD RX callbacks. >>> >>> 1. UMWAIT/UMONITOR: >>> >>>     When a certain threshold of empty polls is reached, the core will go >>>     into a power optimized sleep while waiting on an address of next RX >>>     descriptor to be written to. >>> >>> 2. TPAUSE/Pause instruction >>> >>>     This method uses the pause (or TPAUSE, if available) instruction to >>>     avoid busy polling. >>> >>> 3. Frequency scaling >>>     Reuse existing DPDK power library to scale up/down core frequency >>>     depending on traffic volume. >>> >>> Signed-off-by: Liang Ma >>> Signed-off-by: Anatoly Burakov >>> --- >>> >>> Notes: >>>      v15: >>>      - Fix check in UMWAIT callback >>> >>>      v13: >>>      - Rework the synchronization mechanism to not require locking >>>      - Add more parameter checking >>>      - Rework n_rx_queues access to not go through internal PMD >>> structures and use >>>        public API instead >>> >>>      v13: >>>      - Rework the synchronization mechanism to not require locking >>>      - Add more parameter checking >>>      - Rework n_rx_queues access to not go through internal PMD >>> structures and use >>>        public API instead >>> >>>   doc/guides/prog_guide/power_man.rst    |  44 +++ >>>   doc/guides/rel_notes/release_21_02.rst |  10 + >>>   lib/librte_power/meson.build           |   5 +- >>>   lib/librte_power/rte_power_pmd_mgmt.c  | 359 +++++++++++++++++++++++++ >>>   lib/librte_power/rte_power_pmd_mgmt.h  |  90 +++++++ >>>   lib/librte_power/version.map           |   5 + >>>   6 files changed, 511 insertions(+), 2 deletions(-) >>>   create mode 100644 lib/librte_power/rte_power_pmd_mgmt.c >>>   create mode 100644 lib/librte_power/rte_power_pmd_mgmt.h >>> >> >> ... >> >>> + >>> +static uint16_t >>> +clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts >>> __rte_unused, >>> +uint16_t nb_rx, uint16_t max_pkts __rte_unused, >>> +void *addr __rte_unused) >>> +{ >>> + >>> +struct pmd_queue_cfg *q_conf; >>> + >>> +q_conf = &port_cfg[port_id][qidx]; >>> + >>> +if (unlikely(nb_rx == 0)) { >>> +q_conf->empty_poll_stats++; >>> +if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { >>> +struct rte_power_monitor_cond pmc; >>> +uint16_t ret; >>> + >>> +/* >>> + * we might get a cancellation request while being >>> + * inside the callback, in which case the wakeup >>> + * wouldn't work because it would've arrived too early. >>> + * >>> + * to get around this, we notify the other thread that >>> + * we're sleeping, so that it can spin until we're done. >>> + * unsolicited wakeups are perfectly safe. >>> + */ >>> +q_conf->umwait_in_progress = true; >> >> This write and subsequent read can be reordered by the cpu. >> I think you need rte_atomic_thread_fence(__ATOMIC_SEQ_CST) here and >> in disable() code-path below. >> >>> + >>> +/* check if we need to cancel sleep */ >>> +if (q_conf->pwr_mgmt_state == PMD_MGMT_ENABLED) { >>> +/* use monitoring condition to sleep */ >>> +ret = rte_eth_get_monitor_addr(port_id, qidx, >>> +&pmc); >>> +if (ret == 0) >>> +rte_power_monitor(&pmc, -1ULL); >>> +} >>> +q_conf->umwait_in_progress = false; >>> +} >>> +} else >>> +q_conf->empty_poll_stats = 0; >>> + >>> +return nb_rx; >>> +} >>> + >> >> ... >> >>> + >>> +int >>> +rte_power_pmd_mgmt_queue_disable(unsigned int lcore_id, >>> +uint16_t port_id, uint16_t queue_id) >>> +{ >>> +struct pmd_queue_cfg *queue_cfg; >>> + >>> +RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -EINVAL); >>> + >>> +if (lcore_id >= RTE_MAX_LCORE || queue_id >= RTE_MAX_QUEUES_PER_PORT) >>> +return -EINVAL; >>> + >>> +/* no need to check queue id as wrong queue id would not be enabled */ >>> +queue_cfg = &port_cfg[port_id][queue_id]; >>> + >>> +if (queue_cfg->pwr_mgmt_state != PMD_MGMT_ENABLED) >>> +return -EINVAL; >>> + >>> +/* let the callback know we're shutting down */ >>> +queue_cfg->pwr_mgmt_state = PMD_MGMT_BUSY; >> >> Same as above - write to pwr_mgmt_state and read from umwait_in_progress >> could be reordered by cpu. >> Need to insert rte_atomic_thread_fence(__ATOMIC_SEQ_CST) between them. >> >> BTW, out of curiosity - why do you need this intermediate >> state (PMD_MGMT_BUSY) at all? >> Why not directly: >> queue_cfg->pwr_mgmt_state = PMD_MGMT_DISABLED; >> ? >> > > Thanks for suggestions, i'll add those. > > The goal for the "intermediate" step is to prevent Rx callback from > sleeping in the first place. We can't "wake up" earlier than it goes to > sleep, but we may get a request to disable power management while we're > at the beginning of the callback and haven't yet entered the > rte_power_monitor code. > > In this case, setting it to "BUSY" will prevent the callback from ever > sleeping in the first place (see rte_power_pmd_mgmt:108 check), and will > unset the "umwait in progress" if there was any. > > So, we have three situations to handle: > > 1) wake up during umwait > 2) "wake up" during callback after we've set the "umwait in progress" > flag but before actual umwait happens - we don't wait to exit before > we're sure there's nothing sleeping there > 3) "wake up" during callback before we set the "umwait in progress" flag > > 1) is handled by the rte_power_monitor_wakeup() call, so that's taken > care of. 2) is handled by the other thread waiting on "umwait in > progress" becoming false. 3) is handled by having this BUSY check in the > umwait thread. > > Hope that made sense! > On further thoughts, the "BUSY" thing relies on a hidden assumption that enable/disable power management per queue is supposed to be thread safe. If we let go of this assumption, we can get by with just enable/disable, so i think i'll just document the thread safety and leave out the "BUSY" part. -- Thanks, Anatoly