From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id A0C39A0A0C; Mon, 28 Jun 2021 16:09:11 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2276940692; Mon, 28 Jun 2021 16:09:11 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by mails.dpdk.org (Postfix) with ESMTP id 8226D4068A for ; Mon, 28 Jun 2021 16:09:09 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10029"; a="206134881" X-IronPort-AV: E=Sophos;i="5.83,306,1616482800"; d="scan'208";a="206134881" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2021 07:09:08 -0700 X-IronPort-AV: E=Sophos;i="5.83,306,1616482800"; d="scan'208";a="446583199" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.249.202]) ([10.213.249.202]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2021 07:09:06 -0700 To: "Ananyev, Konstantin" , "dev@dpdk.org" , "Hunt, David" Cc: "Loftus, Ciara" References: <676eab0e1eb6c63acb170893675daa5a39eac29d.1624884053.git.anatoly.burakov@intel.com> From: "Burakov, Anatoly" Message-ID: <77f30ce1-40b0-b427-c0f9-359a350358a1@intel.com> Date: Mon, 28 Jun 2021 15:09:03 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v3 6/7] power: support monitoring multiple Rx queues X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 28-Jun-21 2:29 PM, Ananyev, Konstantin wrote: > > >> Use the new multi-monitor intrinsic to allow monitoring multiple ethdev >> Rx queues while entering the energy efficient power state. The multi >> version will be used unconditionally if supported, and the UMWAIT one >> will only be used when multi-monitor is not supported by the hardware. >> >> Signed-off-by: Anatoly Burakov >> --- >> doc/guides/prog_guide/power_man.rst | 9 ++-- >> lib/power/rte_power_pmd_mgmt.c | 76 ++++++++++++++++++++++++++++- >> 2 files changed, 80 insertions(+), 5 deletions(-) >> >> diff --git a/doc/guides/prog_guide/power_man.rst b/doc/guides/prog_guide/power_man.rst >> index fac2c19516..3245a5ebed 100644 >> --- a/doc/guides/prog_guide/power_man.rst >> +++ b/doc/guides/prog_guide/power_man.rst >> @@ -221,13 +221,16 @@ power saving whenever empty poll count reaches a certain number. >> The "monitor" mode is only supported in the following configurations and scenarios: >> >> * If ``rte_cpu_get_intrinsics_support()`` function indicates that >> + ``rte_power_monitor_multi()`` function is supported by the platform, then >> + monitoring multiple Ethernet Rx queues for traffic will be supported. >> + >> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that only >> ``rte_power_monitor()`` is supported by the platform, then monitoring will be >> limited to a mapping of 1 core 1 queue (thus, each Rx queue will have to be >> monitored from a different lcore). >> >> -* If ``rte_cpu_get_intrinsics_support()`` function indicates that the >> - ``rte_power_monitor()`` function is not supported, then monitor mode will not >> - be supported. >> +* If ``rte_cpu_get_intrinsics_support()`` function indicates that neither of the >> + two monitoring functions are supported, then monitor mode will not be supported. >> >> * Not all Ethernet devices support monitoring, even if the underlying >> platform may support the necessary CPU instructions. Please refer to >> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c >> index 7762cd39b8..aab2d4f1ee 100644 >> --- a/lib/power/rte_power_pmd_mgmt.c >> +++ b/lib/power/rte_power_pmd_mgmt.c >> @@ -155,6 +155,24 @@ queue_list_remove(struct pmd_core_cfg *cfg, const union queue *q) >> return 0; >> } >> >> +static inline int >> +get_monitor_addresses(struct pmd_core_cfg *cfg, >> + struct rte_power_monitor_cond *pmc) >> +{ >> + const struct queue_list_entry *qle; >> + size_t i = 0; >> + int ret; >> + >> + TAILQ_FOREACH(qle, &cfg->head, next) { >> + struct rte_power_monitor_cond *cur = &pmc[i]; > > Looks like you never increment 'i' value inside that function. > Also it probably will be safer to add 'num' parameter to check that > we will never over-run pmc[] boundaries. Will fix in v4, good catch! > >> + const union queue *q = &qle->queue; >> + ret = rte_eth_get_monitor_addr(q->portid, q->qid, cur); >> + if (ret < 0) >> + return ret; >> + } >> + return 0; >> +} >> + >> static void >> calc_tsc(void) >> { >> @@ -183,6 +201,48 @@ calc_tsc(void) >> } >> } >> >> +static uint16_t >> +clb_multiwait(uint16_t port_id, uint16_t qidx, >> + struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx, >> + uint16_t max_pkts __rte_unused, void *addr __rte_unused) >> +{ >> + const unsigned int lcore = rte_lcore_id(); >> + const union queue q = {.portid = port_id, .qid = qidx}; >> + const bool empty = nb_rx == 0; >> + struct pmd_core_cfg *q_conf; >> + >> + q_conf = &lcore_cfg[lcore]; >> + >> + /* early exit */ >> + if (likely(!empty)) { >> + q_conf->empty_poll_stats = 0; >> + } else { >> + /* do we care about this particular queue? */ >> + if (!queue_is_power_save(q_conf, &q)) >> + return nb_rx; > > I still don't understand the need of 'special' power_save queue here... > Why we can't just have a function: > > get_number_of_queues_whose_sequential_empty_polls_less_then_threshold(struct pmd_core_cfg *lcore_cfg), > and then just: > > /* all queues have at least EMPTYPOLL_MAX sequential empty polls */ > if (get_number_of_queues_whose_sequential_empty_polls_less_then_threshold(q_conf) == 0) { > /* go into power-save mode here */ > } Okay, let's go through this step by step :) Let's suppose we have three queues - q0, q1 and q2. We want to sleep whenever there's no traffic on *all of them*, however we cannot know that until we have checked all of them. So, let's suppose that q0, q1 and q2 were empty all this time, but now some traffic arrived at q2 while we're still checking q0. We see that q0 is empty, and all of the queues were empty for the last N polls, so we think we will be safe to sleep at q0 despite the fact that traffic has just arrived at q2. This is not an issue with MONITOR mode because we will be able to see if current Rx ring descriptor is busy or not via the NIC callback, *but this is not possible* with PAUSE and SCALE modes, because they don't have the sneaky lookahead function of MONITOR! So, with PAUSE and SCALE modes, it is possible to end up in a situation where you *think* you don't have any traffic, but you actually do, you just haven't checked the relevant queue yet. In order to prevent this from happening, we do not sleep on every queue, instead we sleep *once* per loop. That is, we check q0, check q1, check q2, and only then we decide whether we want to sleep or not. Of course, with such scheme it is still possible to e.g. sleep in q2 while there's traffic waiting in q0, but worst case is less bad with this scheme, because we'll be doing at worst 1 extra sleep. Whereas with what you're suggesting, if we had e.g. 10 queues to poll, and we checked q1 but traffic has just arrived at q0, we'll be sleeping at q1, then we'll be sleeping at q2, then we'll be sleeping at q3, then we'll be sleeping at q4, then we'll be sleeping at q5.... and 9 sleeps later we finally reach q0 and find out after all this time that we shouldn't have slept in the first place. Hopefully you get the point now :) So, the idea here is, for any N queues, sleep only once, not N times. > >> + >> + /* >> + * we can increment unconditionally here because if there were >> + * non-empty polls in other queues assigned to this core, we >> + * dropped the counter to zero anyway. >> + */ >> + q_conf->empty_poll_stats++; >> + if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) { >> + struct rte_power_monitor_cond pmc[RTE_MAX_ETHPORTS]; > > I think you need here: > struct rte_power_monitor_cond pmc[q_conf->n_queues]; I think VLA's are generally agreed upon to be unsafe, so i'm avoiding them here. > > >> + uint16_t ret; >> + >> + /* gather all monitoring conditions */ >> + ret = get_monitor_addresses(q_conf, pmc); >> + >> + if (ret == 0) >> + rte_power_monitor_multi(pmc, >> + q_conf->n_queues, UINT64_MAX); >> + } >> + } >> + >> + return nb_rx; >> +} >> + >> static uint16_t >> clb_umwait(uint16_t port_id, uint16_t qidx, struct rte_mbuf **pkts __rte_unused, >> uint16_t nb_rx, uint16_t max_pkts __rte_unused, >> @@ -348,14 +408,19 @@ static int >> check_monitor(struct pmd_core_cfg *cfg, const union queue *qdata) >> { >> struct rte_power_monitor_cond dummy; >> + bool multimonitor_supported; >> >> /* check if rte_power_monitor is supported */ >> if (!global_data.intrinsics_support.power_monitor) { >> RTE_LOG(DEBUG, POWER, "Monitoring intrinsics are not supported\n"); >> return -ENOTSUP; >> } >> + /* check if multi-monitor is supported */ >> + multimonitor_supported = >> + global_data.intrinsics_support.power_monitor_multi; >> >> - if (cfg->n_queues > 0) { >> + /* if we're adding a new queue, do we support multiple queues? */ >> + if (cfg->n_queues > 0 && !multimonitor_supported) { >> RTE_LOG(DEBUG, POWER, "Monitoring multiple queues is not supported\n"); >> return -ENOTSUP; >> } >> @@ -371,6 +436,13 @@ check_monitor(struct pmd_core_cfg *cfg, const union queue *qdata) >> return 0; >> } >> >> +static inline rte_rx_callback_fn >> +get_monitor_callback(void) >> +{ >> + return global_data.intrinsics_support.power_monitor_multi ? >> + clb_multiwait : clb_umwait; >> +} >> + >> int >> rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, >> uint16_t queue_id, enum rte_power_pmd_mgmt_type mode) >> @@ -434,7 +506,7 @@ rte_power_ethdev_pmgmt_queue_enable(unsigned int lcore_id, uint16_t port_id, >> if (ret < 0) >> goto end; >> >> - clb = clb_umwait; >> + clb = get_monitor_callback(); >> break; >> case RTE_POWER_MGMT_TYPE_SCALE: >> /* check if we can add a new queue */ >> -- >> 2.25.1 > -- Thanks, Anatoly