From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0A450A0C41; Wed, 23 Jun 2021 11:36:19 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 915C64003F; Wed, 23 Jun 2021 11:36:18 +0200 (CEST) Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by mails.dpdk.org (Postfix) with ESMTP id D58814003E for ; Wed, 23 Jun 2021 11:36:16 +0200 (CEST) IronPort-SDR: I84RtPJZnup6iWIpaffgZ9vjMyZblBrbo+1hku4jXHixqfXZrYYCTfT5oz4q6tdp/bTGz1cgj4 68dtKNgL2bNQ== X-IronPort-AV: E=McAfee;i="6200,9189,10023"; a="271073623" X-IronPort-AV: E=Sophos;i="5.83,293,1616482800"; d="scan'208";a="271073623" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2021 02:36:15 -0700 IronPort-SDR: 83xGw8UenZp58SKiCXKj2c9UFzK9KW9+ICPuvOCgdsJcfQaRMpGVCoC4YubjfzvtduiUbjeDMY +wtex6Ph5Shw== X-IronPort-AV: E=Sophos;i="5.83,293,1616482800"; d="scan'208";a="556088391" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.203.112]) ([10.213.203.112]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2021 02:36:13 -0700 To: "Ananyev, Konstantin" , "dev@dpdk.org" , "Hunt, David" , Ray Kinsella , Neil Horman Cc: "Loftus, Ciara" References: <601f09ad562f97ca1e0077cb36ba46e448df382d.1622548381.git.anatoly.burakov@intel.com> From: "Burakov, Anatoly" Message-ID: Date: Wed, 23 Jun 2021 10:36:09 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v1 5/7] power: support callbacks for multiple Rx queues X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 22-Jun-21 10:41 AM, Ananyev, Konstantin wrote: > >> Currently, there is a hard limitation on the PMD power management >> support that only allows it to support a single queue per lcore. This is >> not ideal as most DPDK use cases will poll multiple queues per core. >> >> The PMD power management mechanism relies on ethdev Rx callbacks, so it >> is very difficult to implement such support because callbacks are >> effectively stateless and have no visibility into what the other ethdev >> devices are doing. This places limitations on what we can do within the >> framework of Rx callbacks, but the basics of this implementation are as >> follows: >> >> - Replace per-queue structures with per-lcore ones, so that any device >> polled from the same lcore can share data >> - Any queue that is going to be polled from a specific lcore has to be >> added to the list of cores to poll, so that the callback is aware of >> other queues being polled by the same lcore >> - Both the empty poll counter and the actual power saving mechanism is >> shared between all queues polled on a particular lcore, and is only >> activated when a special designated "power saving" queue is polled. To >> put it another way, we have no idea which queue the user will poll in >> what order, so we rely on them telling us that queue X is the last one >> in the polling loop, so any power management should happen there. >> - A new API is added to mark a specific Rx queue as "power saving". > > Honestly, I don't understand the logic behind that new function. > I understand that depending on HW we ca monitor either one or multiple queues. > That's ok, but why we now need to mark one queue as a 'very special' one? Because we don't know which of the queues we are supposed to sleep on. Imagine a situation where you have 3 queues. What usually happens is you poll them in a loop, so q0, q1, q2, q0, q1, q2... etc. We only want to enter power-optimized state on polling q2, because otherwise we're risking going into power optimized state while q1 or q2 have traffic. Worst case scenario, we enter sleep after polling q0, then traffic arrives at q2, we wake up, and then attempt to go to sleep on q1 instead of skipping it. Essentially, we will be attempting to sleep at every queue, instead of once in a loop. This *might* be OK for multi-monitor because we'll be aborting sleep due to sleep condition check failure, but for modes like rte_pause()/rte_power_pause()-based sleep, we will be entering sleep unconditionally, and will be risking to sleep at q1 while there's traffic at q2. So, we need this mechanism to be activated once every *loop*, not per queue. > Why can't rte_power_ethdev_pmgmt_queue_enable() just: > Check is number of monitored queues exceed HW/SW capabilities, > and if so then just return a failure. > Otherwise add queue to the list and treat them all equally, i.e: > go to power save mode when number of sequential empty polls on > all monitored queues will exceed EMPTYPOLL_MAX threshold? > >> Failing to call this API will result in no power management, however >> when having only one queue per core it is obvious which queue is the >> "power saving" one, so things will still work without this new API for >> use cases that were previously working without it. >> - The limitation on UMWAIT-based polling is not removed because UMWAIT >> is incapable of monitoring more than one address. >> >> Signed-off-by: Anatoly Burakov >> --- >> lib/power/rte_power_pmd_mgmt.c | 335 ++++++++++++++++++++++++++------- >> lib/power/rte_power_pmd_mgmt.h | 34 ++++ >> lib/power/version.map | 3 + >> 3 files changed, 306 insertions(+), 66 deletions(-) >> >> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c >> index 0707c60a4f..60dd21a19c 100644 >> --- a/lib/power/rte_power_pmd_mgmt.c >> +++ b/lib/power/rte_power_pmd_mgmt.c >> @@ -33,7 +33,19 @@ enum pmd_mgmt_state { >> PMD_MGMT_ENABLED >> }; >> >> -struct pmd_queue_cfg { >> +struct queue { >> + uint16_t portid; >> + uint16_t qid; >> +}; > > Just a thought: if that would help somehow, it can be changed to: > union queue { > uint32_t raw; > struct { uint16_t portid, qid; > }; > }; > > That way in queue find/cmp functions below you can operate with single raw 32-bt values. > Probably not that important, as all these functions are on slow path, but might look nicer. Sure, that can work. We actually do comparisons with power save queue on fast path, so maybe that'll help. > >> +struct pmd_core_cfg { >> + struct queue queues[RTE_MAX_ETHPORTS]; > > If we'll have ability to monitor multiple queues per lcore, would it be always enough? > From other side, it is updated on control path only. > Wouldn't normal list with malloc(/rte_malloc) would be more suitable here? You're right, it should be dynamically allocated. > >> + /**< Which port-queue pairs are associated with this lcore? */ >> + struct queue power_save_queue; >> + /**< When polling multiple queues, all but this one will be ignored */ >> + bool power_save_queue_set; >> + /**< When polling multiple queues, power save queue must be set */ >> + size_t n_queues; >> + /**< How many queues are in the list? */ >> volatile enum pmd_mgmt_state pwr_mgmt_state; >> /**< State of power management for this queue */ >> enum rte_power_pmd_mgmt_type cb_mode; >> @@ -43,8 +55,97 @@ struct pmd_queue_cfg { >> uint64_t empty_poll_stats; >> /**< Number of empty polls */ >> } __rte_cache_aligned; >> +static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE]; >> >> -static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; >> +static inline bool >> +queue_equal(const struct queue *l, const struct queue *r) >> +{ >> + return l->portid == r->portid && l->qid == r->qid; >> +} >> + >> +static inline void >> +queue_copy(struct queue *dst, const struct queue *src) >> +{ >> + dst->portid = src->portid; >> + dst->qid = src->qid; >> +} >> + >> +static inline bool >> +queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) { > > Here and in other places - any reason why standard DPDK coding style is not used? Just accidental :) > >> + const struct queue *pwrsave = &cfg->power_save_queue; >> + >> + /* if there's only single queue, no need to check anything */ >> + if (cfg->n_queues == 1) >> + return true; >> + return cfg->power_save_queue_set && queue_equal(q, pwrsave); >> +} >> + -- Thanks, Anatoly