From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E1BDEA0C41; Wed, 23 Jun 2021 11:56:56 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id C54144003F; Wed, 23 Jun 2021 11:56:56 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by mails.dpdk.org (Postfix) with ESMTP id 748064003E for ; Wed, 23 Jun 2021 11:56:55 +0200 (CEST) IronPort-SDR: /sOJSx0AFXgdpzM5A7U8815Oqg8+VZJsnjqbvOQKpOc/msklB7h3auQpDI//PINI0pH8fuu8zJ 4BDIl8U0gcaw== X-IronPort-AV: E=McAfee;i="6200,9189,10023"; a="292858478" X-IronPort-AV: E=Sophos;i="5.83,293,1616482800"; d="scan'208";a="292858478" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2021 02:56:54 -0700 IronPort-SDR: 5EyPrKqluThirNW458B0LqR8G5XJO4R8j9PjmDM4kL/ZvKK+oC5vmEsAZxPEIHDnIOeTLOPdxp uqNGm8h/iRnw== X-IronPort-AV: E=Sophos;i="5.83,293,1616482800"; d="scan'208";a="556093053" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.203.112]) ([10.213.203.112]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2021 02:56:53 -0700 To: "Ananyev, Konstantin" , "dev@dpdk.org" , "Hunt, David" , Ray Kinsella , Neil Horman Cc: "Loftus, Ciara" References: <601f09ad562f97ca1e0077cb36ba46e448df382d.1622548381.git.anatoly.burakov@intel.com> From: "Burakov, Anatoly" Message-ID: <10918197-9793-0c46-4021-34145092b868@intel.com> Date: Wed, 23 Jun 2021 10:56:49 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v1 5/7] power: support callbacks for multiple Rx queues X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 23-Jun-21 10:49 AM, Ananyev, Konstantin wrote: > >> >> On 22-Jun-21 10:41 AM, Ananyev, Konstantin wrote: >>> >>>> Currently, there is a hard limitation on the PMD power management >>>> support that only allows it to support a single queue per lcore. This is >>>> not ideal as most DPDK use cases will poll multiple queues per core. >>>> >>>> The PMD power management mechanism relies on ethdev Rx callbacks, so it >>>> is very difficult to implement such support because callbacks are >>>> effectively stateless and have no visibility into what the other ethdev >>>> devices are doing. This places limitations on what we can do within the >>>> framework of Rx callbacks, but the basics of this implementation are as >>>> follows: >>>> >>>> - Replace per-queue structures with per-lcore ones, so that any device >>>> polled from the same lcore can share data >>>> - Any queue that is going to be polled from a specific lcore has to be >>>> added to the list of cores to poll, so that the callback is aware of >>>> other queues being polled by the same lcore >>>> - Both the empty poll counter and the actual power saving mechanism is >>>> shared between all queues polled on a particular lcore, and is only >>>> activated when a special designated "power saving" queue is polled. To >>>> put it another way, we have no idea which queue the user will poll in >>>> what order, so we rely on them telling us that queue X is the last one >>>> in the polling loop, so any power management should happen there. >>>> - A new API is added to mark a specific Rx queue as "power saving". >>> >>> Honestly, I don't understand the logic behind that new function. >>> I understand that depending on HW we ca monitor either one or multiple queues. >>> That's ok, but why we now need to mark one queue as a 'very special' one? >> >> Because we don't know which of the queues we are supposed to sleep on. >> >> Imagine a situation where you have 3 queues. What usually happens is you >> poll them in a loop, so q0, q1, q2, q0, q1, q2... etc. We only want to >> enter power-optimized state on polling q2, because otherwise we're >> risking going into power optimized state while q1 or q2 have traffic. > > That's why before going to sleep we need to make sure that for *all* queues > we have at least EMPTYPOLL_MAX empty polls. > Then the order of queue checking wouldn't matter. > With your example it should be: > if (q1.empty_polls > EMPTYPOLL_MAX && q2. empty_polls > EMPTYPOLL_MAX && > q3.empy_pools > EMPTYPOLL_MAX) > goto_sleep; > > Don't take me wrong, I am not suggesting to make *precisely* that checks > in the actual code (it could be time consuming if number of checks is big), > but the logic needs to remain. > The empty poll counter is *per core*, not *per queue*. All the shared data is per core. We only increment empty poll counter on last queue, but we drop it to 0 on any queue that has received traffic. That way, we can avoid checking/incrementing empty poll counters for multiple queues. In other words, this is effectively achieving what you're suggesting, but without per-queue checks. Of course, i could make it per-queue like before, but then we just end up doing way more checks on every callback and basically need to have the same logic anyway, so why bother? >> >> Worst case scenario, we enter sleep after polling q0, then traffic >> arrives at q2, we wake up, and then attempt to go to sleep on q1 instead >> of skipping it. Essentially, we will be attempting to sleep at every >> queue, instead of once in a loop. This *might* be OK for multi-monitor >> because we'll be aborting sleep due to sleep condition check failure, >> but for modes like rte_pause()/rte_power_pause()-based sleep, we will be >> entering sleep unconditionally, and will be risking to sleep at q1 while >> there's traffic at q2. >> >> So, we need this mechanism to be activated once every *loop*, not per queue. >> >>> Why can't rte_power_ethdev_pmgmt_queue_enable() just: >>> Check is number of monitored queues exceed HW/SW capabilities, >>> and if so then just return a failure. >>> Otherwise add queue to the list and treat them all equally, i.e: >>> go to power save mode when number of sequential empty polls on >>> all monitored queues will exceed EMPTYPOLL_MAX threshold? >>> >>>> Failing to call this API will result in no power management, however >>>> when having only one queue per core it is obvious which queue is the >>>> "power saving" one, so things will still work without this new API for >>>> use cases that were previously working without it. >>>> - The limitation on UMWAIT-based polling is not removed because UMWAIT >>>> is incapable of monitoring more than one address. >>>> >>>> Signed-off-by: Anatoly Burakov >>>> --- >>>> lib/power/rte_power_pmd_mgmt.c | 335 ++++++++++++++++++++++++++------- >>>> lib/power/rte_power_pmd_mgmt.h | 34 ++++ >>>> lib/power/version.map | 3 + >>>> 3 files changed, 306 insertions(+), 66 deletions(-) >>>> >>>> diff --git a/lib/power/rte_power_pmd_mgmt.c b/lib/power/rte_power_pmd_mgmt.c >>>> index 0707c60a4f..60dd21a19c 100644 >>>> --- a/lib/power/rte_power_pmd_mgmt.c >>>> +++ b/lib/power/rte_power_pmd_mgmt.c >>>> @@ -33,7 +33,19 @@ enum pmd_mgmt_state { >>>> PMD_MGMT_ENABLED >>>> }; >>>> >>>> -struct pmd_queue_cfg { >>>> +struct queue { >>>> + uint16_t portid; >>>> + uint16_t qid; >>>> +}; >>> >>> Just a thought: if that would help somehow, it can be changed to: >>> union queue { >>> uint32_t raw; >>> struct { uint16_t portid, qid; >>> }; >>> }; >>> >>> That way in queue find/cmp functions below you can operate with single raw 32-bt values. >>> Probably not that important, as all these functions are on slow path, but might look nicer. >> >> Sure, that can work. We actually do comparisons with power save queue on >> fast path, so maybe that'll help. >> >>> >>>> +struct pmd_core_cfg { >>>> + struct queue queues[RTE_MAX_ETHPORTS]; >>> >>> If we'll have ability to monitor multiple queues per lcore, would it be always enough? >>> From other side, it is updated on control path only. >>> Wouldn't normal list with malloc(/rte_malloc) would be more suitable here? >> >> You're right, it should be dynamically allocated. >> >>> >>>> + /**< Which port-queue pairs are associated with this lcore? */ >>>> + struct queue power_save_queue; >>>> + /**< When polling multiple queues, all but this one will be ignored */ >>>> + bool power_save_queue_set; >>>> + /**< When polling multiple queues, power save queue must be set */ >>>> + size_t n_queues; >>>> + /**< How many queues are in the list? */ >>>> volatile enum pmd_mgmt_state pwr_mgmt_state; >>>> /**< State of power management for this queue */ >>>> enum rte_power_pmd_mgmt_type cb_mode; >>>> @@ -43,8 +55,97 @@ struct pmd_queue_cfg { >>>> uint64_t empty_poll_stats; >>>> /**< Number of empty polls */ >>>> } __rte_cache_aligned; >>>> +static struct pmd_core_cfg lcore_cfg[RTE_MAX_LCORE]; >>>> >>>> -static struct pmd_queue_cfg port_cfg[RTE_MAX_ETHPORTS][RTE_MAX_QUEUES_PER_PORT]; >>>> +static inline bool >>>> +queue_equal(const struct queue *l, const struct queue *r) >>>> +{ >>>> + return l->portid == r->portid && l->qid == r->qid; >>>> +} >>>> + >>>> +static inline void >>>> +queue_copy(struct queue *dst, const struct queue *src) >>>> +{ >>>> + dst->portid = src->portid; >>>> + dst->qid = src->qid; >>>> +} >>>> + >>>> +static inline bool >>>> +queue_is_power_save(const struct pmd_core_cfg *cfg, const struct queue *q) { >>> >>> Here and in other places - any reason why standard DPDK coding style is not used? >> >> Just accidental :) >> >>> >>>> + const struct queue *pwrsave = &cfg->power_save_queue; >>>> + >>>> + /* if there's only single queue, no need to check anything */ >>>> + if (cfg->n_queues == 1) >>>> + return true; >>>> + return cfg->power_save_queue_set && queue_equal(q, pwrsave); >>>> +} >>>> + >> >> >> -- >> Thanks, >> Anatoly -- Thanks, Anatoly