From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 63F10A0A0C;
	Tue, 29 Jun 2021 13:05:18 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id B9CD44117C;
	Tue, 29 Jun 2021 13:05:17 +0200 (CEST)
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by mails.dpdk.org (Postfix) with ESMTP id 677C940E01
 for <dev@dpdk.org>; Tue, 29 Jun 2021 13:05:15 +0200 (CEST)
X-IronPort-AV: E=McAfee;i="6200,9189,10029"; a="271981488"
X-IronPort-AV: E=Sophos;i="5.83,308,1616482800"; d="scan'208";a="271981488"
Received: from orsmga005.jf.intel.com ([10.7.209.41])
 by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Jun 2021 04:05:13 -0700
X-IronPort-AV: E=Sophos;i="5.83,308,1616482800"; d="scan'208";a="625590325"
Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.252.170])
 ([10.213.252.170])
 by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 29 Jun 2021 04:05:11 -0700
To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>,
 "dev@dpdk.org" <dev@dpdk.org>, "Hunt, David" <david.hunt@intel.com>
Cc: "Loftus, Ciara" <ciara.loftus@intel.com>
References: <cover.1624629506.git.anatoly.burakov@intel.com>
 <cover.1624884053.git.anatoly.burakov@intel.com>
 <676eab0e1eb6c63acb170893675daa5a39eac29d.1624884053.git.anatoly.burakov@intel.com>
 <DM6PR11MB4491030DED8F1E21880416C89A039@DM6PR11MB4491.namprd11.prod.outlook.com>
 <77f30ce1-40b0-b427-c0f9-359a350358a1@intel.com>
 <DM6PR11MB44913223DE75F7C0702A656C9A029@DM6PR11MB4491.namprd11.prod.outlook.com>
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Message-ID: <cc3d79fe-0de5-1350-8551-d7c196584c8e@intel.com>
Date: Tue, 29 Jun 2021 12:05:08 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
 Thunderbird/78.11.0
MIME-Version: 1.0
In-Reply-To: <DM6PR11MB44913223DE75F7C0702A656C9A029@DM6PR11MB4491.namprd11.prod.outlook.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH v3 6/7] power: support monitoring multiple Rx
 queues
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 29-Jun-21 1:07 AM, Ananyev, Konstantin wrote:

>>>> +static uint16_t
>>>> +clb_multiwait(uint16_t port_id, uint16_t qidx,
>>>> +             struct rte_mbuf **pkts __rte_unused, uint16_t nb_rx,
>>>> +             uint16_t max_pkts __rte_unused, void *addr __rte_unused)
>>>> +{
>>>> +     const unsigned int lcore = rte_lcore_id();
>>>> +     const union queue q = {.portid = port_id, .qid = qidx};
>>>> +     const bool empty = nb_rx == 0;
>>>> +     struct pmd_core_cfg *q_conf;
>>>> +
>>>> +     q_conf = &lcore_cfg[lcore];
>>>> +
>>>> +     /* early exit */
>>>> +     if (likely(!empty)) {
>>>> +             q_conf->empty_poll_stats = 0;
>>>> +     } else {
>>>> +             /* do we care about this particular queue? */
>>>> +             if (!queue_is_power_save(q_conf, &q))
>>>> +                     return nb_rx;
>>>
>>> I still don't understand the need of 'special' power_save queue here...
>>> Why we can't just have a function:
>>>
>>> get_number_of_queues_whose_sequential_empty_polls_less_then_threshold(struct pmd_core_cfg *lcore_cfg),
>>> and then just:
>>>
>>> /* all queues have at least EMPTYPOLL_MAX sequential empty polls */
>>> if (get_number_of_queues_whose_sequential_empty_polls_less_then_threshold(q_conf) == 0) {
>>>       /* go into power-save mode here */
>>> }
>>
>> Okay, let's go through this step by step :)
>>
>> Let's suppose we have three queues - q0, q1 and q2. We want to sleep
>> whenever there's no traffic on *all of them*, however we cannot know
>> that until we have checked all of them.
>>
>> So, let's suppose that q0, q1 and q2 were empty all this time, but now
>> some traffic arrived at q2 while we're still checking q0. We see that q0
>> is empty, and all of the queues were empty for the last N polls, so we
>> think we will be safe to sleep at q0 despite the fact that traffic has
>> just arrived at q2.
>> This is not an issue with MONITOR mode because we will be able to see if
>> current Rx ring descriptor is busy or not via the NIC callback, *but
>> this is not possible* with PAUSE and SCALE modes, because they don't
>> have the sneaky lookahead function of MONITOR! So, with PAUSE and SCALE
>> modes, it is possible to end up in a situation where you *think* you
>> don't have any traffic, but you actually do, you just haven't checked
>> the relevant queue yet.
> 
> I think such situation is unavoidable.
> Yes, traffic can arrive to *any* queue at *any* time.
> With your example above - user choose q2 as 'special' queue, but
> traffic actually arrives on q0 or q1.
> And yes, if user choose PAUSE or SCALE methods he *can* miss the traffic,
> because as you said for these methods there is no notification mechanisms.
> I think there are just unavoidable limitations with these power-save methods.
> 
>> In order to prevent this from happening, we do not sleep on every queue,
>> instead we sleep *once* per loop.
> 
> Yes, totally agree we shouldn't sleep on *every* queue.
> We need to go to sleep when there is no traffic on *any* of queues we monitor.
> 
>> That is, we check q0, check q1, check
>> q2, and only then we decide whether we want to sleep or not.
> 
>> Of course, with such scheme it is still possible to e.g. sleep in q2
>> while there's traffic waiting in q0,
> 
> Yes, exactly.
> 
>> but worst case is less bad with
>> this scheme, because we'll be doing at worst 1 extra sleep.
> 
> Hmm, I think it would be one extra sleep anyway.
> 
>> Whereas with what you're suggesting, if we had e.g. 10 queues to poll,
>> and we checked q1 but traffic has just arrived at q0, we'll be sleeping
>> at q1, then we'll be sleeping at q2, then we'll be sleeping at q3, then
>> we'll be sleeping at q4, then we'll be sleeping at q5.... and 9 sleeps
>> later we finally reach q0 and find out after all this time that we
>> shouldn't have slept in the first place.
> 
> Ah ok, I think I understand now what you are saying.
> Sure, to avoid such situation, we'll need to maintain extra counters and
> update them properly when we go to sleep.
> I should state it clearly at the beginning.
> It might be easier to explain what I meant by code snippet:
> 
> lcore_conf needs 2 counters:
> uint64_t   nb_queues_ready_to_sleep;
> uint64_t   nb_sleeps;
> 
> Plus each queue needs 2 counters:
> uint64_t nb_empty_polls;
> uint64_t nb_sleeps;
> 
> Now, at rx_callback():
> 
> /* check did sleep happen since previous call,
>       if yes, then reset queue counters */
> if (queue->nb_sleeps != lcore_conf->nb_sleeps) {
>      queue->nb_sleeps = lcore_conf->nb_sleeps;
>      queue->nb_empty_polls = 0;
> }
> 
>   /* packet arrived, reset counters */
>   if (nb_rx != 0) {
>     /* queue is not 'ready_to_sleep' any more */
>     if (queue->nb_empty_polls > EMPTYPOLL_MAX)
>         lcore_conf-> nb_queues_ready_to_sleep--;
>     queue->nb_empty_polls = 0;
> 
> /* empty poll */
> } else {
>      /* queue reaches EMPTYPOLL_MAX threshold, mark it as 'ready_to_sleep' */
>      if (queue->nb_empty_polls == EMPTYPOLL_MAX)
>         lcore_conf-> nb_queues_ready_to_sleep++;
>      queue->nb_empty_polls++;
> }
> 
>     /* no traffic on any queue for at least EMPTYPOLL_MAX iterations */
>     if (lcore_conf-> nb_queues_ready_to_sleep == lcore_conf->n_queues) {
>        /* update counters and sleep */
>        lcore_conf->nb_sleeps++;
>        lcore_conf-> nb_queues_ready_to_sleep = 0;
>        goto_sleep();
>     }
> }

OK, this sounds like it is actually doable :) I'll prototype and see if 
it works.

> 
>> Hopefully you get the point now :)
>>
>> So, the idea here is, for any N queues, sleep only once, not N times.
>>
>>>
>>>> +
>>>> +             /*
>>>> +              * we can increment unconditionally here because if there were
>>>> +              * non-empty polls in other queues assigned to this core, we
>>>> +              * dropped the counter to zero anyway.
>>>> +              */
>>>> +             q_conf->empty_poll_stats++;
>>>> +             if (unlikely(q_conf->empty_poll_stats > EMPTYPOLL_MAX)) {
>>>> +                     struct rte_power_monitor_cond pmc[RTE_MAX_ETHPORTS];
>>>
>>> I think you need here:
>>> struct rte_power_monitor_cond pmc[q_conf->n_queues];
>>
>> I think VLA's are generally agreed upon to be unsafe, so i'm avoiding
>> them here.
> 
> Wonder why?
> These days DPDK uses VLA in dozens of places...

Well, if that's the case, i can use it here also :) realistically the 
n_queues value will be very small, so it shouldn't be a big issue.

-- 
Thanks,
Anatoly