From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id E6554A0C3F; Mon, 28 Jun 2021 15:29:28 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 6EB4040692; Mon, 28 Jun 2021 15:29:28 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by mails.dpdk.org (Postfix) with ESMTP id 75B6B4068A for ; Mon, 28 Jun 2021 15:29:27 +0200 (CEST) X-IronPort-AV: E=McAfee;i="6200,9189,10028"; a="206126678" X-IronPort-AV: E=Sophos;i="5.83,306,1616482800"; d="scan'208";a="206126678" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2021 06:29:26 -0700 X-IronPort-AV: E=Sophos;i="5.83,306,1616482800"; d="scan'208";a="446565824" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.249.202]) ([10.213.249.202]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 Jun 2021 06:29:23 -0700 To: "Ananyev, Konstantin" , "dev@dpdk.org" , Jerin Jacob , Ruifeng Wang , Jan Viktorin , David Christensen , Ray Kinsella , Neil Horman , "Richardson, Bruce" Cc: "Hunt, David" , "Loftus, Ciara" References: <6c48562add728df4e2f22b3a41170d624c63e233.1624629506.git.anatoly.burakov@intel.com> <6a932db8-1c6c-8400-c274-ea40918ea023@intel.com> From: "Burakov, Anatoly" Message-ID: Date: Mon, 28 Jun 2021 14:29:20 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v2 3/7] eal: add power monitor for multiple events X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 28-Jun-21 1:58 PM, Ananyev, Konstantin wrote: > >> On 28-Jun-21 1:37 PM, Ananyev, Konstantin wrote: >>> >>>> Use RTM and WAITPKG instructions to perform a wait-for-writes similar to >>>> what UMWAIT does, but without the limitation of having to listen for >>>> just one event. This works because the optimized power state used by the >>>> TPAUSE instruction will cause a wake up on RTM transaction abort, so if >>>> we add the addresses we're interested in to the read-set, any write to >>>> those addresses will wake us up. >>>> >>>> Signed-off-by: Konstantin Ananyev >>>> Signed-off-by: Anatoly Burakov >>>> --- >>>> >>>> Notes: >>>> v2: >>>> - Adapt to callback mechanism >>>> >>>> doc/guides/rel_notes/release_21_08.rst | 2 + >>>> lib/eal/arm/rte_power_intrinsics.c | 11 +++ >>>> lib/eal/include/generic/rte_cpuflags.h | 2 + >>>> .../include/generic/rte_power_intrinsics.h | 35 ++++++++++ >>>> lib/eal/ppc/rte_power_intrinsics.c | 11 +++ >>>> lib/eal/version.map | 3 + >>>> lib/eal/x86/rte_cpuflags.c | 2 + >>>> lib/eal/x86/rte_power_intrinsics.c | 69 +++++++++++++++++++ >>>> 8 files changed, 135 insertions(+) >>>> >>> ... >>> >>>> diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c >>>> index 3c5c9ce7ad..3fc6f62ef5 100644 >>>> --- a/lib/eal/x86/rte_power_intrinsics.c >>>> +++ b/lib/eal/x86/rte_power_intrinsics.c >>>> @@ -4,6 +4,7 @@ >>>> >>>> #include >>>> #include >>>> +#include >>>> #include >>>> >>>> #include "rte_power_intrinsics.h" >>>> @@ -28,6 +29,7 @@ __umwait_wakeup(volatile void *addr) >>>> } >>>> >>>> static bool wait_supported; >>>> +static bool wait_multi_supported; >>>> >>>> static inline uint64_t >>>> __get_umwait_val(const volatile void *p, const uint8_t sz) >>>> @@ -164,6 +166,8 @@ RTE_INIT(rte_power_intrinsics_init) { >>>> >>>> if (i.power_monitor && i.power_pause) >>>> wait_supported = 1; >>>> + if (i.power_monitor_multi) >>>> + wait_multi_supported = 1; >>>> } >>>> >>>> int >>>> @@ -202,6 +206,9 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) >>>> * In this case, since we've already woken up, the "wakeup" was >>>> * unneeded, and since T1 is still waiting on T2 releasing the lock, the >>>> * wakeup address is still valid so it's perfectly safe to write it. >>>> + * >>>> + * For multi-monitor case, the act of locking will in itself trigger the >>>> + * wakeup, so no additional writes necessary. >>>> */ >>>> rte_spinlock_lock(&s->lock); >>>> if (s->monitor_addr != NULL) >>>> @@ -210,3 +217,65 @@ rte_power_monitor_wakeup(const unsigned int lcore_id) >>>> >>>> return 0; >>>> } >>>> + >>>> +int >>>> +rte_power_monitor_multi(const struct rte_power_monitor_cond pmc[], >>>> + const uint32_t num, const uint64_t tsc_timestamp) >>>> +{ >>>> + const unsigned int lcore_id = rte_lcore_id(); >>>> + struct power_wait_status *s = &wait_status[lcore_id]; >>>> + uint32_t i, rc; >>>> + >>>> + /* check if supported */ >>>> + if (!wait_multi_supported) >>>> + return -ENOTSUP; >>>> + >>>> + if (pmc == NULL || num == 0) >>>> + return -EINVAL; >>>> + >>>> + /* we are already inside transaction region, return */ >>>> + if (rte_xtest() != 0) >>>> + return 0; >>>> + >>>> + /* start new transaction region */ >>>> + rc = rte_xbegin(); >>>> + >>>> + /* transaction abort, possible write to one of wait addresses */ >>>> + if (rc != RTE_XBEGIN_STARTED) >>>> + return 0; >>>> + >>>> + /* >>>> + * the mere act of reading the lock status here adds the lock to >>>> + * the read set. This means that when we trigger a wakeup from another >>>> + * thread, even if we don't have a defined wakeup address and thus don't >>>> + * actually cause any writes, the act of locking our lock will itself >>>> + * trigger the wakeup and abort the transaction. >>>> + */ >>>> + rte_spinlock_is_locked(&s->lock); >>>> + >>>> + /* >>>> + * add all addresses to wait on into transaction read-set and check if >>>> + * any of wakeup conditions are already met. >>>> + */ >>>> + for (i = 0; i < num; i++) { >>>> + const struct rte_power_monitor_cond *c = &pmc[i]; >>>> + >>>> + if (pmc->fn == NULL) >>> >>> Should be c->fn, I believe. >> >> Yep, will fix. >> >>> >>>> + continue; >>> >>> Actually that way, if c->fn == NULL, we'll never add our c->addr to monitored addresses. >>> Is that what we really want? >>> My thought was, that if callback is not set, we'll just go to power-save state without extra checking, no? >>> Something like that: >>> >>> const struct rte_power_monitor_cond *c = &pmc[i]; >>> const uint64_t val = __get_umwait_val(c->addr, c->size); >>> >>> if (c->fn && c->fn(val, c->opaque) != 0) >>> break; >> >> This is consistent with previous behavior of rte_power_monitor where if >> mask wasn't set we entered power save mode without any checks. If we do >> a break, that means the check condition has failed somewhere and we have >> to abort the sleep. Continue keeps the sleep. > > Ok, so what is current intention? > If pmc->fn == NULL what does it mean: > 1) pmc->addr shouldn't be monitored at all? > 2) pmc->addr should be monitored unconditionally > 3) pmc->fn should never be NULL and monitor should return an error > 3) something else? > > For me 1) looks really strange, if user doesn't want to sleep on that address, > he can just not add this addr to pmc[]. Ah, i see what you mean now. While we can skip the comparison, we still want to read the address. So, i would guess 2) should be true. > > 2) is probably ok... but is that really needed? > User can just provide NOP as a callback and it would be the same. > > 3) seems like a most sane to be. In that case, we have to do the same in the singular power monitor function - fn set to NULL should return an error. Will fix in v4 (v3 went out before i noticed your comments :/ ). -- Thanks, Anatoly