From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 758D7A0C41; Wed, 23 Jun 2021 14:12:43 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 113884003F; Wed, 23 Jun 2021 14:12:43 +0200 (CEST) Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by mails.dpdk.org (Postfix) with ESMTP id F08834003E for ; Wed, 23 Jun 2021 14:12:40 +0200 (CEST) IronPort-SDR: 8M9DOqdnl5WCkNgWoN61ol2bs6Ar0noJCWf74viHp2I0SVwmAY0HXMzVmjVYOuDVPPF1L0gVyk tSySoww24vJw== X-IronPort-AV: E=McAfee;i="6200,9189,10023"; a="268387455" X-IronPort-AV: E=Sophos;i="5.83,293,1616482800"; d="scan'208";a="268387455" Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga104.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2021 05:12:36 -0700 IronPort-SDR: dsnGizm7Q9KAfz6iPFdZbCZiGiPBf3ixpIh8t24FQ+z141x7xadnypXP82hb0bf2IIdYjb68p4 UH4GNg35Wb3w== X-IronPort-AV: E=Sophos;i="5.83,293,1616482800"; d="scan'208";a="556125419" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.203.112]) ([10.213.203.112]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Jun 2021 05:12:34 -0700 To: "Ananyev, Konstantin" , "dev@dpdk.org" , "Richardson, Bruce" Cc: "Loftus, Ciara" , "Hunt, David" References: <8007029ea9e5129ea43f0c11708169406a16727f.1622548381.git.anatoly.burakov@intel.com> <53ac7ee0-e4bc-099a-2ddd-0d74949eea9d@intel.com> <249f6957-0f10-926f-9f0e-5cb10fc4d2c3@intel.com> From: "Burakov, Anatoly" Message-ID: <22ddddf1-6523-8219-dcef-86205e662756@intel.com> Date: Wed, 23 Jun 2021 13:12:30 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 23-Jun-21 12:00 PM, Ananyev, Konstantin wrote: > >>>>> >>>>>> Previously, the semantics of power monitor were such that we were >>>>>> checking current value against the expected value, and if they matched, >>>>>> then the sleep was aborted. This is somewhat inflexible, because it only >>>>>> allowed us to check for a specific value. >>>>>> >>>>>> This commit adds an option to reverse the check, so that we can have >>>>>> monitor sleep aborted if the expected value *doesn't* match what's in >>>>>> memory. This allows us to both implement all currently implemented >>>>>> driver code, as well as support more use cases which don't easily map to >>>>>> previous semantics (such as waiting on writes to AF_XDP counter value). >>>>>> >>>>>> Since the old behavior is the default, no need to adjust existing >>>>>> implementations. >>>>>> >>>>>> Signed-off-by: Anatoly Burakov >>>>>> --- >>>>>> lib/eal/include/generic/rte_power_intrinsics.h | 4 ++++ >>>>>> lib/eal/x86/rte_power_intrinsics.c | 5 ++++- >>>>>> 2 files changed, 8 insertions(+), 1 deletion(-) >>>>>> >>>>>> diff --git a/lib/eal/include/generic/rte_power_intrinsics.h b/lib/eal/include/generic/rte_power_intrinsics.h >>>>>> index dddca3d41c..1006c2edfc 100644 >>>>>> --- a/lib/eal/include/generic/rte_power_intrinsics.h >>>>>> +++ b/lib/eal/include/generic/rte_power_intrinsics.h >>>>>> @@ -31,6 +31,10 @@ struct rte_power_monitor_cond { >>>>>> * 4, or 8. Supplying any other value will result in >>>>>> * an error. >>>>>> */ >>>>>> + uint8_t invert; /**< Invert check for expected value (e.g. instead of >>>>>> + * checking if `val` matches something, check if >>>>>> + * `val` *doesn't* match a particular value) >>>>>> + */ >>>>>> }; >>>>>> >>>>>> /** >>>>>> diff --git a/lib/eal/x86/rte_power_intrinsics.c b/lib/eal/x86/rte_power_intrinsics.c >>>>>> index 39ea9fdecd..5d944e9aa4 100644 >>>>>> --- a/lib/eal/x86/rte_power_intrinsics.c >>>>>> +++ b/lib/eal/x86/rte_power_intrinsics.c >>>>>> @@ -117,7 +117,10 @@ rte_power_monitor(const struct rte_power_monitor_cond *pmc, >>>>>> const uint64_t masked = cur_value & pmc->mask; >>>>>> >>>>>> /* if the masked value is already matching, abort */ >>>>>> - if (masked == pmc->val) >>>>>> + if (!pmc->invert && masked == pmc->val) >>>>>> + goto end; >>>>>> + /* same, but for inverse check */ >>>>>> + if (pmc->invert && masked != pmc->val) >>>>>> goto end; >>>>>> } >>>>>> >>>>> >>>>> Hmm..., such approach looks too 'patchy'... >>>>> Can we at least replace 'inver' with something like: >>>>> enum rte_power_monitor_cond_op { >>>>> _EQ, NEQ,... >>>>> }; >>>>> Then at least new comparions ops can be added in future. >>>>> Even better I think would be to just leave to PMD to provide a comparison callback. >>>>> Will make things really simple and generic: >>>>> struct rte_power_monitor_cond { >>>>> volatile void *addr; >>>>> int (*cmp)(uint64_t val); >>>>> uint8_t size; >>>>> }; >>>>> And then in rte_power_monitor(...): >>>>> .... >>>>> const uint64_t cur_value = __get_umwait_val(pmc->addr, pmc->size); >>>>> if (pmc->cmp(cur_value) != 0) >>>>> goto end; >>>>> .... >>>>> >>>> >>>> I like the idea of a callback, but these are supposed to be >>>> intrinsic-like functions, so putting too much into them is contrary to >>>> their goal, and it's going to make the API hard to use in simpler cases >>>> (e.g. when we're explicitly calling rte_power_monitor as opposed to >>>> letting the RX callback do it for us). For example, event/dlb code calls >>>> rte_power_monitor explicitly. >>> >>> Good point, I didn't know that. >>> Would be interesting to see how do they use it. >> >> To be fair, it should be possible to rewrite their code using a >> callback. Perhaps adding a (void *) parameter for any custom data >> related to the callback (because C doesn't have closures...), but >> otherwise it should be doable, so the question isn't that it's >> impossible to rewrite event/dlb code to use callbacks, it's more of an >> issue with complicating usage of already-not-quite-straightforward API >> even more. >> >>> >>>> >>>> It's going to be especially "fun" to do these indirect function calls >>>> from inside transactional region on call to multi-monitor. >>> >>> But the callback is not supposed to do any memory reads/writes. >>> Just mask/compare of the provided value with some constant. >> >> Yeah, but with callbacks we can't really control that, can we? I mean i >> guess a *sane* implementation wouldn't do that, but still, it's >> theoretically possible to perform more complex checks and even touch >> some unrelated data in the process. > > Yep, PMD developer can ignore recommendations and do whatever > he wants in the call-back. We can't control it. > If he touches some memory in it - probably there will be more spurious wakeups and less power saves. > In principle it is the same with all other PMD dev-ops - we have to trust that they are > doing what they have to. I did a quick prototype for this, and i don't think it is going to work. Callbacks with just "current value" as argument will be pretty limited and will only really work for cases where we know what we are expecting. However, for cases like event/dlb or net/mlx5, the expected value is (or appears to be) dependent upon some internal device data, and is not constant like in case of net/ixgbe for example. This can be fixed by passing an opaque pointer, either by storing it in the monitor condition, or by passing it directly to rte_power_monitor at invocation time. The latter doesn't work well because when we call rte_power_monitor from inside the rte_power library, we lack the context necessary to get said opaque pointer. The former doesn't work either, because the only place where we can get this argument is inside get_monitor_addr, but the opaque pointer must persist after we exit that function in order to avoid use-after-free - which means that it either has to be statically allocated (which means it's not thread-safe for a non-trivial case), or dynamically allocated (which a big no-no on a hotpath). Any other suggestions? :) > >> >>> >>>> I'm not >>>> opposed to having a callback here, but maybe others have more thoughts >>>> on this? >>>> >>>> -- >>>> Thanks, >>>> Anatoly >> >> >> -- >> Thanks, >> Anatoly -- Thanks, Anatoly