From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id DD3F1A0547; Thu, 24 Jun 2021 17:04:54 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5660F40040; Thu, 24 Jun 2021 17:04:54 +0200 (CEST) Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by mails.dpdk.org (Postfix) with ESMTP id F2FC44003C for ; Thu, 24 Jun 2021 17:04:51 +0200 (CEST) IronPort-SDR: G841pNyyM4r4dAQI+DnohcHNUBy+OAXrnV4U5PlOssB6W96gMoNX4epxKNlvuy0XQINnUEC6vS H7WzNddPp4hA== X-IronPort-AV: E=McAfee;i="6200,9189,10025"; a="205659994" X-IronPort-AV: E=Sophos;i="5.83,296,1616482800"; d="scan'208";a="205659994" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jun 2021 08:04:50 -0700 IronPort-SDR: WiytjPUqqozM4dWeMi9A+eP9ahJ8th0m8rz0xTqZ8uTlXVTCIxTfkUGsA69jl9BROz5vTFRzT2 fEr/IswdO1Jg== X-IronPort-AV: E=Sophos;i="5.83,296,1616482800"; d="scan'208";a="557358115" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.250.19]) ([10.213.250.19]) by fmsmga001-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 24 Jun 2021 08:04:49 -0700 To: "Ananyev, Konstantin" , "dev@dpdk.org" , "Richardson, Bruce" Cc: "Loftus, Ciara" , "Hunt, David" References: <8007029ea9e5129ea43f0c11708169406a16727f.1622548381.git.anatoly.burakov@intel.com> <53ac7ee0-e4bc-099a-2ddd-0d74949eea9d@intel.com> <249f6957-0f10-926f-9f0e-5cb10fc4d2c3@intel.com> <22ddddf1-6523-8219-dcef-86205e662756@intel.com> <084a42d9-1422-f4a5-f57e-53b4667c1f44@intel.com> From: "Burakov, Anatoly" Message-ID: <882380b2-20b9-f081-9a5a-199c38094b1b@intel.com> Date: Thu, 24 Jun 2021 16:04:45 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v1 1/7] power_intrinsics: allow monitor checks inversion X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 24-Jun-21 3:57 PM, Ananyev, Konstantin wrote: > > >>>>>> I did a quick prototype for this, and i don't think it is going to work. >>>>>> >>>>>> Callbacks with just "current value" as argument will be pretty limited >>>>>> and will only really work for cases where we know what we are expecting. >>>>>> However, for cases like event/dlb or net/mlx5, the expected value is (or >>>>>> appears to be) dependent upon some internal device data, and is not >>>>>> constant like in case of net/ixgbe for example. >>>>>> >>>>>> This can be fixed by passing an opaque pointer, either by storing it in >>>>>> the monitor condition, or by passing it directly to rte_power_monitor at >>>>>> invocation time. >>>>>> >>>>>> The latter doesn't work well because when we call rte_power_monitor from >>>>>> inside the rte_power library, we lack the context necessary to get said >>>>>> opaque pointer. >>>>>> >>>>>> The former doesn't work either, because the only place where we can get >>>>>> this argument is inside get_monitor_addr, but the opaque pointer must >>>>>> persist after we exit that function in order to avoid use-after-free - >>>>>> which means that it either has to be statically allocated (which means >>>>>> it's not thread-safe for a non-trivial case), or dynamically allocated >>>>>> (which a big no-no on a hotpath). >>>>> >>>>> If I get you right, expected_value (and probably mask) can be variable ones. >>>>> So for callback approach to work we need to pass all this as parameters >>>>> to PMD comparison callback: >>>>> int pmc_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask) >>>>> Correct? >>>> >>>> If we have both expected value, mask, and current value, then what's the >>>> point of the callback? The point of the callback would be to pass just >>>> the current value, and let the callback decide what's the expected value >>>> and how to compare it. >>> >>> For me the main point of callback is to hide PMD specific comparison semantics. >>> Basically they provide us with some values in struct rte_power_monitor_cond, >>> and then it is up to them how to interpret them in their comparison function. >>> All we'll do for them: will read the value at address provided. >>> I understand that it looks like an overkill, as majority of these comparison functions >>> will be like: >>> int cmp_callback(uint64_t real_val, uint64_t expected_val, uint64_t mask) >>> { >>> return ((real_val & mask) == expected_val); >>> } >>> Though qsort() and bsearch() work in a similar manner, and everyone seems ok with it. >>> >>>> >>>> So, we can either let callback handle expected values itself by having >>>> an opaque callback-specific argument (which means it has to persist >>>> between .get_monitor_addr() and rte_power_monitor() calls), >>> >>> But that's what we doing already - PMD fills rte_power_monitor_cond values >>> for us, we store them somewhere and then use them to decide should we go to sleep or not. >>> All callback does - moves actual values interpretation back to PMD: >>> Right now: >>> PMD: provide PMC values >>> POWER: store PMC values somewhere >>> read the value at address provided in PMC >>> interpret PMC values and newly read value and make the decision >>> >>> With callback: >>> PMD: provide PMC values >>> POWER: store PMC values somewhere >>> read the value at address provided in PMC >>> PMD: interpret PMC values and newly read value and make the decision >>> >>> Or did you mean something different here? >>> >>>> or we do the >>>> comparisons inside rte_power_monitor(), and store the expected/mask >>>> values in the monitor condition, and *don't* have any callbacks at all. >>>> Are you suggesting an alternative to the above two options? >>> >>> As I said in my first mail - we can just replace 'inverse' with 'op'. >>> That at least will make this API extendable, if someone will need >>> something different in future. >>> >>> Another option is >> >> Right, so the idea is store the PMD-specific data in the monitor >> condition, and leave it to the callback to interpret it. >> >> The obvious question then is, how many values is enough? Two? Three? >> Four? This option doesn't really solve the basic issue, it just kicks >> the can down the road. I guess three values should be enough for >> everyone (tm) ? :D >> >> I don't like the 'op' thing because if the goal is to be flexible, it's >> unnecessarily limiting *and* makes the API even more complex to use. I >> would rather have a number of PMD-specific values and leave it up to the >> callback to interpret them, because at least that way we're not limited >> to predefined operations on the monitor condition data. > > Just to make sure we are talking about the same, does what you propose > looks like that: > > struct rte_power_monitor_cond { > volatile void *addr; /**< Address to monitor for changes */ > uint8_t size; /**< Data size (in bytes) that will be used to compare > * expected value (`val`) with data read from the > * monitored memory location (`addr`). Can be 1, 2, > * 4, or 8. Supplying any other value will result in > * an error. > */ > int (*cmp)(uint64_t real_value, const uint64_t opaque[4]); > uint64_t opaque[4]; /*PMD specific data, used by comparison call-back below */ > }; > > And then in rte_power_monitor(): > ... > uint64_t cur_value = __get_umwait_val(pmc->addr, pmc->size); > if (pmc->cmp(cur_value, pmc->opaque) != 0) { > /* goto sleep */ > } > > ? > Something like that, yes. -- Thanks, Anatoly