From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 9A309A04B6; Mon, 12 Oct 2020 12:36:54 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 6EBC82B82; Mon, 12 Oct 2020 12:36:53 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id A620F29D6 for ; Mon, 12 Oct 2020 12:36:50 +0200 (CEST) IronPort-SDR: aK7pzAQGNl71YJnNnG58raxPXkZ6vQO4SfEkW6JT38XBNeZEBGo8cUC+o4YxW3z9CcHs/LDvmL EQKxJ/MX3P7g== X-IronPort-AV: E=McAfee;i="6000,8403,9771"; a="145584765" X-IronPort-AV: E=Sophos;i="5.77,366,1596524400"; d="scan'208";a="145584765" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2020 03:36:48 -0700 IronPort-SDR: rY5Rzc1uW66KBSosECyfP8h6txQhq/LwRbuQLsALjPwSVitD0lO7rwJhqk3bku0JADL5QJQ1M+ 5htSRkqE6FhA== X-IronPort-AV: E=Sophos;i="5.77,366,1596524400"; d="scan'208";a="344830915" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.195.67]) ([10.213.195.67]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2020 03:36:47 -0700 From: "Burakov, Anatoly" To: "Ananyev, Konstantin" , "Ma, Liang J" , "dev@dpdk.org" Cc: "Hunt, David" , "stephen@networkplumber.org" References: <1599214740-3927-1-git-send-email-liang.j.ma@intel.com> <1601647919-25312-1-git-send-email-liang.j.ma@intel.com> <1601647919-25312-2-git-send-email-liang.j.ma@intel.com> <665bcb31-dcf0-553b-bae1-054e5f50e77f@intel.com> <3609c5b3-f431-3954-6350-cb2de77b72a7@intel.com> <7e3bedc9-db3d-262e-c0ab-62b53d60fc7c@intel.com> Message-ID: Date: Mon, 12 Oct 2020 11:36:46 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v4 02/10] eal: add power management intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 12-Oct-20 11:35 AM, Burakov, Anatoly wrote: > On 10-Oct-20 2:19 PM, Ananyev, Konstantin wrote: >> >> >>>>>>>>> Add two new power management intrinsics, and provide an >>>>>>>>> implementation >>>>>>>>> in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions >>>>>>>>> are implemented as raw byte opcodes because there is not yet >>>>>>>>> widespread >>>>>>>>> compiler support for these instructions. >>>>>>>>> >>>>>>>>> The power management instructions provide an architecture-specific >>>>>>>>> function to either wait until a specified TSC timestamp is >>>>>>>>> reached, or >>>>>>>>> optionally wait until either a TSC timestamp is reached or a >>>>>>>>> memory >>>>>>>>> location is written to. The monitor function also provides an >>>>>>>>> optional >>>>>>>>> comparison, to avoid sleeping when the expected write has already >>>>>>>>> happened, and no more writes are expected. >>>>>>>> >>>>>>>> I think what this API is missing - a function to wakeup sleeping >>>>>>>> core. >>>>>>>> If user can/should use some system call to achieve that, then at >>>>>>>> least >>>>>>>> it has to be clearly documented, even better some wrapper provided. >>>>>>> >>>>>>> I don't think it's possible to do that without severely >>>>>>> overcomplicating >>>>>>> the intrinsic and its usage, because AFAIK the only way to wake up a >>>>>>> sleeping core would be to send some kind of interrupt to the >>>>>>> core, or >>>>>>> trigger a write to the cache-line in question. >>>>>>> >>>>>> >>>>>> Yes, I think we either need a syscall that would do an IPI for us >>>>>> (on top of my head - membarrier() does that, might be there are >>>>>> some other syscalls too), >>>>>> or something hand-made. For hand-made, I wonder would something >>>>>> like that >>>>>> be safe and sufficient: >>>>>> uint64_t val = atomic_load(addr); >>>>>> CAS(addr, val, &val); >>>>>> ? >>>>>> Anyway, one way or another - I think ability to wakeup core we put >>>>>> to sleep >>>>>> have to be an essential part of this feature. >>>>>> As I understand linux kernel will limit max amount of sleep time >>>>>> for these instructions: >>>>>> https://lwn.net/Articles/790920/ >>>>>> But relying just on that, seems too vague for me: >>>>>> - user can adjust that value >>>>>> - wouldn't apply to older kernels and non-linux cases >>>>>> Konstantin >>>>>> >>>>> >>>>> This implies knowing the value the core is sleeping on. >>>> >>>> You don't the value to wait for, you just need an address. >>>> And you can make wakeup function to accept address as a parameter, >>>> same as monitor() does. >>> >>> Sorry, i meant the address. We don't know the address we're sleeping on. >>> >>>> >>>>> That's not >>>>> always the case - with this particular PMD power management scheme, we >>>>> get the address from the PMD and it stays inside the callback. >>>> >>>> That's fine - you can store address inside you callback metadata >>>> and do wakeup as part of _disable_ function. >>>> >>> >>> The address may be different, and by the time we access the address it >>> may become stale, so i don't see how that would help unless you're >>> suggesting to have some kind of synchronization mechanism there. >> >> Yes, we'll need something to sync here for sure. >> Sorry, I should say it straightway, to avoid further misunderstanding. >> Let say, associate a spin_lock with monitor(), by analogy with >> pthread_cond_wait(). >> Konstantin >> > > The idea was to provide an intrinsic-like function - as in, raw > instruction call, without anything extra. We even added the masks/values > etc. only because there's no race-less way to combine UMONITOR/UMWAIT > without those. > > Perhaps we can provide a synchronize-able wrapper around it to avoid > adding overhead to calls that function but doesn't need the sync mechanism? > Also, how would having a spinlock help to synchronize? Are you suggesting we do UMWAIT on a spinlock address, or something to that effect? -- Thanks, Anatoly