From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id C4FA8A04BC; Fri, 9 Oct 2020 18:59:40 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 00FCC1D163; Fri, 9 Oct 2020 18:59:26 +0200 (CEST) Received: from mga18.intel.com (mga18.intel.com [134.134.136.126]) by dpdk.org (Postfix) with ESMTP id D8A381C2BC for ; Fri, 9 Oct 2020 18:59:22 +0200 (CEST) IronPort-SDR: l7TiU0gUXaVRKfYiZU3q54jDJCAuEaZqwHgGoyZOIpgeJHQFaf3bQcwjq38xr5JiBJ37Ey+E0g 6a08CmM3BOcg== X-IronPort-AV: E=McAfee;i="6000,8403,9769"; a="153343789" X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="153343789" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga106.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 09:59:21 -0700 IronPort-SDR: 41Lcv7XeyRoh3bdyU45ihbOGVg5lBp/wAqspdT/5Q4nehkba/TJ0v1wQ//KGpi4vfKzpYVXcyq jgHSuB1sBbKQ== X-IronPort-AV: E=Sophos;i="5.77,355,1596524400"; d="scan'208";a="462255129" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.3.170]) ([10.213.3.170]) by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 09 Oct 2020 09:59:19 -0700 To: "Ananyev, Konstantin" , "Ma, Liang J" , "dev@dpdk.org" Cc: "Hunt, David" , "stephen@networkplumber.org" References: <1599214740-3927-1-git-send-email-liang.j.ma@intel.com> <1601647919-25312-1-git-send-email-liang.j.ma@intel.com> <1601647919-25312-2-git-send-email-liang.j.ma@intel.com> <665bcb31-dcf0-553b-bae1-054e5f50e77f@intel.com> <3609c5b3-f431-3954-6350-cb2de77b72a7@intel.com> From: "Burakov, Anatoly" Message-ID: <7e3bedc9-db3d-262e-c0ab-62b53d60fc7c@intel.com> Date: Fri, 9 Oct 2020 17:59:17 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v4 02/10] eal: add power management intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On 09-Oct-20 5:56 PM, Ananyev, Konstantin wrote: > >> >> On 09-Oct-20 4:39 PM, Ananyev, Konstantin wrote: >>> >>>> On 08-Oct-20 6:15 PM, Ananyev, Konstantin wrote: >>>>>> >>>>>> Add two new power management intrinsics, and provide an implementation >>>>>> in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions >>>>>> are implemented as raw byte opcodes because there is not yet widespread >>>>>> compiler support for these instructions. >>>>>> >>>>>> The power management instructions provide an architecture-specific >>>>>> function to either wait until a specified TSC timestamp is reached, or >>>>>> optionally wait until either a TSC timestamp is reached or a memory >>>>>> location is written to. The monitor function also provides an optional >>>>>> comparison, to avoid sleeping when the expected write has already >>>>>> happened, and no more writes are expected. >>>>> >>>>> I think what this API is missing - a function to wakeup sleeping core. >>>>> If user can/should use some system call to achieve that, then at least >>>>> it has to be clearly documented, even better some wrapper provided. >>>> >>>> I don't think it's possible to do that without severely overcomplicating >>>> the intrinsic and its usage, because AFAIK the only way to wake up a >>>> sleeping core would be to send some kind of interrupt to the core, or >>>> trigger a write to the cache-line in question. >>>> >>> >>> Yes, I think we either need a syscall that would do an IPI for us >>> (on top of my head - membarrier() does that, might be there are some other syscalls too), >>> or something hand-made. For hand-made, I wonder would something like that >>> be safe and sufficient: >>> uint64_t val = atomic_load(addr); >>> CAS(addr, val, &val); >>> ? >>> Anyway, one way or another - I think ability to wakeup core we put to sleep >>> have to be an essential part of this feature. >>> As I understand linux kernel will limit max amount of sleep time for these instructions: >>> https://lwn.net/Articles/790920/ >>> But relying just on that, seems too vague for me: >>> - user can adjust that value >>> - wouldn't apply to older kernels and non-linux cases >>> Konstantin >>> >> >> This implies knowing the value the core is sleeping on. > > You don't the value to wait for, you just need an address. > And you can make wakeup function to accept address as a parameter, > same as monitor() does. Sorry, i meant the address. We don't know the address we're sleeping on. > >> That's not >> always the case - with this particular PMD power management scheme, we >> get the address from the PMD and it stays inside the callback. > > That's fine - you can store address inside you callback metadata > and do wakeup as part of _disable_ function. > The address may be different, and by the time we access the address it may become stale, so i don't see how that would help unless you're suggesting to have some kind of synchronization mechanism there. -- Thanks, Anatoly