From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: <dev-bounces@dpdk.org> Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 21C85A04B6; Mon, 12 Oct 2020 12:35:34 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 71EC01D66C; Mon, 12 Oct 2020 12:35:29 +0200 (CEST) Received: from mga17.intel.com (mga17.intel.com [192.55.52.151]) by dpdk.org (Postfix) with ESMTP id DA85E1D666 for <dev@dpdk.org>; Mon, 12 Oct 2020 12:35:27 +0200 (CEST) IronPort-SDR: ZDkAz9/JaY5IC/hjoGeF2dE4UCvHKFWI5pZ9do8ywOt+tlAxzVd7rBhgHy3P8d0k+dAdS7XULD pC6jFSIbi8XQ== X-IronPort-AV: E=McAfee;i="6000,8403,9771"; a="145584673" X-IronPort-AV: E=Sophos;i="5.77,366,1596524400"; d="scan'208";a="145584673" X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga004.fm.intel.com ([10.253.24.48]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2020 03:35:25 -0700 IronPort-SDR: ilTsQQUqdmUmyj9E7rjRTPXaCgeG+wnF7Wf/8ZB5Mv2QaGOOfZ7OoySU5wJC9+X1ozU3FMpydc afvCOpE3EL9A== X-IronPort-AV: E=Sophos;i="5.77,366,1596524400"; d="scan'208";a="344830561" Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.195.67]) ([10.213.195.67]) by fmsmga004-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2020 03:35:24 -0700 To: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>, "Ma, Liang J" <liang.j.ma@intel.com>, "dev@dpdk.org" <dev@dpdk.org> Cc: "Hunt, David" <david.hunt@intel.com>, "stephen@networkplumber.org" <stephen@networkplumber.org> References: <1599214740-3927-1-git-send-email-liang.j.ma@intel.com> <1601647919-25312-1-git-send-email-liang.j.ma@intel.com> <1601647919-25312-2-git-send-email-liang.j.ma@intel.com> <BYAPR11MB33014A47B298FA6F574E26759A0B0@BYAPR11MB3301.namprd11.prod.outlook.com> <665bcb31-dcf0-553b-bae1-054e5f50e77f@intel.com> <BYAPR11MB33014547C6DB0D2EB33524F59A080@BYAPR11MB3301.namprd11.prod.outlook.com> <3609c5b3-f431-3954-6350-cb2de77b72a7@intel.com> <BYAPR11MB330130192DA4C05EF4F6352B9A080@BYAPR11MB3301.namprd11.prod.outlook.com> <7e3bedc9-db3d-262e-c0ab-62b53d60fc7c@intel.com> <BYAPR11MB3301B668B9B4A99BC13BE2D79A090@BYAPR11MB3301.namprd11.prod.outlook.com> From: "Burakov, Anatoly" <anatoly.burakov@intel.com> Message-ID: <a5a4612b-26cb-18de-bb77-339e7f0954ab@intel.com> Date: Mon, 12 Oct 2020 11:35:21 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Thunderbird/68.12.0 MIME-Version: 1.0 In-Reply-To: <BYAPR11MB3301B668B9B4A99BC13BE2D79A090@BYAPR11MB3301.namprd11.prod.outlook.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Subject: Re: [dpdk-dev] [PATCH v4 02/10] eal: add power management intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions <dev.dpdk.org> List-Unsubscribe: <https://mails.dpdk.org/options/dev>, <mailto:dev-request@dpdk.org?subject=unsubscribe> List-Archive: <http://mails.dpdk.org/archives/dev/> List-Post: <mailto:dev@dpdk.org> List-Help: <mailto:dev-request@dpdk.org?subject=help> List-Subscribe: <https://mails.dpdk.org/listinfo/dev>, <mailto:dev-request@dpdk.org?subject=subscribe> Errors-To: dev-bounces@dpdk.org Sender: "dev" <dev-bounces@dpdk.org> On 10-Oct-20 2:19 PM, Ananyev, Konstantin wrote: > > >>>>>>>> Add two new power management intrinsics, and provide an implementation >>>>>>>> in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions >>>>>>>> are implemented as raw byte opcodes because there is not yet widespread >>>>>>>> compiler support for these instructions. >>>>>>>> >>>>>>>> The power management instructions provide an architecture-specific >>>>>>>> function to either wait until a specified TSC timestamp is reached, or >>>>>>>> optionally wait until either a TSC timestamp is reached or a memory >>>>>>>> location is written to. The monitor function also provides an optional >>>>>>>> comparison, to avoid sleeping when the expected write has already >>>>>>>> happened, and no more writes are expected. >>>>>>> >>>>>>> I think what this API is missing - a function to wakeup sleeping core. >>>>>>> If user can/should use some system call to achieve that, then at least >>>>>>> it has to be clearly documented, even better some wrapper provided. >>>>>> >>>>>> I don't think it's possible to do that without severely overcomplicating >>>>>> the intrinsic and its usage, because AFAIK the only way to wake up a >>>>>> sleeping core would be to send some kind of interrupt to the core, or >>>>>> trigger a write to the cache-line in question. >>>>>> >>>>> >>>>> Yes, I think we either need a syscall that would do an IPI for us >>>>> (on top of my head - membarrier() does that, might be there are some other syscalls too), >>>>> or something hand-made. For hand-made, I wonder would something like that >>>>> be safe and sufficient: >>>>> uint64_t val = atomic_load(addr); >>>>> CAS(addr, val, &val); >>>>> ? >>>>> Anyway, one way or another - I think ability to wakeup core we put to sleep >>>>> have to be an essential part of this feature. >>>>> As I understand linux kernel will limit max amount of sleep time for these instructions: >>>>> https://lwn.net/Articles/790920/ >>>>> But relying just on that, seems too vague for me: >>>>> - user can adjust that value >>>>> - wouldn't apply to older kernels and non-linux cases >>>>> Konstantin >>>>> >>>> >>>> This implies knowing the value the core is sleeping on. >>> >>> You don't the value to wait for, you just need an address. >>> And you can make wakeup function to accept address as a parameter, >>> same as monitor() does. >> >> Sorry, i meant the address. We don't know the address we're sleeping on. >> >>> >>>> That's not >>>> always the case - with this particular PMD power management scheme, we >>>> get the address from the PMD and it stays inside the callback. >>> >>> That's fine - you can store address inside you callback metadata >>> and do wakeup as part of _disable_ function. >>> >> >> The address may be different, and by the time we access the address it >> may become stale, so i don't see how that would help unless you're >> suggesting to have some kind of synchronization mechanism there. > > Yes, we'll need something to sync here for sure. > Sorry, I should say it straightway, to avoid further misunderstanding. > Let say, associate a spin_lock with monitor(), by analogy with pthread_cond_wait(). > Konstantin > The idea was to provide an intrinsic-like function - as in, raw instruction call, without anything extra. We even added the masks/values etc. only because there's no race-less way to combine UMONITOR/UMWAIT without those. Perhaps we can provide a synchronize-able wrapper around it to avoid adding overhead to calls that function but doesn't need the sync mechanism? -- Thanks, Anatoly