From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id AA46EA00BE; Thu, 28 May 2020 13:40:09 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 24A921D9F3; Thu, 28 May 2020 13:40:09 +0200 (CEST) Received: from mga12.intel.com (mga12.intel.com [192.55.52.136]) by dpdk.org (Postfix) with ESMTP id 9F44D1D9B8 for ; Thu, 28 May 2020 13:40:03 +0200 (CEST) IronPort-SDR: /GYCE3lSzHcIHqcIdaiRr1jiwr4T3zNi+csRhYooezDwcS7cXL1TYsVFIcYav8zOxB/NZoanf7 cnxral2I0HVw== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga003.fm.intel.com ([10.253.24.29]) by fmsmga106.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 28 May 2020 04:40:02 -0700 IronPort-SDR: VPV5xBPyMcMPItM6SUv8nKaQj42i4Sy0VjAg8dQFYWYxC7P4DoHG4mYACSMkaE3RPOllKAirNe x9zQYbffdrjQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.73,444,1583222400"; d="scan'208";a="310905575" Received: from orsmsx106.amr.corp.intel.com ([10.22.225.133]) by FMSMGA003.fm.intel.com with ESMTP; 28 May 2020 04:40:02 -0700 Received: from orsmsx601.amr.corp.intel.com (10.22.229.14) by ORSMSX106.amr.corp.intel.com (10.22.225.133) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 28 May 2020 04:40:01 -0700 Received: from orsmsx607.amr.corp.intel.com (10.22.229.20) by ORSMSX601.amr.corp.intel.com (10.22.229.14) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.1713.5; Thu, 28 May 2020 04:40:01 -0700 Received: from ORSEDG001.ED.cps.intel.com (10.7.248.4) by orsmsx607.amr.corp.intel.com (10.22.229.20) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.1713.5 via Frontend Transport; Thu, 28 May 2020 04:40:01 -0700 Received: from NAM12-MW2-obe.outbound.protection.outlook.com (104.47.66.42) by edgegateway.intel.com (134.134.137.100) with Microsoft SMTP Server (TLS) id 14.3.439.0; Thu, 28 May 2020 04:40:00 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=iXDSthsUXmbSg21Z7ChRWDTcDYMKVoxTnIz9VA7LoUOZL+yIpFC1KZo+GBqV4EjHHJUXw2g72ztMIjTFEfBhmypkoAdNqi7HrU4yTDbiwDrW6aTWM5c9CY76JKMsIxDHnxRfiHvOIS/SwTmkd11QgFjhyaMSYVTpqoZrBo7dprpXuOfsdvoko7iNPigKBLqzSg+zB/jz2s6GnbFQePRFx+uVWaAtw7M/mvLk1qkAn9DbqW/1ySv8hzcNjko0leCRdr8QHvVTjX6s9tDo7M60wRNlKLcc9UALCNMjnIR+D3ehLZZpkPbwg5dEpW/AHIsCiBJyIOSyIsMqiT7nI44jqg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/HbdonIYoNX6LmI9cefysYmW4w9iSHVP08vXLtvpUj0=; b=Hg0E0F9jsLuTkdxbUunoW/tRyOSMu7pIVapU4LbseNc9DwpuZZoZgfC6JWcEXWBEpC5FDZsdXofTc0xlgWB7G4Z9DAOgwlOHD7JF29iG7XAXuPB3lTL0tkLqnFZn6DwGG2OLzS+6rTeSF3CqYFhw9jobR3OisiB9US/3U+pw8WoY8c6OVAz2nPYigtaOYMIY7tDjElug3efMWRxbUOjXrCB1b2OlHURQ+yOqKSP308uTpS0B2+7xORRenm9eXx3Z0Db6as/lW9K3WnXsgiTE2iNZMEeKRQLX4k1wfzA7vFbhmLMr5zqESQ/hbWxjTeMWPYzzY3d3NYV0y6xgCjlWCA== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/HbdonIYoNX6LmI9cefysYmW4w9iSHVP08vXLtvpUj0=; b=pS4W1s27zHOnzv1WA5/xI02yDBZMn6gzYEgjD6xgdvno4voySV3Z4OVK1y7YN38CX9hxr2KyhbDoslWvlQq6v71ZJO6exfIHGJGrFWSEJW9aCjHnvMmClCEgEWcqPjUGYK3TGZNGD1vZpkHesW/pRbC+/tGplUy4ZElBURZ3Pos= Received: from BYAPR11MB3301.namprd11.prod.outlook.com (2603:10b6:a03:7f::26) by BYAPR11MB2839.namprd11.prod.outlook.com (2603:10b6:a02:c8::33) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.3045.17; Thu, 28 May 2020 11:39:56 +0000 Received: from BYAPR11MB3301.namprd11.prod.outlook.com ([fe80::f160:29ab:b8f9:4189]) by BYAPR11MB3301.namprd11.prod.outlook.com ([fe80::f160:29ab:b8f9:4189%6]) with mapi id 15.20.3045.018; Thu, 28 May 2020 11:39:56 +0000 From: "Ananyev, Konstantin" To: "Burakov, Anatoly" , "dev@dpdk.org" CC: "Richardson, Bruce" , "Hunt, David" , "Ma, Liang J" , "Honnappa.Nagarahalli@arm.com" Thread-Topic: [RFC 1/6] eal: add power management intrinsics Thread-Index: AQHWNEidPQMwm8vCREGwZax/qVFN9qi9XypQ Date: Thu, 28 May 2020 11:39:55 +0000 Message-ID: References: <2772eb151ccba5cc17186e6161d8834176924753.1590598121.git.anatoly.burakov@intel.com> In-Reply-To: <2772eb151ccba5cc17186e6161d8834176924753.1590598121.git.anatoly.burakov@intel.com> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.2.0.6 authentication-results: intel.com; dkim=none (message not signed) header.d=none;intel.com; dmarc=none action=none header.from=intel.com; x-originating-ip: [192.198.151.191] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: 8bbc4e85-5530-4cda-cdb3-08d802fbd882 x-ms-traffictypediagnostic: BYAPR11MB2839: x-ld-processed: 46c98d88-e344-4ed4-8496-4ed7712e255d,ExtAddr x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:10000; x-forefront-prvs: 0417A3FFD2 x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: HaVr7MOOjShtp8JmfSR68/vaIXFNJPbhTlFqMLF8j9D3i3lOnyr91xJ29ZxkI2xy0zkFJFHfXjEi/rzJalnBuKNqwOZsXGFuaVz+7n24dKQ8tooUgj3Q+T773aYS4CtOIBDNYqpQEt0LXFb4KAfF/AvW2fEh7aS5odhBpu7y3XwFCRzDbHOuThThWKiD/0tVB4VrxV+DdcGTAHHwqBYj1eFWakIzsnkRSVx8w9AzJ6bDdlKOPYF2vPimGgpsdGhfS+rLU+LBsfxq0DxYy/54/6QNquA3DrGqEooM73aSqPY+TpPbrlVGtjcO5aiD2Grb7xbfGfvE67DZZkWF+HLPfA== x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB3301.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(366004)(136003)(396003)(346002)(376002)(39860400002)(71200400001)(66946007)(52536014)(76116006)(54906003)(8676002)(83380400001)(5660300002)(316002)(110136005)(9686003)(55016002)(6506007)(478600001)(8936002)(2906002)(33656002)(86362001)(66446008)(64756008)(66476007)(66556008)(26005)(186003)(30864003)(7696005)(4326008); DIR:OUT; SFP:1102; x-ms-exchange-antispam-messagedata: qMphqdIXP9Z6/4YgU9VSmSKHLl0VfGz6882zqUgNft0hnK7zZMJ6Kia2JdnLyaMLSSuSaFj0jkYuBsDQ2TD3dnMeuXLUPrpshJXNej6KkLFg9/OomzgsX8ykyU+NxvjtZqa7yGPBjRNXZ4f/G4kmRRiJjxQCri40H8rnF9V1OwgddS/4nKP+86xyavScTqqV7JCsnmhlKZS7Ntu1H7Nt+z34KsG7lbhuHSSxJPQM1+Rtq/WOtLeT0K3qpNH++RSWkSX2IZEHH8pHTVpK69PBB5rxnkAC0hv0mH2YcKmaW1aZ2nfbV5sltEa/YTD4hyjQCNXUl6kREUjbIJAsoewF/bte5iRB/LGsVurwmyIxmqml1Hhr/VmVEAjbUHC/jQUgDjzO3ZUG67Dss93Pe/R9vTYDR5/vK9B2D/JZCTiJKgzcdsx/WbAn52yhE7g6GT7Y0ZLI78BwOW4uEaJz0MidUpip8ReA5ZmRY+cchqrKAxRg3BEZEZteqGlnMnNamIQK Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: 8bbc4e85-5530-4cda-cdb3-08d802fbd882 X-MS-Exchange-CrossTenant-originalarrivaltime: 28 May 2020 11:39:55.9236 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: Cy3E/JrSU7G6a4TB3kVV9arqE4DX2uoarj9qBDfav3xBsw/mto/hyqoXzhXXhxIHayE0Jh0B4iy8tdH75UUVTWeheOry89fMbDSfmY+6P+A= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB2839 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [RFC 1/6] eal: add power management intrinsics X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" Hi Anatoly, >=20 > Add two new power management intrinsics, and provide an implementation > in eal/x86 based on UMONITOR/UMWAIT instructions. The instructions > are implemented as raw byte opcodes because there is not yet widespread > compiler support for these instructions. >=20 > The power management instructions provide an architecture-specific > function to either wait until a specified TSC timestamp is reached, or > optionally wait until either a TSC timestamp is reached or a memory > location is written to. The monitor function also provides an optional > comparison, to avoid sleeping when the expected write has already > happened, and no more writes are expected. Recently ARM guys introduced new generic API for similar (as I understand) purposes: rte_wait_until_equal_(16|32|64). Probably would make sense to unite both APIs into something common and HW transparent.=20 Konstantin >=20 > Signed-off-by: Liang J. Ma > Signed-off-by: Anatoly Burakov > --- > .../include/generic/rte_power_intrinsics.h | 64 +++++++++ > lib/librte_eal/include/meson.build | 1 + > lib/librte_eal/x86/include/meson.build | 1 + > lib/librte_eal/x86/include/rte_cpuflags.h | 1 + > .../x86/include/rte_power_intrinsics.h | 134 ++++++++++++++++++ > lib/librte_eal/x86/rte_cpuflags.c | 2 + > 6 files changed, 203 insertions(+) > create mode 100644 lib/librte_eal/include/generic/rte_power_intrinsics.h > create mode 100644 lib/librte_eal/x86/include/rte_power_intrinsics.h >=20 > diff --git a/lib/librte_eal/include/generic/rte_power_intrinsics.h b/lib/= librte_eal/include/generic/rte_power_intrinsics.h > new file mode 100644 > index 0000000000..8646c4ac16 > --- /dev/null > +++ b/lib/librte_eal/include/generic/rte_power_intrinsics.h > @@ -0,0 +1,64 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + */ > + > +#ifndef _RTE_POWER_INTRINSIC_H_ > +#define _RTE_POWER_INTRINSIC_H_ > + > +#include > + > +/** > + * @file > + * Advanced power management operations. > + * > + * This file define APIs for advanced power management, > + * which are architecture-dependent. > + */ > + > +/** > + * Monitor specific address for changes. This will cause the CPU to ente= r an > + * architecture-defined optimized power state until either the specified > + * memory address is written to, or a certain TSC timestamp is reached. > + * > + * Additionally, an `expected` 64-bit value and 64-bit mask are provided= . If > + * mask is non-zero, the current value pointed to by the `p` pointer wil= l be > + * checked against the expected value, and if they match, the entering o= f > + * optimized power state may be aborted. > + * > + * @param p > + * Address to monitor for changes. Must be aligned on an 8-byte bounda= ry. > + * @param expected_value > + * Before attempting the monitoring, the `p` address may be read and c= ompared > + * against this value. If `value_mask` is zero, this step will be skip= ped. > + * @param value_mask > + * The 64-bit mask to use to extract current value from `p`. > + * @param state > + * Architecture-dependent optimized power state number > + * @param tsc_timestamp > + * Maximum TSC timestamp to wait for. Note that the wait behavior is > + * architecture-dependent. > + * > + * @return > + * Architecture-dependent return value. > + */ > +static inline int rte_power_monitor(const volatile void *p, > + const uint64_t expected_value, const uint64_t value_mask, > + const uint32_t state, const uint64_t tsc_timestamp); > + > +/** > + * Enter an architecture-defined optimized power state until a certain T= SC > + * timestamp is reached. > + * > + * @param state > + * Architecture-dependent optimized power state number > + * @param tsc_timestamp > + * Maximum TSC timestamp to wait for. Note that the wait behavior is > + * architecture-dependent. > + * > + * @return > + * Architecture-dependent return value. > + */ > +static inline int rte_power_pause(const uint32_t state, > + const uint64_t tsc_timestamp); > + > +#endif /* _RTE_POWER_INTRINSIC_H_ */ > diff --git a/lib/librte_eal/include/meson.build b/lib/librte_eal/include/= meson.build > index bc73ec2c5c..b54a2be4f6 100644 > --- a/lib/librte_eal/include/meson.build > +++ b/lib/librte_eal/include/meson.build > @@ -59,6 +59,7 @@ generic_headers =3D files( > 'generic/rte_memcpy.h', > 'generic/rte_pause.h', > 'generic/rte_prefetch.h', > + 'generic/rte_power_intrinsics.h', > 'generic/rte_rwlock.h', > 'generic/rte_spinlock.h', > 'generic/rte_ticketlock.h', > diff --git a/lib/librte_eal/x86/include/meson.build b/lib/librte_eal/x86/= include/meson.build > index f0e998c2fe..494a8142a2 100644 > --- a/lib/librte_eal/x86/include/meson.build > +++ b/lib/librte_eal/x86/include/meson.build > @@ -13,6 +13,7 @@ arch_headers =3D files( > 'rte_io.h', > 'rte_memcpy.h', > 'rte_prefetch.h', > + 'rte_power_intrinsics.h', > 'rte_pause.h', > 'rte_rtm.h', > 'rte_rwlock.h', > diff --git a/lib/librte_eal/x86/include/rte_cpuflags.h b/lib/librte_eal/x= 86/include/rte_cpuflags.h > index c1d20364d1..94d6a43763 100644 > --- a/lib/librte_eal/x86/include/rte_cpuflags.h > +++ b/lib/librte_eal/x86/include/rte_cpuflags.h > @@ -110,6 +110,7 @@ enum rte_cpu_flag_t { > RTE_CPUFLAG_RDTSCP, /**< RDTSCP */ > RTE_CPUFLAG_EM64T, /**< EM64T */ >=20 > + RTE_CPUFLAG_WAITPKG, /**< UMINITOR/UMWAIT/TPAUSE */ > /* (EAX 80000007h) EDX features */ > RTE_CPUFLAG_INVTSC, /**< INVTSC */ >=20 > diff --git a/lib/librte_eal/x86/include/rte_power_intrinsics.h b/lib/libr= te_eal/x86/include/rte_power_intrinsics.h > new file mode 100644 > index 0000000000..a0522400fb > --- /dev/null > +++ b/lib/librte_eal/x86/include/rte_power_intrinsics.h > @@ -0,0 +1,134 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2020 Intel Corporation > + */ > + > +#ifndef _RTE_POWER_INTRINSIC_X86_64_H_ > +#define _RTE_POWER_INTRINSIC_X86_64_H_ > + > +#ifdef __cplusplus > +extern "C" { > +#endif > + > +#include > +#include > + > +#include "generic/rte_power_intrinsics.h" > + > +/** > + * Monitor specific address for changes. This will cause the CPU to ente= r an > + * architecture-defined optimized power state until either the specified > + * memory address is written to, or a certain TSC timestamp is reached. > + * > + * Additionally, an `expected` 64-bit value and 64-bit mask are provided= . If > + * mask is non-zero, the current value pointed to by the `p` pointer wil= l be > + * checked against the expected value, and if they match, the entering o= f > + * optimized power state may be aborted. > + * > + * This function uses UMONITOR/UMWAIT instructions. For more information= about > + * their usage, please refer to Intel(R) 64 and IA-32 Architectures Soft= ware > + * Developer's Manual. > + * > + * @param p > + * Address to monitor for changes. Must be aligned on an 8-byte bounda= ry. > + * @param expected_value > + * Before attempting the monitoring, the `p` address may be read and c= ompared > + * against this value. If `value_mask` is zero, this step will be skip= ped. > + * @param value_mask > + * The 64-bit mask to use to extract current value from `p`. > + * @param state > + * Architecture-dependent optimized power state number. Can be 0 (C0.2= ) or > + * 1 (C0.1). > + * @param tsc_timestamp > + * Maximum TSC timestamp to wait for. > + * > + * @return > + * - 1 if wakeup was due to TSC timeout expiration. > + * - 0 if wakeup was due to memory write or other reasons. > + */ > +static inline int rte_power_monitor(const volatile void *p, > + const uint64_t expected_value, const uint64_t value_mask, > + const uint32_t state, const uint64_t tsc_timestamp) > +{ > + const uint32_t tsc_l =3D (uint32_t)tsc_timestamp; > + const uint32_t tsc_h =3D (uint32_t)(tsc_timestamp >> 32); > + uint64_t rflags; > + > + /* > + * we're using raw byte codes for now as only the newest compiler > + * versions support this instruction natively. > + */ > + > + /* set address for UMONITOR */ > + asm volatile(".byte 0xf3, 0x0f, 0xae, 0xf7;" > + : > + : "D"(p)); > + rte_mb(); > + if (value_mask) { > + const uint64_t cur_value =3D *(const volatile uint64_t *)p; > + const uint64_t masked =3D cur_value & value_mask; > + /* if the masked value is already matching, abort */ > + if (masked =3D=3D expected_value) > + return 0; > + } > + /* execute UMWAIT */ > + asm volatile(".byte 0xf2, 0x0f, 0xae, 0xf7;\n" > + /* > + * UMWAIT sets CF flag in RFLAGS, so PUSHF to push them > + * onto the stack, then pop them back into `rflags` so that > + * we can read it. > + */ > + "pushf;\n" > + "pop %0;\n" > + : "=3Dr"(rflags) > + : "D"(state), "a"(tsc_l), "d"(tsc_h)); > + > + /* we're interested in the first bit (the carry flag) */ > + return rflags & 0x1; > +} > + > +/** > + * Enter an architecture-defined optimized power state until a certain T= SC > + * timestamp is reached. > + * > + * This function uses TPAUSE instruction. For more information about its= usage, > + * please refer to Intel(R) 64 and IA-32 Architectures Software Develope= r's > + * Manual. > + * > + * @param state > + * Architecture-dependent optimized power state number. Can be 0 (C0.2= ) or > + * 1 (C0.1). > + * @param tsc_timestamp > + * Maximum TSC timestamp to wait for. > + * > + * @return > + * - 1 if wakeup was due to TSC timeout expiration. > + * - 0 if wakeup was due to other reasons. > + */ > +static inline int rte_power_pause(const uint32_t state, > + const uint64_t tsc_timestamp) > +{ > + const uint32_t tsc_l =3D (uint32_t)tsc_timestamp; > + const uint32_t tsc_h =3D (uint32_t)(tsc_timestamp >> 32); > + uint64_t rflags; > + > + /* execute TPAUSE */ > + asm volatile(".byte 0x66, 0x0f, 0xae, 0xf7;\n" > + /* > + * TPAUSE sets CF flag in RFLAGS, so PUSHF to push them > + * onto the stack, then pop them back into `rflags` so that > + * we can read it. > + */ > + "pushf;\n" > + "pop %0;\n" > + : "=3Dr"(rflags) > + : "D"(state), "a"(tsc_l), "d"(tsc_h)); > + > + /* we're interested in the first bit (the carry flag) */ > + return rflags & 0x1; > +} > + > +#ifdef __cplusplus > +} > +#endif > + > +#endif /* _RTE_POWER_INTRINSIC_X86_64_H_ */ > diff --git a/lib/librte_eal/x86/rte_cpuflags.c b/lib/librte_eal/x86/rte_c= puflags.c > index 30439e7951..0325c4b93b 100644 > --- a/lib/librte_eal/x86/rte_cpuflags.c > +++ b/lib/librte_eal/x86/rte_cpuflags.c > @@ -110,6 +110,8 @@ const struct feature_entry rte_cpu_feature_table[] = =3D { > FEAT_DEF(AVX512F, 0x00000007, 0, RTE_REG_EBX, 16) > FEAT_DEF(RDSEED, 0x00000007, 0, RTE_REG_EBX, 18) >=20 > + FEAT_DEF(WAITPKG, 0x00000007, 0, RTE_REG_ECX, 5) > + > FEAT_DEF(LAHF_SAHF, 0x80000001, 0, RTE_REG_ECX, 0) > FEAT_DEF(LZCNT, 0x80000001, 0, RTE_REG_ECX, 4) >=20 > -- > 2.17.1