From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from EUR01-DB5-obe.outbound.protection.outlook.com (mail-eopbgr150070.outbound.protection.outlook.com [40.107.15.70]) by dpdk.org (Postfix) with ESMTP id D1586271 for ; Thu, 24 Jan 2019 06:21:44 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=armh.onmicrosoft.com; s=selector1-arm-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=/X+Z1nrFREoWOcKOmYae9whTaphKfRkUngXpAK5Jzu4=; b=nGgYynqIgk7DrLqd4PJBYDa+0iw80iCgGhGkoh9Rzor49mi5bGg5OnY2Xmk51FK8lVWb+liiS36W85SGtX9ojXx42ptzpnESLkXkXF9FxCztFQKxHGyHaf9FfJF1El+yhkABzvOJFx+COJgEdEHyEcFdeOZHPKQ0sT1EVl4mpWA= Received: from AM6PR08MB3672.eurprd08.prod.outlook.com (20.177.115.76) by AM6PR08MB3494.eurprd08.prod.outlook.com (20.177.112.149) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.1537.27; Thu, 24 Jan 2019 05:21:43 +0000 Received: from AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::9120:87d6:b17c:fadd]) by AM6PR08MB3672.eurprd08.prod.outlook.com ([fe80::9120:87d6:b17c:fadd%3]) with mapi id 15.20.1558.016; Thu, 24 Jan 2019 05:21:43 +0000 From: Honnappa Nagarahalli To: "Eads, Gage" , "dev@dpdk.org" CC: "olivier.matz@6wind.com" , "arybchenko@solarflare.com" , "Richardson, Bruce" , "Ananyev, Konstantin" , nd , "chaozhu@linux.vnet.ibm.com" , "jerinj@marvell.com" , "hemant.agrawal@nxp.com" , nd Thread-Topic: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only) Thread-Index: AQHUra7wYmJ+w7UMKEWkA1HXhAc3iqWy9AnAgAD6ulCAAI2KgIABGauAgAYqIlCAABqPMIABPZsQ Date: Thu, 24 Jan 2019 05:21:43 +0000 Message-ID: References: <20190115223232.31866-1-gage.eads@intel.com> <20190116151835.22424-1-gage.eads@intel.com> <20190116151835.22424-2-gage.eads@intel.com> <9184057F7FC11744A2107296B6B8EB1E541C8BCA@FMSMSX108.amr.corp.intel.com> <9184057F7FC11744A2107296B6B8EB1E541C94D9@FMSMSX108.amr.corp.intel.com> <9184057F7FC11744A2107296B6B8EB1E541CA4CE@FMSMSX108.amr.corp.intel.com> In-Reply-To: <9184057F7FC11744A2107296B6B8EB1E541CA4CE@FMSMSX108.amr.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=Honnappa.Nagarahalli@arm.com; x-originating-ip: [217.140.111.135] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1; AM6PR08MB3494; 6:iUrmMAntiNZY+lGF/zyrD6cfGdnAFIT2n7EpnlROGr44DXhLESSOnYa+ITRwCdxM8kjGKgOXRVU08ScbyjKbm/xcXfe+fj8dhfKQxfM7/DisL4+PJg2HZUDZCmuXtMrJ0+xGtqUqPZIpj+O/16PjqRCN8TRM8IwG1ESL6GlPl4g8gv6P3E1xSgY/XKPp42AVoCvKnLZ3laM4yyIiExLrvAX2dZBcVXVDfvNG9tjG4UVwAW2/+lMjIzgVTWaoRCUegyx19vnZQCO9cANGLqwy189ZWfU1BkQqJyVDgz/h6HkuCy+8M0meUEsrX9et8T2ILTij2pX/kNVh3clTPdEjNiJDuZJdoQH+DIV+bqfW3CEYyqeuGiTFr+lo3mDbM7ZYUYD1r11prmIbYu0zqNM6drbYIsIsvlM1l2128NErNpeL3EpzYqkjDEcUtrFGvh4qkPF4ycDAee+ZIDiicEZEVQ==; 5:o6WzJ8k1TxMco5thKMw3MVMjZqpGpz9XLbvUi98HLqiDR4UxzpL5N7ZeQxNxijtgf4YI3hjznDVSGlV/JP9LxUWTW0ix6xZSgQyoNhVGEYIB9RT0x5DwlN79Ca59FpuvI+HFGIZfUEwkDPna526bpX+LJqJxWajEvKjC388NAPwUSVOAqHmV7/P8o0m7yUYB81iFYUGy6pvqPzAJeBsLCQ==; 7:BYogiDnpsPxtXyJpfkJyJ3B2v0fwapkO8YtZj1oR0bJiXzTnB6suhWgc2e0cCx6E2Vb9f1f/rjWOocVPqvzDv0RehXBYChM/+mRDqJ3U3/mMp2SRJtA59gSFUfluPX85jmBFzhEtXtCe5BJh8HET2w== x-ms-exchange-antispam-srfa-diagnostics: SOS; x-ms-office365-filtering-correlation-id: 090dd94e-c1c1-41a5-0e24-08d681bbd408 x-ms-office365-filtering-ht: Tenant x-microsoft-antispam: BCL:0; PCL:0; RULEID:(2390118)(7020095)(4652040)(8989299)(4534185)(4627221)(201703031133081)(201702281549075)(8990200)(5600110)(711020)(4605077)(4618075)(2017052603328)(7153060)(7193020); SRVR:AM6PR08MB3494; x-ms-traffictypediagnostic: AM6PR08MB3494: x-ld-processed: f34e5979-57d9-4aaa-ad4d-b122a662184d,ExtAddr nodisclaimer: True x-microsoft-antispam-prvs: x-forefront-prvs: 0927AA37C7 x-forefront-antispam-report: SFV:NSPM; SFS:(10009020)(396003)(376002)(346002)(366004)(39860400002)(136003)(189003)(199004)(305945005)(7736002)(14444005)(256004)(93886005)(229853002)(81166006)(8936002)(8676002)(81156014)(110136005)(14454004)(99286004)(55016002)(86362001)(316002)(54906003)(45080400002)(478600001)(72206003)(966005)(66066001)(6436002)(486006)(2501003)(106356001)(33656002)(105586002)(4326008)(102836004)(97736004)(9686003)(25786009)(6306002)(76176011)(6506007)(71200400001)(186003)(71190400001)(476003)(26005)(74316002)(446003)(7696005)(3846002)(11346002)(53936002)(68736007)(6116002)(2906002)(6246003); DIR:OUT; SFP:1101; SCL:1; SRVR:AM6PR08MB3494; H:AM6PR08MB3672.eurprd08.prod.outlook.com; FPR:; SPF:None; LANG:en; PTR:InfoNoRecords; MX:1; A:1; received-spf: None (protection.outlook.com: arm.com does not designate permitted sender hosts) x-ms-exchange-senderadcheck: 1 x-microsoft-antispam-message-info: jgVnnbNXf78jKGsDUMrtoImd+A5Y37jBUb7xIOBWqKFjD0+FxQIvq+xJXe6ft3x7bngHuaRb2S3t5cZQdhyl95eaFlnCCTouS3/bto8zRAc0TXgfl3GuKV78KCnpqBC1nEZ60txzZC45xC8ISltImg2Vkwj2eNdaqE05bm7M+S+snDLQ/+PtCRjm9wHVUvHJClFMZcuD83zN1Op2L8+Mcko1unmuEUmuuH1KQJ5Qbr0rnf/2ztL/xa8aywif+9vwaiH6yyPdu3I6xKuydV2u6xHencdQtFGww6iXXe2a7soT0zehtwlVf2FGbNDR+XbrHEN1rmCloRFSV8pD8efLY9pLLtuhoayM7Ub9gQjGHP1OlKOtNHWHPLBo4cvyoZKzleM8HhCFT4N3SW4HQ+qdGvQfKPyrvodZaADv3A6Vd9M= spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: arm.com X-MS-Exchange-CrossTenant-Network-Message-Id: 090dd94e-c1c1-41a5-0e24-08d681bbd408 X-MS-Exchange-CrossTenant-originalarrivaltime: 24 Jan 2019 05:21:43.4575 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: f34e5979-57d9-4aaa-ad4d-b122a662184d X-MS-Exchange-Transport-CrossTenantHeadersStamped: AM6PR08MB3494 Subject: Re: [dpdk-dev] [PATCH v3 1/2] eal: add 128-bit cmpset (x86-64 only) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 24 Jan 2019 05:21:45 -0000 > > > > Added other platform owners. > > > > > > > > > @@ -208,4 +209,25 @@ static inline void > > > > > > > rte_atomic64_clear(rte_atomic64_t > > > > > > > *v) } #endif > > > > > > > > > > > > > > +static inline int > > > > > > > +rte_atomic128_cmpset(volatile uint64_t *dst, uint64_t *exp, > > > > > > > +uint64_t > > > > > > > +*src) { > > > > > > The API name suggests it is a 128b operation. 'dst', 'exp' and = 'src' > > > > > > should be pointers to 128b (__int128)? Or we could define our > > > > > > own data > > > > > type. > > > > > > > > > > I agree, I'm not a big fan of the 64b pointers here. I avoided > > > > > __int128 originally because it fails to compile with -pedantic, > > > > > but on second thought (and with your suggestion of a separate > > > > > data type), we can resolve that with this typedef: > > > > > > > > > > typedef struct { > > > > > RTE_STD_C11 __int128 val; } rte_int128_t; > > > > ok > > > > > > > > > > > > > > > Since, it is a new API, can we define it with memory orderings > > > > > > which will be more conducive to relaxed memory ordering based > > > architectures? > > > > > > You can refer to [1] and [2] for guidance. > > > > > > > > > > I certainly see the value in controlling the operation's memory > > > > > ordering, like in the __atomic intrinsics, but I'm not sure this > > > > > patchset is the right place to address that. I see that work > > > > > going a couple > > > > ways: > > > > > 1. Expand the existing rte_atomicN_* interfaces with additional > > > > > arguments. In that case, I'd prefer this be done in a separate > > > > > patchset that addresses all the atomic operations, not just > > > > > cmpset, so the interface changes are chosen according to the > > > > > needs of the full set of atomic operations. If this approach is > > > > > taken then there's no need to solve this while > > > > > rte_atomic128_cmpset is experimental, since all the > > > > other functions are non-experimental anyway. > > > > > > > > > > - Or - > > > > > > > > > > 2. Don't modify the existing rte_atomicN_* interfaces (or their > > > > > strongly ordered behavior), and instead create new versions of > > > > > them that take additional arguments. In this case, we can > > > > > implement > > > > > rte_atomic128_cmpset() as is and create a more flexible version > > > > > in a later > > > > patchset. > > > > > > > > > > Either way, I think the current interface (w.r.t. memory > > > > > ordering > > > > > options) can work and still leaves us in a good position for > > > > > future > > > > changes/improvements. > > > > > > > > > I do not see the need to modify/extend the existing rte_atomicN_* > > > > APIs as the corresponding __atomic intrinsics serve as replacements= . > > > > I expect that at some point, DPDK code base will not be using > > > rte_atomicN_* APIs. > > > > However, __atomic intrinsics do not support 128b wide parameters. > > > > Hence > > > > > > I don't think that's correct. From the GCC docs: > > > > > > "16-byte integral types are also allowed if `__int128' (see > > > __int128) is supported by the architecture." > > > > > > This works with x86 -64 -- I assume aarch64 also, but haven't confirm= ed. > > > > > > Source: > > > https://gcc.gnu.org/onlinedocs/gcc-4.7.0/gcc/_005f_005fatomic- > > > Builtins.html > > (following is based on my reading, not based on experiments) My > > understanding is that the __atomic_load_n/store_n were implemented > > using a compare-and- swap HW instruction (due to lack of HW 128b > > atomic load and store instructions). This introduced the side effect > > or store/load respectively. Where as the user does not expect such > > side effects. As suggested in [1], it looks like GCC delegated the > > implementation to libatomic which 'it seems' uses locks to implement > > 128b __atomic intrinsics (needs to be verified) > > > > If __atomic intrinsics, for any of the supported platforms, do not > > have an optimal implementation, I prefer to add a DPDK API as an > > abstraction. A given platform can choose to implement such an API > > using __atomic intrinsics if it wants. The DPDK API can be similar to > __atomic_compare_exchange_n. > > >=20 > Certainly. From the linked discussions, I see how this would affect the d= esign > of (hypothetical functions) rte_atomic128_read() and rte_atomic128_set(), > but I don't see anything that suggests (for the architectures being discu= ssed) > that __atomic_compare_exchange_n is suboptimal. I wrote some code and generated assembly to verify what is happening. On aa= rch64, this call is delegated to libatomic and libatomic needs to be linked= . In the generated assembly, it is clear that it uses locks (pthread mutex = lock) to provide atomicity for. For 32b and 64b the compiler generates the = expected inline assembly. Both, ' __atomic_always_lock_free' and ' __atomic= _is_lock_free' return 0 to indicate that 128b __atomic intrinsics are not l= ock free. (gcc - 8.2) Out of curiosity, I also did similar experiments on x86 (CPU E5-2660 v4). E= ven after using -mcx16 flag the call is delegated to libatomic. I see the '= lock cmpxchg16b' in the generated assembly. But, ' __atomic_always_lock_fre= e' and ' __atomic_is_lock_free' return 0 to indicate that 128b __atomic int= rinsics are NOT lock free with gcc version 7.3.0. However with gcc version = 5.4.0, ' __atomic_is_lock_free' returns 1. I found more discussion at [3]. = However, I am not an expert on x86. [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D80878 These problems do not exist with 32b and 64b. >=20 > > [1] https://patchwork.ozlabs.org/patch/721686/ > > [2] https://gcc.gnu.org/ml/gcc/2017-01/msg00167.html > > > > > > > > > DPDK needs to write its own. Since this is the first API in that > > > > regard, I prefer that we start with a signature that resembles > > > > __atomic intrinsics which have been proven to provide best > > > > flexibility for > > > all the platforms supported by DPDK.