From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id 72603A0562; Fri, 3 Apr 2020 13:58:42 +0200 (CEST) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id DB0A01C127; Fri, 3 Apr 2020 13:58:29 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by dpdk.org (Postfix) with ESMTP id D50CE1C0CE for ; Fri, 3 Apr 2020 13:58:27 +0200 (CEST) IronPort-SDR: wedXgh5N4MSID66Kll2wjDAAwkMw4F2RYUrdIec3w22DFEzwaDuPHJQveSNjBsn50U93pMEzn7 TXkoPcfZ/PNQ== X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from orsmga004.jf.intel.com ([10.7.209.38]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 03 Apr 2020 04:58:26 -0700 IronPort-SDR: pTynokEZAde05T5GloGguF2qUoRjnZYWcHXRqrAuE0bPyGPCbpqAkUiE8Mew4tHiwfUkrfcD82 IBDUZVy75qGQ== X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.72,339,1580803200"; d="scan'208";a="396715225" Received: from fmsmsx107.amr.corp.intel.com ([10.18.124.205]) by orsmga004.jf.intel.com with ESMTP; 03 Apr 2020 04:58:26 -0700 Received: from FMSMSX109.amr.corp.intel.com (10.18.116.9) by fmsmsx107.amr.corp.intel.com (10.18.124.205) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 3 Apr 2020 04:58:26 -0700 Received: from FMSEDG002.ED.cps.intel.com (10.1.192.134) by fmsmsx109.amr.corp.intel.com (10.18.116.9) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 3 Apr 2020 04:58:25 -0700 Received: from NAM11-DM6-obe.outbound.protection.outlook.com (104.47.57.169) by edgegateway.intel.com (192.55.55.69) with Microsoft SMTP Server (TLS) id 14.3.439.0; Fri, 3 Apr 2020 04:58:25 -0700 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=c7IH4mpjHOw74dZmclz4kM3wE9KAnXPxMkVR36umQLchnOqMSL6P8batrr9FI51mRpd1MMu5ocaOM1a9SnpsocUcHqZQQ7th1dQ3kXSEALu+GJ58EImDQrzCCppShHAtzbTcmYt+5ua5lxBfuFJU3t4PxJK+Uci5iwNF73Kjd/8scFbeLR0zRtvnWgGTuDkv4c1I51VYQXaL0ykAmrvBjEGxQL93tKpxDQIfQ33gGjEV5Z6ORfUalgl4LJ7Rd5SwoDe/7AxHWKstYc70iXWC17y5x3ew+I5OxPpMeZ9F22s4+W/Ig19otDoS9c30vVK7TMWiV7zkFkod9etGdftEgA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=xpvtVVrgdXAvRsOseYSZ5DOafwI7jTFMuqqoPu3EEjc=; b=LVMRorZ5ygkiqHqA4LVAk75jRjAb8m/mGNG86gYcafe0gFbWu+zAVL45KoT5EQujhZV+smhUiPBfVCW42I+oNDDiHHIE3QYTft4YXtfsHi17bqS+6IEuWiu3O8GcK9QtLkaosOAAlxCo1bUqeIXxE1ANWsQUGhGgEiopfVaOyXyrjbUxrVbOQkY39venk3mpKyEbJz6ATFlUCeTvLcHUBkbMb8zfPG2bRFeqxtpGGbsi1pacRU375sl1eK/qqMVGm/ovmSgFzJ3LtBQNdoMryGuftj8LRtzM4pW0QPZ2KLoQGW8SmTlUCjHDerBr9Ua/hODjCvFp7KpMdZsG/qlzrg== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=pass smtp.mailfrom=intel.com; dmarc=pass action=none header.from=intel.com; dkim=pass header.d=intel.com; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel.onmicrosoft.com; s=selector2-intel-onmicrosoft-com; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=xpvtVVrgdXAvRsOseYSZ5DOafwI7jTFMuqqoPu3EEjc=; b=S+ugiS+N99OHCaUM6cUKxFFygx4wEXRyop04uaQLvE3lOZ0MbxArrMamNNhebzp3VJmpPtNt5PtwpRL3/1YHyxaO52eBCr4yEY/i/ZYoiyJPvYuLy9NVM1wGDmhvjcRlZT4sZQX9sVyUCGazYEIIHZujMtXO3q+dKkGkLX3Kpp4= Received: from BYAPR11MB3143.namprd11.prod.outlook.com (2603:10b6:a03:92::32) by BYAPR11MB3541.namprd11.prod.outlook.com (2603:10b6:a03:f5::16) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.2878.15; Fri, 3 Apr 2020 11:58:24 +0000 Received: from BYAPR11MB3143.namprd11.prod.outlook.com ([fe80::5917:9757:49b0:3c49]) by BYAPR11MB3143.namprd11.prod.outlook.com ([fe80::5917:9757:49b0:3c49%5]) with mapi id 15.20.2878.014; Fri, 3 Apr 2020 11:58:24 +0000 From: "Van Haaren, Harry" To: Phil Yang , "thomas@monjalon.net" , "Ananyev, Konstantin" , "stephen@networkplumber.org" , "maxime.coquelin@redhat.com" , "dev@dpdk.org" CC: "david.marchand@redhat.com" , "jerinj@marvell.com" , "hemant.agrawal@nxp.com" , "Honnappa.Nagarahalli@arm.com" , "gavin.hu@arm.com" , "ruifeng.wang@arm.com" , "joyce.kong@arm.com" , "nd@arm.com" Thread-Topic: [PATCH v3 11/12] service: optimize with c11 one-way barrier Thread-Index: AQHV+/olVsBdtrwLnk+Z+0XLca31t6hnVtvg Date: Fri, 3 Apr 2020 11:58:23 +0000 Message-ID: References: <1583999071-22872-1-git-send-email-phil.yang@arm.com> <1584407863-774-1-git-send-email-phil.yang@arm.com> <1584407863-774-12-git-send-email-phil.yang@arm.com> In-Reply-To: <1584407863-774-12-git-send-email-phil.yang@arm.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: dlp-product: dlpe-windows dlp-reaction: no-action dlp-version: 11.2.0.6 authentication-results: spf=none (sender IP is ) smtp.mailfrom=harry.van.haaren@intel.com; x-originating-ip: [192.198.151.168] x-ms-publictraffictype: Email x-ms-office365-filtering-correlation-id: fe84e489-56b6-4b92-a59d-08d7d7c64fe6 x-ms-traffictypediagnostic: BYAPR11MB3541: x-ms-exchange-transport-forked: True x-microsoft-antispam-prvs: x-ms-oob-tlc-oobclassifiers: OLM:2150; x-forefront-prvs: 0362BF9FDB x-forefront-antispam-report: CIP:255.255.255.255; CTRY:; LANG:en; SCL:1; SRV:; IPV:NLI; SFV:NSPM; H:BYAPR11MB3143.namprd11.prod.outlook.com; PTR:; CAT:NONE; SFTY:; SFS:(10019020)(396003)(136003)(376002)(346002)(366004)(39860400002)(76116006)(53546011)(8676002)(478600001)(9686003)(7696005)(966005)(66476007)(81156014)(81166006)(55016002)(5660300002)(64756008)(4326008)(66556008)(186003)(52536014)(66446008)(2906002)(66946007)(26005)(8936002)(33656002)(86362001)(110136005)(54906003)(6506007)(71200400001)(7416002)(316002)(87944003); DIR:OUT; SFP:1102; x-ms-exchange-senderadcheck: 1 x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: Tiqkf8TazzDd8q9vyxwILSHk4CIpHCB0tOgSCglcmU5CcW2AHd9jh4ImN0dN4/0wxeav8E227uQ8tORfTSRGuijy285oFY4Fb1ko3ouAN0CLCAaicueNP98brtxFJrSeo9B0rtoCHJWcAg3t3E+U9KncE0xvi37oH1waMmqY8wNUPmxbeAsOF5/RWtrB4GL30CTcWa6f/9EdNr0Ej6kDDkmm1HXJrPgDlwz6rvGCZ8bHx1az0DXpyErgFIHHIEME6xcfGoaMZ+0IaY7i/nohsBq3C5r0UQoGThIr8mSUAJSRCFCUneAQK7Rfa9+FphW9AaIpyN0Vldx7OM5c+6DVnscvtVkwLsgjmDfI2dzmwnIxxknW9gmJmzAbmKRx5gv4I+BfO2LJWM+tL9jmDCczHbXHo9MFfXdjSfMnDqxP4nkD1TvdpAlYLh1S/+CjOD1M63+yK0MCD9xf2P4AWl3g/Ctkr9JyxDPTN99B6/xi+vxrnn/Y+3KdNGQ6YCcx2NLtsvyIELnA0msveiGdjs2nbj9bacAuAbtDHJ3VxthlSSqwrdPLrkePG0rqyDhM/VSY x-ms-exchange-antispam-messagedata: PYwROgpHu+qw1o3pQ7qH8tfbOZTDdpl70VqSMACq5FMbEV183S0PMuDKVkr0N99aZYkyLumsgAUeMMW+BQov9NcBzTqJHOyqT1cBRQOm40AgQJHIb1mgnEIMUlZGNzj+YV7aJa1tYyME3eFiSJHiWQ== Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Exchange-CrossTenant-Network-Message-Id: fe84e489-56b6-4b92-a59d-08d7d7c64fe6 X-MS-Exchange-CrossTenant-originalarrivaltime: 03 Apr 2020 11:58:23.8437 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 46c98d88-e344-4ed4-8496-4ed7712e255d X-MS-Exchange-CrossTenant-mailboxtype: HOSTED X-MS-Exchange-CrossTenant-userprincipalname: qAh7hopseTpHrX8Bf2P4ury/5kHge60g9oy6X3g4gtDdAHfeZxyNtcuAbhaLoLRo9TBUUC0tZ50t2CytbNycJDQHwIEWxXakGK/WpEE5gE0= X-MS-Exchange-Transport-CrossTenantHeadersStamped: BYAPR11MB3541 X-OriginatorOrg: intel.com Subject: Re: [dpdk-dev] [PATCH v3 11/12] service: optimize with c11 one-way barrier X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" > -----Original Message----- > From: Phil Yang > Sent: Tuesday, March 17, 2020 1:18 AM > To: thomas@monjalon.net; Van Haaren, Harry ; > Ananyev, Konstantin ; > stephen@networkplumber.org; maxime.coquelin@redhat.com; dev@dpdk.org > Cc: david.marchand@redhat.com; jerinj@marvell.com; hemant.agrawal@nxp.com= ; > Honnappa.Nagarahalli@arm.com; gavin.hu@arm.com; ruifeng.wang@arm.com; > joyce.kong@arm.com; nd@arm.com > Subject: [PATCH v3 11/12] service: optimize with c11 one-way barrier >=20 > The num_mapped_cores and execute_lock are synchronized with rte_atomic_XX > APIs which is a full barrier, DMB, on aarch64. This patch optimized it wi= th > c11 atomic one-way barrier. >=20 > Signed-off-by: Phil Yang > Reviewed-by: Ruifeng Wang > Reviewed-by: Gavin Hu > Reviewed-by: Honnappa Nagarahalli Based on discussion on-list, it seems the consensus is to not use GCC builtins, but instead use C11 APIs "proper"? If my conclusion is correct, the v+1 of this patchset would require updates to that style API. Inline comments for context below, -Harry=20 > --- > lib/librte_eal/common/rte_service.c | 50 ++++++++++++++++++++++++++-----= ----- > - > 1 file changed, 35 insertions(+), 15 deletions(-) >=20 > diff --git a/lib/librte_eal/common/rte_service.c > b/lib/librte_eal/common/rte_service.c > index 0843c3c..c033224 100644 > --- a/lib/librte_eal/common/rte_service.c > +++ b/lib/librte_eal/common/rte_service.c > @@ -42,7 +42,7 @@ struct rte_service_spec_impl { > * running this service callback. When not set, a core may take the > * lock and then run the service callback. > */ > - rte_atomic32_t execute_lock; > + uint32_t execute_lock; >=20 > /* API set/get-able variables */ > int8_t app_runstate; > @@ -54,7 +54,7 @@ struct rte_service_spec_impl { > * It does not indicate the number of cores the service is running > * on currently. > */ > - rte_atomic32_t num_mapped_cores; > + int32_t num_mapped_cores; Any reason why "int32_t" or "uint32_t" is used over another? execute_lock is a uint32_t above, num_mapped_cores is an int32_t? > uint64_t calls; > uint64_t cycles_spent; > } __rte_cache_aligned; > @@ -332,7 +332,8 @@ rte_service_runstate_get(uint32_t id) > rte_smp_rmb(); >=20 > int check_disabled =3D !(s->internal_flags & SERVICE_F_START_CHECK); > - int lcore_mapped =3D (rte_atomic32_read(&s->num_mapped_cores) > 0); > + int lcore_mapped =3D (__atomic_load_n(&s->num_mapped_cores, > + __ATOMIC_RELAXED) > 0); >=20 > return (s->app_runstate =3D=3D RUNSTATE_RUNNING) && > (s->comp_runstate =3D=3D RUNSTATE_RUNNING) && > @@ -375,11 +376,20 @@ service_run(uint32_t i, struct core_state *cs, uint= 64_t > service_mask, > cs->service_active_on_lcore[i] =3D 1; >=20 > if ((service_mt_safe(s) =3D=3D 0) && (serialize_mt_unsafe =3D=3D 1)) { > - if (!rte_atomic32_cmpset((uint32_t *)&s->execute_lock, 0, 1)) > + uint32_t expected =3D 0; > + /* ACQUIRE ordering here is to prevent the callback > + * function from hoisting up before the execute_lock > + * setting. > + */ > + if (!__atomic_compare_exchange_n(&s->execute_lock, &expected, 1, > + 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) > return -EBUSY; Let's try improve the magic "1" and "0" constants, I believe the "1" here is the desired "new value on success", and the 0 is "bool weak", where our = 0/false constant implies a strongly ordered compare exchange? "Weak is true for weak compare_exchange, which may fail spuriously, and fal= se for the strong variation, which never fails spuriously.", from https://g= cc.gnu.org/onlinedocs/gcc/_005f_005fatomic-Builtins.html const uint32_t on_success_value =3D 1; const bool weak =3D 0; __atomic_compare_exchange_n(&s->execute_lock, &expected, on_success_value, = weak, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED); Although a bit more verbose, I feel this documents usage a lot better, particularly for those who aren't as familiar with the C11 function arguments order. Admittedly with the API change to not use __builtins, perhaps this comment is moot. >=20 > service_runner_do_callback(s, cs, i); > - rte_atomic32_clear(&s->execute_lock); > + /* RELEASE ordering here is used to pair with ACQUIRE > + * above to achieve lock semantic. > + */ > + __atomic_store_n(&s->execute_lock, 0, __ATOMIC_RELEASE); > } else > service_runner_do_callback(s, cs, i); >=20 > @@ -415,11 +425,11 @@ rte_service_run_iter_on_app_lcore(uint32_t id, uint= 32_t > serialize_mt_unsafe) > /* Increment num_mapped_cores to indicate that the service > * is running on a core. > */ > - rte_atomic32_inc(&s->num_mapped_cores); > + __atomic_add_fetch(&s->num_mapped_cores, 1, __ATOMIC_ACQUIRE); >=20 > int ret =3D service_run(id, cs, UINT64_MAX, s, serialize_mt_unsafe); >=20 > - rte_atomic32_dec(&s->num_mapped_cores); > + __atomic_sub_fetch(&s->num_mapped_cores, 1, __ATOMIC_RELEASE); >=20 > return ret; > } > @@ -552,24 +562,32 @@ service_update(uint32_t sid, uint32_t lcore, >=20 > uint64_t sid_mask =3D UINT64_C(1) << sid; > if (set) { > - uint64_t lcore_mapped =3D lcore_states[lcore].service_mask & > - sid_mask; > + /* When multiple threads try to update the same lcore > + * service concurrently, e.g. set lcore map followed > + * by clear lcore map, the unsynchronized service_mask > + * values have issues on the num_mapped_cores value > + * consistency. So we use ACQUIRE ordering to pair with > + * the RELEASE ordering to synchronize the service_mask. > + */ > + uint64_t lcore_mapped =3D __atomic_load_n( > + &lcore_states[lcore].service_mask, > + __ATOMIC_ACQUIRE) & sid_mask; Thanks for the comment - it helps me understand things a bit better. Some questions/theories to validate; 1) The service_mask ACQUIRE avoids other loads being hoisted above it, corr= ect? 2) There are non-atomic stores to service_mask. Is it correct that the stor= es themselves aren't the issue, but relative visibility of service_mask sto= res vs num_mapped_cores? (Detail in (3) below) > if (*set && !lcore_mapped) { > lcore_states[lcore].service_mask |=3D sid_mask; > - rte_atomic32_inc(&rte_services[sid].num_mapped_cores); > + __atomic_add_fetch(&rte_services[sid].num_mapped_cores, > + 1, __ATOMIC_RELEASE); > } > if (!*set && lcore_mapped) { > lcore_states[lcore].service_mask &=3D ~(sid_mask); > - rte_atomic32_dec(&rte_services[sid].num_mapped_cores); > + __atomic_sub_fetch(&rte_services[sid].num_mapped_cores, > + 1, __ATOMIC_RELEASE); > } 3) Here we update the core-local service_mask, and then update the num_mapped_cores with an ATOMIC_RELEASE. The RELEASE here ensures that the previous store to service_mask is guaranteed to be visible on all cores if this store is visible. Why do we care about this property? The service_mask is core local anway. 4) Even with the load ACQ service_mask, and REL num_mapped_cores store, is = there not still a race-condition possible where 2 lcores simultaneously loa= d-ACQ the service_mask, and then both do atomic add/sub_fetch with REL? 5) Assuming 4 above race is true, it raises the real question - the service= -cores control APIs are not designed to be multi-thread-safe. Orchestration= of service/lcore mappings is not meant to be done by multiple threads at t= he same time. Documenting this loudly may help, I'm happy to send a patch t= o do so if we're agreed on the above? > } >=20 > if (enabled) > *enabled =3D !!(lcore_states[lcore].service_mask & (sid_mask)); >=20 > - rte_smp_wmb(); > - > return 0; > } >=20 > @@ -625,7 +643,8 @@ rte_service_lcore_reset_all(void) > } > } > for (i =3D 0; i < RTE_SERVICE_NUM_MAX; i++) > - rte_atomic32_set(&rte_services[i].num_mapped_cores, 0); > + __atomic_store_n(&rte_services[i].num_mapped_cores, 0, > + __ATOMIC_RELAXED); >=20 > rte_smp_wmb(); >=20 > @@ -708,7 +727,8 @@ rte_service_lcore_stop(uint32_t lcore) > int32_t enabled =3D service_mask & (UINT64_C(1) << i); > int32_t service_running =3D rte_service_runstate_get(i); > int32_t only_core =3D (1 =3D=3D > - rte_atomic32_read(&rte_services[i].num_mapped_cores)); > + __atomic_load_n(&rte_services[i].num_mapped_cores, > + __ATOMIC_RELAXED)); >=20 > /* if the core is mapped, and the service is running, and this > * is the only core that is mapped, the service would cease to > -- > 2.7.4