From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from dpdk.org (dpdk.org [92.243.14.124]) by inbox.dpdk.org (Postfix) with ESMTP id A05FDA0566; Tue, 10 Mar 2020 18:51:36 +0100 (CET) Received: from [92.243.14.124] (localhost [127.0.0.1]) by dpdk.org (Postfix) with ESMTP id 2D0D41C0D9; Tue, 10 Mar 2020 18:50:34 +0100 (CET) Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by dpdk.org (Postfix) with ESMTP id 814261C0D7 for ; Tue, 10 Mar 2020 18:50:31 +0100 (CET) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 1B76E1FB; Tue, 10 Mar 2020 10:50:31 -0700 (PDT) Received: from phil-VirtualBox.arm.com (A010647.Arm.com [10.170.243.28]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPA id 8AEBD3F534; Tue, 10 Mar 2020 10:50:27 -0700 (PDT) From: Phil Yang To: thomas@monjalon.net, harry.van.haaren@intel.com, konstantin.ananyev@intel.com, stephen@networkplumber.org, maxime.coquelin@redhat.com, dev@dpdk.org Cc: david.marchand@redhat.com, jerinj@marvell.com, hemant.agrawal@nxp.com, Honnappa.Nagarahalli@arm.com, gavin.hu@arm.com, ruifeng.wang@arm.com, joyce.kong@arm.com, nd@arm.com Date: Wed, 11 Mar 2020 01:49:10 +0800 Message-Id: <1583862551-2049-10-git-send-email-phil.yang@arm.com> X-Mailer: git-send-email 2.7.4 In-Reply-To: <1583862551-2049-1-git-send-email-phil.yang@arm.com> References: <1583862551-2049-1-git-send-email-phil.yang@arm.com> Subject: [dpdk-dev] [PATCH 09/10] service: optimize with c11 one-way barrier X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" The num_mapped_cores and execute_lock are synchronized with rte_atomic_XX APIs which is a full barrier, DMB, on aarch64. This patch optimized it with c11 atomic one-way barrier. Signed-off-by: Phil Yang Reviewed-by: Ruifeng Wang Reviewed-by: Gavin Hu Reviewed-by: Honnappa Nagarahalli --- lib/librte_eal/common/rte_service.c | 50 ++++++++++++++++++++++++++----------- 1 file changed, 35 insertions(+), 15 deletions(-) diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c index 0186024..efb3c9f 100644 --- a/lib/librte_eal/common/rte_service.c +++ b/lib/librte_eal/common/rte_service.c @@ -42,7 +42,7 @@ struct rte_service_spec_impl { * running this service callback. When not set, a core may take the * lock and then run the service callback. */ - rte_atomic32_t execute_lock; + uint32_t execute_lock; /* API set/get-able variables */ int8_t app_runstate; @@ -54,7 +54,7 @@ struct rte_service_spec_impl { * It does not indicate the number of cores the service is running * on currently. */ - rte_atomic32_t num_mapped_cores; + int32_t num_mapped_cores; uint64_t calls; uint64_t cycles_spent; } __rte_cache_aligned; @@ -329,7 +329,8 @@ rte_service_runstate_get(uint32_t id) rte_smp_rmb(); int check_disabled = !(s->internal_flags & SERVICE_F_START_CHECK); - int lcore_mapped = (rte_atomic32_read(&s->num_mapped_cores) > 0); + int lcore_mapped = (__atomic_load_n(&s->num_mapped_cores, + __ATOMIC_RELAXED) > 0); return (s->app_runstate == RUNSTATE_RUNNING) && (s->comp_runstate == RUNSTATE_RUNNING) && @@ -372,11 +373,20 @@ service_run(uint32_t i, struct core_state *cs, uint64_t service_mask, cs->service_active_on_lcore[i] = 1; if ((service_mt_safe(s) == 0) && (serialize_mt_unsafe == 1)) { - if (!rte_atomic32_cmpset((uint32_t *)&s->execute_lock, 0, 1)) + uint32_t expected = 0; + /* ACQUIRE ordering here is to prevent the callback + * function from hoisting up before the execute_lock + * setting. + */ + if (!__atomic_compare_exchange_n(&s->execute_lock, &expected, 1, + 0, __ATOMIC_ACQUIRE, __ATOMIC_RELAXED)) return -EBUSY; service_runner_do_callback(s, cs, i); - rte_atomic32_clear(&s->execute_lock); + /* RELEASE ordering here is used to pair with ACQUIRE + * above to achieve lock semantic. + */ + __atomic_store_n(&s->execute_lock, 0, __ATOMIC_RELEASE); } else service_runner_do_callback(s, cs, i); @@ -412,11 +422,11 @@ rte_service_run_iter_on_app_lcore(uint32_t id, uint32_t serialize_mt_unsafe) /* Increment num_mapped_cores to indicate that the service is * is running on a core. */ - rte_atomic32_inc(&s->num_mapped_cores); + __atomic_add_fetch(&s->num_mapped_cores, 1, __ATOMIC_ACQUIRE); int ret = service_run(id, cs, UINT64_MAX, s, serialize_mt_unsafe); - rte_atomic32_dec(&s->num_mapped_cores); + __atomic_sub_fetch(&s->num_mapped_cores, 1, __ATOMIC_RELEASE); return ret; } @@ -549,24 +559,32 @@ service_update(uint32_t sid, uint32_t lcore, uint64_t sid_mask = UINT64_C(1) << sid; if (set) { - uint64_t lcore_mapped = lcore_states[lcore].service_mask & - sid_mask; + /* When multiple threads try to update the same lcore + * service concurrently, e.g. set lcore map followed + * by clear lcore map, the unsynchronized service_mask + * values have issues on the num_mapped_cores value + * consistency. So we use ACQUIRE ordering to pair with + * the RELEASE ordering to synchronize the service_mask. + */ + uint64_t lcore_mapped = __atomic_load_n( + &lcore_states[lcore].service_mask, + __ATOMIC_ACQUIRE) & sid_mask; if (*set && !lcore_mapped) { lcore_states[lcore].service_mask |= sid_mask; - rte_atomic32_inc(&rte_services[sid].num_mapped_cores); + __atomic_add_fetch(&rte_services[sid].num_mapped_cores, + 1, __ATOMIC_RELEASE); } if (!*set && lcore_mapped) { lcore_states[lcore].service_mask &= ~(sid_mask); - rte_atomic32_dec(&rte_services[sid].num_mapped_cores); + __atomic_sub_fetch(&rte_services[sid].num_mapped_cores, + 1, __ATOMIC_RELEASE); } } if (enabled) *enabled = !!(lcore_states[lcore].service_mask & (sid_mask)); - rte_smp_wmb(); - return 0; } @@ -622,7 +640,8 @@ rte_service_lcore_reset_all(void) } } for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) - rte_atomic32_set(&rte_services[i].num_mapped_cores, 0); + __atomic_store_n(&rte_services[i].num_mapped_cores, 0, + __ATOMIC_RELAXED); rte_smp_wmb(); @@ -705,7 +724,8 @@ rte_service_lcore_stop(uint32_t lcore) int32_t enabled = service_mask & (UINT64_C(1) << i); int32_t service_running = rte_service_runstate_get(i); int32_t only_core = (1 == - rte_atomic32_read(&rte_services[i].num_mapped_cores)); + __atomic_load_n(&rte_services[i].num_mapped_cores, + __ATOMIC_RELAXED)); /* if the core is mapped, and the service is running, and this * is the only core that is mapped, the service would cease to -- 2.7.4