From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by dpdk.org (Postfix) with ESMTP id D92F21B221 for ; Wed, 1 Nov 2017 18:09:43 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by orsmga105.jf.intel.com with ESMTP; 01 Nov 2017 10:09:41 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,329,1505804400"; d="scan'208";a="1212918666" Received: from spassas-mobl.ger.corp.intel.com ([10.252.11.64]) by fmsmga001.fm.intel.com with SMTP; 01 Nov 2017 10:09:11 -0700 Received: by (sSMTP sendmail emulation); Wed, 01 Nov 2017 17:09:09 +0000 Date: Wed, 1 Nov 2017 17:09:09 +0000 From: Bruce Richardson To: Harry van Haaren Cc: dev@dpdk.org, pbhagavatula@caviumnetworks.com, thomas@monjalon.net Message-ID: <20171101170909.GA23264@bricha3-MOBL3.ger.corp.intel.com> References: <1509450542-123626-1-git-send-email-harry.van.haaren@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1509450542-123626-1-git-send-email-harry.van.haaren@intel.com> Organization: Intel Research and Development Ireland Ltd. User-Agent: Mutt/1.9.1 (2017-09-22) Subject: Re: [dpdk-dev] [PATCH] service: fix race in service on app lcore function X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Nov 2017 17:09:44 -0000 On Tue, Oct 31, 2017 at 11:49:02AM +0000, Harry van Haaren wrote: > This commit fixes a possible race condition if an application > uses the service-cores infrastructure and the function to run > a service on an application lcore at the same time. > > The fix is to change the num_mapped_cores variable to be an > atomic variable. This causes concurrent accesses by multiple > threads to a service using rte_service_run_iter_on_app_lcore() > to detect if another core is currently mapped to the service, > and refuses to run if it is not multi-thread safe. > > No performance impact is expected as the mappings for the > service-cores changes at control-path frequency, hence the > change from an ordinary write to an atomic write will not > have any significant impact. > > Two unit tests were added to verify the behaviour of the > function to run a service on an application core, testing both > a multi-thread safe service, and a multi-thread unsafe service. > > The doxygen API documentation for the function has been updated > to reflect the current and correct behaviour. > > Fixes: e9139a32f6e8 ("service: add function to run on app lcore") > > Signed-off-by: Harry van Haaren > > --- > @@ -381,8 +381,28 @@ service_run(uint32_t i, struct core_state *cs, uint64_t service_mask) > int32_t rte_service_run_iter_on_app_lcore(uint32_t id) > { > /* run service on calling core, using all-ones as the service mask */ > + if (!service_valid(id)) > + return -EINVAL; > + > struct core_state *cs = &lcore_states[rte_lcore_id()]; > - return service_run(id, cs, UINT64_MAX); > + struct rte_service_spec_impl *s = &rte_services[id]; > + > + /* Atomically add this core to the mapped cores first, then examine if > + * we can run the service. This avoids a race condition between > + * checking the value, and atomically adding to the mapped count. > + */ > + rte_atomic32_inc(&s->num_mapped_cores); > + > + if (service_mt_safe(s) == 0 && > + rte_atomic32_read(&s->num_mapped_cores) > 1) { > + rte_atomic32_dec(&s->num_mapped_cores); > + return -EBUSY; > + } > + > + int ret = service_run(id, cs, UINT64_MAX); > + rte_atomic32_dec(&s->num_mapped_cores); > + > + return ret; > } Do we really need to do an atomic inc and dec in this function? If we check that there are no service cores mapped, would that not be enough for safety? If an app core is calling a service, the control plane is unlikely to decide to start spawning off a service core for that service simultaneously. Manipulating the count is safer, yes, but unlike the other changes in this patch, this one will affect performance, so I think we can go without. Similarly, for multiple data plane threads calling the same service simultaneously: everything else dataplane is done without locks so I think this should be too. /Bruce