From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id B48991B3A9 for ; Thu, 2 Nov 2017 10:10:25 +0100 (CET) Received: from orsmga003.jf.intel.com ([10.7.209.27]) by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 02 Nov 2017 02:10:23 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,333,1505804400"; d="scan'208";a="1032551034" Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.221.32]) by orsmga003.jf.intel.com with SMTP; 02 Nov 2017 02:10:20 -0700 Received: by (sSMTP sendmail emulation); Thu, 02 Nov 2017 09:10:20 +0000 Date: Thu, 2 Nov 2017 09:10:20 +0000 From: Bruce Richardson To: "Van Haaren, Harry" Cc: "dev@dpdk.org" , "pbhagavatula@caviumnetworks.com" , "thomas@monjalon.net" Message-ID: <20171102091019.GA19712@bricha3-MOBL3.ger.corp.intel.com> References: <1509450542-123626-1-git-send-email-harry.van.haaren@intel.com> <20171101170909.GA23264@bricha3-MOBL3.ger.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: Organization: Intel Research and Development Ireland Ltd. User-Agent: Mutt/1.9.1 (2017-09-22) Subject: Re: [dpdk-dev] [PATCH] service: fix race in service on app lcore function X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 02 Nov 2017 09:10:26 -0000 On Wed, Nov 01, 2017 at 05:59:51PM +0000, Van Haaren, Harry wrote: > > From: Richardson, Bruce > > Sent: Wednesday, November 1, 2017 5:09 PM > > To: Van Haaren, Harry > > Cc: dev@dpdk.org; pbhagavatula@caviumnetworks.com; thomas@monjalon.net > > Subject: Re: [dpdk-dev] [PATCH] service: fix race in service on app lcore > > function > > > > On Tue, Oct 31, 2017 at 11:49:02AM +0000, Harry van Haaren wrote: > > > This commit fixes a possible race condition if an application > > > uses the service-cores infrastructure and the function to run > > > a service on an application lcore at the same time. > > > > > > The fix is to change the num_mapped_cores variable to be an > > > atomic variable. This causes concurrent accesses by multiple > > > threads to a service using rte_service_run_iter_on_app_lcore() > > > to detect if another core is currently mapped to the service, > > > and refuses to run if it is not multi-thread safe. > > > > > > No performance impact is expected as the mappings for the > > > service-cores changes at control-path frequency, hence the > > > change from an ordinary write to an atomic write will not > > > have any significant impact. > > > > > > Two unit tests were added to verify the behaviour of the > > > function to run a service on an application core, testing both > > > a multi-thread safe service, and a multi-thread unsafe service. > > > > > > The doxygen API documentation for the function has been updated > > > to reflect the current and correct behaviour. > > > > > > Fixes: e9139a32f6e8 ("service: add function to run on app lcore") > > > > > > Signed-off-by: Harry van Haaren > > > > > > --- > > > > > > > @@ -381,8 +381,28 @@ service_run(uint32_t i, struct core_state *cs, uint64_t > > service_mask) > > > int32_t rte_service_run_iter_on_app_lcore(uint32_t id) > > > { > > > /* run service on calling core, using all-ones as the service mask */ > > > + if (!service_valid(id)) > > > + return -EINVAL; > > > + > > > struct core_state *cs = &lcore_states[rte_lcore_id()]; > > > - return service_run(id, cs, UINT64_MAX); > > > + struct rte_service_spec_impl *s = &rte_services[id]; > > > + > > > + /* Atomically add this core to the mapped cores first, then examine if > > > + * we can run the service. This avoids a race condition between > > > + * checking the value, and atomically adding to the mapped count. > > > + */ > > > + rte_atomic32_inc(&s->num_mapped_cores); > > > + > > > + if (service_mt_safe(s) == 0 && > > > + rte_atomic32_read(&s->num_mapped_cores) > 1) { > > > + rte_atomic32_dec(&s->num_mapped_cores); > > > + return -EBUSY; > > > + } > > > + > > > + int ret = service_run(id, cs, UINT64_MAX); > > > + rte_atomic32_dec(&s->num_mapped_cores); > > > + > > > + return ret; > > > } > > > > Do we really need to do an atomic inc and dec in this function? If we > > check that there are no service cores mapped, would that not be enough > > for safety? If an app core is calling a service, the control plane is > > unlikely to decide to start spawning off a service core for that > > service simultaneously. Manipulating the count is safer, yes, but unlike > > the other changes in this patch, this one will affect performance, so I > > think we can go without. Similarly, for multiple data plane threads > > calling the same service simultaneously: everything else dataplane is > > done without locks so I think this should be too. > > > I think the inc-dec is required with the current code; what if *two* application > cores are calling the same service? If we don't have the atomic inc-dec of the > num-mapped-cores, both app cores could simultaneously run a multi-thread unsafe service. > > The other option is to demand that the *app* must serialize access to a particular service > using this function - but in my eyes that limits the usefulness of this function, because > it pushes the problem up a level instead of solving it. > > I guess an application "in the know" of how its cores are acting would prefer to not > have the atomics at the service-cores library level. As a compromise, the use of these > atomics could be a parameter to the rte_service_run_iter_on_app_lcore() function. > > That would allow the application to choose if it wants to avail of the functionality > in the service cores library, or if it takes the responsibility to serialize access > to multi-thread unsafe functions itself. > > > I'll spin up a v2 with a parameter to the function. > Ok, that sounds reasonable enough. Just so long as apps that don't need the lock don't have to use it. /Bruce