From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bruce.richardson@intel.com>
Received: from mga07.intel.com (mga07.intel.com [134.134.136.100])
 by dpdk.org (Postfix) with ESMTP id D92F21B221
 for <dev@dpdk.org>; Wed,  1 Nov 2017 18:09:43 +0100 (CET)
Received: from fmsmga001.fm.intel.com ([10.253.24.23])
 by orsmga105.jf.intel.com with ESMTP; 01 Nov 2017 10:09:41 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.44,329,1505804400"; d="scan'208";a="1212918666"
Received: from spassas-mobl.ger.corp.intel.com ([10.252.11.64])
 by fmsmga001.fm.intel.com with SMTP; 01 Nov 2017 10:09:11 -0700
Received: by  (sSMTP sendmail emulation); Wed, 01 Nov 2017 17:09:09 +0000
Date: Wed, 1 Nov 2017 17:09:09 +0000
From: Bruce Richardson <bruce.richardson@intel.com>
To: Harry van Haaren <harry.van.haaren@intel.com>
Cc: dev@dpdk.org, pbhagavatula@caviumnetworks.com, thomas@monjalon.net
Message-ID: <20171101170909.GA23264@bricha3-MOBL3.ger.corp.intel.com>
References: <1509450542-123626-1-git-send-email-harry.van.haaren@intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <1509450542-123626-1-git-send-email-harry.van.haaren@intel.com>
Organization: Intel Research and Development Ireland Ltd.
User-Agent: Mutt/1.9.1 (2017-09-22)
Subject: Re: [dpdk-dev] [PATCH] service: fix race in service on app lcore
 function
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 01 Nov 2017 17:09:44 -0000

On Tue, Oct 31, 2017 at 11:49:02AM +0000, Harry van Haaren wrote:
> This commit fixes a possible race condition if an application
> uses the service-cores infrastructure and the function to run
> a service on an application lcore at the same time.
> 
> The fix is to change the num_mapped_cores variable to be an
> atomic variable. This causes concurrent accesses by multiple
> threads to a service using rte_service_run_iter_on_app_lcore()
> to detect if another core is currently mapped to the service,
> and refuses to run if it is not multi-thread safe.
> 
> No performance impact is expected as the mappings for the
> service-cores changes at control-path frequency, hence the
> change from an ordinary write to an atomic write will not
> have any significant impact.
> 
> Two unit tests were added to verify the behaviour of the
> function to run a service on an application core, testing both
> a multi-thread safe service, and a multi-thread unsafe service.
> 
> The doxygen API documentation for the function has been updated
> to reflect the current and correct behaviour.
> 
> Fixes: e9139a32f6e8 ("service: add function to run on app lcore")
> 
> Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> 
> ---
> 
<snip>
@@ -381,8 +381,28 @@ service_run(uint32_t i, struct core_state *cs, uint64_t service_mask)
>  int32_t rte_service_run_iter_on_app_lcore(uint32_t id)
>  {
>  	/* run service on calling core, using all-ones as the service mask */
> +	if (!service_valid(id))
> +		return -EINVAL;
> +
>  	struct core_state *cs = &lcore_states[rte_lcore_id()];
> -	return service_run(id, cs, UINT64_MAX);
> +	struct rte_service_spec_impl *s = &rte_services[id];
> +
> +	/* Atomically add this core to the mapped cores first, then examine if
> +	 * we can run the service. This avoids a race condition between
> +	 * checking the value, and atomically adding to the mapped count.
> +	 */
> +	rte_atomic32_inc(&s->num_mapped_cores);
> +
> +	if (service_mt_safe(s) == 0 &&
> +			rte_atomic32_read(&s->num_mapped_cores) > 1) {
> +		rte_atomic32_dec(&s->num_mapped_cores);
> +		return -EBUSY;
> +	}
> +
> +	int ret = service_run(id, cs, UINT64_MAX);
> +	rte_atomic32_dec(&s->num_mapped_cores);
> +
> +	return ret;
>  }

Do we really need to do an atomic inc and dec in this function? If we
check that there are no service cores mapped, would that not be enough
for safety? If an app core is calling a service, the control plane is
unlikely to decide to start spawning off a service core for that
service simultaneously. Manipulating the count is safer, yes, but unlike
the other changes in this patch, this one will affect performance, so I
think we can go without. Similarly, for multiple data plane threads
calling the same service simultaneously: everything else dataplane is
done without locks so I think this should be too.

/Bruce