From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <bruce.richardson@intel.com>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id B48991B3A9
 for <dev@dpdk.org>; Thu,  2 Nov 2017 10:10:25 +0100 (CET)
Received: from orsmga003.jf.intel.com ([10.7.209.27])
 by orsmga101.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 02 Nov 2017 02:10:23 -0700
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.44,333,1505804400"; d="scan'208";a="1032551034"
Received: from bricha3-mobl3.ger.corp.intel.com ([10.237.221.32])
 by orsmga003.jf.intel.com with SMTP; 02 Nov 2017 02:10:20 -0700
Received: by  (sSMTP sendmail emulation); Thu, 02 Nov 2017 09:10:20 +0000
Date: Thu, 2 Nov 2017 09:10:20 +0000
From: Bruce Richardson <bruce.richardson@intel.com>
To: "Van Haaren, Harry" <harry.van.haaren@intel.com>
Cc: "dev@dpdk.org" <dev@dpdk.org>,
 "pbhagavatula@caviumnetworks.com" <pbhagavatula@caviumnetworks.com>,
 "thomas@monjalon.net" <thomas@monjalon.net>
Message-ID: <20171102091019.GA19712@bricha3-MOBL3.ger.corp.intel.com>
References: <1509450542-123626-1-git-send-email-harry.van.haaren@intel.com>
 <20171101170909.GA23264@bricha3-MOBL3.ger.corp.intel.com>
 <E923DB57A917B54B9182A2E928D00FA650FCE519@IRSMSX102.ger.corp.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <E923DB57A917B54B9182A2E928D00FA650FCE519@IRSMSX102.ger.corp.intel.com>
Organization: Intel Research and Development Ireland Ltd.
User-Agent: Mutt/1.9.1 (2017-09-22)
Subject: Re: [dpdk-dev] [PATCH] service: fix race in service on app lcore
 function
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Thu, 02 Nov 2017 09:10:26 -0000

On Wed, Nov 01, 2017 at 05:59:51PM +0000, Van Haaren, Harry wrote:
> > From: Richardson, Bruce
> > Sent: Wednesday, November 1, 2017 5:09 PM
> > To: Van Haaren, Harry <harry.van.haaren@intel.com>
> > Cc: dev@dpdk.org; pbhagavatula@caviumnetworks.com; thomas@monjalon.net
> > Subject: Re: [dpdk-dev] [PATCH] service: fix race in service on app lcore
> > function
> > 
> > On Tue, Oct 31, 2017 at 11:49:02AM +0000, Harry van Haaren wrote:
> > > This commit fixes a possible race condition if an application
> > > uses the service-cores infrastructure and the function to run
> > > a service on an application lcore at the same time.
> > >
> > > The fix is to change the num_mapped_cores variable to be an
> > > atomic variable. This causes concurrent accesses by multiple
> > > threads to a service using rte_service_run_iter_on_app_lcore()
> > > to detect if another core is currently mapped to the service,
> > > and refuses to run if it is not multi-thread safe.
> > >
> > > No performance impact is expected as the mappings for the
> > > service-cores changes at control-path frequency, hence the
> > > change from an ordinary write to an atomic write will not
> > > have any significant impact.
> > >
> > > Two unit tests were added to verify the behaviour of the
> > > function to run a service on an application core, testing both
> > > a multi-thread safe service, and a multi-thread unsafe service.
> > >
> > > The doxygen API documentation for the function has been updated
> > > to reflect the current and correct behaviour.
> > >
> > > Fixes: e9139a32f6e8 ("service: add function to run on app lcore")
> > >
> > > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> > >
> > > ---
> > >
> > <snip>
> > @@ -381,8 +381,28 @@ service_run(uint32_t i, struct core_state *cs, uint64_t
> > service_mask)
> > >  int32_t rte_service_run_iter_on_app_lcore(uint32_t id)
> > >  {
> > >  	/* run service on calling core, using all-ones as the service mask */
> > > +	if (!service_valid(id))
> > > +		return -EINVAL;
> > > +
> > >  	struct core_state *cs = &lcore_states[rte_lcore_id()];
> > > -	return service_run(id, cs, UINT64_MAX);
> > > +	struct rte_service_spec_impl *s = &rte_services[id];
> > > +
> > > +	/* Atomically add this core to the mapped cores first, then examine if
> > > +	 * we can run the service. This avoids a race condition between
> > > +	 * checking the value, and atomically adding to the mapped count.
> > > +	 */
> > > +	rte_atomic32_inc(&s->num_mapped_cores);
> > > +
> > > +	if (service_mt_safe(s) == 0 &&
> > > +			rte_atomic32_read(&s->num_mapped_cores) > 1) {
> > > +		rte_atomic32_dec(&s->num_mapped_cores);
> > > +		return -EBUSY;
> > > +	}
> > > +
> > > +	int ret = service_run(id, cs, UINT64_MAX);
> > > +	rte_atomic32_dec(&s->num_mapped_cores);
> > > +
> > > +	return ret;
> > >  }
> > 
> > Do we really need to do an atomic inc and dec in this function? If we
> > check that there are no service cores mapped, would that not be enough
> > for safety? If an app core is calling a service, the control plane is
> > unlikely to decide to start spawning off a service core for that
> > service simultaneously. Manipulating the count is safer, yes, but unlike
> > the other changes in this patch, this one will affect performance, so I
> > think we can go without. Similarly, for multiple data plane threads
> > calling the same service simultaneously: everything else dataplane is
> > done without locks so I think this should be too.
> 
> 
> I think the inc-dec is required with the current code; what if *two* application
> cores are calling the same service? If we don't have the atomic inc-dec of the
> num-mapped-cores, both app cores could simultaneously run a multi-thread unsafe service.
> 
> The other option is to demand that the *app* must serialize access to a particular service
> using this function - but in my eyes that limits the usefulness of this function, because
> it pushes the problem up a level instead of solving it.
> 
> I guess an application "in the know" of how its cores are acting would prefer to not
> have the atomics at the service-cores library level. As a compromise, the use of these
> atomics could be a parameter to the rte_service_run_iter_on_app_lcore() function.
> 
> That would allow the application to choose if it wants to avail of the functionality
> in the service cores library, or if it takes the responsibility to serialize access
> to multi-thread unsafe functions itself.
> 
> 
> I'll spin up a v2 with a parameter to the function.
> 
Ok, that sounds reasonable enough. Just so long as apps that don't need
the lock don't have to use it.

/Bruce