From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by dpdk.org (Postfix) with ESMTP id 8265D1B2F5 for ; Wed, 1 Nov 2017 18:59:55 +0100 (CET) Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga104.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 01 Nov 2017 10:59:54 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.44,330,1505804400"; d="scan'208";a="1212937180" Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159]) by fmsmga001.fm.intel.com with ESMTP; 01 Nov 2017 10:59:53 -0700 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.180]) by IRSMSX104.ger.corp.intel.com ([163.33.3.159]) with mapi id 14.03.0319.002; Wed, 1 Nov 2017 17:59:52 +0000 From: "Van Haaren, Harry" To: "Richardson, Bruce" CC: "dev@dpdk.org" , "pbhagavatula@caviumnetworks.com" , "thomas@monjalon.net" Thread-Topic: [dpdk-dev] [PATCH] service: fix race in service on app lcore function Thread-Index: AQHTUj4xN5K3zADC3kCl+zG+OXZnp6L/w8aAgAAJtVA= Date: Wed, 1 Nov 2017 17:59:51 +0000 Message-ID: References: <1509450542-123626-1-git-send-email-harry.van.haaren@intel.com> <20171101170909.GA23264@bricha3-MOBL3.ger.corp.intel.com> In-Reply-To: <20171101170909.GA23264@bricha3-MOBL3.ger.corp.intel.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNjQyZmUxZmYtMzcyNy00YzQ1LThiNTAtMmM3YTIxZjc1Y2VmIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6Ik9Qdm45Q0ZcL0crVFRpT3YrRXhzdTlJU09GWE50bWRXMFRCblwvQTViVDhYQT0ifQ== x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 11.0.0.116 dlp-reaction: no-action x-originating-ip: [163.33.239.182] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [PATCH] service: fix race in service on app lcore function X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 01 Nov 2017 17:59:56 -0000 > From: Richardson, Bruce > Sent: Wednesday, November 1, 2017 5:09 PM > To: Van Haaren, Harry > Cc: dev@dpdk.org; pbhagavatula@caviumnetworks.com; thomas@monjalon.net > Subject: Re: [dpdk-dev] [PATCH] service: fix race in service on app lcore > function >=20 > On Tue, Oct 31, 2017 at 11:49:02AM +0000, Harry van Haaren wrote: > > This commit fixes a possible race condition if an application > > uses the service-cores infrastructure and the function to run > > a service on an application lcore at the same time. > > > > The fix is to change the num_mapped_cores variable to be an > > atomic variable. This causes concurrent accesses by multiple > > threads to a service using rte_service_run_iter_on_app_lcore() > > to detect if another core is currently mapped to the service, > > and refuses to run if it is not multi-thread safe. > > > > No performance impact is expected as the mappings for the > > service-cores changes at control-path frequency, hence the > > change from an ordinary write to an atomic write will not > > have any significant impact. > > > > Two unit tests were added to verify the behaviour of the > > function to run a service on an application core, testing both > > a multi-thread safe service, and a multi-thread unsafe service. > > > > The doxygen API documentation for the function has been updated > > to reflect the current and correct behaviour. > > > > Fixes: e9139a32f6e8 ("service: add function to run on app lcore") > > > > Signed-off-by: Harry van Haaren > > > > --- > > > > @@ -381,8 +381,28 @@ service_run(uint32_t i, struct core_state *cs, uint6= 4_t > service_mask) > > int32_t rte_service_run_iter_on_app_lcore(uint32_t id) > > { > > /* run service on calling core, using all-ones as the service mask */ > > + if (!service_valid(id)) > > + return -EINVAL; > > + > > struct core_state *cs =3D &lcore_states[rte_lcore_id()]; > > - return service_run(id, cs, UINT64_MAX); > > + struct rte_service_spec_impl *s =3D &rte_services[id]; > > + > > + /* Atomically add this core to the mapped cores first, then examine i= f > > + * we can run the service. This avoids a race condition between > > + * checking the value, and atomically adding to the mapped count. > > + */ > > + rte_atomic32_inc(&s->num_mapped_cores); > > + > > + if (service_mt_safe(s) =3D=3D 0 && > > + rte_atomic32_read(&s->num_mapped_cores) > 1) { > > + rte_atomic32_dec(&s->num_mapped_cores); > > + return -EBUSY; > > + } > > + > > + int ret =3D service_run(id, cs, UINT64_MAX); > > + rte_atomic32_dec(&s->num_mapped_cores); > > + > > + return ret; > > } >=20 > Do we really need to do an atomic inc and dec in this function? If we > check that there are no service cores mapped, would that not be enough > for safety? If an app core is calling a service, the control plane is > unlikely to decide to start spawning off a service core for that > service simultaneously. Manipulating the count is safer, yes, but unlike > the other changes in this patch, this one will affect performance, so I > think we can go without. Similarly, for multiple data plane threads > calling the same service simultaneously: everything else dataplane is > done without locks so I think this should be too. I think the inc-dec is required with the current code; what if *two* applic= ation cores are calling the same service? If we don't have the atomic inc-dec of = the num-mapped-cores, both app cores could simultaneously run a multi-thread un= safe service. The other option is to demand that the *app* must serialize access to a par= ticular service using this function - but in my eyes that limits the usefulness of this fun= ction, because it pushes the problem up a level instead of solving it. I guess an application "in the know" of how its cores are acting would pref= er to not have the atomics at the service-cores library level. As a compromise, the u= se of these atomics could be a parameter to the rte_service_run_iter_on_app_lcore() fun= ction. That would allow the application to choose if it wants to avail of the func= tionality in the service cores library, or if it takes the responsibility to serializ= e access to multi-thread unsafe functions itself. I'll spin up a v2 with a parameter to the function. Thanks for review, -Harry