From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id 561CA4C57 for ; Fri, 30 Jun 2017 15:17:15 +0200 (CEST) Received: from orsmga004.jf.intel.com ([10.7.209.38]) by orsmga104.jf.intel.com with ESMTP; 30 Jun 2017 06:17:14 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.40,286,1496127600"; d="scan'208";a="103164952" Received: from irsmsx104.ger.corp.intel.com ([163.33.3.159]) by orsmga004.jf.intel.com with ESMTP; 30 Jun 2017 06:17:13 -0700 Received: from irsmsx102.ger.corp.intel.com ([169.254.2.211]) by IRSMSX104.ger.corp.intel.com ([169.254.5.26]) with mapi id 14.03.0319.002; Fri, 30 Jun 2017 14:16:44 +0100 From: "Van Haaren, Harry" To: Jerin Jacob CC: Thomas Monjalon , "dev@dpdk.org" , "Wiles, Keith" , "Richardson, Bruce" Thread-Topic: Service lcores and Application lcores Thread-Index: AdLw4cih5pwlzbSuRKKmF+/821WngAAAHVUAAAPYppAABrlnAAAcDVPQ///8dYD//+X2QIAALWQA///neXCAAEFIAP//7qtg Date: Fri, 30 Jun 2017 13:16:44 +0000 Message-ID: References: <2363216.DczB0HHKeo@xps> <1614665.GlQH7FWj5q@xps> <20170630130422.GB4578@jerin> In-Reply-To: <20170630130422.GB4578@jerin> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiN2M0OWExODMtMTc0NC00ZjYyLTlkM2MtYTYxMzAxYzU1ZDI4IiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX0lDIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE2LjUuOS4zIiwiVHJ1c3RlZExhYmVsSGFzaCI6IjlkcGlJbzZLTzRXTURGRW4rMm1rQ2Y1NmVTeXVSekFyK0VXMStcL0JjZ0RNPSJ9 x-ctpclassification: CTP_IC dlp-product: dlpe-windows dlp-version: 10.0.102.7 dlp-reaction: no-action x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] Service lcores and Application lcores X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 30 Jun 2017 13:17:16 -0000 > From: Jerin Jacob [mailto:jerin.jacob@caviumnetworks.com] > Sent: Friday, June 30, 2017 2:04 PM > To: Van Haaren, Harry > Cc: Thomas Monjalon ; dev@dpdk.org; Wiles, Keith > ; Richardson, Bruce > Subject: Re: Service lcores and Application lcores >=20 > -----Original Message----- > > Date: Fri, 30 Jun 2017 11:14:39 +0000 > > From: "Van Haaren, Harry" > > To: Thomas Monjalon > > CC: "dev@dpdk.org" , 'Jerin Jacob' > > , "Wiles, Keith" , > > "Richardson, Bruce" > > Subject: RE: Service lcores and Application lcores > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net] > > > Sent: Friday, June 30, 2017 11:39 AM > > > To: Van Haaren, Harry > > > Cc: dev@dpdk.org; 'Jerin Jacob' ; Wil= es, Keith > > > ; Richardson, Bruce > > > Subject: Re: Service lcores and Application lcores > > > > > > 30/06/2017 12:18, Van Haaren, Harry: > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net] > > > > > 30/06/2017 10:52, Van Haaren, Harry: > > > > > > From: Thomas Monjalon [mailto:thomas@monjalon.net] > > > > > > > 29/06/2017 18:35, Van Haaren, Harry: > > > > > > > > 3) The problem; > > > > > > > > If a service core runs the SW PMD schedule() function (o= ption 2) *AND* > > > > > > > > the application lcore runs schedule() func (option 1), t= he result is that > > > > > > > > two threads are concurrently running a multi-thread unsa= fe function. > > > > > > > > > > > > > > Which function is multi-thread unsafe? > > > > > > > > > > > > With the current design, the service-callback does not have to = be multi-thread > safe. > > > > > > For example, the eventdev SW PMD is not multi-thread safe. > > > > > > > > > > > > The service library handles serializing access to the service-c= allback if > multiple > > > cores > > > > > > are mapped to that service. This keeps the atomic complexity in= one place, and > keeps > > > > > > services as light-weight to implement as possible. > > > > > > > > > > > > (We could consider forcing all service-callbacks to be multi-th= read safe by > using > > > > > atomics, > > > > > > but we would not be able to optimize away the atomic cmpset if = it is not > required. > > > This > > > > > > feels heavy handed, and would cause useless atomic ops to execu= te.) > > > > > > > > > > OK thank you for the detailed explanation. > > > > > > > > > > > > Why the same function would be run by the service and by the = scheduler? > > > > > > > > > > > > The same function can be run concurrently by the application, a= nd a service > core. > > > > > > The root cause that this could happen is that an application ca= n *think* it is > the > > > > > > only one running threads, but in reality one or more service-co= res may be > running > > > > > > in the background. > > > > > > > > > > > > The service lcores and application lcores existence without kno= wledge of the > others > > > > > > behavior is the cause of concurrent running of the multi-thread= unsafe service > > > function. > > > > > > > > > > That's the part I still don't understand. > > > > > Why an application would run a function on its own core if it is = already > > > > > run as a service? Can we just have a check that the service API e= xists > > > > > and that the service is running? > > > > > > > > The point is that really it is an application / service core mis-ma= tch. > > > > The application should never run a PMD that it knows also has a ser= vice core running > it. > > > > > > Yes > > > > > > > However, porting applications to the service-core API has an over-l= ap time where an > > > > application on 17.05 will be required to call eg: rte_eventdev_sche= dule() itself, > and > > > > depending on startup EAL flags for service-cores, it may-or-may-not= have to call > > > schedule() manually. > > > > > > Yes service cores may be unavailable, depending of user configuration= . > > > That's why it must be possible to request the service core API > > > to know whether a service is run or not. > > > > Yep - an application can check if a service is running by calling > rte_service_is_running(struct service_spec*); > > It returns true if a service-core is running, mapped to the service, an= d the service is > start()-ed. >=20 > If I understand it correctly, driver should check the the _required_ > service has been running or not ? Not the _application_. Right? I think the PMD should check if a service core is mapped, and it can print = a warning if not. In the case of eventdev, the eventdev_start() is the function where service= _is_running() is checked, and if not, we inform the user that no service-co= re is ready to run the service. >>From the application POV, it could use e.g. the rte_service_iterate()* to r= un that service - so the PMD should not fail to start(), just warn that at = time of starting there was no core available to it. The application itself = must still check if it should call rte_eventdev_schedule() itself, based on= rte_version.h as Thomas mentioned.=20 The ideal end goal is in my opinion something like this; Service cores are used to run services by 95+% of apps, to abstract away SW= /HW core-requirement differences.=20 Advanced applications can utilize rte_service_iterate() to run specific ser= vices on application lcores if it wishes. * See other "branch" of this thread about rte_service_iterate() http://dpdk.org/ml/archives/dev/2017-June/069540.html > > > When porting an application to service core, you just have to run thi= s > > > check, which is known to be available for DPDK 17.08 (check rte_versi= on.h). > > > > Ok, so as part of porting to service-cores, applications are expected t= o sanity check > the services vs their own lcore config. > > If there's no disagreement, I will add it to the releases notes of the = V+1 service-cores > patchset. > > > > There is still a need for the rte_service_iterate() function as discuss= ed in the other > branch of this thread. > > I'll wait for consensus on that and post the next revision then. > > > > Thanks for the questions / input! > > > > > > > > This is pretty error prone, and mis-configuration would cause A) de= adlock due to no > CPU > > > cycles, B) segfault due to two cores.