From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga02.intel.com (mga02.intel.com [134.134.136.20]) by dpdk.org (Postfix) with ESMTP id 8965219F5 for ; Fri, 9 Jan 2015 12:52:56 +0100 (CET) Received: from orsmga001.jf.intel.com ([10.7.209.18]) by orsmga101.jf.intel.com with ESMTP; 09 Jan 2015 03:52:55 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.07,730,1413270000"; d="scan'208";a="634847069" Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157]) by orsmga001.jf.intel.com with ESMTP; 09 Jan 2015 03:52:54 -0800 Received: from irsmsx105.ger.corp.intel.com ([169.254.7.195]) by IRSMSX103.ger.corp.intel.com ([169.254.3.113]) with mapi id 14.03.0195.001; Fri, 9 Jan 2015 11:52:53 +0000 From: "Ananyev, Konstantin" To: "Liang, Cunming" , Stephen Hemminger , "Richardson, Bruce" Thread-Topic: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore Thread-Index: AQHQFOb7rWYcSWZpW0S0EaABVQ+aIpyKJ7GAgAFMAwCABRIMAIAAC9OAgAS+ioCAANxagIAAj96AgAQtdYCAAISagIAAkhMAgAEB6QCAGXpCEIABOkkAgAALw7A= Date: Fri, 9 Jan 2015 11:52:53 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258213D3B9F@irsmsx105.ger.corp.intel.com> References: <1418263490-21088-1-git-send-email-cunming.liang@intel.com> <7C4248CAE043B144B1CD242D275626532FE15298@IRSMSX104.ger.corp.intel.com> <7C4248CAE043B144B1CD242D275626532FE232BA@IRSMSX104.ger.corp.intel.com> <7C4248CAE043B144B1CD242D275626532FE27C3B@IRSMSX104.ger.corp.intel.com> <20141219100342.GA3848@bricha3-MOBL3> <20141222094603.GA1768@bricha3-MOBL3> <20141222102852.7e6d5e81@urahara> <2601191342CEEE43887BDE71AB977258213D39EA@irsmsx105.ger.corp.intel.com> In-Reply-To: Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [163.33.239.181] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Cc: "dev@dpdk.org" Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: patches and discussions about DPDK List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 09 Jan 2015 11:52:57 -0000 > -----Original Message----- > From: Liang, Cunming > Sent: Friday, January 09, 2015 9:41 AM > To: Ananyev, Konstantin; Stephen Hemminger; Richardson, Bruce > Cc: dev@dpdk.org > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore >=20 >=20 >=20 > > -----Original Message----- > > From: Ananyev, Konstantin > > Sent: Friday, January 09, 2015 1:06 AM > > To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce > > Cc: dev@dpdk.org > > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore > > > > > > Hi Steve, > > > > > -----Original Message----- > > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Liang, Cunming > > > Sent: Tuesday, December 23, 2014 9:52 AM > > > To: Stephen Hemminger; Richardson, Bruce > > > Cc: dev@dpdk.org > > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lco= re > > > > > > > > > > > > > -----Original Message----- > > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org] > > > > Sent: Tuesday, December 23, 2014 2:29 AM > > > > To: Richardson, Bruce > > > > Cc: Liang, Cunming; dev@dpdk.org > > > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per l= core > > > > > > > > On Mon, 22 Dec 2014 09:46:03 +0000 > > > > Bruce Richardson wrote: > > > > > > > > > On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote: > > > > > > ... > > > > > > > I'm conflicted on this one. However, I think far more applica= tions would > > be > > > > > > > broken > > > > > > > to start having to use thread_id in place of an lcore_id than= would be > > > > broken > > > > > > > by having the lcore_id no longer actually correspond to a cor= e. > > > > > > > I'm actually struggling to come up with a large number of sce= narios > > where > > > > it's > > > > > > > important to an app to determine the cpu it's running on, com= pared to > > the > > > > large > > > > > > > number of cases where you need to have a data-structure per t= hread. > > In > > > > DPDK > > > > > > > libs > > > > > > > alone, you see this assumption that lcore_id =3D=3D thread_id= a large > > number > > > > of > > > > > > > times. > > > > > > > > > > > > > > Despite the slight logical inconsistency, I think it's better= to avoid > > > > introducing > > > > > > > a thread-id and continue having lcore_id representing a uniqu= e thread. > > > > > > > > > > > > > > /Bruce > > > > > > > > > > > > Ok, I understand it. > > > > > > I list the implicit meaning if using lcore_id representing the = unique thread. > > > > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents = the logical > > > > core id. > > > > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents= an > > unique > > > > id for thread. > > > > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest = to be used > > only > > > > in CASE 1) > > > > > > 4). rte_lcore_id() can be used in CASE 2), but the return value= no matter > > > > represent a logical core id. > > > > > > > > > > > > If most of us feel it's acceptable, I'll prepare for the RFC v2= base on this > > > > conclusion. > > > > > > > > > > > > /Cunming > > > > > > > > > > Sorry, I don't like that suggestion either, as having lcore_id va= lues greater > > > > > than RTE_MAX_LCORE is terrible, as how will people know how to > > dimension > > > > arrays > > > > > to be indexes by lcore id? Given the choice, if we are not going = to just use > > > > > lcore_id as a generic thread id, which is always between 0 and > > > > RTE_MAX_LCORE > > > > > we can look to define a new thread_id variable to hold that. Howe= ver, it > > should > > > > > have a bounded range. > > > > > From an ease-of-porting perspective, I still think that the simpl= est option is > > to > > > > > use the existing lcore_id and accept the fact that it's now a thr= ead id rather > > > > > than an actual physical lcore. Question is, is would that cause u= s lots of > > issues > > > > > in the future? > > > > > > > > > > /Bruce > > > > > > > > The current rte_lcore_id() has different meaning the thread. Your p= roposal > > will > > > > break code that uses lcore_id to do per-cpu statistics and the lcor= e_config > > > > code in the samples. > > > > q > > > [Liang, Cunming] +1. > > > > Few more thoughts on that subject: > > > > Actually one more place in the lib, where lcore_id is used (and it shou= ld be > > unique): > > rte_spinlock_recursive_lock() / rte_spinlock_recursive_trylock(). > > So if we going to replace lcore_id with thread_id as uniques thread ind= ex, then > > these functions > > have to be updated too. > [Liang, Cunming] You're right, if deciding to use thread_id, we have to c= heck and replace > rte_lcore_id()/RTE_PER_LCORE(_lcore_id) on all the impact place. > Now I'm buying the proposal to keep using rte_lcore_id() to return the > unique id. Meanwhile I think it's necessary to have real cpu id. > It's helpful in NUMA socket checking. > I will provide new API rte_curr_cpu() to return the runtime cpu no matter > the thread running in coremasked or non-coremasked cpu. > So the socket info stored in lcore_config still useful to choose the loc= al socket. > > > > About maintaining our own unique thread_id inside shared memory > > (_get_linear_tid()/_put_linear_tid()). > > There is one thing that worries me with that approach: > > In case of abnormal process termination, TIDs used by that process will= remain > > 'reserved' > > and there is no way to know which TIDs were used by terminated process. > > So there could be a situation with DPDK multi-process model, > > when after secondary process abnormal termination, It wouldn't be possi= ble to > > restart it - > > we just run out of 'free' TIDs. > [Liang, Cunming] That's a good point I think. I think it's not only for t= hread id but > for all the dynamic allocated resource (e.g. memzone, mempool). > we haven't a garbage collection or heartbeat to process the secondary abn= ormal exit. Of course some dynamically allocated meory could be unclaimed in that case. But right now, at least you can restart the child process. What I am saying - we probably better avoid managing our own TIDs dynamical= ly at all. =20 >=20 > > > > Which makes me think probably there is no need to introduce new globall= y > > unique 'thread_id'? > > Might be just lcore_id is enough? > > As Mirek and Bruce suggested we can treat it a sort of 'unique thread i= d' inside > > EAL. > [Liang, Cunming] I think we'd better have two, one for 'unique thread id'= , one for real cpu id. > No matter which of them are named lcore_id/thread_id/cpu_id and etc. As I understand, the goal is to be a be to run multiple EAL threads on mult= iple physical cpus. So each thread could run on multiple cpus, i.e - there would be no one to o= ne match between lcore_id(thread_id) and cpu_id. That's why I think we need to: =20 Introduce rte_lcore_get_affinity(lcore_id) - that would return cpuset for g= iven lcore. =20 Update rte_lcore_to_socket_id(lcore_id) - it would check if all cpus that l= core is allegeable to run belong to the same socket. If yes that socket_id will be returned, if no SOCKET_ID_ANY. > For cpu id, we need to check/get the NUMA info. > Pthread may migrate from one core to another, the thread 'socket id' may = change, > The per cpu socket info we have them in lcore_config. >=20 > > Or as 'virtual' core id that can run on set of physical cpus, and these= subsets for > > different 'virtual' cores can intersect. > > Then basically we can keep legacy behaviour with '-c ,' wh= ere each > > lcore_id matches one to one with physical cpu, and introduce new one, > > something like: > > -- > > lcores=3D'()=3D(),..()'. > > So let say: --lcores=3D(0-7)=3D(0,2-4),(10)=3D(7),(8)=3D(all)' would me= an: > > Create 10 EAL threads, bind threads with clore_id=3D[0-7] to cpuset: <0= ,2,3,4>, > > thread with lcore_id=3D10 is binded to cpu 7, and allow to run lcore_= id=3D8 on any > > cpu in the system. > > Of course '-c' and '-lcores' would be mutually exclusive, and we will n= eed to > > update rte_lcore_to_socket_id() > > and introduce: rte_lcore_(set|get)_affinity(). > > > > Does it make sense to you? > [Liang, Cunming] If assign lcore_id during the command line, user have to= handle > the conflict for '-c' and '--lcores'. > In this cases, if lcore_id 0~10 is occupied, the coremasked thread start = from 11 ? As I said above: " Of course '-c' and '-lcores' would be mutually exclusive= ". > In case, application create a new pthread during the runtime. > As there's no lcore id belongs to the new thread mentioned in the command= line, it then still back to dynamic allocate. > I means on the startup, user may have no idea of how much pthread they wi= ll run. I think you are mixing 2 different tasks here: 1. Allow EAL threads (lcores) to be run run on set of physical cpus (not ju= st one), and these subsets for different lcores can be intersectable. 2. Allow dynamically created threads to call EAL functions (rte_mempool, rt= e_recursive_lock, rte_timer, etc). My understanding was that our goal here is task #1. For #1 - I think what I proposed above is enough. Though, if our goal is #2 - it is a different story. In that case, I think we shouldn't manage unique TID ourselves. We are not OS, and on the app level it would quite complicated to implement= it in a robust way with current DPDK multi-process model. =20 Another thing - with proposed implementation we still limiting number of al= lowed threads. Instead of RTE_MAX_LCORES we just introduce RTE_MAX_THREADS. So all problems with rte_mempool caches and rte_timers will remain.=20 If we really need #2, then what we probably can do instead: 1. Rely on OS unique TID (linux gettid()). 2. Assign by default __lcore_id =3D -1, and set it up to the proper value o= nly for EAL (lcore) threads. 3. Revise all usages of __lcore_id inside the lib and for each case: A) either change it to use system wide unique TID (rte_recusive_spinlock= ) B) or update the code, so it can handle situation with __lcore_id =3D=3D = -1=20 As I can see, right now the following code inside RTE libs use rte_clore_id= (): 1. lib/librte_eal/common/eal_common_log.c Uses rte_lcore_id() return value as index in static log_cur_msg[]. 2. lib/librte_eal/common/include/generic/rte_spinlock.h Uses rte_lcore_id() return value as rte_spinlock_recursive.user. Value -1 (LCORE_ID_ANY) is reserved to mark lock as unused. 3. lib/librte_mempool/rte_mempool.h Uses rte_lcore_id() return value as index in rte_mempool.local_cache[] and = inside rte_mempool.stats[]. 4. lib/librte_timer/rte_timer.c Uses rte_lcore_id() return value as index in static struct priv_timer priv_= timer[]. Also uses it as 16 bit owner filed inside union rte_timer_status.=20 Again -1 is reserved value for RTE_TIMER_NO_OWNER. 5. lib/librte_eal/common/include/rte_lcore.h Inside rte_socket_id() uses rte_clore_id() return value as index in lcore_c= onfig[]. 6. lib/librte_ring/rte_ring.h Uses rte_clore_id() return value as index in rte_ring.stats[]. case 2 is A), so I think we can use gettid() returned value instead of __l= core_id value here. =20 All other cases looks like B) to me. The easiest thing (at least as the first step) is just not add a check that= __lcore_id < MAX_LCORE_ID. case 3: avoid mempool caching if __lcore_id >=3D MAX_LCORE_ID =20 case 4: Allow to setup timers only for EAL (lcore) threads (__lcore_id < MA= X_LCORE_ID). E.g. - dynamically created thread will be able to start/stop timer for lcor= e thread, but it will be not allowed to setup timer for itself or another non-lcore t= hread.=20 rte_timer_manage() for non-lcore thread would simply do nothing and return = straightway. case 5: just return SOCKET_ID_ANY if __lcore_id >=3D MAX_LCORE_ID. case 6: avoid stats[] update if __lcore_id >=3D MAX_LCORE_ID. That way user can create as many threads as he wants dynamically and still = should be able to use EAL functions inside them. Of course for that, the problem that Olivier mentioned with thread pre-empt= ion in the middle of ring enqueue/dequeue (http://dpdk.org/ml/archives/dev/2014-December/010342.html)=20 need to be fixed somehow. Otherwise performance might be really poor. =20 Though I suppose that need to be done for task #1 anyway. Konstantin >=20 > 'rte_pthread_assign_lcore' do the things as 'rte_lcore_(set|get)_affinity= ()' > If we keeping using lcore_id, I like the name you proposed. >=20 > I'll send my code update on next Monday. >=20 > > > > BTW, one more thing: while we are on it - it is probably a good time t= o do > > something with our interrupt thread? > > It is a bit strange that we can't use rte_pktmbuf_free() or > > rte_spinlock_recursive_lock() from our own interrupt/alarm handlers > > > > Konstantin