From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga02.intel.com (mga02.intel.com [134.134.136.20])
 by dpdk.org (Postfix) with ESMTP id 8965219F5
 for <dev@dpdk.org>; Fri,  9 Jan 2015 12:52:56 +0100 (CET)
Received: from orsmga001.jf.intel.com ([10.7.209.18])
 by orsmga101.jf.intel.com with ESMTP; 09 Jan 2015 03:52:55 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.07,730,1413270000"; d="scan'208";a="634847069"
Received: from irsmsx103.ger.corp.intel.com ([163.33.3.157])
 by orsmga001.jf.intel.com with ESMTP; 09 Jan 2015 03:52:54 -0800
Received: from irsmsx105.ger.corp.intel.com ([169.254.7.195]) by
 IRSMSX103.ger.corp.intel.com ([169.254.3.113]) with mapi id 14.03.0195.001;
 Fri, 9 Jan 2015 11:52:53 +0000
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: "Liang, Cunming" <cunming.liang@intel.com>, Stephen Hemminger
 <stephen@networkplumber.org>, "Richardson, Bruce"
 <bruce.richardson@intel.com>
Thread-Topic: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
Thread-Index: AQHQFOb7rWYcSWZpW0S0EaABVQ+aIpyKJ7GAgAFMAwCABRIMAIAAC9OAgAS+ioCAANxagIAAj96AgAQtdYCAAISagIAAkhMAgAEB6QCAGXpCEIABOkkAgAALw7A=
Date: Fri, 9 Jan 2015 11:52:53 +0000
Message-ID: <2601191342CEEE43887BDE71AB977258213D3B9F@irsmsx105.ger.corp.intel.com>
References: <1418263490-21088-1-git-send-email-cunming.liang@intel.com>
 <7C4248CAE043B144B1CD242D275626532FE15298@IRSMSX104.ger.corp.intel.com>
 <D0158A423229094DA7ABF71CF2FA0DA31188B881@shsmsx102.ccr.corp.intel.com>
 <7C4248CAE043B144B1CD242D275626532FE232BA@IRSMSX104.ger.corp.intel.com>
 <D0158A423229094DA7ABF71CF2FA0DA31188C928@shsmsx102.ccr.corp.intel.com>
 <7C4248CAE043B144B1CD242D275626532FE27C3B@IRSMSX104.ger.corp.intel.com>
 <D0158A423229094DA7ABF71CF2FA0DA31188E454@shsmsx102.ccr.corp.intel.com>
 <20141219100342.GA3848@bricha3-MOBL3>
 <D0158A423229094DA7ABF71CF2FA0DA31188EF9F@shsmsx102.ccr.corp.intel.com>
 <20141222094603.GA1768@bricha3-MOBL3> <20141222102852.7e6d5e81@urahara>
 <D0158A423229094DA7ABF71CF2FA0DA31188F9AD@shsmsx102.ccr.corp.intel.com>
 <2601191342CEEE43887BDE71AB977258213D39EA@irsmsx105.ger.corp.intel.com>
 <D0158A423229094DA7ABF71CF2FA0DA311894B99@shsmsx102.ccr.corp.intel.com>
In-Reply-To: <D0158A423229094DA7ABF71CF2FA0DA311894B99@shsmsx102.ccr.corp.intel.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [163.33.239.181]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Cc: "dev@dpdk.org" <dev@dpdk.org>
Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: patches and discussions about DPDK <dev.dpdk.org>
List-Unsubscribe: <http://dpdk.org/ml/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://dpdk.org/ml/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <http://dpdk.org/ml/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Fri, 09 Jan 2015 11:52:57 -0000


> -----Original Message-----
> From: Liang, Cunming
> Sent: Friday, January 09, 2015 9:41 AM
> To: Ananyev, Konstantin; Stephen Hemminger; Richardson, Bruce
> Cc: dev@dpdk.org
> Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
>=20
>=20
>=20
> > -----Original Message-----
> > From: Ananyev, Konstantin
> > Sent: Friday, January 09, 2015 1:06 AM
> > To: Liang, Cunming; Stephen Hemminger; Richardson, Bruce
> > Cc: dev@dpdk.org
> > Subject: RE: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lcore
> >
> >
> > Hi Steve,
> >
> > > -----Original Message-----
> > > From: dev [mailto:dev-bounces@dpdk.org] On Behalf Of Liang, Cunming
> > > Sent: Tuesday, December 23, 2014 9:52 AM
> > > To: Stephen Hemminger; Richardson, Bruce
> > > Cc: dev@dpdk.org
> > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per lco=
re
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > > Sent: Tuesday, December 23, 2014 2:29 AM
> > > > To: Richardson, Bruce
> > > > Cc: Liang, Cunming; dev@dpdk.org
> > > > Subject: Re: [dpdk-dev] [RFC PATCH 0/7] support multi-phtread per l=
core
> > > >
> > > > On Mon, 22 Dec 2014 09:46:03 +0000
> > > > Bruce Richardson <bruce.richardson@intel.com> wrote:
> > > >
> > > > > On Mon, Dec 22, 2014 at 01:51:27AM +0000, Liang, Cunming wrote:
> > > > > > ...
> > > > > > > I'm conflicted on this one. However, I think far more applica=
tions would
> > be
> > > > > > > broken
> > > > > > > to start having to use thread_id in place of an lcore_id than=
 would be
> > > > broken
> > > > > > > by having the lcore_id no longer actually correspond to a cor=
e.
> > > > > > > I'm actually struggling to come up with a large number of sce=
narios
> > where
> > > > it's
> > > > > > > important to an app to determine the cpu it's running on, com=
pared to
> > the
> > > > large
> > > > > > > number of cases where you need to have a data-structure per t=
hread.
> > In
> > > > DPDK
> > > > > > > libs
> > > > > > > alone, you see this assumption that lcore_id =3D=3D thread_id=
 a large
> > number
> > > > of
> > > > > > > times.
> > > > > > >
> > > > > > > Despite the slight logical inconsistency, I think it's better=
 to avoid
> > > > introducing
> > > > > > > a thread-id and continue having lcore_id representing a uniqu=
e thread.
> > > > > > >
> > > > > > > /Bruce
> > > > > >
> > > > > > Ok, I understand it.
> > > > > > I list the implicit meaning if using lcore_id representing the =
unique thread.
> > > > > > 1). When lcore_id less than RTE_MAX_LCORE, it still represents =
the logical
> > > > core id.
> > > > > > 2). When lcore_id large equal than RTE_MAX_LCORE, it represents=
 an
> > unique
> > > > id for thread.
> > > > > > 3). Most of APIs(except rte_lcore_id()) in rte_lcore.h suggest =
to be used
> > only
> > > > in CASE 1)
> > > > > > 4). rte_lcore_id() can be used in CASE 2), but the return value=
 no matter
> > > > represent a logical core id.
> > > > > >
> > > > > > If most of us feel it's acceptable, I'll prepare for the RFC v2=
 base on this
> > > > conclusion.
> > > > > >
> > > > > > /Cunming
> > > > >
> > > > > Sorry, I don't like that suggestion either, as having lcore_id va=
lues greater
> > > > > than RTE_MAX_LCORE is terrible, as how will people know how to
> > dimension
> > > > arrays
> > > > > to be indexes by lcore id? Given the choice, if we are not going =
to just use
> > > > > lcore_id as a generic thread id, which is always between 0 and
> > > > RTE_MAX_LCORE
> > > > > we can look to define a new thread_id variable to hold that. Howe=
ver, it
> > should
> > > > > have a bounded range.
> > > > > From an ease-of-porting perspective, I still think that the simpl=
est option is
> > to
> > > > > use the existing lcore_id and accept the fact that it's now a thr=
ead id rather
> > > > > than an actual physical lcore. Question is, is would that cause u=
s lots of
> > issues
> > > > > in the future?
> > > > >
> > > > > /Bruce
> > > >
> > > > The current rte_lcore_id() has different meaning the thread. Your p=
roposal
> > will
> > > > break code that uses lcore_id to do per-cpu statistics and the lcor=
e_config
> > > > code in the samples.
> > > > q
> > > [Liang, Cunming] +1.
> >
> > Few more thoughts on that subject:
> >
> > Actually one more place in the lib, where lcore_id is used (and it shou=
ld be
> > unique):
> > rte_spinlock_recursive_lock() / rte_spinlock_recursive_trylock().
> > So if we going to replace lcore_id with thread_id as uniques thread ind=
ex, then
> > these functions
> > have to be updated too.
> [Liang, Cunming] You're right, if deciding to use thread_id, we have to c=
heck and replace
> rte_lcore_id()/RTE_PER_LCORE(_lcore_id) on all the impact place.
> Now I'm buying the proposal to keep using rte_lcore_id() to return the
> unique id. Meanwhile I think it's necessary to have real cpu id.
> It's helpful in NUMA socket checking.
> I will provide new API rte_curr_cpu() to return the runtime cpu no matter
> the thread running in coremasked or non-coremasked cpu.
> So the socket info stored in lcore_config still useful  to choose the loc=
al socket.
> >
> > About maintaining our own unique thread_id inside shared memory
> > (_get_linear_tid()/_put_linear_tid()).
> > There is one thing that worries me with that approach:
> > In case of abnormal process termination, TIDs used by that process will=
 remain
> > 'reserved'
> > and there is no way to know which TIDs were used by terminated process.
> > So there could be a situation with DPDK multi-process model,
> > when after secondary process abnormal termination, It wouldn't be possi=
ble to
> > restart it -
> > we just run out of 'free' TIDs.
> [Liang, Cunming] That's a good point I think. I think it's not only for t=
hread id but
> for all the dynamic allocated resource (e.g. memzone, mempool).
> we haven't a garbage collection or heartbeat to process the secondary abn=
ormal exit.

Of course some dynamically allocated meory could be unclaimed in that case.
But right now, at least you can restart the child process.
What I am saying - we probably better avoid managing our own TIDs dynamical=
ly at all. =20

>=20
> >
> > Which makes me think probably there is no need to introduce new globall=
y
> > unique 'thread_id'?
> > Might be just lcore_id is enough?
> > As Mirek and Bruce suggested we can treat it a sort of 'unique thread i=
d' inside
> > EAL.
> [Liang, Cunming] I think we'd better have two, one for 'unique thread id'=
, one for real cpu id.
> No matter which of them are named lcore_id/thread_id/cpu_id and etc.

As I understand, the goal is to be a be to run multiple EAL threads on mult=
iple physical cpus.
So each thread could run on multiple cpus, i.e - there would be no one to o=
ne match
between lcore_id(thread_id) and cpu_id.
That's why I think we  need to: =20
Introduce rte_lcore_get_affinity(lcore_id) - that would return cpuset for g=
iven lcore.  =20
Update rte_lcore_to_socket_id(lcore_id) - it would check if all cpus that l=
core is allegeable
to run belong to the same socket.
If yes that socket_id will be returned, if no SOCKET_ID_ANY.

> For cpu id, we need to check/get the NUMA info.
> Pthread may migrate from one core to another, the thread 'socket id' may =
change,
> The per cpu socket info we have them in lcore_config.
>=20
> > Or as 'virtual' core id that can run on set of physical cpus, and these=
 subsets for
> > different 'virtual' cores can intersect.
> > Then basically we can keep legacy behaviour with '-c <lcores_mask>,' wh=
ere each
> > lcore_id matches one to one  with physical cpu, and introduce new one,
> > something like:
> > --
> > lcores=3D'(<lcore_set1>)=3D(<phys_cpu_set1>),..(<lcore_setN)=3D(<phys_c=
pu_setN>)'.
> > So let say: --lcores=3D(0-7)=3D(0,2-4),(10)=3D(7),(8)=3D(all)' would me=
an:
> > Create 10 EAL threads, bind threads with clore_id=3D[0-7] to cpuset: <0=
,2,3,4>,
> > thread  with lcore_id=3D10 is binded to  cpu 7, and allow to run lcore_=
id=3D8 on any
> > cpu in the system.
> > Of course '-c' and '-lcores' would be mutually exclusive, and we will n=
eed to
> > update  rte_lcore_to_socket_id()
> > and introduce: rte_lcore_(set|get)_affinity().
> >
> > Does it make sense to you?
> [Liang, Cunming] If assign lcore_id during the command line, user have to=
 handle
> the conflict for '-c' and '--lcores'.
> In this cases, if lcore_id 0~10 is occupied, the coremasked thread start =
from 11 ?

As I said above: " Of course '-c' and '-lcores' would be mutually exclusive=
".

> In case, application create a new pthread during the runtime.
> As there's no lcore id belongs to the new thread mentioned in the command=
 line, it then still back to dynamic allocate.
> I means on the startup, user may have no idea of how much pthread they wi=
ll run.

I think you are mixing 2 different tasks here:
1. Allow EAL threads (lcores) to be run run on set of physical cpus (not ju=
st one), and these subsets for
different lcores can be intersectable.
2. Allow dynamically created threads to call EAL functions (rte_mempool, rt=
e_recursive_lock, rte_timer, etc).

My understanding was that our goal here is task #1.
For #1 - I think what I proposed above is enough.

Though, if our goal is #2 - it is a different story.
In that case, I think we shouldn't manage unique TID ourselves.
We are not OS, and on the app level it would quite complicated to implement=
 it
in a robust way with current DPDK multi-process model.  =20
Another thing - with proposed implementation we still limiting number of al=
lowed threads.
Instead of RTE_MAX_LCORES we just introduce RTE_MAX_THREADS.
So all problems with rte_mempool caches and rte_timers will remain.=20

If we really need #2, then what we probably can do instead:

1. Rely on OS unique TID (linux gettid()).
2. Assign by default __lcore_id =3D -1, and set it up to the proper value o=
nly for EAL (lcore) threads.
3. Revise all usages of __lcore_id inside the lib and for each case:
  A) either change it  to use system wide unique TID (rte_recusive_spinlock=
)
  B) or update the code, so it can handle situation with __lcore_id =3D=3D =
-1=20

As I can see, right now the following code inside RTE libs use rte_clore_id=
():

1.	lib/librte_eal/common/eal_common_log.c
Uses rte_lcore_id() return value as index in static log_cur_msg[].

2.	lib/librte_eal/common/include/generic/rte_spinlock.h
Uses rte_lcore_id() return value as rte_spinlock_recursive.user.
Value -1 (LCORE_ID_ANY) is reserved to mark lock as unused.

3.	lib/librte_mempool/rte_mempool.h
Uses rte_lcore_id() return value as index in rte_mempool.local_cache[] and =
 inside rte_mempool.stats[].

4.	lib/librte_timer/rte_timer.c
Uses rte_lcore_id() return value as index in static struct priv_timer priv_=
timer[].
Also uses it as 16 bit owner filed inside union rte_timer_status.=20
Again -1 is reserved value for RTE_TIMER_NO_OWNER.

5.	lib/librte_eal/common/include/rte_lcore.h
Inside rte_socket_id() uses rte_clore_id() return value as index in lcore_c=
onfig[].

6.	lib/librte_ring/rte_ring.h
Uses rte_clore_id() return value as index in rte_ring.stats[].

case 2  is A), so I think we can use gettid() returned value instead of __l=
core_id value here. =20
All other cases looks like B) to me.

The easiest thing (at least as the first step) is just not add a check that=
 __lcore_id < MAX_LCORE_ID.
case 3:  avoid mempool caching if __lcore_id >=3D MAX_LCORE_ID =20
case 4: Allow to setup timers only for EAL (lcore) threads (__lcore_id < MA=
X_LCORE_ID).
E.g. - dynamically created thread will be able to start/stop timer for lcor=
e thread,
but it will be not allowed to setup timer for itself or another non-lcore t=
hread.=20
rte_timer_manage() for non-lcore thread would simply do nothing and return =
straightway.
case 5:  just return SOCKET_ID_ANY if __lcore_id >=3D MAX_LCORE_ID.
case 6:  avoid stats[] update if __lcore_id >=3D MAX_LCORE_ID.

That way user can create as many threads as he wants dynamically and still =
should be able to use EAL functions inside them.
Of course for that, the problem that Olivier mentioned with thread pre-empt=
ion in the middle of ring enqueue/dequeue
(http://dpdk.org/ml/archives/dev/2014-December/010342.html)=20
need to be fixed somehow.
Otherwise performance might be really poor. =20
Though I suppose that need to be done for task #1 anyway.

Konstantin

>=20
> 'rte_pthread_assign_lcore' do the things as 'rte_lcore_(set|get)_affinity=
()'
> If we keeping using lcore_id, I like the name you proposed.
>=20
> I'll send my code update on next Monday.
>=20
> >
> > BTW, one more thing: while we are on it  - it is probably a good time t=
o do
> > something with our interrupt thread?
> > It is a bit strange that we can't use rte_pktmbuf_free() or
> > rte_spinlock_recursive_lock() from our own interrupt/alarm handlers
> >
> > Konstantin