From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga11.intel.com (mga11.intel.com [192.55.52.93])
 by dpdk.org (Postfix) with ESMTP id C93A81B13F
 for <dev@dpdk.org>; Wed, 12 Dec 2018 10:29:29 +0100 (CET)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga006.fm.intel.com ([10.253.24.20])
 by fmsmga102.fm.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 12 Dec 2018 01:29:28 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.56,343,1539673200"; d="scan'208";a="301494738"
Received: from irsmsx110.ger.corp.intel.com ([163.33.3.25])
 by fmsmga006.fm.intel.com with ESMTP; 12 Dec 2018 01:29:26 -0800
Received: from irsmsx112.ger.corp.intel.com (10.108.20.5) by
 irsmsx110.ger.corp.intel.com (163.33.3.25) with Microsoft SMTP Server (TLS)
 id 14.3.408.0; Wed, 12 Dec 2018 09:29:25 +0000
Received: from irsmsx106.ger.corp.intel.com ([169.254.8.227]) by
 irsmsx112.ger.corp.intel.com ([169.254.1.252]) with mapi id 14.03.0415.000;
 Wed, 12 Dec 2018 09:29:26 +0000
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>, "dev@dpdk.org"
 <dev@dpdk.org>
CC: nd <nd@arm.com>, Dharmik Thakkar <Dharmik.Thakkar@arm.com>, Malvika Gupta
 <Malvika.Gupta@arm.com>, "Gavin Hu (Arm Technology China)"
 <Gavin.Hu@arm.com>, nd <nd@arm.com>
Thread-Topic: [dpdk-dev] [RFC 2/3] tqs: add thread quiescent state library
Thread-Index: AQHUghQLRVNcuR2lRU2RH6S2NG2yyqVeB0ZggAUcBqCAAiq/IIANopHQgAa+yEA=
Date: Wed, 12 Dec 2018 09:29:25 +0000
Message-ID: <2601191342CEEE43887BDE71AB977258010D8B8CEE@IRSMSX106.ger.corp.intel.com>
References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com>
 <20181122033055.3431-3-honnappa.nagarahalli@arm.com>
 <2601191342CEEE43887BDE71AB977258010CEBBBA9@IRSMSX106.ger.corp.intel.com>
 <AM6PR08MB36727419D83871C334E3A78898D00@AM6PR08MB3672.eurprd08.prod.outlook.com>
 <2601191342CEEE43887BDE71AB977258010CEBD5D4@IRSMSX106.ger.corp.intel.com>
 <AM6PR08MB3672057B31512AE4235C004598AA0@AM6PR08MB3672.eurprd08.prod.outlook.com>
In-Reply-To: <AM6PR08MB3672057B31512AE4235C004598AA0@AM6PR08MB3672.eurprd08.prod.outlook.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiNTU5ODczMWUtMzc1NS00ODA2LWI4MjgtNWNlYzQ4Nzc3NzMxIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiMlF0V1g4SHdzMnUwekdrdXJVbjNEZUtwS1RoY0lZbGRHQWRDUFZWQUJwbDRkRWdKdmRVdlVcL2RSVklGTnNjWTkifQ==
x-ctpclassification: CTP_NT
dlp-product: dlpe-windows
dlp-version: 11.0.400.15
dlp-reaction: no-action
x-originating-ip: [163.33.239.182]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [RFC 2/3] tqs: add thread quiescent state library
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Wed, 12 Dec 2018 09:29:30 -0000


> > >
> > > > > +
> > > > > +/* Add a reader thread, running on an lcore, to the list of
> > > > > +threads
> > > > > + * reporting their quiescent state on a TQS variable.
> > > > > + */
> > > > > +int __rte_experimental
> > > > > +rte_tqs_register_lcore(struct rte_tqs *v, unsigned int lcore_id)=
 {
> > > > > +	TQS_RETURN_IF_TRUE((v =3D=3D NULL || lcore_id >=3D
> > > > RTE_TQS_MAX_LCORE),
> > > > > +				-EINVAL);
> > > >
> > > > It is not very good practice to make function return different
> > > > values and behave in a different way in debug/non-debug mode.
> > > > I'd say that for slow-path (functions in .c) it is always good to
> > > > check input parameters.
> > > > For fast-path (functions in .h) we sometimes skip such checking, bu=
t
> > > > debug mode can probably use RTE_ASSERT() or so.
> > > Makes sense, I will change this in the next version.
> > >
> > > >
> > > >
> > > > lcore_id >=3D RTE_TQS_MAX_LCORE
> > > >
> > > > Is this limitation really necessary?
> > > I added this limitation because currently DPDK application cannot tak=
e
> > > a mask more than 64bit wide. Otherwise, this should be as big as
> > RTE_MAX_LCORE.
> > > I see that in the case of '-lcores' option, the number of lcores can
> > > be more than the number of PEs. In this case, we still need a MAX lim=
it (but
> > can be bigger than 64).
> > >
> > > > First it means that only lcores can use that API (at least data-pat=
h
> > > > part), second even today many machines have more than 64 cores.
> > > > I think you can easily avoid such limitation, if instead of
> > > > requiring lcore_id as input parameter, you'll just make it return i=
ndex of
> > next available entry in w[].
> > > > Then tqs_update() can take that index as input parameter.
> > > I had thought about a similar approach based on IDs. I was concerned
> > > that ID will be one more thing to manage for the application. But, I =
see the
> > limitations of the current approach now. I will change it to allocation=
 based.
> > This will support even non-EAL pthreads as well.
> >
> > Yes, with such approach non-lcore threads will be able to use it also.
> >
> I realized that rte_tqs_register_lcore/ rte_tqs_unregister_lcore need to =
be efficient as they can be called from the worker's packet
> processing loop (rte_event_dequeue_burst allows blocking. So, the worker =
thread needs to call rte_tqs_unregister_lcore before calling
> rte_event_dequeue_burst and rte_tqs_register_lcore before starting packet=
 processing). Allocating the thread ID in these functions will
> make them more complex.
>=20
> I suggest that we change the argument 'lcore_id' to 'thread_id'. The appl=
ication could use 'lcore_id' as 'thread_id' if threads are mapped to
> physical cores 1:1.
>=20
> If the threads are not mapped 1:1 to physical cores, the threads need to =
use a thread_id in the range of 0 - RTE_TQS_MAX_THREADS. I do
> not see that DPDK has a thread_id concept. For TQS, the thread IDs are gl=
obal (i.e. not per TQS variable). I could provide APIs to do the
> thread ID allocation, but I think the thread ID allocation should not be =
part of this library. Such thread ID might be useful for other libraries.

I don't think there is any point to introduce new thread_id concept just fo=
r that library.
After all we already have a concept of lcore_id which pretty much serves th=
e same purpose.
I still think that we need to either:
a) make register/unregister to work with any valid lcore_id (<=3D  RTE_MAX_=
LCORE)
b) make register/unregister to return index in w[]

For a) will need mask bigger than 64bits.=20
b)  would allow to use data-path API by non-lcores threads too,
plus w[] would occupy less space, and check() might be faster.
Though yes, as a drawback, for b) register/unregister probably would need
extra 'while(CAS(...));' loop.
I suppose the question here do you foresee a lot of concurrent register/unr=
egister
at data-path?=20

>=20
> <snip>
>=20
> >
> > >
> > > >
> > > > > +
> > > > > +		while (lcore_mask) {
> > > > > +			l =3D __builtin_ctz(lcore_mask);
> > > > > +			if (v->w[l].cnt !=3D t)
> > > > > +				break;
> > > >
> > > > As I understand, that makes control-path function progress dependen=
t
> > > > on simultaneous invocation of data-path functions.
> > > I agree that the control-path function progress (for ex: how long to
> > > wait for freeing the memory) depends on invocation of the data-path
> > > functions. The separation of 'start', 'check' and the option not to b=
lock in
> > 'check' provide the flexibility for control-path to do some other work =
if it
> > chooses to.
> > >
> > > > In some cases that might cause control-path to hang.
> > > > Let say if data-path function wouldn't be called, or user invokes
> > > > control-path and data-path functions from the same thread.
> > > I agree with the case of data-path function not getting called. I
> > > would consider that as programming error. I can document that warning=
 in
> > the rte_tqs_check API.
> >
> > Sure, it can be documented.
> > Though that means, that each data-path thread would have to do explicit
> > update() call for every tqs it might use.
> > I just think that it would complicate things and might limit usage of t=
he library
> > quite significantly.
> Each data path thread has to report its quiescent state. Hence, each data=
-path thread has to call update() (similar to how
> rte_timer_manage() has to be called periodically on the worker thread).

I understand that.
Though that means that each data-path thread has to know explicitly what rc=
u vars it accesses.
Would be hard to adopt such API with rcu vars used inside some library.
But ok, as I understand people do use QSBR approach in their apps and
find it useful.

> Do you have any particular use case in mind where this fails?

Let say it means that library can't be used to add/del RX/TX ethdev callbac=
ks
in a safe manner.

BTW, two side questions:
1) As I understand what you propose is very similar to QSBR main concept.
Wouldn't it be better to name it accordingly to avoid confusion (or at leas=
t document it somewhere).
I think someone else already raised that question.
2) Would QSBR be the only technique in that lib?
Any plans to add something similar to GP one too (with MBs at reader-side)?

>=20
> >
> > >
> > > In the case of same thread calling both control-path and data-path
> > > functions, it would depend on the sequence of the calls. The followin=
g
> > sequence should not cause any hangs:
> > > Worker thread
> > > 1) 'deletes' an entry from a lock-free data structure
> > > 2) rte_tqs_start
> > > 3) rte_tqs_update
> > > 4) rte_tqs_check (wait =3D=3D 1 or wait =3D=3D 0)
> > > 5) 'free' the entry deleted in 1)
> >
> > That an interesting idea, and that should help, I think.
> > Probably worth to have {2,3,4} sequence as a new high level function.
> >
> Yes, this is a good idea. Such a function would be applicable only in the=
 worker thread. I would prefer to leave it to the application to take
> care.

Yes, it would be applicable only to worker thread, but why we can't have a =
function for it?
Let say it could be 2 different functions: one doing {2,3,4} - for worker t=
hreads,
and second doing just {2,4} - for control threads.
Or it could be just one function that takes extra parameter: lcore_id/w[] i=
ndex.
If it is some predefined invalid value (-1 or so), step #3 will be skipped.

Konstantin