From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <konstantin.ananyev@intel.com>
Received: from mga06.intel.com (mga06.intel.com [134.134.136.31])
 by dpdk.org (Postfix) with ESMTP id A32781B5E9
 for <dev@dpdk.org>; Sat, 24 Nov 2018 13:18:05 +0100 (CET)
X-Amp-Result: SKIPPED(no attachment in message)
X-Amp-File-Uploaded: False
Received: from fmsmga008.fm.intel.com ([10.253.24.58])
 by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384;
 24 Nov 2018 04:18:03 -0800
X-ExtLoop1: 1
X-IronPort-AV: E=Sophos;i="5.56,273,1539673200"; d="scan'208";a="88799216"
Received: from irsmsx110.ger.corp.intel.com ([163.33.3.25])
 by fmsmga008.fm.intel.com with ESMTP; 24 Nov 2018 04:18:02 -0800
Received: from irsmsx112.ger.corp.intel.com (10.108.20.5) by
 irsmsx110.ger.corp.intel.com (163.33.3.25) with Microsoft SMTP Server (TLS)
 id 14.3.408.0; Sat, 24 Nov 2018 12:18:01 +0000
Received: from irsmsx106.ger.corp.intel.com ([169.254.8.8]) by
 irsmsx112.ger.corp.intel.com ([169.254.1.93]) with mapi id 14.03.0415.000;
 Sat, 24 Nov 2018 12:18:01 +0000
From: "Ananyev, Konstantin" <konstantin.ananyev@intel.com>
To: 'Honnappa Nagarahalli' <honnappa.nagarahalli@arm.com>, "dev@dpdk.org"
 <dev@dpdk.org>
CC: "nd@arm.com" <nd@arm.com>, "dharmik.thakkar@arm.com"
 <dharmik.thakkar@arm.com>, "malvika.gupta@arm.com" <malvika.gupta@arm.com>,
 "gavin.hu@arm.com" <gavin.hu@arm.com>
Thread-Topic: [dpdk-dev] [RFC 2/3] tqs: add thread quiescent state library
Thread-Index: AQHUghQLRVNcuR2lRU2RH6S2NG2yyqVeB0Zg
Date: Sat, 24 Nov 2018 12:18:00 +0000
Message-ID: <2601191342CEEE43887BDE71AB977258010CEBBBA9@IRSMSX106.ger.corp.intel.com>
References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com>
 <20181122033055.3431-3-honnappa.nagarahalli@arm.com>
In-Reply-To: <20181122033055.3431-3-honnappa.nagarahalli@arm.com>
Accept-Language: en-IE, en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZDUzY2E4MmEtYjM5NC00OTk2LWEzZjMtNTc5ZGM0YzI1OTYwIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiOVp1NU5Ub2dSZGRJek9yV2g1bURyRDlGNGs5T01BVlNkckloU2VlM1NTWmJlSE5vdU1uTE56eDVpSzR3aXFDMiJ9
x-ctpclassification: CTP_NT
dlp-product: dlpe-windows
dlp-version: 11.0.400.15
dlp-reaction: no-action
x-originating-ip: [163.33.239.180]
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
MIME-Version: 1.0
Subject: Re: [dpdk-dev] [RFC 2/3] tqs: add thread quiescent state library
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.15
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
X-List-Received-Date: Sat, 24 Nov 2018 12:18:06 -0000

Hi Honnappa,

> +
> +/* Allocate a new TQS variable with the name *name* in memory. */
> +struct rte_tqs * __rte_experimental
> +rte_tqs_alloc(const char *name, int socket_id, uint64_t lcore_mask)
> +{
> +	char tqs_name[RTE_TQS_NAMESIZE];
> +	struct rte_tailq_entry *te, *tmp_te;
> +	struct rte_tqs_list *tqs_list;
> +	struct rte_tqs *v, *tmp_v;
> +	int ret;
> +
> +	if (name =3D=3D NULL) {
> +		RTE_LOG(ERR, TQS, "Invalid input parameters\n");
> +		rte_errno =3D -EINVAL;
> +		return NULL;
> +	}
> +
> +	te =3D rte_zmalloc("TQS_TAILQ_ENTRY", sizeof(*te), 0);
> +	if (te =3D=3D NULL) {
> +		RTE_LOG(ERR, TQS, "Cannot reserve memory for tailq\n");
> +		rte_errno =3D -ENOMEM;
> +		return NULL;
> +	}
> +
> +	snprintf(tqs_name, sizeof(tqs_name), "%s", name);
> +	v =3D rte_zmalloc_socket(tqs_name, sizeof(struct rte_tqs),
> +				RTE_CACHE_LINE_SIZE, socket_id);
> +	if (v =3D=3D NULL) {
> +		RTE_LOG(ERR, TQS, "Cannot reserve memory for TQS variable\n");
> +		rte_errno =3D -ENOMEM;
> +		goto alloc_error;
> +	}
> +
> +	ret =3D snprintf(v->name, sizeof(v->name), "%s", name);
> +	if (ret < 0 || ret >=3D (int)sizeof(v->name)) {
> +		rte_errno =3D -ENAMETOOLONG;
> +		goto alloc_error;
> +	}
> +
> +	te->data =3D (void *) v;
> +	v->lcore_mask =3D lcore_mask;
> +
> +	rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK);
> +
> +	tqs_list =3D RTE_TAILQ_CAST(rte_tqs_tailq.head, rte_tqs_list);
> +
> +	/* Search if a TQS variable with the same name exists already */
> +	TAILQ_FOREACH(tmp_te, tqs_list, next) {
> +		tmp_v =3D (struct rte_tqs *) tmp_te->data;
> +		if (strncmp(name, tmp_v->name, RTE_TQS_NAMESIZE) =3D=3D 0)
> +			break;
> +	}
> +
> +	if (tmp_te !=3D NULL) {
> +		rte_errno =3D -EEXIST;
> +		goto tqs_exist;
> +	}
> +
> +	TAILQ_INSERT_TAIL(tqs_list, te, next);
> +
> +	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
> +
> +	return v;
> +
> +tqs_exist:
> +	rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK);
> +
> +alloc_error:
> +	rte_free(te);
> +	rte_free(v);
> +	return NULL;
> +}

That seems quite heavy-weight function just to allocate sync variable.
As size of struct rte_tqs is constant and known to the user, might be bette=
r
just provide rte_tqs_init(struct rte_tqs *tqs, ...) and let user allocate/f=
ree
memory for it by himself.

> +
> +/* Add a reader thread, running on an lcore, to the list of threads
> + * reporting their quiescent state on a TQS variable.
> + */
> +int __rte_experimental
> +rte_tqs_register_lcore(struct rte_tqs *v, unsigned int lcore_id)
> +{
> +	TQS_RETURN_IF_TRUE((v =3D=3D NULL || lcore_id >=3D RTE_TQS_MAX_LCORE),
> +				-EINVAL);

It is not very good practice to make function return different values and b=
ehave
in a different way in debug/non-debug mode.
I'd say that for slow-path (functions in .c) it is always good to check inp=
ut parameters.
For fast-path (functions in .h) we sometimes skip such checking,
but debug mode can probably use RTE_ASSERT() or so.


lcore_id >=3D RTE_TQS_MAX_LCORE

Is this limitation really necessary?
First it means that only lcores can use that API (at least data-path part),=
 second
even today many machines have more than 64 cores.
I think you can easily avoid such limitation, if instead of requiring lcore=
_id as
input parameter, you'll just make it return index of next available entry i=
n w[].
Then tqs_update() can take that index as input parameter.=20

> +

> +	/* Worker thread has to count the quiescent states
> +	 * only from the current value of token.
> +	 */
> +	v->w[lcore_id].cnt =3D v->token;

Wonder what would happen, if new reader will
call register(), after writer calls start()?
Looks like a race-condition.
Or such pattern is not supported?

> +
> +	/* Release the store to initial TQS count so that workers
> +	 * can use it immediately after this function returns.
> +	 */
> +	__atomic_fetch_or(&v->lcore_mask, (1UL << lcore_id), __ATOMIC_RELEASE);
> +
> +	return 0;
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Trigger the worker threads to report the quiescent state
> + * status.
> + *
> + * This is implemented as a lock-free function. It is multi-thread
> + * safe and can be called from the worker threads as well.
> + *
> + * @param v
> + *   TQS variable
> + * @param n
> + *   Expected number of times the quiescent state is entered
> + * @param t
> + *   - If successful, this is the token for this call of the API.
> + *     This should be passed to rte_tqs_check API.
> + * @return
> + *   - -EINVAL if the parameters are invalid (debug mode compilation onl=
y).
> + *   - 0 Otherwise and always (non-debug mode compilation).
> + */
> +static __rte_always_inline int __rte_experimental
> +rte_tqs_start(struct rte_tqs *v, unsigned int n, uint32_t *t)
> +{
> +	TQS_RETURN_IF_TRUE((v =3D=3D NULL || t =3D=3D NULL), -EINVAL);
> +
> +	/* This store release will ensure that changes to any data
> +	 * structure are visible to the workers before the token
> +	 * update is visible.
> +	 */
> +	*t =3D __atomic_add_fetch(&v->token, n, __ATOMIC_RELEASE);
> +
> +	return 0;
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Update quiescent state for the worker thread on a lcore.
> + *
> + * This is implemented as a lock-free function. It is multi-thread safe.
> + * All the worker threads registered to report their quiescent state
> + * on the TQS variable must call this API.
> + *
> + * @param v
> + *   TQS variable
> + */
> +static __rte_always_inline void __rte_experimental
> +rte_tqs_update(struct rte_tqs *v, unsigned int lcore_id)
> +{
> +	uint32_t t;
> +
> +	TQS_ERR_LOG_IF_TRUE(v =3D=3D NULL || lcore_id >=3D RTE_TQS_MAX_LCORE);
> +
> +	/* Load the token before the worker thread loads any other
> +	 * (lock-free) data structure. This ensures that updates
> +	 * to the data structures are visible if the update
> +	 * to token is visible.
> +	 */
> +	t =3D __atomic_load_n(&v->token, __ATOMIC_ACQUIRE);

Hmm, I am not very familiar with C11 model, but it looks like a race
condition to me:=20
as I understand, update() supposed be called at the end of reader's
critical section, correct?
But ACQUIRE is only a hoist barrier, which means compiler and cpu
are free to move earlier reads (and writes) after it.
It probably needs to be a full ACQ_REL here.

> +	if (v->w[lcore_id].cnt !=3D t)
> +		v->w[lcore_id].cnt++;
> +}
> +
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change without prior notice
> + *
> + * Checks if all the worker threads have entered the quiescent state
> + * 'n' number of times. 'n' is provided in rte_tqs_start API.
> + *
> + * This is implemented as a lock-free function. It is multi-thread
> + * safe and can be called from the worker threads as well.
> + *
> + * @param v
> + *   TQS variable
> + * @param t
> + *   Token returned by rte_tqs_start API
> + * @param wait
> + *   If true, block till all the worker threads have completed entering
> + *   the quiescent state 'n' number of times
> + * @return
> + *   - 0 if all worker threads have NOT passed through specified number
> + *     of quiescent states.
> + *   - 1 if all worker threads have passed through specified number
> + *     of quiescent states.
> + *   - -EINVAL if the parameters are invalid (debug mode compilation onl=
y).
> + */
> +static __rte_always_inline int __rte_experimental
> +rte_tqs_check(struct rte_tqs *v, uint32_t t, bool wait)
> +{
> +	uint64_t l;
> +	uint64_t lcore_mask;
> +
> +	TQS_RETURN_IF_TRUE((v =3D=3D NULL), -EINVAL);
> +
> +	do {
> +		/* Load the current lcore_mask before loading the
> +		 * worker thread quiescent state counters.
> +		 */
> +		lcore_mask =3D __atomic_load_n(&v->lcore_mask, __ATOMIC_ACQUIRE);

What would happen if reader will call unregister() simultaneously
with check() and will update lcore_mask straight after that load?
As I understand check() might hang in such case.

> +
> +		while (lcore_mask) {
> +			l =3D __builtin_ctz(lcore_mask);
> +			if (v->w[l].cnt !=3D t)
> +				break;

As I understand, that makes control-path function progress dependent
on simultaneous invocation of data-path functions.
In some cases that might cause control-path to hang.
Let say if data-path function wouldn't be called, or user invokes
control-path and data-path functions from the same thread.

> +
> +			lcore_mask &=3D ~(1UL << l);
> +		}
> +
> +		if (lcore_mask =3D=3D 0)
> +			return 1;
> +
> +		rte_pause();
> +	} while (wait);
> +
> +	return 0;
> +}
> +