From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mga06.intel.com (mga06.intel.com [134.134.136.31]) by dpdk.org (Postfix) with ESMTP id A32781B5E9 for ; Sat, 24 Nov 2018 13:18:05 +0100 (CET) X-Amp-Result: SKIPPED(no attachment in message) X-Amp-File-Uploaded: False Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga104.jf.intel.com with ESMTP/TLS/DHE-RSA-AES256-GCM-SHA384; 24 Nov 2018 04:18:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.56,273,1539673200"; d="scan'208";a="88799216" Received: from irsmsx110.ger.corp.intel.com ([163.33.3.25]) by fmsmga008.fm.intel.com with ESMTP; 24 Nov 2018 04:18:02 -0800 Received: from irsmsx112.ger.corp.intel.com (10.108.20.5) by irsmsx110.ger.corp.intel.com (163.33.3.25) with Microsoft SMTP Server (TLS) id 14.3.408.0; Sat, 24 Nov 2018 12:18:01 +0000 Received: from irsmsx106.ger.corp.intel.com ([169.254.8.8]) by irsmsx112.ger.corp.intel.com ([169.254.1.93]) with mapi id 14.03.0415.000; Sat, 24 Nov 2018 12:18:01 +0000 From: "Ananyev, Konstantin" To: 'Honnappa Nagarahalli' , "dev@dpdk.org" CC: "nd@arm.com" , "dharmik.thakkar@arm.com" , "malvika.gupta@arm.com" , "gavin.hu@arm.com" Thread-Topic: [dpdk-dev] [RFC 2/3] tqs: add thread quiescent state library Thread-Index: AQHUghQLRVNcuR2lRU2RH6S2NG2yyqVeB0Zg Date: Sat, 24 Nov 2018 12:18:00 +0000 Message-ID: <2601191342CEEE43887BDE71AB977258010CEBBBA9@IRSMSX106.ger.corp.intel.com> References: <20181122033055.3431-1-honnappa.nagarahalli@arm.com> <20181122033055.3431-3-honnappa.nagarahalli@arm.com> In-Reply-To: <20181122033055.3431-3-honnappa.nagarahalli@arm.com> Accept-Language: en-IE, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-titus-metadata-40: eyJDYXRlZ29yeUxhYmVscyI6IiIsIk1ldGFkYXRhIjp7Im5zIjoiaHR0cDpcL1wvd3d3LnRpdHVzLmNvbVwvbnNcL0ludGVsMyIsImlkIjoiZDUzY2E4MmEtYjM5NC00OTk2LWEzZjMtNTc5ZGM0YzI1OTYwIiwicHJvcHMiOlt7Im4iOiJDVFBDbGFzc2lmaWNhdGlvbiIsInZhbHMiOlt7InZhbHVlIjoiQ1RQX05UIn1dfV19LCJTdWJqZWN0TGFiZWxzIjpbXSwiVE1DVmVyc2lvbiI6IjE3LjEwLjE4MDQuNDkiLCJUcnVzdGVkTGFiZWxIYXNoIjoiOVp1NU5Ub2dSZGRJek9yV2g1bURyRDlGNGs5T01BVlNkckloU2VlM1NTWmJlSE5vdU1uTE56eDVpSzR3aXFDMiJ9 x-ctpclassification: CTP_NT dlp-product: dlpe-windows dlp-version: 11.0.400.15 dlp-reaction: no-action x-originating-ip: [163.33.239.180] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Subject: Re: [dpdk-dev] [RFC 2/3] tqs: add thread quiescent state library X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 24 Nov 2018 12:18:06 -0000 Hi Honnappa, > + > +/* Allocate a new TQS variable with the name *name* in memory. */ > +struct rte_tqs * __rte_experimental > +rte_tqs_alloc(const char *name, int socket_id, uint64_t lcore_mask) > +{ > + char tqs_name[RTE_TQS_NAMESIZE]; > + struct rte_tailq_entry *te, *tmp_te; > + struct rte_tqs_list *tqs_list; > + struct rte_tqs *v, *tmp_v; > + int ret; > + > + if (name =3D=3D NULL) { > + RTE_LOG(ERR, TQS, "Invalid input parameters\n"); > + rte_errno =3D -EINVAL; > + return NULL; > + } > + > + te =3D rte_zmalloc("TQS_TAILQ_ENTRY", sizeof(*te), 0); > + if (te =3D=3D NULL) { > + RTE_LOG(ERR, TQS, "Cannot reserve memory for tailq\n"); > + rte_errno =3D -ENOMEM; > + return NULL; > + } > + > + snprintf(tqs_name, sizeof(tqs_name), "%s", name); > + v =3D rte_zmalloc_socket(tqs_name, sizeof(struct rte_tqs), > + RTE_CACHE_LINE_SIZE, socket_id); > + if (v =3D=3D NULL) { > + RTE_LOG(ERR, TQS, "Cannot reserve memory for TQS variable\n"); > + rte_errno =3D -ENOMEM; > + goto alloc_error; > + } > + > + ret =3D snprintf(v->name, sizeof(v->name), "%s", name); > + if (ret < 0 || ret >=3D (int)sizeof(v->name)) { > + rte_errno =3D -ENAMETOOLONG; > + goto alloc_error; > + } > + > + te->data =3D (void *) v; > + v->lcore_mask =3D lcore_mask; > + > + rte_rwlock_write_lock(RTE_EAL_TAILQ_RWLOCK); > + > + tqs_list =3D RTE_TAILQ_CAST(rte_tqs_tailq.head, rte_tqs_list); > + > + /* Search if a TQS variable with the same name exists already */ > + TAILQ_FOREACH(tmp_te, tqs_list, next) { > + tmp_v =3D (struct rte_tqs *) tmp_te->data; > + if (strncmp(name, tmp_v->name, RTE_TQS_NAMESIZE) =3D=3D 0) > + break; > + } > + > + if (tmp_te !=3D NULL) { > + rte_errno =3D -EEXIST; > + goto tqs_exist; > + } > + > + TAILQ_INSERT_TAIL(tqs_list, te, next); > + > + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); > + > + return v; > + > +tqs_exist: > + rte_rwlock_write_unlock(RTE_EAL_TAILQ_RWLOCK); > + > +alloc_error: > + rte_free(te); > + rte_free(v); > + return NULL; > +} That seems quite heavy-weight function just to allocate sync variable. As size of struct rte_tqs is constant and known to the user, might be bette= r just provide rte_tqs_init(struct rte_tqs *tqs, ...) and let user allocate/f= ree memory for it by himself. > + > +/* Add a reader thread, running on an lcore, to the list of threads > + * reporting their quiescent state on a TQS variable. > + */ > +int __rte_experimental > +rte_tqs_register_lcore(struct rte_tqs *v, unsigned int lcore_id) > +{ > + TQS_RETURN_IF_TRUE((v =3D=3D NULL || lcore_id >=3D RTE_TQS_MAX_LCORE), > + -EINVAL); It is not very good practice to make function return different values and b= ehave in a different way in debug/non-debug mode. I'd say that for slow-path (functions in .c) it is always good to check inp= ut parameters. For fast-path (functions in .h) we sometimes skip such checking, but debug mode can probably use RTE_ASSERT() or so. lcore_id >=3D RTE_TQS_MAX_LCORE Is this limitation really necessary? First it means that only lcores can use that API (at least data-path part),= second even today many machines have more than 64 cores. I think you can easily avoid such limitation, if instead of requiring lcore= _id as input parameter, you'll just make it return index of next available entry i= n w[]. Then tqs_update() can take that index as input parameter.=20 > + > + /* Worker thread has to count the quiescent states > + * only from the current value of token. > + */ > + v->w[lcore_id].cnt =3D v->token; Wonder what would happen, if new reader will call register(), after writer calls start()? Looks like a race-condition. Or such pattern is not supported? > + > + /* Release the store to initial TQS count so that workers > + * can use it immediately after this function returns. > + */ > + __atomic_fetch_or(&v->lcore_mask, (1UL << lcore_id), __ATOMIC_RELEASE); > + > + return 0; > +} > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Trigger the worker threads to report the quiescent state > + * status. > + * > + * This is implemented as a lock-free function. It is multi-thread > + * safe and can be called from the worker threads as well. > + * > + * @param v > + * TQS variable > + * @param n > + * Expected number of times the quiescent state is entered > + * @param t > + * - If successful, this is the token for this call of the API. > + * This should be passed to rte_tqs_check API. > + * @return > + * - -EINVAL if the parameters are invalid (debug mode compilation onl= y). > + * - 0 Otherwise and always (non-debug mode compilation). > + */ > +static __rte_always_inline int __rte_experimental > +rte_tqs_start(struct rte_tqs *v, unsigned int n, uint32_t *t) > +{ > + TQS_RETURN_IF_TRUE((v =3D=3D NULL || t =3D=3D NULL), -EINVAL); > + > + /* This store release will ensure that changes to any data > + * structure are visible to the workers before the token > + * update is visible. > + */ > + *t =3D __atomic_add_fetch(&v->token, n, __ATOMIC_RELEASE); > + > + return 0; > +} > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Update quiescent state for the worker thread on a lcore. > + * > + * This is implemented as a lock-free function. It is multi-thread safe. > + * All the worker threads registered to report their quiescent state > + * on the TQS variable must call this API. > + * > + * @param v > + * TQS variable > + */ > +static __rte_always_inline void __rte_experimental > +rte_tqs_update(struct rte_tqs *v, unsigned int lcore_id) > +{ > + uint32_t t; > + > + TQS_ERR_LOG_IF_TRUE(v =3D=3D NULL || lcore_id >=3D RTE_TQS_MAX_LCORE); > + > + /* Load the token before the worker thread loads any other > + * (lock-free) data structure. This ensures that updates > + * to the data structures are visible if the update > + * to token is visible. > + */ > + t =3D __atomic_load_n(&v->token, __ATOMIC_ACQUIRE); Hmm, I am not very familiar with C11 model, but it looks like a race condition to me:=20 as I understand, update() supposed be called at the end of reader's critical section, correct? But ACQUIRE is only a hoist barrier, which means compiler and cpu are free to move earlier reads (and writes) after it. It probably needs to be a full ACQ_REL here. > + if (v->w[lcore_id].cnt !=3D t) > + v->w[lcore_id].cnt++; > +} > + > +/** > + * @warning > + * @b EXPERIMENTAL: this API may change without prior notice > + * > + * Checks if all the worker threads have entered the quiescent state > + * 'n' number of times. 'n' is provided in rte_tqs_start API. > + * > + * This is implemented as a lock-free function. It is multi-thread > + * safe and can be called from the worker threads as well. > + * > + * @param v > + * TQS variable > + * @param t > + * Token returned by rte_tqs_start API > + * @param wait > + * If true, block till all the worker threads have completed entering > + * the quiescent state 'n' number of times > + * @return > + * - 0 if all worker threads have NOT passed through specified number > + * of quiescent states. > + * - 1 if all worker threads have passed through specified number > + * of quiescent states. > + * - -EINVAL if the parameters are invalid (debug mode compilation onl= y). > + */ > +static __rte_always_inline int __rte_experimental > +rte_tqs_check(struct rte_tqs *v, uint32_t t, bool wait) > +{ > + uint64_t l; > + uint64_t lcore_mask; > + > + TQS_RETURN_IF_TRUE((v =3D=3D NULL), -EINVAL); > + > + do { > + /* Load the current lcore_mask before loading the > + * worker thread quiescent state counters. > + */ > + lcore_mask =3D __atomic_load_n(&v->lcore_mask, __ATOMIC_ACQUIRE); What would happen if reader will call unregister() simultaneously with check() and will update lcore_mask straight after that load? As I understand check() might hang in such case. > + > + while (lcore_mask) { > + l =3D __builtin_ctz(lcore_mask); > + if (v->w[l].cnt !=3D t) > + break; As I understand, that makes control-path function progress dependent on simultaneous invocation of data-path functions. In some cases that might cause control-path to hang. Let say if data-path function wouldn't be called, or user invokes control-path and data-path functions from the same thread. > + > + lcore_mask &=3D ~(1UL << l); > + } > + > + if (lcore_mask =3D=3D 0) > + return 1; > + > + rte_pause(); > + } while (wait); > + > + return 0; > +} > +