From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id BFE5EA00C3; Fri, 25 Mar 2022 13:11:28 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id B1D9540687; Fri, 25 Mar 2022 13:11:28 +0100 (CET) Received: from linux.microsoft.com (linux.microsoft.com [13.77.154.182]) by mails.dpdk.org (Postfix) with ESMTP id 1384E40140 for ; Fri, 25 Mar 2022 13:11:27 +0100 (CET) Received: by linux.microsoft.com (Postfix, from userid 1086) id 5956820DE41A; Fri, 25 Mar 2022 05:11:26 -0700 (PDT) DKIM-Filter: OpenDKIM Filter v2.11.0 linux.microsoft.com 5956820DE41A DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.microsoft.com; s=default; t=1648210286; bh=h8L+xcCEPcgwoHb48mHN8FNGLjgkrH4ehDjkSEt7ACs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=erxvg0WwpZXd39z9Wdzq7QBDr5wxF/fSQ5a0qzui6L6e3x8KCikZQOb5oCiMNfFHF 01s/UCt6R2owuKjHAFnq8jF4SJ+oZfRWeM5yEbrxRgu2AOzo4I6qo9BnJ3klQ1KOWr vbtQWfpGID6wg0Ov+v+KTw6o3ulyOzUPnWupw2XM= Date: Fri, 25 Mar 2022 05:11:26 -0700 From: Tyler Retzlaff To: David Marchand Cc: dev , Thomas Monjalon , Bruce Richardson , Dmitry Kozlyuk , Narcisa Ana Maria Vasile , Dmitry Malloy , Pallavi Kadam Subject: Re: [PATCH] eal: factorize lcore main loop Message-ID: <20220325121126.GA6378@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> References: <20220323093001.20618-1-david.marchand@redhat.com> <20220324083107.GA28494@linuxonhyperv3.guj3yctzbm1etfxqx2vob5hsef.xx.internal.cloudapp.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.21 (2010-09-15) X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org On Thu, Mar 24, 2022 at 03:44:23PM +0100, David Marchand wrote: > On Thu, Mar 24, 2022 at 9:31 AM Tyler Retzlaff > wrote: > > > diff --git a/lib/eal/common/eal_common_thread.c b/lib/eal/common/eal_common_thread.c > > > index 684bea166c..256de91abc 100644 > > > --- a/lib/eal/common/eal_common_thread.c > > > +++ b/lib/eal/common/eal_common_thread.c > > > @@ -9,6 +9,7 @@ > > > #include > > > #include > > > > > > +#include > > > #include > > > #include > > > #include > > > @@ -163,6 +164,77 @@ __rte_thread_uninit(void) > > > RTE_PER_LCORE(_lcore_id) = LCORE_ID_ANY; > > > } > > > > > > +/* main loop of threads */ > > > +__rte_noreturn void * > > > +eal_thread_loop(__rte_unused void *arg) > > > +{ > > > + char cpuset[RTE_CPU_AFFINITY_STR_LEN]; > > > + pthread_t thread_id = pthread_self(); > > > + unsigned int lcore_id; > > > + int ret; > > > + > > > + /* retrieve our lcore_id from the configuration structure */ > > > + RTE_LCORE_FOREACH_WORKER(lcore_id) { > > > + if (thread_id == lcore_config[lcore_id].thread_id) > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > i can see that in practice this isn't a problem since the linux > > implementation of pthread_create(3) stores to pthread_t *thread before > > executing start_routine. > > > > but strictly speaking i don't think the pthread_create api contractually > > guarantees that the thread id is stored before start_routine runs. so this > > is relying on an internal implementation detail. > > > > https://pubs.opengroup.org/onlinepubs/9699919799/functions/pthread_create.html > > > > "Upon successful completion, pthread_create() shall store the ID of the > > created thread in the location referenced by thread." > > > > https://man7.org/linux/man-pages/man3/pthread_create.3.html > > > > "Before returning, a successful call to pthread_create() stores > > the ID of the new thread in the buffer pointed to by thread; this > > identifier is used to refer to the thread in subsequent calls to > > other pthreads functions." > > > > it doesn't really say when it does this in relation to start_routine running. > > depends how hair splitty you want to be about it. but since you're revamping > > the code you might be interested in addressing it. > > I had wondered about this part too in the past. > > I don't see a reason to keep this loop (even considering baremetal, > since this code is within the linux implementation of EAL). > And this comment seems a good reason to cleanup the code (like simply > pass lcore_id via arg). > > Something like: > > Author: David Marchand > Date: Thu Mar 24 11:29:46 2022 +0100 > > eal: cleanup lcore hand-over from main thread > > As noted by Tyler, there is nothing in the pthread API that strictly > guarantees that the new thread won't start running eal_thread_loop > before pthread_create writes to &lcore_config[xx].thread_id. > > Rather than rely on thread id, the main thread can directly pass the > worker thread lcore. > > Signed-off-by: David Marchand > > diff --git a/lib/eal/common/eal_common_thread.c > b/lib/eal/common/eal_common_thread.c > index 256de91abc..962b7e9ac4 100644 > --- a/lib/eal/common/eal_common_thread.c > +++ b/lib/eal/common/eal_common_thread.c > @@ -166,26 +166,17 @@ __rte_thread_uninit(void) > > /* main loop of threads */ > __rte_noreturn void * > -eal_thread_loop(__rte_unused void *arg) > +eal_thread_loop(void *arg) > { > + unsigned int lcore_id = (uintptr_t)arg; > char cpuset[RTE_CPU_AFFINITY_STR_LEN]; > - pthread_t thread_id = pthread_self(); > - unsigned int lcore_id; > int ret; > > - /* retrieve our lcore_id from the configuration structure */ > - RTE_LCORE_FOREACH_WORKER(lcore_id) { > - if (thread_id == lcore_config[lcore_id].thread_id) > - break; > - } > - if (lcore_id == RTE_MAX_LCORE) > - rte_panic("cannot retrieve lcore id\n"); > - > __rte_thread_init(lcore_id, &lcore_config[lcore_id].cpuset); > > ret = eal_thread_dump_current_affinity(cpuset, sizeof(cpuset)); > RTE_LOG(DEBUG, EAL, "lcore %u is ready (tid=%zx;cpuset=[%s%s])\n", > - lcore_id, (uintptr_t)thread_id, cpuset, > + lcore_id, (uintptr_t)pthread_self(), cpuset, > ret == 0 ? "" : "..."); > > rte_eal_trace_thread_lcore_ready(lcore_id, cpuset); > diff --git a/lib/eal/common/eal_thread.h b/lib/eal/common/eal_thread.h > index b08dcf34b5..0fde33e70c 100644 > --- a/lib/eal/common/eal_thread.h > +++ b/lib/eal/common/eal_thread.h > @@ -11,7 +11,7 @@ > * basic loop of thread, called for each thread by eal_init(). > * > * @param arg > - * opaque pointer > + * The lcore_id (passed as an integer) of this worker thread. > */ > __rte_noreturn void *eal_thread_loop(void *arg); > > diff --git a/lib/eal/freebsd/eal.c b/lib/eal/freebsd/eal.c > index 80bc3d25e0..a6b20960f2 100644 > --- a/lib/eal/freebsd/eal.c > +++ b/lib/eal/freebsd/eal.c > @@ -810,7 +810,7 @@ rte_eal_init(int argc, char **argv) > > /* create a thread for each lcore */ > ret = pthread_create(&lcore_config[i].thread_id, NULL, > - eal_thread_loop, NULL); > + eal_thread_loop, (void *)(uintptr_t)i); > if (ret != 0) > rte_panic("Cannot create thread\n"); > > diff --git a/lib/eal/linux/eal.c b/lib/eal/linux/eal.c > index 8a405d1d59..1ef263434a 100644 > --- a/lib/eal/linux/eal.c > +++ b/lib/eal/linux/eal.c > @@ -1145,7 +1145,7 @@ rte_eal_init(int argc, char **argv) > > /* create a thread for each lcore */ > ret = pthread_create(&lcore_config[i].thread_id, NULL, > - eal_thread_loop, NULL); > + eal_thread_loop, (void *)(uintptr_t)i); > if (ret != 0) > rte_panic("Cannot create thread\n"); > > diff --git a/lib/eal/windows/eal.c b/lib/eal/windows/eal.c > index ca3c41aaa7..1874f9f6d7 100644 > --- a/lib/eal/windows/eal.c > +++ b/lib/eal/windows/eal.c > @@ -420,7 +420,7 @@ rte_eal_init(int argc, char **argv) > lcore_config[i].state = WAIT; > > /* create a thread for each lcore */ > - if (eal_thread_create(&lcore_config[i].thread_id) != 0) > + if (eal_thread_create(&lcore_config[i].thread_id, i) != 0) > rte_panic("Cannot create thread\n"); > ret = pthread_setaffinity_np(lcore_config[i].thread_id, > sizeof(rte_cpuset_t), &lcore_config[i].cpuset); > diff --git a/lib/eal/windows/eal_thread.c b/lib/eal/windows/eal_thread.c > index de1c0078a5..704781a83c 100644 > --- a/lib/eal/windows/eal_thread.c > +++ b/lib/eal/windows/eal_thread.c > @@ -71,13 +71,14 @@ eal_thread_ack_command(void) > > /* function to create threads */ > int > -eal_thread_create(pthread_t *thread) > +eal_thread_create(pthread_t *thread, unsigned int lcore_id) > { > HANDLE th; > > th = CreateThread(NULL, 0, > (LPTHREAD_START_ROUTINE)(ULONG_PTR)eal_thread_loop, > - NULL, 0, (LPDWORD)thread); > + (LPVOID)(uintptr_t)lcore_id, 0, > + (LPDWORD)thread); > if (!th) > return -1; > > > > But seeing how this code has been there from day 1, I would not > request a backport. this looks better to me it ends up being a bit less code and it solves the problem in a general fashion for platforms including windows. on windows the implementation does run the start_routine before assigning thread which was addressed with this patch. (still not merged) http://patchwork.dpdk.org/project/dpdk/list/?series=22094 it's likely your patch will be merged before mine so when that happens i'll just quietly abandon mine. however if some desire exists for a backport the simpler patch i provided could be used. on our downstream UT pipelines the bug was causing intermittent failure of around 30% of the tests. i'm surprised the bug hasn't had a more negative impact on the dpdk CI pipelines.