From: "Van Haaren, Harry" <harry.van.haaren@intel.com>
To: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>,
"dev@dpdk.org" <dev@dpdk.org>
Cc: "david.marchand@redhat.com" <david.marchand@redhat.com>,
"igor.romanov@oktetlabs.ru" <igor.romanov@oktetlabs.ru>,
"honnappa.nagarahalli@arm.com" <honnappa.nagarahalli@arm.com>,
"Yigit, Ferruh" <ferruh.yigit@intel.com>,
"nd@arm.com" <nd@arm.com>,
"aconole@redhat.com" <aconole@redhat.com>
Subject: Re: [dpdk-dev] [PATCH] service: fix stop API to wait for service thread
Date: Mon, 20 Jul 2020 14:20:22 +0000 [thread overview]
Message-ID: <BYAPR11MB31434ED0846748546D905F86D77B0@BYAPR11MB3143.namprd11.prod.outlook.com> (raw)
In-Reply-To: <fbd9c539-24c1-8cd8-988f-56dd423f892d@partner.samsung.com>
> -----Original Message-----
> From: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Sent: Monday, July 20, 2020 1:52 PM
> To: Van Haaren, Harry <harry.van.haaren@intel.com>; dev@dpdk.org
> Cc: david.marchand@redhat.com; igor.romanov@oktetlabs.ru;
> honnappa.nagarahalli@arm.com; Yigit, Ferruh <ferruh.yigit@intel.com>;
> nd@arm.com; aconole@redhat.com
> Subject: Re: [PATCH] service: fix stop API to wait for service thread
>
>
> W dniu 20.07.2020 o 14:09, Harry van Haaren pisze:
> > This commit improves the service_lcore_stop() implementation,
> > waiting for the service core in question to return. The service
> > thread itself now has a variable to indicate if its thread is
> > active. When zero the service thread has completed its service,
> > and has returned from the service_runner_func() function.
> >
> > This fixes a race condition observed in the DPDK CI, where the
> > statistics of the service were not consistent with the expectation
> > due to the service thread still running, and incrementing a stat
> > after stop was called.
> >
> > Signed-off-by: Harry van Haaren <harry.van.haaren@intel.com>
> >
> > ---
> >
> > This is one possible solution, that avoids a class of race-conditions
> > based on stop() api and following behaviours. Without a change in
> > implementation of the service core thread, we could not detect when
> > the thread was actually finished. This is now possible, and the stop
> > api makes use of it to wait for 1000x one millisecond, or log a warning
> > that a service core didn't return quickly.
> >
> > Thanks for the discussion/debug on list - I'm not sure how to add
> > reported-by/suggested-by etc tags: but I'll resend a V2 (or can add
> > on apply).
> >
> > ---
> > lib/librte_eal/common/rte_service.c | 24 ++++++++++++++++++++++++
> > 1 file changed, 24 insertions(+)
> >
> > diff --git a/lib/librte_eal/common/rte_service.c
> b/lib/librte_eal/common/rte_service.c
> > index 6a0e0ff65..d2255587d 100644
> > --- a/lib/librte_eal/common/rte_service.c
> > +++ b/lib/librte_eal/common/rte_service.c
> > @@ -65,6 +65,7 @@ struct core_state {
> > /* map of services IDs are run on this core */
> > uint64_t service_mask;
> > uint8_t runstate; /* running or stopped */
> > + uint8_t thread_active; /* indicates when the thread is in service_run() */
> > uint8_t is_service_core; /* set if core is currently a service core */
> > uint8_t service_active_on_lcore[RTE_SERVICE_NUM_MAX];
> > uint64_t loops;
> > @@ -457,6 +458,8 @@ service_runner_func(void *arg)
> > const int lcore = rte_lcore_id();
> > struct core_state *cs = &lcore_states[lcore];
> >
> > + __atomic_store_n(&cs->thread_active, 1, __ATOMIC_RELAXED);
> > +
> > /* runstate act as the guard variable. Use load-acquire
> > * memory order here to synchronize with store-release
> > * in runstate update functions.
> > @@ -475,6 +478,7 @@ service_runner_func(void *arg)
> > cs->loops++;
> > }
> >
> > + __atomic_store_n(&cs->thread_active, 0, __ATOMIC_RELAXED);
> > return 0;
> > }
> >
> > @@ -765,6 +769,26 @@ rte_service_lcore_stop(uint32_t lcore)
> > __atomic_store_n(&lcore_states[lcore].runstate, RUNSTATE_STOPPED,
> > __ATOMIC_RELEASE);
> >
> > + /* wait for service lcore to return */
> > + i = 0;
> > + uint8_t active;
> > + uint64_t start = rte_rdtsc();
> > + do {
> > + active = __atomic_load_n(&lcore_states[lcore].thread_active,
> > + __ATOMIC_RELAXED);
> > + if (active == 0)
> > + break;
> > + rte_delay_ms(1);
> > + i++;
> > + } while (i < 1000);
> > +
> > + if (active != 0) {
> > + uint64_t end = rte_rdtsc();
> > + RTE_LOG(WARNING, EAL,
> > + "service lcore stop() failed, waited for %ld cycles\n",
> > + end - start);
> > + }
> > +
> > return 0;
> > }
> >
> I don't like the idea of inserting this polling loop inside API call.
> And I don't like setting up a 1000 iterations constraint.
> How about keeping the thread_active flag, but moving checking state of
> this flag to separate function. This way the user of the API would be
> able to write own loop.
> Maybe he/she would like a custom loop, because:
> * waiting for more cores
> * would like to wait longer
> * would like to check if service is finished less often...
Agree - good feedback, thanks. v2 on the way, with this approach.
next prev parent reply other threads:[~2020-07-20 14:20 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
[not found] <CGME20200720120850eucas1p10debcafa273d244a7e63d225c50cc9df@eucas1p1.samsung.com>
2020-07-20 12:09 ` Harry van Haaren
2020-07-20 12:51 ` Lukasz Wojciechowski
2020-07-20 14:20 ` Van Haaren, Harry [this message]
2020-07-20 14:38 ` [dpdk-dev] [PATCH v2 1/2] service: add API to retrieve service core active Harry van Haaren
2020-07-20 14:38 ` [dpdk-dev] [PATCH v2 2/2] test/service: fix race condition on stopping lcore Harry van Haaren
2020-07-20 17:45 ` Lukasz Wojciechowski
2020-07-21 8:38 ` Phil Yang
2020-07-22 10:26 ` Van Haaren, Harry
2020-07-20 17:45 ` [dpdk-dev] [PATCH v2 1/2] service: add API to retrieve service core active Lukasz Wojciechowski
2020-07-21 7:47 ` Phil Yang
2020-07-21 19:43 ` Honnappa Nagarahalli
2020-07-21 19:50 ` David Marchand
2020-07-21 20:23 ` Honnappa Nagarahalli
2020-07-22 10:14 ` Van Haaren, Harry
2020-07-22 18:50 ` Honnappa Nagarahalli
2020-07-23 16:59 ` Van Haaren, Harry
2020-07-22 10:37 ` [dpdk-dev] [PATCH v3 " Harry van Haaren
2020-07-22 10:37 ` [dpdk-dev] [PATCH v3 2/2] test/service: fix race condition on stopping lcore Harry van Haaren
2020-07-22 21:40 ` Honnappa Nagarahalli
2020-07-22 21:39 ` [dpdk-dev] [PATCH v3 1/2] service: add API to retrieve service core active Honnappa Nagarahalli
2020-07-24 12:45 ` [dpdk-dev] [PATCH v4 " Harry van Haaren
2020-07-24 12:45 ` [dpdk-dev] [PATCH v4 2/2] test/service: fix race condition on stopping lcore Harry van Haaren
2020-07-24 13:45 ` [dpdk-dev] [PATCH v5 1/2] service: add API to retrieve service core active Harry van Haaren
2020-07-24 13:45 ` [dpdk-dev] [PATCH v5 2/2] test/service: fix race condition on stopping lcore Harry van Haaren
2020-09-14 8:36 ` David Marchand
2020-09-14 14:33 ` Van Haaren, Harry
2020-09-14 14:31 ` [dpdk-dev] [PATCH v6 1/2] service: add API to retrieve service core active Harry van Haaren
2020-09-14 14:31 ` [dpdk-dev] [PATCH v6 2/2] test/service: fix race condition on stopping lcore Harry van Haaren
2020-09-21 14:51 ` David Marchand
2020-10-13 19:45 ` David Marchand
2020-10-15 8:11 ` Van Haaren, Harry
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=BYAPR11MB31434ED0846748546D905F86D77B0@BYAPR11MB3143.namprd11.prod.outlook.com \
--to=harry.van.haaren@intel.com \
--cc=aconole@redhat.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=honnappa.nagarahalli@arm.com \
--cc=igor.romanov@oktetlabs.ru \
--cc=l.wojciechow@partner.samsung.com \
--cc=nd@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).