From: Phil Yang <Phil.Yang@arm.com>
To: Aaron Conole <aconole@redhat.com>,
Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
Cc: David Marchand <david.marchand@redhat.com>,
Van Haaren Harry <harry.van.haaren@intel.com>,
Igor Romanov <igor.romanov@oktetlabs.ru>,
Honnappa Nagarahalli <Honnappa.Nagarahalli@arm.com>,
dev <dev@dpdk.org>, Ferruh Yigit <ferruh.yigit@intel.com>,
nd <nd@arm.com>, nd <nd@arm.com>
Subject: Re: [dpdk-dev] Random failure in service_autotest
Date: Sat, 18 Jul 2020 08:34:06 +0000 [thread overview]
Message-ID: <VE1PR08MB4640AE63D9D825EBCE6729E5E97D0@VE1PR08MB4640.eurprd08.prod.outlook.com> (raw)
In-Reply-To: <f7ttuy5lqbw.fsf@dhcp-25.97.bos.redhat.com>
> -----Original Message-----
> From: Aaron Conole <aconole@redhat.com>
> Sent: Saturday, July 18, 2020 6:39 AM
> To: Lukasz Wojciechowski <l.wojciechow@partner.samsung.com>
> Cc: David Marchand <david.marchand@redhat.com>; Van Haaren Harry
> <harry.van.haaren@intel.com>; Igor Romanov
> <igor.romanov@oktetlabs.ru>; Honnappa Nagarahalli
> <Honnappa.Nagarahalli@arm.com>; Phil Yang <Phil.Yang@arm.com>; dev
> <dev@dpdk.org>; Ferruh Yigit <ferruh.yigit@intel.com>
> Subject: Re: [dpdk-dev] Random failure in service_autotest
>
> Lukasz Wojciechowski <l.wojciechow@partner.samsung.com> writes:
>
> > W dniu 17.07.2020 o 17:19, David Marchand pisze:
> >> On Fri, Jul 17, 2020 at 10:56 AM David Marchand
> >> <david.marchand@redhat.com> wrote:
> >>> On Wed, Jul 15, 2020 at 12:41 PM Ferruh Yigit <ferruh.yigit@intel.com>
> wrote:
> >>>> On 7/15/2020 11:14 AM, David Marchand wrote:
> >>>>> Hello Harry and guys who touched the service code recently :-)
> >>>>>
> >>>>> I spotted a failure for the service UT in Travis:
> >>>>> https://travis-ci.com/github/ovsrobot/dpdk/jobs/361097992#L18697
> >>>>>
> >>>>> I found only a single instance of this failure and tried to reproduce
> >>>>> it with my usual "brute" active loop with no success so far.
> >>>> +1, I didn't able to reproduce it in my environment but observed it in
> the
> >>>> Travis CI.
> >>>>
> >>>>> Any chance it could be due to recent changes?
> >>>>> https://protect2.fireeye.com/url?k=70a801b3-2d7b5aa7-70a98afc-
> 0cc47a31ce4e-
> 231dc7b8ee6eb8a9&q=1&u=https%3A%2F%2Fgit.dpdk.org%2Fdpdk%2Fcom
> mit%2F%3Fid%3Df3c256b621262e581d3edcca383df83875ab7ebe
> >>>>> https://protect2.fireeye.com/url?k=21dbcfd3-7c0894c7-21da449c-
> 0cc47a31ce4e-
> d8c6abfb03bf67f1&q=1&u=https%3A%2F%2Fgit.dpdk.org%2Fdpdk%2Fcomm
> it%2F%3Fid%3D048db4b6dcccaee9277ce5b4fbb2fe684b212e22
> >>> I can see more occurrences of the issue in the CI.
> >>> I just applied the patch changing the log level for test assert, in
> >>> the hope it will help.
> >> And... we just got one with logs:
> >> https://travis-ci.com/github/ovsrobot/dpdk/jobs/362109882#L18948
> >>
> >> EAL: Test assert service_lcore_attr_get line 396 failed:
> >> lcore_attr_get() didn't get correct loop count (zero)
> >>
> >> It looks like a race between the service core still running and the
> >> core resetting the loops attr.
> >>
> > Yes, it seems to be just lack of patience of the test. It should wait a
> > bit for lcore to stop before resetting attrs.
> > Something like this should help:
> > @@ -384,6 +384,9 @@ service_lcore_attr_get(void)
> >
> > rte_service_lcore_stop(slcore_id);
> >
> > + /* wait for the service lcore to stop */
> > + rte_delay_ms(200);
> > +
> > TEST_ASSERT_EQUAL(0, rte_service_lcore_attr_reset_all(slcore_id),
> > "Valid lcore_attr_reset_all() didn't return
> > success");
>
> Would an rte_eal_wait_lcore make sense? Overall, I really dislike
> sleeps because they can hide racy synchronization points.
The rte_service_lcore_stop() operation changes the status to RUNSTATE_STOPED.
However, it will not terminate the service_run() procedure (there is a spinlock in service_run() MT_UNSAFE path).
So the 'cs->loop' might increase after calling rte_service_lcore_attr_reset_all(), which leads to this failure.
I think if we move the loop counter update operation before the service_run() procedure, it can avoid this conflict.
I cannot reproduce this issue on my testbed, so not sure something like this can help or not.
Please check the code below.
diff --git a/lib/librte_eal/common/rte_service.c b/lib/librte_eal/common/rte_service.c
index 6a0e0ff..7b703dd 100644
--- a/lib/librte_eal/common/rte_service.c
+++ b/lib/librte_eal/common/rte_service.c
@@ -464,6 +464,7 @@ service_runner_func(void *arg)
while (__atomic_load_n(&cs->runstate, __ATOMIC_ACQUIRE) ==
RUNSTATE_RUNNING) {
const uint64_t service_mask = cs->service_mask;
+ cs->loops++;
for (i = 0; i < RTE_SERVICE_NUM_MAX; i++) {
if (!service_valid(i))
@@ -471,8 +472,6 @@ service_runner_func(void *arg)
/* return value ignored as no change to code flow */
service_run(i, cs, service_mask, service_get(i), 1);
}
-
- cs->loops++;
}
return 0;
Thanks,
Phil
next prev parent reply other threads:[~2020-07-18 8:34 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-15 10:14 David Marchand
2020-07-15 10:41 ` Ferruh Yigit
2020-07-17 8:56 ` David Marchand
2020-07-17 15:19 ` David Marchand
2020-07-17 20:31 ` Lukasz Wojciechowski
2020-07-17 22:38 ` Aaron Conole
2020-07-17 22:43 ` Honnappa Nagarahalli
2020-07-18 8:34 ` Phil Yang [this message]
2020-07-20 12:09 ` Van Haaren, Harry
2020-07-20 12:47 ` Lukasz Wojciechowski
2020-07-21 5:39 ` Honnappa Nagarahalli
2020-07-21 8:01 ` Van Haaren, Harry
2020-07-21 8:07 ` David Marchand
2020-07-21 8:16 ` Lukasz Wojciechowski
2020-07-21 15:09 ` Honnappa Nagarahalli
2020-07-21 15:38 ` Van Haaren, Harry
2020-07-21 16:21 ` Honnappa Nagarahalli
2020-07-15 12:56 ` Aaron Conole
2020-07-15 13:02 ` David Marchand
2020-07-15 13:09 ` Lukasz Wojciechowski
2020-07-15 13:28 ` David Marchand
2020-07-15 13:39 ` Aaron Conole
2020-07-15 20:26 ` Honnappa Nagarahalli
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=VE1PR08MB4640AE63D9D825EBCE6729E5E97D0@VE1PR08MB4640.eurprd08.prod.outlook.com \
--to=phil.yang@arm.com \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=aconole@redhat.com \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=ferruh.yigit@intel.com \
--cc=harry.van.haaren@intel.com \
--cc=igor.romanov@oktetlabs.ru \
--cc=l.wojciechow@partner.samsung.com \
--cc=nd@arm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).