From: "Mattias Rönnblom" <mattias.ronnblom@ericsson.com>
To: Thomas Monjalon <thomas@monjalon.net>,
Van Haaren Harry <harry.van.haaren@intel.com>
Cc: "David Marchand" <david.marchand@redhat.com>,
dpdklab <dpdklab@iol.unh.edu>, "ci@dpdk.org" <ci@dpdk.org>,
"Honnappa Nagarahalli" <Honnappa.Nagarahalli@arm.com>,
"Morten Brørup" <mb@smartsharesystems.com>,
"Aaron Conole" <aconole@redhat.com>, dev <dev@dpdk.org>
Subject: Re: rte_service unit test failing randomly
Date: Wed, 5 Oct 2022 21:33:56 +0000 [thread overview]
Message-ID: <e4aca5cb-4805-9391-c73e-6ba8b8d5982a@ericsson.com> (raw)
In-Reply-To: <3000673.mvXUDI8C0e@thomas>
On 2022-10-05 22:52, Thomas Monjalon wrote:
> 05/10/2022 22:33, Mattias Rönnblom:
>> On 2022-10-05 21:14, David Marchand wrote:
>>> Hello,
>>>
>>> The service_autotest unit test has been failing randomly.
>>> This is not something new.
>>> We have been fixing this unit test and the service code, here and there.
>>> For some time we were "fine": the failures were rare.
>>>
>>> But recenly (for the last two weeks at least), it started failing more
>>> frequently in UNH lab.
>>>
>>> The symptoms are linked to places where the unit test code is "waiting
>>> for some time":
>>>
>>> - service_lcore_attr_get:
>>> + TestCase [ 5] : service_lcore_attr_get failed
>>> EAL: Test assert service_lcore_attr_get line 422 failed: Service lcore
>>> not stopped after waiting.
>>>
>>>
>>> - service_may_be_active:
>>> + TestCase [15] : service_may_be_active failed
>>> ...
>>> EAL: Test assert service_may_be_active line 960 failed: Error: Service
>>> not stopped after 100ms
>>>
>>> Ideas?
>>>
>>>
>>> Thanks.
>>
>> Do you run the test suite in a controlled environment? I.e., one where
>> you can trust that the lcore threads aren't interrupted for long periods
>> of time.
>>
>> 100 ms is not a long time if a SCHED_OTHER lcore thread competes for the
>> CPU with other threads.
>
> You mean the tests cannot be interrupted?
I just took a very quick look, but it seems like the main thread can,
but the worker lcore thread cannot be interrupt for anything close to
100 ms, or you risk a test failure.
> Then it looks very fragile.
Tests like this are by their very nature racey. If a test thread sends a
request to another thread, there is no way for it to decide when a
non-response should result in a test failure, unless the scheduling
latency of the receiving thread has an upper bound.
If you grep for "sleep", or "delay", in app/test/test_*.c, you will get
a lot of matches. I bet there more like the service core one, but they
allow for longer interruptions.
That said, 100 ms sounds like very short. I don't see why this can be a
lot longer.
...and that said, I would argue you still need a reasonably controlled
environment for the autotests. If you have a server is arbitrarily
overloaded, maybe also with high memory pressure (and associated
instruction page faults and god-knows-what), the real-world worst-case
interruptions could be very long indeed. Seconds. Designing inherently
racey tests for that kind of environment will make them have very long
run times.
> Please could help making it more robust?
>
I can send a patch, if Harry can't.
next prev parent reply other threads:[~2022-10-05 21:33 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-05 19:14 David Marchand
2022-10-05 20:33 ` Mattias Rönnblom
2022-10-05 20:52 ` Thomas Monjalon
2022-10-05 21:33 ` Mattias Rönnblom [this message]
2022-10-06 6:53 ` Morten Brørup
2022-10-06 7:04 ` David Marchand
2022-10-06 7:50 ` Morten Brørup
2022-10-06 7:50 ` Mattias Rönnblom
2022-10-06 8:18 ` Morten Brørup
2022-10-06 8:59 ` Mattias Rönnblom
2022-10-06 9:49 ` Morten Brørup
2022-10-06 11:07 ` [dpdklab] " Lincoln Lavoie
2022-10-06 12:00 ` Morten Brørup
2022-10-06 17:52 ` Honnappa Nagarahalli
2022-10-06 13:51 ` Aaron Conole
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e4aca5cb-4805-9391-c73e-6ba8b8d5982a@ericsson.com \
--to=mattias.ronnblom@ericsson.com \
--cc=Honnappa.Nagarahalli@arm.com \
--cc=aconole@redhat.com \
--cc=ci@dpdk.org \
--cc=david.marchand@redhat.com \
--cc=dev@dpdk.org \
--cc=dpdklab@iol.unh.edu \
--cc=harry.van.haaren@intel.com \
--cc=mb@smartsharesystems.com \
--cc=thomas@monjalon.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).