Hi Thomas,

This has been fixed as of yesterday.  The failure was caused by a commit to the SPDK repos in how they pull in their dependencies, which was done in a way that is not compatible with docker.  The team created a work around so that case is fixed, but there is always a risk where other commits for those type of items could cause a failure in the containers.

I asked Brandon to change the scripts that run the testing in the containers to try and catch failures from docker separately, so they can be flagged as infrastructure, compared to failures of the build.

I'm also very surprised, this was not raised during the CI meeting, or by anyone else.  I'm wondering if this is caused by the actual error logs being a little abstracted from the emails, i.e. they are a link and a zip file away for the actual email text, so maybe folks are not really looking into the output as closely as they should be.  Is this something we can make better by including more detail in the email text, so issues are caught more quickly?

Cheers,
Lincoln

On Sun, May 24, 2020 at 5:50 AM Thomas Monjalon <thomas@monjalon.net> wrote:
Hi all,

I think we have a CI reliability issue in general.
Perhaps we lack some alert mechanism warning test platform maintainers
when too many tests are failing.

Recent example: the community lab compilation test is failing on
Fedora 31 for at least 2 weeks, and I don't see any action to fix it:
        https://lab.dpdk.org/results/dashboard/patchsets/11040/

Because of such recurring errors, the whole CI becomes irrelevant.
Please, we need taking actions to avoid such issue in the near future.




--
Lincoln Lavoie
Senior Engineer, Broadband Technologies
21 Madbury Rd., Ste. 100, Durham, NH 03824
+1-603-674-2755 (m)