From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 3BB3E43087; Wed, 16 Aug 2023 22:39:04 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 3152943254; Wed, 16 Aug 2023 22:39:04 +0200 (CEST) Received: from mail-oo1-f41.google.com (mail-oo1-f41.google.com [209.85.161.41]) by mails.dpdk.org (Postfix) with ESMTP id EB9EE40ED9 for ; Wed, 16 Aug 2023 22:39:00 +0200 (CEST) Received: by mail-oo1-f41.google.com with SMTP id 006d021491bc7-56ce936f7c0so4918926eaf.3 for ; Wed, 16 Aug 2023 13:39:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iol.unh.edu; s=unh-iol; t=1692218340; x=1692823140; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=oeC91oNFRn5WLm10r0A8HJ8WLZW9JzAvyNNU+QHRvDY=; b=B50N5S9ovRROI6BstM7+Igxmgo+9T9y+fZeaorW219jORsb7iubjA3dqJGjoUp7cCI 4Up0NIApb1bndd4IdhK8cy222qOn40TldcXDGlS+QtCRILBJBc/yb3N3pXtfwalFsFMN o7sDss6drN0KOxHkqL+aG7Ksky2DRxahr7V6Q= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692218340; x=1692823140; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=oeC91oNFRn5WLm10r0A8HJ8WLZW9JzAvyNNU+QHRvDY=; b=d3FpThv3nVdMXpEaNkDJLkOrl453xwEmNzqLq0r3Ix+0hZcYG7xZdltANjzTKxFyUP y5B9DK2eQier2ZR7IPwtE3cM6j7VUEFRLZ6i8N+2/Bb5Bqoqe8tijD/SpTRoXZwNsIvo TvSV4/l6bXdgh0SiVR2X2TQL/n/ZDfnNwXZpbqpGfM/zTJyxScS9XcxmNilwAQHdnSYV pfYiuifzyZ8/DmeX6AXcqcdGhnsrUomFugk4wa7vJIH7XDJAvHoKy9dkQQ9VMdirddHk Z+/ZP7sIa95169XlGz6MI4rDyMV06F6FJNKi/p/iodLptEHepsBy2EgpPAihjvr7eiiU pcqg== X-Gm-Message-State: AOJu0YzI3RWCTKTu6bOVCWS8/4ExXN0Vj6DtU4cy3jSNnWShCIyKwUsz zEDsETVPr5PlEbLJ2HqGJHgU6twTjpilrAiA35k6Dg== X-Google-Smtp-Source: AGHT+IHtqTzGG63dwF/RXvCWOeAfCg1lCtGRQIRw1xS+k+SqBamF+6RJkaTV/5Kz07b3C8wM3fwJq9WJNkEwef1o5n4= X-Received: by 2002:a4a:611b:0:b0:563:625b:e02e with SMTP id n27-20020a4a611b000000b00563625be02emr2991607ooc.9.1692218340058; Wed, 16 Aug 2023 13:39:00 -0700 (PDT) MIME-Version: 1.0 References: <20230721115125.55137-1-bruce.richardson@intel.com> <20230815151053.996469-1-bruce.richardson@intel.com> <20230815151053.996469-5-bruce.richardson@intel.com> In-Reply-To: From: Patrick Robb Date: Wed, 16 Aug 2023 16:38:49 -0400 Message-ID: Subject: Re: [PATCH v5 04/10] app/test: build using per-file dependency matrix To: David Marchand Cc: Bruce Richardson , dev@dpdk.org, ci@dpdk.org, =?UTF-8?Q?Morten_Br=C3=B8rup?= , Honnappa Nagarahalli , "Ruifeng Wang (Arm Technology China)" , Thomas Monjalon , Aaron Conole Content-Type: multipart/alternative; boundary="0000000000000ac133060310495b" X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org --0000000000000ac133060310495b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Wed, Aug 16, 2023 at 3:26=E2=80=AFPM David Marchand wrote: > On Wed, Aug 16, 2023 at 8:30=E2=80=AFPM Patrick Robb = wrote: > > On Wed, Aug 16, 2023 at 10:40=E2=80=AFAM David Marchand < > david.marchand@redhat.com> wrote: > >> > >> Patrick, Bruce, > >> > >> If it was reported, I either missed it or forgot about it, sorry. > >> Can you (re)share the context? > >> > >> > >> > > >> > Does the test suite pass if the mlx5 driver is disabled in the build= ? > That > >> > could confirm or refute the suspicion of where the issue is, and als= o > >> > provide a temporary workaround while this set is merged (possibly > including > >> > support for disabling specific tests, as I suggested in my other > email). > >> > >> Or disabling the driver as Bruce proposes. > > > > Okay, we ran the test with the mlx5 driver disabled, and it still > fails. So, this might be more of an ARM architecture issue. Ruifeng, are > you still seeing this on your test bed? > > > > @David you didn't miss anything, we had a unicast with ARM when setting > up the new arm container runners for unit testing a few months back. > Ruifeng also noticed the same issue and speculated about mlx5 memory leak= s. > He raised the possibility of disabling the mlx5 driver too, but that opti= on > isn't great since we want to have a uniform build process (as much as > possible) for our unit testing. Anyways, now we know that that isn't > relevant. I'll forward the thread to you in any case - let me know if you > have any ideas. > > The mention of "memtest1" in the mails rings a bell. > I will need more detailed logs, or ideally an env where it is reproduced. > > meson-logs/ for the unit tests run with eal_flags_file_prefix_autotest included shared with you via slack. I also shared the meson test summary, but of course it's the detailed testlog.txt you care about. > > One thing bothers me.. why are we not seeing this failure with ARM for > Bruce v6 series? > Just looking at patchwork, I would think that I can merge Bruce series as > is. > > https://patchwork.dpdk.org/project/dpdk/patch/20230816153439.551501-12-br= uce.richardson@intel.com/ > > So, this is a niche edge case, but because we fail to apply the fast-test filtering script in our jenkinsfile script, we exit without doing any unit testing and don't save or report any results. Almost always if we fail doing "unh jenkins scipt" stuff, it's an infra failure, not a problem with a patch, and we don't want to report a false positive failure result there. It does further exemplify the danger in our current process, of course. I'll be glad to not have to do this anymore. I did try to make this point above, but I don't think I explained it too well. The only other thing I'll add is that we are going to change our reporting process soon, to begin our pipeline run on a test/environment combo by reporting a "pending" result on that test/environent. Then we will overwrite it with a PASS or FAIL at the end. This helps protect us from situations like this. For instance, the way this would have played out is your would have had a label (iol-unit-arm64-testing) which would have had the initial "PENDING" result reported to it, but it never would have been updated from pending. So, you would know the CI results were incomplete. --0000000000000ac133060310495b Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable

On Wed, Aug 16, 2023 at 3:26= =E2=80=AFPM David Marchand <david.marchand@redhat.com> wrote:
On Wed, Aug 16, 2023 at 8:30=E2=80=AFPM Patrick R= obb <probb@iol.un= h.edu> wrote:
> On Wed, Aug 16, 2023 at 10:40=E2=80=AFAM David Marchand <david.marchand@redhat.= com> wrote:
>>
>> Patrick, Bruce,
>>
>> If it was reported, I either missed it or forgot about it, sorry.<= br> >> Can you (re)share the context?
>>
>>
>> >
>> > Does the test suite pass if the mlx5 driver is disabled in th= e build? That
>> > could confirm or refute the suspicion of where the issue is, = and also
>> > provide a temporary workaround while this set is merged (poss= ibly including
>> > support for disabling specific tests, as I suggested in my ot= her email).
>>
>> Or disabling the driver as Bruce proposes.
>
>=C2=A0 Okay, we ran the test with the mlx5 driver disabled, and it stil= l fails. So, this might be more of an ARM architecture issue. Ruifeng, are = you still seeing this on your test bed?
>
> @David you didn't miss anything, we had a unicast with ARM when se= tting up the new arm container runners for unit testing a few months back. = Ruifeng also noticed the same issue and speculated about mlx5 memory leaks.= He raised the possibility of disabling the mlx5 driver too, but that optio= n isn't great since we want to have a uniform build process (as much as= possible) for our unit testing. Anyways, now we know that that isn't r= elevant. I'll forward the thread to you in any case - let me know if yo= u have any ideas.

The mention of "memtest1" in the mails rings a bell.
I will need more detailed logs, or ideally an env where it is reproduced.
meson-logs/ for the unit tests run with eal_flags_fil= e_prefix_autotest included shared with you via slack. I also shared the mes= on test summary, but of course it's the detailed testlog.txt you care a= bout.=C2=A0=C2=A0

One thing bothers me.. why are we not seeing this failure with ARM for
Bruce v6 series?
Just looking at patchwork, I would think that I can merge Bruce series as i= s.
ht= tps://patchwork.dpdk.org/project/dpdk/patch/20230816153439.551501-12-bruce.= richardson@intel.com/

So, this is a niche edge case, but because we fail to= apply the fast-test filtering script in our jenkinsfile script, we exit wi= thout doing any unit testing and don't save or report any results. Almo= st always if we fail doing "unh jenkins=C2=A0scipt" stuff, it'= ;s an infra failure, not a problem with a patch, and we don't want to r= eport a false positive failure result there. It does further exemplify the = danger in our current process, of course. I'll be glad to not have to d= o this anymore. I did try to make this point above, but I don't think I= explained it too well.

The only other thing I'= ;ll add is that we are going to change our reporting process soon, to begin= our pipeline run on a test/environment combo by reporting a "pending&= quot; result on that test/environent. Then we will overwrite it with a PASS= or FAIL at the end. This helps protect us from situations like this. For i= nstance, the way this would have played out is your would have had a label = (iol-unit-arm64-testing) which would have had the initial "PENDING&quo= t; result reported to it, but it never would have been updated from pending= . So, you would know the CI results were incomplete.=C2=A0
--0000000000000ac133060310495b--