From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ci-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 0391242D09
	for <public@inbox.dpdk.org>; Tue, 20 Jun 2023 16:02:45 +0200 (CEST)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id D0E754068E;
	Tue, 20 Jun 2023 16:02:45 +0200 (CEST)
Received: from us-smtp-delivery-124.mimecast.com
 (us-smtp-delivery-124.mimecast.com [170.10.129.124])
 by mails.dpdk.org (Postfix) with ESMTP id 883DC400D6
 for <ci@dpdk.org>; Tue, 20 Jun 2023 16:02:44 +0200 (CEST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
 s=mimecast20190719; t=1687269764;
 h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
 content-transfer-encoding:content-transfer-encoding:
 in-reply-to:in-reply-to:references:references;
 bh=zBtni08mvfYX6oesLE7MjKVeYhE1y4dJ3je/M5ZyPzc=;
 b=OIRItiVzn2vEhhFlMcAxeR9FCgetrcHmLBoA2mL1P05H7Cw3avpFrCt9TwGw529EGED55c
 7cEW9cx0Vi7Mxj5UuO7iiqGaVl+HSh75yie6e5NB/Er3/vMrddVlNkth9DUcQzEvGWHVCy
 CBgkQdR6KIdis13IWUO+hukvgQlcrJs=
Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com
 [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id
 us-mta-157-YoJOcwfkNUq9G2JJ7G3tLg-1; Tue, 20 Jun 2023 10:02:37 -0400
X-MC-Unique: YoJOcwfkNUq9G2JJ7G3tLg-1
Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com
 [10.11.54.7])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by mimecast-mx02.redhat.com (Postfix) with ESMTPS id C9E78887121;
 Tue, 20 Jun 2023 14:01:37 +0000 (UTC)
Received: from RHTPC1VM0NT (unknown [10.22.34.71])
 by smtp.corp.redhat.com (Postfix) with ESMTPS id BB1C41402C06;
 Tue, 20 Jun 2023 14:01:36 +0000 (UTC)
From: Aaron Conole <aconole@redhat.com>
To: Patrick Robb <probb@iol.unh.edu>
Cc: Ferruh Yigit <ferruh.yigit@amd.com>,  ci@dpdk.org,  "Tu, Lijuan"
 <lijuan.tu@intel.com>,  zhoumin <zhoumin@loongson.cn>,  Michael Santana
 <maicolgabriel@hotmail.com>,  Lincoln Lavoie <lylavoie@iol.unh.edu>
Subject: Re: Email Based Re-Testing Framework
In-Reply-To: <CAJvnSUAT9P7EtcWQxUPjFfhqrvJO86_+6g2Y=na8_iFYEjav0Q@mail.gmail.com>
 (Patrick Robb's message of "Tue, 13 Jun 2023 09:28:00 -0400")
References: <CAJvnSUBc79Y+yA1gQRsifjH7BPYQKGxNaXr43x-HmFnPrQcOag@mail.gmail.com>
 <3fa6546b-8152-e317-30f0-30d5118b9fc4@amd.com>
 <CAJvnSUAbhwRqSw5jHRuP4Dwpa5eC5wCKMP+kFfCJE5NqePGVCw@mail.gmail.com>
 <f7tbkhr4enp.fsf@redhat.com>
 <CAJvnSUBgM5Q0Rxm2rNa3MkP+5euAL4J5+H2qCnkXgt4o7z0usg@mail.gmail.com>
 <f7tjzw83esi.fsf@redhat.com>
 <CAJvnSUAT9P7EtcWQxUPjFfhqrvJO86_+6g2Y=na8_iFYEjav0Q@mail.gmail.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Date: Tue, 20 Jun 2023 10:01:30 -0400
Message-ID: <f7t4jn2w7ut.fsf@redhat.com>
MIME-Version: 1.0
X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
X-BeenThere: ci@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK CI discussions <ci.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/ci>,
 <mailto:ci-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/ci/>
List-Post: <mailto:ci@dpdk.org>
List-Help: <mailto:ci-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/ci>,
 <mailto:ci-request@dpdk.org?subject=subscribe>
Errors-To: ci-bounces@dpdk.org

Patrick Robb <probb@iol.unh.edu> writes:

>  1. Ephemeral lab/network failures, or flaky unit tests that sometimes
>     fail.  In this case, we probably just want to re-run the tree as-is.
>
>  2. Failing tree before apply.  In this case, we have applied the series
>     to a tree, but the tree isn't in a good state and will fail
>     regardless of the series being applied.
>
> I had originally thought of this only as rerunning the tree as-is, though=
 if the community wants the 2nd option
> to be available that works too. It does increase the scope of the task an=
d complexity of the request text format
> to be understood for submitters. Personally, where I fall is that there i=
sn't enough additional value to justify
> doing more than rerunning as-is. Plus, if we do end up with a bad tree, s=
ubmitters can resubmit their patches
> when the tree is in a good state again, right? Or alternatively, lab mana=
gers may be aware of this situation
> and be able to order a rerun for the last x days (duration of bad tree) o=
n the most up to date tree.=20

Re: submitters resubmitting, that is currently one way to get an
automatic "retest" right?  The thought is to prevent cluttering up even
more series in patchwork, but I can see that it might not make sense
just yet.  It's probably fine to start with the retest "as-is," and add
the feature on later.

> On Mon, Jun 12, 2023 at 11:01=E2=80=AFAM Aaron Conole <aconole@redhat.com=
> wrote:
>
>  Patrick Robb <probb@iol.unh.edu> writes:
>
>  > On Wed, Jun 7, 2023 at 8:53=E2=80=AFAM Aaron Conole <aconole@redhat.co=
m> wrote:
>  >
>  >  Patrick Robb <probb@iol.unh.edu> writes:
>  >
>  >  >  Also it can be useful to run daily sub-tree testing by request, if=
 possible.
>  >  >
>  >  > That wouldn't be too difficult. I'll make a ticket for this. Althou=
gh, for testing on the daily sub-trees,
>  >  since that's
>  >  > UNH-IOL specific, that wouldn't necessarily have to be done via an =
email based testing request
>  >  framework. We
>  >  > could also just add a button to our dashboard which triggers a sub-=
tree ci run. That would help keep
>  >  narrow
>  >  > the scope of what the email based retesting framework is for. But, =
both email or a dashboard button
>  >  would
>  >  > both work.=20
>  >
>  >  We had discussed this long ago - including agreeing on a format, IIRC=
.
>  >
>  >  See the thread starting here:
>  >    https://mails.dpdk.org/archives/ci/2021-May/001189.html
>  >
>  >  The idea was to have a line like:
>  >
>  >  Recheck-request: <test names>
>  >
>  > I like using this simpler format which is easier to parse. As Thomas p=
ointed out, specifying labs does
>  not really
>  > provide extra information if we are already going to request by label/=
context, which is already specifies
>  the
>  > lab. =20
>
>  One thing we haven't discussed or determined is if we should have the
>  ability to re-apply the series or simply to rerun the patches based on
>  the original sha sum.  There are two cases I can think of:
>
>  1. Ephemeral lab/network failures, or flaky unit tests that sometimes
>     fail.  In this case, we probably just want to re-run the tree as-is.
>
>  2. Failing tree before apply.  In this case, we have applied the series
>     to a tree, but the tree isn't in a good state and will fail
>     regardless of the series being applied.
>
>  WDYT?  Does (2) case warrant any consideration as a possible reason to
>  retest?  If so, what is the right way of handling that situation?
>
>  >  where <test names> was the tests in the check labels.  In fact, what
>  >  started the discussion was a patch for the pw-ci scripts that
>  >  implemented part of it.
>  >
>  >  I don't see how to make your proposal as easily parsed.
>  >
>  >  WDYT?  Can you re-read that thread and come up with comments?
>  >
>  >  Will do. And thanks, this thread is very informative.=20
>  >
>  >  It is important to use the 'msgid' field to distinguish recheck
>  >  requests.  Otherwise, we will continuously reparse the same
>  >  recheck request and loop forever.  Additionally, we've discussed usin=
g a
>  >  counter to limit the recheck requests to a single 'recheck' per test
>  >  name.
>  >
>  > We can track message ids to avoid considering a single retest request =
twice. Perhaps we can
>  accomplish the
>  > same thing by tracking retested patchseries ids and their total number=
 of requested retests (which
>  could be 1
>  > retest per patchseries).=20
>  >
>  >   +function check_series_needs_retest() {
>  >
>  > +    local pw_instance=3D"$1"
>  > +
>  > +    series_get_active_branches "$pw_instance" | while IFS=3D\| read -=
r series_id project url repo
>  branchname; do
>  > +        local patch_comments_url=3D$(curl -s "$userpw" "$url" | jq -r=
c '.comments')
>  > +        if [ "Xnull" !=3D "X$patch_comments_url" ]; then
>  > +            local comments_json=3D$(curl -s "$userpw" "$patch_comment=
s_url")
>  > +            local seq_end=3D$(echo "$comments_json" | jq -rc 'length'=
)
>  > +            if [ "$seq_end" -a $seq_end -gt 0 ]; then
>  > +                seq_end=3D$((seq_end-1))
>  > +                for comment_id in $(seq 0 $seq_end); do
>  > +                    local recheck_requested=3D$(echo "$comments_json"=
 | jq -rc ".[$comment_id].content" |
>  grep
>  > "^Recheck-request: ")
>  > +                    if [ "X$recheck_requested" !=3D "X" ]; then
>  > +                        local msgid=3D$(echo "$comments_json" | jq -r=
c ".[$comment_id].msgid")
>  > +                        run_recheck "$pw_instance" "$series_id" "$pro=
ject" "$url" "$repo" "$branchname"
>  > "$recheck_requested" "$msgid"
>  > +                    fi
>  > +                done
>  > +            fi
>  > +        fi
>  > +    done
>  > +}
>  > This is already a superior approach to what I had in mind for acquirin=
g comments. Unless you're
>  opposed, I
>  > think at the communit lab we can experiment based on this starting poi=
nt to verify the process is
>  sound, but I
>  > don't see any problems here.=20
>  >
>  >  I think that if we're able to specify multiple contexts, then there's=
 not really any reason to run multiple
>  >  rechecks per patchset.
>  >
>  > Agreed.
>  >
>  >  There was also an ask on filtering requesters (only maintainers and
>  >
>  >  patch authors can ask for a recheck).=20
>  >
>  > If we can use the maintainers file as a single source of truth that is=
 convenient and stable as the list of
>  > maintainers changes. But, also I think retesting request permission sh=
ould be extended to the
>  submitter too.
>  > They may want to initiate a re-run without engaging a maintainer. It's=
 not likely to cause a big increase
>  in test
>  > load for us or other labs, so there's no harm there.=20
>  >
>  >  No, an explicit list is actually better.
>  >
>  >  When a new check is added, for someone looking at the mails (maybe 2/=
3
>  >
>  >  weeks later), and reading just "all", he would have to know what
>  >
>  >  checks were available at the time.=20
>  >
>  > Context/Labels rarely change, so I don't think this concern is too ser=
ious. But, if people dont mind
>  comma
>  > separating an entire list of contexts, that's fine.=20
>  >
>  > Thanks,
>  > Patrick