1. Ephemeral lab/network failures, or flaky unit tests that sometimes
   fail.  In this case, we probably just want to re-run the tree as-is.

2. Failing tree before apply.  In this case, we have applied the series
   to a tree, but the tree isn't in a good state and will fail
   regardless of the series being applied.

I had originally thought of this only as rerunning the tree as-is, though if the community wants the 2nd option to be available that works too. It does increase the scope of the task and complexity of the request text format to be understood for submitters. Personally, where I fall is that there isn't enough additional value to justify doing more than rerunning as-is. Plus, if we do end up with a bad tree, submitters can resubmit their patches when the tree is in a good state again, right? Or alternatively, lab managers may be aware of this situation and be able to order a rerun for the last x days (duration of bad tree) on the most up to date tree. 

On Mon, Jun 12, 2023 at 11:01 AM Aaron Conole <aconole@redhat.com> wrote:
Patrick Robb <probb@iol.unh.edu> writes:

> On Wed, Jun 7, 2023 at 8:53 AM Aaron Conole <aconole@redhat.com> wrote:
>
>  Patrick Robb <probb@iol.unh.edu> writes:
>
>  >  Also it can be useful to run daily sub-tree testing by request, if possible.
>  >
>  > That wouldn't be too difficult. I'll make a ticket for this. Although, for testing on the daily sub-trees,
>  since that's
>  > UNH-IOL specific, that wouldn't necessarily have to be done via an email based testing request
>  framework. We
>  > could also just add a button to our dashboard which triggers a sub-tree ci run. That would help keep
>  narrow
>  > the scope of what the email based retesting framework is for. But, both email or a dashboard button
>  would
>  > both work.
>
>  We had discussed this long ago - including agreeing on a format, IIRC.
>
>  See the thread starting here:
>    https://mails.dpdk.org/archives/ci/2021-May/001189.html
>
>  The idea was to have a line like:
>
>  Recheck-request: <test names>
>
> I like using this simpler format which is easier to parse. As Thomas pointed out, specifying labs does not really
> provide extra information if we are already going to request by label/context, which is already specifies the
> lab. 

One thing we haven't discussed or determined is if we should have the
ability to re-apply the series or simply to rerun the patches based on
the original sha sum.  There are two cases I can think of:

1. Ephemeral lab/network failures, or flaky unit tests that sometimes
   fail.  In this case, we probably just want to re-run the tree as-is.

2. Failing tree before apply.  In this case, we have applied the series
   to a tree, but the tree isn't in a good state and will fail
   regardless of the series being applied.

WDYT?  Does (2) case warrant any consideration as a possible reason to
retest?  If so, what is the right way of handling that situation?

>  where <test names> was the tests in the check labels.  In fact, what
>  started the discussion was a patch for the pw-ci scripts that
>  implemented part of it.
>
>  I don't see how to make your proposal as easily parsed.
>
>  WDYT?  Can you re-read that thread and come up with comments?
>
>  Will do. And thanks, this thread is very informative.
>
>  It is important to use the 'msgid' field to distinguish recheck
>  requests.  Otherwise, we will continuously reparse the same
>  recheck request and loop forever.  Additionally, we've discussed using a
>  counter to limit the recheck requests to a single 'recheck' per test
>  name.
>
> We can track message ids to avoid considering a single retest request twice. Perhaps we can accomplish the
> same thing by tracking retested patchseries ids and their total number of requested retests (which could be 1
> retest per patchseries).
>
>   +function check_series_needs_retest() {
>
> +    local pw_instance="$1"
> +
> +    series_get_active_branches "$pw_instance" | while IFS=\| read -r series_id project url repo branchname; do
> +        local patch_comments_url=$(curl -s "$userpw" "$url" | jq -rc '.comments')
> +        if [ "Xnull" != "X$patch_comments_url" ]; then
> +            local comments_json=$(curl -s "$userpw" "$patch_comments_url")
> +            local seq_end=$(echo "$comments_json" | jq -rc 'length')
> +            if [ "$seq_end" -a $seq_end -gt 0 ]; then
> +                seq_end=$((seq_end-1))
> +                for comment_id in $(seq 0 $seq_end); do
> +                    local recheck_requested=$(echo "$comments_json" | jq -rc ".[$comment_id].content" | grep
> "^Recheck-request: ")
> +                    if [ "X$recheck_requested" != "X" ]; then
> +                        local msgid=$(echo "$comments_json" | jq -rc ".[$comment_id].msgid")
> +                        run_recheck "$pw_instance" "$series_id" "$project" "$url" "$repo" "$branchname"
> "$recheck_requested" "$msgid"
> +                    fi
> +                done
> +            fi
> +        fi
> +    done
> +}
> This is already a superior approach to what I had in mind for acquiring comments. Unless you're opposed, I
> think at the communit lab we can experiment based on this starting point to verify the process is sound, but I
> don't see any problems here.
>
>  I think that if we're able to specify multiple contexts, then there's not really any reason to run multiple
>  rechecks per patchset.
>
> Agreed.
>
>  There was also an ask on filtering requesters (only maintainers and
>
>  patch authors can ask for a recheck).
>
> If we can use the maintainers file as a single source of truth that is convenient and stable as the list of
> maintainers changes. But, also I think retesting request permission should be extended to the submitter too.
> They may want to initiate a re-run without engaging a maintainer. It's not likely to cause a big increase in test
> load for us or other labs, so there's no harm there.
>
>  No, an explicit list is actually better.
>
>  When a new check is added, for someone looking at the mails (maybe 2/3
>
>  weeks later), and reading just "all", he would have to know what
>
>  checks were available at the time.
>
> Context/Labels rarely change, so I don't think this concern is too serious. But, if people dont mind comma
> separating an entire list of contexts, that's fine.
>
> Thanks,
> Patrick



--

Patrick Robb

Technical Service Manager

UNH InterOperability Laboratory

21 Madbury Rd, Suite 100, Durham, NH 03824

www.iol.unh.edu