DPDK CI discussions
 help / color / mirror / Atom feed
From: Aaron Conole <aconole@redhat.com>
To: jspewock@iol.unh.edu
Cc: ci@dpdk.org,  alialnu@nvidia.com,  probb@iol.unh.edu,
	 Adam Hassick <ahassick@iol.unh.edu>
Subject: Re: [PATCH 1/1] tools: add get_reruns script
Date: Thu, 07 Sep 2023 08:56:00 -0400	[thread overview]
Message-ID: <f7tfs3qm9b3.fsf@redhat.com> (raw)
In-Reply-To: <20230905222317.25821-4-jspewock@iol.unh.edu> (jspewock@iol.unh.edu's message of "Tue, 5 Sep 2023 18:13:03 -0400")

Hi Jeremy,

jspewock@iol.unh.edu writes:

> From: Jeremy Spewock <jspewock@iol.unh.edu>
>
> This script is used to interact with the DPDK Patchwork API to collect a
> list of retests from comments on patches based on a desired list of
> contexts to retest. The script uses regex to scan all of the comments
> since a timestamp that is passed into the script through the CLI for
> any comment that is requesting a retest. These requests are then filtered
> based on the desired contexts that you pass into the script through the
> CLI and then aggregated based on the patch series ID of the series that
> the comment came from. This aggregated list is then outputted to a JSON
> file with a timestamp of the most recent comment on patchworks.
>
> Signed-off-by: Jeremy Spewock <jspewock@iol.unh.edu>
> Signed-off-by: Adam Hassick <ahassick@iol.unh.edu>
> ---

Thanks for the tool.

>  tools/get_reruns.py | 219 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 219 insertions(+)
>  create mode 100755 tools/get_reruns.py
>
> diff --git a/tools/get_reruns.py b/tools/get_reruns.py
> new file mode 100755
> index 0000000..159ff6e
> --- /dev/null
> +++ b/tools/get_reruns.py
> @@ -0,0 +1,219 @@
> +#!/usr/bin/env python3
> +# -*- coding: utf-8 -*-
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2023 University of New Hampshire
> +
> +import argparse
> +import datetime
> +import json
> +import re
> +from json import JSONEncoder
> +from typing import Dict, List, Set, Optional
> +
> +import requests

I think this block should be cleaned up a bit.

The imports should be in alphabetical order.  The block shouldn't have
extra spaces.

> +
> +
> +class JSONSetEncoder(JSONEncoder):
> +    """Custom JSON encoder to handle sets.
> +
> +    Pythons json module cannot serialize sets so this custom encoder converts
> +    them into lists.
> +
> +    Args:
> +        JSONEncoder: JSON encoder from the json python module.
> +    """
> +
> +    def default(self, input_object):
> +        if isinstance(input_object, set):
> +            return list(input_object)
> +        return input_object
> +
> +
> +class RerunProcessor:
> +    """Class for finding reruns inside an email using the patchworks events
> +    API.
> +
> +    The idea of this class is to use regex to find certain patterns that
> +    represent desired contexts to rerun.
> +
> +    Arguments:
> +        desired_contexts: List of all contexts to search for in the bodies of
> +            the comments
> +        time_since: Get all comments since this timestamp
> +
> +    Attributes:
> +        collection_of_retests: A dictionary that maps patch series IDs to the
> +            set of contexts to be retested for that patch series.
> +        regex: regex used for collecting the contexts from the comment body.
> +        last_comment_timestamp: timestamp of the most recent comment that was
> +            processed
> +    """
> +
> +    _desired_contexts: List[str]
> +    _time_since: str
> +    collection_of_retests: Dict[str, Dict[str, Set]] = {}
> +    last_comment_timestamp: Optional[str] = None
> +    # ^ is start of line
> +    # ((?:[a-zA-Z-]+(?:, ?\n?)?)+) is a capture group that gets all test
> +    #   labels after "Recheck-request: "
> +    #   (?:[a-zA-Z-]+(?:, ?\n?)?)+ means 1 or more of the first match group
> +    #       [a-zA-Z0-9-_]+ means 1 more more of any character in the ranges a-z,
> +    #           A-Z, 0-9, or the characters '-' or '_'
> +    #       (?:, ?\n?)? means 1 or none of this match group which expects
> +    #           exactly 1 comma followed by 1 or no spaces followed by
> +    #           1 or no newlines.

This comment might not be needed.  Afterall, we can see the regex group
and you are just documenting python regex tool.  Instead, maybe we
should just re-iterate the understanding around recheck-request.  For
example, the comment we look for must appear at the start of a line, it
is case sensitive tag, and 

> +    # VALID MATCHES:
> +    #   Recheck-request: iol-unit-testing, iol-something-else, iol-one-more,
> +    #   Recheck-request: iol-unit-testing,iol-something-else, iol-one-more
> +    #   Recheck-request: iol-unit-testing, iol-example, iol-another-example,
> +    #   more-intel-testing
> +    # INVALID MATCHES:
> +    #   Recheck-request: iol-unit-testing,  intel-example-testing
> +    #   Recheck-request: iol-unit-testing iol-something-else,iol-one-more,
> +    #   Recheck-request: iol-unit-testing,iol-something-else,iol-one-more,
> +    #
> +    #   more-intel-testing
> +    regex: str = "^Recheck-request: ((?:[a-zA-Z0-9-_]+(?:, ?\n?)?)+)"
> +
> +    def __init__(self, desired_contexts: List[str], time_since: str) -> None:
> +        self._desired_contexts = desired_contexts
> +        self._time_since = time_since
> +
> +    def process_reruns(self) -> None:
> +        patchwork_url = f"http://patches.dpdk.org/api/events/?since={self._time_since}"

On the off-chance this API URL ever changes, we should make this
configurable.

> +        comment_request_info = []
> +        for item in [
> +            "&category=cover-comment-created",
> +            "&category=patch-comment-created",
> +        ]:
> +            response = requests.get(patchwork_url + item)
> +            response.raise_for_status()
> +            comment_request_info.extend(response.json())
> +        rerun_processor.process_comment_info(comment_request_info)
> +
> +    def process_comment_info(self, list_of_comment_blobs: List[Dict]) -> None:
> +        """Takes the list of json blobs of comment information and associates
> +        them with their patches.
> +
> +        Collects retest labels from a list of comments on patches represented
> +        inlist_of_comment_blobs and creates a dictionary that associates them
> +        with their corresponding patch series ID. The labels that need to be
> +        retested are collected by passing the comments body into
> +        get_test_names() method. This method also updates the current UTC
> +        timestamp for the processor to the current time.
> +
> +        Args:
> +            list_of_comment_blobs: a list of JSON blobs that represent comment
> +            information
> +        """
> +
> +        list_of_comment_blobs = sorted(
> +            list_of_comment_blobs,
> +            key=lambda x: datetime.datetime.fromisoformat(x["date"]),
> +            reverse=True,
> +        )
> +
> +        if list_of_comment_blobs:
> +            most_recent_timestamp = datetime.datetime.fromisoformat(
> +                list_of_comment_blobs[0]["date"]
> +            )
> +            # exclude the most recent
> +            most_recent_timestamp = most_recent_timestamp + datetime.timedelta(
> +                microseconds=1
> +            )
> +            self.last_comment_timestamp = most_recent_timestamp.isoformat()
> +
> +        for comment in list_of_comment_blobs:
> +            # before we do any parsing we want to make sure that we are dealing
> +            # with a comment that is associated with a patch series
> +            payload_key = "cover"
> +            if comment["category"] == "patch-comment-created":
> +                payload_key = "patch"
> +            patch_series_arr = requests.get(
> +                comment["payload"][payload_key]["url"]
> +            ).json()["series"]
> +            if not patch_series_arr:
> +                continue
> +            patch_id = patch_series_arr[0]["id"]
> +
> +            comment_info = requests.get(comment["payload"]["comment"]["url"])
> +            comment_info.raise_for_status()
> +            content = comment_info.json()["content"]
> +
> +            labels_to_rerun = self.get_test_names(content)
> +
> +            # appending to the list if it already exists, or creating it if it
> +            # doesn't
> +            if labels_to_rerun:
> +                self.collection_of_retests[patch_id] = self.collection_of_retests.get(
> +                    patch_id, {"contexts": set()}
> +                )
> +                self.collection_of_retests[patch_id]["contexts"].update(labels_to_rerun)
> +
> +    def get_test_names(self, email_body: str) -> Set[str]:
> +        """Uses the regex in the class to get the information from the email.
> +
> +        When it gets the test names from the email, it will all be in one
> +        capture group. We expect a comma separated list of patchwork labels
> +        to be retested.
> +
> +        Returns:
> +            A set of contexts found in the email that match your list of
> +            desired contexts to capture. We use a set here to avoid duplicate
> +            contexts.
> +        """
> +        rerun_section = re.findall(self.regex, email_body, re.MULTILINE)
> +        if not rerun_section:
> +            return set()
> +        rerun_list = list(map(str.strip, rerun_section[0].split(",")))
> +        return set(filter(lambda x: x and x in self._desired_contexts, rerun_list))
> +
> +    def write_to_output_file(self, file_name: str) -> None:
> +        """Write class information to a JSON file.
> +
> +        Takes the collection_of_retests and last_comment_timestamp and outputs
> +        them into a json file.
> +
> +        Args:
> +            file_name: Name of the file to write the output to.
> +        """

Maybe it is also friendly to output to stdout with a filename like "-"
so that we can use it in a script pipeline.

> +        output_dict = {
> +            "retests": self.collection_of_retests,
> +            "last_comment_timestamp": self.last_comment_timestamp,
> +        }
> +        with open(file_name, "w") as file:
> +            file.write(json.dumps(output_dict, indent=4, cls=JSONSetEncoder))
> +
> +
> +if __name__ == "__main__":
> +    parser = argparse.ArgumentParser(description="Help text for getting reruns")
> +    parser.add_argument(
> +        "-ts",
> +        "--time-since",
> +        dest="time_since",
> +        required=True,
> +        help="Get all patches since this many days ago (default: 5)",
> +    )
> +    parser.add_argument(
> +        "--contexts",
> +        dest="contexts_to_capture",
> +        nargs="*",
> +        required=True,
> +        help="List of patchwork contexts you would like to capture",
> +    )
> +    parser.add_argument(
> +        "-o",
> +        "--out-file",
> +        dest="out_file",
> +        help=(
> +            "Output file where the list of reruns and the timestamp of the"
> +            "last comment in the list of comments"
> +            "(default: rerun_requests.json)."
> +        ),
> +        default="rerun_requests.json",
> +    )
> +    args = parser.parse_args()
> +    rerun_processor = RerunProcessor(args.contexts_to_capture, args.time_since)
> +    rerun_processor.process_reruns()
> +    rerun_processor.write_to_output_file(args.out_file)


      reply	other threads:[~2023-09-07 12:56 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-05 22:13 [PATCH 0/1] tools: Add script for getting rerun requests jspewock
2023-09-05 22:13 ` [PATCH 1/1] tools: add get_reruns script jspewock
2023-09-07 12:56   ` Aaron Conole [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f7tfs3qm9b3.fsf@redhat.com \
    --to=aconole@redhat.com \
    --cc=ahassick@iol.unh.edu \
    --cc=alialnu@nvidia.com \
    --cc=ci@dpdk.org \
    --cc=jspewock@iol.unh.edu \
    --cc=probb@iol.unh.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).