From: Aaron Conole <aconole@redhat.com>
To: jspewock@iol.unh.edu
Cc: ci@dpdk.org, alialnu@nvidia.com, probb@iol.unh.edu, ahassick@iol.unh.edu
Subject: Re: [PATCH v2 1/1] tools: add get_reruns script
Date: Mon, 25 Sep 2023 12:06:04 -0400 [thread overview]
Message-ID: <f7tfs32xmoj.fsf@redhat.com> (raw)
In-Reply-To: <20230907205551.19066-3-jspewock@iol.unh.edu> (jspewock@iol.unh.edu's message of "Thu, 7 Sep 2023 16:45:55 -0400")
jspewock@iol.unh.edu writes:
> From: Jeremy Spewock <jspewock@iol.unh.edu>
>
> This script is used to interact with the DPDK Patchwork API to collect a
> list of retests from comments on patches based on a desired list of
> contexts to retest. The script uses regex to scan all of the comments
> since a timestamp that is passed into the script through the CLI for
> any comment that is requesting a retest. These requests are then filtered
> based on the desired contexts that you pass into the script through the
> CLI and then aggregated based on the patch series ID of the series that
> the comment came from. This aggregated list is then outputted either to
> a JSON file or stdout with a timestamp of the most recent comment on
> patchworks.
>
> Signed-off-by: Jeremy Spewock <jspewock@iol.unh.edu>
> Signed-off-by: Adam Hassick <ahassick@iol.unh.edu>
> ---
Thanks Jeremy - I'll take a look this week. Just returning from PTO.
> tools/get_reruns.py | 218 ++++++++++++++++++++++++++++++++++++++++++++
> 1 file changed, 218 insertions(+)
> create mode 100755 tools/get_reruns.py
>
> diff --git a/tools/get_reruns.py b/tools/get_reruns.py
> new file mode 100755
> index 0000000..832da62
> --- /dev/null
> +++ b/tools/get_reruns.py
> @@ -0,0 +1,218 @@
> +#!/usr/bin/env python3
> +# -*- coding: utf-8 -*-
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2023 University of New Hampshire
> +
> +import argparse
> +import datetime
> +import json
> +import re
> +import requests
> +from typing import Dict, List, Optional, Set
> +
> +PATCHWORK_EVENTS_API_URL = "http://patches.dpdk.org/api/events/"
> +
> +
> +class JSONSetEncoder(json.JSONEncoder):
> + """Custom JSON encoder to handle sets.
> +
> + Pythons json module cannot serialize sets so this custom encoder converts
> + them into lists.
> + """
> +
> + def default(self, input_object):
> + if isinstance(input_object, set):
> + return list(input_object)
> + return input_object
> +
> +
> +class RerunProcessor:
> + """Class for finding reruns inside an email using the patchworks events
> + API.
> +
> + The idea of this class is to use regex to find certain patterns that
> + represent desired contexts to rerun.
> +
> + Arguments:
> + desired_contexts: List of all contexts to search for in the bodies of
> + the comments
> + time_since: Get all comments since this timestamp
> +
> + Attributes:
> + collection_of_retests: A dictionary that maps patch series IDs to the
> + set of contexts to be retested for that patch series.
> + regex: regex used for collecting the contexts from the comment body.
> + last_comment_timestamp: timestamp of the most recent comment that was
> + processed
> + """
> +
> + _desired_contexts: List[str]
> + _time_since: str
> + collection_of_retests: Dict[str, Dict[str, Set]] = {}
> + last_comment_timestamp: Optional[str] = None
> + # The tag we search for in comments must appear at the start of the line
> + # and is case sensitive. After this tag we expect a comma separated list
> + # of valid DPDK patchwork contexts.
> + #
> + # VALID MATCHES:
> + # Recheck-request: iol-unit-testing, iol-something-else, iol-one-more,
> + # Recheck-request: iol-unit-testing,iol-something-else, iol-one-more
> + # Recheck-request: iol-unit-testing, iol-example, iol-another-example,
> + # more-intel-testing
> + # INVALID MATCHES:
> + # Recheck-request: iol-unit-testing, intel-example-testing
> + # Recheck-request: iol-unit-testing iol-something-else,iol-one-more,
> + # Recheck-request: iol-unit-testing,iol-something-else,iol-one-more,
> + #
> + # more-intel-testing
> + regex: str = "^Recheck-request: ((?:[a-zA-Z0-9-_]+(?:, ?\n?)?)+)"
> +
> + def __init__(self, desired_contexts: List[str], time_since: str) -> None:
> + self._desired_contexts = desired_contexts
> + self._time_since = time_since
> +
> + def process_reruns(self) -> None:
> + patchwork_url = f"{PATCHWORK_EVENTS_API_URL}?since={self._time_since}"
> + comment_request_info = []
> + for item in [
> + "&category=cover-comment-created",
> + "&category=patch-comment-created",
> + ]:
> + response = requests.get(patchwork_url + item)
> + response.raise_for_status()
> + comment_request_info.extend(response.json())
> + rerun_processor.process_comment_info(comment_request_info)
> +
> + def process_comment_info(self, list_of_comment_blobs: List[Dict]) -> None:
> + """Takes the list of json blobs of comment information and associates
> + them with their patches.
> +
> + Collects retest labels from a list of comments on patches represented
> + inlist_of_comment_blobs and creates a dictionary that associates them
> + with their corresponding patch series ID. The labels that need to be
> + retested are collected by passing the comments body into
> + get_test_names() method. This method also updates the current UTC
> + timestamp for the processor to the current time.
> +
> + Args:
> + list_of_comment_blobs: a list of JSON blobs that represent comment
> + information
> + """
> +
> + list_of_comment_blobs = sorted(
> + list_of_comment_blobs,
> + key=lambda x: datetime.datetime.fromisoformat(x["date"]),
> + reverse=True,
> + )
> +
> + if list_of_comment_blobs:
> + most_recent_timestamp = datetime.datetime.fromisoformat(
> + list_of_comment_blobs[0]["date"]
> + )
> + # exclude the most recent
> + most_recent_timestamp = most_recent_timestamp + datetime.timedelta(
> + microseconds=1
> + )
> + self.last_comment_timestamp = most_recent_timestamp.isoformat()
> +
> + for comment in list_of_comment_blobs:
> + # before we do any parsing we want to make sure that we are dealing
> + # with a comment that is associated with a patch series
> + payload_key = "cover"
> + if comment["category"] == "patch-comment-created":
> + payload_key = "patch"
> + patch_series_arr = requests.get(
> + comment["payload"][payload_key]["url"]
> + ).json()["series"]
> + if not patch_series_arr:
> + continue
> + patch_id = patch_series_arr[0]["id"]
> +
> + comment_info = requests.get(comment["payload"]["comment"]["url"])
> + comment_info.raise_for_status()
> + content = comment_info.json()["content"]
> +
> + labels_to_rerun = self.get_test_names(content)
> +
> + # appending to the list if it already exists, or creating it if it
> + # doesn't
> + if labels_to_rerun:
> + self.collection_of_retests[patch_id] = self.collection_of_retests.get(
> + patch_id, {"contexts": set()}
> + )
> + self.collection_of_retests[patch_id]["contexts"].update(labels_to_rerun)
> +
> + def get_test_names(self, email_body: str) -> Set[str]:
> + """Uses the regex in the class to get the information from the email.
> +
> + When it gets the test names from the email, it will all be in one
> + capture group. We expect a comma separated list of patchwork labels
> + to be retested.
> +
> + Returns:
> + A set of contexts found in the email that match your list of
> + desired contexts to capture. We use a set here to avoid duplicate
> + contexts.
> + """
> + rerun_section = re.findall(self.regex, email_body, re.MULTILINE)
> + if not rerun_section:
> + return set()
> + rerun_list = list(map(str.strip, rerun_section[0].split(",")))
> + return set(filter(lambda x: x and x in self._desired_contexts, rerun_list))
> +
> + def write_output(self, file_name: str) -> None:
> + """Output class information.
> +
> + Takes the collection_of_retests and last_comment_timestamp and outputs
> + them into either a json file or stdout.
> +
> + Args:
> + file_name: Name of the file to write the output to. If this is set
> + to "-" then it will output to stdout.
> + """
> +
> + output_dict = {
> + "retests": self.collection_of_retests,
> + "last_comment_timestamp": self.last_comment_timestamp,
> + }
> + if file_name == "-":
> + print(json.dumps(output_dict, indent=4, cls=JSONSetEncoder))
> + else:
> + with open(file_name, "w") as file:
> + file.write(json.dumps(output_dict, indent=4, cls=JSONSetEncoder))
> +
> +
> +if __name__ == "__main__":
> + parser = argparse.ArgumentParser(description="Help text for getting reruns")
> + parser.add_argument(
> + "-ts",
> + "--time-since",
> + dest="time_since",
> + required=True,
> + help='Get all patches since this timestamp (yyyy-mm-ddThh:mm:ss.SSSSSS).',
> + )
> + parser.add_argument(
> + "--contexts",
> + dest="contexts_to_capture",
> + nargs="*",
> + required=True,
> + help='List of patchwork contexts you would like to capture.',
> + )
> + parser.add_argument(
> + "-o",
> + "--out-file",
> + dest="out_file",
> + help=(
> + 'Output file where the list of reruns and the timestamp of the'
> + 'last comment in the list of comments is sent. If this is set'
> + 'to "-" then it will output to stdout (default: -).'
> + ),
> + default="-",
> + )
> + args = parser.parse_args()
> + rerun_processor = RerunProcessor(args.contexts_to_capture, args.time_since)
> + rerun_processor.process_reruns()
> + rerun_processor.write_output(args.out_file)
next prev parent reply other threads:[~2023-09-25 16:06 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-07 20:45 [PATCH v2 0/1] tools: Add script for getting rerun requests jspewock
2023-09-07 20:45 ` [PATCH v2 1/1] tools: add get_reruns script jspewock
2023-09-25 16:06 ` Aaron Conole [this message]
2023-09-27 14:23 ` Jeremy Spewock
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f7tfs32xmoj.fsf@redhat.com \
--to=aconole@redhat.com \
--cc=ahassick@iol.unh.edu \
--cc=alialnu@nvidia.com \
--cc=ci@dpdk.org \
--cc=jspewock@iol.unh.edu \
--cc=probb@iol.unh.edu \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).