From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 72B4D42536; Thu, 7 Sep 2023 14:56:06 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 5CC40402AF; Thu, 7 Sep 2023 14:56:06 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 4761E4026C for ; Thu, 7 Sep 2023 14:56:05 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1694091364; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=M7SuutFSTAYX3fVB49PtFByLwRpscwUOkBD9YLzh2wY=; b=bAlvwkDf5dVvsuC70O0zo7/yqh2mEuoj7F08TGX6cDvNtkC3I0fr+Q9XBJ7XKc5mYRAcEK nLNY1E4kabJG2jLG8W/zNq7VJKNahFr7daAaCO1BR9JLHQmwVblF8mxOqKT74mf3bJEN1n oYxt+5YaaW807KYdR5YacsmmY0F5ZBM= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-251-KJhcXzvXPWq0Z5sLM7rzxA-1; Thu, 07 Sep 2023 08:56:01 -0400 X-MC-Unique: KJhcXzvXPWq0Z5sLM7rzxA-1 Received: from smtp.corp.redhat.com (int-mx06.intmail.prod.int.rdu2.redhat.com [10.11.54.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 00515181B544; Thu, 7 Sep 2023 12:56:01 +0000 (UTC) Received: from RHTPC1VM0NT (unknown [10.22.32.194]) by smtp.corp.redhat.com (Postfix) with ESMTPS id AD4D920BAE35; Thu, 7 Sep 2023 12:56:00 +0000 (UTC) From: Aaron Conole To: jspewock@iol.unh.edu Cc: ci@dpdk.org, alialnu@nvidia.com, probb@iol.unh.edu, Adam Hassick Subject: Re: [PATCH 1/1] tools: add get_reruns script References: <20230905222317.25821-2-jspewock@iol.unh.edu> <20230905222317.25821-4-jspewock@iol.unh.edu> Date: Thu, 07 Sep 2023 08:56:00 -0400 In-Reply-To: <20230905222317.25821-4-jspewock@iol.unh.edu> (jspewock@iol.unh.edu's message of "Tue, 5 Sep 2023 18:13:03 -0400") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.6 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org Hi Jeremy, jspewock@iol.unh.edu writes: > From: Jeremy Spewock > > This script is used to interact with the DPDK Patchwork API to collect a > list of retests from comments on patches based on a desired list of > contexts to retest. The script uses regex to scan all of the comments > since a timestamp that is passed into the script through the CLI for > any comment that is requesting a retest. These requests are then filtered > based on the desired contexts that you pass into the script through the > CLI and then aggregated based on the patch series ID of the series that > the comment came from. This aggregated list is then outputted to a JSON > file with a timestamp of the most recent comment on patchworks. > > Signed-off-by: Jeremy Spewock > Signed-off-by: Adam Hassick > --- Thanks for the tool. > tools/get_reruns.py | 219 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 219 insertions(+) > create mode 100755 tools/get_reruns.py > > diff --git a/tools/get_reruns.py b/tools/get_reruns.py > new file mode 100755 > index 0000000..159ff6e > --- /dev/null > +++ b/tools/get_reruns.py > @@ -0,0 +1,219 @@ > +#!/usr/bin/env python3 > +# -*- coding: utf-8 -*- > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2023 University of New Hampshire > + > +import argparse > +import datetime > +import json > +import re > +from json import JSONEncoder > +from typing import Dict, List, Set, Optional > + > +import requests I think this block should be cleaned up a bit. The imports should be in alphabetical order. The block shouldn't have extra spaces. > + > + > +class JSONSetEncoder(JSONEncoder): > + """Custom JSON encoder to handle sets. > + > + Pythons json module cannot serialize sets so this custom encoder converts > + them into lists. > + > + Args: > + JSONEncoder: JSON encoder from the json python module. > + """ > + > + def default(self, input_object): > + if isinstance(input_object, set): > + return list(input_object) > + return input_object > + > + > +class RerunProcessor: > + """Class for finding reruns inside an email using the patchworks events > + API. > + > + The idea of this class is to use regex to find certain patterns that > + represent desired contexts to rerun. > + > + Arguments: > + desired_contexts: List of all contexts to search for in the bodies of > + the comments > + time_since: Get all comments since this timestamp > + > + Attributes: > + collection_of_retests: A dictionary that maps patch series IDs to the > + set of contexts to be retested for that patch series. > + regex: regex used for collecting the contexts from the comment body. > + last_comment_timestamp: timestamp of the most recent comment that was > + processed > + """ > + > + _desired_contexts: List[str] > + _time_since: str > + collection_of_retests: Dict[str, Dict[str, Set]] = {} > + last_comment_timestamp: Optional[str] = None > + # ^ is start of line > + # ((?:[a-zA-Z-]+(?:, ?\n?)?)+) is a capture group that gets all test > + # labels after "Recheck-request: " > + # (?:[a-zA-Z-]+(?:, ?\n?)?)+ means 1 or more of the first match group > + # [a-zA-Z0-9-_]+ means 1 more more of any character in the ranges a-z, > + # A-Z, 0-9, or the characters '-' or '_' > + # (?:, ?\n?)? means 1 or none of this match group which expects > + # exactly 1 comma followed by 1 or no spaces followed by > + # 1 or no newlines. This comment might not be needed. Afterall, we can see the regex group and you are just documenting python regex tool. Instead, maybe we should just re-iterate the understanding around recheck-request. For example, the comment we look for must appear at the start of a line, it is case sensitive tag, and > + # VALID MATCHES: > + # Recheck-request: iol-unit-testing, iol-something-else, iol-one-more, > + # Recheck-request: iol-unit-testing,iol-something-else, iol-one-more > + # Recheck-request: iol-unit-testing, iol-example, iol-another-example, > + # more-intel-testing > + # INVALID MATCHES: > + # Recheck-request: iol-unit-testing, intel-example-testing > + # Recheck-request: iol-unit-testing iol-something-else,iol-one-more, > + # Recheck-request: iol-unit-testing,iol-something-else,iol-one-more, > + # > + # more-intel-testing > + regex: str = "^Recheck-request: ((?:[a-zA-Z0-9-_]+(?:, ?\n?)?)+)" > + > + def __init__(self, desired_contexts: List[str], time_since: str) -> None: > + self._desired_contexts = desired_contexts > + self._time_since = time_since > + > + def process_reruns(self) -> None: > + patchwork_url = f"http://patches.dpdk.org/api/events/?since={self._time_since}" On the off-chance this API URL ever changes, we should make this configurable. > + comment_request_info = [] > + for item in [ > + "&category=cover-comment-created", > + "&category=patch-comment-created", > + ]: > + response = requests.get(patchwork_url + item) > + response.raise_for_status() > + comment_request_info.extend(response.json()) > + rerun_processor.process_comment_info(comment_request_info) > + > + def process_comment_info(self, list_of_comment_blobs: List[Dict]) -> None: > + """Takes the list of json blobs of comment information and associates > + them with their patches. > + > + Collects retest labels from a list of comments on patches represented > + inlist_of_comment_blobs and creates a dictionary that associates them > + with their corresponding patch series ID. The labels that need to be > + retested are collected by passing the comments body into > + get_test_names() method. This method also updates the current UTC > + timestamp for the processor to the current time. > + > + Args: > + list_of_comment_blobs: a list of JSON blobs that represent comment > + information > + """ > + > + list_of_comment_blobs = sorted( > + list_of_comment_blobs, > + key=lambda x: datetime.datetime.fromisoformat(x["date"]), > + reverse=True, > + ) > + > + if list_of_comment_blobs: > + most_recent_timestamp = datetime.datetime.fromisoformat( > + list_of_comment_blobs[0]["date"] > + ) > + # exclude the most recent > + most_recent_timestamp = most_recent_timestamp + datetime.timedelta( > + microseconds=1 > + ) > + self.last_comment_timestamp = most_recent_timestamp.isoformat() > + > + for comment in list_of_comment_blobs: > + # before we do any parsing we want to make sure that we are dealing > + # with a comment that is associated with a patch series > + payload_key = "cover" > + if comment["category"] == "patch-comment-created": > + payload_key = "patch" > + patch_series_arr = requests.get( > + comment["payload"][payload_key]["url"] > + ).json()["series"] > + if not patch_series_arr: > + continue > + patch_id = patch_series_arr[0]["id"] > + > + comment_info = requests.get(comment["payload"]["comment"]["url"]) > + comment_info.raise_for_status() > + content = comment_info.json()["content"] > + > + labels_to_rerun = self.get_test_names(content) > + > + # appending to the list if it already exists, or creating it if it > + # doesn't > + if labels_to_rerun: > + self.collection_of_retests[patch_id] = self.collection_of_retests.get( > + patch_id, {"contexts": set()} > + ) > + self.collection_of_retests[patch_id]["contexts"].update(labels_to_rerun) > + > + def get_test_names(self, email_body: str) -> Set[str]: > + """Uses the regex in the class to get the information from the email. > + > + When it gets the test names from the email, it will all be in one > + capture group. We expect a comma separated list of patchwork labels > + to be retested. > + > + Returns: > + A set of contexts found in the email that match your list of > + desired contexts to capture. We use a set here to avoid duplicate > + contexts. > + """ > + rerun_section = re.findall(self.regex, email_body, re.MULTILINE) > + if not rerun_section: > + return set() > + rerun_list = list(map(str.strip, rerun_section[0].split(","))) > + return set(filter(lambda x: x and x in self._desired_contexts, rerun_list)) > + > + def write_to_output_file(self, file_name: str) -> None: > + """Write class information to a JSON file. > + > + Takes the collection_of_retests and last_comment_timestamp and outputs > + them into a json file. > + > + Args: > + file_name: Name of the file to write the output to. > + """ Maybe it is also friendly to output to stdout with a filename like "-" so that we can use it in a script pipeline. > + output_dict = { > + "retests": self.collection_of_retests, > + "last_comment_timestamp": self.last_comment_timestamp, > + } > + with open(file_name, "w") as file: > + file.write(json.dumps(output_dict, indent=4, cls=JSONSetEncoder)) > + > + > +if __name__ == "__main__": > + parser = argparse.ArgumentParser(description="Help text for getting reruns") > + parser.add_argument( > + "-ts", > + "--time-since", > + dest="time_since", > + required=True, > + help="Get all patches since this many days ago (default: 5)", > + ) > + parser.add_argument( > + "--contexts", > + dest="contexts_to_capture", > + nargs="*", > + required=True, > + help="List of patchwork contexts you would like to capture", > + ) > + parser.add_argument( > + "-o", > + "--out-file", > + dest="out_file", > + help=( > + "Output file where the list of reruns and the timestamp of the" > + "last comment in the list of comments" > + "(default: rerun_requests.json)." > + ), > + default="rerun_requests.json", > + ) > + args = parser.parse_args() > + rerun_processor = RerunProcessor(args.contexts_to_capture, args.time_since) > + rerun_processor.process_reruns() > + rerun_processor.write_to_output_file(args.out_file)