From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id ED15C42637; Mon, 25 Sep 2023 18:06:22 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E318A40E64; Mon, 25 Sep 2023 18:06:22 +0200 (CEST) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 8BEF24069F for ; Mon, 25 Sep 2023 18:06:21 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1695657980; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=fVnD9YQPngSv8TjBiKTN65En8NNBP9ppEZV3J4pEtFQ=; b=TN2o9sEupvP7C7Xas4x6vYN4Lflb5gtgQGXrVUOlNoxZDGmy9q4qNedQuUlxXfbKsfWJiE CmqtlHWIQIp2yKQ1wx5DzJ/jEXbfwT7wIzFMrjDgn++O5T6EdANoIMIyVOfHdKl7Qdeiwn +/Jtwz5GFkj8F4lwKOnfoIv8XNcnI3I= Received: from mimecast-mx02.redhat.com (mx-ext.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-325-PgDVM_S7NDGNcLEOjChv0Q-1; Mon, 25 Sep 2023 12:06:17 -0400 X-MC-Unique: PgDVM_S7NDGNcLEOjChv0Q-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.rdu2.redhat.com [10.11.54.5]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id DC32B1C09A45; Mon, 25 Sep 2023 16:06:15 +0000 (UTC) Received: from RHTPC1VM0NT (unknown [10.22.8.239]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 62CD87B62; Mon, 25 Sep 2023 16:06:15 +0000 (UTC) From: Aaron Conole To: jspewock@iol.unh.edu Cc: ci@dpdk.org, alialnu@nvidia.com, probb@iol.unh.edu, ahassick@iol.unh.edu Subject: Re: [PATCH v2 1/1] tools: add get_reruns script References: <20230907205551.19066-2-jspewock@iol.unh.edu> <20230907205551.19066-3-jspewock@iol.unh.edu> Date: Mon, 25 Sep 2023 12:06:04 -0400 In-Reply-To: <20230907205551.19066-3-jspewock@iol.unh.edu> (jspewock@iol.unh.edu's message of "Thu, 7 Sep 2023 16:45:55 -0400") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.5 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org jspewock@iol.unh.edu writes: > From: Jeremy Spewock > > This script is used to interact with the DPDK Patchwork API to collect a > list of retests from comments on patches based on a desired list of > contexts to retest. The script uses regex to scan all of the comments > since a timestamp that is passed into the script through the CLI for > any comment that is requesting a retest. These requests are then filtered > based on the desired contexts that you pass into the script through the > CLI and then aggregated based on the patch series ID of the series that > the comment came from. This aggregated list is then outputted either to > a JSON file or stdout with a timestamp of the most recent comment on > patchworks. > > Signed-off-by: Jeremy Spewock > Signed-off-by: Adam Hassick > --- Thanks Jeremy - I'll take a look this week. Just returning from PTO. > tools/get_reruns.py | 218 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 218 insertions(+) > create mode 100755 tools/get_reruns.py > > diff --git a/tools/get_reruns.py b/tools/get_reruns.py > new file mode 100755 > index 0000000..832da62 > --- /dev/null > +++ b/tools/get_reruns.py > @@ -0,0 +1,218 @@ > +#!/usr/bin/env python3 > +# -*- coding: utf-8 -*- > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2023 University of New Hampshire > + > +import argparse > +import datetime > +import json > +import re > +import requests > +from typing import Dict, List, Optional, Set > + > +PATCHWORK_EVENTS_API_URL = "http://patches.dpdk.org/api/events/" > + > + > +class JSONSetEncoder(json.JSONEncoder): > + """Custom JSON encoder to handle sets. > + > + Pythons json module cannot serialize sets so this custom encoder converts > + them into lists. > + """ > + > + def default(self, input_object): > + if isinstance(input_object, set): > + return list(input_object) > + return input_object > + > + > +class RerunProcessor: > + """Class for finding reruns inside an email using the patchworks events > + API. > + > + The idea of this class is to use regex to find certain patterns that > + represent desired contexts to rerun. > + > + Arguments: > + desired_contexts: List of all contexts to search for in the bodies of > + the comments > + time_since: Get all comments since this timestamp > + > + Attributes: > + collection_of_retests: A dictionary that maps patch series IDs to the > + set of contexts to be retested for that patch series. > + regex: regex used for collecting the contexts from the comment body. > + last_comment_timestamp: timestamp of the most recent comment that was > + processed > + """ > + > + _desired_contexts: List[str] > + _time_since: str > + collection_of_retests: Dict[str, Dict[str, Set]] = {} > + last_comment_timestamp: Optional[str] = None > + # The tag we search for in comments must appear at the start of the line > + # and is case sensitive. After this tag we expect a comma separated list > + # of valid DPDK patchwork contexts. > + # > + # VALID MATCHES: > + # Recheck-request: iol-unit-testing, iol-something-else, iol-one-more, > + # Recheck-request: iol-unit-testing,iol-something-else, iol-one-more > + # Recheck-request: iol-unit-testing, iol-example, iol-another-example, > + # more-intel-testing > + # INVALID MATCHES: > + # Recheck-request: iol-unit-testing, intel-example-testing > + # Recheck-request: iol-unit-testing iol-something-else,iol-one-more, > + # Recheck-request: iol-unit-testing,iol-something-else,iol-one-more, > + # > + # more-intel-testing > + regex: str = "^Recheck-request: ((?:[a-zA-Z0-9-_]+(?:, ?\n?)?)+)" > + > + def __init__(self, desired_contexts: List[str], time_since: str) -> None: > + self._desired_contexts = desired_contexts > + self._time_since = time_since > + > + def process_reruns(self) -> None: > + patchwork_url = f"{PATCHWORK_EVENTS_API_URL}?since={self._time_since}" > + comment_request_info = [] > + for item in [ > + "&category=cover-comment-created", > + "&category=patch-comment-created", > + ]: > + response = requests.get(patchwork_url + item) > + response.raise_for_status() > + comment_request_info.extend(response.json()) > + rerun_processor.process_comment_info(comment_request_info) > + > + def process_comment_info(self, list_of_comment_blobs: List[Dict]) -> None: > + """Takes the list of json blobs of comment information and associates > + them with their patches. > + > + Collects retest labels from a list of comments on patches represented > + inlist_of_comment_blobs and creates a dictionary that associates them > + with their corresponding patch series ID. The labels that need to be > + retested are collected by passing the comments body into > + get_test_names() method. This method also updates the current UTC > + timestamp for the processor to the current time. > + > + Args: > + list_of_comment_blobs: a list of JSON blobs that represent comment > + information > + """ > + > + list_of_comment_blobs = sorted( > + list_of_comment_blobs, > + key=lambda x: datetime.datetime.fromisoformat(x["date"]), > + reverse=True, > + ) > + > + if list_of_comment_blobs: > + most_recent_timestamp = datetime.datetime.fromisoformat( > + list_of_comment_blobs[0]["date"] > + ) > + # exclude the most recent > + most_recent_timestamp = most_recent_timestamp + datetime.timedelta( > + microseconds=1 > + ) > + self.last_comment_timestamp = most_recent_timestamp.isoformat() > + > + for comment in list_of_comment_blobs: > + # before we do any parsing we want to make sure that we are dealing > + # with a comment that is associated with a patch series > + payload_key = "cover" > + if comment["category"] == "patch-comment-created": > + payload_key = "patch" > + patch_series_arr = requests.get( > + comment["payload"][payload_key]["url"] > + ).json()["series"] > + if not patch_series_arr: > + continue > + patch_id = patch_series_arr[0]["id"] > + > + comment_info = requests.get(comment["payload"]["comment"]["url"]) > + comment_info.raise_for_status() > + content = comment_info.json()["content"] > + > + labels_to_rerun = self.get_test_names(content) > + > + # appending to the list if it already exists, or creating it if it > + # doesn't > + if labels_to_rerun: > + self.collection_of_retests[patch_id] = self.collection_of_retests.get( > + patch_id, {"contexts": set()} > + ) > + self.collection_of_retests[patch_id]["contexts"].update(labels_to_rerun) > + > + def get_test_names(self, email_body: str) -> Set[str]: > + """Uses the regex in the class to get the information from the email. > + > + When it gets the test names from the email, it will all be in one > + capture group. We expect a comma separated list of patchwork labels > + to be retested. > + > + Returns: > + A set of contexts found in the email that match your list of > + desired contexts to capture. We use a set here to avoid duplicate > + contexts. > + """ > + rerun_section = re.findall(self.regex, email_body, re.MULTILINE) > + if not rerun_section: > + return set() > + rerun_list = list(map(str.strip, rerun_section[0].split(","))) > + return set(filter(lambda x: x and x in self._desired_contexts, rerun_list)) > + > + def write_output(self, file_name: str) -> None: > + """Output class information. > + > + Takes the collection_of_retests and last_comment_timestamp and outputs > + them into either a json file or stdout. > + > + Args: > + file_name: Name of the file to write the output to. If this is set > + to "-" then it will output to stdout. > + """ > + > + output_dict = { > + "retests": self.collection_of_retests, > + "last_comment_timestamp": self.last_comment_timestamp, > + } > + if file_name == "-": > + print(json.dumps(output_dict, indent=4, cls=JSONSetEncoder)) > + else: > + with open(file_name, "w") as file: > + file.write(json.dumps(output_dict, indent=4, cls=JSONSetEncoder)) > + > + > +if __name__ == "__main__": > + parser = argparse.ArgumentParser(description="Help text for getting reruns") > + parser.add_argument( > + "-ts", > + "--time-since", > + dest="time_since", > + required=True, > + help='Get all patches since this timestamp (yyyy-mm-ddThh:mm:ss.SSSSSS).', > + ) > + parser.add_argument( > + "--contexts", > + dest="contexts_to_capture", > + nargs="*", > + required=True, > + help='List of patchwork contexts you would like to capture.', > + ) > + parser.add_argument( > + "-o", > + "--out-file", > + dest="out_file", > + help=( > + 'Output file where the list of reruns and the timestamp of the' > + 'last comment in the list of comments is sent. If this is set' > + 'to "-" then it will output to stdout (default: -).' > + ), > + default="-", > + ) > + args = parser.parse_args() > + rerun_processor = RerunProcessor(args.contexts_to_capture, args.time_since) > + rerun_processor.process_reruns() > + rerun_processor.write_output(args.out_file)