[PATCH 0/1] tools: Add script for getting rerun requests

DPDK CI discussions
 help / color / mirror / Atom feed

* [PATCH 0/1] tools: Add script for getting rerun requests
@ 2023-09-05 22:13 jspewock
  2023-09-05 22:13 ` [PATCH 1/1] tools: add get_reruns script jspewock
  0 siblings, 1 reply; 3+ messages in thread
From: jspewock @ 2023-09-05 22:13 UTC (permalink / raw)
  To: ci; +Cc: aconole, alialnu, probb, Jeremy Spewock

From: Jeremy Spewock <jspewock@iol.unh.edu>

As the community suggested, there should be a way for users to retest
any contexts on a given patch without needing to contact a maintainer or
resubmitting the patch. The UNH-IOL Community lab has implmented a way
to do this using the get_reruns.py script. This script is a generic way
for anyone to provide a list of contexts to filter by and a timestamp of
the time they want to consider comments since to get a
automation-friendly JSON file that contains all contexts to be rerun
grouped by the patch series ID that they should be run on. The output
JSON file also includes a timestamp of the most recent comment that was
considered so that you don't process the same data repeatedly.

The idea behind upstreaming this script is so that other labs can have
one common generic script to get the rerun information and then process
that data internally as they see fit.

Jeremy Spewock (1):
  tools: add get_reruns script

 tools/get_reruns.py | 219 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 219 insertions(+)
 create mode 100755 tools/get_reruns.py

-- 
2.41.0

^ permalink raw reply	[flat|nested] 3+ messages in thread

* [PATCH 1/1] tools: add get_reruns script
  2023-09-05 22:13 [PATCH 0/1] tools: Add script for getting rerun requests jspewock
@ 2023-09-05 22:13 ` jspewock
  2023-09-07 12:56   ` Aaron Conole
  0 siblings, 1 reply; 3+ messages in thread
From: jspewock @ 2023-09-05 22:13 UTC (permalink / raw)
  To: ci; +Cc: aconole, alialnu, probb, Jeremy Spewock, Adam Hassick

From: Jeremy Spewock <jspewock@iol.unh.edu>

This script is used to interact with the DPDK Patchwork API to collect a
list of retests from comments on patches based on a desired list of
contexts to retest. The script uses regex to scan all of the comments
since a timestamp that is passed into the script through the CLI for
any comment that is requesting a retest. These requests are then filtered
based on the desired contexts that you pass into the script through the
CLI and then aggregated based on the patch series ID of the series that
the comment came from. This aggregated list is then outputted to a JSON
file with a timestamp of the most recent comment on patchworks.

Signed-off-by: Jeremy Spewock <jspewock@iol.unh.edu>
Signed-off-by: Adam Hassick <ahassick@iol.unh.edu>
---
 tools/get_reruns.py | 219 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 219 insertions(+)
 create mode 100755 tools/get_reruns.py

diff --git a/tools/get_reruns.py b/tools/get_reruns.py
new file mode 100755
index 0000000..159ff6e
--- /dev/null
+++ b/tools/get_reruns.py
@@ -0,0 +1,219 @@
+#!/usr/bin/env python3
+# -*- coding: utf-8 -*-
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 University of New Hampshire
+
+import argparse
+import datetime
+import json
+import re
+from json import JSONEncoder
+from typing import Dict, List, Set, Optional
+
+import requests
+
+
+class JSONSetEncoder(JSONEncoder):
+    """Custom JSON encoder to handle sets.
+
+    Pythons json module cannot serialize sets so this custom encoder converts
+    them into lists.
+
+    Args:
+        JSONEncoder: JSON encoder from the json python module.
+    """
+
+    def default(self, input_object):
+        if isinstance(input_object, set):
+            return list(input_object)
+        return input_object
+
+
+class RerunProcessor:
+    """Class for finding reruns inside an email using the patchworks events
+    API.
+
+    The idea of this class is to use regex to find certain patterns that
+    represent desired contexts to rerun.
+
+    Arguments:
+        desired_contexts: List of all contexts to search for in the bodies of
+            the comments
+        time_since: Get all comments since this timestamp
+
+    Attributes:
+        collection_of_retests: A dictionary that maps patch series IDs to the
+            set of contexts to be retested for that patch series.
+        regex: regex used for collecting the contexts from the comment body.
+        last_comment_timestamp: timestamp of the most recent comment that was
+            processed
+    """
+
+    _desired_contexts: List[str]
+    _time_since: str
+    collection_of_retests: Dict[str, Dict[str, Set]] = {}
+    last_comment_timestamp: Optional[str] = None
+    # ^ is start of line
+    # ((?:[a-zA-Z-]+(?:, ?\n?)?)+) is a capture group that gets all test
+    #   labels after "Recheck-request: "
+    #   (?:[a-zA-Z-]+(?:, ?\n?)?)+ means 1 or more of the first match group
+    #       [a-zA-Z0-9-_]+ means 1 more more of any character in the ranges a-z,
+    #           A-Z, 0-9, or the characters '-' or '_'
+    #       (?:, ?\n?)? means 1 or none of this match group which expects
+    #           exactly 1 comma followed by 1 or no spaces followed by
+    #           1 or no newlines.
+    # VALID MATCHES:
+    #   Recheck-request: iol-unit-testing, iol-something-else, iol-one-more,
+    #   Recheck-request: iol-unit-testing,iol-something-else, iol-one-more
+    #   Recheck-request: iol-unit-testing, iol-example, iol-another-example,
+    #   more-intel-testing
+    # INVALID MATCHES:
+    #   Recheck-request: iol-unit-testing,  intel-example-testing
+    #   Recheck-request: iol-unit-testing iol-something-else,iol-one-more,
+    #   Recheck-request: iol-unit-testing,iol-something-else,iol-one-more,
+    #
+    #   more-intel-testing
+    regex: str = "^Recheck-request: ((?:[a-zA-Z0-9-_]+(?:, ?\n?)?)+)"
+
+    def __init__(self, desired_contexts: List[str], time_since: str) -> None:
+        self._desired_contexts = desired_contexts
+        self._time_since = time_since
+
+    def process_reruns(self) -> None:
+        patchwork_url = f"http://patches.dpdk.org/api/events/?since={self._time_since}"
+        comment_request_info = []
+        for item in [
+            "&category=cover-comment-created",
+            "&category=patch-comment-created",
+        ]:
+            response = requests.get(patchwork_url + item)
+            response.raise_for_status()
+            comment_request_info.extend(response.json())
+        rerun_processor.process_comment_info(comment_request_info)
+
+    def process_comment_info(self, list_of_comment_blobs: List[Dict]) -> None:
+        """Takes the list of json blobs of comment information and associates
+        them with their patches.
+
+        Collects retest labels from a list of comments on patches represented
+        inlist_of_comment_blobs and creates a dictionary that associates them
+        with their corresponding patch series ID. The labels that need to be
+        retested are collected by passing the comments body into
+        get_test_names() method. This method also updates the current UTC
+        timestamp for the processor to the current time.
+
+        Args:
+            list_of_comment_blobs: a list of JSON blobs that represent comment
+            information
+        """
+
+        list_of_comment_blobs = sorted(
+            list_of_comment_blobs,
+            key=lambda x: datetime.datetime.fromisoformat(x["date"]),
+            reverse=True,
+        )
+
+        if list_of_comment_blobs:
+            most_recent_timestamp = datetime.datetime.fromisoformat(
+                list_of_comment_blobs[0]["date"]
+            )
+            # exclude the most recent
+            most_recent_timestamp = most_recent_timestamp + datetime.timedelta(
+                microseconds=1
+            )
+            self.last_comment_timestamp = most_recent_timestamp.isoformat()
+
+        for comment in list_of_comment_blobs:
+            # before we do any parsing we want to make sure that we are dealing
+            # with a comment that is associated with a patch series
+            payload_key = "cover"
+            if comment["category"] == "patch-comment-created":
+                payload_key = "patch"
+            patch_series_arr = requests.get(
+                comment["payload"][payload_key]["url"]
+            ).json()["series"]
+            if not patch_series_arr:
+                continue
+            patch_id = patch_series_arr[0]["id"]
+
+            comment_info = requests.get(comment["payload"]["comment"]["url"])
+            comment_info.raise_for_status()
+            content = comment_info.json()["content"]
+
+            labels_to_rerun = self.get_test_names(content)
+
+            # appending to the list if it already exists, or creating it if it
+            # doesn't
+            if labels_to_rerun:
+                self.collection_of_retests[patch_id] = self.collection_of_retests.get(
+                    patch_id, {"contexts": set()}
+                )
+                self.collection_of_retests[patch_id]["contexts"].update(labels_to_rerun)
+
+    def get_test_names(self, email_body: str) -> Set[str]:
+        """Uses the regex in the class to get the information from the email.
+
+        When it gets the test names from the email, it will all be in one
+        capture group. We expect a comma separated list of patchwork labels
+        to be retested.
+
+        Returns:
+            A set of contexts found in the email that match your list of
+            desired contexts to capture. We use a set here to avoid duplicate
+            contexts.
+        """
+        rerun_section = re.findall(self.regex, email_body, re.MULTILINE)
+        if not rerun_section:
+            return set()
+        rerun_list = list(map(str.strip, rerun_section[0].split(",")))
+        return set(filter(lambda x: x and x in self._desired_contexts, rerun_list))
+
+    def write_to_output_file(self, file_name: str) -> None:
+        """Write class information to a JSON file.
+
+        Takes the collection_of_retests and last_comment_timestamp and outputs
+        them into a json file.
+
+        Args:
+            file_name: Name of the file to write the output to.
+        """
+
+        output_dict = {
+            "retests": self.collection_of_retests,
+            "last_comment_timestamp": self.last_comment_timestamp,
+        }
+        with open(file_name, "w") as file:
+            file.write(json.dumps(output_dict, indent=4, cls=JSONSetEncoder))
+
+
+if __name__ == "__main__":
+    parser = argparse.ArgumentParser(description="Help text for getting reruns")
+    parser.add_argument(
+        "-ts",
+        "--time-since",
+        dest="time_since",
+        required=True,
+        help="Get all patches since this many days ago (default: 5)",
+    )
+    parser.add_argument(
+        "--contexts",
+        dest="contexts_to_capture",
+        nargs="*",
+        required=True,
+        help="List of patchwork contexts you would like to capture",
+    )
+    parser.add_argument(
+        "-o",
+        "--out-file",
+        dest="out_file",
+        help=(
+            "Output file where the list of reruns and the timestamp of the"
+            "last comment in the list of comments"
+            "(default: rerun_requests.json)."
+        ),
+        default="rerun_requests.json",
+    )
+    args = parser.parse_args()
+    rerun_processor = RerunProcessor(args.contexts_to_capture, args.time_since)
+    rerun_processor.process_reruns()
+    rerun_processor.write_to_output_file(args.out_file)
-- 
2.41.0


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH 1/1] tools: add get_reruns script
  2023-09-05 22:13 ` [PATCH 1/1] tools: add get_reruns script jspewock
@ 2023-09-07 12:56   ` Aaron Conole
  0 siblings, 0 replies; 3+ messages in thread
From: Aaron Conole @ 2023-09-07 12:56 UTC (permalink / raw)
  To: jspewock; +Cc: ci, alialnu, probb, Adam Hassick

Hi Jeremy,

jspewock@iol.unh.edu writes:

> From: Jeremy Spewock <jspewock@iol.unh.edu>
>
> This script is used to interact with the DPDK Patchwork API to collect a
> list of retests from comments on patches based on a desired list of
> contexts to retest. The script uses regex to scan all of the comments
> since a timestamp that is passed into the script through the CLI for
> any comment that is requesting a retest. These requests are then filtered
> based on the desired contexts that you pass into the script through the
> CLI and then aggregated based on the patch series ID of the series that
> the comment came from. This aggregated list is then outputted to a JSON
> file with a timestamp of the most recent comment on patchworks.
>
> Signed-off-by: Jeremy Spewock <jspewock@iol.unh.edu>
> Signed-off-by: Adam Hassick <ahassick@iol.unh.edu>
> ---

Thanks for the tool.

>  tools/get_reruns.py | 219 ++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 219 insertions(+)
>  create mode 100755 tools/get_reruns.py
>
> diff --git a/tools/get_reruns.py b/tools/get_reruns.py
> new file mode 100755
> index 0000000..159ff6e
> --- /dev/null
> +++ b/tools/get_reruns.py
> @@ -0,0 +1,219 @@
> +#!/usr/bin/env python3
> +# -*- coding: utf-8 -*-
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2023 University of New Hampshire
> +
> +import argparse
> +import datetime
> +import json
> +import re
> +from json import JSONEncoder
> +from typing import Dict, List, Set, Optional
> +
> +import requests

I think this block should be cleaned up a bit.

The imports should be in alphabetical order.  The block shouldn't have
extra spaces.

> +
> +
> +class JSONSetEncoder(JSONEncoder):
> +    """Custom JSON encoder to handle sets.
> +
> +    Pythons json module cannot serialize sets so this custom encoder converts
> +    them into lists.
> +
> +    Args:
> +        JSONEncoder: JSON encoder from the json python module.
> +    """
> +
> +    def default(self, input_object):
> +        if isinstance(input_object, set):
> +            return list(input_object)
> +        return input_object
> +
> +
> +class RerunProcessor:
> +    """Class for finding reruns inside an email using the patchworks events
> +    API.
> +
> +    The idea of this class is to use regex to find certain patterns that
> +    represent desired contexts to rerun.
> +
> +    Arguments:
> +        desired_contexts: List of all contexts to search for in the bodies of
> +            the comments
> +        time_since: Get all comments since this timestamp
> +
> +    Attributes:
> +        collection_of_retests: A dictionary that maps patch series IDs to the
> +            set of contexts to be retested for that patch series.
> +        regex: regex used for collecting the contexts from the comment body.
> +        last_comment_timestamp: timestamp of the most recent comment that was
> +            processed
> +    """
> +
> +    _desired_contexts: List[str]
> +    _time_since: str
> +    collection_of_retests: Dict[str, Dict[str, Set]] = {}
> +    last_comment_timestamp: Optional[str] = None
> +    # ^ is start of line
> +    # ((?:[a-zA-Z-]+(?:, ?\n?)?)+) is a capture group that gets all test
> +    #   labels after "Recheck-request: "
> +    #   (?:[a-zA-Z-]+(?:, ?\n?)?)+ means 1 or more of the first match group
> +    #       [a-zA-Z0-9-_]+ means 1 more more of any character in the ranges a-z,
> +    #           A-Z, 0-9, or the characters '-' or '_'
> +    #       (?:, ?\n?)? means 1 or none of this match group which expects
> +    #           exactly 1 comma followed by 1 or no spaces followed by
> +    #           1 or no newlines.

This comment might not be needed.  Afterall, we can see the regex group
and you are just documenting python regex tool.  Instead, maybe we
should just re-iterate the understanding around recheck-request.  For
example, the comment we look for must appear at the start of a line, it
is case sensitive tag, and 

> +    # VALID MATCHES:
> +    #   Recheck-request: iol-unit-testing, iol-something-else, iol-one-more,
> +    #   Recheck-request: iol-unit-testing,iol-something-else, iol-one-more
> +    #   Recheck-request: iol-unit-testing, iol-example, iol-another-example,
> +    #   more-intel-testing
> +    # INVALID MATCHES:
> +    #   Recheck-request: iol-unit-testing,  intel-example-testing
> +    #   Recheck-request: iol-unit-testing iol-something-else,iol-one-more,
> +    #   Recheck-request: iol-unit-testing,iol-something-else,iol-one-more,
> +    #
> +    #   more-intel-testing
> +    regex: str = "^Recheck-request: ((?:[a-zA-Z0-9-_]+(?:, ?\n?)?)+)"
> +
> +    def __init__(self, desired_contexts: List[str], time_since: str) -> None:
> +        self._desired_contexts = desired_contexts
> +        self._time_since = time_since
> +
> +    def process_reruns(self) -> None:
> +        patchwork_url = f"http://patches.dpdk.org/api/events/?since={self._time_since}"

On the off-chance this API URL ever changes, we should make this
configurable.

> +        comment_request_info = []
> +        for item in [
> +            "&category=cover-comment-created",
> +            "&category=patch-comment-created",
> +        ]:
> +            response = requests.get(patchwork_url + item)
> +            response.raise_for_status()
> +            comment_request_info.extend(response.json())
> +        rerun_processor.process_comment_info(comment_request_info)
> +
> +    def process_comment_info(self, list_of_comment_blobs: List[Dict]) -> None:
> +        """Takes the list of json blobs of comment information and associates
> +        them with their patches.
> +
> +        Collects retest labels from a list of comments on patches represented
> +        inlist_of_comment_blobs and creates a dictionary that associates them
> +        with their corresponding patch series ID. The labels that need to be
> +        retested are collected by passing the comments body into
> +        get_test_names() method. This method also updates the current UTC
> +        timestamp for the processor to the current time.
> +
> +        Args:
> +            list_of_comment_blobs: a list of JSON blobs that represent comment
> +            information
> +        """
> +
> +        list_of_comment_blobs = sorted(
> +            list_of_comment_blobs,
> +            key=lambda x: datetime.datetime.fromisoformat(x["date"]),
> +            reverse=True,
> +        )
> +
> +        if list_of_comment_blobs:
> +            most_recent_timestamp = datetime.datetime.fromisoformat(
> +                list_of_comment_blobs[0]["date"]
> +            )
> +            # exclude the most recent
> +            most_recent_timestamp = most_recent_timestamp + datetime.timedelta(
> +                microseconds=1
> +            )
> +            self.last_comment_timestamp = most_recent_timestamp.isoformat()
> +
> +        for comment in list_of_comment_blobs:
> +            # before we do any parsing we want to make sure that we are dealing
> +            # with a comment that is associated with a patch series
> +            payload_key = "cover"
> +            if comment["category"] == "patch-comment-created":
> +                payload_key = "patch"
> +            patch_series_arr = requests.get(
> +                comment["payload"][payload_key]["url"]
> +            ).json()["series"]
> +            if not patch_series_arr:
> +                continue
> +            patch_id = patch_series_arr[0]["id"]
> +
> +            comment_info = requests.get(comment["payload"]["comment"]["url"])
> +            comment_info.raise_for_status()
> +            content = comment_info.json()["content"]
> +
> +            labels_to_rerun = self.get_test_names(content)
> +
> +            # appending to the list if it already exists, or creating it if it
> +            # doesn't
> +            if labels_to_rerun:
> +                self.collection_of_retests[patch_id] = self.collection_of_retests.get(
> +                    patch_id, {"contexts": set()}
> +                )
> +                self.collection_of_retests[patch_id]["contexts"].update(labels_to_rerun)
> +
> +    def get_test_names(self, email_body: str) -> Set[str]:
> +        """Uses the regex in the class to get the information from the email.
> +
> +        When it gets the test names from the email, it will all be in one
> +        capture group. We expect a comma separated list of patchwork labels
> +        to be retested.
> +
> +        Returns:
> +            A set of contexts found in the email that match your list of
> +            desired contexts to capture. We use a set here to avoid duplicate
> +            contexts.
> +        """
> +        rerun_section = re.findall(self.regex, email_body, re.MULTILINE)
> +        if not rerun_section:
> +            return set()
> +        rerun_list = list(map(str.strip, rerun_section[0].split(",")))
> +        return set(filter(lambda x: x and x in self._desired_contexts, rerun_list))
> +
> +    def write_to_output_file(self, file_name: str) -> None:
> +        """Write class information to a JSON file.
> +
> +        Takes the collection_of_retests and last_comment_timestamp and outputs
> +        them into a json file.
> +
> +        Args:
> +            file_name: Name of the file to write the output to.
> +        """

Maybe it is also friendly to output to stdout with a filename like "-"
so that we can use it in a script pipeline.

> +        output_dict = {
> +            "retests": self.collection_of_retests,
> +            "last_comment_timestamp": self.last_comment_timestamp,
> +        }
> +        with open(file_name, "w") as file:
> +            file.write(json.dumps(output_dict, indent=4, cls=JSONSetEncoder))
> +
> +
> +if __name__ == "__main__":
> +    parser = argparse.ArgumentParser(description="Help text for getting reruns")
> +    parser.add_argument(
> +        "-ts",
> +        "--time-since",
> +        dest="time_since",
> +        required=True,
> +        help="Get all patches since this many days ago (default: 5)",
> +    )
> +    parser.add_argument(
> +        "--contexts",
> +        dest="contexts_to_capture",
> +        nargs="*",
> +        required=True,
> +        help="List of patchwork contexts you would like to capture",
> +    )
> +    parser.add_argument(
> +        "-o",
> +        "--out-file",
> +        dest="out_file",
> +        help=(
> +            "Output file where the list of reruns and the timestamp of the"
> +            "last comment in the list of comments"
> +            "(default: rerun_requests.json)."
> +        ),
> +        default="rerun_requests.json",
> +    )
> +    args = parser.parse_args()
> +    rerun_processor = RerunProcessor(args.contexts_to_capture, args.time_since)
> +    rerun_processor.process_reruns()
> +    rerun_processor.write_to_output_file(args.out_file)


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2023-09-07 12:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-05 22:13 [PATCH 0/1] tools: Add script for getting rerun requests jspewock
2023-09-05 22:13 ` [PATCH 1/1] tools: add get_reruns script jspewock
2023-09-07 12:56   ` Aaron Conole

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).