From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 4A08643884; Wed, 10 Jan 2024 17:54:29 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 2C0284025E; Wed, 10 Jan 2024 17:54:29 +0100 (CET) Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by mails.dpdk.org (Postfix) with ESMTP id 841304021E for ; Wed, 10 Jan 2024 17:54:27 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1704905667; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=k1Q+uWjIMzCoP/0ulHWbobllau7MaIVR3uVSsLrWWbA=; b=JCVtA+nFSdgwtvO+y1SrIOfZy1UmAuHgxYeP+YN1lKvHj6FuDtbbZdsqCxWe0myziMnscA 5g71nEhmSczpKWXW9tIeiltcWPbq52mXvd7gSiZgnGd8Ts2IQ6QsXHupHKbF9NYbh4SKEI d4stdEqs03b4c+0+d30dbl9ntYz05hU= Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-679-NjZ_mY1rNOaq2frbt0bGfA-1; Wed, 10 Jan 2024 11:54:23 -0500 X-MC-Unique: NjZ_mY1rNOaq2frbt0bGfA-1 Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 16F4E83537D; Wed, 10 Jan 2024 16:54:22 +0000 (UTC) Received: from RHTPC1VM0NT (unknown [10.22.34.170]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 6CDB31C06532; Wed, 10 Jan 2024 16:54:21 +0000 (UTC) From: Aaron Conole To: Adam Hassick Cc: ci@dpdk.org, alialnu@nvidia.com Subject: Re: [PATCH 1/2] tools: Add script to create artifacts References: <20240110145715.28157-1-ahassick@iol.unh.edu> <20240110145715.28157-2-ahassick@iol.unh.edu> Date: Wed, 10 Jan 2024 11:54:20 -0500 In-Reply-To: <20240110145715.28157-2-ahassick@iol.unh.edu> (Adam Hassick's message of "Wed, 10 Jan 2024 09:57:14 -0500") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.3 (gnu/linux) MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.4.1 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org Adam Hassick writes: > This script takes in a URL to a series on Patchwork and emits a > tarball which may be used for running tests. > > Signed-off-by: Adam Hassick > --- Just a quick glance. Is it possible to include a doc on how it is expected to run this script? Maybe that could help to also evaluate it as well. > tools/create_series_artifact.py | 453 ++++++++++++++++++++++++++++++++ > 1 file changed, 453 insertions(+) > create mode 100755 tools/create_series_artifact.py > > diff --git a/tools/create_series_artifact.py b/tools/create_series_artifact.py > new file mode 100755 > index 0000000..3049aaa > --- /dev/null > +++ b/tools/create_series_artifact.py > @@ -0,0 +1,453 @@ As a general note, you're using lots of calls to subprocess, but the utilities you're calling are python utilities. Is it possible to import these and make direct calls instead? I think that will speed up execution. I also think it prevents weird PATH issues. > +#!/usr/bin/env python3 > + > +import argparse > +import os > +import subprocess > +import requests > +import pathlib > +import yaml > +import pygit2 > +import requests > +import shutil Looks like you imported 'requests' twice. I suggest rewriting this so that it is alphabetical order, which would prevent this kind of mistake as well. > + > +from dataclasses import dataclass > +from typing import Optional, Dict, Tuple, Any, List > + > + > +@dataclass > +class CreateSeriesParameters(object): > + pw_server: str > + pw_project: str > + pw_token: str > + git_user: str > + git_email: str > + series_url: str > + patch_ids: List[int] > + labels: List[str] > + apply_error: Optional[Tuple[str, str]] > + config: Dict > + series: Dict > + pw_mcli_script: pathlib.Path > + patch_parser_script: pathlib.Path > + patch_parser_cfg: pathlib.Path > + lzma: bool > + output_tarball: pathlib.Path > + output_properties: pathlib.Path > + no_depends: bool > + docs_only: bool > + > + def __get_tags(self) -> List[str]: > + series_filename = f"{self.series['id']}.patch" > + > + # Pull down the patch series as a single file. > + pulldown_result = subprocess.run( > + [ > + "git", > + "pw", > + "--server", # Pass in the pw server we wish to download from. > + self.pw_server, > + "series", > + "download", > + "--combined", # Specifies that we want the series in one patch file. > + str(self.series["id"]), > + series_filename, > + ], > + stdout=subprocess.DEVNULL, > + stderr=subprocess.DEVNULL, > + ) > + > + # Assert that this succeeds. > + pulldown_result.check_returncode() > + > + # Call the patch parser script to obtain the tags > + parse_result = subprocess.run( > + [self.patch_parser_script, self.patch_parser_cfg, series_filename], > + stdout=subprocess.PIPE, > + stderr=subprocess.PIPE, > + ) > + > + # Assert that patch parser succeeded. > + parse_result.check_returncode() > + > + # Return the output > + return parse_result.stdout.decode().splitlines() > + > + def __init__(self): > + parser = argparse.ArgumentParser( > + formatter_class=argparse.RawDescriptionHelpFormatter, > + description="""This script will create an artifact given a URL to a Patchwork series. > +Much of the information provided is acquired by this script through the use of a configuration file. > +This configuration file can be found with the other script configs in the config directory of the CI repo. > + > +The configuration file is used to aggregate: > + - Git credentials > + - Patchwork configuration and the user token (user token is optional) > + - The URL of the DPDK Git mirror > + - Locations of dependency scripts and their configuration files > + > +More detail and examples can be found in the default configuration file. > +This default file is located at "config/artifacts.yml" in the dpdk-ci repository. > +""", > + ) > + parser.add_argument( > + "config", > + type=argparse.FileType(), > + help="The config file to load. Must be a path to a YAML file.", > + ) > + parser.add_argument( > + "series_url", type=str, help="The URL to a Patchwork series." > + ) > + parser.add_argument( > + "-t", > + "--pw-token", > + dest="pw_token", > + type=str, > + help="The Patchwork token", > + ) > + parser.add_argument( > + "-l", > + "--lzma", > + action="store_true", > + help="Use LZMA compression rather than GNU zip compression.", > + ) > + parser.add_argument( > + "-nd", > + "--no-depends", > + action="store_true", > + help="Do not use the Depends-on label.", > + ) > + > + args = parser.parse_args() > + > + # Collect basic arguments. > + self.series_url = args.series_url > + self.no_depends = args.no_depends > + self.lzma = args.lzma > + > + # Read the configuration file. > + with args.config as config_file: > + self.config = yaml.safe_load(config_file) > + > + self.pw_server = self.config["patchwork"]["server"] > + self.pw_project = self.config["patchwork"]["project"] > + > + if args.pw_token: > + self.pw_token = args.pw_token > + else: > + self.pw_token = self.config["patchwork"].get("token") > + > + if not self.pw_token: > + print("Failed to obtain the Patchworks token.") > + exit(1) > + > + self.pw_mcli_script = pathlib.Path( > + self.config["pw_maintainers_cli"]["path"] > + ).absolute() > + self.git_user = self.config["git"]["user"] > + self.git_email = self.config["git"]["email"] > + > + self.patch_parser_script = pathlib.Path( > + self.config["patch_parser"]["path"] > + ).absolute() > + self.patch_parser_cfg = pathlib.Path( > + self.config["patch_parser"]["config"] > + ).absolute() > + > + if self.lzma: > + tarball_name = "dpdk.tar.xz" > + else: > + tarball_name = "dpdk.tar.gz" > + > + self.output_tarball = pathlib.Path(tarball_name) > + self.output_properties = pathlib.Path(f"{tarball_name}.properties") > + > + # Pull the series JSON down. > + resp = requests.get(self.series_url) > + resp.raise_for_status() > + self.series = resp.json() > + > + # Get the labels using the patch parser. > + self.labels = self.__get_tags() > + > + # See if this is a documentation-only patch. > + self.docs_only = len(self.labels) == 1 and self.labels[0] == "documentation" > + > + # Get the patch ids in this patch series. > + self.patch_ids = list(map(lambda x: int(x["id"]), self.series["patches"])) > + self.patch_ids.sort() > + > + > +class ProjectTree(object): > + artifact_path: pathlib.Path > + tree: str > + commit_id: str > + path: pathlib.Path > + log_file_path: pathlib.Path > + props_file_path: pathlib.Path > + data: CreateSeriesParameters > + repo: pygit2.Repository > + log_buf: List[str] > + properties: Dict[str, Any] > + > + def log(self, msg: str): > + print(msg) > + self.log_buf.append(msg) > + > + def write_log(self): > + with open(self.log_file_path, "w") as log_file: > + log_file.write("\n".join([msg for msg in self.log_buf])) > + > + def write_properties(self): > + with open(self.props_file_path, "w") as prop_file: > + for key, value in self.properties.items(): > + prop_file.write(f"{key}={value}\n") > + > + def move_logs(self): > + shutil.move(self.log_file_path, pathlib.Path(os.getcwd(), "log.txt")) > + shutil.move( > + self.props_file_path, pathlib.Path(os.getcwd(), self.data.output_properties) > + ) > + > + def __init__(self, data: CreateSeriesParameters): > + self.data = data > + self.path = pathlib.Path(os.curdir, "dpdk").absolute() > + self.log_buf = [] > + self.log_file_path = pathlib.Path(self.path, "log.txt") > + self.props_file_path = pathlib.Path(self.path, data.output_properties) > + self.tree = "main" > + self.properties = {} > + self.artifact_path = data.output_tarball > + > + # Set the range of patch IDs this series (aka patchset) covers. > + self.properties["patchset_range"] = f"{data.patch_ids[0]}-{data.patch_ids[-1]}" > + > + # Set the tags using tags obtained by the params class > + self.properties["tags"] = " ".join(data.labels) > + > + # Record whether this patch is only documentation > + self.properties["is_docs_only"] = str(data.docs_only) > + > + if not self.path.exists(): > + # Find the URL to clone from based on the tree name. > + repo = self.data.config["repo_url"] > + > + self.log(f"Cloning the DPDK mirror at: {repo}") > + > + # Pull down the git repo we found. > + repo = pygit2.clone_repository(repo, self.path) > + else: > + # Fetch any changes. > + repo = pygit2.Repository(self.path) > + > + self.log(f"Fetching the remote for tree: {self.tree}") > + > + origin: pygit2.Remote = repo.remotes["origin"] > + > + progress = origin.fetch() > + > + self.log( > + f"Received objects: {progress.received_objects} of {progress.total_objects}" > + ) > + > + self.log("Cleaning repository state...") > + > + repo.state_cleanup() > + > + # Initially, check out to main. > + self.repo = repo > + self.checkout("main") > + > + self.log(f"Done: {self.tree} commit {self.commit_id}") > + > + def checkout(self, branch: str) -> Optional[str]: > + """ > + Check out to some branch. > + Returns true if successful, false otherwise. > + """ > + if branch not in self.repo.branches: > + return None > + > + git_branch = self.repo.branches[branch] > + self.log(f"Trying to checkout branch: {git_branch.branch_name}") > + reference = self.repo.resolve_refish(git_branch.branch_name) > + self.commit_id = str(reference[0].id) > + self.repo.checkout(reference[1]) > + self.tree = branch > + > + return branch > + > + def guess_git_tree(self) -> Optional[str]: > + """ > + Run pw_maintainers_cli to guess the git tree of the patch series we are applying. > + Returns None if the pw_maintainers_cli failed. > + """ > + > + if "id" not in self.data.series: > + raise Exception("ID was not found in the series JSON") > + > + result = subprocess.run( > + [ > + self.data.pw_mcli_script, > + "--type", > + "series", > + "--pw-server", > + self.data.pw_server, > + "--pw-project", > + self.data.pw_project, > + "list-trees", > + str(self.data.series["id"]), > + ], > + stdout=subprocess.PIPE, > + cwd=self.path, > + env={ > + "MAINTAINERS_FILE_PATH": "MAINTAINERS", > + "PW_TOKEN": self.data.pw_token, > + }, > + ) > + > + if result.returncode == 0: > + branch = result.stdout.decode().strip() > + > + if branch in ["main", "dpdk"]: > + branch = "main" > + else: > + return None > + > + if branch[0:5] == "dpdk-": > + branch = branch[5 : len(branch)] I think this can be rewritten like: if branch.startswith("dpdk-"): branch = branch[5:] I think there is a conditional strip for that as well, but I can't recall right now. > + > + return self.checkout(branch) > + > + def set_properties(self): > + self.properties["tree"] = self.tree > + self.properties["applied_commit_id"] = self.commit_id > + > + def apply_patch_series(self) -> bool: > + # Run git-pw to apply the series. > + > + # Configure the tree to point at the given patchwork server and project > + self.repo.config["pw.server"] = self.data.pw_server > + self.repo.config["pw.project"] = self.data.pw_project > + self.repo.config["user.email"] = self.data.git_email > + self.repo.config["user.name"] = self.data.git_user > + > + result = subprocess.run( > + ["git", "pw", "series", "apply", str(self.data.series["id"])], > + cwd=self.path, > + stdout=subprocess.PIPE, > + stderr=subprocess.PIPE, > + ) > + > + # Write the log from the apply process to disk. > + self.log(f"Applying patch...") As far as I understand, f-strings are supposed to have parameters but it looks like this isn't a proper f-string (according to my understanding). At least, it should look like: f'{"Applying patch..."}' Something like that. But also, why do we use an f-string here at all? > + self.log(result.stdout.decode()) > + self.log(result.stderr.decode()) > + > + # Store whether there was an error, and return the flag. > + error = result.returncode != 0 > + self.properties["apply_error"] = error > + return not error > + > + def test_build(self) -> bool: > + ninja_result: Optional[subprocess.CompletedProcess] = None > + meson_result: subprocess.CompletedProcess = subprocess.run( > + ["meson", "setup", "build"], > + cwd=self.path, > + stdout=subprocess.PIPE, > + stderr=subprocess.PIPE, > + ) > + > + build_error = meson_result.returncode != 0 > + > + self.log("Running test build...") > + self.log(meson_result.stdout.decode()) > + > + if not build_error: > + ninja_result = subprocess.run( > + ["ninja", "-C", "build"], > + cwd=self.path, > + stdout=subprocess.PIPE, > + stderr=subprocess.PIPE, > + ) > + > + build_error = build_error or ninja_result.returncode != 0 > + shutil.rmtree(pathlib.Path(self.path, "build")) > + > + self.log(ninja_result.stdout.decode()) > + self.log(ninja_result.stderr.decode()) > + > + self.log(meson_result.stderr.decode()) > + > + if build_error: > + self.log("Test build failed.") > + > + self.properties["build_error"] = build_error > + return not build_error > + > + def create_tarball(self): > + # Copy the logs into the artifact tarball. > + self.write_log() > + self.write_properties() > + > + tar_args = ["tar"] > + > + if self.data.lzma: > + tar_args.append("--lzma") > + else: > + tar_args.append("-z") > + > + tar_args.extend(["-cf", self.artifact_path, "-C", self.path, "."]) > + > + result = subprocess.run( > + tar_args, > + stdout=subprocess.DEVNULL, > + stderr=subprocess.DEVNULL, > + ) > + > + if result.returncode != 0: > + return False > + > + print("Successfully created artifact:", self.artifact_path) > + > + # Move the log file out of the working directory. > + self.move_logs() > + > + return True > + > + > +def try_to_apply(tree: ProjectTree) -> bool: > + tree.set_properties() > + return tree.apply_patch_series() and tree.test_build() and tree.create_tarball() > + > + > +def main() -> int: > + data = CreateSeriesParameters() > + > + # Get the main branch. > + # We solve the chicken and egg problem of pw_maintainers_cli needing > + # information in the repository by always pulling down the main tree. > + tree = ProjectTree(data) > + > + # Try to guess the Git tree for this patchset. > + guessed_tree = tree.guess_git_tree() > + > + # Try to apply this patch. > + if not ( > + try_to_apply(tree) # First, try to apply on the guessed tree. > + or guessed_tree != "main" # If that fails, and the guessed tree was not main > + and tree.checkout("main") # Checkout to main, then > + and try_to_apply(tree) # Try to apply on main > + ): > + tree.write_log() > + tree.write_properties() > + tree.move_logs() > + > + print("FAILURE") > + > + return 1 > + > + return 0 > + > + > +if __name__ == "__main__": > + exit(main())