From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot1-f67.google.com (mail-ot1-f67.google.com [209.85.210.67]) by dpdk.org (Postfix) with ESMTP id B6DFB1B4CF for ; Fri, 15 Feb 2019 22:29:08 +0100 (CET) Received: by mail-ot1-f67.google.com with SMTP id m1so18977839otf.5 for ; Fri, 15 Feb 2019 13:29:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iol.unh.edu; s=unh-iol; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Mc5oEl7rlPyAhFDfnHJOz4bDRK6LdsiJhQqxQsxzgdo=; b=FxXQNIqBYx6zdTiQ6I4MiXdvc/ep+XOuCH93HZRpKQiVAInTEVJvRrb8jaTlVGUhh7 wd1+8fkQVH9KFCVbv0Qb1h1tbc89hb7HVWiDFttHK9Yh/Mp2KgcapF9cRC2igu4gBzkJ TudnXMxRjmyh0Go8/0eoS3sFviQvuHOi+Ce60= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Mc5oEl7rlPyAhFDfnHJOz4bDRK6LdsiJhQqxQsxzgdo=; b=Ni+CfkM1TgBthaWbPPljxTHjLqqY0QshztEyY4lKAj1+aEUAeAYr+RGQ4HUVY8BCf8 mualmyrFD+smdPJR6IesGK1nT06LESz/q8wbmw7yZ70O2WjJeIK+pG1lN82VvJiKOc8K RLir5demPjTPZaNQfZzD4aSLmlm0SVKsuinUhL9UJZgD3ITkuhNLAWjMQSWlN/j0U9QI hSAbOziEiiq3aJ1KEOASUIm3JgmCMRWVVWdx2X8Th3vAUx5fbHpNCbVUO5imMbUAOZQF Bx1JZN3gq1hTEu+BrqXpMkCVxi+b2dhcVRh5OKuzcT6pX+icQYDqHzuyGLTjLFN1DT4O Cm+g== X-Gm-Message-State: AHQUAuZQRQNT3VeNnmwUhTSmLJ6XARRTMU+okQndKYU63z0eDWfO8nco 0NuNpsD2T1/pkaaQQwUNmQC97ut5dcnMk1FfIPlsbA== X-Google-Smtp-Source: AHgI3IbWDwdHk3yCke/2jmfW1xlxwDLiHaw4Snbvdsf1ldHF1mgGwB3h/nGtH56Q6gkYzX4AXaGWkRaw7NSJ9UCY+8w= X-Received: by 2002:a9d:730b:: with SMTP id e11mr22924otk.277.1550266147853; Fri, 15 Feb 2019 13:29:07 -0800 (PST) MIME-Version: 1.0 References: <20190212144828.18122-1-alialnu@mellanox.com> In-Reply-To: <20190212144828.18122-1-alialnu@mellanox.com> From: Jeremy Plsek Date: Fri, 15 Feb 2019 16:28:31 -0500 Message-ID: To: Ali Alnubani Cc: "ci@dpdk.org" , Thomas Monjalon , "ferruh.yigit@intel.com" , Ori Kam Content-Type: text/plain; charset="UTF-8" Subject: Re: [dpdk-ci] [PATCH v2] add script to decide best tree match for patches X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 15 Feb 2019 21:29:09 -0000 Thanks for the script! I tested it locally and seems to work. Review inline. Extra comments at the end. On Tue, Feb 12, 2019 at 9:48 AM Ali Alnubani wrote: > > The script can be used to get the trees that best match > a patch or a series. > > Signed-off-by: Ali Alnubani > Signed-off-by: Ori Kam > --- > Changes in v2: > - Renamed script. > - Updated license. > > tools/guess-git-tree.py | 244 ++++++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 244 insertions(+) > create mode 100755 tools/guess-git-tree.py > > diff --git a/tools/guess-git-tree.py b/tools/guess-git-tree.py > new file mode 100755 > index 0000000..f4caef5 > --- /dev/null > +++ b/tools/guess-git-tree.py > @@ -0,0 +1,244 @@ > +#!/usr/bin/env python > + > +# SPDX-License-Identifier: (BSD-3-Clause AND GPL-2.0-or-later AND MIT) > +# Copyright 2019 Mellanox Technologies, Ltd > + > +import os > +import sys > +import re > +import argparse > +import copy > +import fnmatch > + > +from requests.exceptions import HTTPError > + > +from git_pw import config > +from git_pw import api > +from git_pw import utils > + > +""" > +This script uses the git-pw API to retrieve Patchwork's series/patches, > +and find a list of trees/repos that best match the series/patch. > + > +The rules on which matches are based, are taken from the MAINTAINERS file, > +and currently only based on the paths of the changed files. Results can be > +improved by adding more information to the MAINTAINERS file. > + > +TODO: > + - Match using the subject of the patch/series. > + - Add a configuration file to specify the priority of each tree. > + > +Configurations: > +The script uses tokens for authentication. > +If the arguments pw_{server,project,token} aren't passed, the environment > +variables PW_{SERVER,PROJECT,TOKEN} should be set. If not, the script will try > +to load the git configurations pw.{server,project,token}. > + > +Example usage: > + ./guess-git-tree.py --command list_trees_for_series 2054 > + ./guess-git-tree.py --command list_trees_for_patch 2054 > + > +The output will be a list of trees sorted based on number of matches, > +with the first line having the highest count. > +""" > + > +CONF = config.CONF > +CONF.debug = False > + > +MAINTAINERS_FILE_PATH = os.environ.get('MAINTAINERS_FILE_PATH') > +if not MAINTAINERS_FILE_PATH: > + print('MAINTAINERS_FILE_PATH is not set.') > + sys.exit(1) > +RULES = {} > + > +ignored_files_re = re.compile(r'^doc/|\.sh$|\.py$') > + > +def configure_git_pw(args=None): > + """Configure git-pw.""" > + conf = {} > + conf_keys = ['server', 'project', 'token'] See comment in the argument parser section. > + for key in conf_keys: > + value = getattr(args, 'pw_{}'.format(key)) There are two spaces between = and getattr, > + if not value: > + print('--{} is a required git-pw configuration'.format(arg)) I think this was meant to be '--pw_{}' instead of '--{}'. > + sys.exit(1) > + else: > + setattr(CONF, key, value) > + > +def find_filenames(diff): > + """Find file changes in a given diff. > + > + Source: https://github.com/getpatchwork/patchwork/blob/master/patchwork/parser.py > + Changes from source: > + - Moved _filename_re into the method. > + - Reduced newlines. > + """ > + _filename_re = re.compile(r'^(---|\+\+\+) (\S+)') > + # normalise spaces > + diff = diff.replace('\r', '') > + diff = diff.strip() + '\n' > + filenames = {} > + for line in diff.split('\n'): > + if len(line) <= 0: > + continue > + filename_match = _filename_re.match(line) > + if not filename_match: > + continue > + filename = filename_match.group(2) > + if filename.startswith('/dev/null'): > + continue > + filename = '/'.join(filename.split('/')[1:]) > + filenames[filename] = True > + filenames = sorted(filenames.keys()) > + return filenames > + > +def construct_rules(): > + """Build a dictionary of rules from the MAINTAINERS file.""" > + with open(MAINTAINERS_FILE_PATH) as fd: > + maintainers = fd.read() > + # Split into blocks of text for easier search. > + maintainers = maintainers.split('\n\n') > + > + # Extract blocks that have a tree and files. > + tree_file_blocks = [_item for _item in maintainers \ > + if 'T: git://dpdk.org' in _item and 'F: ' in _item] > + _dict = {} > + for _item in tree_file_blocks: > + # Get the tree url. > + tree_match = re.search(r'T: (git://dpdk\.org[^\n]+)', _item) > + if tree_match: > + tree = tree_match.group(1) > + else: > + continue > + if tree not in _dict: > + _dict[tree] = {} > + _dict[tree]['paths'] = [] > + paths = re.findall(r'F: ([^\n]+)', _item) > + _paths = copy.deepcopy(paths) > + for path in paths: > + # Remove don't-care paths > + if ignored_files_re.search(path): > + _paths.remove(path) > + _dict[tree]['paths'] += _paths > + return _dict > + > +def get_subject(resource): > + """Get subject from patch/series object, > + remove its prefix and strip it. > + """ > + name = resource['name'] > + return re.sub('^\[.*\]', '', name).strip() > + > +def find_matches(files): > + """Find trees that the changed files in a patch match, > + and stop at first match for each file.""" > + matches = [] > + for _file in files: > + if ignored_files_re.search(_file): > + continue > + match_found = False > + for tree in RULES.keys(): > + for rule in RULES[tree]['paths']: > + if rule.endswith('/'): > + rule = '{}*'.format(rule) > + if fnmatch.fnmatch(_file, rule): > + matches.append(tree) > + match_found = True > + break > + if match_found: > + break > + return matches > + > +def get_ordered_matches(matches): > + """Order matches by occurrences.""" > + match_counts = {item:matches.count(item) for item in matches} > + return sorted(match_counts, key=match_counts.get, reverse=True) > + > +def list_trees_for_patch(patch): > + """Find matching trees for a specific patch. > + For a patch to match a tree, both its subject and > + at least one changed path has to match the tree. > + """ > + subject = get_subject(patch) subject is not used and can be removed. > + files = find_filenames(patch['diff']) > + > + matches = find_matches(files) > + return matches > + > +def list_trees_for_series(series): > + """Find matching trees for a series.""" > + patch_list = series['patches'] > + > + matches = [] > + > + for patch in patch_list: > + matches = matches + \ > + list_trees_for_patch(api_get('patches', patch['id'])) > + > + return matches > + > +def parse_args(): > + """Parse command-line arguments.""" > + parser = argparse.ArgumentParser() > + git_pw_conf_parser = parser.add_argument_group('git-pw configurations') > + options_parser = parser.add_argument_group('optional arguments') > + > + options_parser.add_argument('--command', > + choices=('list_trees_for_patch', > + 'list_trees_for_series'), > + required=True, help='command to perform on patch/series') > + > + git_pw_conf_parser.add_argument('--pw_server', type=str, > + default=os.environ.get('PW_SERVER', utils.git_config('pw.server')), > + help='PW.SERVER') > + git_pw_conf_parser.add_argument('--pw_project', type=str, > + default=os.environ.get('PW_PROJECT', utils.git_config('pw.project')), > + help='PW.PROJECT') > + git_pw_conf_parser.add_argument('--pw_token', type=str, > + default=os.environ.get('PW_TOKEN', utils.git_config('pw.token')), > + help='PW.TOKEN') Depending on the action taken from https://github.com/getpatchwork/git-pw/pull/37 this argument may be removed, since the API calls don't require authentication. Then "token" can be removed from conf_keys. You'll also probably need to add and set CONF.token/username/password to None, similar to the CONF.debug = False line. But since authentication in the python library is required now, this part can be done later. > + > + parser.add_argument('id', type=int, > + help='patch/series id') > + > + args = parser.parse_args() > + > + return args > + > +def main(): > + """Main procedure.""" > + args = parse_args() > + configure_git_pw(args) > + > + command = args.command > + _id = args.id > + > + global RULES > + RULES = construct_rules() > + > + tree_list = [] > + > + if command == 'list_trees_for_patch': > + patch = api_get('patches', _id) > + tree_list = list_trees_for_patch(patch) > + > + elif command == 'list_trees_for_series': > + series = api_get('series', _id) > + tree_list = list_trees_for_series(series) > + > + tree_list = get_ordered_matches(tree_list) > + > + print('{}'.format('\n'.join(tree_list))) > + > +def api_get(resource_type, resource_id): > + """Retrieve an API resource.""" > + try: > + return api.detail(resource_type, resource_id) > + except HTTPError as err: > + if '404' in str(err): > + sys.exit(1) > + else: > + raise > + > +if __name__ == '__main__': > + main() > -- > 2.11.0 > There should also be two blank lines above/under method definitions outside of classes to follow pep8 more closely. We have a python script that we use to apply patches. With this current script, it's a little bit awkward to use inside our apply script. Right now it would look something like this inside of our apply script: (ggt = guess-git-tree) ggt.configure_git_pw(args) ggt.RULES = construct_rules() series = ggt.api_get('series', id) tree_list = list_trees_for_series(series) tree_list = get_ordered_matches(tree_list) # now attempt to apply patch to each tree until successful I'd rather see this implemented in a class, so that things like the construct_rules() and configure_git_pw() happens in the constructor, then I'd like to call something like get_ordered_tree_for_series(id) to get the list. The constructor arguments would probably contain the server address and project. (Or you could make the server address also an environment variable like PATCHWORK_API_URL = os.getenv("PATCHWORK_API_URL", "http://patches.dpdk.org/api/1.0/"), which is what we do in our scripts.) As a note, this my first in-email review. If I was supposed to do something different, please let me know! Thanks -- Jeremy Plsek UNH InterOperability Laboratory