A bit of fuzz testing found some edge cases where this script crashes or fails to properly parse the patch file. I am currently working on a rewrite using a dedicated library to avoid these and similar issues. On Fri, Dec 4, 2020 at 2:45 PM Owen Hilyard wrote: > This commit contains a script, patch_parser.py, and a config file, > patch_parser.cfg. These are tooling that the UNH CI team has been > testing in order to reduce the number of tests that need to be run > per patch. This resulted from our push to increase the number of > functional tests running in the CI. While working on expanding test > coverage, we found that DTS could easily take over 6 hours to run, so > we decided to begin work on tagging patches and then only running the > required tests. > > The script works by taking in an address for the config file and then > a list of patch files, which it will parse and then produce a list of > tags for that list of patches based on the config file. The config file > is designed to work as a mapping for a base path to a set of tags. It > also contains an ordered list of priorities for tags so that this may > also be used by hierarchical tools rather than modular ones. > > The intention of the UNH team with giving this tooling to the wider > DPDK community is to have people more familiar with the internal > functionality of DPDK provide most of the tagging. This would allow > UNH to have a better turn around time for testing by eliminating > unnecessary tests, while still increasing the number of tests in the > CI. > > The different patch tags are currently defined as such: > > core: > Core DPDK functionality. Examples include kernel modules and > librte_eal. This tag should be used sparingly as it is intended > to signal to automated test suites that it is necessary to > run most of the tests for DPDK and as such will consume CI > resources for a long period of time. > > driver: > For NIC drivers and other hardware interface code. This should be > used as a generic tag with each driver getting it's own tag. > > application: > Used in a similar manner to "driver". This tag is intended for > code used in only in applications that DPDK provides, such as > testpmd or helloworld. This tag should be accompanied by a tag > which denotes which application specifically has been changed. > > documentation: > This is intended to be used as a tag for paths which only contain > documentation, such as "doc/". It's intended use is as a way to > trigger the automatic re-building of the documentation website. > > Signed-off-by: Owen Hilyard > --- > config/patch_parser.cfg | 25 ++++++++++++++++ > tools/patch_parser.py | 64 +++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 89 insertions(+) > create mode 100644 config/patch_parser.cfg > create mode 100755 tools/patch_parser.py > > diff --git a/config/patch_parser.cfg b/config/patch_parser.cfg > new file mode 100644 > index 0000000..5757f9a > --- /dev/null > +++ b/config/patch_parser.cfg > @@ -0,0 +1,25 @@ > +# Description of the categories as initially designed > + > +[Paths] > +drivers = > + driver, > + core > +kernel = core > +doc = documentation > +lib = core > +meson_options.txt = core > +examples = application > +app = application > +license = documentation > +VERSION = documentation > +build = core > + > +# This is an ordered list of the importance of each patch classification. > +# It should be used to determine which classification to use on tools > which > +# do not support multiple patch classifications. > +[Priority] > +priority_list = > + core, > + driver, > + application, > + documentation > diff --git a/tools/patch_parser.py b/tools/patch_parser.py > new file mode 100755 > index 0000000..01fc55d > --- /dev/null > +++ b/tools/patch_parser.py > @@ -0,0 +1,64 @@ > +#!/usr/bin/env python3 > + > +import itertools > +import sys > +from configparser import ConfigParser > +from typing import List, Dict, Set > + > + > +def get_patch_files(patch_file: str) -> List[str]: > + with open(patch_file, 'r') as f: > + lines = list(itertools.takewhile( > + lambda line: line.strip().endswith('+') or > line.strip().endswith('-'), > + itertools.dropwhile( > + lambda line: not line.strip().startswith("---"), > + f.readlines() > + ) > + )) > + filenames = map(lambda line: line.strip().split(' ')[0], lines) > + # takewhile includes the --- which starts the filenames > + return list(filenames)[1:] > + > + > +def get_all_files_from_patches(patch_files: List[str]) -> Set[str]: > + return set(itertools.chain.from_iterable(map(get_patch_files, > patch_files))) > + > + > +def parse_comma_delimited_list_from_string(mod_str: str) -> List[str]: > + return list(map(str.strip, mod_str.split(','))) > + > + > +def get_dictionary_attributes_from_config_file(conf_obj: ConfigParser) -> > Dict[str, Set[str]]: > + return { > + directory: parse_comma_delimited_list_from_string(module_string) > for directory, module_string in > + conf_obj['Paths'].items() > + } > + > + > +def get_tags_for_patch_file(patch_file: str, dir_attrs: Dict[str, > Set[str]]) -> Set[str]: > + return set(itertools.chain.from_iterable( > + tags for directory, tags in dir_attrs.items() if > patch_file.startswith(directory) > + )) > + > + > +def get_tags_for_patches(patch_files: Set[str], dir_attrs: Dict[str, > Set[str]]) -> Set[str]: > + return set(itertools.chain.from_iterable( > + map(lambda patch_file: get_tags_for_patch_file(patch_file, > dir_attrs), patch_files) > + )) > + > + > +if len(sys.argv) < 3: > + print("usage: patch_parser.py file>...") > + exit(1) > + > +conf_obj = ConfigParser() > +conf_obj.read(sys.argv[1]) > + > +patch_files = get_all_files_from_patches(sys.argv[2:]) > +dir_attrs = get_dictionary_attributes_from_config_file(conf_obj) > +priority_list = > parse_comma_delimited_list_from_string(conf_obj['Priority']['priority_list']) > + > +unordered_tags: Set[str] = get_tags_for_patches(patch_files, dir_attrs) > +ordered_tags: List[str] = [tag for tag in priority_list if tag in > unordered_tags] > + > +print("\n".join(ordered_tags)) > -- > 2.27.0 > >