From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 36BE1A0A02 for ; Thu, 14 Jan 2021 17:53:42 +0100 (CET) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 121821413BD; Thu, 14 Jan 2021 17:53:42 +0100 (CET) Received: from mail-oi1-f169.google.com (mail-oi1-f169.google.com [209.85.167.169]) by mails.dpdk.org (Postfix) with ESMTP id D441014137C for ; Thu, 14 Jan 2021 17:53:40 +0100 (CET) Received: by mail-oi1-f169.google.com with SMTP id w124so6561981oia.6 for ; Thu, 14 Jan 2021 08:53:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iol.unh.edu; s=unh-iol; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=nFzVEwPwnaBf2EecElsqekX0iSyBXR6frMJavOBbzLM=; b=dsNS4h+xEMTjOlfHg6yE9KrzKRX8iGec1mq+l5zJ2ZSD99kblNH0FQ4uuXkTRIsy94 GhEKJ4nubIR65xAM4Wc8a7JiAJtA0Ce8Was1dHQ7e54Mc/8CnGvybdwyylhcL9y3VXb+ VFmH1UuQaB8eE/QU+A7mumK3cAwQyrHoLaED4= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=nFzVEwPwnaBf2EecElsqekX0iSyBXR6frMJavOBbzLM=; b=jhe6nx123r7BWZTYdkje2BNTQBUiyLMgTdr/k/ZugllgJjyaddHWb0AP9wrJ00LwUS uWNWQbBJB8vT/HN6KBpbc1pHCBrSpiKmrJ3M23I3lZ0GDKlilvucIS1Cy9xzlbZp/zTl VgtKdM0AHif77EgUJAN00LFBysIgQwag8vEe7YvUePiiSfZLfnRKzs+5w26DobcFAdI6 XefHm+YTmU2axob6H159vO3A3n+LQ15uSklb9jkQoR8CNa+MTNmPq9Pr/4f6j7oP/2lI BxFVXQW/AaD/pCyedm633VeGr+GfY1UB3g5XvpjPEsKtfDeWQMG4K7/0lK2me9d//oOk 4GOg== X-Gm-Message-State: AOAM5320BZ4PH1zITzYGZLvGWDLKTDzGECZFx3tfU98Az1QVp9qGVie/ iOZYZ9KR7OsWLbqPUsVLVvil9ncC7rfzmaMaHbGI7E7fqJClFw== X-Google-Smtp-Source: ABdhPJyG7KPlC0v99HA/eeZj1OXAhPFwFiubKkua6nt6lpV+k43tiI0PjQC0RfGpiJdE8VE7kFteGVFF55gNmh3RsaU= X-Received: by 2002:aca:4d8b:: with SMTP id a133mr3212062oib.79.1610643219822; Thu, 14 Jan 2021 08:53:39 -0800 (PST) MIME-Version: 1.0 References: <20201204194512.14666-1-ohilyard@iol.unh.edu> In-Reply-To: <20201204194512.14666-1-ohilyard@iol.unh.edu> From: Owen Hilyard Date: Thu, 14 Jan 2021 11:53:04 -0500 Message-ID: To: ci@dpdk.org Cc: Thomas Monjalon Content-Type: multipart/alternative; boundary="000000000000fa504a05b8df18e8" Subject: Re: [dpdk-ci] [PATCH] ci: added patch parser for patch files X-BeenThere: ci@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK CI discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: ci-bounces@dpdk.org Sender: "ci" --000000000000fa504a05b8df18e8 Content-Type: text/plain; charset="UTF-8" A bit of fuzz testing found some edge cases where this script crashes or fails to properly parse the patch file. I am currently working on a rewrite using a dedicated library to avoid these and similar issues. On Fri, Dec 4, 2020 at 2:45 PM Owen Hilyard wrote: > This commit contains a script, patch_parser.py, and a config file, > patch_parser.cfg. These are tooling that the UNH CI team has been > testing in order to reduce the number of tests that need to be run > per patch. This resulted from our push to increase the number of > functional tests running in the CI. While working on expanding test > coverage, we found that DTS could easily take over 6 hours to run, so > we decided to begin work on tagging patches and then only running the > required tests. > > The script works by taking in an address for the config file and then > a list of patch files, which it will parse and then produce a list of > tags for that list of patches based on the config file. The config file > is designed to work as a mapping for a base path to a set of tags. It > also contains an ordered list of priorities for tags so that this may > also be used by hierarchical tools rather than modular ones. > > The intention of the UNH team with giving this tooling to the wider > DPDK community is to have people more familiar with the internal > functionality of DPDK provide most of the tagging. This would allow > UNH to have a better turn around time for testing by eliminating > unnecessary tests, while still increasing the number of tests in the > CI. > > The different patch tags are currently defined as such: > > core: > Core DPDK functionality. Examples include kernel modules and > librte_eal. This tag should be used sparingly as it is intended > to signal to automated test suites that it is necessary to > run most of the tests for DPDK and as such will consume CI > resources for a long period of time. > > driver: > For NIC drivers and other hardware interface code. This should be > used as a generic tag with each driver getting it's own tag. > > application: > Used in a similar manner to "driver". This tag is intended for > code used in only in applications that DPDK provides, such as > testpmd or helloworld. This tag should be accompanied by a tag > which denotes which application specifically has been changed. > > documentation: > This is intended to be used as a tag for paths which only contain > documentation, such as "doc/". It's intended use is as a way to > trigger the automatic re-building of the documentation website. > > Signed-off-by: Owen Hilyard > --- > config/patch_parser.cfg | 25 ++++++++++++++++ > tools/patch_parser.py | 64 +++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 89 insertions(+) > create mode 100644 config/patch_parser.cfg > create mode 100755 tools/patch_parser.py > > diff --git a/config/patch_parser.cfg b/config/patch_parser.cfg > new file mode 100644 > index 0000000..5757f9a > --- /dev/null > +++ b/config/patch_parser.cfg > @@ -0,0 +1,25 @@ > +# Description of the categories as initially designed > + > +[Paths] > +drivers = > + driver, > + core > +kernel = core > +doc = documentation > +lib = core > +meson_options.txt = core > +examples = application > +app = application > +license = documentation > +VERSION = documentation > +build = core > + > +# This is an ordered list of the importance of each patch classification. > +# It should be used to determine which classification to use on tools > which > +# do not support multiple patch classifications. > +[Priority] > +priority_list = > + core, > + driver, > + application, > + documentation > diff --git a/tools/patch_parser.py b/tools/patch_parser.py > new file mode 100755 > index 0000000..01fc55d > --- /dev/null > +++ b/tools/patch_parser.py > @@ -0,0 +1,64 @@ > +#!/usr/bin/env python3 > + > +import itertools > +import sys > +from configparser import ConfigParser > +from typing import List, Dict, Set > + > + > +def get_patch_files(patch_file: str) -> List[str]: > + with open(patch_file, 'r') as f: > + lines = list(itertools.takewhile( > + lambda line: line.strip().endswith('+') or > line.strip().endswith('-'), > + itertools.dropwhile( > + lambda line: not line.strip().startswith("---"), > + f.readlines() > + ) > + )) > + filenames = map(lambda line: line.strip().split(' ')[0], lines) > + # takewhile includes the --- which starts the filenames > + return list(filenames)[1:] > + > + > +def get_all_files_from_patches(patch_files: List[str]) -> Set[str]: > + return set(itertools.chain.from_iterable(map(get_patch_files, > patch_files))) > + > + > +def parse_comma_delimited_list_from_string(mod_str: str) -> List[str]: > + return list(map(str.strip, mod_str.split(','))) > + > + > +def get_dictionary_attributes_from_config_file(conf_obj: ConfigParser) -> > Dict[str, Set[str]]: > + return { > + directory: parse_comma_delimited_list_from_string(module_string) > for directory, module_string in > + conf_obj['Paths'].items() > + } > + > + > +def get_tags_for_patch_file(patch_file: str, dir_attrs: Dict[str, > Set[str]]) -> Set[str]: > + return set(itertools.chain.from_iterable( > + tags for directory, tags in dir_attrs.items() if > patch_file.startswith(directory) > + )) > + > + > +def get_tags_for_patches(patch_files: Set[str], dir_attrs: Dict[str, > Set[str]]) -> Set[str]: > + return set(itertools.chain.from_iterable( > + map(lambda patch_file: get_tags_for_patch_file(patch_file, > dir_attrs), patch_files) > + )) > + > + > +if len(sys.argv) < 3: > + print("usage: patch_parser.py file>...") > + exit(1) > + > +conf_obj = ConfigParser() > +conf_obj.read(sys.argv[1]) > + > +patch_files = get_all_files_from_patches(sys.argv[2:]) > +dir_attrs = get_dictionary_attributes_from_config_file(conf_obj) > +priority_list = > parse_comma_delimited_list_from_string(conf_obj['Priority']['priority_list']) > + > +unordered_tags: Set[str] = get_tags_for_patches(patch_files, dir_attrs) > +ordered_tags: List[str] = [tag for tag in priority_list if tag in > unordered_tags] > + > +print("\n".join(ordered_tags)) > -- > 2.27.0 > > --000000000000fa504a05b8df18e8 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
A bit of fuzz testing found some edge cases where this scr= ipt crashes or fails to properly parse the patch file. I am currently worki= ng on a rewrite=C2=A0using a dedicated library to avoid these and similar i= ssues.=C2=A0

On Fri, Dec 4, 2020 at 2:45 PM Owen Hilyard <ohilyard@iol.unh.edu> wrote:
This commit contains a script,= patch_parser.py, and a config file,
patch_parser.cfg. These are tooling that the UNH CI team has been
testing in order to reduce the number of tests that need to be run
per patch. This resulted from our push to increase the number of
functional tests running in the CI. While working on expanding test
coverage, we found that DTS could easily take over 6 hours to run, so
we decided to begin work on tagging patches and then only running the
required tests.

The script works by taking in an address for the config file and then
a list of patch files, which it will parse and then produce a list of
tags for that list of patches based on the config file. The config file
is designed to work as a mapping for a base path to a set of tags. It
also contains an ordered list of priorities for tags so that this may
also be used by hierarchical tools rather than modular ones.

The intention of the UNH team with giving this tooling to the wider
DPDK community is to have people more familiar with the internal
functionality of DPDK provide most of the tagging. This would allow
UNH to have a better turn around time for testing by eliminating
unnecessary tests, while still increasing the number of tests in the
CI.

The different patch tags are currently defined as such:

core:
=C2=A0 =C2=A0 Core DPDK functionality. Examples include kernel modules and<= br> =C2=A0 =C2=A0 librte_eal. This tag should be used sparingly as it is intend= ed
=C2=A0 =C2=A0 =C2=A0to signal to automated test suites that it is necessary= to
=C2=A0 =C2=A0 =C2=A0run most of the tests for DPDK and as such will consume= CI
=C2=A0 =C2=A0 =C2=A0resources for a long period of time.

driver:
=C2=A0 =C2=A0 For NIC drivers and other hardware interface code. This shoul= d be
=C2=A0 =C2=A0 used as a generic tag with each driver getting it's own t= ag.

application:
=C2=A0 =C2=A0 Used in a similar manner to "driver". This tag is i= ntended for
=C2=A0 =C2=A0 code used in only in applications that DPDK provides, such as=
=C2=A0 =C2=A0 testpmd or helloworld. This tag should be accompanied by a ta= g
=C2=A0 =C2=A0 which denotes which application specifically has been changed= .

documentation:
=C2=A0 =C2=A0 This is intended to be used as a tag for paths which only con= tain
=C2=A0 =C2=A0 documentation, such as "doc/". It's intended us= e is as a way to
=C2=A0 =C2=A0 trigger the automatic re-building of the documentation websit= e.

Signed-off-by: Owen Hilyard <ohilyard@iol.unh.edu>
---
=C2=A0config/patch_parser.cfg | 25 ++++++++++++++++
=C2=A0tools/patch_parser.py=C2=A0 =C2=A0| 64 ++++++++++++++++++++++++++++++= +++++++++++
=C2=A02 files changed, 89 insertions(+)
=C2=A0create mode 100644 config/patch_parser.cfg
=C2=A0create mode 100755 tools/patch_parser.py

diff --git a/config/patch_parser.cfg b/config/patch_parser.cfg
new file mode 100644
index 0000000..5757f9a
--- /dev/null
+++ b/config/patch_parser.cfg
@@ -0,0 +1,25 @@
+# Description of the categories as initially designed
+
+[Paths]
+drivers =3D
+=C2=A0 =C2=A0 driver,
+=C2=A0 =C2=A0 core
+kernel =3D core
+doc =3D documentation
+lib =3D core
+meson_options.txt =3D core
+examples =3D application
+app =3D application
+license =3D documentation
+VERSION =3D documentation
+build =3D core
+
+# This is an ordered list of the importance of each patch classification.<= br> +# It should be used to determine which classification to use on tools whic= h
+# do not support multiple patch classifications.
+[Priority]
+priority_list =3D
+=C2=A0 =C2=A0 core,
+=C2=A0 =C2=A0 driver,
+=C2=A0 =C2=A0 application,
+=C2=A0 =C2=A0 documentation
diff --git a/tools/patch_parser.py b/tools/patch_parser.py
new file mode 100755
index 0000000..01fc55d
--- /dev/null
+++ b/tools/patch_parser.py
@@ -0,0 +1,64 @@
+#!/usr/bin/env python3
+
+import itertools
+import sys
+from configparser import ConfigParser
+from typing import List, Dict, Set
+
+
+def get_patch_files(patch_file: str) -> List[str]:
+=C2=A0 =C2=A0 with open(patch_file, 'r') as f:
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 lines =3D list(itertools.takewhile(
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 lambda line: line.strip().endswi= th('+') or line.strip().endswith('-'),
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 itertools.dropwhile(
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 lambda line: not l= ine.strip().startswith("---"),
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 f.readlines()
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 )
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 ))
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 filenames =3D map(lambda line: line.strip().sp= lit(' ')[0], lines)
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 # takewhile includes the --- which starts the = filenames
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 return list(filenames)[1:]
+
+
+def get_all_files_from_patches(patch_files: List[str]) -> Set[str]:
+=C2=A0 =C2=A0 return set(itertools.chain.from_iterable(map(get_patch_files= , patch_files)))
+
+
+def parse_comma_delimited_list_from_string(mod_str: str) -> List[str]:<= br> +=C2=A0 =C2=A0 return list(map(str.strip, mod_str.split(',')))
+
+
+def get_dictionary_attributes_from_config_file(conf_obj: ConfigParser) -&g= t; Dict[str, Set[str]]:
+=C2=A0 =C2=A0 return {
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 directory: parse_comma_delimited_list_from_str= ing(module_string) for directory, module_string in
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 conf_obj['Paths'].items()
+=C2=A0 =C2=A0 }
+
+
+def get_tags_for_patch_file(patch_file: str, dir_attrs: Dict[str, Set[str]= ]) -> Set[str]:
+=C2=A0 =C2=A0 return set(itertools.chain.from_iterable(
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 tags for directory, tags in dir_attrs.items() = if patch_file.startswith(directory)
+=C2=A0 =C2=A0 ))
+
+
+def get_tags_for_patches(patch_files: Set[str], dir_attrs: Dict[str, Set[s= tr]]) -> Set[str]:
+=C2=A0 =C2=A0 return set(itertools.chain.from_iterable(
+=C2=A0 =C2=A0 =C2=A0 =C2=A0 map(lambda patch_file: get_tags_for_patch_file= (patch_file, dir_attrs), patch_files)
+=C2=A0 =C2=A0 ))
+
+
+if len(sys.argv) < 3:
+=C2=A0 =C2=A0 print("usage: patch_parser.py <path to patch_parser.= cfg> <patch file>...")
+=C2=A0 =C2=A0 exit(1)
+
+conf_obj =3D ConfigParser()
+conf_obj.read(sys.argv[1])
+
+patch_files =3D get_all_files_from_patches(sys.argv[2:])
+dir_attrs =3D get_dictionary_attributes_from_config_file(conf_obj)
+priority_list =3D parse_comma_delimited_list_from_string(conf_obj['Pri= ority']['priority_list'])
+
+unordered_tags: Set[str] =3D get_tags_for_patches(patch_files, dir_attrs)<= br> +ordered_tags: List[str] =3D [tag for tag in priority_list if tag in unorde= red_tags]
+
+print("\n".join(ordered_tags))
--
2.27.0

--000000000000fa504a05b8df18e8--