[PATCH v9 01/23] devtools: add script to check for non inclusive naming

DPDK patches and discussions
 help / color / mirror / Atom feed

From: Stephen Hemminger <stephen@networkplumber.org>
To: dev@dpdk.org
Cc: Stephen Hemminger <stephen@networkplumber.org>,
	Thomas Monjalon <thomas@monjalon.net>
Subject: [PATCH v9 01/23] devtools: add script to check for non inclusive naming
Date: Mon,  5 Feb 2024 09:43:29 -0800	[thread overview]
Message-ID: <20240205180328.131019-2-stephen@networkplumber.org> (raw)
In-Reply-To: <20240205180328.131019-1-stephen@networkplumber.org>

Add a new script to find words that should not be used.
It is a wrapper around git grep command.
By default it prints matches but can also display counts.

Uses the word lists from Inclusive Naming Initiative
see https://inclusivenaming.org/word-lists/

Note: the JSON list has extra comma at end of list of elements which is not
valid in basic JSON but is allowed in user-friendly JSON5 (https://json5.org/)
To handle this the tool uses the PyPi package for parsing json5 format.

Examples:
$ ./devtools/check-inclusive-naming.py -c | head -5
app/test/test_common.c:1
app/test/test_eal_flags.c:8
app/test/test_hash.c:1
app/test/test_hash_readwrite_lf_perf.c:1
app/test/test_link_bonding_mode4.c:1

$ ./devtools/check-inclusive-naming.py lib/pcapng
lib/pcapng/rte_pcapng.c:		/* sanity check that is really a pcapng mbuf */

$ ./devtools/check-inclusive-naming.py -l lib/eal
lib/eal/common/eal_common_memory.c
lib/eal/common/eal_common_proc.c
lib/eal/common/eal_common_trace.c
lib/eal/common/eal_memcfg.h
lib/eal/common/rte_malloc.c
lib/eal/freebsd/eal.c
lib/eal/linux/eal.c
lib/eal/windows/eal.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 MAINTAINERS                        |   1 +
 devtools/check-inclusive-naming.py | 135 +++++++++++++++++++++++++++++
 2 files changed, 136 insertions(+)
 create mode 100755 devtools/check-inclusive-naming.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 5fb3a73f840e..dbf7ea2d916d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -88,6 +88,7 @@ F: devtools/check-doc-vs-code.sh
 F: devtools/check-dup-includes.sh
 F: devtools/check-maintainers.sh
 F: devtools/check-forbidden-tokens.awk
+F: devtools/check-inclusive-naming.py
 F: devtools/check-git-log.sh
 F: devtools/check-spdx-tag.sh
 F: devtools/check-symbol-change.sh
diff --git a/devtools/check-inclusive-naming.py b/devtools/check-inclusive-naming.py
new file mode 100755
index 000000000000..e8989c3c9b79
--- /dev/null
+++ b/devtools/check-inclusive-naming.py
@@ -0,0 +1,135 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2023 Stephen Hemminger
+#
+# This script scans the source tree and creates list of files
+# containing words that are recommended to be avoided by the
+# Inclusive Naming Initiative.
+# See: https://inclusivenaming.org/word-lists/
+
+import argparse
+import subprocess
+from urllib.request import urlopen
+
+# Need JSON5 to be able to handle extra comma
+import json5
+
+DEFAULT_URL = 'https://inclusivenaming.org/word-lists/index.json'
+
+# These give false positives
+skip_files = [
+    'doc/guides/rel_notes/', 'doc/guides/contributing/coding_style.rst',
+    'doc/guides/prog_guide/glossary.rst'
+]
+
+# These are allowed for now
+allow_words = ['abort']
+
+
+def args_parse():
+    "parse arguments and return the argument object back to main"
+
+    parser = argparse.ArgumentParser(
+        description="Identify word usage not aligned with inclusive naming")
+    parser.add_argument('-c',
+                        '--count',
+                        help="Show the number of lines that match",
+                        action='store_true')
+    parser.add_argument('-d',
+                        '--debug',
+                        default=False,
+                        help="Debug this script",
+                        action='store_true')
+    parser.add_argument('-l',
+                        '--files-with-matches',
+                        help="Show only names of files with hits",
+                        action='store_true')
+    # note: tier 0 is "ok to use"
+    parser.add_argument('-t',
+                        '--tier',
+                        type=int,
+                        choices=range(0, 4),
+                        action='append',
+                        help="Show non-conforming words of particular tier")
+    parser.add_argument('-x',
+                        '--exclude',
+                        default=skip_files,
+                        action='append',
+                        help="Exclude path from scan")
+    parser.add_argument('-a',
+                        '--allow',
+                        default=allow_words,
+                        action='append',
+                        help="Ignore these words")
+    parser.add_argument('--url',
+                        default=DEFAULT_URL,
+                        help="URL for the non-inclusive naming word list")
+    parser.add_argument('paths', nargs='*', help='files and directory to scan')
+
+    return parser.parse_args()
+
+
+def fetch_wordlist(url, tiers):
+    "Read list of words from inclusivenaming.org"
+
+    # The wordlist is returned as JSON like:
+    # {
+    # "data" :
+    #         [
+    #             {
+    #                 "term": "abort",
+    #                 "tier" : "1",
+    #                 "recommendation": "Replace when possible.",
+    # ...
+    with urlopen(url) as response:
+        entries = json5.loads(response.read())['data']
+
+    wordlist = []
+    for item in entries:
+        tier = int(item['tier'])
+        if tiers.count(tier) > 0:
+            # convert minus sign to minus or space regex
+            pattern = item['term'].replace('-', '[- ]')
+            if not pattern in allow_words:
+                wordlist.append(pattern.lower())
+
+    return wordlist
+
+
+def process(args):
+    "Find matching words"
+
+    # Default to Tier 1, 2 and 3.
+    if args.tier:
+        tiers = args.tier
+    else:
+        tiers = list(range(1, 4))
+
+    wordlist = fetch_wordlist(args.url, tiers)
+    if args.debug:
+        print(f'Matching on {len(wordlist)} words')
+
+    cmd = ['git', 'grep', '-i']
+    if args.files_with_matches:
+        cmd.append('-l')
+    if args.count:
+        cmd.append('-c')
+    for word in wordlist:
+        cmd.append('-e')
+        cmd.append(word)
+    cmd.append('--')
+    for path in skip_files:
+        cmd.append(f':^{path}')
+    cmd += args.paths
+    if args.debug:
+        print(cmd)
+    subprocess.run(cmd, check=False)
+
+
+def main():
+    '''program main function'''
+    process(args_parse())
+
+
+if __name__ == "__main__":
+    main()
-- 
2.43.0

next prev parent reply	other threads:[~2024-02-05 18:03 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <0230331200824.195294-1-stephen@networkplumber.org>
2023-04-05 23:29 ` [PATCH v3] " Stephen Hemminger
2023-08-17 14:58   ` Stephen Hemminger
2023-04-19 15:00 ` [PATCH] " Stephen Hemminger
2023-10-30 21:33 ` [PATCH v4] " Stephen Hemminger
2023-10-30 22:17 ` [PATCH v5] " Stephen Hemminger
2023-10-30 22:22 ` [PATCH v6] " Stephen Hemminger
2023-10-30 22:32 ` [PATCH v7] " Stephen Hemminger
2023-11-02 20:57   ` [PATCH v8] " Stephen Hemminger
2024-02-05 17:43 ` [PATCH v9 00/23] Use inclusive naming in DPDK Stephen Hemminger
2024-02-05 17:43   ` Stephen Hemminger [this message]
2024-02-05 17:43   ` [PATCH v9 02/23] test: replace use of term segregate Stephen Hemminger
2024-11-26 22:55     ` Thomas Monjalon
2024-02-05 17:43   ` [PATCH v9 03/23] examples/ptp: replace terms master and slave Stephen Hemminger
2024-06-14 15:41     ` [PATCH v10] " Stephen Hemminger
2024-10-22 16:39       ` Stephen Hemminger
2024-10-22 17:26         ` Ajit Khaparde
2024-10-24  2:06           ` Ajit Khaparde
2024-11-13 17:33             ` Thomas Monjalon
2024-11-13 17:52               ` Stephen Hemminger
2024-11-13 19:11                 ` Thomas Monjalon
2024-02-05 17:43   ` [PATCH v9 04/23] test: remove use of word master in test_red Stephen Hemminger
2024-11-26 22:52     ` Thomas Monjalon
2024-02-05 17:43   ` [PATCH v9 05/23] mbuf: replace term sanity check Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 06/23] eal: replace use of sanity check in comments and messages Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 07/23] test: replace use word sanity Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 08/23] examples: remove term sanity Stephen Hemminger
2024-02-06 10:05     ` [EXT] " Akhil Goyal
2024-02-05 17:43   ` [PATCH v9 09/23] lib: replace use of sanity check in comments and messages Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 10/23] doc/eventdev_pipeline: remove sanity Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 11/23] net/ring: replace use of sanity Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 12/23] net/fm10k, net/ixgbe: remove word sanity Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 13/23] net/mlx[45]: " Stephen Hemminger
2024-02-05 19:22     ` Dariusz Sosnowski
2024-02-05 17:43   ` [PATCH v9 14/23] net/sfc: remove term "sanity check" Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 15/23] net/ark: replace use of term sanity Stephen Hemminger
2024-02-05 21:12     ` Ed Czeck
2024-02-05 17:43   ` [PATCH v9 16/23] net/bnxt: " Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 17/23] net/bnx2x: remove reference to sanity Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 18/23] cnxk: replace term sanity Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 19/23] event/opdl: remove " Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 20/23] net/txgbe: replace " Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 21/23] net/cxgbe: remove use of " Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 22/23] crypto/bcmfs: replace term sanity check Stephen Hemminger
2024-02-05 17:43   ` [PATCH v9 23/23] drivers: remove use of " Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240205180328.131019-2-stephen@networkplumber.org \
    --to=stephen@networkplumber.org \
    --cc=dev@dpdk.org \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).