From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 7C3F643247;
	Mon, 30 Oct 2023 23:18:19 +0100 (CET)
Received: from mails.dpdk.org (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id 49DE540266;
	Mon, 30 Oct 2023 23:18:19 +0100 (CET)
Received: from mail-pf1-f172.google.com (mail-pf1-f172.google.com
 [209.85.210.172])
 by mails.dpdk.org (Postfix) with ESMTP id 33E5D40042
 for <dev@dpdk.org>; Mon, 30 Oct 2023 23:18:18 +0100 (CET)
Received: by mail-pf1-f172.google.com with SMTP id
 d2e1a72fcca58-6bd0e1b1890so4175438b3a.3
 for <dev@dpdk.org>; Mon, 30 Oct 2023 15:18:18 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1698704297;
 x=1699309097; darn=dpdk.org; 
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:from:to:cc:subject:date
 :message-id:reply-to;
 bh=3gkwrIsat1sjLuzpMuOtjqn3/5EkQVvoxNjBnybj0xs=;
 b=NHg64wVE9KWE7lyBK4vczDmc8SKa5nla1voA5FA1xtbAP5ZIJ0V/t25MCrEaZlGRLn
 /hydfyEKCnnye1fRYuWqCiwx++XZLHrf8lPfDl5ro2hJ/n205Lwi1l20ShCZgjZfsyD0
 Lyd+TGkfZq9QBDZzaoqFw1T4gXIRiUzszYPs2BeS2TbRtDJ6zSKRgXakyDHc07y+HOyk
 wMMvQYYfbx3OJkYQRpaUSmHjBuFagqzcoA8ND+hzzugqo2SY/hZHqeX6F2bCxr6DupVc
 KEX80lanUVsphNwOjyyiqJCW508QTZ4Utho8Yl49rHgZGRQIhaqiZk5BOGrVi730dw/Q
 gpWg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20230601; t=1698704297; x=1699309097;
 h=content-transfer-encoding:mime-version:references:in-reply-to
 :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=3gkwrIsat1sjLuzpMuOtjqn3/5EkQVvoxNjBnybj0xs=;
 b=awYXT7yT04iYpsiUUWoMXfTvaVca3cZqpNTLPCCWtoMx/5x9Yq5PHsffoHkezV7Xfh
 ODUyfBH26P//nx08HNNU8GjNXlsZ6fW2FfFfd34TixOUMnxs7pGlEZlXWAtNGNnNSka6
 /rNPxWNqc3jx8/Y3N1sdJ/z3qYw3lFCe90FZNJnGRR+3SvxuqibzBAn5963rY5xqQwu+
 bXlu1ImwPw4/IM7lp4prirNZ+EtIEvqqMjcRcySmCQFd3sRciMQQh3gEyvgwy+2guIhi
 02eCqDe3ToFKu8/w6nNlrQN1srAe801bu2z1BbZiM6KN60weshJNDYPoSjqXwVgxHONN
 5/XA==
X-Gm-Message-State: AOJu0YyiurLYzqOmshUla9018De9r1J24iQ2U3C9TNuAlHHetSb8Suly
 j0ZBKnBftGW5/AQD8gznolAKRhHZZt/a4CJav+4Y15V7m4M=
X-Google-Smtp-Source: AGHT+IEVr5i+Vg5iaQVjttMTREfGWDBFddMSdB07+Lh7p3TB/gg3RCajKhts64N+cDHAZjzj3Se3Tw==
X-Received: by 2002:a05:6a00:15d4:b0:68e:3eb6:d45 with SMTP id
 o20-20020a056a0015d400b0068e3eb60d45mr10244636pfu.30.1698704296917; 
 Mon, 30 Oct 2023 15:18:16 -0700 (PDT)
Received: from fedora.. ([38.142.2.14]) by smtp.gmail.com with ESMTPSA id
 d12-20020a056a0024cc00b0069319bfed42sm19860pfv.79.2023.10.30.15.18.15
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Mon, 30 Oct 2023 15:18:16 -0700 (PDT)
From: Stephen Hemminger <stephen@networkplumber.org>
To: dev@dpdk.org
Cc: Stephen Hemminger <stephen@networkplumber.org>
Subject: [PATCH v5] devtools: add script to check for non inclusive naming
Date: Mon, 30 Oct 2023 15:17:48 -0700
Message-ID: <20231030221813.63826-1-stephen@networkplumber.org>
X-Mailer: git-send-email 2.41.0
In-Reply-To: <0230331200824.195294-1-stephen@networkplumber.org>
References: <0230331200824.195294-1-stephen@networkplumber.org>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org

Script to find words that should not be used. Really just a wrapper around git grep command.
By default it prints matches.

Uses the word lists from Inclusive Naming Initiative
see https://inclusivenaming.org/word-lists/
Note: the list has extra comma at end of list of elements which is not valid in basic
JSON but allowed in user-friendly JSON5 (https://json5.org/) and therefore needs the
PyPi package for parsing json5 format.

Examples:
$ ./devtools/check-naming-policy.sh -c
app/test-compress-perf/comp_perf_test_cyclecount.c:1
uapp/test-compress-perf/comp_perf_test_throughput.c:1
app/test-compress-perf/comp_perf_test_verify.c:1
app/test/test_common.c:1
...

$ ./devtools/check-naming-policy.py lib/pcapng
lib/pcapng/rte_pcapng.c:                /* sanity check that is really a pcapng mbuf */

$ ./devtools/check-naming-policy.py -l lib/eal
lib/eal/common/eal_common_memory.c
lib/eal/common/eal_common_proc.c
lib/eal/common/eal_common_trace.c
lib/eal/common/eal_memcfg.h
lib/eal/common/rte_malloc.c
lib/eal/freebsd/eal.c
lib/eal/include/generic/rte_power_intrinsics.h
lib/eal/include/generic/rte_rwlock.h
lib/eal/include/generic/rte_spinlock.h
lib/eal/include/rte_debug.h
lib/eal/include/rte_seqlock.h
lib/eal/linux/eal.c
lib/eal/windows/eal.c
lib/eal/x86/include/rte_rtm.h
lib/eal/x86/include/rte_spinlock.h
lib/eal/x86/rte_power_intrinsics.c

$ ./devtools/check-inclusive-naming -h
usage: check-inclusive-naming.py [-h] [-c] [-d] [-l] [-t {0,1,2,3}]
                                 [-x EXCLUDE] [--url URL]
                                 [paths ...]

Identify word usage not aligned with inclusive naming

positional arguments:
  paths                 files and directory to scan

options:
  -h, --help            show this help message and exit
  -c, --count           Show the nuber of lines that match
  -d, --debug           Debug this script
  -l, --files-with-matches
                        Show only names of files with hits
  -t {0,1,2,3}, --tier {0,1,2,3}
                        Show non-conforming words of particular tier
  -x EXCLUDE, --exclude EXCLUDE
                        Exclude path from scan
  --url URL             URL for the non-inclusive naming word list

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v5 - resolve spelling an python lint errors

 MAINTAINERS                        |   1 +
 devtools/check-inclusive-naming.py | 127 +++++++++++++++++++++++++++++
 2 files changed, 128 insertions(+)
 create mode 100755 devtools/check-inclusive-naming.py

diff --git a/MAINTAINERS b/MAINTAINERS
index 4083658697..b53600ff51 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -89,6 +89,7 @@ F: devtools/check-doc-vs-code.sh
 F: devtools/check-dup-includes.sh
 F: devtools/check-maintainers.sh
 F: devtools/check-forbidden-tokens.awk
+F: devtools/check-inclusive-naming.py
 F: devtools/check-git-log.sh
 F: devtools/check-spdx-tag.sh
 F: devtools/check-symbol-change.sh
diff --git a/devtools/check-inclusive-naming.py b/devtools/check-inclusive-naming.py
new file mode 100755
index 0000000000..092d2c5625
--- /dev/null
+++ b/devtools/check-inclusive-naming.py
@@ -0,0 +1,127 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2023 Stephen Hemminger
+#
+# This script scans the source tree and creates list of files
+# containing words that are recommended to bavoide by the
+# Inclusive Naming Initiative.
+# See: https://inclusivenaming.org/word-lists/
+
+import argparse
+import subprocess
+from urllib.request import urlopen
+
+# Need JSON5 to be able to handle extra comma
+import json5
+
+naming_url = 'https://inclusivenaming.org/word-lists/index.json'
+
+# These give false positives
+dont_scan = [
+    'doc/guides/rel_notes/',
+    'doc/guides/contributing/coding_style.rst'
+    'doc/guides/prog_guide/glossary.rst'
+]
+
+
+def args_parse():
+    "parse arguments and return the argument object back to main"
+
+    parser = argparse.ArgumentParser(
+        description="Identify word usage not aligned with inclusive naming")
+    parser.add_argument("-c",
+                        "--count",
+                        help="Show the nuber of lines that match",
+                        action='store_true')
+    parser.add_argument("-d",
+                        "--debug",
+                        default=False,
+                        help="Debug this script",
+                        action='store_true')
+    parser.add_argument("-l",
+                        "--files-with-matches",
+                        help="Show only names of files with hits",
+                        action='store_true')
+    # note: tier 0 is "ok to use"
+    parser.add_argument("-t",
+                        "--tier",
+                        type=int,
+                        choices=range(0, 4),
+                        action='append',
+                        help="Show non-conforming words of particular tier")
+    parser.add_argument('-x',
+                        "--exclude",
+                        default=dont_scan,
+                        action='append',
+                        help="Exclude path from scan")
+    parser.add_argument('--url',
+                        default=naming_url,
+                        help="URL for the non-inclusive naming word list")
+    parser.add_argument('paths', nargs='*',
+                        help='files and directory to scan')
+
+    return parser.parse_args()
+
+
+def fetch_wordlist(url, tiers):
+    "Read list of words from inclusivenaming.org"
+
+    response = urlopen(url)
+    # The wordlist is returned as JSON like:
+    # {
+    # "data" :
+    #         [
+    #             {
+    #                 "term": "abort",
+    #                 "tier" : "1",
+    #                 "recommendation": "Replace when possible.",
+    # ...
+    entries = json5.loads(response.read())['data']
+
+    wordlist = []
+    for item in entries:
+        tier = int(item['tier'])
+        if (tiers.count(tier) > 0):
+            # convert minus sign to minus or space regex
+            pattern = item['term'].replace('-', '[- ]')
+            wordlist.append(pattern.lower())
+    return wordlist
+
+
+def process(args):
+    "Find matching words"
+
+    # Default to Tier 1, 2 and 3.
+    if (args.tier):
+        tiers = args.tier
+    else:
+        tiers = list(range(1, 4))
+
+    wordlist = fetch_wordlist(args.url, tiers)
+    if (args.debug):
+        print("Matching on {} words".format(len(wordlist)))
+
+    cmd = ['git', 'grep', '-i']
+    if (args.files_with_matches):
+        cmd.append('-l')
+    if (args.count):
+        cmd.append('-c')
+    for word in wordlist:
+        cmd.append('-e')
+        cmd.append(word)
+    cmd.append('--')
+    # convert the dont_scan paths to regexp
+    for path in dont_scan:
+        cmd.append(':^{}'.format(path))
+    cmd += args.paths
+    if args.debug:
+        print(cmd)
+    subprocess.run(cmd)
+
+
+def main():
+    process(args_parse())
+
+
+if __name__ == "__main__":
+    main()
-- 
2.41.0