From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 0B0CF43A3D; Mon, 5 Feb 2024 19:03:46 +0100 (CET) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id E384F40DDA; Mon, 5 Feb 2024 19:03:41 +0100 (CET) Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by mails.dpdk.org (Postfix) with ESMTP id E906640269 for ; Mon, 5 Feb 2024 19:03:39 +0100 (CET) Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-1d98fc5ebceso8421225ad.1 for ; Mon, 05 Feb 2024 10:03:39 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=networkplumber-org.20230601.gappssmtp.com; s=20230601; t=1707156219; x=1707761019; darn=dpdk.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0UU5x8eOSKj/KaWrqX6oh+A5NqhkELkaQHAqy20ER9Q=; b=JRzLTjWSe9OXwqDsBRotZOFaztHXbbHIiThIgDAYB0G+EKX0R1cUFNYWeAspMP3K1u 7f3RnTSmjzgXgs7v4k4ChPukWICpiIhb+vzTqSRfN4+nP/HK04NoJZoqm/qoZx/Iy3vL swarv1vX4A1OU7+UZ/flAKfpuHeXvznR2o04fj+gWCtHo/8E5YbAYqaxSHrao8WIn6NN n6OBbmxnRifQwXn483UVAsJaQo1zKL0O5ZP1pZVoyWlEGNOJ9/NFNFoVUpVrlQxaHIM9 RkGrUFPCkcIM9H5uftzLO48cdhGz8fsyCJuohmiCde/1iaxtmX9k66d6skLliq/uRT1R fj3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1707156219; x=1707761019; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0UU5x8eOSKj/KaWrqX6oh+A5NqhkELkaQHAqy20ER9Q=; b=Agzoh1E0P7zWLO9jaupGRAV++4YD2EJXGBdfb1ljglOJ8AjodZSs2BLXIZ9QuN4ry8 toDTDW3B+Z2tVUwhSrbDey+gxyCQ9xnm5XtUp3jQnGiJmeq+O6aRjrYpEAYLAViYxV06 at3QSRvdoCvwlQSMEuj1VWdwrz/zOo/jO4kWSCfcK1MRXITE8QN8CQomYx4Dp7q1fKQI xV7J04CQBXIJBypJsA2IqyjnmMDSP8NaSIgk1DXuMrMs5XGEPNn5K2sb2LRJQP/cMdq7 SLmdif+YbvsTQlGi5ErnMgqE4OshONqWzT3lKJwzYgEujJkYvt4lrisPyhBYGpCR6lSn WbFA== X-Gm-Message-State: AOJu0YxviGC5tTUGbMyn77nV8MKBtVSWQv9Vfq352y2x3fldsPYQiCZF 20nZMneG3hIVJdgKigtVxluIZYEslKLOy6FmfLqxsBnM0EEUZ6PCGHzTjez5jfpy04UoOMorO3T wP2U= X-Google-Smtp-Source: AGHT+IHMH29o1LnT9fdZzvqN0SdPi2nT8LDMahhk3atFdr1wwpnHBl0iR4CIusQzR5jpT0QD3f/Bcg== X-Received: by 2002:a17:902:cf4a:b0:1d9:5038:f115 with SMTP id e10-20020a170902cf4a00b001d95038f115mr342542plg.4.1707156219155; Mon, 05 Feb 2024 10:03:39 -0800 (PST) X-Forwarded-Encrypted: i=0; AJvYcCXT1FtyMkfKr1V+gB7G9IAgVIuZ2JDs8ZVmuZAdlTsS8IBJfWB4kEjZBsHpSfRKm2R2IP0MfXaVb07yByREreVNi1sq Received: from hermes.local (204-195-123-141.wavecable.com. [204.195.123.141]) by smtp.gmail.com with ESMTPSA id x8-20020a170902b40800b001d9b0a15bbfsm144867plr.262.2024.02.05.10.03.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Feb 2024 10:03:38 -0800 (PST) From: Stephen Hemminger To: dev@dpdk.org Cc: Stephen Hemminger , Thomas Monjalon Subject: [PATCH v9 01/23] devtools: add script to check for non inclusive naming Date: Mon, 5 Feb 2024 09:43:29 -0800 Message-ID: <20240205180328.131019-2-stephen@networkplumber.org> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240205180328.131019-1-stephen@networkplumber.org> References: <0230331200824.195294-1-stephen@networkplumber.org> <20240205180328.131019-1-stephen@networkplumber.org> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Add a new script to find words that should not be used. It is a wrapper around git grep command. By default it prints matches but can also display counts. Uses the word lists from Inclusive Naming Initiative see https://inclusivenaming.org/word-lists/ Note: the JSON list has extra comma at end of list of elements which is not valid in basic JSON but is allowed in user-friendly JSON5 (https://json5.org/) To handle this the tool uses the PyPi package for parsing json5 format. Examples: $ ./devtools/check-inclusive-naming.py -c | head -5 app/test/test_common.c:1 app/test/test_eal_flags.c:8 app/test/test_hash.c:1 app/test/test_hash_readwrite_lf_perf.c:1 app/test/test_link_bonding_mode4.c:1 $ ./devtools/check-inclusive-naming.py lib/pcapng lib/pcapng/rte_pcapng.c: /* sanity check that is really a pcapng mbuf */ $ ./devtools/check-inclusive-naming.py -l lib/eal lib/eal/common/eal_common_memory.c lib/eal/common/eal_common_proc.c lib/eal/common/eal_common_trace.c lib/eal/common/eal_memcfg.h lib/eal/common/rte_malloc.c lib/eal/freebsd/eal.c lib/eal/linux/eal.c lib/eal/windows/eal.c Signed-off-by: Stephen Hemminger --- MAINTAINERS | 1 + devtools/check-inclusive-naming.py | 135 +++++++++++++++++++++++++++++ 2 files changed, 136 insertions(+) create mode 100755 devtools/check-inclusive-naming.py diff --git a/MAINTAINERS b/MAINTAINERS index 5fb3a73f840e..dbf7ea2d916d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -88,6 +88,7 @@ F: devtools/check-doc-vs-code.sh F: devtools/check-dup-includes.sh F: devtools/check-maintainers.sh F: devtools/check-forbidden-tokens.awk +F: devtools/check-inclusive-naming.py F: devtools/check-git-log.sh F: devtools/check-spdx-tag.sh F: devtools/check-symbol-change.sh diff --git a/devtools/check-inclusive-naming.py b/devtools/check-inclusive-naming.py new file mode 100755 index 000000000000..e8989c3c9b79 --- /dev/null +++ b/devtools/check-inclusive-naming.py @@ -0,0 +1,135 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: BSD-3-Clause +# Copyright 2023 Stephen Hemminger +# +# This script scans the source tree and creates list of files +# containing words that are recommended to be avoided by the +# Inclusive Naming Initiative. +# See: https://inclusivenaming.org/word-lists/ + +import argparse +import subprocess +from urllib.request import urlopen + +# Need JSON5 to be able to handle extra comma +import json5 + +DEFAULT_URL = 'https://inclusivenaming.org/word-lists/index.json' + +# These give false positives +skip_files = [ + 'doc/guides/rel_notes/', 'doc/guides/contributing/coding_style.rst', + 'doc/guides/prog_guide/glossary.rst' +] + +# These are allowed for now +allow_words = ['abort'] + + +def args_parse(): + "parse arguments and return the argument object back to main" + + parser = argparse.ArgumentParser( + description="Identify word usage not aligned with inclusive naming") + parser.add_argument('-c', + '--count', + help="Show the number of lines that match", + action='store_true') + parser.add_argument('-d', + '--debug', + default=False, + help="Debug this script", + action='store_true') + parser.add_argument('-l', + '--files-with-matches', + help="Show only names of files with hits", + action='store_true') + # note: tier 0 is "ok to use" + parser.add_argument('-t', + '--tier', + type=int, + choices=range(0, 4), + action='append', + help="Show non-conforming words of particular tier") + parser.add_argument('-x', + '--exclude', + default=skip_files, + action='append', + help="Exclude path from scan") + parser.add_argument('-a', + '--allow', + default=allow_words, + action='append', + help="Ignore these words") + parser.add_argument('--url', + default=DEFAULT_URL, + help="URL for the non-inclusive naming word list") + parser.add_argument('paths', nargs='*', help='files and directory to scan') + + return parser.parse_args() + + +def fetch_wordlist(url, tiers): + "Read list of words from inclusivenaming.org" + + # The wordlist is returned as JSON like: + # { + # "data" : + # [ + # { + # "term": "abort", + # "tier" : "1", + # "recommendation": "Replace when possible.", + # ... + with urlopen(url) as response: + entries = json5.loads(response.read())['data'] + + wordlist = [] + for item in entries: + tier = int(item['tier']) + if tiers.count(tier) > 0: + # convert minus sign to minus or space regex + pattern = item['term'].replace('-', '[- ]') + if not pattern in allow_words: + wordlist.append(pattern.lower()) + + return wordlist + + +def process(args): + "Find matching words" + + # Default to Tier 1, 2 and 3. + if args.tier: + tiers = args.tier + else: + tiers = list(range(1, 4)) + + wordlist = fetch_wordlist(args.url, tiers) + if args.debug: + print(f'Matching on {len(wordlist)} words') + + cmd = ['git', 'grep', '-i'] + if args.files_with_matches: + cmd.append('-l') + if args.count: + cmd.append('-c') + for word in wordlist: + cmd.append('-e') + cmd.append(word) + cmd.append('--') + for path in skip_files: + cmd.append(f':^{path}') + cmd += args.paths + if args.debug: + print(cmd) + subprocess.run(cmd, check=False) + + +def main(): + '''program main function''' + process(args_parse()) + + +if __name__ == "__main__": + main() -- 2.43.0