From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 29F0FA0548; Mon, 26 Apr 2021 16:06:08 +0200 (CEST) Received: from [217.70.189.124] (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 00EC641110; Mon, 26 Apr 2021 16:06:08 +0200 (CEST) Received: from mga05.intel.com (mga05.intel.com [192.55.52.43]) by mails.dpdk.org (Postfix) with ESMTP id 684CF41104 for ; Mon, 26 Apr 2021 16:06:05 +0200 (CEST) IronPort-SDR: G0kPjKFAn/lYB6satgfdvFG/598XiWoQpRZ/qOxA284BaTG+Wszy56rkzzhhYTMcpWOwxdfC/y R3G6acUwqGtw== X-IronPort-AV: E=McAfee;i="6200,9189,9966"; a="281669974" X-IronPort-AV: E=Sophos;i="5.82,252,1613462400"; d="scan'208";a="281669974" Received: from orsmga003.jf.intel.com ([10.7.209.27]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 26 Apr 2021 07:06:04 -0700 IronPort-SDR: 1KOkHiz7JHIw98qlpXau8ozWUiDquFtDSwCvIOHb70Q2ApwE5U1+QNsV3j7qetNS+tnjluyUAx qnaOgE12THAw== X-IronPort-AV: E=Sophos;i="5.82,252,1613462400"; d="scan'208";a="385807510" Received: from bricha3-mobl.ger.corp.intel.com ([10.252.12.160]) by orsmga003-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-SHA; 26 Apr 2021 07:06:03 -0700 Date: Mon, 26 Apr 2021 15:05:59 +0100 From: Bruce Richardson To: "Burakov, Anatoly" Cc: dev@dpdk.org, thomas@monjalon.net Message-ID: References: <20210422090211.320855-1-bruce.richardson@intel.com> <20210426105403.226004-1-bruce.richardson@intel.com> <32b16ecf-be20-1ad2-43b1-df0f4864f681@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <32b16ecf-be20-1ad2-43b1-df0f4864f681@intel.com> Subject: Re: [dpdk-dev] [PATCH v2 1/2] devtools: script to check meson indentation of lists X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org Sender: "dev" On Mon, Apr 26, 2021 at 02:40:25PM +0100, Burakov, Anatoly wrote: > On 26-Apr-21 11:54 AM, Bruce Richardson wrote: > > This is a script to fix up minor formatting issues in meson files. > > It scans for, and can optionally fix, indentation issues and missing > > trailing commas in the lists in meson.build files. It also detects, > > and can fix, multi-line lists where more than one entry appears on a > > line. > > > > Signed-off-by: Bruce Richardson > > --- > > > > > +def split_code_comments(line): > > + 'splits a line into a code part and a comment part, returns (code, comment) tuple' > > + if line.lstrip().startswith('#'): > > + return ('', line) > > + elif '#' in line and '#include' not in line: # catch 99% of cases, not 100% > + idx = line.index('#') > > + while (line[idx - 1].isspace()): > > + idx -= 1 > > + return line[:idx], line[idx:] > > > I think this could be simplified with regex: > > # find any occurrences of '#' but only if it's not an '#include' > if not re.search(r'#(?!include)', line) > return line, '' > return line.split('#', maxsplit=1) Not sure that is simpler, and just splitting on '#' is actually not what we want either. Firstly, while r'#(?!include)' is not a massively complex regex, just checking for "'#' in line and '#include' not in line" is just easier to read for most mortals. In terms of the split, I did initially do as you have here and split on '#', but we don't actually want that, because we want to preserve the whitespace in the line before the comment too - as part of the comment, not the code. This is why after finding the '#' we walk backwards to find the end of the code and find that as the split point. It then saves us worrying about any strips() breaking any comment alignment the user has explicitly set up. Not using split also means that we can just merge the strings back with '+' rather than having to use "'#'.join()". > > > + else: > > + return (line, '') > > + > > + > > +def setline(contents, index, value): > > + 'sets the contents[index] to value. Returns the line, along with code and comments part' > > + line = contents[index] = value > > + code, comments = split_code_comments(line) > > + return line, code, comments > > + > > + > > +def check_indentation(filename, contents): > > + '''check that a list or files() is correctly indented''' > > + infiles = False > > + inlist = False > > + edit_count = 0 > > + for lineno, line in enumerate(contents): > > + code, comments = split_code_comments(line) > > Nitpicking, but maybe instead of calling strip() all over the place, just > count the number of spaces and strip right at the outset? E.g. something > like: > > stripped = code.strip() > line_indent = len(code) - len(stripped) > > You can then reason about indent levels by comparing stripped to code > afterwards, and avoid doing this: > > > + # skip further subarrays or lists > > + if '[' in code or ']' in code: > > + continue > > + if not code.startswith(indent) or code[len(indent)] == ' ': > > Opting to just check the indent size you calculated initially. Unless i'm > missing something :) > > You could also increment edit_count if `calculated indent + stripped` is > equal to `code`. Seems easier logic than raw string manipulation you're > going for here... > > Interesting. That could be a good approach alright. If I do a V3 (not guaranteed for this release) I can try taking that idea on board. > > > +def process_file(filename, fix): > > + '''run checks on file "filename"''' > > + if VERBOSE: > > + print(f'Processing {filename}') > > + with open(filename) as f: > > + contents = [ln.rstrip() for ln in f.readlines()] > > So any trailing whitespace gets automatically and silently fixed? > Hadn't actually thought of that, but yes, that will happen if --fix is given and other changes are made to the file. Ideally, that should be fixed to "non-silently" do so, but I'd view it as low priority since other tools tend to be good at flagging trailing whitespace issues anyway. > > + > > + if check_indentation(filename, contents) > 0 and fix: > > + print(f"Fixing {filename}") > > + with open(filename, 'w') as f: > > + f.writelines([f'{ln}\n' for ln in contents]) > > Something seems suspect here. So, if `fix` is *not* specified, the script > just opens the file, reads it, and... does nothing else? > No, it prints out all the errors without actually fixing them. Regards, /Bruce