From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <dev-bounces@dpdk.org>
Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124])
	by inbox.dpdk.org (Postfix) with ESMTP id 384D8A0548;
	Mon, 26 Apr 2021 16:49:03 +0200 (CEST)
Received: from [217.70.189.124] (localhost [127.0.0.1])
	by mails.dpdk.org (Postfix) with ESMTP id ADA0F41110;
	Mon, 26 Apr 2021 16:49:02 +0200 (CEST)
Received: from mga03.intel.com (mga03.intel.com [134.134.136.65])
 by mails.dpdk.org (Postfix) with ESMTP id 93D0C41104
 for <dev@dpdk.org>; Mon, 26 Apr 2021 16:49:00 +0200 (CEST)
IronPort-SDR: SlPSHYTBisXNGUH609ijV6twnNzwStPpXoPodTCvlVlUhiejYCDFIvtlbSZ/rWilAL4nCkU+oa
 lPxVd/KpOgkQ==
X-IronPort-AV: E=McAfee;i="6200,9189,9966"; a="196406139"
X-IronPort-AV: E=Sophos;i="5.82,252,1613462400"; d="scan'208";a="196406139"
Received: from orsmga004.jf.intel.com ([10.7.209.38])
 by orsmga103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 26 Apr 2021 07:48:58 -0700
IronPort-SDR: Wa4Faf+tWrPbIOgNP3yzso4UxqrodbPPWDL1UHdxCpo2tynK7yNZKQVYv1HNCAB3sPcauaMKUN
 NNkeqjzVfBfQ==
X-IronPort-AV: E=Sophos;i="5.82,252,1613462400"; d="scan'208";a="536248219"
Received: from aburakov-mobl.ger.corp.intel.com (HELO [10.213.222.4])
 ([10.213.222.4])
 by orsmga004-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384;
 26 Apr 2021 07:48:57 -0700
To: Bruce Richardson <bruce.richardson@intel.com>
Cc: dev@dpdk.org, thomas@monjalon.net
References: <20210422090211.320855-1-bruce.richardson@intel.com>
 <20210426105403.226004-1-bruce.richardson@intel.com>
 <32b16ecf-be20-1ad2-43b1-df0f4864f681@intel.com>
 <YIbIx8H6p2kMz9RA@bricha3-MOBL.ger.corp.intel.com>
From: "Burakov, Anatoly" <anatoly.burakov@intel.com>
Message-ID: <58a49a64-62c5-d633-08c1-d57e5e9acd4c@intel.com>
Date: Mon, 26 Apr 2021 15:48:54 +0100
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101
 Thunderbird/78.10.0
MIME-Version: 1.0
In-Reply-To: <YIbIx8H6p2kMz9RA@bricha3-MOBL.ger.corp.intel.com>
Content-Type: text/plain; charset=utf-8; format=flowed
Content-Language: en-US
Content-Transfer-Encoding: 7bit
Subject: Re: [dpdk-dev] [PATCH v2 1/2] devtools: script to check meson
 indentation of lists
X-BeenThere: dev@dpdk.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: DPDK patches and discussions <dev.dpdk.org>
List-Unsubscribe: <https://mails.dpdk.org/options/dev>,
 <mailto:dev-request@dpdk.org?subject=unsubscribe>
List-Archive: <http://mails.dpdk.org/archives/dev/>
List-Post: <mailto:dev@dpdk.org>
List-Help: <mailto:dev-request@dpdk.org?subject=help>
List-Subscribe: <https://mails.dpdk.org/listinfo/dev>,
 <mailto:dev-request@dpdk.org?subject=subscribe>
Errors-To: dev-bounces@dpdk.org
Sender: "dev" <dev-bounces@dpdk.org>

On 26-Apr-21 3:05 PM, Bruce Richardson wrote:
> On Mon, Apr 26, 2021 at 02:40:25PM +0100, Burakov, Anatoly wrote:
>> On 26-Apr-21 11:54 AM, Bruce Richardson wrote:
>>> This is a script to fix up minor formatting issues in meson files.
>>> It scans for, and can optionally fix, indentation issues and missing
>>> trailing commas in the lists in meson.build files. It also detects,
>>> and can fix, multi-line lists where more than one entry appears on a
>>> line.
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>> ---
>>
>> <snip>
>>
>>> +def split_code_comments(line):
>>> +    'splits a line into a code part and a comment part, returns (code, comment) tuple'
>>> +    if line.lstrip().startswith('#'):
>>> +        return ('', line)
>>> +    elif '#' in line and '#include' not in line:  # catch 99% of cases, not 100% > +        idx = line.index('#')
>>> +        while (line[idx - 1].isspace()):
>>> +            idx -= 1
>>> +        return line[:idx], line[idx:]
>>
>>
>> I think this could be simplified with regex:
>>
>> # find any occurrences of '#' but only if it's not an '#include'
>> if not re.search(r'#(?!include)', line)
>>      return line, ''
>> return line.split('#', maxsplit=1)
> 
> Not sure that is simpler, and just splitting on '#' is actually not what we
> want either.
> 
> Firstly, while r'#(?!include)' is not a massively complex regex, just
> checking for "'#' in line and '#include' not in line" is just easier to
> read for most mortals.
> 
> In terms of the split, I did initially do as you have here and split on
> '#', but we don't actually want that, because we want to preserve the
> whitespace in the line before the comment too - as part of the comment, not
> the code. This is why after finding the '#' we walk backwards to find the
> end of the code and find that as the split point. It then saves us worrying
> about any strips() breaking any comment alignment the user has explicitly
> set up. Not using split also means that we can just merge the strings back
> with '+' rather than having to use "'#'.join()".

Fair enough so!

> 
>>
>>> +    else:
>>> +        return (line, '')
>>> +
>>> +
>>> +def setline(contents, index, value):
>>> +    'sets the contents[index] to value. Returns the line, along with code and comments part'
>>> +    line = contents[index] = value
>>> +    code, comments = split_code_comments(line)
>>> +    return line, code, comments
>>> +
>>> +
>>> +def check_indentation(filename, contents):
>>> +    '''check that a list or files() is correctly indented'''
>>> +    infiles = False
>>> +    inlist = False
>>> +    edit_count = 0
>>> +    for lineno, line in enumerate(contents):
>>> +        code, comments = split_code_comments(line)
>>
>> Nitpicking, but maybe instead of calling strip() all over the place, just
>> count the number of spaces and strip right at the outset? E.g. something
>> like:
>>
>> stripped = code.strip()
>> line_indent = len(code) - len(stripped)
>>
>> You can then reason about indent levels by comparing stripped to code
>> afterwards, and avoid doing this:
>>
>>> +            # skip further subarrays or lists
>>> +            if '[' in code or ']' in code:
>>> +                continue
>>> +            if not code.startswith(indent) or code[len(indent)] == ' ':
>>
>> Opting to just check the indent size you calculated initially. Unless i'm
>> missing something :)
>>
>> You could also increment edit_count if `calculated indent + stripped` is
>> equal to `code`. Seems easier logic than raw string manipulation you're
>> going for here...
>>
>> <snip>
> 
> Interesting. That could be a good approach alright. If I do a V3 (not
> guaranteed for this release) I can try taking that idea on board.
> 
>>
>>> +def process_file(filename, fix):
>>> +    '''run checks on file "filename"'''
>>> +    if VERBOSE:
>>> +        print(f'Processing {filename}')
>>> +    with open(filename) as f:
>>> +        contents = [ln.rstrip() for ln in f.readlines()]
>>
>> So any trailing whitespace gets automatically and silently fixed?
>>
> Hadn't actually thought of that, but yes, that will happen if --fix is
> given and other changes are made to the file. Ideally, that should be fixed
> to "non-silently" do so, but I'd view it as low priority since other tools
> tend to be good at flagging trailing whitespace issues anyway.

Yeah, i think there's not a lot of trailing whitespace there in the 
first place, so probably a non issue.

> 
>>> +
>>> +    if check_indentation(filename, contents) > 0 and fix:
>>> +        print(f"Fixing {filename}")
>>> +        with open(filename, 'w') as f:
>>> +            f.writelines([f'{ln}\n' for ln in contents])
>>
>> Something seems suspect here. So, if `fix` is *not* specified, the script
>> just opens the file, reads it, and... does nothing else?
>>
> No, it prints out all the errors without actually fixing them.
> 

Oh, right, check_indentation will run first. Never mind!

In any case,

Reviewed-by: Anatoly Burakov <anatoly.burakov@intel.com>

-- 
Thanks,
Anatoly