From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mails.dpdk.org (mails.dpdk.org [217.70.189.124]) by inbox.dpdk.org (Postfix) with ESMTP id 98F5C44155; Tue, 4 Jun 2024 17:13:12 +0200 (CEST) Received: from mails.dpdk.org (localhost [127.0.0.1]) by mails.dpdk.org (Postfix) with ESMTP id 67F4B40A6D; Tue, 4 Jun 2024 17:13:12 +0200 (CEST) Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by mails.dpdk.org (Postfix) with ESMTP id A7C814026E for ; Tue, 4 Jun 2024 17:13:10 +0200 (CEST) Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-52b936c958dso3800490e87.0 for ; Tue, 04 Jun 2024 08:13:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=pantheon.tech; s=google; t=1717513990; x=1718118790; darn=dpdk.org; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :from:to:cc:subject:date:message-id:reply-to; bh=XDWs2RF8OfUJRtK10SJ7baFmK/3NO/xA8SZsBgmVcxY=; b=KQxvzKigRCLJqd/UPfSN9HF2G+FFLZFy6rx3Ruux7eiVHweA6IwZgoUHqQ+ICmW/5F nova1kjAXRoRfHo14PERUkuV6SmEzTatXc3Z0eFUfb2veZHgrg1QWAahTF2+pBE4sGES ocgLNtaCgdNfVVv78+SLjTrQRC2QyUDcHxSGezA8pmD3x43oxRe2wNxSR9MeGckQFf3s ROODZ1VG7oqUFQkikqW0w64GZ1gE8LzhcYBr3TpjJCsXFytcAPbpT7BE8Ei1ZpV9KJ1S i7vKX4aZXeZ0e1NMAJ6mfm/qcnaUfFpDzA1AAiBfO89pw1vHrcaW9zWRdpNg40O/rUHF OU3A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1717513990; x=1718118790; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=XDWs2RF8OfUJRtK10SJ7baFmK/3NO/xA8SZsBgmVcxY=; b=lfRILMze9TWaUrVJvOJ86nqp4lDzOMMfW6e89CgHcorBzE1POfSYe6G6PidUOJGhSI p48zTaEEmQl9MSZtGRdeFI/FRIiYH9VG5ihQ+mC0YgkUkaNscTkWMtntbtqkeExEUWZm Ox0H58D8nnpq6HLMb63KReFpx5uxhUtOxYxYZjbLrjMU5zJDj5bMPJ0aT4ysQNlWRriD zVQhTUwgDHcWYTnNpMpnCZ/Jw/1A4kW36Uw3yw2jh6lX7S9Ww6cdAdugoFEoh+In10r5 Kq4d138uVnCzkCw9RtxyvgRpjDUttmNJaS4iloRsNz0VsBEk+UfLhGbMkGHxgGVI0GLn x4uA== X-Forwarded-Encrypted: i=1; AJvYcCW9j6Uzvjac3v4y7rhz0RuJmzyuvg8dLniW8wnhFJcnQLLzVQu2DtUO6GCkzTx0Ui6f499YluS98l7qYA8= X-Gm-Message-State: AOJu0YxJznkZkgzEJsHs3tktMIl6b2ZhoafJd18UYbIAKSOlVdFFHHFx pPM6+h9UatRFzylqSQk9zXtK5N4ux2usxCFRdkwVyV51y7IPszN1HOpD/RwhUzY= X-Google-Smtp-Source: AGHT+IEcr/7wIibHzMctwXdtM5KM5r1CXLjxkS4/xTcEQ1dpwVsnpi6bS8ufRcl2GJKC1eNCmRmL+Q== X-Received: by 2002:a05:6512:3126:b0:52b:84bd:345d with SMTP id 2adb3069b0e04-52b896bf8d7mr8294453e87.49.1717513989855; Tue, 04 Jun 2024 08:13:09 -0700 (PDT) Received: from [192.168.1.113] ([84.245.121.236]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a67eb34441bsm633972766b.210.2024.06.04.08.13.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 04 Jun 2024 08:13:08 -0700 (PDT) Message-ID: <685b9f60-7b5d-407f-be26-69115e23a9d2@pantheon.tech> Date: Tue, 4 Jun 2024 17:13:07 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v2 3/5] dts: add parsing utility module To: Luca Vizzarro , dev@dpdk.org Cc: Jeremy Spewock , Paul Szczepanek References: <20240412111136.3470304-1-luca.vizzarro@arm.com> <20240509112635.1170557-1-luca.vizzarro@arm.com> <20240509112635.1170557-4-luca.vizzarro@arm.com> Content-Language: en-US From: =?UTF-8?Q?Juraj_Linke=C5=A1?= In-Reply-To: <20240509112635.1170557-4-luca.vizzarro@arm.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: dev@dpdk.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: DPDK patches and discussions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: dev-bounces@dpdk.org There are mostly documentation and basically inconsequential minor comments. On 9. 5. 2024 13:26, Luca Vizzarro wrote: > Adds parsing text into a custom dataclass. It provides a new > `TextParser` dataclass to be inherited. This implements the `parse` > method, which combined with the parser functions, it can automatically > parse the value for each field. > > This new utility will facilitate and simplify the parsing of complex > command outputs, while ensuring that the codebase does not get bloated > and stays flexible. > > Signed-off-by: Luca Vizzarro > Reviewed-by: Paul Szczepanek > --- > dts/framework/parser.py | 199 ++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 199 insertions(+) > create mode 100644 dts/framework/parser.py > > diff --git a/dts/framework/parser.py b/dts/framework/parser.py > new file mode 100644 > index 0000000000..5b4acddead > --- /dev/null > +++ b/dts/framework/parser.py > @@ -0,0 +1,199 @@ > +# SPDX-License-Identifier: BSD-3-Clause > +# Copyright(c) 2024 Arm Limited > + > +r"""Parsing utility module. > + > +This module provides :class:`~TextParser` which can be used to model any dataclass to a block of > +text. > + > +Usage example:: I'd like to see a high level explanation of what the key pieces of the parsing are and how they're tied to the implementation: 1. The text we're about to parse, passed to the instance of a subclass 2. What we're parsing, the fields of the subclass 3. How we're parsing the fields, the functions of TextParser This could be part of the example or mentioned before the example or in the class. I had to study the code to understand the API, so that should be better documented. > +..code:: python > + > + from dataclasses import dataclass, field > + from enum import Enum > + from framework.parser import TextParser > + > + class Colour(Enum): > + BLACK = 1 > + WHITE = 2 > + > + @classmethod > + def from_str(cls, text: str): > + match text: > + case "black": > + return cls.BLACK > + case "white": > + return cls.WHITE > + case _: > + return None # unsupported colour > + > + @classmethod > + def make_parser(cls): > + # make a parser function that finds a match and > + # then makes it a Colour object through Colour.from_str > + return TextParser.compose(cls.from_str, TextParser.find(r"is a (\w+)")) > + > + @dataclass > + class Animal(TextParser): > + kind: str = field(metadata=TextParser.find(r"is a \w+ (\w+)")) > + name: str = field(metadata=TextParser.find(r"^(\w+)")) > + colour: Colour = field(metadata=Colour.make_parser()) > + age: int = field(metadata=TextParser.find_int(r"aged (\d+)")) > + > + steph = Animal.parse("Stephanie is a white cat aged 10") > + print(steph) # Animal(kind='cat', name='Stephanie', colour=, age=10) > +""" > + > +import re > +from abc import ABC > +from dataclasses import MISSING, dataclass, fields > +from functools import partial > +from typing import Any, Callable, TypedDict, cast > + > +from typing_extensions import Self > + > + > +class ParserFn(TypedDict): > + """Parser function in a dict compatible with the :func:`dataclasses.field` metadata param.""" > + > + #: > + TextParser_fn: Callable[[str], Any] > + > + > +@dataclass > +class TextParser(ABC): > + """Helper abstract dataclass that parses a text according to the fields' rules. > + > + This class provides a selection of parser functions and a function to compose generic functions > + with parser functions. Parser functions are designed to be passed to the fields' metadata param > + to enable parsing. > + """ We should add that this class should be subclassed; basically what I've laid out above so that devs know how to actually use the class. I'm not sure which is the better place to put it, but probably here. > + > + """============ BEGIN PARSER FUNCTIONS ============""" > + > + @staticmethod > + def compose(f: Callable, parser_fn: ParserFn) -> ParserFn: > + """Makes a composite parser function. > + > + The parser function is run and if a non-None value was returned, f is called with it. Let's use `parser_fn` instead of "The parser function". Let's also put f into backticks. It's also peculiar that we're passing the functions in the reverse order - first the second one is applied and then the first one is applied. > + Otherwise the function returns early with None. > + > + Metadata modifier for :func:`dataclasses.field`. This sentence is all alone here and I don't understand what it's saying without more context. > + > + Args: > + f: the function to apply to the parser's result This now refers to just parser's results instead of parser functions's result, which is confusing. But let's also use `parser_fn` here. Also, the arg descriptions should end with a dot. And start with a capital letter. https://www.sphinx-doc.org/en/master/usage/extensions/example_google.html#example-google > + parser_fn: the dictionary storing the parser function > + """ The docstring is missing a Returns: section. > + g = parser_fn["TextParser_fn"] This is a matter of preference, but I like more descriptive variable names, something like initial_func. And then I'd change f to subsequent_func, which would make the function signature a bit more readable. > + > + def _composite_parser_fn(text: str) -> Any: > + intermediate_value = g(text) > + if intermediate_value is None: > + return None > + return f(intermediate_value) > + > + return ParserFn(TextParser_fn=_composite_parser_fn) > + > + @staticmethod > + def find( > + pattern: str | re.Pattern[str], > + flags: re.RegexFlag = re.RegexFlag(0), > + named: bool = False, > + ) -> ParserFn: > + """Makes a parser function that finds a regular expression match in the text. > + > + If the pattern has capturing groups, it returns None if no match was found. If the pattern > + has only one capturing group and a match was found, its value is returned. If the pattern > + has no capturing groups then either True or False is returned if the pattern had a match or > + not. > + This description would be a better fit in a Returns: section. The body could explain the various scenarios in which the function can be used. At least I imagine there are various scenarios based on the different things the function returns, but I don't really want to go around digging in the code to verify that. :-) Also a note: what does the function return if the pattern has capturing groups and a match is found? > + Metadata modifier for :func:`dataclasses.field`. > + > + Args: > + pattern: the regular expression pattern > + flags: the regular expression flags. Not used if the given pattern is already compiled > + named: if set to True only the named capture groups will be returned as a dictionary > + """ This is also missing the Returns: sections. > + if isinstance(pattern, str): > + pattern = re.compile(pattern, flags) > + > + def _find(text: str) -> Any: > + m = pattern.search(text) > + if m is None: > + return None if pattern.groups > 0 else False > + > + if pattern.groups == 0: > + return True > + > + if named: > + return m.groupdict() > + > + matches = m.groups() > + if len(matches) == 1: > + return matches[0] > + > + return matches > + > + return ParserFn(TextParser_fn=_find) > + > + @classmethod Is there a reason why find_int() is a classmethod while the rest are staticmethods? Looks like it could also be a staticmethod. > + def find_int( > + cls, > + pattern: str | re.Pattern[str], > + flags: re.RegexFlag = re.RegexFlag(0), > + int_base: int = 0, > + ) -> ParserFn: > + """Makes a parser function that converts the match of :meth:`~find` to int. > + > + This function is compatible only with a pattern containing one capturing group. > + > + Metadata modifier for :func:`dataclasses.field`. > + > + Args: > + pattern: the regular expression pattern > + flags: the regular expression flags > + int_base: the base of the number to convert from > + Raises: > + RuntimeError: if the pattern does not have exactly one capturing group > + """ > + if isinstance(pattern, str): > + pattern = re.compile(pattern, flags) > + > + if pattern.groups != 1: > + raise RuntimeError("only one capturing group is allowed with this parser function") > + Have you considered using a subclass of DTSError so that we can add a severity to this error? The severity is there to "rank" DTS errors from worst to least worst. > + return cls.compose(partial(int, base=int_base), cls.find(pattern)) > + > + """============ END PARSER FUNCTIONS ============""" > + > + @classmethod > + def parse(cls, text: str) -> Self: Would converting this into __init__(self, text: str) work? Sounds like we could just use "for field in fields(self)" and then setattr() to populate the variables. > + """Creates a new instance of the class from the given text. > + > + A new class instance is created with all the fields that have a parser function in their > + metadata. Fields without one are ignored and are expected to have a default value, otherwise > + the class initialization will fail. > + > + A field is populated with the value returned by its corresponding parser function. > + > + Args: > + text: the text to parse > + Raises: > + RuntimeError: if the parser did not find a match and the field does not have a default > + value or default factory. > + """ > + fields_values = {} > + for field in fields(cls): > + parse = cast(ParserFn, field.metadata).get("TextParser_fn") > + if parse is None: > + continue > + > + value = parse(text) > + if value is not None: > + fields_values[field.name] = value If we convert the method into a constructor, we would just use setattr() here. > + elif field.default is MISSING and field.default_factory is MISSING: > + raise RuntimeError( > + f"parser for field {field.name} returned None, but the field has no default" > + ) > + > + return cls(**fields_values)