DPDK patches and discussions
 help / color / mirror / Atom feed
From: "Juraj Linkeš" <juraj.linkes@pantheon.tech>
To: Luca Vizzarro <luca.vizzarro@arm.com>, dev@dpdk.org
Cc: Honnappa Nagarahalli <honnappa.nagarahalli@arm.com>,
	Paul Szczepanek <paul.szczepanek@arm.com>
Subject: Re: [PATCH 3/5] dts: use Pydantic in the configuration
Date: Tue, 17 Sep 2024 13:13:11 +0200	[thread overview]
Message-ID: <955abe98-9b96-44c5-8ec2-969a8da20365@pantheon.tech> (raw)
In-Reply-To: <20240822163941.1390326-4-luca.vizzarro@arm.com>



On 22. 8. 2024 18:39, Luca Vizzarro wrote:
> This change brings in Pydantic in place of Warlock. Pydantic offers
> a built-in model validation system in the classes, which allows for
> a more resilient and simpler code. As a consequence of this change:
> 
> - most validation is now built-in
> - further validation is added to verify:
>    - cross referencing of node names and ports
>    - test suite and test cases names
> - dictionaries representing the config schema are removed
> - the config schema is no longer used for validation but kept as an
>    alternative format for the developer

If it's not used, we should remove it right away (in this patch). I see 
that it's updated in v5, but we can just add it back.

> - the config schema can now be generated automatically from the
>    Pydantic models
> - the TrafficGeneratorType enum has been changed from inheriting
>    StrEnum to the native str and Enum. This change was necessary to
>    enable the discriminator for object unions
> - the structure of the classes has been slightly changed to perfectly
>    match the structure of the configuration files
> - updates the test suite argument to catch the ValidationError that
>    TestSuiteConfig can now raise

Passive voice is not used here, but the rest of the bullet points are 
using it.

>   delete mode 100644 dts/framework/config/types.py

A note, don't forget to remove this from doc sources if those get merged 
before this.

> 
> diff --git a/dts/framework/config/__init__.py b/dts/framework/config/__init__.py

> @@ -2,17 +2,19 @@

> -the YAML test run configuration file
> -and validates it according to :download:`the schema <conf_yaml_schema.json>`.
> +the YAML test run configuration file and validates it against the :class:`Configuration` Pydantic
> +dataclass model. The Pydantic model is also available as
> +:download:`JSON schema <conf_yaml_schema.json>`.

This second sentence should be moved to the last patch.


> @@ -33,29 +35,31 @@

> +)
> +from pydantic.config import JsonDict
> +from pydantic.dataclasses import dataclass

We should probably distinguish between built-in dataclasses and pydantic 
dataclasses (as pydantic adds the extra argument). Importing them as 
pydantic_dataclass seems like the easiest way to achieve this.


> @@ -116,14 +120,14 @@ class Compiler(StrEnum):

>   @unique
> -class TrafficGeneratorType(StrEnum):
> +class TrafficGeneratorType(str, Enum):
>       """The supported traffic generators."""
>   
>       #:
> -    SCAPY = auto()
> +    SCAPY = "SCAPY"

Do discriminators not work with auto()?


> -@dataclass(slots=True, frozen=True)
> +@dataclass(slots=True, frozen=True, kw_only=True, config=ConfigDict(extra="forbid"))

Is there any special reason for kw_only? Maybe we should add the reason 
for this (and also the config arg) to the module dosctring and commit msg.


> @@ -136,12 +140,17 @@ class HugepageConfiguration:

> -@dataclass(slots=True, frozen=True)
> +PciAddress = Annotated[
> +    str, StringConstraints(pattern=r"^[\da-fA-F]{4}:[\da-fA-F]{2}:[\da-fA-F]{2}.\d:?\w*$")

We have a pattern for this in utils.py. We can reuse and maybe update it 
if needed.

> +]
> +"""A constrained string type representing a PCI address."""

This should be above the var and I think regular comment (with #) should 
suffice.


> @@ -150,69 +159,53 @@ class PortConfig:

> +TrafficGeneratorConfigTypes = Annotated[ScapyTrafficGeneratorConfig, Field(discriminator="type")]
>   
> -@dataclass(slots=True, frozen=True)
> -class ScapyTrafficGeneratorConfig(TrafficGeneratorConfig):
> -    """Scapy traffic generator specific configuration."""
>   
> -    pass
> +LogicalCores = Annotated[
> +    str,
> +    StringConstraints(pattern=r"^(([0-9]+|([0-9]+-[0-9]+))(,([0-9]+|([0-9]+-[0-9]+)))*)?$"),
> +    Field(
> +        description="Comma-separated list of logical cores to use. "
> +        "An empty string means use all lcores.",
> +        examples=["1,2,3,4,5,18-22", "10-15"],
> +    ),
> +]

These two types don't have have a docstring, but others have.


> @@ -232,69 +225,25 @@ class NodeConfiguration:

>       arch: Architecture
>       os: OS

Adding the descriptions to all fields would be beneficial. Do we want to 
do that in this patch?


> @@ -313,10 +264,14 @@ class TGNodeConfiguration(NodeConfiguration):

> -    traffic_generator: TrafficGeneratorConfig
> +    traffic_generator: TrafficGeneratorConfigTypes
> +
>   
> +NodeConfigurationTypes = TGNodeConfiguration | SutNodeConfiguration
> +"""Union type for all the node configuration types."""

Same note as with PciAddress.


> @@ -405,31 +369,63 @@ class TestSuiteConfig:

> +    test_suite_name: str = Field(
> +        title="Test suite name",
> +        description="The identifying name of the test suite.",

I think we need to update this to mention that it's the test suite 
module name. Maybe we can also update the field, as it's only used in 
this object.

> +        alias="test_suite",
> +    )
> +    test_cases_names: list[str] = Field(
> +        default_factory=list,
> +        title="Test cases by name",
> +        description="The identifying name of the test cases of the test suite.",
> +        alias="test_cases",
> +    )

The attributes under Attributes need to be updated.

> +
> +    @cached_property
> +    def test_suite_spec(self) -> "TestSuiteSpec":
> +        """The specification of the requested test suite."""
> +        from framework.test_suite import find_by_name
> +
> +        test_suite_spec = find_by_name(self.test_suite_name)
> +        assert test_suite_spec is not None, f"{self.test_suite_name} is not a valid test suite name"

Doesn't end with a dot; the message should also mention that we're 
dealing with module name.

> +        return test_suite_spec
> +
> +    @model_validator(mode="before")

I think it makes sense to exlude these from docs. I tried putting :meta 
private: into a docstring and it seems to be working.

>       @classmethod
> -    def from_dict(
> -        cls,
> -        entry: str | TestSuiteConfigDict,
> -    ) -> Self:
> -        """Create an instance from two different types.
> +    def convert_from_string(cls, data: Any) -> Any:
> +        """Convert the string representation into a valid mapping."""
> +        if isinstance(data, str):
> +            [test_suite, *test_cases] = data.split()
> +            return dict(test_suite=test_suite, test_cases=test_cases)
> +        return data
> +

Why is this here? To unify the format with the one accepted by the 
--test-suite argument? Do we want to add an alternative format? If so, 
we need to make sure we document clearly that there are two alternatives 
and that they're equivalent.

> +    @model_validator(mode="after")
> +    def validate_names(self) -> Self:
> +        """Validate the supplied test suite and test cases names."""

In Configuration.validate_test_runs_with_nodes() the docstring mentions 
the use of the cached property, let's also do that here.

> +        available_test_cases = map(lambda t: t.name, self.test_suite_spec.test_cases)
> +        for requested_test_case in self.test_cases_names:
> +            assert requested_test_case in available_test_cases, (
> +                f"{requested_test_case} is not a valid test case "
> +                f"for test suite {self.test_suite_name}"

for test suite -> of test suite; also end with a dot. The dot is missing 
in a lot of places (and capital letters where the message doesn't start 
with a var value).


> @@ -442,143 +438,132 @@ class TestRunConfiguration:

> -@dataclass(slots=True, frozen=True)
> +
> +@dataclass(frozen=True, kw_only=True)
>   class Configuration:
>       """DTS testbed and test configuration.
>   
> -    The node configuration is not stored in this object. Rather, all used node configurations
> -    are stored inside the test run configuration where the nodes are actually used.
> -

I think it makes sense to explain the extra validation (with the 
@*_validator decorators) that's being done in the docstring (if we 
remove the validation methods from the generated docs). The docstring 
should be updated for each model that doing the extra validation.

>       Attributes:
>           test_runs: Test run configurations.
> +        nodes: Node configurations.
>       """
>   
> -    test_runs: list[TestRunConfiguration]
> +    test_runs: list[TestRunConfiguration] = Field(min_length=1)
> +    nodes: list[NodeConfigurationTypes] = Field(min_length=1)
>   
> +    @field_validator("nodes")
>       @classmethod
> -    def from_dict(cls, d: ConfigurationDict) -> Self:
> -        """A convenience method that processes the inputs before creating an instance.
> +    def validate_node_names(cls, nodes: list[NodeConfiguration]) -> list[NodeConfiguration]:
> +        """Validate that the node names are unique."""
> +        nodes_by_name: dict[str, int] = {}
> +        for node_no, node in enumerate(nodes):
> +            assert node.name not in nodes_by_name, (
> +                f"node {node_no} cannot have the same name as node {nodes_by_name[node.name]} "
> +                f"({node.name})"
> +            )
> +            nodes_by_name[node.name] = node_no
> +
> +        return nodes
> +
> +    @model_validator(mode="after")
> +    def validate_ports(self) -> Self:
> +        """Validate that the ports are all linked to valid ones."""
> +        port_links: dict[tuple[str, str], Literal[False] | tuple[int, int]] = {
> +            (node.name, port.pci): False for node in self.nodes for port in node.ports
> +        }
> +
> +        for node_no, node in enumerate(self.nodes):

I could see why we're use enumeration for nodes in validate_node_names, 
but here we can just use node names in assert messages. At least that 
should be the case if nodes get validated before this model validator 
runs - it that the case?

> +            for port_no, port in enumerate(node.ports):
> +                peer_port_identifier = (port.peer_node, port.peer_pci)
> +                peer_port = port_links.get(peer_port_identifier, None)
> +                assert peer_port is not None, (
> +                    "invalid peer port specified for " f"nodes.{node_no}.ports.{port_no}"
> +                )
> +                assert peer_port is False, (
> +                    f"the peer port specified for nodes.{node_no}.ports.{port_no} "
> +                    f"is already linked to nodes.{peer_port[0]}.ports.{peer_port[1]}"
> +                )
> +                port_links[peer_port_identifier] = (node_no, port_no)
>   

> +    @cached_property
> +    def test_runs_with_nodes(self) -> list[TestRunWithNodesConfiguration]:

Let's move the property to be the first member of the class, to unify 
the order it with TestSuiteConfig.

> +        """List test runs with the associated nodes."""

This doesn't list the test runs. I think the docstring should say a bit 
more to make it obvious that this is the main attribute to use with this 
class. Or maybe that could be in the the class's docstring.

We're also missing the Returns: section.

> +        test_runs_with_nodes = []
>   
> -        Returns:
> -            The whole configuration instance.
> -        """
> -        nodes: list[SutNodeConfiguration | TGNodeConfiguration] = list(
> -            map(NodeConfiguration.from_dict, d["nodes"])
> -        )
> -        assert len(nodes) > 0, "There must be a node to test"
> +        for test_run_no, test_run in enumerate(self.test_runs):
> +            sut_node_name = test_run.system_under_test_node.node_name
> +            sut_node = next(filter(lambda n: n.name == sut_node_name, self.nodes), None)

There are a number of these instead of a list comprehension (I mentioned 
this in a previous patch). I still don't really see a reason to not use 
list comprehensions in all these cases.


> +
> +
> +ConfigurationType = TypeAdapter(Configuration)

This new transformed class exists only for validation purposes, right? I 
think we can move this to load_config, as it's not going to be used 
anywhere else.

Also I'd rename it to something else, it's not a type. Maybe 
ConfigurationAdapter or PydanticConfiguration or ConfigurationModel (as 
the adapter adds some methods from BaseModel). Or something else, but 
the type in the name confused me.

> diff --git a/dts/framework/runner.py b/dts/framework/runner.py
> @@ -231,10 +234,10 @@ def _get_test_suites_with_cases(
>           test_suites_with_cases = []
>   
>           for test_suite_config in test_suite_configs:
> -            test_suite_class = self._get_test_suite_class(test_suite_config.test_suite)
> +            test_suite_class = self._get_test_suite_class(test_suite_config.test_suite_name)

We've already done all the validation and importing at this point and we 
should be able to use test_suite_config.test_suite_spec, right? The same 
is true for TestSuiteWithCases, which holds the same information.

Looks like you removed _get_test_suite_class in a subsequent patch, but 
we should think about getting rid of TestSuiteWithCases, as it was 
conceived to do what TestSuiteSpec is doing.


  reply	other threads:[~2024-09-17 11:13 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-22 16:39 [PATCH 0/5] dts: Pydantic configuration Luca Vizzarro
2024-08-22 16:39 ` [PATCH 1/5] dts: add TestSuiteSpec class and discovery Luca Vizzarro
2024-09-16 13:00   ` Juraj Linkeš
2024-08-22 16:39 ` [PATCH 2/5] dts: add Pydantic and remove Warlock Luca Vizzarro
2024-09-16 13:17   ` Juraj Linkeš
2024-08-22 16:39 ` [PATCH 3/5] dts: use Pydantic in the configuration Luca Vizzarro
2024-09-17 11:13   ` Juraj Linkeš [this message]
2024-08-22 16:39 ` [PATCH 4/5] dts: use TestSuiteSpec class imports Luca Vizzarro
2024-09-17 11:39   ` Juraj Linkeš
2024-08-22 16:39 ` [PATCH 5/5] dts: add JSON schema generation script Luca Vizzarro
2024-09-17 11:59   ` Juraj Linkeš

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=955abe98-9b96-44c5-8ec2-969a8da20365@pantheon.tech \
    --to=juraj.linkes@pantheon.tech \
    --cc=dev@dpdk.org \
    --cc=honnappa.nagarahalli@arm.com \
    --cc=luca.vizzarro@arm.com \
    --cc=paul.szczepanek@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).