DPDK patches and discussions
 help / color / mirror / Atom feed
Search results ordered by [date|relevance]  view[summary|nested|Atom feed]
thread overview below | download: 
* [PATCH 1/9] doc: reword design section in contributors guidelines
  @ 2024-05-13 15:59  5% ` Nandini Persad
  0 siblings, 0 replies; 200+ results
From: Nandini Persad @ 2024-05-13 15:59 UTC (permalink / raw)
  To: dev; +Cc: Thomas Monjalon

minor editing for grammar and syntax of design section

Signed-off-by: Nandini Persad <nandinipersad361@gmail.com>
---
 .mailmap                           |  1 +
 doc/guides/contributing/design.rst | 79 ++++++++++++++----------------
 doc/guides/linux_gsg/sys_reqs.rst  |  2 +-
 3 files changed, 38 insertions(+), 44 deletions(-)

diff --git a/.mailmap b/.mailmap
index 66ebc20666..7d4929c5d1 100644
--- a/.mailmap
+++ b/.mailmap
@@ -1002,6 +1002,7 @@ Naga Suresh Somarowthu <naga.sureshx.somarowthu@intel.com>
 Nalla Pradeep <pnalla@marvell.com>
 Na Na <nana.nn@alibaba-inc.com>
 Nan Chen <whutchennan@gmail.com>
+Nandini Persad <nandinipersad361@gmail.com>
 Nannan Lu <nannan.lu@intel.com>
 Nan Zhou <zhounan14@huawei.com>
 Narcisa Vasile <navasile@linux.microsoft.com> <navasile@microsoft.com> <narcisa.vasile@microsoft.com>
diff --git a/doc/guides/contributing/design.rst b/doc/guides/contributing/design.rst
index b724177ba1..921578aec5 100644
--- a/doc/guides/contributing/design.rst
+++ b/doc/guides/contributing/design.rst
@@ -8,22 +8,26 @@ Design
 Environment or Architecture-specific Sources
 --------------------------------------------
 
-In DPDK and DPDK applications, some code is specific to an architecture (i686, x86_64) or to an executive environment (freebsd or linux) and so on.
-As far as is possible, all such instances of architecture or env-specific code should be provided via standard APIs in the EAL.
+In DPDK and DPDK applications, some code is architecture-specific (i686, x86_64) or  environment-specific (FreeBsd or Linux, etc.).
+When feasible, such instances of architecture or env-specific code should be provided via standard APIs in the EAL.
 
-By convention, a file is common if it is not located in a directory indicating that it is specific.
-For instance, a file located in a subdir of "x86_64" directory is specific to this architecture.
+By convention, a file is specific if the directory is indicated. Otherwise, it is common.
+
+For example:
+
+A file located in a subdir of "x86_64" directory is specific to this architecture.
 A file located in a subdir of "linux" is specific to this execution environment.
 
 .. note::
 
    Code in DPDK libraries and applications should be generic.
-   The correct location for architecture or executive environment specific code is in the EAL.
+   The correct location for architecture or executive environment-specific code is in the EAL.
+
+When necessary, there are several ways to handle specific code:
 
-When absolutely necessary, there are several ways to handle specific code:
 
-* Use a ``#ifdef`` with a build definition macro in the C code.
-  This can be done when the differences are small and they can be embedded in the same C file:
+* When the differences are small and they can be embedded in the same C file, use a ``#ifdef`` with a build definition macro in the C code.
+
 
   .. code-block:: c
 
@@ -33,9 +37,9 @@ When absolutely necessary, there are several ways to handle specific code:
      titi();
      #endif
 
-* Use build definition macros and conditions in the Meson build file. This is done when the differences are more significant.
-  In this case, the code is split into two separate files that are architecture or environment specific.
-  This should only apply inside the EAL library.
+* When the differences are more significant, use build definition macros and conditions in the Meson build file.
+In this case, the code is split into two separate files that are architecture or environment specific.
+This should only apply inside the EAL library.
 
 Per Architecture Sources
 ~~~~~~~~~~~~~~~~~~~~~~~~
@@ -43,7 +47,7 @@ Per Architecture Sources
 The following macro options can be used:
 
 * ``RTE_ARCH`` is a string that contains the name of the architecture.
-* ``RTE_ARCH_I686``, ``RTE_ARCH_X86_64``, ``RTE_ARCH_X86_X32``, ``RTE_ARCH_PPC_64``, ``RTE_ARCH_RISCV``, ``RTE_ARCH_LOONGARCH``, ``RTE_ARCH_ARM``, ``RTE_ARCH_ARMv7`` or ``RTE_ARCH_ARM64`` are defined only if we are building for those architectures.
+* ``RTE_ARCH_I686``, ``RTE_ARCH_X86_64``, ``RTE_ARCH_X86_X32``, ``RTE_ARCH_PPC_64``, ``RTE_ARCH_RISCV``, ``RTE_ARCH_LOONGARCH``, ``RTE_ARCH_ARM``, ``RTE_ARCH_ARMv7`` or ``RTE_ARCH_ARM64`` are defined when building for these architectures.
 
 Per Execution Environment Sources
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -51,30 +55,21 @@ Per Execution Environment Sources
 The following macro options can be used:
 
 * ``RTE_EXEC_ENV`` is a string that contains the name of the executive environment.
-* ``RTE_EXEC_ENV_FREEBSD``, ``RTE_EXEC_ENV_LINUX`` or ``RTE_EXEC_ENV_WINDOWS`` are defined only if we are building for this execution environment.
+* ``RTE_EXEC_ENV_FREEBSD``, ``RTE_EXEC_ENV_LINUX`` or ``RTE_EXEC_ENV_WINDOWS`` are defined only when building for this execution environment.
 
 Mbuf features
 -------------
 
-The ``rte_mbuf`` structure must be kept small (128 bytes).
-
-In order to add new features without wasting buffer space for unused features,
-some fields and flags can be registered dynamically in a shared area.
-The "dynamic" mbuf area is the default choice for the new features.
-
-The "dynamic" area is eating the remaining space in mbuf,
-and some existing "static" fields may need to become "dynamic".
+A designated area in mbuf stores "dynamically" registered fields and flags. It is the default choice for accomodating new features. The "dynamic" area consumes the remaining space in the mbuf, indicating that it's being efficiently utilized. However, the ``rte_mbuf`` structure must be kept small (128 bytes).
 
-Adding a new static field or flag must be an exception matching many criteria
-like (non exhaustive): wide usage, performance, size.
+As more features are added, the space for existinG=g "static" fields (fields that are allocated statically) may need to be reconsidered and possibly converted to "dynamic" allocation. Adding a new static field or flag should be an exception. It must meet specific criteria including widespread usage, performance impact, and size considerations. Before adding a new static feature, it must be justified by its necessity and its impact on the system's efficiency.
 
 
 Runtime Information - Logging, Tracing and Telemetry
 ----------------------------------------------------
 
-It is often desirable to provide information to the end-user
-as to what is happening to the application at runtime.
-DPDK provides a number of built-in mechanisms to provide this introspection:
+The end user may inquire as to what is happening to the application at runtime.
+DPDK provides several built-in mechanisms to provide these insights:
 
 * :ref:`Logging <dynamic_logging>`
 * :doc:`Tracing <../prog_guide/trace_lib>`
@@ -82,11 +77,11 @@ DPDK provides a number of built-in mechanisms to provide this introspection:
 
 Each of these has its own strengths and suitabilities for use within DPDK components.
 
-Below are some guidelines for when each should be used:
+Here are guidelines for when each mechanism should be used:
 
 * For reporting error conditions, or other abnormal runtime issues, *logging* should be used.
-  Depending on the severity of the issue, the appropriate log level, for example,
-  ``ERROR``, ``WARNING`` or ``NOTICE``, should be used.
+  For example, depending on the severity of the issue, the appropriate log level,
+  ``ERROR``, ``WARNING`` or ``NOTICE`` should be used.
 
 .. note::
 
@@ -96,22 +91,21 @@ Below are some guidelines for when each should be used:
 
 * For component initialization, or other cases where a path through the code
   is only likely to be taken once,
-  either *logging* at ``DEBUG`` level or *tracing* may be used, or potentially both.
+  either *logging* at ``DEBUG`` level or *tracing* may be used, or both.
   In the latter case, tracing can provide basic information as to the code path taken,
   with debug-level logging providing additional details on internal state,
-  not possible to emit via tracing.
+  which is not possible to emit via tracing.
 
 * For a component's data-path, where a path is to be taken multiple times within a short timeframe,
   *tracing* should be used.
   Since DPDK tracing uses `Common Trace Format <https://diamon.org/ctf/>`_ for its tracing logs,
   post-analysis can be done using a range of external tools.
 
-* For numerical or statistical data generated by a component, for example, per-packet statistics,
+* For numerical or statistical data generated by a component, such as per-packet statistics,
   *telemetry* should be used.
 
-* For any data where the data may need to be gathered at any point in the execution
-  to help assess the state of the application component,
-  for example, core configuration, device information, *telemetry* should be used.
+* For any data that may need to be gathered at any point during the execution
+  to help assess the state of the application component (for example, core configuration, device information) *telemetry* should be used.
   Telemetry callbacks should not modify any program state, but be "read-only".
 
 Many libraries also include a ``rte_<libname>_dump()`` function as part of their API,
@@ -135,13 +129,12 @@ requirements for preventing ABI changes when implementing statistics.
 Mechanism to allow the application to turn library statistics on and off
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 
-Having runtime support for enabling/disabling library statistics is recommended,
-as build-time options should be avoided. However, if build-time options are used,
-for example as in the table library, the options can be set using c_args.
-When this flag is set, all the counters supported by current library are
+Having runtime support for enabling/disabling library statistics is recommended
+as build-time options should be avoided. However, if build-time options are used, as in the table library, the options can be set using c_args.
+When this flag is set, all the counters supported by the current library are
 collected for all the instances of every object type provided by the library.
 When this flag is cleared, none of the counters supported by the current library
-are collected for any instance of any object type provided by the library:
+are collected for any instance of any object type provided by the library.
 
 
 Prevention of ABI changes due to library statistics support
@@ -165,8 +158,8 @@ Motivation to allow the application to turn library statistics on and off
 
 It is highly recommended that each library provides statistics counters to allow
 an application to monitor the library-level run-time events. Typical counters
-are: number of packets received/dropped/transmitted, number of buffers
-allocated/freed, number of occurrences for specific events, etc.
+are: the number of packets received/dropped/transmitted, the number of buffers
+allocated/freed, the number of occurrences for specific events, etc.
 
 However, the resources consumed for library-level statistics counter collection
 have to be spent out of the application budget and the counters collected by
@@ -229,5 +222,5 @@ Developers should work with the Linux Kernel community to get the required
 functionality upstream. PF functionality should only be added to DPDK for
 testing and prototyping purposes while the kernel work is ongoing. It should
 also be marked with an "EXPERIMENTAL" tag. If the functionality isn't
-upstreamable then a case can be made to maintain the PF functionality in DPDK
+upstreamable, then a case can be made to maintain the PF functionality in DPDK
 without the EXPERIMENTAL tag.
diff --git a/doc/guides/linux_gsg/sys_reqs.rst b/doc/guides/linux_gsg/sys_reqs.rst
index 13be715933..0569c5cae6 100644
--- a/doc/guides/linux_gsg/sys_reqs.rst
+++ b/doc/guides/linux_gsg/sys_reqs.rst
@@ -99,7 +99,7 @@ e.g. :doc:`../nics/index`
 Running DPDK Applications
 -------------------------
 
-To run a DPDK application, some customization may be required on the target machine.
+To run a DPDK application, customization may be required on the target machine.
 
 System Software
 ~~~~~~~~~~~~~~~
-- 
2.34.1


^ permalink raw reply	[relevance 5%]

* Re: [RFC v2] net/af_packet: make stats reset reliable
  2024-05-07 16:00  3%           ` Morten Brørup
  2024-05-07 16:54  0%             ` Ferruh Yigit
@ 2024-05-08  7:48  0%             ` Mattias Rönnblom
  1 sibling, 0 replies; 200+ results
From: Mattias Rönnblom @ 2024-05-08  7:48 UTC (permalink / raw)
  To: Morten Brørup, Stephen Hemminger, Ferruh Yigit
  Cc: John W. Linville, Thomas Monjalon, dev, Mattias Rönnblom

On 2024-05-07 18:00, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Tuesday, 7 May 2024 16.51
> 
>> I would prefer that the SW statistics be handled generically by ethdev
>> layers and used by all such drivers.
> 
> I agree.
> 
> Please note that maintaining counters in the ethdev layer might cause more cache misses than maintaining them in the hot parts of the individual drivers' data structures, so it's not all that simple. ;-)
> 
> Until then, let's find a short term solution, viable to implement across all software NIC drivers without API/ABI breakage.
> 
>>
>> The most complete version of SW stats now is in the virtio driver.
> 
> It looks like the virtio PMD maintains the counters; they are not retrieved from the host.
> 
> Considering a DPDK application running as a virtual machine (guest) on a host server...
> 
> If the host is unable to put a packet onto the guest's virtio RX queue - like when a HW NIC is out of RX descriptors - is it counted somewhere visible to the guest?
> 
> Similarly, if the guest is unable to put a packet onto its virtio TX queue, is it counted somewhere visible to the host?
> 
>> If reset needs to be reliable (debatable), then it needs to be done without
>> atomics.
> 
> Let's modify that slightly: Without performance degradation in the fast path.
> I'm not sure that all atomic operations are slow.

Relaxed atomic loads from and stores to naturally aligned addresses are 
for free on ARM and x86_64 up to at least 64 bits.

"For free" is not entirely true, since both C11 relaxed stores and 
stores through volatile may prevent vectorization in GCC. I don't see 
why, but in practice that seems to be the case. That is very much a 
corner case.

Also, as mentioned before, C11 atomic store effectively has volatile 
semantics, which in turn may prevent some compiler optimizations.

On 32-bit x86, 64-bit atomic stores use xmm registers, but those are 
going to be used anyway, since you'll have a 64-bit add.

> But you are right that it needs to be done without _Atomic counters; they seem to be slow.
> 

_Atomic is not slower than atomics without _Atomic, when you actually 
need atomic operations.

^ permalink raw reply	[relevance 0%]

* Re: [RFC v2] net/af_packet: make stats reset reliable
  2024-05-07 16:54  0%             ` Ferruh Yigit
@ 2024-05-07 18:47  0%               ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-07 18:47 UTC (permalink / raw)
  To: Ferruh Yigit
  Cc: Morten Brørup, Mattias Rönnblom, John W. Linville,
	Thomas Monjalon, dev, Mattias Rönnblom

On Tue, 7 May 2024 17:54:18 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 5/7/2024 5:00 PM, Morten Brørup wrote:
> >> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> >> Sent: Tuesday, 7 May 2024 16.51  
> >   
> >> I would prefer that the SW statistics be handled generically by ethdev
> >> layers and used by all such drivers.  
> > 
> > I agree.
> > 
> > Please note that maintaining counters in the ethdev layer might cause more cache misses than maintaining them in the hot parts of the individual drivers' data structures, so it's not all that simple. ;-)
> > 
> > Until then, let's find a short term solution, viable to implement across all software NIC drivers without API/ABI breakage.
> >   
> 
> I am against ehtdev layer being aware of SW drivers and behave
> differently for them.
> This is dev_ops and can be managed per driver. We can add helper
> functions for drivers if there is a common pattern.

It is more about having a set of helper routines for SW only drivers.
I have something in progress for this.

^ permalink raw reply	[relevance 0%]

* Re: [RFC v2] net/af_packet: make stats reset reliable
  2024-05-07 16:00  3%           ` Morten Brørup
@ 2024-05-07 16:54  0%             ` Ferruh Yigit
  2024-05-07 18:47  0%               ` Stephen Hemminger
  2024-05-08  7:48  0%             ` Mattias Rönnblom
  1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-05-07 16:54 UTC (permalink / raw)
  To: Morten Brørup, Stephen Hemminger
  Cc: Mattias Rönnblom, John W. Linville, Thomas Monjalon, dev,
	Mattias Rönnblom

On 5/7/2024 5:00 PM, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Tuesday, 7 May 2024 16.51
> 
>> I would prefer that the SW statistics be handled generically by ethdev
>> layers and used by all such drivers.
> 
> I agree.
> 
> Please note that maintaining counters in the ethdev layer might cause more cache misses than maintaining them in the hot parts of the individual drivers' data structures, so it's not all that simple. ;-)
> 
> Until then, let's find a short term solution, viable to implement across all software NIC drivers without API/ABI breakage.
> 

I am against ehtdev layer being aware of SW drivers and behave
differently for them.
This is dev_ops and can be managed per driver. We can add helper
functions for drivers if there is a common pattern.

>>
>> The most complete version of SW stats now is in the virtio driver.
> 
> It looks like the virtio PMD maintains the counters; they are not retrieved from the host.
> 
> Considering a DPDK application running as a virtual machine (guest) on a host server...
> 
> If the host is unable to put a packet onto the guest's virtio RX queue - like when a HW NIC is out of RX descriptors - is it counted somewhere visible to the guest?
> 
> Similarly, if the guest is unable to put a packet onto its virtio TX queue, is it counted somewhere visible to the host?
> 
>> If reset needs to be reliable (debatable), then it needs to be done without
>> atomics.
> 
> Let's modify that slightly: Without performance degradation in the fast path.
> I'm not sure that all atomic operations are slow.
> But you are right that it needs to be done without _Atomic counters; they seem to be slow.
> 


^ permalink raw reply	[relevance 0%]

* RE: [RFC v2] net/af_packet: make stats reset reliable
  @ 2024-05-07 16:00  3%           ` Morten Brørup
  2024-05-07 16:54  0%             ` Ferruh Yigit
  2024-05-08  7:48  0%             ` Mattias Rönnblom
  0 siblings, 2 replies; 200+ results
From: Morten Brørup @ 2024-05-07 16:00 UTC (permalink / raw)
  To: Stephen Hemminger, Ferruh Yigit
  Cc: Mattias Rönnblom, John W. Linville, Thomas Monjalon, dev,
	Mattias Rönnblom

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Tuesday, 7 May 2024 16.51

> I would prefer that the SW statistics be handled generically by ethdev
> layers and used by all such drivers.

I agree.

Please note that maintaining counters in the ethdev layer might cause more cache misses than maintaining them in the hot parts of the individual drivers' data structures, so it's not all that simple. ;-)

Until then, let's find a short term solution, viable to implement across all software NIC drivers without API/ABI breakage.

> 
> The most complete version of SW stats now is in the virtio driver.

It looks like the virtio PMD maintains the counters; they are not retrieved from the host.

Considering a DPDK application running as a virtual machine (guest) on a host server...

If the host is unable to put a packet onto the guest's virtio RX queue - like when a HW NIC is out of RX descriptors - is it counted somewhere visible to the guest?

Similarly, if the guest is unable to put a packet onto its virtio TX queue, is it counted somewhere visible to the host?

> If reset needs to be reliable (debatable), then it needs to be done without
> atomics.

Let's modify that slightly: Without performance degradation in the fast path.
I'm not sure that all atomic operations are slow.
But you are right that it needs to be done without _Atomic counters; they seem to be slow.


^ permalink raw reply	[relevance 3%]

* Re: [PATCH] freebsd: Add support for multiple dpdk instances on FreeBSD
  @ 2024-05-03 13:24  3%       ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2024-05-03 13:24 UTC (permalink / raw)
  To: Tom Jones; +Cc: dev

On Fri, May 03, 2024 at 02:12:58PM +0100, Tom Jones wrote:
> Hi Bruce,
> 
> thanks for letting me know
> 
> I'm not tied to anything particularly. This change isn't compatible with the previous API, but I'm not against making it so if that is really the best thing to do. As is, the dpdk changes and the contigmem changes need to come together because the API changes for getting the physical addresses.
> 

I don't think it's a major problem if the new kernel code doesn't work with the
older DPDK userspace code, we can apply both together in one patch.
However, it would count as an API/ABI change so would need to be deferred
for merge to 24.11 release, I think.


> It is just the sysctl paths that differ. I'm not sure what the compatibility needs to be for DPDK, for all of my usage I have built the kernel module with the package - making API changes easy.
> 
> I'm happy to follow which ever path you think is best.
> 

I'll maybe give more thoughts on this once I try the patch out. Hopefully
I'll get to test it out this afternoon. Don't bother trying to rework
anything until then! :-)

> Sorry for the patch confusion, I'll try to keep the sequence obvious going forward.
> 

No problem. Thanks for the contribution here. FreeBSD support has sadly
been lacking a number of features for some time now, so all changes to
close the feature gap vs linux are very welcome!

/Bruce


^ permalink raw reply	[relevance 3%]

* [PATCH v12 06/12] net/tap: rewrite the RSS BPF program
  @ 2024-05-02 21:31  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-02 21:31 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite of the BPF program used to do queue based RSS.

Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF code.
	- the resulting queue is placed in skb by using skb mark
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  49 +++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 267 +++++++++++++++++++++++++
 9 files changed, 397 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..6d323d2051
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across multiple queues if required by a flow action. The program is
+loaded into the kernel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+  as a target architecture and bpftool to convert object to headers.
+
+  Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Uses clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+   from tap_rss.bpf.o. This header contains wrapper functions for
+   managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+   the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..025b831b5c
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __always_inline __u32
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static __always_inline int
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __always_inline __u32
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	const __u32 *key;
+	__be16 proto;
+	__u32 mark;
+	__u32 hash;
+	__u16 queue;
+
+	__builtin_preserve_access_index(({
+		mark = skb->mark;
+		proto = skb->protocol;
+	}));
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	key = (const __u32 *)rsskey->key;
+
+	if (proto == bpf_htons(ETH_P_IP))
+		hash = parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (proto == bpf_htons(ETH_P_IPV6))
+		hash = parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		hash = 0;
+
+	if (hash == 0)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	queue = reciprocal_scale(hash, rsskey->nb_queues);
+
+	__builtin_preserve_access_index(({
+		skb->queue_mapping = queue;
+	}));
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* Re: [PATCH] net/af_packet: fix statistics
  2024-05-01 18:18  0%     ` Morten Brørup
@ 2024-05-02 13:47  0%       ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-05-02 13:47 UTC (permalink / raw)
  To: Morten Brørup, Stephen Hemminger
  Cc: dev, John W. Linville, Mattias Rönnblom

On 5/1/2024 7:18 PM, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Wednesday, 1 May 2024 18.45
>>
>> On Wed, 1 May 2024 17:25:59 +0100
>> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
>>
>>>>  - Remove the tx_error counter since it was not correct.
>>>>    When transmit ring is full it is not an error and
>>>>    the driver correctly returns only the number sent.
>>>>
>>>
>>> nack
>>> Transmit full is not only return case here.
>>> There are actual errors continue to process relying this error
>> calculation.
>>> Also there are error cases like interface down.
>>> Those error cases should be handled individually if we remove this.
>>> I suggest split this change to separate patch.
>>
>> I see multiple drivers have copy/pasted same code and consider
>> transmit full as an error. It is not.
> 
> +1
> Transmit full is certainly not an error!
> 

I am not referring to the transmit full case, there are error cases in
the driver:
- oversized packets
- vlan inserting failure

In above cases Tx loop continues, which relies at the end of the loop
these packets will be counted as error. We can't just remove error
counter, need to handle above.


- poll on fd fails
- poll on fd returns POLLERR (if down)

In above cases driver Tx loop breaks and all remaining packets counted
as error.


- sendto() fails

All packets sent to af_packet frame counted as error.


As you can see there are real error cases which are handled in the driver.
That is why instead of just removing error counter, I suggest handle it
more properly in a separate patch.

>>
>> There should be a new statistic at ethdev layer that does record
>> transmit full, and make it across all drivers, but that would have
>> to wait for ABI change.
> 
> What happens to these non-transmittable packets depend on the application.
> Our application discards them and count them in a (per-port, per-queue) application level counter tx_nodescr, which eventually becomes IF-MIB::ifOutDiscards in SNMP. I think many applications behave similarly, so having an ethdev layer tx_nodescr counter might be helpful.
> Other applications could try to retransmit them; if there are still no TX descriptors, they will be counted again.
> 
> In case anyone gets funny ideas: The PMD should still not free those non-transmitted packet mbufs, because the application might want to treat them differently than the transmitted packets, e.g. for latency stats or packet capture.
> 


^ permalink raw reply	[relevance 0%]

* [PATCH v11 5/9] net/tap: rewrite the RSS BPF program
  @ 2024-05-02  2:49  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-02  2:49 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite of the BPF program used to do queue based RSS.

Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF code.
	- the resulting queue is placed in skb by using skb mark
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  49 +++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 267 +++++++++++++++++++++++++
 9 files changed, 397 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..6d323d2051
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across multiple queues if required by a flow action. The program is
+loaded into the kernel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+  as a target architecture and bpftool to convert object to headers.
+
+  Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Uses clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+   from tap_rss.bpf.o. This header contains wrapper functions for
+   managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+   the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..025b831b5c
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,267 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __always_inline __u32
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static __always_inline int
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __always_inline __u32
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __always_inline __u32
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	const __u32 *key;
+	__be16 proto;
+	__u32 mark;
+	__u32 hash;
+	__u16 queue;
+
+	__builtin_preserve_access_index(({
+		mark = skb->mark;
+		proto = skb->protocol;
+	}));
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	key = (const __u32 *)rsskey->key;
+
+	if (proto == bpf_htons(ETH_P_IP))
+		hash = parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (proto == bpf_htons(ETH_P_IPV6))
+		hash = parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		hash = 0;
+
+	if (hash == 0)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	queue = reciprocal_scale(hash, rsskey->nb_queues);
+
+	__builtin_preserve_access_index(({
+		skb->queue_mapping = queue;
+	}));
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* RE: [PATCH] net/af_packet: fix statistics
  2024-05-01 16:44  3%   ` Stephen Hemminger
@ 2024-05-01 18:18  0%     ` Morten Brørup
  2024-05-02 13:47  0%       ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2024-05-01 18:18 UTC (permalink / raw)
  To: Stephen Hemminger, Ferruh Yigit
  Cc: dev, John W. Linville, Mattias Rönnblom

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 1 May 2024 18.45
> 
> On Wed, 1 May 2024 17:25:59 +0100
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
> > >  - Remove the tx_error counter since it was not correct.
> > >    When transmit ring is full it is not an error and
> > >    the driver correctly returns only the number sent.
> > >
> >
> > nack
> > Transmit full is not only return case here.
> > There are actual errors continue to process relying this error
> calculation.
> > Also there are error cases like interface down.
> > Those error cases should be handled individually if we remove this.
> > I suggest split this change to separate patch.
> 
> I see multiple drivers have copy/pasted same code and consider
> transmit full as an error. It is not.

+1
Transmit full is certainly not an error!

> 
> There should be a new statistic at ethdev layer that does record
> transmit full, and make it across all drivers, but that would have
> to wait for ABI change.

What happens to these non-transmittable packets depend on the application.
Our application discards them and count them in a (per-port, per-queue) application level counter tx_nodescr, which eventually becomes IF-MIB::ifOutDiscards in SNMP. I think many applications behave similarly, so having an ethdev layer tx_nodescr counter might be helpful.
Other applications could try to retransmit them; if there are still no TX descriptors, they will be counted again.

In case anyone gets funny ideas: The PMD should still not free those non-transmitted packet mbufs, because the application might want to treat them differently than the transmitted packets, e.g. for latency stats or packet capture.


^ permalink raw reply	[relevance 0%]

* Re: [PATCH] net/af_packet: fix statistics
  @ 2024-05-01 16:44  3%   ` Stephen Hemminger
  2024-05-01 18:18  0%     ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-05-01 16:44 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, John W. Linville, Mattias Rönnblom

On Wed, 1 May 2024 17:25:59 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> >  - Remove the tx_error counter since it was not correct.
> >    When transmit ring is full it is not an error and
> >    the driver correctly returns only the number sent.
> >   
> 
> nack
> Transmit full is not only return case here.
> There are actual errors continue to process relying this error calculation.
> Also there are error cases like interface down.
> Those error cases should be handled individually if we remove this.
> I suggest split this change to separate patch.

I see multiple drivers have copy/pasted same code and consider
transmit full as an error. It is not.

There should be a new statistic at ethdev layer that does record
transmit full, and make it across all drivers, but that would have
to wait for ABI change.

^ permalink raw reply	[relevance 3%]

* [PATCH v10 5/9] net/tap: rewrite the RSS BPF program
  @ 2024-05-01 16:12  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-05-01 16:12 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite of the BPF program used to do queue based RSS.

Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF code.
	- the resulting queue is placed in skb by using skb mark
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  49 +++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 264 ++++++++++++++++++++++++
 9 files changed, 394 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..181f76a134
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,49 @@
+This is the BPF program used to implement Receive Side Scaling (RSS)
+across mulitple queues if required by a flow action. The program is
+loaded into the krnel when first RSS flow rule is created and is never unloaded.
+
+When flow rules with the TAP device, packets are first handled by the
+ingress queue discipline that then runs a series of classifier filter rules.
+The first stage is the flow based classifier (flower); for RSS queue
+action the second stage is an the kernel skbedit action which sets
+the skb mark to a key based on the flow id; the final stage
+is this BPF program which then maps flow id and packet header
+into a queue id.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+
+- rebuilding the BPF requires the clang compiler with bpf available
+  as a targe architecture and bpftool to convert object to headers.
+
+  Some older versions of Ubuntu do not have a working bpftool package.
+
+- only standard Toeplitz hash with standard 40 byte key is supported.
+
+- the number of flow rules using RSS is limited to 32.
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are available. If everything works then
+BPF RSS is enabled.
+
+The steps are:
+
+1. Usws clang to compile tap_rss.c to produce tap_rss.bpf.o
+
+2. Uses bpftool generate a skeleton header file tap_rss.skel.h
+   from tap_rss.bpf.o. This header contains wrapper functions for
+   managing the BPF and the actual BPF code as a large byte array.
+
+3. The header file is include in tap_flow.c so that it can load
+   the BPF code (via libbpf).
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u32 mark = skb->mark;
+	__u32 hash;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	hash = calculate_rss_hash(skb, rsskey);
+	if (!hash)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v9 5/9] net/tap: rewrite the RSS BPF program
  @ 2024-04-26 15:48  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-26 15:48 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite the BPF program used to do queue based RSS.
Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF.
	- the resulting queue is placed in skb rather
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  38 ++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 264 ++++++++++++++++++++++++
 9 files changed, 383 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+  Some older versions of Ubuntu do not have working bpftool package.
+  Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+   This skeleton header is an large byte array which contains the
+   BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u32 mark = skb->mark;
+	__u32 hash;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	hash = calculate_rss_hash(skb, rsskey);
+	if (!hash)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* Re: Minutes of DPDK Technical Board Meeting, 2024-04-03
  2024-04-24 17:25  3% ` Morten Brørup
@ 2024-04-24 19:10  0%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-04-24 19:10 UTC (permalink / raw)
  To: Morten Brørup; +Cc: dev, techboard

24/04/2024 19:25, Morten Brørup:
> > Inlining should be avoided in public headers because of ABI
> > compatibility issue
> > and structures being exported because of inline requirement.
> 
> This sounds like a techboard decision, which I don't think it was.
> Suggested wording:
> 
> A disadvantage of inlining in public headers is ABI compatibility issues and structures being exported because of inline requirement.
> 
> 
> Perhaps I'm being paranoid, and the phrase "should be" already suffices.
> 
> Whichever wording you prefer,
> ACK

This is the final report sent to dev@dpdk.org :)
Yes I think the word "should" reflect what was said
during the meeting without any formal vote.



^ permalink raw reply	[relevance 0%]

* Re: getting rid of type argument to rte_malloc().
  2024-04-24 10:29  0% ` Ferruh Yigit
  2024-04-24 16:23  0%   ` Stephen Hemminger
@ 2024-04-24 19:06  0%   ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-24 19:06 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Wed, 24 Apr 2024 11:29:51 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > For the 24.11 release, I want to remove the unused type string argument
> > that shows up in rte_malloc() and related functions, then percolates down
> > through.  It was a idea in the 1.0 release of DPDK, never implemented and
> > never removed.  Yes it will cause API breakage, a large sweeping change;
> > probably easily scripted with coccinelle.
> > 
> > Maybe doing ABI version now?
> >  
> 
> Won't this impact many applications, is there big enough motivation to
> force many DPDK applications to update their code, living with it looks
> simpler.
> 


Something like this script, and fix up the result.

From 13ec14dff523f6e896ab55a17a3c66b45bd90bbc Mon Sep 17 00:00:00 2001
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Wed, 24 Apr 2024 09:39:27 -0700
Subject: [PATCH] devtools/cocci: add script to find unnecessary malloc type

The malloc type argument is unused and should be NULL.
This script finds and fixes those places.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 devtools/cocci/malloc-type.cocci | 33 ++++++++++++++++++++++++++++++++
 1 file changed, 33 insertions(+)
 create mode 100644 devtools/cocci/malloc-type.cocci

diff --git a/devtools/cocci/malloc-type.cocci b/devtools/cocci/malloc-type.cocci
new file mode 100644
index 0000000000..cd74797ecb
--- /dev/null
+++ b/devtools/cocci/malloc-type.cocci
@@ -0,0 +1,33 @@
+//
+// The Ting type field in malloc routines was never
+// implemented and should be NULL
+//
+@@
+expression T != NULL;
+expression num, socket, size, align;
+@@
+(
+- rte_malloc(T, size, align)
++ rte_malloc(NULL, size, align)
+|
+- rte_zmalloc(T, size, align)
++ rte_zmalloc(NULL,  size, align)
+|
+- rte_calloc(T, num, size, align)
++ rte_calloc(NULL, num, size, align)
+|
+- rte_realloc(T, size, align)
++ rte_realloc(NULL, size, align)
+|
+- rte_realloc_socket(T, size, align, socket)
++ rte_realloc_socket(NULL, size, align, socket)
+|
+- rte_malloc_socket(T, size, align, socket)
++ rte_malloc_socket(NULL, size, align, socket)
+|
+- rte_zmalloc_socket(T, size, align, socket)
++ rte_zmalloc_socket(NULL, size, align, socket)
+|
+- rte_calloc_socket(T, num, size, align, socket)
++ rte_calloc_socket(NULL, num, size, align, socket)
+)
-- 
2.43.0


^ permalink raw reply	[relevance 0%]

* Re: getting rid of type argument to rte_malloc().
  2024-04-24 17:09  0%     ` Morten Brørup
@ 2024-04-24 19:05  0%       ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-24 19:05 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Ferruh Yigit, dev

On Wed, 24 Apr 2024 19:09:24 +0200
Morten Brørup <mb@smartsharesystems.com> wrote:

> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Wednesday, 24 April 2024 18.24
> > 
> > On Wed, 24 Apr 2024 11:29:51 +0100
> > Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> >   
> > > On 4/24/2024 5:08 AM, Stephen Hemminger wrote:  
> > > > For the 24.11 release, I want to remove the unused type string  
> > argument  
> > > > that shows up in rte_malloc() and related functions, then percolates  
> > down  
> > > > through.  It was a idea in the 1.0 release of DPDK, never  
> > implemented and  
> > > > never removed.  Yes it will cause API breakage, a large sweeping  
> > change;  
> > > > probably easily scripted with coccinelle.
> > > >
> > > > Maybe doing ABI version now?
> > > >  
> > >
> > > Won't this impact many applications, is there big enough motivation to
> > > force many DPDK applications to update their code, living with it  
> > looks  
> > > simpler.
> > >  
> > 
> > Yeah, probably too big an impact but at least:
> >   - change the documentation to say "do not use" should be NULL
> >   - add script to remove all usage inside of DPDK
> >   - get rid of places where useless arg is passed around inside
> >     of the allocator internals.  
> 
> For the sake of discussion:
> Do we want to get rid of the "name" parameter to the memzone allocation functions too? It's somewhat weird that they differ.

The name is used by memzone lookup for secondary process etc.

> 
> Or are rte_memzone allocations considered init and control path, while rte_malloc allocations are considered fast path?
> 

Not really.

^ permalink raw reply	[relevance 0%]

* RE: Minutes of DPDK Technical Board Meeting, 2024-04-03
  2024-04-24 15:24  3% Minutes of DPDK Technical Board Meeting, 2024-04-03 Thomas Monjalon
@ 2024-04-24 17:25  3% ` Morten Brørup
  2024-04-24 19:10  0%   ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2024-04-24 17:25 UTC (permalink / raw)
  To: Thomas Monjalon, dev; +Cc: techboard

> Inlining should be avoided in public headers because of ABI
> compatibility issue
> and structures being exported because of inline requirement.

This sounds like a techboard decision, which I don't think it was.
Suggested wording:

A disadvantage of inlining in public headers is ABI compatibility issues and structures being exported because of inline requirement.


Perhaps I'm being paranoid, and the phrase "should be" already suffices.

Whichever wording you prefer,
ACK


^ permalink raw reply	[relevance 3%]

* RE: getting rid of type argument to rte_malloc().
  2024-04-24 16:23  0%   ` Stephen Hemminger
  2024-04-24 16:23  0%     ` Stephen Hemminger
@ 2024-04-24 17:09  0%     ` Morten Brørup
  2024-04-24 19:05  0%       ` Stephen Hemminger
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2024-04-24 17:09 UTC (permalink / raw)
  To: Stephen Hemminger, Ferruh Yigit; +Cc: dev

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Wednesday, 24 April 2024 18.24
> 
> On Wed, 24 Apr 2024 11:29:51 +0100
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
> > On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > > For the 24.11 release, I want to remove the unused type string
> argument
> > > that shows up in rte_malloc() and related functions, then percolates
> down
> > > through.  It was a idea in the 1.0 release of DPDK, never
> implemented and
> > > never removed.  Yes it will cause API breakage, a large sweeping
> change;
> > > probably easily scripted with coccinelle.
> > >
> > > Maybe doing ABI version now?
> > >
> >
> > Won't this impact many applications, is there big enough motivation to
> > force many DPDK applications to update their code, living with it
> looks
> > simpler.
> >
> 
> Yeah, probably too big an impact but at least:
>   - change the documentation to say "do not use" should be NULL
>   - add script to remove all usage inside of DPDK
>   - get rid of places where useless arg is passed around inside
>     of the allocator internals.

For the sake of discussion:
Do we want to get rid of the "name" parameter to the memzone allocation functions too? It's somewhat weird that they differ.

Or are rte_memzone allocations considered init and control path, while rte_malloc allocations are considered fast path?


^ permalink raw reply	[relevance 0%]

* Re: getting rid of type argument to rte_malloc().
  2024-04-24 16:23  0%   ` Stephen Hemminger
@ 2024-04-24 16:23  0%     ` Stephen Hemminger
  2024-04-24 17:09  0%     ` Morten Brørup
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-24 16:23 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Wed, 24 Apr 2024 11:29:51 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > For the 24.11 release, I want to remove the unused type string argument
> > that shows up in rte_malloc() and related functions, then percolates down
> > through.  It was a idea in the 1.0 release of DPDK, never implemented and
> > never removed.  Yes it will cause API breakage, a large sweeping change;
> > probably easily scripted with coccinelle.
> > 
> > Maybe doing ABI version now?
> >  
> 
> Won't this impact many applications, is there big enough motivation to
> force many DPDK applications to update their code, living with it looks
> simpler.
> 

Yeah, probably too big an impact but at least:
  - change the documentation to say "do not use" should be NULL
  - add script to remove all usage inside of DPDK
  - get rid of places where useless arg is passed around inside
    of the allocator internals.

^ permalink raw reply	[relevance 0%]

* Re: getting rid of type argument to rte_malloc().
  2024-04-24 10:29  0% ` Ferruh Yigit
@ 2024-04-24 16:23  0%   ` Stephen Hemminger
  2024-04-24 16:23  0%     ` Stephen Hemminger
  2024-04-24 17:09  0%     ` Morten Brørup
  2024-04-24 19:06  0%   ` Stephen Hemminger
  1 sibling, 2 replies; 200+ results
From: Stephen Hemminger @ 2024-04-24 16:23 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Wed, 24 Apr 2024 11:29:51 +0100
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> > For the 24.11 release, I want to remove the unused type string argument
> > that shows up in rte_malloc() and related functions, then percolates down
> > through.  It was a idea in the 1.0 release of DPDK, never implemented and
> > never removed.  Yes it will cause API breakage, a large sweeping change;
> > probably easily scripted with coccinelle.
> > 
> > Maybe doing ABI version now?
> >  
> 
> Won't this impact many applications, is there big enough motivation to
> force many DPDK applications to update their code, living with it looks
> simpler.
> 

Yeah, probably too big an impact but at least:
  - change the documentation to say "do not use" should be NULL
  - add script to remove all usage inside of DPDK
  - get rid of places where useless arg is passed around inside
    of the allocator internals.

^ permalink raw reply	[relevance 0%]

* Minutes of DPDK Technical Board Meeting, 2024-04-03
@ 2024-04-24 15:24  3% Thomas Monjalon
  2024-04-24 17:25  3% ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-04-24 15:24 UTC (permalink / raw)
  To: dev; +Cc: techboard

Members Attending: 10/11
	- Aaron Conole
	- Bruce Richardson
	- Hemant Agrawal
	- Honnappa Nagarahalli
	- Kevin Traynor
	- Konstantin Ananyev
	- Maxime Coquelin
	- Morten Brørup
	- Stephen Hemminger
	- Thomas Monjalon (Chair)

NOTE: The Technical Board meetings take place every second Wednesday at 3 pm UTC
on https://zoom-lfx.platform.linuxfoundation.org/meeting/96459488340?password=d808f1f6-0a28-4165-929e-5a5bcae7efeb
Meetings are public, and DPDK community members are welcome to attend.
Agenda and minutes can be found at http://core.dpdk.org/techboard/minutes


1/ MSVC

Work to be able to compile DPDK with MSVC is progressing.

Regarding the tooling, UNH CI is testing MSVC in Windows Server 2022 job.
There was an ask for GHA job building with MSVC.
Example:
	https://github.com/danielzsh/spark/blob/master/.github/workflows/compile.yml

We should not break MSVC compilation for enabled libraries.
When creating a new library, we should require to allow MSVC where it makes sense.
Some guidelines could be added in doc/guides/contributing/design.rst


2/ function inlining

There are pros and cons for function inlining.

There should not be inlining in control path functions.
Inlining should be avoided in public headers because of ABI compatibility issue
and structures being exported because of inline requirement.

Inlining should be used with care, with benchmarks as a proof of efficiency.
Having too much inlining will have a drawback on instruction cache,
that's why we should justify any new usage of inline.

Note that the same recommendations apply with the use of prefetch and likely/unlikely.



^ permalink raw reply	[relevance 3%]

* Re: getting rid of type argument to rte_malloc().
  2024-04-24  4:08  3% getting rid of type argument to rte_malloc() Stephen Hemminger
@ 2024-04-24 10:29  0% ` Ferruh Yigit
  2024-04-24 16:23  0%   ` Stephen Hemminger
  2024-04-24 19:06  0%   ` Stephen Hemminger
  0 siblings, 2 replies; 200+ results
From: Ferruh Yigit @ 2024-04-24 10:29 UTC (permalink / raw)
  To: Stephen Hemminger, dev

On 4/24/2024 5:08 AM, Stephen Hemminger wrote:
> For the 24.11 release, I want to remove the unused type string argument
> that shows up in rte_malloc() and related functions, then percolates down
> through.  It was a idea in the 1.0 release of DPDK, never implemented and
> never removed.  Yes it will cause API breakage, a large sweeping change;
> probably easily scripted with coccinelle.
> 
> Maybe doing ABI version now?
>

Won't this impact many applications, is there big enough motivation to
force many DPDK applications to update their code, living with it looks
simpler.


^ permalink raw reply	[relevance 0%]

* getting rid of type argument to rte_malloc().
@ 2024-04-24  4:08  3% Stephen Hemminger
  2024-04-24 10:29  0% ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-04-24  4:08 UTC (permalink / raw)
  To: dev

For the 24.11 release, I want to remove the unused type string argument
that shows up in rte_malloc() and related functions, then percolates down
through.  It was a idea in the 1.0 release of DPDK, never implemented and
never removed.  Yes it will cause API breakage, a large sweeping change;
probably easily scripted with coccinelle.

Maybe doing ABI version now?

^ permalink raw reply	[relevance 3%]

* Re: [PATCH v7 0/5] app/testpmd: support multiple process attach and detach port
    2024-03-08 10:38  0%   ` lihuisong (C)
@ 2024-04-23 11:17  0%   ` lihuisong (C)
  1 sibling, 0 replies; 200+ results
From: lihuisong (C) @ 2024-04-23 11:17 UTC (permalink / raw)
  To: dev, thomas, ferruh.yigit
  Cc: andrew.rybchenko, fengchengwen, liudongdong3, liuyonglong

Hi Ferruh and Thomas,

It's been almost two years since this issue was reported.
We have discussed a lot before, and also made some progress and consensus.
Can you take a look at it again?  Looking forward to your reply.

BR/
Huisong


在 2024/1/30 14:36, Huisong Li 写道:
> This patchset fix some bugs and support attaching and detaching port
> in primary and secondary.
>
> ---
>   -v7: fix conflicts
>   -v6: adjust rte_eth_dev_is_used position based on alphabetical order
>        in version.map
>   -v5: move 'ALLOCATED' state to the back of 'REMOVED' to avoid abi break.
>   -v4: fix a misspelling.
>   -v3:
>     #1 merge patch 1/6 and patch 2/6 into patch 1/5, and add modification
>        for other bus type.
>     #2 add a RTE_ETH_DEV_ALLOCATED state in rte_eth_dev_state to resolve
>        the probelm in patch 2/5.
>   -v2: resend due to CI unexplained failure.
>
> Huisong Li (5):
>    drivers/bus: restore driver assignment at front of probing
>    ethdev: fix skip valid port in probing callback
>    app/testpmd: check the validity of the port
>    app/testpmd: add attach and detach port for multiple process
>    app/testpmd: stop forwarding in new or destroy event
>
>   app/test-pmd/testpmd.c                   | 47 +++++++++++++++---------
>   app/test-pmd/testpmd.h                   |  1 -
>   drivers/bus/auxiliary/auxiliary_common.c |  9 ++++-
>   drivers/bus/dpaa/dpaa_bus.c              |  9 ++++-
>   drivers/bus/fslmc/fslmc_bus.c            |  8 +++-
>   drivers/bus/ifpga/ifpga_bus.c            | 12 ++++--
>   drivers/bus/pci/pci_common.c             |  9 ++++-
>   drivers/bus/vdev/vdev.c                  | 10 ++++-
>   drivers/bus/vmbus/vmbus_common.c         |  9 ++++-
>   drivers/net/bnxt/bnxt_ethdev.c           |  3 +-
>   drivers/net/bonding/bonding_testpmd.c    |  1 -
>   drivers/net/mlx5/mlx5.c                  |  2 +-
>   lib/ethdev/ethdev_driver.c               | 13 +++++--
>   lib/ethdev/ethdev_driver.h               | 12 ++++++
>   lib/ethdev/ethdev_pci.h                  |  2 +-
>   lib/ethdev/rte_class_eth.c               |  2 +-
>   lib/ethdev/rte_ethdev.c                  |  4 +-
>   lib/ethdev/rte_ethdev.h                  |  4 +-
>   lib/ethdev/version.map                   |  1 +
>   19 files changed, 114 insertions(+), 44 deletions(-)
>

^ permalink raw reply	[relevance 0%]

* Community CI Meeting Minutes - April 18, 2024
@ 2024-04-18 17:49  3% Patrick Robb
  0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2024-04-18 17:49 UTC (permalink / raw)
  To: ci; +Cc: dev, dts

April 18, 2024

#####################################################################
Attendees
1. Patrick Robb
2. Paul Szczepanek
3. Juraj Linkeš
4. Aaron Conole
5. Ali Alnubani

#####################################################################
Minutes

=====================================================================
General Announcements
* GB is wrapping up voting on the UNH Lab server refresh proposal -
should have more info on this by end of week
   * Patrick Robbshare the list of current servers and servers to be
acquired with Paul
* UNH lab is working on updates to get_reruns.py for retests v2, and
will upstream this when ready.
   * UNH will also start pre-populating all environments with PENDING,
and then overwriting those as new results come in.
   * Reminder - Final conclusion on policy is:
      * A) If retest is requested without rebase key, then retest
"original" dpdk artifact (either by re-using the existing tarball (unh
lab) or tracking the commit from submit time and re-applying onto dpdk
at that commit (loongson)).
      * B) If rebase key is included, apply to tip of the indicated
branch. If, because the branch has changed, the patch no longer
applies, then we can report an apply failure. Then, submitter has to
refactor their patch and resubmit.
      * In either case, report the new results with an updated test
result in the email (i.e. report "_Testing PASS RETEST #1" instead of
"_Testing PASS" in the email body).

=====================================================================
CI Status

---------------------------------------------------------------------
UNH-IOL Community Lab
* ABI binaries got sent to Dodji Seketeli after some abi fails came
in, and he confirmed moving to libabigail 2.4 resolves the issue. Cody
Chengis working on this now.
   * To be submitted to upstream template engine:
https://git.dpdk.org/tools/dpdk-ci/tree/containers/template_engine
* SPDK: Working on these compile jobs
   * Currently compile with:
      * Ubuntu 22.04
      * Debian 11
      * Debian 12
      * CentOS 8
      * CentOS 9
      * Fedora 37
      * Fedora 38
      * Fedora 39
      * Opensuse-Leap 15 but with a warning
   * Cannot compile with:
      * Rhel 8
      * Rhel 9
      * SPDK docs state rhel is “best effort”
   * Questions:
      * Should we run with werror enabled?
      * What versions of SPDK do we test?
      * What versions of DPDK do we test SPDK against?
   * Unit tests pass with the distros which are compiling
   * UPDATE: We are polling SPDK people on their Slack, but current
plan is to bring testing online for only distros which work with
werror compile currently. So, not RHEL, no Opensuse.
* OvS DPDK testing:
   * OvS compile still passing on some distros but failing on others -
Adam is going to circle back on this when he gets time
      * Submit tickets for any outstanding issues
      * Bring ovs compile testing online
   * Plans for performance testing are still pending Aaron & David discussing
* Code coverage for fast tests is now running in CI, 1x per month. You
can download the latest reports here:
https://dpdkdashboard.iol.unh.edu/results/dashboard/code-coverage
   * Open out/coveragereport/index.html
   * Do we need code coverage reports for the other unit tests suites?
(not just fast test)
      * UNH to dry run this, share results
* NVIDIA: Gal has offered to send two CX7 NICs to the UNH lab. This
should allow us to install two CX7 NICs on the DUT, and start
forwarding between the two NICs.
* Pcapng_autotest
   * UNH has some spurious failures reported to patchwork for Debian
12. Need to reconnect with Stephen to debug this further.
* Updating Coverity binaries at UNH

---------------------------------------------------------------------
Intel Lab
* Patrick pinged John M again about a lab contact

---------------------------------------------------------------------
Github Actions
* No new updates

---------------------------------------------------------------------
Loongarch Lab
* None

=====================================================================
DTS Improvements & Test Development
* API docs generation:
   * Reviews are needed for this. Need an ACK from
bruce.richardson@intel.com now that there are some new changes on the
meson side
   * Thomas wants to link DTS api docs from doxygen from dpdk docs
   * UNH folks should provide a review
* Jeremy is switching back to DTS next week, and will be working more
on the 2nd scatter case for MLNX, which will rely on the capabilities
querying (and testcase skipping) patch. Will provide feedback to Juraj
on that patch soon.
* Hugepages patch is updated based on feedback from Morten, but
essentially the same (in approach) as last week.

=====================================================================
Any other business
* DPDK Summit in Montreal will now be late September. This plan is
still being finalized.
* Next Meeting: May 1, 2024

^ permalink raw reply	[relevance 3%]

* [PATCH v2 0/3] cryptodev: add API to get used queue pair depth
  2024-04-11  8:22  3% [PATCH 0/3] cryptodev: add API to get used queue pair depth Akhil Goyal
@ 2024-04-12 11:57  3% ` Akhil Goyal
  0 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2024-04-12 11:57 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, hemant.agrawal, anoobj,
	pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
	g.singh, fanzhang.oss, jianjay.zhou, asomalap, ruifeng.wang,
	konstantin.v.ananyev, radu.nicolau, ajit.khaparde, rnagadheeraj,
	ciara.power, Akhil Goyal

Added a new fast path API to get the number of used crypto device
queue pair depth at any given point.

An implementation in cnxk crypto driver is also added along with
a test case in test app.

The addition of new API causes an ABI warning.
This is suppressed as the updated struct rte_crypto_fp_ops is
an internal structure and not to be used by application directly.

v2: fixed shared and clang build issues.

Akhil Goyal (3):
  cryptodev: add API to get used queue pair depth
  crypto/cnxk: support queue pair depth API
  test/crypto: add QP depth used count case

 app/test/test_cryptodev.c                | 117 +++++++++++++++++++++++
 devtools/libabigail.abignore             |   3 +
 drivers/crypto/cnxk/cn10k_cryptodev.c    |   1 +
 drivers/crypto/cnxk/cn9k_cryptodev.c     |   2 +
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c |  16 ++++
 drivers/crypto/cnxk/cnxk_cryptodev_ops.h |   2 +
 lib/cryptodev/cryptodev_pmd.c            |   1 +
 lib/cryptodev/cryptodev_pmd.h            |   2 +
 lib/cryptodev/cryptodev_trace_points.c   |   3 +
 lib/cryptodev/rte_cryptodev.h            |  45 +++++++++
 lib/cryptodev/rte_cryptodev_core.h       |   7 +-
 lib/cryptodev/rte_cryptodev_trace_fp.h   |   7 ++
 lib/cryptodev/version.map                |   3 +
 13 files changed, 208 insertions(+), 1 deletion(-)

-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* [PATCH v4 3/3] dts: add API doc generation
  @ 2024-04-12 10:14  2%   ` Juraj Linkeš
  0 siblings, 0 replies; 200+ results
From: Juraj Linkeš @ 2024-04-12 10:14 UTC (permalink / raw)
  To: thomas, Honnappa.Nagarahalli, bruce.richardson, jspewock, probb,
	paul.szczepanek, Luca.Vizzarro, npratte
  Cc: dev, Juraj Linkeš

The tool used to generate DTS API docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style with one
DTS-specific configuration (so that the DPDK docs are unchanged) that
modifies how the sidebar displays the content.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to object in external documentations, such as the Python documentation.

There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
  code.
* Also the same Python packages as DTS, for the same reason.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
Reviewed-by: Jeremy Spewock <jspewock@iol.unh.edu>
Tested-by: Nicholas Pratte <npratte@iol.unh.edu>
---
 buildtools/call-sphinx-build.py | 33 +++++++++++++++++++---------
 doc/api/doxy-api-index.md       |  3 +++
 doc/api/doxy-api.conf.in        |  2 ++
 doc/api/meson.build             | 11 +++++++---
 doc/guides/conf.py              | 39 ++++++++++++++++++++++++++++-----
 doc/guides/meson.build          |  1 +
 doc/guides/tools/dts.rst        | 34 +++++++++++++++++++++++++++-
 dts/doc/meson.build             | 27 +++++++++++++++++++++++
 dts/meson.build                 | 16 ++++++++++++++
 meson.build                     |  1 +
 10 files changed, 148 insertions(+), 19 deletions(-)
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 39a60d09fa..aea771a64e 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -3,37 +3,50 @@
 # Copyright(c) 2019 Intel Corporation
 #
 
+import argparse
 import sys
 import os
 from os.path import join
 from subprocess import run, PIPE, STDOUT
 from packaging.version import Version
 
-# assign parameters to variables
-(sphinx, version, src, dst, *extra_args) = sys.argv[1:]
+parser = argparse.ArgumentParser()
+parser.add_argument('sphinx')
+parser.add_argument('version')
+parser.add_argument('src')
+parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
+args, extra_args = parser.parse_known_args()
 
 # set the version in environment for sphinx to pick up
-os.environ['DPDK_VERSION'] = version
+os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+    os.environ['DTS_ROOT'] = args.dts_root
 
 # for sphinx version >= 1.7 add parallelism using "-j auto"
-ver = run([sphinx, '--version'], stdout=PIPE,
+ver = run([args.sphinx, '--version'], stdout=PIPE,
           stderr=STDOUT).stdout.decode().split()[-1]
-sphinx_cmd = [sphinx] + extra_args
+sphinx_cmd = [args.sphinx] + extra_args
 if Version(ver) >= Version('1.7'):
     sphinx_cmd += ['-j', 'auto']
 
 # find all the files sphinx will process so we can write them as dependencies
 srcfiles = []
-for root, dirs, files in os.walk(src):
+for root, dirs, files in os.walk(args.src):
     srcfiles.extend([join(root, f) for f in files])
 
+if not os.path.exists(args.dst):
+    os.makedirs(args.dst)
+
 # run sphinx, putting the html output in a "html" directory
-with open(join(dst, 'sphinx_html.out'), 'w') as out:
-    process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
-                  stdout=out)
+with open(join(args.dst, 'sphinx_html.out'), 'w') as out:
+    process = run(
+        sphinx_cmd + ['-b', 'html', args.src, join(args.dst, 'html')],
+        stdout=out
+    )
 
 # create a gcc format .d file giving all the dependencies of this doc build
-with open(join(dst, '.html.d'), 'w') as d:
+with open(join(args.dst, '.html.d'), 'w') as d:
     d.write('html: ' + ' '.join(srcfiles) + '\n')
 
 sys.exit(process.returncode)
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index 8c1eb8fafa..d5f823b7f0 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -243,3 +243,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index 27afec8b3b..2e08c6a452 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -123,6 +123,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES
 
+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>
 
+doc_api_build_dir = meson.current_build_dir()
 doxygen = find_program('doxygen', required: get_option('enable_docs'))
 
 if not doxygen.found()
@@ -32,14 +33,18 @@ example = custom_target('examples.dox',
 # set up common Doxygen configuration
 cdata = configuration_data()
 cdata.set('VERSION', meson.project_version())
-cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api', 'examples.dox'))
-cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
+cdata.set('API_EXAMPLES', join_paths(doc_api_build_dir, 'examples.dox'))
+cdata.set('OUTPUT', doc_api_build_dir)
 cdata.set('TOPDIR', dpdk_source_root)
-cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, join_paths(dpdk_build_root, 'doc', 'api')]))
+cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, doc_api_build_dir]))
 cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))
 
 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -7,10 +7,9 @@
 from sphinx import __version__ as sphinx_version
 from os import listdir
 from os import environ
-from os.path import basename
-from os.path import dirname
+from os.path import basename, dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path
 
 import configparser
 
@@ -24,6 +23,37 @@
           file=stderr)
     pass
 
+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+    path.append(dts_root)
+    # DTS Sidebar config
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,
+    }
+
 stop_on_error = ('-W' in argv)
 
 project = 'Data Plane Development Kit'
@@ -35,8 +65,7 @@
 html_show_copyright = False
 highlight_language = 'none'
 
-release = environ.setdefault('DPDK_VERSION', "None")
-version = release
+version = environ.setdefault('DPDK_VERSION', "None")
 
 master_doc = 'index'
 
diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index 51f81da2e3..8933d75f6b 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation
 
+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))
 
 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 47b218b2c6..d1c3c2af7a 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -280,7 +280,12 @@ and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.
 
-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing, may be accessed
+in the code directly or generated with the :ref:`API docs build steps <building_api_docs>`.
+When adding new files or modifying the directory structure, the corresponding changes must
+be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -415,6 +420,33 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.
 
 
+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then enter its shell:
+
+.. code-block:: console
+
+   poetry install --no-root --with docs
+   poetry shell
+
+The documentation is built using the standard DPDK build system. After executing the meson command
+and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+   Make sure to fix any Sphinx warnings when adding or updating docstrings. Also make sure to run
+   the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
 Configuration Schema
 --------------------
 
diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), dts_doc_api_build_dir, extra_sphinx_args],
+        build_by_default: false,
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 8b248d4505..835973a0ce 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')
 
 # build docs
 subdir('doc')
+subdir('dts')
 
 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* [PATCH 0/3] cryptodev: add API to get used queue pair depth
@ 2024-04-11  8:22  3% Akhil Goyal
  2024-04-12 11:57  3% ` [PATCH v2 " Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Akhil Goyal @ 2024-04-11  8:22 UTC (permalink / raw)
  To: dev
  Cc: thomas, david.marchand, hemant.agrawal, anoobj,
	pablo.de.lara.guarch, fiona.trahe, declan.doherty, matan,
	g.singh, fanzhang.oss, jianjay.zhou, asomalap, ruifeng.wang,
	konstantin.v.ananyev, radu.nicolau, ajit.khaparde, rnagadheeraj,
	ciara.power, Akhil Goyal

Added a new fast path API to get the number of used crypto device
queue pair depth at any given point.

An implementation in cnxk crypto driver is also added along with
a test case in test app.

The addition of new API causes an ABI warning.
This is suppressed as the updated struct rte_crypto_fp_ops is
an internal structure and not to be used by application directly.

Akhil Goyal (3):
  cryptodev: add API to get used queue pair depth
  crypto/cnxk: support queue pair depth API
  test/crypto: add QP depth used count case

 app/test/test_cryptodev.c                | 117 +++++++++++++++++++++++
 devtools/libabigail.abignore             |   3 +
 drivers/crypto/cnxk/cn10k_cryptodev.c    |   1 +
 drivers/crypto/cnxk/cn9k_cryptodev.c     |   2 +
 drivers/crypto/cnxk/cnxk_cryptodev_ops.c |  15 +++
 drivers/crypto/cnxk/cnxk_cryptodev_ops.h |   2 +
 lib/cryptodev/cryptodev_pmd.c            |   1 +
 lib/cryptodev/cryptodev_pmd.h            |   2 +
 lib/cryptodev/cryptodev_trace_points.c   |   3 +
 lib/cryptodev/rte_cryptodev.h            |  45 +++++++++
 lib/cryptodev/rte_cryptodev_core.h       |   7 +-
 lib/cryptodev/rte_cryptodev_trace_fp.h   |   7 ++
 12 files changed, 204 insertions(+), 1 deletion(-)

-- 
2.25.1


^ permalink raw reply	[relevance 3%]

* Re: Strict aliasing problem with rte_eth_linkstatus_set()
  2024-04-10 19:58  3%     ` Tyler Retzlaff
@ 2024-04-11  3:20  0%       ` fengchengwen
  0 siblings, 0 replies; 200+ results
From: fengchengwen @ 2024-04-11  3:20 UTC (permalink / raw)
  To: Tyler Retzlaff, Morten Brørup
  Cc: Stephen Hemminger, Ferruh Yigit, dev, Dengdui Huang

Hi All,

On 2024/4/11 3:58, Tyler Retzlaff wrote:
> On Wed, Apr 10, 2024 at 07:54:27PM +0200, Morten Brørup wrote:
>>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>>> Sent: Wednesday, 10 April 2024 17.27
>>>
>>> On Wed, 10 Apr 2024 17:33:53 +0800
>>> fengchengwen <fengchengwen@huawei.com> wrote:
>>>
>>>> Last: We think there are two ways to solve this problem.
>>>> 1. Add the compilation option '-fno-strict-aliasing' for hold DPDK
>>> project.
>>>> 2. Use union to avoid such aliasing in rte_eth_linkstatus_set (please
>>> see above).
>>>> PS: We prefer first way.
>>>>
>>>
>>> Please send a patch to replace alias with union.
>>
>> +1
>>
>> Fixing this specific bug would be good.

OK for this,
and I already send a bugfix which use union.

Thanks

>>
>> Instinctively, I think we should build with -fno-strict-aliasing, so the compiler doesn't make the same mistake with similar code elsewhere in DPDK. I fear there is more than this instance.
>> I also wonder if -Wstrict-aliasing could help us instead, if we don't want -fno-strict-aliasing.
> 
> agree, union is the correct way to get defined behavior. there are
> valuable optimizatons that the compiler can make with strict aliasing
> enabled so -Wstrict-aliasing is a good suggestion as opposed to
> disabling it.
> 
> also the union won't break the abi if introduced correctly.
> .
> 

^ permalink raw reply	[relevance 0%]

* Re: Strict aliasing problem with rte_eth_linkstatus_set()
  @ 2024-04-10 19:58  3%     ` Tyler Retzlaff
  2024-04-11  3:20  0%       ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-10 19:58 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Stephen Hemminger, fengchengwen, Ferruh Yigit, dev, Dengdui Huang

On Wed, Apr 10, 2024 at 07:54:27PM +0200, Morten Brørup wrote:
> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Wednesday, 10 April 2024 17.27
> > 
> > On Wed, 10 Apr 2024 17:33:53 +0800
> > fengchengwen <fengchengwen@huawei.com> wrote:
> > 
> > > Last: We think there are two ways to solve this problem.
> > > 1. Add the compilation option '-fno-strict-aliasing' for hold DPDK
> > project.
> > > 2. Use union to avoid such aliasing in rte_eth_linkstatus_set (please
> > see above).
> > > PS: We prefer first way.
> > >
> > 
> > Please send a patch to replace alias with union.
> 
> +1
> 
> Fixing this specific bug would be good.
> 
> Instinctively, I think we should build with -fno-strict-aliasing, so the compiler doesn't make the same mistake with similar code elsewhere in DPDK. I fear there is more than this instance.
> I also wonder if -Wstrict-aliasing could help us instead, if we don't want -fno-strict-aliasing.

agree, union is the correct way to get defined behavior. there are
valuable optimizatons that the compiler can make with strict aliasing
enabled so -Wstrict-aliasing is a good suggestion as opposed to
disabling it.

also the union won't break the abi if introduced correctly.

^ permalink raw reply	[relevance 3%]

* Re: Strict aliasing problem with rte_eth_linkstatus_set()
  @ 2024-04-10 15:58  3%   ` Ferruh Yigit
    1 sibling, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-04-10 15:58 UTC (permalink / raw)
  To: Stephen Hemminger, fengchengwen; +Cc: dev, Dengdui Huang

On 4/10/2024 4:27 PM, Stephen Hemminger wrote:
> On Wed, 10 Apr 2024 17:33:53 +0800
> fengchengwen <fengchengwen@huawei.com> wrote:
> 
>> Last: We think there are two ways to solve this problem.
>> 1. Add the compilation option '-fno-strict-aliasing' for hold DPDK project.
>> 2. Use union to avoid such aliasing in rte_eth_linkstatus_set (please see above).
>> PS: We prefer first way.
>>
> 
> Please send a patch to replace alias with union.
> 

+1

I am not sure about ABI implications, as size is not changing I expect
it won't be an issue but may be good to verify with libabigail.

> PS: you can also override aliasing for a few lines of code with either pragma's
> or lots of casting. Both are messy and hard to maintain.


^ permalink raw reply	[relevance 3%]

* [PATCH v8 5/8] net/tap: rewrite the RSS BPF program
  @ 2024-04-09  3:40  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-09  3:40 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite the BPF program used to do queue based RSS.
Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF.
	- the resulting queue is placed in skb rather
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  38 ++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 264 ++++++++++++++++++++++++
 9 files changed, 383 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+  Some older versions of Ubuntu do not have working bpftool package.
+  Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+   This skeleton header is an large byte array which contains the
+   BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u32 mark = skb->mark;
+	__u32 hash;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	hash = calculate_rss_hash(skb, rsskey);
+	if (!hash)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v7 5/8] net/tap: rewrite the RSS BPF program
  @ 2024-04-08 21:18  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-08 21:18 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite the BPF program used to do queue based RSS.
Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF.
	- the resulting queue is placed in skb rather
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  38 ++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 264 ++++++++++++++++++++++++
 9 files changed, 383 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+  Some older versions of Ubuntu do not have working bpftool package.
+  Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+   This skeleton header is an large byte array which contains the
+   BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u32 mark = skb->mark;
+	__u32 hash;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	hash = calculate_rss_hash(skb, rsskey);
+	if (!hash)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v6 6/8] net/tap: rewrite the RSS BPF program
  @ 2024-04-05 21:14  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-05 21:14 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite the BPF program used to do queue based RSS.
Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF.
	- the resulting queue is placed in skb rather
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  38 ++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 264 ++++++++++++++++++++++++
 9 files changed, 383 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+  Some older versions of Ubuntu do not have working bpftool package.
+  Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+   This skeleton header is an large byte array which contains the
+   BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u32 mark = skb->mark;
+	__u32 hash;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	hash = calculate_rss_hash(skb, rsskey);
+	if (!hash)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* RE: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
  2024-04-05 15:15  3%   ` Stephen Hemminger
@ 2024-04-05 18:17  3%     ` Chautru, Nicolas
  0 siblings, 0 replies; 200+ results
From: Chautru, Nicolas @ 2024-04-05 18:17 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, maxime.coquelin, hemant.agrawal, Marchand, David, Vargas, Hernan

Hi Stephen, 

It is not strictly ABI compatible since the size of the structure increases, hence only updating for 24.11. 


> -----Original Message-----
> From: Stephen Hemminger <stephen@networkplumber.org>
> Sent: Friday, April 5, 2024 8:15 AM
> To: Chautru, Nicolas <nicolas.chautru@intel.com>
> Cc: dev@dpdk.org; maxime.coquelin@redhat.com; hemant.agrawal@nxp.com;
> Marchand, David <david.marchand@redhat.com>; Vargas, Hernan
> <hernan.vargas@intel.com>
> Subject: Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
> 
> On Thu,  4 Apr 2024 14:04:45 -0700
> Nicolas Chautru <nicolas.chautru@intel.com> wrote:
> 
> > Capturing additional queue stats counter for the depth of enqueue
> > batch still available on the given queue. This can help application to
> > monitor that depth at run time.
> >
> > Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> > ---
> >  lib/bbdev/rte_bbdev.h | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h index
> > 0cbfdd1c95..25514c58ac 100644
> > --- a/lib/bbdev/rte_bbdev.h
> > +++ b/lib/bbdev/rte_bbdev.h
> > @@ -283,6 +283,8 @@ struct rte_bbdev_stats {
> >  	 *     bbdev operation
> >  	 */
> >  	uint64_t acc_offload_cycles;
> > +	/** Available number of enqueue batch on that queue. */
> > +	uint16_t enqueue_depth_avail;
> >  };
> >
> >  /**
> 
> Doesn't this break the ABI?

^ permalink raw reply	[relevance 3%]

* Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
    2024-04-05  0:46  3%   ` Stephen Hemminger
@ 2024-04-05 15:15  3%   ` Stephen Hemminger
  2024-04-05 18:17  3%     ` Chautru, Nicolas
  1 sibling, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-04-05 15:15 UTC (permalink / raw)
  To: Nicolas Chautru
  Cc: dev, maxime.coquelin, hemant.agrawal, david.marchand, hernan.vargas

On Thu,  4 Apr 2024 14:04:45 -0700
Nicolas Chautru <nicolas.chautru@intel.com> wrote:

> Capturing additional queue stats counter for the
> depth of enqueue batch still available on the given
> queue. This can help application to monitor that depth
> at run time.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>
> ---
>  lib/bbdev/rte_bbdev.h | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/bbdev/rte_bbdev.h b/lib/bbdev/rte_bbdev.h
> index 0cbfdd1c95..25514c58ac 100644
> --- a/lib/bbdev/rte_bbdev.h
> +++ b/lib/bbdev/rte_bbdev.h
> @@ -283,6 +283,8 @@ struct rte_bbdev_stats {
>  	 *     bbdev operation
>  	 */
>  	uint64_t acc_offload_cycles;
> +	/** Available number of enqueue batch on that queue. */
> +	uint16_t enqueue_depth_avail;
>  };
>  
>  /**

Doesn't this break the ABI?

^ permalink raw reply	[relevance 3%]

* Re: [PATCH] lib: add get/set link settings interface
  2024-04-05  0:55  0%       ` Tyler Retzlaff
  2024-04-05  0:56  0%         ` Tyler Retzlaff
@ 2024-04-05  8:58  0%         ` David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2024-04-05  8:58 UTC (permalink / raw)
  To: Tyler Retzlaff, Dodji Seketeli; +Cc: Thomas Monjalon, dev

On Fri, Apr 5, 2024 at 2:55 AM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
> On Thu, Apr 04, 2024 at 09:09:40AM +0200, David Marchand wrote:
> > On Wed, Apr 3, 2024 at 6:49 PM Tyler Retzlaff
> > > this breaks the abi. David does libabigail pick this up i wonder?
> >
> > Yes, the CI flagged it.
> >
> > Looking at the UNH report (in patchwork):
> > http://mails.dpdk.org/archives/test-report/2024-April/631222.html
>
> i'm jealous we don't have libabigail on windows, so helpful.

libabigail is written in C++ and relies on the elfutils and libxml2 libraries.
I am unclear about what binary format is used in Windows... so I am
not sure how much work would be required to have it on Windows.

That's more something to discuss with Dodji :-).


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [PATCH] lib: add get/set link settings interface
  2024-04-05  0:55  0%       ` Tyler Retzlaff
@ 2024-04-05  0:56  0%         ` Tyler Retzlaff
  2024-04-05  8:58  0%         ` David Marchand
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-05  0:56 UTC (permalink / raw)
  To: David Marchand
  Cc: Marek Pazdan, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev

On Thu, Apr 04, 2024 at 05:55:18PM -0700, Tyler Retzlaff wrote:
> On Thu, Apr 04, 2024 at 09:09:40AM +0200, David Marchand wrote:
> > Hello Tyler, Marek,
> > 
> > On Wed, Apr 3, 2024 at 6:49 PM Tyler Retzlaff
> > <roretzla@linux.microsoft.com> wrote:
> > >
> > > On Wed, Apr 03, 2024 at 06:40:24AM -0700, Marek Pazdan wrote:
> > > >  There are link settings parameters available from PMD drivers level
> > > >  which are currently not exposed to the user via consistent interface.
> > > >  When interface is available for system level those information can
> > > >  be acquired with 'ethtool DEVNAME' (ioctl: ETHTOOL_SLINKSETTINGS/
> > > >  ETHTOOL_GLINKSETTINGS). There are use cases where
> > > >  physical interface is passthrough to dpdk driver and is not available
> > > >  from system level. Information provided by ioctl carries information
> > > >  useful for link auto negotiation settings among others.
> > > >
> > > > Signed-off-by: Marek Pazdan <mpazdan@arista.com>
> > > > ---
> > > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > > index 147257d6a2..66aad925d0 100644
> > > > --- a/lib/ethdev/rte_ethdev.h
> > > > +++ b/lib/ethdev/rte_ethdev.h
> > > > @@ -335,7 +335,7 @@ struct rte_eth_stats {
> > > >  __extension__
> > > >  struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
> > > >       uint32_t link_speed;        /**< RTE_ETH_SPEED_NUM_ */
> > > > -     uint16_t link_duplex  : 1;  /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
> > > > +     uint16_t link_duplex  : 2;  /**< RTE_ETH_LINK_[HALF/FULL/UNKNOWN]_DUPLEX */
> > > >       uint16_t link_autoneg : 1;  /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
> > > >       uint16_t link_status  : 1;  /**< RTE_ETH_LINK_[DOWN/UP] */
> > > >  };
> > >
> > > this breaks the abi. David does libabigail pick this up i wonder?
> > >
> > 
> > Yes, the CI flagged it.
> > 
> > Looking at the UNH report (in patchwork):
> > http://mails.dpdk.org/archives/test-report/2024-April/631222.html
> 
> i'm jealous we don't have libabigail on windows, so helpfull.

s/ll/l/ end of day bah.

> 
> > 
> > 1 function with some indirect sub-type change:
> > 
> > [C] 'function int rte_eth_link_get(uint16_t, rte_eth_link*)' at
> > rte_ethdev.c:2972:1 has some indirect sub-type changes:
> > parameter 2 of type 'rte_eth_link*' has sub-type changes:
> > in pointed to type 'struct rte_eth_link' at rte_ethdev.h:336:1:
> > type size hasn't changed
> > 2 data member changes:
> > 'uint16_t link_autoneg' offset changed from 33 to 34 (in bits) (by +1 bits)
> > 'uint16_t link_status' offset changed from 34 to 35 (in bits) (by +1 bits)
> > 
> > Error: ABI issue reported for abidiff --suppr
> > /home-local/jenkins-local/jenkins-agent/workspace/Generic-DPDK-Compile-ABI
> > at 3/dpdk/devtools/libabigail.abignore --no-added-syms --headers-dir1
> > reference/usr/local/include --headers-dir2
> > build_install/usr/local/include
> > reference/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.0
> > build_install/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.2
> > ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
> > this as a potential issue).
> > 
> > 
> > GHA would have caught it too, but the documentation generation failed
> > before reaching the ABI check.
> > http://mails.dpdk.org/archives/test-report/2024-April/631086.html
> > 
> > 
> > -- 
> > David Marchand

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] lib: add get/set link settings interface
  2024-04-04  7:09  4%     ` David Marchand
@ 2024-04-05  0:55  0%       ` Tyler Retzlaff
  2024-04-05  0:56  0%         ` Tyler Retzlaff
  2024-04-05  8:58  0%         ` David Marchand
  0 siblings, 2 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-05  0:55 UTC (permalink / raw)
  To: David Marchand
  Cc: Marek Pazdan, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev

On Thu, Apr 04, 2024 at 09:09:40AM +0200, David Marchand wrote:
> Hello Tyler, Marek,
> 
> On Wed, Apr 3, 2024 at 6:49 PM Tyler Retzlaff
> <roretzla@linux.microsoft.com> wrote:
> >
> > On Wed, Apr 03, 2024 at 06:40:24AM -0700, Marek Pazdan wrote:
> > >  There are link settings parameters available from PMD drivers level
> > >  which are currently not exposed to the user via consistent interface.
> > >  When interface is available for system level those information can
> > >  be acquired with 'ethtool DEVNAME' (ioctl: ETHTOOL_SLINKSETTINGS/
> > >  ETHTOOL_GLINKSETTINGS). There are use cases where
> > >  physical interface is passthrough to dpdk driver and is not available
> > >  from system level. Information provided by ioctl carries information
> > >  useful for link auto negotiation settings among others.
> > >
> > > Signed-off-by: Marek Pazdan <mpazdan@arista.com>
> > > ---
> > > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > > index 147257d6a2..66aad925d0 100644
> > > --- a/lib/ethdev/rte_ethdev.h
> > > +++ b/lib/ethdev/rte_ethdev.h
> > > @@ -335,7 +335,7 @@ struct rte_eth_stats {
> > >  __extension__
> > >  struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
> > >       uint32_t link_speed;        /**< RTE_ETH_SPEED_NUM_ */
> > > -     uint16_t link_duplex  : 1;  /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
> > > +     uint16_t link_duplex  : 2;  /**< RTE_ETH_LINK_[HALF/FULL/UNKNOWN]_DUPLEX */
> > >       uint16_t link_autoneg : 1;  /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
> > >       uint16_t link_status  : 1;  /**< RTE_ETH_LINK_[DOWN/UP] */
> > >  };
> >
> > this breaks the abi. David does libabigail pick this up i wonder?
> >
> 
> Yes, the CI flagged it.
> 
> Looking at the UNH report (in patchwork):
> http://mails.dpdk.org/archives/test-report/2024-April/631222.html

i'm jealous we don't have libabigail on windows, so helpfull.

> 
> 1 function with some indirect sub-type change:
> 
> [C] 'function int rte_eth_link_get(uint16_t, rte_eth_link*)' at
> rte_ethdev.c:2972:1 has some indirect sub-type changes:
> parameter 2 of type 'rte_eth_link*' has sub-type changes:
> in pointed to type 'struct rte_eth_link' at rte_ethdev.h:336:1:
> type size hasn't changed
> 2 data member changes:
> 'uint16_t link_autoneg' offset changed from 33 to 34 (in bits) (by +1 bits)
> 'uint16_t link_status' offset changed from 34 to 35 (in bits) (by +1 bits)
> 
> Error: ABI issue reported for abidiff --suppr
> /home-local/jenkins-local/jenkins-agent/workspace/Generic-DPDK-Compile-ABI
> at 3/dpdk/devtools/libabigail.abignore --no-added-syms --headers-dir1
> reference/usr/local/include --headers-dir2
> build_install/usr/local/include
> reference/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.0
> build_install/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.2
> ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
> this as a potential issue).
> 
> 
> GHA would have caught it too, but the documentation generation failed
> before reaching the ABI check.
> http://mails.dpdk.org/archives/test-report/2024-April/631086.html
> 
> 
> -- 
> David Marchand

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth
  @ 2024-04-05  0:46  3%   ` Stephen Hemminger
  2024-04-05 15:15  3%   ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-05  0:46 UTC (permalink / raw)
  To: Nicolas Chautru
  Cc: dev, maxime.coquelin, hemant.agrawal, david.marchand, hernan.vargas

On Thu,  4 Apr 2024 14:04:45 -0700
Nicolas Chautru <nicolas.chautru@intel.com> wrote:

> Capturing additional queue stats counter for the
> depth of enqueue batch still available on the given
> queue. This can help application to monitor that depth
> at run time.
> 
> Signed-off-by: Nicolas Chautru <nicolas.chautru@intel.com>

Adding field is an ABI change and will have to wait until 24.11 release

^ permalink raw reply	[relevance 3%]

* [PATCH v11 2/4] mbuf: remove rte marker fields
  2024-04-04 17:51  3% ` [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-04 17:51  2%   ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-04 17:51 UTC (permalink / raw)
  To: dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb, Tyler Retzlaff

RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields from rte_mbuf struct.

Maintain alignment of fields after removed cacheline1 marker by placing
C11 alignas(RTE_CACHE_LINE_MIN_SIZE).

Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.

This change breaks the API for cacheline{0,1} fields that have been
removed from rte_mbuf but it does not break the ABI, to address the
false positives of the removed (but 0 size fields) provide the minimum
libabigail.abignore for type = rte_mbuf.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 devtools/libabigail.abignore           |   6 +
 doc/guides/rel_notes/release_24_07.rst |   3 +
 lib/mbuf/rte_mbuf.h                    |   4 +-
 lib/mbuf/rte_mbuf_core.h               | 202 +++++++++++++++++----------------
 4 files changed, 116 insertions(+), 99 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 645d289..ad13179 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -37,3 +37,9 @@
 [suppress_type]
 	name = rte_eth_fp_ops
 	has_data_member_inserted_between = {offset_of(reserved2), end}
+
+[suppress_type]
+	name = rte_mbuf
+	type_kind = struct
+	has_size_change = no
+	has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, cacheline1}
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24c..b240ee5 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -68,6 +68,9 @@ Removed Items
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* mbuf: ``RTE_MARKER`` fields ``cacheline0`` and ``cacheline1``
+  have been removed from ``struct rte_mbuf``.
+
 
 API Changes
 -----------
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b..4c4722e 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -108,7 +108,7 @@
 static inline void
 rte_mbuf_prefetch_part1(struct rte_mbuf *m)
 {
-	rte_prefetch0(&m->cacheline0);
+	rte_prefetch0(m);
 }
 
 /**
@@ -126,7 +126,7 @@
 rte_mbuf_prefetch_part2(struct rte_mbuf *m)
 {
 #if RTE_CACHE_LINE_SIZE == 64
-	rte_prefetch0(&m->cacheline1);
+	rte_prefetch0(RTE_PTR_ADD(m, RTE_CACHE_LINE_MIN_SIZE));
 #else
 	RTE_SET_USED(m);
 #endif
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 9f58076..726c2cf 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -465,8 +465,6 @@ enum {
  * The generic rte_mbuf, containing a packet mbuf.
  */
 struct __rte_cache_aligned rte_mbuf {
-	RTE_MARKER cacheline0;
-
 	void *buf_addr;           /**< Virtual address of segment buffer. */
 #if RTE_IOVA_IN_MBUF
 	/**
@@ -488,127 +486,138 @@ struct __rte_cache_aligned rte_mbuf {
 #endif
 
 	/* next 8 bytes are initialised on RX descriptor rearm */
-	RTE_MARKER64 rearm_data;
-	uint16_t data_off;
-
-	/**
-	 * Reference counter. Its size should at least equal to the size
-	 * of port field (16 bits), to support zero-copy broadcast.
-	 * It should only be accessed using the following functions:
-	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
-	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
-	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
-	 */
-	RTE_ATOMIC(uint16_t) refcnt;
+	union {
+		uint64_t rearm_data[1];
+		__extension__
+		struct {
+			uint16_t data_off;
+
+			/**
+			 * Reference counter. Its size should at least equal to the size
+			 * of port field (16 bits), to support zero-copy broadcast.
+			 * It should only be accessed using the following functions:
+			 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+			 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+			 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
+			 */
+			RTE_ATOMIC(uint16_t) refcnt;
 
-	/**
-	 * Number of segments. Only valid for the first segment of an mbuf
-	 * chain.
-	 */
-	uint16_t nb_segs;
+			/**
+			 * Number of segments. Only valid for the first segment of an mbuf
+			 * chain.
+			 */
+			uint16_t nb_segs;
 
-	/** Input port (16 bits to support more than 256 virtual ports).
-	 * The event eth Tx adapter uses this field to specify the output port.
-	 */
-	uint16_t port;
+			/** Input port (16 bits to support more than 256 virtual ports).
+			 * The event eth Tx adapter uses this field to specify the output port.
+			 */
+			uint16_t port;
+		};
+	};
 
 	uint64_t ol_flags;        /**< Offload features. */
 
-	/* remaining bytes are set on RX when pulling packet from descriptor */
-	RTE_MARKER rx_descriptor_fields1;
-
-	/*
-	 * The packet type, which is the combination of outer/inner L2, L3, L4
-	 * and tunnel types. The packet_type is about data really present in the
-	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
-	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
-	 * vlan is stripped from the data.
-	 */
+	/* remaining 24 bytes are set on RX when pulling packet from descriptor */
 	union {
-		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		/* void * type of the array elements is retained for driver compatibility. */
+		void *rx_descriptor_fields1[24 / sizeof(void *)];
 		__extension__
 		struct {
-			uint8_t l2_type:4;   /**< (Outer) L2 type. */
-			uint8_t l3_type:4;   /**< (Outer) L3 type. */
-			uint8_t l4_type:4;   /**< (Outer) L4 type. */
-			uint8_t tun_type:4;  /**< Tunnel type. */
+			/*
+			 * The packet type, which is the combination of outer/inner L2, L3, L4
+			 * and tunnel types. The packet_type is about data really present in the
+			 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+			 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+			 * vlan is stripped from the data.
+			 */
 			union {
-				uint8_t inner_esp_next_proto;
-				/**< ESP next protocol type, valid if
-				 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
-				 * on both Tx and Rx.
-				 */
+				uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
 				__extension__
 				struct {
-					uint8_t inner_l2_type:4;
-					/**< Inner L2 type. */
-					uint8_t inner_l3_type:4;
-					/**< Inner L3 type. */
+					uint8_t l2_type:4;   /**< (Outer) L2 type. */
+					uint8_t l3_type:4;   /**< (Outer) L3 type. */
+					uint8_t l4_type:4;   /**< (Outer) L4 type. */
+					uint8_t tun_type:4;  /**< Tunnel type. */
+					union {
+						/** ESP next protocol type, valid if
+						 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
+						 * on both Tx and Rx.
+						 */
+						uint8_t inner_esp_next_proto;
+						__extension__
+						struct {
+							/** Inner L2 type. */
+							uint8_t inner_l2_type:4;
+							/** Inner L3 type. */
+							uint8_t inner_l3_type:4;
+						};
+					};
+					uint8_t inner_l4_type:4; /**< Inner L4 type. */
 				};
 			};
-			uint8_t inner_l4_type:4; /**< Inner L4 type. */
-		};
-	};
 
-	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
-	uint16_t data_len;        /**< Amount of data in segment buffer. */
-	/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
-	uint16_t vlan_tci;
+			uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
 
-	union {
-		union {
-			uint32_t rss;     /**< RSS hash result if RSS enabled */
-			struct {
+			uint16_t data_len;        /**< Amount of data in segment buffer. */
+			/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
+			uint16_t vlan_tci;
+
+			union {
 				union {
+					uint32_t rss;     /**< RSS hash result if RSS enabled */
 					struct {
-						uint16_t hash;
-						uint16_t id;
-					};
-					uint32_t lo;
-					/**< Second 4 flexible bytes */
-				};
-				uint32_t hi;
-				/**< First 4 flexible bytes or FD ID, dependent
-				 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
-				 */
-			} fdir;	/**< Filter identifier if FDIR enabled */
-			struct rte_mbuf_sched sched;
-			/**< Hierarchical scheduler : 8 bytes */
-			struct {
-				uint32_t reserved1;
-				uint16_t reserved2;
-				uint16_t txq;
-				/**< The event eth Tx adapter uses this field
-				 * to store Tx queue id.
-				 * @see rte_event_eth_tx_adapter_txq_set()
-				 */
-			} txadapter; /**< Eventdev ethdev Tx adapter */
-			uint32_t usr;
-			/**< User defined tags. See rte_distributor_process() */
-		} hash;                   /**< hash information */
-	};
+						union {
+							struct {
+								uint16_t hash;
+								uint16_t id;
+							};
+							/** Second 4 flexible bytes */
+							uint32_t lo;
+						};
+						/** First 4 flexible bytes or FD ID, dependent
+						 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
+						 */
+						uint32_t hi;
+					} fdir;	/**< Filter identifier if FDIR enabled */
+					/** Hierarchical scheduler : 8 bytes */
+					struct rte_mbuf_sched sched;
+					struct {
+						uint32_t reserved1;
+						uint16_t reserved2;
+						/** The event eth Tx adapter uses this field
+						 * to store Tx queue id.
+						 * @see rte_event_eth_tx_adapter_txq_set()
+						 */
+						uint16_t txq;
+					} txadapter; /**< Eventdev ethdev Tx adapter */
+					/** User defined tags. See rte_distributor_process() */
+					uint32_t usr;
+				} hash;                   /**< hash information */
+			};
 
-	/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
-	uint16_t vlan_tci_outer;
+			/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
+			uint16_t vlan_tci_outer;
 
-	uint16_t buf_len;         /**< Length of segment buffer. */
+			uint16_t buf_len;         /**< Length of segment buffer. */
+		};
+	};
 
 	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
 
 	/* second cache line - fields only used in slow path or on TX */
-	alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
-
 #if RTE_IOVA_IN_MBUF
 	/**
 	 * Next segment of scattered packet. Must be NULL in the last
 	 * segment or in case of non-segmented packet.
 	 */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	struct rte_mbuf *next;
 #else
 	/**
 	 * Reserved for dynamic fields
 	 * when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
 	 */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	uint64_t dynfield2;
 #endif
 
@@ -617,17 +626,16 @@ struct __rte_cache_aligned rte_mbuf {
 		uint64_t tx_offload;       /**< combined for easy fetch */
 		__extension__
 		struct {
-			uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
-			/**< L2 (MAC) Header Length for non-tunneling pkt.
+			/** L2 (MAC) Header Length for non-tunneling pkt.
 			 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
 			 */
+			uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
+			/** L3 (IP) Header Length. */
 			uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
-			/**< L3 (IP) Header Length. */
+			/** L4 (TCP/UDP) Header Length. */
 			uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
-			/**< L4 (TCP/UDP) Header Length. */
+			/** TCP TSO segment size */
 			uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
-			/**< TCP TSO segment size */
-
 			/*
 			 * Fields for Tx offloading of tunnels.
 			 * These are undefined for packets which don't request
@@ -640,10 +648,10 @@ struct __rte_cache_aligned rte_mbuf {
 			 * Applications are expected to set appropriate tunnel
 			 * offload flags when they fill in these fields.
 			 */
+			/** Outer L3 (IP) Hdr Length. */
 			uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
-			/**< Outer L3 (IP) Hdr Length. */
+			/** Outer L2 (MAC) Hdr Length. */
 			uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
-			/**< Outer L2 (MAC) Hdr Length. */
 
 			/* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */
 		};
-- 
1.8.3.1


^ permalink raw reply	[relevance 2%]

* [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries
                     ` (3 preceding siblings ...)
  2024-04-03 17:53  3% ` [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-04 17:51  3% ` Tyler Retzlaff
  2024-04-04 17:51  2%   ` [PATCH v11 2/4] mbuf: remove rte marker fields Tyler Retzlaff
  4 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-04 17:51 UTC (permalink / raw)
  To: dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb, Tyler Retzlaff

As per techboard meeting 2024/03/20 adopt hybrid proposal of adapting
descriptor fields and removing cachline fields.

RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields.

For cacheline{0,1} fields remove fields entirely and use inline
functions to prefetch.

Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.

Note: diff is easier viewed with -b due to additional nesting from
      unions / structs that have been introduced.

v11:
  * correct doxygen comment style for field documentation.

v10:
  * move removal notices in in release notes from 24.03 to 24.07.

v9:
  * provide narrowest possible libabigail.abignore to suppress
    removal of fields that were agreed are not actual abi changes.

v8:
  * rx_descriptor_fields1 array is now constexpr sized to
    24 / sizeof(void *) so that the array encompasses fields
    accessed via the array.
  * add a comment to rx_descriptor_fields1 array site noting
    that void * type of elements is retained for compatibility
    with existing drivers.
  * clean up comments of fields in rte_mbuf to be before the
    field they apply to instead of after.
  * duplicate alignas(RTE_CACHE_LINE_MIN_SIZE) into both legs of
    conditional compile for first field of cacheline 1 instead of
    once before conditional compile block.

v7:
  * complete re-write of series, previous versions not noted. all
    reviewed-by and acked-by tags (if any) were removed.

Tyler Retzlaff (4):
  net/i40e: use inline prefetch function
  mbuf: remove rte marker fields
  security: remove rte marker fields
  cryptodev: remove rte marker fields

 devtools/libabigail.abignore            |   6 +
 doc/guides/rel_notes/release_24_07.rst  |   9 ++
 drivers/net/i40e/i40e_rxtx_vec_avx512.c |   2 +-
 lib/cryptodev/cryptodev_pmd.h           |   5 +-
 lib/mbuf/rte_mbuf.h                     |   4 +-
 lib/mbuf/rte_mbuf_core.h                | 202 +++++++++++++++++---------------
 lib/security/rte_security_driver.h      |   5 +-
 7 files changed, 129 insertions(+), 104 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* Community CI Meeting Minutes - April 4, 2024
@ 2024-04-04 16:29  3% Patrick Robb
  0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2024-04-04 16:29 UTC (permalink / raw)
  To: ci; +Cc: dev, dts

April 4, 2024

#####################################################################
Attendees
1. Patrick Robb
2. Juraj Linkeš
3. Paul Szczepanek
4. Luca Vizzarro

#####################################################################
Minutes

=====================================================================
General Announcements
* DPDK 24.03 has been released
* UNH Community Lab is experiencing power outages, and we are shutting
down testing for the day after this meeting.
   * Will put in retests once we’re back up and running
* Daylight saving time has hit North America, and will also happen in
Europe between this meeting and the next one. Should we adjust?
   * We will adjust earlier 1 hour
* Server Refresh:
   * GB will vote on this soon (I think over email)
   * Patrick sent Nathan some new information about the ARM Grace
server that ARM is requesting, which Nathan is passing along to GB
* UNH lab is working on updates to get_reruns.py for retests v2, and
will upstream this when ready.
   * UNH will also start pre-populating all environments with PENDING,
and then overwriting those as new results come in.
   * Reminder - Final conclusion on policy is:
      * A) If retest is requested without rebase key, then retest
"original" dpdk artifact (either by re-using the existing tarball (unh
lab) or tracking the commit from submit time and re-applying onto dpdk
at that commit (loongson)).
      * B) If rebase key is included, apply to tip of the indicated
branch. If, because the branch has changed, the patch no longer
applies, then we can report an apply failure. Then, submitter has to
refactor their patch and resubmit.
      * In either case, report the new results with an updated test
result in the email (i.e. report "_Testing PASS RETEST #1" instead of
"_Testing PASS" in the email body).
* Depends-on support: Patrick pinged Thomas about this this morning.
   * https://github.com/getpatchwork/patchwork/issues/583 and
https://github.com/getpatchwork/git-pw/issues/71
* MSVC: Tech board discussed extending the dpdk libraries which
compile with MSVC in CI testing, and making all new libraries which
will be used by Windows require compile using MSVC
   * Some members mentioned difficulty due to burden of running
Windows VM to test their patches against before CI
      * One solution is GitHub actions
      * Honnappa requested lab host a windows VM as a community
resource. Users could SSH onto the lab VPN, and use that machine.
         * Patrick Robbwill follow up on the mailing list to see
whether the ci group approves of this idea.
* DPDK Summit will most likely be in Montreal
   * Once we have a date, Patrick will suggest to GB and TB that
anyone who is interested can visit the lab the date after
   * CFP:
      * Should probably give a DTS update, which can be from Patrick,
other UNH people, Honnappa, maybe Juraj (remotely)
      * UNH folks can probably do a CI testing update
         * Discuss new hardware
         * Discuss new testing
         * Discuss new reporting functionality, retests, depends-on,
other qol stuff

=====================================================================
CI Status

---------------------------------------------------------------------
UNH-IOL Community Lab
* Dodji Seketeli is requesting information about the Community Lab’s
ABI jobs to investigate an error on his patch
   * Libabigail version is 2.2.0
   * Patrick will send him the .so abi ref dirs this morning.
* Marvell CN10K:
   * TG is working, Octeon DUT can run DPDK apps and forward packets.
   * Can’t figure out how to reconfigure the link speed on the QSFP
port (want 2x100GbE not 4x 50GbE) - will ask Marvell people to SSH on
to set this
   * Also need to verify the correct meson options for native builds on the DUT
      * right now just using “meson setup -Dplatform=cn10k build” from dpdk docs
      * Juraj states that for ARM cpus (which is on this board) you
should be able to natively compile with default options
* SPDK: Working on these compile jobs
   * Currently compile with:
      * Ubuntu 22.04
      * Debian 11
      * Debian 12
      * CentOS 8
      * CentOS 9
      * Fedora 37
      * Fedora 38
      * Fedora 39
      * Opensuse-Leap 15 but with a warning
   * Cannot compile with:
      * Rhel 8
      * Rhel 9
      * SPDK docs state rhel is “best effort”
   * Questions:
      * Should we run with werror enabled?
      * What versions of SPDK do we test?
      * What versions of DPDK do we test SPDK against?
   * Unit tests pass with the distros which are compiling
* OvS DPDK testing:
   * * Lab sent an email to test-report which got blocked because it
was just above 500kb, which is the limit
* Ts-factory redirect added to dpdk community lab dashboard navbar

---------------------------------------------------------------------
Intel Lab
* None

---------------------------------------------------------------------
Github Actions
* None

---------------------------------------------------------------------
Loongarch Lab
* None

---------------------------------------------------------------------
DTS Improvements & Test Development
* Nick’s hugepages patch will be submitted today (or already is).
   * Forces 2mb hugepages
* Nick is starting on porting the jumboframes testsuite now
   * Starting by manually running scapy, testpmd, tcpdump to verify
the function works, then writing the suite in DTS
* Jeremy is working on the context manager for testpmd to ensure it
closes completely before we attempt to start it again for a subsequent
testcase
* Juraj has provided an initial review of Luca’s testpmd params patch,
the implementation may need to be refactored, but the idea of
simplifying the developer user experience is a good goal
* Jeremy Spewockwill write to Juraj about the capabilities patch. UNH
can test this if needed.
* Other than the testcase capabilities check patch, Juraj will be
renaming the dts execution and doing work for supporting pre-built
DPDK for the SUT
* Luca ran into what may have been a paramiko race condition from when
the interactive shell closes. We are unsure what exactly is happening
but we will probably need to hotfix this. Would likely require some
checks when closing the section.
* Luca tried to run from two intel nics, and could bind to vfio-pci,
but then timed out when trying to rebind to i40e. Left with 1
interface bound to vfio, one interface bound to i40e.
   * Can try rebinding the ports with 1 command, instead of 1 by 1
   * Maybe tried to run dpdk-devbind before all DPDK resources had
been released (just speculation)

=====================================================================
Any other business
* Next Meeting: April 20, 2024

^ permalink raw reply	[relevance 3%]

* Re: [PATCH] lib: add get/set link settings interface
  2024-04-03 16:49  3%   ` Tyler Retzlaff
@ 2024-04-04  7:09  4%     ` David Marchand
  2024-04-05  0:55  0%       ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-04-04  7:09 UTC (permalink / raw)
  To: Tyler Retzlaff, Marek Pazdan
  Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev

Hello Tyler, Marek,

On Wed, Apr 3, 2024 at 6:49 PM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> On Wed, Apr 03, 2024 at 06:40:24AM -0700, Marek Pazdan wrote:
> >  There are link settings parameters available from PMD drivers level
> >  which are currently not exposed to the user via consistent interface.
> >  When interface is available for system level those information can
> >  be acquired with 'ethtool DEVNAME' (ioctl: ETHTOOL_SLINKSETTINGS/
> >  ETHTOOL_GLINKSETTINGS). There are use cases where
> >  physical interface is passthrough to dpdk driver and is not available
> >  from system level. Information provided by ioctl carries information
> >  useful for link auto negotiation settings among others.
> >
> > Signed-off-by: Marek Pazdan <mpazdan@arista.com>
> > ---
> > diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> > index 147257d6a2..66aad925d0 100644
> > --- a/lib/ethdev/rte_ethdev.h
> > +++ b/lib/ethdev/rte_ethdev.h
> > @@ -335,7 +335,7 @@ struct rte_eth_stats {
> >  __extension__
> >  struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
> >       uint32_t link_speed;        /**< RTE_ETH_SPEED_NUM_ */
> > -     uint16_t link_duplex  : 1;  /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
> > +     uint16_t link_duplex  : 2;  /**< RTE_ETH_LINK_[HALF/FULL/UNKNOWN]_DUPLEX */
> >       uint16_t link_autoneg : 1;  /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
> >       uint16_t link_status  : 1;  /**< RTE_ETH_LINK_[DOWN/UP] */
> >  };
>
> this breaks the abi. David does libabigail pick this up i wonder?
>

Yes, the CI flagged it.

Looking at the UNH report (in patchwork):
http://mails.dpdk.org/archives/test-report/2024-April/631222.html

1 function with some indirect sub-type change:

[C] 'function int rte_eth_link_get(uint16_t, rte_eth_link*)' at
rte_ethdev.c:2972:1 has some indirect sub-type changes:
parameter 2 of type 'rte_eth_link*' has sub-type changes:
in pointed to type 'struct rte_eth_link' at rte_ethdev.h:336:1:
type size hasn't changed
2 data member changes:
'uint16_t link_autoneg' offset changed from 33 to 34 (in bits) (by +1 bits)
'uint16_t link_status' offset changed from 34 to 35 (in bits) (by +1 bits)

Error: ABI issue reported for abidiff --suppr
/home-local/jenkins-local/jenkins-agent/workspace/Generic-DPDK-Compile-ABI
at 3/dpdk/devtools/libabigail.abignore --no-added-syms --headers-dir1
reference/usr/local/include --headers-dir2
build_install/usr/local/include
reference/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.0
build_install/usr/local/lib/x86_64-linux-gnu/librte_ethdev.so.24.2
ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
this as a potential issue).


GHA would have caught it too, but the documentation generation failed
before reaching the ABI check.
http://mails.dpdk.org/archives/test-report/2024-April/631086.html


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v10 2/4] mbuf: remove rte marker fields
  2024-04-03 19:32  0%     ` Morten Brørup
@ 2024-04-03 22:45  0%       ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-03 22:45 UTC (permalink / raw)
  To: Morten Brørup
  Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
	David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
	Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
	Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
	Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
	Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang

On Wed, Apr 03, 2024 at 09:32:21PM +0200, Morten Brørup wrote:
> > From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> > Sent: Wednesday, 3 April 2024 19.54
> > 
> > RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> > RTE_MARKER fields from rte_mbuf struct.
> > 
> > Maintain alignment of fields after removed cacheline1 marker by placing
> > C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> > 
> > Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> > unions as single element arrays of with types matching the original
> > markers to maintain API compatibility.
> > 
> > This change breaks the API for cacheline{0,1} fields that have been
> > removed from rte_mbuf but it does not break the ABI, to address the
> > false positives of the removed (but 0 size fields) provide the minimum
> > libabigail.abignore for type = rte_mbuf.
> > 
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> 
> [...]
> 
> > +	/* remaining 24 bytes are set on RX when pulling packet from
> > descriptor */
> 
> Good.
> 
> >  	union {
> > +		/* void * type of the array elements is retained for driver
> > compatibility. */
> > +		void *rx_descriptor_fields1[24 / sizeof(void *)];
> 
> Good, also the description.
> 
> >  		__extension__
> >  		struct {
> > -			uint8_t l2_type:4;   /**< (Outer) L2 type. */
> > -			uint8_t l3_type:4;   /**< (Outer) L3 type. */
> > -			uint8_t l4_type:4;   /**< (Outer) L4 type. */
> > -			uint8_t tun_type:4;  /**< Tunnel type. */
> > +			/*
> > +			 * The packet type, which is the combination of
> > outer/inner L2, L3, L4
> > +			 * and tunnel types. The packet_type is about data
> > really present in the
> > +			 * mbuf. Example: if vlan stripping is enabled, a
> > received vlan packet
> > +			 * would have RTE_PTYPE_L2_ETHER and not
> > RTE_PTYPE_L2_VLAN because the
> > +			 * vlan is stripped from the data.
> > +			 */
> >  			union {
> > -				uint8_t inner_esp_next_proto;
> > -				/**< ESP next protocol type, valid if
> > -				 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
> > -				 * on both Tx and Rx.
> > -				 */
> 
> [...]
> 
> > +						/**< ESP next protocol type, valid
> > if
> > +						 * RTE_PTYPE_TUNNEL_ESP tunnel type
> > is set
> > +						 * on both Tx and Rx.
> > +						 */
> > +						uint8_t inner_esp_next_proto;
> 
> Thank you for moving the comments up before the fields.
> 
> Please note that "/**<" means that the description is related to the field preceding the comment, so it should be replaced by "/**" when moving the description up above a field.

ooh, i'll fix it i'm not well versed in doxygen documentation.

> 
> Maybe moving the descriptions as part of this patch was not a good idea after all; it doesn't improve the readability of the patch itself. I regret suggesting it.
> If you leave the descriptions at their originals positions (relative to the fields), we can clean up the formatting of the descriptions in a later patch.

it's easy enough for me to fix the comments in place and bring in a new
version of the series, assuming other reviewers don't object i'll do that.
the diff is already kind of annoying to review in mail without -b
anyway.

> 
> [...]
> 
> >  	/* second cache line - fields only used in slow path or on TX */
> > -	alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
> > -
> >  #if RTE_IOVA_IN_MBUF
> >  	/**
> >  	 * Next segment of scattered packet. Must be NULL in the last
> >  	 * segment or in case of non-segmented packet.
> >  	 */
> > +	alignas(RTE_CACHE_LINE_MIN_SIZE)
> >  	struct rte_mbuf *next;
> >  #else
> >  	/**
> >  	 * Reserved for dynamic fields
> >  	 * when the next pointer is in first cache line (i.e.
> > RTE_IOVA_IN_MBUF is 0).
> >  	 */
> > +	alignas(RTE_CACHE_LINE_MIN_SIZE)
> 
> Good positioning of the alignas().
> 
> I like everything in this patch.
> 
> Please fix the descriptions preceding the fields "/**<" -> "/**" or move them back to their location after the fields; then you may add Reviewed-by: Morten Brørup <mb@smartsharesystems.com> to the next version.

ack, next rev.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v10 2/4] mbuf: remove rte marker fields
  2024-04-03 17:53  2%   ` [PATCH v10 2/4] mbuf: remove rte marker fields Tyler Retzlaff
  2024-04-03 19:32  0%     ` Morten Brørup
@ 2024-04-03 21:49  0%     ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-03 21:49 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
	David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
	Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
	Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
	Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
	Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, mb

On Wed,  3 Apr 2024 10:53:34 -0700
Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:

> RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> RTE_MARKER fields from rte_mbuf struct.
> 
> Maintain alignment of fields after removed cacheline1 marker by placing
> C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> 
> Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> unions as single element arrays of with types matching the original
> markers to maintain API compatibility.
> 
> This change breaks the API for cacheline{0,1} fields that have been
> removed from rte_mbuf but it does not break the ABI, to address the
> false positives of the removed (but 0 size fields) provide the minimum
> libabigail.abignore for type = rte_mbuf.
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v10 2/4] mbuf: remove rte marker fields
  2024-04-03 17:53  2%   ` [PATCH v10 2/4] mbuf: remove rte marker fields Tyler Retzlaff
@ 2024-04-03 19:32  0%     ` Morten Brørup
  2024-04-03 22:45  0%       ` Tyler Retzlaff
  2024-04-03 21:49  0%     ` Stephen Hemminger
  1 sibling, 1 reply; 200+ results
From: Morten Brørup @ 2024-04-03 19:32 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Wednesday, 3 April 2024 19.54
> 
> RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> RTE_MARKER fields from rte_mbuf struct.
> 
> Maintain alignment of fields after removed cacheline1 marker by placing
> C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> 
> Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> unions as single element arrays of with types matching the original
> markers to maintain API compatibility.
> 
> This change breaks the API for cacheline{0,1} fields that have been
> removed from rte_mbuf but it does not break the ABI, to address the
> false positives of the removed (but 0 size fields) provide the minimum
> libabigail.abignore for type = rte_mbuf.
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---

[...]

> +	/* remaining 24 bytes are set on RX when pulling packet from
> descriptor */

Good.

>  	union {
> +		/* void * type of the array elements is retained for driver
> compatibility. */
> +		void *rx_descriptor_fields1[24 / sizeof(void *)];

Good, also the description.

>  		__extension__
>  		struct {
> -			uint8_t l2_type:4;   /**< (Outer) L2 type. */
> -			uint8_t l3_type:4;   /**< (Outer) L3 type. */
> -			uint8_t l4_type:4;   /**< (Outer) L4 type. */
> -			uint8_t tun_type:4;  /**< Tunnel type. */
> +			/*
> +			 * The packet type, which is the combination of
> outer/inner L2, L3, L4
> +			 * and tunnel types. The packet_type is about data
> really present in the
> +			 * mbuf. Example: if vlan stripping is enabled, a
> received vlan packet
> +			 * would have RTE_PTYPE_L2_ETHER and not
> RTE_PTYPE_L2_VLAN because the
> +			 * vlan is stripped from the data.
> +			 */
>  			union {
> -				uint8_t inner_esp_next_proto;
> -				/**< ESP next protocol type, valid if
> -				 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
> -				 * on both Tx and Rx.
> -				 */

[...]

> +						/**< ESP next protocol type, valid
> if
> +						 * RTE_PTYPE_TUNNEL_ESP tunnel type
> is set
> +						 * on both Tx and Rx.
> +						 */
> +						uint8_t inner_esp_next_proto;

Thank you for moving the comments up before the fields.

Please note that "/**<" means that the description is related to the field preceding the comment, so it should be replaced by "/**" when moving the description up above a field.

Maybe moving the descriptions as part of this patch was not a good idea after all; it doesn't improve the readability of the patch itself. I regret suggesting it.
If you leave the descriptions at their originals positions (relative to the fields), we can clean up the formatting of the descriptions in a later patch.

[...]

>  	/* second cache line - fields only used in slow path or on TX */
> -	alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
> -
>  #if RTE_IOVA_IN_MBUF
>  	/**
>  	 * Next segment of scattered packet. Must be NULL in the last
>  	 * segment or in case of non-segmented packet.
>  	 */
> +	alignas(RTE_CACHE_LINE_MIN_SIZE)
>  	struct rte_mbuf *next;
>  #else
>  	/**
>  	 * Reserved for dynamic fields
>  	 * when the next pointer is in first cache line (i.e.
> RTE_IOVA_IN_MBUF is 0).
>  	 */
> +	alignas(RTE_CACHE_LINE_MIN_SIZE)

Good positioning of the alignas().

I like everything in this patch.

Please fix the descriptions preceding the fields "/**<" -> "/**" or move them back to their location after the fields; then you may add Reviewed-by: Morten Brørup <mb@smartsharesystems.com> to the next version.


^ permalink raw reply	[relevance 0%]

* [PATCH v10 2/4] mbuf: remove rte marker fields
  2024-04-03 17:53  3% ` [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-03 17:53  2%   ` Tyler Retzlaff
  2024-04-03 19:32  0%     ` Morten Brørup
  2024-04-03 21:49  0%     ` Stephen Hemminger
  0 siblings, 2 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-03 17:53 UTC (permalink / raw)
  To: dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb, Tyler Retzlaff

RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields from rte_mbuf struct.

Maintain alignment of fields after removed cacheline1 marker by placing
C11 alignas(RTE_CACHE_LINE_MIN_SIZE).

Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.

This change breaks the API for cacheline{0,1} fields that have been
removed from rte_mbuf but it does not break the ABI, to address the
false positives of the removed (but 0 size fields) provide the minimum
libabigail.abignore for type = rte_mbuf.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 devtools/libabigail.abignore           |   6 +
 doc/guides/rel_notes/release_24_07.rst |   3 +
 lib/mbuf/rte_mbuf.h                    |   4 +-
 lib/mbuf/rte_mbuf_core.h               | 200 +++++++++++++++++----------------
 4 files changed, 115 insertions(+), 98 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 645d289..ad13179 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -37,3 +37,9 @@
 [suppress_type]
 	name = rte_eth_fp_ops
 	has_data_member_inserted_between = {offset_of(reserved2), end}
+
+[suppress_type]
+	name = rte_mbuf
+	type_kind = struct
+	has_size_change = no
+	has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, cacheline1}
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
index a69f24c..b240ee5 100644
--- a/doc/guides/rel_notes/release_24_07.rst
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -68,6 +68,9 @@ Removed Items
    Also, make sure to start the actual text at the margin.
    =======================================================
 
+* mbuf: ``RTE_MARKER`` fields ``cacheline0`` and ``cacheline1``
+  have been removed from ``struct rte_mbuf``.
+
 
 API Changes
 -----------
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b..4c4722e 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -108,7 +108,7 @@
 static inline void
 rte_mbuf_prefetch_part1(struct rte_mbuf *m)
 {
-	rte_prefetch0(&m->cacheline0);
+	rte_prefetch0(m);
 }
 
 /**
@@ -126,7 +126,7 @@
 rte_mbuf_prefetch_part2(struct rte_mbuf *m)
 {
 #if RTE_CACHE_LINE_SIZE == 64
-	rte_prefetch0(&m->cacheline1);
+	rte_prefetch0(RTE_PTR_ADD(m, RTE_CACHE_LINE_MIN_SIZE));
 #else
 	RTE_SET_USED(m);
 #endif
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 9f58076..9d838b8 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -465,8 +465,6 @@ enum {
  * The generic rte_mbuf, containing a packet mbuf.
  */
 struct __rte_cache_aligned rte_mbuf {
-	RTE_MARKER cacheline0;
-
 	void *buf_addr;           /**< Virtual address of segment buffer. */
 #if RTE_IOVA_IN_MBUF
 	/**
@@ -488,127 +486,138 @@ struct __rte_cache_aligned rte_mbuf {
 #endif
 
 	/* next 8 bytes are initialised on RX descriptor rearm */
-	RTE_MARKER64 rearm_data;
-	uint16_t data_off;
-
-	/**
-	 * Reference counter. Its size should at least equal to the size
-	 * of port field (16 bits), to support zero-copy broadcast.
-	 * It should only be accessed using the following functions:
-	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
-	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
-	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
-	 */
-	RTE_ATOMIC(uint16_t) refcnt;
+	union {
+		uint64_t rearm_data[1];
+		__extension__
+		struct {
+			uint16_t data_off;
+
+			/**
+			 * Reference counter. Its size should at least equal to the size
+			 * of port field (16 bits), to support zero-copy broadcast.
+			 * It should only be accessed using the following functions:
+			 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+			 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+			 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
+			 */
+			RTE_ATOMIC(uint16_t) refcnt;
 
-	/**
-	 * Number of segments. Only valid for the first segment of an mbuf
-	 * chain.
-	 */
-	uint16_t nb_segs;
+			/**
+			 * Number of segments. Only valid for the first segment of an mbuf
+			 * chain.
+			 */
+			uint16_t nb_segs;
 
-	/** Input port (16 bits to support more than 256 virtual ports).
-	 * The event eth Tx adapter uses this field to specify the output port.
-	 */
-	uint16_t port;
+			/** Input port (16 bits to support more than 256 virtual ports).
+			 * The event eth Tx adapter uses this field to specify the output port.
+			 */
+			uint16_t port;
+		};
+	};
 
 	uint64_t ol_flags;        /**< Offload features. */
 
-	/* remaining bytes are set on RX when pulling packet from descriptor */
-	RTE_MARKER rx_descriptor_fields1;
-
-	/*
-	 * The packet type, which is the combination of outer/inner L2, L3, L4
-	 * and tunnel types. The packet_type is about data really present in the
-	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
-	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
-	 * vlan is stripped from the data.
-	 */
+	/* remaining 24 bytes are set on RX when pulling packet from descriptor */
 	union {
-		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		/* void * type of the array elements is retained for driver compatibility. */
+		void *rx_descriptor_fields1[24 / sizeof(void *)];
 		__extension__
 		struct {
-			uint8_t l2_type:4;   /**< (Outer) L2 type. */
-			uint8_t l3_type:4;   /**< (Outer) L3 type. */
-			uint8_t l4_type:4;   /**< (Outer) L4 type. */
-			uint8_t tun_type:4;  /**< Tunnel type. */
+			/*
+			 * The packet type, which is the combination of outer/inner L2, L3, L4
+			 * and tunnel types. The packet_type is about data really present in the
+			 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+			 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+			 * vlan is stripped from the data.
+			 */
 			union {
-				uint8_t inner_esp_next_proto;
-				/**< ESP next protocol type, valid if
-				 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
-				 * on both Tx and Rx.
-				 */
+				uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
 				__extension__
 				struct {
-					uint8_t inner_l2_type:4;
-					/**< Inner L2 type. */
-					uint8_t inner_l3_type:4;
-					/**< Inner L3 type. */
+					uint8_t l2_type:4;   /**< (Outer) L2 type. */
+					uint8_t l3_type:4;   /**< (Outer) L3 type. */
+					uint8_t l4_type:4;   /**< (Outer) L4 type. */
+					uint8_t tun_type:4;  /**< Tunnel type. */
+					union {
+						/**< ESP next protocol type, valid if
+						 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
+						 * on both Tx and Rx.
+						 */
+						uint8_t inner_esp_next_proto;
+						__extension__
+						struct {
+							/**< Inner L2 type. */
+							uint8_t inner_l2_type:4;
+							/**< Inner L3 type. */
+							uint8_t inner_l3_type:4;
+						};
+					};
+					uint8_t inner_l4_type:4; /**< Inner L4 type. */
 				};
 			};
-			uint8_t inner_l4_type:4; /**< Inner L4 type. */
-		};
-	};
 
-	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
-	uint16_t data_len;        /**< Amount of data in segment buffer. */
-	/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
-	uint16_t vlan_tci;
+			uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
 
-	union {
-		union {
-			uint32_t rss;     /**< RSS hash result if RSS enabled */
-			struct {
+			uint16_t data_len;        /**< Amount of data in segment buffer. */
+			/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
+			uint16_t vlan_tci;
+
+			union {
 				union {
+					uint32_t rss;     /**< RSS hash result if RSS enabled */
 					struct {
-						uint16_t hash;
-						uint16_t id;
-					};
-					uint32_t lo;
-					/**< Second 4 flexible bytes */
-				};
-				uint32_t hi;
-				/**< First 4 flexible bytes or FD ID, dependent
-				 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
-				 */
-			} fdir;	/**< Filter identifier if FDIR enabled */
-			struct rte_mbuf_sched sched;
-			/**< Hierarchical scheduler : 8 bytes */
-			struct {
-				uint32_t reserved1;
-				uint16_t reserved2;
-				uint16_t txq;
-				/**< The event eth Tx adapter uses this field
-				 * to store Tx queue id.
-				 * @see rte_event_eth_tx_adapter_txq_set()
-				 */
-			} txadapter; /**< Eventdev ethdev Tx adapter */
-			uint32_t usr;
-			/**< User defined tags. See rte_distributor_process() */
-		} hash;                   /**< hash information */
-	};
+						union {
+							struct {
+								uint16_t hash;
+								uint16_t id;
+							};
+							/**< Second 4 flexible bytes */
+							uint32_t lo;
+						};
+						/**< First 4 flexible bytes or FD ID, dependent
+						 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
+						 */
+						uint32_t hi;
+					} fdir;	/**< Filter identifier if FDIR enabled */
+					struct rte_mbuf_sched sched;
+					/**< Hierarchical scheduler : 8 bytes */
+					struct {
+						uint32_t reserved1;
+						uint16_t reserved2;
+						/**< The event eth Tx adapter uses this field
+						 * to store Tx queue id.
+						 * @see rte_event_eth_tx_adapter_txq_set()
+						 */
+						uint16_t txq;
+					} txadapter; /**< Eventdev ethdev Tx adapter */
+					/**< User defined tags. See rte_distributor_process() */
+					uint32_t usr;
+				} hash;                   /**< hash information */
+			};
 
-	/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
-	uint16_t vlan_tci_outer;
+			/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
+			uint16_t vlan_tci_outer;
 
-	uint16_t buf_len;         /**< Length of segment buffer. */
+			uint16_t buf_len;         /**< Length of segment buffer. */
+		};
+	};
 
 	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
 
 	/* second cache line - fields only used in slow path or on TX */
-	alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
-
 #if RTE_IOVA_IN_MBUF
 	/**
 	 * Next segment of scattered packet. Must be NULL in the last
 	 * segment or in case of non-segmented packet.
 	 */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	struct rte_mbuf *next;
 #else
 	/**
 	 * Reserved for dynamic fields
 	 * when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
 	 */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	uint64_t dynfield2;
 #endif
 
@@ -617,17 +626,16 @@ struct __rte_cache_aligned rte_mbuf {
 		uint64_t tx_offload;       /**< combined for easy fetch */
 		__extension__
 		struct {
-			uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
 			/**< L2 (MAC) Header Length for non-tunneling pkt.
 			 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
 			 */
-			uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
+			uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
 			/**< L3 (IP) Header Length. */
-			uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
+			uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
 			/**< L4 (TCP/UDP) Header Length. */
-			uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
+			uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
 			/**< TCP TSO segment size */
-
+			uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
 			/*
 			 * Fields for Tx offloading of tunnels.
 			 * These are undefined for packets which don't request
@@ -640,10 +648,10 @@ struct __rte_cache_aligned rte_mbuf {
 			 * Applications are expected to set appropriate tunnel
 			 * offload flags when they fill in these fields.
 			 */
-			uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
 			/**< Outer L3 (IP) Hdr Length. */
-			uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
+			uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
 			/**< Outer L2 (MAC) Hdr Length. */
+			uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
 
 			/* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */
 		};
-- 
1.8.3.1


^ permalink raw reply	[relevance 2%]

* [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries
                     ` (2 preceding siblings ...)
  2024-04-02 20:08  3% ` [PATCH v9 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-03 17:53  3% ` Tyler Retzlaff
  2024-04-03 17:53  2%   ` [PATCH v10 2/4] mbuf: remove rte marker fields Tyler Retzlaff
  2024-04-04 17:51  3% ` [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
  4 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-03 17:53 UTC (permalink / raw)
  To: dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb, Tyler Retzlaff

As per techboard meeting 2024/03/20 adopt hybrid proposal of adapting
descriptor fields and removing cachline fields.

RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields.

For cacheline{0,1} fields remove fields entirely and use inline
functions to prefetch.

Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.

Note: diff is easier viewed with -b due to additional nesting from
      unions / structs that have been introduced.

v10:
  * move removal notices in in release notes from 24.03 to 24.07

v9:
  * provide narrowest possible libabigail.abignore to suppress
    removal of fields that were agreed are not actual abi changes.

v8:
  * rx_descriptor_fields1 array is now constexpr sized to
    24 / sizeof(void *) so that the array encompasses fields
    accessed via the array.
  * add a comment to rx_descriptor_fields1 array site noting
    that void * type of elements is retained for compatibility
    with existing drivers.
  * clean up comments of fields in rte_mbuf to be before the
    field they apply to instead of after.
  * duplicate alignas(RTE_CACHE_LINE_MIN_SIZE) into both legs of
    conditional compile for first field of cacheline 1 instead of
    once before conditional compile block.

v7:
  * complete re-write of series, previous versions not noted. all
    reviewed-by and acked-by tags (if any) were removed.

Tyler Retzlaff (4):
  net/i40e: use inline prefetch function
  mbuf: remove rte marker fields
  security: remove rte marker fields
  cryptodev: remove rte marker fields

 devtools/libabigail.abignore            |   6 +
 doc/guides/rel_notes/release_24_07.rst  |   9 ++
 drivers/net/i40e/i40e_rxtx_vec_avx512.c |   2 +-
 lib/cryptodev/cryptodev_pmd.h           |   5 +-
 lib/mbuf/rte_mbuf.h                     |   4 +-
 lib/mbuf/rte_mbuf_core.h                | 200 +++++++++++++++++---------------
 lib/security/rte_security_driver.h      |   5 +-
 7 files changed, 128 insertions(+), 103 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH]  lib: add get/set link settings interface
  @ 2024-04-03 16:49  3%   ` Tyler Retzlaff
  2024-04-04  7:09  4%     ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-03 16:49 UTC (permalink / raw)
  To: Marek Pazdan
  Cc: Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko, dev, david.marchand

On Wed, Apr 03, 2024 at 06:40:24AM -0700, Marek Pazdan wrote:
>  There are link settings parameters available from PMD drivers level
>  which are currently not exposed to the user via consistent interface.
>  When interface is available for system level those information can
>  be acquired with 'ethtool DEVNAME' (ioctl: ETHTOOL_SLINKSETTINGS/
>  ETHTOOL_GLINKSETTINGS). There are use cases where
>  physical interface is passthrough to dpdk driver and is not available
>  from system level. Information provided by ioctl carries information
>  useful for link auto negotiation settings among others.
> 
> Signed-off-by: Marek Pazdan <mpazdan@arista.com>
> ---
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 147257d6a2..66aad925d0 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -335,7 +335,7 @@ struct rte_eth_stats {
>  __extension__
>  struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
>  	uint32_t link_speed;        /**< RTE_ETH_SPEED_NUM_ */
> -	uint16_t link_duplex  : 1;  /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
> +	uint16_t link_duplex  : 2;  /**< RTE_ETH_LINK_[HALF/FULL/UNKNOWN]_DUPLEX */
>  	uint16_t link_autoneg : 1;  /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
>  	uint16_t link_status  : 1;  /**< RTE_ETH_LINK_[DOWN/UP] */
>  };

this breaks the abi. David does libabigail pick this up i wonder?

^ permalink raw reply	[relevance 3%]

* RE: The effect of inlining
  2024-04-01 15:20  3%           ` Mattias Rönnblom
@ 2024-04-03 16:01  3%             ` Morten Brørup
  0 siblings, 0 replies; 200+ results
From: Morten Brørup @ 2024-04-03 16:01 UTC (permalink / raw)
  To: Mattias Rönnblom, Maxime Coquelin, Stephen Hemminger,
	Andrey Ignatov
  Cc: dev, Chenbo Xia, Wei Shen, techboard

> From: Mattias Rönnblom [mailto:hofors@lysator.liu.se]
> Sent: Monday, 1 April 2024 17.20
> 
> On 2024-03-29 14:42, Morten Brørup wrote:
> > +CC techboard
> >
> >> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> >> Sent: Friday, 29 March 2024 14.05
> >>
> >> Hi Stephen,
> >>
> >> On 3/29/24 03:53, Stephen Hemminger wrote:
> >>> On Thu, 28 Mar 2024 17:10:42 -0700
> >>> Andrey Ignatov <rdna@apple.com> wrote:
> >>>
> >>>>>
> >>>>> You don't need always inline, the compiler will do it anyway.
> >>>>
> >>>> I can remove it in v2, but it's not completely obvious to me how is
> >> it
> >>>> decided when to specify it explicitly and when not?
> >>>>
> >>>> I see plenty of __rte_always_inline in this file:
> >>>>
> >>>> % git grep -c '^static __rte_always_inline' lib/vhost/virtio_net.c
> >>>> lib/vhost/virtio_net.c:66
> >>>
> >>>
> >>> Cargo cult really.
> >>>
> >>
> >> Cargo cult... really?
> >>
> >> Well, I just did a quick test by comparing IO forwarding with testpmd
> >> between main branch and with adding a patch that removes all the
> >> inline/noinline in lib/vhost/virtio_net.c [0].
> >>
> >> main branch: 14.63Mpps
> >> main branch - inline/noinline: 10.24Mpps
> >
> > Thank you for testing this, Maxime. Very interesting!
> >
> > It is sometimes suggested on techboard meetings that we should convert
> more inline functions to non-inline for improved API/ABI stability, with
> the argument that the performance of inlining is negligible.
> >
> 
> I think you are mixing two different (but related) things here.
> 1) marking functions with the inline family of keywords/attributes
> 2) keeping function definitions in header files

I'm talking about 2. The reason for wanting to avoid inline function definitions in header files is to hide more of the implementation behind the API, thus making it easier to change the implementation without breaking the API/ABI. Sorry about not making this clear.

> 
> 1) does not affect the ABI, while 2) does. Neither 1) nor 2) affects the
> API (i.e., source-level compatibility).
> 
> 2) *allows* for function inlining even in non-LTO builds, but doesn't
> force it.
> 
> If you don't believe 2) makes a difference performance-wise, it follows
> that you also don't believe LTO makes much of a difference. Both have
> the same effect: allowing the compiler to reason over a larger chunk of
> your program.
> 
> Allowing the compiler to inline small, often-called functions is crucial
> for performance, in my experience. If the target symbol tend to be in a
> shared object, the difference is even larger. It's also quite common
> that you see no effect of LTO (other than a reduction of code
> footprint).
> 
> As LTO becomes more practical to use, 2) loses much of its appeal.
> 
> If PGO ever becomes practical to use, maybe 1) will as well.
> 
> > I think this test proves that the sum of many small (negligible)
> performance differences it not negligible!
> >
> >>
> >> Andrey, thanks for the patch, I'll have a look at it next week.
> >>
> >> Maxime
> >>
> >> [0]: https://pastebin.com/72P2npZ0
> >

^ permalink raw reply	[relevance 3%]

* Re: Issues around packet capture when secondary process is doing rx/tx
  @ 2024-04-03 11:43  0%   ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-04-03 11:43 UTC (permalink / raw)
  To: Morten Brørup, Stephen Hemminger, dev
  Cc: arshdeep.kaur, Gowda, Sandesh, Reshma Pattan, Konstantin Ananyev

On 1/8/2024 10:41 AM, Morten Brørup wrote:
>> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
>> Sent: Monday, 8 January 2024 02.59
>>
>> I have been looking at a problem reported by Sandesh
>> where packet capture does not work if rx/tx burst is done in secondary
>> process.
>>
>> The root cause is that existing rx/tx callback model just doesn't work
>> unless the process doing the rx/tx burst calls is the same one that
>> registered the callbacks.
> 
> So, callbacks don't work across processes, because code might differ across processes.
> 
> If process A is running, and RX'ing and TX'ing, and process B wants to install its own callbacks (e.g. packet capture) on RX and RX, we basically want process A to execute code residing in process B, which is impossible.
> 

Callbacks stored in "struct rte_eth_dev", so it is per process, which
means primary and secondaries has their own copies of callbacks, as
Konstantin explained.

So, how pdump works :), it uses MP support and shared ring similar to
you mentioned below. More detail:
- Primary registers a MP handler
- pdump secondary process sends a MP message with a ring and mempool in
the message
- When primary receives the MP message it registers its *own* callbacks
that gets 'ring' as parameter
- Callbacks clone packets to 'ring', that is how pdump secondary process
access to the packets

> An alternative could be to pass the packets through a ring in shared memory. However, this method would add the ring processing latency of process B to the RX/TX latency of process A.
> 
> I think we can conclude that callbacks are one of the things that don't work with secondary processes.
> 
> With this decided, we can then consider how to best add packet capture. The concept of passing "data" (instead of calling functions) across processes obviously applies to this use case.
> 
>>
>> An example sequence would be:
>> 	1. dumpcap (or pdump) as secondary tells pdump in primary to
>> register callback
>> 	2. secondary process calls rx_burst.
>> 	3. rx_burst sees the callback but it has pointer pdump_rx which
>> is not necessarily
>> 	   at same location in primary and secondary process.
>> 	4. indirect function call in secondary to bad location likely
>> causes crash.
>>
>> Some possible workarounds.
>> 	1. Keep callback list per-process: messy, but won't crash.
>> Capture won't work
>>            without other changes. In this primary would register
>> callback, but secondaries
>>            would not use them in rx/tx burst.
>>
>> 	2. Replace use of rx/tx callback in pdump with change to
>> rte_ethdev to have
>>            a capture flag. (i.e. don't use indirection).  Likely ABI
>> problems.
>>            Basically, ignore the rx/tx callback mechanism. This is my
>> preferred
>> 	   solution.
>>
>> 	3. Some fix up mechanism (in EAL mp support?) to have each
>> process fixup
>>            its callback mechanism.
>>
>> 	4. Do something in pdump_init to register the callback in same
>> process context
>> 	   (probably need callbacks to be per-process). Would mean
>> callback is always
>>            on independent of capture being enabled.
>>
>>         5. Get rid of indirect function call pointer, and replace it by
>> index into
>>            a static table of callback functions. Every process would
>> have same code
>>            (in this case pdump_rx) but at different address.  Requires
>> all callbacks
>>            to be statically defined at build time.
>>
>> The existing rx/tx callback is not safe id rx/tx burst is called from
>> different process
>> than where callback is registered.
>>
> 


^ permalink raw reply	[relevance 0%]

* Re: Issues around packet capture when secondary process is doing rx/tx
    2024-04-03  0:14  4%   ` Stephen Hemminger
@ 2024-04-03 11:42  0%   ` Ferruh Yigit
  1 sibling, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-04-03 11:42 UTC (permalink / raw)
  To: Konstantin Ananyev, Stephen Hemminger, dev
  Cc: arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

On 1/8/2024 3:13 PM, Konstantin Ananyev wrote:
> 
> 
>> I have been looking at a problem reported by Sandesh
>> where packet capture does not work if rx/tx burst is done in secondary process.
>>
>> The root cause is that existing rx/tx callback model just doesn't work
>> unless the process doing the rx/tx burst calls is the same one that
>> registered the callbacks.
>>
>> An example sequence would be:
>> 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
>> 	2. secondary process calls rx_burst.
>> 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
>> 	   at same location in primary and secondary process.
>> 	4. indirect function call in secondary to bad location likely causes crash.
> 
> As I remember, RX/TX callbacks were never intended to work over multiple processes.
> Right now RX/TX callbacks are private for the process, different process simply should not
> see/execute them.
> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> between processes.
> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> for different processes.  
> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> From my understanding secondary process will never see/call primary's callbacks.
> 

Ack. There should be another reason for crash.


> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> though I am not sure such option is supported right now. 
>  

Currently testpmd calls 'rte_pdump_init()', and both primary testpmd and
secondary testpmd process calls this API and both register PDUMP_MP
handler, I think this is OK.

When pdump secondary process sends MP message, both primary testpmd and
secondary testpmd process should register callbacks with provided ring
and mempool information.

I don't know if both primary and secondary process callbacks running
simultaneously causing this problem, otherwise I expect it to work.

>>
>> Some possible workarounds.
>> 	1. Keep callback list per-process: messy, but won't crash. Capture won't work
>>            without other changes. In this primary would register callback, but secondaries
>>            would not use them in rx/tx burst.
>>
>> 	2. Replace use of rx/tx callback in pdump with change to rte_ethdev to have
>>            a capture flag. (i.e. don't use indirection).  Likely ABI problems.
>>            Basically, ignore the rx/tx callback mechanism. This is my preferred
>> 	   solution.
> 
> It is not only the capture flag, it is also what to do with the captured packets
> (copy? If yes, then where to? examine? drop?, do something else?).
> It is probably not the best choice to add all these things into ethdev API.
> 
>> 	3. Some fix up mechanism (in EAL mp support?) to have each process fixup
>>            its callback mechanism.
>  
> Probably the easiest way to fix that - pass to rte_pdump_enable() extra information
> that  would allow it to distinguish on what exact process (local, remote)
> we want to enable pdump functionality. Then it could act accordingly.
> 
>>
>> 	4. Do something in pdump_init to register the callback in same process context
>> 	   (probably need callbacks to be per-process). Would mean callback is always
>>            on independent of capture being enabled.
>>
>>         5. Get rid of indirect function call pointer, and replace it by index into
>>            a static table of callback functions. Every process would have same code
>>            (in this case pdump_rx) but at different address.  Requires all callbacks
>>            to be statically defined at build time.
> 
> Doesn't look like a good approach - it will break many things. 
>  
>> The existing rx/tx callback is not safe id rx/tx burst is called from different process
>> than where callback is registered.
>  
> 


^ permalink raw reply	[relevance 0%]

* Re: Issues around packet capture when secondary process is doing rx/tx
  @ 2024-04-03  0:14  4%   ` Stephen Hemminger
  2024-04-03 11:42  0%   ` Ferruh Yigit
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-03  0:14 UTC (permalink / raw)
  To: Konstantin Ananyev; +Cc: dev, arshdeep.kaur, Gowda, Sandesh, Reshma Pattan

On Mon, 8 Jan 2024 15:13:25 +0000
Konstantin Ananyev <konstantin.ananyev@huawei.com> wrote:

> > I have been looking at a problem reported by Sandesh
> > where packet capture does not work if rx/tx burst is done in secondary process.
> > 
> > The root cause is that existing rx/tx callback model just doesn't work
> > unless the process doing the rx/tx burst calls is the same one that
> > registered the callbacks.
> > 
> > An example sequence would be:
> > 	1. dumpcap (or pdump) as secondary tells pdump in primary to register callback
> > 	2. secondary process calls rx_burst.
> > 	3. rx_burst sees the callback but it has pointer pdump_rx which is not necessarily
> > 	   at same location in primary and secondary process.
> > 	4. indirect function call in secondary to bad location likely causes crash.  
> 
> As I remember, RX/TX callbacks were never intended to work over multiple processes.
> Right now RX/TX callbacks are private for the process, different process simply should not
> see/execute them.
> I.E. it callbacks list is part of 'struct rte_eth_dev' itself, not the rte_eth_dev.data that is shared
> between processes.
> It should be normal, wehn for the same port/queue you will end-up with different list of callbacks
> for different processes.  
> So, unless I am missing something, I don't see how we can end-up with 3) and 4) from above:
> From my understanding secondary process will never see/call primary's callbacks.
> 
> About pdump itself, it was a while when I looked at it last time, but as I remember to start it to work,
> server process has to call rte_pdump_init() which in terns register PDUMP_MP handler.
> I suppose for the secondary process to act as a 'pdump server' it needs to call rte_pdump_init() itself,
> though I am not sure such option is supported right now. 
>  
> > 
> > Some possible workarounds.
> > 	1. Keep callback list per-process: messy, but won't crash. Capture won't work
> >            without other changes. In this primary would register callback, but secondaries
> >            would not use them in rx/tx burst.
> > 
> > 	2. Replace use of rx/tx callback in pdump with change to rte_ethdev to have
> >            a capture flag. (i.e. don't use indirection).  Likely ABI problems.
> >            Basically, ignore the rx/tx callback mechanism. This is my preferred
> > 	   solution.  
> 
> It is not only the capture flag, it is also what to do with the captured packets
> (copy? If yes, then where to? examine? drop?, do something else?).
> It is probably not the best choice to add all these things into ethdev API.
> 
> > 	3. Some fix up mechanism (in EAL mp support?) to have each process fixup
> >            its callback mechanism.  
>  
> Probably the easiest way to fix that - pass to rte_pdump_enable() extra information
> that  would allow it to distinguish on what exact process (local, remote)
> we want to enable pdump functionality. Then it could act accordingly.
> 
> > 
> > 	4. Do something in pdump_init to register the callback in same process context
> > 	   (probably need callbacks to be per-process). Would mean callback is always
> >            on independent of capture being enabled.
> > 
> >         5. Get rid of indirect function call pointer, and replace it by index into
> >            a static table of callback functions. Every process would have same code
> >            (in this case pdump_rx) but at different address.  Requires all callbacks
> >            to be statically defined at build time.  
> 
> Doesn't look like a good approach - it will break many things. 
>  
> > The existing rx/tx callback is not safe id rx/tx burst is called from different process
> > than where callback is registered.  
>  
> 

Have been looking into best way to fix this, and the real answer is not to use
callbacks but instead use a flag per-queue. The natural place to put these in
rte_ethdev_driver. BUT this will mean an ABI breakage, so will have to wait for 24.11
release. Sometimes fixing a design flaw means an ABI change.

^ permalink raw reply	[relevance 4%]

* Re: [PATCH v9 2/4] mbuf: remove rte marker fields
  2024-04-02 20:45  0%     ` Stephen Hemminger
@ 2024-04-02 20:51  0%       ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-04-02 20:51 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
	David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
	Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
	Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
	Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
	Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, mb

On Tue, Apr 02, 2024 at 01:45:49PM -0700, Stephen Hemminger wrote:
> On Tue,  2 Apr 2024 13:08:48 -0700
> Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:
> 
> > RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> > RTE_MARKER fields from rte_mbuf struct.
> > 
> > Maintain alignment of fields after removed cacheline1 marker by placing
> > C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> > 
> > Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> > unions as single element arrays of with types matching the original
> > markers to maintain API compatibility.
> > 
> > This change breaks the API for cacheline{0,1} fields that have been
> > removed from rte_mbuf but it does not break the ABI, to address the
> > false positives of the removed (but 0 size fields) provide the minimum
> > libabigail.abignore for type = rte_mbuf.
> > 
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> 
> Release note should be for 24.07 not 24.03.

yeah, pressed send and noticed it seconds later. when the new empty
release notes are added i'll move the notes to 24.07.

thanks.

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v9 2/4] mbuf: remove rte marker fields
  2024-04-02 20:08  2%   ` [PATCH v9 2/4] mbuf: remove rte marker fields Tyler Retzlaff
@ 2024-04-02 20:45  0%     ` Stephen Hemminger
  2024-04-02 20:51  0%       ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-04-02 20:45 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
	David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
	Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
	Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
	Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
	Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, mb

On Tue,  2 Apr 2024 13:08:48 -0700
Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:

> RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> RTE_MARKER fields from rte_mbuf struct.
> 
> Maintain alignment of fields after removed cacheline1 marker by placing
> C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> 
> Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
> unions as single element arrays of with types matching the original
> markers to maintain API compatibility.
> 
> This change breaks the API for cacheline{0,1} fields that have been
> removed from rte_mbuf but it does not break the ABI, to address the
> false positives of the removed (but 0 size fields) provide the minimum
> libabigail.abignore for type = rte_mbuf.
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

Release note should be for 24.07 not 24.03.

^ permalink raw reply	[relevance 0%]

* [PATCH v9 2/4] mbuf: remove rte marker fields
  2024-04-02 20:08  3% ` [PATCH v9 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
@ 2024-04-02 20:08  2%   ` Tyler Retzlaff
  2024-04-02 20:45  0%     ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-02 20:08 UTC (permalink / raw)
  To: dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb, Tyler Retzlaff

RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields from rte_mbuf struct.

Maintain alignment of fields after removed cacheline1 marker by placing
C11 alignas(RTE_CACHE_LINE_MIN_SIZE).

Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.

This change breaks the API for cacheline{0,1} fields that have been
removed from rte_mbuf but it does not break the ABI, to address the
false positives of the removed (but 0 size fields) provide the minimum
libabigail.abignore for type = rte_mbuf.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 devtools/libabigail.abignore           |   6 +
 doc/guides/rel_notes/release_24_03.rst |   2 +
 lib/mbuf/rte_mbuf.h                    |   4 +-
 lib/mbuf/rte_mbuf_core.h               | 200 +++++++++++++++++----------------
 4 files changed, 114 insertions(+), 98 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 645d289..ad13179 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -37,3 +37,9 @@
 [suppress_type]
 	name = rte_eth_fp_ops
 	has_data_member_inserted_between = {offset_of(reserved2), end}
+
+[suppress_type]
+	name = rte_mbuf
+	type_kind = struct
+	has_size_change = no
+	has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, cacheline1}
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 013c12f..ffc0d62 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -161,6 +161,8 @@ Removed Items
 
 * acc101: Removed obsolete code for non productized HW variant.
 
+* mbuf: ``RTE_MARKER`` fields ``cacheline0`` and ``cacheline1``
+  have been removed from ``struct rte_mbuf``.
 
 API Changes
 -----------
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b..4c4722e 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -108,7 +108,7 @@
 static inline void
 rte_mbuf_prefetch_part1(struct rte_mbuf *m)
 {
-	rte_prefetch0(&m->cacheline0);
+	rte_prefetch0(m);
 }
 
 /**
@@ -126,7 +126,7 @@
 rte_mbuf_prefetch_part2(struct rte_mbuf *m)
 {
 #if RTE_CACHE_LINE_SIZE == 64
-	rte_prefetch0(&m->cacheline1);
+	rte_prefetch0(RTE_PTR_ADD(m, RTE_CACHE_LINE_MIN_SIZE));
 #else
 	RTE_SET_USED(m);
 #endif
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 9f58076..9d838b8 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -465,8 +465,6 @@ enum {
  * The generic rte_mbuf, containing a packet mbuf.
  */
 struct __rte_cache_aligned rte_mbuf {
-	RTE_MARKER cacheline0;
-
 	void *buf_addr;           /**< Virtual address of segment buffer. */
 #if RTE_IOVA_IN_MBUF
 	/**
@@ -488,127 +486,138 @@ struct __rte_cache_aligned rte_mbuf {
 #endif
 
 	/* next 8 bytes are initialised on RX descriptor rearm */
-	RTE_MARKER64 rearm_data;
-	uint16_t data_off;
-
-	/**
-	 * Reference counter. Its size should at least equal to the size
-	 * of port field (16 bits), to support zero-copy broadcast.
-	 * It should only be accessed using the following functions:
-	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
-	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
-	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
-	 */
-	RTE_ATOMIC(uint16_t) refcnt;
+	union {
+		uint64_t rearm_data[1];
+		__extension__
+		struct {
+			uint16_t data_off;
+
+			/**
+			 * Reference counter. Its size should at least equal to the size
+			 * of port field (16 bits), to support zero-copy broadcast.
+			 * It should only be accessed using the following functions:
+			 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+			 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
+			 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
+			 */
+			RTE_ATOMIC(uint16_t) refcnt;
 
-	/**
-	 * Number of segments. Only valid for the first segment of an mbuf
-	 * chain.
-	 */
-	uint16_t nb_segs;
+			/**
+			 * Number of segments. Only valid for the first segment of an mbuf
+			 * chain.
+			 */
+			uint16_t nb_segs;
 
-	/** Input port (16 bits to support more than 256 virtual ports).
-	 * The event eth Tx adapter uses this field to specify the output port.
-	 */
-	uint16_t port;
+			/** Input port (16 bits to support more than 256 virtual ports).
+			 * The event eth Tx adapter uses this field to specify the output port.
+			 */
+			uint16_t port;
+		};
+	};
 
 	uint64_t ol_flags;        /**< Offload features. */
 
-	/* remaining bytes are set on RX when pulling packet from descriptor */
-	RTE_MARKER rx_descriptor_fields1;
-
-	/*
-	 * The packet type, which is the combination of outer/inner L2, L3, L4
-	 * and tunnel types. The packet_type is about data really present in the
-	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
-	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
-	 * vlan is stripped from the data.
-	 */
+	/* remaining 24 bytes are set on RX when pulling packet from descriptor */
 	union {
-		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
+		/* void * type of the array elements is retained for driver compatibility. */
+		void *rx_descriptor_fields1[24 / sizeof(void *)];
 		__extension__
 		struct {
-			uint8_t l2_type:4;   /**< (Outer) L2 type. */
-			uint8_t l3_type:4;   /**< (Outer) L3 type. */
-			uint8_t l4_type:4;   /**< (Outer) L4 type. */
-			uint8_t tun_type:4;  /**< Tunnel type. */
+			/*
+			 * The packet type, which is the combination of outer/inner L2, L3, L4
+			 * and tunnel types. The packet_type is about data really present in the
+			 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
+			 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+			 * vlan is stripped from the data.
+			 */
 			union {
-				uint8_t inner_esp_next_proto;
-				/**< ESP next protocol type, valid if
-				 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
-				 * on both Tx and Rx.
-				 */
+				uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
 				__extension__
 				struct {
-					uint8_t inner_l2_type:4;
-					/**< Inner L2 type. */
-					uint8_t inner_l3_type:4;
-					/**< Inner L3 type. */
+					uint8_t l2_type:4;   /**< (Outer) L2 type. */
+					uint8_t l3_type:4;   /**< (Outer) L3 type. */
+					uint8_t l4_type:4;   /**< (Outer) L4 type. */
+					uint8_t tun_type:4;  /**< Tunnel type. */
+					union {
+						/**< ESP next protocol type, valid if
+						 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
+						 * on both Tx and Rx.
+						 */
+						uint8_t inner_esp_next_proto;
+						__extension__
+						struct {
+							/**< Inner L2 type. */
+							uint8_t inner_l2_type:4;
+							/**< Inner L3 type. */
+							uint8_t inner_l3_type:4;
+						};
+					};
+					uint8_t inner_l4_type:4; /**< Inner L4 type. */
 				};
 			};
-			uint8_t inner_l4_type:4; /**< Inner L4 type. */
-		};
-	};
 
-	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
-	uint16_t data_len;        /**< Amount of data in segment buffer. */
-	/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
-	uint16_t vlan_tci;
+			uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
 
-	union {
-		union {
-			uint32_t rss;     /**< RSS hash result if RSS enabled */
-			struct {
+			uint16_t data_len;        /**< Amount of data in segment buffer. */
+			/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
+			uint16_t vlan_tci;
+
+			union {
 				union {
+					uint32_t rss;     /**< RSS hash result if RSS enabled */
 					struct {
-						uint16_t hash;
-						uint16_t id;
-					};
-					uint32_t lo;
-					/**< Second 4 flexible bytes */
-				};
-				uint32_t hi;
-				/**< First 4 flexible bytes or FD ID, dependent
-				 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
-				 */
-			} fdir;	/**< Filter identifier if FDIR enabled */
-			struct rte_mbuf_sched sched;
-			/**< Hierarchical scheduler : 8 bytes */
-			struct {
-				uint32_t reserved1;
-				uint16_t reserved2;
-				uint16_t txq;
-				/**< The event eth Tx adapter uses this field
-				 * to store Tx queue id.
-				 * @see rte_event_eth_tx_adapter_txq_set()
-				 */
-			} txadapter; /**< Eventdev ethdev Tx adapter */
-			uint32_t usr;
-			/**< User defined tags. See rte_distributor_process() */
-		} hash;                   /**< hash information */
-	};
+						union {
+							struct {
+								uint16_t hash;
+								uint16_t id;
+							};
+							/**< Second 4 flexible bytes */
+							uint32_t lo;
+						};
+						/**< First 4 flexible bytes or FD ID, dependent
+						 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
+						 */
+						uint32_t hi;
+					} fdir;	/**< Filter identifier if FDIR enabled */
+					struct rte_mbuf_sched sched;
+					/**< Hierarchical scheduler : 8 bytes */
+					struct {
+						uint32_t reserved1;
+						uint16_t reserved2;
+						/**< The event eth Tx adapter uses this field
+						 * to store Tx queue id.
+						 * @see rte_event_eth_tx_adapter_txq_set()
+						 */
+						uint16_t txq;
+					} txadapter; /**< Eventdev ethdev Tx adapter */
+					/**< User defined tags. See rte_distributor_process() */
+					uint32_t usr;
+				} hash;                   /**< hash information */
+			};
 
-	/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
-	uint16_t vlan_tci_outer;
+			/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
+			uint16_t vlan_tci_outer;
 
-	uint16_t buf_len;         /**< Length of segment buffer. */
+			uint16_t buf_len;         /**< Length of segment buffer. */
+		};
+	};
 
 	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
 
 	/* second cache line - fields only used in slow path or on TX */
-	alignas(RTE_CACHE_LINE_MIN_SIZE) RTE_MARKER cacheline1;
-
 #if RTE_IOVA_IN_MBUF
 	/**
 	 * Next segment of scattered packet. Must be NULL in the last
 	 * segment or in case of non-segmented packet.
 	 */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	struct rte_mbuf *next;
 #else
 	/**
 	 * Reserved for dynamic fields
 	 * when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
 	 */
+	alignas(RTE_CACHE_LINE_MIN_SIZE)
 	uint64_t dynfield2;
 #endif
 
@@ -617,17 +626,16 @@ struct __rte_cache_aligned rte_mbuf {
 		uint64_t tx_offload;       /**< combined for easy fetch */
 		__extension__
 		struct {
-			uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
 			/**< L2 (MAC) Header Length for non-tunneling pkt.
 			 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
 			 */
-			uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
+			uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
 			/**< L3 (IP) Header Length. */
-			uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
+			uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
 			/**< L4 (TCP/UDP) Header Length. */
-			uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
+			uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
 			/**< TCP TSO segment size */
-
+			uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
 			/*
 			 * Fields for Tx offloading of tunnels.
 			 * These are undefined for packets which don't request
@@ -640,10 +648,10 @@ struct __rte_cache_aligned rte_mbuf {
 			 * Applications are expected to set appropriate tunnel
 			 * offload flags when they fill in these fields.
 			 */
-			uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
 			/**< Outer L3 (IP) Hdr Length. */
-			uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
+			uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
 			/**< Outer L2 (MAC) Hdr Length. */
+			uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
 
 			/* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */
 		};
-- 
1.8.3.1


^ permalink raw reply	[relevance 2%]

* [PATCH v9 0/4] remove use of RTE_MARKER fields in libraries
      @ 2024-04-02 20:08  3% ` Tyler Retzlaff
  2024-04-02 20:08  2%   ` [PATCH v9 2/4] mbuf: remove rte marker fields Tyler Retzlaff
  2024-04-03 17:53  3% ` [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
  2024-04-04 17:51  3% ` [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
  4 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-04-02 20:08 UTC (permalink / raw)
  To: dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb, Tyler Retzlaff

As per techboard meeting 2024/03/20 adopt hybrid proposal of adapting
descriptor fields and removing cachline fields.

RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
RTE_MARKER fields.

For cacheline{0,1} fields remove fields entirely and use inline
functions to prefetch.

Provide new rearm_data and rx_descriptor_fields1 fields in anonymous
unions as single element arrays of with types matching the original
markers to maintain API compatibility.

Note: diff is easier viewed with -b due to additional nesting from
      unions / structs that have been introduced.

v9:
  * provide narrowest possible libabigail.abignore to suppress
    removal of fields that were agreed are not actual abi changes.

v8:
  * rx_descriptor_fields1 array is now constexpr sized to
    24 / sizeof(void *) so that the array encompasses fields
    accessed via the array.
  * add a comment to rx_descriptor_fields1 array site noting
    that void * type of elements is retained for compatibility
    with existing drivers.
  * clean up comments of fields in rte_mbuf to be before the
    field they apply to instead of after.
  * duplicate alignas(RTE_CACHE_LINE_MIN_SIZE) into both legs of
    conditional compile for first field of cacheline 1 instead of
    once before conditional compile block.

v7:
  * complete re-write of series, previous versions not noted. all
    reviewed-by and acked-by tags (if any) were removed.

Tyler Retzlaff (4):
  net/i40e: use inline prefetch function
  mbuf: remove rte marker fields
  security: remove rte marker fields
  cryptodev: remove rte marker fields

 devtools/libabigail.abignore            |   6 +
 doc/guides/rel_notes/release_24_03.rst  |   8 ++
 drivers/net/i40e/i40e_rxtx_vec_avx512.c |   2 +-
 lib/cryptodev/cryptodev_pmd.h           |   5 +-
 lib/mbuf/rte_mbuf.h                     |   4 +-
 lib/mbuf/rte_mbuf_core.h                | 200 +++++++++++++++++---------------
 lib/security/rte_security_driver.h      |   5 +-
 7 files changed, 127 insertions(+), 103 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* [PATCH v5 6/8] net/tap: rewrite the RSS BPF program
  @ 2024-04-02 17:12  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-04-02 17:12 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite the BPF program used to do queue based RSS.
Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF.
	- the resulting queue is placed in skb rather
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  38 ++++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 264 ++++++++++++++++++++++++
 9 files changed, 383 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace..01a47a7606 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc7..0000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 0000000000..1d421ff42c
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,38 @@
+This is the BPF program used to implement the RSS across queues flow action.
+The program is loaded when first RSS flow rule is created and is never unloaded.
+
+Each flow rule creates a unique key (handle) and this is used as the key
+for finding the RSS information for that flow rule.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+-----------
+- requires libbpf to run
+- rebuilding the BPF requires Clang and bpftool.
+  Some older versions of Ubuntu do not have working bpftool package.
+  Need a version of Clang that can compile to BPF.
+- only standard Toeplitz hash with standard 40 byte key is supported
+- the number of flow rules using RSS is limited to 32
+
+Building
+--------
+During the DPDK build process the meson build file checks that
+libbpf, bpftool, and clang are not available. If everything is
+there then BPF RSS is enabled.
+
+1. Using clang to compile tap_rss.c the tap_rss.bpf.o file.
+
+2. Using bpftool generate a skeleton header file tap_rss.skel.h from tap_rss.bpf.o.
+   This skeleton header is an large byte array which contains the
+   BPF binary and wrappers to load and use it.
+
+3. The tap flow code then compiles that BPF byte array into the PMD object.
+
+4. When needed the BPF array is loaded by libbpf.
+
+References
+----------
+BPF and XDP reference guide
+https://docs.cilium.io/en/latest/bpf/progtypes/
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 4cd25fa593..0000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	__extension__ ({						\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c..0000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4e..0000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 0000000000..f2c03a19fd
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c..0000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 0000000000..888b3bdc24
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,264 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows which need BPF RSS.
+ *
+ * The hash is indexed by the skb mark.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u32));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_RSS_MAX);
+} rss_map SEC(".maps");
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/*
+ * Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* If packet is fragmented then no L4 hash is possible */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP protocols */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/*
+ * Parse Ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+/*
+ * Compute RSS hash for IPv6 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;	/* missing IPv6 header */
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	/* If only doing L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	/* Skip extension headers if present */
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	/* If packet is a fragment then no L4 hash is possible */
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/*
+ * Scale value to be into range [0, n)
+ * Assumes val is large (ie hash covers whole u32 range)
+ */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u32 mark = skb->mark;
+	__u32 hash;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &mark);
+	if (rsskey == NULL)
+		return TC_ACT_OK;
+
+	hash = calculate_rss_hash(skb, rsskey);
+	if (!hash)
+		return TC_ACT_OK;
+
+	/* Fold hash to the number of queues configured */
+	skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+	return TC_ACT_PIPE;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* Re: [PATCH] version: 24.07-rc0
  2024-04-02  8:52  0% ` Thomas Monjalon
@ 2024-04-02  9:25  0%   ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-04-02  9:25 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Aaron Conole, Michael Santana, Thomas Monjalon

On Tue, Apr 2, 2024 at 10:52 AM Thomas Monjalon <thomas@monjalon.net> wrote:
>
> 30/03/2024 18:54, David Marchand:
> > Start a new release cycle with empty release notes.
> > Bump version and ABI minor.
> >
> > Signed-off-by: David Marchand <david.marchand@redhat.com>
> Acked-by: Thomas Monjalon <thomas@monjalon.net>

Applied, thanks.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [PATCH] version: 24.07-rc0
  2024-03-30 17:54 18% [PATCH] version: 24.07-rc0 David Marchand
@ 2024-04-02  8:52  0% ` Thomas Monjalon
  2024-04-02  9:25  0%   ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-04-02  8:52 UTC (permalink / raw)
  To: David Marchand; +Cc: dev, Aaron Conole, Michael Santana

30/03/2024 18:54, David Marchand:
> Start a new release cycle with empty release notes.
> Bump version and ABI minor.
> 
> Signed-off-by: David Marchand <david.marchand@redhat.com>

Acked-by: Thomas Monjalon <thomas@monjalon.net>




^ permalink raw reply	[relevance 0%]

* Re: The effect of inlining
  2024-03-29 13:42  3%         ` The effect of inlining Morten Brørup
  2024-03-29 20:26  0%           ` Tyler Retzlaff
@ 2024-04-01 15:20  3%           ` Mattias Rönnblom
  2024-04-03 16:01  3%             ` Morten Brørup
  1 sibling, 1 reply; 200+ results
From: Mattias Rönnblom @ 2024-04-01 15:20 UTC (permalink / raw)
  To: Morten Brørup, Maxime Coquelin, Stephen Hemminger, Andrey Ignatov
  Cc: dev, Chenbo Xia, Wei Shen, techboard

On 2024-03-29 14:42, Morten Brørup wrote:
> +CC techboard
> 
>> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
>> Sent: Friday, 29 March 2024 14.05
>>
>> Hi Stephen,
>>
>> On 3/29/24 03:53, Stephen Hemminger wrote:
>>> On Thu, 28 Mar 2024 17:10:42 -0700
>>> Andrey Ignatov <rdna@apple.com> wrote:
>>>
>>>>>
>>>>> You don't need always inline, the compiler will do it anyway.
>>>>
>>>> I can remove it in v2, but it's not completely obvious to me how is
>> it
>>>> decided when to specify it explicitly and when not?
>>>>
>>>> I see plenty of __rte_always_inline in this file:
>>>>
>>>> % git grep -c '^static __rte_always_inline' lib/vhost/virtio_net.c
>>>> lib/vhost/virtio_net.c:66
>>>
>>>
>>> Cargo cult really.
>>>
>>
>> Cargo cult... really?
>>
>> Well, I just did a quick test by comparing IO forwarding with testpmd
>> between main branch and with adding a patch that removes all the
>> inline/noinline in lib/vhost/virtio_net.c [0].
>>
>> main branch: 14.63Mpps
>> main branch - inline/noinline: 10.24Mpps
> 
> Thank you for testing this, Maxime. Very interesting!
> 
> It is sometimes suggested on techboard meetings that we should convert more inline functions to non-inline for improved API/ABI stability, with the argument that the performance of inlining is negligible.
> 

I think you are mixing two different (but related) things here.
1) marking functions with the inline family of keywords/attributes
2) keeping function definitions in header files

1) does not affect the ABI, while 2) does. Neither 1) nor 2) affects the 
API (i.e., source-level compatibility).

2) *allows* for function inlining even in non-LTO builds, but doesn't 
force it.

If you don't believe 2) makes a difference performance-wise, it follows 
that you also don't believe LTO makes much of a difference. Both have 
the same effect: allowing the compiler to reason over a larger chunk of 
your program.

Allowing the compiler to inline small, often-called functions is crucial 
for performance, in my experience. If the target symbol tend to be in a 
shared object, the difference is even larger. It's also quite common 
that you see no effect of LTO (other than a reduction of code footprint).

As LTO becomes more practical to use, 2) loses much of its appeal.

If PGO ever becomes practical to use, maybe 1) will as well.

> I think this test proves that the sum of many small (negligible) performance differences it not negligible!
> 
>>
>> Andrey, thanks for the patch, I'll have a look at it next week.
>>
>> Maxime
>>
>> [0]: https://pastebin.com/72P2npZ0
> 

^ permalink raw reply	[relevance 3%]

* [PATCH] version: 24.07-rc0
@ 2024-03-30 17:54 18% David Marchand
  2024-04-02  8:52  0% ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-03-30 17:54 UTC (permalink / raw)
  To: dev; +Cc: thomas, Aaron Conole, Michael Santana

Start a new release cycle with empty release notes.
Bump version and ABI minor.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 .github/workflows/build.yml            |   2 +-
 ABI_VERSION                            |   2 +-
 VERSION                                |   2 +-
 doc/guides/rel_notes/index.rst         |   1 +
 doc/guides/rel_notes/release_24_03.rst | 110 --------------------
 doc/guides/rel_notes/release_24_07.rst | 138 +++++++++++++++++++++++++
 6 files changed, 142 insertions(+), 113 deletions(-)
 create mode 100644 doc/guides/rel_notes/release_24_07.rst

diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
index 2c308d5e9d..dbf25626d4 100644
--- a/.github/workflows/build.yml
+++ b/.github/workflows/build.yml
@@ -27,7 +27,7 @@ jobs:
       MINGW: ${{ matrix.config.cross == 'mingw' }}
       MINI: ${{ matrix.config.mini != '' }}
       PPC64LE: ${{ matrix.config.cross == 'ppc64le' }}
-      REF_GIT_TAG: v23.11
+      REF_GIT_TAG: v24.03
       RISCV64: ${{ matrix.config.cross == 'riscv64' }}
       RUN_TESTS: ${{ contains(matrix.config.checks, 'tests') }}
       STDATOMIC: ${{ contains(matrix.config.checks, 'stdatomic') }}
diff --git a/ABI_VERSION b/ABI_VERSION
index 0dad123924..9dc0ade502 100644
--- a/ABI_VERSION
+++ b/ABI_VERSION
@@ -1 +1 @@
-24.1
+24.2
diff --git a/VERSION b/VERSION
index 58dfef16ef..2081979127 100644
--- a/VERSION
+++ b/VERSION
@@ -1 +1 @@
-24.03.0
+24.07.0-rc0
diff --git a/doc/guides/rel_notes/index.rst b/doc/guides/rel_notes/index.rst
index 88f2b30b03..77a92b308f 100644
--- a/doc/guides/rel_notes/index.rst
+++ b/doc/guides/rel_notes/index.rst
@@ -8,6 +8,7 @@ Release Notes
     :maxdepth: 1
     :numbered:
 
+    release_24_07
     release_24_03
     release_23_11
     release_23_07
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 8e7ad8f99f..013c12f801 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -6,55 +6,9 @@
 DPDK Release 24.03
 ==================
 
-.. **Read this first.**
-
-   The text in the sections below explains how to update the release notes.
-
-   Use proper spelling, capitalization and punctuation in all sections.
-
-   Variable and config names should be quoted as fixed width text:
-   ``LIKE_THIS``.
-
-   Build the docs and view the output file to ensure the changes are correct::
-
-      ninja -C build doc
-      xdg-open build/doc/guides/html/rel_notes/release_24_03.html
-
-
 New Features
 ------------
 
-.. This section should contain new features added in this release.
-   Sample format:
-
-   * **Add a title in the past tense with a full stop.**
-
-     Add a short 1-2 sentence description in the past tense.
-     The description should be enough to allow someone scanning
-     the release notes to understand the new feature.
-
-     If the feature adds a lot of sub-features you can use a bullet list
-     like this:
-
-     * Added feature foo to do something.
-     * Enhanced feature bar to do something else.
-
-     Refer to the previous release notes for examples.
-
-     Suggested order in release notes items:
-     * Core libs (EAL, mempool, ring, mbuf, buses)
-     * Device abstraction libs and PMDs (ordered alphabetically by vendor name)
-       - ethdev (lib, PMDs)
-       - cryptodev (lib, PMDs)
-       - eventdev (lib, PMDs)
-       - etc
-     * Other libs
-     * Apps, Examples, Tools (if significant)
-
-     This section is a comment. Do not overwrite or remove it.
-     Also, make sure to start the actual text at the margin.
-     =======================================================
-
 * **Added HiSilicon UACCE bus support.**
 
   Added UACCE (Unified/User-space-access-intended Accelerator Framework) bus
@@ -200,15 +154,6 @@ New Features
 Removed Items
 -------------
 
-.. This section should contain removed items in this release. Sample format:
-
-   * Add a short 1-2 sentence description of the removed item
-     in the past tense.
-
-   This section is a comment. Do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =======================================================
-
 * log: Removed the statically defined logtypes that were used internally by DPDK.
   All code should be using the dynamic logtypes (see ``RTE_LOG_REGISTER()``).
   The application reserved statically defined logtypes ``RTE_LOGTYPE_USER1..RTE_LOGTYPE_USER8``
@@ -220,18 +165,6 @@ Removed Items
 API Changes
 -----------
 
-.. This section should contain API changes. Sample format:
-
-   * sample: Add a short 1-2 sentence description of the API change
-     which was announced in the previous releases and made in this release.
-     Start with a scope label like "ethdev:".
-     Use fixed width quotes for ``function_names`` or ``struct_names``.
-     Use the past tense.
-
-   This section is a comment. Do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =======================================================
-
 * eal: Removed ``typeof(type)`` from the expansion of ``RTE_DEFINE_PER_LCORE``
   and ``RTE_DECLARE_PER_LCORE`` macros aligning them with their intended design.
   If use with an expression is desired applications can adapt by supplying
@@ -249,55 +182,12 @@ API Changes
 ABI Changes
 -----------
 
-.. This section should contain ABI changes. Sample format:
-
-   * sample: Add a short 1-2 sentence description of the ABI change
-     which was announced in the previous releases and made in this release.
-     Start with a scope label like "ethdev:".
-     Use fixed width quotes for ``function_names`` or ``struct_names``.
-     Use the past tense.
-
-   This section is a comment. Do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =======================================================
-
 * No ABI change that would break compatibility with 23.11.
 
 
-Known Issues
-------------
-
-.. This section should contain new known issues in this release. Sample format:
-
-   * **Add title in present tense with full stop.**
-
-     Add a short 1-2 sentence description of the known issue
-     in the present tense. Add information on any known workarounds.
-
-   This section is a comment. Do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =======================================================
-
-
 Tested Platforms
 ----------------
 
-.. This section should contain a list of platforms that were tested
-   with this release.
-
-   The format is:
-
-   * <vendor> platform with <vendor> <type of devices> combinations
-
-     * List of CPU
-     * List of OS
-     * List of devices
-     * Other relevant details...
-
-   This section is a comment. Do not overwrite or remove it.
-   Also, make sure to start the actual text at the margin.
-   =======================================================
-
 * AMD platforms
 
   * CPU
diff --git a/doc/guides/rel_notes/release_24_07.rst b/doc/guides/rel_notes/release_24_07.rst
new file mode 100644
index 0000000000..a69f24cf99
--- /dev/null
+++ b/doc/guides/rel_notes/release_24_07.rst
@@ -0,0 +1,138 @@
+.. SPDX-License-Identifier: BSD-3-Clause
+   Copyright 2024 The DPDK contributors
+
+.. include:: <isonum.txt>
+
+DPDK Release 24.07
+==================
+
+.. **Read this first.**
+
+   The text in the sections below explains how to update the release notes.
+
+   Use proper spelling, capitalization and punctuation in all sections.
+
+   Variable and config names should be quoted as fixed width text:
+   ``LIKE_THIS``.
+
+   Build the docs and view the output file to ensure the changes are correct::
+
+      ninja -C build doc
+      xdg-open build/doc/guides/html/rel_notes/release_24_07.html
+
+
+New Features
+------------
+
+.. This section should contain new features added in this release.
+   Sample format:
+
+   * **Add a title in the past tense with a full stop.**
+
+     Add a short 1-2 sentence description in the past tense.
+     The description should be enough to allow someone scanning
+     the release notes to understand the new feature.
+
+     If the feature adds a lot of sub-features you can use a bullet list
+     like this:
+
+     * Added feature foo to do something.
+     * Enhanced feature bar to do something else.
+
+     Refer to the previous release notes for examples.
+
+     Suggested order in release notes items:
+     * Core libs (EAL, mempool, ring, mbuf, buses)
+     * Device abstraction libs and PMDs (ordered alphabetically by vendor name)
+       - ethdev (lib, PMDs)
+       - cryptodev (lib, PMDs)
+       - eventdev (lib, PMDs)
+       - etc
+     * Other libs
+     * Apps, Examples, Tools (if significant)
+
+     This section is a comment. Do not overwrite or remove it.
+     Also, make sure to start the actual text at the margin.
+     =======================================================
+
+
+Removed Items
+-------------
+
+.. This section should contain removed items in this release. Sample format:
+
+   * Add a short 1-2 sentence description of the removed item
+     in the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+API Changes
+-----------
+
+.. This section should contain API changes. Sample format:
+
+   * sample: Add a short 1-2 sentence description of the API change
+     which was announced in the previous releases and made in this release.
+     Start with a scope label like "ethdev:".
+     Use fixed width quotes for ``function_names`` or ``struct_names``.
+     Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+ABI Changes
+-----------
+
+.. This section should contain ABI changes. Sample format:
+
+   * sample: Add a short 1-2 sentence description of the ABI change
+     which was announced in the previous releases and made in this release.
+     Start with a scope label like "ethdev:".
+     Use fixed width quotes for ``function_names`` or ``struct_names``.
+     Use the past tense.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+* No ABI change that would break compatibility with 23.11.
+
+
+Known Issues
+------------
+
+.. This section should contain new known issues in this release. Sample format:
+
+   * **Add title in present tense with full stop.**
+
+     Add a short 1-2 sentence description of the known issue
+     in the present tense. Add information on any known workarounds.
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
+
+
+Tested Platforms
+----------------
+
+.. This section should contain a list of platforms that were tested
+   with this release.
+
+   The format is:
+
+   * <vendor> platform with <vendor> <type of devices> combinations
+
+     * List of CPU
+     * List of OS
+     * List of devices
+     * Other relevant details...
+
+   This section is a comment. Do not overwrite or remove it.
+   Also, make sure to start the actual text at the margin.
+   =======================================================
-- 
2.44.0


^ permalink raw reply	[relevance 18%]

* Re: The effect of inlining
  2024-03-29 13:42  3%         ` The effect of inlining Morten Brørup
@ 2024-03-29 20:26  0%           ` Tyler Retzlaff
  2024-04-01 15:20  3%           ` Mattias Rönnblom
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-03-29 20:26 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Maxime Coquelin, Stephen Hemminger, Andrey Ignatov, dev,
	Chenbo Xia, Wei Shen, techboard

On Fri, Mar 29, 2024 at 02:42:49PM +0100, Morten Brørup wrote:
> +CC techboard
> 
> > From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> > Sent: Friday, 29 March 2024 14.05
> > 
> > Hi Stephen,
> > 
> > On 3/29/24 03:53, Stephen Hemminger wrote:
> > > On Thu, 28 Mar 2024 17:10:42 -0700
> > > Andrey Ignatov <rdna@apple.com> wrote:
> > >
> > >>>
> > >>> You don't need always inline, the compiler will do it anyway.
> > >>
> > >> I can remove it in v2, but it's not completely obvious to me how is
> > it
> > >> decided when to specify it explicitly and when not?
> > >>
> > >> I see plenty of __rte_always_inline in this file:
> > >>
> > >> % git grep -c '^static __rte_always_inline' lib/vhost/virtio_net.c
> > >> lib/vhost/virtio_net.c:66
> > >
> > >
> > > Cargo cult really.
> > >
> > 
> > Cargo cult... really?
> > 
> > Well, I just did a quick test by comparing IO forwarding with testpmd
> > between main branch and with adding a patch that removes all the
> > inline/noinline in lib/vhost/virtio_net.c [0].
> > 
> > main branch: 14.63Mpps
> > main branch - inline/noinline: 10.24Mpps
> 
> Thank you for testing this, Maxime. Very interesting!
> 
> It is sometimes suggested on techboard meetings that we should convert more inline functions to non-inline for improved API/ABI stability, with the argument that the performance of inlining is negligible.

removing inline functions probably has an even more profound negative
impact when using dynamic linking. for all the value of msvc's dll
scoped security features they do have overheads per-call that can't be
wished away i imagine equivalents in gcc are the same.

> 
> I think this test proves that the sum of many small (negligible) performance differences it not negligible!

sure looks that way, though i think there is some distinction to be made
between inline and *forced* inline.

force inline may be losing us some opportunity for the compiler to
optimize better than is obvious to us.

> 
> > 
> > Andrey, thanks for the patch, I'll have a look at it next week.
> > 
> > Maxime
> > 
> > [0]: https://pastebin.com/72P2npZ0
> 

^ permalink raw reply	[relevance 0%]

* The effect of inlining
  @ 2024-03-29 13:42  3%         ` Morten Brørup
  2024-03-29 20:26  0%           ` Tyler Retzlaff
  2024-04-01 15:20  3%           ` Mattias Rönnblom
  0 siblings, 2 replies; 200+ results
From: Morten Brørup @ 2024-03-29 13:42 UTC (permalink / raw)
  To: Maxime Coquelin, Stephen Hemminger, Andrey Ignatov
  Cc: dev, Chenbo Xia, Wei Shen, techboard

+CC techboard

> From: Maxime Coquelin [mailto:maxime.coquelin@redhat.com]
> Sent: Friday, 29 March 2024 14.05
> 
> Hi Stephen,
> 
> On 3/29/24 03:53, Stephen Hemminger wrote:
> > On Thu, 28 Mar 2024 17:10:42 -0700
> > Andrey Ignatov <rdna@apple.com> wrote:
> >
> >>>
> >>> You don't need always inline, the compiler will do it anyway.
> >>
> >> I can remove it in v2, but it's not completely obvious to me how is
> it
> >> decided when to specify it explicitly and when not?
> >>
> >> I see plenty of __rte_always_inline in this file:
> >>
> >> % git grep -c '^static __rte_always_inline' lib/vhost/virtio_net.c
> >> lib/vhost/virtio_net.c:66
> >
> >
> > Cargo cult really.
> >
> 
> Cargo cult... really?
> 
> Well, I just did a quick test by comparing IO forwarding with testpmd
> between main branch and with adding a patch that removes all the
> inline/noinline in lib/vhost/virtio_net.c [0].
> 
> main branch: 14.63Mpps
> main branch - inline/noinline: 10.24Mpps

Thank you for testing this, Maxime. Very interesting!

It is sometimes suggested on techboard meetings that we should convert more inline functions to non-inline for improved API/ABI stability, with the argument that the performance of inlining is negligible.

I think this test proves that the sum of many small (negligible) performance differences it not negligible!

> 
> Andrey, thanks for the patch, I'll have a look at it next week.
> 
> Maxime
> 
> [0]: https://pastebin.com/72P2npZ0


^ permalink raw reply	[relevance 3%]

* DPDK 24.03 released
@ 2024-03-28 21:46  3% Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-03-28 21:46 UTC (permalink / raw)
  To: announce

A new major release is available:
	https://fast.dpdk.org/rel/dpdk-24.03.tar.xz

This is the work we did during the last months:
	987 commits from 154 authors
	1334 files changed, 79260 insertions(+), 22824 deletions(-)


It is not planned to start a maintenance branch for 24.03.
This version is ABI-compatible with 23.11.

Below are some new features:
	- argument parsing library
	- dynamic logging standardized
	- HiSilicon UACCE bus
	- Tx queue query
	- flow matching with random and field comparison
	- flow action NAT64
	- flow template table resizing
	- more cleanups to prepare MSVC build
	- more DTS tests and cleanups

More details in the release notes:
	https://doc.dpdk.org/guides/rel_notes/release_24_03.html


There are 31 new contributors (including authors, reviewers and testers).
Welcome to Akshay Dorwat, Alan Elder, Bhuvan Mital, Brad Larson,
Christian Koue Muf, Chuanyu Xue, Emi Aoki, Fidel Castro, Flore Norceide,
Gavin Li, Holly Nichols, Jack Bond-Preston, Lewis Donzis, Liangxing Wang,
Luca Vizzarro, Masoumeh Farhadi Nia, Mykola Kostenok, Nicholas Pratte,
Nishikant Nayak, Oleksandr Kolomeiets, Parthakumar Roy, Qian Hao,
Shani Peretz, Shaowei Sun, Ting-Kai Ku, Tingting Liao, Tom Jones,
Vamsi Krishna Atluri, Venkat Kumar Ande, Vinh Tran,
and Wathsala Vithanage.

Below is the number of commits per employer (with authors count):
	202     Marvell (26)
	166     NVIDIA (23)
	125     Intel (31)
	 80     networkplumber.org (1)
	 77     Corigine (6)
	 64     Red Hat (5)
	 56     Huawei (7)
	 52     Broadcom (6)
	 33     AMD (9)
	 32     Amazon (1)
	 27     Microsoft (4)
	 14     PANTHEON.tech (1)
	 14     Arm (5)
	  7     Google (2)
	  6     UNH (1)
	        ...

A big thank to all courageous people who reviewed other's work.
Based on Reviewed-by and Acked-by tags, the top non-PMD reviewers are:
	 50     Akhil Goyal <gakhil@marvell.com>
	 44     Ferruh Yigit <ferruh.yigit@amd.com>
	 40     Chengwen Feng <fengchengwen@huawei.com>
	 36     Anoob Joseph <anoobj@marvell.com>
	 32     Morten Brørup <mb@smartsharesystems.com>
	 26     Tyler Retzlaff <roretzla@linux.microsoft.com>
	 21     Dariusz Sosnowski <dsosnowski@nvidia.com>
	 18     Ori Kam <orika@nvidia.com>
	 18     Bruce Richardson <bruce.richardson@intel.com>

The next challenge is to reduce open bugs drastically.


The next version will be 24.07 in July.
The new features for 24.07 can be submitted during the next 4 weeks:
	http://core.dpdk.org/roadmap#dates
Please share your roadmap.


Don't forget to register for the webinar about DPDK in the cloud:
	https://zoom.us/webinar/register/WN_IG21wHwlTEGTv3sAXqcoFg


Thanks everyone



^ permalink raw reply	[relevance 3%]

* Minutes of Technical Board meeting 20-March-2024
@ 2024-03-28  2:19  3% Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-03-28  2:19 UTC (permalink / raw)
  To: dev

Members Attending
=================
Aaron Conole
Bruce Richardson
Hemant Agrawal
Jerin Jacob
Kevin Traynor
Konstantin Ananyev
Maxime Coquelin
Morten Brørup
Stephen Hemminger (chair)
Thomas Monjalon

NOTE
====
The technical board meetings are on every second Wednesday at 3 pm UTC.
Meetings are public. DPDK community members are welcome to attend on Zoom:
https://zoom-lfx.platform.linuxfoundation.org/meeting/96459488340?
password=d808f1f6-0a28-4165-929e-5a5bcae7efeb
Agenda: https://annuel.framapad.org/p/r.0c3cc4d1e011214183872a98f6b5c7db
Minutes of previous meetings: http://core.dpdk.org/techboard/minutes

Next meeting will be on Wednesday 3-April-2024 at 3pm UTC, and will be
chaired by Thomas.


Agenda Items
============
1. Lcore variables (Mattias)

This patch series proposes an alternative for per-lcore variables.
Lots of code uses the pattern is per-lcore-data.
This solution is simple but uses lots of padding to handle cache lines;
and it is easy to overlook some cache patterns and get false sharing.
Recent example was hardware will pre-fetch across cache lines.
The per-lcore array model also doesn't handle non-EAL threads well.
One other issue is that accessing thread local storage variables
in another thread works but is undefined according to standards.

The proposal defines yet another allocator for per-thread data.
It is available early in startup other libraries can use use it.
Per-thread allocated storage can safely be used by unregistered threads.
Limitations: it does not completely solve the HW pre-fetch issue,
and it is not a true heap since there is no free or collection function.

Performance is about the same. All implementations have same access pattern.
An added benefit is that there is less chance of false sharing bugs.

Issues:
Valgrind will report leaks. It uses aligned_alloc() and that
function is different on Windows.

Questions:
Should it use hugepages?
startup issue before/after memory setup?
Or hint the OS?

2. 2024 Events update (Nathan)
Planning for North American DPDK summit in Montreal - week of 9 September.
Still pending governing board approval.

Discussions around an Asia-Pacific event in Bangkok Thailand.
This location was suggested as a neutral location (less visa issues).
Early stage discussions, still needs more work by governing board.
There has been large amount of interest in DPDK webinars in APAC.

Are there budget or visa issues with travel for Indian companies?

3. Code Challenge (Ben)
Asked Jim Zemlin about cross-project challenges; he agreed good idea but hard to pull off.
What next steps are needed to make this event work?
Proposal to have DPDK branded merch as part of this.

4. Changes to use markers (Tyler)
Using GCC zero sized arrays, used for alignment and reference and prefetch.
Not available in MSVC. Existing code has inconsistent usage of markers.

Proposed solutions (both maintain ABI):
Option 1: anonymous unions in C11, same API.
Option 2: break API, use existing cacheline prefetch functions.

Existing usage appears to be for prefetch, and hardware rearming.
A hybrid approach is to remove markers for the prefetch usage,
and use anonymous unions for the hardware rearm.

Need a way to prevent introduction of new usage of markers (checkpatch).

^ permalink raw reply	[relevance 3%]

* [PATCH v5] graph: expose node context as pointers
  @ 2024-03-27  9:14  4% ` Robin Jarry
  0 siblings, 0 replies; 200+ results
From: Robin Jarry @ 2024-03-27  9:14 UTC (permalink / raw)
  To: dev, Jerin Jacob, Kiran Kumar K, Nithin Dabilpuram, Zhirun Yan
  Cc: Tyler Retzlaff

In some cases, the node context data is used to store two pointers
because the data is larger than the reserved 16 bytes. Having to define
intermediate structures just to be able to cast is tedious. And without
intermediate structures, casting to opaque pointers is hard without
violating strict aliasing rules.

Add an unnamed union to allow storing opaque pointers in the node
context. Unfortunately, aligning an unnamed union that contains an array
produces inconsistent results between C and C++. To preserve ABI/API
compatibility in both C and C++, move all fast-path area fields into an
unnamed struct which is cache aligned. Use __rte_cache_min_aligned to
preserve existing alignment on architectures where cache lines are 128
bytes.

Add a static assert to ensure that the unnamed union is not larger than
the context array (RTE_NODE_CTX_SZ).

Signed-off-by: Robin Jarry <rjarry@redhat.com>
---

Notes:
    v5:
    
    * Helper functions to hide casting proved to be harder than expected.
      Naive casting may even be impossible without breaking strict aliasing
      rules. The only other option would be to use explicit memcpy calls.
    * Unnamed union tentative again. As suggested by Tyler (thank you!),
      using an intermediate unnamed struct to carry the alignment produces
      consistent ABI in C and C++.
    * Also, Tyler (thank you!) suggested that the fast path area alignment
      size may be incorrect for architectures where the cache line is not 64
      bytes. There will be a 64 bytes hole in the structure at the end of
      the unnamed struct before the zero length next nodes array. Use
      __rte_cache_min_aligned to preserve existing alignment.
    
    v4:
    
    * Replaced the unnamed union with helper inline functions.
    
    v3:
    
    * Added __extension__ to the unnamed struct inside the union.
    * Fixed C++ header checks.
    * Replaced alignas() with an explicit static_assert.

 lib/graph/rte_graph_worker_common.h | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h
index 36d864e2c14e..84d4997bbbf6 100644
--- a/lib/graph/rte_graph_worker_common.h
+++ b/lib/graph/rte_graph_worker_common.h
@@ -12,7 +12,9 @@
  * process, enqueue and move streams of objects to the next nodes.
  */
 
+#include <assert.h>
 #include <stdalign.h>
+#include <stddef.h>
 
 #include <rte_common.h>
 #include <rte_cycles.h>
@@ -111,14 +113,21 @@ struct __rte_cache_aligned rte_node {
 		} dispatch;
 	};
 	/* Fast path area  */
+	__extension__ struct __rte_cache_min_aligned {
 #define RTE_NODE_CTX_SZ 16
-	alignas(RTE_CACHE_LINE_SIZE) uint8_t ctx[RTE_NODE_CTX_SZ]; /**< Node Context. */
-	uint16_t size;		/**< Total number of objects available. */
-	uint16_t idx;		/**< Number of objects used. */
-	rte_graph_off_t off;	/**< Offset of node in the graph reel. */
-	uint64_t total_cycles;	/**< Cycles spent in this node. */
-	uint64_t total_calls;	/**< Calls done to this node. */
-	uint64_t total_objs;	/**< Objects processed by this node. */
+		union {
+			uint8_t ctx[RTE_NODE_CTX_SZ];
+			__extension__ struct {
+				void *ctx_ptr;
+				void *ctx_ptr2;
+			};
+		}; /**< Node Context. */
+		uint16_t size;		/**< Total number of objects available. */
+		uint16_t idx;		/**< Number of objects used. */
+		rte_graph_off_t off;	/**< Offset of node in the graph reel. */
+		uint64_t total_cycles;	/**< Cycles spent in this node. */
+		uint64_t total_calls;	/**< Calls done to this node. */
+		uint64_t total_objs;	/**< Objects processed by this node. */
 		union {
 			void **objs;	   /**< Array of object pointers. */
 			uint64_t objs_u64;
@@ -127,9 +136,13 @@ struct __rte_cache_aligned rte_node {
 			rte_node_process_t process; /**< Process function. */
 			uint64_t process_u64;
 		};
+	};
 	alignas(RTE_CACHE_LINE_MIN_SIZE) struct rte_node *nodes[]; /**< Next nodes. */
 };
 
+static_assert(offsetof(struct rte_node, size) - offsetof(struct rte_node, ctx) == RTE_NODE_CTX_SZ,
+	"rte_node context must be RTE_NODE_CTX_SZ bytes exactly");
+
 /**
  * @internal
  *
-- 
2.44.0


^ permalink raw reply	[relevance 4%]

* [PATCH v2 1/6] ethdev: support setting lanes
  @ 2024-03-22  7:09  5%   ` Dengdui Huang
  0 siblings, 0 replies; 200+ results
From: Dengdui Huang @ 2024-03-22  7:09 UTC (permalink / raw)
  To: dev
  Cc: ferruh.yigit, aman.deep.singh, yuying.zhang, thomas,
	andrew.rybchenko, damodharam.ammepalli, stephen, jerinjacobk,
	ajit.khaparde, liuyonglong, fengchengwen, haijie1, lihuisong

Some speeds can be achieved with different number of lanes. For example,
100Gbps can be achieved using two lanes of 50Gbps or four lanes of 25Gbps.
When use different lanes, the port cannot be up. This patch add support
setting lanes and report lanes.

In addition, add a device capability RTE_ETH_DEV_CAPA_SETTING_LANES
When the device does not support it, if a speed supports different
numbers of lanes, the application does not knowe which the lane number
are used by the device.

Signed-off-by: Dengdui Huang <huangdengdui@huawei.com>
---
 doc/guides/rel_notes/release_24_03.rst |   6 +
 drivers/net/bnxt/bnxt_ethdev.c         |   3 +-
 drivers/net/hns3/hns3_ethdev.c         |   1 +
 lib/ethdev/ethdev_linux_ethtool.c      | 208 ++++++++++++-------------
 lib/ethdev/ethdev_private.h            |   4 +
 lib/ethdev/ethdev_trace.h              |   4 +-
 lib/ethdev/meson.build                 |   2 +
 lib/ethdev/rte_ethdev.c                |  85 +++++++---
 lib/ethdev/rte_ethdev.h                |  75 ++++++---
 lib/ethdev/version.map                 |   6 +
 10 files changed, 250 insertions(+), 144 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 7bd9ceab27..4621689c68 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -76,6 +76,9 @@ New Features
   * Added a fath path function ``rte_eth_tx_queue_count``
     to get the number of used descriptors of a Tx queue.
 
+* **Support setting lanes for ethdev.**
+  * Support setting lanes by extended ``RTE_ETH_LINK_SPEED_*``.
+
 * **Added hash calculation of an encapsulated packet as done by the HW.**
 
   Added function to calculate hash when doing tunnel encapsulation:
@@ -254,6 +257,9 @@ ABI Changes
 
 * No ABI change that would break compatibility with 23.11.
 
+* ethdev: Convert a numerical speed to a bitmap flag with lanes:
+  The function ``rte_eth_speed_bitflag`` add lanes parameters.
+
 
 Known Issues
 ------------
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index ba31ae9286..e881a7f3cc 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -711,7 +711,8 @@ static int bnxt_update_phy_setting(struct bnxt *bp)
 	}
 
 	/* convert to speedbit flag */
-	curr_speed_bit = rte_eth_speed_bitflag((uint32_t)link->link_speed, 1);
+	curr_speed_bit = rte_eth_speed_bitflag((uint32_t)link->link_speed,
+					       RTE_ETH_LANES_UNKNOWN, 1);
 
 	/*
 	 * Device is not obliged link down in certain scenarios, even
diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c
index b10d1216d2..ecd3b2ef64 100644
--- a/drivers/net/hns3/hns3_ethdev.c
+++ b/drivers/net/hns3/hns3_ethdev.c
@@ -5969,6 +5969,7 @@ hns3_get_speed_fec_capa(struct rte_eth_fec_capa *speed_fec_capa,
 	for (i = 0; i < RTE_DIM(speed_fec_capa_tbl); i++) {
 		speed_bit =
 			rte_eth_speed_bitflag(speed_fec_capa_tbl[i].speed,
+					      RTE_ETH_LANES_UNKNOWN,
 					      RTE_ETH_LINK_FULL_DUPLEX);
 		if ((speed_capa & speed_bit) == 0)
 			continue;
diff --git a/lib/ethdev/ethdev_linux_ethtool.c b/lib/ethdev/ethdev_linux_ethtool.c
index e792204b01..6412845161 100644
--- a/lib/ethdev/ethdev_linux_ethtool.c
+++ b/lib/ethdev/ethdev_linux_ethtool.c
@@ -7,6 +7,10 @@
 #include "rte_ethdev.h"
 #include "ethdev_linux_ethtool.h"
 
+#define RTE_ETH_LINK_MODES_INDEX_SPEED	0
+#define RTE_ETH_LINK_MODES_INDEX_DUPLEX	1
+#define RTE_ETH_LINK_MODES_INDEX_LANES	2
+
 /* Link modes sorted with index as defined in ethtool.
  * Values are speed in Mbps with LSB indicating duplex.
  *
@@ -15,123 +19,119 @@
  * and allows to compile with new bits included even on an old kernel.
  *
  * The array below is built from bit definitions with this shell command:
- *   sed -rn 's;.*(ETHTOOL_LINK_MODE_)([0-9]+)([0-9a-zA-Z_]*).*= *([0-9]*).*;'\
- *           '[\4] = \2, /\* \1\2\3 *\/;p' /usr/include/linux/ethtool.h |
- *   awk '/_Half_/{$3=$3+1","}1'
+ *   sed -rn 's;.*(ETHTOOL_LINK_MODE_)([0-9]+)([a-zA-Z]+)([0-9_]+)([0-9a-zA-Z_]*)
+ *   .*= *([0-9]*).*;'\ '[\6] = {\2, 1, \4}, /\* \1\2\3\4\5 *\/;p'
+ *    /usr/include/linux/ethtool.h | awk '/_Half_/{$4=0","}1' |
+ *    awk '/, _}/{$5=1"},"}1' | awk '{sub(/_}/,"\}");}1'
  */
-static const uint32_t link_modes[] = {
-	  [0] =      11, /* ETHTOOL_LINK_MODE_10baseT_Half_BIT */
-	  [1] =      10, /* ETHTOOL_LINK_MODE_10baseT_Full_BIT */
-	  [2] =     101, /* ETHTOOL_LINK_MODE_100baseT_Half_BIT */
-	  [3] =     100, /* ETHTOOL_LINK_MODE_100baseT_Full_BIT */
-	  [4] =    1001, /* ETHTOOL_LINK_MODE_1000baseT_Half_BIT */
-	  [5] =    1000, /* ETHTOOL_LINK_MODE_1000baseT_Full_BIT */
-	 [12] =   10000, /* ETHTOOL_LINK_MODE_10000baseT_Full_BIT */
-	 [15] =    2500, /* ETHTOOL_LINK_MODE_2500baseX_Full_BIT */
-	 [17] =    1000, /* ETHTOOL_LINK_MODE_1000baseKX_Full_BIT */
-	 [18] =   10000, /* ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT */
-	 [19] =   10000, /* ETHTOOL_LINK_MODE_10000baseKR_Full_BIT */
-	 [20] =   10000, /* ETHTOOL_LINK_MODE_10000baseR_FEC_BIT */
-	 [21] =   20000, /* ETHTOOL_LINK_MODE_20000baseMLD2_Full_BIT */
-	 [22] =   20000, /* ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT */
-	 [23] =   40000, /* ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT */
-	 [24] =   40000, /* ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT */
-	 [25] =   40000, /* ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT */
-	 [26] =   40000, /* ETHTOOL_LINK_MODE_40000baseLR4_Full_BIT */
-	 [27] =   56000, /* ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT */
-	 [28] =   56000, /* ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT */
-	 [29] =   56000, /* ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT */
-	 [30] =   56000, /* ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT */
-	 [31] =   25000, /* ETHTOOL_LINK_MODE_25000baseCR_Full_BIT */
-	 [32] =   25000, /* ETHTOOL_LINK_MODE_25000baseKR_Full_BIT */
-	 [33] =   25000, /* ETHTOOL_LINK_MODE_25000baseSR_Full_BIT */
-	 [34] =   50000, /* ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT */
-	 [35] =   50000, /* ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT */
-	 [36] =  100000, /* ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT */
-	 [37] =  100000, /* ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT */
-	 [38] =  100000, /* ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT */
-	 [39] =  100000, /* ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT */
-	 [40] =   50000, /* ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT */
-	 [41] =    1000, /* ETHTOOL_LINK_MODE_1000baseX_Full_BIT */
-	 [42] =   10000, /* ETHTOOL_LINK_MODE_10000baseCR_Full_BIT */
-	 [43] =   10000, /* ETHTOOL_LINK_MODE_10000baseSR_Full_BIT */
-	 [44] =   10000, /* ETHTOOL_LINK_MODE_10000baseLR_Full_BIT */
-	 [45] =   10000, /* ETHTOOL_LINK_MODE_10000baseLRM_Full_BIT */
-	 [46] =   10000, /* ETHTOOL_LINK_MODE_10000baseER_Full_BIT */
-	 [47] =    2500, /* ETHTOOL_LINK_MODE_2500baseT_Full_BIT */
-	 [48] =    5000, /* ETHTOOL_LINK_MODE_5000baseT_Full_BIT */
-	 [52] =   50000, /* ETHTOOL_LINK_MODE_50000baseKR_Full_BIT */
-	 [53] =   50000, /* ETHTOOL_LINK_MODE_50000baseSR_Full_BIT */
-	 [54] =   50000, /* ETHTOOL_LINK_MODE_50000baseCR_Full_BIT */
-	 [55] =   50000, /* ETHTOOL_LINK_MODE_50000baseLR_ER_FR_Full_BIT */
-	 [56] =   50000, /* ETHTOOL_LINK_MODE_50000baseDR_Full_BIT */
-	 [57] =  100000, /* ETHTOOL_LINK_MODE_100000baseKR2_Full_BIT */
-	 [58] =  100000, /* ETHTOOL_LINK_MODE_100000baseSR2_Full_BIT */
-	 [59] =  100000, /* ETHTOOL_LINK_MODE_100000baseCR2_Full_BIT */
-	 [60] =  100000, /* ETHTOOL_LINK_MODE_100000baseLR2_ER2_FR2_Full_BIT */
-	 [61] =  100000, /* ETHTOOL_LINK_MODE_100000baseDR2_Full_BIT */
-	 [62] =  200000, /* ETHTOOL_LINK_MODE_200000baseKR4_Full_BIT */
-	 [63] =  200000, /* ETHTOOL_LINK_MODE_200000baseSR4_Full_BIT */
-	 [64] =  200000, /* ETHTOOL_LINK_MODE_200000baseLR4_ER4_FR4_Full_BIT */
-	 [65] =  200000, /* ETHTOOL_LINK_MODE_200000baseDR4_Full_BIT */
-	 [66] =  200000, /* ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT */
-	 [67] =     100, /* ETHTOOL_LINK_MODE_100baseT1_Full_BIT */
-	 [68] =    1000, /* ETHTOOL_LINK_MODE_1000baseT1_Full_BIT */
-	 [69] =  400000, /* ETHTOOL_LINK_MODE_400000baseKR8_Full_BIT */
-	 [70] =  400000, /* ETHTOOL_LINK_MODE_400000baseSR8_Full_BIT */
-	 [71] =  400000, /* ETHTOOL_LINK_MODE_400000baseLR8_ER8_FR8_Full_BIT */
-	 [72] =  400000, /* ETHTOOL_LINK_MODE_400000baseDR8_Full_BIT */
-	 [73] =  400000, /* ETHTOOL_LINK_MODE_400000baseCR8_Full_BIT */
-	 [75] =  100000, /* ETHTOOL_LINK_MODE_100000baseKR_Full_BIT */
-	 [76] =  100000, /* ETHTOOL_LINK_MODE_100000baseSR_Full_BIT */
-	 [77] =  100000, /* ETHTOOL_LINK_MODE_100000baseLR_ER_FR_Full_BIT */
-	 [78] =  100000, /* ETHTOOL_LINK_MODE_100000baseCR_Full_BIT */
-	 [79] =  100000, /* ETHTOOL_LINK_MODE_100000baseDR_Full_BIT */
-	 [80] =  200000, /* ETHTOOL_LINK_MODE_200000baseKR2_Full_BIT */
-	 [81] =  200000, /* ETHTOOL_LINK_MODE_200000baseSR2_Full_BIT */
-	 [82] =  200000, /* ETHTOOL_LINK_MODE_200000baseLR2_ER2_FR2_Full_BIT */
-	 [83] =  200000, /* ETHTOOL_LINK_MODE_200000baseDR2_Full_BIT */
-	 [84] =  200000, /* ETHTOOL_LINK_MODE_200000baseCR2_Full_BIT */
-	 [85] =  400000, /* ETHTOOL_LINK_MODE_400000baseKR4_Full_BIT */
-	 [86] =  400000, /* ETHTOOL_LINK_MODE_400000baseSR4_Full_BIT */
-	 [87] =  400000, /* ETHTOOL_LINK_MODE_400000baseLR4_ER4_FR4_Full_BIT */
-	 [88] =  400000, /* ETHTOOL_LINK_MODE_400000baseDR4_Full_BIT */
-	 [89] =  400000, /* ETHTOOL_LINK_MODE_400000baseCR4_Full_BIT */
-	 [90] =     101, /* ETHTOOL_LINK_MODE_100baseFX_Half_BIT */
-	 [91] =     100, /* ETHTOOL_LINK_MODE_100baseFX_Full_BIT */
-	 [92] =      10, /* ETHTOOL_LINK_MODE_10baseT1L_Full_BIT */
-	 [93] =  800000, /* ETHTOOL_LINK_MODE_800000baseCR8_Full_BIT */
-	 [94] =  800000, /* ETHTOOL_LINK_MODE_800000baseKR8_Full_BIT */
-	 [95] =  800000, /* ETHTOOL_LINK_MODE_800000baseDR8_Full_BIT */
-	 [96] =  800000, /* ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT */
-	 [97] =  800000, /* ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT */
-	 [98] =  800000, /* ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT */
-	 [99] =      10, /* ETHTOOL_LINK_MODE_10baseT1S_Full_BIT */
-	[100] =      11, /* ETHTOOL_LINK_MODE_10baseT1S_Half_BIT */
-	[101] =      11, /* ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT */
+static const uint32_t link_modes[][3] = {
+	[0]   = {10, 0, 1},     /* ETHTOOL_LINK_MODE_10baseT_Half_BIT */
+	[1]   = {10, 1, 1},     /* ETHTOOL_LINK_MODE_10baseT_Full_BIT */
+	[2]   = {100, 0, 1},    /* ETHTOOL_LINK_MODE_100baseT_Half_BIT */
+	[3]   = {100, 1, 1},    /* ETHTOOL_LINK_MODE_100baseT_Full_BIT */
+	[4]   = {1000, 0, 1},   /* ETHTOOL_LINK_MODE_1000baseT_Half_BIT */
+	[5]   = {1000, 1, 1},   /* ETHTOOL_LINK_MODE_1000baseT_Full_BIT */
+	[12]  = {10000, 1, 1},  /* ETHTOOL_LINK_MODE_10000baseT_Full_BIT */
+	[15]  = {2500, 1, 1},   /* ETHTOOL_LINK_MODE_2500baseX_Full_BIT */
+	[17]  = {1000, 1, 1},   /* ETHTOOL_LINK_MODE_1000baseKX_Full_BIT */
+	[18]  = {10000, 1, 4},  /* ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT */
+	[19]  = {10000, 1, 1},  /* ETHTOOL_LINK_MODE_10000baseKR_Full_BIT */
+	[20]  = {10000, 1, 1},  /* ETHTOOL_LINK_MODE_10000baseR_FEC_BIT */
+	[21]  = {20000, 1, 2},  /* ETHTOOL_LINK_MODE_20000baseMLD2_Full_BIT */
+	[22]  = {20000, 1, 2},  /* ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT */
+	[23]  = {40000, 1, 4},  /* ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT */
+	[24]  = {40000, 1, 4},  /* ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT */
+	[25]  = {40000, 1, 4},  /* ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT */
+	[26]  = {40000, 1, 4},  /* ETHTOOL_LINK_MODE_40000baseLR4_Full_BIT */
+	[27]  = {56000, 1, 4},  /* ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT */
+	[28]  = {56000, 1, 4},  /* ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT */
+	[29]  = {56000, 1, 4},  /* ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT */
+	[30]  = {56000, 1, 4},  /* ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT */
+	[31]  = {25000, 1, 1},  /* ETHTOOL_LINK_MODE_25000baseCR_Full_BIT */
+	[32]  = {25000, 1, 1},  /* ETHTOOL_LINK_MODE_25000baseKR_Full_BIT */
+	[33]  = {25000, 1, 1},  /* ETHTOOL_LINK_MODE_25000baseSR_Full_BIT */
+	[34]  = {50000, 1, 2},  /* ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT */
+	[35]  = {50000, 1, 2},  /* ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT */
+	[36]  = {100000, 1, 4}, /* ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT */
+	[37]  = {100000, 1, 4}, /* ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT */
+	[38]  = {100000, 1, 4}, /* ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT */
+	[39]  = {100000, 1, 4}, /* ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT */
+	[40]  = {50000, 1, 2},  /* ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT */
+	[41]  = {1000, 1, 1},   /* ETHTOOL_LINK_MODE_1000baseX_Full_BIT */
+	[42]  = {10000, 1, 1},  /* ETHTOOL_LINK_MODE_10000baseCR_Full_BIT */
+	[43]  = {10000, 1, 1},  /* ETHTOOL_LINK_MODE_10000baseSR_Full_BIT */
+	[44]  = {10000, 1, 1},  /* ETHTOOL_LINK_MODE_10000baseLR_Full_BIT */
+	[45]  = {10000, 1, 1},  /* ETHTOOL_LINK_MODE_10000baseLRM_Full_BIT */
+	[46]  = {10000, 1, 1},  /* ETHTOOL_LINK_MODE_10000baseER_Full_BIT */
+	[47]  = {2500, 1, 1},   /* ETHTOOL_LINK_MODE_2500baseT_Full_BIT */
+	[48]  = {5000, 1, 1},   /* ETHTOOL_LINK_MODE_5000baseT_Full_BIT */
+	[52]  = {50000, 1, 1},  /* ETHTOOL_LINK_MODE_50000baseKR_Full_BIT */
+	[53]  = {50000, 1, 1},  /* ETHTOOL_LINK_MODE_50000baseSR_Full_BIT */
+	[54]  = {50000, 1, 1},  /* ETHTOOL_LINK_MODE_50000baseCR_Full_BIT */
+	[55]  = {50000, 1, 1},  /* ETHTOOL_LINK_MODE_50000baseLR_ER_FR_Full_BIT */
+	[56]  = {50000, 1, 1},  /* ETHTOOL_LINK_MODE_50000baseDR_Full_BIT */
+	[57]  = {100000, 1, 2}, /* ETHTOOL_LINK_MODE_100000baseKR2_Full_BIT */
+	[58]  = {100000, 1, 2}, /* ETHTOOL_LINK_MODE_100000baseSR2_Full_BIT */
+	[59]  = {100000, 1, 2}, /* ETHTOOL_LINK_MODE_100000baseCR2_Full_BIT */
+	[60]  = {100000, 1, 2}, /* ETHTOOL_LINK_MODE_100000baseLR2_ER2_FR2_Full_BIT */
+	[61]  = {100000, 1, 2}, /* ETHTOOL_LINK_MODE_100000baseDR2_Full_BIT */
+	[62]  = {200000, 1, 4}, /* ETHTOOL_LINK_MODE_200000baseKR4_Full_BIT */
+	[63]  = {200000, 1, 4}, /* ETHTOOL_LINK_MODE_200000baseSR4_Full_BIT */
+	[64]  = {200000, 1, 4}, /* ETHTOOL_LINK_MODE_200000baseLR4_ER4_FR4_Full_BIT */
+	[65]  = {200000, 1, 4}, /* ETHTOOL_LINK_MODE_200000baseDR4_Full_BIT */
+	[66]  = {200000, 1, 4}, /* ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT */
+	[67]  = {100, 1, 1},    /* ETHTOOL_LINK_MODE_100baseT1_Full_BIT */
+	[68]  = {1000, 1, 1},   /* ETHTOOL_LINK_MODE_1000baseT1_Full_BIT */
+	[69]  = {400000, 1, 8}, /* ETHTOOL_LINK_MODE_400000baseKR8_Full_BIT */
+	[70]  = {400000, 1, 8}, /* ETHTOOL_LINK_MODE_400000baseSR8_Full_BIT */
+	[71]  = {400000, 1, 8}, /* ETHTOOL_LINK_MODE_400000baseLR8_ER8_FR8_Full_BIT */
+	[72]  = {400000, 1, 8}, /* ETHTOOL_LINK_MODE_400000baseDR8_Full_BIT */
+	[73]  = {400000, 1, 8}, /* ETHTOOL_LINK_MODE_400000baseCR8_Full_BIT */
+	[75]  = {100000, 1, 1}, /* ETHTOOL_LINK_MODE_100000baseKR_Full_BIT */
+	[76]  = {100000, 1, 1}, /* ETHTOOL_LINK_MODE_100000baseSR_Full_BIT */
+	[77]  = {100000, 1, 1}, /* ETHTOOL_LINK_MODE_100000baseLR_ER_FR_Full_BIT */
+	[78]  = {100000, 1, 1}, /* ETHTOOL_LINK_MODE_100000baseCR_Full_BIT */
+	[79]  = {100000, 1, 1}, /* ETHTOOL_LINK_MODE_100000baseDR_Full_BIT */
+	[80]  = {200000, 1, 2}, /* ETHTOOL_LINK_MODE_200000baseKR2_Full_BIT */
+	[81]  = {200000, 1, 2}, /* ETHTOOL_LINK_MODE_200000baseSR2_Full_BIT */
+	[82]  = {200000, 1, 2}, /* ETHTOOL_LINK_MODE_200000baseLR2_ER2_FR2_Full_BIT */
+	[83]  = {200000, 1, 2}, /* ETHTOOL_LINK_MODE_200000baseDR2_Full_BIT */
+	[84]  = {200000, 1, 2}, /* ETHTOOL_LINK_MODE_200000baseCR2_Full_BIT */
+	[85]  = {400000, 1, 4}, /* ETHTOOL_LINK_MODE_400000baseKR4_Full_BIT */
+	[86]  = {400000, 1, 4}, /* ETHTOOL_LINK_MODE_400000baseSR4_Full_BIT */
+	[87]  = {400000, 1, 4}, /* ETHTOOL_LINK_MODE_400000baseLR4_ER4_FR4_Full_BIT */
+	[88]  = {400000, 1, 4}, /* ETHTOOL_LINK_MODE_400000baseDR4_Full_BIT */
+	[89]  = {400000, 1, 4}, /* ETHTOOL_LINK_MODE_400000baseCR4_Full_BIT */
+	[90]  = {100, 0, 1},    /* ETHTOOL_LINK_MODE_100baseFX_Half_BIT */
+	[91]  = {100, 1, 1},    /* ETHTOOL_LINK_MODE_100baseFX_Full_BIT */
+	[92]  = {10, 1, 1},     /* ETHTOOL_LINK_MODE_10baseT1L_Full_BIT */
+	[93]  = {800000, 1, 8}, /* ETHTOOL_LINK_MODE_800000baseCR8_Full_BIT */
+	[94]  = {800000, 1, 8}, /* ETHTOOL_LINK_MODE_800000baseKR8_Full_BIT */
+	[95]  = {800000, 1, 8}, /* ETHTOOL_LINK_MODE_800000baseDR8_Full_BIT */
+	[96]  = {800000, 1, 8}, /* ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT */
+	[97]  = {800000, 1, 8}, /* ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT */
+	[98]  = {800000, 1, 8}, /* ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT */
+	[99]  = {10, 1, 1},     /* ETHTOOL_LINK_MODE_10baseT1S_Full_BIT */
+	[100] = {10, 0, 1},     /* ETHTOOL_LINK_MODE_10baseT1S_Half_BIT */
+	[101] = {10, 0, 1},     /* ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT */
 };
 
 uint32_t
 rte_eth_link_speed_ethtool(enum ethtool_link_mode_bit_indices bit)
 {
-	uint32_t speed;
-	int duplex;
+	uint32_t speed, duplex, lanes;
 
 	/* get mode from array */
 	if (bit >= RTE_DIM(link_modes))
 		return RTE_ETH_LINK_SPEED_AUTONEG;
-	speed = link_modes[bit];
-	if (speed == 0)
+	if (link_modes[bit][RTE_ETH_LINK_MODES_INDEX_SPEED] == 0)
 		return RTE_ETH_LINK_SPEED_AUTONEG;
 	RTE_BUILD_BUG_ON(RTE_ETH_LINK_SPEED_AUTONEG != 0);
 
-	/* duplex is LSB */
-	duplex = (speed & 1) ?
-			RTE_ETH_LINK_HALF_DUPLEX :
-			RTE_ETH_LINK_FULL_DUPLEX;
-	speed &= RTE_GENMASK32(31, 1);
-
-	return rte_eth_speed_bitflag(speed, duplex);
+	speed = link_modes[bit][RTE_ETH_LINK_MODES_INDEX_SPEED];
+	duplex = link_modes[bit][RTE_ETH_LINK_MODES_INDEX_DUPLEX];
+	lanes = link_modes[bit][RTE_ETH_LINK_MODES_INDEX_LANES];
+	return rte_eth_speed_bitflag(speed, duplex, lanes);
 }
 
 uint32_t
diff --git a/lib/ethdev/ethdev_private.h b/lib/ethdev/ethdev_private.h
index 0d36b9c30f..9092ab3a9e 100644
--- a/lib/ethdev/ethdev_private.h
+++ b/lib/ethdev/ethdev_private.h
@@ -79,4 +79,8 @@ void eth_dev_txq_release(struct rte_eth_dev *dev, uint16_t qid);
 int eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues);
 int eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues);
 
+/* versioned functions */
+uint32_t rte_eth_speed_bitflag_v24(uint32_t speed, int duplex);
+uint32_t rte_eth_speed_bitflag_v25(uint32_t speed, uint8_t lanes, int duplex);
+
 #endif /* _ETH_PRIVATE_H_ */
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 3bec87bfdb..5547b49cab 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -183,8 +183,10 @@ RTE_TRACE_POINT(
 
 RTE_TRACE_POINT(
 	rte_eth_trace_speed_bitflag,
-	RTE_TRACE_POINT_ARGS(uint32_t speed, int duplex, uint32_t ret),
+	RTE_TRACE_POINT_ARGS(uint32_t speed, uint8_t lanes, int duplex,
+			     uint32_t ret),
 	rte_trace_point_emit_u32(speed);
+	rte_trace_point_emit_u8(lanes);
 	rte_trace_point_emit_int(duplex);
 	rte_trace_point_emit_u32(ret);
 )
diff --git a/lib/ethdev/meson.build b/lib/ethdev/meson.build
index f1d2586591..2c9588d0b3 100644
--- a/lib/ethdev/meson.build
+++ b/lib/ethdev/meson.build
@@ -62,3 +62,5 @@ endif
 if get_option('buildtype').contains('debug')
     cflags += ['-DRTE_FLOW_DEBUG']
 endif
+
+use_function_versioning = true
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e..6571116fbf 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -26,6 +26,7 @@
 #include <rte_class.h>
 #include <rte_ether.h>
 #include <rte_telemetry.h>
+#include <rte_function_versioning.h>
 
 #include "rte_ethdev.h"
 #include "rte_ethdev_trace_fp.h"
@@ -991,63 +992,101 @@ rte_eth_dev_tx_queue_stop(uint16_t port_id, uint16_t tx_queue_id)
 	return ret;
 }
 
-uint32_t
-rte_eth_speed_bitflag(uint32_t speed, int duplex)
+uint32_t __vsym
+rte_eth_speed_bitflag_v25(uint32_t speed, uint8_t lanes, int duplex)
 {
-	uint32_t ret;
+	uint32_t ret = 0;
 
 	switch (speed) {
 	case RTE_ETH_SPEED_NUM_10M:
-		ret = duplex ? RTE_ETH_LINK_SPEED_10M : RTE_ETH_LINK_SPEED_10M_HD;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+			ret = duplex ? RTE_ETH_LINK_SPEED_10M : RTE_ETH_LINK_SPEED_10M_HD;
 		break;
 	case RTE_ETH_SPEED_NUM_100M:
-		ret = duplex ? RTE_ETH_LINK_SPEED_100M : RTE_ETH_LINK_SPEED_100M_HD;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+			ret = duplex ? RTE_ETH_LINK_SPEED_100M : RTE_ETH_LINK_SPEED_100M_HD;
 		break;
 	case RTE_ETH_SPEED_NUM_1G:
-		ret = RTE_ETH_LINK_SPEED_1G;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+			ret = RTE_ETH_LINK_SPEED_1G;
 		break;
 	case RTE_ETH_SPEED_NUM_2_5G:
-		ret = RTE_ETH_LINK_SPEED_2_5G;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+			ret = RTE_ETH_LINK_SPEED_2_5G;
 		break;
 	case RTE_ETH_SPEED_NUM_5G:
-		ret = RTE_ETH_LINK_SPEED_5G;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+			ret = RTE_ETH_LINK_SPEED_5G;
 		break;
 	case RTE_ETH_SPEED_NUM_10G:
-		ret = RTE_ETH_LINK_SPEED_10G;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+			ret = RTE_ETH_LINK_SPEED_10G;
+		else if (lanes == RTE_ETH_LANES_4)
+			ret = RTE_ETH_LINK_SPEED_10G_4LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_20G:
-		ret = RTE_ETH_LINK_SPEED_20G;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_2)
+			ret = RTE_ETH_LINK_SPEED_20G_2LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_25G:
-		ret = RTE_ETH_LINK_SPEED_25G;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+			ret = RTE_ETH_LINK_SPEED_25G;
 		break;
 	case RTE_ETH_SPEED_NUM_40G:
-		ret = RTE_ETH_LINK_SPEED_40G;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_4)
+			ret = RTE_ETH_LINK_SPEED_40G_4LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_50G:
-		ret = RTE_ETH_LINK_SPEED_50G;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+			ret = RTE_ETH_LINK_SPEED_50G;
+		else if (lanes == RTE_ETH_LANES_2)
+			ret = RTE_ETH_LINK_SPEED_50G_2LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_56G:
-		ret = RTE_ETH_LINK_SPEED_56G;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_4)
+			ret = RTE_ETH_LINK_SPEED_56G_4LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_100G:
-		ret = RTE_ETH_LINK_SPEED_100G;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_1)
+			ret = RTE_ETH_LINK_SPEED_100G;
+		else if (lanes == RTE_ETH_LANES_2)
+			ret = RTE_ETH_LINK_SPEED_100G_2LANES;
+		else if (lanes == RTE_ETH_LANES_4)
+			ret = RTE_ETH_LINK_SPEED_100G_4LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_200G:
-		ret = RTE_ETH_LINK_SPEED_200G;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_4)
+			ret = RTE_ETH_LINK_SPEED_200G_4LANES;
+		else if (lanes == RTE_ETH_LANES_2)
+			ret = RTE_ETH_LINK_SPEED_200G_2LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_400G:
-		ret = RTE_ETH_LINK_SPEED_400G;
+		if (lanes == RTE_ETH_LANES_UNKNOWN || lanes == RTE_ETH_LANES_4)
+			ret = RTE_ETH_LINK_SPEED_400G_4LANES;
+		else if (lanes == RTE_ETH_LANES_8)
+			ret = RTE_ETH_LINK_SPEED_400G_8LANES;
 		break;
 	default:
 		ret = 0;
 	}
 
-	rte_eth_trace_speed_bitflag(speed, duplex, ret);
+	rte_eth_trace_speed_bitflag(speed, lanes, duplex, ret);
 
 	return ret;
 }
 
+uint32_t __vsym
+rte_eth_speed_bitflag_v24(uint32_t speed, int duplex)
+{
+	return rte_eth_speed_bitflag_v25(speed, RTE_ETH_LANES_UNKNOWN, duplex);
+}
+
+/* mark the v24 function as the older version, and v25 as the default version */
+VERSION_SYMBOL(rte_eth_speed_bitflag, _v24, 24);
+BIND_DEFAULT_SYMBOL(rte_eth_speed_bitflag, _v25, 25);
+MAP_STATIC_SYMBOL(uint32_t rte_eth_speed_bitflag(uint32_t speed, uint8_t lanes, int duplex),
+		  rte_eth_speed_bitflag_v25);
+
 const char *
 rte_eth_dev_rx_offload_name(uint64_t offload)
 {
@@ -3110,13 +3149,21 @@ rte_eth_link_to_str(char *str, size_t len, const struct rte_eth_link *eth_link)
 
 	if (eth_link->link_status == RTE_ETH_LINK_DOWN)
 		ret = snprintf(str, len, "Link down");
-	else
+	else if (eth_link->link_lanes == RTE_ETH_LANES_UNKNOWN)
 		ret = snprintf(str, len, "Link up at %s %s %s",
 			rte_eth_link_speed_to_str(eth_link->link_speed),
 			(eth_link->link_duplex == RTE_ETH_LINK_FULL_DUPLEX) ?
 			"FDX" : "HDX",
 			(eth_link->link_autoneg == RTE_ETH_LINK_AUTONEG) ?
 			"Autoneg" : "Fixed");
+	else
+		ret = snprintf(str, len, "Link up at %s %u lanes %s %s",
+			rte_eth_link_speed_to_str(eth_link->link_speed),
+			eth_link->link_lanes,
+			(eth_link->link_duplex == RTE_ETH_LINK_FULL_DUPLEX) ?
+			"FDX" : "HDX",
+			(eth_link->link_autoneg == RTE_ETH_LINK_AUTONEG) ?
+			"Autoneg" : "Fixed");
 
 	rte_eth_trace_link_to_str(len, eth_link, str, ret);
 
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 147257d6a2..123b771046 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -288,24 +288,40 @@ struct rte_eth_stats {
 /**@{@name Link speed capabilities
  * Device supported speeds bitmap flags
  */
-#define RTE_ETH_LINK_SPEED_AUTONEG 0             /**< Autonegotiate (all speeds) */
-#define RTE_ETH_LINK_SPEED_FIXED   RTE_BIT32(0)  /**< Disable autoneg (fixed speed) */
-#define RTE_ETH_LINK_SPEED_10M_HD  RTE_BIT32(1)  /**<  10 Mbps half-duplex */
-#define RTE_ETH_LINK_SPEED_10M     RTE_BIT32(2)  /**<  10 Mbps full-duplex */
-#define RTE_ETH_LINK_SPEED_100M_HD RTE_BIT32(3)  /**< 100 Mbps half-duplex */
-#define RTE_ETH_LINK_SPEED_100M    RTE_BIT32(4)  /**< 100 Mbps full-duplex */
-#define RTE_ETH_LINK_SPEED_1G      RTE_BIT32(5)  /**<   1 Gbps */
-#define RTE_ETH_LINK_SPEED_2_5G    RTE_BIT32(6)  /**< 2.5 Gbps */
-#define RTE_ETH_LINK_SPEED_5G      RTE_BIT32(7)  /**<   5 Gbps */
-#define RTE_ETH_LINK_SPEED_10G     RTE_BIT32(8)  /**<  10 Gbps */
-#define RTE_ETH_LINK_SPEED_20G     RTE_BIT32(9)  /**<  20 Gbps */
-#define RTE_ETH_LINK_SPEED_25G     RTE_BIT32(10) /**<  25 Gbps */
-#define RTE_ETH_LINK_SPEED_40G     RTE_BIT32(11) /**<  40 Gbps */
-#define RTE_ETH_LINK_SPEED_50G     RTE_BIT32(12) /**<  50 Gbps */
-#define RTE_ETH_LINK_SPEED_56G     RTE_BIT32(13) /**<  56 Gbps */
-#define RTE_ETH_LINK_SPEED_100G    RTE_BIT32(14) /**< 100 Gbps */
-#define RTE_ETH_LINK_SPEED_200G    RTE_BIT32(15) /**< 200 Gbps */
-#define RTE_ETH_LINK_SPEED_400G    RTE_BIT32(16) /**< 400 Gbps */
+#define RTE_ETH_LINK_SPEED_AUTONEG        0             /**< Autonegotiate (all speeds) */
+#define RTE_ETH_LINK_SPEED_FIXED          RTE_BIT32(0)  /**< Disable autoneg (fixed speed) */
+#define RTE_ETH_LINK_SPEED_10M_HD         RTE_BIT32(1)  /**<  10 Mbps half-duplex */
+#define RTE_ETH_LINK_SPEED_10M            RTE_BIT32(2)  /**<  10 Mbps full-duplex */
+#define RTE_ETH_LINK_SPEED_100M_HD        RTE_BIT32(3)  /**< 100 Mbps half-duplex */
+#define RTE_ETH_LINK_SPEED_100M           RTE_BIT32(4)  /**< 100 Mbps full-duplex */
+#define RTE_ETH_LINK_SPEED_1G             RTE_BIT32(5)  /**<   1 Gbps */
+#define RTE_ETH_LINK_SPEED_2_5G           RTE_BIT32(6)  /**< 2.5 Gbps */
+#define RTE_ETH_LINK_SPEED_5G             RTE_BIT32(7)  /**<   5 Gbps */
+#define RTE_ETH_LINK_SPEED_10G            RTE_BIT32(8)  /**<  10 Gbps */
+#define RTE_ETH_LINK_SPEED_20G            RTE_BIT32(9)  /**<  20 Gbps 2lanes */
+#define RTE_ETH_LINK_SPEED_25G            RTE_BIT32(10) /**<  25 Gbps */
+#define RTE_ETH_LINK_SPEED_40G            RTE_BIT32(11) /**<  40 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_50G            RTE_BIT32(12) /**<  50 Gbps */
+#define RTE_ETH_LINK_SPEED_56G            RTE_BIT32(13) /**<  56 Gbps  4lanes */
+#define RTE_ETH_LINK_SPEED_100G           RTE_BIT32(14) /**< 100 Gbps */
+#define RTE_ETH_LINK_SPEED_200G           RTE_BIT32(15) /**< 200 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_400G           RTE_BIT32(16) /**< 400 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_10G_4LANES     RTE_BIT32(17)  /**<  10 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_50G_2LANES     RTE_BIT32(18) /**<  50 Gbps 2 lanes */
+#define RTE_ETH_LINK_SPEED_100G_2LANES    RTE_BIT32(19) /**< 100 Gbps 2 lanes */
+#define RTE_ETH_LINK_SPEED_100G_4LANES    RTE_BIT32(20) /**< 100 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_200G_2LANES    RTE_BIT32(21) /**< 200 Gbps 2lanes */
+#define RTE_ETH_LINK_SPEED_400G_8LANES    RTE_BIT32(22) /**< 400 Gbps 8lanes */
+/**@}*/
+
+/**@{@name Link speed capabilities
+ * Default lanes, use to compatible with earlier versions
+ */
+#define RTE_ETH_LINK_SPEED_20G_2LANES	RTE_ETH_LINK_SPEED_20G
+#define RTE_ETH_LINK_SPEED_40G_4LANES	RTE_ETH_LINK_SPEED_40G
+#define RTE_ETH_LINK_SPEED_56G_4LANES	RTE_ETH_LINK_SPEED_56G
+#define RTE_ETH_LINK_SPEED_200G_4LANES	RTE_ETH_LINK_SPEED_200G
+#define RTE_ETH_LINK_SPEED_400G_4LANES	RTE_ETH_LINK_SPEED_400G
 /**@}*/
 
 /**@{@name Link speed
@@ -329,6 +345,16 @@ struct rte_eth_stats {
 #define RTE_ETH_SPEED_NUM_UNKNOWN UINT32_MAX /**< Unknown */
 /**@}*/
 
+/**@{@name Link lane number
+ * Ethernet lane number
+ */
+#define RTE_ETH_LANES_UNKNOWN    0 /**< Unknown */
+#define RTE_ETH_LANES_1          1 /**< 1 lanes */
+#define RTE_ETH_LANES_2          2 /**< 2 lanes */
+#define RTE_ETH_LANES_4          4 /**< 4 lanes */
+#define RTE_ETH_LANES_8          8 /**< 8 lanes */
+/**@}*/
+
 /**
  * A structure used to retrieve link-level information of an Ethernet port.
  */
@@ -338,6 +364,7 @@ struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
 	uint16_t link_duplex  : 1;  /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
 	uint16_t link_autoneg : 1;  /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
 	uint16_t link_status  : 1;  /**< RTE_ETH_LINK_[DOWN/UP] */
+	uint16_t link_lanes   : 4;  /**< RTE_ETH_LANES_ */
 };
 
 /**@{@name Link negotiation
@@ -1641,6 +1668,12 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
 /** Device supports keeping shared flow objects across restart. */
 #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
+/**
+ * Device supports setting lanes. When the device does not support it,
+ * if a speed supports different numbers of lanes, the application does
+ * not knowe which the lane number are used by the device.
+ */
+#define RTE_ETH_DEV_CAPA_SETTING_LANES RTE_BIT64(5)
 /**@}*/
 
 /*
@@ -2301,12 +2334,16 @@ uint16_t rte_eth_dev_count_total(void);
  *
  * @param speed
  *   Numerical speed value in Mbps
+ * @param lanes
+ *   number of lanes (RTE_ETH_LANES_x)
+ *   RTE_ETH_LANES_UNKNOWN is always used when the device does not support
+ *   setting lanes
  * @param duplex
  *   RTE_ETH_LINK_[HALF/FULL]_DUPLEX (only for 10/100M speeds)
  * @return
  *   0 if the speed cannot be mapped
  */
-uint32_t rte_eth_speed_bitflag(uint32_t speed, int duplex);
+uint32_t rte_eth_speed_bitflag(uint32_t speed, uint8_t lanes, int duplex);
 
 /**
  * Get RTE_ETH_RX_OFFLOAD_* flag name.
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 79f6f5293b..9fa2439976 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -169,6 +169,12 @@ DPDK_24 {
 	local: *;
 };
 
+DPDK_25 {
+	global:
+
+	rte_eth_speed_bitflag;
+} DPDK_24;
+
 EXPERIMENTAL {
 	global:
 
-- 
2.33.0


^ permalink raw reply	[relevance 5%]

* Re: [PATCH 15/46] net/sfc: use rte stdatomic API
  2024-03-21 18:11  3%   ` Aaron Conole
@ 2024-03-21 18:15  0%     ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-03-21 18:15 UTC (permalink / raw)
  To: Aaron Conole
  Cc: dev, Mattias Rönnblom, Morten Brørup,
	Abdullah Sevincer, Ajit Khaparde, Alok Prasad, Anatoly Burakov,
	Andrew Rybchenko, Anoob Joseph, Bruce Richardson, Byron Marohn,
	Chenbo Xia, Chengwen Feng, Ciara Loftus, Ciara Power,
	Dariusz Sosnowski, David Hunt, Devendra Singh Rawat,
	Erik Gabriel Carrillo, Guoyang Zhou, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jakub Grajciar,
	Jerin Jacob, Jeroen de Borst, Jian Wang, Jiawen Wu, Jie Hai,
	Jingjing Wu, Joshua Washington, Joyce Kong, Junfeng Guo,
	Kevin Laatz, Konstantin Ananyev, Liang Ma, Long Li,
	Maciej Czekaj, Matan Azrad, Maxime Coquelin, Nicolas Chautru,
	Ori Kam, Pavan Nikhilesh, Peter Mccarthy, Rahul Lakkireddy,
	Reshma Pattan, Rosen Xu, Ruifeng Wang, Rushil Gupta,
	Sameh Gobriel, Sivaprasad Tummala, Somnath Kotur,
	Stephen Hemminger, Suanming Mou, Sunil Kumar Kori,
	Sunil Uttarwar, Tetsuya Mukawa, Vamsi Attunuru,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Xiaoyun Wang,
	Yipeng Wang, Yisen Zhuang, Yuying Zhang, Ziyang Xuan

On Thu, Mar 21, 2024 at 02:11:00PM -0400, Aaron Conole wrote:
> Tyler Retzlaff <roretzla@linux.microsoft.com> writes:
> 
> > Replace the use of gcc builtin __atomic_xxx intrinsics with
> > corresponding rte_atomic_xxx optional rte stdatomic API.
> >
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> >  drivers/net/sfc/meson.build       |  5 ++---
> >  drivers/net/sfc/sfc_mae_counter.c | 30 +++++++++++++++---------------
> >  drivers/net/sfc/sfc_repr_proxy.c  |  8 ++++----
> >  drivers/net/sfc/sfc_stats.h       |  8 ++++----
> >  4 files changed, 25 insertions(+), 26 deletions(-)
> >
> > diff --git a/drivers/net/sfc/meson.build b/drivers/net/sfc/meson.build
> > index 5adde68..d3603a0 100644
> > --- a/drivers/net/sfc/meson.build
> > +++ b/drivers/net/sfc/meson.build
> > @@ -47,9 +47,8 @@ int main(void)
> >      __int128 a = 0;
> >      __int128 b;
> >  
> > -    b = __atomic_load_n(&a, __ATOMIC_RELAXED);
> > -    __atomic_store(&b, &a, __ATOMIC_RELAXED);
> > -    __atomic_store_n(&b, a, __ATOMIC_RELAXED);
> > +    b = rte_atomic_load_explicit(&a, rte_memory_order_relaxed);
> > +    rte_atomic_store_explicit(&b, a, rte_memory_order_relaxed);
> >      return 0;
> >  }
> >  '''
> 
> I think this is a case where simple find/replace is a problem.  For
> example, this is a sample file that the meson build uses to determine if
> libatomic is properly installed.  However, it is very bare-bones.
> 
> Your change is likely causing a compile error when cc.links happens in
> the meson file.  That leads to the ABI error.
> 
> If the goal is to remove all the intrinsics, then maybe a better change
> would be dropping this libatomic check from here completely.
> 
> WDYT?

yeah, actually it wasn't a search replace mistake it was an
unintentionally included file where i was experimenting with keeping the
test (thought i had reverted it).

i shouldn't have added the change to the series thanks for pointing the
mistake out and sorry for the noise.

appreciate it!

^ permalink raw reply	[relevance 0%]

* Re: [PATCH 15/46] net/sfc: use rte stdatomic API
  @ 2024-03-21 18:11  3%   ` Aaron Conole
  2024-03-21 18:15  0%     ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: Aaron Conole @ 2024-03-21 18:11 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Mattias Rönnblom, Morten Brørup,
	Abdullah Sevincer, Ajit Khaparde, Alok Prasad, Anatoly Burakov,
	Andrew Rybchenko, Anoob Joseph, Bruce Richardson, Byron Marohn,
	Chenbo Xia, Chengwen Feng, Ciara Loftus, Ciara Power,
	Dariusz Sosnowski, David Hunt, Devendra Singh Rawat,
	Erik Gabriel Carrillo, Guoyang Zhou, Harman Kalra,
	Harry van Haaren, Honnappa Nagarahalli, Jakub Grajciar,
	Jerin Jacob, Jeroen de Borst, Jian Wang, Jiawen Wu, Jie Hai,
	Jingjing Wu, Joshua Washington, Joyce Kong, Junfeng Guo,
	Kevin Laatz, Konstantin Ananyev, Liang Ma, Long Li,
	Maciej Czekaj, Matan Azrad, Maxime Coquelin, Nicolas Chautru,
	Ori Kam, Pavan Nikhilesh, Peter Mccarthy, Rahul Lakkireddy,
	Reshma Pattan, Rosen Xu, Ruifeng Wang, Rushil Gupta,
	Sameh Gobriel, Sivaprasad Tummala, Somnath Kotur,
	Stephen Hemminger, Suanming Mou, Sunil Kumar Kori,
	Sunil Uttarwar, Tetsuya Mukawa, Vamsi Attunuru,
	Viacheslav Ovsiienko, Vladimir Medvedkin, Xiaoyun Wang,
	Yipeng Wang, Yisen Zhuang, Yuying Zhang, Ziyang Xuan

Tyler Retzlaff <roretzla@linux.microsoft.com> writes:

> Replace the use of gcc builtin __atomic_xxx intrinsics with
> corresponding rte_atomic_xxx optional rte stdatomic API.
>
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>  drivers/net/sfc/meson.build       |  5 ++---
>  drivers/net/sfc/sfc_mae_counter.c | 30 +++++++++++++++---------------
>  drivers/net/sfc/sfc_repr_proxy.c  |  8 ++++----
>  drivers/net/sfc/sfc_stats.h       |  8 ++++----
>  4 files changed, 25 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/net/sfc/meson.build b/drivers/net/sfc/meson.build
> index 5adde68..d3603a0 100644
> --- a/drivers/net/sfc/meson.build
> +++ b/drivers/net/sfc/meson.build
> @@ -47,9 +47,8 @@ int main(void)
>      __int128 a = 0;
>      __int128 b;
>  
> -    b = __atomic_load_n(&a, __ATOMIC_RELAXED);
> -    __atomic_store(&b, &a, __ATOMIC_RELAXED);
> -    __atomic_store_n(&b, a, __ATOMIC_RELAXED);
> +    b = rte_atomic_load_explicit(&a, rte_memory_order_relaxed);
> +    rte_atomic_store_explicit(&b, a, rte_memory_order_relaxed);
>      return 0;
>  }
>  '''

I think this is a case where simple find/replace is a problem.  For
example, this is a sample file that the meson build uses to determine if
libatomic is properly installed.  However, it is very bare-bones.

Your change is likely causing a compile error when cc.links happens in
the meson file.  That leads to the ABI error.

If the goal is to remove all the intrinsics, then maybe a better change
would be dropping this libatomic check from here completely.

WDYT?


^ permalink raw reply	[relevance 3%]

* Re: [PATCH 02/15] eal: pack structures when building with MSVC
  @ 2024-03-21 16:02  3%   ` Bruce Richardson
  0 siblings, 0 replies; 200+ results
From: Bruce Richardson @ 2024-03-21 16:02 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Akhil Goyal, Aman Singh, Anatoly Burakov, Byron Marohn,
	Conor Walsh, Cristian Dumitrescu, Dariusz Sosnowski, David Hunt,
	Jerin Jacob, Jingjing Wu, Kirill Rybalchenko, Konstantin Ananyev,
	Matan Azrad, Ori Kam, Radu Nicolau, Ruifeng Wang, Sameh Gobriel,
	Sivaprasad Tummala, Suanming Mou, Sunil Kumar Kori,
	Vamsi Attunuru, Viacheslav Ovsiienko, Vladimir Medvedkin,
	Yipeng Wang, Yuying Zhang

On Wed, Mar 20, 2024 at 02:05:58PM -0700, Tyler Retzlaff wrote:
> Add __rte_msvc_pack to all __rte_packed structs to cause packing
> when building with MSVC.
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>  lib/eal/common/eal_private.h      | 1 +
>  lib/eal/include/rte_memory.h      | 1 +
>  lib/eal/include/rte_memzone.h     | 1 +
>  lib/eal/include/rte_trace_point.h | 1 +
>  lib/eal/x86/include/rte_memcpy.h  | 3 +++
>  5 files changed, 7 insertions(+)
> 
> diff --git a/lib/eal/common/eal_private.h b/lib/eal/common/eal_private.h
> index 71523cf..21ace2a 100644
> --- a/lib/eal/common/eal_private.h
> +++ b/lib/eal/common/eal_private.h
> @@ -43,6 +43,7 @@ struct lcore_config {
>  /**
>   * The global RTE configuration structure.
>   */
> +__rte_msvc_pack
>  struct rte_config {
>  	uint32_t main_lcore;         /**< Id of the main lcore */
>  	uint32_t lcore_count;        /**< Number of available logical cores. */

This struct almost certainly doesn't need to be packed - since it's in a
private header, I would imagine removing packing wouldn't be an ABI
break.
Also, removing rte_packed doesn't change the size for me for a 64-bit x86
build. Looking at the struct, I don't see why it would change on a 32-bit
build either.

> diff --git a/lib/eal/include/rte_memory.h b/lib/eal/include/rte_memory.h
> index 842362d..73bb00d 100644
> --- a/lib/eal/include/rte_memory.h
> +++ b/lib/eal/include/rte_memory.h
> @@ -46,6 +46,7 @@
>  /**
>   * Physical memory segment descriptor.
>   */
> +__rte_msvc_pack
>  struct rte_memseg {
>  	rte_iova_t iova;            /**< Start IO address. */
>  	union {
> diff --git a/lib/eal/include/rte_memzone.h b/lib/eal/include/rte_memzone.h
> index 931497f..ca312c0 100644
> --- a/lib/eal/include/rte_memzone.h
> +++ b/lib/eal/include/rte_memzone.h
> @@ -45,6 +45,7 @@
>   * A structure describing a memzone, which is a contiguous portion of
>   * physical memory identified by a name.
>   */
> +__rte_msvc_pack
>  struct rte_memzone {
> 

This also doesn't look like it should be packed. It is a public header
though so we may need to be more careful. Checking a 64-bit x86 build shows
no size change when removing the "packed" attribute, though. For 32-bit, I
think the "size_t" field in the middle would be followed by padding on
32-bit if we removed the "packed" attribute, so it may be a no-go for now.
:-(

>  #define RTE_MEMZONE_NAMESIZE 32       /**< Maximum length of memory zone name.*/
> diff --git a/lib/eal/include/rte_trace_point.h b/lib/eal/include/rte_trace_point.h
> index 41e2a7f..63f333c 100644
> --- a/lib/eal/include/rte_trace_point.h
> +++ b/lib/eal/include/rte_trace_point.h
> @@ -292,6 +292,7 @@ int __rte_trace_point_register(rte_trace_point_t *trace, const char *name,
>  #define __RTE_TRACE_FIELD_ENABLE_MASK (1ULL << 63)
>  #define __RTE_TRACE_FIELD_ENABLE_DISCARD (1ULL << 62)
>  
> +__rte_msvc_pack
>  struct __rte_trace_stream_header {
>  	uint32_t magic;
>  	rte_uuid_t uuid;

From code review, this doesn't look like "packed" has any impact, since all
fields should naturally be aligned on both 32-bit and 64-bit builds.

/Bruce


^ permalink raw reply	[relevance 3%]

* DPDK Release Status Meeting 2024-03-21
@ 2024-03-21 14:49  3% Mcnamara, John
  0 siblings, 0 replies; 200+ results
From: Mcnamara, John @ 2024-03-21 14:49 UTC (permalink / raw)
  To: dev; +Cc: thomas, Marchand, David

[-- Attachment #1: Type: text/plain, Size: 2486 bytes --]

Release status meeting minutes 2024-03-21
=========================================

Agenda:
* Release Dates
* Subtrees
* Roadmaps
* LTS
* Defects
* Opens

Participants:
* AMD
* ARM
* Intel
* Marvell
* Nvidia
* Red Hat

Release Dates
-------------

The following are the current/updated working dates for 24.03:

* V1:      29 December 2023
* RC1:     21 February 2024
* RC2:      8 March    2024
* RC3:     18 March    2024
* RC4:     22 March    2024
* Release: 27 March    2024

https://core.dpdk.org/roadmap/#dates


Subtrees
--------

* next-net
  * Some fixes merged.
  * Ready for Pull

* next-net-intel
  * 2 fix/doc patches.

* next-net-mlx
  * Series merged after RC3.

* next-net-mvl
  * No new changes post RC3.

* next-eventdev
  * No new changes post RC3.

* next-baseband
  * No new changes post RC3.

* next-virtio
  * No new changes post RC3.

* next-crypto
  * Some doc patches.
  * Patch for ipsecgw sample app to be postponed to
    next release due to risk of breakage in other PMDs.

* main
  * RH testing for RC3 - no major issues.
  * Looking at Windows patches for next release to
    make sure there aren't any API/ABI breaking changes.
  * Doc fixes and release notes.
  * Proposed 24.03 dates:
    * RC4:     22 March 2024
    * Release: 27 March 2024


LTS
---

Please add acks to confirm validation support for a 3 year LTS window:
http://inbox.dpdk.org/dev/20240117161804.223582-1-ktraynor@redhat.com/

* 23.11.1 - In progress.
* 22.11.5 - In progress.
* 21.11.7 - In progress.
* 20.11.10 - Will only be updated with CVE and critical fixes.
* 19.11.15 - Will only be updated with CVE and critical fixes.


* Distros
  * Debian 12 contains DPDK v22.11
  * Ubuntu 24.04-LTS will contain DPDK v23.11
  * Ubuntu 23.04 contains DPDK v22.11

Defects
-------

* Bugzilla links, 'Bugs',  added for hosted projects
  * https://www.dpdk.org/hosted-projects/



DPDK Release Status Meetings
----------------------------

The DPDK Release Status Meeting is intended for DPDK Committers to discuss the
status of the master tree and sub-trees, and for project managers to track
progress or milestone dates.

The meeting occurs on every Thursday at 9:30 UTC over Jitsi on https://meet.jit.si/DPDK

You don't need an invite to join the meeting but if you want a calendar reminder just
send an email to "John McNamara john.mcnamara@intel.com" for the invite.


[-- Attachment #2: Type: text/html, Size: 13374 bytes --]

^ permalink raw reply	[relevance 3%]

* Re: [PATCH v6 02/23] mbuf: consolidate driver asserts for mbuf struct
  @ 2024-03-14 16:51  4%     ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-03-14 16:51 UTC (permalink / raw)
  To: dev, techboard
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb

We've gone around in circles a little on this series.  Let's discuss it
at the next techboard meeting, please put it on the agenda.

Summary

MSVC does not support the typedef of zero-sized typed arrays used in
struct rte_mbuf and a handful of other structs built on Windows. Better
known as ``RTE_MARKER`` fields.

There are two competing solutions we would like to know which to move
forward with.

1. Use C11 anonymous unions and anonymous struct extensions to replace
the RTE_MARKER fields which can maintain ABI and API compatibility.

2. Provide inline accessors for struct rte_mbuf for some fields and
removing others which maintains ABI but breaks API.

I'm proposing a mix of 1 & 2 to maintain ABI and some API for struct
rte_mbuf fields but remove (API breaking) cacheline{0,1} RTE_MARKER
fields in favor of existing inline functions for prefetching.

Thanks!

On Mon, Feb 26, 2024 at 09:41:18PM -0800, Tyler Retzlaff wrote:
> Collect duplicated RTE_BUILD_BUG_ON checks from drivers and place them
> at global scope with struct rte_mbuf definition using static_assert.
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>  lib/mbuf/rte_mbuf_core.h | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> index 7000c04..36551c2 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -16,8 +16,11 @@
>   * New fields and flags should fit in the "dynamic space".
>   */
>  
> +#include <assert.h>
> +#include <stddef.h>
>  #include <stdint.h>
>  
> +#include <rte_common.h>
>  #include <rte_byteorder.h>
>  #include <rte_stdatomic.h>
>  
> @@ -673,6 +676,37 @@ struct rte_mbuf {
>  	uint32_t dynfield1[9]; /**< Reserved for dynamic fields. */
>  } __rte_cache_aligned;
>  
> +static_assert(!(offsetof(struct rte_mbuf, ol_flags) !=
> +	offsetof(struct rte_mbuf, rearm_data) + 8), "ol_flags");
> +static_assert(!(offsetof(struct rte_mbuf, rearm_data) !=
> +	RTE_ALIGN(offsetof(struct rte_mbuf, rearm_data), 16)), "rearm_data");
> +static_assert(!(offsetof(struct rte_mbuf, data_off) !=
> +	offsetof(struct rte_mbuf, rearm_data)), "data_off");
> +static_assert(!(offsetof(struct rte_mbuf, data_off) <
> +	offsetof(struct rte_mbuf, rearm_data)), "data_off");
> +static_assert(!(offsetof(struct rte_mbuf, refcnt) <
> +	offsetof(struct rte_mbuf, rearm_data)), "refcnt");
> +static_assert(!(offsetof(struct rte_mbuf, nb_segs) <
> +	offsetof(struct rte_mbuf, rearm_data)), "nb_segs");
> +static_assert(!(offsetof(struct rte_mbuf, port) <
> +	offsetof(struct rte_mbuf, rearm_data)), "port");
> +static_assert(!(offsetof(struct rte_mbuf, data_off) -
> +	offsetof(struct rte_mbuf, rearm_data) > 6), "data_off");
> +static_assert(!(offsetof(struct rte_mbuf, refcnt) -
> +	offsetof(struct rte_mbuf, rearm_data) > 6), "refcnt");
> +static_assert(!(offsetof(struct rte_mbuf, nb_segs) -
> +	offsetof(struct rte_mbuf, rearm_data) > 6), "nb_segs");
> +static_assert(!(offsetof(struct rte_mbuf, port) -
> +	offsetof(struct rte_mbuf, rearm_data) > 6), "port");
> +static_assert(!(offsetof(struct rte_mbuf, pkt_len) !=
> +	offsetof(struct rte_mbuf, rx_descriptor_fields1) + 4), "pkt_len");
> +static_assert(!(offsetof(struct rte_mbuf, data_len) !=
> +	offsetof(struct rte_mbuf, rx_descriptor_fields1) + 8), "data_len");
> +static_assert(!(offsetof(struct rte_mbuf, vlan_tci) !=
> +	offsetof(struct rte_mbuf, rx_descriptor_fields1) + 10), "vlan_tci");
> +static_assert(!(offsetof(struct rte_mbuf, hash) !=
> +	offsetof(struct rte_mbuf, rx_descriptor_fields1) + 12), "hash");
> +
>  /**
>   * Function typedef of callback to free externally attached buffer.
>   */
> -- 
> 1.8.3.1

^ permalink raw reply	[relevance 4%]

* Re: [RFC v2] eal: increase passed max multi-process file descriptors
  2024-03-14 15:23  0%   ` Stephen Hemminger
@ 2024-03-14 15:38  3%     ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-03-14 15:38 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Anatoly Burakov, Jianfeng Tan

On Thu, Mar 14, 2024 at 4:23 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
> Rather than mess with versioning everything, probably better to just
> hold off to 24.11 release and do the change there.
>
> It will limit xdp and tap PMD's to 8 queues but no user has been
> demanding more yet.

IIUC, this limitation only applies to multiprocess setups.

Waiting for next ABI seems the simpler approach if there is no
explicit ask for this change.
Until someone wants to send more than 253 fds :-).


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [RFC v2] eal: increase passed max multi-process file descriptors
  2024-03-08 20:36  8% ` [RFC v2] eal: increase passed max multi-process file descriptors Stephen Hemminger
@ 2024-03-14 15:23  0%   ` Stephen Hemminger
  2024-03-14 15:38  3%     ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-03-14 15:23 UTC (permalink / raw)
  To: dev; +Cc: Anatoly Burakov, Jianfeng Tan

On Fri,  8 Mar 2024 12:36:39 -0800
Stephen Hemminger <stephen@networkplumber.org> wrote:

> Both XDP and TAP device are limited in the number of queues
> because of limitations on the number of file descriptors that
> are allowed. The original choice of 8 was too low; the allowed
> maximum is 253 according to unix(7) man page.
> 
> This may look like a serious ABI breakage but it is not.
> It is simpler for everyone if the limit is increased rather than
> building a parallel set of calls.
> 
> The case that matters is older application registering MP support
> with the newer version of EAL. In this case, since the old application
> will always send the more compact structure (less possible fd's)
> it is OK.
> 
> Request (for up to 8 fds) sent to EAL.
>    - EAL only references up to num_fds.
>    - The area past the old fd array is not accessed.
> 
> Reply callback:
>    - EAL will pass pointer to the new (larger structure),
>      the old callback will only look at the first part of
>      the fd array (num_fds <= 8).
> 
>    - Since primary and secondary must both be from same DPDK version
>      there is normal way that a reply with more fd's could be possible.
>      The only case is the same as above, where application requested
>      something that would break in old version and now succeeds.
> 
> The one possible incompatibility is that if application passed
> a larger number of fd's (32?) and expected an error. Now it will
> succeed and get passed through.
> 
> Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
> v2 - show the simpler way to address with some minor ABI issue
> 
>  doc/guides/rel_notes/release_24_03.rst | 4 ++++
>  lib/eal/include/rte_eal.h              | 2 +-
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
> index 932688ca4d82..1d33cfa15dfb 100644
> --- a/doc/guides/rel_notes/release_24_03.rst
> +++ b/doc/guides/rel_notes/release_24_03.rst
> @@ -225,6 +225,10 @@ API Changes
>  * ethdev: Renamed structure ``rte_flow_action_modify_data`` to be
>    ``rte_flow_field_data`` for more generic usage.
>  
> +* eal: The maximum number of file descriptors allowed to be passed in
> +  multi-process requests is increased from 8 to the maximum possible on
> +  Linux unix domain sockets 253. This allows for more queues on XDP and
> +  TAP device.
>  
>  ABI Changes
>  -----------
> diff --git a/lib/eal/include/rte_eal.h b/lib/eal/include/rte_eal.h
> index c2256f832e51..cd84fcdd1bdb 100644
> --- a/lib/eal/include/rte_eal.h
> +++ b/lib/eal/include/rte_eal.h
> @@ -155,7 +155,7 @@ int rte_eal_primary_proc_alive(const char *config_file_path);
>   */
>  bool rte_mp_disable(void);
>  
> -#define RTE_MP_MAX_FD_NUM	8    /* The max amount of fds */
> +#define RTE_MP_MAX_FD_NUM	253    /* The max amount of fds */
>  #define RTE_MP_MAX_NAME_LEN	64   /* The max length of action name */
>  #define RTE_MP_MAX_PARAM_LEN	256  /* The max length of param */
>  struct rte_mp_msg {


Rather than mess with versioning everything, probably better to just
hold off to 24.11 release and do the change there.

It will limit xdp and tap PMD's to 8 queues but no user has been
demanding more yet.

^ permalink raw reply	[relevance 0%]

* Re: [RFC] eal: increase the number of availble file descriptors for MP
  2024-03-08 18:54  2% [RFC] eal: increase the number of availble file descriptors for MP Stephen Hemminger
  2024-03-08 20:36  8% ` [RFC v2] eal: increase passed max multi-process file descriptors Stephen Hemminger
  2024-03-09 18:12  2% ` [RFC v3] tap: do not duplicate fd's Stephen Hemminger
@ 2024-03-14 14:40  4% ` David Marchand
  2 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-03-14 14:40 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Anatoly Burakov, Ferruh Yigit, Thomas Monjalon

On Fri, Mar 8, 2024 at 7:54 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> The current limit of file descriptors is too low, it should have
> been set to the maximum possible to send across an unix domain
> socket.
>
> This is an attempt to allow increasing it without breaking ABI.
> But in the process it exposes what is broken about how symbol
> versions are checked in check-symbol-maps.sh. That script is
> broken in that it won't allow adding a backwards compatiable
> version hook like this.

- It could be enhanced maybe, but I see no problem with the script.

The versions for compat symbols in this patch are wrong.
We want to keep compat with ABI 24, not 23.
And next ABI will be 25.

- rte_mp_old_msg does not have to be exported as public in rte_eal.h.


- I think the patch is not complete:
 * rte_mp_action_register and rte_mp_request_async need versioning too,
 * because of the former point, handling of msg requests probably
needs to keep track of accepted length per registered callbacks,


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* Re: [PATCH 1/1] eal: add C++ include guard in generic/rte_vect.h
  @ 2024-03-14  3:45  3%         ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-03-14  3:45 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: Ashish Sadanandan, Bruce Richardson, Thomas Monjalon, dev,
	nelio.laranjeiro, stable, honnappa.nagarahalli,
	konstantin.v.ananyev, david.marchand, ruifeng.wang

On Wed, Mar 13, 2024 at 04:45:36PM -0700, Stephen Hemminger wrote:
> On Fri, 2 Feb 2024 13:58:19 -0700
> Ashish Sadanandan <ashish.sadanandan@gmail.com> wrote:
> 
> > > I think just having the extern "C" guard in all files is the safest choice,
> > > because it's immediately obvious in each and every file that it is correct.
> > > Taking the other option, to check any indirect include file you need to go
> > > finding what other files include it and check there that a) they have
> > > include guards and b) the include for the indirect header is contained
> > > within it.
> > >
> > > Adopting the policy of putting the guard in each and every header is also a
> > > lot easier to do basic automated sanity checks on. If the file ends in .h,
> > > we just use grep to quickly verify it's not missing the guards. [Naturally,
> > > we can do more complete checks than that if we want, but 99% percent of
> > > misses can be picked up by a grep for the 'extern "C"' bit]
> > >
> > > /Bruce
> > >  
> > 
> > 100% agree with Bruce. It's a valid ideological argument that private
> > headers
> > don't need such safeguards, but it's difficult to enforce and easy to break
> > during refactoring.
> > 
> > - Ashish
> 
> But splashing this across all the internal driver headers is bad idea.
> It should only apply to header files that exported in final package.

while we don't provide api/abi stability promises for driver headers we
do optionally install them with -Denable_driver_sdk=true.

the driver sdk allows drivers to be developed outside of the dpdk source
tree, many such drivers are explicitly authored in C++ and are outside of
the dpdk source tree because dpdk does not allow C++ drivers in tree.

^ permalink raw reply	[relevance 3%]

* Re: [PATCH v3 07/33] net/ena: restructure the llq policy setting process
  2024-03-10 14:29  0%     ` Brandes, Shai
@ 2024-03-13 11:21  0%       ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-03-13 11:21 UTC (permalink / raw)
  To: Brandes, Shai; +Cc: dev

On 3/10/2024 2:29 PM, Brandes, Shai wrote:
> 
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Friday, March 8, 2024 7:24 PM
>> To: Brandes, Shai <shaibran@amazon.com>
>> Cc: dev@dpdk.org
>> Subject: RE: [EXTERNAL] [PATCH v3 07/33] net/ena: restructure the llq policy
>> setting process
>>
>> CAUTION: This email originated from outside of the organization. Do not click
>> links or open attachments unless you can confirm the sender and know the
>> content is safe.
>>
>>
>>
>> On 3/6/2024 12:24 PM, shaibran@amazon.com wrote:
>>> From: Shai Brandes <shaibran@amazon.com>
>>>
>>> The driver will set the size of the LLQ header size according to the
>>> recommendation from the device.
>>> Replaced `enable_llq` and `large_llq_hdr` devargs with a new devarg
>>> `llq_policy` that accepts the following values:
>>> 0 - Disable LLQ.
>>>     Use with extreme caution as it leads to a huge performance
>>>     degradation on AWS instances from 6th generation onwards.
>>> 1 - Accept device recommended LLQ policy (Default).
>>>     Device can recommend normal or large LLQ policy.
>>> 2 - Enforce normal LLQ policy.
>>> 3 - Enforce large LLQ policy.
>>>     Required for packets with header that exceed 96 bytes on
>>>     AWS instances prior to 5th generation.
>>>
>>
>> We had similar discussion before, although dev_args is not part of the ABI, it
>> is an user interface, and changes in the devargs will impact users directly.
>>
>> What would you think to either keep backward compatilibity in the devargs
>> (like not remove old one but add new one), or do this change in
>> 24.11 release?
> [Brandes, Shai] understood. 
> The new devarg replaced the old ones and added option to enforce normal-llq mode which is critical for our release.
> As you suggested, we will keep backward compatibility and add an additional devarg for enforcing  normal-llq policy.
> That way, we can easily replace it in future releases with a common devarg without the need to make major logic changes.
> 

ack.

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v6 1/5] ci: replace IPsec-mb package install
  2024-03-12 16:13  3%       ` David Marchand
@ 2024-03-12 17:07  0%         ` Power, Ciara
  0 siblings, 0 replies; 200+ results
From: Power, Ciara @ 2024-03-12 17:07 UTC (permalink / raw)
  To: Marchand, David
  Cc: Dooley, Brian, Aaron Conole, Michael Santana, dev, gakhil,
	De Lara Guarch, Pablo, probb, wathsala.vithanage,
	Thomas Monjalon, Richardson, Bruce



> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, March 12, 2024 4:14 PM
> To: Power, Ciara <ciara.power@intel.com>
> Cc: Dooley, Brian <brian.dooley@intel.com>; Aaron Conole
> <aconole@redhat.com>; Michael Santana <maicolgabriel@hotmail.com>;
> dev@dpdk.org; gakhil@marvell.com; De Lara Guarch, Pablo
> <pablo.de.lara.guarch@intel.com>; probb@iol.unh.edu;
> wathsala.vithanage@arm.com; Thomas Monjalon <thomas@monjalon.net>;
> Richardson, Bruce <bruce.richardson@intel.com>
> Subject: Re: [PATCH v6 1/5] ci: replace IPsec-mb package install
> 
> On Tue, Mar 12, 2024 at 4:26 PM Power, Ciara <ciara.power@intel.com> wrote:
> > > From: David Marchand <david.marchand@redhat.com> On Tue, Mar 12,
> > > 2024 at 2:50 PM Brian Dooley <brian.dooley@intel.com>
> > > wrote:
> > > >
> > > > From: Ciara Power <ciara.power@intel.com>
> > > >
> > > > The IPsec-mb version that is available through current package
> > > > managers is 1.2.
> > > > This release moves the minimum required IPsec-mb version for
> > > > IPsec-mb based SW PMDs to 1.4.
> > > > To compile these PMDs, a manual step is added to install IPsec-mb
> > > > v1.4 using dpkg.
> > > >
> > > > Signed-off-by: Ciara Power <ciara.power@intel.com>
> > > > ---
> > > >  .github/workflows/build.yml | 25 ++++++++++++++++++++++---
> > > >  1 file changed, 22 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/.github/workflows/build.yml
> > > > b/.github/workflows/build.yml index 776fbf6f30..ed44b1f730 100644
> > > > --- a/.github/workflows/build.yml
> > > > +++ b/.github/workflows/build.yml
> > > > @@ -106,9 +106,15 @@ jobs:
> > > >        run: sudo apt update || true
> > > >      - name: Install packages
> > > >        run: sudo apt install -y ccache libarchive-dev libbsd-dev libbpf-dev
> > > > -        libfdt-dev libibverbs-dev libipsec-mb-dev libisal-dev libjansson-dev
> > > > +        libfdt-dev libibverbs-dev libisal-dev libjansson-dev
> > > >          libnuma-dev libpcap-dev libssl-dev ninja-build pkg-config python3-
> pip
> > > >          python3-pyelftools python3-setuptools python3-wheel
> > > > zlib1g-dev
> > > > +    - name: Install ipsec-mb library
> > > > +      run: |
> > > > +        wget
> > > > + "https://launchpad.net/ubuntu/+archive/primary/+files/libipsec-
> > > mb-dev_1.4-3_amd64.deb"
> > > > +        wget
> > > > + "https://launchpad.net/ubuntu/+archive/primary/+files/libipsec-
> > > mb1_1.4-3_amd64.deb"
> > > > +        sudo dpkg -i libipsec-mb1_1.4-3_amd64.deb
> > > > +        sudo dpkg -i libipsec-mb-dev_1.4-3_amd64.deb
> > >
> > > I am not enthousiastic at advertising a kind of out of tree approach.
> > > That's a bit like if NVIDIA asked us to stop testing distribution
> > > rdma-core packages and instead rely on MOFED.
> > >
> > > Why are we removing support for versions that are packaged by the
> > > main distributions?
> >
> > With Ubuntu 22.04, ipsec-mb v1.2 is the version available through the
> package manager.
> > We were aiming to make v1.4 the minimum version for ipsec-mb PMDs from
> > this release onwards, removing the many ifdef codepaths in the PMDs
> > for older versions. (patch included in this patchset)
> >
> > Some of the other CI environments were updated to install v1.4 already
> > to support this change, but we found the github CI robot was limited for ipsec-
> mb versions when using the package manager.
> > It had some failures comparing ABI with v1.2 installed (SW PMDs compiled in
> reference build, but not compiled after patch).
> 
> Such a change means that users of the Ubuntu/Fedora dpdk package lose access
> to those drivers hypothetically.
> "Hypothetically", because in reality, Ubuntu and others distributions won't
> update to non LTS versions.
> 
> On the other hand, if a user was building DPDK (and not the one provided by
> the distribution), now the user has to stop using the ipsec mb provided by the
> distribution: building/packaging/maintaining the ipsec mb library is now forced
> on the user plate.
> 
> I am unclear if this qualifies as a ABI breakage, but I am not confortable with this
> change.

Hi David,

Ah, okay - thanks for the explanation.
Those are points I had missed, but it makes sense.

We will drop the version bump to v1.4 for this release, and revisit in a later release when suitable.

Thanks,
Ciara

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v6 1/5] ci: replace IPsec-mb package install
  2024-03-12 15:26  3%     ` Power, Ciara
@ 2024-03-12 16:13  3%       ` David Marchand
  2024-03-12 17:07  0%         ` Power, Ciara
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-03-12 16:13 UTC (permalink / raw)
  To: Power, Ciara
  Cc: Dooley, Brian, Aaron Conole, Michael Santana, dev, gakhil,
	De Lara Guarch, Pablo, probb, wathsala.vithanage,
	Thomas Monjalon, Bruce Richardson

On Tue, Mar 12, 2024 at 4:26 PM Power, Ciara <ciara.power@intel.com> wrote:
> > From: David Marchand <david.marchand@redhat.com>
> > On Tue, Mar 12, 2024 at 2:50 PM Brian Dooley <brian.dooley@intel.com>
> > wrote:
> > >
> > > From: Ciara Power <ciara.power@intel.com>
> > >
> > > The IPsec-mb version that is available through current package
> > > managers is 1.2.
> > > This release moves the minimum required IPsec-mb version for IPsec-mb
> > > based SW PMDs to 1.4.
> > > To compile these PMDs, a manual step is added to install IPsec-mb v1.4
> > > using dpkg.
> > >
> > > Signed-off-by: Ciara Power <ciara.power@intel.com>
> > > ---
> > >  .github/workflows/build.yml | 25 ++++++++++++++++++++++---
> > >  1 file changed, 22 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
> > > index 776fbf6f30..ed44b1f730 100644
> > > --- a/.github/workflows/build.yml
> > > +++ b/.github/workflows/build.yml
> > > @@ -106,9 +106,15 @@ jobs:
> > >        run: sudo apt update || true
> > >      - name: Install packages
> > >        run: sudo apt install -y ccache libarchive-dev libbsd-dev libbpf-dev
> > > -        libfdt-dev libibverbs-dev libipsec-mb-dev libisal-dev libjansson-dev
> > > +        libfdt-dev libibverbs-dev libisal-dev libjansson-dev
> > >          libnuma-dev libpcap-dev libssl-dev ninja-build pkg-config python3-pip
> > >          python3-pyelftools python3-setuptools python3-wheel
> > > zlib1g-dev
> > > +    - name: Install ipsec-mb library
> > > +      run: |
> > > +        wget "https://launchpad.net/ubuntu/+archive/primary/+files/libipsec-
> > mb-dev_1.4-3_amd64.deb"
> > > +        wget "https://launchpad.net/ubuntu/+archive/primary/+files/libipsec-
> > mb1_1.4-3_amd64.deb"
> > > +        sudo dpkg -i libipsec-mb1_1.4-3_amd64.deb
> > > +        sudo dpkg -i libipsec-mb-dev_1.4-3_amd64.deb
> >
> > I am not enthousiastic at advertising a kind of out of tree approach.
> > That's a bit like if NVIDIA asked us to stop testing distribution rdma-core
> > packages and instead rely on MOFED.
> >
> > Why are we removing support for versions that are packaged by the main
> > distributions?
>
> With Ubuntu 22.04, ipsec-mb v1.2 is the version available through the package manager.
> We were aiming to make v1.4 the minimum version for ipsec-mb PMDs from this release onwards,
> removing the many ifdef codepaths in the PMDs for older versions. (patch included in this patchset)
>
> Some of the other CI environments were updated to install v1.4 already to support this change,
> but we found the github CI robot was limited for ipsec-mb versions when using the package manager.
> It had some failures comparing ABI with v1.2 installed (SW PMDs compiled in reference build, but not compiled after patch).

Such a change means that users of the Ubuntu/Fedora dpdk package lose
access to those drivers hypothetically.
"Hypothetically", because in reality, Ubuntu and others distributions
won't update to non LTS versions.

On the other hand, if a user was building DPDK (and not the one
provided by the distribution), now the user has to stop using the
ipsec mb provided by the distribution: building/packaging/maintaining
the ipsec mb library is now forced on the user plate.

I am unclear if this qualifies as a ABI breakage, but I am not
confortable with this change.


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* RE: [PATCH v6 1/5] ci: replace IPsec-mb package install
  @ 2024-03-12 15:26  3%     ` Power, Ciara
  2024-03-12 16:13  3%       ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Power, Ciara @ 2024-03-12 15:26 UTC (permalink / raw)
  To: Marchand, David, Dooley, Brian
  Cc: Aaron Conole, Michael Santana, dev, gakhil, De Lara Guarch,
	Pablo, probb, wathsala.vithanage

Hi David,

> -----Original Message-----
> From: David Marchand <david.marchand@redhat.com>
> Sent: Tuesday, March 12, 2024 1:54 PM
> To: Dooley, Brian <brian.dooley@intel.com>
> Cc: Aaron Conole <aconole@redhat.com>; Michael Santana
> <maicolgabriel@hotmail.com>; dev@dpdk.org; gakhil@marvell.com; De Lara
> Guarch, Pablo <pablo.de.lara.guarch@intel.com>; probb@iol.unh.edu;
> wathsala.vithanage@arm.com; Power, Ciara <ciara.power@intel.com>
> Subject: Re: [PATCH v6 1/5] ci: replace IPsec-mb package install
> 
> Hello,
> 
> On Tue, Mar 12, 2024 at 2:50 PM Brian Dooley <brian.dooley@intel.com>
> wrote:
> >
> > From: Ciara Power <ciara.power@intel.com>
> >
> > The IPsec-mb version that is available through current package
> > managers is 1.2.
> > This release moves the minimum required IPsec-mb version for IPsec-mb
> > based SW PMDs to 1.4.
> > To compile these PMDs, a manual step is added to install IPsec-mb v1.4
> > using dpkg.
> >
> > Signed-off-by: Ciara Power <ciara.power@intel.com>
> > ---
> >  .github/workflows/build.yml | 25 ++++++++++++++++++++++---
> >  1 file changed, 22 insertions(+), 3 deletions(-)
> >
> > diff --git a/.github/workflows/build.yml b/.github/workflows/build.yml
> > index 776fbf6f30..ed44b1f730 100644
> > --- a/.github/workflows/build.yml
> > +++ b/.github/workflows/build.yml
> > @@ -106,9 +106,15 @@ jobs:
> >        run: sudo apt update || true
> >      - name: Install packages
> >        run: sudo apt install -y ccache libarchive-dev libbsd-dev libbpf-dev
> > -        libfdt-dev libibverbs-dev libipsec-mb-dev libisal-dev libjansson-dev
> > +        libfdt-dev libibverbs-dev libisal-dev libjansson-dev
> >          libnuma-dev libpcap-dev libssl-dev ninja-build pkg-config python3-pip
> >          python3-pyelftools python3-setuptools python3-wheel
> > zlib1g-dev
> > +    - name: Install ipsec-mb library
> > +      run: |
> > +        wget "https://launchpad.net/ubuntu/+archive/primary/+files/libipsec-
> mb-dev_1.4-3_amd64.deb"
> > +        wget "https://launchpad.net/ubuntu/+archive/primary/+files/libipsec-
> mb1_1.4-3_amd64.deb"
> > +        sudo dpkg -i libipsec-mb1_1.4-3_amd64.deb
> > +        sudo dpkg -i libipsec-mb-dev_1.4-3_amd64.deb
> 
> I am not enthousiastic at advertising a kind of out of tree approach.
> That's a bit like if NVIDIA asked us to stop testing distribution rdma-core
> packages and instead rely on MOFED.
> 
> Why are we removing support for versions that are packaged by the main
> distributions?

With Ubuntu 22.04, ipsec-mb v1.2 is the version available through the package manager.
We were aiming to make v1.4 the minimum version for ipsec-mb PMDs from this release onwards,
removing the many ifdef codepaths in the PMDs for older versions. (patch included in this patchset)

Some of the other CI environments were updated to install v1.4 already to support this change,
but we found the github CI robot was limited for ipsec-mb versions when using the package manager.
It had some failures comparing ABI with v1.2 installed (SW PMDs compiled in reference build, but not compiled after patch).

To support the new minimum SW PMD ipsec-mb version for this CI, we thought installing v1.4 like this would suffice.

Thanks,
Ciara



^ permalink raw reply	[relevance 3%]

* [PATCH 1/3] ethdev: support setting lanes
  @ 2024-03-12  7:52  5% ` Dengdui Huang
  0 siblings, 0 replies; 200+ results
From: Dengdui Huang @ 2024-03-12  7:52 UTC (permalink / raw)
  To: dev
  Cc: ferruh.yigit, aman.deep.singh, yuying.zhang, thomas,
	andrew.rybchenko, liuyonglong, fengchengwen, haijie1, lihuisong

Some speeds can be achieved with different number of lanes. For example,
100Gbps can be achieved using two lanes of 50Gbps or four lanes of 25Gbps.
When use different lanes, the port cannot be up. This patch add support
for setting lanes and report lanes.

Signed-off-by: Dengdui Huang <huangdengdui@huawei.com>
---
 doc/guides/rel_notes/release_24_03.rst |   8 +-
 drivers/net/bnxt/bnxt_ethdev.c         |   3 +-
 drivers/net/hns3/hns3_ethdev.c         |   1 +
 lib/ethdev/ethdev_driver.h             |   1 -
 lib/ethdev/ethdev_linux_ethtool.c      | 101 ++++++++-
 lib/ethdev/ethdev_private.h            |   4 +
 lib/ethdev/ethdev_trace.h              |   4 +-
 lib/ethdev/meson.build                 |   2 +
 lib/ethdev/rte_ethdev.c                | 272 +++++++++++++++++++++++--
 lib/ethdev/rte_ethdev.h                |  99 +++++++--
 lib/ethdev/version.map                 |   7 +
 11 files changed, 466 insertions(+), 36 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 932688ca4d..75d93ee965 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -76,6 +76,10 @@ New Features
   * Added a fath path function ``rte_eth_tx_queue_count``
     to get the number of used descriptors of a Tx queue.
 
+* **Support setting lanes for ethdev.**
+  * Support setting lanes by extended ``RTE_ETH_LINK_SPEED_*``.
+  * Added function to convert bitmap flag to the struct of link speed info.
+
 * **Added hash calculation of an encapsulated packet as done by the HW.**
 
   Added function to calculate hash when doing tunnel encapsulation:
@@ -240,9 +244,11 @@ ABI Changes
    This section is a comment. Do not overwrite or remove it.
    Also, make sure to start the actual text at the margin.
    =======================================================
-
 * No ABI change that would break compatibility with 23.11.
 
+* ethdev: Convert a numerical speed to a bitmap flag with lanes:
+  The function ``rte_eth_speed_bitflag`` add lanes parameters.
+
 
 Known Issues
 ------------
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index ba31ae9286..e881a7f3cc 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -711,7 +711,8 @@ static int bnxt_update_phy_setting(struct bnxt *bp)
 	}
 
 	/* convert to speedbit flag */
-	curr_speed_bit = rte_eth_speed_bitflag((uint32_t)link->link_speed, 1);
+	curr_speed_bit = rte_eth_speed_bitflag((uint32_t)link->link_speed,
+					       RTE_ETH_LANES_UNKNOWN, 1);
 
 	/*
 	 * Device is not obliged link down in certain scenarios, even
diff --git a/drivers/net/hns3/hns3_ethdev.c b/drivers/net/hns3/hns3_ethdev.c
index b10d1216d2..ecd3b2ef64 100644
--- a/drivers/net/hns3/hns3_ethdev.c
+++ b/drivers/net/hns3/hns3_ethdev.c
@@ -5969,6 +5969,7 @@ hns3_get_speed_fec_capa(struct rte_eth_fec_capa *speed_fec_capa,
 	for (i = 0; i < RTE_DIM(speed_fec_capa_tbl); i++) {
 		speed_bit =
 			rte_eth_speed_bitflag(speed_fec_capa_tbl[i].speed,
+					      RTE_ETH_LANES_UNKNOWN,
 					      RTE_ETH_LINK_FULL_DUPLEX);
 		if ((speed_capa & speed_bit) == 0)
 			continue;
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 0dbf2dd6a2..bb7dc7acb7 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -2003,7 +2003,6 @@ __rte_internal
 int
 rte_eth_ip_reassembly_dynfield_register(int *field_offset, int *flag);
 
-
 /*
  * Legacy ethdev API used internally by drivers.
  */
diff --git a/lib/ethdev/ethdev_linux_ethtool.c b/lib/ethdev/ethdev_linux_ethtool.c
index e792204b01..b776ec6173 100644
--- a/lib/ethdev/ethdev_linux_ethtool.c
+++ b/lib/ethdev/ethdev_linux_ethtool.c
@@ -111,10 +111,107 @@ static const uint32_t link_modes[] = {
 	[101] =      11, /* ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT */
 };
 
+/*
+ * Link modes sorted with index as defined in ethtool.
+ * Values are lanes.
+ */
+static const uint32_t link_modes_lanes[] = {
+	  [0] =      1, /* ETHTOOL_LINK_MODE_10baseT_Half_BIT */
+	  [1] =      1, /* ETHTOOL_LINK_MODE_10baseT_Full_BIT */
+	  [2] =      1, /* ETHTOOL_LINK_MODE_100baseT_Half_BIT */
+	  [3] =      1, /* ETHTOOL_LINK_MODE_100baseT_Full_BIT */
+	  [4] =      1, /* ETHTOOL_LINK_MODE_1000baseT_Half_BIT */
+	  [5] =      1, /* ETHTOOL_LINK_MODE_1000baseT_Full_BIT */
+	 [12] =      1, /* ETHTOOL_LINK_MODE_10000baseT_Full_BIT */
+	 [15] =      1, /* ETHTOOL_LINK_MODE_2500baseX_Full_BIT */
+	 [17] =      1, /* ETHTOOL_LINK_MODE_1000baseKX_Full_BIT */
+	 [18] =      4, /* ETHTOOL_LINK_MODE_10000baseKX4_Full_BIT */
+	 [19] =      1, /* ETHTOOL_LINK_MODE_10000baseKR_Full_BIT */
+	 [20] =      1, /* ETHTOOL_LINK_MODE_10000baseR_FEC_BIT */
+	 [21] =      2, /* ETHTOOL_LINK_MODE_20000baseMLD2_Full_BIT */
+	 [22] =      2, /* ETHTOOL_LINK_MODE_20000baseKR2_Full_BIT */
+	 [23] =      4, /* ETHTOOL_LINK_MODE_40000baseKR4_Full_BIT */
+	 [24] =      4, /* ETHTOOL_LINK_MODE_40000baseCR4_Full_BIT */
+	 [25] =      4, /* ETHTOOL_LINK_MODE_40000baseSR4_Full_BIT */
+	 [26] =      4, /* ETHTOOL_LINK_MODE_40000baseLR4_Full_BIT */
+	 [27] =      4, /* ETHTOOL_LINK_MODE_56000baseKR4_Full_BIT */
+	 [28] =      4, /* ETHTOOL_LINK_MODE_56000baseCR4_Full_BIT */
+	 [29] =      4, /* ETHTOOL_LINK_MODE_56000baseSR4_Full_BIT */
+	 [30] =      4, /* ETHTOOL_LINK_MODE_56000baseLR4_Full_BIT */
+	 [31] =      1, /* ETHTOOL_LINK_MODE_25000baseCR_Full_BIT */
+	 [32] =      1, /* ETHTOOL_LINK_MODE_25000baseKR_Full_BIT */
+	 [33] =      1, /* ETHTOOL_LINK_MODE_25000baseSR_Full_BIT */
+	 [34] =      2, /* ETHTOOL_LINK_MODE_50000baseCR2_Full_BIT */
+	 [35] =      2, /* ETHTOOL_LINK_MODE_50000baseKR2_Full_BIT */
+	 [36] =      4, /* ETHTOOL_LINK_MODE_100000baseKR4_Full_BIT */
+	 [37] =      4, /* ETHTOOL_LINK_MODE_100000baseSR4_Full_BIT */
+	 [38] =      4, /* ETHTOOL_LINK_MODE_100000baseCR4_Full_BIT */
+	 [39] =      4, /* ETHTOOL_LINK_MODE_100000baseLR4_ER4_Full_BIT */
+	 [40] =      2, /* ETHTOOL_LINK_MODE_50000baseSR2_Full_BIT */
+	 [41] =      1, /* ETHTOOL_LINK_MODE_1000baseX_Full_BIT */
+	 [42] =      1, /* ETHTOOL_LINK_MODE_10000baseCR_Full_BIT */
+	 [43] =      1, /* ETHTOOL_LINK_MODE_10000baseSR_Full_BIT */
+	 [44] =      1, /* ETHTOOL_LINK_MODE_10000baseLR_Full_BIT */
+	 [45] =      1, /* ETHTOOL_LINK_MODE_10000baseLRM_Full_BIT */
+	 [46] =      1, /* ETHTOOL_LINK_MODE_10000baseER_Full_BIT */
+	 [47] =      1, /* ETHTOOL_LINK_MODE_2500baseT_Full_BIT */
+	 [48] =      1, /* ETHTOOL_LINK_MODE_5000baseT_Full_BIT */
+	 [52] =      1, /* ETHTOOL_LINK_MODE_50000baseKR_Full_BIT */
+	 [53] =      1, /* ETHTOOL_LINK_MODE_50000baseSR_Full_BIT */
+	 [54] =      1, /* ETHTOOL_LINK_MODE_50000baseCR_Full_BIT */
+	 [55] =      1, /* ETHTOOL_LINK_MODE_50000baseLR_ER_FR_Full_BIT */
+	 [56] =      1, /* ETHTOOL_LINK_MODE_50000baseDR_Full_BIT */
+	 [57] =      2, /* ETHTOOL_LINK_MODE_100000baseKR2_Full_BIT */
+	 [58] =      2, /* ETHTOOL_LINK_MODE_100000baseSR2_Full_BIT */
+	 [59] =      2, /* ETHTOOL_LINK_MODE_100000baseCR2_Full_BIT */
+	 [60] =      2, /* ETHTOOL_LINK_MODE_100000baseLR2_ER2_FR2_Full_BIT */
+	 [61] =      2, /* ETHTOOL_LINK_MODE_100000baseDR2_Full_BIT */
+	 [62] =      4, /* ETHTOOL_LINK_MODE_200000baseKR4_Full_BIT */
+	 [63] =      4, /* ETHTOOL_LINK_MODE_200000baseSR4_Full_BIT */
+	 [64] =      4, /* ETHTOOL_LINK_MODE_200000baseLR4_ER4_FR4_Full_BIT */
+	 [65] =      4, /* ETHTOOL_LINK_MODE_200000baseDR4_Full_BIT */
+	 [66] =      4, /* ETHTOOL_LINK_MODE_200000baseCR4_Full_BIT */
+	 [67] =      1, /* ETHTOOL_LINK_MODE_100baseT1_Full_BIT */
+	 [68] =      1, /* ETHTOOL_LINK_MODE_1000baseT1_Full_BIT */
+	 [69] =      8, /* ETHTOOL_LINK_MODE_400000baseKR8_Full_BIT */
+	 [70] =      8, /* ETHTOOL_LINK_MODE_400000baseSR8_Full_BIT */
+	 [71] =      8, /* ETHTOOL_LINK_MODE_400000baseLR8_ER8_FR8_Full_BIT */
+	 [72] =      8, /* ETHTOOL_LINK_MODE_400000baseDR8_Full_BIT */
+	 [73] =      8, /* ETHTOOL_LINK_MODE_400000baseCR8_Full_BIT */
+	 [75] =      1, /* ETHTOOL_LINK_MODE_100000baseKR_Full_BIT */
+	 [76] =      1, /* ETHTOOL_LINK_MODE_100000baseSR_Full_BIT */
+	 [77] =      1, /* ETHTOOL_LINK_MODE_100000baseLR_ER_FR_Full_BIT */
+	 [78] =      1, /* ETHTOOL_LINK_MODE_100000baseCR_Full_BIT */
+	 [79] =      1, /* ETHTOOL_LINK_MODE_100000baseDR_Full_BIT */
+	 [80] =      2, /* ETHTOOL_LINK_MODE_200000baseKR2_Full_BIT */
+	 [81] =      2, /* ETHTOOL_LINK_MODE_200000baseSR2_Full_BIT */
+	 [82] =      2, /* ETHTOOL_LINK_MODE_200000baseLR2_ER2_FR2_Full_BIT */
+	 [83] =      2, /* ETHTOOL_LINK_MODE_200000baseDR2_Full_BIT */
+	 [84] =      2, /* ETHTOOL_LINK_MODE_200000baseCR2_Full_BIT */
+	 [85] =      4, /* ETHTOOL_LINK_MODE_400000baseKR4_Full_BIT */
+	 [86] =      4, /* ETHTOOL_LINK_MODE_400000baseSR4_Full_BIT */
+	 [87] =      4, /* ETHTOOL_LINK_MODE_400000baseLR4_ER4_FR4_Full_BIT */
+	 [88] =      4, /* ETHTOOL_LINK_MODE_400000baseDR4_Full_BIT */
+	 [89] =      4, /* ETHTOOL_LINK_MODE_400000baseCR4_Full_BIT */
+	 [90] =      1, /* ETHTOOL_LINK_MODE_100baseFX_Half_BIT */
+	 [91] =      1, /* ETHTOOL_LINK_MODE_100baseFX_Full_BIT */
+	 [92] =      1, /* ETHTOOL_LINK_MODE_10baseT1L_Full_BIT */
+	 [93] =      8, /* ETHTOOL_LINK_MODE_800000baseCR8_Full_BIT */
+	 [94] =      8, /* ETHTOOL_LINK_MODE_800000baseKR8_Full_BIT */
+	 [95] =      8, /* ETHTOOL_LINK_MODE_800000baseDR8_Full_BIT */
+	 [96] =      8, /* ETHTOOL_LINK_MODE_800000baseDR8_2_Full_BIT */
+	 [97] =      8, /* ETHTOOL_LINK_MODE_800000baseSR8_Full_BIT */
+	 [98] =      8, /* ETHTOOL_LINK_MODE_800000baseVR8_Full_BIT */
+	 [99] =      1, /* ETHTOOL_LINK_MODE_10baseT1S_Full_BIT */
+	[100] =      1, /* ETHTOOL_LINK_MODE_10baseT1S_Half_BIT */
+	[101] =      1, /* ETHTOOL_LINK_MODE_10baseT1S_P2MP_Half_BIT */
+};
+
 uint32_t
 rte_eth_link_speed_ethtool(enum ethtool_link_mode_bit_indices bit)
 {
 	uint32_t speed;
+	uint8_t lanes;
 	int duplex;
 
 	/* get mode from array */
@@ -131,7 +228,9 @@ rte_eth_link_speed_ethtool(enum ethtool_link_mode_bit_indices bit)
 			RTE_ETH_LINK_FULL_DUPLEX;
 	speed &= RTE_GENMASK32(31, 1);
 
-	return rte_eth_speed_bitflag(speed, duplex);
+	lanes = link_modes_lanes[bit];
+
+	return rte_eth_speed_bitflag(speed, lanes, duplex);
 }
 
 uint32_t
diff --git a/lib/ethdev/ethdev_private.h b/lib/ethdev/ethdev_private.h
index 0d36b9c30f..9092ab3a9e 100644
--- a/lib/ethdev/ethdev_private.h
+++ b/lib/ethdev/ethdev_private.h
@@ -79,4 +79,8 @@ void eth_dev_txq_release(struct rte_eth_dev *dev, uint16_t qid);
 int eth_dev_rx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues);
 int eth_dev_tx_queue_config(struct rte_eth_dev *dev, uint16_t nb_queues);
 
+/* versioned functions */
+uint32_t rte_eth_speed_bitflag_v24(uint32_t speed, int duplex);
+uint32_t rte_eth_speed_bitflag_v25(uint32_t speed, uint8_t lanes, int duplex);
+
 #endif /* _ETH_PRIVATE_H_ */
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 3bec87bfdb..5547b49cab 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -183,8 +183,10 @@ RTE_TRACE_POINT(
 
 RTE_TRACE_POINT(
 	rte_eth_trace_speed_bitflag,
-	RTE_TRACE_POINT_ARGS(uint32_t speed, int duplex, uint32_t ret),
+	RTE_TRACE_POINT_ARGS(uint32_t speed, uint8_t lanes, int duplex,
+			     uint32_t ret),
 	rte_trace_point_emit_u32(speed);
+	rte_trace_point_emit_u8(lanes);
 	rte_trace_point_emit_int(duplex);
 	rte_trace_point_emit_u32(ret);
 )
diff --git a/lib/ethdev/meson.build b/lib/ethdev/meson.build
index f1d2586591..2c9588d0b3 100644
--- a/lib/ethdev/meson.build
+++ b/lib/ethdev/meson.build
@@ -62,3 +62,5 @@ endif
 if get_option('buildtype').contains('debug')
     cflags += ['-DRTE_FLOW_DEBUG']
 endif
+
+use_function_versioning = true
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e..522f8796b1 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -26,6 +26,7 @@
 #include <rte_class.h>
 #include <rte_ether.h>
 #include <rte_telemetry.h>
+#include <rte_function_versioning.h>
 
 #include "rte_ethdev.h"
 #include "rte_ethdev_trace_fp.h"
@@ -991,63 +992,111 @@ rte_eth_dev_tx_queue_stop(uint16_t port_id, uint16_t tx_queue_id)
 	return ret;
 }
 
-uint32_t
-rte_eth_speed_bitflag(uint32_t speed, int duplex)
+uint32_t __vsym
+rte_eth_speed_bitflag_v25(uint32_t speed, uint8_t lanes, int duplex)
 {
-	uint32_t ret;
+	uint32_t ret = 0;
 
 	switch (speed) {
 	case RTE_ETH_SPEED_NUM_10M:
+		if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_1)
+			break;
 		ret = duplex ? RTE_ETH_LINK_SPEED_10M : RTE_ETH_LINK_SPEED_10M_HD;
 		break;
 	case RTE_ETH_SPEED_NUM_100M:
+		if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_1)
+			break;
 		ret = duplex ? RTE_ETH_LINK_SPEED_100M : RTE_ETH_LINK_SPEED_100M_HD;
 		break;
 	case RTE_ETH_SPEED_NUM_1G:
+		if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_1)
+			break;
 		ret = RTE_ETH_LINK_SPEED_1G;
 		break;
 	case RTE_ETH_SPEED_NUM_2_5G:
+		if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_1)
+			break;
 		ret = RTE_ETH_LINK_SPEED_2_5G;
 		break;
 	case RTE_ETH_SPEED_NUM_5G:
+		if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_1)
+			break;
 		ret = RTE_ETH_LINK_SPEED_5G;
 		break;
 	case RTE_ETH_SPEED_NUM_10G:
-		ret = RTE_ETH_LINK_SPEED_10G;
+		if (lanes == RTE_ETH_LANES_1)
+			ret = RTE_ETH_LINK_SPEED_10G;
+		if (lanes == RTE_ETH_LANES_4)
+			ret = RTE_ETH_LINK_SPEED_10G_4LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_20G:
-		ret = RTE_ETH_LINK_SPEED_20G;
+		if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_2)
+			break;
+		ret = RTE_ETH_LINK_SPEED_20G_2LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_25G:
+		if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_1)
+			break;
 		ret = RTE_ETH_LINK_SPEED_25G;
 		break;
 	case RTE_ETH_SPEED_NUM_40G:
-		ret = RTE_ETH_LINK_SPEED_40G;
+		if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_4)
+			break;
+		ret = RTE_ETH_LINK_SPEED_40G_4LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_50G:
-		ret = RTE_ETH_LINK_SPEED_50G;
+		if (lanes == RTE_ETH_LANES_1)
+			ret = RTE_ETH_LINK_SPEED_50G;
+		if (lanes == RTE_ETH_LANES_2)
+			ret = RTE_ETH_LINK_SPEED_50G_2LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_56G:
-		ret = RTE_ETH_LINK_SPEED_56G;
+		if (lanes != RTE_ETH_LANES_UNKNOWN && lanes != RTE_ETH_LANES_4)
+			break;
+		ret = RTE_ETH_LINK_SPEED_56G_4LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_100G:
-		ret = RTE_ETH_LINK_SPEED_100G;
+		if (lanes == RTE_ETH_LANES_1)
+			ret = RTE_ETH_LINK_SPEED_100G;
+		if (lanes == RTE_ETH_LANES_2)
+			ret = RTE_ETH_LINK_SPEED_100G_2LANES;
+		if (lanes == RTE_ETH_LANES_4)
+			ret = RTE_ETH_LINK_SPEED_100G_4LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_200G:
-		ret = RTE_ETH_LINK_SPEED_200G;
+		if (lanes == RTE_ETH_LANES_2)
+			ret = RTE_ETH_LINK_SPEED_200G_2LANES;
+		if (lanes == RTE_ETH_LANES_4)
+			ret = RTE_ETH_LINK_SPEED_200G_4LANES;
 		break;
 	case RTE_ETH_SPEED_NUM_400G:
+		if (lanes == RTE_ETH_LANES_4)
+			ret = RTE_ETH_LINK_SPEED_400G_4LANES;
+		if (lanes == RTE_ETH_LANES_8)
+			ret = RTE_ETH_LINK_SPEED_400G_8LANES;
 		ret = RTE_ETH_LINK_SPEED_400G;
 		break;
 	default:
 		ret = 0;
 	}
 
-	rte_eth_trace_speed_bitflag(speed, duplex, ret);
+	rte_eth_trace_speed_bitflag(speed, lanes, duplex, ret);
 
 	return ret;
 }
 
+uint32_t __vsym
+rte_eth_speed_bitflag_v24(uint32_t speed, int duplex)
+{
+	return rte_eth_speed_bitflag_v25(speed, RTE_ETH_LANES_UNKNOWN, duplex);
+}
+
+/* mark the v24 function as the older version, and v25 as the default version */
+VERSION_SYMBOL(rte_eth_speed_bitflag, _v24, 24);
+BIND_DEFAULT_SYMBOL(rte_eth_speed_bitflag, _v25, 25);
+MAP_STATIC_SYMBOL(uint32_t rte_eth_speed_bitflag(uint32_t speed, uint8_t lanes, int duplex),
+		  rte_eth_speed_bitflag_v25);
+
 const char *
 rte_eth_dev_rx_offload_name(uint64_t offload)
 {
@@ -1066,6 +1115,204 @@ rte_eth_dev_rx_offload_name(uint64_t offload)
 	return name;
 }
 
+int
+rte_eth_speed_capa_to_info(uint32_t link_speed,
+			   struct rte_eth_speed_capa_info *capa_info)
+{
+	const struct {
+		uint32_t speed_capa;
+		struct rte_eth_speed_capa_info capa_info;
+	} speed_capa_info_map[] = {
+		{
+			RTE_ETH_LINK_SPEED_10M_HD,
+				{
+					RTE_ETH_SPEED_NUM_10M,
+					RTE_ETH_LANES_1,
+					RTE_ETH_LINK_HALF_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_10M,
+				{
+					RTE_ETH_SPEED_NUM_10M,
+					RTE_ETH_LANES_1,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_100M_HD,
+				{
+					RTE_ETH_SPEED_NUM_100M,
+					RTE_ETH_LANES_1,
+					RTE_ETH_LINK_HALF_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_100M,
+				{
+					RTE_ETH_SPEED_NUM_100M,
+					RTE_ETH_LANES_1,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_1G,
+				{
+					RTE_ETH_SPEED_NUM_1G,
+					RTE_ETH_LANES_1,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_2_5G,
+				{
+					RTE_ETH_SPEED_NUM_2_5G,
+					RTE_ETH_LANES_1,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_5G,
+				{
+					RTE_ETH_SPEED_NUM_5G,
+					RTE_ETH_LANES_1,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_10G,
+				{
+					RTE_ETH_SPEED_NUM_10G,
+					RTE_ETH_LANES_1,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_20G_2LANES,
+				{
+					RTE_ETH_SPEED_NUM_20G,
+					RTE_ETH_LANES_2,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_25G,
+				{
+					RTE_ETH_SPEED_NUM_25G,
+					RTE_ETH_LANES_1,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_40G_4LANES,
+				{
+					RTE_ETH_SPEED_NUM_40G,
+					RTE_ETH_LANES_4,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_50G,
+				{
+					RTE_ETH_SPEED_NUM_50G,
+					RTE_ETH_LANES_1,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_56G_4LANES,
+				{
+					RTE_ETH_SPEED_NUM_56G,
+					RTE_ETH_LANES_4,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_100G,
+				{
+					RTE_ETH_SPEED_NUM_100G,
+					RTE_ETH_LANES_1,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_200G_4LANES,
+				{
+					RTE_ETH_SPEED_NUM_200G,
+					RTE_ETH_LANES_4,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_400G_4LANES,
+				{
+					RTE_ETH_SPEED_NUM_400G,
+					RTE_ETH_LANES_4,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_10G_4LANES,
+				{
+					RTE_ETH_SPEED_NUM_10G,
+					RTE_ETH_LANES_4,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_50G_2LANES,
+				{
+					RTE_ETH_SPEED_NUM_50G,
+					RTE_ETH_LANES_2,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_100G_2LANES,
+				{
+					RTE_ETH_SPEED_NUM_100G,
+					RTE_ETH_LANES_2,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_100G_4LANES,
+				{
+					RTE_ETH_SPEED_NUM_100G,
+					RTE_ETH_LANES_4,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_200G_2LANES,
+				{
+					RTE_ETH_SPEED_NUM_200G,
+					RTE_ETH_LANES_2,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		},
+		{
+			RTE_ETH_LINK_SPEED_400G_8LANES,
+				{
+					RTE_ETH_SPEED_NUM_400G,
+					RTE_ETH_LANES_8,
+					RTE_ETH_LINK_FULL_DUPLEX
+				}
+		}
+	};
+	uint32_t i;
+
+	for (i = 0; i < RTE_DIM(speed_capa_info_map); i++)
+		if (link_speed == speed_capa_info_map[i].speed_capa) {
+			capa_info->speed = speed_capa_info_map[i].capa_info.speed;
+			capa_info->lanes = speed_capa_info_map[i].capa_info.lanes;
+			capa_info->duplex = speed_capa_info_map[i].capa_info.duplex;
+			return 0;
+		}
+
+	return -EINVAL;
+}
+
 const char *
 rte_eth_dev_tx_offload_name(uint64_t offload)
 {
@@ -3111,8 +3358,9 @@ rte_eth_link_to_str(char *str, size_t len, const struct rte_eth_link *eth_link)
 	if (eth_link->link_status == RTE_ETH_LINK_DOWN)
 		ret = snprintf(str, len, "Link down");
 	else
-		ret = snprintf(str, len, "Link up at %s %s %s",
+		ret = snprintf(str, len, "Link up at %s %ulanes %s %s",
 			rte_eth_link_speed_to_str(eth_link->link_speed),
+			eth_link->link_lanes,
 			(eth_link->link_duplex == RTE_ETH_LINK_FULL_DUPLEX) ?
 			"FDX" : "HDX",
 			(eth_link->link_autoneg == RTE_ETH_LINK_AUTONEG) ?
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 147257d6a2..3383ad8495 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -288,24 +288,40 @@ struct rte_eth_stats {
 /**@{@name Link speed capabilities
  * Device supported speeds bitmap flags
  */
-#define RTE_ETH_LINK_SPEED_AUTONEG 0             /**< Autonegotiate (all speeds) */
-#define RTE_ETH_LINK_SPEED_FIXED   RTE_BIT32(0)  /**< Disable autoneg (fixed speed) */
-#define RTE_ETH_LINK_SPEED_10M_HD  RTE_BIT32(1)  /**<  10 Mbps half-duplex */
-#define RTE_ETH_LINK_SPEED_10M     RTE_BIT32(2)  /**<  10 Mbps full-duplex */
-#define RTE_ETH_LINK_SPEED_100M_HD RTE_BIT32(3)  /**< 100 Mbps half-duplex */
-#define RTE_ETH_LINK_SPEED_100M    RTE_BIT32(4)  /**< 100 Mbps full-duplex */
-#define RTE_ETH_LINK_SPEED_1G      RTE_BIT32(5)  /**<   1 Gbps */
-#define RTE_ETH_LINK_SPEED_2_5G    RTE_BIT32(6)  /**< 2.5 Gbps */
-#define RTE_ETH_LINK_SPEED_5G      RTE_BIT32(7)  /**<   5 Gbps */
-#define RTE_ETH_LINK_SPEED_10G     RTE_BIT32(8)  /**<  10 Gbps */
-#define RTE_ETH_LINK_SPEED_20G     RTE_BIT32(9)  /**<  20 Gbps */
-#define RTE_ETH_LINK_SPEED_25G     RTE_BIT32(10) /**<  25 Gbps */
-#define RTE_ETH_LINK_SPEED_40G     RTE_BIT32(11) /**<  40 Gbps */
-#define RTE_ETH_LINK_SPEED_50G     RTE_BIT32(12) /**<  50 Gbps */
-#define RTE_ETH_LINK_SPEED_56G     RTE_BIT32(13) /**<  56 Gbps */
-#define RTE_ETH_LINK_SPEED_100G    RTE_BIT32(14) /**< 100 Gbps */
-#define RTE_ETH_LINK_SPEED_200G    RTE_BIT32(15) /**< 200 Gbps */
-#define RTE_ETH_LINK_SPEED_400G    RTE_BIT32(16) /**< 400 Gbps */
+#define RTE_ETH_LINK_SPEED_AUTONEG        0             /**< Autonegotiate (all speeds) */
+#define RTE_ETH_LINK_SPEED_FIXED          RTE_BIT32(0)  /**< Disable autoneg (fixed speed) */
+#define RTE_ETH_LINK_SPEED_10M_HD         RTE_BIT32(1)  /**<  10 Mbps half-duplex */
+#define RTE_ETH_LINK_SPEED_10M            RTE_BIT32(2)  /**<  10 Mbps full-duplex */
+#define RTE_ETH_LINK_SPEED_100M_HD        RTE_BIT32(3)  /**< 100 Mbps half-duplex */
+#define RTE_ETH_LINK_SPEED_100M           RTE_BIT32(4)  /**< 100 Mbps full-duplex */
+#define RTE_ETH_LINK_SPEED_1G             RTE_BIT32(5)  /**<   1 Gbps */
+#define RTE_ETH_LINK_SPEED_2_5G           RTE_BIT32(6)  /**< 2.5 Gbps */
+#define RTE_ETH_LINK_SPEED_5G             RTE_BIT32(7)  /**<   5 Gbps */
+#define RTE_ETH_LINK_SPEED_10G            RTE_BIT32(8)  /**<  10 Gbps */
+#define RTE_ETH_LINK_SPEED_20G_2LANES     RTE_BIT32(9)  /**<  20 Gbps 2lanes */
+#define RTE_ETH_LINK_SPEED_25G            RTE_BIT32(10) /**<  25 Gbps */
+#define RTE_ETH_LINK_SPEED_40G_4LANES     RTE_BIT32(11) /**<  40 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_50G            RTE_BIT32(12) /**<  50 Gbps */
+#define RTE_ETH_LINK_SPEED_56G_4LANES     RTE_BIT32(13) /**<  56 Gbps  4lanes */
+#define RTE_ETH_LINK_SPEED_100G           RTE_BIT32(14) /**< 100 Gbps */
+#define RTE_ETH_LINK_SPEED_200G_4LANES    RTE_BIT32(15) /**< 200 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_400G_4LANES    RTE_BIT32(16) /**< 400 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_10G_4LANES     RTE_BIT32(17)  /**<  10 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_50G_2LANES     RTE_BIT32(18) /**<  50 Gbps 2 lanes */
+#define RTE_ETH_LINK_SPEED_100G_2LANES    RTE_BIT32(19) /**< 100 Gbps 2 lanes */
+#define RTE_ETH_LINK_SPEED_100G_4LANES    RTE_BIT32(20) /**< 100 Gbps 4lanes */
+#define RTE_ETH_LINK_SPEED_200G_2LANES    RTE_BIT32(21) /**< 200 Gbps 2lanes */
+#define RTE_ETH_LINK_SPEED_400G_8LANES    RTE_BIT32(22) /**< 400 Gbps 8lanes */
+/**@}*/
+
+/**@{@name Link speed capabilities
+ * Default lanes, use to compatible with earlier versions
+ */
+#define RTE_ETH_LINK_SPEED_20G	RTE_ETH_LINK_SPEED_20G_2LANES
+#define RTE_ETH_LINK_SPEED_40G	RTE_ETH_LINK_SPEED_40G_4LANES
+#define RTE_ETH_LINK_SPEED_56G	RTE_ETH_LINK_SPEED_56G_4LANES
+#define RTE_ETH_LINK_SPEED_200G	RTE_ETH_LINK_SPEED_200G_4LANES
+#define RTE_ETH_LINK_SPEED_400G	RTE_ETH_LINK_SPEED_400G_4LANES
 /**@}*/
 
 /**@{@name Link speed
@@ -329,6 +345,25 @@ struct rte_eth_stats {
 #define RTE_ETH_SPEED_NUM_UNKNOWN UINT32_MAX /**< Unknown */
 /**@}*/
 
+/**@{@name Link lane number
+ * Ethernet lane number
+ */
+#define RTE_ETH_LANES_UNKNOWN    0 /**< Unknown */
+#define RTE_ETH_LANES_1          1 /**< 1 lanes */
+#define RTE_ETH_LANES_2          2 /**< 2 lanes */
+#define RTE_ETH_LANES_4          4 /**< 4 lanes */
+#define RTE_ETH_LANES_8          8 /**< 8 lanes */
+/**@}*/
+
+/**
+ * A structure used to store information of link speed capability.
+ */
+struct rte_eth_speed_capa_info {
+	uint32_t speed;        /**< RTE_ETH_SPEED_NUM_ */
+	uint8_t  lanes;        /**< RTE_ETH_LANES_ */
+	uint8_t  duplex;       /**< RTE_ETH_LANES_ */
+};
+
 /**
  * A structure used to retrieve link-level information of an Ethernet port.
  */
@@ -338,6 +373,7 @@ struct __rte_aligned(8) rte_eth_link { /**< aligned for atomic64 read/write */
 	uint16_t link_duplex  : 1;  /**< RTE_ETH_LINK_[HALF/FULL]_DUPLEX */
 	uint16_t link_autoneg : 1;  /**< RTE_ETH_LINK_[AUTONEG/FIXED] */
 	uint16_t link_status  : 1;  /**< RTE_ETH_LINK_[DOWN/UP] */
+	uint16_t link_lanes   : 8;  /**< RTE_ETH_LANES_ */
 };
 
 /**@{@name Link negotiation
@@ -1641,6 +1677,13 @@ struct rte_eth_conf {
 #define RTE_ETH_DEV_CAPA_FLOW_RULE_KEEP         RTE_BIT64(3)
 /** Device supports keeping shared flow objects across restart. */
 #define RTE_ETH_DEV_CAPA_FLOW_SHARED_OBJECT_KEEP RTE_BIT64(4)
+/**
+ * Device supports setting lanes. When the driver does not support setting lane,
+ * the lane in the speed capability reported by the driver may be incorrect,
+ * for example, if the driver reports that the 200G speed capability
+ * (@see RTE_ETH_LINK_SPEED_200G), the number of used lanes may be 2 or 4.
+ */
+#define RTE_ETH_DEV_CAPA_SETTING_LANES RTE_BIT64(4)
 /**@}*/
 
 /*
@@ -2301,12 +2344,30 @@ uint16_t rte_eth_dev_count_total(void);
  *
  * @param speed
  *   Numerical speed value in Mbps
+ * @param lanes
+ *   RTE_ETH_LANES_x
  * @param duplex
  *   RTE_ETH_LINK_[HALF/FULL]_DUPLEX (only for 10/100M speeds)
  * @return
  *   0 if the speed cannot be mapped
  */
-uint32_t rte_eth_speed_bitflag(uint32_t speed, int duplex);
+uint32_t rte_eth_speed_bitflag(uint32_t speed, uint8_t lanes, int duplex);
+
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice
+ *
+ * Convert bitmap flag to the struct of link speed capability info
+ *
+ * @param link_speed
+ *   speed bitmap (RTE_ETH_LINK_SPEED_)
+ * @return
+ *   - (0) if successful.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_speed_capa_to_info(uint32_t link_speed,
+				struct rte_eth_speed_capa_info *capa_info);
 
 /**
  * Get RTE_ETH_RX_OFFLOAD_* flag name.
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 79f6f5293b..0e9c560920 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -169,6 +169,12 @@ DPDK_24 {
 	local: *;
 };
 
+DPDK_25 {
+	global:
+
+	rte_eth_speed_bitflag;
+} DPDK_24;
+
 EXPERIMENTAL {
 	global:
 
@@ -320,6 +326,7 @@ EXPERIMENTAL {
 	# added in 24.03
 	__rte_eth_trace_tx_queue_count;
 	rte_eth_find_rss_algo;
+	rte_eth_speed_capa_to_info;
 	rte_flow_async_update_resized;
 	rte_flow_calc_encap_hash;
 	rte_flow_template_table_resizable;
-- 
2.33.0


^ permalink raw reply	[relevance 5%]

* Community CI Meeting Minutes - March 7, 2024
@ 2024-03-11 22:36  2% Patrick Robb
  0 siblings, 0 replies; 200+ results
From: Patrick Robb @ 2024-03-11 22:36 UTC (permalink / raw)
  To: ci; +Cc: dev, dts

Sorry, I forgot to send these last week.

March 7, 2024

#####################################################################
Attendees
1. Patrick Robb
2. Ali Alnubani
3. Paul Szczepanek
4. David Marchand
5. Aaron Conole

#####################################################################
Minutes

=====================================================================
General Announcements
* IPSEC-MB requirement increase:
   * Aaron has some questions about whether this new requirement has
been properly documented - having a conversation with Ciara to that
end on the mailing list currently
   * Arm did publish an updated tag for this repo - Ciara has some
ideas for what may be going wrong and started a conversation on the
mailing list
      * Patrick Robbwill forward this conversation to Paul
   * Building under OpenSSL is still supported
* Server Refresh:
   * See the mailing list for the most recent ideas, but we will be
putting various options in front of GB in the March meeting
   * Idea is to try to support as many arches as possible (intel x86,
amd x86, arm grace-grace)

=====================================================================
CI Status

---------------------------------------------------------------------
UNH-IOL Community Lab
* Hardware Refresh:
   * NVIDIA CX7:
      * Without writing out the whole background for the cx7 testing
on our NVIDIA DUT, we are being bandwidth capped by the server with
this performance testing, but this can be worked around by acquiring a
2nd CX7 NIC for the DUT server.
         * For on thing, this corresponds to the testing NVIDIA
publishes: https://fast.dpdk.org/doc/perf/DPDK_23_07_NVIDIA_NIC_performance_report.pdf
         * Patrick has asked whether NVIDIA can donate this NIC. We
can also go to the DPDK project asking for it, but they have already
provided two cx7 to the Community Lab, so it is not ideal.
         * Over email we have noted that the Broadwell CPU is old and
may not be adequate for higher bandwidth testing
   * QAT 8970 on Amper Server: Has been dry run and is working
      * Requires a few change in DTS which Patrick can submit once
David/Dharmik give approval (basically relates to loading vfio with
custom options for certain QAT devices only)
      * If there are no objections, UNH folks can write up the
automation scripts today and get the testing online today or next
week.
* Test Coverage changes:
   * OpenSSL driver test has been added to our unit testing jobs
   * Marvell mvneta build test has been added, per:
https://doc.dpdk.org/guides/nics/mvneta.html
* Debian 12 has been added to the CI template engine, and we’re
running testing from this now
   * Need to upstream this.
* Robin Jarry noted on Slack UNH has been sending out results to
test-reports mailing list without setting in-reply-to message-id for
the patchseries. Adam has resolved this.
   * Ferruh also notes that in looking at this he noticed duplicate
emails being sent by UNH, which we still need to resolve
* Cody at UNH has been making updates to testing on Windows:
   * Did modify the 2022 build test this week, moving it from the MSVC
preview compiler to the MSVC standard compiler (which with v17.9.2 has
now caught up to the build features previously only available in the
preview version)
   * Cody is also adding the Clang and Mingw64 compile jobs to the
2022 server (they are only on server 2019 right now) and also is
adding DPDK unit test/fast tests to the 2022 server.
* David Marchand noticed a bug with the create artifact script: After
failing to apply on the recommended branch and trying to fall back on
applying to main, it did not checkout to tip of main. Patrick will
look.
* Bugzilla ticket was creating noting that we need to add
23.11-staging to our CI

---------------------------------------------------------------------
Intel Lab
* None

---------------------------------------------------------------------
Github Actions
* Has to double check the ipsec-mb requirement and how we generate abi
symbols. Need to check that they are pulling the right version.
* In progress in migrating back to the original server this ran on
before the server was physically moved to another location
   * Going to completely re-image/update the server
* Posted a series for adding Cirrus-CI to the robot monitoring
   * Comments are welcome on the mailin list
   * Need to add a Cirrus YAML config for the DPDK repo

---------------------------------------------------------------------
Loongarch Lab
* Zhoumin has stated on the mailing list that he can support the email
based retest framework
   * Possible to store commit hash when series as submitted, and
recreate those artifacts as needed
   * Also can support re-apply on tip of branch X
   * There is an ongoing conversation on CI mailing list for this

=====================================================================
DTS Improvements & Test Development
24.07 Roadmap
* Ethdev testsuites:
   * Nicholas:
      * Jumboframes:
https://git.dpdk.org/tools/dts/tree/test_plans/jumboframes_test_plan.rst
      * Mac Filter:
https://git.dpdk.org/tools/dts/tree/test_plans/mac_filter_test_plan.rst
   * Prince:
      * Dynamic Queue:
https://git.dpdk.org/tools/dts/tree/test_plans/dynamic_queue_test_plan.rst
   * Need to vet the testsuites. It may be possible to add additional
testcases, refactor testcases. We want to flex the same capabilities
as the old testsuites, but make improvements where possible.
      * We should loop in ethdev maintainers and ask for their review
on the testcases
      * David test an email a couple years ago which priority ranked
some ethdev capabilities and testsuites, and if we can find this email
we should use it.
         * https://inbox.dpdk.org/ci/CAJFAV8y8-LSh5vniZXR812ckKNa2ELEJVRKRzT53PVu2zO902w@mail.gmail.com/
* Configuration schema updates:
   * Nicholas:
      * Working on the Hugepages allocation first, then will do the
other config updates (ripping out some unneeded keys from the schema)
      * Will follow up with the ethdev testsuites
* API Docs generation:
   * Juraj: Needs review from Thomas (the Doxygen integration part),
may need to be addressed when Juraj gets back from vacation.
* Skip test cases based on testbed capabilities:
   * Juraj: RFC should be ready before Juraj leaves on vacation. 24.07
shouldn't be a problem.
   * RFC Patch:
https://patches.dpdk.org/project/dpdk/patch/20240301155416.96960-1-juraj.linkes@pantheon.tech/
      * The patch requires
https://patches.dpdk.org/project/dpdk/list/?series=31329
   * Bugzilla: https://bugs.dpdk.org/show_bug.cgi?id=1351
* Rename the execution section/stage:
   * Juraj: Juraj will work on this in 24.07 and submit a patch to
continue the discussion. The v1 patch will be ready for 24.07, but the
discussion/review could push the patch to 24.11.
   * Bugzilla: https://bugs.dpdk.org/show_bug.cgi?id=1355
* Add support for externally compiled DPDK:
   * Juraj: Juraj will start working on this in 24.07. There's a small
chance we'll get this in 24.07, but Juraj wants to target this for
24.11.
   * Bugzilla: https://bugs.dpdk.org/show_bug.cgi?id=1365
* Jeremy has a bugzilla ticket for refactoring how we handle scapy on
the TG (no more XMLRPC server), and will do this in 24.07
* We will finalize at next DTS meeting

=====================================================================
Any other business
* Next Meeting: March 21, 2024

^ permalink raw reply	[relevance 2%]

* RE: [PATCH v3 07/33] net/ena: restructure the llq policy setting process
  2024-03-08 17:24  3%   ` Ferruh Yigit
@ 2024-03-10 14:29  0%     ` Brandes, Shai
  2024-03-13 11:21  0%       ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Brandes, Shai @ 2024-03-10 14:29 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, March 8, 2024 7:24 PM
> To: Brandes, Shai <shaibran@amazon.com>
> Cc: dev@dpdk.org
> Subject: RE: [EXTERNAL] [PATCH v3 07/33] net/ena: restructure the llq policy
> setting process
> 
> CAUTION: This email originated from outside of the organization. Do not click
> links or open attachments unless you can confirm the sender and know the
> content is safe.
> 
> 
> 
> On 3/6/2024 12:24 PM, shaibran@amazon.com wrote:
> > From: Shai Brandes <shaibran@amazon.com>
> >
> > The driver will set the size of the LLQ header size according to the
> > recommendation from the device.
> > Replaced `enable_llq` and `large_llq_hdr` devargs with a new devarg
> > `llq_policy` that accepts the following values:
> > 0 - Disable LLQ.
> >     Use with extreme caution as it leads to a huge performance
> >     degradation on AWS instances from 6th generation onwards.
> > 1 - Accept device recommended LLQ policy (Default).
> >     Device can recommend normal or large LLQ policy.
> > 2 - Enforce normal LLQ policy.
> > 3 - Enforce large LLQ policy.
> >     Required for packets with header that exceed 96 bytes on
> >     AWS instances prior to 5th generation.
> >
> 
> We had similar discussion before, although dev_args is not part of the ABI, it
> is an user interface, and changes in the devargs will impact users directly.
> 
> What would you think to either keep backward compatilibity in the devargs
> (like not remove old one but add new one), or do this change in
> 24.11 release?
[Brandes, Shai] understood. 
The new devarg replaced the old ones and added option to enforce normal-llq mode which is critical for our release.
As you suggested, we will keep backward compatibility and add an additional devarg for enforcing  normal-llq policy.
That way, we can easily replace it in future releases with a common devarg without the need to make major logic changes.


^ permalink raw reply	[relevance 0%]

* [RFC v3] tap: do not duplicate fd's
  2024-03-08 18:54  2% [RFC] eal: increase the number of availble file descriptors for MP Stephen Hemminger
  2024-03-08 20:36  8% ` [RFC v2] eal: increase passed max multi-process file descriptors Stephen Hemminger
@ 2024-03-09 18:12  2% ` Stephen Hemminger
  2024-03-14 14:40  4% ` [RFC] eal: increase the number of availble file descriptors for MP David Marchand
  2 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-03-09 18:12 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The TAP devic can use the same file descriptor for both rx and tx queues.
This allows up to 8 queues (versus 4) and reduces some resource consumption.
Also, reduce the TAP_MAX_QUEUES to what the multi-process restrictions
will allow.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v3 - This is more limited patch, only addresses tap device
     and only gets tap up from 4 to 8 queues.

     Still better to fix underlying EAL issue, but that requires
     overriding strict ABI rules.

 drivers/net/tap/meson.build   |   2 +-
 drivers/net/tap/rte_eth_tap.c | 197 +++++++++++++++-------------------
 drivers/net/tap/rte_eth_tap.h |   3 +-
 drivers/net/tap/tap_flow.c    |   3 +-
 drivers/net/tap/tap_intr.c    |  12 ++-
 5 files changed, 95 insertions(+), 122 deletions(-)

diff --git a/drivers/net/tap/meson.build b/drivers/net/tap/meson.build
index 5099ccdff11b..9cd124d53e23 100644
--- a/drivers/net/tap/meson.build
+++ b/drivers/net/tap/meson.build
@@ -16,7 +16,7 @@ sources = files(
 
 deps = ['bus_vdev', 'gso', 'hash']
 
-cflags += '-DTAP_MAX_QUEUES=16'
+cflags += '-DTAP_MAX_QUEUES=8'
 
 # input array for meson symbol search:
 # [ "MACRO to define if found", "header for the search",
diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 69d9da695bed..38a1b2d825f9 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -124,8 +124,7 @@ enum ioctl_mode {
 /* Message header to synchronize queues via IPC */
 struct ipc_queues {
 	char port_name[RTE_DEV_NAME_MAX_LEN];
-	int rxq_count;
-	int txq_count;
+	int q_count;
 	/*
 	 * The file descriptors are in the dedicated part
 	 * of the Unix message to be translated by the kernel.
@@ -446,7 +445,7 @@ pmd_rx_burst(void *queue, struct rte_mbuf **bufs, uint16_t nb_pkts)
 		uint16_t data_off = rte_pktmbuf_headroom(mbuf);
 		int len;
 
-		len = readv(process_private->rxq_fds[rxq->queue_id],
+		len = readv(process_private->fds[rxq->queue_id],
 			*rxq->iovecs,
 			1 + (rxq->rxmode->offloads & RTE_ETH_RX_OFFLOAD_SCATTER ?
 			     rxq->nb_rx_desc : 1));
@@ -643,7 +642,7 @@ tap_write_mbufs(struct tx_queue *txq, uint16_t num_mbufs,
 		}
 
 		/* copy the tx frame data */
-		n = writev(process_private->txq_fds[txq->queue_id], iovecs, k);
+		n = writev(process_private->fds[txq->queue_id], iovecs, k);
 		if (n <= 0)
 			return -1;
 
@@ -851,7 +850,6 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
 	struct rte_mp_msg msg;
 	struct ipc_queues *request_param = (struct ipc_queues *)msg.param;
 	int err;
-	int fd_iterator = 0;
 	struct pmd_process_private *process_private = dev->process_private;
 	int i;
 
@@ -859,16 +857,13 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
 	strlcpy(msg.name, TAP_MP_REQ_START_RXTX, sizeof(msg.name));
 	strlcpy(request_param->port_name, dev->data->name, sizeof(request_param->port_name));
 	msg.len_param = sizeof(*request_param);
-	for (i = 0; i < dev->data->nb_tx_queues; i++) {
-		msg.fds[fd_iterator++] = process_private->txq_fds[i];
-		msg.num_fds++;
-		request_param->txq_count++;
-	}
-	for (i = 0; i < dev->data->nb_rx_queues; i++) {
-		msg.fds[fd_iterator++] = process_private->rxq_fds[i];
-		msg.num_fds++;
-		request_param->rxq_count++;
-	}
+
+	/* rx and tx share file descriptors and nb_tx_queues == nb_rx_queues */
+	for (i = 0; i < dev->data->nb_rx_queues; i++)
+		msg.fds[i] = process_private->fds[i];
+
+	request_param->q_count = dev->data->nb_rx_queues;
+	msg.num_fds = dev->data->nb_rx_queues;
 
 	err = rte_mp_sendmsg(&msg);
 	if (err < 0) {
@@ -910,8 +905,6 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
 	struct rte_eth_dev *dev;
 	const struct ipc_queues *request_param =
 		(const struct ipc_queues *)request->param;
-	int fd_iterator;
-	int queue;
 	struct pmd_process_private *process_private;
 
 	dev = rte_eth_dev_get_by_name(request_param->port_name);
@@ -920,14 +913,13 @@ tap_mp_req_start_rxtx(const struct rte_mp_msg *request, __rte_unused const void
 			request_param->port_name);
 		return -1;
 	}
+
 	process_private = dev->process_private;
-	fd_iterator = 0;
-	TAP_LOG(DEBUG, "tap_attach rx_q:%d tx_q:%d\n", request_param->rxq_count,
-		request_param->txq_count);
-	for (queue = 0; queue < request_param->txq_count; queue++)
-		process_private->txq_fds[queue] = request->fds[fd_iterator++];
-	for (queue = 0; queue < request_param->rxq_count; queue++)
-		process_private->rxq_fds[queue] = request->fds[fd_iterator++];
+	TAP_LOG(DEBUG, "tap_attach q:%d\n", request_param->q_count);
+
+	for (int q = 0; q < request_param->q_count; q++)
+		process_private->fds[q] = request->fds[q];
+
 
 	return 0;
 }
@@ -1121,7 +1113,6 @@ tap_dev_close(struct rte_eth_dev *dev)
 	int i;
 	struct pmd_internals *internals = dev->data->dev_private;
 	struct pmd_process_private *process_private = dev->process_private;
-	struct rx_queue *rxq;
 
 	if (rte_eal_process_type() != RTE_PROC_PRIMARY) {
 		rte_free(dev->process_private);
@@ -1141,19 +1132,18 @@ tap_dev_close(struct rte_eth_dev *dev)
 	}
 
 	for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
-		if (process_private->rxq_fds[i] != -1) {
-			rxq = &internals->rxq[i];
-			close(process_private->rxq_fds[i]);
-			process_private->rxq_fds[i] = -1;
-			tap_rxq_pool_free(rxq->pool);
-			rte_free(rxq->iovecs);
-			rxq->pool = NULL;
-			rxq->iovecs = NULL;
-		}
-		if (process_private->txq_fds[i] != -1) {
-			close(process_private->txq_fds[i]);
-			process_private->txq_fds[i] = -1;
-		}
+		struct rx_queue *rxq = &internals->rxq[i];
+
+		if (process_private->fds[i] == -1)
+			continue;
+
+		close(process_private->fds[i]);
+		process_private->fds[i] = -1;
+
+		tap_rxq_pool_free(rxq->pool);
+		rte_free(rxq->iovecs);
+		rxq->pool = NULL;
+		rxq->iovecs = NULL;
 	}
 
 	if (internals->remote_if_index) {
@@ -1198,6 +1188,15 @@ tap_dev_close(struct rte_eth_dev *dev)
 	return 0;
 }
 
+static void
+tap_queue_close(struct pmd_process_private *process_private, uint16_t qid)
+{
+	if (process_private->fds[qid] != -1) {
+		close(process_private->fds[qid]);
+		process_private->fds[qid] = -1;
+	}
+}
+
 static void
 tap_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 {
@@ -1206,15 +1205,16 @@ tap_rx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	if (!rxq)
 		return;
+
 	process_private = rte_eth_devices[rxq->in_port].process_private;
-	if (process_private->rxq_fds[rxq->queue_id] != -1) {
-		close(process_private->rxq_fds[rxq->queue_id]);
-		process_private->rxq_fds[rxq->queue_id] = -1;
-		tap_rxq_pool_free(rxq->pool);
-		rte_free(rxq->iovecs);
-		rxq->pool = NULL;
-		rxq->iovecs = NULL;
-	}
+
+	tap_rxq_pool_free(rxq->pool);
+	rte_free(rxq->iovecs);
+	rxq->pool = NULL;
+	rxq->iovecs = NULL;
+
+	if (dev->data->tx_queues[qid] == NULL)
+		tap_queue_close(process_private, qid);
 }
 
 static void
@@ -1225,12 +1225,10 @@ tap_tx_queue_release(struct rte_eth_dev *dev, uint16_t qid)
 
 	if (!txq)
 		return;
-	process_private = rte_eth_devices[txq->out_port].process_private;
 
-	if (process_private->txq_fds[txq->queue_id] != -1) {
-		close(process_private->txq_fds[txq->queue_id]);
-		process_private->txq_fds[txq->queue_id] = -1;
-	}
+	process_private = rte_eth_devices[txq->out_port].process_private;
+	if (dev->data->rx_queues[qid] == NULL)
+		tap_queue_close(process_private, qid);
 }
 
 static int
@@ -1482,52 +1480,34 @@ tap_setup_queue(struct rte_eth_dev *dev,
 		uint16_t qid,
 		int is_rx)
 {
-	int ret;
-	int *fd;
-	int *other_fd;
-	const char *dir;
+	int fd, ret;
 	struct pmd_internals *pmd = dev->data->dev_private;
 	struct pmd_process_private *process_private = dev->process_private;
 	struct rx_queue *rx = &internals->rxq[qid];
 	struct tx_queue *tx = &internals->txq[qid];
-	struct rte_gso_ctx *gso_ctx;
+	struct rte_gso_ctx *gso_ctx = NULL;
+	const char *dir = is_rx ? "rx" : "tx";
 
-	if (is_rx) {
-		fd = &process_private->rxq_fds[qid];
-		other_fd = &process_private->txq_fds[qid];
-		dir = "rx";
-		gso_ctx = NULL;
-	} else {
-		fd = &process_private->txq_fds[qid];
-		other_fd = &process_private->rxq_fds[qid];
-		dir = "tx";
+	if (is_rx)
 		gso_ctx = &tx->gso_ctx;
-	}
-	if (*fd != -1) {
+
+	fd = process_private->fds[qid];
+	if (fd != -1) {
 		/* fd for this queue already exists */
 		TAP_LOG(DEBUG, "%s: fd %d for %s queue qid %d exists",
-			pmd->name, *fd, dir, qid);
+			pmd->name, fd, dir, qid);
 		gso_ctx = NULL;
-	} else if (*other_fd != -1) {
-		/* Only other_fd exists. dup it */
-		*fd = dup(*other_fd);
-		if (*fd < 0) {
-			*fd = -1;
-			TAP_LOG(ERR, "%s: dup() failed.", pmd->name);
-			return -1;
-		}
-		TAP_LOG(DEBUG, "%s: dup fd %d for %s queue qid %d (%d)",
-			pmd->name, *other_fd, dir, qid, *fd);
 	} else {
-		/* Both RX and TX fds do not exist (equal -1). Create fd */
-		*fd = tun_alloc(pmd, 0, 0);
-		if (*fd < 0) {
-			*fd = -1; /* restore original value */
+		fd = tun_alloc(pmd, 0, 0);
+		if (fd < 0) {
 			TAP_LOG(ERR, "%s: tun_alloc() failed.", pmd->name);
 			return -1;
 		}
+
 		TAP_LOG(DEBUG, "%s: add %s queue for qid %d fd %d",
-			pmd->name, dir, qid, *fd);
+			pmd->name, dir, qid, fd);
+
+		process_private->fds[qid] = fd;
 	}
 
 	tx->mtu = &dev->data->mtu;
@@ -1540,7 +1520,7 @@ tap_setup_queue(struct rte_eth_dev *dev,
 
 	tx->type = pmd->type;
 
-	return *fd;
+	return fd;
 }
 
 static int
@@ -1620,7 +1600,7 @@ tap_rx_queue_setup(struct rte_eth_dev *dev,
 
 	TAP_LOG(DEBUG, "  RX TUNTAP device name %s, qid %d on fd %d",
 		internals->name, rx_queue_id,
-		process_private->rxq_fds[rx_queue_id]);
+		process_private->fds[rx_queue_id]);
 
 	return 0;
 
@@ -1664,7 +1644,7 @@ tap_tx_queue_setup(struct rte_eth_dev *dev,
 	TAP_LOG(DEBUG,
 		"  TX TUNTAP device name %s, qid %d on fd %d csum %s",
 		internals->name, tx_queue_id,
-		process_private->txq_fds[tx_queue_id],
+		process_private->fds[tx_queue_id],
 		txq->csum ? "on" : "off");
 
 	return 0;
@@ -2001,10 +1981,9 @@ eth_dev_tap_create(struct rte_vdev_device *vdev, const char *tap_name,
 	dev->intr_handle = pmd->intr_handle;
 
 	/* Presetup the fds to -1 as being not valid */
-	for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++) {
-		process_private->rxq_fds[i] = -1;
-		process_private->txq_fds[i] = -1;
-	}
+	for (i = 0; i < RTE_PMD_TAP_MAX_QUEUES; i++)
+		process_private->fds[i] = -1;
+
 
 	if (pmd->type == ETH_TUNTAP_TYPE_TAP) {
 		if (rte_is_zero_ether_addr(mac_addr))
@@ -2332,7 +2311,6 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
 	struct ipc_queues *request_param = (struct ipc_queues *)request.param;
 	struct ipc_queues *reply_param;
 	struct pmd_process_private *process_private = dev->process_private;
-	int queue, fd_iterator;
 
 	/* Prepare the request */
 	memset(&request, 0, sizeof(request));
@@ -2352,18 +2330,17 @@ tap_mp_attach_queues(const char *port_name, struct rte_eth_dev *dev)
 	TAP_LOG(DEBUG, "Received IPC reply for %s", reply_param->port_name);
 
 	/* Attach the queues from received file descriptors */
-	if (reply_param->rxq_count + reply_param->txq_count != reply->num_fds) {
+	if (reply_param->q_count != reply->num_fds) {
 		TAP_LOG(ERR, "Unexpected number of fds received");
 		return -1;
 	}
 
-	dev->data->nb_rx_queues = reply_param->rxq_count;
-	dev->data->nb_tx_queues = reply_param->txq_count;
-	fd_iterator = 0;
-	for (queue = 0; queue < reply_param->rxq_count; queue++)
-		process_private->rxq_fds[queue] = reply->fds[fd_iterator++];
-	for (queue = 0; queue < reply_param->txq_count; queue++)
-		process_private->txq_fds[queue] = reply->fds[fd_iterator++];
+	dev->data->nb_rx_queues = reply_param->q_count;
+	dev->data->nb_tx_queues = reply_param->q_count;
+
+	for (int q = 0; q < reply_param->q_count; q++)
+		process_private->fds[q] = reply->fds[q];
+
 	free(reply);
 	return 0;
 }
@@ -2393,25 +2370,19 @@ tap_mp_sync_queues(const struct rte_mp_msg *request, const void *peer)
 
 	/* Fill file descriptors for all queues */
 	reply.num_fds = 0;
-	reply_param->rxq_count = 0;
-	if (dev->data->nb_rx_queues + dev->data->nb_tx_queues >
-			RTE_MP_MAX_FD_NUM){
-		TAP_LOG(ERR, "Number of rx/tx queues exceeds max number of fds");
+	reply_param->q_count = 0;
+
+	RTE_ASSERT(dev->data->nb_rx_queues == dev->data->nb_tx_queues);
+	if (dev->data->nb_rx_queues > RTE_MP_MAX_FD_NUM) {
+		TAP_LOG(ERR, "Number of rx/tx queues %u exceeds max number of fds %u",
+			dev->data->nb_rx_queues, RTE_MP_MAX_FD_NUM);
 		return -1;
 	}
 
 	for (queue = 0; queue < dev->data->nb_rx_queues; queue++) {
-		reply.fds[reply.num_fds++] = process_private->rxq_fds[queue];
-		reply_param->rxq_count++;
-	}
-	RTE_ASSERT(reply_param->rxq_count == dev->data->nb_rx_queues);
-
-	reply_param->txq_count = 0;
-	for (queue = 0; queue < dev->data->nb_tx_queues; queue++) {
-		reply.fds[reply.num_fds++] = process_private->txq_fds[queue];
-		reply_param->txq_count++;
+		reply.fds[reply.num_fds++] = process_private->fds[queue];
+		reply_param->q_count++;
 	}
-	RTE_ASSERT(reply_param->txq_count == dev->data->nb_tx_queues);
 
 	/* Send reply */
 	strlcpy(reply.name, request->name, sizeof(reply.name));
diff --git a/drivers/net/tap/rte_eth_tap.h b/drivers/net/tap/rte_eth_tap.h
index 5ac93f93e961..dc8201020b5f 100644
--- a/drivers/net/tap/rte_eth_tap.h
+++ b/drivers/net/tap/rte_eth_tap.h
@@ -96,8 +96,7 @@ struct pmd_internals {
 };
 
 struct pmd_process_private {
-	int rxq_fds[RTE_PMD_TAP_MAX_QUEUES];
-	int txq_fds[RTE_PMD_TAP_MAX_QUEUES];
+	int fds[RTE_PMD_TAP_MAX_QUEUES];
 };
 
 /* tap_intr.c */
diff --git a/drivers/net/tap/tap_flow.c b/drivers/net/tap/tap_flow.c
index fa50fe45d7b7..a78fd50cd494 100644
--- a/drivers/net/tap/tap_flow.c
+++ b/drivers/net/tap/tap_flow.c
@@ -1595,8 +1595,9 @@ tap_flow_isolate(struct rte_eth_dev *dev,
 	 * If netdevice is there, setup appropriate flow rules immediately.
 	 * Otherwise it will be set when bringing up the netdevice (tun_alloc).
 	 */
-	if (!process_private->rxq_fds[0])
+	if (process_private->fds[0] == -1)
 		return 0;
+
 	if (set) {
 		struct rte_flow *remote_flow;
 
diff --git a/drivers/net/tap/tap_intr.c b/drivers/net/tap/tap_intr.c
index a9097def1a32..bc953791635e 100644
--- a/drivers/net/tap/tap_intr.c
+++ b/drivers/net/tap/tap_intr.c
@@ -68,20 +68,22 @@ tap_rx_intr_vec_install(struct rte_eth_dev *dev)
 	}
 	for (i = 0; i < n; i++) {
 		struct rx_queue *rxq = pmd->dev->data->rx_queues[i];
+		int fd = process_private->fds[i];
 
 		/* Skip queues that cannot request interrupts. */
-		if (!rxq || process_private->rxq_fds[i] == -1) {
+		if (!rxq || fd == -1) {
 			/* Use invalid intr_vec[] index to disable entry. */
 			if (rte_intr_vec_list_index_set(intr_handle, i,
-			RTE_INTR_VEC_RXTX_OFFSET + RTE_MAX_RXTX_INTR_VEC_ID))
+						RTE_INTR_VEC_RXTX_OFFSET + RTE_MAX_RXTX_INTR_VEC_ID))
 				return -rte_errno;
 			continue;
 		}
+
 		if (rte_intr_vec_list_index_set(intr_handle, i,
-					RTE_INTR_VEC_RXTX_OFFSET + count))
+						RTE_INTR_VEC_RXTX_OFFSET + count))
 			return -rte_errno;
-		if (rte_intr_efds_index_set(intr_handle, count,
-						   process_private->rxq_fds[i]))
+
+		if (rte_intr_efds_index_set(intr_handle, count, fd))
 			return -rte_errno;
 		count++;
 	}
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [RFC v2] eal: increase passed max multi-process file descriptors
  2024-03-08 18:54  2% [RFC] eal: increase the number of availble file descriptors for MP Stephen Hemminger
@ 2024-03-08 20:36  8% ` Stephen Hemminger
  2024-03-14 15:23  0%   ` Stephen Hemminger
  2024-03-09 18:12  2% ` [RFC v3] tap: do not duplicate fd's Stephen Hemminger
  2024-03-14 14:40  4% ` [RFC] eal: increase the number of availble file descriptors for MP David Marchand
  2 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-03-08 20:36 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Anatoly Burakov, Jianfeng Tan

Both XDP and TAP device are limited in the number of queues
because of limitations on the number of file descriptors that
are allowed. The original choice of 8 was too low; the allowed
maximum is 253 according to unix(7) man page.

This may look like a serious ABI breakage but it is not.
It is simpler for everyone if the limit is increased rather than
building a parallel set of calls.

The case that matters is older application registering MP support
with the newer version of EAL. In this case, since the old application
will always send the more compact structure (less possible fd's)
it is OK.

Request (for up to 8 fds) sent to EAL.
   - EAL only references up to num_fds.
   - The area past the old fd array is not accessed.

Reply callback:
   - EAL will pass pointer to the new (larger structure),
     the old callback will only look at the first part of
     the fd array (num_fds <= 8).

   - Since primary and secondary must both be from same DPDK version
     there is normal way that a reply with more fd's could be possible.
     The only case is the same as above, where application requested
     something that would break in old version and now succeeds.

The one possible incompatibility is that if application passed
a larger number of fd's (32?) and expected an error. Now it will
succeed and get passed through.

Fixes: bacaa2754017 ("eal: add channel for multi-process communication")
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
v2 - show the simpler way to address with some minor ABI issue

 doc/guides/rel_notes/release_24_03.rst | 4 ++++
 lib/eal/include/rte_eal.h              | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 932688ca4d82..1d33cfa15dfb 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -225,6 +225,10 @@ API Changes
 * ethdev: Renamed structure ``rte_flow_action_modify_data`` to be
   ``rte_flow_field_data`` for more generic usage.
 
+* eal: The maximum number of file descriptors allowed to be passed in
+  multi-process requests is increased from 8 to the maximum possible on
+  Linux unix domain sockets 253. This allows for more queues on XDP and
+  TAP device.
 
 ABI Changes
 -----------
diff --git a/lib/eal/include/rte_eal.h b/lib/eal/include/rte_eal.h
index c2256f832e51..cd84fcdd1bdb 100644
--- a/lib/eal/include/rte_eal.h
+++ b/lib/eal/include/rte_eal.h
@@ -155,7 +155,7 @@ int rte_eal_primary_proc_alive(const char *config_file_path);
  */
 bool rte_mp_disable(void);
 
-#define RTE_MP_MAX_FD_NUM	8    /* The max amount of fds */
+#define RTE_MP_MAX_FD_NUM	253    /* The max amount of fds */
 #define RTE_MP_MAX_NAME_LEN	64   /* The max length of action name */
 #define RTE_MP_MAX_PARAM_LEN	256  /* The max length of param */
 struct rte_mp_msg {
-- 
2.43.0


^ permalink raw reply	[relevance 8%]

* [RFC] eal: increase the number of availble file descriptors for MP
@ 2024-03-08 18:54  2% Stephen Hemminger
  2024-03-08 20:36  8% ` [RFC v2] eal: increase passed max multi-process file descriptors Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 200+ results
From: Stephen Hemminger @ 2024-03-08 18:54 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger, Anatoly Burakov

The current limit of file descriptors is too low, it should have
been set to the maximum possible to send across an unix domain
socket.

This is an attempt to allow increasing it without breaking ABI.
But in the process it exposes what is broken about how symbol
versions are checked in check-symbol-maps.sh. That script is
broken in that it won't allow adding a backwards compatiable
version hook like this.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/eal/common/eal_common_proc.c | 118 ++++++++++++++++++++++++++-----
 lib/eal/common/meson.build       |   2 +
 lib/eal/include/rte_eal.h        |  13 +++-
 lib/eal/version.map              |   9 +++
 4 files changed, 125 insertions(+), 17 deletions(-)

diff --git a/lib/eal/common/eal_common_proc.c b/lib/eal/common/eal_common_proc.c
index d24093937c1d..c08113a8d9e0 100644
--- a/lib/eal/common/eal_common_proc.c
+++ b/lib/eal/common/eal_common_proc.c
@@ -27,6 +27,7 @@
 #include <rte_lcore.h>
 #include <rte_log.h>
 #include <rte_thread.h>
+#include <rte_function_versioning.h>
 
 #include "eal_memcfg.h"
 #include "eal_private.h"
@@ -796,7 +797,7 @@ mp_send(struct rte_mp_msg *msg, const char *peer, int type)
 }
 
 static int
-check_input(const struct rte_mp_msg *msg)
+check_input(const struct rte_mp_msg *msg, int max_fd)
 {
 	if (msg == NULL) {
 		EAL_LOG(ERR, "Msg cannot be NULL");
@@ -825,9 +826,8 @@ check_input(const struct rte_mp_msg *msg)
 		return -1;
 	}
 
-	if (msg->num_fds > RTE_MP_MAX_FD_NUM) {
-		EAL_LOG(ERR, "Cannot send more than %d FDs",
-			RTE_MP_MAX_FD_NUM);
+	if (msg->num_fds > max_fd) {
+		EAL_LOG(ERR, "Cannot send more than %d FDs", max_fd);
 		rte_errno = E2BIG;
 		return -1;
 	}
@@ -835,13 +835,13 @@ check_input(const struct rte_mp_msg *msg)
 	return 0;
 }
 
-int
-rte_mp_sendmsg(struct rte_mp_msg *msg)
+static int
+mp_sendmsg(struct rte_mp_msg *msg, int max_fd)
 {
 	const struct internal_config *internal_conf =
 		eal_get_internal_configuration();
 
-	if (check_input(msg) != 0)
+	if (check_input(msg, max_fd) != 0)
 		return -1;
 
 	if (internal_conf->no_shconf) {
@@ -854,6 +854,24 @@ rte_mp_sendmsg(struct rte_mp_msg *msg)
 	return mp_send(msg, NULL, MP_MSG);
 }
 
+int rte_mp_sendmsg_V23(struct rte_mp_old_msg *msg);
+int rte_mp_sendmsg_V24(struct rte_mp_msg *msg);
+
+int
+rte_mp_sendmsg_V23(struct rte_mp_old_msg *omsg)
+{
+	return mp_sendmsg((struct rte_mp_msg *)omsg, RTE_MP_MAX_OLD_FD_NUM);
+}
+VERSION_SYMBOL(rte_mp_sendmsg, _V23, 23);
+
+int
+rte_mp_sendmsg_V24(struct rte_mp_msg *msg)
+{
+	return mp_sendmsg(msg, RTE_MP_MAX_FD_NUM);
+}
+BIND_DEFAULT_SYMBOL(rte_mp_sendmsg, _V24, 24);
+MAP_STATIC_SYMBOL(int rte_mp_sendmsg(struct rte_mp_msg *msg), rte_mp_sendmsg_V24);
+
 static int
 mp_request_async(const char *dst, struct rte_mp_msg *req,
 		struct async_request_param *param, const struct timespec *ts)
@@ -988,9 +1006,9 @@ mp_request_sync(const char *dst, struct rte_mp_msg *req,
 	return 0;
 }
 
-int
-rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,
-		const struct timespec *ts)
+static int
+__rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,
+		      const struct timespec *ts, int max_fd)
 {
 	int dir_fd, ret = -1;
 	DIR *mp_dir;
@@ -1005,7 +1023,7 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,
 	reply->nb_received = 0;
 	reply->msgs = NULL;
 
-	if (check_input(req) != 0)
+	if (check_input(req, max_fd) != 0)
 		goto end;
 
 	if (internal_conf->no_shconf) {
@@ -1085,9 +1103,34 @@ rte_mp_request_sync(struct rte_mp_msg *req, struct rte_mp_reply *reply,
 	return ret;
 }
 
+int rte_mp_request_sync_V23(struct rte_mp_old_msg *req, struct rte_mp_reply *reply,
+			    const struct timespec *ts);
+int rte_mp_request_sync_V24(struct rte_mp_msg *req, struct rte_mp_reply *reply,
+			    const struct timespec *ts);
+
+
+int
+rte_mp_request_sync_V23(struct rte_mp_old_msg *req, struct rte_mp_reply *reply,
+		    const struct timespec *ts)
+{
+	return __rte_mp_request_sync((struct rte_mp_msg *)req, reply, ts, RTE_MP_MAX_OLD_FD_NUM);
+}
+VERSION_SYMBOL(rte_mp_request_sync, _V23, 23);
+
 int
-rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
-		rte_mp_async_reply_t clb)
+rte_mp_request_sync_V24(struct rte_mp_msg *req, struct rte_mp_reply *reply,
+			const struct timespec *ts)
+{
+	return __rte_mp_request_sync(req, reply, ts, RTE_MP_MAX_FD_NUM);
+}
+BIND_DEFAULT_SYMBOL(rte_mp_request_sync, _V24, 24);
+MAP_STATIC_SYMBOL(int rte_mp_request_sync(struct rte_mp_msg *req, \
+					  struct rte_mp_reply *reply, \
+					  const struct timespec *ts), rte_mp_request_sync_V24);
+
+static int
+__rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
+		       rte_mp_async_reply_t clb, int max_fd)
 {
 	struct rte_mp_msg *copy;
 	struct pending_request *dummy;
@@ -1104,7 +1147,7 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 
 	EAL_LOG(DEBUG, "request: %s", req->name);
 
-	if (check_input(req) != 0)
+	if (check_input(req, max_fd) != 0)
 		return -1;
 
 	if (internal_conf->no_shconf) {
@@ -1237,14 +1280,38 @@ rte_mp_request_async(struct rte_mp_msg *req, const struct timespec *ts,
 	return -1;
 }
 
+int rte_mp_request_async_V23(struct rte_mp_old_msg *req, const struct timespec *ts,
+			     rte_mp_async_reply_t clb);
+int rte_mp_request_async_V24(struct rte_mp_msg *req, const struct timespec *ts,
+			     rte_mp_async_reply_t clb);
+
 int
-rte_mp_reply(struct rte_mp_msg *msg, const char *peer)
+rte_mp_request_async_V23(struct rte_mp_old_msg *req, const struct timespec *ts,
+			 rte_mp_async_reply_t clb)
+{
+	return __rte_mp_request_async((struct rte_mp_msg *)req, ts, clb, RTE_MP_MAX_OLD_FD_NUM);
+}
+VERSION_SYMBOL(rte_mp_request_async, _V23, 23);
+
+int
+rte_mp_request_async_V24(struct rte_mp_msg *req, const struct timespec *ts,
+			 rte_mp_async_reply_t clb)
+{
+	return __rte_mp_request_async(req, ts, clb, RTE_MP_MAX_FD_NUM);
+}
+BIND_DEFAULT_SYMBOL(rte_mp_request_async, _V24, 24);
+MAP_STATIC_SYMBOL(int rte_mp_request_async(struct rte_mp_msg *req,	\
+					   const struct timespec *ts,	\
+					   rte_mp_async_reply_t clb), rte_mp_request_async_V24);
+
+static int
+mp_reply(struct rte_mp_msg *msg, const char *peer, int max_fd)
 {
 	EAL_LOG(DEBUG, "reply: %s", msg->name);
 	const struct internal_config *internal_conf =
 		eal_get_internal_configuration();
 
-	if (check_input(msg) != 0)
+	if (check_input(msg, max_fd) != 0)
 		return -1;
 
 	if (peer == NULL) {
@@ -1261,6 +1328,25 @@ rte_mp_reply(struct rte_mp_msg *msg, const char *peer)
 	return mp_send(msg, peer, MP_REP);
 }
 
+int rte_mp_reply_V23(struct rte_mp_old_msg *msg, const char *peer);
+int rte_mp_reply_V24(struct rte_mp_msg *msg, const char *peer);
+
+int
+rte_mp_reply_V23(struct rte_mp_old_msg *msg, const char *peer)
+{
+	return mp_reply((struct rte_mp_msg *)msg, peer, RTE_MP_MAX_OLD_FD_NUM);
+}
+VERSION_SYMBOL(rte_mp_reply, _V23, 23);
+
+int
+rte_mp_reply_V24(struct rte_mp_msg *msg, const char *peer)
+{
+	return mp_reply(msg, peer, RTE_MP_MAX_FD_NUM);
+}
+BIND_DEFAULT_SYMBOL(rte_mp_reply, _V24, 24);
+MAP_STATIC_SYMBOL(int rte_mp_reply(struct rte_mp_msg *msg, const char *peer), rte_mp_reply_V24);
+
+
 /* Internally, the status of the mp feature is represented as a three-state:
  * - "unknown" as long as no secondary process attached to a primary process
  *   and there was no call to rte_mp_disable yet,
diff --git a/lib/eal/common/meson.build b/lib/eal/common/meson.build
index 22a626ba6fc7..3faf0c20e798 100644
--- a/lib/eal/common/meson.build
+++ b/lib/eal/common/meson.build
@@ -3,6 +3,8 @@
 
 includes += include_directories('.')
 
+use_function_versioning = true
+
 cflags += [ '-DABI_VERSION="@0@"'.format(abi_version) ]
 
 sources += files(
diff --git a/lib/eal/include/rte_eal.h b/lib/eal/include/rte_eal.h
index c2256f832e51..0d0761c50409 100644
--- a/lib/eal/include/rte_eal.h
+++ b/lib/eal/include/rte_eal.h
@@ -155,9 +155,10 @@ int rte_eal_primary_proc_alive(const char *config_file_path);
  */
 bool rte_mp_disable(void);
 
-#define RTE_MP_MAX_FD_NUM	8    /* The max amount of fds */
+#define RTE_MP_MAX_FD_NUM	253  /* The max number of fds (SCM_MAX_FD) */
 #define RTE_MP_MAX_NAME_LEN	64   /* The max length of action name */
 #define RTE_MP_MAX_PARAM_LEN	256  /* The max length of param */
+
 struct rte_mp_msg {
 	char name[RTE_MP_MAX_NAME_LEN];
 	int len_param;
@@ -166,6 +167,16 @@ struct rte_mp_msg {
 	int fds[RTE_MP_MAX_FD_NUM];
 };
 
+/* Legacy API version */
+#define RTE_MP_MAX_OLD_FD_NUM	8    /* The legacy limit on fds */
+struct rte_mp_old_msg {
+	char name[RTE_MP_MAX_NAME_LEN];
+	int len_param;
+	int num_fds;
+	uint8_t param[RTE_MP_MAX_PARAM_LEN];
+	int fds[RTE_MP_MAX_OLD_FD_NUM];
+};
+
 struct rte_mp_reply {
 	int nb_sent;
 	int nb_received;
diff --git a/lib/eal/version.map b/lib/eal/version.map
index c06ceaad5097..264ff2d0818b 100644
--- a/lib/eal/version.map
+++ b/lib/eal/version.map
@@ -344,6 +344,15 @@ DPDK_24 {
 	local: *;
 };
 
+DPDK_23 {
+	global:
+
+	rte_mp_reply;
+	rte_mp_request_async;
+	rte_mp_request_sync;
+	rte_mp_sendmsg;
+} DPDK_24;
+
 EXPERIMENTAL {
 	global:
 
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v3 07/33] net/ena: restructure the llq policy setting process
  @ 2024-03-08 17:24  3%   ` Ferruh Yigit
  2024-03-10 14:29  0%     ` Brandes, Shai
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-03-08 17:24 UTC (permalink / raw)
  To: shaibran; +Cc: dev

On 3/6/2024 12:24 PM, shaibran@amazon.com wrote:
> From: Shai Brandes <shaibran@amazon.com>
> 
> The driver will set the size of the LLQ header size according to the
> recommendation from the device.
> Replaced `enable_llq` and `large_llq_hdr` devargs with
> a new devarg `llq_policy` that accepts the following values:
> 0 - Disable LLQ.
>     Use with extreme caution as it leads to a huge performance
>     degradation on AWS instances from 6th generation onwards.
> 1 - Accept device recommended LLQ policy (Default).
>     Device can recommend normal or large LLQ policy.
> 2 - Enforce normal LLQ policy.
> 3 - Enforce large LLQ policy.
>     Required for packets with header that exceed 96 bytes on
>     AWS instances prior to 5th generation.
> 

We had similar discussion before, although dev_args is not part of the
ABI, it is an user interface, and changes in the devargs will impact
users directly.

What would you think to either keep backward compatilibity in the
devargs (like not remove old one but add new one), or do this change in
24.11 release?


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v7 0/5] app/testpmd: support multiple process attach and detach port
  @ 2024-03-08 10:38  0%   ` lihuisong (C)
  2024-04-23 11:17  0%   ` lihuisong (C)
  1 sibling, 0 replies; 200+ results
From: lihuisong (C) @ 2024-03-08 10:38 UTC (permalink / raw)
  To: thomas, ferruh.yigit
  Cc: andrew.rybchenko, fengchengwen, liudongdong3, liuyonglong, dev

Hi Ferruh and Thomas,

Kindly ping for review.


在 2024/1/30 14:36, Huisong Li 写道:
> This patchset fix some bugs and support attaching and detaching port
> in primary and secondary.
>
> ---
>   -v7: fix conflicts
>   -v6: adjust rte_eth_dev_is_used position based on alphabetical order
>        in version.map
>   -v5: move 'ALLOCATED' state to the back of 'REMOVED' to avoid abi break.
>   -v4: fix a misspelling.
>   -v3:
>     #1 merge patch 1/6 and patch 2/6 into patch 1/5, and add modification
>        for other bus type.
>     #2 add a RTE_ETH_DEV_ALLOCATED state in rte_eth_dev_state to resolve
>        the probelm in patch 2/5.
>   -v2: resend due to CI unexplained failure.
>
> Huisong Li (5):
>    drivers/bus: restore driver assignment at front of probing
>    ethdev: fix skip valid port in probing callback
>    app/testpmd: check the validity of the port
>    app/testpmd: add attach and detach port for multiple process
>    app/testpmd: stop forwarding in new or destroy event
>
>   app/test-pmd/testpmd.c                   | 47 +++++++++++++++---------
>   app/test-pmd/testpmd.h                   |  1 -
>   drivers/bus/auxiliary/auxiliary_common.c |  9 ++++-
>   drivers/bus/dpaa/dpaa_bus.c              |  9 ++++-
>   drivers/bus/fslmc/fslmc_bus.c            |  8 +++-
>   drivers/bus/ifpga/ifpga_bus.c            | 12 ++++--
>   drivers/bus/pci/pci_common.c             |  9 ++++-
>   drivers/bus/vdev/vdev.c                  | 10 ++++-
>   drivers/bus/vmbus/vmbus_common.c         |  9 ++++-
>   drivers/net/bnxt/bnxt_ethdev.c           |  3 +-
>   drivers/net/bonding/bonding_testpmd.c    |  1 -
>   drivers/net/mlx5/mlx5.c                  |  2 +-
>   lib/ethdev/ethdev_driver.c               | 13 +++++--
>   lib/ethdev/ethdev_driver.h               | 12 ++++++
>   lib/ethdev/ethdev_pci.h                  |  2 +-
>   lib/ethdev/rte_class_eth.c               |  2 +-
>   lib/ethdev/rte_ethdev.c                  |  4 +-
>   lib/ethdev/rte_ethdev.h                  |  4 +-
>   lib/ethdev/version.map                   |  1 +
>   19 files changed, 114 insertions(+), 44 deletions(-)
>

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] net/tap: allow more that 4 queues
  2024-03-07 10:25  0%     ` Ferruh Yigit
@ 2024-03-07 16:53  0%       ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-03-07 16:53 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Thu, 7 Mar 2024 10:25:48 +0000
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> > I got 4 queues setup, but looks like they are trash in secondary.
> > Probably best to revert this and fix it by bumping RTE_MP_MAX_FD_NUM.
> > This is better, but does take some ABI issue handling.
> >  
> 
> We can increase RTE_MP_MAX_FD_NUM but still there will be a limit.
> 
> Can't it be possible to update 'rte_mp_sendmsg()' to support multiple
> 'rte_mp_sendmsg()' calls in this patch?
> 
> Also need to check if fds size is less than 'RTE_PMD_TAP_MAX_QUEUES'
> with multiple 'rte_mp_sendmsg()' call support.

Kernel allows up to 253 fd's to be passed.
So for tap that would limit it to 126 queues; because TAP dups the
fd's for rx and tx but that could be fixable.

Tap should have a static assert about max queues and this as well.

Increasing RTE_MP_MAX_FD_NUM would also fix similar issues in af_xdp PMD
and when af_packet gets MP support.

^ permalink raw reply	[relevance 0%]

* Re: [EXTERNAL] Re: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional transfer
  2024-03-01 10:59  0%               ` Amit Prakash Shukla
@ 2024-03-07 13:41  0%                 ` fengchengwen
  0 siblings, 0 replies; 200+ results
From: fengchengwen @ 2024-03-07 13:41 UTC (permalink / raw)
  To: Amit Prakash Shukla, Cheng Jiang, Gowrishankar Muthukrishnan
  Cc: dev, Jerin Jacob, Anoob Joseph, Kevin Laatz, Bruce Richardson,
	Pavan Nikhilesh Bhagavatula

Hi Amit,

On 2024/3/1 18:59, Amit Prakash Shukla wrote:
> Hi Chengwen,
> 
> Please find my reply in-line.
> 
> Thanks,
> Amit Shukla
> 
>> Hi Amit,
>>
>> On 2024/3/1 16:31, Amit Prakash Shukla wrote:
>>> Hi Chengwen,
>>>
>>> If I'm not wrong, your concern was about config file additions and not
>>> about the test as such. If the config file is getting complicated and
>>> there are better alternatives, we can minimize the config file changes
>>> with this patch and just provide minimum functionality as required and
>>> leave it open for future changes. For now, I can document the existing
>>> behavior in documentation as "Constraints". Similar approach is
>>> followed in other application such as ipsec-secgw
>>> https://urldefense.proofpoint.com/v2/url?u=https-3A__doc.dpdk.org_guid
>>> es_sample-5Fapp-5Fug_ipsec-5Fsecgw.html-
>> 23constraints&d=DwICaQ&c=nKjWe
>>>
>> c2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR69VnJLdSnADun7zLaXG1p5Rs7pXihE
>> &m=UXlZ
>>>
>> 1CWj8uotMMmYQ4e7wtBXj4geBwcMUirlqFw0pZzSlOIIAVjWaPgcaXtni370&
>> s=haaehrX
>>> QSEG6EFRW8w2sHKUTU75aJX7ML8vM-0mJsAI&e=
>>
>> Yes, I prefer enable different test just by modify configuration file, and then
>> limit the number of entries at the same time.
>>
>> This commit is bi-direction transfer, it is fixed, maybe later we should test 3/4
>> for mem2dev while 1/4 for dev2mem.
> 
> Agreed. We will add this later after the base functionality is merged. I will send next version with constraints listed. Can I assume next version is good for merge?

I suggest do it all at once in [1].

[1] https://patches.dpdk.org/project/dpdk/cover/cover.1709210551.git.gmuthukrishn@marvell.com/

Thanks

> 
>>
>> sometime we may need evaluate performance of one dma channel for
>> mem2mem, while another channel for mem2dev, we can't do this in current
>> implement (because vchan_dev is for all DMA channel).
> 
> We are okay with extending it later. As you said, we are still deciding how the configuration file should look like.
> 
>>
>> So I prefer restrict DMA non-mem2mem's config (include
>> dir/type/coreid/pfid/vfid/raddr) as the dma device's private configuration.
>>
>> Thanks
>>
>>>
>>> Constraints:
>>> 1. vchan_dev config will be same for all the configured DMA devices.
>>> 2. Alternate DMA device will do dev2mem and mem2dev implicitly.
>>> Example:
>>> xfer_mode=1
>>> vchan_dev=raddr=0x200000000,coreid=1,pfid=2,vfid=3
>>> lcore_dma=lcore10@0000:00:04.2, lcore11@0000:00:04.3,
>>> lcore12@0000:00:04.4, lcore13@0000:00:04.5
>>>
>>> lcore10@0000:00:04.2, lcore12@0000:00:04.4 will do dev2mem and
>> lcore11@0000:00:04.3, lcore13@0000:00:04.5 will do mem2dev.
>>>
>>> Thanks,
>>> Amit Shukla
>>>
>>>> -----Original Message-----
>>>> From: fengchengwen <fengchengwen@huawei.com>
>>>> Sent: Friday, March 1, 2024 7:16 AM
>>>> To: Amit Prakash Shukla <amitprakashs@marvell.com>; Cheng Jiang
>>>> <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
>>>> <gmuthukrishn@marvell.com>
>>>> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
>>>> <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
>>>> Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
>>>> <pbhagavatula@marvell.com>
>>>> Subject: [EXTERNAL] Re: [EXT] Re: [PATCH v2] app/dma-perf: support
>>>> bi- directional transfer
>>>>
>>>> Prioritize security for external emails: Confirm sender and content
>>>> safety before clicking links or opening attachments
>>>>
>>>> ---------------------------------------------------------------------
>>>> -
>>>> Hi Amit,
>>>>
>>>> I think this commit will complicated the test, plus futer we may add
>>>> more test (e.g. fill)
>>>>
>>>> I agree Bruce's advise in the [1], let also support "lcore_dma0/1/2",
>>>>
>>>> User could provide dma info by two way:
>>>> 1) lcore_dma=, which seperate each dma with ", ", but a maximum of a
>>>> certain number is allowed.
>>>> 2) lcore_dma0/1/2/..., each dma device take one line
>>>>
>>>> [1] https://urldefense.proofpoint.com/v2/url?u=https-
>>>> 3A__patchwork.dpdk.org_project_dpdk_patch_20231206112952.1588-
>>>> 2D1-2Dvipin.varghese-
>>>>
>> 40amd.com_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR6
>>>> 9VnJLdSnADun7zLaXG1p5Rs7pXihE&m=OwrvdPIi-
>>>>
>> TQ2UEH3cztfXDzT8YkOB099Pl1mfUzGaq9td0fEWrRBLQQBzAFkjQSU&s=kKin
>>>> YsGoNyTxuLEyPJ0LppT17Yq64CvFBtJMirGEISI&e=
>>>>
>>>> Thanks
>>>>
>>>> On 2024/2/29 22:03, Amit Prakash Shukla wrote:
>>>>> Hi Chengwen,
>>>>>
>>>>> I liked your suggestion and tried making changes, but encountered
>>>>> parsing
>>>> issue for CFG files with line greater than CFG_VALUE_LEN=256(current
>>>> value set).
>>>>>
>>>>> There is a discussion on the similar lines in another patch set:
>>>> https://urldefense.proofpoint.com/v2/url?u=https-
>>>> 3A__patchwork.dpdk.org_project_dpdk_patch_20231206112952.1588-
>>>> 2D1-2Dvipin.varghese-
>>>>
>> 40amd.com_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR6
>>>> 9VnJLdSnADun7zLaXG1p5Rs7pXihE&m=OwrvdPIi-
>>>>
>> TQ2UEH3cztfXDzT8YkOB099Pl1mfUzGaq9td0fEWrRBLQQBzAFkjQSU&s=kKin
>>>> YsGoNyTxuLEyPJ0LppT17Yq64CvFBtJMirGEISI&e= .
>>>>>
>>>>> I believe this patch can be taken as-is and we can come up with the
>>>>> solution
>>>> when we can increase the CFG_VALUE_LEN as changing CFG_VALUE_LEN in
>>>> this release is causing ABI breakage.
>>>>>
>>>>> Thanks,
>>>>> Amit Shukla
>>>>>
> 
> <snip>
> 

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] hash: make gfni stubs inline
  2024-03-05 17:53  3%   ` Tyler Retzlaff
  2024-03-05 18:44  0%     ` Stephen Hemminger
@ 2024-03-07 10:32  0%     ` David Marchand
  1 sibling, 0 replies; 200+ results
From: David Marchand @ 2024-03-07 10:32 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: Stephen Hemminger, dev, Yipeng Wang, Sameh Gobriel,
	Bruce Richardson, Vladimir Medvedkin

On Tue, Mar 5, 2024 at 6:53 PM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> On Tue, Mar 05, 2024 at 11:14:45AM +0100, David Marchand wrote:
> > On Mon, Mar 4, 2024 at 7:45 PM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:
> > >
> > > This reverts commit 07d836e5929d18ad6640ebae90dd2f81a2cafb71.
> > >
> > > Tyler found build issues with MSVC and the thash gfni stubs.
> > > The problem would be link errors from missing symbols.
> >
> > Trying to understand this link error.
> > Does it come from the fact that rte_thash_gfni/rte_thash_gfni_bulk
> > declarations are hidden under RTE_THASH_GFNI_DEFINED in
> > rte_thash_gfni.h?
> >
> > If so, why not always expose those two symbols unconditionnally and
> > link with the stub only when ! RTE_THASH_GFNI_DEFINED.
>
> So I don't have a lot of background of this lib.
>
> I think we understand that we can't conditionally expose symbols. That's
> what windows was picking up because it seems none of our CI's ever end
> up with RTE_THASH_GFNI_DEFINED but my local test system did and failed.
> (my experiments showed that Linux would complain too if it was defined)

I can't reproduce a problem when I build (gcc/clang) for a target that
has GFNI/AVX512F.
binutils ld seems to just ignore unknown symbols in the map.

With current main:
[dmarchan@dmarchan main]$ nm build/lib/librte_hash.so.24.1 | grep rte_thash_gfni
00000000000088b0 T rte_thash_gfni_supported
[dmarchan@dmarchan main]$ nm build-nogfni/lib/librte_hash.so.24.1 |
grep rte_thash_gfni
00000000000102c0 T rte_thash_gfni
00000000000102d0 T rte_thash_gfni_bulk
000000000000294e t rte_thash_gfni_bulk.cold
0000000000002918 t rte_thash_gfni.cold
000000000000d3c0 T rte_thash_gfni_supported


>
> If we always expose the symbols then as you point out we have to
> conditionally link with the stub otherwise the inline (non-stub) will be
> duplicate and build / link will fail.
>
> I guess the part I don't understand with your suggestion is how we would
> conditionally link with just the stub? We have to link with rte_hash to
> get the rest of hash and the stub. I've probably missed something here.

No we can't, Stephen suggestion is a full solution.

>
> Since we never had a release exposing the new symbols introduced by
> Stephen in question my suggestion was that we just revert for 24.03 so
> we don't end up with an ABI break later if we choose to solve the
> problem without exports.
>
> I don't know what else to do, but I think we need to decide for 24.03.

I am fully aware that we must fix this for 24.03.

I would like to be sure Stephen fix (see v3) works for you, so have a
look because I am not able to reproduce an issue and validate the fix
myself.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* Re: [PATCH] net/tap: allow more that 4 queues
  2024-03-06 20:21  3%   ` Stephen Hemminger
@ 2024-03-07 10:25  0%     ` Ferruh Yigit
  2024-03-07 16:53  0%       ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-03-07 10:25 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev

On 3/6/2024 8:21 PM, Stephen Hemminger wrote:
> On Wed, 6 Mar 2024 16:14:51 +0000
> Ferruh Yigit <ferruh.yigit@amd.com> wrote:
> 
>> On 2/29/2024 5:56 PM, Stephen Hemminger wrote:
>>> The tap device needs to exchange file descriptors for tx and rx.
>>> But the EAL MP layer has limit of 8 file descriptors per message.
>>> The ideal resolution would be to increase the number of file
>>> descriptors allowed for rte_mp_sendmsg(), but this would break
>>> the ABI. Workaround the constraint by breaking into multiple messages.
>>>
>>> Do not hide errors about MP message failures.
>>>
>>> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
>>> ---
>>>  drivers/net/tap/rte_eth_tap.c | 40 +++++++++++++++++++++++++++++------
>>>  1 file changed, 33 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
>>> index 69d9da695bed..df18c328f498 100644
>>> --- a/drivers/net/tap/rte_eth_tap.c
>>> +++ b/drivers/net/tap/rte_eth_tap.c
>>> @@ -863,21 +863,44 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
>>>  		msg.fds[fd_iterator++] = process_private->txq_fds[i];
>>>  		msg.num_fds++;
>>>  		request_param->txq_count++;
>>> +
>>> +		/* Need to break request into chunks */
>>> +		if (fd_iterator >= RTE_MP_MAX_FD_NUM) {
>>> +			err = rte_mp_sendmsg(&msg);
>>> +			if (err < 0)
>>> +				goto fail;
>>> +
>>> +			fd_iterator = 0;
>>> +			msg.num_fds = 0;
>>> +			request_param->txq_count = 0;
>>> +		}
>>>  	}
>>>  	for (i = 0; i < dev->data->nb_rx_queues; i++) {
>>>  		msg.fds[fd_iterator++] = process_private->rxq_fds[i];
>>>  		msg.num_fds++;
>>>  		request_param->rxq_count++;
>>> +
>>> +		if (fd_iterator >= RTE_MP_MAX_FD_NUM) {
>>> +			err = rte_mp_sendmsg(&msg);
>>> +			if (err < 0)
>>> +				goto fail;
>>> +
>>> +			fd_iterator = 0;
>>> +			msg.num_fds = 0;
>>> +			request_param->rxq_count = 0;
>>> +		}
>>>  	}  
>>
>> Hi Stephen,
>>
>> Did you able to verify with more than 4 queues?
>>
>> As far as I can see, in the secondary counterpart of the
>> 'rte_mp_sendmsg()', each time secondary index starts from 0, and
>> subsequent calls overwrites the fds in secondary.
>> So practically still only 4 queues works.
> 
> I got 4 queues setup, but looks like they are trash in secondary.
> Probably best to revert this and fix it by bumping RTE_MP_MAX_FD_NUM.
> This is better, but does take some ABI issue handling.
>

We can increase RTE_MP_MAX_FD_NUM but still there will be a limit.

Can't it be possible to update 'rte_mp_sendmsg()' to support multiple
'rte_mp_sendmsg()' calls in this patch?

Also need to check if fds size is less than 'RTE_PMD_TAP_MAX_QUEUES'
with multiple 'rte_mp_sendmsg()' call support.


^ permalink raw reply	[relevance 0%]

* [PATCH v5 1/7] ethdev: support report register names and filter
  @ 2024-03-07  3:02  8%   ` Jie Hai
  0 siblings, 0 replies; 200+ results
From: Jie Hai @ 2024-03-07  3:02 UTC (permalink / raw)
  To: dev, Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: lihuisong, fengchengwen, haijie1

This patch adds "filter" and "names" fields to "rte_dev_reg_info"
structure. Names of registers in data fields can be reported and
the registers can be filtered by their names.

The new API rte_eth_dev_get_reg_info_ext() is added to support
reporting names and filtering by names. And the original API
rte_eth_dev_get_reg_info() does not use the names and filter fields.
A local variable is used in rte_eth_dev_get_reg_info for
compatibility. If the drivers does not report the names, set them
to "offset_XXX".

Signed-off-by: Jie Hai <haijie1@huawei.com>
---
 doc/guides/rel_notes/release_24_03.rst |  9 ++++++
 lib/ethdev/rte_dev_info.h              | 11 ++++++++
 lib/ethdev/rte_ethdev.c                | 38 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 29 ++++++++++++++++++++
 lib/ethdev/version.map                 |  1 +
 5 files changed, 88 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 78590c047b2e..e491579ca984 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -161,6 +161,12 @@ New Features
   * Added power-saving during polling within the ``rte_event_dequeue_burst()`` API.
   * Added support for DMA adapter.
 
+* **Added support for dumping registers with names and filter.**
+
+  * Added new API functions ``rte_eth_dev_get_reg_info_ext()`` to and filter
+    the registers by their names and get the information of registers(names,
+    values and other attributes).
+
 
 Removed Items
 -------------
@@ -228,6 +234,9 @@ ABI Changes
 
 * No ABI change that would break compatibility with 23.11.
 
+* ethdev: Added ``filter`` and ``names`` fields to ``rte_dev_reg_info``
+  structure for reporting names of registers and filtering them by names.
+
 
 Known Issues
 ------------
diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
index 67cf0ae52668..0badb92432ae 100644
--- a/lib/ethdev/rte_dev_info.h
+++ b/lib/ethdev/rte_dev_info.h
@@ -11,6 +11,11 @@ extern "C" {
 
 #include <stdint.h>
 
+#define RTE_ETH_REG_NAME_SIZE 64
+struct rte_eth_reg_name {
+	char name[RTE_ETH_REG_NAME_SIZE];
+};
+
 /*
  * Placeholder for accessing device registers
  */
@@ -20,6 +25,12 @@ struct rte_dev_reg_info {
 	uint32_t length; /**< Number of registers to fetch */
 	uint32_t width; /**< Size of device register */
 	uint32_t version; /**< Device version */
+	/**
+	 * Filter for target subset of registers.
+	 * This field could affects register selection for data/length/names.
+	 */
+	const char *filter;
+	struct rte_eth_reg_name *names; /**< Registers name saver */
 };
 
 /*
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e80..82d228790692 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6388,8 +6388,37 @@ rte_eth_read_clock(uint16_t port_id, uint64_t *clock)
 
 int
 rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
+{
+	struct rte_dev_reg_info reg_info = { 0 };
+	int ret;
+
+	if (info == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Cannot get ethdev port %u register info to NULL",
+			port_id);
+		return -EINVAL;
+	}
+
+	reg_info.length = info->length;
+	reg_info.data = info->data;
+
+	ret = rte_eth_dev_get_reg_info_ext(port_id, &reg_info);
+	if (ret != 0)
+		return ret;
+
+	info->length = reg_info.length;
+	info->width = reg_info.width;
+	info->version = reg_info.version;
+	info->offset = reg_info.offset;
+
+	return 0;
+}
+
+int
+rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info)
 {
 	struct rte_eth_dev *dev;
+	uint32_t i;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
@@ -6402,12 +6431,21 @@ rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
 		return -EINVAL;
 	}
 
+	if (info->names != NULL && info->length != 0)
+		memset(info->names, 0,
+			sizeof(struct rte_eth_reg_name) * info->length);
+
 	if (*dev->dev_ops->get_reg == NULL)
 		return -ENOTSUP;
 	ret = eth_err(port_id, (*dev->dev_ops->get_reg)(dev, info));
 
 	rte_ethdev_trace_get_reg_info(port_id, info, ret);
 
+	/* Report the default names if drivers not report. */
+	if (info->names != NULL && strlen(info->names[0].name) == 0)
+		for (i = 0; i < info->length; i++)
+			snprintf(info->names[i].name, RTE_ETH_REG_NAME_SIZE,
+				"offset_%u", info->offset + i * info->width);
 	return ret;
 }
 
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index ed27360447a3..cd95a0d51038 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -5066,6 +5066,35 @@ __rte_experimental
 int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id,
 		struct rte_power_monitor_cond *pmc);
 
+/**
+ * Retrieve the filtered device registers (values and names) and
+ * register attributes (number of registers and register size)
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param info
+ *   Pointer to rte_dev_reg_info structure to fill in.
+ *   - If info->filter is NULL, return info for all registers (seen as filter
+ *     none).
+ *   - If info->filter is not NULL, return error if the driver does not support
+ *     names or filter.
+ *   - If info->data is NULL, the function fills in the width and length fields.
+ *   - If info->data is not NULL, ethdev considers there are enough spaces to
+ *     store the registers, and the values of registers whose name contains the
+ *     filter string are put into the buffer pointed at by info->data.
+ *   - If info->names is not NULL, drivers should fill it or the ethdev fills it
+ *     with default names.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EIO) if device is removed.
+ *   - others depends on the specific operations implementation.
+ */
+__rte_experimental
+int rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info);
+
 /**
  * Retrieve device registers and register attributes (number of registers and
  * register size)
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 79f6f5293b5c..e5ec2a2a9741 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -319,6 +319,7 @@ EXPERIMENTAL {
 
 	# added in 24.03
 	__rte_eth_trace_tx_queue_count;
+	rte_eth_dev_get_reg_info_ext;
 	rte_eth_find_rss_algo;
 	rte_flow_async_update_resized;
 	rte_flow_calc_encap_hash;
-- 
2.30.0


^ permalink raw reply	[relevance 8%]

* Re: [PATCH] net/tap: allow more that 4 queues
  2024-03-06 16:14  0% ` Ferruh Yigit
@ 2024-03-06 20:21  3%   ` Stephen Hemminger
  2024-03-07 10:25  0%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-03-06 20:21 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev

On Wed, 6 Mar 2024 16:14:51 +0000
Ferruh Yigit <ferruh.yigit@amd.com> wrote:

> On 2/29/2024 5:56 PM, Stephen Hemminger wrote:
> > The tap device needs to exchange file descriptors for tx and rx.
> > But the EAL MP layer has limit of 8 file descriptors per message.
> > The ideal resolution would be to increase the number of file
> > descriptors allowed for rte_mp_sendmsg(), but this would break
> > the ABI. Workaround the constraint by breaking into multiple messages.
> > 
> > Do not hide errors about MP message failures.
> > 
> > Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> > ---
> >  drivers/net/tap/rte_eth_tap.c | 40 +++++++++++++++++++++++++++++------
> >  1 file changed, 33 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> > index 69d9da695bed..df18c328f498 100644
> > --- a/drivers/net/tap/rte_eth_tap.c
> > +++ b/drivers/net/tap/rte_eth_tap.c
> > @@ -863,21 +863,44 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
> >  		msg.fds[fd_iterator++] = process_private->txq_fds[i];
> >  		msg.num_fds++;
> >  		request_param->txq_count++;
> > +
> > +		/* Need to break request into chunks */
> > +		if (fd_iterator >= RTE_MP_MAX_FD_NUM) {
> > +			err = rte_mp_sendmsg(&msg);
> > +			if (err < 0)
> > +				goto fail;
> > +
> > +			fd_iterator = 0;
> > +			msg.num_fds = 0;
> > +			request_param->txq_count = 0;
> > +		}
> >  	}
> >  	for (i = 0; i < dev->data->nb_rx_queues; i++) {
> >  		msg.fds[fd_iterator++] = process_private->rxq_fds[i];
> >  		msg.num_fds++;
> >  		request_param->rxq_count++;
> > +
> > +		if (fd_iterator >= RTE_MP_MAX_FD_NUM) {
> > +			err = rte_mp_sendmsg(&msg);
> > +			if (err < 0)
> > +				goto fail;
> > +
> > +			fd_iterator = 0;
> > +			msg.num_fds = 0;
> > +			request_param->rxq_count = 0;
> > +		}
> >  	}  
> 
> Hi Stephen,
> 
> Did you able to verify with more than 4 queues?
> 
> As far as I can see, in the secondary counterpart of the
> 'rte_mp_sendmsg()', each time secondary index starts from 0, and
> subsequent calls overwrites the fds in secondary.
> So practically still only 4 queues works.

I got 4 queues setup, but looks like they are trash in secondary.
Probably best to revert this and fix it by bumping RTE_MP_MAX_FD_NUM.
This is better, but does take some ABI issue handling.

^ permalink raw reply	[relevance 3%]

* [PATCH v5 4/6] pipeline: replace zero length array with flex array
  @ 2024-03-06 20:13  4%   ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-03-06 20:13 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Cristian Dumitrescu, Honnappa Nagarahalli,
	Sameh Gobriel, Vladimir Medvedkin, Yipeng Wang, mb, fengchengwen,
	Tyler Retzlaff

Zero length arrays are GNU extension. Replace with
standard flex array.

Add a temporary suppression for rte_pipeline_table_entry
libabigail bug:

Bugzilla ID: https://sourceware.org/bugzilla/show_bug.cgi?id=31377

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 devtools/libabigail.abignore      | 2 ++
 lib/pipeline/rte_pipeline.h       | 2 +-
 lib/pipeline/rte_port_in_action.c | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 25c73a5..5292b63 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -33,6 +33,8 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till next major ABI version ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+[suppress_type]
+	name = rte_pipeline_table_entry
 
 [suppress_type]
 	name = rte_rcu_qsbr
diff --git a/lib/pipeline/rte_pipeline.h b/lib/pipeline/rte_pipeline.h
index ec51b9b..0c7994b 100644
--- a/lib/pipeline/rte_pipeline.h
+++ b/lib/pipeline/rte_pipeline.h
@@ -220,7 +220,7 @@ struct rte_pipeline_table_entry {
 		uint32_t table_id;
 	};
 	/** Start of table entry area for user defined actions and meta-data */
-	__extension__ uint8_t action_data[0];
+	uint8_t action_data[];
 };
 
 /**
diff --git a/lib/pipeline/rte_port_in_action.c b/lib/pipeline/rte_port_in_action.c
index bbacaff..4127bd2 100644
--- a/lib/pipeline/rte_port_in_action.c
+++ b/lib/pipeline/rte_port_in_action.c
@@ -283,7 +283,7 @@ struct rte_port_in_action_profile *
 struct rte_port_in_action {
 	struct ap_config cfg;
 	struct ap_data data;
-	alignas(RTE_CACHE_LINE_SIZE) uint8_t memory[0];
+	alignas(RTE_CACHE_LINE_SIZE) uint8_t memory[];
 };
 
 static __rte_always_inline void *
-- 
1.8.3.1


^ permalink raw reply	[relevance 4%]

* Re: [PATCH] net/tap: allow more that 4 queues
  2024-02-29 17:56  3% [PATCH] net/tap: allow more that 4 queues Stephen Hemminger
@ 2024-03-06 16:14  0% ` Ferruh Yigit
  2024-03-06 20:21  3%   ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-03-06 16:14 UTC (permalink / raw)
  To: Stephen Hemminger, dev

On 2/29/2024 5:56 PM, Stephen Hemminger wrote:
> The tap device needs to exchange file descriptors for tx and rx.
> But the EAL MP layer has limit of 8 file descriptors per message.
> The ideal resolution would be to increase the number of file
> descriptors allowed for rte_mp_sendmsg(), but this would break
> the ABI. Workaround the constraint by breaking into multiple messages.
> 
> Do not hide errors about MP message failures.
> 
> Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
> ---
>  drivers/net/tap/rte_eth_tap.c | 40 +++++++++++++++++++++++++++++------
>  1 file changed, 33 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
> index 69d9da695bed..df18c328f498 100644
> --- a/drivers/net/tap/rte_eth_tap.c
> +++ b/drivers/net/tap/rte_eth_tap.c
> @@ -863,21 +863,44 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
>  		msg.fds[fd_iterator++] = process_private->txq_fds[i];
>  		msg.num_fds++;
>  		request_param->txq_count++;
> +
> +		/* Need to break request into chunks */
> +		if (fd_iterator >= RTE_MP_MAX_FD_NUM) {
> +			err = rte_mp_sendmsg(&msg);
> +			if (err < 0)
> +				goto fail;
> +
> +			fd_iterator = 0;
> +			msg.num_fds = 0;
> +			request_param->txq_count = 0;
> +		}
>  	}
>  	for (i = 0; i < dev->data->nb_rx_queues; i++) {
>  		msg.fds[fd_iterator++] = process_private->rxq_fds[i];
>  		msg.num_fds++;
>  		request_param->rxq_count++;
> +
> +		if (fd_iterator >= RTE_MP_MAX_FD_NUM) {
> +			err = rte_mp_sendmsg(&msg);
> +			if (err < 0)
> +				goto fail;
> +
> +			fd_iterator = 0;
> +			msg.num_fds = 0;
> +			request_param->rxq_count = 0;
> +		}
>  	}

Hi Stephen,

Did you able to verify with more than 4 queues?

As far as I can see, in the secondary counterpart of the
'rte_mp_sendmsg()', each time secondary index starts from 0, and
subsequent calls overwrites the fds in secondary.
So practically still only 4 queues works.

^ permalink raw reply	[relevance 0%]

* RE: [EXTERNAL] [PATCH v5 1/4] crypto/ipsec_mb: bump minimum IPsec Multi-buffer version
  @ 2024-03-06 11:12  4%     ` Power, Ciara
  0 siblings, 0 replies; 200+ results
From: Power, Ciara @ 2024-03-06 11:12 UTC (permalink / raw)
  To: Akhil Goyal, Dooley, Brian, Ji, Kai, De Lara Guarch, Pablo,
	Patrick Robb, Aaron Conole
  Cc: dev, Sivaramakrishnan, VenkatX, Wathsala Vithanage, thomas,
	Marchand, David



> -----Original Message-----
> From: Akhil Goyal <gakhil@marvell.com>
> Sent: Tuesday, March 5, 2024 7:12 PM
> To: Dooley, Brian <brian.dooley@intel.com>; Ji, Kai <kai.ji@intel.com>; De Lara
> Guarch, Pablo <pablo.de.lara.guarch@intel.com>; Patrick Robb
> <probb@iol.unh.edu>; Aaron Conole <aconole@redhat.com>
> Cc: dev@dpdk.org; Sivaramakrishnan, VenkatX
> <venkatx.sivaramakrishnan@intel.com>; Power, Ciara <ciara.power@intel.com>;
> Wathsala Vithanage <wathsala.vithanage@arm.com>; thomas@monjalon.net;
> Marchand, David <david.marchand@redhat.com>
> Subject: RE: [EXTERNAL] [PATCH v5 1/4] crypto/ipsec_mb: bump minimum IPsec
> Multi-buffer version
> 
> > Subject: [EXTERNAL] [PATCH v5 1/4] crypto/ipsec_mb: bump minimum IPsec
> > Multi-buffer version
> >
> > From: Sivaramakrishnan Venkat <venkatx.sivaramakrishnan@intel.com>
> >
> > SW PMDs increment IPsec Multi-buffer version to 1.4.
> > A minimum IPsec Multi-buffer version of 1.4 or greater is now required.
> >
> > Signed-off-by: Sivaramakrishnan Venkat
> > <venkatx.sivaramakrishnan@intel.com>
> > Acked-by: Ciara Power <ciara.power@intel.com>
> > Acked-by: Pablo de Lara <pablo.de.lara.guarch@intel.com>
> > Acked-by: Wathsala Vithanage <wathsala.vithanage@arm.com>
> please check these:
> https://github.com/ovsrobot/dpdk/actions/runs/8160942783/job/223086396
> 70#step:19:19411
> Error: cannot find librte_crypto_ipsec_mb.so.24.0 in install You need to get this
> fixed or else CI would fail for every patch once this series is applied.

I am having trouble reproducing this one.
I have run these commands that I saw in the CI log, before the patches, and then after the patches are applied - with ipsec-mb v1.2 on system as in CI.

meson configure  build -Denable_docs=true -Dexamples=all -Dplatform=generic -Ddefault_library=shared -Dbuildtype=debug -Dcheck_includes=true  -Dlibdir=lib -Dwerror=true
meson install -C build
ninja -C build

It compiles ok both times, first time it compiles ipsec-mb PMDs, after the patches applied, it skips compiling the PMDs as expected.

I am wondering, could this error be to do with the ABI reference/install comparison?
Maybe reference has the ipsec_mb.so file from a build that supported it, and it can't find the equivalent in the new install, because it's not compiled anymore:
+ devtools/check-abi.sh reference install
Error: cannot find librte_crypto_ipsec_mb.so.24.0 in install

Aaron, could that be the case?
Or, maybe my steps to reproduce the build setup are incorrect?


> And this is also failing http://mails.dpdk.org/archives/test-report/2024-
> March/601301.html
> These need to be fixed in CI infra.

This function that throws the error is available in the recently tagged 1.4 equivalent Arm repo, so I am unsure why it can't find it.
Could there be some old installed ipsec-mb version in the environment that is being picked up by DPDK?
Sometimes the meson configure step will pick up the correct ipsec-mb version, but then ninja step links to an older version that still exists and hadn't been uninstalled previously.
Not sure if that could be the case for the CI container though - Patrick maybe you can verify there are no 1.3 or less versions on system that could be being picked up:
I usually use something like:  find /usr -name libIPSec_MB.so\*


Thanks for the help,
Ciara


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v4 1/7] ethdev: support report register names and filter
  2024-02-26  8:01  0%     ` fengchengwen
@ 2024-03-06  7:22  0%       ` Jie Hai
  0 siblings, 0 replies; 200+ results
From: Jie Hai @ 2024-03-06  7:22 UTC (permalink / raw)
  To: fengchengwen, dev; +Cc: lihuisong, liuyonglong, huangdengdui, ferruh.yigit

Hi, fengchengwen,
On 2024/2/26 16:01, fengchengwen wrote:
> Hi Jie,
> 
> On 2024/2/26 11:07, Jie Hai wrote:
>> This patch adds "filter" and "names" fields to "rte_dev_reg_info"
>> structure. Names of registers in data fields can be reported and
>> the registers can be filtered by their names.
>>
>> The new API rte_eth_dev_get_reg_info_ext() is added to support
>> reporting names and filtering by names. And the original API
>> rte_eth_dev_get_reg_info() does not use the name and filter fields.
>> A local variable is used in rte_eth_dev_get_reg_info for
>> compatibility. If the drivers does not report the names, set them
>> to "offset_XXX".
>>
>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>> ---
>>   doc/guides/rel_notes/release_24_03.rst |  8 ++++++
>>   lib/ethdev/rte_dev_info.h              | 11 +++++++++
>>   lib/ethdev/rte_ethdev.c                | 34 ++++++++++++++++++++++++++
>>   lib/ethdev/rte_ethdev.h                | 28 +++++++++++++++++++++
>>   lib/ethdev/version.map                 |  1 +
>>   5 files changed, 82 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
>> index 32d0ad8cf6a7..fa46da427dca 100644
>> --- a/doc/guides/rel_notes/release_24_03.rst
>> +++ b/doc/guides/rel_notes/release_24_03.rst
>> @@ -132,6 +132,11 @@ New Features
>>       to support TLS v1.2, TLS v1.3 and DTLS v1.2.
>>     * Added PMD API to allow raw submission of instructions to CPT.
>>   
>> +  * **Added support for dumping registers with names and filter.**
>> +
>> +    * Added new API functions ``rte_eth_dev_get_reg_info_ext()`` to and filter
>> +      the registers by their names and get the information of registers(names,
>> +      values and other attributes).
>>   
>>   Removed Items
>>   -------------
>> @@ -197,6 +202,9 @@ ABI Changes
>>   
>>   * No ABI change that would break compatibility with 23.11.
>>   
>> +* ethdev: Added ``filter`` and ``names`` fields to ``rte_dev_reg_info``
>> +  structure for reporting names of registers and filtering them by names.
>> +
>>   
>>   Known Issues
>>   ------------
>> diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
>> index 67cf0ae52668..0ad4a43b9526 100644
>> --- a/lib/ethdev/rte_dev_info.h
>> +++ b/lib/ethdev/rte_dev_info.h
>> @@ -11,6 +11,11 @@ extern "C" {
>>   
>>   #include <stdint.h>
>>   
>> +#define RTE_ETH_REG_NAME_SIZE 128
> 
> Almost all stats name size is 64, why not keep consistent?
> 
will correct.
>> +struct rte_eth_reg_name {
>> +	char name[RTE_ETH_REG_NAME_SIZE];
>> +};
>> +
>>   /*
>>    * Placeholder for accessing device registers
>>    */
>> @@ -20,6 +25,12 @@ struct rte_dev_reg_info {
>>   	uint32_t length; /**< Number of registers to fetch */
>>   	uint32_t width; /**< Size of device register */
>>   	uint32_t version; /**< Device version */
>> +	/**
>> +	 * Filter for target subset of registers.
>> +	 * This field could affects register selection for data/length/names.
>> +	 */
>> +	const char *filter;
>> +	struct rte_eth_reg_name *names; /**< Registers name saver */
>>   };
>>   
>>   /*
>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>> index f1c658f49e80..9ef50c633ce3 100644
>> --- a/lib/ethdev/rte_ethdev.c
>> +++ b/lib/ethdev/rte_ethdev.c
>> @@ -6388,8 +6388,37 @@ rte_eth_read_clock(uint16_t port_id, uint64_t *clock)
>>   
>>   int
>>   rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
>> +{
>> +	struct rte_dev_reg_info reg_info = { 0 };
>> +	int ret;
>> +
>> +	if (info == NULL) {
>> +		RTE_ETHDEV_LOG_LINE(ERR,
>> +			"Cannot get ethdev port %u register info to NULL",
>> +			port_id);
>> +		return -EINVAL;
>> +	}
>> +
>> +	reg_info.length = info->length;
>> +	reg_info.data = info->data;
>> +
>> +	ret = rte_eth_dev_get_reg_info_ext(port_id, &reg_info);
>> +	if (ret != 0)
>> +		return ret;
>> +
>> +	info->length = reg_info.length;
>> +	info->width = reg_info.width;
>> +	info->version = reg_info.version;
>> +	info->offset = reg_info.offset;
>> +
>> +	return 0;
>> +}
>> +
>> +int
>> +rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info)
>>   {
>>   	struct rte_eth_dev *dev;
>> +	uint32_t i;
>>   	int ret;
>>   
>>   	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>> @@ -6408,6 +6437,11 @@ rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
>>   
>>   	rte_ethdev_trace_get_reg_info(port_id, info, ret);
>>   
>> +	/* Report the default names if drivers not report. */
>> +	if (info->names != NULL && strlen(info->names[0].name) == 0)
>> +		for (i = 0; i < info->length; i++)
>> +			snprintf(info->names[i].name, RTE_ETH_REG_NAME_SIZE,
>> +				"offset_%x", info->offset + i * info->width);
> 
> %x has no prefix "0x", may lead to confused.
> How about use %u ?
> 
That sounds better.
> Another question, if app don't zero names' memory, then its value is random, so it will not enter this logic.
> Suggest memset item[0]'s name memory before invoke PMD ops.
> 
>>   	return ret;
>>   }
>>   
>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>> index ed27360447a3..09e2d5fdb49b 100644
>> --- a/lib/ethdev/rte_ethdev.h
>> +++ b/lib/ethdev/rte_ethdev.h
>> @@ -5066,6 +5066,34 @@ __rte_experimental
>>   int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id,
>>   		struct rte_power_monitor_cond *pmc);
>>   
>> +/**
>> + * Retrieve the filtered device registers (values and names) and
>> + * register attributes (number of registers and register size)
>> + *
>> + * @param port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param info
>> + *   Pointer to rte_dev_reg_info structure to fill in.
>> + *   If info->filter is not NULL and the driver does not support names or
>> + *   filter, return error. If info->filter is NULL, return info for all
>> + *   registers (seen as filter none).
>> + *   If info->data is NULL, the function fills in the width and length fields.
>> + *   If non-NULL, ethdev considers there are enough spaces to store the
>> + *   registers, and the values of registers whose name contains the filter
>> + *   string are put into the buffer pointed at by the data field. Do the same
>> + *   for the names of registers if info->names is not NULL. If drivers do not
>> + *   report names, default names are given by ethdev.
> 
> It's a little hard to understand. Suggest use '-' for each field, just like rte_eth_remove_tx_callback
> 
I will try.
>> + * @return
>> + *   - (0) if successful.
>> + *   - (-ENOTSUP) if hardware doesn't support.
>> + *   - (-EINVAL) if bad parameter.
>> + *   - (-ENODEV) if *port_id* invalid.
>> + *   - (-EIO) if device is removed.
>> + *   - others depends on the specific operations implementation.
>> + */
>> +__rte_experimental
>> +int rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info);
>> +
>>   /**
>>    * Retrieve device registers and register attributes (number of registers and
>>    * register size)
>> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
>> index 17e4eac8a4cc..c41a64e404db 100644
>> --- a/lib/ethdev/version.map
>> +++ b/lib/ethdev/version.map
>> @@ -325,6 +325,7 @@ EXPERIMENTAL {
>>   	rte_flow_template_table_resizable;
>>   	rte_flow_template_table_resize;
>>   	rte_flow_template_table_resize_complete;
>> +	rte_eth_dev_get_reg_info_ext;
> 
> should place with alphabetical order.
> 
> Thanks
Ok.
> 
>>   };
>>   
>>   INTERNAL {
>>
> .
Thanks,
Jie Hai

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] hash: make gfni stubs inline
  2024-03-05 17:53  3%   ` Tyler Retzlaff
@ 2024-03-05 18:44  0%     ` Stephen Hemminger
  2024-03-07 10:32  0%     ` David Marchand
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-03-05 18:44 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: David Marchand, dev, Yipeng Wang, Sameh Gobriel,
	Bruce Richardson, Vladimir Medvedkin

On Tue, 5 Mar 2024 09:53:00 -0800
Tyler Retzlaff <roretzla@linux.microsoft.com> wrote:

> On Tue, Mar 05, 2024 at 11:14:45AM +0100, David Marchand wrote:
> > On Mon, Mar 4, 2024 at 7:45 PM Stephen Hemminger
> > <stephen@networkplumber.org> wrote:  
> > >
> > > This reverts commit 07d836e5929d18ad6640ebae90dd2f81a2cafb71.
> > >
> > > Tyler found build issues with MSVC and the thash gfni stubs.
> > > The problem would be link errors from missing symbols.  
> > 
> > Trying to understand this link error.
> > Does it come from the fact that rte_thash_gfni/rte_thash_gfni_bulk
> > declarations are hidden under RTE_THASH_GFNI_DEFINED in
> > rte_thash_gfni.h?
> > 
> > If so, why not always expose those two symbols unconditionnally and
> > link with the stub only when ! RTE_THASH_GFNI_DEFINED.  
> 
> So I don't have a lot of background of this lib.
> 
> I think we understand that we can't conditionally expose symbols. That's
> what windows was picking up because it seems none of our CI's ever end
> up with RTE_THASH_GFNI_DEFINED but my local test system did and failed.
> (my experiments showed that Linux would complain too if it was defined)
> 
> If we always expose the symbols then as you point out we have to
> conditionally link with the stub otherwise the inline (non-stub) will be
> duplicate and build / link will fail.
> 
> I guess the part I don't understand with your suggestion is how we would
> conditionally link with just the stub? We have to link with rte_hash to
> get the rest of hash and the stub. I've probably missed something here.
> 
> Since we never had a release exposing the new symbols introduced by
> Stephen in question my suggestion was that we just revert for 24.03 so
> we don't end up with an ABI break later if we choose to solve the
> problem without exports.
> 
> I don't know what else to do, but I think we need to decide for 24.03.
> 
> ty

Another option would be introduce dead code stubs all the time.
Then have inline wrapper that redirect to the dead stub if needed.

Something like:
From 7bb972d342e939200f8f993a9074b20794941f6a Mon Sep 17 00:00:00 2001
From: Stephen Hemminger <stephen@networkplumber.org>
Date: Tue, 5 Mar 2024 10:42:48 -0800
Subject: [PATCH] hash: rename GFNI stubs

Make the GFNI stub functions always built. This solves the conditional
linking problem. If GFNI is available, they will never get used.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 lib/hash/rte_thash_gfni.c | 11 +++++------
 lib/hash/rte_thash_gfni.h | 23 ++++++++++++++++++-----
 lib/hash/version.map      |  9 +++++++--
 3 files changed, 30 insertions(+), 13 deletions(-)

diff --git a/lib/hash/rte_thash_gfni.c b/lib/hash/rte_thash_gfni.c
index f1525f9838de..de67abb8b211 100644
--- a/lib/hash/rte_thash_gfni.c
+++ b/lib/hash/rte_thash_gfni.c
@@ -4,18 +4,18 @@
 
 #include <stdbool.h>
 
+#include <rte_common.h>
 #include <rte_log.h>
 #include <rte_thash_gfni.h>
 
-#ifndef RTE_THASH_GFNI_DEFINED
-
 RTE_LOG_REGISTER_SUFFIX(hash_gfni_logtype, gfni, INFO);
 #define RTE_LOGTYPE_HASH hash_gfni_logtype
 #define HASH_LOG(level, ...) \
 	RTE_LOG_LINE(level, HASH, "" __VA_ARGS__)
 
+__rte_internal
 uint32_t
-rte_thash_gfni(const uint64_t *mtrx __rte_unused,
+___rte_thash_gfni(const uint64_t *mtrx __rte_unused,
 	const uint8_t *key __rte_unused, int len __rte_unused)
 {
 	static bool warned;
@@ -29,8 +29,9 @@ rte_thash_gfni(const uint64_t *mtrx __rte_unused,
 	return 0;
 }
 
+__rte_internal
 void
-rte_thash_gfni_bulk(const uint64_t *mtrx __rte_unused,
+___rte_thash_gfni_bulk(const uint64_t *mtrx __rte_unused,
 	int len __rte_unused, uint8_t *tuple[] __rte_unused,
 	uint32_t val[], uint32_t num)
 {
@@ -47,5 +48,3 @@ rte_thash_gfni_bulk(const uint64_t *mtrx __rte_unused,
 	for (i = 0; i < num; i++)
 		val[i] = 0;
 }
-
-#endif
diff --git a/lib/hash/rte_thash_gfni.h b/lib/hash/rte_thash_gfni.h
index eed55fc86c86..1cb61cf39675 100644
--- a/lib/hash/rte_thash_gfni.h
+++ b/lib/hash/rte_thash_gfni.h
@@ -9,7 +9,16 @@
 extern "C" {
 #endif
 
-#include <rte_log.h>
+#include <rte_common.h>
+/*
+ * @internal
+ * Stubs defined for use when GFNI is not available
+ */
+uint32_t
+___rte_thash_gfni(const uint64_t *mtrx, const uint8_t *key, int len);
+void
+___rte_thash_gfni_bulk(const uint64_t *mtrx, int len, uint8_t *tuple[],
+		       uint32_t val[], uint32_t num);
 
 #ifdef RTE_ARCH_X86
 
@@ -18,10 +27,8 @@ extern "C" {
 #endif
 
 #ifndef RTE_THASH_GFNI_DEFINED
-
 /**
  * Calculate Toeplitz hash.
- * Dummy implementation.
  *
  * @param m
  *  Pointer to the matrices generated from the corresponding
@@ -34,7 +41,10 @@ extern "C" {
  *  Calculated Toeplitz hash value.
  */
 uint32_t
-rte_thash_gfni(const uint64_t *mtrx, const uint8_t *key, int len);
+rte_thash_gfni(const uint64_t *mtrx, const uint8_t *key, int len)
+{
+	return ___rte_thash_gfni(mtrx, key, len);
+}
 
 /**
  * Bulk implementation for Toeplitz hash.
@@ -55,7 +65,10 @@ rte_thash_gfni(const uint64_t *mtrx, const uint8_t *key, int len);
  */
 void
 rte_thash_gfni_bulk(const uint64_t *mtrx, int len, uint8_t *tuple[],
-	uint32_t val[], uint32_t num);
+	uint32_t val[], uint32_t num)
+{
+	return ___rte_thash_gfni_bulk(mtrx, len, tuple, val);
+}
 
 #endif /* RTE_THASH_GFNI_DEFINED */
 
diff --git a/lib/hash/version.map b/lib/hash/version.map
index 6b2afebf6b46..942e2998578f 100644
--- a/lib/hash/version.map
+++ b/lib/hash/version.map
@@ -41,10 +41,15 @@ DPDK_24 {
 	rte_thash_get_gfni_matrices;
 	rte_thash_get_helper;
 	rte_thash_get_key;
-	rte_thash_gfni;
-	rte_thash_gfni_bulk;
 	rte_thash_gfni_supported;
 	rte_thash_init_ctx;
 
 	local: *;
 };
+
+INTERNAL {
+	global:
+
+	___rte_thash_gfni;
+	___rte_thash_gfni_bulk;
+};
-- 
2.43.0


^ permalink raw reply	[relevance 0%]

* Re: [PATCH] hash: make gfni stubs inline
  @ 2024-03-05 17:53  3%   ` Tyler Retzlaff
  2024-03-05 18:44  0%     ` Stephen Hemminger
  2024-03-07 10:32  0%     ` David Marchand
  0 siblings, 2 replies; 200+ results
From: Tyler Retzlaff @ 2024-03-05 17:53 UTC (permalink / raw)
  To: David Marchand
  Cc: Stephen Hemminger, dev, Yipeng Wang, Sameh Gobriel,
	Bruce Richardson, Vladimir Medvedkin

On Tue, Mar 05, 2024 at 11:14:45AM +0100, David Marchand wrote:
> On Mon, Mar 4, 2024 at 7:45 PM Stephen Hemminger
> <stephen@networkplumber.org> wrote:
> >
> > This reverts commit 07d836e5929d18ad6640ebae90dd2f81a2cafb71.
> >
> > Tyler found build issues with MSVC and the thash gfni stubs.
> > The problem would be link errors from missing symbols.
> 
> Trying to understand this link error.
> Does it come from the fact that rte_thash_gfni/rte_thash_gfni_bulk
> declarations are hidden under RTE_THASH_GFNI_DEFINED in
> rte_thash_gfni.h?
> 
> If so, why not always expose those two symbols unconditionnally and
> link with the stub only when ! RTE_THASH_GFNI_DEFINED.

So I don't have a lot of background of this lib.

I think we understand that we can't conditionally expose symbols. That's
what windows was picking up because it seems none of our CI's ever end
up with RTE_THASH_GFNI_DEFINED but my local test system did and failed.
(my experiments showed that Linux would complain too if it was defined)

If we always expose the symbols then as you point out we have to
conditionally link with the stub otherwise the inline (non-stub) will be
duplicate and build / link will fail.

I guess the part I don't understand with your suggestion is how we would
conditionally link with just the stub? We have to link with rte_hash to
get the rest of hash and the stub. I've probably missed something here.

Since we never had a release exposing the new symbols introduced by
Stephen in question my suggestion was that we just revert for 24.03 so
we don't end up with an ABI break later if we choose to solve the
problem without exports.

I don't know what else to do, but I think we need to decide for 24.03.

ty

> 
> -- 
> David Marchand

^ permalink raw reply	[relevance 3%]

* [PATCH] devtools: require version for experimental symbols
@ 2024-03-05 13:49  4% David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-03-05 13:49 UTC (permalink / raw)
  To: dev; +Cc: thomas

Add version to all symbols maps and a check so any experimental symbol
is versioned.

Signed-off-by: David Marchand <david.marchand@redhat.com>
---
 buildtools/map-list-symbol.sh              |  8 ++++++--
 devtools/check-symbol-maps.sh              | 15 +++++++++++++++
 doc/guides/contributing/abi_policy.rst     | 17 ++++++++++++++++-
 drivers/baseband/acc/version.map           |  1 +
 drivers/baseband/fpga_5gnr_fec/version.map |  1 +
 drivers/baseband/fpga_lte_fec/version.map  |  2 +-
 drivers/bus/pci/version.map                |  1 +
 drivers/dma/dpaa2/version.map              |  3 +++
 drivers/event/dlb2/version.map             |  1 +
 drivers/mempool/cnxk/version.map           |  2 ++
 drivers/net/atlantic/version.map           |  1 +
 drivers/net/i40e/version.map               |  7 ++++++-
 drivers/net/ixgbe/version.map              |  1 +
 lib/argparse/version.map                   |  1 +
 lib/metrics/version.map                    |  2 +-
 lib/mldev/version.map                      |  1 +
 lib/regexdev/version.map                   |  9 ++++++---
 lib/reorder/version.map                    |  2 ++
 18 files changed, 66 insertions(+), 9 deletions(-)

diff --git a/buildtools/map-list-symbol.sh b/buildtools/map-list-symbol.sh
index a834399816..b76e2417c5 100755
--- a/buildtools/map-list-symbol.sh
+++ b/buildtools/map-list-symbol.sh
@@ -61,8 +61,12 @@ for file in $@; do
 		if (current_section == "") {
 			next;
 		}
-		if ("'$version'" != "" && "'$version'" != current_version) {
-			next;
+		if ("'$version'" != "") {
+			if ("'$version'" == "unset" && current_version != "") {
+				next;
+			} else if ("'$version'" != "unset" && "'$version'" != current_version) {
+				next;
+			}
 		}
 		gsub(";","");
 		if ("'$symbol'" == "all" || $1 == "'$symbol'") {
diff --git a/devtools/check-symbol-maps.sh b/devtools/check-symbol-maps.sh
index ba2f892f56..6121f78ec6 100755
--- a/devtools/check-symbol-maps.sh
+++ b/devtools/check-symbol-maps.sh
@@ -97,4 +97,19 @@ if [ -n "$bad_format_maps" ] ; then
     ret=1
 fi
 
+find_non_versioned_maps ()
+{
+    for map in $@ ; do
+        [ $(buildtools/map-list-symbol.sh -S EXPERIMENTAL -V unset $map | wc -l) = '0' ] ||
+            echo $map
+    done
+}
+
+non_versioned_maps=$(find_non_versioned_maps $@)
+if [ -n "$non_versioned_maps" ] ; then
+    echo "Found non versioned maps:"
+    echo "$non_versioned_maps"
+    ret=1
+fi
+
 exit $ret
diff --git a/doc/guides/contributing/abi_policy.rst b/doc/guides/contributing/abi_policy.rst
index 5fd4052585..3c4478692a 100644
--- a/doc/guides/contributing/abi_policy.rst
+++ b/doc/guides/contributing/abi_policy.rst
@@ -331,7 +331,22 @@ become part of a tracked ABI version.
 Note that marking an API as experimental is a multi step process.
 To mark an API as experimental, the symbols which are desired to be exported
 must be placed in an EXPERIMENTAL version block in the corresponding libraries'
-version map script.
+version map script. Experimental symbols must be commented so
+that it is clear in which DPDK version they were introduced.
+
+.. code-block:: none
+
+ EXPERIMENTAL {
+        global:
+
+        # added in 20.11
+        rte_foo_init;
+        rte_foo_configure;
+
+        # added in 21.02
+        rte_foo_cleanup;
+ ...
+
 Secondly, the corresponding prototypes of those exported functions (in the
 development header files), must be marked with the ``__rte_experimental`` tag
 (see ``rte_compat.h``).
diff --git a/drivers/baseband/acc/version.map b/drivers/baseband/acc/version.map
index 1b6b1cd10d..fa39a63f0f 100644
--- a/drivers/baseband/acc/version.map
+++ b/drivers/baseband/acc/version.map
@@ -5,5 +5,6 @@ DPDK_24 {
 EXPERIMENTAL {
 	global:
 
+	# added in 22.11
 	rte_acc_configure;
 };
diff --git a/drivers/baseband/fpga_5gnr_fec/version.map b/drivers/baseband/fpga_5gnr_fec/version.map
index 2da20cabc1..855ce55703 100644
--- a/drivers/baseband/fpga_5gnr_fec/version.map
+++ b/drivers/baseband/fpga_5gnr_fec/version.map
@@ -5,6 +5,7 @@ DPDK_24 {
 EXPERIMENTAL {
 	global:
 
+	# added in 20.11
 	rte_fpga_5gnr_fec_configure;
 
 };
diff --git a/drivers/baseband/fpga_lte_fec/version.map b/drivers/baseband/fpga_lte_fec/version.map
index 83f3a8a267..2c8e60375d 100644
--- a/drivers/baseband/fpga_lte_fec/version.map
+++ b/drivers/baseband/fpga_lte_fec/version.map
@@ -5,6 +5,6 @@ DPDK_24 {
 EXPERIMENTAL {
 	global:
 
+	# added in 20.11
 	rte_fpga_lte_fec_configure;
-
 };
diff --git a/drivers/bus/pci/version.map b/drivers/bus/pci/version.map
index 9e4d8f5e54..5d9dced5b2 100644
--- a/drivers/bus/pci/version.map
+++ b/drivers/bus/pci/version.map
@@ -17,6 +17,7 @@ DPDK_24 {
 EXPERIMENTAL {
 	global:
 
+	# added in 20.11
 	rte_pci_find_ext_capability;
 
 	# added in 21.08
diff --git a/drivers/dma/dpaa2/version.map b/drivers/dma/dpaa2/version.map
index 7dc2d6e185..713ed41f0c 100644
--- a/drivers/dma/dpaa2/version.map
+++ b/drivers/dma/dpaa2/version.map
@@ -3,6 +3,9 @@ DPDK_24 {
 };
 
 EXPERIMENTAL {
+	global:
+
+	# added in 22.07
 	rte_dpaa2_qdma_completed_multi;
 	rte_dpaa2_qdma_copy_multi;
 	rte_dpaa2_qdma_vchan_fd_us_enable;
diff --git a/drivers/event/dlb2/version.map b/drivers/event/dlb2/version.map
index 8aabf8b727..1d0a0a75d7 100644
--- a/drivers/event/dlb2/version.map
+++ b/drivers/event/dlb2/version.map
@@ -5,5 +5,6 @@ DPDK_24 {
 EXPERIMENTAL {
 	global:
 
+	# added in 20.11
 	rte_pmd_dlb2_set_token_pop_mode;
 };
diff --git a/drivers/mempool/cnxk/version.map b/drivers/mempool/cnxk/version.map
index 775d46d934..8249417527 100644
--- a/drivers/mempool/cnxk/version.map
+++ b/drivers/mempool/cnxk/version.map
@@ -4,6 +4,8 @@ DPDK_24 {
 
 EXPERIMENTAL {
 	global:
+
+	# added in 23.07
 	rte_pmd_cnxk_mempool_is_hwpool;
 	rte_pmd_cnxk_mempool_mbuf_exchange;
 	rte_pmd_cnxk_mempool_range_check_disable;
diff --git a/drivers/net/atlantic/version.map b/drivers/net/atlantic/version.map
index b063baa7a4..cbe9ee9263 100644
--- a/drivers/net/atlantic/version.map
+++ b/drivers/net/atlantic/version.map
@@ -5,6 +5,7 @@ DPDK_24 {
 EXPERIMENTAL {
 	global:
 
+	# added in 19.05
 	rte_pmd_atl_macsec_enable;
 	rte_pmd_atl_macsec_disable;
 	rte_pmd_atl_macsec_config_txsc;
diff --git a/drivers/net/i40e/version.map b/drivers/net/i40e/version.map
index 3ba31f4768..52b7a3269a 100644
--- a/drivers/net/i40e/version.map
+++ b/drivers/net/i40e/version.map
@@ -42,9 +42,14 @@ DPDK_24 {
 EXPERIMENTAL {
 	global:
 
+	# added in 19.11
+	rte_pmd_i40e_set_switch_dev;
+
+	# added in 20.08
 	rte_pmd_i40e_get_fdir_info;
 	rte_pmd_i40e_get_fdir_stats;
 	rte_pmd_i40e_set_gre_key_len;
+
+	# added in 23.07
 	rte_pmd_i40e_set_pf_src_prune;
-	rte_pmd_i40e_set_switch_dev;
 };
diff --git a/drivers/net/ixgbe/version.map b/drivers/net/ixgbe/version.map
index 2c9d977f5c..9a6ef29b1d 100644
--- a/drivers/net/ixgbe/version.map
+++ b/drivers/net/ixgbe/version.map
@@ -43,6 +43,7 @@ DPDK_24 {
 EXPERIMENTAL {
 	global:
 
+	# added in 20.08
 	rte_pmd_ixgbe_get_fdir_info;
 	rte_pmd_ixgbe_get_fdir_stats;
 };
diff --git a/lib/argparse/version.map b/lib/argparse/version.map
index 9b68464600..46da99a3e2 100644
--- a/lib/argparse/version.map
+++ b/lib/argparse/version.map
@@ -1,6 +1,7 @@
 EXPERIMENTAL {
 	global:
 
+	# added in 24.03
 	rte_argparse_parse;
 	rte_argparse_parse_type;
 
diff --git a/lib/metrics/version.map b/lib/metrics/version.map
index 4763ac6b8b..9766a1af5b 100644
--- a/lib/metrics/version.map
+++ b/lib/metrics/version.map
@@ -16,11 +16,11 @@ DPDK_24 {
 EXPERIMENTAL {
 	global:
 
+	# added in 20.05
 	rte_metrics_tel_encode_json_format;
 	rte_metrics_tel_reg_all_ethdev;
 	rte_metrics_tel_get_global_stats;
 	rte_metrics_tel_get_port_stats_ids;
 	rte_metrics_tel_get_ports_stats_json;
 	rte_metrics_tel_extract_data;
-
 };
diff --git a/lib/mldev/version.map b/lib/mldev/version.map
index 1978695314..84bdd6c300 100644
--- a/lib/mldev/version.map
+++ b/lib/mldev/version.map
@@ -1,6 +1,7 @@
 EXPERIMENTAL {
 	global:
 
+	# added in 22.11
 	rte_ml_dequeue_burst;
 	rte_ml_dev_close;
 	rte_ml_dev_configure;
diff --git a/lib/regexdev/version.map b/lib/regexdev/version.map
index 3c6e9fffa1..4c0435180c 100644
--- a/lib/regexdev/version.map
+++ b/lib/regexdev/version.map
@@ -1,7 +1,7 @@
 EXPERIMENTAL {
 	global:
 
-	rte_regex_devices;
+	# added in 20.08
 	rte_regexdev_attr_get;
 	rte_regexdev_attr_set;
 	rte_regexdev_close;
@@ -12,8 +12,6 @@ EXPERIMENTAL {
 	rte_regexdev_enqueue_burst;
 	rte_regexdev_get_dev_id;
 	rte_regexdev_info_get;
-	rte_regexdev_is_valid_dev;
-	rte_regexdev_logtype;
 	rte_regexdev_queue_pair_setup;
 	rte_regexdev_rule_db_compile_activate;
 	rte_regexdev_rule_db_export;
@@ -27,6 +25,11 @@ EXPERIMENTAL {
 	rte_regexdev_xstats_names_get;
 	rte_regexdev_xstats_reset;
 
+	# added in 22.03
+	rte_regex_devices;
+	rte_regexdev_is_valid_dev;
+	rte_regexdev_logtype;
+
 	local: *;
 };
 
diff --git a/lib/reorder/version.map b/lib/reorder/version.map
index ea60759106..5baeab56f8 100644
--- a/lib/reorder/version.map
+++ b/lib/reorder/version.map
@@ -15,11 +15,13 @@ DPDK_24 {
 EXPERIMENTAL {
 	global:
 
+	# added in 20.11
 	rte_reorder_seqn_dynfield_offset;
 
 	# added in 23.03
 	rte_reorder_drain_up_to_seqn;
 	rte_reorder_min_seqn_set;
+
 	# added in 23.07
 	rte_reorder_memory_footprint_get;
 };
-- 
2.43.0


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v4 1/7] ethdev: support report register names and filter
  2024-02-29  9:52  3%     ` Thomas Monjalon
@ 2024-03-05  7:45  5%       ` Jie Hai
  0 siblings, 0 replies; 200+ results
From: Jie Hai @ 2024-03-05  7:45 UTC (permalink / raw)
  To: Thomas Monjalon, ferruh.yigit
  Cc: dev, lihuisong, fengchengwen, liuyonglong, huangdengdui

Hi, Thomas ,

Thanks for your review.
On 2024/2/29 17:52, Thomas Monjalon wrote:
> 26/02/2024 04:07, Jie Hai:
>> This patch adds "filter" and "names" fields to "rte_dev_reg_info"
>> structure. Names of registers in data fields can be reported and
>> the registers can be filtered by their names.
>>
>> The new API rte_eth_dev_get_reg_info_ext() is added to support
>> reporting names and filtering by names. And the original API
>> rte_eth_dev_get_reg_info() does not use the name and filter fields.
>> A local variable is used in rte_eth_dev_get_reg_info for
>> compatibility. If the drivers does not report the names, set them
>> to "offset_XXX".
> 
> Isn't it possible to implement filtering in the original function?
> What would it break?
> 
If we implement filtering in the original function
rte_eth_dev_get_reg_info(), ABI would be broken and
old app cannot run with linked to the new library.

Existing binary applications will have backwards compatibility with our 
current patch.
Although the ABI is modified, the old app can still behave normally in 
the case of dynamic linking with the new library.
And the new APP using rte_eth_dev_get_reg_info() works well with  the 
old library.
>> @@ -20,6 +25,12 @@ struct rte_dev_reg_info {
>>   	uint32_t length; /**< Number of registers to fetch */
>>   	uint32_t width; /**< Size of device register */
>>   	uint32_t version; /**< Device version */
>> +	/**
>> +	 * Filter for target subset of registers.
>> +	 * This field could affects register selection for data/length/names.
>> +	 */
>> +	const char *filter;
>> +	struct rte_eth_reg_name *names; /**< Registers name saver */
>>   };
> 
> I suppose this is an ABI break?
> Confirmed: http://mails.dpdk.org/archives/test-report/2024-February/587314.html
> 
I think  it is ABI change but not ABI break. please see above.
> 
> .
Best regards,
Jie Hai

^ permalink raw reply	[relevance 5%]

* RE: [RFC PATCH 1/2] power: refactor core power management library
  2024-03-01  2:56  3%   ` lihuisong (C)
  2024-03-01 10:39  0%     ` Hunt, David
@ 2024-03-05  4:35  3%     ` Tummala, Sivaprasad
  1 sibling, 0 replies; 200+ results
From: Tummala, Sivaprasad @ 2024-03-05  4:35 UTC (permalink / raw)
  To: lihuisong (C),
	david.hunt, anatoly.burakov, jerinj, radu.nicolau, gakhil,
	cristian.dumitrescu, Yigit, Ferruh, konstantin.ananyev
  Cc: dev

[AMD Official Use Only - General]

Hi Lihuisong,

> -----Original Message-----
> From: lihuisong (C) <lihuisong@huawei.com>
> Sent: Friday, March 1, 2024 8:27 AM
> To: Tummala, Sivaprasad <Sivaprasad.Tummala@amd.com>;
> david.hunt@intel.com; anatoly.burakov@intel.com; jerinj@marvell.com;
> radu.nicolau@intel.com; gakhil@marvell.com; cristian.dumitrescu@intel.com; Yigit,
> Ferruh <Ferruh.Yigit@amd.com>; konstantin.ananyev@huawei.com
> Cc: dev@dpdk.org
> Subject: Re: [RFC PATCH 1/2] power: refactor core power management library
>
> Caution: This message originated from an External Source. Use proper caution
> when opening attachments, clicking links, or responding.
>
>
> 在 2024/2/20 23:33, Sivaprasad Tummala 写道:
> > This patch introduces a comprehensive refactor to the core power
> > management library. The primary focus is on improving modularity and
> > organization by relocating specific driver implementations from the
> > 'lib/power' directory to dedicated directories within
> > 'drivers/power/core/*'. The adjustment of meson.build files enables
> > the selective activation of individual drivers.
> >
> > These changes contribute to a significant enhancement in code
> > organization, providing a clearer structure for driver implementations.
> > The refactor aims to improve overall code clarity and boost
> > maintainability. Additionally, it establishes a foundation for future
> > development, allowing for more focused work on individual drivers and
> > seamless integration of forthcoming enhancements.
>
> Good job. +1 to refacotor.
>
> <...>
>
> > diff --git a/drivers/meson.build b/drivers/meson.build index
> > f2be71bc05..e293c3945f 100644
> > --- a/drivers/meson.build
> > +++ b/drivers/meson.build
> > @@ -28,6 +28,7 @@ subdirs = [
> >           'event',          # depends on common, bus, mempool and net.
> >           'baseband',       # depends on common and bus.
> >           'gpu',            # depends on common and bus.
> > +        'power',          # depends on common (in future).
> >   ]
> >
> >   if meson.is_cross_build()
> > diff --git a/drivers/power/core/acpi/meson.build
> > b/drivers/power/core/acpi/meson.build
> > new file mode 100644
> > index 0000000000..d10ec8ee94
> > --- /dev/null
> > +++ b/drivers/power/core/acpi/meson.build
> > @@ -0,0 +1,8 @@
> > +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2024 AMD
> > +Limited
> > +
> > +sources = files('power_acpi_cpufreq.c')
> > +
> > +headers = files('power_acpi_cpufreq.h')
> > +
> > +deps += ['power']
> > diff --git a/lib/power/power_acpi_cpufreq.c
> > b/drivers/power/core/acpi/power_acpi_cpufreq.c
> > similarity index 95%
> > rename from lib/power/power_acpi_cpufreq.c rename to
> > drivers/power/core/acpi/power_acpi_cpufreq.c
> This file is in power lib.
> How about remove the 'power' prefix of this file name?
> like acpi_cpufreq.c, cppc_cpufreq.c.
ACK

> > index f8d978d03d..69d80ad2ae 100644
> > --- a/lib/power/power_acpi_cpufreq.c
> > +++ b/drivers/power/core/acpi/power_acpi_cpufreq.c
> > @@ -577,3 +577,22 @@ int power_acpi_get_capabilities(unsigned int
> > lcore_id,
> >
> >       return 0;
> >   }
> > +
> > +static struct rte_power_ops acpi_ops = {
> How about use the following structure name?
> "struct rte_power_cpufreq_ops" or "struct rte_power_core_ops"
> After all, we also have other power ops, like uncore, right?
Agreed.
> > +     .init = power_acpi_cpufreq_init,
> > +     .exit = power_acpi_cpufreq_exit,
> > +     .check_env_support = power_acpi_cpufreq_check_supported,
> > +     .get_avail_freqs = power_acpi_cpufreq_freqs,
> > +     .get_freq = power_acpi_cpufreq_get_freq,
> > +     .set_freq = power_acpi_cpufreq_set_freq,
> > +     .freq_down = power_acpi_cpufreq_freq_down,
> > +     .freq_up = power_acpi_cpufreq_freq_up,
> > +     .freq_max = power_acpi_cpufreq_freq_max,
> > +     .freq_min = power_acpi_cpufreq_freq_min,
> > +     .turbo_status = power_acpi_turbo_status,
> > +     .enable_turbo = power_acpi_enable_turbo,
> > +     .disable_turbo = power_acpi_disable_turbo,
> > +     .get_caps = power_acpi_get_capabilities };
> > +
> > +RTE_POWER_REGISTER_OPS(acpi_ops);
> > diff --git a/lib/power/power_acpi_cpufreq.h
> > b/drivers/power/core/acpi/power_acpi_cpufreq.h
> > similarity index 100%
> > rename from lib/power/power_acpi_cpufreq.h rename to
> > drivers/power/core/acpi/power_acpi_cpufreq.h
> > diff --git a/drivers/power/core/amd-pstate/meson.build
> > b/drivers/power/core/amd-pstate/meson.build
> > new file mode 100644
> > index 0000000000..8ec4c960f5
> > --- /dev/null
> > +++ b/drivers/power/core/amd-pstate/meson.build
> > @@ -0,0 +1,8 @@
> > +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2024 AMD
> > +Limited
> > +
> > +sources = files('power_amd_pstate_cpufreq.c')
> > +
> > +headers = files('power_amd_pstate_cpufreq.h')
> > +
> > +deps += ['power']
> > diff --git a/lib/power/power_amd_pstate_cpufreq.c
> > b/drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.c
> > similarity index 95%
> > rename from lib/power/power_amd_pstate_cpufreq.c
> > rename to drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.c
> > index 028f84416b..9938de72a6 100644
> > --- a/lib/power/power_amd_pstate_cpufreq.c
> > +++ b/drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.c
> > @@ -700,3 +700,22 @@ power_amd_pstate_get_capabilities(unsigned int
> > lcore_id,
> >
> >       return 0;
> >   }
> > +
> > +static struct rte_power_ops amd_pstate_ops = {
> > +     .init = power_amd_pstate_cpufreq_init,
> > +     .exit = power_amd_pstate_cpufreq_exit,
> > +     .check_env_support = power_amd_pstate_cpufreq_check_supported,
> > +     .get_avail_freqs = power_amd_pstate_cpufreq_freqs,
> > +     .get_freq = power_amd_pstate_cpufreq_get_freq,
> > +     .set_freq = power_amd_pstate_cpufreq_set_freq,
> > +     .freq_down = power_amd_pstate_cpufreq_freq_down,
> > +     .freq_up = power_amd_pstate_cpufreq_freq_up,
> > +     .freq_max = power_amd_pstate_cpufreq_freq_max,
> > +     .freq_min = power_amd_pstate_cpufreq_freq_min,
> > +     .turbo_status = power_amd_pstate_turbo_status,
> > +     .enable_turbo = power_amd_pstate_enable_turbo,
> > +     .disable_turbo = power_amd_pstate_disable_turbo,
> > +     .get_caps = power_amd_pstate_get_capabilities };
> > +
> > +RTE_POWER_REGISTER_OPS(amd_pstate_ops);
> > diff --git a/lib/power/power_amd_pstate_cpufreq.h
> > b/drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.h
> > similarity index 100%
> > rename from lib/power/power_amd_pstate_cpufreq.h
> > rename to drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.h
> > diff --git a/drivers/power/core/cppc/meson.build
> > b/drivers/power/core/cppc/meson.build
> > new file mode 100644
> > index 0000000000..06f3b99bb8
> > --- /dev/null
> > +++ b/drivers/power/core/cppc/meson.build
> > @@ -0,0 +1,8 @@
> > +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2024 AMD
> > +Limited
> > +
> > +sources = files('power_cppc_cpufreq.c')
> > +
> > +headers = files('power_cppc_cpufreq.h')
> > +
> > +deps += ['power']
> > diff --git a/lib/power/power_cppc_cpufreq.c
> > b/drivers/power/core/cppc/power_cppc_cpufreq.c
> > similarity index 96%
> > rename from lib/power/power_cppc_cpufreq.c rename to
> > drivers/power/core/cppc/power_cppc_cpufreq.c
> > index 3ddf39bd76..605f633309 100644
> > --- a/lib/power/power_cppc_cpufreq.c
> > +++ b/drivers/power/core/cppc/power_cppc_cpufreq.c
> > @@ -685,3 +685,22 @@ power_cppc_get_capabilities(unsigned int
> > lcore_id,
> >
> >       return 0;
> >   }
> > +
> > +static struct rte_power_ops cppc_ops = {
> > +     .init = power_cppc_cpufreq_init,
> > +     .exit = power_cppc_cpufreq_exit,
> > +     .check_env_support = power_cppc_cpufreq_check_supported,
> > +     .get_avail_freqs = power_cppc_cpufreq_freqs,
> > +     .get_freq = power_cppc_cpufreq_get_freq,
> > +     .set_freq = power_cppc_cpufreq_set_freq,
> > +     .freq_down = power_cppc_cpufreq_freq_down,
> > +     .freq_up = power_cppc_cpufreq_freq_up,
> > +     .freq_max = power_cppc_cpufreq_freq_max,
> > +     .freq_min = power_cppc_cpufreq_freq_min,
> > +     .turbo_status = power_cppc_turbo_status,
> > +     .enable_turbo = power_cppc_enable_turbo,
> > +     .disable_turbo = power_cppc_disable_turbo,
> > +     .get_caps = power_cppc_get_capabilities };
> > +
> > +RTE_POWER_REGISTER_OPS(cppc_ops);
> > diff --git a/lib/power/power_cppc_cpufreq.h
> > b/drivers/power/core/cppc/power_cppc_cpufreq.h
> > similarity index 100%
> > rename from lib/power/power_cppc_cpufreq.h rename to
> > drivers/power/core/cppc/power_cppc_cpufreq.h
> > diff --git a/lib/power/guest_channel.c
> > b/drivers/power/core/kvm-vm/guest_channel.c
> > similarity index 100%
> > rename from lib/power/guest_channel.c
> > rename to drivers/power/core/kvm-vm/guest_channel.c
> > diff --git a/lib/power/guest_channel.h
> > b/drivers/power/core/kvm-vm/guest_channel.h
> > similarity index 100%
> > rename from lib/power/guest_channel.h
> > rename to drivers/power/core/kvm-vm/guest_channel.h
> > diff --git a/drivers/power/core/kvm-vm/meson.build
> > b/drivers/power/core/kvm-vm/meson.build
> > new file mode 100644
> > index 0000000000..3150c6674b
> > --- /dev/null
> > +++ b/drivers/power/core/kvm-vm/meson.build
> > @@ -0,0 +1,20 @@
> > +# SPDX-License-Identifier: BSD-3-Clause # Copyright(C) 2024 AMD
> > +Limited.
> > +#
> > +
> > +if not is_linux
> > +    build = false
> > +    reason = 'only supported on Linux'
> > +    subdir_done()
> > +endif
> > +
> > +sources = files(
> > +        'guest_channel.c',
> > +        'power_kvm_vm.c',
> > +)
> > +
> > +headers = files(
> > +        'guest_channel.h',
> > +        'power_kvm_vm.h',
> > +)
> > +deps += ['power']
> > diff --git a/lib/power/power_kvm_vm.c
> > b/drivers/power/core/kvm-vm/power_kvm_vm.c
> > similarity index 83%
> > rename from lib/power/power_kvm_vm.c
> > rename to drivers/power/core/kvm-vm/power_kvm_vm.c
> > index f15be8fac5..a5d6984d26 100644
> > --- a/lib/power/power_kvm_vm.c
> > +++ b/drivers/power/core/kvm-vm/power_kvm_vm.c
> > @@ -137,3 +137,22 @@ int power_kvm_vm_get_capabilities(__rte_unused
> unsigned int lcore_id,
> >       POWER_LOG(ERR, "rte_power_get_capabilities is not implemented for Virtual
> Machine Power Management");
> >       return -ENOTSUP;
> >   }
> > +
> > +static struct rte_power_ops kvm_vm_ops = {
> > +     .init = power_kvm_vm_init,
> > +     .exit = power_kvm_vm_exit,
> > +     .check_env_support = power_kvm_vm_check_supported,
> > +     .get_avail_freqs = power_kvm_vm_freqs,
> > +     .get_freq = power_kvm_vm_get_freq,
> > +     .set_freq = power_kvm_vm_set_freq,
> > +     .freq_down = power_kvm_vm_freq_down,
> > +     .freq_up = power_kvm_vm_freq_up,
> > +     .freq_max = power_kvm_vm_freq_max,
> > +     .freq_min = power_kvm_vm_freq_min,
> > +     .turbo_status = power_kvm_vm_turbo_status,
> > +     .enable_turbo = power_kvm_vm_enable_turbo,
> > +     .disable_turbo = power_kvm_vm_disable_turbo,
> > +     .get_caps = power_kvm_vm_get_capabilities };
> > +
> > +RTE_POWER_REGISTER_OPS(kvm_vm_ops);
> > diff --git a/lib/power/power_kvm_vm.h
> > b/drivers/power/core/kvm-vm/power_kvm_vm.h
> > similarity index 100%
> > rename from lib/power/power_kvm_vm.h
> > rename to drivers/power/core/kvm-vm/power_kvm_vm.h
> > diff --git a/drivers/power/core/meson.build
> > b/drivers/power/core/meson.build new file mode 100644 index
> > 0000000000..4081dafaa0
> > --- /dev/null
> > +++ b/drivers/power/core/meson.build
> > @@ -0,0 +1,12 @@
> > +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2024 AMD
> > +Limited
> > +
> > +drivers = [
> > +        'acpi',
> > +        'amd-pstate',
> > +        'cppc',
> > +        'kvm-vm',
> > +        'pstate'
> > +]
> > +
> > +std_deps = ['power']
> > diff --git a/drivers/power/core/pstate/meson.build
> > b/drivers/power/core/pstate/meson.build
> > new file mode 100644
> > index 0000000000..1025c64e48
> > --- /dev/null
> > +++ b/drivers/power/core/pstate/meson.build
> > @@ -0,0 +1,8 @@
> > +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2024 AMD
> > +Limited
> > +
> > +sources = files('power_pstate_cpufreq.c')
> > +
> > +headers = files('power_pstate_cpufreq.h')
> > +
> > +deps += ['power']
> > diff --git a/lib/power/power_pstate_cpufreq.c
> > b/drivers/power/core/pstate/power_pstate_cpufreq.c
> > similarity index 96%
> > rename from lib/power/power_pstate_cpufreq.c rename to
> > drivers/power/core/pstate/power_pstate_cpufreq.c
> > index 73138dc4e4..d4c3645ff8 100644
> > --- a/lib/power/power_pstate_cpufreq.c
> > +++ b/drivers/power/core/pstate/power_pstate_cpufreq.c
> > @@ -888,3 +888,22 @@ int power_pstate_get_capabilities(unsigned int
> > lcore_id,
> >
> >       return 0;
> >   }
> > +
> > +static struct rte_power_ops pstate_ops = {
> > +     .init = power_pstate_cpufreq_init,
> > +     .exit = power_pstate_cpufreq_exit,
> > +     .check_env_support = power_pstate_cpufreq_check_supported,
> > +     .get_avail_freqs = power_pstate_cpufreq_freqs,
> > +     .get_freq = power_pstate_cpufreq_get_freq,
> > +     .set_freq = power_pstate_cpufreq_set_freq,
> > +     .freq_down = power_pstate_cpufreq_freq_down,
> > +     .freq_up = power_pstate_cpufreq_freq_up,
> > +     .freq_max = power_pstate_cpufreq_freq_max,
> > +     .freq_min = power_pstate_cpufreq_freq_min,
> > +     .turbo_status = power_pstate_turbo_status,
> > +     .enable_turbo = power_pstate_enable_turbo,
> > +     .disable_turbo = power_pstate_disable_turbo,
> > +     .get_caps = power_pstate_get_capabilities };
> > +
> > +RTE_POWER_REGISTER_OPS(pstate_ops);
> > diff --git a/lib/power/power_pstate_cpufreq.h
> > b/drivers/power/core/pstate/power_pstate_cpufreq.h
> > similarity index 100%
> > rename from lib/power/power_pstate_cpufreq.h rename to
> > drivers/power/core/pstate/power_pstate_cpufreq.h
> > diff --git a/drivers/power/meson.build b/drivers/power/meson.build new
> > file mode 100644 index 0000000000..7d9034c7ac
> > --- /dev/null
> > +++ b/drivers/power/meson.build
> > @@ -0,0 +1,8 @@
> > +# SPDX-License-Identifier: BSD-3-Clause # Copyright(c) 2024 AMD
> > +Limited
> > +
> > +drivers = [
> > +        'core',
> > +]
> > +
> > +std_deps = ['power']
> > diff --git a/lib/power/meson.build b/lib/power/meson.build index
> > b8426589b2..207d96d877 100644
> > --- a/lib/power/meson.build
> > +++ b/lib/power/meson.build
> > @@ -12,14 +12,8 @@ if not is_linux
> >       reason = 'only supported on Linux'
> >   endif
> >   sources = files(
> > -        'guest_channel.c',
> > -        'power_acpi_cpufreq.c',
> > -        'power_amd_pstate_cpufreq.c',
> >           'power_common.c',
> > -        'power_cppc_cpufreq.c',
> > -        'power_kvm_vm.c',
> >           'power_intel_uncore.c',
> > -        'power_pstate_cpufreq.c',
> >           'rte_power.c',
> >           'rte_power_uncore.c',
> >           'rte_power_pmd_mgmt.c',
> > diff --git a/lib/power/power_common.h b/lib/power/power_common.h index
> > 30966400ba..c90b611f4f 100644
> > --- a/lib/power/power_common.h
> > +++ b/lib/power/power_common.h
> > @@ -23,13 +23,24 @@ extern int power_logtype;
> >   #endif
> >
> >   /* check if scaling driver matches one we want */
> > +__rte_internal
> >   int cpufreq_check_scaling_driver(const char *driver);
> > +
> > +__rte_internal
> >   int power_set_governor(unsigned int lcore_id, const char *new_governor,
> >               char *orig_governor, size_t orig_governor_len);
> > +
> > +__rte_internal
> >   int open_core_sysfs_file(FILE **f, const char *mode, const char *format, ...)
> >               __rte_format_printf(3, 4);
> > +
> > +__rte_internal
> >   int read_core_sysfs_u32(FILE *f, uint32_t *val);
> > +
> > +__rte_internal
> >   int read_core_sysfs_s(FILE *f, char *buf, unsigned int len);
> > +
> > +__rte_internal
> >   int write_core_sysfs_s(FILE *f, const char *str);
> >
> >   #endif /* _POWER_COMMON_H_ */
> > diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c index
> > 36c3f3da98..70176807f4 100644
> > --- a/lib/power/rte_power.c
> > +++ b/lib/power/rte_power.c
> > @@ -8,64 +8,80 @@
> >   #include <rte_spinlock.h>
> >
> >   #include "rte_power.h"
> > -#include "power_acpi_cpufreq.h"
> > -#include "power_cppc_cpufreq.h"
> >   #include "power_common.h"
> > -#include "power_kvm_vm.h"
> > -#include "power_pstate_cpufreq.h"
> > -#include "power_amd_pstate_cpufreq.h"
> >
> >   enum power_management_env global_default_env = PM_ENV_NOT_SET;
> use a pointer to save the current power cpufreq ops?
ACK

> >
> >   static rte_spinlock_t global_env_cfg_lock =
> > RTE_SPINLOCK_INITIALIZER;
> > +static struct rte_power_ops rte_power_ops[PM_ENV_MAX];
> >
> > -/* function pointers */
> > -rte_power_freqs_t rte_power_freqs  = NULL; -rte_power_get_freq_t
> > rte_power_get_freq = NULL; -rte_power_set_freq_t rte_power_set_freq =
> > NULL; -rte_power_freq_change_t rte_power_freq_up = NULL;
> > -rte_power_freq_change_t rte_power_freq_down = NULL;
> > -rte_power_freq_change_t rte_power_freq_max = NULL;
> > -rte_power_freq_change_t rte_power_freq_min = NULL;
> > -rte_power_freq_change_t rte_power_turbo_status;
> > -rte_power_freq_change_t rte_power_freq_enable_turbo;
> > -rte_power_freq_change_t rte_power_freq_disable_turbo;
> > -rte_power_get_capabilities_t rte_power_get_capabilities;
> > -
> > -static void
> > -reset_power_function_ptrs(void)
> > +/* register the ops struct in rte_power_ops, return 0 on success. */
> > +int rte_power_register_ops(const struct rte_power_ops *op) {
> > +     struct rte_power_ops *ops;
> > +
> > +     if (op->env >= PM_ENV_MAX) {
> > +             POWER_LOG(ERR, "Unsupported power management environment\n");
> > +             return -EINVAL;
> > +     }
> > +
> > +     if (op->status != 0) {
> > +             POWER_LOG(ERR, "Power management env[%d] ops registered
> already\n",
> > +                     op->env);
> > +             return -EINVAL;
> > +     }
> > +
> > +     if (!op->init || !op->exit || !op->check_env_support ||
> > +             !op->get_avail_freqs || !op->get_freq || !op->set_freq ||
> > +             !op->freq_up || !op->freq_down || !op->freq_max ||
> > +             !op->freq_min || !op->turbo_status || !op->enable_turbo ||
> > +             !op->disable_turbo || !op->get_caps) {
> > +             POWER_LOG(ERR, "Missing callbacks while registering power ops\n");
> > +             return -EINVAL;
> > +     }
> > +
> > +     ops = &rte_power_ops[op->env];
> It is better to use a global linked list instead of an array.
> And we should extract a list structure including this ops structure and this ops's
> owner.
> > +     ops->env = op->env;
> > +     ops->init = op->init;
> > +     ops->exit = op->exit;
> > +     ops->check_env_support = op->check_env_support;
> > +     ops->get_avail_freqs = op->get_avail_freqs;
> > +     ops->get_freq = op->get_freq;
> > +     ops->set_freq = op->set_freq;
> > +     ops->freq_up = op->freq_up;
> > +     ops->freq_down = op->freq_down;
> > +     ops->freq_max = op->freq_max;
> > +     ops->freq_min = op->freq_min;
> > +     ops->turbo_status = op->turbo_status;
> > +     ops->enable_turbo = op->enable_turbo;
> > +     ops->disable_turbo = op->disable_turbo;
> *ops = *op?
> > +     ops->status = 1; /* registered */
> status --> registered?
> But if use ops linked list, this flag also can be removed.
> > +
> > +     return 0;
> > +}
> > +
> > +struct rte_power_ops *
> > +rte_power_get_ops(int ops_index)
> AFAICS, there is only one cpufreq driver on one platform and just have one
> power_cpufreq_ops to use for user.
> We don't need user to get other power ops, and user just want to know the power
> ops using currently, right?
> So using 'index' toget this ops is not good.
Agreed! I will rework this to make it global.
> >   {
> > -     rte_power_freqs  = NULL;
> > -     rte_power_get_freq = NULL;
> > -     rte_power_set_freq = NULL;
> > -     rte_power_freq_up = NULL;
> > -     rte_power_freq_down = NULL;
> > -     rte_power_freq_max = NULL;
> > -     rte_power_freq_min = NULL;
> > -     rte_power_turbo_status = NULL;
> > -     rte_power_freq_enable_turbo = NULL;
> > -     rte_power_freq_disable_turbo = NULL;
> > -     rte_power_get_capabilities = NULL;
> > +     RTE_VERIFY((ops_index >= PM_ENV_NOT_SET) && (ops_index <
> PM_ENV_MAX));
> > +     RTE_VERIFY(rte_power_ops[ops_index].status != 0);
> > +
> > +     return &rte_power_ops[ops_index];
> >   }
> >
> >   int
> >   rte_power_check_env_supported(enum power_management_env env)
> >   {
> > -     switch (env) {
> > -     case PM_ENV_ACPI_CPUFREQ:
> > -             return power_acpi_cpufreq_check_supported();
> > -     case PM_ENV_PSTATE_CPUFREQ:
> > -             return power_pstate_cpufreq_check_supported();
> > -     case PM_ENV_KVM_VM:
> > -             return power_kvm_vm_check_supported();
> > -     case PM_ENV_CPPC_CPUFREQ:
> > -             return power_cppc_cpufreq_check_supported();
> > -     case PM_ENV_AMD_PSTATE_CPUFREQ:
> > -             return power_amd_pstate_cpufreq_check_supported();
> > -     default:
> > -             rte_errno = EINVAL;
> > -             return -1;
> > +     struct rte_power_ops *ops;
> > +
> > +     if ((env > PM_ENV_NOT_SET) && (env < PM_ENV_MAX)) {
> > +             ops = rte_power_get_ops(env);
> > +             return ops->check_env_support();
> >       }
> > +
> > +     rte_errno = EINVAL;
> > +     return -1;
> >   }
> >
> >   int
> > @@ -80,80 +96,26 @@ rte_power_set_env(enum power_management_env
> env)
> >       }
> >
> >       int ret = 0;
> > +     struct rte_power_ops *ops;
> > +
> > +     if ((env == PM_ENV_NOT_SET) || (env >= PM_ENV_MAX)) {
> > +             POWER_LOG(ERR, "Invalid Power Management Environment(%d)"
> > +                             " set\n", env);
> > +             ret = -1;
> > +     }
> >
> <...>
> > +     ops = rte_power_get_ops(env);
> To find the target ops from the global list according to the env?
> > +     if (ops->status == 0) {
> > +             POWER_LOG(ERR, WER,
> > +                     "Power Management Environment(%d) not"
> > +                     " registered\n", env);
> >               ret = -1;
> >       }
> >
> >       if (ret == 0)
> >               global_default_env = env;
> It is more convenient to use a global variable to point to the default power_cpufreq
> ops or its list node.
Agreed
> > -     else {
> > +     else
> >               global_default_env = PM_ENV_NOT_SET;
> > -             reset_power_function_ptrs();
> > -     }
> >
> >       rte_spinlock_unlock(&global_env_cfg_lock);
> >       return ret;
> > @@ -164,7 +126,6 @@ rte_power_unset_env(void)
> >   {
> >       rte_spinlock_lock(&global_env_cfg_lock);
> >       global_default_env = PM_ENV_NOT_SET;
> > -     reset_power_function_ptrs();
> >       rte_spinlock_unlock(&global_env_cfg_lock);
> >   }
> >
> > @@ -177,59 +138,76 @@ int
> >   rte_power_init(unsigned int lcore_id)
> >   {
> >       int ret = -1;
> > +     struct rte_power_ops *ops;
> >
> > -     switch (global_default_env) {
> > -     case PM_ENV_ACPI_CPUFREQ:
> > -             return power_acpi_cpufreq_init(lcore_id);
> > -     case PM_ENV_KVM_VM:
> > -             return power_kvm_vm_init(lcore_id);
> > -     case PM_ENV_PSTATE_CPUFREQ:
> > -             return power_pstate_cpufreq_init(lcore_id);
> > -     case PM_ENV_CPPC_CPUFREQ:
> > -             return power_cppc_cpufreq_init(lcore_id);
> > -     case PM_ENV_AMD_PSTATE_CPUFREQ:
> > -             return power_amd_pstate_cpufreq_init(lcore_id);
> > -     default:
> > -             POWER_LOG(INFO, "Env isn't set yet!");
> > +     if (global_default_env != PM_ENV_NOT_SET) {
> > +             ops = &rte_power_ops[global_default_env];
> > +             if (!ops->status) {
> > +                     POWER_LOG(ERR, "Power management env[%d] not"
> > +                             " supported\n", global_default_env);
> > +                     goto out;
> > +             }
> > +             return ops->init(lcore_id);
> >       }
> > +     POWER_LOG(INFO, POWER, "Env isn't set yet!\n");
> >
> >       /* Auto detect Environment */
> > -     POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq power
> management...");
> > -     ret = power_acpi_cpufreq_init(lcore_id);
> > -     if (ret == 0) {
> > -             rte_power_set_env(PM_ENV_ACPI_CPUFREQ);
> > -             goto out;
> > +     POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq"
> > +                     " power management...\n");
> > +     ops = &rte_power_ops[PM_ENV_ACPI_CPUFREQ];
> > +     if (ops->status) {
> > +             ret = ops->init(lcore_id);
> > +             if (ret == 0) {
> > +                     rte_power_set_env(PM_ENV_ACPI_CPUFREQ);
> > +                     goto out;
> > +             }
> >       }
> >
> > -     POWER_LOG(INFO, "Attempting to initialise PSTAT power management...");
> > -     ret = power_pstate_cpufreq_init(lcore_id);
> > -     if (ret == 0) {
> > -             rte_power_set_env(PM_ENV_PSTATE_CPUFREQ);
> > -             goto out;
> > +     POWER_LOG(INFO, "Attempting to initialise PSTAT"
> > +                     " power management...\n");
> > +     ops = &rte_power_ops[PM_ENV_PSTATE_CPUFREQ];
> > +     if (ops->status) {
> > +             ret = ops->init(lcore_id);
> > +             if (ret == 0) {
> > +                     rte_power_set_env(PM_ENV_PSTATE_CPUFREQ);
> > +                     goto out;
> > +             }
> >       }
> >
> > -     POWER_LOG(INFO, "Attempting to initialise AMD PSTATE power
> management...");
> > -     ret = power_amd_pstate_cpufreq_init(lcore_id);
> > -     if (ret == 0) {
> > -             rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ);
> > -             goto out;
> > +     POWER_LOG(INFO, "Attempting to initialise AMD PSTATE"
> > +                     " power management...\n");
> > +     ops = &rte_power_ops[PM_ENV_AMD_PSTATE_CPUFREQ];
> > +     if (ops->status) {
> > +             ret = ops->init(lcore_id);
> > +             if (ret == 0) {
> > +                     rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ);
> > +                     goto out;
> > +             }
> >       }
> >
> > -     POWER_LOG(INFO, "Attempting to initialise CPPC power management...");
> > -     ret = power_cppc_cpufreq_init(lcore_id);
> > -     if (ret == 0) {
> > -             rte_power_set_env(PM_ENV_CPPC_CPUFREQ);
> > -             goto out;
> > +     POWER_LOG(INFO, "Attempting to initialise CPPC power"
> > +                     " management...\n");
> > +     ops = &rte_power_ops[PM_ENV_CPPC_CPUFREQ];
> > +     if (ops->status) {
> > +             ret = ops->init(lcore_id);
> > +             if (ret == 0) {
> > +                     rte_power_set_env(PM_ENV_CPPC_CPUFREQ);
> > +                     goto out;
> > +             }
> >       }
> >
> > -     POWER_LOG(INFO, "Attempting to initialise VM power management...");
> > -     ret = power_kvm_vm_init(lcore_id);
> > -     if (ret == 0) {
> > -             rte_power_set_env(PM_ENV_KVM_VM);
> > -             goto out;
> > +     POWER_LOG(INFO, "Attempting to initialise VM power"
> > +                     " management...\n");
> > +     ops = &rte_power_ops[PM_ENV_KVM_VM];
> > +     if (ops->status) {
> > +             ret = ops->init(lcore_id);
> > +             if (ret == 0) {
> > +                     rte_power_set_env(PM_ENV_KVM_VM);
> > +                     goto out;
> > +             }
> >       }
> If we use a linked list, above code can be simpled like this:
> ->
> for_each_power_cpufreq_ops(ops, ...) {
>      ret = ops->init()
>      if (ret) {
>          ....
>      }
> }
ACK
> > -     POWER_LOG(ERR, "Unable to set Power Management Environment for lcore "
> > -                     "%u", lcore_id);
> > +     POWER_LOG(ERR, "Unable to set Power Management Environment"
> > +                     " for lcore %u\n", lcore_id);
> >   out:
> >       return ret;
> >   }
> > @@ -237,21 +215,14 @@ rte_power_init(unsigned int lcore_id)
> >   int
> >   rte_power_exit(unsigned int lcore_id)
> >   {
> > -     switch (global_default_env) {
> > -     case PM_ENV_ACPI_CPUFREQ:
> > -             return power_acpi_cpufreq_exit(lcore_id);
> > -     case PM_ENV_KVM_VM:
> > -             return power_kvm_vm_exit(lcore_id);
> > -     case PM_ENV_PSTATE_CPUFREQ:
> > -             return power_pstate_cpufreq_exit(lcore_id);
> > -     case PM_ENV_CPPC_CPUFREQ:
> > -             return power_cppc_cpufreq_exit(lcore_id);
> > -     case PM_ENV_AMD_PSTATE_CPUFREQ:
> > -             return power_amd_pstate_cpufreq_exit(lcore_id);
> > -     default:
> > -             POWER_LOG(ERR, "Environment has not been set, unable to exit
> gracefully");
> > +     struct rte_power_ops *ops;
> >
> > +     if (global_default_env != PM_ENV_NOT_SET) {
> > +             ops = &rte_power_ops[global_default_env];
> > +             return ops->exit(lcore_id);
> >       }
> > -     return -1;
> > +     POWER_LOG(ERR, "Environment has not been set, unable "
> > +                     "to exit gracefully\n");
> >
> > +     return -1;
> >   }
> > diff --git a/lib/power/rte_power.h b/lib/power/rte_power.h index
> > 4fa4afe399..749bb823ab 100644
> > --- a/lib/power/rte_power.h
> > +++ b/lib/power/rte_power.h
> > @@ -1,5 +1,6 @@
> >   /* SPDX-License-Identifier: BSD-3-Clause
> >    * Copyright(c) 2010-2014 Intel Corporation
> > + * Copyright(c) 2024 AMD Limited
> >    */
> >
> >   #ifndef _RTE_POWER_H
> > @@ -21,7 +22,7 @@ extern "C" {
> >   /* Power Management Environment State */
> >   enum power_management_env {PM_ENV_NOT_SET, PM_ENV_ACPI_CPUFREQ,
> PM_ENV_KVM_VM,
> >               PM_ENV_PSTATE_CPUFREQ, PM_ENV_CPPC_CPUFREQ,
> > -             PM_ENV_AMD_PSTATE_CPUFREQ};
> > +             PM_ENV_AMD_PSTATE_CPUFREQ, PM_ENV_MAX};
> "enum power_management_env" is not good. may be like "enum
> power_cpufreq_driver_type"?
> In previous linked list structure to be defined, may be directly use a string name
> instead of a fixed enum is better.
> Becuase the new "PM_ENV_MAX" will  lead to break ABI when add a new cpufreq
> driver.
I will rework this to remove the max macro.
How changing the enum power_management_env requires ABI versioning.
Will consider this change in future.
> >
> >   /**
> >    * Check if a specific power management environment type is
> > supported on a @@ -66,6 +67,97 @@ void rte_power_unset_env(void);
> >    */
> >   enum power_management_env rte_power_get_env(void);
> >
> > +typedef int (*rte_power_cpufreq_init_t)(unsigned int lcore_id);
> > +typedef int (*rte_power_cpufreq_exit_t)(unsigned int lcore_id);
> > +typedef int (*rte_power_check_env_support_t)(void);
> > +
> > +typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, uint32_t *freqs,
> > +                                     uint32_t num); typedef uint32_t
> > +(*rte_power_get_freq_t)(unsigned int lcore_id); typedef int
> > +(*rte_power_set_freq_t)(unsigned int lcore_id, uint32_t index);
> > +typedef int (*rte_power_freq_change_t)(unsigned int lcore_id);
> > +
> > +/**
> > + * Function pointer definition for generic frequency change
> > +functions. Review
> > + * each environments specific documentation for usage.
> > + *
> > + * @param lcore_id
> > + *  lcore id.
> > + *
> > + * @return
> > + *  - 1 on success with frequency changed.
> > + *  - 0 on success without frequency changed.
> > + *  - Negative on error.
> > + */
> > +
> > +/**
> > + * Power capabilities summary.
> > + */
> > +struct rte_power_core_capabilities {
> > +     union {
> > +             uint64_t capabilities;
> > +             struct {
> > +                     uint64_t turbo:1;       /**< Turbo can be enabled. */
> > +                     uint64_t priority:1;    /**< SST-BF high freq core */
> > +             };
> > +     };
> > +};
> > +
> > +typedef int (*rte_power_get_capabilities_t)(unsigned int lcore_id,
> > +                             struct rte_power_core_capabilities
> > +*caps);
> > +
> > +/** Structure defining core power operations structure */ struct
> > +rte_power_ops {
> > +uint8_t status;                         /**< ops register status. */
> > +     enum power_management_env env;          /**< power mgmt env. */
> > +     rte_power_cpufreq_init_t init;    /**< Initialize power management. */
> > +     rte_power_cpufreq_exit_t exit;    /**< Exit power management. */
> > +     rte_power_check_env_support_t check_env_support; /**< verify env is
> supported. */
> > +     rte_power_freqs_t get_avail_freqs; /**< Get the available frequencies. */
> > +     rte_power_get_freq_t get_freq; /**< Get frequency index. */
> > +     rte_power_set_freq_t set_freq; /**< Set frequency index. */
> > +     rte_power_freq_change_t freq_up;   /**< Scale up frequency. */
> > +     rte_power_freq_change_t freq_down; /**< Scale down frequency. */
> > +     rte_power_freq_change_t freq_max;  /**< Scale up frequency to highest.
> */
> > +     rte_power_freq_change_t freq_min;  /**< Scale up frequency to lowest. */
> > +     rte_power_freq_change_t turbo_status; /**< Get Turbo status. */
> > +     rte_power_freq_change_t enable_turbo; /**< Enable Turbo. */
> > +     rte_power_freq_change_t disable_turbo; /**< Disable Turbo. */
> > +     rte_power_get_capabilities_t get_caps; /**< power capabilities.
> > +*/ } __rte_cache_aligned;
> Suggest that fix this sturcture, like:
> struct rte_power_cpufreq_list {
>      char name[];   // like "cppc_cpufreq", "pstate_cpufreq"
>      struct rte_power_cpufreq *ops;
>      struct rte_power_cpufreq_list *node; }
ACK
> > +
> > +/**
> > + * Register power cpu frequency operations.
> > + *
> > + * @param ops
> > + *   Pointer to an ops structure to register.
> > + * @return
> > + *   - >=0: Success; return the index of the ops struct in the table.
> > + *   - -EINVAL - error while registering ops struct.
> > + */
> > +__rte_internal
> > +int rte_power_register_ops(const struct rte_power_ops *ops);
> > +
> > +/**
> > + * Macro to statically register the ops of a cpufreq driver.
> > + */
> > +#define RTE_POWER_REGISTER_OPS(ops)          \
> > +     (RTE_INIT(power_hdlr_init_##ops)        \
> > +     {                                       \
> > +             rte_power_register_ops(&ops);   \
> > +     })
> > +
> > +/**
> > + * @internal Get the power ops struct from its index.
> > + *
> > + * @param ops_index
> > + *   The index of the ops struct in the ops struct table.
> > + * @return
> > + *   The pointer to the ops struct in the table if registered.
> > + */
> > +struct rte_power_ops *
> > +rte_power_get_ops(int ops_index);
> > +
> >   /**
> >    * Initialize power management for a specific lcore. If rte_power_set_env() has
> >    * not been called then an auto-detect of the environment will start
> > and @@ -108,10 +200,14 @@ int rte_power_exit(unsigned int lcore_id);
> >    * @return
> >    *  The number of available frequencies.
> >    */
> > -typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, uint32_t *freqs,
> > -             uint32_t num);
> > +static inline uint32_t
> > +rte_power_freqs(unsigned int lcore_id, uint32_t *freqs, uint32_t n) {
> > +     struct rte_power_ops *ops;
> >
> > -extern rte_power_freqs_t rte_power_freqs;
> > +     ops = rte_power_get_ops(rte_power_get_env());
> > +     return ops->get_avail_freqs(lcore_id, freqs, n); }
> nice.
> <...>

^ permalink raw reply	[relevance 3%]

* [PATCH 3/5] net/nfp: uniform function name format
    2024-03-05  2:29  4% ` [PATCH 1/5] net/nfp: create " Chaoyong He
@ 2024-03-05  2:29  3% ` Chaoyong He
  1 sibling, 0 replies; 200+ results
From: Chaoyong He @ 2024-03-05  2:29 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, Long Wu, Chaoyong He

From: Long Wu <long.wu@corigine.com>

Uniform function name format and add the same prefix.

Signed-off-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
---
 drivers/net/nfp/nfd3/nfp_nfd3_dp.c |  5 ++--
 drivers/net/nfp/nfdk/nfp_nfdk_dp.c |  5 ++--
 drivers/net/nfp/nfp_net_common.c   |  2 +-
 drivers/net/nfp/nfp_net_meta.c     | 38 +++++++++++++++---------------
 drivers/net/nfp/nfp_net_meta.h     |  8 +++----
 drivers/net/nfp/nfp_rxtx.c         |  2 +-
 6 files changed, 29 insertions(+), 31 deletions(-)

diff --git a/drivers/net/nfp/nfd3/nfp_nfd3_dp.c b/drivers/net/nfp/nfd3/nfp_nfd3_dp.c
index 5fb76ae9d7..7e281ae498 100644
--- a/drivers/net/nfp/nfd3/nfp_nfd3_dp.c
+++ b/drivers/net/nfp/nfd3/nfp_nfd3_dp.c
@@ -194,8 +194,7 @@ nfp_net_nfd3_set_meta_data(struct nfp_net_meta_raw *meta_data,
 				PMD_DRV_LOG(ERR, "At most 1 layers of vlan is supported");
 				return -EINVAL;
 			}
-
-			nfp_net_set_meta_vlan(meta_data, pkt, layer);
+			nfp_net_meta_set_vlan(meta_data, pkt, layer);
 			vlan_layer++;
 			break;
 		case NFP_NET_META_IPSEC:
@@ -204,7 +203,7 @@ nfp_net_nfd3_set_meta_data(struct nfp_net_meta_raw *meta_data,
 				return -EINVAL;
 			}
 
-			nfp_net_set_meta_ipsec(meta_data, txq, pkt, layer, ipsec_layer);
+			nfp_net_meta_set_ipsec(meta_data, txq, pkt, layer, ipsec_layer);
 			ipsec_layer++;
 			break;
 		default:
diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_dp.c b/drivers/net/nfp/nfdk/nfp_nfdk_dp.c
index 8bdab5d463..b8592b1767 100644
--- a/drivers/net/nfp/nfdk/nfp_nfdk_dp.c
+++ b/drivers/net/nfp/nfdk/nfp_nfdk_dp.c
@@ -228,8 +228,7 @@ nfp_net_nfdk_set_meta_data(struct rte_mbuf *pkt,
 				PMD_DRV_LOG(ERR, "At most 1 layers of vlan is supported");
 				return -EINVAL;
 			}
-
-			nfp_net_set_meta_vlan(&meta_data, pkt, layer);
+			nfp_net_meta_set_vlan(&meta_data, pkt, layer);
 			vlan_layer++;
 			break;
 		case NFP_NET_META_IPSEC:
@@ -238,7 +237,7 @@ nfp_net_nfdk_set_meta_data(struct rte_mbuf *pkt,
 				return -EINVAL;
 			}
 
-			nfp_net_set_meta_ipsec(&meta_data, txq, pkt, layer, ipsec_layer);
+			nfp_net_meta_set_ipsec(&meta_data, txq, pkt, layer, ipsec_layer);
 			ipsec_layer++;
 			break;
 		default:
diff --git a/drivers/net/nfp/nfp_net_common.c b/drivers/net/nfp/nfp_net_common.c
index 384e042dfd..71038d6be9 100644
--- a/drivers/net/nfp/nfp_net_common.c
+++ b/drivers/net/nfp/nfp_net_common.c
@@ -1312,7 +1312,7 @@ nfp_net_common_init(struct rte_pci_device *pci_dev,
 	hw->max_mtu = nn_cfg_readl(&hw->super, NFP_NET_CFG_MAX_MTU);
 	hw->flbufsz = DEFAULT_FLBUF_SIZE;
 
-	nfp_net_init_metadata_format(hw);
+	nfp_net_meta_init_format(hw);
 
 	/* Read the Rx offset configured from firmware */
 	if (hw->ver.major < 2)
diff --git a/drivers/net/nfp/nfp_net_meta.c b/drivers/net/nfp/nfp_net_meta.c
index 0fd5ba17a0..2ec20aba7d 100644
--- a/drivers/net/nfp/nfp_net_meta.c
+++ b/drivers/net/nfp/nfp_net_meta.c
@@ -17,7 +17,7 @@ enum nfp_net_meta_ipsec_layer {
 
 /* Parse the chained metadata from packet */
 static bool
-nfp_net_parse_chained_meta(uint8_t *meta_base,
+nfp_net_meta_parse_chained(uint8_t *meta_base,
 		rte_be32_t meta_header,
 		struct nfp_net_meta_parsed *meta)
 {
@@ -73,7 +73,7 @@ nfp_net_parse_chained_meta(uint8_t *meta_base,
  * Get it from metadata area.
  */
 static inline void
-nfp_net_parse_single_meta(uint8_t *meta_base,
+nfp_net_meta_parse_single(uint8_t *meta_base,
 		rte_be32_t meta_header,
 		struct nfp_net_meta_parsed *meta)
 {
@@ -83,7 +83,7 @@ nfp_net_parse_single_meta(uint8_t *meta_base,
 
 /* Set mbuf hash data based on the metadata info */
 static void
-nfp_net_parse_meta_hash(const struct nfp_net_meta_parsed *meta,
+nfp_net_meta_parse_hash(const struct nfp_net_meta_parsed *meta,
 		struct nfp_net_rxq *rxq,
 		struct rte_mbuf *mbuf)
 {
@@ -98,7 +98,7 @@ nfp_net_parse_meta_hash(const struct nfp_net_meta_parsed *meta,
 
 /* Set mbuf vlan_strip data based on metadata info */
 static void
-nfp_net_parse_meta_vlan(const struct nfp_net_meta_parsed *meta,
+nfp_net_meta_parse_vlan(const struct nfp_net_meta_parsed *meta,
 		struct nfp_net_rx_desc *rxd,
 		struct nfp_net_rxq *rxq,
 		struct rte_mbuf *mb)
@@ -146,7 +146,7 @@ nfp_net_parse_meta_vlan(const struct nfp_net_meta_parsed *meta,
  * qinq not set & vlan not set: meta->vlan_layer=0
  */
 static void
-nfp_net_parse_meta_qinq(const struct nfp_net_meta_parsed *meta,
+nfp_net_meta_parse_qinq(const struct nfp_net_meta_parsed *meta,
 		struct nfp_net_rxq *rxq,
 		struct rte_mbuf *mb)
 {
@@ -175,7 +175,7 @@ nfp_net_parse_meta_qinq(const struct nfp_net_meta_parsed *meta,
  * Extract and decode metadata info and set the mbuf ol_flags.
  */
 static void
-nfp_net_parse_meta_ipsec(struct nfp_net_meta_parsed *meta,
+nfp_net_meta_parse_ipsec(struct nfp_net_meta_parsed *meta,
 		struct nfp_net_rxq *rxq,
 		struct rte_mbuf *mbuf)
 {
@@ -202,7 +202,7 @@ nfp_net_parse_meta_ipsec(struct nfp_net_meta_parsed *meta,
 }
 
 static void
-nfp_net_parse_meta_mark(const struct nfp_net_meta_parsed *meta,
+nfp_net_meta_parse_mark(const struct nfp_net_meta_parsed *meta,
 		struct rte_mbuf *mbuf)
 {
 	if (((meta->flags >> NFP_NET_META_MARK) & 0x1) == 0)
@@ -214,7 +214,7 @@ nfp_net_parse_meta_mark(const struct nfp_net_meta_parsed *meta,
 
 /* Parse the metadata from packet */
 void
-nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
+nfp_net_meta_parse(struct nfp_net_rx_desc *rxds,
 		struct nfp_net_rxq *rxq,
 		struct nfp_net_hw *hw,
 		struct rte_mbuf *mb,
@@ -231,20 +231,20 @@ nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
 
 	switch (hw->meta_format) {
 	case NFP_NET_METAFORMAT_CHAINED:
-		if (nfp_net_parse_chained_meta(meta_base, meta_header, meta)) {
-			nfp_net_parse_meta_hash(meta, rxq, mb);
-			nfp_net_parse_meta_vlan(meta, rxds, rxq, mb);
-			nfp_net_parse_meta_qinq(meta, rxq, mb);
-			nfp_net_parse_meta_ipsec(meta, rxq, mb);
-			nfp_net_parse_meta_mark(meta, mb);
+		if (nfp_net_meta_parse_chained(meta_base, meta_header, meta)) {
+			nfp_net_meta_parse_hash(meta, rxq, mb);
+			nfp_net_meta_parse_vlan(meta, rxds, rxq, mb);
+			nfp_net_meta_parse_qinq(meta, rxq, mb);
+			nfp_net_meta_parse_ipsec(meta, rxq, mb);
+			nfp_net_meta_parse_mark(meta, mb);
 		} else {
 			PMD_RX_LOG(DEBUG, "RX chained metadata format is wrong!");
 		}
 		break;
 	case NFP_NET_METAFORMAT_SINGLE:
 		if ((rxds->rxd.flags & PCIE_DESC_RX_RSS) != 0) {
-			nfp_net_parse_single_meta(meta_base, meta_header, meta);
-			nfp_net_parse_meta_hash(meta, rxq, mb);
+			nfp_net_meta_parse_single(meta_base, meta_header, meta);
+			nfp_net_meta_parse_hash(meta, rxq, mb);
 		}
 		break;
 	default:
@@ -253,7 +253,7 @@ nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
 }
 
 void
-nfp_net_init_metadata_format(struct nfp_net_hw *hw)
+nfp_net_meta_init_format(struct nfp_net_hw *hw)
 {
 	/*
 	 * ABI 4.x and ctrl vNIC always use chained metadata, in other cases we allow use of
@@ -276,7 +276,7 @@ nfp_net_init_metadata_format(struct nfp_net_hw *hw)
 }
 
 void
-nfp_net_set_meta_vlan(struct nfp_net_meta_raw *meta_data,
+nfp_net_meta_set_vlan(struct nfp_net_meta_raw *meta_data,
 		struct rte_mbuf *pkt,
 		uint8_t layer)
 {
@@ -290,7 +290,7 @@ nfp_net_set_meta_vlan(struct nfp_net_meta_raw *meta_data,
 }
 
 void
-nfp_net_set_meta_ipsec(struct nfp_net_meta_raw *meta_data,
+nfp_net_meta_set_ipsec(struct nfp_net_meta_raw *meta_data,
 		struct nfp_net_txq *txq,
 		struct rte_mbuf *pkt,
 		uint8_t layer,
diff --git a/drivers/net/nfp/nfp_net_meta.h b/drivers/net/nfp/nfp_net_meta.h
index 46caa777da..1d26b089d5 100644
--- a/drivers/net/nfp/nfp_net_meta.h
+++ b/drivers/net/nfp/nfp_net_meta.h
@@ -90,16 +90,16 @@ struct nfp_net_meta_parsed {
 	} vlan[NFP_NET_META_MAX_VLANS];
 };
 
-void nfp_net_init_metadata_format(struct nfp_net_hw *hw);
-void nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
+void nfp_net_meta_init_format(struct nfp_net_hw *hw);
+void nfp_net_meta_parse(struct nfp_net_rx_desc *rxds,
 		struct nfp_net_rxq *rxq,
 		struct nfp_net_hw *hw,
 		struct rte_mbuf *mb,
 		struct nfp_net_meta_parsed *meta);
-void nfp_net_set_meta_vlan(struct nfp_net_meta_raw *meta_data,
+void nfp_net_meta_set_vlan(struct nfp_net_meta_raw *meta_data,
 		struct rte_mbuf *pkt,
 		uint8_t layer);
-void nfp_net_set_meta_ipsec(struct nfp_net_meta_raw *meta_data,
+void nfp_net_meta_set_ipsec(struct nfp_net_meta_raw *meta_data,
 		struct nfp_net_txq *txq,
 		struct rte_mbuf *pkt,
 		uint8_t layer,
diff --git a/drivers/net/nfp/nfp_rxtx.c b/drivers/net/nfp/nfp_rxtx.c
index e863c42039..716a6af34f 100644
--- a/drivers/net/nfp/nfp_rxtx.c
+++ b/drivers/net/nfp/nfp_rxtx.c
@@ -498,7 +498,7 @@ nfp_net_recv_pkts(void *rx_queue,
 		mb->port = rxq->port_id;
 
 		struct nfp_net_meta_parsed meta = {};
-		nfp_net_parse_meta(rxds, rxq, hw, mb, &meta);
+		nfp_net_meta_parse(rxds, rxq, hw, mb, &meta);
 
 		nfp_net_parse_ptype(rxq, rxds, mb);
 
-- 
2.39.1


^ permalink raw reply	[relevance 3%]

* [PATCH 1/5] net/nfp: create new meta data module
  @ 2024-03-05  2:29  4% ` Chaoyong He
  2024-03-05  2:29  3% ` [PATCH 3/5] net/nfp: uniform function name format Chaoyong He
  1 sibling, 0 replies; 200+ results
From: Chaoyong He @ 2024-03-05  2:29 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, Long Wu, Chaoyong He

From: Long Wu <long.wu@corigine.com>

Move the Rx meta data code to new 'nfp_net_meta' module, which
makes related logic more clean and the code architecture more
reasonable.
There is no functional change, just moving verbatim code around.

Signed-off-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
---
 drivers/common/nfp/nfp_common_ctrl.h     |  41 ---
 drivers/net/nfp/flower/nfp_flower.c      |   1 +
 drivers/net/nfp/flower/nfp_flower_cmsg.c |   1 +
 drivers/net/nfp/flower/nfp_flower_ctrl.c |   1 +
 drivers/net/nfp/meson.build              |   1 +
 drivers/net/nfp/nfd3/nfp_nfd3_dp.c       |   1 +
 drivers/net/nfp/nfdk/nfp_nfdk_dp.c       |   1 +
 drivers/net/nfp/nfp_ipsec.c              |   1 +
 drivers/net/nfp/nfp_ipsec.h              |   6 -
 drivers/net/nfp/nfp_net_common.c         |  24 +-
 drivers/net/nfp/nfp_net_common.h         |   7 +-
 drivers/net/nfp/nfp_net_meta.c           | 320 +++++++++++++++++++++++
 drivers/net/nfp/nfp_net_meta.h           | 108 ++++++++
 drivers/net/nfp/nfp_rxtx.c               | 305 +--------------------
 drivers/net/nfp/nfp_rxtx.h               |  20 --
 15 files changed, 438 insertions(+), 400 deletions(-)
 create mode 100644 drivers/net/nfp/nfp_net_meta.c
 create mode 100644 drivers/net/nfp/nfp_net_meta.h

diff --git a/drivers/common/nfp/nfp_common_ctrl.h b/drivers/common/nfp/nfp_common_ctrl.h
index 7749ba6459..6badf769fc 100644
--- a/drivers/common/nfp/nfp_common_ctrl.h
+++ b/drivers/common/nfp/nfp_common_ctrl.h
@@ -13,47 +13,6 @@
  */
 #define NFP_NET_CFG_BAR_SZ              (32 * 1024)
 
-/* Offset in Freelist buffer where packet starts on RX */
-#define NFP_NET_RX_OFFSET               32
-
-/* Working with metadata api (NFD version > 3.0) */
-#define NFP_NET_META_FIELD_SIZE         4
-#define NFP_NET_META_FIELD_MASK ((1 << NFP_NET_META_FIELD_SIZE) - 1)
-#define NFP_NET_META_HEADER_SIZE        4
-#define NFP_NET_META_NFDK_LENGTH        8
-
-/* Working with metadata vlan api (NFD version >= 2.0) */
-#define NFP_NET_META_VLAN_INFO          16
-#define NFP_NET_META_VLAN_OFFLOAD       31
-#define NFP_NET_META_VLAN_TPID          3
-#define NFP_NET_META_VLAN_MASK          ((1 << NFP_NET_META_VLAN_INFO) - 1)
-#define NFP_NET_META_VLAN_TPID_MASK     ((1 << NFP_NET_META_VLAN_TPID) - 1)
-#define NFP_NET_META_TPID(d)            (((d) >> NFP_NET_META_VLAN_INFO) & \
-						NFP_NET_META_VLAN_TPID_MASK)
-
-/* Prepend field types */
-#define NFP_NET_META_HASH               1 /* Next field carries hash type */
-#define NFP_NET_META_MARK               2
-#define NFP_NET_META_VLAN               4
-#define NFP_NET_META_PORTID             5
-#define NFP_NET_META_IPSEC              9
-
-#define NFP_META_PORT_ID_CTRL           ~0U
-
-/* Hash type prepended when a RSS hash was computed */
-#define NFP_NET_RSS_NONE                0
-#define NFP_NET_RSS_IPV4                1
-#define NFP_NET_RSS_IPV6                2
-#define NFP_NET_RSS_IPV6_EX             3
-#define NFP_NET_RSS_IPV4_TCP            4
-#define NFP_NET_RSS_IPV6_TCP            5
-#define NFP_NET_RSS_IPV6_EX_TCP         6
-#define NFP_NET_RSS_IPV4_UDP            7
-#define NFP_NET_RSS_IPV6_UDP            8
-#define NFP_NET_RSS_IPV6_EX_UDP         9
-#define NFP_NET_RSS_IPV4_SCTP           10
-#define NFP_NET_RSS_IPV6_SCTP           11
-
 /*
  * @NFP_NET_TXR_MAX:         Maximum number of TX rings
  * @NFP_NET_TXR_MASK:        Mask for TX rings
diff --git a/drivers/net/nfp/flower/nfp_flower.c b/drivers/net/nfp/flower/nfp_flower.c
index c6a744e868..97219ff379 100644
--- a/drivers/net/nfp/flower/nfp_flower.c
+++ b/drivers/net/nfp/flower/nfp_flower.c
@@ -16,6 +16,7 @@
 #include "../nfp_cpp_bridge.h"
 #include "../nfp_logs.h"
 #include "../nfp_mtr.h"
+#include "../nfp_net_meta.h"
 #include "nfp_flower_ctrl.h"
 #include "nfp_flower_representor.h"
 #include "nfp_flower_service.h"
diff --git a/drivers/net/nfp/flower/nfp_flower_cmsg.c b/drivers/net/nfp/flower/nfp_flower_cmsg.c
index 8effe9474d..f78bfba332 100644
--- a/drivers/net/nfp/flower/nfp_flower_cmsg.c
+++ b/drivers/net/nfp/flower/nfp_flower_cmsg.c
@@ -7,6 +7,7 @@
 
 #include "../nfpcore/nfp_nsp.h"
 #include "../nfp_logs.h"
+#include "../nfp_net_meta.h"
 #include "nfp_flower_ctrl.h"
 #include "nfp_flower_representor.h"
 
diff --git a/drivers/net/nfp/flower/nfp_flower_ctrl.c b/drivers/net/nfp/flower/nfp_flower_ctrl.c
index bcb325d475..720a0d9495 100644
--- a/drivers/net/nfp/flower/nfp_flower_ctrl.c
+++ b/drivers/net/nfp/flower/nfp_flower_ctrl.c
@@ -10,6 +10,7 @@
 #include "../nfd3/nfp_nfd3.h"
 #include "../nfdk/nfp_nfdk.h"
 #include "../nfp_logs.h"
+#include "../nfp_net_meta.h"
 #include "nfp_flower_representor.h"
 #include "nfp_mtr.h"
 #include "nfp_flower_service.h"
diff --git a/drivers/net/nfp/meson.build b/drivers/net/nfp/meson.build
index 959ca01844..d805644ec5 100644
--- a/drivers/net/nfp/meson.build
+++ b/drivers/net/nfp/meson.build
@@ -41,6 +41,7 @@ sources = files(
         'nfp_net_common.c',
         'nfp_net_ctrl.c',
         'nfp_net_flow.c',
+        'nfp_net_meta.c',
         'nfp_rxtx.c',
         'nfp_service.c',
 )
diff --git a/drivers/net/nfp/nfd3/nfp_nfd3_dp.c b/drivers/net/nfp/nfd3/nfp_nfd3_dp.c
index be31f4ac33..5fb76ae9d7 100644
--- a/drivers/net/nfp/nfd3/nfp_nfd3_dp.c
+++ b/drivers/net/nfp/nfd3/nfp_nfd3_dp.c
@@ -10,6 +10,7 @@
 
 #include "../flower/nfp_flower.h"
 #include "../nfp_logs.h"
+#include "../nfp_net_meta.h"
 
 /* Flags in the host TX descriptor */
 #define NFD3_DESC_TX_CSUM               RTE_BIT32(7)
diff --git a/drivers/net/nfp/nfdk/nfp_nfdk_dp.c b/drivers/net/nfp/nfdk/nfp_nfdk_dp.c
index daf5ac5b30..8bdab5d463 100644
--- a/drivers/net/nfp/nfdk/nfp_nfdk_dp.c
+++ b/drivers/net/nfp/nfdk/nfp_nfdk_dp.c
@@ -11,6 +11,7 @@
 
 #include "../flower/nfp_flower.h"
 #include "../nfp_logs.h"
+#include "../nfp_net_meta.h"
 
 #define NFDK_TX_DESC_GATHER_MAX         17
 
diff --git a/drivers/net/nfp/nfp_ipsec.c b/drivers/net/nfp/nfp_ipsec.c
index 0b815fa983..0bf146b9be 100644
--- a/drivers/net/nfp/nfp_ipsec.c
+++ b/drivers/net/nfp/nfp_ipsec.c
@@ -18,6 +18,7 @@
 #include "nfp_net_common.h"
 #include "nfp_net_ctrl.h"
 #include "nfp_rxtx.h"
+#include "nfp_net_meta.h"
 
 #define NFP_UDP_ESP_PORT            4500
 
diff --git a/drivers/net/nfp/nfp_ipsec.h b/drivers/net/nfp/nfp_ipsec.h
index d7a729398a..4ef0e196be 100644
--- a/drivers/net/nfp/nfp_ipsec.h
+++ b/drivers/net/nfp/nfp_ipsec.h
@@ -168,12 +168,6 @@ struct nfp_net_ipsec_data {
 	struct nfp_ipsec_session *sa_entries[NFP_NET_IPSEC_MAX_SA_CNT];
 };
 
-enum nfp_ipsec_meta_layer {
-	NFP_IPSEC_META_SAIDX,       /**< Order of SA index in metadata */
-	NFP_IPSEC_META_SEQLOW,      /**< Order of Sequence Number (low 32bits) in metadata */
-	NFP_IPSEC_META_SEQHI,       /**< Order of Sequence Number (high 32bits) in metadata */
-};
-
 int nfp_ipsec_init(struct rte_eth_dev *dev);
 void nfp_ipsec_uninit(struct rte_eth_dev *dev);
 
diff --git a/drivers/net/nfp/nfp_net_common.c b/drivers/net/nfp/nfp_net_common.c
index 20e628bfd1..384e042dfd 100644
--- a/drivers/net/nfp/nfp_net_common.c
+++ b/drivers/net/nfp/nfp_net_common.c
@@ -15,6 +15,7 @@
 #include "nfpcore/nfp_mip.h"
 #include "nfpcore/nfp_nsp.h"
 #include "nfp_logs.h"
+#include "nfp_net_meta.h"
 
 #define NFP_TX_MAX_SEG       UINT8_MAX
 #define NFP_TX_MAX_MTU_SEG   8
@@ -2038,29 +2039,6 @@ nfp_net_check_dma_mask(struct nfp_net_hw *hw,
 	return 0;
 }
 
-void
-nfp_net_init_metadata_format(struct nfp_net_hw *hw)
-{
-	/*
-	 * ABI 4.x and ctrl vNIC always use chained metadata, in other cases we allow use of
-	 * single metadata if only RSS(v1) is supported by hw capability, and RSS(v2)
-	 * also indicate that we are using chained metadata.
-	 */
-	if (hw->ver.major == 4) {
-		hw->meta_format = NFP_NET_METAFORMAT_CHAINED;
-	} else if ((hw->super.cap & NFP_NET_CFG_CTRL_CHAIN_META) != 0) {
-		hw->meta_format = NFP_NET_METAFORMAT_CHAINED;
-		/*
-		 * RSS is incompatible with chained metadata. hw->super.cap just represents
-		 * firmware's ability rather than the firmware's configuration. We decide
-		 * to reduce the confusion to allow us can use hw->super.cap to identify RSS later.
-		 */
-		hw->super.cap &= ~NFP_NET_CFG_CTRL_RSS;
-	} else {
-		hw->meta_format = NFP_NET_METAFORMAT_SINGLE;
-	}
-}
-
 void
 nfp_net_cfg_read_version(struct nfp_net_hw *hw)
 {
diff --git a/drivers/net/nfp/nfp_net_common.h b/drivers/net/nfp/nfp_net_common.h
index 628c0d3491..49a5a84044 100644
--- a/drivers/net/nfp/nfp_net_common.h
+++ b/drivers/net/nfp/nfp_net_common.h
@@ -16,6 +16,7 @@
 #include "nfpcore/nfp_sync.h"
 #include "nfp_net_ctrl.h"
 #include "nfp_service.h"
+#include "nfp_net_meta.h"
 
 /* Interrupt definitions */
 #define NFP_NET_IRQ_LSC_IDX             0
@@ -67,11 +68,6 @@ enum nfp_app_fw_id {
 	NFP_APP_FW_FLOWER_NIC             = 0x3,
 };
 
-enum nfp_net_meta_format {
-	NFP_NET_METAFORMAT_SINGLE,
-	NFP_NET_METAFORMAT_CHAINED,
-};
-
 /* Parsed control BAR TLV capabilities */
 struct nfp_net_tlv_caps {
 	uint32_t mbox_off;               /**< VNIC mailbox area offset */
@@ -306,7 +302,6 @@ void nfp_net_tx_desc_limits(struct nfp_net_hw *hw,
 		uint16_t *min_tx_desc,
 		uint16_t *max_tx_desc);
 int nfp_net_check_dma_mask(struct nfp_net_hw *hw, char *name);
-void nfp_net_init_metadata_format(struct nfp_net_hw *hw);
 void nfp_net_cfg_read_version(struct nfp_net_hw *hw);
 int nfp_net_firmware_version_get(struct rte_eth_dev *dev, char *fw_version, size_t fw_size);
 bool nfp_net_is_valid_nfd_version(struct nfp_net_fw_ver version);
diff --git a/drivers/net/nfp/nfp_net_meta.c b/drivers/net/nfp/nfp_net_meta.c
new file mode 100644
index 0000000000..0bc22b2f88
--- /dev/null
+++ b/drivers/net/nfp/nfp_net_meta.c
@@ -0,0 +1,320 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2023 Corigine, Inc.
+ * All rights reserved.
+ */
+
+#include "nfp_net_meta.h"
+
+#include "nfp_net_common.h"
+#include "nfp_ipsec.h"
+#include "nfp_logs.h"
+
+enum nfp_ipsec_meta_layer {
+	NFP_IPSEC_META_SAIDX,       /**< Order of SA index in metadata */
+	NFP_IPSEC_META_SEQLOW,      /**< Order of Sequence Number (low 32bits) in metadata */
+	NFP_IPSEC_META_SEQHI,       /**< Order of Sequence Number (high 32bits) in metadata */
+};
+
+/* Parse the chained metadata from packet */
+static bool
+nfp_net_parse_chained_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
+{
+	uint32_t meta_info;
+	uint32_t vlan_info;
+	uint8_t *meta_offset;
+
+	meta_info = rte_be_to_cpu_32(meta_header);
+	meta_offset = meta_base + 4;
+
+	for (; meta_info != 0; meta_info >>= NFP_NET_META_FIELD_SIZE, meta_offset += 4) {
+		switch (meta_info & NFP_NET_META_FIELD_MASK) {
+		case NFP_NET_META_PORTID:
+			meta->port_id = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
+			break;
+		case NFP_NET_META_HASH:
+			/* Next field type is about the hash type */
+			meta_info >>= NFP_NET_META_FIELD_SIZE;
+			/* Hash value is in the data field */
+			meta->hash = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
+			meta->hash_type = meta_info & NFP_NET_META_FIELD_MASK;
+			break;
+		case NFP_NET_META_VLAN:
+			vlan_info = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
+			meta->vlan[meta->vlan_layer].offload =
+					vlan_info >> NFP_NET_META_VLAN_OFFLOAD;
+			meta->vlan[meta->vlan_layer].tci =
+					vlan_info & NFP_NET_META_VLAN_MASK;
+			meta->vlan[meta->vlan_layer].tpid = NFP_NET_META_TPID(vlan_info);
+			meta->vlan_layer++;
+			break;
+		case NFP_NET_META_IPSEC:
+			meta->sa_idx = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
+			meta->ipsec_type = meta_info & NFP_NET_META_FIELD_MASK;
+			break;
+		case NFP_NET_META_MARK:
+			meta->flags |= (1 << NFP_NET_META_MARK);
+			meta->mark_id = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
+			break;
+		default:
+			/* Unsupported metadata can be a performance issue */
+			return false;
+		}
+	}
+
+	return true;
+}
+
+/*
+ * Parse the single metadata
+ *
+ * The RSS hash and hash-type are prepended to the packet data.
+ * Get it from metadata area.
+ */
+static inline void
+nfp_net_parse_single_meta(uint8_t *meta_base,
+		rte_be32_t meta_header,
+		struct nfp_meta_parsed *meta)
+{
+	meta->hash_type = rte_be_to_cpu_32(meta_header);
+	meta->hash = rte_be_to_cpu_32(*(rte_be32_t *)(meta_base + 4));
+}
+
+/* Set mbuf hash data based on the metadata info */
+static void
+nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
+		struct nfp_net_rxq *rxq,
+		struct rte_mbuf *mbuf)
+{
+	struct nfp_net_hw *hw = rxq->hw;
+
+	if ((hw->super.ctrl & NFP_NET_CFG_CTRL_RSS_ANY) == 0)
+		return;
+
+	mbuf->hash.rss = meta->hash;
+	mbuf->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
+}
+
+/* Set mbuf vlan_strip data based on metadata info */
+static void
+nfp_net_parse_meta_vlan(const struct nfp_meta_parsed *meta,
+		struct nfp_net_rx_desc *rxd,
+		struct nfp_net_rxq *rxq,
+		struct rte_mbuf *mb)
+{
+	uint32_t ctrl = rxq->hw->super.ctrl;
+
+	/* Skip if hardware don't support setting vlan. */
+	if ((ctrl & (NFP_NET_CFG_CTRL_RXVLAN | NFP_NET_CFG_CTRL_RXVLAN_V2)) == 0)
+		return;
+
+	/*
+	 * The firmware support two ways to send the VLAN info (with priority) :
+	 * 1. Using the metadata when NFP_NET_CFG_CTRL_RXVLAN_V2 is set,
+	 * 2. Using the descriptor when NFP_NET_CFG_CTRL_RXVLAN is set.
+	 */
+	if ((ctrl & NFP_NET_CFG_CTRL_RXVLAN_V2) != 0) {
+		if (meta->vlan_layer > 0 && meta->vlan[0].offload != 0) {
+			mb->vlan_tci = rte_cpu_to_le_32(meta->vlan[0].tci);
+			mb->ol_flags |= RTE_MBUF_F_RX_VLAN | RTE_MBUF_F_RX_VLAN_STRIPPED;
+		}
+	} else if ((ctrl & NFP_NET_CFG_CTRL_RXVLAN) != 0) {
+		if ((rxd->rxd.flags & PCIE_DESC_RX_VLAN) != 0) {
+			mb->vlan_tci = rte_cpu_to_le_32(rxd->rxd.offload_info);
+			mb->ol_flags |= RTE_MBUF_F_RX_VLAN | RTE_MBUF_F_RX_VLAN_STRIPPED;
+		}
+	}
+}
+
+/*
+ * Set mbuf qinq_strip data based on metadata info
+ *
+ * The out VLAN tci are prepended to the packet data.
+ * Extract and decode it and set the mbuf fields.
+ *
+ * If both RTE_MBUF_F_RX_VLAN and NFP_NET_CFG_CTRL_RXQINQ are set, the 2 VLANs
+ *   have been stripped by the hardware and their TCIs are saved in
+ *   mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).
+ * If NFP_NET_CFG_CTRL_RXQINQ is set and RTE_MBUF_F_RX_VLAN is unset, only the
+ *   outer VLAN is removed from packet data, but both tci are saved in
+ *   mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).
+ *
+ * qinq set & vlan set : meta->vlan_layer>=2, meta->vlan[0].offload=1, meta->vlan[1].offload=1
+ * qinq set & vlan not set: meta->vlan_layer>=2, meta->vlan[1].offload=1,meta->vlan[0].offload=0
+ * qinq not set & vlan set: meta->vlan_layer=1, meta->vlan[0].offload=1
+ * qinq not set & vlan not set: meta->vlan_layer=0
+ */
+static void
+nfp_net_parse_meta_qinq(const struct nfp_meta_parsed *meta,
+		struct nfp_net_rxq *rxq,
+		struct rte_mbuf *mb)
+{
+	struct nfp_hw *hw = &rxq->hw->super;
+
+	if ((hw->ctrl & NFP_NET_CFG_CTRL_RXQINQ) == 0 ||
+			(hw->cap & NFP_NET_CFG_CTRL_RXQINQ) == 0)
+		return;
+
+	if (meta->vlan_layer < NFP_META_MAX_VLANS)
+		return;
+
+	if (meta->vlan[0].offload == 0)
+		mb->vlan_tci = rte_cpu_to_le_16(meta->vlan[0].tci);
+
+	mb->vlan_tci_outer = rte_cpu_to_le_16(meta->vlan[1].tci);
+	PMD_RX_LOG(DEBUG, "Received outer vlan TCI is %u inner vlan TCI is %u",
+			mb->vlan_tci_outer, mb->vlan_tci);
+	mb->ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
+}
+
+/*
+ * Set mbuf IPsec Offload features based on metadata info.
+ *
+ * The IPsec Offload features is prepended to the mbuf ol_flags.
+ * Extract and decode metadata info and set the mbuf ol_flags.
+ */
+static void
+nfp_net_parse_meta_ipsec(struct nfp_meta_parsed *meta,
+		struct nfp_net_rxq *rxq,
+		struct rte_mbuf *mbuf)
+{
+	int offset;
+	uint32_t sa_idx;
+	struct nfp_net_hw *hw;
+	struct nfp_tx_ipsec_desc_msg *desc_md;
+
+	hw = rxq->hw;
+	sa_idx = meta->sa_idx;
+
+	if (meta->ipsec_type != NFP_NET_META_IPSEC)
+		return;
+
+	if (sa_idx >= NFP_NET_IPSEC_MAX_SA_CNT) {
+		mbuf->ol_flags |= RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED;
+	} else {
+		mbuf->ol_flags |= RTE_MBUF_F_RX_SEC_OFFLOAD;
+		offset = hw->ipsec_data->pkt_dynfield_offset;
+		desc_md = RTE_MBUF_DYNFIELD(mbuf, offset, struct nfp_tx_ipsec_desc_msg *);
+		desc_md->sa_idx = sa_idx;
+		desc_md->enc = 0;
+	}
+}
+
+static void
+nfp_net_parse_meta_mark(const struct nfp_meta_parsed *meta,
+		struct rte_mbuf *mbuf)
+{
+	if (((meta->flags >> NFP_NET_META_MARK) & 0x1) == 0)
+		return;
+
+	mbuf->hash.fdir.hi = meta->mark_id;
+	mbuf->ol_flags |= RTE_MBUF_F_RX_FDIR | RTE_MBUF_F_RX_FDIR_ID;
+}
+
+/* Parse the metadata from packet */
+void
+nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
+		struct nfp_net_rxq *rxq,
+		struct nfp_net_hw *hw,
+		struct rte_mbuf *mb,
+		struct nfp_meta_parsed *meta)
+{
+	uint8_t *meta_base;
+	rte_be32_t meta_header;
+
+	if (unlikely(NFP_DESC_META_LEN(rxds) == 0))
+		return;
+
+	meta_base = rte_pktmbuf_mtod_offset(mb, uint8_t *, -NFP_DESC_META_LEN(rxds));
+	meta_header = *(rte_be32_t *)meta_base;
+
+	switch (hw->meta_format) {
+	case NFP_NET_METAFORMAT_CHAINED:
+		if (nfp_net_parse_chained_meta(meta_base, meta_header, meta)) {
+			nfp_net_parse_meta_hash(meta, rxq, mb);
+			nfp_net_parse_meta_vlan(meta, rxds, rxq, mb);
+			nfp_net_parse_meta_qinq(meta, rxq, mb);
+			nfp_net_parse_meta_ipsec(meta, rxq, mb);
+			nfp_net_parse_meta_mark(meta, mb);
+		} else {
+			PMD_RX_LOG(DEBUG, "RX chained metadata format is wrong!");
+		}
+		break;
+	case NFP_NET_METAFORMAT_SINGLE:
+		if ((rxds->rxd.flags & PCIE_DESC_RX_RSS) != 0) {
+			nfp_net_parse_single_meta(meta_base, meta_header, meta);
+			nfp_net_parse_meta_hash(meta, rxq, mb);
+		}
+		break;
+	default:
+		PMD_RX_LOG(DEBUG, "RX metadata do not exist.");
+	}
+}
+
+void
+nfp_net_init_metadata_format(struct nfp_net_hw *hw)
+{
+	/*
+	 * ABI 4.x and ctrl vNIC always use chained metadata, in other cases we allow use of
+	 * single metadata if only RSS(v1) is supported by hw capability, and RSS(v2)
+	 * also indicate that we are using chained metadata.
+	 */
+	if (hw->ver.major == 4) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHAINED;
+	} else if ((hw->super.cap & NFP_NET_CFG_CTRL_CHAIN_META) != 0) {
+		hw->meta_format = NFP_NET_METAFORMAT_CHAINED;
+		/*
+		 * RSS is incompatible with chained metadata. hw->super.cap just represents
+		 * firmware's ability rather than the firmware's configuration. We decide
+		 * to reduce the confusion to allow us can use hw->super.cap to identify RSS later.
+		 */
+		hw->super.cap &= ~NFP_NET_CFG_CTRL_RSS;
+	} else {
+		hw->meta_format = NFP_NET_METAFORMAT_SINGLE;
+	}
+}
+
+void
+nfp_net_set_meta_vlan(struct nfp_net_meta_raw *meta_data,
+		struct rte_mbuf *pkt,
+		uint8_t layer)
+{
+	uint16_t tpid;
+	uint16_t vlan_tci;
+
+	tpid = RTE_ETHER_TYPE_VLAN;
+	vlan_tci = pkt->vlan_tci;
+
+	meta_data->data[layer] = rte_cpu_to_be_32(tpid << 16 | vlan_tci);
+}
+
+void
+nfp_net_set_meta_ipsec(struct nfp_net_meta_raw *meta_data,
+		struct nfp_net_txq *txq,
+		struct rte_mbuf *pkt,
+		uint8_t layer,
+		uint8_t ipsec_layer)
+{
+	int offset;
+	struct nfp_net_hw *hw;
+	struct nfp_tx_ipsec_desc_msg *desc_md;
+
+	hw = txq->hw;
+	offset = hw->ipsec_data->pkt_dynfield_offset;
+	desc_md = RTE_MBUF_DYNFIELD(pkt, offset, struct nfp_tx_ipsec_desc_msg *);
+
+	switch (ipsec_layer) {
+	case NFP_IPSEC_META_SAIDX:
+		meta_data->data[layer] = desc_md->sa_idx;
+		break;
+	case NFP_IPSEC_META_SEQLOW:
+		meta_data->data[layer] = desc_md->esn.low;
+		break;
+	case NFP_IPSEC_META_SEQHI:
+		meta_data->data[layer] = desc_md->esn.hi;
+		break;
+	default:
+		break;
+	}
+}
diff --git a/drivers/net/nfp/nfp_net_meta.h b/drivers/net/nfp/nfp_net_meta.h
new file mode 100644
index 0000000000..da2091ce9f
--- /dev/null
+++ b/drivers/net/nfp/nfp_net_meta.h
@@ -0,0 +1,108 @@
+/* SPDX-License-Identifier: BSD-3-Clause
+ * Copyright (c) 2014, 2015 Netronome Systems, Inc.
+ * All rights reserved.
+ */
+
+#ifndef __NFP_NET_META_H__
+#define __NFP_NET_META_H__
+
+#include "nfp_rxtx.h"
+
+/* Hash type prepended when a RSS hash was computed */
+#define NFP_NET_RSS_NONE                0
+#define NFP_NET_RSS_IPV4                1
+#define NFP_NET_RSS_IPV6                2
+#define NFP_NET_RSS_IPV6_EX             3
+#define NFP_NET_RSS_IPV4_TCP            4
+#define NFP_NET_RSS_IPV6_TCP            5
+#define NFP_NET_RSS_IPV6_EX_TCP         6
+#define NFP_NET_RSS_IPV4_UDP            7
+#define NFP_NET_RSS_IPV6_UDP            8
+#define NFP_NET_RSS_IPV6_EX_UDP         9
+#define NFP_NET_RSS_IPV4_SCTP           10
+#define NFP_NET_RSS_IPV6_SCTP           11
+
+/* Offset in Freelist buffer where packet starts on RX */
+#define NFP_NET_RX_OFFSET               32
+
+/* Working with metadata api (NFD version > 3.0) */
+#define NFP_NET_META_FIELD_SIZE         4
+#define NFP_NET_META_FIELD_MASK ((1 << NFP_NET_META_FIELD_SIZE) - 1)
+#define NFP_NET_META_HEADER_SIZE        4
+#define NFP_NET_META_NFDK_LENGTH        8
+
+/* Working with metadata vlan api (NFD version >= 2.0) */
+#define NFP_NET_META_VLAN_INFO          16
+#define NFP_NET_META_VLAN_OFFLOAD       31
+#define NFP_NET_META_VLAN_TPID          3
+#define NFP_NET_META_VLAN_MASK          ((1 << NFP_NET_META_VLAN_INFO) - 1)
+#define NFP_NET_META_VLAN_TPID_MASK     ((1 << NFP_NET_META_VLAN_TPID) - 1)
+#define NFP_NET_META_TPID(d)            (((d) >> NFP_NET_META_VLAN_INFO) & \
+						NFP_NET_META_VLAN_TPID_MASK)
+
+/* Prepend field types */
+#define NFP_NET_META_HASH               1 /* Next field carries hash type */
+#define NFP_NET_META_MARK               2
+#define NFP_NET_META_VLAN               4
+#define NFP_NET_META_PORTID             5
+#define NFP_NET_META_IPSEC              9
+
+#define NFP_META_PORT_ID_CTRL           ~0U
+
+#define NFP_DESC_META_LEN(d) ((d)->rxd.meta_len_dd & PCIE_DESC_RX_META_LEN_MASK)
+
+/* Maximum number of NFP packet metadata fields. */
+#define NFP_META_MAX_FIELDS      8
+
+/* Maximum number of supported VLANs in parsed form packet metadata. */
+#define NFP_META_MAX_VLANS       2
+
+enum nfp_net_meta_format {
+	NFP_NET_METAFORMAT_SINGLE,
+	NFP_NET_METAFORMAT_CHAINED,
+};
+
+/* Describe the raw metadata format. */
+struct nfp_net_meta_raw {
+	uint32_t header; /**< Field type header (see format in nfp.rst) */
+	uint32_t data[NFP_META_MAX_FIELDS]; /**< Array of each fields data member */
+	uint8_t length; /**< Number of valid fields in @header */
+};
+
+/* Record metadata parsed from packet */
+struct nfp_meta_parsed {
+	uint32_t port_id;         /**< Port id value */
+	uint32_t sa_idx;          /**< IPsec SA index */
+	uint32_t hash;            /**< RSS hash value */
+	uint32_t mark_id;         /**< Mark id value */
+	uint16_t flags;           /**< Bitmap to indicate if meta exist */
+	uint8_t hash_type;        /**< RSS hash type */
+	uint8_t ipsec_type;       /**< IPsec type */
+	uint8_t vlan_layer;       /**< The valid number of value in @vlan[] */
+	/**
+	 * Holds information parses from NFP_NET_META_VLAN.
+	 * The inner most vlan starts at position 0
+	 */
+	struct {
+		uint8_t offload;  /**< Flag indicates whether VLAN is offloaded */
+		uint8_t tpid;     /**< Vlan TPID */
+		uint16_t tci;     /**< Vlan TCI (PCP + Priority + VID) */
+	} vlan[NFP_META_MAX_VLANS];
+};
+
+void nfp_net_init_metadata_format(struct nfp_net_hw *hw);
+void nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
+		struct nfp_net_rxq *rxq,
+		struct nfp_net_hw *hw,
+		struct rte_mbuf *mb,
+		struct nfp_meta_parsed *meta);
+void nfp_net_set_meta_vlan(struct nfp_net_meta_raw *meta_data,
+		struct rte_mbuf *pkt,
+		uint8_t layer);
+void nfp_net_set_meta_ipsec(struct nfp_net_meta_raw *meta_data,
+		struct nfp_net_txq *txq,
+		struct rte_mbuf *pkt,
+		uint8_t layer,
+		uint8_t ipsec_layer);
+
+#endif /* __NFP_NET_META_H__ */
diff --git a/drivers/net/nfp/nfp_rxtx.c b/drivers/net/nfp/nfp_rxtx.c
index cbcf57d769..0256eba456 100644
--- a/drivers/net/nfp/nfp_rxtx.c
+++ b/drivers/net/nfp/nfp_rxtx.c
@@ -16,30 +16,7 @@
 
 #include "nfp_ipsec.h"
 #include "nfp_logs.h"
-
-/* Maximum number of supported VLANs in parsed form packet metadata. */
-#define NFP_META_MAX_VLANS       2
-
-/* Record metadata parsed from packet */
-struct nfp_meta_parsed {
-	uint32_t port_id;         /**< Port id value */
-	uint32_t sa_idx;          /**< IPsec SA index */
-	uint32_t hash;            /**< RSS hash value */
-	uint32_t mark_id;         /**< Mark id value */
-	uint16_t flags;           /**< Bitmap to indicate if meta exist */
-	uint8_t hash_type;        /**< RSS hash type */
-	uint8_t ipsec_type;       /**< IPsec type */
-	uint8_t vlan_layer;       /**< The valid number of value in @vlan[] */
-	/**
-	 * Holds information parses from NFP_NET_META_VLAN.
-	 * The inner most vlan starts at position 0
-	 */
-	struct {
-		uint8_t offload;  /**< Flag indicates whether VLAN is offloaded */
-		uint8_t tpid;     /**< Vlan TPID */
-		uint16_t tci;     /**< Vlan TCI (PCP + Priority + VID) */
-	} vlan[NFP_META_MAX_VLANS];
-};
+#include "nfp_net_meta.h"
 
 /*
  * The bit format and map of nfp packet type for rxd.offload_info in Rx descriptor.
@@ -254,242 +231,6 @@ nfp_net_rx_queue_count(void *rx_queue)
 	return count;
 }
 
-/* Parse the chained metadata from packet */
-static bool
-nfp_net_parse_chained_meta(uint8_t *meta_base,
-		rte_be32_t meta_header,
-		struct nfp_meta_parsed *meta)
-{
-	uint32_t meta_info;
-	uint32_t vlan_info;
-	uint8_t *meta_offset;
-
-	meta_info = rte_be_to_cpu_32(meta_header);
-	meta_offset = meta_base + 4;
-
-	for (; meta_info != 0; meta_info >>= NFP_NET_META_FIELD_SIZE, meta_offset += 4) {
-		switch (meta_info & NFP_NET_META_FIELD_MASK) {
-		case NFP_NET_META_PORTID:
-			meta->port_id = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
-			break;
-		case NFP_NET_META_HASH:
-			/* Next field type is about the hash type */
-			meta_info >>= NFP_NET_META_FIELD_SIZE;
-			/* Hash value is in the data field */
-			meta->hash = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
-			meta->hash_type = meta_info & NFP_NET_META_FIELD_MASK;
-			break;
-		case NFP_NET_META_VLAN:
-			vlan_info = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
-			meta->vlan[meta->vlan_layer].offload =
-					vlan_info >> NFP_NET_META_VLAN_OFFLOAD;
-			meta->vlan[meta->vlan_layer].tci =
-					vlan_info & NFP_NET_META_VLAN_MASK;
-			meta->vlan[meta->vlan_layer].tpid = NFP_NET_META_TPID(vlan_info);
-			meta->vlan_layer++;
-			break;
-		case NFP_NET_META_IPSEC:
-			meta->sa_idx = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
-			meta->ipsec_type = meta_info & NFP_NET_META_FIELD_MASK;
-			break;
-		case NFP_NET_META_MARK:
-			meta->flags |= (1 << NFP_NET_META_MARK);
-			meta->mark_id = rte_be_to_cpu_32(*(rte_be32_t *)meta_offset);
-			break;
-		default:
-			/* Unsupported metadata can be a performance issue */
-			return false;
-		}
-	}
-
-	return true;
-}
-
-/* Set mbuf hash data based on the metadata info */
-static void
-nfp_net_parse_meta_hash(const struct nfp_meta_parsed *meta,
-		struct nfp_net_rxq *rxq,
-		struct rte_mbuf *mbuf)
-{
-	struct nfp_net_hw *hw = rxq->hw;
-
-	if ((hw->super.ctrl & NFP_NET_CFG_CTRL_RSS_ANY) == 0)
-		return;
-
-	mbuf->hash.rss = meta->hash;
-	mbuf->ol_flags |= RTE_MBUF_F_RX_RSS_HASH;
-}
-
-/*
- * Parse the single metadata
- *
- * The RSS hash and hash-type are prepended to the packet data.
- * Get it from metadata area.
- */
-static inline void
-nfp_net_parse_single_meta(uint8_t *meta_base,
-		rte_be32_t meta_header,
-		struct nfp_meta_parsed *meta)
-{
-	meta->hash_type = rte_be_to_cpu_32(meta_header);
-	meta->hash = rte_be_to_cpu_32(*(rte_be32_t *)(meta_base + 4));
-}
-
-/* Set mbuf vlan_strip data based on metadata info */
-static void
-nfp_net_parse_meta_vlan(const struct nfp_meta_parsed *meta,
-		struct nfp_net_rx_desc *rxd,
-		struct nfp_net_rxq *rxq,
-		struct rte_mbuf *mb)
-{
-	uint32_t ctrl = rxq->hw->super.ctrl;
-
-	/* Skip if hardware don't support setting vlan. */
-	if ((ctrl & (NFP_NET_CFG_CTRL_RXVLAN | NFP_NET_CFG_CTRL_RXVLAN_V2)) == 0)
-		return;
-
-	/*
-	 * The firmware support two ways to send the VLAN info (with priority) :
-	 * 1. Using the metadata when NFP_NET_CFG_CTRL_RXVLAN_V2 is set,
-	 * 2. Using the descriptor when NFP_NET_CFG_CTRL_RXVLAN is set.
-	 */
-	if ((ctrl & NFP_NET_CFG_CTRL_RXVLAN_V2) != 0) {
-		if (meta->vlan_layer > 0 && meta->vlan[0].offload != 0) {
-			mb->vlan_tci = rte_cpu_to_le_32(meta->vlan[0].tci);
-			mb->ol_flags |= RTE_MBUF_F_RX_VLAN | RTE_MBUF_F_RX_VLAN_STRIPPED;
-		}
-	} else if ((ctrl & NFP_NET_CFG_CTRL_RXVLAN) != 0) {
-		if ((rxd->rxd.flags & PCIE_DESC_RX_VLAN) != 0) {
-			mb->vlan_tci = rte_cpu_to_le_32(rxd->rxd.offload_info);
-			mb->ol_flags |= RTE_MBUF_F_RX_VLAN | RTE_MBUF_F_RX_VLAN_STRIPPED;
-		}
-	}
-}
-
-/*
- * Set mbuf qinq_strip data based on metadata info
- *
- * The out VLAN tci are prepended to the packet data.
- * Extract and decode it and set the mbuf fields.
- *
- * If both RTE_MBUF_F_RX_VLAN and NFP_NET_CFG_CTRL_RXQINQ are set, the 2 VLANs
- *   have been stripped by the hardware and their TCIs are saved in
- *   mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).
- * If NFP_NET_CFG_CTRL_RXQINQ is set and RTE_MBUF_F_RX_VLAN is unset, only the
- *   outer VLAN is removed from packet data, but both tci are saved in
- *   mbuf->vlan_tci (inner) and mbuf->vlan_tci_outer (outer).
- *
- * qinq set & vlan set : meta->vlan_layer>=2, meta->vlan[0].offload=1, meta->vlan[1].offload=1
- * qinq set & vlan not set: meta->vlan_layer>=2, meta->vlan[1].offload=1,meta->vlan[0].offload=0
- * qinq not set & vlan set: meta->vlan_layer=1, meta->vlan[0].offload=1
- * qinq not set & vlan not set: meta->vlan_layer=0
- */
-static void
-nfp_net_parse_meta_qinq(const struct nfp_meta_parsed *meta,
-		struct nfp_net_rxq *rxq,
-		struct rte_mbuf *mb)
-{
-	struct nfp_hw *hw = &rxq->hw->super;
-
-	if ((hw->ctrl & NFP_NET_CFG_CTRL_RXQINQ) == 0)
-		return;
-
-	if (meta->vlan_layer < NFP_META_MAX_VLANS)
-		return;
-
-	if (meta->vlan[0].offload == 0)
-		mb->vlan_tci = rte_cpu_to_le_16(meta->vlan[0].tci);
-
-	mb->vlan_tci_outer = rte_cpu_to_le_16(meta->vlan[1].tci);
-	PMD_RX_LOG(DEBUG, "Received outer vlan TCI is %u inner vlan TCI is %u",
-			mb->vlan_tci_outer, mb->vlan_tci);
-	mb->ol_flags |= RTE_MBUF_F_RX_QINQ | RTE_MBUF_F_RX_QINQ_STRIPPED;
-}
-
-/*
- * Set mbuf IPsec Offload features based on metadata info.
- *
- * The IPsec Offload features is prepended to the mbuf ol_flags.
- * Extract and decode metadata info and set the mbuf ol_flags.
- */
-static void
-nfp_net_parse_meta_ipsec(struct nfp_meta_parsed *meta,
-		struct nfp_net_rxq *rxq,
-		struct rte_mbuf *mbuf)
-{
-	int offset;
-	uint32_t sa_idx;
-	struct nfp_net_hw *hw;
-	struct nfp_tx_ipsec_desc_msg *desc_md;
-
-	hw = rxq->hw;
-	sa_idx = meta->sa_idx;
-
-	if (meta->ipsec_type != NFP_NET_META_IPSEC)
-		return;
-
-	if (sa_idx >= NFP_NET_IPSEC_MAX_SA_CNT) {
-		mbuf->ol_flags |= RTE_MBUF_F_RX_SEC_OFFLOAD_FAILED;
-	} else {
-		mbuf->ol_flags |= RTE_MBUF_F_RX_SEC_OFFLOAD;
-		offset = hw->ipsec_data->pkt_dynfield_offset;
-		desc_md = RTE_MBUF_DYNFIELD(mbuf, offset, struct nfp_tx_ipsec_desc_msg *);
-		desc_md->sa_idx = sa_idx;
-		desc_md->enc = 0;
-	}
-}
-
-static void
-nfp_net_parse_meta_mark(const struct nfp_meta_parsed *meta,
-		struct rte_mbuf *mbuf)
-{
-	if (((meta->flags >> NFP_NET_META_MARK) & 0x1) == 0)
-		return;
-
-	mbuf->hash.fdir.hi = meta->mark_id;
-	mbuf->ol_flags |= RTE_MBUF_F_RX_FDIR | RTE_MBUF_F_RX_FDIR_ID;
-}
-
-/* Parse the metadata from packet */
-static void
-nfp_net_parse_meta(struct nfp_net_rx_desc *rxds,
-		struct nfp_net_rxq *rxq,
-		struct nfp_net_hw *hw,
-		struct rte_mbuf *mb,
-		struct nfp_meta_parsed *meta)
-{
-	uint8_t *meta_base;
-	rte_be32_t meta_header;
-
-	if (unlikely(NFP_DESC_META_LEN(rxds) == 0))
-		return;
-
-	meta_base = rte_pktmbuf_mtod_offset(mb, uint8_t *, -NFP_DESC_META_LEN(rxds));
-	meta_header = *(rte_be32_t *)meta_base;
-
-	switch (hw->meta_format) {
-	case NFP_NET_METAFORMAT_CHAINED:
-		if (nfp_net_parse_chained_meta(meta_base, meta_header, meta)) {
-			nfp_net_parse_meta_hash(meta, rxq, mb);
-			nfp_net_parse_meta_vlan(meta, rxds, rxq, mb);
-			nfp_net_parse_meta_qinq(meta, rxq, mb);
-			nfp_net_parse_meta_ipsec(meta, rxq, mb);
-			nfp_net_parse_meta_mark(meta, mb);
-		} else {
-			PMD_RX_LOG(DEBUG, "RX chained metadata format is wrong!");
-		}
-		break;
-	case NFP_NET_METAFORMAT_SINGLE:
-		if ((rxds->rxd.flags & PCIE_DESC_RX_RSS) != 0) {
-			nfp_net_parse_single_meta(meta_base, meta_header, meta);
-			nfp_net_parse_meta_hash(meta, rxq, mb);
-		}
-		break;
-	default:
-		PMD_RX_LOG(DEBUG, "RX metadata do not exist.");
-	}
-}
-
 /**
  * Set packet type to mbuf based on parsed structure.
  *
@@ -1038,50 +779,6 @@ nfp_net_reset_tx_queue(struct nfp_net_txq *txq)
 	txq->rd_p = 0;
 }
 
-void
-nfp_net_set_meta_vlan(struct nfp_net_meta_raw *meta_data,
-		struct rte_mbuf *pkt,
-		uint8_t layer)
-{
-	uint16_t tpid;
-	uint16_t vlan_tci;
-
-	tpid = RTE_ETHER_TYPE_VLAN;
-	vlan_tci = pkt->vlan_tci;
-
-	meta_data->data[layer] = rte_cpu_to_be_32(tpid << 16 | vlan_tci);
-}
-
-void
-nfp_net_set_meta_ipsec(struct nfp_net_meta_raw *meta_data,
-		struct nfp_net_txq *txq,
-		struct rte_mbuf *pkt,
-		uint8_t layer,
-		uint8_t ipsec_layer)
-{
-	int offset;
-	struct nfp_net_hw *hw;
-	struct nfp_tx_ipsec_desc_msg *desc_md;
-
-	hw = txq->hw;
-	offset = hw->ipsec_data->pkt_dynfield_offset;
-	desc_md = RTE_MBUF_DYNFIELD(pkt, offset, struct nfp_tx_ipsec_desc_msg *);
-
-	switch (ipsec_layer) {
-	case NFP_IPSEC_META_SAIDX:
-		meta_data->data[layer] = desc_md->sa_idx;
-		break;
-	case NFP_IPSEC_META_SEQLOW:
-		meta_data->data[layer] = desc_md->esn.low;
-		break;
-	case NFP_IPSEC_META_SEQHI:
-		meta_data->data[layer] = desc_md->esn.hi;
-		break;
-	default:
-		break;
-	}
-}
-
 int
 nfp_net_tx_queue_setup(struct rte_eth_dev *dev,
 		uint16_t queue_idx,
diff --git a/drivers/net/nfp/nfp_rxtx.h b/drivers/net/nfp/nfp_rxtx.h
index 5695a31636..6ecabc232c 100644
--- a/drivers/net/nfp/nfp_rxtx.h
+++ b/drivers/net/nfp/nfp_rxtx.h
@@ -8,18 +8,6 @@
 
 #include <ethdev_driver.h>
 
-#define NFP_DESC_META_LEN(d) ((d)->rxd.meta_len_dd & PCIE_DESC_RX_META_LEN_MASK)
-
-/* Maximum number of NFP packet metadata fields. */
-#define NFP_META_MAX_FIELDS      8
-
-/* Describe the raw metadata format. */
-struct nfp_net_meta_raw {
-	uint32_t header; /**< Field type header (see format in nfp.rst) */
-	uint32_t data[NFP_META_MAX_FIELDS]; /**< Array of each fields data member */
-	uint8_t length; /**< Number of valid fields in @header */
-};
-
 /* Descriptor alignment */
 #define NFP_ALIGN_RING_DESC 128
 
@@ -238,13 +226,5 @@ int nfp_net_tx_queue_setup(struct rte_eth_dev *dev,
 		unsigned int socket_id,
 		const struct rte_eth_txconf *tx_conf);
 uint32_t nfp_net_tx_free_bufs(struct nfp_net_txq *txq);
-void nfp_net_set_meta_vlan(struct nfp_net_meta_raw *meta_data,
-		struct rte_mbuf *pkt,
-		uint8_t layer);
-void nfp_net_set_meta_ipsec(struct nfp_net_meta_raw *meta_data,
-		struct nfp_net_txq *txq,
-		struct rte_mbuf *pkt,
-		uint8_t layer,
-		uint8_t ipsec_layer);
 
 #endif /* __NFP_RXTX_H__ */
-- 
2.39.1


^ permalink raw reply	[relevance 4%]

* [PATCH v7 27/39] mempool: use C11 alignas
  @ 2024-03-04 17:52  3%   ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-03-04 17:52 UTC (permalink / raw)
  To: dev
  Cc: Andrew Rybchenko, Bruce Richardson, Chengwen Feng,
	Cristian Dumitrescu, David Christensen, David Hunt, Ferruh Yigit,
	Honnappa Nagarahalli, Jasvinder Singh, Jerin Jacob, Kevin Laatz,
	Konstantin Ananyev, Min Zhou, Ruifeng Wang, Sameh Gobriel,
	Stanislaw Kardach, Thomas Monjalon, Vladimir Medvedkin,
	Yipeng Wang, Tyler Retzlaff

The current location used for __rte_aligned(a) for alignment of types
and variables is not compatible with MSVC. There is only a single
location accepted by both toolchains.

For variables standard C11 offers alignas(a) supported by conformant
compilers i.e. both MSVC and GCC.

For types the standard offers no alignment facility that compatibly
interoperates with C and C++ but may be achieved by relocating the
placement of __rte_aligned(a) to the aforementioned location accepted
by all currently supported toolchains.

To allow alignment for both compilers do the following:

* Move __rte_aligned from the end of {struct,union} definitions to
  be between {struct,union} and tag.

  The placement between {struct,union} and the tag allows the desired
  alignment to be imparted on the type regardless of the toolchain being
  used for all of GCC, LLVM, MSVC compilers building both C and C++.

* Replace use of __rte_aligned(a) on variables/fields with alignas(a).

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
---
 lib/mempool/rte_mempool.h | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 6fa4d48..23fd5c8 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -34,6 +34,7 @@
  * user cache created with rte_mempool_cache_create().
  */
 
+#include <stdalign.h>
 #include <stdio.h>
 #include <stdint.h>
 #include <inttypes.h>
@@ -66,7 +67,7 @@
  * captured since they can be calculated from other stats.
  * For example: put_cache_objs = put_objs - put_common_pool_objs.
  */
-struct rte_mempool_debug_stats {
+struct __rte_cache_aligned rte_mempool_debug_stats {
 	uint64_t put_bulk;             /**< Number of puts. */
 	uint64_t put_objs;             /**< Number of objects successfully put. */
 	uint64_t put_common_pool_bulk; /**< Number of bulks enqueued in common pool. */
@@ -80,13 +81,13 @@ struct rte_mempool_debug_stats {
 	uint64_t get_success_blks;     /**< Successful allocation number of contiguous blocks. */
 	uint64_t get_fail_blks;        /**< Failed allocation number of contiguous blocks. */
 	RTE_CACHE_GUARD;
-} __rte_cache_aligned;
+};
 #endif
 
 /**
  * A structure that stores a per-core object cache.
  */
-struct rte_mempool_cache {
+struct __rte_cache_aligned rte_mempool_cache {
 	uint32_t size;	      /**< Size of the cache */
 	uint32_t flushthresh; /**< Threshold before we flush excess elements */
 	uint32_t len;	      /**< Current cache count */
@@ -109,8 +110,8 @@ struct rte_mempool_cache {
 	 * Cache is allocated to this size to allow it to overflow in certain
 	 * cases to avoid needless emptying of cache.
 	 */
-	void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned;
-} __rte_cache_aligned;
+	alignas(RTE_CACHE_LINE_SIZE) void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2];
+};
 
 /**
  * A structure that stores the size of mempool elements.
@@ -218,15 +219,15 @@ struct rte_mempool_memhdr {
  * The structure is cache-line aligned to avoid ABI breakages in
  * a number of cases when something small is added.
  */
-struct rte_mempool_info {
+struct __rte_cache_aligned rte_mempool_info {
 	/** Number of objects in the contiguous block */
 	unsigned int contig_block_size;
-} __rte_cache_aligned;
+};
 
 /**
  * The RTE mempool structure.
  */
-struct rte_mempool {
+struct __rte_cache_aligned rte_mempool {
 	char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
 	union {
 		void *pool_data;         /**< Ring or pool to store objects. */
@@ -268,7 +269,7 @@ struct rte_mempool {
 	 */
 	struct rte_mempool_debug_stats stats[RTE_MAX_LCORE + 1];
 #endif
-}  __rte_cache_aligned;
+};
 
 /** Spreading among memory channels not required. */
 #define RTE_MEMPOOL_F_NO_SPREAD		0x0001
@@ -688,7 +689,7 @@ typedef int (*rte_mempool_get_info_t)(const struct rte_mempool *mp,
 
 
 /** Structure defining mempool operations structure */
-struct rte_mempool_ops {
+struct __rte_cache_aligned rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
 	rte_mempool_alloc_t alloc;       /**< Allocate private data. */
 	rte_mempool_free_t free;         /**< Free the external pool. */
@@ -713,7 +714,7 @@ struct rte_mempool_ops {
 	 * Dequeue a number of contiguous object blocks.
 	 */
 	rte_mempool_dequeue_contig_blocks_t dequeue_contig_blocks;
-} __rte_cache_aligned;
+};
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
 
@@ -726,14 +727,14 @@ struct rte_mempool_ops {
  * any function pointers stored directly in the mempool struct would not be.
  * This results in us simply having "ops_index" in the mempool struct.
  */
-struct rte_mempool_ops_table {
+struct __rte_cache_aligned rte_mempool_ops_table {
 	rte_spinlock_t sl;     /**< Spinlock for add/delete. */
 	uint32_t num_ops;      /**< Number of used ops structs in the table. */
 	/**
 	 * Storage for all possible ops structs.
 	 */
 	struct rte_mempool_ops ops[RTE_MEMPOOL_MAX_OPS_IDX];
-} __rte_cache_aligned;
+};
 
 /** Array of registered ops structs. */
 extern struct rte_mempool_ops_table rte_mempool_ops_table;
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* RE: [PATCH 0/7] add Nitrox compress device support
  2024-03-02  9:38  3% Nagadheeraj Rottela
@ 2024-03-04  7:14  0% ` Akhil Goyal
  0 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2024-03-04  7:14 UTC (permalink / raw)
  To: Nagadheeraj Rottela, fanzhang.oss, Ashish Gupta; +Cc: dev, Nagadheeraj Rottela

> Subject: [PATCH 0/7] add Nitrox compress device support
> 
> Add the Nitrox PMD to support Nitrox compress device.
> ---
> v5:
> * Added missing entry for nitrox folder in compress meson.json
> 
> v4:
> * Fixed checkpatch warnings.
> * Updated release notes.
> 
> v3:
> * Fixed ABI compatibility issue.
> 
> v2:
> * Reformatted patches to minimize number of changes.
> * Removed empty file with only copyright.
> * Updated all feature flags in nitrox.ini file.
> * Added separate gotos in nitrox_pci_probe() function.
> 
> Nagadheeraj Rottela (7):
>   crypto/nitrox: move common code
>   drivers/compress: add Nitrox driver
>   common/nitrox: add compress hardware queue management
>   crypto/nitrox: set queue type during queue pair setup
>   compress/nitrox: add software queue management
>   compress/nitrox: support stateless request
>   compress/nitrox: support stateful request
> 
>  MAINTAINERS                                   |    8 +
>  doc/guides/compressdevs/features/nitrox.ini   |   17 +
>  doc/guides/compressdevs/index.rst             |    1 +
>  doc/guides/compressdevs/nitrox.rst            |   50 +
>  doc/guides/rel_notes/release_24_03.rst        |    3 +
>  drivers/common/nitrox/meson.build             |   19 +
>  .../{crypto => common}/nitrox/nitrox_csr.h    |   12 +
>  .../{crypto => common}/nitrox/nitrox_device.c |   51 +-
>  .../{crypto => common}/nitrox/nitrox_device.h |    4 +-
>  .../{crypto => common}/nitrox/nitrox_hal.c    |  116 ++
>  .../{crypto => common}/nitrox/nitrox_hal.h    |  115 ++
>  .../{crypto => common}/nitrox/nitrox_logs.c   |    0
>  .../{crypto => common}/nitrox/nitrox_logs.h   |    0
>  drivers/{crypto => common}/nitrox/nitrox_qp.c |   56 +-
>  drivers/{crypto => common}/nitrox/nitrox_qp.h |   60 +-
>  drivers/common/nitrox/version.map             |    9 +
>  drivers/compress/meson.build                  |    1 +
>  drivers/compress/nitrox/meson.build           |   16 +
>  drivers/compress/nitrox/nitrox_comp.c         |  604 +++++++++
>  drivers/compress/nitrox/nitrox_comp.h         |   35 +
>  drivers/compress/nitrox/nitrox_comp_reqmgr.c  | 1194 +++++++++++++++++
>  drivers/compress/nitrox/nitrox_comp_reqmgr.h  |   58 +
>  drivers/crypto/nitrox/meson.build             |   11 +-
>  drivers/crypto/nitrox/nitrox_sym.c            |    1 +
>  drivers/meson.build                           |    1 +
>  25 files changed, 2412 insertions(+), 30 deletions(-)
>  create mode 100644 doc/guides/compressdevs/features/nitrox.ini
>  create mode 100644 doc/guides/compressdevs/nitrox.rst
>  create mode 100644 drivers/common/nitrox/meson.build
>  rename drivers/{crypto => common}/nitrox/nitrox_csr.h (67%)
>  rename drivers/{crypto => common}/nitrox/nitrox_device.c (77%)
>  rename drivers/{crypto => common}/nitrox/nitrox_device.h (81%)
>  rename drivers/{crypto => common}/nitrox/nitrox_hal.c (65%)
>  rename drivers/{crypto => common}/nitrox/nitrox_hal.h (59%)
>  rename drivers/{crypto => common}/nitrox/nitrox_logs.c (100%)
>  rename drivers/{crypto => common}/nitrox/nitrox_logs.h (100%)
>  rename drivers/{crypto => common}/nitrox/nitrox_qp.c (67%)
>  rename drivers/{crypto => common}/nitrox/nitrox_qp.h (55%)
>  create mode 100644 drivers/common/nitrox/version.map
>  create mode 100644 drivers/compress/nitrox/meson.build
>  create mode 100644 drivers/compress/nitrox/nitrox_comp.c
>  create mode 100644 drivers/compress/nitrox/nitrox_comp.h
>  create mode 100644 drivers/compress/nitrox/nitrox_comp_reqmgr.c
>  create mode 100644 drivers/compress/nitrox/nitrox_comp_reqmgr.h
> 
Applied to dpdk-next-crypto.

Reworked and moved release notes changes to last patch.

^ permalink raw reply	[relevance 0%]

* [RFC 0/7] Improve EAL bit operations API
@ 2024-03-02 13:53  2% Mattias Rönnblom
  0 siblings, 0 replies; 200+ results
From: Mattias Rönnblom @ 2024-03-02 13:53 UTC (permalink / raw)
  To: dev; +Cc: hofors, Heng Wang, Mattias Rönnblom

This patch set represent an attempt to improve and extend the RTE
bitops API, in particular for functions that operate on individual
bits.

RFCv1 is submitted primarily to 1) receive general feedback on if
improvements in this area is worth working on, and 2) receive feedback
on the API.

No test cases are included in v1 and the various functions may well
not do what they are intended to.

The legacy <rte_bitops.h> rte_bit_relaxed_*() family of functions is
replaced with three families:

rte_bit_[test|set|clear|assign][32|64]() which provides no memory
ordering or atomicity guarantees and no read-once or write-once
semantics (e.g., no use of volatile), but does provide the best
performance. The performance degradation resulting from the use of
volatile (e.g., forcing loads and stores to actually occur and in the
number specified) and atomic (e.g., LOCK instructions on x86) may be a
significant.

rte_bit_once_*() which guarantees program-level load and stores
actually occurring (i.e., prevents certain optimizations). The primary
use of these functions are in the context of memory mapped
I/O. Feedback on the details (semantics, naming) here would be greatly
appreciated, since the author is not much of a driver developer.

rte_bit_atomic_*() which provides atomic bit-level operations,
including the possibility to specifying memory ordering constraints
(or the lack thereof).

The atomic functions take non-_Atomic pointers, to be flexible, just
like the GCC builtins and default <rte_stdatomic.h>. The issue with
_Atomic APIs is that it may well be the case that the user wants to
perform both non-atomic and atomic operations on the same word.

Having _Atomic-marked addresses would complicate supporting atomic
bit-level operations in the proposed bitset API (and potentially other
APIs depending on RTE bitops for atomic bit-level ops). Either one
needs two bitset variants, one _Atomic bitset and one non-atomic one,
or the bitset code needs to cast the non-_Atomic pointer to an _Atomic
one. Having a separate _Atomic bitset would be bloat and also prevent
the user from both, in some situations, doing atomic operations
against a bit set, while in other situations (e.g., at times when MT
safety is not a concern) operating on the same words in a non-atomic
manner. That said, all this is still unclear to the author and much
depending on the future path of DPDK atomics.

Unlike rte_bit_relaxed_*(), individual bits are represented by bool,
not uint32_t or uint64_t. The author found the use of such large types
confusing, and also failed to see any performance benefits.

A set of functions rte_bit_*_assign*() are added, to assign a
particular boolean value to a particular bit.

All functions have properly documented semantics.

All functions are available in uint32_t and uint64_t variants.

In addition, for every function there is a generic selection variant
which operates on both 32-bit and 64-bit words (depending on the
pointer type). The use of C11 generic selection is the first in the
DPDK code base.

_Generic allow the user code to be a little more impact. Have a
generic atomic test/set/clear/assign bit API also seems consistent
with the "core" (word-size) atomics API, which is generic (both GCC
builtins and <rte_stdatomic.h> are).

The _Generic versions also may avoid having explicit unsigned long
versions of all functions. If you have an unsigned long, it's safe to
use the generic version (e.g., rte_set_bit()) and _Generic will pick
the right function, provided long is either 32 or 64 bit on your
platform (which it is on all DPDK-supported ABIs).

The generic rte_bit_set() is a macro, and not a function, but
nevertheless has been given a lower-case name. That's how C11 does it
(for atomics, and other _Generic), and <rte_stdatomic.h>. Its address
can't be taken, but it does not evaluate its parameters more than
once.

Things that are left out of this patch set, that may be included
in future versions:

 * Have all functions returning a bit number have the same return type
   (i.e., unsigned int).
 * Harmonize naming of some GCC builtin wrappers (i.e., rte_fls_u32()).
 * Add __builtin_ffsll()/ffs() wrapper and potentially other wrappers
   for useful/used bit-level GCC builtins.
 * Eliminate the MSVC #ifdef-induced documentation duplication.
 * _Generic versions of things like rte_popcount32(). (?)

ABI-breaking patches should probably go into a separate patch set (?).

Mattias Rönnblom (7):
  eal: extend bit manipulation functions
  eal: add generic bit manipulation macros
  eal: add bit manipulation functions which read or write once
  eal: add generic once-type bit operations macros
  eal: add atomic bit operations
  eal: add generic atomic bit operations
  eal: deprecate relaxed family of bit operations

 lib/eal/include/rte_bitops.h | 1115 +++++++++++++++++++++++++++++++++-
 1 file changed, 1113 insertions(+), 2 deletions(-)

-- 
2.34.1


^ permalink raw reply	[relevance 2%]

* [PATCH 0/7] add Nitrox compress device support
@ 2024-03-02  9:38  3% Nagadheeraj Rottela
  2024-03-04  7:14  0% ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Nagadheeraj Rottela @ 2024-03-02  9:38 UTC (permalink / raw)
  To: gakhil, fanzhang.oss, ashishg; +Cc: dev, Nagadheeraj Rottela

Add the Nitrox PMD to support Nitrox compress device.
---
v5:
* Added missing entry for nitrox folder in compress meson.json

v4:
* Fixed checkpatch warnings.
* Updated release notes.

v3:
* Fixed ABI compatibility issue.

v2:
* Reformatted patches to minimize number of changes.
* Removed empty file with only copyright.
* Updated all feature flags in nitrox.ini file.
* Added separate gotos in nitrox_pci_probe() function.

Nagadheeraj Rottela (7):
  crypto/nitrox: move common code
  drivers/compress: add Nitrox driver
  common/nitrox: add compress hardware queue management
  crypto/nitrox: set queue type during queue pair setup
  compress/nitrox: add software queue management
  compress/nitrox: support stateless request
  compress/nitrox: support stateful request

 MAINTAINERS                                   |    8 +
 doc/guides/compressdevs/features/nitrox.ini   |   17 +
 doc/guides/compressdevs/index.rst             |    1 +
 doc/guides/compressdevs/nitrox.rst            |   50 +
 doc/guides/rel_notes/release_24_03.rst        |    3 +
 drivers/common/nitrox/meson.build             |   19 +
 .../{crypto => common}/nitrox/nitrox_csr.h    |   12 +
 .../{crypto => common}/nitrox/nitrox_device.c |   51 +-
 .../{crypto => common}/nitrox/nitrox_device.h |    4 +-
 .../{crypto => common}/nitrox/nitrox_hal.c    |  116 ++
 .../{crypto => common}/nitrox/nitrox_hal.h    |  115 ++
 .../{crypto => common}/nitrox/nitrox_logs.c   |    0
 .../{crypto => common}/nitrox/nitrox_logs.h   |    0
 drivers/{crypto => common}/nitrox/nitrox_qp.c |   56 +-
 drivers/{crypto => common}/nitrox/nitrox_qp.h |   60 +-
 drivers/common/nitrox/version.map             |    9 +
 drivers/compress/meson.build                  |    1 +
 drivers/compress/nitrox/meson.build           |   16 +
 drivers/compress/nitrox/nitrox_comp.c         |  604 +++++++++
 drivers/compress/nitrox/nitrox_comp.h         |   35 +
 drivers/compress/nitrox/nitrox_comp_reqmgr.c  | 1194 +++++++++++++++++
 drivers/compress/nitrox/nitrox_comp_reqmgr.h  |   58 +
 drivers/crypto/nitrox/meson.build             |   11 +-
 drivers/crypto/nitrox/nitrox_sym.c            |    1 +
 drivers/meson.build                           |    1 +
 25 files changed, 2412 insertions(+), 30 deletions(-)
 create mode 100644 doc/guides/compressdevs/features/nitrox.ini
 create mode 100644 doc/guides/compressdevs/nitrox.rst
 create mode 100644 drivers/common/nitrox/meson.build
 rename drivers/{crypto => common}/nitrox/nitrox_csr.h (67%)
 rename drivers/{crypto => common}/nitrox/nitrox_device.c (77%)
 rename drivers/{crypto => common}/nitrox/nitrox_device.h (81%)
 rename drivers/{crypto => common}/nitrox/nitrox_hal.c (65%)
 rename drivers/{crypto => common}/nitrox/nitrox_hal.h (59%)
 rename drivers/{crypto => common}/nitrox/nitrox_logs.c (100%)
 rename drivers/{crypto => common}/nitrox/nitrox_logs.h (100%)
 rename drivers/{crypto => common}/nitrox/nitrox_qp.c (67%)
 rename drivers/{crypto => common}/nitrox/nitrox_qp.h (55%)
 create mode 100644 drivers/common/nitrox/version.map
 create mode 100644 drivers/compress/nitrox/meson.build
 create mode 100644 drivers/compress/nitrox/nitrox_comp.c
 create mode 100644 drivers/compress/nitrox/nitrox_comp.h
 create mode 100644 drivers/compress/nitrox/nitrox_comp_reqmgr.c
 create mode 100644 drivers/compress/nitrox/nitrox_comp_reqmgr.h

-- 
2.42.0


^ permalink raw reply	[relevance 3%]

* [PATCH 0/7] add Nitrox compress device support
@ 2024-03-01 16:25  3% Nagadheeraj Rottela
  0 siblings, 0 replies; 200+ results
From: Nagadheeraj Rottela @ 2024-03-01 16:25 UTC (permalink / raw)
  To: gakhil, fanzhang.oss, ashishg; +Cc: dev, Nagadheeraj Rottela

Add the Nitrox PMD to support Nitrox compress device.
---
v4:
* Fixed checkpatch warnings.
* Updated release notes.

v3:
* Fixed ABI compatibility issue.

v2:
* Reformatted patches to minimize number of changes.
* Removed empty file with only copyright.
* Updated all feature flags in nitrox.ini file.
* Added separate gotos in nitrox_pci_probe() function.

Nagadheeraj Rottela (7):
  crypto/nitrox: move common code
  drivers/compress: add Nitrox driver
  common/nitrox: add compress hardware queue management
  crypto/nitrox: set queue type during queue pair setup
  compress/nitrox: add software queue management
  compress/nitrox: support stateless request
  compress/nitrox: support stateful request

 MAINTAINERS                                   |    8 +
 doc/guides/compressdevs/features/nitrox.ini   |   17 +
 doc/guides/compressdevs/index.rst             |    1 +
 doc/guides/compressdevs/nitrox.rst            |   50 +
 doc/guides/rel_notes/release_24_03.rst        |    3 +
 drivers/common/nitrox/meson.build             |   19 +
 .../{crypto => common}/nitrox/nitrox_csr.h    |   12 +
 .../{crypto => common}/nitrox/nitrox_device.c |   51 +-
 .../{crypto => common}/nitrox/nitrox_device.h |    4 +-
 .../{crypto => common}/nitrox/nitrox_hal.c    |  116 ++
 .../{crypto => common}/nitrox/nitrox_hal.h    |  115 ++
 .../{crypto => common}/nitrox/nitrox_logs.c   |    0
 .../{crypto => common}/nitrox/nitrox_logs.h   |    0
 drivers/{crypto => common}/nitrox/nitrox_qp.c |   56 +-
 drivers/{crypto => common}/nitrox/nitrox_qp.h |   60 +-
 drivers/common/nitrox/version.map             |    9 +
 drivers/compress/nitrox/meson.build           |   16 +
 drivers/compress/nitrox/nitrox_comp.c         |  604 +++++++++
 drivers/compress/nitrox/nitrox_comp.h         |   35 +
 drivers/compress/nitrox/nitrox_comp_reqmgr.c  | 1194 +++++++++++++++++
 drivers/compress/nitrox/nitrox_comp_reqmgr.h  |   58 +
 drivers/crypto/nitrox/meson.build             |   11 +-
 drivers/crypto/nitrox/nitrox_sym.c            |    1 +
 drivers/meson.build                           |    1 +
 24 files changed, 2411 insertions(+), 30 deletions(-)
 create mode 100644 doc/guides/compressdevs/features/nitrox.ini
 create mode 100644 doc/guides/compressdevs/nitrox.rst
 create mode 100644 drivers/common/nitrox/meson.build
 rename drivers/{crypto => common}/nitrox/nitrox_csr.h (67%)
 rename drivers/{crypto => common}/nitrox/nitrox_device.c (77%)
 rename drivers/{crypto => common}/nitrox/nitrox_device.h (81%)
 rename drivers/{crypto => common}/nitrox/nitrox_hal.c (65%)
 rename drivers/{crypto => common}/nitrox/nitrox_hal.h (59%)
 rename drivers/{crypto => common}/nitrox/nitrox_logs.c (100%)
 rename drivers/{crypto => common}/nitrox/nitrox_logs.h (100%)
 rename drivers/{crypto => common}/nitrox/nitrox_qp.c (67%)
 rename drivers/{crypto => common}/nitrox/nitrox_qp.h (55%)
 create mode 100644 drivers/common/nitrox/version.map
 create mode 100644 drivers/compress/nitrox/meson.build
 create mode 100644 drivers/compress/nitrox/nitrox_comp.c
 create mode 100644 drivers/compress/nitrox/nitrox_comp.h
 create mode 100644 drivers/compress/nitrox/nitrox_comp_reqmgr.c
 create mode 100644 drivers/compress/nitrox/nitrox_comp_reqmgr.h

-- 
2.42.0


^ permalink raw reply	[relevance 3%]

* RE: [EXTERNAL] Re: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional transfer
  2024-03-01  9:30  0%             ` fengchengwen
@ 2024-03-01 10:59  0%               ` Amit Prakash Shukla
  2024-03-07 13:41  0%                 ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Amit Prakash Shukla @ 2024-03-01 10:59 UTC (permalink / raw)
  To: fengchengwen, Cheng Jiang, Gowrishankar Muthukrishnan
  Cc: dev, Jerin Jacob, Anoob Joseph, Kevin Laatz, Bruce Richardson,
	Pavan Nikhilesh Bhagavatula

Hi Chengwen,

Please find my reply in-line.

Thanks,
Amit Shukla

> Hi Amit,
> 
> On 2024/3/1 16:31, Amit Prakash Shukla wrote:
> > Hi Chengwen,
> >
> > If I'm not wrong, your concern was about config file additions and not
> > about the test as such. If the config file is getting complicated and
> > there are better alternatives, we can minimize the config file changes
> > with this patch and just provide minimum functionality as required and
> > leave it open for future changes. For now, I can document the existing
> > behavior in documentation as "Constraints". Similar approach is
> > followed in other application such as ipsec-secgw
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__doc.dpdk.org_guid
> > es_sample-5Fapp-5Fug_ipsec-5Fsecgw.html-
> 23constraints&d=DwICaQ&c=nKjWe
> >
> c2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR69VnJLdSnADun7zLaXG1p5Rs7pXihE
> &m=UXlZ
> >
> 1CWj8uotMMmYQ4e7wtBXj4geBwcMUirlqFw0pZzSlOIIAVjWaPgcaXtni370&
> s=haaehrX
> > QSEG6EFRW8w2sHKUTU75aJX7ML8vM-0mJsAI&e=
> 
> Yes, I prefer enable different test just by modify configuration file, and then
> limit the number of entries at the same time.
> 
> This commit is bi-direction transfer, it is fixed, maybe later we should test 3/4
> for mem2dev while 1/4 for dev2mem.

Agreed. We will add this later after the base functionality is merged. I will send next version with constraints listed. Can I assume next version is good for merge?

> 
> sometime we may need evaluate performance of one dma channel for
> mem2mem, while another channel for mem2dev, we can't do this in current
> implement (because vchan_dev is for all DMA channel).

We are okay with extending it later. As you said, we are still deciding how the configuration file should look like.

> 
> So I prefer restrict DMA non-mem2mem's config (include
> dir/type/coreid/pfid/vfid/raddr) as the dma device's private configuration.
> 
> Thanks
> 
> >
> > Constraints:
> > 1. vchan_dev config will be same for all the configured DMA devices.
> > 2. Alternate DMA device will do dev2mem and mem2dev implicitly.
> > Example:
> > xfer_mode=1
> > vchan_dev=raddr=0x200000000,coreid=1,pfid=2,vfid=3
> > lcore_dma=lcore10@0000:00:04.2, lcore11@0000:00:04.3,
> > lcore12@0000:00:04.4, lcore13@0000:00:04.5
> >
> > lcore10@0000:00:04.2, lcore12@0000:00:04.4 will do dev2mem and
> lcore11@0000:00:04.3, lcore13@0000:00:04.5 will do mem2dev.
> >
> > Thanks,
> > Amit Shukla
> >
> >> -----Original Message-----
> >> From: fengchengwen <fengchengwen@huawei.com>
> >> Sent: Friday, March 1, 2024 7:16 AM
> >> To: Amit Prakash Shukla <amitprakashs@marvell.com>; Cheng Jiang
> >> <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
> >> <gmuthukrishn@marvell.com>
> >> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
> >> <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
> >> Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
> >> <pbhagavatula@marvell.com>
> >> Subject: [EXTERNAL] Re: [EXT] Re: [PATCH v2] app/dma-perf: support
> >> bi- directional transfer
> >>
> >> Prioritize security for external emails: Confirm sender and content
> >> safety before clicking links or opening attachments
> >>
> >> ---------------------------------------------------------------------
> >> -
> >> Hi Amit,
> >>
> >> I think this commit will complicated the test, plus futer we may add
> >> more test (e.g. fill)
> >>
> >> I agree Bruce's advise in the [1], let also support "lcore_dma0/1/2",
> >>
> >> User could provide dma info by two way:
> >> 1) lcore_dma=, which seperate each dma with ", ", but a maximum of a
> >> certain number is allowed.
> >> 2) lcore_dma0/1/2/..., each dma device take one line
> >>
> >> [1] https://urldefense.proofpoint.com/v2/url?u=https-
> >> 3A__patchwork.dpdk.org_project_dpdk_patch_20231206112952.1588-
> >> 2D1-2Dvipin.varghese-
> >>
> 40amd.com_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR6
> >> 9VnJLdSnADun7zLaXG1p5Rs7pXihE&m=OwrvdPIi-
> >>
> TQ2UEH3cztfXDzT8YkOB099Pl1mfUzGaq9td0fEWrRBLQQBzAFkjQSU&s=kKin
> >> YsGoNyTxuLEyPJ0LppT17Yq64CvFBtJMirGEISI&e=
> >>
> >> Thanks
> >>
> >> On 2024/2/29 22:03, Amit Prakash Shukla wrote:
> >>> Hi Chengwen,
> >>>
> >>> I liked your suggestion and tried making changes, but encountered
> >>> parsing
> >> issue for CFG files with line greater than CFG_VALUE_LEN=256(current
> >> value set).
> >>>
> >>> There is a discussion on the similar lines in another patch set:
> >> https://urldefense.proofpoint.com/v2/url?u=https-
> >> 3A__patchwork.dpdk.org_project_dpdk_patch_20231206112952.1588-
> >> 2D1-2Dvipin.varghese-
> >>
> 40amd.com_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR6
> >> 9VnJLdSnADun7zLaXG1p5Rs7pXihE&m=OwrvdPIi-
> >>
> TQ2UEH3cztfXDzT8YkOB099Pl1mfUzGaq9td0fEWrRBLQQBzAFkjQSU&s=kKin
> >> YsGoNyTxuLEyPJ0LppT17Yq64CvFBtJMirGEISI&e= .
> >>>
> >>> I believe this patch can be taken as-is and we can come up with the
> >>> solution
> >> when we can increase the CFG_VALUE_LEN as changing CFG_VALUE_LEN in
> >> this release is causing ABI breakage.
> >>>
> >>> Thanks,
> >>> Amit Shukla
> >>>

<snip>

^ permalink raw reply	[relevance 0%]

* Re: [RFC PATCH 1/2] power: refactor core power management library
  2024-03-01  2:56  3%   ` lihuisong (C)
@ 2024-03-01 10:39  0%     ` Hunt, David
  2024-03-05  4:35  3%     ` Tummala, Sivaprasad
  1 sibling, 0 replies; 200+ results
From: Hunt, David @ 2024-03-01 10:39 UTC (permalink / raw)
  To: lihuisong (C),
	Sivaprasad Tummala, anatoly.burakov, jerinj, radu.nicolau,
	gakhil, cristian.dumitrescu, ferruh.yigit, konstantin.ananyev
  Cc: dev


On 01/03/2024 02:56, lihuisong (C) wrote:
>
> 在 2024/2/20 23:33, Sivaprasad Tummala 写道:
>> This patch introduces a comprehensive refactor to the core power
>> management library. The primary focus is on improving modularity
>> and organization by relocating specific driver implementations
>> from the 'lib/power' directory to dedicated directories within
>> 'drivers/power/core/*'. The adjustment of meson.build files
>> enables the selective activation of individual drivers.
>>
>> These changes contribute to a significant enhancement in code
>> organization, providing a clearer structure for driver implementations.
>> The refactor aims to improve overall code clarity and boost
>> maintainability. Additionally, it establishes a foundation for
>> future development, allowing for more focused work on individual
>> drivers and seamless integration of forthcoming enhancements.
>
> Good job. +1 to refacotor.
>
> <...>
>

Also a +1 from me, looks like a sensible re-organisation of the power code.

Regards,
Dave.



>> diff --git a/drivers/meson.build b/drivers/meson.build
>> index f2be71bc05..e293c3945f 100644
>> --- a/drivers/meson.build
>> +++ b/drivers/meson.build
>> @@ -28,6 +28,7 @@ subdirs = [
>>           'event',          # depends on common, bus, mempool and net.
>>           'baseband',       # depends on common and bus.
>>           'gpu',            # depends on common and bus.
>> +        'power',          # depends on common (in future).
>>   ]
>>     if meson.is_cross_build()
>> diff --git a/drivers/power/core/acpi/meson.build 
>> b/drivers/power/core/acpi/meson.build
>> new file mode 100644
>> index 0000000000..d10ec8ee94
>> --- /dev/null
>> +++ b/drivers/power/core/acpi/meson.build
>> @@ -0,0 +1,8 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2024 AMD Limited
>> +
>> +sources = files('power_acpi_cpufreq.c')
>> +
>> +headers = files('power_acpi_cpufreq.h')
>> +
>> +deps += ['power']
>> diff --git a/lib/power/power_acpi_cpufreq.c 
>> b/drivers/power/core/acpi/power_acpi_cpufreq.c
>> similarity index 95%
>> rename from lib/power/power_acpi_cpufreq.c
>> rename to drivers/power/core/acpi/power_acpi_cpufreq.c
> This file is in power lib.
> How about remove the 'power' prefix of this file name?
> like acpi_cpufreq.c, cppc_cpufreq.c.
>> index f8d978d03d..69d80ad2ae 100644
>> --- a/lib/power/power_acpi_cpufreq.c
>> +++ b/drivers/power/core/acpi/power_acpi_cpufreq.c
>> @@ -577,3 +577,22 @@ int power_acpi_get_capabilities(unsigned int 
>> lcore_id,
>>         return 0;
>>   }
>> +
>> +static struct rte_power_ops acpi_ops = {
> How about use the following structure name?
> "struct rte_power_cpufreq_ops" or "struct rte_power_core_ops"
> After all, we also have other power ops, like uncore, right?
>> +    .init = power_acpi_cpufreq_init,
>> +    .exit = power_acpi_cpufreq_exit,
>> +    .check_env_support = power_acpi_cpufreq_check_supported,
>> +    .get_avail_freqs = power_acpi_cpufreq_freqs,
>> +    .get_freq = power_acpi_cpufreq_get_freq,
>> +    .set_freq = power_acpi_cpufreq_set_freq,
>> +    .freq_down = power_acpi_cpufreq_freq_down,
>> +    .freq_up = power_acpi_cpufreq_freq_up,
>> +    .freq_max = power_acpi_cpufreq_freq_max,
>> +    .freq_min = power_acpi_cpufreq_freq_min,
>> +    .turbo_status = power_acpi_turbo_status,
>> +    .enable_turbo = power_acpi_enable_turbo,
>> +    .disable_turbo = power_acpi_disable_turbo,
>> +    .get_caps = power_acpi_get_capabilities
>> +};
>> +
>> +RTE_POWER_REGISTER_OPS(acpi_ops);
>> diff --git a/lib/power/power_acpi_cpufreq.h 
>> b/drivers/power/core/acpi/power_acpi_cpufreq.h
>> similarity index 100%
>> rename from lib/power/power_acpi_cpufreq.h
>> rename to drivers/power/core/acpi/power_acpi_cpufreq.h
>> diff --git a/drivers/power/core/amd-pstate/meson.build 
>> b/drivers/power/core/amd-pstate/meson.build
>> new file mode 100644
>> index 0000000000..8ec4c960f5
>> --- /dev/null
>> +++ b/drivers/power/core/amd-pstate/meson.build
>> @@ -0,0 +1,8 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2024 AMD Limited
>> +
>> +sources = files('power_amd_pstate_cpufreq.c')
>> +
>> +headers = files('power_amd_pstate_cpufreq.h')
>> +
>> +deps += ['power']
>> diff --git a/lib/power/power_amd_pstate_cpufreq.c 
>> b/drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.c
>> similarity index 95%
>> rename from lib/power/power_amd_pstate_cpufreq.c
>> rename to drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.c
>> index 028f84416b..9938de72a6 100644
>> --- a/lib/power/power_amd_pstate_cpufreq.c
>> +++ b/drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.c
>> @@ -700,3 +700,22 @@ power_amd_pstate_get_capabilities(unsigned int 
>> lcore_id,
>>         return 0;
>>   }
>> +
>> +static struct rte_power_ops amd_pstate_ops = {
>> +    .init = power_amd_pstate_cpufreq_init,
>> +    .exit = power_amd_pstate_cpufreq_exit,
>> +    .check_env_support = power_amd_pstate_cpufreq_check_supported,
>> +    .get_avail_freqs = power_amd_pstate_cpufreq_freqs,
>> +    .get_freq = power_amd_pstate_cpufreq_get_freq,
>> +    .set_freq = power_amd_pstate_cpufreq_set_freq,
>> +    .freq_down = power_amd_pstate_cpufreq_freq_down,
>> +    .freq_up = power_amd_pstate_cpufreq_freq_up,
>> +    .freq_max = power_amd_pstate_cpufreq_freq_max,
>> +    .freq_min = power_amd_pstate_cpufreq_freq_min,
>> +    .turbo_status = power_amd_pstate_turbo_status,
>> +    .enable_turbo = power_amd_pstate_enable_turbo,
>> +    .disable_turbo = power_amd_pstate_disable_turbo,
>> +    .get_caps = power_amd_pstate_get_capabilities
>> +};
>> +
>> +RTE_POWER_REGISTER_OPS(amd_pstate_ops);
>> diff --git a/lib/power/power_amd_pstate_cpufreq.h 
>> b/drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.h
>> similarity index 100%
>> rename from lib/power/power_amd_pstate_cpufreq.h
>> rename to drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.h
>> diff --git a/drivers/power/core/cppc/meson.build 
>> b/drivers/power/core/cppc/meson.build
>> new file mode 100644
>> index 0000000000..06f3b99bb8
>> --- /dev/null
>> +++ b/drivers/power/core/cppc/meson.build
>> @@ -0,0 +1,8 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2024 AMD Limited
>> +
>> +sources = files('power_cppc_cpufreq.c')
>> +
>> +headers = files('power_cppc_cpufreq.h')
>> +
>> +deps += ['power']
>> diff --git a/lib/power/power_cppc_cpufreq.c 
>> b/drivers/power/core/cppc/power_cppc_cpufreq.c
>> similarity index 96%
>> rename from lib/power/power_cppc_cpufreq.c
>> rename to drivers/power/core/cppc/power_cppc_cpufreq.c
>> index 3ddf39bd76..605f633309 100644
>> --- a/lib/power/power_cppc_cpufreq.c
>> +++ b/drivers/power/core/cppc/power_cppc_cpufreq.c
>> @@ -685,3 +685,22 @@ power_cppc_get_capabilities(unsigned int lcore_id,
>>         return 0;
>>   }
>> +
>> +static struct rte_power_ops cppc_ops = {
>> +    .init = power_cppc_cpufreq_init,
>> +    .exit = power_cppc_cpufreq_exit,
>> +    .check_env_support = power_cppc_cpufreq_check_supported,
>> +    .get_avail_freqs = power_cppc_cpufreq_freqs,
>> +    .get_freq = power_cppc_cpufreq_get_freq,
>> +    .set_freq = power_cppc_cpufreq_set_freq,
>> +    .freq_down = power_cppc_cpufreq_freq_down,
>> +    .freq_up = power_cppc_cpufreq_freq_up,
>> +    .freq_max = power_cppc_cpufreq_freq_max,
>> +    .freq_min = power_cppc_cpufreq_freq_min,
>> +    .turbo_status = power_cppc_turbo_status,
>> +    .enable_turbo = power_cppc_enable_turbo,
>> +    .disable_turbo = power_cppc_disable_turbo,
>> +    .get_caps = power_cppc_get_capabilities
>> +};
>> +
>> +RTE_POWER_REGISTER_OPS(cppc_ops);
>> diff --git a/lib/power/power_cppc_cpufreq.h 
>> b/drivers/power/core/cppc/power_cppc_cpufreq.h
>> similarity index 100%
>> rename from lib/power/power_cppc_cpufreq.h
>> rename to drivers/power/core/cppc/power_cppc_cpufreq.h
>> diff --git a/lib/power/guest_channel.c 
>> b/drivers/power/core/kvm-vm/guest_channel.c
>> similarity index 100%
>> rename from lib/power/guest_channel.c
>> rename to drivers/power/core/kvm-vm/guest_channel.c
>> diff --git a/lib/power/guest_channel.h 
>> b/drivers/power/core/kvm-vm/guest_channel.h
>> similarity index 100%
>> rename from lib/power/guest_channel.h
>> rename to drivers/power/core/kvm-vm/guest_channel.h
>> diff --git a/drivers/power/core/kvm-vm/meson.build 
>> b/drivers/power/core/kvm-vm/meson.build
>> new file mode 100644
>> index 0000000000..3150c6674b
>> --- /dev/null
>> +++ b/drivers/power/core/kvm-vm/meson.build
>> @@ -0,0 +1,20 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(C) 2024 AMD Limited.
>> +#
>> +
>> +if not is_linux
>> +    build = false
>> +    reason = 'only supported on Linux'
>> +    subdir_done()
>> +endif
>> +
>> +sources = files(
>> +        'guest_channel.c',
>> +        'power_kvm_vm.c',
>> +)
>> +
>> +headers = files(
>> +        'guest_channel.h',
>> +        'power_kvm_vm.h',
>> +)
>> +deps += ['power']
>> diff --git a/lib/power/power_kvm_vm.c 
>> b/drivers/power/core/kvm-vm/power_kvm_vm.c
>> similarity index 83%
>> rename from lib/power/power_kvm_vm.c
>> rename to drivers/power/core/kvm-vm/power_kvm_vm.c
>> index f15be8fac5..a5d6984d26 100644
>> --- a/lib/power/power_kvm_vm.c
>> +++ b/drivers/power/core/kvm-vm/power_kvm_vm.c
>> @@ -137,3 +137,22 @@ int power_kvm_vm_get_capabilities(__rte_unused 
>> unsigned int lcore_id,
>>       POWER_LOG(ERR, "rte_power_get_capabilities is not implemented 
>> for Virtual Machine Power Management");
>>       return -ENOTSUP;
>>   }
>> +
>> +static struct rte_power_ops kvm_vm_ops = {
>> +    .init = power_kvm_vm_init,
>> +    .exit = power_kvm_vm_exit,
>> +    .check_env_support = power_kvm_vm_check_supported,
>> +    .get_avail_freqs = power_kvm_vm_freqs,
>> +    .get_freq = power_kvm_vm_get_freq,
>> +    .set_freq = power_kvm_vm_set_freq,
>> +    .freq_down = power_kvm_vm_freq_down,
>> +    .freq_up = power_kvm_vm_freq_up,
>> +    .freq_max = power_kvm_vm_freq_max,
>> +    .freq_min = power_kvm_vm_freq_min,
>> +    .turbo_status = power_kvm_vm_turbo_status,
>> +    .enable_turbo = power_kvm_vm_enable_turbo,
>> +    .disable_turbo = power_kvm_vm_disable_turbo,
>> +    .get_caps = power_kvm_vm_get_capabilities
>> +};
>> +
>> +RTE_POWER_REGISTER_OPS(kvm_vm_ops);
>> diff --git a/lib/power/power_kvm_vm.h 
>> b/drivers/power/core/kvm-vm/power_kvm_vm.h
>> similarity index 100%
>> rename from lib/power/power_kvm_vm.h
>> rename to drivers/power/core/kvm-vm/power_kvm_vm.h
>> diff --git a/drivers/power/core/meson.build 
>> b/drivers/power/core/meson.build
>> new file mode 100644
>> index 0000000000..4081dafaa0
>> --- /dev/null
>> +++ b/drivers/power/core/meson.build
>> @@ -0,0 +1,12 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2024 AMD Limited
>> +
>> +drivers = [
>> +        'acpi',
>> +        'amd-pstate',
>> +        'cppc',
>> +        'kvm-vm',
>> +        'pstate'
>> +]
>> +
>> +std_deps = ['power']
>> diff --git a/drivers/power/core/pstate/meson.build 
>> b/drivers/power/core/pstate/meson.build
>> new file mode 100644
>> index 0000000000..1025c64e48
>> --- /dev/null
>> +++ b/drivers/power/core/pstate/meson.build
>> @@ -0,0 +1,8 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2024 AMD Limited
>> +
>> +sources = files('power_pstate_cpufreq.c')
>> +
>> +headers = files('power_pstate_cpufreq.h')
>> +
>> +deps += ['power']
>> diff --git a/lib/power/power_pstate_cpufreq.c 
>> b/drivers/power/core/pstate/power_pstate_cpufreq.c
>> similarity index 96%
>> rename from lib/power/power_pstate_cpufreq.c
>> rename to drivers/power/core/pstate/power_pstate_cpufreq.c
>> index 73138dc4e4..d4c3645ff8 100644
>> --- a/lib/power/power_pstate_cpufreq.c
>> +++ b/drivers/power/core/pstate/power_pstate_cpufreq.c
>> @@ -888,3 +888,22 @@ int power_pstate_get_capabilities(unsigned int 
>> lcore_id,
>>         return 0;
>>   }
>> +
>> +static struct rte_power_ops pstate_ops = {
>> +    .init = power_pstate_cpufreq_init,
>> +    .exit = power_pstate_cpufreq_exit,
>> +    .check_env_support = power_pstate_cpufreq_check_supported,
>> +    .get_avail_freqs = power_pstate_cpufreq_freqs,
>> +    .get_freq = power_pstate_cpufreq_get_freq,
>> +    .set_freq = power_pstate_cpufreq_set_freq,
>> +    .freq_down = power_pstate_cpufreq_freq_down,
>> +    .freq_up = power_pstate_cpufreq_freq_up,
>> +    .freq_max = power_pstate_cpufreq_freq_max,
>> +    .freq_min = power_pstate_cpufreq_freq_min,
>> +    .turbo_status = power_pstate_turbo_status,
>> +    .enable_turbo = power_pstate_enable_turbo,
>> +    .disable_turbo = power_pstate_disable_turbo,
>> +    .get_caps = power_pstate_get_capabilities
>> +};
>> +
>> +RTE_POWER_REGISTER_OPS(pstate_ops);
>> diff --git a/lib/power/power_pstate_cpufreq.h 
>> b/drivers/power/core/pstate/power_pstate_cpufreq.h
>> similarity index 100%
>> rename from lib/power/power_pstate_cpufreq.h
>> rename to drivers/power/core/pstate/power_pstate_cpufreq.h
>> diff --git a/drivers/power/meson.build b/drivers/power/meson.build
>> new file mode 100644
>> index 0000000000..7d9034c7ac
>> --- /dev/null
>> +++ b/drivers/power/meson.build
>> @@ -0,0 +1,8 @@
>> +# SPDX-License-Identifier: BSD-3-Clause
>> +# Copyright(c) 2024 AMD Limited
>> +
>> +drivers = [
>> +        'core',
>> +]
>> +
>> +std_deps = ['power']
>> diff --git a/lib/power/meson.build b/lib/power/meson.build
>> index b8426589b2..207d96d877 100644
>> --- a/lib/power/meson.build
>> +++ b/lib/power/meson.build
>> @@ -12,14 +12,8 @@ if not is_linux
>>       reason = 'only supported on Linux'
>>   endif
>>   sources = files(
>> -        'guest_channel.c',
>> -        'power_acpi_cpufreq.c',
>> -        'power_amd_pstate_cpufreq.c',
>>           'power_common.c',
>> -        'power_cppc_cpufreq.c',
>> -        'power_kvm_vm.c',
>>           'power_intel_uncore.c',
>> -        'power_pstate_cpufreq.c',
>>           'rte_power.c',
>>           'rte_power_uncore.c',
>>           'rte_power_pmd_mgmt.c',
>> diff --git a/lib/power/power_common.h b/lib/power/power_common.h
>> index 30966400ba..c90b611f4f 100644
>> --- a/lib/power/power_common.h
>> +++ b/lib/power/power_common.h
>> @@ -23,13 +23,24 @@ extern int power_logtype;
>>   #endif
>>     /* check if scaling driver matches one we want */
>> +__rte_internal
>>   int cpufreq_check_scaling_driver(const char *driver);
>> +
>> +__rte_internal
>>   int power_set_governor(unsigned int lcore_id, const char 
>> *new_governor,
>>           char *orig_governor, size_t orig_governor_len);
>> +
>> +__rte_internal
>>   int open_core_sysfs_file(FILE **f, const char *mode, const char 
>> *format, ...)
>>           __rte_format_printf(3, 4);
>> +
>> +__rte_internal
>>   int read_core_sysfs_u32(FILE *f, uint32_t *val);
>> +
>> +__rte_internal
>>   int read_core_sysfs_s(FILE *f, char *buf, unsigned int len);
>> +
>> +__rte_internal
>>   int write_core_sysfs_s(FILE *f, const char *str);
>>     #endif /* _POWER_COMMON_H_ */
>> diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c
>> index 36c3f3da98..70176807f4 100644
>> --- a/lib/power/rte_power.c
>> +++ b/lib/power/rte_power.c
>> @@ -8,64 +8,80 @@
>>   #include <rte_spinlock.h>
>>     #include "rte_power.h"
>> -#include "power_acpi_cpufreq.h"
>> -#include "power_cppc_cpufreq.h"
>>   #include "power_common.h"
>> -#include "power_kvm_vm.h"
>> -#include "power_pstate_cpufreq.h"
>> -#include "power_amd_pstate_cpufreq.h"
>>     enum power_management_env global_default_env = PM_ENV_NOT_SET;
> use a pointer to save the current power cpufreq ops?
>>     static rte_spinlock_t global_env_cfg_lock = 
>> RTE_SPINLOCK_INITIALIZER;
>> +static struct rte_power_ops rte_power_ops[PM_ENV_MAX];
>>   -/* function pointers */
>> -rte_power_freqs_t rte_power_freqs  = NULL;
>> -rte_power_get_freq_t rte_power_get_freq = NULL;
>> -rte_power_set_freq_t rte_power_set_freq = NULL;
>> -rte_power_freq_change_t rte_power_freq_up = NULL;
>> -rte_power_freq_change_t rte_power_freq_down = NULL;
>> -rte_power_freq_change_t rte_power_freq_max = NULL;
>> -rte_power_freq_change_t rte_power_freq_min = NULL;
>> -rte_power_freq_change_t rte_power_turbo_status;
>> -rte_power_freq_change_t rte_power_freq_enable_turbo;
>> -rte_power_freq_change_t rte_power_freq_disable_turbo;
>> -rte_power_get_capabilities_t rte_power_get_capabilities;
>> -
>> -static void
>> -reset_power_function_ptrs(void)
>> +/* register the ops struct in rte_power_ops, return 0 on success. */
>> +int
>> +rte_power_register_ops(const struct rte_power_ops *op)
>> +{
>> +    struct rte_power_ops *ops;
>> +
>> +    if (op->env >= PM_ENV_MAX) {
>> +        POWER_LOG(ERR, "Unsupported power management environment\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (op->status != 0) {
>> +        POWER_LOG(ERR, "Power management env[%d] ops registered 
>> already\n",
>> +            op->env);
>> +        return -EINVAL;
>> +    }
>> +
>> +    if (!op->init || !op->exit || !op->check_env_support ||
>> +        !op->get_avail_freqs || !op->get_freq || !op->set_freq ||
>> +        !op->freq_up || !op->freq_down || !op->freq_max ||
>> +        !op->freq_min || !op->turbo_status || !op->enable_turbo ||
>> +        !op->disable_turbo || !op->get_caps) {
>> +        POWER_LOG(ERR, "Missing callbacks while registering power 
>> ops\n");
>> +        return -EINVAL;
>> +    }
>> +
>> +    ops = &rte_power_ops[op->env];
> It is better to use a global linked list instead of an array.
> And we should extract a list structure including this ops structure 
> and this ops's owner.
>> +    ops->env = op->env;
>> +    ops->init = op->init;
>> +    ops->exit = op->exit;
>> +    ops->check_env_support = op->check_env_support;
>> +    ops->get_avail_freqs = op->get_avail_freqs;
>> +    ops->get_freq = op->get_freq;
>> +    ops->set_freq = op->set_freq;
>> +    ops->freq_up = op->freq_up;
>> +    ops->freq_down = op->freq_down;
>> +    ops->freq_max = op->freq_max;
>> +    ops->freq_min = op->freq_min;
>> +    ops->turbo_status = op->turbo_status;
>> +    ops->enable_turbo = op->enable_turbo;
>> +    ops->disable_turbo = op->disable_turbo;
> *ops = *op?
>> +    ops->status = 1; /* registered */
> status --> registered?
> But if use ops linked list, this flag also can be removed.
>> +
>> +    return 0;
>> +}
>> +
>> +struct rte_power_ops *
>> +rte_power_get_ops(int ops_index)
> AFAICS, there is only one cpufreq driver on one platform and just have 
> one power_cpufreq_ops to use for user.
> We don't need user to get other power ops, and user just want to know 
> the power ops using currently, right?
> So using 'index' toget this ops is not good.
>>   {
>> -    rte_power_freqs  = NULL;
>> -    rte_power_get_freq = NULL;
>> -    rte_power_set_freq = NULL;
>> -    rte_power_freq_up = NULL;
>> -    rte_power_freq_down = NULL;
>> -    rte_power_freq_max = NULL;
>> -    rte_power_freq_min = NULL;
>> -    rte_power_turbo_status = NULL;
>> -    rte_power_freq_enable_turbo = NULL;
>> -    rte_power_freq_disable_turbo = NULL;
>> -    rte_power_get_capabilities = NULL;
>> +    RTE_VERIFY((ops_index >= PM_ENV_NOT_SET) && (ops_index < 
>> PM_ENV_MAX));
>> +    RTE_VERIFY(rte_power_ops[ops_index].status != 0);
>> +
>> +    return &rte_power_ops[ops_index];
>>   }
>>     int
>>   rte_power_check_env_supported(enum power_management_env env)
>>   {
>> -    switch (env) {
>> -    case PM_ENV_ACPI_CPUFREQ:
>> -        return power_acpi_cpufreq_check_supported();
>> -    case PM_ENV_PSTATE_CPUFREQ:
>> -        return power_pstate_cpufreq_check_supported();
>> -    case PM_ENV_KVM_VM:
>> -        return power_kvm_vm_check_supported();
>> -    case PM_ENV_CPPC_CPUFREQ:
>> -        return power_cppc_cpufreq_check_supported();
>> -    case PM_ENV_AMD_PSTATE_CPUFREQ:
>> -        return power_amd_pstate_cpufreq_check_supported();
>> -    default:
>> -        rte_errno = EINVAL;
>> -        return -1;
>> +    struct rte_power_ops *ops;
>> +
>> +    if ((env > PM_ENV_NOT_SET) && (env < PM_ENV_MAX)) {
>> +        ops = rte_power_get_ops(env);
>> +        return ops->check_env_support();
>>       }
>> +
>> +    rte_errno = EINVAL;
>> +    return -1;
>>   }
>>     int
>> @@ -80,80 +96,26 @@ rte_power_set_env(enum power_management_env env)
>>       }
>>         int ret = 0;
>> +    struct rte_power_ops *ops;
>> +
>> +    if ((env == PM_ENV_NOT_SET) || (env >= PM_ENV_MAX)) {
>> +        POWER_LOG(ERR, "Invalid Power Management Environment(%d)"
>> +                " set\n", env);
>> +        ret = -1;
>> +    }
> <...>
>> +    ops = rte_power_get_ops(env);
> To find the target ops from the global list according to the env?
>> +    if (ops->status == 0) {
>> +        POWER_LOG(ERR, WER,
>> +            "Power Management Environment(%d) not"
>> +            " registered\n", env);
>>           ret = -1;
>>       }
>>         if (ret == 0)
>>           global_default_env = env;
> It is more convenient to use a global variable to point to the default 
> power_cpufreq ops or its list node.
>> -    else {
>> +    else
>>           global_default_env = PM_ENV_NOT_SET;
>> -        reset_power_function_ptrs();
>> -    }
>>         rte_spinlock_unlock(&global_env_cfg_lock);
>>       return ret;
>> @@ -164,7 +126,6 @@ rte_power_unset_env(void)
>>   {
>>       rte_spinlock_lock(&global_env_cfg_lock);
>>       global_default_env = PM_ENV_NOT_SET;
>> -    reset_power_function_ptrs();
>>       rte_spinlock_unlock(&global_env_cfg_lock);
>>   }
>>   @@ -177,59 +138,76 @@ int
>>   rte_power_init(unsigned int lcore_id)
>>   {
>>       int ret = -1;
>> +    struct rte_power_ops *ops;
>>   -    switch (global_default_env) {
>> -    case PM_ENV_ACPI_CPUFREQ:
>> -        return power_acpi_cpufreq_init(lcore_id);
>> -    case PM_ENV_KVM_VM:
>> -        return power_kvm_vm_init(lcore_id);
>> -    case PM_ENV_PSTATE_CPUFREQ:
>> -        return power_pstate_cpufreq_init(lcore_id);
>> -    case PM_ENV_CPPC_CPUFREQ:
>> -        return power_cppc_cpufreq_init(lcore_id);
>> -    case PM_ENV_AMD_PSTATE_CPUFREQ:
>> -        return power_amd_pstate_cpufreq_init(lcore_id);
>> -    default:
>> -        POWER_LOG(INFO, "Env isn't set yet!");
>> +    if (global_default_env != PM_ENV_NOT_SET) {
>> +        ops = &rte_power_ops[global_default_env];
>> +        if (!ops->status) {
>> +            POWER_LOG(ERR, "Power management env[%d] not"
>> +                " supported\n", global_default_env);
>> +            goto out;
>> +        }
>> +        return ops->init(lcore_id);
>>       }
>> +    POWER_LOG(INFO, POWER, "Env isn't set yet!\n");
>>         /* Auto detect Environment */
>> -    POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq power 
>> management...");
>> -    ret = power_acpi_cpufreq_init(lcore_id);
>> -    if (ret == 0) {
>> -        rte_power_set_env(PM_ENV_ACPI_CPUFREQ);
>> -        goto out;
>> +    POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq"
>> +            " power management...\n");
>> +    ops = &rte_power_ops[PM_ENV_ACPI_CPUFREQ];
>> +    if (ops->status) {
>> +        ret = ops->init(lcore_id);
>> +        if (ret == 0) {
>> +            rte_power_set_env(PM_ENV_ACPI_CPUFREQ);
>> +            goto out;
>> +        }
>>       }
>>   -    POWER_LOG(INFO, "Attempting to initialise PSTAT power 
>> management...");
>> -    ret = power_pstate_cpufreq_init(lcore_id);
>> -    if (ret == 0) {
>> -        rte_power_set_env(PM_ENV_PSTATE_CPUFREQ);
>> -        goto out;
>> +    POWER_LOG(INFO, "Attempting to initialise PSTAT"
>> +            " power management...\n");
>> +    ops = &rte_power_ops[PM_ENV_PSTATE_CPUFREQ];
>> +    if (ops->status) {
>> +        ret = ops->init(lcore_id);
>> +        if (ret == 0) {
>> +            rte_power_set_env(PM_ENV_PSTATE_CPUFREQ);
>> +            goto out;
>> +        }
>>       }
>>   -    POWER_LOG(INFO, "Attempting to initialise AMD PSTATE power 
>> management...");
>> -    ret = power_amd_pstate_cpufreq_init(lcore_id);
>> -    if (ret == 0) {
>> -        rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ);
>> -        goto out;
>> +    POWER_LOG(INFO,    "Attempting to initialise AMD PSTATE"
>> +            " power management...\n");
>> +    ops = &rte_power_ops[PM_ENV_AMD_PSTATE_CPUFREQ];
>> +    if (ops->status) {
>> +        ret = ops->init(lcore_id);
>> +        if (ret == 0) {
>> +            rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ);
>> +            goto out;
>> +        }
>>       }
>>   -    POWER_LOG(INFO, "Attempting to initialise CPPC power 
>> management...");
>> -    ret = power_cppc_cpufreq_init(lcore_id);
>> -    if (ret == 0) {
>> -        rte_power_set_env(PM_ENV_CPPC_CPUFREQ);
>> -        goto out;
>> +    POWER_LOG(INFO, "Attempting to initialise CPPC power"
>> +            " management...\n");
>> +    ops = &rte_power_ops[PM_ENV_CPPC_CPUFREQ];
>> +    if (ops->status) {
>> +        ret = ops->init(lcore_id);
>> +        if (ret == 0) {
>> +            rte_power_set_env(PM_ENV_CPPC_CPUFREQ);
>> +            goto out;
>> +        }
>>       }
>>   -    POWER_LOG(INFO, "Attempting to initialise VM power 
>> management...");
>> -    ret = power_kvm_vm_init(lcore_id);
>> -    if (ret == 0) {
>> -        rte_power_set_env(PM_ENV_KVM_VM);
>> -        goto out;
>> +    POWER_LOG(INFO, "Attempting to initialise VM power"
>> +            " management...\n");
>> +    ops = &rte_power_ops[PM_ENV_KVM_VM];
>> +    if (ops->status) {
>> +        ret = ops->init(lcore_id);
>> +        if (ret == 0) {
>> +            rte_power_set_env(PM_ENV_KVM_VM);
>> +            goto out;
>> +        }
>>       }
> If we use a linked list, above code can be simpled like this:
> ->
> for_each_power_cpufreq_ops(ops, ...) {
>     ret = ops->init()
>     if (ret) {
>         ....
>     }
> }
>> -    POWER_LOG(ERR, "Unable to set Power Management Environment for 
>> lcore "
>> -            "%u", lcore_id);
>> +    POWER_LOG(ERR, "Unable to set Power Management Environment"
>> +            " for lcore %u\n", lcore_id);
>>   out:
>>       return ret;
>>   }
>> @@ -237,21 +215,14 @@ rte_power_init(unsigned int lcore_id)
>>   int
>>   rte_power_exit(unsigned int lcore_id)
>>   {
>> -    switch (global_default_env) {
>> -    case PM_ENV_ACPI_CPUFREQ:
>> -        return power_acpi_cpufreq_exit(lcore_id);
>> -    case PM_ENV_KVM_VM:
>> -        return power_kvm_vm_exit(lcore_id);
>> -    case PM_ENV_PSTATE_CPUFREQ:
>> -        return power_pstate_cpufreq_exit(lcore_id);
>> -    case PM_ENV_CPPC_CPUFREQ:
>> -        return power_cppc_cpufreq_exit(lcore_id);
>> -    case PM_ENV_AMD_PSTATE_CPUFREQ:
>> -        return power_amd_pstate_cpufreq_exit(lcore_id);
>> -    default:
>> -        POWER_LOG(ERR, "Environment has not been set, unable to exit 
>> gracefully");
>> +    struct rte_power_ops *ops;
>>   +    if (global_default_env != PM_ENV_NOT_SET) {
>> +        ops = &rte_power_ops[global_default_env];
>> +        return ops->exit(lcore_id);
>>       }
>> -    return -1;
>> +    POWER_LOG(ERR, "Environment has not been set, unable "
>> +            "to exit gracefully\n");
>>   +    return -1;
>>   }
>> diff --git a/lib/power/rte_power.h b/lib/power/rte_power.h
>> index 4fa4afe399..749bb823ab 100644
>> --- a/lib/power/rte_power.h
>> +++ b/lib/power/rte_power.h
>> @@ -1,5 +1,6 @@
>>   /* SPDX-License-Identifier: BSD-3-Clause
>>    * Copyright(c) 2010-2014 Intel Corporation
>> + * Copyright(c) 2024 AMD Limited
>>    */
>>     #ifndef _RTE_POWER_H
>> @@ -21,7 +22,7 @@ extern "C" {
>>   /* Power Management Environment State */
>>   enum power_management_env {PM_ENV_NOT_SET, PM_ENV_ACPI_CPUFREQ, 
>> PM_ENV_KVM_VM,
>>           PM_ENV_PSTATE_CPUFREQ, PM_ENV_CPPC_CPUFREQ,
>> -        PM_ENV_AMD_PSTATE_CPUFREQ};
>> +        PM_ENV_AMD_PSTATE_CPUFREQ, PM_ENV_MAX};
> "enum power_management_env" is not good. may be like "enum 
> power_cpufreq_driver_type"?
> In previous linked list structure to be defined, may be directly use a 
> string name instead of a fixed enum is better.
> Becuase the new "PM_ENV_MAX" will  lead to break ABI when add a new 
> cpufreq driver.
>>     /**
>>    * Check if a specific power management environment type is 
>> supported on a
>> @@ -66,6 +67,97 @@ void rte_power_unset_env(void);
>>    */
>>   enum power_management_env rte_power_get_env(void);
>>   +typedef int (*rte_power_cpufreq_init_t)(unsigned int lcore_id);
>> +typedef int (*rte_power_cpufreq_exit_t)(unsigned int lcore_id);
>> +typedef int (*rte_power_check_env_support_t)(void);
>> +
>> +typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, 
>> uint32_t *freqs,
>> +                    uint32_t num);
>> +typedef uint32_t (*rte_power_get_freq_t)(unsigned int lcore_id);
>> +typedef int (*rte_power_set_freq_t)(unsigned int lcore_id, uint32_t 
>> index);
>> +typedef int (*rte_power_freq_change_t)(unsigned int lcore_id);
>> +
>> +/**
>> + * Function pointer definition for generic frequency change 
>> functions. Review
>> + * each environments specific documentation for usage.
>> + *
>> + * @param lcore_id
>> + *  lcore id.
>> + *
>> + * @return
>> + *  - 1 on success with frequency changed.
>> + *  - 0 on success without frequency changed.
>> + *  - Negative on error.
>> + */
>> +
>> +/**
>> + * Power capabilities summary.
>> + */
>> +struct rte_power_core_capabilities {
>> +    union {
>> +        uint64_t capabilities;
>> +        struct {
>> +            uint64_t turbo:1;       /**< Turbo can be enabled. */
>> +            uint64_t priority:1;    /**< SST-BF high freq core */
>> +        };
>> +    };
>> +};
>> +
>> +typedef int (*rte_power_get_capabilities_t)(unsigned int lcore_id,
>> +                struct rte_power_core_capabilities *caps);
>> +
>> +/** Structure defining core power operations structure */
>> +struct rte_power_ops {
>> +uint8_t status;                         /**< ops register status. */
>> +    enum power_management_env env;          /**< power mgmt env. */
>> +    rte_power_cpufreq_init_t init;    /**< Initialize power 
>> management. */
>> +    rte_power_cpufreq_exit_t exit;    /**< Exit power management. */
>> +    rte_power_check_env_support_t check_env_support; /**< verify env 
>> is supported. */
>> +    rte_power_freqs_t get_avail_freqs; /**< Get the available 
>> frequencies. */
>> +    rte_power_get_freq_t get_freq; /**< Get frequency index. */
>> +    rte_power_set_freq_t set_freq; /**< Set frequency index. */
>> +    rte_power_freq_change_t freq_up;   /**< Scale up frequency. */
>> +    rte_power_freq_change_t freq_down; /**< Scale down frequency. */
>> +    rte_power_freq_change_t freq_max;  /**< Scale up frequency to 
>> highest. */
>> +    rte_power_freq_change_t freq_min;  /**< Scale up frequency to 
>> lowest. */
>> +    rte_power_freq_change_t turbo_status; /**< Get Turbo status. */
>> +    rte_power_freq_change_t enable_turbo; /**< Enable Turbo. */
>> +    rte_power_freq_change_t disable_turbo; /**< Disable Turbo. */
>> +    rte_power_get_capabilities_t get_caps; /**< power capabilities. */
>> +} __rte_cache_aligned;
> Suggest that fix this sturcture, like:
> struct rte_power_cpufreq_list {
>     char name[];   // like "cppc_cpufreq", "pstate_cpufreq"
>     struct rte_power_cpufreq *ops;
>     struct rte_power_cpufreq_list *node;
> }
>> +
>> +/**
>> + * Register power cpu frequency operations.
>> + *
>> + * @param ops
>> + *   Pointer to an ops structure to register.
>> + * @return
>> + *   - >=0: Success; return the index of the ops struct in the table.
>> + *   - -EINVAL - error while registering ops struct.
>> + */
>> +__rte_internal
>> +int rte_power_register_ops(const struct rte_power_ops *ops);
>> +
>> +/**
>> + * Macro to statically register the ops of a cpufreq driver.
>> + */
>> +#define RTE_POWER_REGISTER_OPS(ops)        \
>> +    (RTE_INIT(power_hdlr_init_##ops)    \
>> +    {                    \
>> +        rte_power_register_ops(&ops);    \
>> +    })
>> +
>> +/**
>> + * @internal Get the power ops struct from its index.
>> + *
>> + * @param ops_index
>> + *   The index of the ops struct in the ops struct table.
>> + * @return
>> + *   The pointer to the ops struct in the table if registered.
>> + */
>> +struct rte_power_ops *
>> +rte_power_get_ops(int ops_index);
>> +
>>   /**
>>    * Initialize power management for a specific lcore. If 
>> rte_power_set_env() has
>>    * not been called then an auto-detect of the environment will 
>> start and
>> @@ -108,10 +200,14 @@ int rte_power_exit(unsigned int lcore_id);
>>    * @return
>>    *  The number of available frequencies.
>>    */
>> -typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, 
>> uint32_t *freqs,
>> -        uint32_t num);
>> +static inline uint32_t
>> +rte_power_freqs(unsigned int lcore_id, uint32_t *freqs, uint32_t n)
>> +{
>> +    struct rte_power_ops *ops;
>>   -extern rte_power_freqs_t rte_power_freqs;
>> +    ops = rte_power_get_ops(rte_power_get_env());
>> +    return ops->get_avail_freqs(lcore_id, freqs, n);
>> +}
> nice.
> <...>

^ permalink raw reply	[relevance 0%]

* Re: [EXTERNAL] Re: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional transfer
  2024-03-01  8:31  0%           ` [EXTERNAL] " Amit Prakash Shukla
@ 2024-03-01  9:30  0%             ` fengchengwen
  2024-03-01 10:59  0%               ` Amit Prakash Shukla
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2024-03-01  9:30 UTC (permalink / raw)
  To: Amit Prakash Shukla, Cheng Jiang, Gowrishankar Muthukrishnan
  Cc: dev, Jerin Jacob, Anoob Joseph, Kevin Laatz, Bruce Richardson,
	Pavan Nikhilesh Bhagavatula

Hi Amit,

On 2024/3/1 16:31, Amit Prakash Shukla wrote:
> Hi Chengwen,
> 
> If I'm not wrong, your concern was about config file additions and not about the test as such. If the config file is getting complicated and there are better alternatives, we can minimize the config file changes with this patch and just provide minimum functionality as required and leave it open for future changes. For now, I can document the existing behavior in documentation as "Constraints". Similar approach is followed in other application such as ipsec-secgw https://doc.dpdk.org/guides/sample_app_ug/ipsec_secgw.html#constraints

Yes, I prefer enable different test just by modify configuration file, and then limit the number of entries at the same time.

This commit is bi-direction transfer, it is fixed, maybe later we should test 3/4 for mem2dev while 1/4 for dev2mem.

sometime we may need evaluate performance of one dma channel for mem2mem, while another channel for mem2dev, we can't do
this in current implement (because vchan_dev is for all DMA channel).

So I prefer restrict DMA non-mem2mem's config (include dir/type/coreid/pfid/vfid/raddr) as the dma device's private configuration.

Thanks

> 
> Constraints:
> 1. vchan_dev config will be same for all the configured DMA devices.
> 2. Alternate DMA device will do dev2mem and mem2dev implicitly.
> Example:
> xfer_mode=1
> vchan_dev=raddr=0x200000000,coreid=1,pfid=2,vfid=3
> lcore_dma=lcore10@0000:00:04.2, lcore11@0000:00:04.3, lcore12@0000:00:04.4, lcore13@0000:00:04.5
> 
> lcore10@0000:00:04.2, lcore12@0000:00:04.4 will do dev2mem and lcore11@0000:00:04.3, lcore13@0000:00:04.5 will do mem2dev.
> 
> Thanks,
> Amit Shukla
> 
>> -----Original Message-----
>> From: fengchengwen <fengchengwen@huawei.com>
>> Sent: Friday, March 1, 2024 7:16 AM
>> To: Amit Prakash Shukla <amitprakashs@marvell.com>; Cheng Jiang
>> <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
>> <gmuthukrishn@marvell.com>
>> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
>> <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
>> Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
>> <pbhagavatula@marvell.com>
>> Subject: [EXTERNAL] Re: [EXT] Re: [PATCH v2] app/dma-perf: support bi-
>> directional transfer
>>
>> Prioritize security for external emails: Confirm sender and content safety
>> before clicking links or opening attachments
>>
>> ----------------------------------------------------------------------
>> Hi Amit,
>>
>> I think this commit will complicated the test, plus futer we may add more test
>> (e.g. fill)
>>
>> I agree Bruce's advise in the [1], let also support "lcore_dma0/1/2",
>>
>> User could provide dma info by two way:
>> 1) lcore_dma=, which seperate each dma with ", ", but a maximum of a certain
>> number is allowed.
>> 2) lcore_dma0/1/2/..., each dma device take one line
>>
>> [1] https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__patchwork.dpdk.org_project_dpdk_patch_20231206112952.1588-
>> 2D1-2Dvipin.varghese-
>> 40amd.com_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR6
>> 9VnJLdSnADun7zLaXG1p5Rs7pXihE&m=OwrvdPIi-
>> TQ2UEH3cztfXDzT8YkOB099Pl1mfUzGaq9td0fEWrRBLQQBzAFkjQSU&s=kKin
>> YsGoNyTxuLEyPJ0LppT17Yq64CvFBtJMirGEISI&e=
>>
>> Thanks
>>
>> On 2024/2/29 22:03, Amit Prakash Shukla wrote:
>>> Hi Chengwen,
>>>
>>> I liked your suggestion and tried making changes, but encountered parsing
>> issue for CFG files with line greater than CFG_VALUE_LEN=256(current value
>> set).
>>>
>>> There is a discussion on the similar lines in another patch set:
>> https://urldefense.proofpoint.com/v2/url?u=https-
>> 3A__patchwork.dpdk.org_project_dpdk_patch_20231206112952.1588-
>> 2D1-2Dvipin.varghese-
>> 40amd.com_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR6
>> 9VnJLdSnADun7zLaXG1p5Rs7pXihE&m=OwrvdPIi-
>> TQ2UEH3cztfXDzT8YkOB099Pl1mfUzGaq9td0fEWrRBLQQBzAFkjQSU&s=kKin
>> YsGoNyTxuLEyPJ0LppT17Yq64CvFBtJMirGEISI&e= .
>>>
>>> I believe this patch can be taken as-is and we can come up with the solution
>> when we can increase the CFG_VALUE_LEN as changing CFG_VALUE_LEN in
>> this release is causing ABI breakage.
>>>
>>> Thanks,
>>> Amit Shukla
>>>
>>>> -----Original Message-----
>>>> From: Amit Prakash Shukla
>>>> Sent: Wednesday, February 28, 2024 3:08 PM
>>>> To: fengchengwen <fengchengwen@huawei.com>; Cheng Jiang
>>>> <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
>>>> <gmuthukrishn@marvell.com>
>>>> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
>>>> <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
>>>> Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
>>>> <pbhagavatula@marvell.com>
>>>> Subject: RE: [EXT] Re: [PATCH v2] app/dma-perf: support
>>>> bi-directional transfer
>>>>
>>>> Hi Chengwen,
>>>>
>>>> Please see my reply in-line.
>>>>
>>>> Thanks
>>>> Amit Shukla
>>>>
>>>>> -----Original Message-----
>>>>> From: fengchengwen <fengchengwen@huawei.com>
>>>>> Sent: Wednesday, February 28, 2024 12:34 PM
>>>>> To: Amit Prakash Shukla <amitprakashs@marvell.com>; Cheng Jiang
>>>>> <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
>>>>> <gmuthukrishn@marvell.com>
>>>>> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
>>>>> <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
>>>>> Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
>>>>> <pbhagavatula@marvell.com>
>>>>> Subject: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional
>>>>> transfer
>>>>>
>>>>> External Email
>>>>>
>>>>> --------------------------------------------------------------------
>>>>> --
>>>>> Hi Amit and Gowrishankar,
>>>>>
>>>>> It's nature to support multiple dmadev test in one testcase, and the
>>>>> original framework supports it.
>>>>> But it seem we both complicated it when adding support for non-
>>>> mem2mem
>>>>> dma test.
>>>>>
>>>>> The new added "direction" and "vchan_dev" could treat as the
>>>>> dmadev's private configure, some thing like:
>>>>>
>>>>>
>>>>
>> lcore_dma=lcore10@0000:00:04.2,vchan=0,dir=mem2dev,devtype=pcie,radd
>>>>> r=xxx,coreid=1,pfid=2,vfid=3
>>>>>
>>>>> then this bi-directional could impl only with config:
>>>>>
>>>>>
>>>>
>> lcore_dma=lcore10@0000:00:04.2,dir=mem2dev,devtype=pcie,raddr=xxx,cor
>>>>> eid=1,pfid=2,vfid=3,
>>>>>
>>>>
>> lcore11@0000:00:04.3,dir=dev2mem,devtype=pcie,raddr=xxx,coreid=1,pfid
>>>> =
>>>>> 2,vfid=3
>>>>> so that the lcore10 will do mem2dev with device 0000:00:04.2, while
>>>>> lcore11 will do dev2mem with device 0000:00:04.3.
>>>>
>>>> Thanks for the suggestion. I will make the suggested changes and send
>>>> the next version.

^ permalink raw reply	[relevance 0%]

* RE: [EXTERNAL] Re: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional transfer
  2024-03-01  1:46  0%         ` fengchengwen
@ 2024-03-01  8:31  0%           ` Amit Prakash Shukla
  2024-03-01  9:30  0%             ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Amit Prakash Shukla @ 2024-03-01  8:31 UTC (permalink / raw)
  To: fengchengwen, Cheng Jiang, Gowrishankar Muthukrishnan
  Cc: dev, Jerin Jacob, Anoob Joseph, Kevin Laatz, Bruce Richardson,
	Pavan Nikhilesh Bhagavatula

Hi Chengwen,

If I'm not wrong, your concern was about config file additions and not about the test as such. If the config file is getting complicated and there are better alternatives, we can minimize the config file changes with this patch and just provide minimum functionality as required and leave it open for future changes. For now, I can document the existing behavior in documentation as "Constraints". Similar approach is followed in other application such as ipsec-secgw https://doc.dpdk.org/guides/sample_app_ug/ipsec_secgw.html#constraints

Constraints:
1. vchan_dev config will be same for all the configured DMA devices.
2. Alternate DMA device will do dev2mem and mem2dev implicitly.
Example:
xfer_mode=1
vchan_dev=raddr=0x200000000,coreid=1,pfid=2,vfid=3
lcore_dma=lcore10@0000:00:04.2, lcore11@0000:00:04.3, lcore12@0000:00:04.4, lcore13@0000:00:04.5

lcore10@0000:00:04.2, lcore12@0000:00:04.4 will do dev2mem and lcore11@0000:00:04.3, lcore13@0000:00:04.5 will do mem2dev.

Thanks,
Amit Shukla

> -----Original Message-----
> From: fengchengwen <fengchengwen@huawei.com>
> Sent: Friday, March 1, 2024 7:16 AM
> To: Amit Prakash Shukla <amitprakashs@marvell.com>; Cheng Jiang
> <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
> <gmuthukrishn@marvell.com>
> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
> <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
> Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
> <pbhagavatula@marvell.com>
> Subject: [EXTERNAL] Re: [EXT] Re: [PATCH v2] app/dma-perf: support bi-
> directional transfer
> 
> Prioritize security for external emails: Confirm sender and content safety
> before clicking links or opening attachments
> 
> ----------------------------------------------------------------------
> Hi Amit,
> 
> I think this commit will complicated the test, plus futer we may add more test
> (e.g. fill)
> 
> I agree Bruce's advise in the [1], let also support "lcore_dma0/1/2",
> 
> User could provide dma info by two way:
> 1) lcore_dma=, which seperate each dma with ", ", but a maximum of a certain
> number is allowed.
> 2) lcore_dma0/1/2/..., each dma device take one line
> 
> [1] https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__patchwork.dpdk.org_project_dpdk_patch_20231206112952.1588-
> 2D1-2Dvipin.varghese-
> 40amd.com_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR6
> 9VnJLdSnADun7zLaXG1p5Rs7pXihE&m=OwrvdPIi-
> TQ2UEH3cztfXDzT8YkOB099Pl1mfUzGaq9td0fEWrRBLQQBzAFkjQSU&s=kKin
> YsGoNyTxuLEyPJ0LppT17Yq64CvFBtJMirGEISI&e=
> 
> Thanks
> 
> On 2024/2/29 22:03, Amit Prakash Shukla wrote:
> > Hi Chengwen,
> >
> > I liked your suggestion and tried making changes, but encountered parsing
> issue for CFG files with line greater than CFG_VALUE_LEN=256(current value
> set).
> >
> > There is a discussion on the similar lines in another patch set:
> https://urldefense.proofpoint.com/v2/url?u=https-
> 3A__patchwork.dpdk.org_project_dpdk_patch_20231206112952.1588-
> 2D1-2Dvipin.varghese-
> 40amd.com_&d=DwICaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=ALGdXl3fZgFGR6
> 9VnJLdSnADun7zLaXG1p5Rs7pXihE&m=OwrvdPIi-
> TQ2UEH3cztfXDzT8YkOB099Pl1mfUzGaq9td0fEWrRBLQQBzAFkjQSU&s=kKin
> YsGoNyTxuLEyPJ0LppT17Yq64CvFBtJMirGEISI&e= .
> >
> > I believe this patch can be taken as-is and we can come up with the solution
> when we can increase the CFG_VALUE_LEN as changing CFG_VALUE_LEN in
> this release is causing ABI breakage.
> >
> > Thanks,
> > Amit Shukla
> >
> >> -----Original Message-----
> >> From: Amit Prakash Shukla
> >> Sent: Wednesday, February 28, 2024 3:08 PM
> >> To: fengchengwen <fengchengwen@huawei.com>; Cheng Jiang
> >> <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
> >> <gmuthukrishn@marvell.com>
> >> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
> >> <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
> >> Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
> >> <pbhagavatula@marvell.com>
> >> Subject: RE: [EXT] Re: [PATCH v2] app/dma-perf: support
> >> bi-directional transfer
> >>
> >> Hi Chengwen,
> >>
> >> Please see my reply in-line.
> >>
> >> Thanks
> >> Amit Shukla
> >>
> >>> -----Original Message-----
> >>> From: fengchengwen <fengchengwen@huawei.com>
> >>> Sent: Wednesday, February 28, 2024 12:34 PM
> >>> To: Amit Prakash Shukla <amitprakashs@marvell.com>; Cheng Jiang
> >>> <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
> >>> <gmuthukrishn@marvell.com>
> >>> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
> >>> <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
> >>> Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
> >>> <pbhagavatula@marvell.com>
> >>> Subject: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional
> >>> transfer
> >>>
> >>> External Email
> >>>
> >>> --------------------------------------------------------------------
> >>> --
> >>> Hi Amit and Gowrishankar,
> >>>
> >>> It's nature to support multiple dmadev test in one testcase, and the
> >>> original framework supports it.
> >>> But it seem we both complicated it when adding support for non-
> >> mem2mem
> >>> dma test.
> >>>
> >>> The new added "direction" and "vchan_dev" could treat as the
> >>> dmadev's private configure, some thing like:
> >>>
> >>>
> >>
> lcore_dma=lcore10@0000:00:04.2,vchan=0,dir=mem2dev,devtype=pcie,radd
> >>> r=xxx,coreid=1,pfid=2,vfid=3
> >>>
> >>> then this bi-directional could impl only with config:
> >>>
> >>>
> >>
> lcore_dma=lcore10@0000:00:04.2,dir=mem2dev,devtype=pcie,raddr=xxx,cor
> >>> eid=1,pfid=2,vfid=3,
> >>>
> >>
> lcore11@0000:00:04.3,dir=dev2mem,devtype=pcie,raddr=xxx,coreid=1,pfid
> >> =
> >>> 2,vfid=3
> >>> so that the lcore10 will do mem2dev with device 0000:00:04.2, while
> >>> lcore11 will do dev2mem with device 0000:00:04.3.
> >>
> >> Thanks for the suggestion. I will make the suggested changes and send
> >> the next version.

^ permalink raw reply	[relevance 0%]

* Re: [RFC PATCH 1/2] power: refactor core power management library
  @ 2024-03-01  2:56  3%   ` lihuisong (C)
  2024-03-01 10:39  0%     ` Hunt, David
  2024-03-05  4:35  3%     ` Tummala, Sivaprasad
  0 siblings, 2 replies; 200+ results
From: lihuisong (C) @ 2024-03-01  2:56 UTC (permalink / raw)
  To: Sivaprasad Tummala, david.hunt, anatoly.burakov, jerinj,
	radu.nicolau, gakhil, cristian.dumitrescu, ferruh.yigit,
	konstantin.ananyev
  Cc: dev


在 2024/2/20 23:33, Sivaprasad Tummala 写道:
> This patch introduces a comprehensive refactor to the core power
> management library. The primary focus is on improving modularity
> and organization by relocating specific driver implementations
> from the 'lib/power' directory to dedicated directories within
> 'drivers/power/core/*'. The adjustment of meson.build files
> enables the selective activation of individual drivers.
>
> These changes contribute to a significant enhancement in code
> organization, providing a clearer structure for driver implementations.
> The refactor aims to improve overall code clarity and boost
> maintainability. Additionally, it establishes a foundation for
> future development, allowing for more focused work on individual
> drivers and seamless integration of forthcoming enhancements.

Good job. +1 to refacotor.

<...>

> diff --git a/drivers/meson.build b/drivers/meson.build
> index f2be71bc05..e293c3945f 100644
> --- a/drivers/meson.build
> +++ b/drivers/meson.build
> @@ -28,6 +28,7 @@ subdirs = [
>           'event',          # depends on common, bus, mempool and net.
>           'baseband',       # depends on common and bus.
>           'gpu',            # depends on common and bus.
> +        'power',          # depends on common (in future).
>   ]
>   
>   if meson.is_cross_build()
> diff --git a/drivers/power/core/acpi/meson.build b/drivers/power/core/acpi/meson.build
> new file mode 100644
> index 0000000000..d10ec8ee94
> --- /dev/null
> +++ b/drivers/power/core/acpi/meson.build
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2024 AMD Limited
> +
> +sources = files('power_acpi_cpufreq.c')
> +
> +headers = files('power_acpi_cpufreq.h')
> +
> +deps += ['power']
> diff --git a/lib/power/power_acpi_cpufreq.c b/drivers/power/core/acpi/power_acpi_cpufreq.c
> similarity index 95%
> rename from lib/power/power_acpi_cpufreq.c
> rename to drivers/power/core/acpi/power_acpi_cpufreq.c
This file is in power lib.
How about remove the 'power' prefix of this file name?
like acpi_cpufreq.c, cppc_cpufreq.c.
> index f8d978d03d..69d80ad2ae 100644
> --- a/lib/power/power_acpi_cpufreq.c
> +++ b/drivers/power/core/acpi/power_acpi_cpufreq.c
> @@ -577,3 +577,22 @@ int power_acpi_get_capabilities(unsigned int lcore_id,
>   
>   	return 0;
>   }
> +
> +static struct rte_power_ops acpi_ops = {
How about use the following structure name?
"struct rte_power_cpufreq_ops" or "struct rte_power_core_ops"
After all, we also have other power ops, like uncore, right?
> +	.init = power_acpi_cpufreq_init,
> +	.exit = power_acpi_cpufreq_exit,
> +	.check_env_support = power_acpi_cpufreq_check_supported,
> +	.get_avail_freqs = power_acpi_cpufreq_freqs,
> +	.get_freq = power_acpi_cpufreq_get_freq,
> +	.set_freq = power_acpi_cpufreq_set_freq,
> +	.freq_down = power_acpi_cpufreq_freq_down,
> +	.freq_up = power_acpi_cpufreq_freq_up,
> +	.freq_max = power_acpi_cpufreq_freq_max,
> +	.freq_min = power_acpi_cpufreq_freq_min,
> +	.turbo_status = power_acpi_turbo_status,
> +	.enable_turbo = power_acpi_enable_turbo,
> +	.disable_turbo = power_acpi_disable_turbo,
> +	.get_caps = power_acpi_get_capabilities
> +};
> +
> +RTE_POWER_REGISTER_OPS(acpi_ops);
> diff --git a/lib/power/power_acpi_cpufreq.h b/drivers/power/core/acpi/power_acpi_cpufreq.h
> similarity index 100%
> rename from lib/power/power_acpi_cpufreq.h
> rename to drivers/power/core/acpi/power_acpi_cpufreq.h
> diff --git a/drivers/power/core/amd-pstate/meson.build b/drivers/power/core/amd-pstate/meson.build
> new file mode 100644
> index 0000000000..8ec4c960f5
> --- /dev/null
> +++ b/drivers/power/core/amd-pstate/meson.build
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2024 AMD Limited
> +
> +sources = files('power_amd_pstate_cpufreq.c')
> +
> +headers = files('power_amd_pstate_cpufreq.h')
> +
> +deps += ['power']
> diff --git a/lib/power/power_amd_pstate_cpufreq.c b/drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.c
> similarity index 95%
> rename from lib/power/power_amd_pstate_cpufreq.c
> rename to drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.c
> index 028f84416b..9938de72a6 100644
> --- a/lib/power/power_amd_pstate_cpufreq.c
> +++ b/drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.c
> @@ -700,3 +700,22 @@ power_amd_pstate_get_capabilities(unsigned int lcore_id,
>   
>   	return 0;
>   }
> +
> +static struct rte_power_ops amd_pstate_ops = {
> +	.init = power_amd_pstate_cpufreq_init,
> +	.exit = power_amd_pstate_cpufreq_exit,
> +	.check_env_support = power_amd_pstate_cpufreq_check_supported,
> +	.get_avail_freqs = power_amd_pstate_cpufreq_freqs,
> +	.get_freq = power_amd_pstate_cpufreq_get_freq,
> +	.set_freq = power_amd_pstate_cpufreq_set_freq,
> +	.freq_down = power_amd_pstate_cpufreq_freq_down,
> +	.freq_up = power_amd_pstate_cpufreq_freq_up,
> +	.freq_max = power_amd_pstate_cpufreq_freq_max,
> +	.freq_min = power_amd_pstate_cpufreq_freq_min,
> +	.turbo_status = power_amd_pstate_turbo_status,
> +	.enable_turbo = power_amd_pstate_enable_turbo,
> +	.disable_turbo = power_amd_pstate_disable_turbo,
> +	.get_caps = power_amd_pstate_get_capabilities
> +};
> +
> +RTE_POWER_REGISTER_OPS(amd_pstate_ops);
> diff --git a/lib/power/power_amd_pstate_cpufreq.h b/drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.h
> similarity index 100%
> rename from lib/power/power_amd_pstate_cpufreq.h
> rename to drivers/power/core/amd-pstate/power_amd_pstate_cpufreq.h
> diff --git a/drivers/power/core/cppc/meson.build b/drivers/power/core/cppc/meson.build
> new file mode 100644
> index 0000000000..06f3b99bb8
> --- /dev/null
> +++ b/drivers/power/core/cppc/meson.build
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2024 AMD Limited
> +
> +sources = files('power_cppc_cpufreq.c')
> +
> +headers = files('power_cppc_cpufreq.h')
> +
> +deps += ['power']
> diff --git a/lib/power/power_cppc_cpufreq.c b/drivers/power/core/cppc/power_cppc_cpufreq.c
> similarity index 96%
> rename from lib/power/power_cppc_cpufreq.c
> rename to drivers/power/core/cppc/power_cppc_cpufreq.c
> index 3ddf39bd76..605f633309 100644
> --- a/lib/power/power_cppc_cpufreq.c
> +++ b/drivers/power/core/cppc/power_cppc_cpufreq.c
> @@ -685,3 +685,22 @@ power_cppc_get_capabilities(unsigned int lcore_id,
>   
>   	return 0;
>   }
> +
> +static struct rte_power_ops cppc_ops = {
> +	.init = power_cppc_cpufreq_init,
> +	.exit = power_cppc_cpufreq_exit,
> +	.check_env_support = power_cppc_cpufreq_check_supported,
> +	.get_avail_freqs = power_cppc_cpufreq_freqs,
> +	.get_freq = power_cppc_cpufreq_get_freq,
> +	.set_freq = power_cppc_cpufreq_set_freq,
> +	.freq_down = power_cppc_cpufreq_freq_down,
> +	.freq_up = power_cppc_cpufreq_freq_up,
> +	.freq_max = power_cppc_cpufreq_freq_max,
> +	.freq_min = power_cppc_cpufreq_freq_min,
> +	.turbo_status = power_cppc_turbo_status,
> +	.enable_turbo = power_cppc_enable_turbo,
> +	.disable_turbo = power_cppc_disable_turbo,
> +	.get_caps = power_cppc_get_capabilities
> +};
> +
> +RTE_POWER_REGISTER_OPS(cppc_ops);
> diff --git a/lib/power/power_cppc_cpufreq.h b/drivers/power/core/cppc/power_cppc_cpufreq.h
> similarity index 100%
> rename from lib/power/power_cppc_cpufreq.h
> rename to drivers/power/core/cppc/power_cppc_cpufreq.h
> diff --git a/lib/power/guest_channel.c b/drivers/power/core/kvm-vm/guest_channel.c
> similarity index 100%
> rename from lib/power/guest_channel.c
> rename to drivers/power/core/kvm-vm/guest_channel.c
> diff --git a/lib/power/guest_channel.h b/drivers/power/core/kvm-vm/guest_channel.h
> similarity index 100%
> rename from lib/power/guest_channel.h
> rename to drivers/power/core/kvm-vm/guest_channel.h
> diff --git a/drivers/power/core/kvm-vm/meson.build b/drivers/power/core/kvm-vm/meson.build
> new file mode 100644
> index 0000000000..3150c6674b
> --- /dev/null
> +++ b/drivers/power/core/kvm-vm/meson.build
> @@ -0,0 +1,20 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(C) 2024 AMD Limited.
> +#
> +
> +if not is_linux
> +    build = false
> +    reason = 'only supported on Linux'
> +    subdir_done()
> +endif
> +
> +sources = files(
> +        'guest_channel.c',
> +        'power_kvm_vm.c',
> +)
> +
> +headers = files(
> +        'guest_channel.h',
> +        'power_kvm_vm.h',
> +)
> +deps += ['power']
> diff --git a/lib/power/power_kvm_vm.c b/drivers/power/core/kvm-vm/power_kvm_vm.c
> similarity index 83%
> rename from lib/power/power_kvm_vm.c
> rename to drivers/power/core/kvm-vm/power_kvm_vm.c
> index f15be8fac5..a5d6984d26 100644
> --- a/lib/power/power_kvm_vm.c
> +++ b/drivers/power/core/kvm-vm/power_kvm_vm.c
> @@ -137,3 +137,22 @@ int power_kvm_vm_get_capabilities(__rte_unused unsigned int lcore_id,
>   	POWER_LOG(ERR, "rte_power_get_capabilities is not implemented for Virtual Machine Power Management");
>   	return -ENOTSUP;
>   }
> +
> +static struct rte_power_ops kvm_vm_ops = {
> +	.init = power_kvm_vm_init,
> +	.exit = power_kvm_vm_exit,
> +	.check_env_support = power_kvm_vm_check_supported,
> +	.get_avail_freqs = power_kvm_vm_freqs,
> +	.get_freq = power_kvm_vm_get_freq,
> +	.set_freq = power_kvm_vm_set_freq,
> +	.freq_down = power_kvm_vm_freq_down,
> +	.freq_up = power_kvm_vm_freq_up,
> +	.freq_max = power_kvm_vm_freq_max,
> +	.freq_min = power_kvm_vm_freq_min,
> +	.turbo_status = power_kvm_vm_turbo_status,
> +	.enable_turbo = power_kvm_vm_enable_turbo,
> +	.disable_turbo = power_kvm_vm_disable_turbo,
> +	.get_caps = power_kvm_vm_get_capabilities
> +};
> +
> +RTE_POWER_REGISTER_OPS(kvm_vm_ops);
> diff --git a/lib/power/power_kvm_vm.h b/drivers/power/core/kvm-vm/power_kvm_vm.h
> similarity index 100%
> rename from lib/power/power_kvm_vm.h
> rename to drivers/power/core/kvm-vm/power_kvm_vm.h
> diff --git a/drivers/power/core/meson.build b/drivers/power/core/meson.build
> new file mode 100644
> index 0000000000..4081dafaa0
> --- /dev/null
> +++ b/drivers/power/core/meson.build
> @@ -0,0 +1,12 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2024 AMD Limited
> +
> +drivers = [
> +        'acpi',
> +        'amd-pstate',
> +        'cppc',
> +        'kvm-vm',
> +        'pstate'
> +]
> +
> +std_deps = ['power']
> diff --git a/drivers/power/core/pstate/meson.build b/drivers/power/core/pstate/meson.build
> new file mode 100644
> index 0000000000..1025c64e48
> --- /dev/null
> +++ b/drivers/power/core/pstate/meson.build
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2024 AMD Limited
> +
> +sources = files('power_pstate_cpufreq.c')
> +
> +headers = files('power_pstate_cpufreq.h')
> +
> +deps += ['power']
> diff --git a/lib/power/power_pstate_cpufreq.c b/drivers/power/core/pstate/power_pstate_cpufreq.c
> similarity index 96%
> rename from lib/power/power_pstate_cpufreq.c
> rename to drivers/power/core/pstate/power_pstate_cpufreq.c
> index 73138dc4e4..d4c3645ff8 100644
> --- a/lib/power/power_pstate_cpufreq.c
> +++ b/drivers/power/core/pstate/power_pstate_cpufreq.c
> @@ -888,3 +888,22 @@ int power_pstate_get_capabilities(unsigned int lcore_id,
>   
>   	return 0;
>   }
> +
> +static struct rte_power_ops pstate_ops = {
> +	.init = power_pstate_cpufreq_init,
> +	.exit = power_pstate_cpufreq_exit,
> +	.check_env_support = power_pstate_cpufreq_check_supported,
> +	.get_avail_freqs = power_pstate_cpufreq_freqs,
> +	.get_freq = power_pstate_cpufreq_get_freq,
> +	.set_freq = power_pstate_cpufreq_set_freq,
> +	.freq_down = power_pstate_cpufreq_freq_down,
> +	.freq_up = power_pstate_cpufreq_freq_up,
> +	.freq_max = power_pstate_cpufreq_freq_max,
> +	.freq_min = power_pstate_cpufreq_freq_min,
> +	.turbo_status = power_pstate_turbo_status,
> +	.enable_turbo = power_pstate_enable_turbo,
> +	.disable_turbo = power_pstate_disable_turbo,
> +	.get_caps = power_pstate_get_capabilities
> +};
> +
> +RTE_POWER_REGISTER_OPS(pstate_ops);
> diff --git a/lib/power/power_pstate_cpufreq.h b/drivers/power/core/pstate/power_pstate_cpufreq.h
> similarity index 100%
> rename from lib/power/power_pstate_cpufreq.h
> rename to drivers/power/core/pstate/power_pstate_cpufreq.h
> diff --git a/drivers/power/meson.build b/drivers/power/meson.build
> new file mode 100644
> index 0000000000..7d9034c7ac
> --- /dev/null
> +++ b/drivers/power/meson.build
> @@ -0,0 +1,8 @@
> +# SPDX-License-Identifier: BSD-3-Clause
> +# Copyright(c) 2024 AMD Limited
> +
> +drivers = [
> +        'core',
> +]
> +
> +std_deps = ['power']
> diff --git a/lib/power/meson.build b/lib/power/meson.build
> index b8426589b2..207d96d877 100644
> --- a/lib/power/meson.build
> +++ b/lib/power/meson.build
> @@ -12,14 +12,8 @@ if not is_linux
>       reason = 'only supported on Linux'
>   endif
>   sources = files(
> -        'guest_channel.c',
> -        'power_acpi_cpufreq.c',
> -        'power_amd_pstate_cpufreq.c',
>           'power_common.c',
> -        'power_cppc_cpufreq.c',
> -        'power_kvm_vm.c',
>           'power_intel_uncore.c',
> -        'power_pstate_cpufreq.c',
>           'rte_power.c',
>           'rte_power_uncore.c',
>           'rte_power_pmd_mgmt.c',
> diff --git a/lib/power/power_common.h b/lib/power/power_common.h
> index 30966400ba..c90b611f4f 100644
> --- a/lib/power/power_common.h
> +++ b/lib/power/power_common.h
> @@ -23,13 +23,24 @@ extern int power_logtype;
>   #endif
>   
>   /* check if scaling driver matches one we want */
> +__rte_internal
>   int cpufreq_check_scaling_driver(const char *driver);
> +
> +__rte_internal
>   int power_set_governor(unsigned int lcore_id, const char *new_governor,
>   		char *orig_governor, size_t orig_governor_len);
> +
> +__rte_internal
>   int open_core_sysfs_file(FILE **f, const char *mode, const char *format, ...)
>   		__rte_format_printf(3, 4);
> +
> +__rte_internal
>   int read_core_sysfs_u32(FILE *f, uint32_t *val);
> +
> +__rte_internal
>   int read_core_sysfs_s(FILE *f, char *buf, unsigned int len);
> +
> +__rte_internal
>   int write_core_sysfs_s(FILE *f, const char *str);
>   
>   #endif /* _POWER_COMMON_H_ */
> diff --git a/lib/power/rte_power.c b/lib/power/rte_power.c
> index 36c3f3da98..70176807f4 100644
> --- a/lib/power/rte_power.c
> +++ b/lib/power/rte_power.c
> @@ -8,64 +8,80 @@
>   #include <rte_spinlock.h>
>   
>   #include "rte_power.h"
> -#include "power_acpi_cpufreq.h"
> -#include "power_cppc_cpufreq.h"
>   #include "power_common.h"
> -#include "power_kvm_vm.h"
> -#include "power_pstate_cpufreq.h"
> -#include "power_amd_pstate_cpufreq.h"
>   
>   enum power_management_env global_default_env = PM_ENV_NOT_SET;
use a pointer to save the current power cpufreq ops?
>   
>   static rte_spinlock_t global_env_cfg_lock = RTE_SPINLOCK_INITIALIZER;
> +static struct rte_power_ops rte_power_ops[PM_ENV_MAX];
>   
> -/* function pointers */
> -rte_power_freqs_t rte_power_freqs  = NULL;
> -rte_power_get_freq_t rte_power_get_freq = NULL;
> -rte_power_set_freq_t rte_power_set_freq = NULL;
> -rte_power_freq_change_t rte_power_freq_up = NULL;
> -rte_power_freq_change_t rte_power_freq_down = NULL;
> -rte_power_freq_change_t rte_power_freq_max = NULL;
> -rte_power_freq_change_t rte_power_freq_min = NULL;
> -rte_power_freq_change_t rte_power_turbo_status;
> -rte_power_freq_change_t rte_power_freq_enable_turbo;
> -rte_power_freq_change_t rte_power_freq_disable_turbo;
> -rte_power_get_capabilities_t rte_power_get_capabilities;
> -
> -static void
> -reset_power_function_ptrs(void)
> +/* register the ops struct in rte_power_ops, return 0 on success. */
> +int
> +rte_power_register_ops(const struct rte_power_ops *op)
> +{
> +	struct rte_power_ops *ops;
> +
> +	if (op->env >= PM_ENV_MAX) {
> +		POWER_LOG(ERR, "Unsupported power management environment\n");
> +		return -EINVAL;
> +	}
> +
> +	if (op->status != 0) {
> +		POWER_LOG(ERR, "Power management env[%d] ops registered already\n",
> +			op->env);
> +		return -EINVAL;
> +	}
> +
> +	if (!op->init || !op->exit || !op->check_env_support ||
> +		!op->get_avail_freqs || !op->get_freq || !op->set_freq ||
> +		!op->freq_up || !op->freq_down || !op->freq_max ||
> +		!op->freq_min || !op->turbo_status || !op->enable_turbo ||
> +		!op->disable_turbo || !op->get_caps) {
> +		POWER_LOG(ERR, "Missing callbacks while registering power ops\n");
> +		return -EINVAL;
> +	}
> +
> +	ops = &rte_power_ops[op->env];
It is better to use a global linked list instead of an array.
And we should extract a list structure including this ops structure and 
this ops's owner.
> +	ops->env = op->env;
> +	ops->init = op->init;
> +	ops->exit = op->exit;
> +	ops->check_env_support = op->check_env_support;
> +	ops->get_avail_freqs = op->get_avail_freqs;
> +	ops->get_freq = op->get_freq;
> +	ops->set_freq = op->set_freq;
> +	ops->freq_up = op->freq_up;
> +	ops->freq_down = op->freq_down;
> +	ops->freq_max = op->freq_max;
> +	ops->freq_min = op->freq_min;
> +	ops->turbo_status = op->turbo_status;
> +	ops->enable_turbo = op->enable_turbo;
> +	ops->disable_turbo = op->disable_turbo;
*ops = *op?
> +	ops->status = 1; /* registered */
status --> registered?
But if use ops linked list, this flag also can be removed.
> +
> +	return 0;
> +}
> +
> +struct rte_power_ops *
> +rte_power_get_ops(int ops_index)
AFAICS, there is only one cpufreq driver on one platform and just have 
one power_cpufreq_ops to use for user.
We don't need user to get other power ops, and user just want to know 
the power ops using currently, right?
So using 'index' toget this ops is not good.
>   {
> -	rte_power_freqs  = NULL;
> -	rte_power_get_freq = NULL;
> -	rte_power_set_freq = NULL;
> -	rte_power_freq_up = NULL;
> -	rte_power_freq_down = NULL;
> -	rte_power_freq_max = NULL;
> -	rte_power_freq_min = NULL;
> -	rte_power_turbo_status = NULL;
> -	rte_power_freq_enable_turbo = NULL;
> -	rte_power_freq_disable_turbo = NULL;
> -	rte_power_get_capabilities = NULL;
> +	RTE_VERIFY((ops_index >= PM_ENV_NOT_SET) && (ops_index < PM_ENV_MAX));
> +	RTE_VERIFY(rte_power_ops[ops_index].status != 0);
> +
> +	return &rte_power_ops[ops_index];
>   }
>   
>   int
>   rte_power_check_env_supported(enum power_management_env env)
>   {
> -	switch (env) {
> -	case PM_ENV_ACPI_CPUFREQ:
> -		return power_acpi_cpufreq_check_supported();
> -	case PM_ENV_PSTATE_CPUFREQ:
> -		return power_pstate_cpufreq_check_supported();
> -	case PM_ENV_KVM_VM:
> -		return power_kvm_vm_check_supported();
> -	case PM_ENV_CPPC_CPUFREQ:
> -		return power_cppc_cpufreq_check_supported();
> -	case PM_ENV_AMD_PSTATE_CPUFREQ:
> -		return power_amd_pstate_cpufreq_check_supported();
> -	default:
> -		rte_errno = EINVAL;
> -		return -1;
> +	struct rte_power_ops *ops;
> +
> +	if ((env > PM_ENV_NOT_SET) && (env < PM_ENV_MAX)) {
> +		ops = rte_power_get_ops(env);
> +		return ops->check_env_support();
>   	}
> +
> +	rte_errno = EINVAL;
> +	return -1;
>   }
>   
>   int
> @@ -80,80 +96,26 @@ rte_power_set_env(enum power_management_env env)
>   	}
>   
>   	int ret = 0;
> +	struct rte_power_ops *ops;
> +
> +	if ((env == PM_ENV_NOT_SET) || (env >= PM_ENV_MAX)) {
> +		POWER_LOG(ERR, "Invalid Power Management Environment(%d)"
> +				" set\n", env);
> +		ret = -1;
> +	}
>   
<...>
> +	ops = rte_power_get_ops(env);
To find the target ops from the global list according to the env?
> +	if (ops->status == 0) {
> +		POWER_LOG(ERR, WER,
> +			"Power Management Environment(%d) not"
> +			" registered\n", env);
>   		ret = -1;
>   	}
>   
>   	if (ret == 0)
>   		global_default_env = env;
It is more convenient to use a global variable to point to the default 
power_cpufreq ops or its list node.
> -	else {
> +	else
>   		global_default_env = PM_ENV_NOT_SET;
> -		reset_power_function_ptrs();
> -	}
>   
>   	rte_spinlock_unlock(&global_env_cfg_lock);
>   	return ret;
> @@ -164,7 +126,6 @@ rte_power_unset_env(void)
>   {
>   	rte_spinlock_lock(&global_env_cfg_lock);
>   	global_default_env = PM_ENV_NOT_SET;
> -	reset_power_function_ptrs();
>   	rte_spinlock_unlock(&global_env_cfg_lock);
>   }
>   
> @@ -177,59 +138,76 @@ int
>   rte_power_init(unsigned int lcore_id)
>   {
>   	int ret = -1;
> +	struct rte_power_ops *ops;
>   
> -	switch (global_default_env) {
> -	case PM_ENV_ACPI_CPUFREQ:
> -		return power_acpi_cpufreq_init(lcore_id);
> -	case PM_ENV_KVM_VM:
> -		return power_kvm_vm_init(lcore_id);
> -	case PM_ENV_PSTATE_CPUFREQ:
> -		return power_pstate_cpufreq_init(lcore_id);
> -	case PM_ENV_CPPC_CPUFREQ:
> -		return power_cppc_cpufreq_init(lcore_id);
> -	case PM_ENV_AMD_PSTATE_CPUFREQ:
> -		return power_amd_pstate_cpufreq_init(lcore_id);
> -	default:
> -		POWER_LOG(INFO, "Env isn't set yet!");
> +	if (global_default_env != PM_ENV_NOT_SET) {
> +		ops = &rte_power_ops[global_default_env];
> +		if (!ops->status) {
> +			POWER_LOG(ERR, "Power management env[%d] not"
> +				" supported\n", global_default_env);
> +			goto out;
> +		}
> +		return ops->init(lcore_id);
>   	}
> +	POWER_LOG(INFO, POWER, "Env isn't set yet!\n");
>   
>   	/* Auto detect Environment */
> -	POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq power management...");
> -	ret = power_acpi_cpufreq_init(lcore_id);
> -	if (ret == 0) {
> -		rte_power_set_env(PM_ENV_ACPI_CPUFREQ);
> -		goto out;
> +	POWER_LOG(INFO, "Attempting to initialise ACPI cpufreq"
> +			" power management...\n");
> +	ops = &rte_power_ops[PM_ENV_ACPI_CPUFREQ];
> +	if (ops->status) {
> +		ret = ops->init(lcore_id);
> +		if (ret == 0) {
> +			rte_power_set_env(PM_ENV_ACPI_CPUFREQ);
> +			goto out;
> +		}
>   	}
>   
> -	POWER_LOG(INFO, "Attempting to initialise PSTAT power management...");
> -	ret = power_pstate_cpufreq_init(lcore_id);
> -	if (ret == 0) {
> -		rte_power_set_env(PM_ENV_PSTATE_CPUFREQ);
> -		goto out;
> +	POWER_LOG(INFO, "Attempting to initialise PSTAT"
> +			" power management...\n");
> +	ops = &rte_power_ops[PM_ENV_PSTATE_CPUFREQ];
> +	if (ops->status) {
> +		ret = ops->init(lcore_id);
> +		if (ret == 0) {
> +			rte_power_set_env(PM_ENV_PSTATE_CPUFREQ);
> +			goto out;
> +		}
>   	}
>   
> -	POWER_LOG(INFO, "Attempting to initialise AMD PSTATE power management...");
> -	ret = power_amd_pstate_cpufreq_init(lcore_id);
> -	if (ret == 0) {
> -		rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ);
> -		goto out;
> +	POWER_LOG(INFO,	"Attempting to initialise AMD PSTATE"
> +			" power management...\n");
> +	ops = &rte_power_ops[PM_ENV_AMD_PSTATE_CPUFREQ];
> +	if (ops->status) {
> +		ret = ops->init(lcore_id);
> +		if (ret == 0) {
> +			rte_power_set_env(PM_ENV_AMD_PSTATE_CPUFREQ);
> +			goto out;
> +		}
>   	}
>   
> -	POWER_LOG(INFO, "Attempting to initialise CPPC power management...");
> -	ret = power_cppc_cpufreq_init(lcore_id);
> -	if (ret == 0) {
> -		rte_power_set_env(PM_ENV_CPPC_CPUFREQ);
> -		goto out;
> +	POWER_LOG(INFO, "Attempting to initialise CPPC power"
> +			" management...\n");
> +	ops = &rte_power_ops[PM_ENV_CPPC_CPUFREQ];
> +	if (ops->status) {
> +		ret = ops->init(lcore_id);
> +		if (ret == 0) {
> +			rte_power_set_env(PM_ENV_CPPC_CPUFREQ);
> +			goto out;
> +		}
>   	}
>   
> -	POWER_LOG(INFO, "Attempting to initialise VM power management...");
> -	ret = power_kvm_vm_init(lcore_id);
> -	if (ret == 0) {
> -		rte_power_set_env(PM_ENV_KVM_VM);
> -		goto out;
> +	POWER_LOG(INFO, "Attempting to initialise VM power"
> +			" management...\n");
> +	ops = &rte_power_ops[PM_ENV_KVM_VM];
> +	if (ops->status) {
> +		ret = ops->init(lcore_id);
> +		if (ret == 0) {
> +			rte_power_set_env(PM_ENV_KVM_VM);
> +			goto out;
> +		}
>   	}
If we use a linked list, above code can be simpled like this:
->
for_each_power_cpufreq_ops(ops, ...) {
     ret = ops->init()
     if (ret) {
         ....
     }
}
> -	POWER_LOG(ERR, "Unable to set Power Management Environment for lcore "
> -			"%u", lcore_id);
> +	POWER_LOG(ERR, "Unable to set Power Management Environment"
> +			" for lcore %u\n", lcore_id);
>   out:
>   	return ret;
>   }
> @@ -237,21 +215,14 @@ rte_power_init(unsigned int lcore_id)
>   int
>   rte_power_exit(unsigned int lcore_id)
>   {
> -	switch (global_default_env) {
> -	case PM_ENV_ACPI_CPUFREQ:
> -		return power_acpi_cpufreq_exit(lcore_id);
> -	case PM_ENV_KVM_VM:
> -		return power_kvm_vm_exit(lcore_id);
> -	case PM_ENV_PSTATE_CPUFREQ:
> -		return power_pstate_cpufreq_exit(lcore_id);
> -	case PM_ENV_CPPC_CPUFREQ:
> -		return power_cppc_cpufreq_exit(lcore_id);
> -	case PM_ENV_AMD_PSTATE_CPUFREQ:
> -		return power_amd_pstate_cpufreq_exit(lcore_id);
> -	default:
> -		POWER_LOG(ERR, "Environment has not been set, unable to exit gracefully");
> +	struct rte_power_ops *ops;
>   
> +	if (global_default_env != PM_ENV_NOT_SET) {
> +		ops = &rte_power_ops[global_default_env];
> +		return ops->exit(lcore_id);
>   	}
> -	return -1;
> +	POWER_LOG(ERR, "Environment has not been set, unable "
> +			"to exit gracefully\n");
>   
> +	return -1;
>   }
> diff --git a/lib/power/rte_power.h b/lib/power/rte_power.h
> index 4fa4afe399..749bb823ab 100644
> --- a/lib/power/rte_power.h
> +++ b/lib/power/rte_power.h
> @@ -1,5 +1,6 @@
>   /* SPDX-License-Identifier: BSD-3-Clause
>    * Copyright(c) 2010-2014 Intel Corporation
> + * Copyright(c) 2024 AMD Limited
>    */
>   
>   #ifndef _RTE_POWER_H
> @@ -21,7 +22,7 @@ extern "C" {
>   /* Power Management Environment State */
>   enum power_management_env {PM_ENV_NOT_SET, PM_ENV_ACPI_CPUFREQ, PM_ENV_KVM_VM,
>   		PM_ENV_PSTATE_CPUFREQ, PM_ENV_CPPC_CPUFREQ,
> -		PM_ENV_AMD_PSTATE_CPUFREQ};
> +		PM_ENV_AMD_PSTATE_CPUFREQ, PM_ENV_MAX};
"enum power_management_env" is not good. may be like "enum 
power_cpufreq_driver_type"?
In previous linked list structure to be defined, may be directly use a 
string name instead of a fixed enum is better.
Becuase the new "PM_ENV_MAX" will  lead to break ABI when add a new 
cpufreq driver.
>   
>   /**
>    * Check if a specific power management environment type is supported on a
> @@ -66,6 +67,97 @@ void rte_power_unset_env(void);
>    */
>   enum power_management_env rte_power_get_env(void);
>   
> +typedef int (*rte_power_cpufreq_init_t)(unsigned int lcore_id);
> +typedef int (*rte_power_cpufreq_exit_t)(unsigned int lcore_id);
> +typedef int (*rte_power_check_env_support_t)(void);
> +
> +typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, uint32_t *freqs,
> +					uint32_t num);
> +typedef uint32_t (*rte_power_get_freq_t)(unsigned int lcore_id);
> +typedef int (*rte_power_set_freq_t)(unsigned int lcore_id, uint32_t index);
> +typedef int (*rte_power_freq_change_t)(unsigned int lcore_id);
> +
> +/**
> + * Function pointer definition for generic frequency change functions. Review
> + * each environments specific documentation for usage.
> + *
> + * @param lcore_id
> + *  lcore id.
> + *
> + * @return
> + *  - 1 on success with frequency changed.
> + *  - 0 on success without frequency changed.
> + *  - Negative on error.
> + */
> +
> +/**
> + * Power capabilities summary.
> + */
> +struct rte_power_core_capabilities {
> +	union {
> +		uint64_t capabilities;
> +		struct {
> +			uint64_t turbo:1;       /**< Turbo can be enabled. */
> +			uint64_t priority:1;    /**< SST-BF high freq core */
> +		};
> +	};
> +};
> +
> +typedef int (*rte_power_get_capabilities_t)(unsigned int lcore_id,
> +				struct rte_power_core_capabilities *caps);
> +
> +/** Structure defining core power operations structure */
> +struct rte_power_ops {
> +uint8_t status;                         /**< ops register status. */
> +	enum power_management_env env;          /**< power mgmt env. */
> +	rte_power_cpufreq_init_t init;    /**< Initialize power management. */
> +	rte_power_cpufreq_exit_t exit;    /**< Exit power management. */
> +	rte_power_check_env_support_t check_env_support; /**< verify env is supported. */
> +	rte_power_freqs_t get_avail_freqs; /**< Get the available frequencies. */
> +	rte_power_get_freq_t get_freq; /**< Get frequency index. */
> +	rte_power_set_freq_t set_freq; /**< Set frequency index. */
> +	rte_power_freq_change_t freq_up;   /**< Scale up frequency. */
> +	rte_power_freq_change_t freq_down; /**< Scale down frequency. */
> +	rte_power_freq_change_t freq_max;  /**< Scale up frequency to highest. */
> +	rte_power_freq_change_t freq_min;  /**< Scale up frequency to lowest. */
> +	rte_power_freq_change_t turbo_status; /**< Get Turbo status. */
> +	rte_power_freq_change_t enable_turbo; /**< Enable Turbo. */
> +	rte_power_freq_change_t disable_turbo; /**< Disable Turbo. */
> +	rte_power_get_capabilities_t get_caps; /**< power capabilities. */
> +} __rte_cache_aligned;
Suggest that fix this sturcture, like:
struct rte_power_cpufreq_list {
     char name[];   // like "cppc_cpufreq", "pstate_cpufreq"
     struct rte_power_cpufreq *ops;
     struct rte_power_cpufreq_list *node;
}
> +
> +/**
> + * Register power cpu frequency operations.
> + *
> + * @param ops
> + *   Pointer to an ops structure to register.
> + * @return
> + *   - >=0: Success; return the index of the ops struct in the table.
> + *   - -EINVAL - error while registering ops struct.
> + */
> +__rte_internal
> +int rte_power_register_ops(const struct rte_power_ops *ops);
> +
> +/**
> + * Macro to statically register the ops of a cpufreq driver.
> + */
> +#define RTE_POWER_REGISTER_OPS(ops)		\
> +	(RTE_INIT(power_hdlr_init_##ops)	\
> +	{					\
> +		rte_power_register_ops(&ops);	\
> +	})
> +
> +/**
> + * @internal Get the power ops struct from its index.
> + *
> + * @param ops_index
> + *   The index of the ops struct in the ops struct table.
> + * @return
> + *   The pointer to the ops struct in the table if registered.
> + */
> +struct rte_power_ops *
> +rte_power_get_ops(int ops_index);
> +
>   /**
>    * Initialize power management for a specific lcore. If rte_power_set_env() has
>    * not been called then an auto-detect of the environment will start and
> @@ -108,10 +200,14 @@ int rte_power_exit(unsigned int lcore_id);
>    * @return
>    *  The number of available frequencies.
>    */
> -typedef uint32_t (*rte_power_freqs_t)(unsigned int lcore_id, uint32_t *freqs,
> -		uint32_t num);
> +static inline uint32_t
> +rte_power_freqs(unsigned int lcore_id, uint32_t *freqs, uint32_t n)
> +{
> +	struct rte_power_ops *ops;
>   
> -extern rte_power_freqs_t rte_power_freqs;
> +	ops = rte_power_get_ops(rte_power_get_env());
> +	return ops->get_avail_freqs(lcore_id, freqs, n);
> +}
nice.
<...>

^ permalink raw reply	[relevance 3%]

* Re: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional transfer
  2024-02-29 14:03  3%       ` Amit Prakash Shukla
@ 2024-03-01  1:46  0%         ` fengchengwen
  2024-03-01  8:31  0%           ` [EXTERNAL] " Amit Prakash Shukla
  0 siblings, 1 reply; 200+ results
From: fengchengwen @ 2024-03-01  1:46 UTC (permalink / raw)
  To: Amit Prakash Shukla, Cheng Jiang, Gowrishankar Muthukrishnan
  Cc: dev, Jerin Jacob, Anoob Joseph, Kevin Laatz, Bruce Richardson,
	Pavan Nikhilesh Bhagavatula

Hi Amit,

I think this commit will complicated the test, plus futer we may add more test (e.g. fill)

I agree Bruce's advise in the [1], let also support "lcore_dma0/1/2",

User could provide dma info by two way:
1) lcore_dma=, which seperate each dma with ", ", but a maximum of a certain number is allowed.
2) lcore_dma0/1/2/..., each dma device take one line

[1] https://patchwork.dpdk.org/project/dpdk/patch/20231206112952.1588-1-vipin.varghese@amd.com/

Thanks

On 2024/2/29 22:03, Amit Prakash Shukla wrote:
> Hi Chengwen,
> 
> I liked your suggestion and tried making changes, but encountered parsing issue for CFG files with line greater than CFG_VALUE_LEN=256(current value set).
> 
> There is a discussion on the similar lines in another patch set: https://patchwork.dpdk.org/project/dpdk/patch/20231206112952.1588-1-vipin.varghese@amd.com/.
> 
> I believe this patch can be taken as-is and we can come up with the solution when we can increase the CFG_VALUE_LEN as changing CFG_VALUE_LEN in this release is causing ABI breakage.
> 
> Thanks,
> Amit Shukla
> 
>> -----Original Message-----
>> From: Amit Prakash Shukla
>> Sent: Wednesday, February 28, 2024 3:08 PM
>> To: fengchengwen <fengchengwen@huawei.com>; Cheng Jiang
>> <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
>> <gmuthukrishn@marvell.com>
>> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
>> <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
>> Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
>> <pbhagavatula@marvell.com>
>> Subject: RE: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional
>> transfer
>>
>> Hi Chengwen,
>>
>> Please see my reply in-line.
>>
>> Thanks
>> Amit Shukla
>>
>>> -----Original Message-----
>>> From: fengchengwen <fengchengwen@huawei.com>
>>> Sent: Wednesday, February 28, 2024 12:34 PM
>>> To: Amit Prakash Shukla <amitprakashs@marvell.com>; Cheng Jiang
>>> <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
>>> <gmuthukrishn@marvell.com>
>>> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
>>> <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
>>> Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
>>> <pbhagavatula@marvell.com>
>>> Subject: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional
>>> transfer
>>>
>>> External Email
>>>
>>> ----------------------------------------------------------------------
>>> Hi Amit and Gowrishankar,
>>>
>>> It's nature to support multiple dmadev test in one testcase, and the
>>> original framework supports it.
>>> But it seem we both complicated it when adding support for non-
>> mem2mem
>>> dma test.
>>>
>>> The new added "direction" and "vchan_dev" could treat as the dmadev's
>>> private configure, some thing like:
>>>
>>>
>> lcore_dma=lcore10@0000:00:04.2,vchan=0,dir=mem2dev,devtype=pcie,radd
>>> r=xxx,coreid=1,pfid=2,vfid=3
>>>
>>> then this bi-directional could impl only with config:
>>>
>>>
>> lcore_dma=lcore10@0000:00:04.2,dir=mem2dev,devtype=pcie,raddr=xxx,cor
>>> eid=1,pfid=2,vfid=3,
>>>
>> lcore11@0000:00:04.3,dir=dev2mem,devtype=pcie,raddr=xxx,coreid=1,pfid=
>>> 2,vfid=3
>>> so that the lcore10 will do mem2dev with device 0000:00:04.2, while
>>> lcore11 will do dev2mem with device 0000:00:04.3.
>>
>> Thanks for the suggestion. I will make the suggested changes and send the
>> next version.

^ permalink raw reply	[relevance 0%]

* [PATCH v4 4/6] pipeline: replace zero length array with flex array
  @ 2024-02-29 22:58  4%   ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-02-29 22:58 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Cristian Dumitrescu, Honnappa Nagarahalli,
	Sameh Gobriel, Vladimir Medvedkin, Yipeng Wang, mb, fengchengwen,
	Tyler Retzlaff

Zero length arrays are GNU extension. Replace with
standard flex array.

Add a temporary suppression for rte_pipeline_table_entry
libabigail bug:

Bugzilla ID: https://sourceware.org/bugzilla/show_bug.cgi?id=31377

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 devtools/libabigail.abignore      | 2 ++
 lib/pipeline/rte_pipeline.h       | 2 +-
 lib/pipeline/rte_port_in_action.c | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 25c73a5..5292b63 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -33,6 +33,8 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till next major ABI version ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+[suppress_type]
+	name = rte_pipeline_table_entry
 
 [suppress_type]
 	name = rte_rcu_qsbr
diff --git a/lib/pipeline/rte_pipeline.h b/lib/pipeline/rte_pipeline.h
index ec51b9b..0c7994b 100644
--- a/lib/pipeline/rte_pipeline.h
+++ b/lib/pipeline/rte_pipeline.h
@@ -220,7 +220,7 @@ struct rte_pipeline_table_entry {
 		uint32_t table_id;
 	};
 	/** Start of table entry area for user defined actions and meta-data */
-	__extension__ uint8_t action_data[0];
+	uint8_t action_data[];
 };
 
 /**
diff --git a/lib/pipeline/rte_port_in_action.c b/lib/pipeline/rte_port_in_action.c
index 5818973..ebd9b9a 100644
--- a/lib/pipeline/rte_port_in_action.c
+++ b/lib/pipeline/rte_port_in_action.c
@@ -282,7 +282,7 @@ struct rte_port_in_action_profile *
 struct rte_port_in_action {
 	struct ap_config cfg;
 	struct ap_data data;
-	uint8_t memory[0] __rte_cache_aligned;
+	uint8_t memory[] __rte_cache_aligned;
 };
 
 static __rte_always_inline void *
-- 
1.8.3.1


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v3 3/3] dts: add API doc generation
       [not found]         ` <CAJvnSUCNjo0p-yhROF1MNLKhjiAw2QTyTHO2hpOaVVUn0xnJ0A@mail.gmail.com>
@ 2024-02-29 18:12  2%       ` Nicholas Pratte
  0 siblings, 0 replies; 200+ results
From: Nicholas Pratte @ 2024-02-29 18:12 UTC (permalink / raw)
  To: dev, Jeremy Spewock

Tested-by: Nicholas Pratte <npratte@iol.unh.edu>

----

The tool used to generate developer docs is Sphinx, which is already in
use in DPDK. The same configuration is used to preserve style, but it's
been augmented with doc-generating configuration. There's a change that
modifies how the sidebar displays the content hierarchy that's been put
into an if block to not interfere with regular docs.

Sphinx generates the documentation from Python docstrings. The docstring
format is the Google format [0] which requires the sphinx.ext.napoleon
extension. The other extension, sphinx.ext.intersphinx, enables linking
to object in external documentations, such as the Python documentation.

There are two requirements for building DTS docs:
* The same Python version as DTS or higher, because Sphinx imports the
  code.
* Also the same Python packages as DTS, for the same reason.

[0] https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings

Signed-off-by: Juraj Linkeš <juraj.linkes@pantheon.tech>
---
 buildtools/call-sphinx-build.py | 33 +++++++++++++++++++---------
 doc/api/doxy-api-index.md       |  3 +++
 doc/api/doxy-api.conf.in        |  2 ++
 doc/api/meson.build             | 11 +++++++---
 doc/guides/conf.py              | 39 ++++++++++++++++++++++++++++-----
 doc/guides/meson.build          |  1 +
 doc/guides/tools/dts.rst        | 34 +++++++++++++++++++++++++++-
 dts/doc/meson.build             | 27 +++++++++++++++++++++++
 dts/meson.build                 | 16 ++++++++++++++
 meson.build                     |  1 +
 10 files changed, 148 insertions(+), 19 deletions(-)
 create mode 100644 dts/doc/meson.build
 create mode 100644 dts/meson.build

diff --git a/buildtools/call-sphinx-build.py b/buildtools/call-sphinx-build.py
index 39a60d09fa..aea771a64e 100755
--- a/buildtools/call-sphinx-build.py
+++ b/buildtools/call-sphinx-build.py
@@ -3,37 +3,50 @@
 # Copyright(c) 2019 Intel Corporation
 #

+import argparse
 import sys
 import os
 from os.path import join
 from subprocess import run, PIPE, STDOUT
 from packaging.version import Version

-# assign parameters to variables
-(sphinx, version, src, dst, *extra_args) = sys.argv[1:]
+parser = argparse.ArgumentParser()
+parser.add_argument('sphinx')
+parser.add_argument('version')
+parser.add_argument('src')
+parser.add_argument('dst')
+parser.add_argument('--dts-root', default=None)
+args, extra_args = parser.parse_known_args()

 # set the version in environment for sphinx to pick up
-os.environ['DPDK_VERSION'] = version
+os.environ['DPDK_VERSION'] = args.version
+if args.dts_root:
+    os.environ['DTS_ROOT'] = args.dts_root

 # for sphinx version >= 1.7 add parallelism using "-j auto"
-ver = run([sphinx, '--version'], stdout=PIPE,
+ver = run([args.sphinx, '--version'], stdout=PIPE,
           stderr=STDOUT).stdout.decode().split()[-1]
-sphinx_cmd = [sphinx] + extra_args
+sphinx_cmd = [args.sphinx] + extra_args
 if Version(ver) >= Version('1.7'):
     sphinx_cmd += ['-j', 'auto']

 # find all the files sphinx will process so we can write them as dependencies
 srcfiles = []
-for root, dirs, files in os.walk(src):
+for root, dirs, files in os.walk(args.src):
     srcfiles.extend([join(root, f) for f in files])

+if not os.path.exists(args.dst):
+    os.makedirs(args.dst)
+
 # run sphinx, putting the html output in a "html" directory
-with open(join(dst, 'sphinx_html.out'), 'w') as out:
-    process = run(sphinx_cmd + ['-b', 'html', src, join(dst, 'html')],
-                  stdout=out)
+with open(join(args.dst, 'sphinx_html.out'), 'w') as out:
+    process = run(
+        sphinx_cmd + ['-b', 'html', args.src, join(args.dst, 'html')],
+        stdout=out
+    )

 # create a gcc format .d file giving all the dependencies of this doc build
-with open(join(dst, '.html.d'), 'w') as d:
+with open(join(args.dst, '.html.d'), 'w') as d:
     d.write('html: ' + ' '.join(srcfiles) + '\n')

 sys.exit(process.returncode)
diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
index a6a768bd7c..b49b24acce 100644
--- a/doc/api/doxy-api-index.md
+++ b/doc/api/doxy-api-index.md
@@ -241,3 +241,6 @@ The public API headers are grouped by topics:
   [experimental APIs](@ref rte_compat.h),
   [ABI versioning](@ref rte_function_versioning.h),
   [version](@ref rte_version.h)
+
+- **tests**:
+  [**DTS**](@dts_api_main_page)
diff --git a/doc/api/doxy-api.conf.in b/doc/api/doxy-api.conf.in
index e94c9e4e46..d53edeba57 100644
--- a/doc/api/doxy-api.conf.in
+++ b/doc/api/doxy-api.conf.in
@@ -121,6 +121,8 @@ SEARCHENGINE            = YES
 SORT_MEMBER_DOCS        = NO
 SOURCE_BROWSER          = YES

+ALIASES                 = "dts_api_main_page=@DTS_API_MAIN_PAGE@"
+
 EXAMPLE_PATH            = @TOPDIR@/examples
 EXAMPLE_PATTERNS        = *.c
 EXAMPLE_RECURSIVE       = YES
diff --git a/doc/api/meson.build b/doc/api/meson.build
index 5b50692df9..ffc75d7b5a 100644
--- a/doc/api/meson.build
+++ b/doc/api/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Luca Boccassi <bluca@debian.org>

+doc_api_build_dir = meson.current_build_dir()
 doxygen = find_program('doxygen', required: get_option('enable_docs'))

 if not doxygen.found()
@@ -32,14 +33,18 @@ example = custom_target('examples.dox',
 # set up common Doxygen configuration
 cdata = configuration_data()
 cdata.set('VERSION', meson.project_version())
-cdata.set('API_EXAMPLES', join_paths(dpdk_build_root, 'doc', 'api',
'examples.dox'))
-cdata.set('OUTPUT', join_paths(dpdk_build_root, 'doc', 'api'))
+cdata.set('API_EXAMPLES', join_paths(doc_api_build_dir, 'examples.dox'))
+cdata.set('OUTPUT', doc_api_build_dir)
 cdata.set('TOPDIR', dpdk_source_root)
-cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root,
join_paths(dpdk_build_root, 'doc', 'api')]))
+cdata.set('STRIP_FROM_PATH', ' '.join([dpdk_source_root, doc_api_build_dir]))
 cdata.set('WARN_AS_ERROR', 'NO')
 if get_option('werror')
     cdata.set('WARN_AS_ERROR', 'YES')
 endif
+# A local reference must be relative to the main index.html page
+# The path below can't be taken from the DTS meson file as that would
+# require recursive subdir traversal (doc, dts, then doc again)
+cdata.set('DTS_API_MAIN_PAGE', join_paths('..', 'dts', 'html', 'index.html'))

 # configure HTML Doxygen run
 html_cdata = configuration_data()
diff --git a/doc/guides/conf.py b/doc/guides/conf.py
index 0f7ff5282d..b442a1f76c 100644
--- a/doc/guides/conf.py
+++ b/doc/guides/conf.py
@@ -7,10 +7,9 @@
 from sphinx import __version__ as sphinx_version
 from os import listdir
 from os import environ
-from os.path import basename
-from os.path import dirname
+from os.path import basename, dirname
 from os.path import join as path_join
-from sys import argv, stderr
+from sys import argv, stderr, path

 import configparser

@@ -24,6 +23,37 @@
           file=stderr)
     pass

+# Napoleon enables the Google format of Python doscstrings, used in DTS
+# Intersphinx allows linking to external projects, such as Python
docs, also used in DTS
+extensions = ['sphinx.ext.napoleon', 'sphinx.ext.intersphinx']
+
+# DTS Python docstring options
+autodoc_default_options = {
+    'members': True,
+    'member-order': 'bysource',
+    'show-inheritance': True,
+}
+autodoc_class_signature = 'separated'
+autodoc_typehints = 'both'
+autodoc_typehints_format = 'short'
+autodoc_typehints_description_target = 'documented'
+napoleon_numpy_docstring = False
+napoleon_attr_annotations = True
+napoleon_preprocess_types = True
+add_module_names = False
+toc_object_entries = True
+toc_object_entries_show_parents = 'hide'
+intersphinx_mapping = {'python': ('https://docs.python.org/3', None)}
+
+dts_root = environ.get('DTS_ROOT')
+if dts_root:
+    path.append(dts_root)
+    # DTS Sidebar config
+    html_theme_options = {
+        'collapse_navigation': False,
+        'navigation_depth': -1,
+    }
+
 stop_on_error = ('-W' in argv)

 project = 'Data Plane Development Kit'
@@ -35,8 +65,7 @@
 html_show_copyright = False
 highlight_language = 'none'

-release = environ.setdefault('DPDK_VERSION', "None")
-version = release
+version = environ.setdefault('DPDK_VERSION', "None")

 master_doc = 'index'

diff --git a/doc/guides/meson.build b/doc/guides/meson.build
index 51f81da2e3..8933d75f6b 100644
--- a/doc/guides/meson.build
+++ b/doc/guides/meson.build
@@ -1,6 +1,7 @@
 # SPDX-License-Identifier: BSD-3-Clause
 # Copyright(c) 2018 Intel Corporation

+doc_guides_source_dir = meson.current_source_dir()
 sphinx = find_program('sphinx-build', required: get_option('enable_docs'))

 if not sphinx.found()
diff --git a/doc/guides/tools/dts.rst b/doc/guides/tools/dts.rst
index 846696e14e..21d3d89fc2 100644
--- a/doc/guides/tools/dts.rst
+++ b/doc/guides/tools/dts.rst
@@ -278,7 +278,12 @@ and try not to divert much from it.
 The :ref:`DTS developer tools <dts_dev_tools>` will issue warnings
 when some of the basics are not met.

-The code must be properly documented with docstrings.
+The API documentation, which is a helpful reference when developing,
may be accessed
+in the code directly or generated with the :ref:`API docs build steps
<building_api_docs>`.
+When adding new files or modifying the directory structure, the
corresponding changes must
+be made to DTS api doc sources in ``dts/doc``.
+
+Speaking of which, the code must be properly documented with docstrings.
 The style must conform to the `Google style
 <https://google.github.io/styleguide/pyguide.html#38-comments-and-docstrings>`_.
 See an example of the style `here
@@ -413,6 +418,33 @@ the DTS code check and format script.
 Refer to the script for usage: ``devtools/dts-check-format.sh -h``.


+.. _building_api_docs:
+
+Building DTS API docs
+---------------------
+
+To build DTS API docs, install the dependencies with Poetry, then
enter its shell:
+
+.. code-block:: console
+
+   poetry install --with docs
+   poetry shell
+
+The documentation is built using the standard DPDK build system.
After executing the meson command
+and entering Poetry's shell, build the documentation with:
+
+.. code-block:: console
+
+   ninja -C build dts-doc
+
+The output is generated in ``build/doc/api/dts/html``.
+
+.. Note::
+
+   Make sure to fix any Sphinx warnings when adding or updating
docstrings. Also make sure to run
+   the ``devtools/dts-check-format.sh`` script and address any issues it finds.
+
+
 Configuration Schema
 --------------------

diff --git a/dts/doc/meson.build b/dts/doc/meson.build
new file mode 100644
index 0000000000..01b7b51034
--- /dev/null
+++ b/dts/doc/meson.build
@@ -0,0 +1,27 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+sphinx = find_program('sphinx-build', required: false)
+sphinx_apidoc = find_program('sphinx-apidoc', required: false)
+
+if not sphinx.found() or not sphinx_apidoc.found()
+    subdir_done()
+endif
+
+dts_doc_api_build_dir = join_paths(doc_api_build_dir, 'dts')
+
+extra_sphinx_args = ['-E', '-c', doc_guides_source_dir, '--dts-root', dts_dir]
+if get_option('werror')
+    extra_sphinx_args += '-W'
+endif
+
+htmldir = join_paths(get_option('datadir'), 'doc', 'dpdk', 'dts')
+dts_api_html = custom_target('dts_api_html',
+        output: 'html',
+        command: [sphinx_wrapper, sphinx, meson.project_version(),
+            meson.current_source_dir(), dts_doc_api_build_dir,
extra_sphinx_args],
+        build_by_default: false,
+        install: get_option('enable_docs'),
+        install_dir: htmldir)
+doc_targets += dts_api_html
+doc_target_names += 'DTS_API_HTML'
diff --git a/dts/meson.build b/dts/meson.build
new file mode 100644
index 0000000000..e8ce0f06ac
--- /dev/null
+++ b/dts/meson.build
@@ -0,0 +1,16 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright(c) 2023 PANTHEON.tech s.r.o.
+
+doc_targets = []
+doc_target_names = []
+dts_dir = meson.current_source_dir()
+
+subdir('doc')
+
+if doc_targets.length() == 0
+    message = 'No docs targets found'
+else
+    message = 'Built docs:'
+endif
+run_target('dts-doc', command: [echo, message, doc_target_names],
+    depends: doc_targets)
diff --git a/meson.build b/meson.build
index 5e161f43e5..001fdcbbbf 100644
--- a/meson.build
+++ b/meson.build
@@ -87,6 +87,7 @@ subdir('app')

 # build docs
 subdir('doc')
+subdir('dts')

 # build any examples explicitly requested - useful for developers - and
 # install any example code into the appropriate install path
--
2.34.1

^ permalink raw reply	[relevance 2%]

* [PATCH] net/tap: allow more that 4 queues
@ 2024-02-29 17:56  3% Stephen Hemminger
  2024-03-06 16:14  0% ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-02-29 17:56 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

The tap device needs to exchange file descriptors for tx and rx.
But the EAL MP layer has limit of 8 file descriptors per message.
The ideal resolution would be to increase the number of file
descriptors allowed for rte_mp_sendmsg(), but this would break
the ABI. Workaround the constraint by breaking into multiple messages.

Do not hide errors about MP message failures.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 drivers/net/tap/rte_eth_tap.c | 40 +++++++++++++++++++++++++++++------
 1 file changed, 33 insertions(+), 7 deletions(-)

diff --git a/drivers/net/tap/rte_eth_tap.c b/drivers/net/tap/rte_eth_tap.c
index 69d9da695bed..df18c328f498 100644
--- a/drivers/net/tap/rte_eth_tap.c
+++ b/drivers/net/tap/rte_eth_tap.c
@@ -863,21 +863,44 @@ tap_mp_req_on_rxtx(struct rte_eth_dev *dev)
 		msg.fds[fd_iterator++] = process_private->txq_fds[i];
 		msg.num_fds++;
 		request_param->txq_count++;
+
+		/* Need to break request into chunks */
+		if (fd_iterator >= RTE_MP_MAX_FD_NUM) {
+			err = rte_mp_sendmsg(&msg);
+			if (err < 0)
+				goto fail;
+
+			fd_iterator = 0;
+			msg.num_fds = 0;
+			request_param->txq_count = 0;
+		}
 	}
 	for (i = 0; i < dev->data->nb_rx_queues; i++) {
 		msg.fds[fd_iterator++] = process_private->rxq_fds[i];
 		msg.num_fds++;
 		request_param->rxq_count++;
+
+		if (fd_iterator >= RTE_MP_MAX_FD_NUM) {
+			err = rte_mp_sendmsg(&msg);
+			if (err < 0)
+				goto fail;
+
+			fd_iterator = 0;
+			msg.num_fds = 0;
+			request_param->rxq_count = 0;
+		}
 	}
 
-	err = rte_mp_sendmsg(&msg);
-	if (err < 0) {
-		TAP_LOG(ERR, "Failed to send start req to secondary %d",
-			rte_errno);
-		return -1;
+	if (msg.num_fds > 0) {
+		err = rte_mp_sendmsg(&msg);
+		if (err < 0)
+			goto fail;
 	}
 
 	return 0;
+fail:
+	TAP_LOG(ERR, "Failed to send start req to secondary %d", rte_errno);
+	return err;
 }
 
 static int
@@ -885,8 +908,11 @@ tap_dev_start(struct rte_eth_dev *dev)
 {
 	int err, i;
 
-	if (rte_eal_process_type() == RTE_PROC_PRIMARY)
-		tap_mp_req_on_rxtx(dev);
+	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
+		err = tap_mp_req_on_rxtx(dev);
+		if (err)
+			return err;
+	}
 
 	err = tap_intr_handle_set(dev, 1);
 	if (err)
-- 
2.43.0


^ permalink raw reply	[relevance 3%]

* RE: [PATCH 0/7] add Nitrox compress device support
  2024-02-15 12:48  3% [PATCH 0/7] add Nitrox compress device support Nagadheeraj Rottela
@ 2024-02-29 17:22  0% ` Akhil Goyal
  0 siblings, 0 replies; 200+ results
From: Akhil Goyal @ 2024-02-29 17:22 UTC (permalink / raw)
  To: Nagadheeraj Rottela, fanzhang.oss, Ashish Gupta; +Cc: dev, Nagadheeraj Rottela

Please add release notes for the addition of new PMD.

> -----Original Message-----
> From: Nagadheeraj Rottela <rnagadheeraj@marvell.com>
> Sent: Thursday, February 15, 2024 6:19 PM
> To: Akhil Goyal <gakhil@marvell.com>; fanzhang.oss@gmail.com; Ashish Gupta
> <ashishg@marvell.com>
> Cc: dev@dpdk.org; Nagadheeraj Rottela <rnagadheeraj@marvell.com>
> Subject: [PATCH 0/7] add Nitrox compress device support
> 
> Add the Nitrox PMD to support Nitrox compress device.
> ---
> v3:
> * Fixed ABI compatibility issue.
> 
> v2:
> * Reformatted patches to minimize number of changes.
> * Removed empty file with only copyright.
> * Updated all feature flags in nitrox.ini file.
> * Added separate gotos in nitrox_pci_probe() function.
> 
> Nagadheeraj Rottela (7):
>   crypto/nitrox: move nitrox common code to common folder
>   compress/nitrox: add nitrox compressdev driver
>   common/nitrox: add compress hardware queue management
>   crypto/nitrox: set queue type during queue pair setup
>   compress/nitrox: add software queue management
>   compress/nitrox: add stateless request support
>   compress/nitrox: add stateful request support
> 
>  MAINTAINERS                                   |    8 +
>  doc/guides/compressdevs/features/nitrox.ini   |   17 +
>  doc/guides/compressdevs/index.rst             |    1 +
>  doc/guides/compressdevs/nitrox.rst            |   50 +
>  drivers/common/nitrox/meson.build             |   19 +
>  .../{crypto => common}/nitrox/nitrox_csr.h    |   12 +
>  .../{crypto => common}/nitrox/nitrox_device.c |   51 +-
>  .../{crypto => common}/nitrox/nitrox_device.h |    4 +-
>  .../{crypto => common}/nitrox/nitrox_hal.c    |  116 ++
>  .../{crypto => common}/nitrox/nitrox_hal.h    |  115 ++
>  .../{crypto => common}/nitrox/nitrox_logs.c   |    0
>  .../{crypto => common}/nitrox/nitrox_logs.h   |    0
>  drivers/{crypto => common}/nitrox/nitrox_qp.c |   53 +-
>  drivers/{crypto => common}/nitrox/nitrox_qp.h |   37 +-
>  drivers/common/nitrox/version.map             |    9 +
>  drivers/compress/nitrox/meson.build           |   16 +
>  drivers/compress/nitrox/nitrox_comp.c         |  604 +++++++++
>  drivers/compress/nitrox/nitrox_comp.h         |   35 +
>  drivers/compress/nitrox/nitrox_comp_reqmgr.c  | 1199 +++++++++++++++++
>  drivers/compress/nitrox/nitrox_comp_reqmgr.h  |   58 +
>  drivers/crypto/nitrox/meson.build             |   11 +-
>  drivers/crypto/nitrox/nitrox_sym.c            |    1 +
>  drivers/meson.build                           |    1 +
>  23 files changed, 2396 insertions(+), 21 deletions(-)
>  create mode 100644 doc/guides/compressdevs/features/nitrox.ini
>  create mode 100644 doc/guides/compressdevs/nitrox.rst
>  create mode 100644 drivers/common/nitrox/meson.build
>  rename drivers/{crypto => common}/nitrox/nitrox_csr.h (67%)
>  rename drivers/{crypto => common}/nitrox/nitrox_device.c (77%)
>  rename drivers/{crypto => common}/nitrox/nitrox_device.h (81%)
>  rename drivers/{crypto => common}/nitrox/nitrox_hal.c (65%)
>  rename drivers/{crypto => common}/nitrox/nitrox_hal.h (59%)
>  rename drivers/{crypto => common}/nitrox/nitrox_logs.c (100%)
>  rename drivers/{crypto => common}/nitrox/nitrox_logs.h (100%)
>  rename drivers/{crypto => common}/nitrox/nitrox_qp.c (69%)
>  rename drivers/{crypto => common}/nitrox/nitrox_qp.h (75%)
>  create mode 100644 drivers/common/nitrox/version.map
>  create mode 100644 drivers/compress/nitrox/meson.build
>  create mode 100644 drivers/compress/nitrox/nitrox_comp.c
>  create mode 100644 drivers/compress/nitrox/nitrox_comp.h
>  create mode 100644 drivers/compress/nitrox/nitrox_comp_reqmgr.c
>  create mode 100644 drivers/compress/nitrox/nitrox_comp_reqmgr.h
> 
> --
> 2.42.0


^ permalink raw reply	[relevance 0%]

* RE: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional transfer
  @ 2024-02-29 14:03  3%       ` Amit Prakash Shukla
  2024-03-01  1:46  0%         ` fengchengwen
  0 siblings, 1 reply; 200+ results
From: Amit Prakash Shukla @ 2024-02-29 14:03 UTC (permalink / raw)
  To: fengchengwen, Cheng Jiang, Gowrishankar Muthukrishnan
  Cc: dev, Jerin Jacob, Anoob Joseph, Kevin Laatz, Bruce Richardson,
	Pavan Nikhilesh Bhagavatula

Hi Chengwen,

I liked your suggestion and tried making changes, but encountered parsing issue for CFG files with line greater than CFG_VALUE_LEN=256(current value set).

There is a discussion on the similar lines in another patch set: https://patchwork.dpdk.org/project/dpdk/patch/20231206112952.1588-1-vipin.varghese@amd.com/.

I believe this patch can be taken as-is and we can come up with the solution when we can increase the CFG_VALUE_LEN as changing CFG_VALUE_LEN in this release is causing ABI breakage.

Thanks,
Amit Shukla

> -----Original Message-----
> From: Amit Prakash Shukla
> Sent: Wednesday, February 28, 2024 3:08 PM
> To: fengchengwen <fengchengwen@huawei.com>; Cheng Jiang
> <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
> <gmuthukrishn@marvell.com>
> Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
> <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
> Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
> <pbhagavatula@marvell.com>
> Subject: RE: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional
> transfer
> 
> Hi Chengwen,
> 
> Please see my reply in-line.
> 
> Thanks
> Amit Shukla
> 
> > -----Original Message-----
> > From: fengchengwen <fengchengwen@huawei.com>
> > Sent: Wednesday, February 28, 2024 12:34 PM
> > To: Amit Prakash Shukla <amitprakashs@marvell.com>; Cheng Jiang
> > <honest.jiang@foxmail.com>; Gowrishankar Muthukrishnan
> > <gmuthukrishn@marvell.com>
> > Cc: dev@dpdk.org; Jerin Jacob <jerinj@marvell.com>; Anoob Joseph
> > <anoobj@marvell.com>; Kevin Laatz <kevin.laatz@intel.com>; Bruce
> > Richardson <bruce.richardson@intel.com>; Pavan Nikhilesh Bhagavatula
> > <pbhagavatula@marvell.com>
> > Subject: [EXT] Re: [PATCH v2] app/dma-perf: support bi-directional
> > transfer
> >
> > External Email
> >
> > ----------------------------------------------------------------------
> > Hi Amit and Gowrishankar,
> >
> > It's nature to support multiple dmadev test in one testcase, and the
> > original framework supports it.
> > But it seem we both complicated it when adding support for non-
> mem2mem
> > dma test.
> >
> > The new added "direction" and "vchan_dev" could treat as the dmadev's
> > private configure, some thing like:
> >
> >
> lcore_dma=lcore10@0000:00:04.2,vchan=0,dir=mem2dev,devtype=pcie,radd
> > r=xxx,coreid=1,pfid=2,vfid=3
> >
> > then this bi-directional could impl only with config:
> >
> >
> lcore_dma=lcore10@0000:00:04.2,dir=mem2dev,devtype=pcie,raddr=xxx,cor
> > eid=1,pfid=2,vfid=3,
> >
> lcore11@0000:00:04.3,dir=dev2mem,devtype=pcie,raddr=xxx,coreid=1,pfid=
> > 2,vfid=3
> > so that the lcore10 will do mem2dev with device 0000:00:04.2, while
> > lcore11 will do dev2mem with device 0000:00:04.3.
> 
> Thanks for the suggestion. I will make the suggested changes and send the
> next version.

^ permalink raw reply	[relevance 3%]

* Re: [PATCH v4 1/7] ethdev: support report register names and filter
  2024-02-26  3:07  8%   ` [PATCH v4 1/7] ethdev: support report register names and filter Jie Hai
  2024-02-26  8:01  0%     ` fengchengwen
@ 2024-02-29  9:52  3%     ` Thomas Monjalon
  2024-03-05  7:45  5%       ` Jie Hai
  1 sibling, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-02-29  9:52 UTC (permalink / raw)
  To: ferruh.yigit, Jie Hai
  Cc: dev, lihuisong, fengchengwen, liuyonglong, huangdengdui

26/02/2024 04:07, Jie Hai:
> This patch adds "filter" and "names" fields to "rte_dev_reg_info"
> structure. Names of registers in data fields can be reported and
> the registers can be filtered by their names.
> 
> The new API rte_eth_dev_get_reg_info_ext() is added to support
> reporting names and filtering by names. And the original API
> rte_eth_dev_get_reg_info() does not use the name and filter fields.
> A local variable is used in rte_eth_dev_get_reg_info for
> compatibility. If the drivers does not report the names, set them
> to "offset_XXX".

Isn't it possible to implement filtering in the original function?
What would it break?

> @@ -20,6 +25,12 @@ struct rte_dev_reg_info {
>  	uint32_t length; /**< Number of registers to fetch */
>  	uint32_t width; /**< Size of device register */
>  	uint32_t version; /**< Device version */
> +	/**
> +	 * Filter for target subset of registers.
> +	 * This field could affects register selection for data/length/names.
> +	 */
> +	const char *filter;
> +	struct rte_eth_reg_name *names; /**< Registers name saver */
>  };

I suppose this is an ABI break?
Confirmed: http://mails.dpdk.org/archives/test-report/2024-February/587314.html



^ permalink raw reply	[relevance 3%]

* RE: release candidate 24.03-rc1
  @ 2024-02-29  9:18  4% ` Xu, HailinX
  0 siblings, 0 replies; 200+ results
From: Xu, HailinX @ 2024-02-29  9:18 UTC (permalink / raw)
  To: Thomas Monjalon, dev
  Cc: Kovacevic, Marko, Mcnamara, John, Richardson, Bruce,
	Ferruh Yigit, Puttaswamy, Rajesh T, Cui, KaixinX

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Thursday, February 22, 2024 3:36 PM
> To: announce@dpdk.org
> Subject: release candidate 24.03-rc1
> 
> A new DPDK release candidate is ready for testing:
> 	https://git.dpdk.org/dpdk/tag/?id=v24.03-rc1
> 
> There are 521 new patches in this snapshot.
> 
> Release notes:
> 	https://doc.dpdk.org/guides/rel_notes/release_24_03.html
> 
> Highlights of 24.03-rc1:
> 	- argument parsing library
> 	- dynamic logging standardized
> 	- HiSilicon UACCE bus
> 	- Tx queue query
> 	- flow matching with random and field comparison
> 	- flow action NAT64
> 	- more cleanups to prepare MSVC build
> 
> Please test and report issues on bugs.dpdk.org.
> 
> DPDK 24.03-rc2 will be out as soon as possible.
> Priority is on features announced in the roadmap:
> 	https://core.dpdk.org/roadmap/
> 
> Thank you everyone
> 
Update the test status for Intel part. dpdk24.03-rc1 all test is done. found three new issues.

New issues:
1. Bug 1386 - [dpdk-24.03] [ABI][meson test] driver-tests/link_bonding_autotest test failed: Segmentation fault when do ABI testing  -> not fix yet
2. Bug 1387 - [dpdk24.03] cbdma: Failed to launch dpdk-dma app  -> has fix patch
3. dcf_lifecycle/test_one_testpmd_dcf_reset_port: Failed to manually reset vf when use dcf  -> Intel dev is under investigating

# Basic Intel(R) NIC testing
* Build or compile:  
 *Build: cover the build test combination with latest GCC/Clang version and the popular OS revision such as Ubuntu23.10, Ubuntu22.04.3, Fedora39, RHEL8.9/9.2, Centos7.9, FreeBSD14.0, SUSE15, OpenAnolis8.8, CBL-Mariner2.0 etc.
  - All test passed.
 *Compile: cover the CFLAGES(O0/O1/O2/O3) with popular OS such as Ubuntu22.04.3 and RHEL9.2.
  - All test passed with latest dpdk.
* PF/VF(i40e, ixgbe): test scenarios including PF/VF-RTE_FLOW/TSO/Jumboframe/checksum offload/VLAN/VXLAN, etc. 
	- All test case is done. No new issue is found.
* PF/VF(ice): test scenarios including Switch features/Package Management/Flow Director/Advanced Tx/Advanced RSS/ACL/DCF/Flexible Descriptor, etc.
	- Execution rate is done. found the third issue.
* Intel NIC single core/NIC performance: test scenarios including PF/VF single core performance test, RFC2544 Zero packet loss performance test, etc.
	- Execution rate is done. No new issue is found.
* Power and IPsec: 
 * Power: test scenarios including bi-direction/Telemetry/Empty Poll Lib/Priority Base Frequency, etc. 
	- Execution rate is done. No new issue is found.
 * IPsec: test scenarios including ipsec/ipsec-gw/ipsec library basic test - QAT&SW/FIB library, etc.
	- Execution rate is done. No new issue is found. 
# Basic cryptodev and virtio testing
* Virtio: both function and performance test are covered. Such as PVP/Virtio_loopback/virtio-user loopback/virtio-net VM2VM perf testing/VMAWARE ESXI 8.0U1, etc.
	- Execution rate is done. found the second issue.
* Cryptodev: 
 *Function test: test scenarios including Cryptodev API testing/CompressDev ISA-L/QAT/ZLIB PMD Testing/FIPS, etc.
	- Execution rate is done. No new issue is found. 
 *Performance test: test scenarios including Throughput Performance /Cryptodev Latency, etc.
	- Execution rate is done. No performance drop.


Regards,
Xu, Hailin

^ permalink raw reply	[relevance 4%]

* Re: [PATCH v6 20/23] mbuf: remove and stop using rte marker fields
  2024-02-28 15:01  4%       ` Morten Brørup
@ 2024-02-28 15:33  3%         ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-02-28 15:33 UTC (permalink / raw)
  To: Morten Brørup
  Cc: Tyler Retzlaff, dev, Ajit Khaparde, Andrew Boyer,
	Andrew Rybchenko, Bruce Richardson, Chenbo Xia, Chengwen Feng,
	Dariusz Sosnowski, David Christensen, Hyong Youb Kim,
	Jerin Jacob, Jie Hai, Jingjing Wu, John Daley, Kevin Laatz,
	Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj, Matan Azrad,
	Maxime Coquelin, Nithin Dabilpuram, Ori Kam, Ruifeng Wang,
	Satha Rao, Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang,
	Thomas Monjalon

On Wed, Feb 28, 2024 at 4:01 PM Morten Brørup <mb@smartsharesystems.com> wrote:
>
> > From: David Marchand [mailto:david.marchand@redhat.com]
> > Sent: Wednesday, 28 February 2024 15.19
> >
> > On Tue, Feb 27, 2024 at 6:44 AM Tyler Retzlaff
> > <roretzla@linux.microsoft.com> wrote:
> > >
> > > RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> > > RTE_MARKER fields from rte_mbuf struct.
> > >
> > > Maintain alignment of fields after removed cacheline1 marker by placing
> > > C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> > >
> > > Update implementation of rte_mbuf_prefetch_part1() and
> > > rte_mbuf_prefetch_part2() inline functions calculate pointer for
> > > prefetch of cachline0 and cachline1 without using removed markers.
> > >
> > > Update static_assert of rte_mbuf struct fields to reference data_off and
> > > packet_type fields that occupy the original offsets of the marker
> > > fields.
> > >
> > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > > ---
> > >  doc/guides/rel_notes/release_24_03.rst |  9 ++++++++
> > >  lib/mbuf/rte_mbuf.h                    |  4 ++--
> > >  lib/mbuf/rte_mbuf_core.h               | 39 +++++++++++++------------------
> > ---
> > >  3 files changed, 26 insertions(+), 26 deletions(-)
> > >
> > > diff --git a/doc/guides/rel_notes/release_24_03.rst
> > b/doc/guides/rel_notes/release_24_03.rst
> > > index 879bb49..67750f2 100644
> > > --- a/doc/guides/rel_notes/release_24_03.rst
> > > +++ b/doc/guides/rel_notes/release_24_03.rst
> > > @@ -156,6 +156,15 @@ Removed Items
> > >    The application reserved statically defined logtypes
> > ``RTE_LOGTYPE_USER1..RTE_LOGTYPE_USER8``
> > >    are still defined.
> > >
> > > +* mbuf: ``RTE_MARKER`` fields ``cacheline0`` ``cacheline1``
> > > +  ``rx_descriptor_fields1`` and ``RTE_MARKER64`` field ``rearm_data``
> > > +  have been removed from ``struct rte_mbuf``.
> > > +  Prefetch of ``cacheline0`` and ``cacheline1`` may be achieved through
> > > +  ``rte_mbuf_prefetch_part1()`` and ``rte_mbuf_prefetch_part2()`` inline
> > > +  functions respectively.
> > > +  Access to ``rearm_data`` and ``rx_descriptor_fields1`` should be
> > > +  through new inline functions ``rte_mbuf_rearm_data()`` and
> > > +  ``rte_mbuf_rx_descriptor_fields1()`` respectively.
> > >
> > >  API Changes
> > >  -----------
> > > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > > index aa7495b..61cda20 100644
> > > --- a/lib/mbuf/rte_mbuf.h
> > > +++ b/lib/mbuf/rte_mbuf.h
> > > @@ -108,7 +108,7 @@
> > >  static inline void
> > >  rte_mbuf_prefetch_part1(struct rte_mbuf *m)
> > >  {
> > > -       rte_prefetch0(&m->cacheline0);
> > > +       rte_prefetch0(m);
> > >  }
> > >
> > >  /**
> > > @@ -126,7 +126,7 @@
> > >  rte_mbuf_prefetch_part2(struct rte_mbuf *m)
> > >  {
> > >  #if RTE_CACHE_LINE_SIZE == 64
> > > -       rte_prefetch0(&m->cacheline1);
> > > +       rte_prefetch0(RTE_PTR_ADD(m, RTE_CACHE_LINE_MIN_SIZE));
> > >  #else
> > >         RTE_SET_USED(m);
> > >  #endif
> > > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > > index 36551c2..4e06f15 100644
> > > --- a/lib/mbuf/rte_mbuf_core.h
> > > +++ b/lib/mbuf/rte_mbuf_core.h
> > > @@ -18,6 +18,7 @@
> > >
> > >  #include <assert.h>
> > >  #include <stddef.h>
> > > +#include <stdalign.h>
> > >  #include <stdint.h>
> > >
> > >  #include <rte_common.h>
> > > @@ -467,8 +468,6 @@ enum {
> > >   * The generic rte_mbuf, containing a packet mbuf.
> > >   */
> > >  struct rte_mbuf {
> > > -       RTE_MARKER cacheline0;
> > > -
> > >         void *buf_addr;           /**< Virtual address of segment buffer. */
> > >  #if RTE_IOVA_IN_MBUF
> > >         /**
> > > @@ -495,7 +494,6 @@ struct rte_mbuf {
> > >          * To obtain a pointer to rearm_data use the rte_mbuf_rearm_data()
> > >          * accessor instead of directly referencing through the data_off
> > field.
> > >          */
> > > -       RTE_MARKER64 rearm_data;
> > >         uint16_t data_off;
> >
> > One subtile change of removing the marker is that fields may not be
> > aligned as before.
> >
> > #if RTE_IOVA_IN_MBUF
> >         /**
> >          * Physical address of segment buffer.
> >          * This field is undefined if the build is configured to use only
> >          * virtual address as IOVA (i.e. RTE_IOVA_IN_MBUF is 0).
> >          * Force alignment to 8-bytes, so as to ensure we have the exact
> >          * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
> >          * working on vector drivers easier.
> >          */
> >         rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
> > #else
> >         /**
> >          * Next segment of scattered packet.
> >          * This field is valid when physical address field is undefined.
> >          * Otherwise next pointer in the second cache line will be used.
> >          */
> >         struct rte_mbuf *next;
> > #endif
> >
> > When building ! RTE_IOVA_IN_MBUF on a 32 bits arch, the next pointer
> > is not force aligned to 64bits.
> > Which has a cascade effect on data_off alignement.
> >
> > In file included from ../lib/mbuf/rte_mbuf_core.h:19,
> >                  from ../lib/mbuf/rte_mbuf.h:42,
> >                  from ../lib/mbuf/rte_mbuf_dyn.c:18:
> > ../lib/mbuf/rte_mbuf_core.h:676:1: error: static assertion failed: "data_off"
> >   676 | static_assert(!(offsetof(struct rte_mbuf, data_off) !=
> >       | ^~~~~~~~~~~~~
> >
> >
> > I hope reviewers pay attention to the alignment changes when removing
> > those markers.
> > This is not trivial to catch in the CI.
>
> Good catch, David.
>
> I wonder about the reason for 64 bit aligning the rearm_data group of fields? Perhaps it's there for (64 bit arch) vector instruction purposes?
>
> Regardless, it's an ABI break, so padding or an alignment attribute must be added to avoid ABI breakage. If there is no valid reason for the 64 bit alignment, it could be noted that the padding (or alignment attribute) is there for 32 bit arch ABI compatibility reasons only.
>
> Please note that only RTE_MARKER64 is affected by this. The other marker types have arch bit-width (or smaller) alignment, i.e. RTE_MARKER is 8 byte aligned on 64 bit arch and 4 byte aligned on 32 bit arch.

Well, strictly speaking other RTE_MARKER users *may* be affected,
depending on the alignement for the following fields.
For example, I think removing the rxq_fastpath_data_end RTE_MARKER in
struct nicvf_rxq
(https://git.dpdk.org/dpdk/tree/drivers/net/thunderx/nicvf_struct.h#n72)
impacts rx_drop_en alignment and subsequent fields.

Now, in practice and focusing only on what this series touches, either
the markers were coupled with an explicit alignment constraint
(__rte_cache*_aligned) which is preserved by the series, or the
alignement constraint is stronger than that of the marker.
So there is probably only this ABI breakage I reported.


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* RE: [PATCH v6 20/23] mbuf: remove and stop using rte marker fields
  @ 2024-02-28 15:01  4%       ` Morten Brørup
  2024-02-28 15:33  3%         ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2024-02-28 15:01 UTC (permalink / raw)
  To: David Marchand, Tyler Retzlaff
  Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
	David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
	Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
	Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
	Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
	Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang,
	Thomas Monjalon

> From: David Marchand [mailto:david.marchand@redhat.com]
> Sent: Wednesday, 28 February 2024 15.19
> 
> On Tue, Feb 27, 2024 at 6:44 AM Tyler Retzlaff
> <roretzla@linux.microsoft.com> wrote:
> >
> > RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> > RTE_MARKER fields from rte_mbuf struct.
> >
> > Maintain alignment of fields after removed cacheline1 marker by placing
> > C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> >
> > Update implementation of rte_mbuf_prefetch_part1() and
> > rte_mbuf_prefetch_part2() inline functions calculate pointer for
> > prefetch of cachline0 and cachline1 without using removed markers.
> >
> > Update static_assert of rte_mbuf struct fields to reference data_off and
> > packet_type fields that occupy the original offsets of the marker
> > fields.
> >
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> >  doc/guides/rel_notes/release_24_03.rst |  9 ++++++++
> >  lib/mbuf/rte_mbuf.h                    |  4 ++--
> >  lib/mbuf/rte_mbuf_core.h               | 39 +++++++++++++------------------
> ---
> >  3 files changed, 26 insertions(+), 26 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/release_24_03.rst
> b/doc/guides/rel_notes/release_24_03.rst
> > index 879bb49..67750f2 100644
> > --- a/doc/guides/rel_notes/release_24_03.rst
> > +++ b/doc/guides/rel_notes/release_24_03.rst
> > @@ -156,6 +156,15 @@ Removed Items
> >    The application reserved statically defined logtypes
> ``RTE_LOGTYPE_USER1..RTE_LOGTYPE_USER8``
> >    are still defined.
> >
> > +* mbuf: ``RTE_MARKER`` fields ``cacheline0`` ``cacheline1``
> > +  ``rx_descriptor_fields1`` and ``RTE_MARKER64`` field ``rearm_data``
> > +  have been removed from ``struct rte_mbuf``.
> > +  Prefetch of ``cacheline0`` and ``cacheline1`` may be achieved through
> > +  ``rte_mbuf_prefetch_part1()`` and ``rte_mbuf_prefetch_part2()`` inline
> > +  functions respectively.
> > +  Access to ``rearm_data`` and ``rx_descriptor_fields1`` should be
> > +  through new inline functions ``rte_mbuf_rearm_data()`` and
> > +  ``rte_mbuf_rx_descriptor_fields1()`` respectively.
> >
> >  API Changes
> >  -----------
> > diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
> > index aa7495b..61cda20 100644
> > --- a/lib/mbuf/rte_mbuf.h
> > +++ b/lib/mbuf/rte_mbuf.h
> > @@ -108,7 +108,7 @@
> >  static inline void
> >  rte_mbuf_prefetch_part1(struct rte_mbuf *m)
> >  {
> > -       rte_prefetch0(&m->cacheline0);
> > +       rte_prefetch0(m);
> >  }
> >
> >  /**
> > @@ -126,7 +126,7 @@
> >  rte_mbuf_prefetch_part2(struct rte_mbuf *m)
> >  {
> >  #if RTE_CACHE_LINE_SIZE == 64
> > -       rte_prefetch0(&m->cacheline1);
> > +       rte_prefetch0(RTE_PTR_ADD(m, RTE_CACHE_LINE_MIN_SIZE));
> >  #else
> >         RTE_SET_USED(m);
> >  #endif
> > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > index 36551c2..4e06f15 100644
> > --- a/lib/mbuf/rte_mbuf_core.h
> > +++ b/lib/mbuf/rte_mbuf_core.h
> > @@ -18,6 +18,7 @@
> >
> >  #include <assert.h>
> >  #include <stddef.h>
> > +#include <stdalign.h>
> >  #include <stdint.h>
> >
> >  #include <rte_common.h>
> > @@ -467,8 +468,6 @@ enum {
> >   * The generic rte_mbuf, containing a packet mbuf.
> >   */
> >  struct rte_mbuf {
> > -       RTE_MARKER cacheline0;
> > -
> >         void *buf_addr;           /**< Virtual address of segment buffer. */
> >  #if RTE_IOVA_IN_MBUF
> >         /**
> > @@ -495,7 +494,6 @@ struct rte_mbuf {
> >          * To obtain a pointer to rearm_data use the rte_mbuf_rearm_data()
> >          * accessor instead of directly referencing through the data_off
> field.
> >          */
> > -       RTE_MARKER64 rearm_data;
> >         uint16_t data_off;
> 
> One subtile change of removing the marker is that fields may not be
> aligned as before.
> 
> #if RTE_IOVA_IN_MBUF
>         /**
>          * Physical address of segment buffer.
>          * This field is undefined if the build is configured to use only
>          * virtual address as IOVA (i.e. RTE_IOVA_IN_MBUF is 0).
>          * Force alignment to 8-bytes, so as to ensure we have the exact
>          * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
>          * working on vector drivers easier.
>          */
>         rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
> #else
>         /**
>          * Next segment of scattered packet.
>          * This field is valid when physical address field is undefined.
>          * Otherwise next pointer in the second cache line will be used.
>          */
>         struct rte_mbuf *next;
> #endif
> 
> When building ! RTE_IOVA_IN_MBUF on a 32 bits arch, the next pointer
> is not force aligned to 64bits.
> Which has a cascade effect on data_off alignement.
> 
> In file included from ../lib/mbuf/rte_mbuf_core.h:19,
>                  from ../lib/mbuf/rte_mbuf.h:42,
>                  from ../lib/mbuf/rte_mbuf_dyn.c:18:
> ../lib/mbuf/rte_mbuf_core.h:676:1: error: static assertion failed: "data_off"
>   676 | static_assert(!(offsetof(struct rte_mbuf, data_off) !=
>       | ^~~~~~~~~~~~~
> 
> 
> I hope reviewers pay attention to the alignment changes when removing
> those markers.
> This is not trivial to catch in the CI.

Good catch, David.

I wonder about the reason for 64 bit aligning the rearm_data group of fields? Perhaps it's there for (64 bit arch) vector instruction purposes?

Regardless, it's an ABI break, so padding or an alignment attribute must be added to avoid ABI breakage. If there is no valid reason for the 64 bit alignment, it could be noted that the padding (or alignment attribute) is there for 32 bit arch ABI compatibility reasons only.

Please note that only RTE_MARKER64 is affected by this. The other marker types have arch bit-width (or smaller) alignment, i.e. RTE_MARKER is 8 byte aligned on 64 bit arch and 4 byte aligned on 32 bit arch.

And RTE_MARKER64 is only used in the rte_mbuf structure.


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v6 20/23] mbuf: remove and stop using rte marker fields
  2024-02-27 15:18  4%     ` David Marchand
  2024-02-27 16:04  3%       ` Morten Brørup
  2024-02-27 17:23  4%       ` Tyler Retzlaff
@ 2024-02-28 14:03  3%       ` Dodji Seketeli
  2 siblings, 0 replies; 200+ results
From: Dodji Seketeli @ 2024-02-28 14:03 UTC (permalink / raw)
  To: David Marchand
  Cc: Dodji Seketeli, dev, Ajit Khaparde, Andrew Boyer,
	Andrew Rybchenko, Bruce Richardson, Chenbo Xia, Chengwen Feng,
	Dariusz Sosnowski, David Christensen, Hyong Youb Kim,
	Jerin Jacob, Jie Hai, Jingjing Wu, John Daley, Kevin Laatz,
	Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj, Matan Azrad,
	Maxime Coquelin, Nithin Dabilpuram, Ori Kam, Ruifeng Wang,
	Satha Rao, Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, mb,
	Tyler Retzlaff

Hello,

David Marchand <david.marchand@redhat.com> writes:

> Hello Dodji,

o/

[...]


> This change is reported as a potential ABI change.
>
> For the context, this patch
> https://patchwork.dpdk.org/project/dpdk/patch/1709012499-12813-21-git-send-email-roretzla@linux.microsoft.com/
> removes null-sized markers (those fields were using RTE_MARKER, see
> https://git.dpdk.org/dpdk/tree/lib/eal/include/rte_common.h#n583) from
> the rte_mbuf struct.

Thank you for the context.

[...]


> As reported by the CI:

[...]

>   [C] 'function const rte_eth_rxtx_callback*
> rte_eth_add_first_rx_callback(uint16_t, uint16_t, rte_rx_callback_fn,
> void*)' at rte_ethdev.c:5768:1 has some indirect sub-type changes:
>     parameter 3 of type 'typedef rte_rx_callback_fn' has sub-type changes:

[...]

>               in pointed to type 'struct rte_mbuf' at rte_mbuf_core.h:470:1:
>                 type size hasn't changed
>                 4 data member deletions:
>                   'RTE_MARKER cacheline0', at offset 0 (in bits) at
> rte_mbuf_core.h:467:1
>                   'RTE_MARKER64 rearm_data', at offset 128 (in bits)
> at rte_mbuf_core.h:490:1
>                   'RTE_MARKER rx_descriptor_fields1', at offset 256
> (in bits) at rte_mbuf_core.h:517:1
>                   'RTE_MARKER cacheline1', at offset 512 (in bits) at
> rte_mbuf_core.h:598:1
>                 no data member change (1 filtered);

[...]

> I would argue this change do not impact ABI as the layout of the mbuf
> object is not impacted.

I agree that on the /particular platform/ that the checker runs on,
there is no incompatible ABI change because no data member offset from
the 'struct rte_mbuf' type got modified and the size of the type hasn't
changed either.


>
> Error: ABI issue reported for abidiff --suppr
> /home/runner/work/dpdk/dpdk/devtools/libabigail.abignore
> --no-added-syms --headers-dir1 reference/usr/local/include
> --headers-dir2 install/usr/local/include
> reference/usr/local/lib/librte_ethdev.so.24.0
> install/usr/local/lib/librte_ethdev.so.24.1
> ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
> this as a potential issue).
>
> Opinions?
>
> Btw, I see no way to suppress this (except a global [suppress_type]
> name = rte_mbuf)...

Right.

To avoid having subsequent changes to that type from being "overly"
suppressed, maybe do something like:

    [suppress_type]
     name = rte_mbuf
     has_size_change = no
     has_data_member = {cacheline0, rearm_data, rx_descriptor_fields1, cacheline1}

That way, only size-impacting changes to struct rte_mbuf in its form
that predates this patch would be suppressed, hopefully.

[...]

Cheers,

-- 
		Dodji


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v6 20/23] mbuf: remove and stop using rte marker fields
  2024-02-27 17:23  4%       ` Tyler Retzlaff
@ 2024-02-28 10:42  3%         ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-02-28 10:42 UTC (permalink / raw)
  To: Tyler Retzlaff, Thomas Monjalon
  Cc: Dodji Seketeli, dev, Ajit Khaparde, Andrew Boyer,
	Andrew Rybchenko, Bruce Richardson, Chenbo Xia, Chengwen Feng,
	Dariusz Sosnowski, David Christensen, Hyong Youb Kim,
	Jerin Jacob, Jie Hai, Jingjing Wu, John Daley, Kevin Laatz,
	Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj, Matan Azrad,
	Maxime Coquelin, Nithin Dabilpuram, Ori Kam, Ruifeng Wang,
	Satha Rao, Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, mb, ci

On Tue, Feb 27, 2024 at 6:23 PM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> On Tue, Feb 27, 2024 at 04:18:10PM +0100, David Marchand wrote:
> > Hello Dodji,
> >
> > On Tue, Feb 27, 2024 at 6:44 AM Tyler Retzlaff
> > <roretzla@linux.microsoft.com> wrote:
> > >
> > > RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> > > RTE_MARKER fields from rte_mbuf struct.
> > >
> > > Maintain alignment of fields after removed cacheline1 marker by placing
> > > C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> > >
> > > Update implementation of rte_mbuf_prefetch_part1() and
> > > rte_mbuf_prefetch_part2() inline functions calculate pointer for
> > > prefetch of cachline0 and cachline1 without using removed markers.
> > >
> > > Update static_assert of rte_mbuf struct fields to reference data_off and
> > > packet_type fields that occupy the original offsets of the marker
> > > fields.
> > >
> > > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> >
> > This change is reported as a potential ABI change.
> >
> > For the context, this patch
> > https://patchwork.dpdk.org/project/dpdk/patch/1709012499-12813-21-git-send-email-roretzla@linux.microsoft.com/
> > removes null-sized markers (those fields were using RTE_MARKER, see
> > https://git.dpdk.org/dpdk/tree/lib/eal/include/rte_common.h#n583) from
> > the rte_mbuf struct.
> > I would argue this change do not impact ABI as the layout of the mbuf
> > object is not impacted.
>
> It isn't a surprise that the change got flagged because the 0 sized
> fields being removed probably not something the checker understands.
> So no ABI change just API break (as was requested).
>
> > As reported by the CI:
> >
> >   [C] 'function const rte_eth_rxtx_callback*
> > rte_eth_add_first_rx_callback(uint16_t, uint16_t, rte_rx_callback_fn,
> > void*)' at rte_ethdev.c:5768:1 has some indirect sub-type changes:
> >     parameter 3 of type 'typedef rte_rx_callback_fn' has sub-type changes:
> >       underlying type 'typedef uint16_t (typedef uint16_t, typedef
> > uint16_t, rte_mbuf**, typedef uint16_t, typedef uint16_t, void*)*'
> > changed:
> >         in pointed to type 'function type typedef uint16_t (typedef
> > uint16_t, typedef uint16_t, rte_mbuf**, typedef uint16_t, typedef
> > uint16_t, void*)':
> >           parameter 3 of type 'rte_mbuf**' has sub-type changes:
> >             in pointed to type 'rte_mbuf*':
> >               in pointed to type 'struct rte_mbuf' at rte_mbuf_core.h:470:1:
> >                 type size hasn't changed
> >                 4 data member deletions:
> >                   'RTE_MARKER cacheline0', at offset 0 (in bits) at
> > rte_mbuf_core.h:467:1
> >                   'RTE_MARKER64 rearm_data', at offset 128 (in bits)
> > at rte_mbuf_core.h:490:1
> >                   'RTE_MARKER rx_descriptor_fields1', at offset 256
> > (in bits) at rte_mbuf_core.h:517:1
> >                   'RTE_MARKER cacheline1', at offset 512 (in bits) at
> > rte_mbuf_core.h:598:1
> >                 no data member change (1 filtered);
> >
> > Error: ABI issue reported for abidiff --suppr
> > /home/runner/work/dpdk/dpdk/devtools/libabigail.abignore
> > --no-added-syms --headers-dir1 reference/usr/local/include
> > --headers-dir2 install/usr/local/include
> > reference/usr/local/lib/librte_ethdev.so.24.0
> > install/usr/local/lib/librte_ethdev.so.24.1
> > ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
> > this as a potential issue).
> >
> > Opinions?
> >
> > Btw, I see no way to suppress this (except a global [suppress_type]
> > name = rte_mbuf)...
>
> I am unfamiliar with the ABI checker I'm afraid i have no suggestion to
> offer. Maybe we can just ignore the failure for this one series when we
> decide it is ready to be merged and don't suppress the checker?

The ABI check compares a current build with a (cached) reference build.
There is no "let's ignore this specific error" mechanism at the moment.
And I suspect it would be non-trivial to add (parsing abidiff text
output... brrr).

Changing the check so that it compares against origin/main (for
example) every time is doable *on paper*, but it would consume a lot
of cpu for maintainers (like Thomas, Ferruh or me) and the CI.
CI scripts would have to be updated too.


Fow now, one thing we can do is to change the reference to point at
the exact commit that introduces a change we know safe.
This requires a little sync between people (maintainers / users of
test-meson-builds.h) and UNH CI, but this is doable.

On the other hand, by the time we merge this series, libabigail may
have fixed this for us already? ;-)


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* [DPDK/ethdev Bug 1386] [dpdk-24.03] [ABI][meson test] driver-tests/link_bonding_autotest test failed: Segmentation fault when do ABI testing
@ 2024-02-28  3:18  9% bugzilla
  0 siblings, 0 replies; 200+ results
From: bugzilla @ 2024-02-28  3:18 UTC (permalink / raw)
  To: dev

[-- Attachment #1: Type: text/plain, Size: 4592 bytes --]

https://bugs.dpdk.org/show_bug.cgi?id=1386

            Bug ID: 1386
           Summary: [dpdk-24.03] [ABI][meson test]
                    driver-tests/link_bonding_autotest test failed:
                    Segmentation fault when do ABI testing
           Product: DPDK
           Version: unspecified
          Hardware: All
                OS: All
            Status: UNCONFIRMED
          Severity: normal
          Priority: Normal
         Component: ethdev
          Assignee: dev@dpdk.org
          Reporter: yux.jiang@intel.com
  Target Milestone: ---

[Environment]

DPDK version: 92c0ad70ca version: 24.03-rc1
OS: RHEL9.0/5.14.0-70.13.1.el9_0.x86_64
Compiler: gcc version 11.2.1
Hardware platform: Intel(R) Xeon(R) Platinum 8180 CPU @ 2.50GHz
NIC hardware: Ethernet Controller XL710 for 40GbE QSFP+ 1583
NIC firmware: 
driver: i40e
version: 2.24.6
firmware-version: 9.40 0x8000ece4 1.3429.0

[Test Setup]
Steps to reproduce
List the steps to reproduce the issue.

1, Build latest main dpdk24.03-rc1
rm -rf x86_64-native-linuxapp-gcc
CC=gcc meson -Denable_kmods=True -Dlibdir=lib  --default-library=shared
x86_64-native-linuxapp-gcc
ninja -C x86_64-native-linuxapp-gcc
rm -rf /root/tmp/dpdk_share_lib /root/shared_lib_dpdk
DESTDIR=/root/tmp/dpdk_share_lib ninja -C x86_64-native-linuxapp-gcc -j 110
install
mv /root/tmp/dpdk_share_lib/usr/local/lib /root/shared_lib_dpdk
ll /root/shared_lib_dpdk
cat /root/.bashrc | grep LD_LIBRARY_PATH
sed -i 's#export LD_LIBRARY_PATH=.*#export
LD_LIBRARY_PATH=/root/shared_lib_dpdk#g' /root/.bashrc

2, Build LTS dpdk23.11.0
rm /root/dpdk
tar zxvf dpdk_abi.tar.gz -C ~
cd ~/dpdk/
rm -rf x86_64-native-linuxapp-gcc
CC=gcc meson -Denable_kmods=True -Dlibdir=lib  --default-library=shared
x86_64-native-linuxapp-gcc
ninja -C x86_64-native-linuxapp-gcc
rm -rf x86_64-native-linuxapp-gcc/lib
rm -rf x86_64-native-linuxapp-gcc/drivers

3, Bind nic
rmmod vfio_pci
rmmod vfio_iommu_type1
rmmod vfio
modprobe vfio
modprobe vfio-pci
usertools/dpdk-devbind.py --force --bind=vfio-pci 0000:18:00.0 0000:1a:00.0

4, Launch dpdk-test and run link_bonding_autotest
x86_64-native-linuxapp-gcc/app/dpdk-test -c 0xff -d /root/shared_lib_dpdk -a
0000:18:00.0 -a 0000:1a:00.0
RTE>>link_bonding_autotest

Show the output from the previous commands.
[root@ABI-80 dpdk]# x86_64-native-linuxapp-gcc/app/dpdk-test -c 0xff -d
/root/shared_lib_dpdk -a 0000:18:00.0 -a 0000:1a:00.0
EAL: Detected CPU lcores: 112
EAL: Detected NUMA nodes: 2
EAL: Detected shared linkage of DPDK
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: VFIO support initialized
EAL: Using IOMMU type 1 (Type 1)
EAL: Ignore mapping IO port bar(1)
EAL: Ignore mapping IO port bar(4)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0000:18:00.0 (socket 0)
i40e_GLQF_reg_init(): i40e device 0000:18:00.0 changed global register
[0x002689a0]. original: 0x00000021, new: 0x00000029
EAL: Ignore mapping IO port bar(1)
EAL: Ignore mapping IO port bar(4)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0000:1a:00.0 (socket 0)
i40e_GLQF_reg_init(): i40e device 0000:1a:00.0 changed global register
[0x002689a0]. original: 0x00000021, new: 0x00000029
TELEMETRY: No legacy callbacks, legacy socket not created
APP: HPET is not enabled, using TSC as default timer
RTE>>link_bonding_autotest
 + ------------------------------------------------------- +
 + Test Suite : Link Bonding Unit Test Suite
Segmentation fault (core dumped)

[Expected Result]
Test ok.

[Regression]
Is this issue a regression: (Y/N) Y
The first bad commit:
commit d4b9235f95de4f46f368627af256ed8080f20d65
Author: Jerin Jacob <jerinj@marvell.com>
Date:   Thu Jan 18 15:17:42 2024 +0530

    ethdev: add Tx queue used count query

    Introduce a new API to retrieve the number of used descriptors
    in a Tx queue. Applications can leverage this API in the fast path to
    inspect the Tx queue occupancy and take appropriate actions based on the
    available free descriptors.

    A notable use case could be implementing Random Early Discard (RED)
    in software based on Tx queue occupancy.

    Signed-off-by: Jerin Jacob <jerinj@marvell.com>
    Reviewed-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
    Acked-by: Morten Brørup <mb@smartsharesystems.com>
    Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>
    Reviewed-by: Ferruh Yigit <ferruh.yigit@amd.com>

-- 
You are receiving this mail because:
You are the assignee for the bug.

[-- Attachment #2: Type: text/html, Size: 6869 bytes --]

^ permalink raw reply	[relevance 9%]

* [PATCH v3 4/6] pipeline: replace zero length array with flex array
  @ 2024-02-27 23:56  4%   ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-02-27 23:56 UTC (permalink / raw)
  To: dev
  Cc: Bruce Richardson, Cristian Dumitrescu, Honnappa Nagarahalli,
	Sameh Gobriel, Vladimir Medvedkin, Yipeng Wang, mb, fengchengwen,
	Tyler Retzlaff

Zero length arrays are GNU extension. Replace with
standard flex array.

Add a temporary suppression for rte_pipeline_table_entry
libabigail bug:

Bugzilla ID: https://sourceware.org/bugzilla/show_bug.cgi?id=31377

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Reviewed-by: Morten Brørup <mb@smartsharesystems.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
---
 devtools/libabigail.abignore      | 2 ++
 lib/pipeline/rte_pipeline.h       | 2 +-
 lib/pipeline/rte_port_in_action.c | 2 +-
 3 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 645d289..2a23d53 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -33,6 +33,8 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till next major ABI version ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+[suppress_type]
+	name = rte_pipeline_table_entry
 
 [suppress_type]
 	name = rte_eth_fp_ops
diff --git a/lib/pipeline/rte_pipeline.h b/lib/pipeline/rte_pipeline.h
index ec51b9b..0c7994b 100644
--- a/lib/pipeline/rte_pipeline.h
+++ b/lib/pipeline/rte_pipeline.h
@@ -220,7 +220,7 @@ struct rte_pipeline_table_entry {
 		uint32_t table_id;
 	};
 	/** Start of table entry area for user defined actions and meta-data */
-	__extension__ uint8_t action_data[0];
+	uint8_t action_data[];
 };
 
 /**
diff --git a/lib/pipeline/rte_port_in_action.c b/lib/pipeline/rte_port_in_action.c
index 5818973..ebd9b9a 100644
--- a/lib/pipeline/rte_port_in_action.c
+++ b/lib/pipeline/rte_port_in_action.c
@@ -282,7 +282,7 @@ struct rte_port_in_action_profile *
 struct rte_port_in_action {
 	struct ap_config cfg;
 	struct ap_data data;
-	uint8_t memory[0] __rte_cache_aligned;
+	uint8_t memory[] __rte_cache_aligned;
 };
 
 static __rte_always_inline void *
-- 
1.8.3.1


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v6 20/23] mbuf: remove and stop using rte marker fields
  2024-02-27 15:18  4%     ` David Marchand
  2024-02-27 16:04  3%       ` Morten Brørup
@ 2024-02-27 17:23  4%       ` Tyler Retzlaff
  2024-02-28 10:42  3%         ` David Marchand
  2024-02-28 14:03  3%       ` Dodji Seketeli
  2 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-02-27 17:23 UTC (permalink / raw)
  To: David Marchand
  Cc: Dodji Seketeli, dev, Ajit Khaparde, Andrew Boyer,
	Andrew Rybchenko, Bruce Richardson, Chenbo Xia, Chengwen Feng,
	Dariusz Sosnowski, David Christensen, Hyong Youb Kim,
	Jerin Jacob, Jie Hai, Jingjing Wu, John Daley, Kevin Laatz,
	Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj, Matan Azrad,
	Maxime Coquelin, Nithin Dabilpuram, Ori Kam, Ruifeng Wang,
	Satha Rao, Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, mb

On Tue, Feb 27, 2024 at 04:18:10PM +0100, David Marchand wrote:
> Hello Dodji,
> 
> On Tue, Feb 27, 2024 at 6:44 AM Tyler Retzlaff
> <roretzla@linux.microsoft.com> wrote:
> >
> > RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> > RTE_MARKER fields from rte_mbuf struct.
> >
> > Maintain alignment of fields after removed cacheline1 marker by placing
> > C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> >
> > Update implementation of rte_mbuf_prefetch_part1() and
> > rte_mbuf_prefetch_part2() inline functions calculate pointer for
> > prefetch of cachline0 and cachline1 without using removed markers.
> >
> > Update static_assert of rte_mbuf struct fields to reference data_off and
> > packet_type fields that occupy the original offsets of the marker
> > fields.
> >
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> 
> This change is reported as a potential ABI change.
> 
> For the context, this patch
> https://patchwork.dpdk.org/project/dpdk/patch/1709012499-12813-21-git-send-email-roretzla@linux.microsoft.com/
> removes null-sized markers (those fields were using RTE_MARKER, see
> https://git.dpdk.org/dpdk/tree/lib/eal/include/rte_common.h#n583) from
> the rte_mbuf struct.
> I would argue this change do not impact ABI as the layout of the mbuf
> object is not impacted.

It isn't a surprise that the change got flagged because the 0 sized
fields being removed probably not something the checker understands.
So no ABI change just API break (as was requested).

> As reported by the CI:
> 
>   [C] 'function const rte_eth_rxtx_callback*
> rte_eth_add_first_rx_callback(uint16_t, uint16_t, rte_rx_callback_fn,
> void*)' at rte_ethdev.c:5768:1 has some indirect sub-type changes:
>     parameter 3 of type 'typedef rte_rx_callback_fn' has sub-type changes:
>       underlying type 'typedef uint16_t (typedef uint16_t, typedef
> uint16_t, rte_mbuf**, typedef uint16_t, typedef uint16_t, void*)*'
> changed:
>         in pointed to type 'function type typedef uint16_t (typedef
> uint16_t, typedef uint16_t, rte_mbuf**, typedef uint16_t, typedef
> uint16_t, void*)':
>           parameter 3 of type 'rte_mbuf**' has sub-type changes:
>             in pointed to type 'rte_mbuf*':
>               in pointed to type 'struct rte_mbuf' at rte_mbuf_core.h:470:1:
>                 type size hasn't changed
>                 4 data member deletions:
>                   'RTE_MARKER cacheline0', at offset 0 (in bits) at
> rte_mbuf_core.h:467:1
>                   'RTE_MARKER64 rearm_data', at offset 128 (in bits)
> at rte_mbuf_core.h:490:1
>                   'RTE_MARKER rx_descriptor_fields1', at offset 256
> (in bits) at rte_mbuf_core.h:517:1
>                   'RTE_MARKER cacheline1', at offset 512 (in bits) at
> rte_mbuf_core.h:598:1
>                 no data member change (1 filtered);
> 
> Error: ABI issue reported for abidiff --suppr
> /home/runner/work/dpdk/dpdk/devtools/libabigail.abignore
> --no-added-syms --headers-dir1 reference/usr/local/include
> --headers-dir2 install/usr/local/include
> reference/usr/local/lib/librte_ethdev.so.24.0
> install/usr/local/lib/librte_ethdev.so.24.1
> ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
> this as a potential issue).
> 
> Opinions?
> 
> Btw, I see no way to suppress this (except a global [suppress_type]
> name = rte_mbuf)...

I am unfamiliar with the ABI checker I'm afraid i have no suggestion to
offer. Maybe we can just ignore the failure for this one series when we
decide it is ready to be merged and don't suppress the checker?

> 
> 
> -- 
> David Marchand

^ permalink raw reply	[relevance 4%]

* RE: [PATCH v6 20/23] mbuf: remove and stop using rte marker fields
  2024-02-27 15:18  4%     ` David Marchand
@ 2024-02-27 16:04  3%       ` Morten Brørup
  2024-02-27 17:23  4%       ` Tyler Retzlaff
  2024-02-28 14:03  3%       ` Dodji Seketeli
  2 siblings, 0 replies; 200+ results
From: Morten Brørup @ 2024-02-27 16:04 UTC (permalink / raw)
  To: David Marchand, Dodji Seketeli
  Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
	David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
	Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
	Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
	Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
	Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, Tyler Retzlaff

> From: David Marchand [mailto:david.marchand@redhat.com]
> Sent: Tuesday, 27 February 2024 16.18
> 
> Hello Dodji,
> 
> On Tue, Feb 27, 2024 at 6:44 AM Tyler Retzlaff
> <roretzla@linux.microsoft.com> wrote:
> >
> > RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> > RTE_MARKER fields from rte_mbuf struct.
> >
> > Maintain alignment of fields after removed cacheline1 marker by
> placing
> > C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
> >
> > Update implementation of rte_mbuf_prefetch_part1() and
> > rte_mbuf_prefetch_part2() inline functions calculate pointer for
> > prefetch of cachline0 and cachline1 without using removed markers.
> >
> > Update static_assert of rte_mbuf struct fields to reference data_off
> and
> > packet_type fields that occupy the original offsets of the marker
> > fields.
> >
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> 
> This change is reported as a potential ABI change.
> 
> For the context, this patch
> https://patchwork.dpdk.org/project/dpdk/patch/1709012499-12813-21-git-
> send-email-roretzla@linux.microsoft.com/
> removes null-sized markers (those fields were using RTE_MARKER, see
> https://git.dpdk.org/dpdk/tree/lib/eal/include/rte_common.h#n583) from
> the rte_mbuf struct.
> I would argue this change do not impact ABI as the layout of the mbuf
> object is not impacted.
> 
> As reported by the CI:
> 
>   [C] 'function const rte_eth_rxtx_callback*
> rte_eth_add_first_rx_callback(uint16_t, uint16_t, rte_rx_callback_fn,
> void*)' at rte_ethdev.c:5768:1 has some indirect sub-type changes:
>     parameter 3 of type 'typedef rte_rx_callback_fn' has sub-type
> changes:
>       underlying type 'typedef uint16_t (typedef uint16_t, typedef
> uint16_t, rte_mbuf**, typedef uint16_t, typedef uint16_t, void*)*'
> changed:
>         in pointed to type 'function type typedef uint16_t (typedef
> uint16_t, typedef uint16_t, rte_mbuf**, typedef uint16_t, typedef
> uint16_t, void*)':
>           parameter 3 of type 'rte_mbuf**' has sub-type changes:
>             in pointed to type 'rte_mbuf*':
>               in pointed to type 'struct rte_mbuf' at
> rte_mbuf_core.h:470:1:
>                 type size hasn't changed
>                 4 data member deletions:
>                   'RTE_MARKER cacheline0', at offset 0 (in bits) at
> rte_mbuf_core.h:467:1
>                   'RTE_MARKER64 rearm_data', at offset 128 (in bits)
> at rte_mbuf_core.h:490:1
>                   'RTE_MARKER rx_descriptor_fields1', at offset 256
> (in bits) at rte_mbuf_core.h:517:1
>                   'RTE_MARKER cacheline1', at offset 512 (in bits) at
> rte_mbuf_core.h:598:1
>                 no data member change (1 filtered);
> 
> Error: ABI issue reported for abidiff --suppr
> /home/runner/work/dpdk/dpdk/devtools/libabigail.abignore
> --no-added-syms --headers-dir1 reference/usr/local/include
> --headers-dir2 install/usr/local/include
> reference/usr/local/lib/librte_ethdev.so.24.0
> install/usr/local/lib/librte_ethdev.so.24.1
> ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
> this as a potential issue).
> 
> Opinions?

Agree: Not an ABI change, only API change.

> 
> Btw, I see no way to suppress this (except a global [suppress_type]
> name = rte_mbuf)...
> 
> 
> --
> David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v6 20/23] mbuf: remove and stop using rte marker fields
  @ 2024-02-27 15:18  4%     ` David Marchand
  2024-02-27 16:04  3%       ` Morten Brørup
                         ` (2 more replies)
    1 sibling, 3 replies; 200+ results
From: David Marchand @ 2024-02-27 15:18 UTC (permalink / raw)
  To: Dodji Seketeli
  Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
	David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
	Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
	Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
	Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
	Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, mb,
	Tyler Retzlaff

Hello Dodji,

On Tue, Feb 27, 2024 at 6:44 AM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> RTE_MARKER typedefs are a GCC extension unsupported by MSVC. Remove
> RTE_MARKER fields from rte_mbuf struct.
>
> Maintain alignment of fields after removed cacheline1 marker by placing
> C11 alignas(RTE_CACHE_LINE_MIN_SIZE).
>
> Update implementation of rte_mbuf_prefetch_part1() and
> rte_mbuf_prefetch_part2() inline functions calculate pointer for
> prefetch of cachline0 and cachline1 without using removed markers.
>
> Update static_assert of rte_mbuf struct fields to reference data_off and
> packet_type fields that occupy the original offsets of the marker
> fields.
>
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>

This change is reported as a potential ABI change.

For the context, this patch
https://patchwork.dpdk.org/project/dpdk/patch/1709012499-12813-21-git-send-email-roretzla@linux.microsoft.com/
removes null-sized markers (those fields were using RTE_MARKER, see
https://git.dpdk.org/dpdk/tree/lib/eal/include/rte_common.h#n583) from
the rte_mbuf struct.
I would argue this change do not impact ABI as the layout of the mbuf
object is not impacted.

As reported by the CI:

  [C] 'function const rte_eth_rxtx_callback*
rte_eth_add_first_rx_callback(uint16_t, uint16_t, rte_rx_callback_fn,
void*)' at rte_ethdev.c:5768:1 has some indirect sub-type changes:
    parameter 3 of type 'typedef rte_rx_callback_fn' has sub-type changes:
      underlying type 'typedef uint16_t (typedef uint16_t, typedef
uint16_t, rte_mbuf**, typedef uint16_t, typedef uint16_t, void*)*'
changed:
        in pointed to type 'function type typedef uint16_t (typedef
uint16_t, typedef uint16_t, rte_mbuf**, typedef uint16_t, typedef
uint16_t, void*)':
          parameter 3 of type 'rte_mbuf**' has sub-type changes:
            in pointed to type 'rte_mbuf*':
              in pointed to type 'struct rte_mbuf' at rte_mbuf_core.h:470:1:
                type size hasn't changed
                4 data member deletions:
                  'RTE_MARKER cacheline0', at offset 0 (in bits) at
rte_mbuf_core.h:467:1
                  'RTE_MARKER64 rearm_data', at offset 128 (in bits)
at rte_mbuf_core.h:490:1
                  'RTE_MARKER rx_descriptor_fields1', at offset 256
(in bits) at rte_mbuf_core.h:517:1
                  'RTE_MARKER cacheline1', at offset 512 (in bits) at
rte_mbuf_core.h:598:1
                no data member change (1 filtered);

Error: ABI issue reported for abidiff --suppr
/home/runner/work/dpdk/dpdk/devtools/libabigail.abignore
--no-added-syms --headers-dir1 reference/usr/local/include
--headers-dir2 install/usr/local/include
reference/usr/local/lib/librte_ethdev.so.24.0
install/usr/local/lib/librte_ethdev.so.24.1
ABIDIFF_ABI_CHANGE, this change requires a review (abidiff flagged
this as a potential issue).

Opinions?

Btw, I see no way to suppress this (except a global [suppress_type]
name = rte_mbuf)...


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* RE: [PATCH v6 27/39] mempool: use C11 alignas
  2024-02-26 18:25  3%   ` [PATCH v6 27/39] mempool: " Tyler Retzlaff
@ 2024-02-27  9:42  0%     ` Konstantin Ananyev
  0 siblings, 0 replies; 200+ results
From: Konstantin Ananyev @ 2024-02-27  9:42 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Andrew Rybchenko, Bruce Richardson, Fengchengwen,
	Cristian Dumitrescu, David Christensen, David Hunt, Ferruh Yigit,
	Honnappa Nagarahalli, Jasvinder Singh, Jerin Jacob, Kevin Laatz,
	Konstantin Ananyev, Min Zhou, Ruifeng Wang, Sameh Gobriel,
	Stanislaw Kardach, Thomas Monjalon, Vladimir Medvedkin,
	Yipeng Wang



> Subject: [PATCH v6 27/39] mempool: use C11 alignas
> 
> The current location used for __rte_aligned(a) for alignment of types
> and variables is not compatible with MSVC. There is only a single
> location accepted by both toolchains.
> 
> For variables standard C11 offers alignas(a) supported by conformant
> compilers i.e. both MSVC and GCC.
> 
> For types the standard offers no alignment facility that compatibly
> interoperates with C and C++ but may be achieved by relocating the
> placement of __rte_aligned(a) to the aforementioned location accepted
> by all currently supported toolchains.
> 
> To allow alignment for both compilers do the following:
> 
> * Move __rte_aligned from the end of {struct,union} definitions to
>   be between {struct,union} and tag.
> 
>   The placement between {struct,union} and the tag allows the desired
>   alignment to be imparted on the type regardless of the toolchain being
>   used for all of GCC, LLVM, MSVC compilers building both C and C++.
> 
> * Replace use of __rte_aligned(a) on variables/fields with alignas(a).
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  lib/mempool/rte_mempool.h | 27 ++++++++++++++-------------
>  1 file changed, 14 insertions(+), 13 deletions(-)
> 
> diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
> index 6fa4d48..23fd5c8 100644
> --- a/lib/mempool/rte_mempool.h
> +++ b/lib/mempool/rte_mempool.h
> @@ -34,6 +34,7 @@
>   * user cache created with rte_mempool_cache_create().
>   */
> 
> +#include <stdalign.h>
>  #include <stdio.h>
>  #include <stdint.h>
>  #include <inttypes.h>
> @@ -66,7 +67,7 @@
>   * captured since they can be calculated from other stats.
>   * For example: put_cache_objs = put_objs - put_common_pool_objs.
>   */
> -struct rte_mempool_debug_stats {
> +struct __rte_cache_aligned rte_mempool_debug_stats {
>  	uint64_t put_bulk;             /**< Number of puts. */
>  	uint64_t put_objs;             /**< Number of objects successfully put. */
>  	uint64_t put_common_pool_bulk; /**< Number of bulks enqueued in common pool. */
> @@ -80,13 +81,13 @@ struct rte_mempool_debug_stats {
>  	uint64_t get_success_blks;     /**< Successful allocation number of contiguous blocks. */
>  	uint64_t get_fail_blks;        /**< Failed allocation number of contiguous blocks. */
>  	RTE_CACHE_GUARD;
> -} __rte_cache_aligned;
> +};
>  #endif
> 
>  /**
>   * A structure that stores a per-core object cache.
>   */
> -struct rte_mempool_cache {
> +struct __rte_cache_aligned rte_mempool_cache {
>  	uint32_t size;	      /**< Size of the cache */
>  	uint32_t flushthresh; /**< Threshold before we flush excess elements */
>  	uint32_t len;	      /**< Current cache count */
> @@ -109,8 +110,8 @@ struct rte_mempool_cache {
>  	 * Cache is allocated to this size to allow it to overflow in certain
>  	 * cases to avoid needless emptying of cache.
>  	 */
> -	void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned;
> -} __rte_cache_aligned;
> +	alignas(RTE_CACHE_LINE_SIZE) void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2];
> +};
> 
>  /**
>   * A structure that stores the size of mempool elements.
> @@ -218,15 +219,15 @@ struct rte_mempool_memhdr {
>   * The structure is cache-line aligned to avoid ABI breakages in
>   * a number of cases when something small is added.
>   */
> -struct rte_mempool_info {
> +struct __rte_cache_aligned rte_mempool_info {
>  	/** Number of objects in the contiguous block */
>  	unsigned int contig_block_size;
> -} __rte_cache_aligned;
> +};
> 
>  /**
>   * The RTE mempool structure.
>   */
> -struct rte_mempool {
> +struct __rte_cache_aligned rte_mempool {
>  	char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
>  	union {
>  		void *pool_data;         /**< Ring or pool to store objects. */
> @@ -268,7 +269,7 @@ struct rte_mempool {
>  	 */
>  	struct rte_mempool_debug_stats stats[RTE_MAX_LCORE + 1];
>  #endif
> -}  __rte_cache_aligned;
> +};
> 
>  /** Spreading among memory channels not required. */
>  #define RTE_MEMPOOL_F_NO_SPREAD		0x0001
> @@ -688,7 +689,7 @@ typedef int (*rte_mempool_get_info_t)(const struct rte_mempool *mp,
> 
> 
>  /** Structure defining mempool operations structure */
> -struct rte_mempool_ops {
> +struct __rte_cache_aligned rte_mempool_ops {
>  	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
>  	rte_mempool_alloc_t alloc;       /**< Allocate private data. */
>  	rte_mempool_free_t free;         /**< Free the external pool. */
> @@ -713,7 +714,7 @@ struct rte_mempool_ops {
>  	 * Dequeue a number of contiguous object blocks.
>  	 */
>  	rte_mempool_dequeue_contig_blocks_t dequeue_contig_blocks;
> -} __rte_cache_aligned;
> +};
> 
>  #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
> 
> @@ -726,14 +727,14 @@ struct rte_mempool_ops {
>   * any function pointers stored directly in the mempool struct would not be.
>   * This results in us simply having "ops_index" in the mempool struct.
>   */
> -struct rte_mempool_ops_table {
> +struct __rte_cache_aligned rte_mempool_ops_table {
>  	rte_spinlock_t sl;     /**< Spinlock for add/delete. */
>  	uint32_t num_ops;      /**< Number of used ops structs in the table. */
>  	/**
>  	 * Storage for all possible ops structs.
>  	 */
>  	struct rte_mempool_ops ops[RTE_MEMPOOL_MAX_OPS_IDX];
> -} __rte_cache_aligned;
> +};
> 
>  /** Array of registered ops structs. */
>  extern struct rte_mempool_ops_table rte_mempool_ops_table;
> --

Acked-by: Konstantin Ananyev <konstantin.ananyev@huawei.com>

> 1.8.3.1


^ permalink raw reply	[relevance 0%]

* [PATCH v6 27/39] mempool: use C11 alignas
  @ 2024-02-26 18:25  3%   ` Tyler Retzlaff
  2024-02-27  9:42  0%     ` Konstantin Ananyev
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-02-26 18:25 UTC (permalink / raw)
  To: dev
  Cc: Andrew Rybchenko, Bruce Richardson, Chengwen Feng,
	Cristian Dumitrescu, David Christensen, David Hunt, Ferruh Yigit,
	Honnappa Nagarahalli, Jasvinder Singh, Jerin Jacob, Kevin Laatz,
	Konstantin Ananyev, Min Zhou, Ruifeng Wang, Sameh Gobriel,
	Stanislaw Kardach, Thomas Monjalon, Vladimir Medvedkin,
	Yipeng Wang, Tyler Retzlaff

The current location used for __rte_aligned(a) for alignment of types
and variables is not compatible with MSVC. There is only a single
location accepted by both toolchains.

For variables standard C11 offers alignas(a) supported by conformant
compilers i.e. both MSVC and GCC.

For types the standard offers no alignment facility that compatibly
interoperates with C and C++ but may be achieved by relocating the
placement of __rte_aligned(a) to the aforementioned location accepted
by all currently supported toolchains.

To allow alignment for both compilers do the following:

* Move __rte_aligned from the end of {struct,union} definitions to
  be between {struct,union} and tag.

  The placement between {struct,union} and the tag allows the desired
  alignment to be imparted on the type regardless of the toolchain being
  used for all of GCC, LLVM, MSVC compilers building both C and C++.

* Replace use of __rte_aligned(a) on variables/fields with alignas(a).

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/mempool/rte_mempool.h | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 6fa4d48..23fd5c8 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -34,6 +34,7 @@
  * user cache created with rte_mempool_cache_create().
  */
 
+#include <stdalign.h>
 #include <stdio.h>
 #include <stdint.h>
 #include <inttypes.h>
@@ -66,7 +67,7 @@
  * captured since they can be calculated from other stats.
  * For example: put_cache_objs = put_objs - put_common_pool_objs.
  */
-struct rte_mempool_debug_stats {
+struct __rte_cache_aligned rte_mempool_debug_stats {
 	uint64_t put_bulk;             /**< Number of puts. */
 	uint64_t put_objs;             /**< Number of objects successfully put. */
 	uint64_t put_common_pool_bulk; /**< Number of bulks enqueued in common pool. */
@@ -80,13 +81,13 @@ struct rte_mempool_debug_stats {
 	uint64_t get_success_blks;     /**< Successful allocation number of contiguous blocks. */
 	uint64_t get_fail_blks;        /**< Failed allocation number of contiguous blocks. */
 	RTE_CACHE_GUARD;
-} __rte_cache_aligned;
+};
 #endif
 
 /**
  * A structure that stores a per-core object cache.
  */
-struct rte_mempool_cache {
+struct __rte_cache_aligned rte_mempool_cache {
 	uint32_t size;	      /**< Size of the cache */
 	uint32_t flushthresh; /**< Threshold before we flush excess elements */
 	uint32_t len;	      /**< Current cache count */
@@ -109,8 +110,8 @@ struct rte_mempool_cache {
 	 * Cache is allocated to this size to allow it to overflow in certain
 	 * cases to avoid needless emptying of cache.
 	 */
-	void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned;
-} __rte_cache_aligned;
+	alignas(RTE_CACHE_LINE_SIZE) void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2];
+};
 
 /**
  * A structure that stores the size of mempool elements.
@@ -218,15 +219,15 @@ struct rte_mempool_memhdr {
  * The structure is cache-line aligned to avoid ABI breakages in
  * a number of cases when something small is added.
  */
-struct rte_mempool_info {
+struct __rte_cache_aligned rte_mempool_info {
 	/** Number of objects in the contiguous block */
 	unsigned int contig_block_size;
-} __rte_cache_aligned;
+};
 
 /**
  * The RTE mempool structure.
  */
-struct rte_mempool {
+struct __rte_cache_aligned rte_mempool {
 	char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
 	union {
 		void *pool_data;         /**< Ring or pool to store objects. */
@@ -268,7 +269,7 @@ struct rte_mempool {
 	 */
 	struct rte_mempool_debug_stats stats[RTE_MAX_LCORE + 1];
 #endif
-}  __rte_cache_aligned;
+};
 
 /** Spreading among memory channels not required. */
 #define RTE_MEMPOOL_F_NO_SPREAD		0x0001
@@ -688,7 +689,7 @@ typedef int (*rte_mempool_get_info_t)(const struct rte_mempool *mp,
 
 
 /** Structure defining mempool operations structure */
-struct rte_mempool_ops {
+struct __rte_cache_aligned rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
 	rte_mempool_alloc_t alloc;       /**< Allocate private data. */
 	rte_mempool_free_t free;         /**< Free the external pool. */
@@ -713,7 +714,7 @@ struct rte_mempool_ops {
 	 * Dequeue a number of contiguous object blocks.
 	 */
 	rte_mempool_dequeue_contig_blocks_t dequeue_contig_blocks;
-} __rte_cache_aligned;
+};
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
 
@@ -726,14 +727,14 @@ struct rte_mempool_ops {
  * any function pointers stored directly in the mempool struct would not be.
  * This results in us simply having "ops_index" in the mempool struct.
  */
-struct rte_mempool_ops_table {
+struct __rte_cache_aligned rte_mempool_ops_table {
 	rte_spinlock_t sl;     /**< Spinlock for add/delete. */
 	uint32_t num_ops;      /**< Number of used ops structs in the table. */
 	/**
 	 * Storage for all possible ops structs.
 	 */
 	struct rte_mempool_ops ops[RTE_MEMPOOL_MAX_OPS_IDX];
-} __rte_cache_aligned;
+};
 
 /** Array of registered ops structs. */
 extern struct rte_mempool_ops_table rte_mempool_ops_table;
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v4 1/7] ethdev: support report register names and filter
  2024-02-26  3:07  8%   ` [PATCH v4 1/7] ethdev: support report register names and filter Jie Hai
@ 2024-02-26  8:01  0%     ` fengchengwen
  2024-03-06  7:22  0%       ` Jie Hai
  2024-02-29  9:52  3%     ` Thomas Monjalon
  1 sibling, 1 reply; 200+ results
From: fengchengwen @ 2024-02-26  8:01 UTC (permalink / raw)
  To: Jie Hai, dev; +Cc: lihuisong, liuyonglong, huangdengdui, ferruh.yigit

Hi Jie,

On 2024/2/26 11:07, Jie Hai wrote:
> This patch adds "filter" and "names" fields to "rte_dev_reg_info"
> structure. Names of registers in data fields can be reported and
> the registers can be filtered by their names.
> 
> The new API rte_eth_dev_get_reg_info_ext() is added to support
> reporting names and filtering by names. And the original API
> rte_eth_dev_get_reg_info() does not use the name and filter fields.
> A local variable is used in rte_eth_dev_get_reg_info for
> compatibility. If the drivers does not report the names, set them
> to "offset_XXX".
> 
> Signed-off-by: Jie Hai <haijie1@huawei.com>
> ---
>  doc/guides/rel_notes/release_24_03.rst |  8 ++++++
>  lib/ethdev/rte_dev_info.h              | 11 +++++++++
>  lib/ethdev/rte_ethdev.c                | 34 ++++++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h                | 28 +++++++++++++++++++++
>  lib/ethdev/version.map                 |  1 +
>  5 files changed, 82 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
> index 32d0ad8cf6a7..fa46da427dca 100644
> --- a/doc/guides/rel_notes/release_24_03.rst
> +++ b/doc/guides/rel_notes/release_24_03.rst
> @@ -132,6 +132,11 @@ New Features
>      to support TLS v1.2, TLS v1.3 and DTLS v1.2.
>    * Added PMD API to allow raw submission of instructions to CPT.
>  
> +  * **Added support for dumping registers with names and filter.**
> +
> +    * Added new API functions ``rte_eth_dev_get_reg_info_ext()`` to and filter
> +      the registers by their names and get the information of registers(names,
> +      values and other attributes).
>  
>  Removed Items
>  -------------
> @@ -197,6 +202,9 @@ ABI Changes
>  
>  * No ABI change that would break compatibility with 23.11.
>  
> +* ethdev: Added ``filter`` and ``names`` fields to ``rte_dev_reg_info``
> +  structure for reporting names of registers and filtering them by names.
> +
>  
>  Known Issues
>  ------------
> diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
> index 67cf0ae52668..0ad4a43b9526 100644
> --- a/lib/ethdev/rte_dev_info.h
> +++ b/lib/ethdev/rte_dev_info.h
> @@ -11,6 +11,11 @@ extern "C" {
>  
>  #include <stdint.h>
>  
> +#define RTE_ETH_REG_NAME_SIZE 128

Almost all stats name size is 64, why not keep consistent?

> +struct rte_eth_reg_name {
> +	char name[RTE_ETH_REG_NAME_SIZE];
> +};
> +
>  /*
>   * Placeholder for accessing device registers
>   */
> @@ -20,6 +25,12 @@ struct rte_dev_reg_info {
>  	uint32_t length; /**< Number of registers to fetch */
>  	uint32_t width; /**< Size of device register */
>  	uint32_t version; /**< Device version */
> +	/**
> +	 * Filter for target subset of registers.
> +	 * This field could affects register selection for data/length/names.
> +	 */
> +	const char *filter;
> +	struct rte_eth_reg_name *names; /**< Registers name saver */
>  };
>  
>  /*
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index f1c658f49e80..9ef50c633ce3 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -6388,8 +6388,37 @@ rte_eth_read_clock(uint16_t port_id, uint64_t *clock)
>  
>  int
>  rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
> +{
> +	struct rte_dev_reg_info reg_info = { 0 };
> +	int ret;
> +
> +	if (info == NULL) {
> +		RTE_ETHDEV_LOG_LINE(ERR,
> +			"Cannot get ethdev port %u register info to NULL",
> +			port_id);
> +		return -EINVAL;
> +	}
> +
> +	reg_info.length = info->length;
> +	reg_info.data = info->data;
> +
> +	ret = rte_eth_dev_get_reg_info_ext(port_id, &reg_info);
> +	if (ret != 0)
> +		return ret;
> +
> +	info->length = reg_info.length;
> +	info->width = reg_info.width;
> +	info->version = reg_info.version;
> +	info->offset = reg_info.offset;
> +
> +	return 0;
> +}
> +
> +int
> +rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info)
>  {
>  	struct rte_eth_dev *dev;
> +	uint32_t i;
>  	int ret;
>  
>  	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> @@ -6408,6 +6437,11 @@ rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
>  
>  	rte_ethdev_trace_get_reg_info(port_id, info, ret);
>  
> +	/* Report the default names if drivers not report. */
> +	if (info->names != NULL && strlen(info->names[0].name) == 0)
> +		for (i = 0; i < info->length; i++)
> +			snprintf(info->names[i].name, RTE_ETH_REG_NAME_SIZE,
> +				"offset_%x", info->offset + i * info->width);

%x has no prefix "0x", may lead to confused.
How about use %u ?

Another question, if app don't zero names' memory, then its value is random, so it will not enter this logic.
Suggest memset item[0]'s name memory before invoke PMD ops.

>  	return ret;
>  }
>  
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index ed27360447a3..09e2d5fdb49b 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -5066,6 +5066,34 @@ __rte_experimental
>  int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id,
>  		struct rte_power_monitor_cond *pmc);
>  
> +/**
> + * Retrieve the filtered device registers (values and names) and
> + * register attributes (number of registers and register size)
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param info
> + *   Pointer to rte_dev_reg_info structure to fill in.
> + *   If info->filter is not NULL and the driver does not support names or
> + *   filter, return error. If info->filter is NULL, return info for all
> + *   registers (seen as filter none).
> + *   If info->data is NULL, the function fills in the width and length fields.
> + *   If non-NULL, ethdev considers there are enough spaces to store the
> + *   registers, and the values of registers whose name contains the filter
> + *   string are put into the buffer pointed at by the data field. Do the same
> + *   for the names of registers if info->names is not NULL. If drivers do not
> + *   report names, default names are given by ethdev.

It's a little hard to understand. Suggest use '-' for each field, just like rte_eth_remove_tx_callback

> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support.
> + *   - (-EINVAL) if bad parameter.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EIO) if device is removed.
> + *   - others depends on the specific operations implementation.
> + */
> +__rte_experimental
> +int rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info);
> +
>  /**
>   * Retrieve device registers and register attributes (number of registers and
>   * register size)
> diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
> index 17e4eac8a4cc..c41a64e404db 100644
> --- a/lib/ethdev/version.map
> +++ b/lib/ethdev/version.map
> @@ -325,6 +325,7 @@ EXPERIMENTAL {
>  	rte_flow_template_table_resizable;
>  	rte_flow_template_table_resize;
>  	rte_flow_template_table_resize_complete;
> +	rte_eth_dev_get_reg_info_ext;

should place with alphabetical order.

Thanks

>  };
>  
>  INTERNAL {
> 

^ permalink raw reply	[relevance 0%]

* [PATCH v4 1/7] ethdev: support report register names and filter
  @ 2024-02-26  3:07  8%   ` Jie Hai
  2024-02-26  8:01  0%     ` fengchengwen
  2024-02-29  9:52  3%     ` Thomas Monjalon
  0 siblings, 2 replies; 200+ results
From: Jie Hai @ 2024-02-26  3:07 UTC (permalink / raw)
  To: dev; +Cc: lihuisong, fengchengwen, liuyonglong, huangdengdui, ferruh.yigit

This patch adds "filter" and "names" fields to "rte_dev_reg_info"
structure. Names of registers in data fields can be reported and
the registers can be filtered by their names.

The new API rte_eth_dev_get_reg_info_ext() is added to support
reporting names and filtering by names. And the original API
rte_eth_dev_get_reg_info() does not use the name and filter fields.
A local variable is used in rte_eth_dev_get_reg_info for
compatibility. If the drivers does not report the names, set them
to "offset_XXX".

Signed-off-by: Jie Hai <haijie1@huawei.com>
---
 doc/guides/rel_notes/release_24_03.rst |  8 ++++++
 lib/ethdev/rte_dev_info.h              | 11 +++++++++
 lib/ethdev/rte_ethdev.c                | 34 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 28 +++++++++++++++++++++
 lib/ethdev/version.map                 |  1 +
 5 files changed, 82 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 32d0ad8cf6a7..fa46da427dca 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -132,6 +132,11 @@ New Features
     to support TLS v1.2, TLS v1.3 and DTLS v1.2.
   * Added PMD API to allow raw submission of instructions to CPT.
 
+  * **Added support for dumping registers with names and filter.**
+
+    * Added new API functions ``rte_eth_dev_get_reg_info_ext()`` to and filter
+      the registers by their names and get the information of registers(names,
+      values and other attributes).
 
 Removed Items
 -------------
@@ -197,6 +202,9 @@ ABI Changes
 
 * No ABI change that would break compatibility with 23.11.
 
+* ethdev: Added ``filter`` and ``names`` fields to ``rte_dev_reg_info``
+  structure for reporting names of registers and filtering them by names.
+
 
 Known Issues
 ------------
diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
index 67cf0ae52668..0ad4a43b9526 100644
--- a/lib/ethdev/rte_dev_info.h
+++ b/lib/ethdev/rte_dev_info.h
@@ -11,6 +11,11 @@ extern "C" {
 
 #include <stdint.h>
 
+#define RTE_ETH_REG_NAME_SIZE 128
+struct rte_eth_reg_name {
+	char name[RTE_ETH_REG_NAME_SIZE];
+};
+
 /*
  * Placeholder for accessing device registers
  */
@@ -20,6 +25,12 @@ struct rte_dev_reg_info {
 	uint32_t length; /**< Number of registers to fetch */
 	uint32_t width; /**< Size of device register */
 	uint32_t version; /**< Device version */
+	/**
+	 * Filter for target subset of registers.
+	 * This field could affects register selection for data/length/names.
+	 */
+	const char *filter;
+	struct rte_eth_reg_name *names; /**< Registers name saver */
 };
 
 /*
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e80..9ef50c633ce3 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6388,8 +6388,37 @@ rte_eth_read_clock(uint16_t port_id, uint64_t *clock)
 
 int
 rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
+{
+	struct rte_dev_reg_info reg_info = { 0 };
+	int ret;
+
+	if (info == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Cannot get ethdev port %u register info to NULL",
+			port_id);
+		return -EINVAL;
+	}
+
+	reg_info.length = info->length;
+	reg_info.data = info->data;
+
+	ret = rte_eth_dev_get_reg_info_ext(port_id, &reg_info);
+	if (ret != 0)
+		return ret;
+
+	info->length = reg_info.length;
+	info->width = reg_info.width;
+	info->version = reg_info.version;
+	info->offset = reg_info.offset;
+
+	return 0;
+}
+
+int
+rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info)
 {
 	struct rte_eth_dev *dev;
+	uint32_t i;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
@@ -6408,6 +6437,11 @@ rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
 
 	rte_ethdev_trace_get_reg_info(port_id, info, ret);
 
+	/* Report the default names if drivers not report. */
+	if (info->names != NULL && strlen(info->names[0].name) == 0)
+		for (i = 0; i < info->length; i++)
+			snprintf(info->names[i].name, RTE_ETH_REG_NAME_SIZE,
+				"offset_%x", info->offset + i * info->width);
 	return ret;
 }
 
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index ed27360447a3..09e2d5fdb49b 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -5066,6 +5066,34 @@ __rte_experimental
 int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id,
 		struct rte_power_monitor_cond *pmc);
 
+/**
+ * Retrieve the filtered device registers (values and names) and
+ * register attributes (number of registers and register size)
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param info
+ *   Pointer to rte_dev_reg_info structure to fill in.
+ *   If info->filter is not NULL and the driver does not support names or
+ *   filter, return error. If info->filter is NULL, return info for all
+ *   registers (seen as filter none).
+ *   If info->data is NULL, the function fills in the width and length fields.
+ *   If non-NULL, ethdev considers there are enough spaces to store the
+ *   registers, and the values of registers whose name contains the filter
+ *   string are put into the buffer pointed at by the data field. Do the same
+ *   for the names of registers if info->names is not NULL. If drivers do not
+ *   report names, default names are given by ethdev.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EIO) if device is removed.
+ *   - others depends on the specific operations implementation.
+ */
+__rte_experimental
+int rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info);
+
 /**
  * Retrieve device registers and register attributes (number of registers and
  * register size)
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 17e4eac8a4cc..c41a64e404db 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -325,6 +325,7 @@ EXPERIMENTAL {
 	rte_flow_template_table_resizable;
 	rte_flow_template_table_resize;
 	rte_flow_template_table_resize_complete;
+	rte_eth_dev_get_reg_info_ext;
 };
 
 INTERNAL {
-- 
2.30.0


^ permalink raw reply	[relevance 8%]

* [PATCH v5 27/39] mempool: use C11 alignas
  @ 2024-02-23 19:04  3%   ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-02-23 19:04 UTC (permalink / raw)
  To: dev
  Cc: Andrew Rybchenko, Bruce Richardson, Chengwen Feng,
	Cristian Dumitrescu, David Christensen, David Hunt, Ferruh Yigit,
	Honnappa Nagarahalli, Jasvinder Singh, Jerin Jacob, Kevin Laatz,
	Konstantin Ananyev, Min Zhou, Ruifeng Wang, Sameh Gobriel,
	Stanislaw Kardach, Thomas Monjalon, Vladimir Medvedkin,
	Yipeng Wang, Tyler Retzlaff

The current location used for __rte_aligned(a) for alignment of types
and variables is not compatible with MSVC. There is only a single
location accepted by both toolchains.

For variables standard C11 offers alignas(a) supported by conformant
compilers i.e. both MSVC and GCC.

For types the standard offers no alignment facility that compatibly
interoperates with C and C++ but may be achieved by relocating the
placement of __rte_aligned(a) to the aforementioned location accepted
by all currently supported toolchains.

To allow alignment for both compilers do the following:

* Move __rte_aligned from the end of {struct,union} definitions to
  be between {struct,union} and tag.

  The placement between {struct,union} and the tag allows the desired
  alignment to be imparted on the type regardless of the toolchain being
  used for all of GCC, LLVM, MSVC compilers building both C and C++.

* Replace use of __rte_aligned(a) on variables/fields with alignas(a).

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/mempool/rte_mempool.h | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 6fa4d48..23fd5c8 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -34,6 +34,7 @@
  * user cache created with rte_mempool_cache_create().
  */
 
+#include <stdalign.h>
 #include <stdio.h>
 #include <stdint.h>
 #include <inttypes.h>
@@ -66,7 +67,7 @@
  * captured since they can be calculated from other stats.
  * For example: put_cache_objs = put_objs - put_common_pool_objs.
  */
-struct rte_mempool_debug_stats {
+struct __rte_cache_aligned rte_mempool_debug_stats {
 	uint64_t put_bulk;             /**< Number of puts. */
 	uint64_t put_objs;             /**< Number of objects successfully put. */
 	uint64_t put_common_pool_bulk; /**< Number of bulks enqueued in common pool. */
@@ -80,13 +81,13 @@ struct rte_mempool_debug_stats {
 	uint64_t get_success_blks;     /**< Successful allocation number of contiguous blocks. */
 	uint64_t get_fail_blks;        /**< Failed allocation number of contiguous blocks. */
 	RTE_CACHE_GUARD;
-} __rte_cache_aligned;
+};
 #endif
 
 /**
  * A structure that stores a per-core object cache.
  */
-struct rte_mempool_cache {
+struct __rte_cache_aligned rte_mempool_cache {
 	uint32_t size;	      /**< Size of the cache */
 	uint32_t flushthresh; /**< Threshold before we flush excess elements */
 	uint32_t len;	      /**< Current cache count */
@@ -109,8 +110,8 @@ struct rte_mempool_cache {
 	 * Cache is allocated to this size to allow it to overflow in certain
 	 * cases to avoid needless emptying of cache.
 	 */
-	void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned;
-} __rte_cache_aligned;
+	alignas(RTE_CACHE_LINE_SIZE) void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2];
+};
 
 /**
  * A structure that stores the size of mempool elements.
@@ -218,15 +219,15 @@ struct rte_mempool_memhdr {
  * The structure is cache-line aligned to avoid ABI breakages in
  * a number of cases when something small is added.
  */
-struct rte_mempool_info {
+struct __rte_cache_aligned rte_mempool_info {
 	/** Number of objects in the contiguous block */
 	unsigned int contig_block_size;
-} __rte_cache_aligned;
+};
 
 /**
  * The RTE mempool structure.
  */
-struct rte_mempool {
+struct __rte_cache_aligned rte_mempool {
 	char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
 	union {
 		void *pool_data;         /**< Ring or pool to store objects. */
@@ -268,7 +269,7 @@ struct rte_mempool {
 	 */
 	struct rte_mempool_debug_stats stats[RTE_MAX_LCORE + 1];
 #endif
-}  __rte_cache_aligned;
+};
 
 /** Spreading among memory channels not required. */
 #define RTE_MEMPOOL_F_NO_SPREAD		0x0001
@@ -688,7 +689,7 @@ typedef int (*rte_mempool_get_info_t)(const struct rte_mempool *mp,
 
 
 /** Structure defining mempool operations structure */
-struct rte_mempool_ops {
+struct __rte_cache_aligned rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
 	rte_mempool_alloc_t alloc;       /**< Allocate private data. */
 	rte_mempool_free_t free;         /**< Free the external pool. */
@@ -713,7 +714,7 @@ struct rte_mempool_ops {
 	 * Dequeue a number of contiguous object blocks.
 	 */
 	rte_mempool_dequeue_contig_blocks_t dequeue_contig_blocks;
-} __rte_cache_aligned;
+};
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
 
@@ -726,14 +727,14 @@ struct rte_mempool_ops {
  * any function pointers stored directly in the mempool struct would not be.
  * This results in us simply having "ops_index" in the mempool struct.
  */
-struct rte_mempool_ops_table {
+struct __rte_cache_aligned rte_mempool_ops_table {
 	rte_spinlock_t sl;     /**< Spinlock for add/delete. */
 	uint32_t num_ops;      /**< Number of used ops structs in the table. */
 	/**
 	 * Storage for all possible ops structs.
 	 */
 	struct rte_mempool_ops ops[RTE_MEMPOOL_MAX_OPS_IDX];
-} __rte_cache_aligned;
+};
 
 /** Array of registered ops structs. */
 extern struct rte_mempool_ops_table rte_mempool_ops_table;
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* [PATCH v2 1/4] ethdev: add function to check representor port
  @ 2024-02-23  2:42  2%   ` Chaoyong He
  0 siblings, 0 replies; 200+ results
From: Chaoyong He @ 2024-02-23  2:42 UTC (permalink / raw)
  To: dev; +Cc: oss-drivers, Long Wu, Chaoyong He, Peng Zhang

From: Long Wu <long.wu@corigine.com>

Add a function to check if a device is representor port, also
modified the related codes for PMDs.

Signed-off-by: Long Wu <long.wu@corigine.com>
Reviewed-by: Chaoyong He <chaoyong.he@corigine.com>
Reviewed-by: Peng Zhang <peng.zhang@corigine.com>
---
 doc/guides/rel_notes/release_24_03.rst     |  3 +++
 drivers/net/bnxt/bnxt.h                    |  3 ---
 drivers/net/bnxt/bnxt_ethdev.c             |  4 ++--
 drivers/net/bnxt/tf_ulp/bnxt_tf_pmd_shim.c | 12 ++++++------
 drivers/net/bnxt/tf_ulp/bnxt_ulp.c         |  4 ++--
 drivers/net/bnxt/tf_ulp/ulp_def_rules.c    |  4 ++--
 drivers/net/cpfl/cpfl_representor.c        |  2 +-
 drivers/net/enic/enic.h                    |  5 -----
 drivers/net/enic/enic_ethdev.c             |  2 +-
 drivers/net/enic/enic_fm_flow.c            | 20 ++++++++++----------
 drivers/net/enic/enic_main.c               |  4 ++--
 drivers/net/i40e/i40e_ethdev.c             |  2 +-
 drivers/net/ice/ice_dcf_ethdev.c           |  2 +-
 drivers/net/ixgbe/ixgbe_ethdev.c           |  2 +-
 drivers/net/nfp/flower/nfp_flower_flow.c   |  2 +-
 drivers/net/nfp/nfp_mtr.c                  |  2 +-
 drivers/net/nfp/nfp_net_common.c           |  4 ++--
 drivers/net/nfp/nfp_net_flow.c             |  2 +-
 lib/ethdev/ethdev_driver.h                 | 17 +++++++++++++++++
 19 files changed, 54 insertions(+), 42 deletions(-)

diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 879bb4944c..8178417b98 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -185,6 +185,9 @@ API Changes
 * ethdev: Renamed structure ``rte_flow_action_modify_data`` to be
   ``rte_flow_field_data`` for more generic usage.
 
+* ethdev: Add new function ``rte_eth_dev_is_repr()`` to check if a device is
+  representor port.
+
 
 ABI Changes
 -----------
diff --git a/drivers/net/bnxt/bnxt.h b/drivers/net/bnxt/bnxt.h
index fcf2b8be97..82036a16a1 100644
--- a/drivers/net/bnxt/bnxt.h
+++ b/drivers/net/bnxt/bnxt.h
@@ -1204,9 +1204,6 @@ extern const struct rte_flow_ops bnxt_flow_meter_ops;
 	} \
 } while (0)
 
-#define	BNXT_ETH_DEV_IS_REPRESENTOR(eth_dev)	\
-		((eth_dev)->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
-
 extern int bnxt_logtype_driver;
 #define RTE_LOGTYPE_BNXT bnxt_logtype_driver
 #define PMD_DRV_LOG_RAW(level, fmt, args...) \
diff --git a/drivers/net/bnxt/bnxt_ethdev.c b/drivers/net/bnxt/bnxt_ethdev.c
index f8d83662f4..825e9c1941 100644
--- a/drivers/net/bnxt/bnxt_ethdev.c
+++ b/drivers/net/bnxt/bnxt_ethdev.c
@@ -3525,7 +3525,7 @@ bnxt_flow_ops_get_op(struct rte_eth_dev *dev,
 	if (!bp)
 		return -EIO;
 
-	if (BNXT_ETH_DEV_IS_REPRESENTOR(dev)) {
+	if (rte_eth_dev_is_repr(dev)) {
 		struct bnxt_representor *vfr = dev->data->dev_private;
 		bp = vfr->parent_dev->data->dev_private;
 		/* parent is deleted while children are still valid */
@@ -6781,7 +6781,7 @@ static int bnxt_pci_remove(struct rte_pci_device *pci_dev)
 
 	PMD_DRV_LOG(DEBUG, "BNXT Port:%d pci remove\n", eth_dev->data->port_id);
 	if (rte_eal_process_type() == RTE_PROC_PRIMARY) {
-		if (eth_dev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
+		if (rte_eth_dev_is_repr(eth_dev))
 			return rte_eth_dev_destroy(eth_dev,
 						   bnxt_representor_uninit);
 		else
diff --git a/drivers/net/bnxt/tf_ulp/bnxt_tf_pmd_shim.c b/drivers/net/bnxt/tf_ulp/bnxt_tf_pmd_shim.c
index 239191e14e..96d61c3ed2 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_tf_pmd_shim.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_tf_pmd_shim.c
@@ -202,7 +202,7 @@ bnxt_pmd_get_svif(uint16_t port_id, bool func_svif,
 	struct bnxt *bp;
 
 	eth_dev = &rte_eth_devices[port_id];
-	if (BNXT_ETH_DEV_IS_REPRESENTOR(eth_dev)) {
+	if (rte_eth_dev_is_repr(eth_dev)) {
 		struct bnxt_representor *vfr = eth_dev->data->dev_private;
 		if (!vfr)
 			return 0;
@@ -260,7 +260,7 @@ bnxt_pmd_get_vnic_id(uint16_t port, enum bnxt_ulp_intf_type type)
 	struct bnxt *bp;
 
 	eth_dev = &rte_eth_devices[port];
-	if (BNXT_ETH_DEV_IS_REPRESENTOR(eth_dev)) {
+	if (rte_eth_dev_is_repr(eth_dev)) {
 		struct bnxt_representor *vfr = eth_dev->data->dev_private;
 		if (!vfr)
 			return 0;
@@ -285,7 +285,7 @@ bnxt_pmd_get_fw_func_id(uint16_t port, enum bnxt_ulp_intf_type type)
 	struct bnxt *bp;
 
 	eth_dev = &rte_eth_devices[port];
-	if (BNXT_ETH_DEV_IS_REPRESENTOR(eth_dev)) {
+	if (rte_eth_dev_is_repr(eth_dev)) {
 		struct bnxt_representor *vfr = eth_dev->data->dev_private;
 		if (!vfr)
 			return 0;
@@ -308,7 +308,7 @@ bnxt_pmd_get_interface_type(uint16_t port)
 	struct bnxt *bp;
 
 	eth_dev = &rte_eth_devices[port];
-	if (BNXT_ETH_DEV_IS_REPRESENTOR(eth_dev))
+	if (rte_eth_dev_is_repr(eth_dev))
 		return BNXT_ULP_INTF_TYPE_VF_REP;
 
 	bp = eth_dev->data->dev_private;
@@ -330,7 +330,7 @@ bnxt_pmd_get_phy_port_id(uint16_t port_id)
 	struct bnxt *bp;
 
 	eth_dev = &rte_eth_devices[port_id];
-	if (BNXT_ETH_DEV_IS_REPRESENTOR(eth_dev)) {
+	if (rte_eth_dev_is_repr(eth_dev)) {
 		vfr = eth_dev->data->dev_private;
 		if (!vfr)
 			return 0;
@@ -350,7 +350,7 @@ bnxt_pmd_get_parif(uint16_t port_id, enum bnxt_ulp_intf_type type)
 	struct bnxt *bp;
 
 	eth_dev = &rte_eth_devices[port_id];
-	if (BNXT_ETH_DEV_IS_REPRESENTOR(eth_dev)) {
+	if (rte_eth_dev_is_repr(eth_dev)) {
 		struct bnxt_representor *vfr = eth_dev->data->dev_private;
 		if (!vfr)
 			return 0;
diff --git a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
index 274e935a1f..33028c470f 100644
--- a/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
+++ b/drivers/net/bnxt/tf_ulp/bnxt_ulp.c
@@ -1559,7 +1559,7 @@ bnxt_ulp_destroy_vfr_default_rules(struct bnxt *bp, bool global)
 	struct rte_eth_dev *vfr_eth_dev;
 	struct bnxt_representor *vfr_bp;
 
-	if (!BNXT_TRUFLOW_EN(bp) || BNXT_ETH_DEV_IS_REPRESENTOR(bp->eth_dev))
+	if (!BNXT_TRUFLOW_EN(bp) || rte_eth_dev_is_repr(bp->eth_dev))
 		return;
 
 	if (!bp->ulp_ctx || !bp->ulp_ctx->cfg_data)
@@ -2316,7 +2316,7 @@ bnxt_ulp_eth_dev_ptr2_cntxt_get(struct rte_eth_dev	*dev)
 {
 	struct bnxt *bp = (struct bnxt *)dev->data->dev_private;
 
-	if (BNXT_ETH_DEV_IS_REPRESENTOR(dev)) {
+	if (rte_eth_dev_is_repr(dev)) {
 		struct bnxt_representor *vfr = dev->data->dev_private;
 
 		bp = vfr->parent_dev->data->dev_private;
diff --git a/drivers/net/bnxt/tf_ulp/ulp_def_rules.c b/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
index fe1f65deb9..8237dbd294 100644
--- a/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
+++ b/drivers/net/bnxt/tf_ulp/ulp_def_rules.c
@@ -449,7 +449,7 @@ bnxt_ulp_destroy_df_rules(struct bnxt *bp, bool global)
 	uint16_t port_id;
 
 	if (!BNXT_TRUFLOW_EN(bp) ||
-	    BNXT_ETH_DEV_IS_REPRESENTOR(bp->eth_dev))
+	    rte_eth_dev_is_repr(bp->eth_dev))
 		return;
 
 	if (!bp->ulp_ctx || !bp->ulp_ctx->cfg_data)
@@ -514,7 +514,7 @@ bnxt_ulp_create_df_rules(struct bnxt *bp)
 	int rc = 0;
 
 	if (!BNXT_TRUFLOW_EN(bp) ||
-	    BNXT_ETH_DEV_IS_REPRESENTOR(bp->eth_dev) || !bp->ulp_ctx)
+	    rte_eth_dev_is_repr(bp->eth_dev) || !bp->ulp_ctx)
 		return 0;
 
 	port_id = bp->eth_dev->data->port_id;
diff --git a/drivers/net/cpfl/cpfl_representor.c b/drivers/net/cpfl/cpfl_representor.c
index e2ed9eda04..60b72b5ec1 100644
--- a/drivers/net/cpfl/cpfl_representor.c
+++ b/drivers/net/cpfl/cpfl_representor.c
@@ -339,7 +339,7 @@ cpfl_repr_link_update(struct rte_eth_dev *ethdev,
 	struct cpfl_vport_id vi;
 	int ret;
 
-	if (!(ethdev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)) {
+	if (!rte_eth_dev_is_repr(ethdev)) {
 		PMD_INIT_LOG(ERR, "This ethdev is not representor.");
 		return -EINVAL;
 	}
diff --git a/drivers/net/enic/enic.h b/drivers/net/enic/enic.h
index 78778704f2..f46903ea9e 100644
--- a/drivers/net/enic/enic.h
+++ b/drivers/net/enic/enic.h
@@ -233,11 +233,6 @@ struct enic_vf_representor {
 #define VF_ENIC_TO_VF_REP(vf_enic) \
 	container_of(vf_enic, struct enic_vf_representor, enic)
 
-static inline int enic_is_vf_rep(struct enic *enic)
-{
-	return !!(enic->rte_dev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR);
-}
-
 /* Compute ethdev's max packet size from MTU */
 static inline uint32_t enic_mtu_to_max_rx_pktlen(uint32_t mtu)
 {
diff --git a/drivers/net/enic/enic_ethdev.c b/drivers/net/enic/enic_ethdev.c
index 7e040c36c4..cad8db2f6f 100644
--- a/drivers/net/enic/enic_ethdev.c
+++ b/drivers/net/enic/enic_ethdev.c
@@ -1386,7 +1386,7 @@ static int eth_enic_pci_remove(struct rte_pci_device *pci_dev)
 	ethdev = rte_eth_dev_allocated(pci_dev->device.name);
 	if (!ethdev)
 		return -ENODEV;
-	if (ethdev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
+	if (rte_eth_dev_is_repr(ethdev))
 		return rte_eth_dev_destroy(ethdev, enic_vf_representor_uninit);
 	else
 		return rte_eth_dev_destroy(ethdev, eth_enic_dev_uninit);
diff --git a/drivers/net/enic/enic_fm_flow.c b/drivers/net/enic/enic_fm_flow.c
index 90027dc676..8988148454 100644
--- a/drivers/net/enic/enic_fm_flow.c
+++ b/drivers/net/enic/enic_fm_flow.c
@@ -1535,14 +1535,14 @@ vf_egress_port_id_action(struct enic_flowman *fm,
 	ENICPMD_FUNC_TRACE();
 	src_enic = fm->user_enic;
 	dst_enic = pmd_priv(dst_dev);
-	if (!(src_enic->rte_dev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)) {
+	if (!rte_eth_dev_is_repr(src_enic->rte_dev)) {
 		return rte_flow_error_set(error, EINVAL,
 			RTE_FLOW_ERROR_TYPE_ACTION,
 			NULL, "source port is not VF representor");
 	}
 
 	/* VF -> PF uplink. dst is not VF representor */
-	if (!(dst_dev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)) {
+	if (!rte_eth_dev_is_repr(dst_dev)) {
 		/* PF is the VF's PF? Then nothing to do */
 		vf = VF_ENIC_TO_VF_REP(src_enic);
 		if (vf->pf == dst_enic) {
@@ -1954,7 +1954,7 @@ enic_fm_copy_action(struct enic_flowman *fm,
 	if (!(overlap & (FATE | PASSTHRU | COUNT | PORT_ID)))
 		goto unsupported;
 	/* Egress from VF: need implicit WQ match */
-	if (enic_is_vf_rep(enic) && !ingress) {
+	if (rte_eth_dev_is_repr(enic->rte_dev) && !ingress) {
 		fmt->ftm_data.fk_wq_id = 0;
 		fmt->ftm_mask.fk_wq_id = 0xffff;
 		fmt->ftm_data.fk_wq_vnic = enic->fm_vnic_handle;
@@ -3226,7 +3226,7 @@ enic_fm_init(struct enic *enic)
 		return 0;
 	ENICPMD_FUNC_TRACE();
 	/* Get vnic handle and save for port-id action */
-	if (enic_is_vf_rep(enic))
+	if (rte_eth_dev_is_repr(enic->rte_dev))
 		addr = &VF_ENIC_TO_VF_REP(enic)->bdf;
 	else
 		addr = &RTE_ETH_DEV_TO_PCI(enic->rte_dev)->addr;
@@ -3240,7 +3240,7 @@ enic_fm_init(struct enic *enic)
 	enic->fm_vnic_uif = vnic_dev_uif(enic->vdev);
 	ENICPMD_LOG(DEBUG, "uif %u", enic->fm_vnic_uif);
 	/* Nothing else to do for representor. It will share the PF flowman */
-	if (enic_is_vf_rep(enic))
+	if (rte_eth_dev_is_repr(enic->rte_dev))
 		return 0;
 	fm = calloc(1, sizeof(*fm));
 	if (fm == NULL) {
@@ -3321,7 +3321,7 @@ enic_fm_destroy(struct enic *enic)
 	struct enic_fm_fet *fet;
 
 	ENICPMD_FUNC_TRACE();
-	if (enic_is_vf_rep(enic)) {
+	if (rte_eth_dev_is_repr(enic->rte_dev)) {
 		delete_rep_flows(enic);
 		return;
 	}
@@ -3358,7 +3358,7 @@ enic_fm_allocate_switch_domain(struct enic *pf)
 	int ret;
 
 	ENICPMD_FUNC_TRACE();
-	if (enic_is_vf_rep(pf))
+	if (rte_eth_dev_is_repr(pf->rte_dev))
 		return -EINVAL;
 	cur = pf;
 	cur_a = &RTE_ETH_DEV_TO_PCI(cur->rte_dev)->addr;
@@ -3367,7 +3367,7 @@ enic_fm_allocate_switch_domain(struct enic *pf)
 		dev = &rte_eth_devices[pid];
 		if (!dev_is_enic(dev))
 			continue;
-		if (dev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
+		if (rte_eth_dev_is_repr(dev))
 			continue;
 		if (dev == cur->rte_dev)
 			continue;
@@ -3597,7 +3597,7 @@ delete_rep_flows(struct enic *enic)
 	struct rte_eth_dev *dev;
 	uint32_t i;
 
-	RTE_ASSERT(enic_is_vf_rep(enic));
+	RTE_ASSERT(rte_eth_dev_is_repr(enic->rte_dev));
 	vf = VF_ENIC_TO_VF_REP(enic);
 	dev = vf->pf->rte_dev;
 	for (i = 0; i < ARRAY_SIZE(vf->vf2rep_flow); i++) {
@@ -3617,7 +3617,7 @@ begin_fm(struct enic *enic)
 	struct enic_flowman *fm;
 
 	/* Representor uses PF flowman */
-	if (enic_is_vf_rep(enic)) {
+	if (rte_eth_dev_is_repr(enic->rte_dev)) {
 		vf = VF_ENIC_TO_VF_REP(enic);
 		fm = vf->pf->fm;
 	} else {
diff --git a/drivers/net/enic/enic_main.c b/drivers/net/enic/enic_main.c
index a6aaa760ca..2f681315b6 100644
--- a/drivers/net/enic/enic_main.c
+++ b/drivers/net/enic/enic_main.c
@@ -824,7 +824,7 @@ int enic_alloc_rq(struct enic *enic, uint16_t queue_idx,
 	 * Representor uses a reserved PF queue. Translate representor
 	 * queue number to PF queue number.
 	 */
-	if (enic_is_vf_rep(enic)) {
+	if (rte_eth_dev_is_repr(enic->rte_dev)) {
 		RTE_ASSERT(queue_idx == 0);
 		vf = VF_ENIC_TO_VF_REP(enic);
 		sop_queue_idx = vf->pf_rq_sop_idx;
@@ -1053,7 +1053,7 @@ int enic_alloc_wq(struct enic *enic, uint16_t queue_idx,
 	 * Representor uses a reserved PF queue. Translate representor
 	 * queue number to PF queue number.
 	 */
-	if (enic_is_vf_rep(enic)) {
+	if (rte_eth_dev_is_repr(enic->rte_dev)) {
 		RTE_ASSERT(queue_idx == 0);
 		vf = VF_ENIC_TO_VF_REP(enic);
 		queue_idx = vf->pf_wq_idx;
diff --git a/drivers/net/i40e/i40e_ethdev.c b/drivers/net/i40e/i40e_ethdev.c
index 4d21341382..ddbc2962bc 100644
--- a/drivers/net/i40e/i40e_ethdev.c
+++ b/drivers/net/i40e/i40e_ethdev.c
@@ -706,7 +706,7 @@ static int eth_i40e_pci_remove(struct rte_pci_device *pci_dev)
 	if (!ethdev)
 		return 0;
 
-	if (ethdev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
+	if (rte_eth_dev_is_repr(ethdev))
 		return rte_eth_dev_pci_generic_remove(pci_dev,
 					i40e_vf_representor_uninit);
 	else
diff --git a/drivers/net/ice/ice_dcf_ethdev.c b/drivers/net/ice/ice_dcf_ethdev.c
index bebf356f4d..d58ec9d907 100644
--- a/drivers/net/ice/ice_dcf_ethdev.c
+++ b/drivers/net/ice/ice_dcf_ethdev.c
@@ -2131,7 +2131,7 @@ eth_ice_dcf_pci_remove(struct rte_pci_device *pci_dev)
 	if (!eth_dev)
 		return 0;
 
-	if (eth_dev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
+	if (rte_eth_dev_is_repr(eth_dev))
 		return rte_eth_dev_pci_generic_remove(pci_dev,
 						      ice_dcf_vf_repr_uninit);
 	else
diff --git a/drivers/net/ixgbe/ixgbe_ethdev.c b/drivers/net/ixgbe/ixgbe_ethdev.c
index 0cd3d0b105..c61c52b296 100644
--- a/drivers/net/ixgbe/ixgbe_ethdev.c
+++ b/drivers/net/ixgbe/ixgbe_ethdev.c
@@ -1842,7 +1842,7 @@ static int eth_ixgbe_pci_remove(struct rte_pci_device *pci_dev)
 	if (!ethdev)
 		return 0;
 
-	if (ethdev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR)
+	if (rte_eth_dev_is_repr(ethdev))
 		return rte_eth_dev_pci_generic_remove(pci_dev,
 					ixgbe_vf_representor_uninit);
 	else
diff --git a/drivers/net/nfp/flower/nfp_flower_flow.c b/drivers/net/nfp/flower/nfp_flower_flow.c
index e26be30d18..501a8d87bd 100644
--- a/drivers/net/nfp/flower/nfp_flower_flow.c
+++ b/drivers/net/nfp/flower/nfp_flower_flow.c
@@ -4321,7 +4321,7 @@ int
 nfp_flow_ops_get(struct rte_eth_dev *dev,
 		const struct rte_flow_ops **ops)
 {
-	if ((dev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR) == 0) {
+	if (!rte_eth_dev_is_repr(dev)) {
 		*ops = NULL;
 		PMD_DRV_LOG(ERR, "Port is not a representor.");
 		return -EINVAL;
diff --git a/drivers/net/nfp/nfp_mtr.c b/drivers/net/nfp/nfp_mtr.c
index 255977ec22..6abc6dc9bc 100644
--- a/drivers/net/nfp/nfp_mtr.c
+++ b/drivers/net/nfp/nfp_mtr.c
@@ -1066,7 +1066,7 @@ static const struct rte_mtr_ops nfp_mtr_ops = {
 int
 nfp_net_mtr_ops_get(struct rte_eth_dev *dev, void *arg)
 {
-	if ((dev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR) == 0) {
+	if (!rte_eth_dev_is_repr(dev)) {
 		PMD_DRV_LOG(ERR, "Port is not a representor");
 		return -EINVAL;
 	}
diff --git a/drivers/net/nfp/nfp_net_common.c b/drivers/net/nfp/nfp_net_common.c
index 99c319eb2d..0ee2811926 100644
--- a/drivers/net/nfp/nfp_net_common.c
+++ b/drivers/net/nfp/nfp_net_common.c
@@ -241,7 +241,7 @@ nfp_net_get_hw(const struct rte_eth_dev *dev)
 {
 	struct nfp_net_hw *hw;
 
-	if ((dev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR) != 0) {
+	if (rte_eth_dev_is_repr(dev)) {
 		struct nfp_flower_representor *repr;
 		repr = dev->data->dev_private;
 		hw = repr->app_fw_flower->pf_hw;
@@ -2143,7 +2143,7 @@ nfp_net_firmware_version_get(struct rte_eth_dev *dev,
 
 	hw = nfp_net_get_hw(dev);
 
-	if ((dev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR) != 0) {
+	if (rte_eth_dev_is_repr(dev)) {
 		snprintf(vnic_version, FW_VER_LEN, "%d.%d.%d.%d",
 			hw->ver.extend, hw->ver.class,
 			hw->ver.major, hw->ver.minor);
diff --git a/drivers/net/nfp/nfp_net_flow.c b/drivers/net/nfp/nfp_net_flow.c
index 98e8499756..3b33f3b6e9 100644
--- a/drivers/net/nfp/nfp_net_flow.c
+++ b/drivers/net/nfp/nfp_net_flow.c
@@ -932,7 +932,7 @@ nfp_net_flow_ops_get(struct rte_eth_dev *dev,
 {
 	struct nfp_net_hw *hw;
 
-	if ((dev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR) != 0) {
+	if (rte_eth_dev_is_repr(dev)) {
 		*ops = NULL;
 		PMD_DRV_LOG(ERR, "Port is a representor.");
 		return -EINVAL;
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 0e4c1f0743..f46c102558 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -2120,6 +2120,23 @@ struct rte_eth_fdir_conf {
 	struct rte_eth_fdir_flex_conf flex_conf;
 };
 
+/**
+ * @internal
+ * Check if the ethdev is a representor port.
+ *
+ * @param dev
+ *  Pointer to struct rte_eth_dev.
+ *
+ * @return
+ *  false the ethdev is not a representor port.
+ *  true  the ethdev is a representor port.
+ */
+static inline bool
+rte_eth_dev_is_repr(const struct rte_eth_dev *dev)
+{
+	return ((dev->data->dev_flags & RTE_ETH_DEV_REPRESENTOR) != 0);
+}
+
 #ifdef __cplusplus
 }
 #endif
-- 
2.39.1


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v3 10/11] eventdev: clarify docs on event object fields and op types
  2024-02-20 17:39  3%         ` Bruce Richardson
@ 2024-02-21  9:31  0%           ` Jerin Jacob
  0 siblings, 0 replies; 200+ results
From: Jerin Jacob @ 2024-02-21  9:31 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: dev, jerinj, mattias.ronnblom, abdullah.sevincer, sachin.saxena,
	hemant.agrawal, pbhagavatula, pravin.pathak

On Tue, Feb 20, 2024 at 11:09 PM Bruce Richardson
<bruce.richardson@intel.com> wrote:
>
> On Fri, Feb 09, 2024 at 02:44:04PM +0530, Jerin Jacob wrote:
> > On Fri, Feb 2, 2024 at 6:11 PM Bruce Richardson
> > <bruce.richardson@intel.com> wrote:
> > >
> > > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > > For the fields in "rte_event" struct, enhance the comments on each to
> > > clarify the field's use, and whether it is preserved between enqueue and
> > > dequeue, and it's role, if any, in scheduling.
> > >
> > > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > > ---
> > > V3: updates following review
> > > ---
> > >  lib/eventdev/rte_eventdev.h | 161 +++++++++++++++++++++++++-----------
> > >  1 file changed, 111 insertions(+), 50 deletions(-)
> > >
> > > diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> > > index 8d72765ae7..58219e027e 100644
> > > --- a/lib/eventdev/rte_eventdev.h
> > > +++ b/lib/eventdev/rte_eventdev.h
> > > @@ -1463,47 +1463,54 @@ struct rte_event_vector {
> > >
> > >  /* Event enqueue operations */
> > >  #define RTE_EVENT_OP_NEW                0
> > > -/**< The event producers use this operation to inject a new event to the
> > > - * event device.
> > > +/**< The @ref rte_event.op field must be set to this operation type to inject a new event,
> > > + * i.e. one not previously dequeued, into the event device, to be scheduled
> > > + * for processing.
> > >   */
> > >  #define RTE_EVENT_OP_FORWARD            1
> > > -/**< The CPU use this operation to forward the event to different event queue or
> > > - * change to new application specific flow or schedule type to enable
> > > - * pipelining.
> > > +/**< The application must set the @ref rte_event.op field to this operation type to return a
> > > + * previously dequeued event to the event device to be scheduled for further processing.
> > >   *
> > > - * This operation must only be enqueued to the same port that the
> > > + * This event *must* be enqueued to the same port that the
> > >   * event to be forwarded was dequeued from.
> > > + *
> > > + * The event's fields, including (but not limited to) flow_id, scheduling type,
> > > + * destination queue, and event payload e.g. mbuf pointer, may all be updated as
> > > + * desired by the application, but the @ref rte_event.impl_opaque field must
> > > + * be kept to the same value as was present when the event was dequeued.
> > >   */
> > >  #define RTE_EVENT_OP_RELEASE            2
> > >  /**< Release the flow context associated with the schedule type.
> > >   *
> > > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
> > > - * then this function hints the scheduler that the user has completed critical
> > > - * section processing in the current atomic context.
> > > - * The scheduler is now allowed to schedule events from the same flow from
> > > - * an event queue to another port. However, the context may be still held
> > > - * until the next rte_event_dequeue_burst() call, this call allows but does not
> > > - * force the scheduler to release the context early.
> > > - *
> > > - * Early atomic context release may increase parallelism and thus system
> > > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ATOMIC
> > > + * then this operation type hints the scheduler that the user has completed critical
> > > + * section processing for this event in the current atomic context, and that the
> > > + * scheduler may unlock any atomic locks held for this event.
> > > + * If this is the last event from an atomic flow, i.e. all flow locks are released,
> >
> >
> > Similar comment as other email
> > [Jerin] When there are multiple atomic events dequeue from @ref
> > rte_event_dequeue_burst()
> > for the same event queue, and it has same flow id then the lock is ....
> >
> > [Mattias]
> > Yes, or maybe describing the whole lock/unlock state.
> >
> > "The conceptual per-queue-per-flow lock is in a locked state as long
> > (and only as long) as one or more events pertaining to that flow were
> > scheduled to the port in question, but are not yet released."
> >
> > Maybe it needs to be more meaty, describing what released means. I don't
> > have the full context of the documentation in my head when I'm writing this.
> >
>
> Will take a look to reword a bit
>
> >
> > > + * the scheduler is now allowed to schedule events from that flow from to another port.
> > > + * However, the atomic locks may be still held until the next rte_event_dequeue_burst()
> > > + * call; enqueuing an event with opt type @ref RTE_EVENT_OP_RELEASE allows,
> >
> > Is ";" intended?
> >
> > > + * but does not force, the scheduler to release the atomic locks early.
> >
> > instead of "not force", can use the term _hint_ the driver and reword.
>
> Ok.
> >
> > > + *
> > > + * Early atomic lock release may increase parallelism and thus system
> > >   * performance, but the user needs to design carefully the split into critical
> > >   * vs non-critical sections.
> > >   *
> > > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
> > > - * then this function hints the scheduler that the user has done all that need
> > > - * to maintain event order in the current ordered context.
> > > - * The scheduler is allowed to release the ordered context of this port and
> > > - * avoid reordering any following enqueues.
> > > - *
> > > - * Early ordered context release may increase parallelism and thus system
> > > - * performance.
> > > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ORDERED
> > > + * then this operation type informs the scheduler that the current event has
> > > + * completed processing and will not be returned to the scheduler, i.e.
> > > + * it has been dropped, and so the reordering context for that event
> > > + * should be considered filled.
> > >   *
> > > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_PARALLEL*
> > > - * or no scheduling context is held then this function may be an NOOP,
> > > - * depending on the implementation.
> > > + * Events with this operation type must only be enqueued to the same port that the
> > > + * event to be released was dequeued from. The @ref rte_event.impl_opaque
> > > + * field in the release event must have the same value as that in the original dequeued event.
> > >   *
> > > - * This operation must only be enqueued to the same port that the
> > > - * event to be released was dequeued from.
> > > + * If a dequeued event is re-enqueued with operation type of @ref RTE_EVENT_OP_RELEASE,
> > > + * then any subsequent enqueue of that event - or a copy of it - must be done as event of type
> > > + * @ref RTE_EVENT_OP_NEW, not @ref RTE_EVENT_OP_FORWARD. This is because any context for
> > > + * the originally dequeued event, i.e. atomic locks, or reorder buffer entries, will have
> > > + * been removed or invalidated by the release operation.
> > >   */
> > >
> > >  /**
> > > @@ -1517,56 +1524,110 @@ struct rte_event {
> > >                 /** Event attributes for dequeue or enqueue operation */
> > >                 struct {
> > >                         uint32_t flow_id:20;
> > > -                       /**< Targeted flow identifier for the enqueue and
> > > -                        * dequeue operation.
> > > -                        * The value must be in the range of
> > > -                        * [0, nb_event_queue_flows - 1] which
> > > -                        * previously supplied to rte_event_dev_configure().
> > > +                       /**< Target flow identifier for the enqueue and dequeue operation.
> > > +                        *
> > > +                        * For @ref RTE_SCHED_TYPE_ATOMIC, this field is used to identify a
> > > +                        * flow for atomicity within a queue & priority level, such that events
> > > +                        * from each individual flow will only be scheduled to one port at a time.
> > > +                        *
> > > +                        * This field is preserved between enqueue and dequeue when
> > > +                        * a device reports the @ref RTE_EVENT_DEV_CAP_CARRY_FLOW_ID
> > > +                        * capability. Otherwise the value is implementation dependent
> > > +                        * on dequeue.
> > >                          */
> > >                         uint32_t sub_event_type:8;
> > >                         /**< Sub-event types based on the event source.
> > > +                        *
> > > +                        * This field is preserved between enqueue and dequeue.
> > > +                        * This field is for application or event adapter use,
> > > +                        * and is not considered in scheduling decisions.
> >
> >
> > cnxk driver is considering this for scheduling decision to
> > differentiate the producer i.e event adapters.
> > If other drivers are not then we can change the language around it is
> > implementation defined.
> >
> How does the event type influence the scheduling decision? I can drop the

For cnxk, From HW POV, the flow ID is 32 bit which is divided between
flow_id(20 bit), sub event type(8bit) and
event type(4bit)

> last line here

Yes. Please


 > but it seems strange to me that the type of event could affect
> things. I would have thought that with the eventdev API only the queue,
> flow id, and priority would be factors in scheduling?

>
> >
> > > +                        *
> > >                          * @see RTE_EVENT_TYPE_CPU
> > >                          */
> > >                         uint32_t event_type:4;
> > > -                       /**< Event type to classify the event source.
> > > -                        * @see RTE_EVENT_TYPE_ETHDEV, (RTE_EVENT_TYPE_*)
> > > +                       /**< Event type to classify the event source. (RTE_EVENT_TYPE_*)
> > > +                        *
> > > +                        * This field is preserved between enqueue and dequeue
> > > +                        * This field is for application or event adapter use,
> > > +                        * and is not considered in scheduling decisions.
> >
> >
> > cnxk driver is considering this for scheduling decision to
> > differentiate the producer i.e event adapters.
> > If other drivers are not then we can change the language around it is
> > implementation defined.
> >
> > >                          */
> > >                         uint8_t op:2;
> > > -                       /**< The type of event enqueue operation - new/forward/
> > > -                        * etc.This field is not preserved across an instance
> > > -                        * and is undefined on dequeue.
> > > -                        * @see RTE_EVENT_OP_NEW, (RTE_EVENT_OP_*)
> > > +                       /**< The type of event enqueue operation - new/forward/ etc.
> > > +                        *
> > > +                        * This field is *not* preserved across an instance
> > > +                        * and is implementation dependent on dequeue.
> > > +                        *
> > > +                        * @see RTE_EVENT_OP_NEW
> > > +                        * @see RTE_EVENT_OP_FORWARD
> > > +                        * @see RTE_EVENT_OP_RELEASE
> > >                          */
> > >                         uint8_t rsvd:4;
> > > -                       /**< Reserved for future use */
> > > +                       /**< Reserved for future use.
> > > +                        *
> > > +                        * Should be set to zero on enqueue.
> >
> > I am worried about some application explicitly start setting this to
> > zero on every enqueue.
> > Instead, can we say application should not touch the field, Since every eventdev
> > operations starts with dequeue() driver can fill to the correct value.
> >
>
> I'll set this to zero on "NEW", or untouched on FORWARD/RELEASE.

OK

> If we don't state that it should be zeroed on NEW or untouched
> otherwise we cannot use the space in future without ABI break.
>
> > > +                        */
> > >                         uint8_t sched_type:2;
> > >                         /**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
> > >                          * associated with flow id on a given event queue
> > >                          * for the enqueue and dequeue operation.
> > > +                        *
> > > +                        * This field is used to determine the scheduling type
> > > +                        * for events sent to queues where @ref RTE_EVENT_QUEUE_CFG_ALL_TYPES
> > > +                        * is configured.
> > > +                        * For queues where only a single scheduling type is available,
> > > +                        * this field must be set to match the configured scheduling type.
> > > +                        *
> > > +                        * This field is preserved between enqueue and dequeue.
> > > +                        *
> > > +                        * @see RTE_SCHED_TYPE_ORDERED
> > > +                        * @see RTE_SCHED_TYPE_ATOMIC
> > > +                        * @see RTE_SCHED_TYPE_PARALLEL
> > >                          */
> > >                         uint8_t queue_id;
> > >                         /**< Targeted event queue identifier for the enqueue or
> > >                          * dequeue operation.
> > > -                        * The value must be in the range of
> > > -                        * [0, nb_event_queues - 1] which previously supplied to
> > > -                        * rte_event_dev_configure().
> > > +                        * The value must be less than @ref rte_event_dev_config.nb_event_queues
> > > +                        * which was previously supplied to rte_event_dev_configure().
> >
> > Some reason, similar text got removed for flow_id. Please add the same.
> >
>
> That was deliberate based on discussion on V2. See:
>
> http://inbox.dpdk.org/dev/Zby3nb4NGs8T5odL@bricha3-MOBL.ger.corp.intel.com/
>
> and wider thread discussion starting here:
>
> http://inbox.dpdk.org/dev/ZbvOtAEpzja0gu7b@bricha3-MOBL.ger.corp.intel.com/
>
> Basically, the comment is wrong based on what the code does now. No event
> adapters or apps are limiting the flow-id, and nothing seems broken, so we
> can remove the comment.

OK

>

^ permalink raw reply	[relevance 0%]

* Re: [DPDK/ethdev Bug 1381] TAP device can not support 17 queues
  @ 2024-02-20 19:23  3% ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-02-20 19:23 UTC (permalink / raw)
  To: bugzilla; +Cc: dev

On Tue, 20 Feb 2024 03:29:04 +0000
bugzilla@dpdk.org wrote:

> https://bugs.dpdk.org/show_bug.cgi?id=1381
> 
>             Bug ID: 1381
>            Summary: TAP device can not support 17 queues
>            Product: DPDK
>            Version: 23.11
>           Hardware: All
>                 OS: All
>             Status: UNCONFIRMED
>           Severity: normal
>           Priority: Normal
>          Component: ethdev
>           Assignee: dev@dpdk.org
>           Reporter: stephen@networkplumber.org
>   Target Milestone: ---
> 
> If you try:
> # dpdk-testpmd --log-level=pmd.net.tap:debug -l 1-2 --vdev=net_tap0 -- -i
> --rxq=8 --txq=8
> 
> It will fail because:
> EAL: Detected CPU lcores: 8
> EAL: Detected NUMA nodes: 1
> EAL: Detected static linkage of DPDK
> EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
> EAL: Selected IOVA mode 'VA'
> rte_pmd_tap_probe(): Initializing pmd_tap for net_tap0
> eth_dev_tap_create(): TAP device on numa 0
> tun_alloc(): /dev/net/tun Features 00007173
> tun_alloc():   Multi-queue support for 16 queues
> tun_alloc(): Device name is 'dtap0'
> tun_alloc(): Using rt-signal 35
> eth_dev_tap_create(): allocated dtap0
> Interactive-mode selected
> testpmd: create a new mbuf pool <mb_pool_0>: n=155456, size=2176, socket=0
> testpmd: preferred mempool ops selected: ring_mp_mc
> 
> Warning! port-topology=paired and odd forward ports number, the last port will
> pair with itself.
> 
> Configuring Port 0 (socket 0)
> tap_dev_configure(): net_tap0: dtap0: TX configured queues number: 8
> tap_dev_configure(): net_tap0: dtap0: RX configured queues number: 8
> tun_alloc(): /dev/net/tun Features 00007173
> tun_alloc():   Multi-queue support for 16 queues
> tun_alloc(): Device name is 'dtap0'
> tap_setup_queue(): dtap0: add tx queue for qid 0 fd 26
> tap_tx_queue_setup():   TX TUNTAP device name dtap0, qid 0 on fd 26 csum off
> tun_alloc(): /dev/net/tun Features 00007173
> tun_alloc():   Multi-queue support for 16 queues
> tun_alloc(): Device name is 'dtap0'
> tap_setup_queue(): dtap0: add tx queue for qid 1 fd 212
> tap_tx_queue_setup():   TX TUNTAP device name dtap0, qid 1 on fd 212 csum off
> tun_alloc(): /dev/net/tun Features 00007173
> tun_alloc():   Multi-queue support for 16 queues
> tun_alloc(): Device name is 'dtap0'
> tap_setup_queue(): dtap0: add tx queue for qid 2 fd 213
> tap_tx_queue_setup():   TX TUNTAP device name dtap0, qid 2 on fd 213 csum off
> tun_alloc(): /dev/net/tun Features 00007173
> tun_alloc():   Multi-queue support for 16 queues
> tun_alloc(): Device name is 'dtap0'
> tap_setup_queue(): dtap0: add tx queue for qid 3 fd 214
> tap_tx_queue_setup():   TX TUNTAP device name dtap0, qid 3 on fd 214 csum off
> tun_alloc(): /dev/net/tun Features 00007173
> tun_alloc():   Multi-queue support for 16 queues
> tun_alloc(): Device name is 'dtap0'
> tap_setup_queue(): dtap0: add tx queue for qid 4 fd 215
> tap_tx_queue_setup():   TX TUNTAP device name dtap0, qid 4 on fd 215 csum off
> tun_alloc(): /dev/net/tun Features 00007173
> tun_alloc():   Multi-queue support for 16 queues
> tun_alloc(): Device name is 'dtap0'
> tap_setup_queue(): dtap0: add tx queue for qid 5 fd 216
> tap_tx_queue_setup():   TX TUNTAP device name dtap0, qid 5 on fd 216 csum off
> tun_alloc(): /dev/net/tun Features 00007173
> tun_alloc():   Multi-queue support for 16 queues
> tun_alloc(): Device name is 'dtap0'
> tap_setup_queue(): dtap0: add tx queue for qid 6 fd 217
> tap_tx_queue_setup():   TX TUNTAP device name dtap0, qid 6 on fd 217 csum off
> tun_alloc(): /dev/net/tun Features 00007173
> tun_alloc():   Multi-queue support for 16 queues
> tun_alloc(): Device name is 'dtap0'
> tap_setup_queue(): dtap0: add tx queue for qid 7 fd 218
> tap_tx_queue_setup():   TX TUNTAP device name dtap0, qid 7 on fd 218 csum off
> tap_setup_queue(): dtap0: dup fd 26 for rx queue qid 0 (219)
> tap_rx_queue_setup():   RX TUNTAP device name dtap0, qid 0 on fd 219
> tap_setup_queue(): dtap0: dup fd 212 for rx queue qid 1 (220)
> tap_rx_queue_setup():   RX TUNTAP device name dtap0, qid 1 on fd 220
> tap_setup_queue(): dtap0: dup fd 213 for rx queue qid 2 (221)
> tap_rx_queue_setup():   RX TUNTAP device name dtap0, qid 2 on fd 221
> tap_setup_queue(): dtap0: dup fd 214 for rx queue qid 3 (222)
> tap_rx_queue_setup():   RX TUNTAP device name dtap0, qid 3 on fd 222
> tap_setup_queue(): dtap0: dup fd 215 for rx queue qid 4 (223)
> tap_rx_queue_setup():   RX TUNTAP device name dtap0, qid 4 on fd 223
> tap_setup_queue(): dtap0: dup fd 216 for rx queue qid 5 (224)
> tap_rx_queue_setup():   RX TUNTAP device name dtap0, qid 5 on fd 224
> tap_setup_queue(): dtap0: dup fd 217 for rx queue qid 6 (225)
> tap_rx_queue_setup():   RX TUNTAP device name dtap0, qid 6 on fd 225
> tap_setup_queue(): dtap0: dup fd 218 for rx queue qid 7 (226)
> tap_rx_queue_setup():   RX TUNTAP device name dtap0, qid 7 on fd 226
> EAL: Cannot send more than 8 FDs
> tap_mp_req_on_rxtx(): Failed to send start req to secondary 7
> 
> This is a regression caused by:
> commit c36ce7099c2187926cd62cff7ebd479823554929
> Author: Kumara Parameshwaran <kumaraparamesh92@gmail.com>
> Date:   Mon Jan 31 20:02:34 2022 +0530
> 
>     net/tap: fix to populate FDs in secondary process
> 
>     When a tap device is hotplugged to primary process which in turn
>     adds the device to all secondary process, the secondary process
>     does a tap_mp_attach_queues, but the fds are not populated in
>     the primary during the probe they are populated during the queue_setup,
>     added a fix to sync the queues during rte_eth_dev_start
> 
>     Fixes: 4852aa8f6e21 ("drivers/net: enable hotplug on secondary process")
>     Cc: stable@dpdk.org
> 
>     Signed-off-by: Kumara Parameshwaran <kparameshwar@vmware.com>
>     Reviewed-by: Ferruh Yigit <ferruh.yigit@intel.com>
> 

The number of file descriptors allowed in rte_mp_msg should be much larger.
The Linux kernel has an upper limit SCM_MAX_FD which is 253 (see net/scm.h)

But fixing this will break ABI because rte_mp_msg structure was exposed
in rte_eal.h.  It should have been internal!

Alternatively, since fds[] is the last field in rte_mp_msg, and num_fds is
also there, fds[] could have been a flexible array, rather than hard coded.




^ permalink raw reply	[relevance 3%]

* Re: [PATCH v3 10/11] eventdev: clarify docs on event object fields and op types
  @ 2024-02-20 17:39  3%         ` Bruce Richardson
  2024-02-21  9:31  0%           ` Jerin Jacob
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2024-02-20 17:39 UTC (permalink / raw)
  To: Jerin Jacob
  Cc: dev, jerinj, mattias.ronnblom, abdullah.sevincer, sachin.saxena,
	hemant.agrawal, pbhagavatula, pravin.pathak

On Fri, Feb 09, 2024 at 02:44:04PM +0530, Jerin Jacob wrote:
> On Fri, Feb 2, 2024 at 6:11 PM Bruce Richardson
> <bruce.richardson@intel.com> wrote:
> >
> > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > For the fields in "rte_event" struct, enhance the comments on each to
> > clarify the field's use, and whether it is preserved between enqueue and
> > dequeue, and it's role, if any, in scheduling.
> >
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> > V3: updates following review
> > ---
> >  lib/eventdev/rte_eventdev.h | 161 +++++++++++++++++++++++++-----------
> >  1 file changed, 111 insertions(+), 50 deletions(-)
> >
> > diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> > index 8d72765ae7..58219e027e 100644
> > --- a/lib/eventdev/rte_eventdev.h
> > +++ b/lib/eventdev/rte_eventdev.h
> > @@ -1463,47 +1463,54 @@ struct rte_event_vector {
> >
> >  /* Event enqueue operations */
> >  #define RTE_EVENT_OP_NEW                0
> > -/**< The event producers use this operation to inject a new event to the
> > - * event device.
> > +/**< The @ref rte_event.op field must be set to this operation type to inject a new event,
> > + * i.e. one not previously dequeued, into the event device, to be scheduled
> > + * for processing.
> >   */
> >  #define RTE_EVENT_OP_FORWARD            1
> > -/**< The CPU use this operation to forward the event to different event queue or
> > - * change to new application specific flow or schedule type to enable
> > - * pipelining.
> > +/**< The application must set the @ref rte_event.op field to this operation type to return a
> > + * previously dequeued event to the event device to be scheduled for further processing.
> >   *
> > - * This operation must only be enqueued to the same port that the
> > + * This event *must* be enqueued to the same port that the
> >   * event to be forwarded was dequeued from.
> > + *
> > + * The event's fields, including (but not limited to) flow_id, scheduling type,
> > + * destination queue, and event payload e.g. mbuf pointer, may all be updated as
> > + * desired by the application, but the @ref rte_event.impl_opaque field must
> > + * be kept to the same value as was present when the event was dequeued.
> >   */
> >  #define RTE_EVENT_OP_RELEASE            2
> >  /**< Release the flow context associated with the schedule type.
> >   *
> > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
> > - * then this function hints the scheduler that the user has completed critical
> > - * section processing in the current atomic context.
> > - * The scheduler is now allowed to schedule events from the same flow from
> > - * an event queue to another port. However, the context may be still held
> > - * until the next rte_event_dequeue_burst() call, this call allows but does not
> > - * force the scheduler to release the context early.
> > - *
> > - * Early atomic context release may increase parallelism and thus system
> > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ATOMIC
> > + * then this operation type hints the scheduler that the user has completed critical
> > + * section processing for this event in the current atomic context, and that the
> > + * scheduler may unlock any atomic locks held for this event.
> > + * If this is the last event from an atomic flow, i.e. all flow locks are released,
> 
> 
> Similar comment as other email
> [Jerin] When there are multiple atomic events dequeue from @ref
> rte_event_dequeue_burst()
> for the same event queue, and it has same flow id then the lock is ....
> 
> [Mattias]
> Yes, or maybe describing the whole lock/unlock state.
> 
> "The conceptual per-queue-per-flow lock is in a locked state as long
> (and only as long) as one or more events pertaining to that flow were
> scheduled to the port in question, but are not yet released."
> 
> Maybe it needs to be more meaty, describing what released means. I don't
> have the full context of the documentation in my head when I'm writing this.
>

Will take a look to reword a bit
 
> 
> > + * the scheduler is now allowed to schedule events from that flow from to another port.
> > + * However, the atomic locks may be still held until the next rte_event_dequeue_burst()
> > + * call; enqueuing an event with opt type @ref RTE_EVENT_OP_RELEASE allows,
> 
> Is ";" intended?
> 
> > + * but does not force, the scheduler to release the atomic locks early.
> 
> instead of "not force", can use the term _hint_ the driver and reword.

Ok.
> 
> > + *
> > + * Early atomic lock release may increase parallelism and thus system
> >   * performance, but the user needs to design carefully the split into critical
> >   * vs non-critical sections.
> >   *
> > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
> > - * then this function hints the scheduler that the user has done all that need
> > - * to maintain event order in the current ordered context.
> > - * The scheduler is allowed to release the ordered context of this port and
> > - * avoid reordering any following enqueues.
> > - *
> > - * Early ordered context release may increase parallelism and thus system
> > - * performance.
> > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ORDERED
> > + * then this operation type informs the scheduler that the current event has
> > + * completed processing and will not be returned to the scheduler, i.e.
> > + * it has been dropped, and so the reordering context for that event
> > + * should be considered filled.
> >   *
> > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_PARALLEL*
> > - * or no scheduling context is held then this function may be an NOOP,
> > - * depending on the implementation.
> > + * Events with this operation type must only be enqueued to the same port that the
> > + * event to be released was dequeued from. The @ref rte_event.impl_opaque
> > + * field in the release event must have the same value as that in the original dequeued event.
> >   *
> > - * This operation must only be enqueued to the same port that the
> > - * event to be released was dequeued from.
> > + * If a dequeued event is re-enqueued with operation type of @ref RTE_EVENT_OP_RELEASE,
> > + * then any subsequent enqueue of that event - or a copy of it - must be done as event of type
> > + * @ref RTE_EVENT_OP_NEW, not @ref RTE_EVENT_OP_FORWARD. This is because any context for
> > + * the originally dequeued event, i.e. atomic locks, or reorder buffer entries, will have
> > + * been removed or invalidated by the release operation.
> >   */
> >
> >  /**
> > @@ -1517,56 +1524,110 @@ struct rte_event {
> >                 /** Event attributes for dequeue or enqueue operation */
> >                 struct {
> >                         uint32_t flow_id:20;
> > -                       /**< Targeted flow identifier for the enqueue and
> > -                        * dequeue operation.
> > -                        * The value must be in the range of
> > -                        * [0, nb_event_queue_flows - 1] which
> > -                        * previously supplied to rte_event_dev_configure().
> > +                       /**< Target flow identifier for the enqueue and dequeue operation.
> > +                        *
> > +                        * For @ref RTE_SCHED_TYPE_ATOMIC, this field is used to identify a
> > +                        * flow for atomicity within a queue & priority level, such that events
> > +                        * from each individual flow will only be scheduled to one port at a time.
> > +                        *
> > +                        * This field is preserved between enqueue and dequeue when
> > +                        * a device reports the @ref RTE_EVENT_DEV_CAP_CARRY_FLOW_ID
> > +                        * capability. Otherwise the value is implementation dependent
> > +                        * on dequeue.
> >                          */
> >                         uint32_t sub_event_type:8;
> >                         /**< Sub-event types based on the event source.
> > +                        *
> > +                        * This field is preserved between enqueue and dequeue.
> > +                        * This field is for application or event adapter use,
> > +                        * and is not considered in scheduling decisions.
> 
> 
> cnxk driver is considering this for scheduling decision to
> differentiate the producer i.e event adapters.
> If other drivers are not then we can change the language around it is
> implementation defined.
> 
How does the event type influence the scheduling decision? I can drop the
last line here, but it seems strange to me that the type of event could affect
things. I would have thought that with the eventdev API only the queue,
flow id, and priority would be factors in scheduling?

> 
> > +                        *
> >                          * @see RTE_EVENT_TYPE_CPU
> >                          */
> >                         uint32_t event_type:4;
> > -                       /**< Event type to classify the event source.
> > -                        * @see RTE_EVENT_TYPE_ETHDEV, (RTE_EVENT_TYPE_*)
> > +                       /**< Event type to classify the event source. (RTE_EVENT_TYPE_*)
> > +                        *
> > +                        * This field is preserved between enqueue and dequeue
> > +                        * This field is for application or event adapter use,
> > +                        * and is not considered in scheduling decisions.
> 
> 
> cnxk driver is considering this for scheduling decision to
> differentiate the producer i.e event adapters.
> If other drivers are not then we can change the language around it is
> implementation defined.
> 
> >                          */
> >                         uint8_t op:2;
> > -                       /**< The type of event enqueue operation - new/forward/
> > -                        * etc.This field is not preserved across an instance
> > -                        * and is undefined on dequeue.
> > -                        * @see RTE_EVENT_OP_NEW, (RTE_EVENT_OP_*)
> > +                       /**< The type of event enqueue operation - new/forward/ etc.
> > +                        *
> > +                        * This field is *not* preserved across an instance
> > +                        * and is implementation dependent on dequeue.
> > +                        *
> > +                        * @see RTE_EVENT_OP_NEW
> > +                        * @see RTE_EVENT_OP_FORWARD
> > +                        * @see RTE_EVENT_OP_RELEASE
> >                          */
> >                         uint8_t rsvd:4;
> > -                       /**< Reserved for future use */
> > +                       /**< Reserved for future use.
> > +                        *
> > +                        * Should be set to zero on enqueue.
> 
> I am worried about some application explicitly start setting this to
> zero on every enqueue.
> Instead, can we say application should not touch the field, Since every eventdev
> operations starts with dequeue() driver can fill to the correct value.
> 

I'll set this to zero on "NEW", or untouched on FORWARD/RELEASE. 
If we don't state that it should be zeroed on NEW or untouched
otherwise we cannot use the space in future without ABI break.

> > +                        */
> >                         uint8_t sched_type:2;
> >                         /**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
> >                          * associated with flow id on a given event queue
> >                          * for the enqueue and dequeue operation.
> > +                        *
> > +                        * This field is used to determine the scheduling type
> > +                        * for events sent to queues where @ref RTE_EVENT_QUEUE_CFG_ALL_TYPES
> > +                        * is configured.
> > +                        * For queues where only a single scheduling type is available,
> > +                        * this field must be set to match the configured scheduling type.
> > +                        *
> > +                        * This field is preserved between enqueue and dequeue.
> > +                        *
> > +                        * @see RTE_SCHED_TYPE_ORDERED
> > +                        * @see RTE_SCHED_TYPE_ATOMIC
> > +                        * @see RTE_SCHED_TYPE_PARALLEL
> >                          */
> >                         uint8_t queue_id;
> >                         /**< Targeted event queue identifier for the enqueue or
> >                          * dequeue operation.
> > -                        * The value must be in the range of
> > -                        * [0, nb_event_queues - 1] which previously supplied to
> > -                        * rte_event_dev_configure().
> > +                        * The value must be less than @ref rte_event_dev_config.nb_event_queues
> > +                        * which was previously supplied to rte_event_dev_configure().
> 
> Some reason, similar text got removed for flow_id. Please add the same.
> 

That was deliberate based on discussion on V2. See:

http://inbox.dpdk.org/dev/Zby3nb4NGs8T5odL@bricha3-MOBL.ger.corp.intel.com/

and wider thread discussion starting here:

http://inbox.dpdk.org/dev/ZbvOtAEpzja0gu7b@bricha3-MOBL.ger.corp.intel.com/

Basically, the comment is wrong based on what the code does now. No event
adapters or apps are limiting the flow-id, and nothing seems broken, so we
can remove the comment.


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v4 01/18] mbuf: deprecate GCC marker in rte mbuf struct
  2024-02-18 12:39  0%     ` Thomas Monjalon
  2024-02-18 13:07  0%       ` Morten Brørup
@ 2024-02-20 17:20  0%       ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-02-20 17:20 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: dev, Ajit Khaparde, Andrew Boyer, Andrew Rybchenko,
	Bruce Richardson, Chenbo Xia, Chengwen Feng, Dariusz Sosnowski,
	David Christensen, Hyong Youb Kim, Jerin Jacob, Jie Hai,
	Jingjing Wu, John Daley, Kevin Laatz, Kiran Kumar K,
	Konstantin Ananyev, Maciej Czekaj, Matan Azrad, Maxime Coquelin,
	Nithin Dabilpuram, Ori Kam, Ruifeng Wang, Satha Rao,
	Somnath Kotur, Suanming Mou, Sunil Kumar Kori,
	Viacheslav Ovsiienko, Yisen Zhuang, Yuying Zhang, mb

On Sun, Feb 18, 2024 at 01:39:52PM +0100, Thomas Monjalon wrote:
> 15/02/2024 07:21, Tyler Retzlaff:
> > Provide a macro that allows conditional expansion of RTE_MARKER fields
> > to empty to allow rte_mbuf to be used with MSVC. It is proposed that
> > we announce the fields to be __rte_deprecated (currently disabled).
> > 
> > Introduce C11 anonymous unions to permit aliasing of well-known
> > offsets by name into the rte_mbuf structure by a *new* name and to
> > provide padding for cache alignment.
> > 
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> >  doc/guides/rel_notes/deprecation.rst |  20 ++
> >  lib/eal/include/rte_common.h         |   6 +
> >  lib/mbuf/rte_mbuf_core.h             | 375 +++++++++++++++++++----------------
> >  3 files changed, 233 insertions(+), 168 deletions(-)
> > 
> > diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> > index 10630ba..8594255 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -17,6 +17,26 @@ Other API and ABI deprecation notices are to be posted below.
> >  Deprecation Notices
> >  -------------------
> >  
> > +* mbuf: Named zero sized fields of type ``RTE_MARKER`` and ``RTE_MARKER64``
> > +  will be removed from ``struct rte_mbuf`` and replaced with new fields
> > +  in anonymous unions.
> > +
> > +  The names of the fields impacted are:
> > +
> > +    old name                  new name
> > +
> > +  ``cacheline0``            ``mbuf_cachelin0``
> > +  ``rearm_data``            ``mbuf_rearm_data``
> > +  ``rx_descriptor_fields1`` ``mbuf_rx_descriptor_fields1``
> > +  ``cacheline1``            ``mbuf_cachelin1``
> > +
> > +  Contributions to DPDK should immediately start using the new names,
> > +  applications should adapt to new names as soon as possible as the
> > +  old names will be removed in a future DPDK release.
> > +
> > +  Note: types of the new names are not API compatible with the old and
> > +  some code conversion is required to maintain correct behavior.
> > +
> >  * build: The ``enable_kmods`` option is deprecated and will be removed in a future release.
> >    Setting/clearing the option has no impact on the build.
> >    Instead, kernel modules will be always built for OS's where out-of-tree kernel modules
> > diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
> > index d7d6390..af73f67 100644
> > --- a/lib/eal/include/rte_common.h
> > +++ b/lib/eal/include/rte_common.h
> > @@ -582,6 +582,12 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
> >  /** Marker for 8B alignment in a structure. */
> >  __extension__ typedef uint64_t RTE_MARKER64[0];
> >  
> > +#define __rte_marker(type, name) type name /* __rte_deprecated */
> > +
> > +#else
> > +
> > +#define __rte_marker(type, name)
> > +
> >  #endif
> >  
> >  /*********** Macros for calculating min and max **********/
> > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > index 5688683..9e9590b 100644
> > --- a/lib/mbuf/rte_mbuf_core.h
> > +++ b/lib/mbuf/rte_mbuf_core.h
> > @@ -16,7 +16,10 @@
> >   * New fields and flags should fit in the "dynamic space".
> >   */
> >  
> > +#include <assert.h>
> > +#include <stdalign.h>
> >  #include <stdint.h>
> > +#include <stddef.h>
> >  
> >  #include <rte_byteorder.h>
> >  #include <rte_stdatomic.h>
> > @@ -464,204 +467,240 @@ enum {
> >   * The generic rte_mbuf, containing a packet mbuf.
> >   */
> >  struct rte_mbuf {
> > -	RTE_MARKER cacheline0;
> > -
> > -	void *buf_addr;           /**< Virtual address of segment buffer. */
> > +	__rte_marker(RTE_MARKER, cacheline0);
> 
> You don't need to keep the first argument.
> This would be simpler:
> 	__rte_marker(cacheline0);
> You just need to create 2 functions: __rte_marker and __rte_marker64.

no objection, i'll add 2 macros and drop the first argument.

> 
> You should replace all occurrences of RTE_MARKER in DPDK in one patch,
> and mark RTE_MARKER as deprecated (use #pragma GCC poison)

will update to use pragma instead of __rte_deprecated

> 
> > +	union {
> > +		char mbuf_cacheline0[RTE_CACHE_LINE_MIN_SIZE];
> > +		__extension__
> > +		struct {
> > +			void *buf_addr;           /**< Virtual address of segment buffer. 
> 
> I think it is ugly.
> 
> Changing mbuf API is a serious issue.

agreed, do you have an alternate proposal to solve problem?

> 

^ permalink raw reply	[relevance 0%]

* Re: [RFC v3 1/6] eal: add static per-lcore memory allocation facility
  2024-02-20 11:39  0%         ` Bruce Richardson
  2024-02-20 13:37  0%           ` Morten Brørup
@ 2024-02-20 16:26  0%           ` Mattias Rönnblom
  1 sibling, 0 replies; 200+ results
From: Mattias Rönnblom @ 2024-02-20 16:26 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: Mattias Rönnblom, dev, Morten Brørup, Stephen Hemminger

On 2024-02-20 12:39, Bruce Richardson wrote:
> On Tue, Feb 20, 2024 at 11:47:14AM +0100, Mattias Rönnblom wrote:
>> On 2024-02-20 10:11, Bruce Richardson wrote:
>>> On Tue, Feb 20, 2024 at 09:49:03AM +0100, Mattias Rönnblom wrote:
>>>> Introduce DPDK per-lcore id variables, or lcore variables for short.
>>>>
>>>> An lcore variable has one value for every current and future lcore
>>>> id-equipped thread.
>>>>
>>>> The primary <rte_lcore_var.h> use case is for statically allocating
>>>> small chunks of often-used data, which is related logically, but where
>>>> there are performance benefits to reap from having updates being local
>>>> to an lcore.
>>>>
>>>> Lcore variables are similar to thread-local storage (TLS, e.g., C11
>>>> _Thread_local), but decoupling the values' life time with that of the
>>>> threads.
> 
> <snip>
> 
>>>> +/*
>>>> + * Avoid using offset zero, since it would result in a NULL-value
>>>> + * "handle" (offset) pointer, which in principle and per the API
>>>> + * definition shouldn't be an issue, but may confuse some tools and
>>>> + * users.
>>>> + */
>>>> +#define INITIAL_OFFSET 1
>>>> +
>>>> +char rte_lcore_var[RTE_MAX_LCORE][RTE_MAX_LCORE_VAR] __rte_cache_aligned;
>>>> +
>>>
>>> While I like the idea of improved handling for per-core variables, my main
>>> concern with this set is this definition here, which adds yet another
>>> dependency on the compile-time defined RTE_MAX_LCORE value.
>>>
>>
>> lcore variables replaces one RTE_MAX_LCORE-dependent pattern with another.
>>
>> You could even argue the dependency on RTE_MAX_LCORE is reduced with lcore
>> variables, if you look at where/in how many places in the code base this
>> macro is being used. Centralizing per-lcore data management may also provide
>> some opportunity in the future for extending the API to cope with some more
>> dynamic RTE_MAX_LCORE variant. Not without ABI breakage of course, but we
>> are not ever going to change anything related to RTE_MAX_LCORE without
>> breaking the ABI, since this constant is everywhere, including compiled into
>> the application itself.
>>
> 
> Yep, that is true if it's widely used.
> 
>>> I believe we already have an issue with this #define where it's impossible
>>> to come up with a single value that works for all, or nearly all cases. The
>>> current default is still 128, yet DPDK needs to support systems where the
>>> number of cores is well into the hundreds, requiring workarounds of core
>>> mappings or customized builds of DPDK. Upping the value fixes those issues
>>> at the cost to memory footprint explosion for smaller systems.
>>>
>>
>> I agree this is an issue.
>>
>> RTE_MAX_LCORE also need to be sized to accommodate not only all cores used,
>> but the sum of all EAL threads and registered non-EAL threads.
>>
>> So, there is no reliable way to discover what RTE_MAX_LCORE is on a
>> particular piece of hardware, since the actual number of lcore ids needed is
>> up to the application.
>>
>> Why is the default set so low? Linux has MAX_CPUS, which serves the same
>> purpose, which is set to 4096 by default, if I recall correctly. Shouldn't
>> we at least be able to increase it to 256?
> 
> The default is so low because of the mempool caches. These are an array of
> buffer pointers with 512 (IIRC) entries per core up to RTE_MAX_LCORE.
> 
>>
>>> I'm therefore nervous about putting more dependencies on this value, when I
>>> feel we should be moving away from its use, to allow more runtime
>>> configurability of cores.
>>>
>>
>> What more specifically do you have in mind?
>>
> 
> I don't think having a dynamically scaling RTE_MAX_LCORE is feasible, but
> what I would like to see is a runtime specified value. For example, you
> could run DPDK with EAL parameter "--max-lcores=1024" for large systems or
> "--max-lcores=32" for small ones. That would then be used at init-time to
> scale all internal datastructures appropriately.
> 

Sounds reasonably to me, especially if you would take gradual approach.

By gradual I mean something like adding a function 
rte_lcore_max_possible(), or something like that, returning the EAL 
init-specified value. DPDK libraries/PMDs could then gradually be made 
aware and taking advantage of knowing that lcore ids will always be 
below a certain threshold, usually significantly lower than RTE_MAX_LCORE.

The only change required for lcore variables would be that the FOREACH 
macro would use the run-time-max value, rather than RTE_MAX_LCORE, which 
in turn would leave all the higher-numbered lcore id buffers 
untouched/unmapped.

The set of possible lcore ids could also be expressed as a bitset, if 
you have machine with a huge amount of cores, running many small DPDK 
instances.

> /Bruce
> 
> <snip for brevity>

^ permalink raw reply	[relevance 0%]

* RE: [RFC v3 1/6] eal: add static per-lcore memory allocation facility
  2024-02-20 11:39  0%         ` Bruce Richardson
@ 2024-02-20 13:37  0%           ` Morten Brørup
  2024-02-20 16:26  0%           ` Mattias Rönnblom
  1 sibling, 0 replies; 200+ results
From: Morten Brørup @ 2024-02-20 13:37 UTC (permalink / raw)
  To: Bruce Richardson, Mattias Rönnblom
  Cc: Mattias Rönnblom, dev, Stephen Hemminger

> From: Bruce Richardson [mailto:bruce.richardson@intel.com]
> Sent: Tuesday, 20 February 2024 12.39
> 
> On Tue, Feb 20, 2024 at 11:47:14AM +0100, Mattias Rönnblom wrote:
> > On 2024-02-20 10:11, Bruce Richardson wrote:
> > > On Tue, Feb 20, 2024 at 09:49:03AM +0100, Mattias Rönnblom wrote:
> > > > Introduce DPDK per-lcore id variables, or lcore variables for
> short.
> > > >
> > > > An lcore variable has one value for every current and future
> lcore
> > > > id-equipped thread.
> > > >
> > > > The primary <rte_lcore_var.h> use case is for statically
> allocating
> > > > small chunks of often-used data, which is related logically, but
> where
> > > > there are performance benefits to reap from having updates being
> local
> > > > to an lcore.
> > > >
> > > > Lcore variables are similar to thread-local storage (TLS, e.g.,
> C11
> > > > _Thread_local), but decoupling the values' life time with that of
> the
> > > > threads.
> 
> <snip>
> 
> > > > +/*
> > > > + * Avoid using offset zero, since it would result in a NULL-
> value
> > > > + * "handle" (offset) pointer, which in principle and per the API
> > > > + * definition shouldn't be an issue, but may confuse some tools
> and
> > > > + * users.
> > > > + */
> > > > +#define INITIAL_OFFSET 1
> > > > +
> > > > +char rte_lcore_var[RTE_MAX_LCORE][RTE_MAX_LCORE_VAR]
> __rte_cache_aligned;
> > > > +
> > >
> > > While I like the idea of improved handling for per-core variables,
> my main
> > > concern with this set is this definition here, which adds yet
> another
> > > dependency on the compile-time defined RTE_MAX_LCORE value.
> > >
> >
> > lcore variables replaces one RTE_MAX_LCORE-dependent pattern with
> another.
> >
> > You could even argue the dependency on RTE_MAX_LCORE is reduced with
> lcore
> > variables, if you look at where/in how many places in the code base
> this
> > macro is being used. Centralizing per-lcore data management may also
> provide
> > some opportunity in the future for extending the API to cope with
> some more
> > dynamic RTE_MAX_LCORE variant. Not without ABI breakage of course,
> but we
> > are not ever going to change anything related to RTE_MAX_LCORE
> without
> > breaking the ABI, since this constant is everywhere, including
> compiled into
> > the application itself.
> >
> 
> Yep, that is true if it's widely used.
> 
> > > I believe we already have an issue with this #define where it's
> impossible
> > > to come up with a single value that works for all, or nearly all
> cases. The
> > > current default is still 128, yet DPDK needs to support systems
> where the
> > > number of cores is well into the hundreds, requiring workarounds of
> core
> > > mappings or customized builds of DPDK. Upping the value fixes those
> issues
> > > at the cost to memory footprint explosion for smaller systems.
> > >
> >
> > I agree this is an issue.
> >
> > RTE_MAX_LCORE also need to be sized to accommodate not only all cores
> used,
> > but the sum of all EAL threads and registered non-EAL threads.
> >
> > So, there is no reliable way to discover what RTE_MAX_LCORE is on a
> > particular piece of hardware, since the actual number of lcore ids
> needed is
> > up to the application.
> >
> > Why is the default set so low? Linux has MAX_CPUS, which serves the
> same
> > purpose, which is set to 4096 by default, if I recall correctly.
> Shouldn't
> > we at least be able to increase it to 256?

I recall a recent techboard meeting where the default was discussed. The default was agreed so low because it suffices for the vast majority of hardware out there, and applications for bigger platforms can be expected to build DPDK with a different configuration themselves. And as Bruce also mentions, it's a tradeoff for memory consumption.

> 
> The default is so low because of the mempool caches. These are an array
> of
> buffer pointers with 512 (IIRC) entries per core up to RTE_MAX_LCORE.

The decision was based on a need to make a quick decision, so we used narrow guesstimates, not a broader memory consumption analysis.

If we really cared about default memory consumption, we should reduce the default RTE_MAX_QUEUES_PER_PORT from 1024 too. It has quite an effect.

Having hard data about which build time configuration parameters have the biggest effect on memory consumption would be extremely useful for tweaking the parameters for resource limited hardware.
It's a mix of static and dynamic allocation, so it's not obvious which scalable data structures consume the most memory.

> 
> >
> > > I'm therefore nervous about putting more dependencies on this
> value, when I
> > > feel we should be moving away from its use, to allow more runtime
> > > configurability of cores.
> > >
> >
> > What more specifically do you have in mind?
> >
> 
> I don't think having a dynamically scaling RTE_MAX_LCORE is feasible,
> but
> what I would like to see is a runtime specified value. For example, you
> could run DPDK with EAL parameter "--max-lcores=1024" for large systems
> or
> "--max-lcores=32" for small ones. That would then be used at init-time
> to
> scale all internal datastructures appropriately.
> 

I agree 100 % that a better long term solution should be on the general road map.
Memory is a precious resource, but few seem to care about it.

A mix could provide an easy migration path:
Having RTE_MAX_LCORE as the hard upper limit (and default value) for a runtime specified max number ("rte_max_lcores").
With this, the goal would be for modules with very small data sets to continue using RTE_MAX_LCORE fixed size arrays, and for modules with larger data sets to migrate to rte_max_lcores dynamically sized arrays.

I am opposed to blocking a new patch series, only because it adds another RTE_MAX_LCORE sized array. We already have plenty of those.
It can be migrated towards dynamically sized array at a later time, just like the other modules with RTE_MAX_LCORE sized arrays.
Perhaps "fixing" an existing module would free up more memory than fixing this module. Let's spend development resources where they have the biggest impact.


^ permalink raw reply	[relevance 0%]

* Re: [RFC v3 1/6] eal: add static per-lcore memory allocation facility
  2024-02-20 10:47  3%       ` Mattias Rönnblom
@ 2024-02-20 11:39  0%         ` Bruce Richardson
  2024-02-20 13:37  0%           ` Morten Brørup
  2024-02-20 16:26  0%           ` Mattias Rönnblom
  0 siblings, 2 replies; 200+ results
From: Bruce Richardson @ 2024-02-20 11:39 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: Mattias Rönnblom, dev, Morten Brørup, Stephen Hemminger

On Tue, Feb 20, 2024 at 11:47:14AM +0100, Mattias Rönnblom wrote:
> On 2024-02-20 10:11, Bruce Richardson wrote:
> > On Tue, Feb 20, 2024 at 09:49:03AM +0100, Mattias Rönnblom wrote:
> > > Introduce DPDK per-lcore id variables, or lcore variables for short.
> > > 
> > > An lcore variable has one value for every current and future lcore
> > > id-equipped thread.
> > > 
> > > The primary <rte_lcore_var.h> use case is for statically allocating
> > > small chunks of often-used data, which is related logically, but where
> > > there are performance benefits to reap from having updates being local
> > > to an lcore.
> > > 
> > > Lcore variables are similar to thread-local storage (TLS, e.g., C11
> > > _Thread_local), but decoupling the values' life time with that of the
> > > threads.

<snip>

> > > +/*
> > > + * Avoid using offset zero, since it would result in a NULL-value
> > > + * "handle" (offset) pointer, which in principle and per the API
> > > + * definition shouldn't be an issue, but may confuse some tools and
> > > + * users.
> > > + */
> > > +#define INITIAL_OFFSET 1
> > > +
> > > +char rte_lcore_var[RTE_MAX_LCORE][RTE_MAX_LCORE_VAR] __rte_cache_aligned;
> > > +
> > 
> > While I like the idea of improved handling for per-core variables, my main
> > concern with this set is this definition here, which adds yet another
> > dependency on the compile-time defined RTE_MAX_LCORE value.
> > 
> 
> lcore variables replaces one RTE_MAX_LCORE-dependent pattern with another.
> 
> You could even argue the dependency on RTE_MAX_LCORE is reduced with lcore
> variables, if you look at where/in how many places in the code base this
> macro is being used. Centralizing per-lcore data management may also provide
> some opportunity in the future for extending the API to cope with some more
> dynamic RTE_MAX_LCORE variant. Not without ABI breakage of course, but we
> are not ever going to change anything related to RTE_MAX_LCORE without
> breaking the ABI, since this constant is everywhere, including compiled into
> the application itself.
> 

Yep, that is true if it's widely used.

> > I believe we already have an issue with this #define where it's impossible
> > to come up with a single value that works for all, or nearly all cases. The
> > current default is still 128, yet DPDK needs to support systems where the
> > number of cores is well into the hundreds, requiring workarounds of core
> > mappings or customized builds of DPDK. Upping the value fixes those issues
> > at the cost to memory footprint explosion for smaller systems.
> > 
> 
> I agree this is an issue.
> 
> RTE_MAX_LCORE also need to be sized to accommodate not only all cores used,
> but the sum of all EAL threads and registered non-EAL threads.
> 
> So, there is no reliable way to discover what RTE_MAX_LCORE is on a
> particular piece of hardware, since the actual number of lcore ids needed is
> up to the application.
> 
> Why is the default set so low? Linux has MAX_CPUS, which serves the same
> purpose, which is set to 4096 by default, if I recall correctly. Shouldn't
> we at least be able to increase it to 256?

The default is so low because of the mempool caches. These are an array of
buffer pointers with 512 (IIRC) entries per core up to RTE_MAX_LCORE.

> 
> > I'm therefore nervous about putting more dependencies on this value, when I
> > feel we should be moving away from its use, to allow more runtime
> > configurability of cores.
> > 
> 
> What more specifically do you have in mind?
> 

I don't think having a dynamically scaling RTE_MAX_LCORE is feasible, but
what I would like to see is a runtime specified value. For example, you
could run DPDK with EAL parameter "--max-lcores=1024" for large systems or
"--max-lcores=32" for small ones. That would then be used at init-time to
scale all internal datastructures appropriately.

/Bruce

<snip for brevity>

^ permalink raw reply	[relevance 0%]

* [PATCH v3 1/7] ethdev: support report register names and filter
  @ 2024-02-20 10:58  8%   ` Jie Hai
  0 siblings, 0 replies; 200+ results
From: Jie Hai @ 2024-02-20 10:58 UTC (permalink / raw)
  To: dev; +Cc: lihuisong, fengchengwen, liuyonglong, huangdengdui, ferruh.yigit

This patch adds "filter" and "names" fields to "rte_dev_reg_info"
structure. Names of registers in data fields can be reported and
the registers can be filtered by their names.

The new API rte_eth_dev_get_reg_info_ext() is added to support
reporting names and filtering by names. And the original API
rte_eth_dev_get_reg_info() does not use the name and filter fields.
A local variable is used in rte_eth_dev_get_reg_info for
compatibility. If the drivers does not report the names, set them
to "offset_XXX".

Signed-off-by: Jie Hai <haijie1@huawei.com>
---
 doc/guides/rel_notes/release_24_03.rst |  9 +++++++
 lib/ethdev/rte_dev_info.h              | 11 ++++++++
 lib/ethdev/rte_ethdev.c                | 36 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 28 ++++++++++++++++++++
 lib/ethdev/version.map                 |  1 +
 5 files changed, 85 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index db8be50c6dfd..f8882ba36bb9 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -102,6 +102,12 @@ New Features
 
   * Added support for comparing result between packet fields or value.
 
+* **Added support for dumping regiters with names and filter.**
+
+  * Added new API functions ``rte_eth_dev_get_reg_info_ext()`` to and filter
+    the registers by their names and get the information of registers(names,
+    values and other attributes).
+
 
 Removed Items
 -------------
@@ -154,6 +160,9 @@ ABI Changes
 
 * No ABI change that would break compatibility with 23.11.
 
+* ethdev: Added ``filter`` and ``names`` fields to ``rte_dev_reg_info``
+  structure for reporting names of registers and filtering them by names.
+
 
 Known Issues
 ------------
diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
index 67cf0ae52668..2f4541bd46c8 100644
--- a/lib/ethdev/rte_dev_info.h
+++ b/lib/ethdev/rte_dev_info.h
@@ -11,6 +11,11 @@ extern "C" {
 
 #include <stdint.h>
 
+#define RTE_ETH_REG_NAME_SIZE 128
+struct rte_eth_reg_name {
+	char name[RTE_ETH_REG_NAME_SIZE];
+};
+
 /*
  * Placeholder for accessing device registers
  */
@@ -20,6 +25,12 @@ struct rte_dev_reg_info {
 	uint32_t length; /**< Number of registers to fetch */
 	uint32_t width; /**< Size of device register */
 	uint32_t version; /**< Device version */
+	/**
+	 * Filter for target subset of registers.
+	 * This field could affects register selection for data/length/names.
+	 */
+	char *filter;
+	struct rte_eth_reg_name *names; /**< Registers name saver */
 };
 
 /*
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e80..9eb4e696a51a 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6388,8 +6388,39 @@ rte_eth_read_clock(uint16_t port_id, uint64_t *clock)
 
 int
 rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
+{
+	struct rte_dev_reg_info reg_info = { 0 };
+	int ret;
+
+	if (info == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Cannot get ethdev port %u register info to NULL",
+			port_id);
+		return -EINVAL;
+	}
+
+	reg_info.length = info->length;
+	reg_info.data = info->data;
+	reg_info.names = NULL;
+	reg_info.filter = NULL;
+
+	ret = rte_eth_dev_get_reg_info_ext(port_id, &reg_info);
+	if (ret != 0)
+		return ret;
+
+	info->length = reg_info.length;
+	info->width = reg_info.width;
+	info->version = reg_info.version;
+	info->offset = reg_info.offset;
+
+	return 0;
+}
+
+int
+rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info)
 {
 	struct rte_eth_dev *dev;
+	uint32_t i;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
@@ -6408,6 +6439,11 @@ rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
 
 	rte_ethdev_trace_get_reg_info(port_id, info, ret);
 
+	/* Report the default names if drivers not report. */
+	if (info->names != NULL && strlen(info->names[0].name) == 0)
+		for (i = 0; i < info->length; i++)
+			snprintf(info->names[i].name, RTE_ETH_REG_NAME_SIZE,
+				"offset_%x", info->offset + i * info->width);
 	return ret;
 }
 
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index a14ca15f34ce..4b4aa8d2152e 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -5066,6 +5066,34 @@ __rte_experimental
 int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id,
 		struct rte_power_monitor_cond *pmc);
 
+/**
+ * Retrieve the filtered device registers (values and names) and
+ * register attributes (number of registers and register size)
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param info
+ *   Pointer to rte_dev_reg_info structure to fill in.
+ *   If info->filter is not NULL and the driver does not support names or
+ *   filter, return error. If info->filter is NULL, return info for all
+ *   registers (seen as filter none).
+ *   If info->data is NULL, the function fills in the width and length fields.
+ *   If non-NULL, ethdev considers there are enough spaces to store the
+ *   registers, and the values of registers whose name contains the filter
+ *   string are put into the buffer pointed at by the data field. Do the same
+ *   for the names of registers if info->names is not NULL. If drivers do not
+ *   report names, default names are given by ethdev.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EIO) if device is removed.
+ *   - others depends on the specific operations implementation.
+ */
+__rte_experimental
+int rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info);
+
 /**
  * Retrieve device registers and register attributes (number of registers and
  * register size)
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 8678b6991eee..2bdafce693c3 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -325,6 +325,7 @@ EXPERIMENTAL {
 	rte_flow_template_table_resizable;
 	rte_flow_template_table_resize;
 	rte_flow_template_table_resize_complete;
+	rte_eth_dev_get_reg_info_ext;
 };
 
 INTERNAL {
-- 
2.30.0


^ permalink raw reply	[relevance 8%]

* Re: [RFC v3 1/6] eal: add static per-lcore memory allocation facility
  @ 2024-02-20 10:47  3%       ` Mattias Rönnblom
  2024-02-20 11:39  0%         ` Bruce Richardson
  0 siblings, 1 reply; 200+ results
From: Mattias Rönnblom @ 2024-02-20 10:47 UTC (permalink / raw)
  To: Bruce Richardson, Mattias Rönnblom
  Cc: dev, Morten Brørup, Stephen Hemminger

On 2024-02-20 10:11, Bruce Richardson wrote:
> On Tue, Feb 20, 2024 at 09:49:03AM +0100, Mattias Rönnblom wrote:
>> Introduce DPDK per-lcore id variables, or lcore variables for short.
>>
>> An lcore variable has one value for every current and future lcore
>> id-equipped thread.
>>
>> The primary <rte_lcore_var.h> use case is for statically allocating
>> small chunks of often-used data, which is related logically, but where
>> there are performance benefits to reap from having updates being local
>> to an lcore.
>>
>> Lcore variables are similar to thread-local storage (TLS, e.g., C11
>> _Thread_local), but decoupling the values' life time with that of the
>> threads.
>>
>> Lcore variables are also similar in terms of functionality provided by
>> FreeBSD kernel's DPCPU_*() family of macros and the associated
>> build-time machinery. DPCPU uses linker scripts, which effectively
>> prevents the reuse of its, otherwise seemingly viable, approach.
>>
>> The currently-prevailing way to solve the same problem as lcore
>> variables is to keep a module's per-lcore data as RTE_MAX_LCORE-sized
>> array of cache-aligned, RTE_CACHE_GUARDed structs. The benefit of
>> lcore variables over this approach is that data related to the same
>> lcore now is close (spatially, in memory), rather than data used by
>> the same module, which in turn avoid excessive use of padding,
>> polluting caches with unused data.
>>
>> RFC v3:
>>   * Replace use of GCC-specific alignof(<expression>) with alignof(<type>).
>>   * Update example to reflect FOREACH macro name change (in RFC v2).
>>
>> RFC v2:
>>   * Use alignof to derive alignment requirements. (Morten Brørup)
>>   * Change name of FOREACH to make it distinct from <rte_lcore.h>'s
>>     *per-EAL-thread* RTE_LCORE_FOREACH(). (Morten Brørup)
>>   * Allow user-specified alignment, but limit max to cache line size.
>>
>> Signed-off-by: Mattias Rönnblom <mattias.ronnblom@ericsson.com>
>> ---
>>   config/rte_config.h                   |   1 +
>>   doc/api/doxy-api-index.md             |   1 +
>>   lib/eal/common/eal_common_lcore_var.c |  82 ++++++
>>   lib/eal/common/meson.build            |   1 +
>>   lib/eal/include/meson.build           |   1 +
>>   lib/eal/include/rte_lcore_var.h       | 375 ++++++++++++++++++++++++++
>>   lib/eal/version.map                   |   4 +
>>   7 files changed, 465 insertions(+)
>>   create mode 100644 lib/eal/common/eal_common_lcore_var.c
>>   create mode 100644 lib/eal/include/rte_lcore_var.h
>>
>> diff --git a/config/rte_config.h b/config/rte_config.h
>> index da265d7dd2..884482e473 100644
>> --- a/config/rte_config.h
>> +++ b/config/rte_config.h
>> @@ -30,6 +30,7 @@
>>   /* EAL defines */
>>   #define RTE_CACHE_GUARD_LINES 1
>>   #define RTE_MAX_HEAPS 32
>> +#define RTE_MAX_LCORE_VAR 1048576
>>   #define RTE_MAX_MEMSEG_LISTS 128
>>   #define RTE_MAX_MEMSEG_PER_LIST 8192
>>   #define RTE_MAX_MEM_MB_PER_LIST 32768
>> diff --git a/doc/api/doxy-api-index.md b/doc/api/doxy-api-index.md
>> index a6a768bd7c..bb06bb7ca1 100644
>> --- a/doc/api/doxy-api-index.md
>> +++ b/doc/api/doxy-api-index.md
>> @@ -98,6 +98,7 @@ The public API headers are grouped by topics:
>>     [interrupts](@ref rte_interrupts.h),
>>     [launch](@ref rte_launch.h),
>>     [lcore](@ref rte_lcore.h),
>> +  [lcore-varible](@ref rte_lcore_var.h),
>>     [per-lcore](@ref rte_per_lcore.h),
>>     [service cores](@ref rte_service.h),
>>     [keepalive](@ref rte_keepalive.h),
>> diff --git a/lib/eal/common/eal_common_lcore_var.c b/lib/eal/common/eal_common_lcore_var.c
>> new file mode 100644
>> index 0000000000..dfd11cbd0b
>> --- /dev/null
>> +++ b/lib/eal/common/eal_common_lcore_var.c
>> @@ -0,0 +1,82 @@
>> +/* SPDX-License-Identifier: BSD-3-Clause
>> + * Copyright(c) 2024 Ericsson AB
>> + */
>> +
>> +#include <inttypes.h>
>> +
>> +#include <rte_common.h>
>> +#include <rte_debug.h>
>> +#include <rte_log.h>
>> +
>> +#include <rte_lcore_var.h>
>> +
>> +#include "eal_private.h"
>> +
>> +#define WARN_THRESHOLD 75
>> +
>> +/*
>> + * Avoid using offset zero, since it would result in a NULL-value
>> + * "handle" (offset) pointer, which in principle and per the API
>> + * definition shouldn't be an issue, but may confuse some tools and
>> + * users.
>> + */
>> +#define INITIAL_OFFSET 1
>> +
>> +char rte_lcore_var[RTE_MAX_LCORE][RTE_MAX_LCORE_VAR] __rte_cache_aligned;
>> +
> 
> While I like the idea of improved handling for per-core variables, my main
> concern with this set is this definition here, which adds yet another
> dependency on the compile-time defined RTE_MAX_LCORE value.
> 

lcore variables replaces one RTE_MAX_LCORE-dependent pattern with another.

You could even argue the dependency on RTE_MAX_LCORE is reduced with 
lcore variables, if you look at where/in how many places in the code 
base this macro is being used. Centralizing per-lcore data management 
may also provide some opportunity in the future for extending the API to 
cope with some more dynamic RTE_MAX_LCORE variant. Not without ABI 
breakage of course, but we are not ever going to change anything related 
to RTE_MAX_LCORE without breaking the ABI, since this constant is 
everywhere, including compiled into the application itself.

> I believe we already have an issue with this #define where it's impossible
> to come up with a single value that works for all, or nearly all cases. The
> current default is still 128, yet DPDK needs to support systems where the
> number of cores is well into the hundreds, requiring workarounds of core
> mappings or customized builds of DPDK. Upping the value fixes those issues
> at the cost to memory footprint explosion for smaller systems.
> 

I agree this is an issue.

RTE_MAX_LCORE also need to be sized to accommodate not only all cores 
used, but the sum of all EAL threads and registered non-EAL threads.

So, there is no reliable way to discover what RTE_MAX_LCORE is on a 
particular piece of hardware, since the actual number of lcore ids 
needed is up to the application.

Why is the default set so low? Linux has MAX_CPUS, which serves the same 
purpose, which is set to 4096 by default, if I recall correctly. 
Shouldn't we at least be able to increase it to 256?

> I'm therefore nervous about putting more dependencies on this value, when I
> feel we should be moving away from its use, to allow more runtime
> configurability of cores.
> 

What more specifically do you have in mind?

Maybe I'm overly pessimistic, but supporting lcores without any upper 
bound and also allowing them to be added and removed at any point during 
run time seems far-fetched, given where DPDK is today.

To include an actual upper bound, set during DPDK run-time 
initialization, lower than RTE_MAX_LCORE, seems easier. I think there is 
some equivalent in the Linux kernel. Again, you would need to 
accommodate for future rte_register_thread() calls.

<rte_lcore_var.h> could be extended with a user-specified lcore variable 
  init/free function callbacks, to allow lazy or late initialization.

If one could have a way to retrieve the max possible lcore ids *for a 
particular DPDK process* (as opposed to a particular build) it would be 
possible to avoid touching the per-lcore buffers for lcore ids that 
would never be used. With data in BSS, it would never be mapped/allocated.

An issue with BSS data is that there might be very RT-sensitive 
applications deciding to lock all memory into RAM, to avoid latency 
jitter caused by paging, and such would suffer from a large 
rte_lcore_var (or all the current static arrays). Lcore variables makes 
this worse, since rte_lcore_var is larger than the sum of today's static 
arrays, and must be so, with some margin, since there is no way to 
figure out ahead of time how much memory is actually going to be needed.

> For this set/feature, would it be possible to have a run-time allocated
> (and sized) array rather than using the RTE_MAX_LCORE value?
> 

What I explored was having the per-lcore buffers dynamically allocated. 
What I ran into was I saw no apparent benefit, and with dynamic 
allocation there were new problems to solve. One was to assure lcore 
variable buffers were allocated before they were being used. In 
particular if you want to use huge page memory, lcore variables may be 
available only when that machinery is ready to accept requests.

Also, with huge page memory, you won't get the benefit you will get from 
depend paging and BSS (i.e., only used memory is actually allocated).

With malloc(), I believe you generally do get that same benefit, if you 
allocation is sufficiently large.

I also considered just allocating chunks, fitting (say) 64 kB worth of 
lcore variables in each. Turned out more complex, and to no benefit, 
other than reducing footprint for mlockall() type apps, which seemed 
like corner case.

I never considered no upper-bound, dynamic, RTE_MAX_LCORE.

> Thanks, (and apologies for the mini-rant!)
> 
> /Bruce

Thanks for the comments. This is was no way near a rant.


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 1/7] ethdev: support report register names and filter
  2024-02-07 17:00  3%     ` Ferruh Yigit
@ 2024-02-20  8:43  3%       ` Jie Hai
  0 siblings, 0 replies; 200+ results
From: Jie Hai @ 2024-02-20  8:43 UTC (permalink / raw)
  To: Ferruh Yigit, dev; +Cc: lihuisong, fengchengwen, liuyonglong, huangdengdui

Hi, Ferruh,
Thanks for your review.

On 2024/2/8 1:00, Ferruh Yigit wrote:
> On 2/5/2024 10:51 AM, Jie Hai wrote:
>> This patch adds "filter" and "names" fields to "rte_dev_reg_info"
>> structure. Names of registers in data fields can be reported and
>> the registers can be filtered by their names.
>>
>> For compatibility, the original API rte_eth_dev_get_reg_info()
>> does not use the name and filter fields. The new API
>> rte_eth_dev_get_reg_info_ext() is added to support reporting
>> names and filtering by names. If the drivers does not report
>> the names, set them to "offset_XXX".
>>
>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>> ---
>>   doc/guides/rel_notes/release_24_03.rst |  8 ++++++
>>   lib/ethdev/rte_dev_info.h              | 11 ++++++++
>>   lib/ethdev/rte_ethdev.c                | 36 ++++++++++++++++++++++++++
>>   lib/ethdev/rte_ethdev.h                | 22 ++++++++++++++++
>>   4 files changed, 77 insertions(+)
>>
>> diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
>> index 84d3144215c6..5d402341223a 100644
>> --- a/doc/guides/rel_notes/release_24_03.rst
>> +++ b/doc/guides/rel_notes/release_24_03.rst
>> @@ -75,6 +75,11 @@ New Features
>>     * Added support for Atomic Rules' TK242 packet-capture family of devices
>>       with PCI IDs: ``0x1024, 0x1025, 0x1026``.
>>   
>> +* **Added support for dumping regiters with names and filter.**
>>
> 
> s/regiters/registers/
Will correct in v3.
> 
>> +
>> +  * Added new API functions ``rte_eth_dev_get_reg_info_ext()`` to and filter
>> +  * the registers by their names and get the information of registers(names,
>> +  * values and other attributes).
>>   
> 
> '*' makes a bullet, but above seems one sentences, if so please only
> keep the first '*'.
Will correct in v3.
> 
>>   Removed Items
>>   -------------
>> @@ -124,6 +129,9 @@ ABI Changes
>>   
>>   * No ABI change that would break compatibility with 23.11.
>>   
>> +* ethdev: Added ``filter`` and ``names`` fields to ``rte_dev_reg_info``
>> +  structure for reporting names of regiters and filtering them by names.
>> +
>>   
> 
> This will break the ABI.
> 
> Think about a case, an application compiled with an old version of DPDK,
> later same application started to use this version without re-compile,
> application will send old version of 'struct rte_dev_reg_info', but new
> version of DPDK will try to access or update new fields of the 'struct
> rte_dev_reg_info'
> 
Actually, we use a local variable "struct rte_dev_reg_info reg_info" in
'rte_eth_dev_get_reg_info()' to pass to the driver, and the new fields
are set to zero. The old drivers do not visit the new fields.
We make constrains that if filter is NULL, do not filter and get info of 
all registers.
For an old version APP and new version ethdev, driver will not visit the 
new fields. so it does not break the ABI.
> One option is:
> - to add a new 'struct rte_dev_reg_info_ext',
> - 'rte_eth_dev_get_reg_info()' still uses old 'struct rte_dev_reg_info'
> - 'get_reg()' dev_ops will use this new 'struct rte_dev_reg_info_ext'
> - Add deprecation notice to update 'rte_eth_dev_get_reg_info()' to use
> new struct in next LTS release
> 

> 
>>   Known Issues
>>   ------------
>> diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
>> index 67cf0ae52668..2f4541bd46c8 100644
>> --- a/lib/ethdev/rte_dev_info.h
>> +++ b/lib/ethdev/rte_dev_info.h
>> @@ -11,6 +11,11 @@ extern "C" {
>>   
>>   #include <stdint.h>
>>   
>> +#define RTE_ETH_REG_NAME_SIZE 128
>> +struct rte_eth_reg_name {
>> +	char name[RTE_ETH_REG_NAME_SIZE];
>> +};
>> +
>>   /*
>>    * Placeholder for accessing device registers
>>    */
>> @@ -20,6 +25,12 @@ struct rte_dev_reg_info {
>>   	uint32_t length; /**< Number of registers to fetch */
>>   	uint32_t width; /**< Size of device register */
>>   	uint32_t version; /**< Device version */
>> +	/**
>> +	 * Filter for target subset of registers.
>> +	 * This field could affects register selection for data/length/names.
>> +	 */
>> +	char *filter;
>> +	struct rte_eth_reg_name *names; /**< Registers name saver */
>>   };
>>   
>>   /*
>> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
>> index f1c658f49e80..3e0294e49092 100644
>> --- a/lib/ethdev/rte_ethdev.c
>> +++ b/lib/ethdev/rte_ethdev.c
>> @@ -6388,8 +6388,39 @@ rte_eth_read_clock(uint16_t port_id, uint64_t *clock)
>>   
>>   int
>>   rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
>> +{
>> +	struct rte_dev_reg_info reg_info;
>> +	int ret;
>> +
>> +	if (info == NULL) {
>> +		RTE_ETHDEV_LOG_LINE(ERR,
>> +			"Cannot get ethdev port %u register info to NULL",
>> +			port_id);
>> +		return -EINVAL;
>> +	}
>> +
>> +	reg_info.length = info->length;
>> +	reg_info.data = info->data;
>> +	reg_info.names = NULL;
>> +	reg_info.filter = NULL;
>> +
>> +	ret = rte_eth_dev_get_reg_info_ext(port_id, &reg_info);
>> +	if (ret != 0)
>> +		return ret;
>> +
>> +	info->length = reg_info.length;
>> +	info->width = reg_info.width;
>> +	info->version = reg_info.version;
>> +	info->offset = reg_info.offset;
>> +
>> +	return 0;
>> +}
>> +
>> +int
>> +rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info)
>>   {
>>   	struct rte_eth_dev *dev;
>> +	uint32_t i;
>>   	int ret;
>>   
>>   	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>> @@ -6408,6 +6439,11 @@ rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
>>   
>>   	rte_ethdev_trace_get_reg_info(port_id, info, ret);
>>   
>> +	/* Report the default names if drivers not report. */
>> +	if (info->names != NULL && strlen(info->names[0].name) == 0)
>> +		for (i = 0; i < info->length; i++)
>> +			sprintf(info->names[i].name, "offset_%x",
>> +				info->offset + i * info->width);
>>
> 
> Better to use 'snprintf()'
> 
Will correct in v3.
>>   	return ret;
>>   }
>>   
>> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
>> index 2687c23fa6fb..3abc2ad3f865 100644
>> --- a/lib/ethdev/rte_ethdev.h
>> +++ b/lib/ethdev/rte_ethdev.h
>> @@ -5053,6 +5053,28 @@ __rte_experimental
>>   int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id,
>>   		struct rte_power_monitor_cond *pmc);
>>   
>> +/**
>> + * Retrieve the filtered device registers (values and names) and
>> + * register attributes (number of registers and register size)
>> + *
>> + * @param port_id
>> + *   The port identifier of the Ethernet device.
>> + * @param info
>> + *   Pointer to rte_dev_reg_info structure to fill in. If info->data is
>> + *   NULL, the function fills in the width and length fields. If non-NULL,
>> + *   the values of registers whose name contains the filter string are put
>> + *   into the buffer pointed at by the data field. Do the same for the names
>> + *   of registers if info->names is not NULL.
>>
> 
> May be good to mention if info->names is not NULL, but driver doesn't
> support names, ehtdev fills the names automatically.
> 
> As both 'rte_eth_dev_get_reg_info()' & 'rte_eth_dev_get_reg_info_ext()'
> use same dev_ops ('get_reg()'), it is possible that driver doesn't
> support filtering, so if the user provides info->filter but driver
> doesn't support it, I think API should return error, what do you think?
> 
> And can you please make it clear above, if filtering is provided with
> info->data = NULL, are the returned width and length fields for filtered
> number of registers or all registers?
> 
> 
Will correct in v3.
>> + * @return
>> + *   - (0) if successful.
>> + *   - (-ENOTSUP) if hardware doesn't support.
>> + *   - (-EINVAL) if bad parameter.
>> + *   - (-ENODEV) if *port_id* invalid.
>> + *   - (-EIO) if device is removed.
>> + *   - others depends on the specific operations implementation.
>> + */
>> +int rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info);
>> +
>>
> 
> Can you please make the new API as experimental. That is the requirement
> for new APIs.
> 
> Also need to add the API to version.map
Will correct in v3.
> 
> 
>>   /**
>>    * Retrieve device registers and register attributes (number of registers and
>>    * register size)
> 
> .

Best regards,
Jie Hai

^ permalink raw reply	[relevance 3%]

* [PATCH v8 0/2]  add telemetry cmds for ring
    @ 2024-02-19  8:32  3% ` Jie Hai
  1 sibling, 0 replies; 200+ results
From: Jie Hai @ 2024-02-19  8:32 UTC (permalink / raw)
  To: dev
  Cc: lihuisong, fengchengwen, huangdengdui, ferruh.yigit, thomas,
	konstantin.v.ananyev, honnappa.nagarahalli, david.marchand,
	Ruifeng.Wang, mb

This patch set supports telemetry cmd to list rings and dump information
of a ring by its name.

v1->v2:
1. Add space after "switch".
2. Fix wrong strlen parameter.

v2->v3:
1. Remove prefix "rte_" for static function.
2. Add Acked-by Konstantin Ananyev for PATCH 1.
3. Introduce functions to return strings instead copy strings.
4. Check pointer to memzone of ring.
5. Remove redundant variable.
6. Hold lock when access ring data.

v3->v4:
1. Update changelog according to reviews of Honnappa Nagarahalli.
2. Add Reviewed-by Honnappa Nagarahalli.
3. Correct grammar in help information.
4. Correct spell warning on "te" reported by checkpatch.pl.
5. Use ring_walk() to query ring info instead of rte_ring_lookup().
6. Fix that type definition the flag field of rte_ring does not match the usage.
7. Use rte_tel_data_add_dict_uint_hex instead of rte_tel_data_add_dict_u64
   for mask and flags.

v4->v5:
1. Add Acked-by Konstantin Ananyev and Chengwen Feng.
2. Add ABI change explanation for commit message of patch 1/3.

v5->v6:
1. Add Acked-by Morten Br?rup.
2. Fix incorrect reference of commit.

v6->v7:
1. Remove prod/consumer head/tail info.

v7->v8:
1. Drop patch 1/3 and related codes.

Jie Hai (2):
  ring: add telemetry cmd to list rings
  ring: add telemetry cmd for ring info

 lib/ring/meson.build |   1 +
 lib/ring/rte_ring.c  | 135 +++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 136 insertions(+)

-- 
2.30.0


^ permalink raw reply	[relevance 3%]

* Re: [RESEND v7 1/3] ring: fix unmatched type definition and usage
  2024-02-18 18:11  3%     ` Thomas Monjalon
@ 2024-02-19  8:24  0%       ` Jie Hai
  0 siblings, 0 replies; 200+ results
From: Jie Hai @ 2024-02-19  8:24 UTC (permalink / raw)
  To: Thomas Monjalon
  Cc: honnappa.nagarahalli, konstantin.v.ananyev, dev, david.marchand,
	Ruifeng.Wang, mb, lihuisong, fengchengwen, liudongdong3

On 2024/2/19 2:11, Thomas Monjalon wrote:
> 09/11/2023 11:20, Jie Hai:
>> Field 'flags' of struct rte_ring is defined as int type. However,
>> it is used as unsigned int. To ensure consistency, change the
>> type of flags to unsigned int. Since these two types has the
>> same byte size, this change is not an ABI change.
>>
>> Fixes: af75078fece3 ("first public release")
>>
>> Signed-off-by: Jie Hai <haijie1@huawei.com>
>> Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
>> Acked-by: Chengwen Feng <fengchengwen@huawei.com>
>> Acked-by: Morten Brørup <mb@smartsharesystems.com>
>> ---
>>   lib/ring/rte_ring_core.h | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
>> index b7708730658a..14dac6495d83 100644
>> --- a/lib/ring/rte_ring_core.h
>> +++ b/lib/ring/rte_ring_core.h
>> @@ -119,7 +119,7 @@ struct rte_ring_hts_headtail {
>>   struct rte_ring {
>>   	char name[RTE_RING_NAMESIZE] __rte_cache_aligned;
>>   	/**< Name of the ring. */
>> -	int flags;               /**< Flags supplied at creation. */
>> +	uint32_t flags;               /**< Flags supplied at creation. */
> 
> This triggers a warning in our ABI checker:
> 
>        in pointed to type 'struct rte_ring' at rte_ring_core.h:119:1:
>          type size hasn't changed
>          1 data member change:
>            type of 'int flags' changed:
>              entity changed from 'int' to compatible type 'typedef uint32_t' at stdint-uintn.h:26:1
>                type name changed from 'int' to 'unsigned int'
>                type size hasn't changed
> 
> I guess we were supposed to merge this in 23.11, sorry about this.
> 
> How can we proceed?
> 
How about we drop this amendment (patch 1/3) for now?
> 
> .

^ permalink raw reply	[relevance 0%]

* Re: [RESEND v7 1/3] ring: fix unmatched type definition and usage
  @ 2024-02-18 18:11  3%     ` Thomas Monjalon
  2024-02-19  8:24  0%       ` Jie Hai
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-02-18 18:11 UTC (permalink / raw)
  To: Jie Hai
  Cc: honnappa.nagarahalli, konstantin.v.ananyev, dev, david.marchand,
	Ruifeng.Wang, mb, lihuisong, fengchengwen, liudongdong3

09/11/2023 11:20, Jie Hai:
> Field 'flags' of struct rte_ring is defined as int type. However,
> it is used as unsigned int. To ensure consistency, change the
> type of flags to unsigned int. Since these two types has the
> same byte size, this change is not an ABI change.
> 
> Fixes: af75078fece3 ("first public release")
> 
> Signed-off-by: Jie Hai <haijie1@huawei.com>
> Acked-by: Konstantin Ananyev <konstantin.v.ananyev@yandex.ru>
> Acked-by: Chengwen Feng <fengchengwen@huawei.com>
> Acked-by: Morten Brørup <mb@smartsharesystems.com>
> ---
>  lib/ring/rte_ring_core.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/ring/rte_ring_core.h b/lib/ring/rte_ring_core.h
> index b7708730658a..14dac6495d83 100644
> --- a/lib/ring/rte_ring_core.h
> +++ b/lib/ring/rte_ring_core.h
> @@ -119,7 +119,7 @@ struct rte_ring_hts_headtail {
>  struct rte_ring {
>  	char name[RTE_RING_NAMESIZE] __rte_cache_aligned;
>  	/**< Name of the ring. */
> -	int flags;               /**< Flags supplied at creation. */
> +	uint32_t flags;               /**< Flags supplied at creation. */

This triggers a warning in our ABI checker:

      in pointed to type 'struct rte_ring' at rte_ring_core.h:119:1:
        type size hasn't changed
        1 data member change:
          type of 'int flags' changed:
            entity changed from 'int' to compatible type 'typedef uint32_t' at stdint-uintn.h:26:1
              type name changed from 'int' to 'unsigned int'
              type size hasn't changed

I guess we were supposed to merge this in 23.11, sorry about this.

How can we proceed?



^ permalink raw reply	[relevance 3%]

* RE: [PATCH v4 01/18] mbuf: deprecate GCC marker in rte mbuf struct
  2024-02-18 12:39  0%     ` Thomas Monjalon
@ 2024-02-18 13:07  0%       ` Morten Brørup
  2024-02-20 17:20  0%       ` Tyler Retzlaff
  1 sibling, 0 replies; 200+ results
From: Morten Brørup @ 2024-02-18 13:07 UTC (permalink / raw)
  To: Thomas Monjalon, dev, Tyler Retzlaff
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang

> From: Thomas Monjalon [mailto:thomas@monjalon.net]
> Sent: Sunday, 18 February 2024 13.40
> 
> 15/02/2024 07:21, Tyler Retzlaff:
> > Provide a macro that allows conditional expansion of RTE_MARKER
> fields
> > to empty to allow rte_mbuf to be used with MSVC. It is proposed that
> > we announce the fields to be __rte_deprecated (currently disabled).
> >
> > Introduce C11 anonymous unions to permit aliasing of well-known
> > offsets by name into the rte_mbuf structure by a *new* name and to
> > provide padding for cache alignment.
> >
> > Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> > ---
> >  doc/guides/rel_notes/deprecation.rst |  20 ++
> >  lib/eal/include/rte_common.h         |   6 +
> >  lib/mbuf/rte_mbuf_core.h             | 375 +++++++++++++++++++------
> ----------
> >  3 files changed, 233 insertions(+), 168 deletions(-)
> >
> > diff --git a/doc/guides/rel_notes/deprecation.rst
> b/doc/guides/rel_notes/deprecation.rst
> > index 10630ba..8594255 100644
> > --- a/doc/guides/rel_notes/deprecation.rst
> > +++ b/doc/guides/rel_notes/deprecation.rst
> > @@ -17,6 +17,26 @@ Other API and ABI deprecation notices are to be
> posted below.
> >  Deprecation Notices
> >  -------------------
> >
> > +* mbuf: Named zero sized fields of type ``RTE_MARKER`` and
> ``RTE_MARKER64``
> > +  will be removed from ``struct rte_mbuf`` and replaced with new
> fields
> > +  in anonymous unions.
> > +
> > +  The names of the fields impacted are:
> > +
> > +    old name                  new name
> > +
> > +  ``cacheline0``            ``mbuf_cachelin0``
> > +  ``rearm_data``            ``mbuf_rearm_data``
> > +  ``rx_descriptor_fields1`` ``mbuf_rx_descriptor_fields1``
> > +  ``cacheline1``            ``mbuf_cachelin1``
> > +
> > +  Contributions to DPDK should immediately start using the new
> names,
> > +  applications should adapt to new names as soon as possible as the
> > +  old names will be removed in a future DPDK release.
> > +
> > +  Note: types of the new names are not API compatible with the old
> and
> > +  some code conversion is required to maintain correct behavior.
> > +
> >  * build: The ``enable_kmods`` option is deprecated and will be
> removed in a future release.
> >    Setting/clearing the option has no impact on the build.
> >    Instead, kernel modules will be always built for OS's where out-
> of-tree kernel modules
> > diff --git a/lib/eal/include/rte_common.h
> b/lib/eal/include/rte_common.h
> > index d7d6390..af73f67 100644
> > --- a/lib/eal/include/rte_common.h
> > +++ b/lib/eal/include/rte_common.h
> > @@ -582,6 +582,12 @@ static void
> __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
> >  /** Marker for 8B alignment in a structure. */
> >  __extension__ typedef uint64_t RTE_MARKER64[0];
> >
> > +#define __rte_marker(type, name) type name /* __rte_deprecated */
> > +
> > +#else
> > +
> > +#define __rte_marker(type, name)
> > +
> >  #endif
> >
> >  /*********** Macros for calculating min and max **********/
> > diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> > index 5688683..9e9590b 100644
> > --- a/lib/mbuf/rte_mbuf_core.h
> > +++ b/lib/mbuf/rte_mbuf_core.h
> > @@ -16,7 +16,10 @@
> >   * New fields and flags should fit in the "dynamic space".
> >   */
> >
> > +#include <assert.h>
> > +#include <stdalign.h>
> >  #include <stdint.h>
> > +#include <stddef.h>
> >
> >  #include <rte_byteorder.h>
> >  #include <rte_stdatomic.h>
> > @@ -464,204 +467,240 @@ enum {
> >   * The generic rte_mbuf, containing a packet mbuf.
> >   */
> >  struct rte_mbuf {
> > -	RTE_MARKER cacheline0;
> > -
> > -	void *buf_addr;           /**< Virtual address of segment buffer.
> */
> > +	__rte_marker(RTE_MARKER, cacheline0);
> 
> You don't need to keep the first argument.
> This would be simpler:
> 	__rte_marker(cacheline0);
> You just need to create 2 functions: __rte_marker and __rte_marker64.
> 
> You should replace all occurrences of RTE_MARKER in DPDK in one patch,
> and mark RTE_MARKER as deprecated (use #pragma GCC poison)

I like this suggestion.
However, some applications might use RTE_MARKER in their own structures.
Wouldn't it be considered API breakage to mark RTE_MARKER as deprecated?

> 
> > +	union {
> > +		char mbuf_cacheline0[RTE_CACHE_LINE_MIN_SIZE];
> > +		__extension__
> > +		struct {
> > +			void *buf_addr;           /**< Virtual address of
> segment buffer.
> 
> I think it is ugly.
> 
> Changing mbuf API is a serious issue.


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v4 01/18] mbuf: deprecate GCC marker in rte mbuf struct
  2024-02-15  6:21  3%   ` [PATCH v4 01/18] mbuf: deprecate GCC marker in rte mbuf struct Tyler Retzlaff
@ 2024-02-18 12:39  0%     ` Thomas Monjalon
  2024-02-18 13:07  0%       ` Morten Brørup
  2024-02-20 17:20  0%       ` Tyler Retzlaff
  0 siblings, 2 replies; 200+ results
From: Thomas Monjalon @ 2024-02-18 12:39 UTC (permalink / raw)
  To: dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb, Tyler Retzlaff

15/02/2024 07:21, Tyler Retzlaff:
> Provide a macro that allows conditional expansion of RTE_MARKER fields
> to empty to allow rte_mbuf to be used with MSVC. It is proposed that
> we announce the fields to be __rte_deprecated (currently disabled).
> 
> Introduce C11 anonymous unions to permit aliasing of well-known
> offsets by name into the rte_mbuf structure by a *new* name and to
> provide padding for cache alignment.
> 
> Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
> ---
>  doc/guides/rel_notes/deprecation.rst |  20 ++
>  lib/eal/include/rte_common.h         |   6 +
>  lib/mbuf/rte_mbuf_core.h             | 375 +++++++++++++++++++----------------
>  3 files changed, 233 insertions(+), 168 deletions(-)
> 
> diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
> index 10630ba..8594255 100644
> --- a/doc/guides/rel_notes/deprecation.rst
> +++ b/doc/guides/rel_notes/deprecation.rst
> @@ -17,6 +17,26 @@ Other API and ABI deprecation notices are to be posted below.
>  Deprecation Notices
>  -------------------
>  
> +* mbuf: Named zero sized fields of type ``RTE_MARKER`` and ``RTE_MARKER64``
> +  will be removed from ``struct rte_mbuf`` and replaced with new fields
> +  in anonymous unions.
> +
> +  The names of the fields impacted are:
> +
> +    old name                  new name
> +
> +  ``cacheline0``            ``mbuf_cachelin0``
> +  ``rearm_data``            ``mbuf_rearm_data``
> +  ``rx_descriptor_fields1`` ``mbuf_rx_descriptor_fields1``
> +  ``cacheline1``            ``mbuf_cachelin1``
> +
> +  Contributions to DPDK should immediately start using the new names,
> +  applications should adapt to new names as soon as possible as the
> +  old names will be removed in a future DPDK release.
> +
> +  Note: types of the new names are not API compatible with the old and
> +  some code conversion is required to maintain correct behavior.
> +
>  * build: The ``enable_kmods`` option is deprecated and will be removed in a future release.
>    Setting/clearing the option has no impact on the build.
>    Instead, kernel modules will be always built for OS's where out-of-tree kernel modules
> diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
> index d7d6390..af73f67 100644
> --- a/lib/eal/include/rte_common.h
> +++ b/lib/eal/include/rte_common.h
> @@ -582,6 +582,12 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
>  /** Marker for 8B alignment in a structure. */
>  __extension__ typedef uint64_t RTE_MARKER64[0];
>  
> +#define __rte_marker(type, name) type name /* __rte_deprecated */
> +
> +#else
> +
> +#define __rte_marker(type, name)
> +
>  #endif
>  
>  /*********** Macros for calculating min and max **********/
> diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
> index 5688683..9e9590b 100644
> --- a/lib/mbuf/rte_mbuf_core.h
> +++ b/lib/mbuf/rte_mbuf_core.h
> @@ -16,7 +16,10 @@
>   * New fields and flags should fit in the "dynamic space".
>   */
>  
> +#include <assert.h>
> +#include <stdalign.h>
>  #include <stdint.h>
> +#include <stddef.h>
>  
>  #include <rte_byteorder.h>
>  #include <rte_stdatomic.h>
> @@ -464,204 +467,240 @@ enum {
>   * The generic rte_mbuf, containing a packet mbuf.
>   */
>  struct rte_mbuf {
> -	RTE_MARKER cacheline0;
> -
> -	void *buf_addr;           /**< Virtual address of segment buffer. */
> +	__rte_marker(RTE_MARKER, cacheline0);

You don't need to keep the first argument.
This would be simpler:
	__rte_marker(cacheline0);
You just need to create 2 functions: __rte_marker and __rte_marker64.

You should replace all occurrences of RTE_MARKER in DPDK in one patch,
and mark RTE_MARKER as deprecated (use #pragma GCC poison)

> +	union {
> +		char mbuf_cacheline0[RTE_CACHE_LINE_MIN_SIZE];
> +		__extension__
> +		struct {
> +			void *buf_addr;           /**< Virtual address of segment buffer. 

I think it is ugly.

Changing mbuf API is a serious issue.



^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 0/4] more replacement of zero length array
  2024-02-16 10:14  0%         ` David Marchand
@ 2024-02-16 20:46  0%           ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-02-16 20:46 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Bruce Richardson, Cristian Dumitrescu, Honnappa Nagarahalli,
	Sameh Gobriel, Vladimir Medvedkin, Yipeng Wang, mb, fengchengwen,
	Dodji Seketeli

On Fri, Feb 16, 2024 at 11:14:27AM +0100, David Marchand wrote:
> On Wed, Feb 14, 2024 at 8:36 AM David Marchand
> <david.marchand@redhat.com> wrote:
> > > I'm okay with the change being merged but if there is concern I can drop
> > > this patch from the series.
> >
> > At least, we can't merge it in the current form.
> >
> > If libabigail gets a fix quickly, DPDK CI will still need a released version.
> > So for this patch to be merged now, we need a libabigail suppression rule.
> > I don't see a way to precisely waive this issue, so my suggestion is
> > to silence any change on the concerned structure here (which should be
> > ok, as the pipeline library data struct have been super stable for a
> > couple of years).
> > Something like:
> >
> > $ git diff
> > diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> > index 21b8cd6113..d667157909 100644
> > --- a/devtools/libabigail.abignore
> > +++ b/devtools/libabigail.abignore
> > @@ -33,3 +33,5 @@
> >  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> >  ; Temporary exceptions till next major ABI version ;
> >  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> > +[suppress_type]
> > +       name = rte_pipeline_table_entry
> 
> Dodji confirmed the issue in libabigail and prepared a fix.
> 
> DPDK still needs a suppression rule (like the one proposed above) if
> we want to merge this change before the libabigail fix makes it to all
> distribs.
> Please resubmit this series with my proposal and a comment pointing at
> libabigail bz squashed in patch 4.

this works out conveniently, i noticed there are a few more instances
that i'll try to add to this series so i'll come back with a new rev.

i've marked the series changes requested in patchwork for now.

> 
> 
> -- 
> David Marchand

^ permalink raw reply	[relevance 0%]

* Re: [PATCH] doc: update minimum Linux kernel version
  2024-02-16  8:29  0%             ` Morten Brørup
@ 2024-02-16 17:22  0%               ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-02-16 17:22 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Patrick Robb, Aaron Conole, dev

On Fri, 16 Feb 2024 09:29:47 +0100
Morten Brørup <mb@smartsharesystems.com> wrote:

> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Friday, 16 February 2024 04.05
> > 
> > On Thu, 11 Jan 2024 23:38:07 +0100
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> >   
> > > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > > Sent: Thursday, 11 January 2024 20.55
> > > >
> > > > On Thu, 11 Jan 2024 20:26:56 +0100
> > > > Morten Brørup <mb@smartsharesystems.com> wrote:
> > > >  
> > > > >
> > > > >
> > > > > When the documentation specifies a minimum required kernel  
> > version,  
> > > > it implicitly claims that DPDK works with that kernel version.  
> > > > >
> > > > > So we should either test with the specified kernel version (which  
> > > > requires significant effort to set up, so I’m not going to ask for
> > > > it!), or add a big fat disclaimer/warning that DPDK is not tested  
> > with  
> > > > the mentioned kernel version, and list the kernel versions actually
> > > > tested.
> > > >
> > > > It is much less of an issue than it used to be since there should  
> > be no  
> > > > need for
> > > > DPDK specific kernel components. The kernel API/ABI is stable  
> > across  
> > > > releases
> > > > (with the notable exception of BPF). Therefore the DPDK kernel  
> > version  
> > > > dependency
> > > > is much less than it used to be.  
> > 
> > There are three issues here:
> > 
> > 1. Supporting out of date kernel also means supporting out of date
> > build environments
> >    that maybe missing headers. The recent example was the TAP device
> > requiring (or cloning
> >    which is worse) the headers to the FLOWER classifier.  If we move
> > the kernel version
> >    to current LTS, then FLOWER is always present.
> > 2. Supporting out of date kernel means more test infrastructure. Some
> > CI needs to build
> >    test on older environments.
> > 3. The place it impacts current CI is the building on CentOS7. CentOS7
> > is end of life
> >    do we have to keep it? The compiler also lack good C11 support so
> > not sure how CI keeps working.
> > 
> > The way I view it, if you are on an old system, then stick to old DPDK
> > LTS version.
> > We don't want to here about regressions on end of life systems.  
> 
> The system requirements in the Getting Started Guide [1] says:
> 
> Kernel version >= 4.14
> The kernel version required is based on the oldest long term stable kernel available at kernel.org when the DPDK version is in development.
> Compatibility for recent distribution kernels will be kept, notably RHEL/CentOS 7.

We need to drop CentOS 7 soon.

https://www.redhat.com/en/topics/linux/centos-linux-eol

	CentOS Linux 7 will reach end of life (EOL) on June 30, 2024. 

^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2] lib/hash: new feature adding existing key
  @ 2024-02-16 12:43  3%   ` Thomas Monjalon
  0 siblings, 0 replies; 200+ results
From: Thomas Monjalon @ 2024-02-16 12:43 UTC (permalink / raw)
  To: dev
  Cc: Abdullah Ömer Yamaç,
	Yipeng Wang, Sameh Gobriel, Bruce Richardson, Vladimir Medvedkin,
	David Marchand, Abdullah Ömer Yamaç

Any review please?
If maintainers agree with the idea, we should announce the ABI change.


23/10/2023 10:29, Abdullah Ömer Yamaç:
> From: Abdullah Ömer Yamaç <omer.yamac@ceng.metu.edu.tr>
> 
> In some use cases inserting data with the same key shouldn't be
> overwritten. We use a new flag in this patch to disable overwriting
> data for the same key.
> 
> Signed-off-by: Abdullah Ömer Yamaç <omer.yamac@ceng.metu.edu.tr>
> 
> ---
> Cc: Yipeng Wang <yipeng1.wang@intel.com>
> Cc: Sameh Gobriel <sameh.gobriel@intel.com>
> Cc: Bruce Richardson <bruce.richardson@intel.com>
> Cc: Vladimir Medvedkin <vladimir.medvedkin@intel.com>
> Cc: David Marchand <david.marchand@redhat.com>
> ---
>  lib/hash/rte_cuckoo_hash.c | 10 +++++++++-
>  lib/hash/rte_cuckoo_hash.h |  2 ++
>  lib/hash/rte_hash.h        |  4 ++++
>  3 files changed, 15 insertions(+), 1 deletion(-)
> 
> diff --git a/lib/hash/rte_cuckoo_hash.c b/lib/hash/rte_cuckoo_hash.c
> index 19b23f2a97..fe8f21bee4 100644
> --- a/lib/hash/rte_cuckoo_hash.c
> +++ b/lib/hash/rte_cuckoo_hash.c
> @@ -32,7 +32,8 @@
>  				   RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY | \
>  				   RTE_HASH_EXTRA_FLAGS_EXT_TABLE |	\
>  				   RTE_HASH_EXTRA_FLAGS_NO_FREE_ON_DEL | \
> -				   RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF)
> +				   RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF | \
> +				   RTE_HASH_EXTRA_FLAGS_DISABLE_UPDATE_EXISTING_KEY)
>  
>  #define FOR_EACH_BUCKET(CURRENT_BKT, START_BUCKET)                            \
>  	for (CURRENT_BKT = START_BUCKET;                                      \
> @@ -148,6 +149,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
>  	unsigned int readwrite_concur_support = 0;
>  	unsigned int writer_takes_lock = 0;
>  	unsigned int no_free_on_del = 0;
> +	unsigned int no_update_data = 0;
>  	uint32_t *ext_bkt_to_free = NULL;
>  	uint32_t *tbl_chng_cnt = NULL;
>  	struct lcore_cache *local_free_slots = NULL;
> @@ -216,6 +218,9 @@ rte_hash_create(const struct rte_hash_parameters *params)
>  		no_free_on_del = 1;
>  	}
>  
> +	if (params->extra_flag & RTE_HASH_EXTRA_FLAGS_DISABLE_UPDATE_EXISTING_KEY)
> +		no_update_data = 1;
> +
>  	/* Store all keys and leave the first entry as a dummy entry for lookup_bulk */
>  	if (use_local_cache)
>  		/*
> @@ -428,6 +433,7 @@ rte_hash_create(const struct rte_hash_parameters *params)
>  	h->ext_table_support = ext_table_support;
>  	h->writer_takes_lock = writer_takes_lock;
>  	h->no_free_on_del = no_free_on_del;
> +	h->no_update_data = no_update_data;
>  	h->readwrite_concur_lf_support = readwrite_concur_lf_support;
>  
>  #if defined(RTE_ARCH_X86)
> @@ -707,6 +713,8 @@ search_and_update(const struct rte_hash *h, void *data, const void *key,
>  			k = (struct rte_hash_key *) ((char *)keys +
>  					bkt->key_idx[i] * h->key_entry_size);
>  			if (rte_hash_cmp_eq(key, k->key, h) == 0) {
> +				if (h->no_update_data == 1)
> +					return -EINVAL;
>  				/* The store to application data at *data
>  				 * should not leak after the store to pdata
>  				 * in the key store. i.e. pdata is the guard
> diff --git a/lib/hash/rte_cuckoo_hash.h b/lib/hash/rte_cuckoo_hash.h
> index eb2644f74b..e8b7283ec2 100644
> --- a/lib/hash/rte_cuckoo_hash.h
> +++ b/lib/hash/rte_cuckoo_hash.h
> @@ -193,6 +193,8 @@ struct rte_hash {
>  	/**< If read-write concurrency support is enabled */
>  	uint8_t ext_table_support;     /**< Enable extendable bucket table */
>  	uint8_t no_free_on_del;
> +	/**< If update is prohibited on adding same key */
> +	uint8_t no_update_data;
>  	/**< If key index should be freed on calling rte_hash_del_xxx APIs.
>  	 * If this is set, rte_hash_free_key_with_position must be called to
>  	 * free the key index associated with the deleted entry.
> diff --git a/lib/hash/rte_hash.h b/lib/hash/rte_hash.h
> index 7ecc021111..ca5b4841d2 100644
> --- a/lib/hash/rte_hash.h
> +++ b/lib/hash/rte_hash.h
> @@ -55,6 +55,10 @@ extern "C" {
>   */
>  #define RTE_HASH_EXTRA_FLAGS_RW_CONCURRENCY_LF 0x20
>  
> +/** Flag to disable updating data of existing key
> + */
> +#define RTE_HASH_EXTRA_FLAGS_DISABLE_UPDATE_EXISTING_KEY 0x40
> +
>  /**
>   * The type of hash value of a key.
>   * It should be a value of at least 32bit with fully random pattern.
> 






^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 0/4] more replacement of zero length array
  2024-02-14  7:36  4%       ` David Marchand
@ 2024-02-16 10:14  0%         ` David Marchand
  2024-02-16 20:46  0%           ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-02-16 10:14 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Bruce Richardson, Cristian Dumitrescu, Honnappa Nagarahalli,
	Sameh Gobriel, Vladimir Medvedkin, Yipeng Wang, mb, fengchengwen,
	Dodji Seketeli

On Wed, Feb 14, 2024 at 8:36 AM David Marchand
<david.marchand@redhat.com> wrote:
> > I'm okay with the change being merged but if there is concern I can drop
> > this patch from the series.
>
> At least, we can't merge it in the current form.
>
> If libabigail gets a fix quickly, DPDK CI will still need a released version.
> So for this patch to be merged now, we need a libabigail suppression rule.
> I don't see a way to precisely waive this issue, so my suggestion is
> to silence any change on the concerned structure here (which should be
> ok, as the pipeline library data struct have been super stable for a
> couple of years).
> Something like:
>
> $ git diff
> diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
> index 21b8cd6113..d667157909 100644
> --- a/devtools/libabigail.abignore
> +++ b/devtools/libabigail.abignore
> @@ -33,3 +33,5 @@
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
>  ; Temporary exceptions till next major ABI version ;
>  ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
> +[suppress_type]
> +       name = rte_pipeline_table_entry

Dodji confirmed the issue in libabigail and prepared a fix.

DPDK still needs a suppression rule (like the one proposed above) if
we want to merge this change before the libabigail fix makes it to all
distribs.
Please resubmit this series with my proposal and a comment pointing at
libabigail bz squashed in patch 4.


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* RE: [PATCH] doc: update minimum Linux kernel version
  2024-02-16  3:05  0%           ` Stephen Hemminger
@ 2024-02-16  8:29  0%             ` Morten Brørup
  2024-02-16 17:22  0%               ` Stephen Hemminger
  0 siblings, 1 reply; 200+ results
From: Morten Brørup @ 2024-02-16  8:29 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Patrick Robb, Aaron Conole, dev

> From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> Sent: Friday, 16 February 2024 04.05
> 
> On Thu, 11 Jan 2024 23:38:07 +0100
> Morten Brørup <mb@smartsharesystems.com> wrote:
> 
> > > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > > Sent: Thursday, 11 January 2024 20.55
> > >
> > > On Thu, 11 Jan 2024 20:26:56 +0100
> > > Morten Brørup <mb@smartsharesystems.com> wrote:
> > >
> > > >
> > > >
> > > > When the documentation specifies a minimum required kernel
> version,
> > > it implicitly claims that DPDK works with that kernel version.
> > > >
> > > > So we should either test with the specified kernel version (which
> > > requires significant effort to set up, so I’m not going to ask for
> > > it!), or add a big fat disclaimer/warning that DPDK is not tested
> with
> > > the mentioned kernel version, and list the kernel versions actually
> > > tested.
> > >
> > > It is much less of an issue than it used to be since there should
> be no
> > > need for
> > > DPDK specific kernel components. The kernel API/ABI is stable
> across
> > > releases
> > > (with the notable exception of BPF). Therefore the DPDK kernel
> version
> > > dependency
> > > is much less than it used to be.
> 
> There are three issues here:
> 
> 1. Supporting out of date kernel also means supporting out of date
> build environments
>    that maybe missing headers. The recent example was the TAP device
> requiring (or cloning
>    which is worse) the headers to the FLOWER classifier.  If we move
> the kernel version
>    to current LTS, then FLOWER is always present.
> 2. Supporting out of date kernel means more test infrastructure. Some
> CI needs to build
>    test on older environments.
> 3. The place it impacts current CI is the building on CentOS7. CentOS7
> is end of life
>    do we have to keep it? The compiler also lack good C11 support so
> not sure how CI keeps working.
> 
> The way I view it, if you are on an old system, then stick to old DPDK
> LTS version.
> We don't want to here about regressions on end of life systems.

The system requirements in the Getting Started Guide [1] says:

Kernel version >= 4.14
The kernel version required is based on the oldest long term stable kernel available at kernel.org when the DPDK version is in development.
Compatibility for recent distribution kernels will be kept, notably RHEL/CentOS 7.

[1]: https://doc.dpdk.org/guides/linux_gsg/sys_reqs.html#system-software

If we consider it API breakage to change that, we have to wait until the 24.11 release.
For future DPDK LTS releases, we should be more careful about what we claim to support. And again: If we claim to support something, people expect it to be tested in CI.

Disregarding the API breakage by stopping support for a system we claim to support... RHEL7 testing was changed to LTS only [2], that should probably have been applied to CentOS 7 too.

[2]: https://inbox.dpdk.org/dev/CAJvnSUBcq3gznQD4k=krQ+gu2OxTxA2YJBc=J=LtidFXqgg_hg@mail.gmail.com/



^ permalink raw reply	[relevance 0%]

* Re: [PATCH] doc: update minimum Linux kernel version
  @ 2024-02-16  3:05  0%           ` Stephen Hemminger
  2024-02-16  8:29  0%             ` Morten Brørup
  0 siblings, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-02-16  3:05 UTC (permalink / raw)
  To: Morten Brørup; +Cc: Patrick Robb, Aaron Conole, dev

On Thu, 11 Jan 2024 23:38:07 +0100
Morten Brørup <mb@smartsharesystems.com> wrote:

> > From: Stephen Hemminger [mailto:stephen@networkplumber.org]
> > Sent: Thursday, 11 January 2024 20.55
> > 
> > On Thu, 11 Jan 2024 20:26:56 +0100
> > Morten Brørup <mb@smartsharesystems.com> wrote:
> >   
> > >
> > >
> > > When the documentation specifies a minimum required kernel version,  
> > it implicitly claims that DPDK works with that kernel version.  
> > >
> > > So we should either test with the specified kernel version (which  
> > requires significant effort to set up, so I’m not going to ask for
> > it!), or add a big fat disclaimer/warning that DPDK is not tested with
> > the mentioned kernel version, and list the kernel versions actually
> > tested.
> > 
> > It is much less of an issue than it used to be since there should be no
> > need for
> > DPDK specific kernel components. The kernel API/ABI is stable across
> > releases
> > (with the notable exception of BPF). Therefore the DPDK kernel version
> > dependency
> > is much less than it used to be.  

There are three issues here:

1. Supporting out of date kernel also means supporting out of date build environments
   that maybe missing headers. The recent example was the TAP device requiring (or cloning
   which is worse) the headers to the FLOWER classifier.  If we move the kernel version
   to current LTS, then FLOWER is always present.
2. Supporting out of date kernel means more test infrastructure. Some CI needs to build
   test on older environments.
3. The place it impacts current CI is the building on CentOS7. CentOS7 is end of life
   do we have to keep it? The compiler also lack good C11 support so not sure how CI keeps working.

The way I view it, if you are on an old system, then stick to old DPDK LTS version.
We don't want to here about regressions on end of life systems.

^ permalink raw reply	[relevance 0%]

* [PATCH 0/7] add Nitrox compress device support
@ 2024-02-15 12:48  3% Nagadheeraj Rottela
  2024-02-29 17:22  0% ` Akhil Goyal
  0 siblings, 1 reply; 200+ results
From: Nagadheeraj Rottela @ 2024-02-15 12:48 UTC (permalink / raw)
  To: gakhil, fanzhang.oss, ashishg; +Cc: dev, Nagadheeraj Rottela

Add the Nitrox PMD to support Nitrox compress device.
---
v3:
* Fixed ABI compatibility issue.

v2:
* Reformatted patches to minimize number of changes.
* Removed empty file with only copyright.
* Updated all feature flags in nitrox.ini file.
* Added separate gotos in nitrox_pci_probe() function.

Nagadheeraj Rottela (7):
  crypto/nitrox: move nitrox common code to common folder
  compress/nitrox: add nitrox compressdev driver
  common/nitrox: add compress hardware queue management
  crypto/nitrox: set queue type during queue pair setup
  compress/nitrox: add software queue management
  compress/nitrox: add stateless request support
  compress/nitrox: add stateful request support

 MAINTAINERS                                   |    8 +
 doc/guides/compressdevs/features/nitrox.ini   |   17 +
 doc/guides/compressdevs/index.rst             |    1 +
 doc/guides/compressdevs/nitrox.rst            |   50 +
 drivers/common/nitrox/meson.build             |   19 +
 .../{crypto => common}/nitrox/nitrox_csr.h    |   12 +
 .../{crypto => common}/nitrox/nitrox_device.c |   51 +-
 .../{crypto => common}/nitrox/nitrox_device.h |    4 +-
 .../{crypto => common}/nitrox/nitrox_hal.c    |  116 ++
 .../{crypto => common}/nitrox/nitrox_hal.h    |  115 ++
 .../{crypto => common}/nitrox/nitrox_logs.c   |    0
 .../{crypto => common}/nitrox/nitrox_logs.h   |    0
 drivers/{crypto => common}/nitrox/nitrox_qp.c |   53 +-
 drivers/{crypto => common}/nitrox/nitrox_qp.h |   37 +-
 drivers/common/nitrox/version.map             |    9 +
 drivers/compress/nitrox/meson.build           |   16 +
 drivers/compress/nitrox/nitrox_comp.c         |  604 +++++++++
 drivers/compress/nitrox/nitrox_comp.h         |   35 +
 drivers/compress/nitrox/nitrox_comp_reqmgr.c  | 1199 +++++++++++++++++
 drivers/compress/nitrox/nitrox_comp_reqmgr.h  |   58 +
 drivers/crypto/nitrox/meson.build             |   11 +-
 drivers/crypto/nitrox/nitrox_sym.c            |    1 +
 drivers/meson.build                           |    1 +
 23 files changed, 2396 insertions(+), 21 deletions(-)
 create mode 100644 doc/guides/compressdevs/features/nitrox.ini
 create mode 100644 doc/guides/compressdevs/nitrox.rst
 create mode 100644 drivers/common/nitrox/meson.build
 rename drivers/{crypto => common}/nitrox/nitrox_csr.h (67%)
 rename drivers/{crypto => common}/nitrox/nitrox_device.c (77%)
 rename drivers/{crypto => common}/nitrox/nitrox_device.h (81%)
 rename drivers/{crypto => common}/nitrox/nitrox_hal.c (65%)
 rename drivers/{crypto => common}/nitrox/nitrox_hal.h (59%)
 rename drivers/{crypto => common}/nitrox/nitrox_logs.c (100%)
 rename drivers/{crypto => common}/nitrox/nitrox_logs.h (100%)
 rename drivers/{crypto => common}/nitrox/nitrox_qp.c (69%)
 rename drivers/{crypto => common}/nitrox/nitrox_qp.h (75%)
 create mode 100644 drivers/common/nitrox/version.map
 create mode 100644 drivers/compress/nitrox/meson.build
 create mode 100644 drivers/compress/nitrox/nitrox_comp.c
 create mode 100644 drivers/compress/nitrox/nitrox_comp.h
 create mode 100644 drivers/compress/nitrox/nitrox_comp_reqmgr.c
 create mode 100644 drivers/compress/nitrox/nitrox_comp_reqmgr.h

-- 
2.42.0


^ permalink raw reply	[relevance 3%]

* [PATCH v4 01/18] mbuf: deprecate GCC marker in rte mbuf struct
  @ 2024-02-15  6:21  3%   ` Tyler Retzlaff
  2024-02-18 12:39  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-02-15  6:21 UTC (permalink / raw)
  To: dev
  Cc: Ajit Khaparde, Andrew Boyer, Andrew Rybchenko, Bruce Richardson,
	Chenbo Xia, Chengwen Feng, Dariusz Sosnowski, David Christensen,
	Hyong Youb Kim, Jerin Jacob, Jie Hai, Jingjing Wu, John Daley,
	Kevin Laatz, Kiran Kumar K, Konstantin Ananyev, Maciej Czekaj,
	Matan Azrad, Maxime Coquelin, Nithin Dabilpuram, Ori Kam,
	Ruifeng Wang, Satha Rao, Somnath Kotur, Suanming Mou,
	Sunil Kumar Kori, Viacheslav Ovsiienko, Yisen Zhuang,
	Yuying Zhang, mb, Tyler Retzlaff

Provide a macro that allows conditional expansion of RTE_MARKER fields
to empty to allow rte_mbuf to be used with MSVC. It is proposed that
we announce the fields to be __rte_deprecated (currently disabled).

Introduce C11 anonymous unions to permit aliasing of well-known
offsets by name into the rte_mbuf structure by a *new* name and to
provide padding for cache alignment.

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 doc/guides/rel_notes/deprecation.rst |  20 ++
 lib/eal/include/rte_common.h         |   6 +
 lib/mbuf/rte_mbuf_core.h             | 375 +++++++++++++++++++----------------
 3 files changed, 233 insertions(+), 168 deletions(-)

diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 10630ba..8594255 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -17,6 +17,26 @@ Other API and ABI deprecation notices are to be posted below.
 Deprecation Notices
 -------------------
 
+* mbuf: Named zero sized fields of type ``RTE_MARKER`` and ``RTE_MARKER64``
+  will be removed from ``struct rte_mbuf`` and replaced with new fields
+  in anonymous unions.
+
+  The names of the fields impacted are:
+
+    old name                  new name
+
+  ``cacheline0``            ``mbuf_cachelin0``
+  ``rearm_data``            ``mbuf_rearm_data``
+  ``rx_descriptor_fields1`` ``mbuf_rx_descriptor_fields1``
+  ``cacheline1``            ``mbuf_cachelin1``
+
+  Contributions to DPDK should immediately start using the new names,
+  applications should adapt to new names as soon as possible as the
+  old names will be removed in a future DPDK release.
+
+  Note: types of the new names are not API compatible with the old and
+  some code conversion is required to maintain correct behavior.
+
 * build: The ``enable_kmods`` option is deprecated and will be removed in a future release.
   Setting/clearing the option has no impact on the build.
   Instead, kernel modules will be always built for OS's where out-of-tree kernel modules
diff --git a/lib/eal/include/rte_common.h b/lib/eal/include/rte_common.h
index d7d6390..af73f67 100644
--- a/lib/eal/include/rte_common.h
+++ b/lib/eal/include/rte_common.h
@@ -582,6 +582,12 @@ static void __attribute__((destructor(RTE_PRIO(prio)), used)) func(void)
 /** Marker for 8B alignment in a structure. */
 __extension__ typedef uint64_t RTE_MARKER64[0];
 
+#define __rte_marker(type, name) type name /* __rte_deprecated */
+
+#else
+
+#define __rte_marker(type, name)
+
 #endif
 
 /*********** Macros for calculating min and max **********/
diff --git a/lib/mbuf/rte_mbuf_core.h b/lib/mbuf/rte_mbuf_core.h
index 5688683..9e9590b 100644
--- a/lib/mbuf/rte_mbuf_core.h
+++ b/lib/mbuf/rte_mbuf_core.h
@@ -16,7 +16,10 @@
  * New fields and flags should fit in the "dynamic space".
  */
 
+#include <assert.h>
+#include <stdalign.h>
 #include <stdint.h>
+#include <stddef.h>
 
 #include <rte_byteorder.h>
 #include <rte_stdatomic.h>
@@ -464,204 +467,240 @@ enum {
  * The generic rte_mbuf, containing a packet mbuf.
  */
 struct rte_mbuf {
-	RTE_MARKER cacheline0;
-
-	void *buf_addr;           /**< Virtual address of segment buffer. */
+	__rte_marker(RTE_MARKER, cacheline0);
+	union {
+		char mbuf_cacheline0[RTE_CACHE_LINE_MIN_SIZE];
+		__extension__
+		struct {
+			void *buf_addr;           /**< Virtual address of segment buffer. */
 #if RTE_IOVA_IN_MBUF
-	/**
-	 * Physical address of segment buffer.
-	 * This field is undefined if the build is configured to use only
-	 * virtual address as IOVA (i.e. RTE_IOVA_IN_MBUF is 0).
-	 * Force alignment to 8-bytes, so as to ensure we have the exact
-	 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
-	 * working on vector drivers easier.
-	 */
-	rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
+			/**
+			 * Physical address of segment buffer.
+			 * This field is undefined if the build is configured to use only
+			 * virtual address as IOVA (i.e. RTE_IOVA_IN_MBUF is 0).
+			 * Force alignment to 8-bytes, so as to ensure we have the exact
+			 * same mbuf cacheline0 layout for 32-bit and 64-bit. This makes
+			 * working on vector drivers easier.
+			 */
+			rte_iova_t buf_iova __rte_aligned(sizeof(rte_iova_t));
 #else
-	/**
-	 * Next segment of scattered packet.
-	 * This field is valid when physical address field is undefined.
-	 * Otherwise next pointer in the second cache line will be used.
-	 */
-	struct rte_mbuf *next;
+			/**
+			 * Next segment of scattered packet.
+			 * This field is valid when physical address field is undefined.
+			 * Otherwise next pointer in the second cache line will be used.
+			 */
+			struct rte_mbuf *next;
 #endif
 
-	/* next 8 bytes are initialised on RX descriptor rearm */
-	RTE_MARKER64 rearm_data;
-	uint16_t data_off;
-
-	/**
-	 * Reference counter. Its size should at least equal to the size
-	 * of port field (16 bits), to support zero-copy broadcast.
-	 * It should only be accessed using the following functions:
-	 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
-	 * rte_mbuf_refcnt_set(). The functionality of these functions (atomic,
-	 * or non-atomic) is controlled by the RTE_MBUF_REFCNT_ATOMIC flag.
-	 */
-	RTE_ATOMIC(uint16_t) refcnt;
-
-	/**
-	 * Number of segments. Only valid for the first segment of an mbuf
-	 * chain.
-	 */
-	uint16_t nb_segs;
-
-	/** Input port (16 bits to support more than 256 virtual ports).
-	 * The event eth Tx adapter uses this field to specify the output port.
-	 */
-	uint16_t port;
-
-	uint64_t ol_flags;        /**< Offload features. */
+			/* next 8 bytes are initialised on RX descriptor rearm */
+			__rte_marker(RTE_MARKER64, rearm_data);
+			union {
+				char mbuf_rearm_data[8];
+				__extension__
+				struct {
+					uint16_t data_off;
+
+					/**
+					 * Reference counter. Its size should at least equal to the
+					 * size of port field (16 bits), to support zero-copy
+					 * broadcast.
+					 * It should only be accessed using the following
+					 * functions:
+					 * rte_mbuf_refcnt_update(), rte_mbuf_refcnt_read(), and
+					 * rte_mbuf_refcnt_set(). The functionality of these
+					 * functions (atomic, or non-atomic) is controlled by the
+					 * RTE_MBUF_REFCNT_ATOMIC flag.
+					 */
+					RTE_ATOMIC(uint16_t) refcnt;
+
+					/**
+					 * Number of segments. Only valid for the first segment of
+					 * an mbuf chain.
+					 */
+					uint16_t nb_segs;
+
+					/**
+					 * Input port (16 bits to support more than 256 virtual
+					 * ports).  The event eth Tx adapter uses this field to
+					 * specify the output port.
+					 */
+					uint16_t port;
+				};
+			};
 
-	/* remaining bytes are set on RX when pulling packet from descriptor */
-	RTE_MARKER rx_descriptor_fields1;
+			uint64_t ol_flags;        /**< Offload features. */
 
-	/*
-	 * The packet type, which is the combination of outer/inner L2, L3, L4
-	 * and tunnel types. The packet_type is about data really present in the
-	 * mbuf. Example: if vlan stripping is enabled, a received vlan packet
-	 * would have RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
-	 * vlan is stripped from the data.
-	 */
-	union {
-		uint32_t packet_type; /**< L2/L3/L4 and tunnel information. */
-		__extension__
-		struct {
-			uint8_t l2_type:4;   /**< (Outer) L2 type. */
-			uint8_t l3_type:4;   /**< (Outer) L3 type. */
-			uint8_t l4_type:4;   /**< (Outer) L4 type. */
-			uint8_t tun_type:4;  /**< Tunnel type. */
+			/* remaining bytes are set on RX when pulling packet from descriptor */
+			__rte_marker(RTE_MARKER, rx_descriptor_fields1);
 			union {
-				uint8_t inner_esp_next_proto;
-				/**< ESP next protocol type, valid if
-				 * RTE_PTYPE_TUNNEL_ESP tunnel type is set
-				 * on both Tx and Rx.
-				 */
+				char mbuf_rx_descriptor_fields1[8];
 				__extension__
 				struct {
-					uint8_t inner_l2_type:4;
-					/**< Inner L2 type. */
-					uint8_t inner_l3_type:4;
-					/**< Inner L3 type. */
+					/*
+					 * The packet type, which is the combination of outer/inner
+					 * L2, L3, L4 and tunnel types. The packet_type is about
+					 * data really present in the mbuf. Example: if vlan
+					 * stripping is enabled, a received vlan packet would have
+					 * RTE_PTYPE_L2_ETHER and not RTE_PTYPE_L2_VLAN because the
+					 * vlan is stripped from the data.
+					 */
+					union {
+						uint32_t packet_type;
+						/**< L2/L3/L4 and tunnel information. */
+						__extension__
+						struct {
+							uint8_t l2_type:4;
+							/**< (Outer) L2 type. */
+							uint8_t l3_type:4;
+							/**< (Outer) L3 type. */
+							uint8_t l4_type:4;
+							/**< (Outer) L4 type. */
+							uint8_t tun_type:4;
+							/**< Tunnel type. */
+							union {
+								uint8_t inner_esp_next_proto;
+								/**< ESP next protocol type, valid
+								 * if RTE_PTYPE_TUNNEL_ESP tunnel
+								 * type is set on both Tx and Rx.
+								 */
+								__extension__
+								struct {
+									uint8_t inner_l2_type:4;
+									/**< Inner L2 type. */
+									uint8_t inner_l3_type:4;
+									/**< Inner L3 type. */
+								};
+							};
+							uint8_t inner_l4_type:4;
+							/**< Inner L4 type. */
+						};
+					};
+					uint32_t pkt_len;
+					/**< Total pkt len: sum of all segments. */
 				};
 			};
-			uint8_t inner_l4_type:4; /**< Inner L4 type. */
-		};
-	};
 
-	uint32_t pkt_len;         /**< Total pkt len: sum of all segments. */
-	uint16_t data_len;        /**< Amount of data in segment buffer. */
-	/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
-	uint16_t vlan_tci;
+			uint16_t data_len;        /**< Amount of data in segment buffer. */
+			/** VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_VLAN is set. */
+			uint16_t vlan_tci;
 
-	union {
-		union {
-			uint32_t rss;     /**< RSS hash result if RSS enabled */
-			struct {
+			union {
 				union {
+					uint32_t rss;     /**< RSS hash result if RSS enabled */
 					struct {
-						uint16_t hash;
-						uint16_t id;
-					};
-					uint32_t lo;
-					/**< Second 4 flexible bytes */
-				};
-				uint32_t hi;
-				/**< First 4 flexible bytes or FD ID, dependent
-				 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
-				 */
-			} fdir;	/**< Filter identifier if FDIR enabled */
-			struct rte_mbuf_sched sched;
-			/**< Hierarchical scheduler : 8 bytes */
-			struct {
-				uint32_t reserved1;
-				uint16_t reserved2;
-				uint16_t txq;
-				/**< The event eth Tx adapter uses this field
-				 * to store Tx queue id.
-				 * @see rte_event_eth_tx_adapter_txq_set()
-				 */
-			} txadapter; /**< Eventdev ethdev Tx adapter */
-			uint32_t usr;
-			/**< User defined tags. See rte_distributor_process() */
-		} hash;                   /**< hash information */
-	};
+						union {
+							__extension__
+							struct {
+								uint16_t hash;
+								uint16_t id;
+							};
+							uint32_t lo;
+							/**< Second 4 flexible bytes */
+						};
+						uint32_t hi;
+						/**< First 4 flexible bytes or FD ID, dependent
+						 * on RTE_MBUF_F_RX_FDIR_* flag in ol_flags.
+						 */
+					} fdir;	/**< Filter identifier if FDIR enabled */
+					struct rte_mbuf_sched sched;
+					/**< Hierarchical scheduler : 8 bytes */
+					struct {
+						uint32_t reserved1;
+						uint16_t reserved2;
+						uint16_t txq;
+						/**< The event eth Tx adapter uses this field
+						 * to store Tx queue id.
+						 * @see rte_event_eth_tx_adapter_txq_set()
+						 */
+					} txadapter; /**< Eventdev ethdev Tx adapter */
+					uint32_t usr;
+					/**< User defined tags. See rte_distributor_process() */
+				} hash;                   /**< hash information */
+			};
 
-	/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
-	uint16_t vlan_tci_outer;
+			/** Outer VLAN TCI (CPU order), valid if RTE_MBUF_F_RX_QINQ is set. */
+			uint16_t vlan_tci_outer;
 
-	uint16_t buf_len;         /**< Length of segment buffer. */
+			uint16_t buf_len;         /**< Length of segment buffer. */
 
-	struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
+			struct rte_mempool *pool; /**< Pool from which mbuf was allocated. */
+		};
+	};
 
 	/* second cache line - fields only used in slow path or on TX */
-	RTE_MARKER cacheline1 __rte_cache_min_aligned;
-
-#if RTE_IOVA_IN_MBUF
-	/**
-	 * Next segment of scattered packet. Must be NULL in the last
-	 * segment or in case of non-segmented packet.
-	 */
-	struct rte_mbuf *next;
-#else
-	/**
-	 * Reserved for dynamic fields
-	 * when the next pointer is in first cache line (i.e. RTE_IOVA_IN_MBUF is 0).
-	 */
-	uint64_t dynfield2;
-#endif
-
-	/* fields to support TX offloads */
+	__rte_marker(RTE_MARKER, cacheline1);
 	union {
-		uint64_t tx_offload;       /**< combined for easy fetch */
+		char mbuf_cacheline1[RTE_CACHE_LINE_MIN_SIZE];
 		__extension__
 		struct {
-			uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
-			/**< L2 (MAC) Header Length for non-tunneling pkt.
-			 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
+#if RTE_IOVA_IN_MBUF
+			/**
+			 * Next segment of scattered packet. Must be NULL in the last
+			 * segment or in case of non-segmented packet.
 			 */
-			uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
-			/**< L3 (IP) Header Length. */
-			uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
-			/**< L4 (TCP/UDP) Header Length. */
-			uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
-			/**< TCP TSO segment size */
-
-			/*
-			 * Fields for Tx offloading of tunnels.
-			 * These are undefined for packets which don't request
-			 * any tunnel offloads (outer IP or UDP checksum,
-			 * tunnel TSO).
-			 *
-			 * PMDs should not use these fields unconditionally
-			 * when calculating offsets.
-			 *
-			 * Applications are expected to set appropriate tunnel
-			 * offload flags when they fill in these fields.
+			struct rte_mbuf *next;
+#else
+			/**
+			 * Reserved for dynamic fields
+			 * when the next pointer is in first cache line
+			 * (i.e. RTE_IOVA_IN_MBUF is 0).
 			 */
-			uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
-			/**< Outer L3 (IP) Hdr Length. */
-			uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
-			/**< Outer L2 (MAC) Hdr Length. */
+			uint64_t dynfield2;
+#endif
 
-			/* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */
-		};
-	};
+			/* fields to support TX offloads */
+			union {
+				uint64_t tx_offload;       /**< combined for easy fetch */
+				__extension__
+				struct {
+					uint64_t l2_len:RTE_MBUF_L2_LEN_BITS;
+					/**< L2 (MAC) Header Length for non-tunneling pkt.
+					 * Outer_L4_len + ... + Inner_L2_len for tunneling pkt.
+					 */
+					uint64_t l3_len:RTE_MBUF_L3_LEN_BITS;
+					/**< L3 (IP) Header Length. */
+					uint64_t l4_len:RTE_MBUF_L4_LEN_BITS;
+					/**< L4 (TCP/UDP) Header Length. */
+					uint64_t tso_segsz:RTE_MBUF_TSO_SEGSZ_BITS;
+					/**< TCP TSO segment size */
+
+					/*
+					 * Fields for Tx offloading of tunnels.
+					 * These are undefined for packets which don't request
+					 * any tunnel offloads (outer IP or UDP checksum,
+					 * tunnel TSO).
+					 *
+					 * PMDs should not use these fields unconditionally
+					 * when calculating offsets.
+					 *
+					 * Applications are expected to set appropriate tunnel
+					 * offload flags when they fill in these fields.
+					 */
+					uint64_t outer_l3_len:RTE_MBUF_OUTL3_LEN_BITS;
+					/**< Outer L3 (IP) Hdr Length. */
+					uint64_t outer_l2_len:RTE_MBUF_OUTL2_LEN_BITS;
+					/**< Outer L2 (MAC) Hdr Length. */
+
+					/* uint64_t unused:RTE_MBUF_TXOFLD_UNUSED_BITS; */
+				};
+			};
 
-	/** Shared data for external buffer attached to mbuf. See
-	 * rte_pktmbuf_attach_extbuf().
-	 */
-	struct rte_mbuf_ext_shared_info *shinfo;
+			/** Shared data for external buffer attached to mbuf. See
+			 * rte_pktmbuf_attach_extbuf().
+			 */
+			struct rte_mbuf_ext_shared_info *shinfo;
 
-	/** Size of the application private data. In case of an indirect
-	 * mbuf, it stores the direct mbuf private data size.
-	 */
-	uint16_t priv_size;
+			/** Size of the application private data. In case of an indirect
+			 * mbuf, it stores the direct mbuf private data size.
+			 */
+			uint16_t priv_size;
 
-	/** Timesync flags for use with IEEE1588. */
-	uint16_t timesync;
+			/** Timesync flags for use with IEEE1588. */
+			uint16_t timesync;
 
-	uint32_t dynfield1[9]; /**< Reserved for dynamic fields. */
+			uint32_t dynfield1[9]; /**< Reserved for dynamic fields. */
+		};
+	};
 } __rte_cache_aligned;
 
 /**
-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* [PATCH v4 27/39] mempool: use C11 alignas
  @ 2024-02-14 16:35  4%   ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-02-14 16:35 UTC (permalink / raw)
  To: dev
  Cc: Andrew Rybchenko, Bruce Richardson, Chengwen Feng,
	Cristian Dumitrescu, David Christensen, David Hunt, Ferruh Yigit,
	Honnappa Nagarahalli, Jasvinder Singh, Jerin Jacob, Kevin Laatz,
	Konstantin Ananyev, Min Zhou, Ruifeng Wang, Sameh Gobriel,
	Stanislaw Kardach, Thomas Monjalon, Vladimir Medvedkin,
	Yipeng Wang, Tyler Retzlaff

* Move __rte_aligned from the end of {struct,union} definitions to
  be between {struct,union} and tag.

  The placement between {struct,union} and the tag allows the desired
  alignment to be imparted on the type regardless of the toolchain being
  used for all of GCC, LLVM, MSVC compilers building both C and C++.

* Replace use of __rte_aligned(a) on variables/fields with alignas(a).

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 lib/mempool/rte_mempool.h | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 6fa4d48..23fd5c8 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -34,6 +34,7 @@
  * user cache created with rte_mempool_cache_create().
  */
 
+#include <stdalign.h>
 #include <stdio.h>
 #include <stdint.h>
 #include <inttypes.h>
@@ -66,7 +67,7 @@
  * captured since they can be calculated from other stats.
  * For example: put_cache_objs = put_objs - put_common_pool_objs.
  */
-struct rte_mempool_debug_stats {
+struct __rte_cache_aligned rte_mempool_debug_stats {
 	uint64_t put_bulk;             /**< Number of puts. */
 	uint64_t put_objs;             /**< Number of objects successfully put. */
 	uint64_t put_common_pool_bulk; /**< Number of bulks enqueued in common pool. */
@@ -80,13 +81,13 @@ struct rte_mempool_debug_stats {
 	uint64_t get_success_blks;     /**< Successful allocation number of contiguous blocks. */
 	uint64_t get_fail_blks;        /**< Failed allocation number of contiguous blocks. */
 	RTE_CACHE_GUARD;
-} __rte_cache_aligned;
+};
 #endif
 
 /**
  * A structure that stores a per-core object cache.
  */
-struct rte_mempool_cache {
+struct __rte_cache_aligned rte_mempool_cache {
 	uint32_t size;	      /**< Size of the cache */
 	uint32_t flushthresh; /**< Threshold before we flush excess elements */
 	uint32_t len;	      /**< Current cache count */
@@ -109,8 +110,8 @@ struct rte_mempool_cache {
 	 * Cache is allocated to this size to allow it to overflow in certain
 	 * cases to avoid needless emptying of cache.
 	 */
-	void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned;
-} __rte_cache_aligned;
+	alignas(RTE_CACHE_LINE_SIZE) void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2];
+};
 
 /**
  * A structure that stores the size of mempool elements.
@@ -218,15 +219,15 @@ struct rte_mempool_memhdr {
  * The structure is cache-line aligned to avoid ABI breakages in
  * a number of cases when something small is added.
  */
-struct rte_mempool_info {
+struct __rte_cache_aligned rte_mempool_info {
 	/** Number of objects in the contiguous block */
 	unsigned int contig_block_size;
-} __rte_cache_aligned;
+};
 
 /**
  * The RTE mempool structure.
  */
-struct rte_mempool {
+struct __rte_cache_aligned rte_mempool {
 	char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
 	union {
 		void *pool_data;         /**< Ring or pool to store objects. */
@@ -268,7 +269,7 @@ struct rte_mempool {
 	 */
 	struct rte_mempool_debug_stats stats[RTE_MAX_LCORE + 1];
 #endif
-}  __rte_cache_aligned;
+};
 
 /** Spreading among memory channels not required. */
 #define RTE_MEMPOOL_F_NO_SPREAD		0x0001
@@ -688,7 +689,7 @@ typedef int (*rte_mempool_get_info_t)(const struct rte_mempool *mp,
 
 
 /** Structure defining mempool operations structure */
-struct rte_mempool_ops {
+struct __rte_cache_aligned rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
 	rte_mempool_alloc_t alloc;       /**< Allocate private data. */
 	rte_mempool_free_t free;         /**< Free the external pool. */
@@ -713,7 +714,7 @@ struct rte_mempool_ops {
 	 * Dequeue a number of contiguous object blocks.
 	 */
 	rte_mempool_dequeue_contig_blocks_t dequeue_contig_blocks;
-} __rte_cache_aligned;
+};
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
 
@@ -726,14 +727,14 @@ struct rte_mempool_ops {
  * any function pointers stored directly in the mempool struct would not be.
  * This results in us simply having "ops_index" in the mempool struct.
  */
-struct rte_mempool_ops_table {
+struct __rte_cache_aligned rte_mempool_ops_table {
 	rte_spinlock_t sl;     /**< Spinlock for add/delete. */
 	uint32_t num_ops;      /**< Number of used ops structs in the table. */
 	/**
 	 * Storage for all possible ops structs.
 	 */
 	struct rte_mempool_ops ops[RTE_MEMPOOL_MAX_OPS_IDX];
-} __rte_cache_aligned;
+};
 
 /** Array of registered ops structs. */
 extern struct rte_mempool_ops_table rte_mempool_ops_table;
-- 
1.8.3.1


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v2 0/4] more replacement of zero length array
  2024-02-13 19:20  3%     ` Tyler Retzlaff
@ 2024-02-14  7:36  4%       ` David Marchand
  2024-02-16 10:14  0%         ` David Marchand
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-02-14  7:36 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Bruce Richardson, Cristian Dumitrescu, Honnappa Nagarahalli,
	Sameh Gobriel, Vladimir Medvedkin, Yipeng Wang, mb, fengchengwen,
	Dodji Seketeli

On Tue, Feb 13, 2024 at 8:20 PM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> On Tue, Feb 13, 2024 at 02:14:28PM +0100, David Marchand wrote:
> > On Mon, Feb 12, 2024 at 11:36 PM Tyler Retzlaff
> > <roretzla@linux.microsoft.com> wrote:
> > >
> > > Replace some missed zero length arrays not captured in the
> > > original series.
> > > https://patchwork.dpdk.org/project/dpdk/list/?series=30410&state=*
> > >
> > > Zero length arrays are a GNU extension that has been
> > > superseded by flex arrays.
> > >
> > > https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
> > >
> > > v2:
> > >     * added additional patches for fib & pipeline libs.
> > >       series-acks have been placed only against original
> > >       hash and rcu patches.
> >
> > There seems to be an issue with the ABI check on those changes.
> > After a quick chat with Dodji, I opened a bug for libabigail.
> >
> > https://sourceware.org/bugzilla/show_bug.cgi?id=31377
>
> I double checked again and I don't see the struct in question being
> embedded as a field of another struct/union.  So I don't think there should
> be an ABI change here.

That was and is still my understanding too.

The message we get when testing this series is:

                      type size hasn't changed
                      1 data member change:
                        'uint8_t action_data[]' has *some* difference
- please report as a bug

which is why I reached out to Dodji (libabigail maintainer).

Dodji explained me that zero length / flex arrays conversion is
something he has been working on, and there are still some rough
edges.
This message is there so that libabigail community gets more input on
real life cases to handle.


>
> I'm okay with the change being merged but if there is concern I can drop
> this patch from the series.

At least, we can't merge it in the current form.

If libabigail gets a fix quickly, DPDK CI will still need a released version.
So for this patch to be merged now, we need a libabigail suppression rule.
I don't see a way to precisely waive this issue, so my suggestion is
to silence any change on the concerned structure here (which should be
ok, as the pipeline library data struct have been super stable for a
couple of years).
Something like:

$ git diff
diff --git a/devtools/libabigail.abignore b/devtools/libabigail.abignore
index 21b8cd6113..d667157909 100644
--- a/devtools/libabigail.abignore
+++ b/devtools/libabigail.abignore
@@ -33,3 +33,5 @@
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 ; Temporary exceptions till next major ABI version ;
 ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
+[suppress_type]
+       name = rte_pipeline_table_entry


-- 
David Marchand


^ permalink raw reply	[relevance 4%]

* [PATCH v3 27/39] mempool: use C11 alignas
  @ 2024-02-14  7:06  4%   ` Tyler Retzlaff
  0 siblings, 0 replies; 200+ results
From: Tyler Retzlaff @ 2024-02-14  7:06 UTC (permalink / raw)
  To: dev
  Cc: Andrew Rybchenko, Bruce Richardson, Chengwen Feng,
	Cristian Dumitrescu, David Christensen, David Hunt, Ferruh Yigit,
	Honnappa Nagarahalli, Jasvinder Singh, Jerin Jacob, Kevin Laatz,
	Konstantin Ananyev, Min Zhou, Ruifeng Wang, Sameh Gobriel,
	Stanislaw Kardach, Thomas Monjalon, Vladimir Medvedkin,
	Yipeng Wang, Tyler Retzlaff

* Move __rte_aligned from the end of {struct,union} definitions to
  be between {struct,union} and tag.

  The placement between {struct,union} and the tag allows the desired
  alignment to be imparted on the type regardless of the toolchain being
  used for all of GCC, LLVM, MSVC compilers building both C and C++.

* Replace use of __rte_aligned(a) on variables/fields with alignas(a).

Signed-off-by: Tyler Retzlaff <roretzla@linux.microsoft.com>
---
 lib/mempool/rte_mempool.h | 27 ++++++++++++++-------------
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/lib/mempool/rte_mempool.h b/lib/mempool/rte_mempool.h
index 6fa4d48..23fd5c8 100644
--- a/lib/mempool/rte_mempool.h
+++ b/lib/mempool/rte_mempool.h
@@ -34,6 +34,7 @@
  * user cache created with rte_mempool_cache_create().
  */
 
+#include <stdalign.h>
 #include <stdio.h>
 #include <stdint.h>
 #include <inttypes.h>
@@ -66,7 +67,7 @@
  * captured since they can be calculated from other stats.
  * For example: put_cache_objs = put_objs - put_common_pool_objs.
  */
-struct rte_mempool_debug_stats {
+struct __rte_cache_aligned rte_mempool_debug_stats {
 	uint64_t put_bulk;             /**< Number of puts. */
 	uint64_t put_objs;             /**< Number of objects successfully put. */
 	uint64_t put_common_pool_bulk; /**< Number of bulks enqueued in common pool. */
@@ -80,13 +81,13 @@ struct rte_mempool_debug_stats {
 	uint64_t get_success_blks;     /**< Successful allocation number of contiguous blocks. */
 	uint64_t get_fail_blks;        /**< Failed allocation number of contiguous blocks. */
 	RTE_CACHE_GUARD;
-} __rte_cache_aligned;
+};
 #endif
 
 /**
  * A structure that stores a per-core object cache.
  */
-struct rte_mempool_cache {
+struct __rte_cache_aligned rte_mempool_cache {
 	uint32_t size;	      /**< Size of the cache */
 	uint32_t flushthresh; /**< Threshold before we flush excess elements */
 	uint32_t len;	      /**< Current cache count */
@@ -109,8 +110,8 @@ struct rte_mempool_cache {
 	 * Cache is allocated to this size to allow it to overflow in certain
 	 * cases to avoid needless emptying of cache.
 	 */
-	void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2] __rte_cache_aligned;
-} __rte_cache_aligned;
+	alignas(RTE_CACHE_LINE_SIZE) void *objs[RTE_MEMPOOL_CACHE_MAX_SIZE * 2];
+};
 
 /**
  * A structure that stores the size of mempool elements.
@@ -218,15 +219,15 @@ struct rte_mempool_memhdr {
  * The structure is cache-line aligned to avoid ABI breakages in
  * a number of cases when something small is added.
  */
-struct rte_mempool_info {
+struct __rte_cache_aligned rte_mempool_info {
 	/** Number of objects in the contiguous block */
 	unsigned int contig_block_size;
-} __rte_cache_aligned;
+};
 
 /**
  * The RTE mempool structure.
  */
-struct rte_mempool {
+struct __rte_cache_aligned rte_mempool {
 	char name[RTE_MEMPOOL_NAMESIZE]; /**< Name of mempool. */
 	union {
 		void *pool_data;         /**< Ring or pool to store objects. */
@@ -268,7 +269,7 @@ struct rte_mempool {
 	 */
 	struct rte_mempool_debug_stats stats[RTE_MAX_LCORE + 1];
 #endif
-}  __rte_cache_aligned;
+};
 
 /** Spreading among memory channels not required. */
 #define RTE_MEMPOOL_F_NO_SPREAD		0x0001
@@ -688,7 +689,7 @@ typedef int (*rte_mempool_get_info_t)(const struct rte_mempool *mp,
 
 
 /** Structure defining mempool operations structure */
-struct rte_mempool_ops {
+struct __rte_cache_aligned rte_mempool_ops {
 	char name[RTE_MEMPOOL_OPS_NAMESIZE]; /**< Name of mempool ops struct. */
 	rte_mempool_alloc_t alloc;       /**< Allocate private data. */
 	rte_mempool_free_t free;         /**< Free the external pool. */
@@ -713,7 +714,7 @@ struct rte_mempool_ops {
 	 * Dequeue a number of contiguous object blocks.
 	 */
 	rte_mempool_dequeue_contig_blocks_t dequeue_contig_blocks;
-} __rte_cache_aligned;
+};
 
 #define RTE_MEMPOOL_MAX_OPS_IDX 16  /**< Max registered ops structs */
 
@@ -726,14 +727,14 @@ struct rte_mempool_ops {
  * any function pointers stored directly in the mempool struct would not be.
  * This results in us simply having "ops_index" in the mempool struct.
  */
-struct rte_mempool_ops_table {
+struct __rte_cache_aligned rte_mempool_ops_table {
 	rte_spinlock_t sl;     /**< Spinlock for add/delete. */
 	uint32_t num_ops;      /**< Number of used ops structs in the table. */
 	/**
 	 * Storage for all possible ops structs.
 	 */
 	struct rte_mempool_ops ops[RTE_MEMPOOL_MAX_OPS_IDX];
-} __rte_cache_aligned;
+};
 
 /** Array of registered ops structs. */
 extern struct rte_mempool_ops_table rte_mempool_ops_table;
-- 
1.8.3.1


^ permalink raw reply	[relevance 4%]

* Re: [PATCH v2 0/4] more replacement of zero length array
  2024-02-13 13:14  3%   ` David Marchand
@ 2024-02-13 19:20  3%     ` Tyler Retzlaff
  2024-02-14  7:36  4%       ` David Marchand
  0 siblings, 1 reply; 200+ results
From: Tyler Retzlaff @ 2024-02-13 19:20 UTC (permalink / raw)
  To: David Marchand
  Cc: dev, Bruce Richardson, Cristian Dumitrescu, Honnappa Nagarahalli,
	Sameh Gobriel, Vladimir Medvedkin, Yipeng Wang, mb, fengchengwen,
	Dodji Seketeli

On Tue, Feb 13, 2024 at 02:14:28PM +0100, David Marchand wrote:
> On Mon, Feb 12, 2024 at 11:36 PM Tyler Retzlaff
> <roretzla@linux.microsoft.com> wrote:
> >
> > Replace some missed zero length arrays not captured in the
> > original series.
> > https://patchwork.dpdk.org/project/dpdk/list/?series=30410&state=*
> >
> > Zero length arrays are a GNU extension that has been
> > superseded by flex arrays.
> >
> > https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
> >
> > v2:
> >     * added additional patches for fib & pipeline libs.
> >       series-acks have been placed only against original
> >       hash and rcu patches.
> 
> There seems to be an issue with the ABI check on those changes.
> After a quick chat with Dodji, I opened a bug for libabigail.
> 
> https://sourceware.org/bugzilla/show_bug.cgi?id=31377

I double checked again and I don't see the struct in question being
embedded as a field of another struct/union.  So I don't think there should
be an ABI change here.

I'm okay with the change being merged but if there is concern I can drop
this patch from the series.

> 
> 
> -- 
> David Marchand

^ permalink raw reply	[relevance 3%]

* RE: [PATCH v2] RFC: replace GCC marker extension with C11 anonymous unions
  2024-02-13  6:45  3% ` [PATCH v2] RFC: " Tyler Retzlaff
  2024-02-13  8:57  0%   ` Bruce Richardson
@ 2024-02-13 17:09  0%   ` Morten Brørup
  1 sibling, 0 replies; 200+ results
From: Morten Brørup @ 2024-02-13 17:09 UTC (permalink / raw)
  To: Tyler Retzlaff, dev
  Cc: Andrew Boyer, Andrew Rybchenko, Bruce Richardson, Chenbo Xia,
	Konstantin Ananyev, Maxime Coquelin

> From: Tyler Retzlaff [mailto:roretzla@linux.microsoft.com]
> Sent: Tuesday, 13 February 2024 07.46
> 
> The zero sized RTE_MARKER<n> typedefs are a GCC extension unsupported
> by
> MSVC.  Replace the use of the RTE_MARKER typedefs with anonymous
> unions.
> 
> Note:
> 
> v1 of the series tried to maintain the API after some study it has been
> discovered that some existing uses of the markers do not produce
> compilation
> failure but evaluate to unintended values in the absence of adaptation.
> For this reason the existing markers cannot be removed because it is
> too hard
> to identify what needs to be changed by consumers. While the ABI has
> been
> maintained the subtle API change is just too risky.
> 
> The question I'm asking now is how to gracefully deprecate the markers
> while allowing consumption of the struct on Windows.
> 
> I propose the following:
> 
> * Introduce the unions as per-this series except instead of adding
> members
>   that match the original RTE_MARKER field names provide *new* names.
> * Retain (conditionally compiled away on Windows) the existing
> RTE_MARKER
>   fields with their original names.
> * Convert in-tree code to use the new names in the unions.
> 
> The old names & markers would be announced for deprecation and
> eventually
> removed and when they are the conditional compilation would also go
> away.
> 
> Thoughts?

Seems like the right thing to do!

The modified type of rearm_data might not be noticed by out-of-tree PMD developers, so using a new name for the new type reduces the risk.

If some of the markers maintain their type or get a compatible type (from an API perspective), they can keep their names.


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 0/4] more replacement of zero length array
  @ 2024-02-13 13:14  3%   ` David Marchand
  2024-02-13 19:20  3%     ` Tyler Retzlaff
  0 siblings, 1 reply; 200+ results
From: David Marchand @ 2024-02-13 13:14 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Bruce Richardson, Cristian Dumitrescu, Honnappa Nagarahalli,
	Sameh Gobriel, Vladimir Medvedkin, Yipeng Wang, mb, fengchengwen,
	Dodji Seketeli

On Mon, Feb 12, 2024 at 11:36 PM Tyler Retzlaff
<roretzla@linux.microsoft.com> wrote:
>
> Replace some missed zero length arrays not captured in the
> original series.
> https://patchwork.dpdk.org/project/dpdk/list/?series=30410&state=*
>
> Zero length arrays are a GNU extension that has been
> superseded by flex arrays.
>
> https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
>
> v2:
>     * added additional patches for fib & pipeline libs.
>       series-acks have been placed only against original
>       hash and rcu patches.

There seems to be an issue with the ABI check on those changes.
After a quick chat with Dodji, I opened a bug for libabigail.

https://sourceware.org/bugzilla/show_bug.cgi?id=31377


-- 
David Marchand


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2] RFC: replace GCC marker extension with C11 anonymous unions
  2024-02-13  6:45  3% ` [PATCH v2] RFC: " Tyler Retzlaff
@ 2024-02-13  8:57  0%   ` Bruce Richardson
  2024-02-13 17:09  0%   ` Morten Brørup
  1 sibling, 0 replies; 200+ results
From: Bruce Richardson @ 2024-02-13  8:57 UTC (permalink / raw)
  To: Tyler Retzlaff
  Cc: dev, Andrew Boyer, Andrew Rybchenko, Chenbo Xia,
	Konstantin Ananyev, Maxime Coquelin, mb

On Mon, Feb 12, 2024 at 10:45:40PM -0800, Tyler Retzlaff wrote:
> The zero sized RTE_MARKER<n> typedefs are a GCC extension unsupported by
> MSVC.  Replace the use of the RTE_MARKER typedefs with anonymous unions.
> 
> Note:
> 
> v1 of the series tried to maintain the API after some study it has been
> discovered that some existing uses of the markers do not produce compilation
> failure but evaluate to unintended values in the absence of adaptation.
> For this reason the existing markers cannot be removed because it is too hard
> to identify what needs to be changed by consumers. While the ABI has been
> maintained the subtle API change is just too risky.
> 
> The question I'm asking now is how to gracefully deprecate the markers
> while allowing consumption of the struct on Windows.
> 
> I propose the following:
> 
> * Introduce the unions as per-this series except instead of adding members
>   that match the original RTE_MARKER field names provide *new* names.
> * Retain (conditionally compiled away on Windows) the existing RTE_MARKER
>   fields with their original names.
> * Convert in-tree code to use the new names in the unions.
> 
> The old names & markers would be announced for deprecation and eventually
> removed and when they are the conditional compilation would also go away.
> 
> Thoughts?
> 
This seems a good approach. +1 from me for the idea.

/Bruce

^ permalink raw reply	[relevance 0%]

* RE: [PATCH v2 1/4] ethdev: introduce encap hash calculation
  2024-02-12 20:09  3%           ` Ferruh Yigit
@ 2024-02-13  7:05  0%             ` Ori Kam
  0 siblings, 0 replies; 200+ results
From: Ori Kam @ 2024-02-13  7:05 UTC (permalink / raw)
  To: Ferruh Yigit, Dariusz Sosnowski, cristian.dumitrescu,
	andrew.rybchenko, stephen, NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: dev, Raslan Darawsheh



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Monday, February 12, 2024 10:10 PM
> 
> On 2/12/2024 6:44 PM, Ori Kam wrote:
> > Hi Ferruh
> >
> >> -----Original Message-----
> >> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >> Sent: Monday, February 12, 2024 7:05 PM
> >>
> >> On 2/11/2024 7:29 AM, Ori Kam wrote:
> >>> Hi Ferruh,
> >>>
> >>>> -----Original Message-----
> >>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
> >>>> Sent: Thursday, February 8, 2024 7:13 PM
> >>>> To: Ori Kam <orika@nvidia.com>; Dariusz Sosnowski
> >>>>
> >>>> On 2/8/2024 9:09 AM, Ori Kam wrote:
> >>>>> During encapsulation of a packet, it is possible to change some
> >>>>> outer headers to improve flow destribution.
> >>>>> For example, from VXLAN RFC:
> >>>>> "It is recommended that the UDP source port number
> >>>>> be calculated using a hash of fields from the inner packet --
> >>>>> one example being a hash of the inner Ethernet frame's headers.
> >>>>> This is to enable a level of entropy for the ECMP/load-balancing"
> >>>>>
> >>>>> The tunnel protocol defines which outer field should hold this hash,
> >>>>> but it doesn't define the hash calculation algorithm.
> >>>>>
> >>>>> An application that uses flow offloads gets the first few packets
> >>>>> (exception path) and then decides to offload the flow.
> >>>>> As a result, there are two
> >>>>> different paths that a packet from a given flow may take.
> >>>>> SW for the first few packets or HW for the rest.
> >>>>> When the packet goes through the SW, the SW encapsulates the
> packet
> >>>>> and must use the same hash calculation as the HW will do for
> >>>>> the rest of the packets in this flow.
> >>>>>
> >>>>> the new function rte_flow_calc_encap_hash can query the hash value
> >>>>> fromm the driver for a given packet as if the packet was passed
> >>>>> through the HW.
> >>>>>
> >>>>> Signed-off-by: Ori Kam <orika@nvidia.com>
> >>>>> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
> >>>>>
> >>>>
> >>>> <...>
> >>>>
> >>>>> +int
> >>>>> +rte_flow_calc_encap_hash(uint16_t port_id, const struct
> rte_flow_item
> >>>> pattern[],
> >>>>> +			 enum rte_flow_encap_hash_field dest_field,
> uint8_t
> >>>> hash_len,
> >>>>> +			 uint8_t *hash, struct rte_flow_error *error)
> >>>>> +{
> >>>>> +	int ret;
> >>>>> +	struct rte_eth_dev *dev;
> >>>>> +	const struct rte_flow_ops *ops;
> >>>>> +
> >>>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> >>>>> +	ops = rte_flow_ops_get(port_id, error);
> >>>>> +	if (!ops || !ops->flow_calc_encap_hash)
> >>>>> +		return rte_flow_error_set(error, ENOTSUP,
> >>>>> +
> >>>> RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> >>>>> +					  "calc encap hash is not
> supported");
> >>>>> +	if ((dest_field == RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT
> &&
> >>>> hash_len != 2) ||
> >>>>> +	    (dest_field ==
> RTE_FLOW_ENCAP_HASH_FIELD_NVGRE_FLOW_ID
> >>>> && hash_len != 1))
> >>>>>
> >>>>
> >>>> If there is a fixed mapping with the dest_field and the size, instead of
> >>>> putting this information into check code, what do you think to put it
> >>>> into the data structure?
> >>>>
> >>>> I mean instead of using enum for dest_filed, it can be a struct that is
> >>>> holding enum and its expected size, this clarifies what the expected
> >>>> size for that field.
> >>>>
> >>>
> >>> From my original email I think we only need the type, we don't need the
> >> size.
> >>> On the RFC thread there was an objection. So I added the size,
> >>> If you think it is not needed lets remove it.
> >>>
> >>
> >> I am not saying length is not needed, but
> >> API gets 'dest_field' & 'hash_len', and according checks in the API for
> >> each 'dest_field' there is an exact 'hash_len' requirement, this
> >> requirement is something impacts user but this information is embedded
> >> in the API, my suggestion is make it more visible to user.
> >>
> >> My initial suggestion was put this into an object, like:
> >> ```
> >> struct x {
> >> 	enum rte_flow_encap_hash_field dest_field;
> >> 	size_t expected size;
> >> } y[] = {
> >> 	{ RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT, 2 },
> >> 	{ RTE_FLOW_ENCAP_HASH_FIELD_NVGRE_FLOW_ID, 1 }
> >> };
> >> ```
> >>
> >> But as you mentioned this is a limited set, perhaps it is sufficient to
> >> document size requirement in the "enum rte_flow_encap_hash_field" API
> >> doxygen comment.
> >
> > Will add it to the doxygen.
> >
> >>
> >>
> >>
> >>>>> +		return rte_flow_error_set(error, EINVAL,
> >>>>> +
> >>>> RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
> >>>>> +					  "hash len doesn't match the
> >>>> requested field len");
> >>>>> +	dev = &rte_eth_devices[port_id];
> >>>>> +	ret = ops->flow_calc_encap_hash(dev, pattern, dest_field,
> hash,
> >>>> error);
> >>>>>
> >>>>
> >>>> 'hash_len' is get by API, but it is not passed to dev_ops, does this
> >>>> mean this information hardcoded in the driver as well, if so why
> >>>> duplicate this information in driver instead off passing hash_len to
> driver?
> >>>
> >>> Not sure I understand, like I wrote above this is pure verification from my
> >> point of view.
> >>> The driver knows the size based on the dest.
> >>>
> >>
> >> My intention was similar to above comment, like dest_field type
> >> RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT implies that required size
> should
> >> be
> >> 2 bytes, and it seems driver already knows about this requirement.
> >
> > That is correct, that is why I don't think we need the size, add added it
> > only for validation due to community request.
> >
> >>
> >> Instead, it can be possible to verify 'hash_len' in the API level, pass
> >> this information to the driver and driver use 'hash_len' directly for
> >> its size parameter, so driver will rely on API provided 'hash_len' value
> >> instead of storing this information within driver.
> >>
> >> Lets assume 10 drivers are implementing this feature, should all of them
> >> define MLX5DR_CRC_ENCAP_ENTROPY_HASH_SIZE_16 equivalent
> >> enum/define
> >> withing the driver?
> >
> > No, the driver implements hard-coded logic, which means that it just needs
> to know
> > the dest field, in order to know what hash to calculate
> > It is possible that for each field the HW will calculate the hash using
> different algorithm.
> >
> 
> OK if HW already needs to know the size in advance, lets go with enum
> doxygen update only.
> 
> > Also it is possible that the HW doesn't support writing to the expected field,
> in which case we
> > want the driver call to fail.
> >
> > Field implies size.
> > Size doesn't implies field.
> >
> >>
> >>>>
> >>>>
> >>>>> +	return flow_err(port_id, ret, error);
> >>>>> +}
> >>>>> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> >>>>> index 1267c146e5..2bdf3a4a17 100644
> >>>>> --- a/lib/ethdev/rte_flow.h
> >>>>> +++ b/lib/ethdev/rte_flow.h
> >>>>> @@ -6783,6 +6783,57 @@ rte_flow_calc_table_hash(uint16_t
> port_id,
> >>>> const struct rte_flow_template_table
> >>>>>  			 const struct rte_flow_item pattern[], uint8_t
> >>>> pattern_template_index,
> >>>>>  			 uint32_t *hash, struct rte_flow_error *error);
> >>>>>
> >>>>> +/**
> >>>>> + * @warning
> >>>>> + * @b EXPERIMENTAL: this API may change without prior notice.
> >>>>> + *
> >>>>> + * Destination field type for the hash calculation, when encap action
> is
> >>>> used.
> >>>>> + *
> >>>>> + * @see function rte_flow_calc_encap_hash
> >>>>> + */
> >>>>> +enum rte_flow_encap_hash_field {
> >>>>> +	/* Calculate hash placed in UDP source port field. */
> >>>>>
> >>
> >> Just recognized that comments are not doxygen comments.
> >
> > Thanks,
> > Will fix.
> >>
> >>>>> +	RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT,
> >>>>> +	/* Calculate hash placed in NVGRE flow ID field. */
> >>>>> +	RTE_FLOW_ENCAP_HASH_FIELD_NVGRE_FLOW_ID,
> >>>>> +};
> >>>>>
> >>>>
> >>>> Indeed above enum represents a field in a network protocol, right?
> >>>> Instead of having a 'RTE_FLOW_ENCAP_HASH_' specific one, can re-
> using
> >>>> 'enum rte_flow_field_id' work?
> >>>
> >>> Since the option are really limited and defined by standard, I prefer to
> have
> >> dedicated options.
> >>>
> >>
> >> OK, my intention is to reduce the duplication. Just for brainstorm, what
> >> is the benefit of having 'RTE_FLOW_ENCAP_HASH_' specific enums, if we
> >> can present them as generic protocol fiels, like
> >> 'RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT' vs
> >> 'RTE_FLOW_FIELD_UDP_PORT_SRC,'?
> >
> > I guess you want to go with 'RTE_FLOW_FIELD_UDP_PORT_SRC
> > right?
> >
> 
> I just want to discuss if redundancy can be eliminated.
> 
> > The main issue is since the options are really limited and used for a very
> dedicated function.
> > When app developers / DPDK developers will look at it, it will be very
> unclear what is the use of this enum.
> > We already have an enum for fields. Like you suggested we could have
> used it,
> > but this will show much more option than there are really.
> >
> 
> OK, lets use dedicated enums to clarify to the users the specific fields
> available for this set of APIs.
> 
> Btw, is boundary check like following required for the APIs:
> ```
> if (dest_field > RTE_FLOW_ENCAP_HASH_FIELD_NVGRE_FLOW_ID)
> 	return -EINVAL;
> ```
> In case user pass an invalid value as 'dest_filed'
> 
> (Note: I intentionally not used MAX enum something like
> 'RTE_FLOW_ENCAP_HASH_FIELD_MAX' to not need to deal with ABI issues in
> the future.)
Good idea will add
Best,
Ori


^ permalink raw reply	[relevance 0%]

* [PATCH v2] RFC: replace GCC marker extension with C11 anonymous unions
  @ 2024-02-13  6:45  3% ` Tyler Retzlaff
  2024-02-13  8:57  0%   ` Bruce Richardson
  2024-02-13 17:09  0%   ` Morten Brørup
  0 siblings, 2 replies; 200+ results
From: Tyler Retzlaff @ 2024-02-13  6:45 UTC (permalink / raw)
  To: dev
  Cc: Andrew Boyer, Andrew Rybchenko, Bruce Richardson, Chenbo Xia,
	Konstantin Ananyev, Maxime Coquelin, mb, Tyler Retzlaff

The zero sized RTE_MARKER<n> typedefs are a GCC extension unsupported by
MSVC.  Replace the use of the RTE_MARKER typedefs with anonymous unions.

Note:

v1 of the series tried to maintain the API after some study it has been
discovered that some existing uses of the markers do not produce compilation
failure but evaluate to unintended values in the absence of adaptation.
For this reason the existing markers cannot be removed because it is too hard
to identify what needs to be changed by consumers. While the ABI has been
maintained the subtle API change is just too risky.

The question I'm asking now is how to gracefully deprecate the markers
while allowing consumption of the struct on Windows.

I propose the following:

* Introduce the unions as per-this series except instead of adding members
  that match the original RTE_MARKER field names provide *new* names.
* Retain (conditionally compiled away on Windows) the existing RTE_MARKER
  fields with their original names.
* Convert in-tree code to use the new names in the unions.

The old names & markers would be announced for deprecation and eventually
removed and when they are the conditional compilation would also go away.

Thoughts?

v2:
    * Introduce additional union/struct to agnostically pad cachline0 to
      RTE_CACHE_LINE_MIN_SIZE without conditional compilation.
    * Adapt ixgbe access of rearm_data field.
    * Move ol_flags field out of rearm_data union where it didn't belong.
    * Added a couple of static_asserts for offset of cacheline1 and
      sizeof struct rte_mbuf.

Tyler Retzlaff (1):
  mbuf: replace GCC marker extension with C11 anonymous unions

 drivers/net/ionic/ionic_lif.c               |   8 +-
 drivers/net/ionic/ionic_rxtx_sg.c           |   4 +-
 drivers/net/ionic/ionic_rxtx_simple.c       |   2 +-
 drivers/net/ixgbe/ixgbe_rxtx_vec_sse.c      |   8 +-
 drivers/net/sfc/sfc_ef100_rx.c              |   8 +-
 drivers/net/sfc/sfc_ef10_rx.c               |  12 +-
 drivers/net/virtio/virtio_rxtx_packed_avx.h |   8 +-
 lib/mbuf/rte_mbuf_core.h                    | 276 ++++++++++++++++------------
 8 files changed, 179 insertions(+), 147 deletions(-)

-- 
1.8.3.1


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 1/4] ethdev: introduce encap hash calculation
  @ 2024-02-12 20:09  3%           ` Ferruh Yigit
  2024-02-13  7:05  0%             ` Ori Kam
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-02-12 20:09 UTC (permalink / raw)
  To: Ori Kam, Dariusz Sosnowski, cristian.dumitrescu,
	andrew.rybchenko, stephen, NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: dev, Raslan Darawsheh

On 2/12/2024 6:44 PM, Ori Kam wrote:
> Hi Ferruh
> 
>> -----Original Message-----
>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>> Sent: Monday, February 12, 2024 7:05 PM
>>
>> On 2/11/2024 7:29 AM, Ori Kam wrote:
>>> Hi Ferruh,
>>>
>>>> -----Original Message-----
>>>> From: Ferruh Yigit <ferruh.yigit@amd.com>
>>>> Sent: Thursday, February 8, 2024 7:13 PM
>>>> To: Ori Kam <orika@nvidia.com>; Dariusz Sosnowski
>>>>
>>>> On 2/8/2024 9:09 AM, Ori Kam wrote:
>>>>> During encapsulation of a packet, it is possible to change some
>>>>> outer headers to improve flow destribution.
>>>>> For example, from VXLAN RFC:
>>>>> "It is recommended that the UDP source port number
>>>>> be calculated using a hash of fields from the inner packet --
>>>>> one example being a hash of the inner Ethernet frame's headers.
>>>>> This is to enable a level of entropy for the ECMP/load-balancing"
>>>>>
>>>>> The tunnel protocol defines which outer field should hold this hash,
>>>>> but it doesn't define the hash calculation algorithm.
>>>>>
>>>>> An application that uses flow offloads gets the first few packets
>>>>> (exception path) and then decides to offload the flow.
>>>>> As a result, there are two
>>>>> different paths that a packet from a given flow may take.
>>>>> SW for the first few packets or HW for the rest.
>>>>> When the packet goes through the SW, the SW encapsulates the packet
>>>>> and must use the same hash calculation as the HW will do for
>>>>> the rest of the packets in this flow.
>>>>>
>>>>> the new function rte_flow_calc_encap_hash can query the hash value
>>>>> fromm the driver for a given packet as if the packet was passed
>>>>> through the HW.
>>>>>
>>>>> Signed-off-by: Ori Kam <orika@nvidia.com>
>>>>> Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
>>>>>
>>>>
>>>> <...>
>>>>
>>>>> +int
>>>>> +rte_flow_calc_encap_hash(uint16_t port_id, const struct rte_flow_item
>>>> pattern[],
>>>>> +			 enum rte_flow_encap_hash_field dest_field, uint8_t
>>>> hash_len,
>>>>> +			 uint8_t *hash, struct rte_flow_error *error)
>>>>> +{
>>>>> +	int ret;
>>>>> +	struct rte_eth_dev *dev;
>>>>> +	const struct rte_flow_ops *ops;
>>>>> +
>>>>> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
>>>>> +	ops = rte_flow_ops_get(port_id, error);
>>>>> +	if (!ops || !ops->flow_calc_encap_hash)
>>>>> +		return rte_flow_error_set(error, ENOTSUP,
>>>>> +
>>>> RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
>>>>> +					  "calc encap hash is not supported");
>>>>> +	if ((dest_field == RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT &&
>>>> hash_len != 2) ||
>>>>> +	    (dest_field == RTE_FLOW_ENCAP_HASH_FIELD_NVGRE_FLOW_ID
>>>> && hash_len != 1))
>>>>>
>>>>
>>>> If there is a fixed mapping with the dest_field and the size, instead of
>>>> putting this information into check code, what do you think to put it
>>>> into the data structure?
>>>>
>>>> I mean instead of using enum for dest_filed, it can be a struct that is
>>>> holding enum and its expected size, this clarifies what the expected
>>>> size for that field.
>>>>
>>>
>>> From my original email I think we only need the type, we don't need the
>> size.
>>> On the RFC thread there was an objection. So I added the size,
>>> If you think it is not needed lets remove it.
>>>
>>
>> I am not saying length is not needed, but
>> API gets 'dest_field' & 'hash_len', and according checks in the API for
>> each 'dest_field' there is an exact 'hash_len' requirement, this
>> requirement is something impacts user but this information is embedded
>> in the API, my suggestion is make it more visible to user.
>>
>> My initial suggestion was put this into an object, like:
>> ```
>> struct x {
>> 	enum rte_flow_encap_hash_field dest_field;
>> 	size_t expected size;
>> } y[] = {
>> 	{ RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT, 2 },
>> 	{ RTE_FLOW_ENCAP_HASH_FIELD_NVGRE_FLOW_ID, 1 }
>> };
>> ```
>>
>> But as you mentioned this is a limited set, perhaps it is sufficient to
>> document size requirement in the "enum rte_flow_encap_hash_field" API
>> doxygen comment.
> 
> Will add it to the doxygen.
> 
>>
>>
>>
>>>>> +		return rte_flow_error_set(error, EINVAL,
>>>>> +
>>>> RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
>>>>> +					  "hash len doesn't match the
>>>> requested field len");
>>>>> +	dev = &rte_eth_devices[port_id];
>>>>> +	ret = ops->flow_calc_encap_hash(dev, pattern, dest_field, hash,
>>>> error);
>>>>>
>>>>
>>>> 'hash_len' is get by API, but it is not passed to dev_ops, does this
>>>> mean this information hardcoded in the driver as well, if so why
>>>> duplicate this information in driver instead off passing hash_len to driver?
>>>
>>> Not sure I understand, like I wrote above this is pure verification from my
>> point of view.
>>> The driver knows the size based on the dest.
>>>
>>
>> My intention was similar to above comment, like dest_field type
>> RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT implies that required size should
>> be
>> 2 bytes, and it seems driver already knows about this requirement.
> 
> That is correct, that is why I don't think we need the size, add added it
> only for validation due to community request.
> 
>>
>> Instead, it can be possible to verify 'hash_len' in the API level, pass
>> this information to the driver and driver use 'hash_len' directly for
>> its size parameter, so driver will rely on API provided 'hash_len' value
>> instead of storing this information within driver.
>>
>> Lets assume 10 drivers are implementing this feature, should all of them
>> define MLX5DR_CRC_ENCAP_ENTROPY_HASH_SIZE_16 equivalent
>> enum/define
>> withing the driver?
> 
> No, the driver implements hard-coded logic, which means that it just needs to know
> the dest field, in order to know what hash to calculate
> It is possible that for each field the HW will calculate the hash using different algorithm.
> 

OK if HW already needs to know the size in advance, lets go with enum
doxygen update only.

> Also it is possible that the HW doesn't support writing to the expected field, in which case we 
> want the driver call to fail.
> 
> Field implies size.
> Size doesn't implies field.
> 
>>
>>>>
>>>>
>>>>> +	return flow_err(port_id, ret, error);
>>>>> +}
>>>>> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
>>>>> index 1267c146e5..2bdf3a4a17 100644
>>>>> --- a/lib/ethdev/rte_flow.h
>>>>> +++ b/lib/ethdev/rte_flow.h
>>>>> @@ -6783,6 +6783,57 @@ rte_flow_calc_table_hash(uint16_t port_id,
>>>> const struct rte_flow_template_table
>>>>>  			 const struct rte_flow_item pattern[], uint8_t
>>>> pattern_template_index,
>>>>>  			 uint32_t *hash, struct rte_flow_error *error);
>>>>>
>>>>> +/**
>>>>> + * @warning
>>>>> + * @b EXPERIMENTAL: this API may change without prior notice.
>>>>> + *
>>>>> + * Destination field type for the hash calculation, when encap action is
>>>> used.
>>>>> + *
>>>>> + * @see function rte_flow_calc_encap_hash
>>>>> + */
>>>>> +enum rte_flow_encap_hash_field {
>>>>> +	/* Calculate hash placed in UDP source port field. */
>>>>>
>>
>> Just recognized that comments are not doxygen comments.
> 
> Thanks,
> Will fix.
>>
>>>>> +	RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT,
>>>>> +	/* Calculate hash placed in NVGRE flow ID field. */
>>>>> +	RTE_FLOW_ENCAP_HASH_FIELD_NVGRE_FLOW_ID,
>>>>> +};
>>>>>
>>>>
>>>> Indeed above enum represents a field in a network protocol, right?
>>>> Instead of having a 'RTE_FLOW_ENCAP_HASH_' specific one, can re-using
>>>> 'enum rte_flow_field_id' work?
>>>
>>> Since the option are really limited and defined by standard, I prefer to have
>> dedicated options.
>>>
>>
>> OK, my intention is to reduce the duplication. Just for brainstorm, what
>> is the benefit of having 'RTE_FLOW_ENCAP_HASH_' specific enums, if we
>> can present them as generic protocol fiels, like
>> 'RTE_FLOW_ENCAP_HASH_FIELD_SRC_PORT' vs
>> 'RTE_FLOW_FIELD_UDP_PORT_SRC,'?
> 
> I guess you want to go with 'RTE_FLOW_FIELD_UDP_PORT_SRC
> right?
>

I just want to discuss if redundancy can be eliminated.

> The main issue is since the options are really limited and used for a very dedicated function.
> When app developers / DPDK developers will look at it, it will be very unclear what is the use of this enum.
> We already have an enum for fields. Like you suggested we could have used it,
> but this will show much more option than there are really.
> 

OK, lets use dedicated enums to clarify to the users the specific fields
available for this set of APIs.

Btw, is boundary check like following required for the APIs:
```
if (dest_field > RTE_FLOW_ENCAP_HASH_FIELD_NVGRE_FLOW_ID)
	return -EINVAL;
```
In case user pass an invalid value as 'dest_filed'

(Note: I intentionally not used MAX enum something like
'RTE_FLOW_ENCAP_HASH_FIELD_MAX' to not need to deal with ABI issues in
the future.)


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v7 00/19] Replace use of PMD logtype
  2024-02-03  4:10  3% ` [PATCH v7 00/19] Replace use of PMD logtype Stephen Hemminger
@ 2024-02-12 14:45  0%   ` David Marchand
  0 siblings, 0 replies; 200+ results
From: David Marchand @ 2024-02-12 14:45 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: dev, Thomas Monjalon

On Sat, Feb 3, 2024 at 5:11 AM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> Many of the uses of PMD logtype have already been fixed.
> But there are still some leftovers, mostly places where
> drivers had a logtype but did not use them.
>
> Note: this is not an ABI break, but could break out of
>       tree drivers that never updated to use dynamic logtype.
>       DPDK never guaranteed that that would not happen.
>
> v7 - drop changes to newlines
>      drop changes related to RTE_LOG_DP
>      rebase now that other stuff has changed

Series applied.

Edits I did:
- fixed crypto/armv8 (compilation broken because of typo),
- fixed one missed use of PMD in crypto/caam_jr,
- fixed net/nfb build (thanks to Thomas for reporting),
- preferred per level macros instead of CAAM_JR_LOG, like in the rest
of the crypto/caam_jr driver,
- dropped more unrelated changes on \n in crypto/dpaa*,
- I also reorganised the commits, fixed (well dropped) wrong commit
title, typos, tried to use more consistent wording,


-- 
David Marchand


^ permalink raw reply	[relevance 0%]

* RE: [v7 1/1] net/af_xdp: fix multi interface support for K8s
  2024-02-07 23:24  0%             ` Ferruh Yigit
@ 2024-02-09 12:40  0%               ` Loftus, Ciara
  0 siblings, 0 replies; 200+ results
From: Loftus, Ciara @ 2024-02-09 12:40 UTC (permalink / raw)
  To: Ferruh Yigit, Tahhan, Maryam, stephen, lihuisong, fengchengwen,
	liuyonglong, Marchand, David
  Cc: dev, Koikkara Reeny, Shibin, Kevin Traynor, Luca Boccassi

> 
> On 1/11/2024 2:21 PM, Ferruh Yigit wrote:
> > On 1/11/2024 12:21 PM, Maryam Tahhan wrote:
> >> On 11/01/2024 11:35, Ferruh Yigit wrote:
> >>> Devarg is user interface, changing it impacts the user.
> >>>
> >>> Assume that user of '22.11.3' using 'use_cni' dev_arg, it will be broken
> >>> when user upgrades DPDK to '22.11.4', which is not expected.
> >>>
> >>> dev_arg is not API/ABI but as it impacts the user, it is in the gray
> >>> area to backport to the LTS release.
> >> Fair enough
> >>> Current patch doesn't have Fixes tag or stable tag, so it doesn't
> >>> request to be backported to LTS release. I took this as an improvement,
> >>> more than a fix.
> >>
> >> This was overlooked by me apologies. It's been a while since I've
> >> contributed to DPDK and I must've missed this detail in the contribution
> >> guide.
> >>> As far as I understand existing code (that use 'use_cni' dev_arg)
> >>> supports only single netdev, this patch adds support for multiple netdevs.
> >>
> >> The use_cni implementation will no longer work with the AF_XDP DP as the
> >> use_cni was originally implemented as it has hard coded what's now an
> >> incorrect path for the UDS.
> >>
> >>> So what do you think keep LTS with 'use_cni' dev_arg, is there a
> >>> requirement to update LTS release?
> >>> If so, can it be an option to keep 'use_cni' for backward compatibility
> >>> but add only add 'uds_path' and remove 'use_cni' in next LTS?
> >>
> >>
> >> Yeah we can go back to the version of the patch that had the 'use_cni'
> >> flag that was used in combination with the path argument. We can add
> >> better documentation re the "use_cni" misnomer... What we can then do is
> >> if no path argument is set by the user assume their intent and and
> >> generate the path internally in the AF_XDP PMD (which was suggested by
> >> Shibin at some stage). That way there should be no surprises to the End
> >> User.
> >>
> >
> > Ack, this keeps backward compatibility,
> >
> > BUT if 'use_cni' is already broken in v23.11 (that is what I understand
> > from your above comment), means there is no user of it in LTS, and we
> > can be more pragmatic and replace the dev_args, by backporting this
> > patch, assuming LTS maintainer is also OK with it.
> >
> 
> Hi Maryam,
> 
> How do you want to continue with the patch, I think options we considered:
> 
> 1. Fix 'use_cni' documentation (which we can backport to LTS) and
> overload the argument for new purpose. This will enable new feature by
> keeping backward compatibility. And requires new version of this patch.
> 
> 2. If the 'use_cni' is completely broken in the 23.11 LTS, which means
> there is no user or backward compatibility to worry about, we can merge
> this patch and backport it to LTS.
> 
> 3. Don't backport this fix to LTS, merge only to current release, which
> means your new feature won't be available to some users as long as a few
> years.
> 
> 
> (1.) is most user friendly, but if 'use_cni' already broken in LTS we
> can go with option (2.). What do you think?
> 
> 
> 
> btw, @Ciara, @Maryam, if (2.) is true, how we end up having a feature
> ('use_cni' dev_args) completely broken in an LTS release?

My understanding is that the use_cni implementation that is available in the 23.11 LTS is compatible with a particular version of the afxdp-plugins-for-kubernetes source. Maryam's change makes it compatible with the latest version. @Maryam can you confirm this?
If my understanding is correct then I think we should include the version/tag/commit-id of afxdp-plugins-for-kubernetes that the code is compatible with. Including backporting a patch to LTS to specify what version that code is comaptible with.

> 
> 
> 
> >
> >> Long term I would like to keep a (renamed) path argument (in case the
> >> path does ever change from the AF_XDP DP POV) and use it also in
> >> combination with another (maybe boolean) param for passing pinned bpf
> >> maps rather than another separate path.
> >>
> >> WDYT? Would this work for the LTS release?
> >>
> >>
> >


^ permalink raw reply	[relevance 0%]

* [PATCH v4 4/7] net/tap: rewrite the RSS BPF program
  @ 2024-02-08 19:05  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-02-08 19:05 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite the BPF program used to do queue based RSS.
Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF.
	- the resulting queue is placed in skb rather
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  12 ++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 272 +++++++++++++++++++++++++
 9 files changed, 365 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace2e..01a47a760660 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc704..000000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 000000000000..960a10da73b8
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,12 @@
+This is the BPF program used to implement the RSS across queues
+flow action. It works like the skbedit tc filter but instead of mapping
+to only one queues, it maps to multiple queues based on RSS hash.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+- requires libbpf version XX or later
+- rebuilding the BPF requires clang and bpftool
+- only Toeplitz hash with standard 40 byte key is supported
+- the number of queues per RSS action is limited to 16
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 2638a8a4ac9a..000000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	({								\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c0f..000000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4eca..000000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 000000000000..f2c03a19fd4d
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c30..000000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 000000000000..1abd18cb606e
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows
+ * which need BPF RSS.
+ *
+ * The hash is indexed by the tc_index.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u16));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_MAX_QUEUES);
+} rss_map SEC(".maps");
+
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/* Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* No L4 if packet is a fragmented */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP ports */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/* parse ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP ports */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/* scale value to be into range [0, n), assumes val is large */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/* layout of qdisc skb cb (from sch_generic.h) */
+struct qdisc_skb_cb {
+	struct {
+		unsigned int	pkt_len;
+		__u16		dev_queue_mapping;
+		__u16		tc_classid;
+	};
+#define QDISC_CB_PRIV_LEN 20
+	unsigned char		data[QDISC_CB_PRIV_LEN];
+};
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u16 classid;
+	__u32 hash;
+
+	/* TC layer puts the BPF_CLASSID into the skb cb area */
+	classid = ((const struct qdisc_skb_cb *)skb->cb)->tc_classid;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &classid);
+	if (rsskey == NULL) {
+		bpf_printk("hash(): rss not configured");
+		return TC_ACT_OK;
+	}
+
+	hash = calculate_rss_hash(skb, rsskey);
+	bpf_printk("hash %u\n", hash);
+	if (hash) {
+		/* Fold hash to the number of queues configured */
+		skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+		bpf_printk("queue %u\n", skb->queue_mapping);
+		return TC_ACT_PIPE;
+	}
+	return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* [PATCH v3 4/7] net/tap: rewrite the RSS BPF program
  @ 2024-02-08 17:41  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-02-08 17:41 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite the BPF program used to do queue based RSS.
Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF.
	- the resulting queue is placed in skb rather
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  12 ++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 272 +++++++++++++++++++++++++
 9 files changed, 365 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace2e..01a47a760660 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc704..000000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 000000000000..960a10da73b8
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,12 @@
+This is the BPF program used to implement the RSS across queues
+flow action. It works like the skbedit tc filter but instead of mapping
+to only one queues, it maps to multiple queues based on RSS hash.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+- requires libbpf version XX or later
+- rebuilding the BPF requires clang and bpftool
+- only Toeplitz hash with standard 40 byte key is supported
+- the number of queues per RSS action is limited to 16
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 2638a8a4ac9a..000000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	({								\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c0f..000000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4eca..000000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 000000000000..f2c03a19fd4d
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c30..000000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 000000000000..1abd18cb606e
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows
+ * which need BPF RSS.
+ *
+ * The hash is indexed by the tc_index.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u16));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_MAX_QUEUES);
+} rss_map SEC(".maps");
+
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/* Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* No L4 if packet is a fragmented */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP ports */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/* parse ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP ports */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/* scale value to be into range [0, n), assumes val is large */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/* layout of qdisc skb cb (from sch_generic.h) */
+struct qdisc_skb_cb {
+	struct {
+		unsigned int	pkt_len;
+		__u16		dev_queue_mapping;
+		__u16		tc_classid;
+	};
+#define QDISC_CB_PRIV_LEN 20
+	unsigned char		data[QDISC_CB_PRIV_LEN];
+};
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u16 classid;
+	__u32 hash;
+
+	/* TC layer puts the BPF_CLASSID into the skb cb area */
+	classid = ((const struct qdisc_skb_cb *)skb->cb)->tc_classid;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &classid);
+	if (rsskey == NULL) {
+		bpf_printk("hash(): rss not configured");
+		return TC_ACT_OK;
+	}
+
+	hash = calculate_rss_hash(skb, rsskey);
+	bpf_printk("hash %u\n", hash);
+	if (hash) {
+		/* Fold hash to the number of queues configured */
+		skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+		bpf_printk("queue %u\n", skb->queue_mapping);
+		return TC_ACT_PIPE;
+	}
+	return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* Re: [v7 1/1] net/af_xdp: fix multi interface support for K8s
  @ 2024-02-07 23:24  0%             ` Ferruh Yigit
  2024-02-09 12:40  0%               ` Loftus, Ciara
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-02-07 23:24 UTC (permalink / raw)
  To: Maryam Tahhan, stephen, lihuisong, fengchengwen, liuyonglong,
	david.marchand
  Cc: dev, Ciara Loftus, Shibin Koikkara Reeny, Kevin Traynor, Luca Boccassi

On 1/11/2024 2:21 PM, Ferruh Yigit wrote:
> On 1/11/2024 12:21 PM, Maryam Tahhan wrote:
>> On 11/01/2024 11:35, Ferruh Yigit wrote:
>>> Devarg is user interface, changing it impacts the user.
>>>
>>> Assume that user of '22.11.3' using 'use_cni' dev_arg, it will be broken
>>> when user upgrades DPDK to '22.11.4', which is not expected.
>>>
>>> dev_arg is not API/ABI but as it impacts the user, it is in the gray
>>> area to backport to the LTS release.
>> Fair enough
>>> Current patch doesn't have Fixes tag or stable tag, so it doesn't
>>> request to be backported to LTS release. I took this as an improvement,
>>> more than a fix.
>>
>> This was overlooked by me apologies. It's been a while since I've
>> contributed to DPDK and I must've missed this detail in the contribution
>> guide.
>>> As far as I understand existing code (that use 'use_cni' dev_arg)
>>> supports only single netdev, this patch adds support for multiple netdevs.
>>
>> The use_cni implementation will no longer work with the AF_XDP DP as the
>> use_cni was originally implemented as it has hard coded what's now an
>> incorrect path for the UDS.
>>
>>> So what do you think keep LTS with 'use_cni' dev_arg, is there a
>>> requirement to update LTS release?
>>> If so, can it be an option to keep 'use_cni' for backward compatibility
>>> but add only add 'uds_path' and remove 'use_cni' in next LTS?
>>
>>
>> Yeah we can go back to the version of the patch that had the 'use_cni'
>> flag that was used in combination with the path argument. We can add
>> better documentation re the "use_cni" misnomer... What we can then do is
>> if no path argument is set by the user assume their intent and and
>> generate the path internally in the AF_XDP PMD (which was suggested by
>> Shibin at some stage). That way there should be no surprises to the End
>> User.
>>
> 
> Ack, this keeps backward compatibility,
> 
> BUT if 'use_cni' is already broken in v23.11 (that is what I understand
> from your above comment), means there is no user of it in LTS, and we
> can be more pragmatic and replace the dev_args, by backporting this
> patch, assuming LTS maintainer is also OK with it.
> 

Hi Maryam,

How do you want to continue with the patch, I think options we considered:

1. Fix 'use_cni' documentation (which we can backport to LTS) and
overload the argument for new purpose. This will enable new feature by
keeping backward compatibility. And requires new version of this patch.

2. If the 'use_cni' is completely broken in the 23.11 LTS, which means
there is no user or backward compatibility to worry about, we can merge
this patch and backport it to LTS.

3. Don't backport this fix to LTS, merge only to current release, which
means your new feature won't be available to some users as long as a few
years.


(1.) is most user friendly, but if 'use_cni' already broken in LTS we
can go with option (2.). What do you think?



btw, @Ciara, @Maryam, if (2.) is true, how we end up having a feature
('use_cni' dev_args) completely broken in an LTS release?



> 
>> Long term I would like to keep a (renamed) path argument (in case the
>> path does ever change from the AF_XDP DP POV) and use it also in
>> combination with another (maybe boolean) param for passing pinned bpf
>> maps rather than another separate path.
>>
>> WDYT? Would this work for the LTS release?
>>
>>
> 


^ permalink raw reply	[relevance 0%]

* [PATCH v2 4/7] net/tap: rewrite the RSS BPF program
  @ 2024-02-07 22:11  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-02-07 22:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Rewrite the BPF program used to do queue based RSS.
Important changes:
	- uses newer BPF map format BTF
	- accepts key as parameter rather than constant default
	- can do L3 or L4 hashing
	- supports IPv4 options
	- supports IPv6 extension headers
	- restructured for readability

The usage of BPF is different as well:
	- the incoming configuration is looked up based on
	  class parameters rather than patching the BPF.
	- the resulting queue is placed in skb rather
	  than requiring a second pass through classifier step.

Note: This version only works with later patch to enable it on
the DPDK driver side. It is submitted as an incremental patch
to allow for easier review. Bisection still works because
the old instruction are still present for now.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
---
 .gitignore                            |   3 -
 drivers/net/tap/bpf/Makefile          |  19 --
 drivers/net/tap/bpf/README            |  12 ++
 drivers/net/tap/bpf/bpf_api.h         | 276 --------------------------
 drivers/net/tap/bpf/bpf_elf.h         |  53 -----
 drivers/net/tap/bpf/bpf_extract.py    |  85 --------
 drivers/net/tap/bpf/meson.build       |  81 ++++++++
 drivers/net/tap/bpf/tap_bpf_program.c | 255 ------------------------
 drivers/net/tap/bpf/tap_rss.c         | 272 +++++++++++++++++++++++++
 9 files changed, 365 insertions(+), 691 deletions(-)
 delete mode 100644 drivers/net/tap/bpf/Makefile
 create mode 100644 drivers/net/tap/bpf/README
 delete mode 100644 drivers/net/tap/bpf/bpf_api.h
 delete mode 100644 drivers/net/tap/bpf/bpf_elf.h
 delete mode 100644 drivers/net/tap/bpf/bpf_extract.py
 create mode 100644 drivers/net/tap/bpf/meson.build
 delete mode 100644 drivers/net/tap/bpf/tap_bpf_program.c
 create mode 100644 drivers/net/tap/bpf/tap_rss.c

diff --git a/.gitignore b/.gitignore
index 3f444dcace2e..01a47a760660 100644
--- a/.gitignore
+++ b/.gitignore
@@ -36,9 +36,6 @@ TAGS
 # ignore python bytecode files
 *.pyc
 
-# ignore BPF programs
-drivers/net/tap/bpf/tap_bpf_program.o
-
 # DTS results
 dts/output
 
diff --git a/drivers/net/tap/bpf/Makefile b/drivers/net/tap/bpf/Makefile
deleted file mode 100644
index 9efeeb1bc704..000000000000
--- a/drivers/net/tap/bpf/Makefile
+++ /dev/null
@@ -1,19 +0,0 @@
-# SPDX-License-Identifier: BSD-3-Clause
-# This file is not built as part of normal DPDK build.
-# It is used to generate the eBPF code for TAP RSS.
-
-CLANG=clang
-CLANG_OPTS=-O2
-TARGET=../tap_bpf_insns.h
-
-all: $(TARGET)
-
-clean:
-	rm tap_bpf_program.o $(TARGET)
-
-tap_bpf_program.o: tap_bpf_program.c
-	$(CLANG) $(CLANG_OPTS) -emit-llvm -c $< -o - | \
-	llc -march=bpf -filetype=obj -o $@
-
-$(TARGET): tap_bpf_program.o
-	python3 bpf_extract.py -stap_bpf_program.c -o $@ $<
diff --git a/drivers/net/tap/bpf/README b/drivers/net/tap/bpf/README
new file mode 100644
index 000000000000..960a10da73b8
--- /dev/null
+++ b/drivers/net/tap/bpf/README
@@ -0,0 +1,12 @@
+This is the BPF program used to implement the RSS across queues
+flow action. It works like the skbedit tc filter but instead of mapping
+to only one queues, it maps to multiple queues based on RSS hash.
+
+This version is built the BPF Compile Once — Run Everywhere (CO-RE)
+framework and uses libbpf and bpftool.
+
+Limitations
+- requires libbpf version XX or later
+- rebuilding the BPF requires clang and bpftool
+- only Toeplitz hash with standard 40 byte key is supported
+- the number of queues per RSS action is limited to 16
diff --git a/drivers/net/tap/bpf/bpf_api.h b/drivers/net/tap/bpf/bpf_api.h
deleted file mode 100644
index 2638a8a4ac9a..000000000000
--- a/drivers/net/tap/bpf/bpf_api.h
+++ /dev/null
@@ -1,276 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-
-#ifndef __BPF_API__
-#define __BPF_API__
-
-/* Note:
- *
- * This file can be included into eBPF kernel programs. It contains
- * a couple of useful helper functions, map/section ABI (bpf_elf.h),
- * misc macros and some eBPF specific LLVM built-ins.
- */
-
-#include <stdint.h>
-
-#include <linux/pkt_cls.h>
-#include <linux/bpf.h>
-#include <linux/filter.h>
-
-#include <asm/byteorder.h>
-
-#include "bpf_elf.h"
-
-/** libbpf pin type. */
-enum libbpf_pin_type {
-	LIBBPF_PIN_NONE,
-	/* PIN_BY_NAME: pin maps by name (in /sys/fs/bpf by default) */
-	LIBBPF_PIN_BY_NAME,
-};
-
-/** Type helper macros. */
-
-#define __uint(name, val) int (*name)[val]
-#define __type(name, val) typeof(val) *name
-#define __array(name, val) typeof(val) *name[]
-
-/** Misc macros. */
-
-#ifndef __stringify
-# define __stringify(X)		#X
-#endif
-
-#ifndef __maybe_unused
-# define __maybe_unused		__attribute__((__unused__))
-#endif
-
-#ifndef offsetof
-# define offsetof(TYPE, MEMBER)	__builtin_offsetof(TYPE, MEMBER)
-#endif
-
-#ifndef likely
-# define likely(X)		__builtin_expect(!!(X), 1)
-#endif
-
-#ifndef unlikely
-# define unlikely(X)		__builtin_expect(!!(X), 0)
-#endif
-
-#ifndef htons
-# define htons(X)		__constant_htons((X))
-#endif
-
-#ifndef ntohs
-# define ntohs(X)		__constant_ntohs((X))
-#endif
-
-#ifndef htonl
-# define htonl(X)		__constant_htonl((X))
-#endif
-
-#ifndef ntohl
-# define ntohl(X)		__constant_ntohl((X))
-#endif
-
-#ifndef __inline__
-# define __inline__		__attribute__((always_inline))
-#endif
-
-/** Section helper macros. */
-
-#ifndef __section
-# define __section(NAME)						\
-	__attribute__((section(NAME), used))
-#endif
-
-#ifndef __section_tail
-# define __section_tail(ID, KEY)					\
-	__section(__stringify(ID) "/" __stringify(KEY))
-#endif
-
-#ifndef __section_xdp_entry
-# define __section_xdp_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_cls_entry
-# define __section_cls_entry						\
-	__section(ELF_SECTION_CLASSIFIER)
-#endif
-
-#ifndef __section_act_entry
-# define __section_act_entry						\
-	__section(ELF_SECTION_ACTION)
-#endif
-
-#ifndef __section_lwt_entry
-# define __section_lwt_entry						\
-	__section(ELF_SECTION_PROG)
-#endif
-
-#ifndef __section_license
-# define __section_license						\
-	__section(ELF_SECTION_LICENSE)
-#endif
-
-#ifndef __section_maps
-# define __section_maps							\
-	__section(ELF_SECTION_MAPS)
-#endif
-
-/** Declaration helper macros. */
-
-#ifndef BPF_LICENSE
-# define BPF_LICENSE(NAME)						\
-	char ____license[] __section_license = NAME
-#endif
-
-/** Classifier helper */
-
-#ifndef BPF_H_DEFAULT
-# define BPF_H_DEFAULT	-1
-#endif
-
-/** BPF helper functions for tc. Individual flags are in linux/bpf.h */
-
-#ifndef __BPF_FUNC
-# define __BPF_FUNC(NAME, ...)						\
-	(* NAME)(__VA_ARGS__) __maybe_unused
-#endif
-
-#ifndef BPF_FUNC
-# define BPF_FUNC(NAME, ...)						\
-	__BPF_FUNC(NAME, __VA_ARGS__) = (void *) BPF_FUNC_##NAME
-#endif
-
-/* Map access/manipulation */
-static void *BPF_FUNC(map_lookup_elem, void *map, const void *key);
-static int BPF_FUNC(map_update_elem, void *map, const void *key,
-		    const void *value, uint32_t flags);
-static int BPF_FUNC(map_delete_elem, void *map, const void *key);
-
-/* Time access */
-static uint64_t BPF_FUNC(ktime_get_ns);
-
-/* Debugging */
-
-/* FIXME: __attribute__ ((format(printf, 1, 3))) not possible unless
- * llvm bug https://llvm.org/bugs/show_bug.cgi?id=26243 gets resolved.
- * It would require ____fmt to be made const, which generates a reloc
- * entry (non-map).
- */
-static void BPF_FUNC(trace_printk, const char *fmt, int fmt_size, ...);
-
-#ifndef printt
-# define printt(fmt, ...)						\
-	({								\
-		char ____fmt[] = fmt;					\
-		trace_printk(____fmt, sizeof(____fmt), ##__VA_ARGS__);	\
-	})
-#endif
-
-/* Random numbers */
-static uint32_t BPF_FUNC(get_prandom_u32);
-
-/* Tail calls */
-static void BPF_FUNC(tail_call, struct __sk_buff *skb, void *map,
-		     uint32_t index);
-
-/* System helpers */
-static uint32_t BPF_FUNC(get_smp_processor_id);
-static uint32_t BPF_FUNC(get_numa_node_id);
-
-/* Packet misc meta data */
-static uint32_t BPF_FUNC(get_cgroup_classid, struct __sk_buff *skb);
-static int BPF_FUNC(skb_under_cgroup, void *map, uint32_t index);
-
-static uint32_t BPF_FUNC(get_route_realm, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(get_hash_recalc, struct __sk_buff *skb);
-static uint32_t BPF_FUNC(set_hash_invalid, struct __sk_buff *skb);
-
-/* Packet redirection */
-static int BPF_FUNC(redirect, int ifindex, uint32_t flags);
-static int BPF_FUNC(clone_redirect, struct __sk_buff *skb, int ifindex,
-		    uint32_t flags);
-
-/* Packet manipulation */
-static int BPF_FUNC(skb_load_bytes, struct __sk_buff *skb, uint32_t off,
-		    void *to, uint32_t len);
-static int BPF_FUNC(skb_store_bytes, struct __sk_buff *skb, uint32_t off,
-		    const void *from, uint32_t len, uint32_t flags);
-
-static int BPF_FUNC(l3_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(l4_csum_replace, struct __sk_buff *skb, uint32_t off,
-		    uint32_t from, uint32_t to, uint32_t flags);
-static int BPF_FUNC(csum_diff, const void *from, uint32_t from_size,
-		    const void *to, uint32_t to_size, uint32_t seed);
-static int BPF_FUNC(csum_update, struct __sk_buff *skb, uint32_t wsum);
-
-static int BPF_FUNC(skb_change_type, struct __sk_buff *skb, uint32_t type);
-static int BPF_FUNC(skb_change_proto, struct __sk_buff *skb, uint32_t proto,
-		    uint32_t flags);
-static int BPF_FUNC(skb_change_tail, struct __sk_buff *skb, uint32_t nlen,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_pull_data, struct __sk_buff *skb, uint32_t len);
-
-/* Event notification */
-static int __BPF_FUNC(skb_event_output, struct __sk_buff *skb, void *map,
-		      uint64_t index, const void *data, uint32_t size) =
-		      (void *) BPF_FUNC_perf_event_output;
-
-/* Packet vlan encap/decap */
-static int BPF_FUNC(skb_vlan_push, struct __sk_buff *skb, uint16_t proto,
-		    uint16_t vlan_tci);
-static int BPF_FUNC(skb_vlan_pop, struct __sk_buff *skb);
-
-/* Packet tunnel encap/decap */
-static int BPF_FUNC(skb_get_tunnel_key, struct __sk_buff *skb,
-		    struct bpf_tunnel_key *to, uint32_t size, uint32_t flags);
-static int BPF_FUNC(skb_set_tunnel_key, struct __sk_buff *skb,
-		    const struct bpf_tunnel_key *from, uint32_t size,
-		    uint32_t flags);
-
-static int BPF_FUNC(skb_get_tunnel_opt, struct __sk_buff *skb,
-		    void *to, uint32_t size);
-static int BPF_FUNC(skb_set_tunnel_opt, struct __sk_buff *skb,
-		    const void *from, uint32_t size);
-
-/** LLVM built-ins, mem*() routines work for constant size */
-
-#ifndef lock_xadd
-# define lock_xadd(ptr, val)	((void) __sync_fetch_and_add(ptr, val))
-#endif
-
-#ifndef memset
-# define memset(s, c, n)	__builtin_memset((s), (c), (n))
-#endif
-
-#ifndef memcpy
-# define memcpy(d, s, n)	__builtin_memcpy((d), (s), (n))
-#endif
-
-#ifndef memmove
-# define memmove(d, s, n)	__builtin_memmove((d), (s), (n))
-#endif
-
-/* FIXME: __builtin_memcmp() is not yet fully usable unless llvm bug
- * https://llvm.org/bugs/show_bug.cgi?id=26218 gets resolved. Also
- * this one would generate a reloc entry (non-map), otherwise.
- */
-#if 0
-#ifndef memcmp
-# define memcmp(a, b, n)	__builtin_memcmp((a), (b), (n))
-#endif
-#endif
-
-unsigned long long load_byte(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.byte");
-
-unsigned long long load_half(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.half");
-
-unsigned long long load_word(void *skb, unsigned long long off)
-	asm ("llvm.bpf.load.word");
-
-#endif /* __BPF_API__ */
diff --git a/drivers/net/tap/bpf/bpf_elf.h b/drivers/net/tap/bpf/bpf_elf.h
deleted file mode 100644
index ea8a11c95c0f..000000000000
--- a/drivers/net/tap/bpf/bpf_elf.h
+++ /dev/null
@@ -1,53 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 or BSD-3-Clause */
-#ifndef __BPF_ELF__
-#define __BPF_ELF__
-
-#include <asm/types.h>
-
-/* Note:
- *
- * Below ELF section names and bpf_elf_map structure definition
- * are not (!) kernel ABI. It's rather a "contract" between the
- * application and the BPF loader in tc. For compatibility, the
- * section names should stay as-is. Introduction of aliases, if
- * needed, are a possibility, though.
- */
-
-/* ELF section names, etc */
-#define ELF_SECTION_LICENSE	"license"
-#define ELF_SECTION_MAPS	"maps"
-#define ELF_SECTION_PROG	"prog"
-#define ELF_SECTION_CLASSIFIER	"classifier"
-#define ELF_SECTION_ACTION	"action"
-
-#define ELF_MAX_MAPS		64
-#define ELF_MAX_LICENSE_LEN	128
-
-/* Object pinning settings */
-#define PIN_NONE		0
-#define PIN_OBJECT_NS		1
-#define PIN_GLOBAL_NS		2
-
-/* ELF map definition */
-struct bpf_elf_map {
-	__u32 type;
-	__u32 size_key;
-	__u32 size_value;
-	__u32 max_elem;
-	__u32 flags;
-	__u32 id;
-	__u32 pinning;
-	__u32 inner_id;
-	__u32 inner_idx;
-};
-
-#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val)		\
-	struct ____btf_map_##name {				\
-		type_key key;					\
-		type_val value;					\
-	};							\
-	struct ____btf_map_##name				\
-	    __attribute__ ((section(".maps." #name), used))	\
-	    ____btf_map_##name = { }
-
-#endif /* __BPF_ELF__ */
diff --git a/drivers/net/tap/bpf/bpf_extract.py b/drivers/net/tap/bpf/bpf_extract.py
deleted file mode 100644
index 73c4dafe4eca..000000000000
--- a/drivers/net/tap/bpf/bpf_extract.py
+++ /dev/null
@@ -1,85 +0,0 @@
-#!/usr/bin/env python3
-# SPDX-License-Identifier: BSD-3-Clause
-# Copyright (c) 2023 Stephen Hemminger <stephen@networkplumber.org>
-
-import argparse
-import sys
-import struct
-from tempfile import TemporaryFile
-from elftools.elf.elffile import ELFFile
-
-
-def load_sections(elffile):
-    """Get sections of interest from ELF"""
-    result = []
-    parts = [("cls_q", "cls_q_insns"), ("l3_l4", "l3_l4_hash_insns")]
-    for name, tag in parts:
-        section = elffile.get_section_by_name(name)
-        if section:
-            insns = struct.iter_unpack('<BBhL', section.data())
-            result.append([tag, insns])
-    return result
-
-
-def dump_section(name, insns, out):
-    """Dump the array of BPF instructions"""
-    print(f'\nstatic struct bpf_insn {name}[] = {{', file=out)
-    for bpf in insns:
-        code = bpf[0]
-        src = bpf[1] >> 4
-        dst = bpf[1] & 0xf
-        off = bpf[2]
-        imm = bpf[3]
-        print(f'\t{{{code:#04x}, {dst:4d}, {src:4d}, {off:8d}, {imm:#010x}}},',
-              file=out)
-    print('};', file=out)
-
-
-def parse_args():
-    """Parse command line arguments"""
-    parser = argparse.ArgumentParser()
-    parser.add_argument('-s',
-                        '--source',
-                        type=str,
-                        help="original source file")
-    parser.add_argument('-o', '--out', type=str, help="output C file path")
-    parser.add_argument("file",
-                        nargs='+',
-                        help="object file path or '-' for stdin")
-    return parser.parse_args()
-
-
-def open_input(path):
-    """Open the file or stdin"""
-    if path == "-":
-        temp = TemporaryFile()
-        temp.write(sys.stdin.buffer.read())
-        return temp
-    return open(path, 'rb')
-
-
-def write_header(out, source):
-    """Write file intro header"""
-    print("/* SPDX-License-Identifier: BSD-3-Clause", file=out)
-    if source:
-        print(f' * Auto-generated from {source}', file=out)
-    print(" * This not the original source file. Do NOT edit it.", file=out)
-    print(" */\n", file=out)
-
-
-def main():
-    '''program main function'''
-    args = parse_args()
-
-    with open(args.out, 'w',
-              encoding="utf-8") if args.out else sys.stdout as out:
-        write_header(out, args.source)
-        for path in args.file:
-            elffile = ELFFile(open_input(path))
-            sections = load_sections(elffile)
-            for name, insns in sections:
-                dump_section(name, insns, out)
-
-
-if __name__ == "__main__":
-    main()
diff --git a/drivers/net/tap/bpf/meson.build b/drivers/net/tap/bpf/meson.build
new file mode 100644
index 000000000000..f2c03a19fd4d
--- /dev/null
+++ b/drivers/net/tap/bpf/meson.build
@@ -0,0 +1,81 @@
+# SPDX-License-Identifier: BSD-3-Clause
+# Copyright 2024 Stephen Hemminger <stephen@networkplumber.org>
+
+enable_tap_rss = false
+
+libbpf = dependency('libbpf', required: false, method: 'pkg-config')
+if not libbpf.found()
+    message('net/tap: no RSS support missing libbpf')
+    subdir_done()
+endif
+
+# Debian install this in /usr/sbin which is not in $PATH
+bpftool = find_program('bpftool', '/usr/sbin/bpftool', required: false, version: '>= 5.6.0')
+if not bpftool.found()
+    message('net/tap: no RSS support missing bpftool')
+    subdir_done()
+endif
+
+clang_supports_bpf = false
+clang = find_program('clang', required: false)
+if clang.found()
+    clang_supports_bpf = run_command(clang, '-target', 'bpf', '--print-supported-cpus',
+                                     check: false).returncode() == 0
+endif
+
+if not clang_supports_bpf
+    message('net/tap: no RSS support missing clang BPF')
+    subdir_done()
+endif
+
+enable_tap_rss = true
+
+libbpf_include_dir = libbpf.get_variable(pkgconfig : 'includedir')
+
+# The include files <linux/bpf.h> and others include <asm/types.h>
+# but <asm/types.h> is not defined for multi-lib environment target.
+# Workaround by using include directoriy from the host build environment.
+machine_name = run_command('uname', '-m').stdout().strip()
+march_include_dir = '/usr/include/' + machine_name + '-linux-gnu'
+
+clang_flags = [
+    '-O2',
+    '-Wall',
+    '-Wextra',
+    '-target',
+    'bpf',
+    '-g',
+    '-c',
+]
+
+bpf_o_cmd = [
+    clang,
+    clang_flags,
+    '-idirafter',
+    libbpf_include_dir,
+    '-idirafter',
+    march_include_dir,
+    '@INPUT@',
+    '-o',
+    '@OUTPUT@'
+]
+
+skel_h_cmd = [
+    bpftool,
+    'gen',
+    'skeleton',
+    '@INPUT@'
+]
+
+tap_rss_o = custom_target(
+    'tap_rss.bpf.o',
+    input: 'tap_rss.c',
+    output: 'tap_rss.o',
+    command: bpf_o_cmd)
+
+tap_rss_skel_h = custom_target(
+    'tap_rss.skel.h',
+    input: tap_rss_o,
+    output: 'tap_rss.skel.h',
+    command: skel_h_cmd,
+    capture: true)
diff --git a/drivers/net/tap/bpf/tap_bpf_program.c b/drivers/net/tap/bpf/tap_bpf_program.c
deleted file mode 100644
index f05aed021c30..000000000000
--- a/drivers/net/tap/bpf/tap_bpf_program.c
+++ /dev/null
@@ -1,255 +0,0 @@
-/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
- * Copyright 2017 Mellanox Technologies, Ltd
- */
-
-#include <stdint.h>
-#include <stdbool.h>
-#include <sys/types.h>
-#include <sys/socket.h>
-#include <asm/types.h>
-#include <linux/in.h>
-#include <linux/if.h>
-#include <linux/if_ether.h>
-#include <linux/ip.h>
-#include <linux/ipv6.h>
-#include <linux/if_tunnel.h>
-#include <linux/filter.h>
-
-#include "bpf_api.h"
-#include "bpf_elf.h"
-#include "../tap_rss.h"
-
-/** Create IPv4 address */
-#define IPv4(a, b, c, d) ((__u32)(((a) & 0xff) << 24) | \
-		(((b) & 0xff) << 16) | \
-		(((c) & 0xff) << 8)  | \
-		((d) & 0xff))
-
-#define PORT(a, b) ((__u16)(((a) & 0xff) << 8) | \
-		((b) & 0xff))
-
-/*
- * The queue number is offset by a unique QUEUE_OFFSET, to distinguish
- * packets that have gone through this rule (skb->cb[1] != 0) from others.
- */
-#define QUEUE_OFFSET		0x7cafe800
-#define PIN_GLOBAL_NS		2
-
-#define KEY_IDX			0
-#define BPF_MAP_ID_KEY	1
-
-struct vlan_hdr {
-	__be16 proto;
-	__be16 tci;
-};
-
-struct bpf_elf_map __attribute__((section("maps"), used))
-map_keys = {
-	.type           =       BPF_MAP_TYPE_HASH,
-	.id             =       BPF_MAP_ID_KEY,
-	.size_key       =       sizeof(__u32),
-	.size_value     =       sizeof(struct rss_key),
-	.max_elem       =       256,
-	.pinning        =       PIN_GLOBAL_NS,
-};
-
-__section("cls_q") int
-match_q(struct __sk_buff *skb)
-{
-	__u32 queue = skb->cb[1];
-	/* queue is set by tap_flow_bpf_cls_q() before load */
-	volatile __u32 q = 0xdeadbeef;
-	__u32 match_queue = QUEUE_OFFSET + q;
-
-	/* printt("match_q$i() queue = %d\n", queue); */
-
-	if (queue != match_queue)
-		return TC_ACT_OK;
-
-	/* queue match */
-	skb->cb[1] = 0;
-	return TC_ACT_UNSPEC;
-}
-
-
-struct ipv4_l3_l4_tuple {
-	__u32    src_addr;
-	__u32    dst_addr;
-	__u16    dport;
-	__u16    sport;
-} __attribute__((packed));
-
-struct ipv6_l3_l4_tuple {
-	__u8        src_addr[16];
-	__u8        dst_addr[16];
-	__u16       dport;
-	__u16       sport;
-} __attribute__((packed));
-
-static const __u8 def_rss_key[TAP_RSS_HASH_KEY_SIZE] = {
-	0xd1, 0x81, 0xc6, 0x2c,
-	0xf7, 0xf4, 0xdb, 0x5b,
-	0x19, 0x83, 0xa2, 0xfc,
-	0x94, 0x3e, 0x1a, 0xdb,
-	0xd9, 0x38, 0x9e, 0x6b,
-	0xd1, 0x03, 0x9c, 0x2c,
-	0xa7, 0x44, 0x99, 0xad,
-	0x59, 0x3d, 0x56, 0xd9,
-	0xf3, 0x25, 0x3c, 0x06,
-	0x2a, 0xdc, 0x1f, 0xfc,
-};
-
-static __u32  __attribute__((always_inline))
-rte_softrss_be(const __u32 *input_tuple, const uint8_t *rss_key,
-		__u8 input_len)
-{
-	__u32 i, j, hash = 0;
-#pragma unroll
-	for (j = 0; j < input_len; j++) {
-#pragma unroll
-		for (i = 0; i < 32; i++) {
-			if (input_tuple[j] & (1U << (31 - i))) {
-				hash ^= ((const __u32 *)def_rss_key)[j] << i |
-				(__u32)((uint64_t)
-				(((const __u32 *)def_rss_key)[j + 1])
-					>> (32 - i));
-			}
-		}
-	}
-	return hash;
-}
-
-static int __attribute__((always_inline))
-rss_l3_l4(struct __sk_buff *skb)
-{
-	void *data_end = (void *)(long)skb->data_end;
-	void *data = (void *)(long)skb->data;
-	__u16 proto = (__u16)skb->protocol;
-	__u32 key_idx = 0xdeadbeef;
-	__u32 hash;
-	struct rss_key *rsskey;
-	__u64 off = ETH_HLEN;
-	int j;
-	__u8 *key = 0;
-	__u32 len;
-	__u32 queue = 0;
-	bool mf = 0;
-	__u16 frag_off = 0;
-
-	rsskey = map_lookup_elem(&map_keys, &key_idx);
-	if (!rsskey) {
-		printt("hash(): rss key is not configured\n");
-		return TC_ACT_OK;
-	}
-	key = (__u8 *)rsskey->key;
-
-	/* Get correct proto for 802.1ad */
-	if (skb->vlan_present && skb->vlan_proto == htons(ETH_P_8021AD)) {
-		if (data + ETH_ALEN * 2 + sizeof(struct vlan_hdr) +
-		    sizeof(proto) > data_end)
-			return TC_ACT_OK;
-		proto = *(__u16 *)(data + ETH_ALEN * 2 +
-				   sizeof(struct vlan_hdr));
-		off += sizeof(struct vlan_hdr);
-	}
-
-	if (proto == htons(ETH_P_IP)) {
-		if (data + off + sizeof(struct iphdr) + sizeof(__u32)
-			> data_end)
-			return TC_ACT_OK;
-
-		__u8 *src_dst_addr = data + off + offsetof(struct iphdr, saddr);
-		__u8 *frag_off_addr = data + off + offsetof(struct iphdr, frag_off);
-		__u8 *prot_addr = data + off + offsetof(struct iphdr, protocol);
-		__u8 *src_dst_port = data + off + sizeof(struct iphdr);
-		struct ipv4_l3_l4_tuple v4_tuple = {
-			.src_addr = IPv4(*(src_dst_addr + 0),
-					*(src_dst_addr + 1),
-					*(src_dst_addr + 2),
-					*(src_dst_addr + 3)),
-			.dst_addr = IPv4(*(src_dst_addr + 4),
-					*(src_dst_addr + 5),
-					*(src_dst_addr + 6),
-					*(src_dst_addr + 7)),
-			.sport = 0,
-			.dport = 0,
-		};
-		/** Fetch the L4-payer port numbers only in-case of TCP/UDP
-		 ** and also if the packet is not fragmented. Since fragmented
-		 ** chunks do not have L4 TCP/UDP header.
-		 **/
-		if (*prot_addr == IPPROTO_UDP || *prot_addr == IPPROTO_TCP) {
-			frag_off = PORT(*(frag_off_addr + 0),
-					*(frag_off_addr + 1));
-			mf = frag_off & 0x2000;
-			frag_off = frag_off & 0x1fff;
-			if (mf == 0 && frag_off == 0) {
-				v4_tuple.sport = PORT(*(src_dst_port + 0),
-						*(src_dst_port + 1));
-				v4_tuple.dport = PORT(*(src_dst_port + 2),
-						*(src_dst_port + 3));
-			}
-		}
-		__u8 input_len = sizeof(v4_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV4_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v4_tuple, key, 3);
-	} else if (proto == htons(ETH_P_IPV6)) {
-		if (data + off + sizeof(struct ipv6hdr) +
-					sizeof(__u32) > data_end)
-			return TC_ACT_OK;
-		__u8 *src_dst_addr = data + off +
-					offsetof(struct ipv6hdr, saddr);
-		__u8 *src_dst_port = data + off +
-					sizeof(struct ipv6hdr);
-		__u8 *next_hdr = data + off +
-					offsetof(struct ipv6hdr, nexthdr);
-
-		struct ipv6_l3_l4_tuple v6_tuple;
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.src_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + j));
-		for (j = 0; j < 4; j++)
-			*((uint32_t *)&v6_tuple.dst_addr + j) =
-				__builtin_bswap32(*((uint32_t *)
-						src_dst_addr + 4 + j));
-
-		/** Fetch the L4 header port-numbers only if next-header
-		 * is TCP/UDP **/
-		if (*next_hdr == IPPROTO_UDP || *next_hdr == IPPROTO_TCP) {
-			v6_tuple.sport = PORT(*(src_dst_port + 0),
-				      *(src_dst_port + 1));
-			v6_tuple.dport = PORT(*(src_dst_port + 2),
-				      *(src_dst_port + 3));
-		} else {
-			v6_tuple.sport = 0;
-			v6_tuple.dport = 0;
-		}
-
-		__u8 input_len = sizeof(v6_tuple) / sizeof(__u32);
-		if (rsskey->hash_fields & (1 << HASH_FIELD_IPV6_L3))
-			input_len--;
-		hash = rte_softrss_be((__u32 *)&v6_tuple, key, 9);
-	} else {
-		return TC_ACT_PIPE;
-	}
-
-	queue = rsskey->queues[(hash % rsskey->nb_queues) &
-				       (TAP_MAX_QUEUES - 1)];
-	skb->cb[1] = QUEUE_OFFSET + queue;
-	/* printt(">>>>> rss_l3_l4 hash=0x%x queue=%u\n", hash, queue); */
-
-	return TC_ACT_RECLASSIFY;
-}
-
-#define RSS(L)						\
-	__section(#L) int				\
-		L ## _hash(struct __sk_buff *skb)	\
-	{						\
-		return rss_ ## L (skb);			\
-	}
-
-RSS(l3_l4)
-
-BPF_LICENSE("Dual BSD/GPL");
diff --git a/drivers/net/tap/bpf/tap_rss.c b/drivers/net/tap/bpf/tap_rss.c
new file mode 100644
index 000000000000..1abd18cb606e
--- /dev/null
+++ b/drivers/net/tap/bpf/tap_rss.c
@@ -0,0 +1,272 @@
+/* SPDX-License-Identifier: BSD-3-Clause OR GPL-2.0
+ * Copyright 2017 Mellanox Technologies, Ltd
+ */
+
+#include <linux/in.h>
+#include <linux/if_ether.h>
+#include <linux/ip.h>
+#include <linux/ipv6.h>
+#include <linux/pkt_cls.h>
+#include <linux/bpf.h>
+
+#include <bpf/bpf_helpers.h>
+#include <bpf/bpf_endian.h>
+
+#include "../tap_rss.h"
+
+/*
+ * This map provides configuration information about flows
+ * which need BPF RSS.
+ *
+ * The hash is indexed by the tc_index.
+ */
+struct {
+	__uint(type, BPF_MAP_TYPE_HASH);
+	__uint(key_size, sizeof(__u16));
+	__uint(value_size, sizeof(struct rss_key));
+	__uint(max_entries, TAP_MAX_QUEUES);
+} rss_map SEC(".maps");
+
+
+#define IP_MF		0x2000		/** IP header Flags **/
+#define IP_OFFSET	0x1FFF		/** IP header fragment offset **/
+
+/*
+ * Compute Toeplitz hash over the input tuple.
+ * This is same as rte_softrss_be in lib/hash
+ * but loop needs to be setup to match BPF restrictions.
+ */
+static __u32 __attribute__((always_inline))
+softrss_be(const __u32 *input_tuple, __u32 input_len, const __u32 *key)
+{
+	__u32 i, j, hash = 0;
+
+#pragma unroll
+	for (j = 0; j < input_len; j++) {
+#pragma unroll
+		for (i = 0; i < 32; i++) {
+			if (input_tuple[j] & (1U << (31 - i)))
+				hash ^= key[j] << i | key[j + 1] >> (32 - i);
+		}
+	}
+	return hash;
+}
+
+/* Compute RSS hash for IPv4 packet.
+ * return in 0 if RSS not specified
+ */
+static __u32 __attribute__((always_inline))
+parse_ipv4(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct iphdr iph;
+	__u32 off = 0;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &iph, sizeof(iph), BPF_HDR_START_NET))
+		return 0;	/* no IP header present */
+
+	struct {
+		__u32    src_addr;
+		__u32    dst_addr;
+		__u16    dport;
+		__u16    sport;
+	} v4_tuple = {
+		.src_addr = bpf_ntohl(iph.saddr),
+		.dst_addr = bpf_ntohl(iph.daddr),
+	};
+
+	/* If only calculating L3 hash, do it now */
+	if (hash_type & (1 << HASH_FIELD_IPV4_L3))
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32) - 1, key);
+
+	/* No L4 if packet is a fragmented */
+	if ((iph.frag_off & bpf_htons(IP_MF | IP_OFFSET)) != 0)
+		return 0;
+
+	/* Do RSS on UDP or TCP ports */
+	if (iph.protocol == IPPROTO_UDP || iph.protocol == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		off += iph.ihl * 4;
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0; /* TCP or UDP header missing */
+
+		v4_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v4_tuple.dport = bpf_ntohs(src_dst_port[1]);
+		return softrss_be((__u32 *)&v4_tuple, sizeof(v4_tuple) / sizeof(__u32), key);
+	}
+
+	/* Other protocol */
+	return 0;
+}
+
+/* parse ipv6 extended headers, update offset and return next proto.
+ * returns next proto on success, -1 on malformed header
+ */
+static int __attribute__((always_inline))
+skip_ip6_ext(__u16 proto, const struct __sk_buff *skb, __u32 *off, int *frag)
+{
+	struct ext_hdr {
+		__u8 next_hdr;
+		__u8 len;
+	} xh;
+	unsigned int i;
+
+	*frag = 0;
+
+#define MAX_EXT_HDRS 5
+#pragma unroll
+	for (i = 0; i < MAX_EXT_HDRS; i++) {
+		switch (proto) {
+		case IPPROTO_HOPOPTS:
+		case IPPROTO_ROUTING:
+		case IPPROTO_DSTOPTS:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += (xh.len + 1) * 8;
+			proto = xh.next_hdr;
+			break;
+		case IPPROTO_FRAGMENT:
+			if (bpf_skb_load_bytes_relative(skb, *off, &xh, sizeof(xh),
+							BPF_HDR_START_NET))
+				return -1;
+
+			*off += 8;
+			proto = xh.next_hdr;
+			*frag = 1;
+			return proto; /* this is always the last ext hdr */
+		default:
+			return proto;
+		}
+	}
+
+	/* too many extension headers give up */
+	return -1;
+}
+
+static __u32 __attribute__((always_inline))
+parse_ipv6(const struct __sk_buff *skb, __u32 hash_type, const __u32 *key)
+{
+	struct {
+		__u32       src_addr[4];
+		__u32       dst_addr[4];
+		__u16       dport;
+		__u16       sport;
+	} v6_tuple = { };
+	struct ipv6hdr ip6h;
+	__u32 off = 0, j;
+	int proto, frag;
+
+	if (bpf_skb_load_bytes_relative(skb, off, &ip6h, sizeof(ip6h), BPF_HDR_START_NET))
+		return 0;
+
+#pragma unroll
+	for (j = 0; j < 4; j++) {
+		v6_tuple.src_addr[j] = bpf_ntohl(ip6h.saddr.in6_u.u6_addr32[j]);
+		v6_tuple.dst_addr[j] = bpf_ntohl(ip6h.daddr.in6_u.u6_addr32[j]);
+	}
+
+	if (hash_type & (1 << HASH_FIELD_IPV6_L3))
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32) - 1, key);
+
+	off += sizeof(ip6h);
+	proto = skip_ip6_ext(ip6h.nexthdr, skb, &off, &frag);
+	if (proto < 0)
+		return 0;
+
+	if (frag)
+		return 0;
+
+	/* Do RSS on UDP or TCP ports */
+	if (proto == IPPROTO_UDP || proto == IPPROTO_TCP) {
+		__u16 src_dst_port[2];
+
+		if (bpf_skb_load_bytes_relative(skb, off, &src_dst_port, sizeof(src_dst_port),
+						BPF_HDR_START_NET))
+			return 0;
+
+		v6_tuple.sport = bpf_ntohs(src_dst_port[0]);
+		v6_tuple.dport = bpf_ntohs(src_dst_port[1]);
+
+		return softrss_be((__u32 *)&v6_tuple, sizeof(v6_tuple) / sizeof(__u32), key);
+	}
+
+	return 0;
+}
+
+/*
+ * Compute RSS hash for packets.
+ * Returns 0 if no hash is possible.
+ */
+static __u32 __attribute__((always_inline))
+calculate_rss_hash(const struct __sk_buff *skb, const struct rss_key *rsskey)
+{
+	const __u32 *key = (const __u32 *)rsskey->key;
+
+	if (skb->protocol == bpf_htons(ETH_P_IP))
+		return parse_ipv4(skb, rsskey->hash_fields, key);
+	else if (skb->protocol == bpf_htons(ETH_P_IPV6))
+		return parse_ipv6(skb, rsskey->hash_fields, key);
+	else
+		return 0;
+}
+
+/* scale value to be into range [0, n), assumes val is large */
+static __u32  __attribute__((always_inline))
+reciprocal_scale(__u32 val, __u32 n)
+{
+	return (__u32)(((__u64)val * n) >> 32);
+}
+
+/* layout of qdisc skb cb (from sch_generic.h) */
+struct qdisc_skb_cb {
+	struct {
+		unsigned int	pkt_len;
+		__u16		dev_queue_mapping;
+		__u16		tc_classid;
+	};
+#define QDISC_CB_PRIV_LEN 20
+	unsigned char		data[QDISC_CB_PRIV_LEN];
+};
+
+/*
+ * When this BPF program is run by tc from the filter classifier,
+ * it is able to read skb metadata and packet data.
+ *
+ * For packets where RSS is not possible, then just return TC_ACT_OK.
+ * When RSS is desired, change the skb->queue_mapping and set TC_ACT_PIPE
+ * to continue processing.
+ *
+ * This should be BPF_PROG_TYPE_SCHED_ACT so section needs to be "action"
+ */
+SEC("action") int
+rss_flow_action(struct __sk_buff *skb)
+{
+	const struct rss_key *rsskey;
+	__u16 classid;
+	__u32 hash;
+
+	/* TC layer puts the BPF_CLASSID into the skb cb area */
+	classid = ((const struct qdisc_skb_cb *)skb->cb)->tc_classid;
+
+	/* Lookup RSS configuration for that BPF class */
+	rsskey = bpf_map_lookup_elem(&rss_map, &classid);
+	if (rsskey == NULL) {
+		bpf_printk("hash(): rss not configured");
+		return TC_ACT_OK;
+	}
+
+	hash = calculate_rss_hash(skb, rsskey);
+	bpf_printk("hash %u\n", hash);
+	if (hash) {
+		/* Fold hash to the number of queues configured */
+		skb->queue_mapping = reciprocal_scale(hash, rsskey->nb_queues);
+		bpf_printk("queue %u\n", skb->queue_mapping);
+		return TC_ACT_PIPE;
+	}
+	return TC_ACT_OK;
+}
+
+char _license[] SEC("license") = "Dual BSD/GPL";
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* Re: [PATCH v2 1/7] ethdev: support report register names and filter
  2024-02-05 10:51  8%   ` [PATCH v2 1/7] ethdev: support report register names and filter Jie Hai
@ 2024-02-07 17:00  3%     ` Ferruh Yigit
  2024-02-20  8:43  3%       ` Jie Hai
  0 siblings, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-02-07 17:00 UTC (permalink / raw)
  To: Jie Hai, dev; +Cc: lihuisong, fengchengwen, liuyonglong, huangdengdui

On 2/5/2024 10:51 AM, Jie Hai wrote:
> This patch adds "filter" and "names" fields to "rte_dev_reg_info"
> structure. Names of registers in data fields can be reported and
> the registers can be filtered by their names.
> 
> For compatibility, the original API rte_eth_dev_get_reg_info()
> does not use the name and filter fields. The new API
> rte_eth_dev_get_reg_info_ext() is added to support reporting
> names and filtering by names. If the drivers does not report
> the names, set them to "offset_XXX".
> 
> Signed-off-by: Jie Hai <haijie1@huawei.com>
> ---
>  doc/guides/rel_notes/release_24_03.rst |  8 ++++++
>  lib/ethdev/rte_dev_info.h              | 11 ++++++++
>  lib/ethdev/rte_ethdev.c                | 36 ++++++++++++++++++++++++++
>  lib/ethdev/rte_ethdev.h                | 22 ++++++++++++++++
>  4 files changed, 77 insertions(+)
> 
> diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
> index 84d3144215c6..5d402341223a 100644
> --- a/doc/guides/rel_notes/release_24_03.rst
> +++ b/doc/guides/rel_notes/release_24_03.rst
> @@ -75,6 +75,11 @@ New Features
>    * Added support for Atomic Rules' TK242 packet-capture family of devices
>      with PCI IDs: ``0x1024, 0x1025, 0x1026``.
>  
> +* **Added support for dumping regiters with names and filter.**
>

s/regiters/registers/

> +
> +  * Added new API functions ``rte_eth_dev_get_reg_info_ext()`` to and filter
> +  * the registers by their names and get the information of registers(names,
> +  * values and other attributes).
>  

'*' makes a bullet, but above seems one sentences, if so please only
keep the first '*'.

>  Removed Items
>  -------------
> @@ -124,6 +129,9 @@ ABI Changes
>  
>  * No ABI change that would break compatibility with 23.11.
>  
> +* ethdev: Added ``filter`` and ``names`` fields to ``rte_dev_reg_info``
> +  structure for reporting names of regiters and filtering them by names.
> +
>  

This will break the ABI.

Think about a case, an application compiled with an old version of DPDK,
later same application started to use this version without re-compile,
application will send old version of 'struct rte_dev_reg_info', but new
version of DPDK will try to access or update new fields of the 'struct
rte_dev_reg_info'

One option is:
- to add a new 'struct rte_dev_reg_info_ext',
- 'rte_eth_dev_get_reg_info()' still uses old 'struct rte_dev_reg_info'
- 'get_reg()' dev_ops will use this new 'struct rte_dev_reg_info_ext'
- Add deprecation notice to update 'rte_eth_dev_get_reg_info()' to use
new struct in next LTS release


>  Known Issues
>  ------------
> diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
> index 67cf0ae52668..2f4541bd46c8 100644
> --- a/lib/ethdev/rte_dev_info.h
> +++ b/lib/ethdev/rte_dev_info.h
> @@ -11,6 +11,11 @@ extern "C" {
>  
>  #include <stdint.h>
>  
> +#define RTE_ETH_REG_NAME_SIZE 128
> +struct rte_eth_reg_name {
> +	char name[RTE_ETH_REG_NAME_SIZE];
> +};
> +
>  /*
>   * Placeholder for accessing device registers
>   */
> @@ -20,6 +25,12 @@ struct rte_dev_reg_info {
>  	uint32_t length; /**< Number of registers to fetch */
>  	uint32_t width; /**< Size of device register */
>  	uint32_t version; /**< Device version */
> +	/**
> +	 * Filter for target subset of registers.
> +	 * This field could affects register selection for data/length/names.
> +	 */
> +	char *filter;
> +	struct rte_eth_reg_name *names; /**< Registers name saver */
>  };
>  
>  /*
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
> index f1c658f49e80..3e0294e49092 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -6388,8 +6388,39 @@ rte_eth_read_clock(uint16_t port_id, uint64_t *clock)
>  
>  int
>  rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
> +{
> +	struct rte_dev_reg_info reg_info;
> +	int ret;
> +
> +	if (info == NULL) {
> +		RTE_ETHDEV_LOG_LINE(ERR,
> +			"Cannot get ethdev port %u register info to NULL",
> +			port_id);
> +		return -EINVAL;
> +	}
> +
> +	reg_info.length = info->length;
> +	reg_info.data = info->data;
> +	reg_info.names = NULL;
> +	reg_info.filter = NULL;
> +
> +	ret = rte_eth_dev_get_reg_info_ext(port_id, &reg_info);
> +	if (ret != 0)
> +		return ret;
> +
> +	info->length = reg_info.length;
> +	info->width = reg_info.width;
> +	info->version = reg_info.version;
> +	info->offset = reg_info.offset;
> +
> +	return 0;
> +}
> +
> +int
> +rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info)
>  {
>  	struct rte_eth_dev *dev;
> +	uint32_t i;
>  	int ret;
>  
>  	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> @@ -6408,6 +6439,11 @@ rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
>  
>  	rte_ethdev_trace_get_reg_info(port_id, info, ret);
>  
> +	/* Report the default names if drivers not report. */
> +	if (info->names != NULL && strlen(info->names[0].name) == 0)
> +		for (i = 0; i < info->length; i++)
> +			sprintf(info->names[i].name, "offset_%x",
> +				info->offset + i * info->width);
>

Better to use 'snprintf()'

>  	return ret;
>  }
>  
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
> index 2687c23fa6fb..3abc2ad3f865 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -5053,6 +5053,28 @@ __rte_experimental
>  int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id,
>  		struct rte_power_monitor_cond *pmc);
>  
> +/**
> + * Retrieve the filtered device registers (values and names) and
> + * register attributes (number of registers and register size)
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param info
> + *   Pointer to rte_dev_reg_info structure to fill in. If info->data is
> + *   NULL, the function fills in the width and length fields. If non-NULL,
> + *   the values of registers whose name contains the filter string are put
> + *   into the buffer pointed at by the data field. Do the same for the names
> + *   of registers if info->names is not NULL.
>

May be good to mention if info->names is not NULL, but driver doesn't
support names, ehtdev fills the names automatically.

As both 'rte_eth_dev_get_reg_info()' & 'rte_eth_dev_get_reg_info_ext()'
use same dev_ops ('get_reg()'), it is possible that driver doesn't
support filtering, so if the user provides info->filter but driver
doesn't support it, I think API should return error, what do you think?

And can you please make it clear above, if filtering is provided with
info->data = NULL, are the returned width and length fields for filtered
number of registers or all registers?


> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support.
> + *   - (-EINVAL) if bad parameter.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EIO) if device is removed.
> + *   - others depends on the specific operations implementation.
> + */
> +int rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info);
> +
>

Can you please make the new API as experimental. That is the requirement
for new APIs.

Also need to add the API to version.map


>  /**
>   * Retrieve device registers and register attributes (number of registers and
>   * register size)


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v7 0/4] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE
  2024-02-06  2:06  3% ` [PATCH v7 0/4] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Suanming Mou
  2024-02-06  2:06  3%   ` [PATCH v7 1/4] ethdev: rename action modify field data structure Suanming Mou
@ 2024-02-06 21:24  0%   ` Ferruh Yigit
  1 sibling, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-02-06 21:24 UTC (permalink / raw)
  To: Suanming Mou, thomas; +Cc: dev, orika

On 2/6/2024 2:06 AM, Suanming Mou wrote:
> The new item type is added for the case user wants to match traffic
> based on packet field compare result with other fields or immediate
> value.
> 
> e.g. take advantage the compare item user will be able to accumulate
> a IPv4/TCP packet's TCP data_offset and IPv4 IHL field to a tag
> register, then compare the tag register with IPv4 header total length
> to understand the packet has payload or not.
> 
> The supported operations can be as below:
>  - RTE_FLOW_ITEM_COMPARE_EQ (equal)
>  - RTE_FLOW_ITEM_COMPARE_NE (not equal)
>  - RTE_FLOW_ITEM_COMPARE_LT (less than)
>  - RTE_FLOW_ITEM_COMPARE_LE (less than or equal)
>  - RTE_FLOW_ITEM_COMPARE_GT (great than)
>  - RTE_FLOW_ITEM_COMPARE_GE (great than or equal)
> 
> V7:
>  - Moved release notes to API.
>  - Optimize comment descriptions.
> 
> V6:
>  - fix typo and style issue.
>  - adjust flow_field description.
> 
> V5:
>  - rebase on top of next-net
>  - add sample detail for rte_flow_field.
> 
> V4:
>  - rebase on top of the latest version.
>  - move ACTION_MODIFY_PATTERN_SIZE and modify_field_ids rename
>    to first patch.
>  - add comparison flow create sample in testpmd_funcs.rst.
> 
> V3:
>  - fix code style missing empty line in rte_flow.rst.
>  - fix missing the ABI change release notes.
> 
> V2:
>  - Since modify field data struct is experiment, rename modify
>    field data directly instead of adding new flow field struct.
> 
> 
> Suanming Mou (4):
>   ethdev: rename action modify field data structure
>   ethdev: move flow field data structures
>   ethdev: add compare item
>   net/mlx5: add compare item support
>

Series applied to dpdk-next-net/main, thanks.


^ permalink raw reply	[relevance 0%]

* [PATCH v3] ethdev: fast path async flow API
  @ 2024-02-06 17:36  1% ` Dariusz Sosnowski
  0 siblings, 0 replies; 200+ results
From: Dariusz Sosnowski @ 2024-02-06 17:36 UTC (permalink / raw)
  To: Matan Azrad, Viacheslav Ovsiienko, Ori Kam, Suanming Mou,
	Thomas Monjalon, Ferruh Yigit, Andrew Rybchenko
  Cc: dev

This patch reworks the async flow API functions called in data path,
to reduce the overhead during flow operations at the library level.
Main source of the overhead was indirection and checks done while
ethdev library was fetching rte_flow_ops from a given driver.

This patch introduces rte_flow_fp_ops struct which holds callbacks
to driver's implementation of fast path async flow API functions.
Each driver implementing these functions must populate flow_fp_ops
field inside rte_eth_dev structure with a reference to
its own implementation.
By default, ethdev library provides dummy callbacks with
implementations returning ENOSYS.
Such design provides a few assumptions:

- rte_flow_fp_ops struct for given port is always available.
- Each callback is either:
    - Default provided by library.
    - Set up by driver.

As a result, no checks for availability of the implementation
are needed at library level in data path.
Any library-level validation checks in async flow API are compiled
if and only if RTE_FLOW_DEBUG macro is defined.

This design was based on changes in ethdev library introduced in [1].

These changes apply only to the following API functions:

- rte_flow_async_create()
- rte_flow_async_create_by_index()
- rte_flow_async_actions_update()
- rte_flow_async_destroy()
- rte_flow_push()
- rte_flow_pull()
- rte_flow_async_action_handle_create()
- rte_flow_async_action_handle_destroy()
- rte_flow_async_action_handle_update()
- rte_flow_async_action_handle_query()
- rte_flow_async_action_handle_query_update()
- rte_flow_async_action_list_handle_create()
- rte_flow_async_action_list_handle_destroy()
- rte_flow_async_action_list_handle_query_update()

This patch also adjusts the mlx5 PMD to the introduced flow API changes.

[1]
commit c87d435a4d79 ("ethdev: copy fast-path API into separate structure")

Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
---
v3:
- Documented RTE_FLOW_DEBUG build option.
- Enabled RTE_FLOW_DEBUG automatically on debug builds.
- Fixed pointer checks to compare against NULL explicitly.

v2:
- Fixed mlx5 PMD build issue with older versions of rdma-core.
---
 doc/guides/nics/build_and_test.rst     |   9 +-
 doc/guides/rel_notes/release_24_03.rst |  37 ++
 drivers/net/mlx5/mlx5_flow.c           | 608 +------------------------
 drivers/net/mlx5/mlx5_flow_hw.c        |  25 +
 lib/ethdev/ethdev_driver.c             |   4 +
 lib/ethdev/ethdev_driver.h             |   4 +
 lib/ethdev/meson.build                 |   4 +
 lib/ethdev/rte_flow.c                  | 519 ++++++++++++++++-----
 lib/ethdev/rte_flow_driver.h           | 277 ++++++-----
 lib/ethdev/version.map                 |   2 +
 10 files changed, 647 insertions(+), 842 deletions(-)

diff --git a/doc/guides/nics/build_and_test.rst b/doc/guides/nics/build_and_test.rst
index e8b29c2277..453fa74b39 100644
--- a/doc/guides/nics/build_and_test.rst
+++ b/doc/guides/nics/build_and_test.rst
@@ -36,11 +36,16 @@ The ethdev layer supports below build options for debug purpose:

   Build with debug code on Tx path.

+- ``RTE_FLOW_DEBUG`` (default **disabled**; enabled automatically on debug builds)
+
+  Build with debug code in asynchronous flow APIs.
+
 .. Note::

-   The ethdev library use above options to wrap debug code to trace invalid parameters
+   The ethdev library uses above options to wrap debug code to trace invalid parameters
    on data path APIs, so performance downgrade is expected when enabling those options.
-   Each PMD can decide to reuse them to wrap their own debug code in the Rx/Tx path.
+   Each PMD can decide to reuse them to wrap their own debug code in the Rx/Tx path
+   and in asynchronous flow APIs implementation.

 Running testpmd in Linux
 ------------------------
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 6f8ad27808..b62330b8b1 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -86,6 +86,43 @@ API Changes

 * gso: ``rte_gso_segment`` now returns -ENOTSUP for unknown protocols.

+* ethdev: PMDs implementing asynchronous flow operations are required to provide relevant functions
+  implementation through ``rte_flow_fp_ops`` struct, instead of ``rte_flow_ops`` struct.
+  Pointer to device-dependent ``rte_flow_fp_ops`` should be provided to ``rte_eth_dev.flow_fp_ops``.
+  This change applies to the following API functions:
+
+   * ``rte_flow_async_create``
+   * ``rte_flow_async_create_by_index``
+   * ``rte_flow_async_actions_update``
+   * ``rte_flow_async_destroy``
+   * ``rte_flow_push``
+   * ``rte_flow_pull``
+   * ``rte_flow_async_action_handle_create``
+   * ``rte_flow_async_action_handle_destroy``
+   * ``rte_flow_async_action_handle_update``
+   * ``rte_flow_async_action_handle_query``
+   * ``rte_flow_async_action_handle_query_update``
+   * ``rte_flow_async_action_list_handle_create``
+   * ``rte_flow_async_action_list_handle_destroy``
+   * ``rte_flow_async_action_list_handle_query_update``
+
+* ethdev: Removed the following fields from ``rte_flow_ops`` struct:
+
+   * ``async_create``
+   * ``async_create_by_index``
+   * ``async_actions_update``
+   * ``async_destroy``
+   * ``push``
+   * ``pull``
+   * ``async_action_handle_create``
+   * ``async_action_handle_destroy``
+   * ``async_action_handle_update``
+   * ``async_action_handle_query``
+   * ``async_action_handle_query_update``
+   * ``async_action_list_handle_create``
+   * ``async_action_list_handle_destroy``
+   * ``async_action_list_handle_query_update``
+

 ABI Changes
 -----------
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 85e8c77c81..0ff3b91596 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -1055,98 +1055,13 @@ mlx5_flow_group_set_miss_actions(struct rte_eth_dev *dev,
 				 const struct rte_flow_group_attr *attr,
 				 const struct rte_flow_action actions[],
 				 struct rte_flow_error *error);
-static struct rte_flow *
-mlx5_flow_async_flow_create(struct rte_eth_dev *dev,
-			    uint32_t queue,
-			    const struct rte_flow_op_attr *attr,
-			    struct rte_flow_template_table *table,
-			    const struct rte_flow_item items[],
-			    uint8_t pattern_template_index,
-			    const struct rte_flow_action actions[],
-			    uint8_t action_template_index,
-			    void *user_data,
-			    struct rte_flow_error *error);
-static struct rte_flow *
-mlx5_flow_async_flow_create_by_index(struct rte_eth_dev *dev,
-			    uint32_t queue,
-			    const struct rte_flow_op_attr *attr,
-			    struct rte_flow_template_table *table,
-			    uint32_t rule_index,
-			    const struct rte_flow_action actions[],
-			    uint8_t action_template_index,
-			    void *user_data,
-			    struct rte_flow_error *error);
-static int
-mlx5_flow_async_flow_update(struct rte_eth_dev *dev,
-			     uint32_t queue,
-			     const struct rte_flow_op_attr *attr,
-			     struct rte_flow *flow,
-			     const struct rte_flow_action actions[],
-			     uint8_t action_template_index,
-			     void *user_data,
-			     struct rte_flow_error *error);
-static int
-mlx5_flow_async_flow_destroy(struct rte_eth_dev *dev,
-			     uint32_t queue,
-			     const struct rte_flow_op_attr *attr,
-			     struct rte_flow *flow,
-			     void *user_data,
-			     struct rte_flow_error *error);
-static int
-mlx5_flow_pull(struct rte_eth_dev *dev,
-	       uint32_t queue,
-	       struct rte_flow_op_result res[],
-	       uint16_t n_res,
-	       struct rte_flow_error *error);
-static int
-mlx5_flow_push(struct rte_eth_dev *dev,
-	       uint32_t queue,
-	       struct rte_flow_error *error);
-
-static struct rte_flow_action_handle *
-mlx5_flow_async_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
-				 const struct rte_flow_op_attr *attr,
-				 const struct rte_flow_indir_action_conf *conf,
-				 const struct rte_flow_action *action,
-				 void *user_data,
-				 struct rte_flow_error *error);
-
-static int
-mlx5_flow_async_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
-				 const struct rte_flow_op_attr *attr,
-				 struct rte_flow_action_handle *handle,
-				 const void *update,
-				 void *user_data,
-				 struct rte_flow_error *error);

 static int
-mlx5_flow_async_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
-				  const struct rte_flow_op_attr *attr,
-				  struct rte_flow_action_handle *handle,
-				  void *user_data,
-				  struct rte_flow_error *error);
-
-static int
-mlx5_flow_async_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
-				 const struct rte_flow_op_attr *attr,
-				 const struct rte_flow_action_handle *handle,
-				 void *data,
-				 void *user_data,
-				 struct rte_flow_error *error);
-static int
 mlx5_action_handle_query_update(struct rte_eth_dev *dev,
 				struct rte_flow_action_handle *handle,
 				const void *update, void *query,
 				enum rte_flow_query_update_mode qu_mode,
 				struct rte_flow_error *error);
-static int
-mlx5_flow_async_action_handle_query_update
-	(struct rte_eth_dev *dev, uint32_t queue_id,
-	 const struct rte_flow_op_attr *op_attr,
-	 struct rte_flow_action_handle *action_handle,
-	 const void *update, void *query,
-	 enum rte_flow_query_update_mode qu_mode,
-	 void *user_data, struct rte_flow_error *error);

 static struct rte_flow_action_list_handle *
 mlx5_action_list_handle_create(struct rte_eth_dev *dev,
@@ -1159,20 +1074,6 @@ mlx5_action_list_handle_destroy(struct rte_eth_dev *dev,
 				struct rte_flow_action_list_handle *handle,
 				struct rte_flow_error *error);

-static struct rte_flow_action_list_handle *
-mlx5_flow_async_action_list_handle_create(struct rte_eth_dev *dev, uint32_t queue_id,
-					  const struct rte_flow_op_attr *attr,
-					  const struct
-					  rte_flow_indir_action_conf *conf,
-					  const struct rte_flow_action *actions,
-					  void *user_data,
-					  struct rte_flow_error *error);
-static int
-mlx5_flow_async_action_list_handle_destroy
-			(struct rte_eth_dev *dev, uint32_t queue_id,
-			 const struct rte_flow_op_attr *op_attr,
-			 struct rte_flow_action_list_handle *action_handle,
-			 void *user_data, struct rte_flow_error *error);
 static int
 mlx5_flow_action_list_handle_query_update(struct rte_eth_dev *dev,
 					  const
@@ -1180,17 +1081,7 @@ mlx5_flow_action_list_handle_query_update(struct rte_eth_dev *dev,
 					  const void **update, void **query,
 					  enum rte_flow_query_update_mode mode,
 					  struct rte_flow_error *error);
-static int
-mlx5_flow_async_action_list_handle_query_update(struct rte_eth_dev *dev,
-						uint32_t queue_id,
-						const struct rte_flow_op_attr *attr,
-						const struct
-						rte_flow_action_list_handle *handle,
-						const void **update,
-						void **query,
-						enum rte_flow_query_update_mode mode,
-						void *user_data,
-						struct rte_flow_error *error);
+
 static int
 mlx5_flow_calc_table_hash(struct rte_eth_dev *dev,
 			  const struct rte_flow_template_table *table,
@@ -1232,26 +1123,8 @@ static const struct rte_flow_ops mlx5_flow_ops = {
 	.template_table_create = mlx5_flow_table_create,
 	.template_table_destroy = mlx5_flow_table_destroy,
 	.group_set_miss_actions = mlx5_flow_group_set_miss_actions,
-	.async_create = mlx5_flow_async_flow_create,
-	.async_create_by_index = mlx5_flow_async_flow_create_by_index,
-	.async_destroy = mlx5_flow_async_flow_destroy,
-	.pull = mlx5_flow_pull,
-	.push = mlx5_flow_push,
-	.async_action_handle_create = mlx5_flow_async_action_handle_create,
-	.async_action_handle_update = mlx5_flow_async_action_handle_update,
-	.async_action_handle_query_update =
-		mlx5_flow_async_action_handle_query_update,
-	.async_action_handle_query = mlx5_flow_async_action_handle_query,
-	.async_action_handle_destroy = mlx5_flow_async_action_handle_destroy,
-	.async_actions_update = mlx5_flow_async_flow_update,
-	.async_action_list_handle_create =
-		mlx5_flow_async_action_list_handle_create,
-	.async_action_list_handle_destroy =
-		mlx5_flow_async_action_list_handle_destroy,
 	.action_list_handle_query_update =
 		mlx5_flow_action_list_handle_query_update,
-	.async_action_list_handle_query_update =
-		mlx5_flow_async_action_list_handle_query_update,
 	.flow_calc_table_hash = mlx5_flow_calc_table_hash,
 };

@@ -9427,424 +9300,6 @@ mlx5_flow_group_set_miss_actions(struct rte_eth_dev *dev,
 	return fops->group_set_miss_actions(dev, group_id, attr, actions, error);
 }

-/**
- * Enqueue flow creation.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev structure.
- * @param[in] queue_id
- *   The queue to create the flow.
- * @param[in] attr
- *   Pointer to the flow operation attributes.
- * @param[in] items
- *   Items with flow spec value.
- * @param[in] pattern_template_index
- *   The item pattern flow follows from the table.
- * @param[in] actions
- *   Action with flow spec value.
- * @param[in] action_template_index
- *   The action pattern flow follows from the table.
- * @param[in] user_data
- *   Pointer to the user_data.
- * @param[out] error
- *   Pointer to error structure.
- *
- * @return
- *    Flow pointer on success, NULL otherwise and rte_errno is set.
- */
-static struct rte_flow *
-mlx5_flow_async_flow_create(struct rte_eth_dev *dev,
-			    uint32_t queue_id,
-			    const struct rte_flow_op_attr *attr,
-			    struct rte_flow_template_table *table,
-			    const struct rte_flow_item items[],
-			    uint8_t pattern_template_index,
-			    const struct rte_flow_action actions[],
-			    uint8_t action_template_index,
-			    void *user_data,
-			    struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops;
-	struct rte_flow_attr fattr = {0};
-
-	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
-		rte_flow_error_set(error, ENOTSUP,
-				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				NULL,
-				"flow_q create with incorrect steering mode");
-		return NULL;
-	}
-	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
-	return fops->async_flow_create(dev, queue_id, attr, table,
-				       items, pattern_template_index,
-				       actions, action_template_index,
-				       user_data, error);
-}
-
-/**
- * Enqueue flow creation by index.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev structure.
- * @param[in] queue_id
- *   The queue to create the flow.
- * @param[in] attr
- *   Pointer to the flow operation attributes.
- * @param[in] rule_index
- *   The item pattern flow follows from the table.
- * @param[in] actions
- *   Action with flow spec value.
- * @param[in] action_template_index
- *   The action pattern flow follows from the table.
- * @param[in] user_data
- *   Pointer to the user_data.
- * @param[out] error
- *   Pointer to error structure.
- *
- * @return
- *    Flow pointer on success, NULL otherwise and rte_errno is set.
- */
-static struct rte_flow *
-mlx5_flow_async_flow_create_by_index(struct rte_eth_dev *dev,
-			    uint32_t queue_id,
-			    const struct rte_flow_op_attr *attr,
-			    struct rte_flow_template_table *table,
-			    uint32_t rule_index,
-			    const struct rte_flow_action actions[],
-			    uint8_t action_template_index,
-			    void *user_data,
-			    struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops;
-	struct rte_flow_attr fattr = {0};
-
-	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW) {
-		rte_flow_error_set(error, ENOTSUP,
-				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				NULL,
-				"flow_q create with incorrect steering mode");
-		return NULL;
-	}
-	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
-	return fops->async_flow_create_by_index(dev, queue_id, attr, table,
-				       rule_index, actions, action_template_index,
-				       user_data, error);
-}
-
-/**
- * Enqueue flow update.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev structure.
- * @param[in] queue
- *   The queue to destroy the flow.
- * @param[in] attr
- *   Pointer to the flow operation attributes.
- * @param[in] flow
- *   Pointer to the flow to be destroyed.
- * @param[in] actions
- *   Action with flow spec value.
- * @param[in] action_template_index
- *   The action pattern flow follows from the table.
- * @param[in] user_data
- *   Pointer to the user_data.
- * @param[out] error
- *   Pointer to error structure.
- *
- * @return
- *    0 on success, negative value otherwise and rte_errno is set.
- */
-static int
-mlx5_flow_async_flow_update(struct rte_eth_dev *dev,
-			     uint32_t queue,
-			     const struct rte_flow_op_attr *attr,
-			     struct rte_flow *flow,
-			     const struct rte_flow_action actions[],
-			     uint8_t action_template_index,
-			     void *user_data,
-			     struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops;
-	struct rte_flow_attr fattr = {0};
-
-	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW)
-		return rte_flow_error_set(error, ENOTSUP,
-				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				NULL,
-				"flow_q update with incorrect steering mode");
-	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
-	return fops->async_flow_update(dev, queue, attr, flow,
-					actions, action_template_index, user_data, error);
-}
-
-/**
- * Enqueue flow destruction.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev structure.
- * @param[in] queue
- *   The queue to destroy the flow.
- * @param[in] attr
- *   Pointer to the flow operation attributes.
- * @param[in] flow
- *   Pointer to the flow to be destroyed.
- * @param[in] user_data
- *   Pointer to the user_data.
- * @param[out] error
- *   Pointer to error structure.
- *
- * @return
- *    0 on success, negative value otherwise and rte_errno is set.
- */
-static int
-mlx5_flow_async_flow_destroy(struct rte_eth_dev *dev,
-			     uint32_t queue,
-			     const struct rte_flow_op_attr *attr,
-			     struct rte_flow *flow,
-			     void *user_data,
-			     struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops;
-	struct rte_flow_attr fattr = {0};
-
-	if (flow_get_drv_type(dev, &fattr) != MLX5_FLOW_TYPE_HW)
-		return rte_flow_error_set(error, ENOTSUP,
-				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				NULL,
-				"flow_q destroy with incorrect steering mode");
-	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
-	return fops->async_flow_destroy(dev, queue, attr, flow,
-					user_data, error);
-}
-
-/**
- * Pull the enqueued flows.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev structure.
- * @param[in] queue
- *   The queue to pull the result.
- * @param[in/out] res
- *   Array to save the results.
- * @param[in] n_res
- *   Available result with the array.
- * @param[out] error
- *   Pointer to error structure.
- *
- * @return
- *    Result number on success, negative value otherwise and rte_errno is set.
- */
-static int
-mlx5_flow_pull(struct rte_eth_dev *dev,
-	       uint32_t queue,
-	       struct rte_flow_op_result res[],
-	       uint16_t n_res,
-	       struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops;
-	struct rte_flow_attr attr = {0};
-
-	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
-		return rte_flow_error_set(error, ENOTSUP,
-				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				NULL,
-				"flow_q pull with incorrect steering mode");
-	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
-	return fops->pull(dev, queue, res, n_res, error);
-}
-
-/**
- * Push the enqueued flows.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev structure.
- * @param[in] queue
- *   The queue to push the flows.
- * @param[out] error
- *   Pointer to error structure.
- *
- * @return
- *    0 on success, negative value otherwise and rte_errno is set.
- */
-static int
-mlx5_flow_push(struct rte_eth_dev *dev,
-	       uint32_t queue,
-	       struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops;
-	struct rte_flow_attr attr = {0};
-
-	if (flow_get_drv_type(dev, &attr) != MLX5_FLOW_TYPE_HW)
-		return rte_flow_error_set(error, ENOTSUP,
-				RTE_FLOW_ERROR_TYPE_UNSPECIFIED,
-				NULL,
-				"flow_q push with incorrect steering mode");
-	fops = flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
-	return fops->push(dev, queue, error);
-}
-
-/**
- * Create shared action.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev structure.
- * @param[in] queue
- *   Which queue to be used..
- * @param[in] attr
- *   Operation attribute.
- * @param[in] conf
- *   Indirect action configuration.
- * @param[in] action
- *   rte_flow action detail.
- * @param[in] user_data
- *   Pointer to the user_data.
- * @param[out] error
- *   Pointer to error structure.
- *
- * @return
- *   Action handle on success, NULL otherwise and rte_errno is set.
- */
-static struct rte_flow_action_handle *
-mlx5_flow_async_action_handle_create(struct rte_eth_dev *dev, uint32_t queue,
-				 const struct rte_flow_op_attr *attr,
-				 const struct rte_flow_indir_action_conf *conf,
-				 const struct rte_flow_action *action,
-				 void *user_data,
-				 struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops =
-			flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
-
-	return fops->async_action_create(dev, queue, attr, conf, action,
-					 user_data, error);
-}
-
-/**
- * Update shared action.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev structure.
- * @param[in] queue
- *   Which queue to be used..
- * @param[in] attr
- *   Operation attribute.
- * @param[in] handle
- *   Action handle to be updated.
- * @param[in] update
- *   Update value.
- * @param[in] user_data
- *   Pointer to the user_data.
- * @param[out] error
- *   Pointer to error structure.
- *
- * @return
- *   0 on success, negative value otherwise and rte_errno is set.
- */
-static int
-mlx5_flow_async_action_handle_update(struct rte_eth_dev *dev, uint32_t queue,
-				     const struct rte_flow_op_attr *attr,
-				     struct rte_flow_action_handle *handle,
-				     const void *update,
-				     void *user_data,
-				     struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops =
-			flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
-
-	return fops->async_action_update(dev, queue, attr, handle,
-					 update, user_data, error);
-}
-
-static int
-mlx5_flow_async_action_handle_query_update
-	(struct rte_eth_dev *dev, uint32_t queue_id,
-	 const struct rte_flow_op_attr *op_attr,
-	 struct rte_flow_action_handle *action_handle,
-	 const void *update, void *query,
-	 enum rte_flow_query_update_mode qu_mode,
-	 void *user_data, struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops =
-		flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
-
-	if (!fops || !fops->async_action_query_update)
-		return rte_flow_error_set(error, ENOTSUP,
-					  RTE_FLOW_ERROR_TYPE_ACTION, NULL,
-					  "async query_update not supported");
-	return fops->async_action_query_update
-			   (dev, queue_id, op_attr, action_handle,
-			    update, query, qu_mode, user_data, error);
-}
-
-/**
- * Query shared action.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev structure.
- * @param[in] queue
- *   Which queue to be used..
- * @param[in] attr
- *   Operation attribute.
- * @param[in] handle
- *   Action handle to be updated.
- * @param[in] data
- *   Pointer query result data.
- * @param[in] user_data
- *   Pointer to the user_data.
- * @param[out] error
- *   Pointer to error structure.
- *
- * @return
- *   0 on success, negative value otherwise and rte_errno is set.
- */
-static int
-mlx5_flow_async_action_handle_query(struct rte_eth_dev *dev, uint32_t queue,
-				    const struct rte_flow_op_attr *attr,
-				    const struct rte_flow_action_handle *handle,
-				    void *data,
-				    void *user_data,
-				    struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops =
-			flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
-
-	return fops->async_action_query(dev, queue, attr, handle,
-					data, user_data, error);
-}
-
-/**
- * Destroy shared action.
- *
- * @param[in] dev
- *   Pointer to the rte_eth_dev structure.
- * @param[in] queue
- *   Which queue to be used..
- * @param[in] attr
- *   Operation attribute.
- * @param[in] handle
- *   Action handle to be destroyed.
- * @param[in] user_data
- *   Pointer to the user_data.
- * @param[out] error
- *   Pointer to error structure.
- *
- * @return
- *   0 on success, negative value otherwise and rte_errno is set.
- */
-static int
-mlx5_flow_async_action_handle_destroy(struct rte_eth_dev *dev, uint32_t queue,
-				      const struct rte_flow_op_attr *attr,
-				      struct rte_flow_action_handle *handle,
-				      void *user_data,
-				      struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops =
-			flow_get_drv_ops(MLX5_FLOW_TYPE_HW);
-
-	return fops->async_action_destroy(dev, queue, attr, handle,
-					  user_data, error);
-}
-
 /**
  * Allocate a new memory for the counter values wrapped by all the needed
  * management.
@@ -11015,41 +10470,6 @@ mlx5_action_list_handle_destroy(struct rte_eth_dev *dev,
 	return fops->action_list_handle_destroy(dev, handle, error);
 }

-static struct rte_flow_action_list_handle *
-mlx5_flow_async_action_list_handle_create(struct rte_eth_dev *dev,
-					  uint32_t queue_id,
-					  const struct
-					  rte_flow_op_attr *op_attr,
-					  const struct
-					  rte_flow_indir_action_conf *conf,
-					  const struct rte_flow_action *actions,
-					  void *user_data,
-					  struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops;
-
-	MLX5_DRV_FOPS_OR_ERR(dev, fops, async_action_list_handle_create, NULL);
-	return fops->async_action_list_handle_create(dev, queue_id, op_attr,
-						     conf, actions, user_data,
-						     error);
-}
-
-static int
-mlx5_flow_async_action_list_handle_destroy
-	(struct rte_eth_dev *dev, uint32_t queue_id,
-	 const struct rte_flow_op_attr *op_attr,
-	 struct rte_flow_action_list_handle *action_handle,
-	 void *user_data, struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops;
-
-	MLX5_DRV_FOPS_OR_ERR(dev, fops,
-			     async_action_list_handle_destroy, ENOTSUP);
-	return fops->async_action_list_handle_destroy(dev, queue_id, op_attr,
-						      action_handle, user_data,
-						      error);
-}
-
 static int
 mlx5_flow_action_list_handle_query_update(struct rte_eth_dev *dev,
 					  const
@@ -11065,32 +10485,6 @@ mlx5_flow_action_list_handle_query_update(struct rte_eth_dev *dev,
 	return fops->action_list_handle_query_update(dev, handle, update, query,
 						     mode, error);
 }
-
-static int
-mlx5_flow_async_action_list_handle_query_update(struct rte_eth_dev *dev,
-						uint32_t queue_id,
-						const
-						struct rte_flow_op_attr *op_attr,
-						const struct
-						rte_flow_action_list_handle *handle,
-						const void **update,
-						void **query,
-						enum
-						rte_flow_query_update_mode mode,
-						void *user_data,
-						struct rte_flow_error *error)
-{
-	const struct mlx5_flow_driver_ops *fops;
-
-	MLX5_DRV_FOPS_OR_ERR(dev, fops,
-			     async_action_list_handle_query_update, ENOTSUP);
-	return fops->async_action_list_handle_query_update(dev, queue_id, op_attr,
-							   handle, update,
-							   query, mode,
-							   user_data, error);
-}
-
-
 static int
 mlx5_flow_calc_table_hash(struct rte_eth_dev *dev,
 			  const struct rte_flow_template_table *table,
diff --git a/drivers/net/mlx5/mlx5_flow_hw.c b/drivers/net/mlx5/mlx5_flow_hw.c
index da873ae2e2..c65ebfbba2 100644
--- a/drivers/net/mlx5/mlx5_flow_hw.c
+++ b/drivers/net/mlx5/mlx5_flow_hw.c
@@ -3,6 +3,7 @@
  */

 #include <rte_flow.h>
+#include <rte_flow_driver.h>

 #include <mlx5_malloc.h>

@@ -14,6 +15,9 @@
 #if defined(HAVE_IBV_FLOW_DV_SUPPORT) || !defined(HAVE_INFINIBAND_VERBS_H)
 #include "mlx5_hws_cnt.h"

+/** Fast path async flow API functions. */
+static struct rte_flow_fp_ops mlx5_flow_hw_fp_ops;
+
 /* The maximum actions support in the flow. */
 #define MLX5_HW_MAX_ACTS 16

@@ -9543,6 +9547,7 @@ flow_hw_configure(struct rte_eth_dev *dev,
 		mlx5_free(_queue_attr);
 	if (port_attr->flags & RTE_FLOW_PORT_FLAG_STRICT_QUEUE)
 		priv->hws_strict_queue = 1;
+	dev->flow_fp_ops = &mlx5_flow_hw_fp_ops;
 	return 0;
 err:
 	if (priv->hws_ctpool) {
@@ -9617,6 +9622,7 @@ flow_hw_resource_release(struct rte_eth_dev *dev)

 	if (!priv->dr_ctx)
 		return;
+	dev->flow_fp_ops = &rte_flow_fp_default_ops;
 	flow_hw_rxq_flag_set(dev, false);
 	flow_hw_flush_all_ctrl_flows(dev);
 	flow_hw_cleanup_tx_repr_tagging(dev);
@@ -12992,4 +12998,23 @@ mlx5_reformat_action_destroy(struct rte_eth_dev *dev,
 	mlx5_free(handle);
 	return 0;
 }
+
+static struct rte_flow_fp_ops mlx5_flow_hw_fp_ops = {
+	.async_create = flow_hw_async_flow_create,
+	.async_create_by_index = flow_hw_async_flow_create_by_index,
+	.async_actions_update = flow_hw_async_flow_update,
+	.async_destroy = flow_hw_async_flow_destroy,
+	.push = flow_hw_push,
+	.pull = flow_hw_pull,
+	.async_action_handle_create = flow_hw_action_handle_create,
+	.async_action_handle_destroy = flow_hw_action_handle_destroy,
+	.async_action_handle_update = flow_hw_action_handle_update,
+	.async_action_handle_query = flow_hw_action_handle_query,
+	.async_action_handle_query_update = flow_hw_async_action_handle_query_update,
+	.async_action_list_handle_create = flow_hw_async_action_list_handle_create,
+	.async_action_list_handle_destroy = flow_hw_async_action_list_handle_destroy,
+	.async_action_list_handle_query_update =
+		flow_hw_async_action_list_handle_query_update,
+};
+
 #endif
diff --git a/lib/ethdev/ethdev_driver.c b/lib/ethdev/ethdev_driver.c
index bd917a15fc..34909a3018 100644
--- a/lib/ethdev/ethdev_driver.c
+++ b/lib/ethdev/ethdev_driver.c
@@ -10,6 +10,7 @@

 #include "ethdev_driver.h"
 #include "ethdev_private.h"
+#include "rte_flow_driver.h"

 /**
  * A set of values to describe the possible states of a switch domain.
@@ -110,6 +111,7 @@ rte_eth_dev_allocate(const char *name)
 	}

 	eth_dev = eth_dev_get(port_id);
+	eth_dev->flow_fp_ops = &rte_flow_fp_default_ops;
 	strlcpy(eth_dev->data->name, name, sizeof(eth_dev->data->name));
 	eth_dev->data->port_id = port_id;
 	eth_dev->data->backer_port_id = RTE_MAX_ETHPORTS;
@@ -245,6 +247,8 @@ rte_eth_dev_release_port(struct rte_eth_dev *eth_dev)

 	eth_dev_fp_ops_reset(rte_eth_fp_ops + eth_dev->data->port_id);

+	eth_dev->flow_fp_ops = &rte_flow_fp_default_ops;
+
 	rte_spinlock_lock(rte_mcfg_ethdev_get_lock());

 	eth_dev->state = RTE_ETH_DEV_UNUSED;
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index b482cd12bb..b2e879ae1d 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -71,6 +71,10 @@ struct rte_eth_dev {
 	struct rte_eth_dev_data *data;
 	void *process_private; /**< Pointer to per-process device data */
 	const struct eth_dev_ops *dev_ops; /**< Functions exported by PMD */
+	/**
+	 * Fast path flow API functions exported by PMD.
+	 */
+	const struct rte_flow_fp_ops *flow_fp_ops;
 	struct rte_device *device; /**< Backing device */
 	struct rte_intr_handle *intr_handle; /**< Device interrupt handle */

diff --git a/lib/ethdev/meson.build b/lib/ethdev/meson.build
index 3497aa1548..b8859de11b 100644
--- a/lib/ethdev/meson.build
+++ b/lib/ethdev/meson.build
@@ -49,3 +49,7 @@ deps += ['net', 'kvargs', 'meter', 'telemetry']
 if is_freebsd
     annotate_locks = false
 endif
+
+if get_option('buildtype').contains('debug')
+    cflags += ['-DRTE_FLOW_DEBUG']
+endif
diff --git a/lib/ethdev/rte_flow.c b/lib/ethdev/rte_flow.c
index f49d1d3767..02522730b3 100644
--- a/lib/ethdev/rte_flow.c
+++ b/lib/ethdev/rte_flow.c
@@ -2013,16 +2013,26 @@ rte_flow_async_create(uint16_t port_id,
 		      struct rte_flow_error *error)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
 	struct rte_flow *flow;

-	flow = ops->async_create(dev, queue_id,
-				 op_attr, template_table,
-				 pattern, pattern_template_index,
-				 actions, actions_template_index,
-				 user_data, error);
-	if (flow == NULL)
-		flow_err(port_id, -rte_errno, error);
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   rte_strerror(ENODEV));
+		return NULL;
+	}
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->async_create == NULL) {
+		rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   rte_strerror(ENOSYS));
+		return NULL;
+	}
+#endif
+
+	flow = dev->flow_fp_ops->async_create(dev, queue_id,
+					      op_attr, template_table,
+					      pattern, pattern_template_index,
+					      actions, actions_template_index,
+					      user_data, error);

 	rte_flow_trace_async_create(port_id, queue_id, op_attr, template_table,
 				    pattern, pattern_template_index, actions,
@@ -2043,16 +2053,24 @@ rte_flow_async_create_by_index(uint16_t port_id,
 			       struct rte_flow_error *error)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
-	struct rte_flow *flow;

-	flow = ops->async_create_by_index(dev, queue_id,
-					  op_attr, template_table, rule_index,
-					  actions, actions_template_index,
-					  user_data, error);
-	if (flow == NULL)
-		flow_err(port_id, -rte_errno, error);
-	return flow;
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   rte_strerror(ENODEV));
+		return NULL;
+	}
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->async_create_by_index == NULL) {
+		rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   rte_strerror(ENOSYS));
+		return NULL;
+	}
+#endif
+
+	return dev->flow_fp_ops->async_create_by_index(dev, queue_id,
+						       op_attr, template_table, rule_index,
+						       actions, actions_template_index,
+						       user_data, error);
 }

 int
@@ -2064,14 +2082,20 @@ rte_flow_async_destroy(uint16_t port_id,
 		       struct rte_flow_error *error)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
 	int ret;

-	ret = flow_err(port_id,
-		       ops->async_destroy(dev, queue_id,
-					  op_attr, flow,
-					  user_data, error),
-		       error);
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENODEV));
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->async_destroy == NULL)
+		return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENOSYS));
+#endif
+
+	ret = dev->flow_fp_ops->async_destroy(dev, queue_id,
+					      op_attr, flow,
+					      user_data, error);

 	rte_flow_trace_async_destroy(port_id, queue_id, op_attr, flow,
 				     user_data, ret);
@@ -2090,15 +2114,21 @@ rte_flow_async_actions_update(uint16_t port_id,
 			      struct rte_flow_error *error)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
 	int ret;

-	ret = flow_err(port_id,
-		       ops->async_actions_update(dev, queue_id, op_attr,
-						 flow, actions,
-						 actions_template_index,
-						 user_data, error),
-		       error);
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENODEV));
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->async_actions_update == NULL)
+		return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENOSYS));
+#endif
+
+	ret = dev->flow_fp_ops->async_actions_update(dev, queue_id, op_attr,
+						     flow, actions,
+						     actions_template_index,
+						     user_data, error);

 	rte_flow_trace_async_actions_update(port_id, queue_id, op_attr, flow,
 					    actions, actions_template_index,
@@ -2113,12 +2143,18 @@ rte_flow_push(uint16_t port_id,
 	      struct rte_flow_error *error)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
 	int ret;

-	ret = flow_err(port_id,
-		       ops->push(dev, queue_id, error),
-		       error);
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENODEV));
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->push == NULL)
+		return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENOSYS));
+#endif
+
+	ret = dev->flow_fp_ops->push(dev, queue_id, error);

 	rte_flow_trace_push(port_id, queue_id, ret);

@@ -2133,16 +2169,22 @@ rte_flow_pull(uint16_t port_id,
 	      struct rte_flow_error *error)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
 	int ret;
-	int rc;

-	ret = ops->pull(dev, queue_id, res, n_res, error);
-	rc = ret ? ret : flow_err(port_id, ret, error);
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENODEV));
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->pull == NULL)
+		return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENOSYS));
+#endif
+
+	ret = dev->flow_fp_ops->pull(dev, queue_id, res, n_res, error);

-	rte_flow_trace_pull(port_id, queue_id, res, n_res, rc);
+	rte_flow_trace_pull(port_id, queue_id, res, n_res, ret);

-	return rc;
+	return ret;
 }

 struct rte_flow_action_handle *
@@ -2155,13 +2197,24 @@ rte_flow_async_action_handle_create(uint16_t port_id,
 		struct rte_flow_error *error)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
 	struct rte_flow_action_handle *handle;

-	handle = ops->async_action_handle_create(dev, queue_id, op_attr,
-					     indir_action_conf, action, user_data, error);
-	if (handle == NULL)
-		flow_err(port_id, -rte_errno, error);
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   rte_strerror(ENODEV));
+		return NULL;
+	}
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->async_action_handle_create == NULL) {
+		rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   rte_strerror(ENOSYS));
+		return NULL;
+	}
+#endif
+
+	handle = dev->flow_fp_ops->async_action_handle_create(dev, queue_id, op_attr,
+							      indir_action_conf, action,
+							      user_data, error);

 	rte_flow_trace_async_action_handle_create(port_id, queue_id, op_attr,
 						  indir_action_conf, action,
@@ -2179,12 +2232,19 @@ rte_flow_async_action_handle_destroy(uint16_t port_id,
 		struct rte_flow_error *error)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
 	int ret;

-	ret = ops->async_action_handle_destroy(dev, queue_id, op_attr,
-					   action_handle, user_data, error);
-	ret = flow_err(port_id, ret, error);
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENODEV));
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->async_action_handle_destroy == NULL)
+		return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENOSYS));
+#endif
+
+	ret = dev->flow_fp_ops->async_action_handle_destroy(dev, queue_id, op_attr,
+							    action_handle, user_data, error);

 	rte_flow_trace_async_action_handle_destroy(port_id, queue_id, op_attr,
 						   action_handle, user_data, ret);
@@ -2202,12 +2262,19 @@ rte_flow_async_action_handle_update(uint16_t port_id,
 		struct rte_flow_error *error)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
 	int ret;

-	ret = ops->async_action_handle_update(dev, queue_id, op_attr,
-					  action_handle, update, user_data, error);
-	ret = flow_err(port_id, ret, error);
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENODEV));
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->async_action_handle_update == NULL)
+		return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENOSYS));
+#endif
+
+	ret = dev->flow_fp_ops->async_action_handle_update(dev, queue_id, op_attr,
+							   action_handle, update, user_data, error);

 	rte_flow_trace_async_action_handle_update(port_id, queue_id, op_attr,
 						  action_handle, update,
@@ -2226,14 +2293,19 @@ rte_flow_async_action_handle_query(uint16_t port_id,
 		struct rte_flow_error *error)
 {
 	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
-	const struct rte_flow_ops *ops = rte_flow_ops_get(port_id, error);
 	int ret;

-	if (unlikely(!ops))
-		return -rte_errno;
-	ret = ops->async_action_handle_query(dev, queue_id, op_attr,
-					  action_handle, data, user_data, error);
-	ret = flow_err(port_id, ret, error);
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENODEV));
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->async_action_handle_query == NULL)
+		return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENOSYS));
+#endif
+
+	ret = dev->flow_fp_ops->async_action_handle_query(dev, queue_id, op_attr,
+							  action_handle, data, user_data, error);

 	rte_flow_trace_async_action_handle_query(port_id, queue_id, op_attr,
 						 action_handle, data, user_data,
@@ -2276,24 +2348,21 @@ rte_flow_async_action_handle_query_update(uint16_t port_id, uint32_t queue_id,
 					  void *user_data,
 					  struct rte_flow_error *error)
 {
-	int ret;
-	struct rte_eth_dev *dev;
-	const struct rte_flow_ops *ops;
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];

-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-	if (!handle)
-		return -EINVAL;
-	if (!update && !query)
-		return -EINVAL;
-	dev = &rte_eth_devices[port_id];
-	ops = rte_flow_ops_get(port_id, error);
-	if (!ops || !ops->async_action_handle_query_update)
-		return -ENOTSUP;
-	ret = ops->async_action_handle_query_update(dev, queue_id, attr,
-						    handle, update,
-						    query, mode,
-						    user_data, error);
-	return flow_err(port_id, ret, error);
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENODEV));
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->async_action_handle_query_update == NULL)
+		return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENOSYS));
+#endif
+
+	return dev->flow_fp_ops->async_action_handle_query_update(dev, queue_id, attr,
+								  handle, update,
+								  query, mode,
+								  user_data, error);
 }

 struct rte_flow_action_list_handle *
@@ -2353,24 +2422,28 @@ rte_flow_async_action_list_handle_create(uint16_t port_id, uint32_t queue_id,
 					 void *user_data,
 					 struct rte_flow_error *error)
 {
-	int ret;
-	struct rte_eth_dev *dev;
-	const struct rte_flow_ops *ops;
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 	struct rte_flow_action_list_handle *handle;
+	int ret;

-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, NULL);
-	ops = rte_flow_ops_get(port_id, error);
-	if (!ops || !ops->async_action_list_handle_create) {
-		rte_flow_error_set(error, ENOTSUP,
-				   RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
-				   "action_list handle not supported");
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id)) {
+		rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   rte_strerror(ENODEV));
 		return NULL;
 	}
-	dev = &rte_eth_devices[port_id];
-	handle = ops->async_action_list_handle_create(dev, queue_id, attr, conf,
-						      actions, user_data,
-						      error);
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->async_action_list_handle_create == NULL) {
+		rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				   rte_strerror(ENOSYS));
+		return NULL;
+	}
+#endif
+
+	handle = dev->flow_fp_ops->async_action_list_handle_create(dev, queue_id, attr, conf,
+								   actions, user_data,
+								   error);
 	ret = flow_err(port_id, -rte_errno, error);
+
 	rte_flow_trace_async_action_list_handle_create(port_id, queue_id, attr,
 						       conf, actions, user_data,
 						       ret);
@@ -2383,20 +2456,21 @@ rte_flow_async_action_list_handle_destroy(uint16_t port_id, uint32_t queue_id,
 				 struct rte_flow_action_list_handle *handle,
 				 void *user_data, struct rte_flow_error *error)
 {
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 	int ret;
-	struct rte_eth_dev *dev;
-	const struct rte_flow_ops *ops;

-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-	ops = rte_flow_ops_get(port_id, error);
-	if (!ops || !ops->async_action_list_handle_destroy)
-		return rte_flow_error_set(error, ENOTSUP,
-					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
-					  "async action_list handle not supported");
-	dev = &rte_eth_devices[port_id];
-	ret = ops->async_action_list_handle_destroy(dev, queue_id, op_attr,
-						    handle, user_data, error);
-	ret = flow_err(port_id, ret, error);
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENODEV));
+	if (dev->flow_fp_ops == NULL || dev->flow_fp_ops->async_action_list_handle_destroy == NULL)
+		return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENOSYS));
+#endif
+
+	ret = dev->flow_fp_ops->async_action_list_handle_destroy(dev, queue_id, op_attr,
+								 handle, user_data, error);
+
 	rte_flow_trace_async_action_list_handle_destroy(port_id, queue_id,
 							op_attr, handle,
 							user_data, ret);
@@ -2437,22 +2511,24 @@ rte_flow_async_action_list_handle_query_update(uint16_t port_id, uint32_t queue_
 			 enum rte_flow_query_update_mode mode,
 			 void *user_data, struct rte_flow_error *error)
 {
+	struct rte_eth_dev *dev = &rte_eth_devices[port_id];
 	int ret;
-	struct rte_eth_dev *dev;
-	const struct rte_flow_ops *ops;

-	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
-	ops = rte_flow_ops_get(port_id, error);
-	if (!ops || !ops->async_action_list_handle_query_update)
-		return rte_flow_error_set(error, ENOTSUP,
-					  RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
-					  "action_list async query_update not supported");
-	dev = &rte_eth_devices[port_id];
-	ret = ops->async_action_list_handle_query_update(dev, queue_id, attr,
-							 handle, update, query,
-							 mode, user_data,
-							 error);
-	ret = flow_err(port_id, ret, error);
+#ifdef RTE_FLOW_DEBUG
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return rte_flow_error_set(error, ENODEV, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENODEV));
+	if (dev->flow_fp_ops == NULL ||
+	    dev->flow_fp_ops->async_action_list_handle_query_update == NULL)
+		return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+					  rte_strerror(ENOSYS));
+#endif
+
+	ret = dev->flow_fp_ops->async_action_list_handle_query_update(dev, queue_id, attr,
+								      handle, update, query,
+								      mode, user_data,
+								      error);
+
 	rte_flow_trace_async_action_list_handle_query_update(port_id, queue_id,
 							     attr, handle,
 							     update, query,
@@ -2481,3 +2557,216 @@ rte_flow_calc_table_hash(uint16_t port_id, const struct rte_flow_template_table
 					hash, error);
 	return flow_err(port_id, ret, error);
 }
+
+static struct rte_flow *
+rte_flow_dummy_async_create(struct rte_eth_dev *dev __rte_unused,
+			    uint32_t queue __rte_unused,
+			    const struct rte_flow_op_attr *attr __rte_unused,
+			    struct rte_flow_template_table *table __rte_unused,
+			    const struct rte_flow_item items[] __rte_unused,
+			    uint8_t pattern_template_index __rte_unused,
+			    const struct rte_flow_action actions[] __rte_unused,
+			    uint8_t action_template_index __rte_unused,
+			    void *user_data __rte_unused,
+			    struct rte_flow_error *error)
+{
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			   rte_strerror(ENOSYS));
+	return NULL;
+}
+
+static struct rte_flow *
+rte_flow_dummy_async_create_by_index(struct rte_eth_dev *dev __rte_unused,
+				     uint32_t queue __rte_unused,
+				     const struct rte_flow_op_attr *attr __rte_unused,
+				     struct rte_flow_template_table *table __rte_unused,
+				     uint32_t rule_index __rte_unused,
+				     const struct rte_flow_action actions[] __rte_unused,
+				     uint8_t action_template_index __rte_unused,
+				     void *user_data __rte_unused,
+				     struct rte_flow_error *error)
+{
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			   rte_strerror(ENOSYS));
+	return NULL;
+}
+
+static int
+rte_flow_dummy_async_actions_update(struct rte_eth_dev *dev __rte_unused,
+				    uint32_t queue_id __rte_unused,
+				    const struct rte_flow_op_attr *op_attr __rte_unused,
+				    struct rte_flow *flow __rte_unused,
+				    const struct rte_flow_action actions[] __rte_unused,
+				    uint8_t actions_template_index __rte_unused,
+				    void *user_data __rte_unused,
+				    struct rte_flow_error *error)
+{
+	return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  rte_strerror(ENOSYS));
+}
+
+static int
+rte_flow_dummy_async_destroy(struct rte_eth_dev *dev __rte_unused,
+			     uint32_t queue_id __rte_unused,
+			     const struct rte_flow_op_attr *op_attr __rte_unused,
+			     struct rte_flow *flow __rte_unused,
+			     void *user_data __rte_unused,
+			     struct rte_flow_error *error)
+{
+	return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  rte_strerror(ENOSYS));
+}
+
+static int
+rte_flow_dummy_push(struct rte_eth_dev *dev __rte_unused,
+		    uint32_t queue_id __rte_unused,
+		    struct rte_flow_error *error)
+{
+	return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  rte_strerror(ENOSYS));
+}
+
+static int
+rte_flow_dummy_pull(struct rte_eth_dev *dev __rte_unused,
+		    uint32_t queue_id __rte_unused,
+		    struct rte_flow_op_result res[] __rte_unused,
+		    uint16_t n_res __rte_unused,
+		    struct rte_flow_error *error)
+{
+	return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  rte_strerror(ENOSYS));
+}
+
+static struct rte_flow_action_handle *
+rte_flow_dummy_async_action_handle_create(
+	struct rte_eth_dev *dev __rte_unused,
+	uint32_t queue_id __rte_unused,
+	const struct rte_flow_op_attr *op_attr __rte_unused,
+	const struct rte_flow_indir_action_conf *indir_action_conf __rte_unused,
+	const struct rte_flow_action *action __rte_unused,
+	void *user_data __rte_unused,
+	struct rte_flow_error *error)
+{
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			   rte_strerror(ENOSYS));
+	return NULL;
+}
+
+static int
+rte_flow_dummy_async_action_handle_destroy(
+	struct rte_eth_dev *dev __rte_unused,
+	uint32_t queue_id __rte_unused,
+	const struct rte_flow_op_attr *op_attr __rte_unused,
+	struct rte_flow_action_handle *action_handle __rte_unused,
+	void *user_data __rte_unused,
+	struct rte_flow_error *error)
+{
+	return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  rte_strerror(ENOSYS));
+}
+
+static int
+rte_flow_dummy_async_action_handle_update(
+	struct rte_eth_dev *dev __rte_unused,
+	uint32_t queue_id __rte_unused,
+	const struct rte_flow_op_attr *op_attr __rte_unused,
+	struct rte_flow_action_handle *action_handle __rte_unused,
+	const void *update __rte_unused,
+	void *user_data __rte_unused,
+	struct rte_flow_error *error)
+{
+	return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  rte_strerror(ENOSYS));
+}
+
+static int
+rte_flow_dummy_async_action_handle_query(
+	struct rte_eth_dev *dev __rte_unused,
+	uint32_t queue_id __rte_unused,
+	const struct rte_flow_op_attr *op_attr __rte_unused,
+	const struct rte_flow_action_handle *action_handle __rte_unused,
+	void *data __rte_unused,
+	void *user_data __rte_unused,
+	struct rte_flow_error *error)
+{
+	return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  rte_strerror(ENOSYS));
+}
+
+static int
+rte_flow_dummy_async_action_handle_query_update(
+	struct rte_eth_dev *dev __rte_unused,
+	uint32_t queue_id __rte_unused,
+	const struct rte_flow_op_attr *attr __rte_unused,
+	struct rte_flow_action_handle *handle __rte_unused,
+	const void *update __rte_unused,
+	void *query __rte_unused,
+	enum rte_flow_query_update_mode mode __rte_unused,
+	void *user_data __rte_unused,
+	struct rte_flow_error *error)
+{
+	return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  rte_strerror(ENOSYS));
+}
+
+static struct rte_flow_action_list_handle *
+rte_flow_dummy_async_action_list_handle_create(
+	struct rte_eth_dev *dev __rte_unused,
+	uint32_t queue_id __rte_unused,
+	const struct rte_flow_op_attr *attr __rte_unused,
+	const struct rte_flow_indir_action_conf *conf __rte_unused,
+	const struct rte_flow_action *actions __rte_unused,
+	void *user_data __rte_unused,
+	struct rte_flow_error *error)
+{
+	rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+			   rte_strerror(ENOSYS));
+	return NULL;
+}
+
+static int
+rte_flow_dummy_async_action_list_handle_destroy(
+	struct rte_eth_dev *dev __rte_unused,
+	uint32_t queue_id __rte_unused,
+	const struct rte_flow_op_attr *op_attr __rte_unused,
+	struct rte_flow_action_list_handle *handle __rte_unused,
+	void *user_data __rte_unused,
+	struct rte_flow_error *error)
+{
+	return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  rte_strerror(ENOSYS));
+}
+
+static int
+rte_flow_dummy_async_action_list_handle_query_update(
+	struct rte_eth_dev *dev __rte_unused,
+	uint32_t queue_id __rte_unused,
+	const struct rte_flow_op_attr *attr __rte_unused,
+	const struct rte_flow_action_list_handle *handle __rte_unused,
+	const void **update __rte_unused,
+	void **query __rte_unused,
+	enum rte_flow_query_update_mode mode __rte_unused,
+	void *user_data __rte_unused,
+	struct rte_flow_error *error)
+{
+	return rte_flow_error_set(error, ENOSYS, RTE_FLOW_ERROR_TYPE_UNSPECIFIED, NULL,
+				  rte_strerror(ENOSYS));
+}
+
+struct rte_flow_fp_ops rte_flow_fp_default_ops = {
+	.async_create = rte_flow_dummy_async_create,
+	.async_create_by_index = rte_flow_dummy_async_create_by_index,
+	.async_actions_update = rte_flow_dummy_async_actions_update,
+	.async_destroy = rte_flow_dummy_async_destroy,
+	.push = rte_flow_dummy_push,
+	.pull = rte_flow_dummy_pull,
+	.async_action_handle_create = rte_flow_dummy_async_action_handle_create,
+	.async_action_handle_destroy = rte_flow_dummy_async_action_handle_destroy,
+	.async_action_handle_update = rte_flow_dummy_async_action_handle_update,
+	.async_action_handle_query = rte_flow_dummy_async_action_handle_query,
+	.async_action_handle_query_update = rte_flow_dummy_async_action_handle_query_update,
+	.async_action_list_handle_create = rte_flow_dummy_async_action_list_handle_create,
+	.async_action_list_handle_destroy = rte_flow_dummy_async_action_list_handle_destroy,
+	.async_action_list_handle_query_update =
+		rte_flow_dummy_async_action_list_handle_query_update,
+};
diff --git a/lib/ethdev/rte_flow_driver.h b/lib/ethdev/rte_flow_driver.h
index f35f659503..dd9d01045d 100644
--- a/lib/ethdev/rte_flow_driver.h
+++ b/lib/ethdev/rte_flow_driver.h
@@ -234,122 +234,12 @@ struct rte_flow_ops {
 		 const struct rte_flow_group_attr *attr,
 		 const struct rte_flow_action actions[],
 		 struct rte_flow_error *err);
-	/** See rte_flow_async_create() */
-	struct rte_flow *(*async_create)
-		(struct rte_eth_dev *dev,
-		 uint32_t queue_id,
-		 const struct rte_flow_op_attr *op_attr,
-		 struct rte_flow_template_table *template_table,
-		 const struct rte_flow_item pattern[],
-		 uint8_t pattern_template_index,
-		 const struct rte_flow_action actions[],
-		 uint8_t actions_template_index,
-		 void *user_data,
-		 struct rte_flow_error *err);
-	/** See rte_flow_async_create_by_index() */
-	struct rte_flow *(*async_create_by_index)
-		(struct rte_eth_dev *dev,
-		 uint32_t queue_id,
-		 const struct rte_flow_op_attr *op_attr,
-		 struct rte_flow_template_table *template_table,
-		 uint32_t rule_index,
-		 const struct rte_flow_action actions[],
-		 uint8_t actions_template_index,
-		 void *user_data,
-		 struct rte_flow_error *err);
-	/** See rte_flow_async_destroy() */
-	int (*async_destroy)
-		(struct rte_eth_dev *dev,
-		 uint32_t queue_id,
-		 const struct rte_flow_op_attr *op_attr,
-		 struct rte_flow *flow,
-		 void *user_data,
-		 struct rte_flow_error *err);
-	/** See rte_flow_push() */
-	int (*push)
-		(struct rte_eth_dev *dev,
-		 uint32_t queue_id,
-		 struct rte_flow_error *err);
-	/** See rte_flow_pull() */
-	int (*pull)
-		(struct rte_eth_dev *dev,
-		 uint32_t queue_id,
-		 struct rte_flow_op_result res[],
-		 uint16_t n_res,
-		 struct rte_flow_error *error);
-	/** See rte_flow_async_action_handle_create() */
-	struct rte_flow_action_handle *(*async_action_handle_create)
-		(struct rte_eth_dev *dev,
-		 uint32_t queue_id,
-		 const struct rte_flow_op_attr *op_attr,
-		 const struct rte_flow_indir_action_conf *indir_action_conf,
-		 const struct rte_flow_action *action,
-		 void *user_data,
-		 struct rte_flow_error *err);
-	/** See rte_flow_async_action_handle_destroy() */
-	int (*async_action_handle_destroy)
-		(struct rte_eth_dev *dev,
-		 uint32_t queue_id,
-		 const struct rte_flow_op_attr *op_attr,
-		 struct rte_flow_action_handle *action_handle,
-		 void *user_data,
-		 struct rte_flow_error *error);
-	/** See rte_flow_async_action_handle_update() */
-	int (*async_action_handle_update)
-		(struct rte_eth_dev *dev,
-		 uint32_t queue_id,
-		 const struct rte_flow_op_attr *op_attr,
-		 struct rte_flow_action_handle *action_handle,
-		 const void *update,
-		 void *user_data,
-		 struct rte_flow_error *error);
-	/** See rte_flow_async_action_handle_query() */
-	int (*async_action_handle_query)
-		(struct rte_eth_dev *dev,
-		 uint32_t queue_id,
-		 const struct rte_flow_op_attr *op_attr,
-		 const struct rte_flow_action_handle *action_handle,
-		 void *data,
-		 void *user_data,
-		 struct rte_flow_error *error);
-	/** See rte_flow_async_action_handle_query_update */
-	int (*async_action_handle_query_update)
-		(struct rte_eth_dev *dev, uint32_t queue_id,
-		 const struct rte_flow_op_attr *op_attr,
-		 struct rte_flow_action_handle *action_handle,
-		 const void *update, void *query,
-		 enum rte_flow_query_update_mode qu_mode,
-		 void *user_data, struct rte_flow_error *error);
 	/** See rte_flow_actions_update(). */
 	int (*actions_update)
 		(struct rte_eth_dev *dev,
 		 struct rte_flow *flow,
 		 const struct rte_flow_action actions[],
 		 struct rte_flow_error *error);
-	/** See rte_flow_async_actions_update() */
-	int (*async_actions_update)
-		(struct rte_eth_dev *dev,
-		 uint32_t queue_id,
-		 const struct rte_flow_op_attr *op_attr,
-		 struct rte_flow *flow,
-		 const struct rte_flow_action actions[],
-		 uint8_t actions_template_index,
-		 void *user_data,
-		 struct rte_flow_error *error);
-	/** @see rte_flow_async_action_list_handle_create() */
-	struct rte_flow_action_list_handle *
-	(*async_action_list_handle_create)
-		(struct rte_eth_dev *dev, uint32_t queue_id,
-		 const struct rte_flow_op_attr *attr,
-		 const struct rte_flow_indir_action_conf *conf,
-		 const struct rte_flow_action *actions,
-		 void *user_data, struct rte_flow_error *error);
-	/** @see rte_flow_async_action_list_handle_destroy() */
-	int (*async_action_list_handle_destroy)
-		(struct rte_eth_dev *dev, uint32_t queue_id,
-		 const struct rte_flow_op_attr *op_attr,
-		 struct rte_flow_action_list_handle *action_handle,
-		 void *user_data, struct rte_flow_error *error);
 	/** @see rte_flow_action_list_handle_query_update() */
 	int (*action_list_handle_query_update)
 		(struct rte_eth_dev *dev,
@@ -357,14 +247,6 @@ struct rte_flow_ops {
 		 const void **update, void **query,
 		 enum rte_flow_query_update_mode mode,
 		 struct rte_flow_error *error);
-	/** @see rte_flow_async_action_list_handle_query_update() */
-	int (*async_action_list_handle_query_update)
-		(struct rte_eth_dev *dev, uint32_t queue_id,
-		 const struct rte_flow_op_attr *attr,
-		 const struct rte_flow_action_list_handle *handle,
-		 const void **update, void **query,
-		 enum rte_flow_query_update_mode mode,
-		 void *user_data, struct rte_flow_error *error);
 	/** @see rte_flow_calc_table_hash() */
 	int (*flow_calc_table_hash)
 		(struct rte_eth_dev *dev, const struct rte_flow_template_table *table,
@@ -394,6 +276,165 @@ rte_flow_ops_get(uint16_t port_id, struct rte_flow_error *error);
 int
 rte_flow_restore_info_dynflag_register(void);

+/** @internal Enqueue rule creation operation. */
+typedef struct rte_flow *(*rte_flow_async_create_t)(struct rte_eth_dev *dev,
+						    uint32_t queue,
+						    const struct rte_flow_op_attr *attr,
+						    struct rte_flow_template_table *table,
+						    const struct rte_flow_item *items,
+						    uint8_t pattern_template_index,
+						    const struct rte_flow_action *actions,
+						    uint8_t action_template_index,
+						    void *user_data,
+						    struct rte_flow_error *error);
+
+/** @internal Enqueue rule creation by index operation. */
+typedef struct rte_flow *(*rte_flow_async_create_by_index_t)(struct rte_eth_dev *dev,
+							     uint32_t queue,
+							     const struct rte_flow_op_attr *attr,
+							     struct rte_flow_template_table *table,
+							     uint32_t rule_index,
+							     const struct rte_flow_action *actions,
+							     uint8_t action_template_index,
+							     void *user_data,
+							     struct rte_flow_error *error);
+
+/** @internal Enqueue rule update operation. */
+typedef int (*rte_flow_async_actions_update_t)(struct rte_eth_dev *dev,
+					       uint32_t queue_id,
+					       const struct rte_flow_op_attr *op_attr,
+					       struct rte_flow *flow,
+					       const struct rte_flow_action *actions,
+					       uint8_t actions_template_index,
+					       void *user_data,
+					       struct rte_flow_error *error);
+
+/** @internal Enqueue rule destruction operation. */
+typedef int (*rte_flow_async_destroy_t)(struct rte_eth_dev *dev,
+					uint32_t queue_id,
+					const struct rte_flow_op_attr *op_attr,
+					struct rte_flow *flow,
+					void *user_data,
+					struct rte_flow_error *error);
+
+/** @internal Push all internally stored rules to the HW. */
+typedef int (*rte_flow_push_t)(struct rte_eth_dev *dev,
+			       uint32_t queue_id,
+			       struct rte_flow_error *error);
+
+/** @internal Pull the flow rule operations results from the HW. */
+typedef int (*rte_flow_pull_t)(struct rte_eth_dev *dev,
+			       uint32_t queue_id,
+			       struct rte_flow_op_result *res,
+			       uint16_t n_res,
+			       struct rte_flow_error *error);
+
+/** @internal Enqueue indirect action creation operation. */
+typedef struct rte_flow_action_handle *(*rte_flow_async_action_handle_create_t)(
+					struct rte_eth_dev *dev,
+					uint32_t queue_id,
+					const struct rte_flow_op_attr *op_attr,
+					const struct rte_flow_indir_action_conf *indir_action_conf,
+					const struct rte_flow_action *action,
+					void *user_data,
+					struct rte_flow_error *error);
+
+/** @internal Enqueue indirect action destruction operation. */
+typedef int (*rte_flow_async_action_handle_destroy_t)(struct rte_eth_dev *dev,
+						      uint32_t queue_id,
+						      const struct rte_flow_op_attr *op_attr,
+						      struct rte_flow_action_handle *action_handle,
+						      void *user_data,
+						      struct rte_flow_error *error);
+
+/** @internal Enqueue indirect action update operation. */
+typedef int (*rte_flow_async_action_handle_update_t)(struct rte_eth_dev *dev,
+						     uint32_t queue_id,
+						     const struct rte_flow_op_attr *op_attr,
+						     struct rte_flow_action_handle *action_handle,
+						     const void *update,
+						     void *user_data,
+						     struct rte_flow_error *error);
+
+/** @internal Enqueue indirect action query operation. */
+typedef int (*rte_flow_async_action_handle_query_t)
+		(struct rte_eth_dev *dev,
+		 uint32_t queue_id,
+		 const struct rte_flow_op_attr *op_attr,
+		 const struct rte_flow_action_handle *action_handle,
+		 void *data,
+		 void *user_data,
+		 struct rte_flow_error *error);
+
+/** @internal Enqueue indirect action query and/or update operation. */
+typedef int (*rte_flow_async_action_handle_query_update_t)(struct rte_eth_dev *dev,
+							   uint32_t queue_id,
+							   const struct rte_flow_op_attr *attr,
+							   struct rte_flow_action_handle *handle,
+							   const void *update, void *query,
+							   enum rte_flow_query_update_mode mode,
+							   void *user_data,
+							   struct rte_flow_error *error);
+
+/** @internal Enqueue indirect action list creation operation. */
+typedef struct rte_flow_action_list_handle *(*rte_flow_async_action_list_handle_create_t)(
+	struct rte_eth_dev *dev,
+	uint32_t queue_id,
+	const struct rte_flow_op_attr *attr,
+	const struct rte_flow_indir_action_conf *conf,
+	const struct rte_flow_action *actions,
+	void *user_data,
+	struct rte_flow_error *error);
+
+/** @internal Enqueue indirect action list destruction operation. */
+typedef int (*rte_flow_async_action_list_handle_destroy_t)(
+	struct rte_eth_dev *dev,
+	uint32_t queue_id,
+	const struct rte_flow_op_attr *op_attr,
+	struct rte_flow_action_list_handle *handle,
+	void *user_data,
+	struct rte_flow_error *error);
+
+/** @internal Enqueue indirect action list query and/or update operation. */
+typedef int (*rte_flow_async_action_list_handle_query_update_t)(
+	struct rte_eth_dev *dev,
+	uint32_t queue_id,
+	const struct rte_flow_op_attr *attr,
+	const struct rte_flow_action_list_handle *handle,
+	const void **update,
+	void **query,
+	enum rte_flow_query_update_mode mode,
+	void *user_data,
+	struct rte_flow_error *error);
+
+/**
+ * @internal
+ *
+ * Fast path async flow functions are held in a flat array, one entry per ethdev.
+ */
+struct rte_flow_fp_ops {
+	rte_flow_async_create_t async_create;
+	rte_flow_async_create_by_index_t async_create_by_index;
+	rte_flow_async_actions_update_t async_actions_update;
+	rte_flow_async_destroy_t async_destroy;
+	rte_flow_push_t push;
+	rte_flow_pull_t pull;
+	rte_flow_async_action_handle_create_t async_action_handle_create;
+	rte_flow_async_action_handle_destroy_t async_action_handle_destroy;
+	rte_flow_async_action_handle_update_t async_action_handle_update;
+	rte_flow_async_action_handle_query_t async_action_handle_query;
+	rte_flow_async_action_handle_query_update_t async_action_handle_query_update;
+	rte_flow_async_action_list_handle_create_t async_action_list_handle_create;
+	rte_flow_async_action_list_handle_destroy_t async_action_list_handle_destroy;
+	rte_flow_async_action_list_handle_query_update_t async_action_list_handle_query_update;
+} __rte_cache_aligned;
+
+/**
+ * @internal
+ * Default implementation of fast path flow API functions.
+ */
+extern struct rte_flow_fp_ops rte_flow_fp_default_ops;
+
 #ifdef __cplusplus
 }
 #endif
diff --git a/lib/ethdev/version.map b/lib/ethdev/version.map
index 5c4917c020..a8758084f6 100644
--- a/lib/ethdev/version.map
+++ b/lib/ethdev/version.map
@@ -345,4 +345,6 @@ INTERNAL {
 	rte_eth_representor_id_get;
 	rte_eth_switch_domain_alloc;
 	rte_eth_switch_domain_free;
+
+	rte_flow_fp_default_ops;
 };
--
2.25.1


^ permalink raw reply	[relevance 1%]

* [PATCH v7 1/4] ethdev: rename action modify field data structure
  2024-02-06  2:06  3% ` [PATCH v7 0/4] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Suanming Mou
@ 2024-02-06  2:06  3%   ` Suanming Mou
  2024-02-06 21:24  0%   ` [PATCH v7 0/4] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Ferruh Yigit
  1 sibling, 0 replies; 200+ results
From: Suanming Mou @ 2024-02-06  2:06 UTC (permalink / raw)
  To: thomas, ferruh.yigit, Ori Kam, Aman Singh, Yuying Zhang,
	Dariusz Sosnowski, Viacheslav Ovsiienko, Matan Azrad,
	Andrew Rybchenko
  Cc: dev

Current rte_flow_action_modify_data struct describes the pkt
field perfectly and is used only in action.

It is planned to be used for item as well. This commit renames
it to "rte_flow_field_data" making it compatible to be used by item.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 app/test-pmd/cmdline_flow.c            | 22 +++++++++++-----------
 doc/guides/prog_guide/rte_flow.rst     |  2 +-
 doc/guides/rel_notes/release_24_03.rst |  3 +++
 drivers/net/mlx5/mlx5_flow.c           |  4 ++--
 drivers/net/mlx5/mlx5_flow.h           |  6 +++---
 drivers/net/mlx5/mlx5_flow_dv.c        | 10 +++++-----
 lib/ethdev/rte_flow.h                  | 10 +++++-----
 7 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 4d26e81d26..35030b5c47 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -744,13 +744,13 @@ enum index {
 #define ITEM_RAW_SIZE \
 	(sizeof(struct rte_flow_item_raw) + ITEM_RAW_PATTERN_SIZE)
 
-/** Maximum size for external pattern in struct rte_flow_action_modify_data. */
-#define ACTION_MODIFY_PATTERN_SIZE 32
+/** Maximum size for external pattern in struct rte_flow_field_data. */
+#define FLOW_FIELD_PATTERN_SIZE 32
 
 /** Storage size for struct rte_flow_action_modify_field including pattern. */
 #define ACTION_MODIFY_SIZE \
 	(sizeof(struct rte_flow_action_modify_field) + \
-	ACTION_MODIFY_PATTERN_SIZE)
+	FLOW_FIELD_PATTERN_SIZE)
 
 /** Maximum number of queue indices in struct rte_flow_action_rss. */
 #define ACTION_RSS_QUEUE_NUM 128
@@ -944,7 +944,7 @@ static const char *const modify_field_ops[] = {
 	"set", "add", "sub", NULL
 };
 
-static const char *const modify_field_ids[] = {
+static const char *const flow_field_ids[] = {
 	"start", "mac_dst", "mac_src",
 	"vlan_type", "vlan_id", "mac_type",
 	"ipv4_dscp", "ipv4_ttl", "ipv4_src", "ipv4_dst",
@@ -6995,7 +6995,7 @@ static const struct token token_list[] = {
 			     ARGS_ENTRY_ARB(0, 0),
 			     ARGS_ENTRY_ARB
 				(sizeof(struct rte_flow_action_modify_field),
-				 ACTION_MODIFY_PATTERN_SIZE)),
+				 FLOW_FIELD_PATTERN_SIZE)),
 		.call = parse_vc_conf,
 	},
 	[ACTION_MODIFY_FIELD_WIDTH] = {
@@ -9821,10 +9821,10 @@ parse_vc_modify_field_id(struct context *ctx, const struct token *token,
 	if (ctx->curr != ACTION_MODIFY_FIELD_DST_TYPE_VALUE &&
 		ctx->curr != ACTION_MODIFY_FIELD_SRC_TYPE_VALUE)
 		return -1;
-	for (i = 0; modify_field_ids[i]; ++i)
-		if (!strcmp_partial(modify_field_ids[i], str, len))
+	for (i = 0; flow_field_ids[i]; ++i)
+		if (!strcmp_partial(flow_field_ids[i], str, len))
 			break;
-	if (!modify_field_ids[i])
+	if (!flow_field_ids[i])
 		return -1;
 	if (!ctx->object)
 		return len;
@@ -12051,10 +12051,10 @@ comp_set_modify_field_id(struct context *ctx, const struct token *token,
 
 	RTE_SET_USED(token);
 	if (!buf)
-		return RTE_DIM(modify_field_ids);
-	if (ent >= RTE_DIM(modify_field_ids) - 1)
+		return RTE_DIM(flow_field_ids);
+	if (ent >= RTE_DIM(flow_field_ids) - 1)
 		return -1;
-	name = modify_field_ids[ent];
+	name = flow_field_ids[ent];
 	if (ctx->curr == ACTION_MODIFY_FIELD_SRC_TYPE ||
 	    (strcmp(name, "pointer") && strcmp(name, "value")))
 		return strlcpy(buf, name, size);
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 7af329bd93..9192d6ab01 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3185,7 +3185,7 @@ destination offset as ``48``, and provide immediate value ``0xXXXX85XX``.
    | ``width``     | number of bits to use   |
    +---------------+-------------------------+
 
-.. _table_rte_flow_action_modify_data:
+.. _table_rte_flow_field_data:
 
 .. table:: destination/source field definition
 
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 84d3144215..222a091e8b 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -106,6 +106,9 @@ API Changes
 
 * gso: ``rte_gso_segment`` now returns -ENOTSUP for unknown protocols.
 
+* ethdev: Renamed structure ``rte_flow_action_modify_data`` to be
+  ``rte_flow_field_data`` for more generic usage.
+
 
 ABI Changes
 -----------
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 85e8c77c81..5788a7fb57 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2493,7 +2493,7 @@ mlx5_validate_action_ct(struct rte_eth_dev *dev,
  * Validate the level value for modify field action.
  *
  * @param[in] data
- *   Pointer to the rte_flow_action_modify_data structure either src or dst.
+ *   Pointer to the rte_flow_field_data structure either src or dst.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2501,7 +2501,7 @@ mlx5_validate_action_ct(struct rte_eth_dev *dev,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-flow_validate_modify_field_level(const struct rte_flow_action_modify_data *data,
+flow_validate_modify_field_level(const struct rte_flow_field_data *data,
 				 struct rte_flow_error *error)
 {
 	if (data->level == 0)
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 6dde9de688..ecfb04ead2 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1121,7 +1121,7 @@ flow_items_to_tunnel(const struct rte_flow_item items[])
  *   Tag array index.
  */
 static inline uint8_t
-flow_tag_index_get(const struct rte_flow_action_modify_data *data)
+flow_tag_index_get(const struct rte_flow_field_data *data)
 {
 	return data->tag_index ? data->tag_index : data->level;
 }
@@ -2523,7 +2523,7 @@ int mlx5_flow_validate_action_default_miss(uint64_t action_flags,
 				const struct rte_flow_attr *attr,
 				struct rte_flow_error *error);
 int flow_validate_modify_field_level
-			(const struct rte_flow_action_modify_data *data,
+			(const struct rte_flow_field_data *data,
 			 struct rte_flow_error *error);
 int mlx5_flow_item_acceptable(const struct rte_flow_item *item,
 			      const uint8_t *mask,
@@ -2828,7 +2828,7 @@ size_t flow_dv_get_item_hdr_len(const enum rte_flow_item_type item_type);
 int flow_dv_convert_encap_data(const struct rte_flow_item *items, uint8_t *buf,
 			   size_t *size, struct rte_flow_error *error);
 void mlx5_flow_field_id_to_modify_info
-		(const struct rte_flow_action_modify_data *data,
+		(const struct rte_flow_field_data *data,
 		 struct field_modify_info *info, uint32_t *mask,
 		 uint32_t width, struct rte_eth_dev *dev,
 		 const struct rte_flow_attr *attr, struct rte_flow_error *error);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 115d730317..52620be262 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1441,7 +1441,7 @@ flow_modify_info_mask_32_masked(uint32_t length, uint32_t off, uint32_t post_mas
 }
 
 static __rte_always_inline enum mlx5_modification_field
-mlx5_mpls_modi_field_get(const struct rte_flow_action_modify_data *data)
+mlx5_mpls_modi_field_get(const struct rte_flow_field_data *data)
 {
 	return MLX5_MODI_IN_MPLS_LABEL_0 + data->tag_index;
 }
@@ -1449,7 +1449,7 @@ mlx5_mpls_modi_field_get(const struct rte_flow_action_modify_data *data)
 static void
 mlx5_modify_flex_item(const struct rte_eth_dev *dev,
 		      const struct mlx5_flex_item *flex,
-		      const struct rte_flow_action_modify_data *data,
+		      const struct rte_flow_field_data *data,
 		      struct field_modify_info *info,
 		      uint32_t *mask, uint32_t width)
 {
@@ -1573,7 +1573,7 @@ mlx5_modify_flex_item(const struct rte_eth_dev *dev,
 
 void
 mlx5_flow_field_id_to_modify_info
-		(const struct rte_flow_action_modify_data *data,
+		(const struct rte_flow_field_data *data,
 		 struct field_modify_info *info, uint32_t *mask,
 		 uint32_t width, struct rte_eth_dev *dev,
 		 const struct rte_flow_attr *attr, struct rte_flow_error *error)
@@ -5284,8 +5284,8 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	struct mlx5_sh_config *config = &priv->sh->config;
 	struct mlx5_hca_attr *hca_attr = &priv->sh->cdev->config.hca_attr;
 	const struct rte_flow_action_modify_field *conf = action->conf;
-	const struct rte_flow_action_modify_data *src_data = &conf->src;
-	const struct rte_flow_action_modify_data *dst_data = &conf->dst;
+	const struct rte_flow_field_data *src_data = &conf->src;
+	const struct rte_flow_field_data *dst_data = &conf->dst;
 	uint32_t dst_width, src_width, width = conf->width;
 
 	ret = flow_dv_validate_action_modify_hdr(action_flags, action, error);
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 1dded812ec..eb46cfe09e 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -3893,7 +3893,7 @@ struct rte_flow_action_ethdev {
 };
 
 /**
- * Field IDs for MODIFY_FIELD action.
+ * Packet header field IDs, used by RTE_FLOW_ACTION_TYPE_MODIFY_FIELD.
  */
 enum rte_flow_field_id {
 	RTE_FLOW_FIELD_START = 0,	/**< Start of a packet. */
@@ -3947,9 +3947,9 @@ enum rte_flow_field_id {
  * @warning
  * @b EXPERIMENTAL: this structure may change without prior notice
  *
- * Field description for MODIFY_FIELD action.
+ * Packet header field descriptions, used by RTE_FLOW_ACTION_TYPE_MODIFY_FIELD.
  */
-struct rte_flow_action_modify_data {
+struct rte_flow_field_data {
 	enum rte_flow_field_id field; /**< Field or memory type ID. */
 	union {
 		struct {
@@ -4058,8 +4058,8 @@ enum rte_flow_modify_op {
  */
 struct rte_flow_action_modify_field {
 	enum rte_flow_modify_op operation; /**< Operation to perform. */
-	struct rte_flow_action_modify_data dst; /**< Destination field. */
-	struct rte_flow_action_modify_data src; /**< Source field. */
+	struct rte_flow_field_data dst; /**< Destination field. */
+	struct rte_flow_field_data src; /**< Source field. */
 	uint32_t width; /**< Number of bits to use from a source field. */
 };
 
-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* [PATCH v7 0/4] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE
    2024-02-01 12:29  3% ` [PATCH v5 0/3] " Suanming Mou
  2024-02-02  0:42  3% ` [PATCH v6 " Suanming Mou
@ 2024-02-06  2:06  3% ` Suanming Mou
  2024-02-06  2:06  3%   ` [PATCH v7 1/4] ethdev: rename action modify field data structure Suanming Mou
  2024-02-06 21:24  0%   ` [PATCH v7 0/4] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Ferruh Yigit
  2 siblings, 2 replies; 200+ results
From: Suanming Mou @ 2024-02-06  2:06 UTC (permalink / raw)
  To: thomas, ferruh.yigit; +Cc: dev, orika

The new item type is added for the case user wants to match traffic
based on packet field compare result with other fields or immediate
value.

e.g. take advantage the compare item user will be able to accumulate
a IPv4/TCP packet's TCP data_offset and IPv4 IHL field to a tag
register, then compare the tag register with IPv4 header total length
to understand the packet has payload or not.

The supported operations can be as below:
 - RTE_FLOW_ITEM_COMPARE_EQ (equal)
 - RTE_FLOW_ITEM_COMPARE_NE (not equal)
 - RTE_FLOW_ITEM_COMPARE_LT (less than)
 - RTE_FLOW_ITEM_COMPARE_LE (less than or equal)
 - RTE_FLOW_ITEM_COMPARE_GT (great than)
 - RTE_FLOW_ITEM_COMPARE_GE (great than or equal)

V7:
 - Moved release notes to API.
 - Optimize comment descriptions.

V6:
 - fix typo and style issue.
 - adjust flow_field description.

V5:
 - rebase on top of next-net
 - add sample detail for rte_flow_field.

V4:
 - rebase on top of the latest version.
 - move ACTION_MODIFY_PATTERN_SIZE and modify_field_ids rename
   to first patch.
 - add comparison flow create sample in testpmd_funcs.rst.

V3:
 - fix code style missing empty line in rte_flow.rst.
 - fix missing the ABI change release notes.

V2:
 - Since modify field data struct is experiment, rename modify
   field data directly instead of adding new flow field struct.


Suanming Mou (4):
  ethdev: rename action modify field data structure
  ethdev: move flow field data structures
  ethdev: add compare item
  net/mlx5: add compare item support

 app/test-pmd/cmdline_flow.c                 | 416 +++++++++++++++++++-
 doc/guides/nics/features/default.ini        |   1 +
 doc/guides/nics/features/mlx5.ini           |   1 +
 doc/guides/nics/mlx5.rst                    |   7 +
 doc/guides/prog_guide/rte_flow.rst          |   9 +-
 doc/guides/rel_notes/release_24_03.rst      |  10 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  24 ++
 drivers/net/mlx5/mlx5_flow.c                |   4 +-
 drivers/net/mlx5/mlx5_flow.h                |   9 +-
 drivers/net/mlx5/mlx5_flow_dv.c             |  10 +-
 drivers/net/mlx5/mlx5_flow_hw.c             |  73 ++++
 lib/ethdev/rte_flow.c                       |   1 +
 lib/ethdev/rte_flow.h                       | 330 +++++++++-------
 13 files changed, 726 insertions(+), 169 deletions(-)

-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* [PATCH v9 05/23] mbuf: replace term sanity check
  @ 2024-02-05 17:43  2%   ` Stephen Hemminger
  0 siblings, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-02-05 17:43 UTC (permalink / raw)
  To: dev
  Cc: Stephen Hemminger, Andrew Rybchenko, Morten Brørup,
	Steven Webster, Matt Peters

Replace rte_mbuf_sanity_check() with rte_mbuf_verify()
to match the similar macro RTE_VERIFY() in rte_debug.h

Good wording from discussion english.stackexchange.com:
  The phrase "sanity check" is ableist language as it implies that
  there is something wrong with people who have mental illnesses
  and the word "sanity" has been used to discriminate against such people.
  Therefore, it should should be avoided.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
Acked-by: Morten Brørup <mb@smartsharesystems.com>
---
 app/test/test_mbuf.c                 | 28 +++++------
 doc/guides/prog_guide/mbuf_lib.rst   |  4 +-
 doc/guides/rel_notes/deprecation.rst |  3 ++
 drivers/net/avp/avp_ethdev.c         | 18 +++----
 drivers/net/sfc/sfc_ef100_rx.c       |  6 +--
 drivers/net/sfc/sfc_ef10_essb_rx.c   |  4 +-
 drivers/net/sfc/sfc_ef10_rx.c        |  4 +-
 drivers/net/sfc/sfc_rx.c             |  2 +-
 examples/ipv4_multicast/main.c       |  2 +-
 lib/mbuf/rte_mbuf.c                  | 23 +++++----
 lib/mbuf/rte_mbuf.h                  | 71 +++++++++++++++-------------
 lib/mbuf/version.map                 |  1 +
 12 files changed, 90 insertions(+), 76 deletions(-)

diff --git a/app/test/test_mbuf.c b/app/test/test_mbuf.c
index d7393df7eb5d..261c6e5d71e9 100644
--- a/app/test/test_mbuf.c
+++ b/app/test/test_mbuf.c
@@ -261,8 +261,8 @@ test_one_pktmbuf(struct rte_mempool *pktmbuf_pool)
 		GOTO_FAIL("Buffer should be continuous");
 	memset(hdr, 0x55, MBUF_TEST_HDR2_LEN);
 
-	rte_mbuf_sanity_check(m, 1);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 1);
+	rte_mbuf_verify(m, 0);
 	rte_pktmbuf_dump(stdout, m, 0);
 
 	/* this prepend should fail */
@@ -1161,7 +1161,7 @@ test_refcnt_mbuf(void)
 
 #ifdef RTE_EXEC_ENV_WINDOWS
 static int
-test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
+test_failing_mbuf_verify(struct rte_mempool *pktmbuf_pool)
 {
 	RTE_SET_USED(pktmbuf_pool);
 	return TEST_SKIPPED;
@@ -1180,12 +1180,12 @@ mbuf_check_pass(struct rte_mbuf *buf)
 }
 
 static int
-test_failing_mbuf_sanity_check(struct rte_mempool *pktmbuf_pool)
+test_failing_mbuf_verify(struct rte_mempool *pktmbuf_pool)
 {
 	struct rte_mbuf *buf;
 	struct rte_mbuf badbuf;
 
-	printf("Checking rte_mbuf_sanity_check for failure conditions\n");
+	printf("Checking rte_mbuf_verify for failure conditions\n");
 
 	/* get a good mbuf to use to make copies */
 	buf = rte_pktmbuf_alloc(pktmbuf_pool);
@@ -1707,7 +1707,7 @@ test_mbuf_validate_tx_offload(const char *test_name,
 		GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
 	if (rte_pktmbuf_pkt_len(m) != 0)
 		GOTO_FAIL("%s: Bad packet length\n", __func__);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 0);
 	m->ol_flags = ol_flags;
 	m->tso_segsz = segsize;
 	ret = rte_validate_tx_offload(m);
@@ -1914,7 +1914,7 @@ test_pktmbuf_read(struct rte_mempool *pktmbuf_pool)
 		GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
 	if (rte_pktmbuf_pkt_len(m) != 0)
 		GOTO_FAIL("%s: Bad packet length\n", __func__);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 0);
 
 	data = rte_pktmbuf_append(m, MBUF_TEST_DATA_LEN2);
 	if (data == NULL)
@@ -1963,7 +1963,7 @@ test_pktmbuf_read_from_offset(struct rte_mempool *pktmbuf_pool)
 
 	if (rte_pktmbuf_pkt_len(m) != 0)
 		GOTO_FAIL("%s: Bad packet length\n", __func__);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 0);
 
 	/* prepend an ethernet header */
 	hdr = (struct ether_hdr *)rte_pktmbuf_prepend(m, hdr_len);
@@ -2108,7 +2108,7 @@ create_packet(struct rte_mempool *pktmbuf_pool,
 			GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
 		if (rte_pktmbuf_pkt_len(pkt_seg) != 0)
 			GOTO_FAIL("%s: Bad packet length\n", __func__);
-		rte_mbuf_sanity_check(pkt_seg, 0);
+		rte_mbuf_verify(pkt_seg, 0);
 		/* Add header only for the first segment */
 		if (test_data->flags == MBUF_HEADER && seg == 0) {
 			hdr_len = sizeof(struct rte_ether_hdr);
@@ -2320,7 +2320,7 @@ test_pktmbuf_ext_shinfo_init_helper(struct rte_mempool *pktmbuf_pool)
 		GOTO_FAIL("%s: mbuf allocation failed!\n", __func__);
 	if (rte_pktmbuf_pkt_len(m) != 0)
 		GOTO_FAIL("%s: Bad packet length\n", __func__);
-	rte_mbuf_sanity_check(m, 0);
+	rte_mbuf_verify(m, 0);
 
 	ext_buf_addr = rte_malloc("External buffer", buf_len,
 			RTE_CACHE_LINE_SIZE);
@@ -2484,8 +2484,8 @@ test_pktmbuf_ext_pinned_buffer(struct rte_mempool *std_pool)
 		GOTO_FAIL("%s: test_pktmbuf_copy(pinned) failed\n",
 			  __func__);
 
-	if (test_failing_mbuf_sanity_check(pinned_pool) < 0)
-		GOTO_FAIL("%s: test_failing_mbuf_sanity_check(pinned)"
+	if (test_failing_mbuf_verify(pinned_pool) < 0)
+		GOTO_FAIL("%s: test_failing_mbuf_verify(pinned)"
 			  " failed\n", __func__);
 
 	if (test_mbuf_linearize_check(pinned_pool) < 0)
@@ -2859,8 +2859,8 @@ test_mbuf(void)
 		goto err;
 	}
 
-	if (test_failing_mbuf_sanity_check(pktmbuf_pool) < 0) {
-		printf("test_failing_mbuf_sanity_check() failed\n");
+	if (test_failing_mbuf_verify(pktmbuf_pool) < 0) {
+		printf("test_failing_mbuf_verify() failed\n");
 		goto err;
 	}
 
diff --git a/doc/guides/prog_guide/mbuf_lib.rst b/doc/guides/prog_guide/mbuf_lib.rst
index 049357c75563..0accb51a98c7 100644
--- a/doc/guides/prog_guide/mbuf_lib.rst
+++ b/doc/guides/prog_guide/mbuf_lib.rst
@@ -266,8 +266,8 @@ can be found in several of the sample applications, for example, the IPv4 Multic
 Debug
 -----
 
-In debug mode, the functions of the mbuf library perform sanity checks before any operation (such as, buffer corruption,
-bad type, and so on).
+In debug mode, the functions of the mbuf library perform consistency checks
+before any operation (such as, buffer corruption, bad type, and so on).
 
 Use Cases
 ---------
diff --git a/doc/guides/rel_notes/deprecation.rst b/doc/guides/rel_notes/deprecation.rst
index 81b93515cbd9..1e1544b5b644 100644
--- a/doc/guides/rel_notes/deprecation.rst
+++ b/doc/guides/rel_notes/deprecation.rst
@@ -146,3 +146,6 @@ Deprecation Notices
   will be deprecated and subsequently removed in DPDK 24.11 release.
   Before this, the new port library API (functions rte_swx_port_*)
   will gradually transition from experimental to stable status.
+
+* mbuf: The function ``rte_mbuf_sanity_check`` is deprecated.
+  Use the new function ``rte_mbuf_verify`` instead.
diff --git a/drivers/net/avp/avp_ethdev.c b/drivers/net/avp/avp_ethdev.c
index 53d9e38c939b..ae76fad84948 100644
--- a/drivers/net/avp/avp_ethdev.c
+++ b/drivers/net/avp/avp_ethdev.c
@@ -1231,7 +1231,7 @@ _avp_mac_filter(struct avp_dev *avp, struct rte_mbuf *m)
 
 #ifdef RTE_LIBRTE_AVP_DEBUG_BUFFERS
 static inline void
-__avp_dev_buffer_sanity_check(struct avp_dev *avp, struct rte_avp_desc *buf)
+__avp_dev_buffer_check(struct avp_dev *avp, struct rte_avp_desc *buf)
 {
 	struct rte_avp_desc *first_buf;
 	struct rte_avp_desc *pkt_buf;
@@ -1272,12 +1272,12 @@ __avp_dev_buffer_sanity_check(struct avp_dev *avp, struct rte_avp_desc *buf)
 			  first_buf->pkt_len, pkt_len);
 }
 
-#define avp_dev_buffer_sanity_check(a, b) \
-	__avp_dev_buffer_sanity_check((a), (b))
+#define avp_dev_buffer_check(a, b) \
+	__avp_dev_buffer_check((a), (b))
 
 #else /* RTE_LIBRTE_AVP_DEBUG_BUFFERS */
 
-#define avp_dev_buffer_sanity_check(a, b) do {} while (0)
+#define avp_dev_buffer_check(a, b) do {} while (0)
 
 #endif
 
@@ -1302,7 +1302,7 @@ avp_dev_copy_from_buffers(struct avp_dev *avp,
 	void *pkt_data;
 	unsigned int i;
 
-	avp_dev_buffer_sanity_check(avp, buf);
+	avp_dev_buffer_check(avp, buf);
 
 	/* setup the first source buffer */
 	pkt_buf = avp_dev_translate_buffer(avp, buf);
@@ -1370,7 +1370,7 @@ avp_dev_copy_from_buffers(struct avp_dev *avp,
 	rte_pktmbuf_pkt_len(m) = total_length;
 	m->vlan_tci = vlan_tci;
 
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	return m;
 }
@@ -1614,7 +1614,7 @@ avp_dev_copy_to_buffers(struct avp_dev *avp,
 	char *pkt_data;
 	unsigned int i;
 
-	__rte_mbuf_sanity_check(mbuf, 1);
+	__rte_mbuf_verify(mbuf, 1);
 
 	m = mbuf;
 	src_offset = 0;
@@ -1680,7 +1680,7 @@ avp_dev_copy_to_buffers(struct avp_dev *avp,
 		first_buf->vlan_tci = mbuf->vlan_tci;
 	}
 
-	avp_dev_buffer_sanity_check(avp, buffers[0]);
+	avp_dev_buffer_check(avp, buffers[0]);
 
 	return total_length;
 }
@@ -1798,7 +1798,7 @@ avp_xmit_scattered_pkts(void *tx_queue,
 
 #ifdef RTE_LIBRTE_AVP_DEBUG_BUFFERS
 	for (i = 0; i < nb_pkts; i++)
-		avp_dev_buffer_sanity_check(avp, tx_bufs[i]);
+		avp_dev_buffer_check(avp, tx_bufs[i]);
 #endif
 
 	/* send the packets */
diff --git a/drivers/net/sfc/sfc_ef100_rx.c b/drivers/net/sfc/sfc_ef100_rx.c
index 2677003da326..8199b56f2740 100644
--- a/drivers/net/sfc/sfc_ef100_rx.c
+++ b/drivers/net/sfc/sfc_ef100_rx.c
@@ -179,7 +179,7 @@ sfc_ef100_rx_qrefill(struct sfc_ef100_rxq *rxq)
 			struct sfc_ef100_rx_sw_desc *rxd;
 			rte_iova_t dma_addr;
 
-			__rte_mbuf_raw_sanity_check(m);
+			__rte_mbuf_raw_verify(m);
 
 			dma_addr = rte_mbuf_data_iova_default(m);
 			if (rxq->flags & SFC_EF100_RXQ_NIC_DMA_MAP) {
@@ -551,7 +551,7 @@ sfc_ef100_rx_process_ready_pkts(struct sfc_ef100_rxq *rxq,
 		rxq->ready_pkts--;
 
 		pkt = sfc_ef100_rx_next_mbuf(rxq);
-		__rte_mbuf_raw_sanity_check(pkt);
+		__rte_mbuf_raw_verify(pkt);
 
 		RTE_BUILD_BUG_ON(sizeof(pkt->rearm_data[0]) !=
 				 sizeof(rxq->rearm_data));
@@ -575,7 +575,7 @@ sfc_ef100_rx_process_ready_pkts(struct sfc_ef100_rxq *rxq,
 			struct rte_mbuf *seg;
 
 			seg = sfc_ef100_rx_next_mbuf(rxq);
-			__rte_mbuf_raw_sanity_check(seg);
+			__rte_mbuf_raw_verify(seg);
 
 			seg->data_off = RTE_PKTMBUF_HEADROOM;
 
diff --git a/drivers/net/sfc/sfc_ef10_essb_rx.c b/drivers/net/sfc/sfc_ef10_essb_rx.c
index 78bd430363b1..74647e2792b1 100644
--- a/drivers/net/sfc/sfc_ef10_essb_rx.c
+++ b/drivers/net/sfc/sfc_ef10_essb_rx.c
@@ -125,7 +125,7 @@ sfc_ef10_essb_next_mbuf(const struct sfc_ef10_essb_rxq *rxq,
 	struct rte_mbuf *m;
 
 	m = (struct rte_mbuf *)((uintptr_t)mbuf + rxq->buf_stride);
-	__rte_mbuf_raw_sanity_check(m);
+	__rte_mbuf_raw_verify(m);
 	return m;
 }
 
@@ -136,7 +136,7 @@ sfc_ef10_essb_mbuf_by_index(const struct sfc_ef10_essb_rxq *rxq,
 	struct rte_mbuf *m;
 
 	m = (struct rte_mbuf *)((uintptr_t)mbuf + idx * rxq->buf_stride);
-	__rte_mbuf_raw_sanity_check(m);
+	__rte_mbuf_raw_verify(m);
 	return m;
 }
 
diff --git a/drivers/net/sfc/sfc_ef10_rx.c b/drivers/net/sfc/sfc_ef10_rx.c
index 30a320d0791c..72b03b3bba7a 100644
--- a/drivers/net/sfc/sfc_ef10_rx.c
+++ b/drivers/net/sfc/sfc_ef10_rx.c
@@ -148,7 +148,7 @@ sfc_ef10_rx_qrefill(struct sfc_ef10_rxq *rxq)
 			struct sfc_ef10_rx_sw_desc *rxd;
 			rte_iova_t phys_addr;
 
-			__rte_mbuf_raw_sanity_check(m);
+			__rte_mbuf_raw_verify(m);
 
 			SFC_ASSERT((id & ~ptr_mask) == 0);
 			rxd = &rxq->sw_ring[id];
@@ -297,7 +297,7 @@ sfc_ef10_rx_process_event(struct sfc_ef10_rxq *rxq, efx_qword_t rx_ev,
 		rxd = &rxq->sw_ring[pending++ & ptr_mask];
 		m = rxd->mbuf;
 
-		__rte_mbuf_raw_sanity_check(m);
+		__rte_mbuf_raw_verify(m);
 
 		m->data_off = RTE_PKTMBUF_HEADROOM;
 		rte_pktmbuf_data_len(m) = seg_len;
diff --git a/drivers/net/sfc/sfc_rx.c b/drivers/net/sfc/sfc_rx.c
index 1dde2c111001..645c8643d1c1 100644
--- a/drivers/net/sfc/sfc_rx.c
+++ b/drivers/net/sfc/sfc_rx.c
@@ -120,7 +120,7 @@ sfc_efx_rx_qrefill(struct sfc_efx_rxq *rxq)
 		     ++i, id = (id + 1) & rxq->ptr_mask) {
 			m = objs[i];
 
-			__rte_mbuf_raw_sanity_check(m);
+			__rte_mbuf_raw_verify(m);
 
 			rxd = &rxq->sw_desc[id];
 			rxd->mbuf = m;
diff --git a/examples/ipv4_multicast/main.c b/examples/ipv4_multicast/main.c
index 6d0a8501eff5..f39658f4e249 100644
--- a/examples/ipv4_multicast/main.c
+++ b/examples/ipv4_multicast/main.c
@@ -258,7 +258,7 @@ mcast_out_pkt(struct rte_mbuf *pkt, int use_clone)
 	hdr->pkt_len = (uint16_t)(hdr->data_len + pkt->pkt_len);
 	hdr->nb_segs = pkt->nb_segs + 1;
 
-	__rte_mbuf_sanity_check(hdr, 1);
+	__rte_mbuf_verify(hdr, 1);
 	return hdr;
 }
 /* >8 End of mcast_out_kt. */
diff --git a/lib/mbuf/rte_mbuf.c b/lib/mbuf/rte_mbuf.c
index 559d5ad8a71c..fc5d4ba29db1 100644
--- a/lib/mbuf/rte_mbuf.c
+++ b/lib/mbuf/rte_mbuf.c
@@ -367,9 +367,9 @@ rte_pktmbuf_pool_create_extbuf(const char *name, unsigned int n,
 	return mp;
 }
 
-/* do some sanity checks on a mbuf: panic if it fails */
+/* do some checks on a mbuf: panic if it fails */
 void
-rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
+rte_mbuf_verify(const struct rte_mbuf *m, int is_header)
 {
 	const char *reason;
 
@@ -377,6 +377,13 @@ rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
 		rte_panic("%s\n", reason);
 }
 
+/* For ABI compatibility, to be removed in next release */
+void
+rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header)
+{
+	rte_mbuf_verify(m, is_header);
+}
+
 int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
 		   const char **reason)
 {
@@ -496,7 +503,7 @@ void rte_pktmbuf_free_bulk(struct rte_mbuf **mbufs, unsigned int count)
 		if (unlikely(m == NULL))
 			continue;
 
-		__rte_mbuf_sanity_check(m, 1);
+		__rte_mbuf_verify(m, 1);
 
 		do {
 			m_next = m->next;
@@ -546,7 +553,7 @@ rte_pktmbuf_clone(struct rte_mbuf *md, struct rte_mempool *mp)
 		return NULL;
 	}
 
-	__rte_mbuf_sanity_check(mc, 1);
+	__rte_mbuf_verify(mc, 1);
 	return mc;
 }
 
@@ -596,7 +603,7 @@ rte_pktmbuf_copy(const struct rte_mbuf *m, struct rte_mempool *mp,
 	struct rte_mbuf *mc, *m_last, **prev;
 
 	/* garbage in check */
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	/* check for request to copy at offset past end of mbuf */
 	if (unlikely(off >= m->pkt_len))
@@ -660,7 +667,7 @@ rte_pktmbuf_copy(const struct rte_mbuf *m, struct rte_mempool *mp,
 	}
 
 	/* garbage out check */
-	__rte_mbuf_sanity_check(mc, 1);
+	__rte_mbuf_verify(mc, 1);
 	return mc;
 }
 
@@ -671,7 +678,7 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
 	unsigned int len;
 	unsigned int nb_segs;
 
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	fprintf(f, "dump mbuf at %p, iova=%#" PRIx64 ", buf_len=%u\n", m, rte_mbuf_iova_get(m),
 		m->buf_len);
@@ -689,7 +696,7 @@ rte_pktmbuf_dump(FILE *f, const struct rte_mbuf *m, unsigned dump_len)
 	nb_segs = m->nb_segs;
 
 	while (m && nb_segs != 0) {
-		__rte_mbuf_sanity_check(m, 0);
+		__rte_mbuf_verify(m, 0);
 
 		fprintf(f, "  segment at %p, data=%p, len=%u, off=%u, refcnt=%u\n",
 			m, rte_pktmbuf_mtod(m, void *),
diff --git a/lib/mbuf/rte_mbuf.h b/lib/mbuf/rte_mbuf.h
index 286b32b788a5..380663a0893b 100644
--- a/lib/mbuf/rte_mbuf.h
+++ b/lib/mbuf/rte_mbuf.h
@@ -339,13 +339,13 @@ rte_pktmbuf_priv_flags(struct rte_mempool *mp)
 
 #ifdef RTE_LIBRTE_MBUF_DEBUG
 
-/**  check mbuf type in debug mode */
-#define __rte_mbuf_sanity_check(m, is_h) rte_mbuf_sanity_check(m, is_h)
+/**  do mbuf type in debug mode */
+#define __rte_mbuf_verify(m, is_h) rte_mbuf_verify(m, is_h)
 
 #else /*  RTE_LIBRTE_MBUF_DEBUG */
 
-/**  check mbuf type in debug mode */
-#define __rte_mbuf_sanity_check(m, is_h) do { } while (0)
+/**  ignore mbuf checks if not in debug mode */
+#define __rte_mbuf_verify(m, is_h) do { } while (0)
 
 #endif /*  RTE_LIBRTE_MBUF_DEBUG */
 
@@ -514,10 +514,9 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
 
 
 /**
- * Sanity checks on an mbuf.
+ * Check that the mbuf is valid and panic if corrupted.
  *
- * Check the consistency of the given mbuf. The function will cause a
- * panic if corruption is detected.
+ * Acts assertion that mbuf is consistent. If not it calls rte_panic().
  *
  * @param m
  *   The mbuf to be checked.
@@ -526,13 +525,17 @@ rte_mbuf_ext_refcnt_update(struct rte_mbuf_ext_shared_info *shinfo,
  *   of a packet (in this case, some fields like nb_segs are not checked)
  */
 void
+rte_mbuf_verify(const struct rte_mbuf *m, int is_header);
+
+/* Older deprecated name for rte_mbuf_verify() */
+void __rte_deprecated
 rte_mbuf_sanity_check(const struct rte_mbuf *m, int is_header);
 
 /**
- * Sanity checks on a mbuf.
+ * Do consistency checks on a mbuf.
  *
- * Almost like rte_mbuf_sanity_check(), but this function gives the reason
- * if corruption is detected rather than panic.
+ * Check the consistency of the given mbuf and if not valid
+ * return the reason.
  *
  * @param m
  *   The mbuf to be checked.
@@ -551,7 +554,7 @@ int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
 		   const char **reason);
 
 /**
- * Sanity checks on a reinitialized mbuf in debug mode.
+ * Do checks on a reinitialized mbuf in debug mode.
  *
  * Check the consistency of the given reinitialized mbuf.
  * The function will cause a panic if corruption is detected.
@@ -563,16 +566,16 @@ int rte_mbuf_check(const struct rte_mbuf *m, int is_header,
  *   The mbuf to be checked.
  */
 static __rte_always_inline void
-__rte_mbuf_raw_sanity_check(__rte_unused const struct rte_mbuf *m)
+__rte_mbuf_raw_verify(__rte_unused const struct rte_mbuf *m)
 {
 	RTE_ASSERT(rte_mbuf_refcnt_read(m) == 1);
 	RTE_ASSERT(m->next == NULL);
 	RTE_ASSERT(m->nb_segs == 1);
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(m, 0);
 }
 
 /** For backwards compatibility. */
-#define MBUF_RAW_ALLOC_CHECK(m) __rte_mbuf_raw_sanity_check(m)
+#define MBUF_RAW_ALLOC_CHECK(m) __rte_mbuf_raw_verify(m)
 
 /**
  * Allocate an uninitialized mbuf from mempool *mp*.
@@ -599,7 +602,7 @@ static inline struct rte_mbuf *rte_mbuf_raw_alloc(struct rte_mempool *mp)
 
 	if (rte_mempool_get(mp, (void **)&m) < 0)
 		return NULL;
-	__rte_mbuf_raw_sanity_check(m);
+	__rte_mbuf_raw_verify(m);
 	return m;
 }
 
@@ -622,7 +625,7 @@ rte_mbuf_raw_free(struct rte_mbuf *m)
 {
 	RTE_ASSERT(!RTE_MBUF_CLONED(m) &&
 		  (!RTE_MBUF_HAS_EXTBUF(m) || RTE_MBUF_HAS_PINNED_EXTBUF(m)));
-	__rte_mbuf_raw_sanity_check(m);
+	__rte_mbuf_raw_verify(m);
 	rte_mempool_put(m->pool, m);
 }
 
@@ -885,7 +888,7 @@ static inline void rte_pktmbuf_reset(struct rte_mbuf *m)
 	rte_pktmbuf_reset_headroom(m);
 
 	m->data_len = 0;
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 }
 
 /**
@@ -941,22 +944,22 @@ static inline int rte_pktmbuf_alloc_bulk(struct rte_mempool *pool,
 	switch (count % 4) {
 	case 0:
 		while (idx != count) {
-			__rte_mbuf_raw_sanity_check(mbufs[idx]);
+			__rte_mbuf_raw_verify(mbufs[idx]);
 			rte_pktmbuf_reset(mbufs[idx]);
 			idx++;
 			/* fall-through */
 	case 3:
-			__rte_mbuf_raw_sanity_check(mbufs[idx]);
+			__rte_mbuf_raw_verify(mbufs[idx]);
 			rte_pktmbuf_reset(mbufs[idx]);
 			idx++;
 			/* fall-through */
 	case 2:
-			__rte_mbuf_raw_sanity_check(mbufs[idx]);
+			__rte_mbuf_raw_verify(mbufs[idx]);
 			rte_pktmbuf_reset(mbufs[idx]);
 			idx++;
 			/* fall-through */
 	case 1:
-			__rte_mbuf_raw_sanity_check(mbufs[idx]);
+			__rte_mbuf_raw_verify(mbufs[idx]);
 			rte_pktmbuf_reset(mbufs[idx]);
 			idx++;
 			/* fall-through */
@@ -1184,8 +1187,8 @@ static inline void rte_pktmbuf_attach(struct rte_mbuf *mi, struct rte_mbuf *m)
 	mi->pkt_len = mi->data_len;
 	mi->nb_segs = 1;
 
-	__rte_mbuf_sanity_check(mi, 1);
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(mi, 1);
+	__rte_mbuf_verify(m, 0);
 }
 
 /**
@@ -1340,7 +1343,7 @@ static inline int __rte_pktmbuf_pinned_extbuf_decref(struct rte_mbuf *m)
 static __rte_always_inline struct rte_mbuf *
 rte_pktmbuf_prefree_seg(struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(m, 0);
 
 	if (likely(rte_mbuf_refcnt_read(m) == 1)) {
 
@@ -1411,7 +1414,7 @@ static inline void rte_pktmbuf_free(struct rte_mbuf *m)
 	struct rte_mbuf *m_next;
 
 	if (m != NULL)
-		__rte_mbuf_sanity_check(m, 1);
+		__rte_mbuf_verify(m, 1);
 
 	while (m != NULL) {
 		m_next = m->next;
@@ -1492,7 +1495,7 @@ rte_pktmbuf_copy(const struct rte_mbuf *m, struct rte_mempool *mp,
  */
 static inline void rte_pktmbuf_refcnt_update(struct rte_mbuf *m, int16_t v)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	do {
 		rte_mbuf_refcnt_update(m, v);
@@ -1509,7 +1512,7 @@ static inline void rte_pktmbuf_refcnt_update(struct rte_mbuf *m, int16_t v)
  */
 static inline uint16_t rte_pktmbuf_headroom(const struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(m, 0);
 	return m->data_off;
 }
 
@@ -1523,7 +1526,7 @@ static inline uint16_t rte_pktmbuf_headroom(const struct rte_mbuf *m)
  */
 static inline uint16_t rte_pktmbuf_tailroom(const struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 0);
+	__rte_mbuf_verify(m, 0);
 	return (uint16_t)(m->buf_len - rte_pktmbuf_headroom(m) -
 			  m->data_len);
 }
@@ -1538,7 +1541,7 @@ static inline uint16_t rte_pktmbuf_tailroom(const struct rte_mbuf *m)
  */
 static inline struct rte_mbuf *rte_pktmbuf_lastseg(struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 	while (m->next != NULL)
 		m = m->next;
 	return m;
@@ -1582,7 +1585,7 @@ static inline struct rte_mbuf *rte_pktmbuf_lastseg(struct rte_mbuf *m)
 static inline char *rte_pktmbuf_prepend(struct rte_mbuf *m,
 					uint16_t len)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	if (unlikely(len > rte_pktmbuf_headroom(m)))
 		return NULL;
@@ -1617,7 +1620,7 @@ static inline char *rte_pktmbuf_append(struct rte_mbuf *m, uint16_t len)
 	void *tail;
 	struct rte_mbuf *m_last;
 
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	m_last = rte_pktmbuf_lastseg(m);
 	if (unlikely(len > rte_pktmbuf_tailroom(m_last)))
@@ -1645,7 +1648,7 @@ static inline char *rte_pktmbuf_append(struct rte_mbuf *m, uint16_t len)
  */
 static inline char *rte_pktmbuf_adj(struct rte_mbuf *m, uint16_t len)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	if (unlikely(len > m->data_len))
 		return NULL;
@@ -1677,7 +1680,7 @@ static inline int rte_pktmbuf_trim(struct rte_mbuf *m, uint16_t len)
 {
 	struct rte_mbuf *m_last;
 
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 
 	m_last = rte_pktmbuf_lastseg(m);
 	if (unlikely(len > m_last->data_len))
@@ -1699,7 +1702,7 @@ static inline int rte_pktmbuf_trim(struct rte_mbuf *m, uint16_t len)
  */
 static inline int rte_pktmbuf_is_contiguous(const struct rte_mbuf *m)
 {
-	__rte_mbuf_sanity_check(m, 1);
+	__rte_mbuf_verify(m, 1);
 	return m->nb_segs == 1;
 }
 
diff --git a/lib/mbuf/version.map b/lib/mbuf/version.map
index daa65e2bbdb2..c85370e430b2 100644
--- a/lib/mbuf/version.map
+++ b/lib/mbuf/version.map
@@ -31,6 +31,7 @@ DPDK_24 {
 	rte_mbuf_set_platform_mempool_ops;
 	rte_mbuf_set_user_mempool_ops;
 	rte_mbuf_user_mempool_ops;
+	rte_mbuf_verify;
 	rte_pktmbuf_clone;
 	rte_pktmbuf_copy;
 	rte_pktmbuf_dump;
-- 
2.43.0


^ permalink raw reply	[relevance 2%]

* RE: [PATCH v6 1/3] ethdev: rename action modify field data structure
  2024-02-05 11:23  0%     ` Thomas Monjalon
@ 2024-02-05 11:49  0%       ` Suanming Mou
  0 siblings, 0 replies; 200+ results
From: Suanming Mou @ 2024-02-05 11:49 UTC (permalink / raw)
  To: NBU-Contact-Thomas Monjalon (EXTERNAL)
  Cc: ferruh.yigit, Ori Kam, Aman Singh, Yuying Zhang,
	Dariusz Sosnowski, Slava Ovsiienko, Matan Azrad,
	Andrew Rybchenko, dev

Hi Thomas,

> -----Original Message-----
> From: Thomas Monjalon <thomas@monjalon.net>
> Sent: Monday, February 5, 2024 7:23 PM
> To: Suanming Mou <suanmingm@nvidia.com>
> Cc: ferruh.yigit@amd.com; Ori Kam <orika@nvidia.com>; Aman Singh
> <aman.deep.singh@intel.com>; Yuying Zhang <yuying.zhang@intel.com>; Dariusz
> Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Matan Azrad <matan@nvidia.com>; Andrew
> Rybchenko <andrew.rybchenko@oktetlabs.ru>; dev@dpdk.org
> Subject: Re: [PATCH v6 1/3] ethdev: rename action modify field data structure
> 
> 02/02/2024 01:42, Suanming Mou:
> > --- a/doc/guides/rel_notes/release_24_03.rst
> > +++ b/doc/guides/rel_notes/release_24_03.rst
> > @@ -124,6 +124,8 @@ ABI Changes
> >
> >  * No ABI change that would break compatibility with 23.11.
> >
> > +* ethdev: Rename the experimental ``struct
> > +rte_flow_action_modify_data`` to be ``struct rte_flow_field_data``
> 
> It should be in API change section.
> Please us past tense as recommened in comments in the file.
OK.

> 
> > --- a/lib/ethdev/rte_flow.h
> > +++ b/lib/ethdev/rte_flow.h
> > @@ -3894,6 +3894,7 @@ struct rte_flow_action_ethdev {
> >
> >  /**
> >   * Field IDs for MODIFY_FIELD action.
> > + * e.g. the packet field IDs used in RTE_FLOW_ACTION_TYPE_MODIFY_FIELD.
> 
> Better to give the full name in the first line, so no need to add a second line of
> comment.

So maybe " Field IDs for packet field, used by RTE_FLOW_ACTION_TYPE_MODIFY_FIELD."?
But when COMPARE item to be added. It will be " Field IDs for packet field, used by RTE_FLOW_ACTION_TYPE_MODIFY_FIELD and RTE_FLOW_ITEM_TYPE_COMPARE."  And I assume that will still need a second line since it is too long.

> 
> [...]
> > - * Field description for MODIFY_FIELD action.
> > + * Field description for packet field.
> > + * e.g. the packet fields used in RTE_FLOW_ACTION_TYPE_MODIFY_FIELD.
> 
> Same here, can be one simple line with full name.
> 
> 


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v6 1/3] ethdev: rename action modify field data structure
  2024-02-02  0:42  7%   ` [PATCH v6 1/3] ethdev: rename action modify field data structure Suanming Mou
@ 2024-02-05 11:23  0%     ` Thomas Monjalon
  2024-02-05 11:49  0%       ` Suanming Mou
  0 siblings, 1 reply; 200+ results
From: Thomas Monjalon @ 2024-02-05 11:23 UTC (permalink / raw)
  To: Suanming Mou
  Cc: ferruh.yigit, Ori Kam, Aman Singh, Yuying Zhang,
	Dariusz Sosnowski, Viacheslav Ovsiienko, Matan Azrad,
	Andrew Rybchenko, dev

02/02/2024 01:42, Suanming Mou:
> --- a/doc/guides/rel_notes/release_24_03.rst
> +++ b/doc/guides/rel_notes/release_24_03.rst
> @@ -124,6 +124,8 @@ ABI Changes
>  
>  * No ABI change that would break compatibility with 23.11.
>  
> +* ethdev: Rename the experimental ``struct rte_flow_action_modify_data`` to be ``struct rte_flow_field_data``

It should be in API change section.
Please us past tense as recommened in comments in the file.

> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -3894,6 +3894,7 @@ struct rte_flow_action_ethdev {
>  
>  /**
>   * Field IDs for MODIFY_FIELD action.
> + * e.g. the packet field IDs used in RTE_FLOW_ACTION_TYPE_MODIFY_FIELD.

Better to give the full name in the first line,
so no need to add a second line of comment.

[...]
> - * Field description for MODIFY_FIELD action.
> + * Field description for packet field.
> + * e.g. the packet fields used in RTE_FLOW_ACTION_TYPE_MODIFY_FIELD.

Same here, can be one simple line with full name.




^ permalink raw reply	[relevance 0%]

* [PATCH v2 1/7] ethdev: support report register names and filter
  @ 2024-02-05 10:51  8%   ` Jie Hai
  2024-02-07 17:00  3%     ` Ferruh Yigit
  0 siblings, 1 reply; 200+ results
From: Jie Hai @ 2024-02-05 10:51 UTC (permalink / raw)
  To: dev; +Cc: lihuisong, fengchengwen, liuyonglong, huangdengdui, ferruh.yigit

This patch adds "filter" and "names" fields to "rte_dev_reg_info"
structure. Names of registers in data fields can be reported and
the registers can be filtered by their names.

For compatibility, the original API rte_eth_dev_get_reg_info()
does not use the name and filter fields. The new API
rte_eth_dev_get_reg_info_ext() is added to support reporting
names and filtering by names. If the drivers does not report
the names, set them to "offset_XXX".

Signed-off-by: Jie Hai <haijie1@huawei.com>
---
 doc/guides/rel_notes/release_24_03.rst |  8 ++++++
 lib/ethdev/rte_dev_info.h              | 11 ++++++++
 lib/ethdev/rte_ethdev.c                | 36 ++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 22 ++++++++++++++++
 4 files changed, 77 insertions(+)

diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 84d3144215c6..5d402341223a 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -75,6 +75,11 @@ New Features
   * Added support for Atomic Rules' TK242 packet-capture family of devices
     with PCI IDs: ``0x1024, 0x1025, 0x1026``.
 
+* **Added support for dumping regiters with names and filter.**
+
+  * Added new API functions ``rte_eth_dev_get_reg_info_ext()`` to and filter
+  * the registers by their names and get the information of registers(names,
+  * values and other attributes).
 
 Removed Items
 -------------
@@ -124,6 +129,9 @@ ABI Changes
 
 * No ABI change that would break compatibility with 23.11.
 
+* ethdev: Added ``filter`` and ``names`` fields to ``rte_dev_reg_info``
+  structure for reporting names of regiters and filtering them by names.
+
 
 Known Issues
 ------------
diff --git a/lib/ethdev/rte_dev_info.h b/lib/ethdev/rte_dev_info.h
index 67cf0ae52668..2f4541bd46c8 100644
--- a/lib/ethdev/rte_dev_info.h
+++ b/lib/ethdev/rte_dev_info.h
@@ -11,6 +11,11 @@ extern "C" {
 
 #include <stdint.h>
 
+#define RTE_ETH_REG_NAME_SIZE 128
+struct rte_eth_reg_name {
+	char name[RTE_ETH_REG_NAME_SIZE];
+};
+
 /*
  * Placeholder for accessing device registers
  */
@@ -20,6 +25,12 @@ struct rte_dev_reg_info {
 	uint32_t length; /**< Number of registers to fetch */
 	uint32_t width; /**< Size of device register */
 	uint32_t version; /**< Device version */
+	/**
+	 * Filter for target subset of registers.
+	 * This field could affects register selection for data/length/names.
+	 */
+	char *filter;
+	struct rte_eth_reg_name *names; /**< Registers name saver */
 };
 
 /*
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index f1c658f49e80..3e0294e49092 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -6388,8 +6388,39 @@ rte_eth_read_clock(uint16_t port_id, uint64_t *clock)
 
 int
 rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
+{
+	struct rte_dev_reg_info reg_info;
+	int ret;
+
+	if (info == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Cannot get ethdev port %u register info to NULL",
+			port_id);
+		return -EINVAL;
+	}
+
+	reg_info.length = info->length;
+	reg_info.data = info->data;
+	reg_info.names = NULL;
+	reg_info.filter = NULL;
+
+	ret = rte_eth_dev_get_reg_info_ext(port_id, &reg_info);
+	if (ret != 0)
+		return ret;
+
+	info->length = reg_info.length;
+	info->width = reg_info.width;
+	info->version = reg_info.version;
+	info->offset = reg_info.offset;
+
+	return 0;
+}
+
+int
+rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info)
 {
 	struct rte_eth_dev *dev;
+	uint32_t i;
 	int ret;
 
 	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
@@ -6408,6 +6439,11 @@ rte_eth_dev_get_reg_info(uint16_t port_id, struct rte_dev_reg_info *info)
 
 	rte_ethdev_trace_get_reg_info(port_id, info, ret);
 
+	/* Report the default names if drivers not report. */
+	if (info->names != NULL && strlen(info->names[0].name) == 0)
+		for (i = 0; i < info->length; i++)
+			sprintf(info->names[i].name, "offset_%x",
+				info->offset + i * info->width);
 	return ret;
 }
 
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 2687c23fa6fb..3abc2ad3f865 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -5053,6 +5053,28 @@ __rte_experimental
 int rte_eth_get_monitor_addr(uint16_t port_id, uint16_t queue_id,
 		struct rte_power_monitor_cond *pmc);
 
+/**
+ * Retrieve the filtered device registers (values and names) and
+ * register attributes (number of registers and register size)
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param info
+ *   Pointer to rte_dev_reg_info structure to fill in. If info->data is
+ *   NULL, the function fills in the width and length fields. If non-NULL,
+ *   the values of registers whose name contains the filter string are put
+ *   into the buffer pointed at by the data field. Do the same for the names
+ *   of registers if info->names is not NULL.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support.
+ *   - (-EINVAL) if bad parameter.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EIO) if device is removed.
+ *   - others depends on the specific operations implementation.
+ */
+int rte_eth_dev_get_reg_info_ext(uint16_t port_id, struct rte_dev_reg_info *info);
+
 /**
  * Retrieve device registers and register attributes (number of registers and
  * register size)
-- 
2.30.0


^ permalink raw reply	[relevance 8%]

* [PATCH v7 00/19] Replace uses of RTE_LOGTYPE_PMD
    2024-02-03  4:10  3% ` [PATCH v7 00/19] Replace use of PMD logtype Stephen Hemminger
@ 2024-02-03  4:11  3% ` Stephen Hemminger
  1 sibling, 0 replies; 200+ results
From: Stephen Hemminger @ 2024-02-03  4:11 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Many of the uses of PMD logtype have already been fixed.
But there are still some leftovers, mostly places where
drivers had a logtype but did not use them.

Note: this is not an ABI break, but could break out of
      tree drivers that never updated to use dynamic logtype.
      DPDK never guaranteed that that would not happen.

v7 - drop changes to newlines
     drop changes related to RTE_LOG_DP
     rebase now that other stuff has changed

Stephen Hemminger (19):
  common/sfc_efx: remove use of PMD logtype
  mempool/dpaa2: use driver logtype not PMD
  net/dpaa: use dedicated logtype not PMD
  net/dpaa2: used dedicated logtype not PMD
  net/mrvl: do not use PMD logtype
  net/mvpp2: use dedicated logtype
  net/nfb: use dynamic logtype
  net/vmxnet3: used dedicated logtype not PMD
  raw/cnxk: replace PMD logtype with dynamic type
  crypto/scheduler: replace use of logtype PMD
  crypto/armv8: do not use PMD logtype
  crypto/caam_jr: use dedicated logtype
  crypto/ccp: do not use PMD logtype
  crypto/dpaa_sec, crypto/dpaa2_sec: use dedicated logtype
  event/dpaa, event/dpaa2: use dedicated logtype
  event/dlb2: use dedicated logtype
  event/skeleton: replace logtype PMD with dynamic type
  examples/fips_validation: replace use of PMD logtype
  log: remove PMD log type

 drivers/common/cnxk/roc_platform.h            | 16 ++++---
 drivers/common/sfc_efx/sfc_efx.c              | 11 +----
 drivers/common/sfc_efx/sfc_efx_log.h          |  2 +-
 drivers/crypto/armv8/rte_armv8_pmd.c          |  4 +-
 drivers/crypto/caam_jr/caam_jr.c              |  5 +--
 drivers/crypto/ccp/rte_ccp_pmd.c              | 11 +++--
 drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c   |  6 +--
 drivers/crypto/dpaa_sec/dpaa_sec.c            | 30 ++++++-------
 drivers/crypto/scheduler/scheduler_pmd.c      |  4 +-
 drivers/event/dlb2/dlb2.c                     |  5 +--
 drivers/event/dpaa/dpaa_eventdev.c            |  2 +-
 drivers/event/dpaa2/dpaa2_eventdev.c          |  4 +-
 drivers/event/dpaa2/dpaa2_eventdev_selftest.c |  6 +--
 drivers/event/skeleton/skeleton_eventdev.c    |  4 +-
 drivers/event/skeleton/skeleton_eventdev.h    |  8 +++-
 drivers/mempool/dpaa2/dpaa2_hw_mempool.c      |  4 +-
 drivers/net/dpaa/dpaa_ethdev.c                |  6 +--
 drivers/net/dpaa2/dpaa2_ethdev.c              |  2 +-
 drivers/net/dpaa2/dpaa2_sparser.c             |  4 +-
 drivers/net/mvpp2/mrvl_ethdev.c               |  7 ++-
 drivers/net/nfb/nfb.h                         |  5 +++
 drivers/net/nfb/nfb_ethdev.c                  | 20 ++++-----
 drivers/net/nfb/nfb_rx.c                      | 10 ++---
 drivers/net/nfb/nfb_rx.h                      |  2 +-
 drivers/net/nfb/nfb_tx.c                      | 10 ++---
 drivers/net/nfb/nfb_tx.h                      |  2 +-
 drivers/net/vmxnet3/vmxnet3_ethdev.c          |  2 +-
 drivers/raw/cnxk_bphy/cnxk_bphy.c             |  3 +-
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |  2 +-
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx_test.c    | 31 +++++++------
 drivers/raw/cnxk_bphy/rte_pmd_bphy.h          |  6 +++
 drivers/raw/cnxk_gpio/cnxk_gpio.c             | 21 ++++-----
 drivers/raw/cnxk_gpio/cnxk_gpio.h             |  5 +++
 drivers/raw/cnxk_gpio/cnxk_gpio_selftest.c    | 17 ++++---
 examples/fips_validation/fips_dev_self_test.c | 44 +++++++++----------
 lib/log/log.c                                 |  2 +-
 lib/log/rte_log.h                             |  2 +-
 37 files changed, 166 insertions(+), 159 deletions(-)

-- 
2.43.0


^ permalink raw reply	[relevance 3%]

* [PATCH v7 00/19] Replace use of PMD logtype
  @ 2024-02-03  4:10  3% ` Stephen Hemminger
  2024-02-12 14:45  0%   ` David Marchand
  2024-02-03  4:11  3% ` [PATCH v7 00/19] Replace uses of RTE_LOGTYPE_PMD Stephen Hemminger
  1 sibling, 1 reply; 200+ results
From: Stephen Hemminger @ 2024-02-03  4:10 UTC (permalink / raw)
  To: dev; +Cc: Stephen Hemminger

Many of the uses of PMD logtype have already been fixed.
But there are still some leftovers, mostly places where
drivers had a logtype but did not use them.

Note: this is not an ABI break, but could break out of
      tree drivers that never updated to use dynamic logtype.
      DPDK never guaranteed that that would not happen.

v7 - drop changes to newlines
     drop changes related to RTE_LOG_DP
     rebase now that other stuff has changed

Stephen Hemminger (19):
  common/sfc_efx: remove use of PMD logtype
  mempool/dpaa2: use driver logtype not PMD
  net/dpaa: use dedicated logtype not PMD
  net/dpaa2: used dedicated logtype not PMD
  net/mrvl: do not use PMD logtype
  net/mvpp2: use dedicated logtype
  net/nfb: use dynamic logtype
  net/vmxnet3: used dedicated logtype not PMD
  raw/cnxk: replace PMD logtype with dynamic type
  crypto/scheduler: replace use of logtype PMD
  crypto/armv8: do not use PMD logtype
  crypto/caam_jr: use dedicated logtype
  crypto/ccp: do not use PMD logtype
  crypto/dpaa_sec, crypto/dpaa2_sec: use dedicated logtype
  event/dpaa, event/dpaa2: use dedicated logtype
  event/dlb2: use dedicated logtype
  event/skeleton: replace logtype PMD with dynamic type
  examples/fips_validation: replace use of PMD logtype
  log: remove PMD log type

 drivers/common/cnxk/roc_platform.h            | 16 ++++---
 drivers/common/sfc_efx/sfc_efx.c              | 11 +----
 drivers/common/sfc_efx/sfc_efx_log.h          |  2 +-
 drivers/crypto/armv8/rte_armv8_pmd.c          |  4 +-
 drivers/crypto/caam_jr/caam_jr.c              |  5 +--
 drivers/crypto/ccp/rte_ccp_pmd.c              | 11 +++--
 drivers/crypto/dpaa2_sec/dpaa2_sec_dpseci.c   |  6 +--
 drivers/crypto/dpaa_sec/dpaa_sec.c            | 30 ++++++-------
 drivers/crypto/scheduler/scheduler_pmd.c      |  4 +-
 drivers/event/dlb2/dlb2.c                     |  5 +--
 drivers/event/dpaa/dpaa_eventdev.c            |  2 +-
 drivers/event/dpaa2/dpaa2_eventdev.c          |  4 +-
 drivers/event/dpaa2/dpaa2_eventdev_selftest.c |  6 +--
 drivers/event/skeleton/skeleton_eventdev.c    |  4 +-
 drivers/event/skeleton/skeleton_eventdev.h    |  8 +++-
 drivers/mempool/dpaa2/dpaa2_hw_mempool.c      |  4 +-
 drivers/net/dpaa/dpaa_ethdev.c                |  6 +--
 drivers/net/dpaa2/dpaa2_ethdev.c              |  2 +-
 drivers/net/dpaa2/dpaa2_sparser.c             |  4 +-
 drivers/net/mvpp2/mrvl_ethdev.c               |  7 ++-
 drivers/net/nfb/nfb.h                         |  5 +++
 drivers/net/nfb/nfb_ethdev.c                  | 20 ++++-----
 drivers/net/nfb/nfb_rx.c                      | 10 ++---
 drivers/net/nfb/nfb_rx.h                      |  2 +-
 drivers/net/nfb/nfb_tx.c                      | 10 ++---
 drivers/net/nfb/nfb_tx.h                      |  2 +-
 drivers/net/vmxnet3/vmxnet3_ethdev.c          |  2 +-
 drivers/raw/cnxk_bphy/cnxk_bphy.c             |  3 +-
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx.c         |  2 +-
 drivers/raw/cnxk_bphy/cnxk_bphy_cgx_test.c    | 31 +++++++------
 drivers/raw/cnxk_bphy/rte_pmd_bphy.h          |  6 +++
 drivers/raw/cnxk_gpio/cnxk_gpio.c             | 21 ++++-----
 drivers/raw/cnxk_gpio/cnxk_gpio.h             |  5 +++
 drivers/raw/cnxk_gpio/cnxk_gpio_selftest.c    | 17 ++++---
 examples/fips_validation/fips_dev_self_test.c | 44 +++++++++----------
 lib/log/log.c                                 |  2 +-
 lib/log/rte_log.h                             |  2 +-
 37 files changed, 166 insertions(+), 159 deletions(-)

-- 
2.43.0


^ permalink raw reply	[relevance 3%]

* Re: [PATCH v2 11/11] eventdev: RFC clarify docs on event object fields
  2024-02-01 16:59  0%       ` Bruce Richardson
@ 2024-02-02  9:38  0%         ` Mattias Rönnblom
  0 siblings, 0 replies; 200+ results
From: Mattias Rönnblom @ 2024-02-02  9:38 UTC (permalink / raw)
  To: Bruce Richardson
  Cc: dev, jerinj, mattias.ronnblom, abdullah.sevincer, sachin.saxena,
	hemant.agrawal, pbhagavatula, pravin.pathak

On 2024-02-01 17:59, Bruce Richardson wrote:
> On Wed, Jan 24, 2024 at 12:34:50PM +0100, Mattias Rönnblom wrote:
>> On 2024-01-19 18:43, Bruce Richardson wrote:
>>> Clarify the meaning of the NEW, FORWARD and RELEASE event types.
>>> For the fields in "rte_event" struct, enhance the comments on each to
>>> clarify the field's use, and whether it is preserved between enqueue and
>>> dequeue, and it's role, if any, in scheduling.
>>>
>>> Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
>>> ---
>>>
>>> As with the previous patch, please review this patch to ensure that the
>>> expected semantics of the various event types and event fields have not
>>> changed in an unexpected way.
>>> ---
>>>    lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
>>>    1 file changed, 77 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
>>> index cb13602ffb..4eff1c4958 100644
>>> --- a/lib/eventdev/rte_eventdev.h
>>> +++ b/lib/eventdev/rte_eventdev.h
>>> @@ -1416,21 +1416,25 @@ struct rte_event_vector {
>>>
>>>    /* Event enqueue operations */
>>>    #define RTE_EVENT_OP_NEW                0
>>> -/**< The event producers use this operation to inject a new event to the
>>> +/**< The @ref rte_event.op field should be set to this type to inject a new event to the
>>>     * event device.
>>>     */
>>
>> "type" -> "value"
>>
>> "to" -> "into"?
>>
>> You could also say "to mark the event as new".
>>
>> What is new? Maybe "new (as opposed to a forwarded) event." or "new (i.e.,
>> not previously dequeued).".
>>
> 
> Using this latter suggested wording in V3.
> 
>> "The application sets the @ref rte_event.op field of an enqueued event to
>> this value to mark the event as new (i.e., not previously dequeued)."
>>
>>>    #define RTE_EVENT_OP_FORWARD            1
>>> -/**< The CPU use this operation to forward the event to different event queue or
>>> - * change to new application specific flow or schedule type to enable
>>> - * pipelining.
>>> +/**< SW should set the @ref rte_event.op filed to this type to return a
>>> + * previously dequeued event to the event device for further processing.
>>
>> "filed" -> "field"
>>
>> "SW" -> "The application"
>>
>> "to be scheduled for further processing (or transmission)"
>>
>> The wording should otherwise be the same as NEW, whatever you choose there.
>>
> Ack.
> 
>>>     *
>>> - * This operation must only be enqueued to the same port that the
>>> + * This event *must* be enqueued to the same port that the
>>>     * event to be forwarded was dequeued from.
>>
>> OK, so you "should" mark a new event RTE_EVENT_OP_FORWARD but you "*must*"
>> enqueue it to the same port.
>>
>> I think you "must" do both.
>>
> Ack
> 
>>> + *
>>> + * The event's fields, including (but not limited to) flow_id, scheduling type,
>>> + * destination queue, and event payload e.g. mbuf pointer, may all be updated as
>>> + * desired by software, but the @ref rte_event.impl_opaque field must
>>
>> "software" -> "application"
>>
> Ack
>   
>>> + * be kept to the same value as was present when the event was dequeued.
>>>     */
>>>    #define RTE_EVENT_OP_RELEASE            2
>>>    /**< Release the flow context associated with the schedule type.
>>>     *
>>> - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
>>> + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ATOMIC
>>>     * then this function hints the scheduler that the user has completed critical
>>>     * section processing in the current atomic context.
>>>     * The scheduler is now allowed to schedule events from the same flow from
>>> @@ -1442,21 +1446,19 @@ struct rte_event_vector {
>>>     * performance, but the user needs to design carefully the split into critical
>>>     * vs non-critical sections.
>>>     *
>>> - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
>>> - * then this function hints the scheduler that the user has done all that need
>>> - * to maintain event order in the current ordered context.
>>> - * The scheduler is allowed to release the ordered context of this port and
>>> - * avoid reordering any following enqueues.
>>> - *
>>> - * Early ordered context release may increase parallelism and thus system
>>> - * performance.
>>> + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ORDERED
>>
>> Isn't a missing "or @ref RTE_SCHED_TYPE_ATOMIC" just an oversight (in the
>> original API wording)?
>>
> 
> No, I don't think so, because ATOMIC is described above.
> 
>>> + * then this function informs the scheduler that the current event has
>>> + * completed processing and will not be returned to the scheduler, i.e.
>>> + * it has been dropped, and so the reordering context for that event
>>> + * should be considered filled.
>>>     *
>>> - * If current flow's scheduler type method is *RTE_SCHED_TYPE_PARALLEL*
>>> + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_PARALLEL
>>>     * or no scheduling context is held then this function may be an NOOP,
>>>     * depending on the implementation.
>>
>> Maybe you can also fix this "function" -> "operation". I suggest you delete
>> that sentence, because it makes no sense.
>>
>> What is says in a somewhat vague manner is that you tread into the realm of
>> undefined behavior if you release PARALLEL events.
>>
> 
> Agree. Just deleting.
> 
>>>     *
>>>     * This operation must only be enqueued to the same port that the
>>> - * event to be released was dequeued from.
>>> + * event to be released was dequeued from. The @ref rte_event.impl_opaque
>>> + * field in the release event must match that in the original dequeued event.
>>
>> I would say "same value" rather than "match".
>>
>> "The @ref rte_event.impl_opaque field in the release event have the same
>> value as in the original dequeued event."
>>
> Ack.
> 
>>>     */
>>>
>>>    /**
>>> @@ -1473,53 +1475,100 @@ struct rte_event {
>>>    			/**< Targeted flow identifier for the enqueue and
>>>    			 * dequeue operation.
>>>    			 * The value must be in the range of
>>> -			 * [0, nb_event_queue_flows - 1] which
>>> +			 * [0, @ref rte_event_dev_config.nb_event_queue_flows - 1] which
>>
>> The same comment as I had before about ranges for unsigned types.
>>
> Ack.
> 
>>>    			 * previously supplied to rte_event_dev_configure().
>>> +			 *
>>> +			 * For @ref RTE_SCHED_TYPE_ATOMIC, this field is used to identify a
>>> +			 * flow context for atomicity, such that events from each individual flow
>>> +			 * will only be scheduled to one port at a time.
>>
>> flow_id alone doesn't identify an atomic flow. It's queue_id + flow_id. I'm
>> not sure I think "flow context" adds much, unless it's defined somewhere.
>> Sounds like some assumed implementation detail.
>>
> Removing the word context, and adding that it identifies a flow "within a
> queue and priority level", to make it clear that it's just not the flow_id
> involved here, as you say.
> 
>>> +			 *
>>> +			 * This field is preserved between enqueue and dequeue when
>>> +			 * a device reports the @ref RTE_EVENT_DEV_CAP_CARRY_FLOW_ID
>>> +			 * capability. Otherwise the value is implementation dependent
>>> +			 * on dequeue >   			 */
>>>    			uint32_t sub_event_type:8;
>>>    			/**< Sub-event types based on the event source.
>>> +			 *
>>> +			 * This field is preserved between enqueue and dequeue.
>>> +			 * This field is for SW or event adapter use,
>>
>> "SW" -> "application"
>>
> Ack.
> 
>>> +			 * and is unused in scheduling decisions.
>>> +			 *
>>>    			 * @see RTE_EVENT_TYPE_CPU
>>>    			 */
>>>    			uint32_t event_type:4;
>>> -			/**< Event type to classify the event source.
>>> -			 * @see RTE_EVENT_TYPE_ETHDEV, (RTE_EVENT_TYPE_*)
>>> +			/**< Event type to classify the event source. (RTE_EVENT_TYPE_*)
>>> +			 *
>>> +			 * This field is preserved between enqueue and dequeue
>>> +			 * This field is for SW or event adapter use,
>>> +			 * and is unused in scheduling decisions.
>>
>> "unused" -> "is not considered"?
>>
> Ack.
> 
>>>    			 */
>>>    			uint8_t op:2;
>>> -			/**< The type of event enqueue operation - new/forward/
>>> -			 * etc.This field is not preserved across an instance
>>> +			/**< The type of event enqueue operation - new/forward/ etc.
>>> +			 *
>>> +			 * This field is *not* preserved across an instance
>>>    			 * and is undefined on dequeue.
>>
>> Maybe you should use "undefined" rather than "implementation dependent", or
>> change this instance of undefined to implementation dependent. Now two
>> different terms are used for the same thing.
>>
> 
> Using implementation dependent.
> Ideally, I think we should update all drivers to set this to "FORWARD" by
> default on dequeue, but for now it's "implementation dependent".
> 

That would make a lot of sense.

>>> -			 * @see RTE_EVENT_OP_NEW, (RTE_EVENT_OP_*)
>>> +			 *
>>> +			 * @see RTE_EVENT_OP_NEW
>>> +			 * @see RTE_EVENT_OP_FORWARD
>>> +			 * @see RTE_EVENT_OP_RELEASE
>>>    			 */
>>>    			uint8_t rsvd:4;
>>> -			/**< Reserved for future use */
>>> +			/**< Reserved for future use.
>>> +			 *
>>> +			 * Should be set to zero on enqueue. Zero on dequeue.
>>> +			 */
>>
>> Why say they should be zero on dequeue? Doesn't this defeat the purpose of
>> having reserved bits, for future use? With you suggested wording, you can't
>> use these bits without breaking the ABI.
> 
> Good point. Removing the dequeue value bit.
> 
>>
>>>    			uint8_t sched_type:2;
>>>    			/**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
>>>    			 * associated with flow id on a given event queue
>>>    			 * for the enqueue and dequeue operation.
>>> +			 *
>>> +			 * This field is used to determine the scheduling type
>>> +			 * for events sent to queues where @ref RTE_EVENT_QUEUE_CFG_ALL_TYPES
>>> +			 * is supported.
>>
>> "supported" -> "configured"
>>
> Ack.
> 
>>> +			 * For queues where only a single scheduling type is available,
>>> +			 * this field must be set to match the configured scheduling type.
>>> +			 *
>>
>> Why is the API/event device asking this of the application?
>>
> Historical reasons. I agree that it shouldn't, this should just be marked
> as ignored on fixed-type queues, but the spec up till now says it must
> match and some drivers do check this. Once we update the drivers to stop
> checking then we can change the spec without affecting apps.
> 
>>> +			 * This field is preserved between enqueue and
>>> dequeue.  +			 * +			 * @see
>>> RTE_SCHED_TYPE_ORDERED +			 * @see
>>> RTE_SCHED_TYPE_ATOMIC +			 * @see
>>> RTE_SCHED_TYPE_PARALLEL */ uint8_t queue_id; /**< Targeted event queue
>>> identifier for the enqueue or * dequeue operation.  * The value must be
>>> in the range of -			 * [0, nb_event_queues - 1] which
>>> previously supplied to -			 *
>>> rte_event_dev_configure().  +			 * [0, @ref
>>> rte_event_dev_config.nb_event_queues - 1] which was +
>>> * previously supplied to rte_event_dev_configure().  +
>>> * +			 * This field is preserved between enqueue on
>>> dequeue.  */ uint8_t priority; /**< Event priority relative to other
>>> events in the * event queue. The requested priority should in the -
>>> * range of  [RTE_EVENT_DEV_PRIORITY_HIGHEST, -			 *
>>> RTE_EVENT_DEV_PRIORITY_LOWEST].  +			 * range of  [@ref
>>> RTE_EVENT_DEV_PRIORITY_HIGHEST, +			 * @ref
>>> RTE_EVENT_DEV_PRIORITY_LOWEST].  * The implementation shall normalize
>>> the requested * priority to supported priority value.  +
>>> * * Valid when the device has -			 *
>>> RTE_EVENT_DEV_CAP_EVENT_QOS capability.  +			 * @ref
>>> RTE_EVENT_DEV_CAP_EVENT_QOS capability.  +			 * Ignored
>>> otherwise.  +			 * +			 * This
>>> field is preserved between enqueue and dequeue.
>>
>> Is the normalized or unnormalized value that is preserved?
>>
> Very good point. It's the normalized & then denormalised version that is
> guaranteed to be preserved, I suspect. SW eventdevs probably preserve
> as-is, but HW eventdevs may lose precision. Rather than making this
> "implementation defined" or "not preserved" which would be annoying for
> apps, I think, I'm going to document this as "preserved, but with possible
> loss of precision".
> 

This makes me again think it may be worth noting that Eventdev -> API 
priority normalization is (event.priority * PMD_LEVELS) / 
EVENTDEV_LEVELS (rounded down) - assuming that's how it's supposed to be 
done - or something to that effect.

>>>    			 */
>>>    			uint8_t impl_opaque;
>>>    			/**< Implementation specific opaque value.
>>
>> Maybe you can also fix "implementation" here to be something more specific.
>> Implementation, of what?
>>
>> "Event device implementation" or just "event device".
>>
> "Opaque field for event device use"
> 
>>> +			 *
>>>    			 * An implementation may use this field to hold
>>>    			 * implementation specific value to share between
>>>    			 * dequeue and enqueue operation.
>>> +			 *
>>>    			 * The application should not modify this field.
>>> +			 * Its value is implementation dependent on dequeue,
>>> +			 * and must be returned unmodified on enqueue when
>>> +			 * op type is @ref RTE_EVENT_OP_FORWARD or @ref RTE_EVENT_OP_RELEASE
>>
>> Should it be mentioned that impl_opaque is ignored by the event device for
>> NEW events?
>>
> Added in V3.
> 
>>>    			 */
>>>    		};
>>>    	};
>>> --
>>> 2.40.1
>>>

^ permalink raw reply	[relevance 0%]

* [PATCH v6 1/3] ethdev: rename action modify field data structure
  2024-02-02  0:42  3% ` [PATCH v6 " Suanming Mou
@ 2024-02-02  0:42  7%   ` Suanming Mou
  2024-02-05 11:23  0%     ` Thomas Monjalon
  0 siblings, 1 reply; 200+ results
From: Suanming Mou @ 2024-02-02  0:42 UTC (permalink / raw)
  To: ferruh.yigit, Ori Kam, Aman Singh, Yuying Zhang,
	Dariusz Sosnowski, Viacheslav Ovsiienko, Matan Azrad,
	Thomas Monjalon, Andrew Rybchenko
  Cc: dev

Current rte_flow_action_modify_data struct describes the pkt
field perfectly and is used only in action.

It is planned to be used for item as well. This commit renames
it to "rte_flow_field_data" making it compatible to be used by item.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 app/test-pmd/cmdline_flow.c            | 22 +++++++++++-----------
 doc/guides/prog_guide/rte_flow.rst     |  2 +-
 doc/guides/rel_notes/release_24_03.rst |  2 ++
 drivers/net/mlx5/mlx5_flow.c           |  4 ++--
 drivers/net/mlx5/mlx5_flow.h           |  6 +++---
 drivers/net/mlx5/mlx5_flow_dv.c        | 10 +++++-----
 lib/ethdev/rte_flow.h                  | 10 ++++++----
 7 files changed, 30 insertions(+), 26 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 4d26e81d26..35030b5c47 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -744,13 +744,13 @@ enum index {
 #define ITEM_RAW_SIZE \
 	(sizeof(struct rte_flow_item_raw) + ITEM_RAW_PATTERN_SIZE)
 
-/** Maximum size for external pattern in struct rte_flow_action_modify_data. */
-#define ACTION_MODIFY_PATTERN_SIZE 32
+/** Maximum size for external pattern in struct rte_flow_field_data. */
+#define FLOW_FIELD_PATTERN_SIZE 32
 
 /** Storage size for struct rte_flow_action_modify_field including pattern. */
 #define ACTION_MODIFY_SIZE \
 	(sizeof(struct rte_flow_action_modify_field) + \
-	ACTION_MODIFY_PATTERN_SIZE)
+	FLOW_FIELD_PATTERN_SIZE)
 
 /** Maximum number of queue indices in struct rte_flow_action_rss. */
 #define ACTION_RSS_QUEUE_NUM 128
@@ -944,7 +944,7 @@ static const char *const modify_field_ops[] = {
 	"set", "add", "sub", NULL
 };
 
-static const char *const modify_field_ids[] = {
+static const char *const flow_field_ids[] = {
 	"start", "mac_dst", "mac_src",
 	"vlan_type", "vlan_id", "mac_type",
 	"ipv4_dscp", "ipv4_ttl", "ipv4_src", "ipv4_dst",
@@ -6995,7 +6995,7 @@ static const struct token token_list[] = {
 			     ARGS_ENTRY_ARB(0, 0),
 			     ARGS_ENTRY_ARB
 				(sizeof(struct rte_flow_action_modify_field),
-				 ACTION_MODIFY_PATTERN_SIZE)),
+				 FLOW_FIELD_PATTERN_SIZE)),
 		.call = parse_vc_conf,
 	},
 	[ACTION_MODIFY_FIELD_WIDTH] = {
@@ -9821,10 +9821,10 @@ parse_vc_modify_field_id(struct context *ctx, const struct token *token,
 	if (ctx->curr != ACTION_MODIFY_FIELD_DST_TYPE_VALUE &&
 		ctx->curr != ACTION_MODIFY_FIELD_SRC_TYPE_VALUE)
 		return -1;
-	for (i = 0; modify_field_ids[i]; ++i)
-		if (!strcmp_partial(modify_field_ids[i], str, len))
+	for (i = 0; flow_field_ids[i]; ++i)
+		if (!strcmp_partial(flow_field_ids[i], str, len))
 			break;
-	if (!modify_field_ids[i])
+	if (!flow_field_ids[i])
 		return -1;
 	if (!ctx->object)
 		return len;
@@ -12051,10 +12051,10 @@ comp_set_modify_field_id(struct context *ctx, const struct token *token,
 
 	RTE_SET_USED(token);
 	if (!buf)
-		return RTE_DIM(modify_field_ids);
-	if (ent >= RTE_DIM(modify_field_ids) - 1)
+		return RTE_DIM(flow_field_ids);
+	if (ent >= RTE_DIM(flow_field_ids) - 1)
 		return -1;
-	name = modify_field_ids[ent];
+	name = flow_field_ids[ent];
 	if (ctx->curr == ACTION_MODIFY_FIELD_SRC_TYPE ||
 	    (strcmp(name, "pointer") && strcmp(name, "value")))
 		return strlcpy(buf, name, size);
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 7af329bd93..9192d6ab01 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3185,7 +3185,7 @@ destination offset as ``48``, and provide immediate value ``0xXXXX85XX``.
    | ``width``     | number of bits to use   |
    +---------------+-------------------------+
 
-.. _table_rte_flow_action_modify_data:
+.. _table_rte_flow_field_data:
 
 .. table:: destination/source field definition
 
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 84d3144215..5f3ceeccab 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -124,6 +124,8 @@ ABI Changes
 
 * No ABI change that would break compatibility with 23.11.
 
+* ethdev: Rename the experimental ``struct rte_flow_action_modify_data`` to be ``struct rte_flow_field_data``
+
 
 Known Issues
 ------------
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 85e8c77c81..5788a7fb57 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2493,7 +2493,7 @@ mlx5_validate_action_ct(struct rte_eth_dev *dev,
  * Validate the level value for modify field action.
  *
  * @param[in] data
- *   Pointer to the rte_flow_action_modify_data structure either src or dst.
+ *   Pointer to the rte_flow_field_data structure either src or dst.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2501,7 +2501,7 @@ mlx5_validate_action_ct(struct rte_eth_dev *dev,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-flow_validate_modify_field_level(const struct rte_flow_action_modify_data *data,
+flow_validate_modify_field_level(const struct rte_flow_field_data *data,
 				 struct rte_flow_error *error)
 {
 	if (data->level == 0)
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 6dde9de688..ecfb04ead2 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1121,7 +1121,7 @@ flow_items_to_tunnel(const struct rte_flow_item items[])
  *   Tag array index.
  */
 static inline uint8_t
-flow_tag_index_get(const struct rte_flow_action_modify_data *data)
+flow_tag_index_get(const struct rte_flow_field_data *data)
 {
 	return data->tag_index ? data->tag_index : data->level;
 }
@@ -2523,7 +2523,7 @@ int mlx5_flow_validate_action_default_miss(uint64_t action_flags,
 				const struct rte_flow_attr *attr,
 				struct rte_flow_error *error);
 int flow_validate_modify_field_level
-			(const struct rte_flow_action_modify_data *data,
+			(const struct rte_flow_field_data *data,
 			 struct rte_flow_error *error);
 int mlx5_flow_item_acceptable(const struct rte_flow_item *item,
 			      const uint8_t *mask,
@@ -2828,7 +2828,7 @@ size_t flow_dv_get_item_hdr_len(const enum rte_flow_item_type item_type);
 int flow_dv_convert_encap_data(const struct rte_flow_item *items, uint8_t *buf,
 			   size_t *size, struct rte_flow_error *error);
 void mlx5_flow_field_id_to_modify_info
-		(const struct rte_flow_action_modify_data *data,
+		(const struct rte_flow_field_data *data,
 		 struct field_modify_info *info, uint32_t *mask,
 		 uint32_t width, struct rte_eth_dev *dev,
 		 const struct rte_flow_attr *attr, struct rte_flow_error *error);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 115d730317..52620be262 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1441,7 +1441,7 @@ flow_modify_info_mask_32_masked(uint32_t length, uint32_t off, uint32_t post_mas
 }
 
 static __rte_always_inline enum mlx5_modification_field
-mlx5_mpls_modi_field_get(const struct rte_flow_action_modify_data *data)
+mlx5_mpls_modi_field_get(const struct rte_flow_field_data *data)
 {
 	return MLX5_MODI_IN_MPLS_LABEL_0 + data->tag_index;
 }
@@ -1449,7 +1449,7 @@ mlx5_mpls_modi_field_get(const struct rte_flow_action_modify_data *data)
 static void
 mlx5_modify_flex_item(const struct rte_eth_dev *dev,
 		      const struct mlx5_flex_item *flex,
-		      const struct rte_flow_action_modify_data *data,
+		      const struct rte_flow_field_data *data,
 		      struct field_modify_info *info,
 		      uint32_t *mask, uint32_t width)
 {
@@ -1573,7 +1573,7 @@ mlx5_modify_flex_item(const struct rte_eth_dev *dev,
 
 void
 mlx5_flow_field_id_to_modify_info
-		(const struct rte_flow_action_modify_data *data,
+		(const struct rte_flow_field_data *data,
 		 struct field_modify_info *info, uint32_t *mask,
 		 uint32_t width, struct rte_eth_dev *dev,
 		 const struct rte_flow_attr *attr, struct rte_flow_error *error)
@@ -5284,8 +5284,8 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	struct mlx5_sh_config *config = &priv->sh->config;
 	struct mlx5_hca_attr *hca_attr = &priv->sh->cdev->config.hca_attr;
 	const struct rte_flow_action_modify_field *conf = action->conf;
-	const struct rte_flow_action_modify_data *src_data = &conf->src;
-	const struct rte_flow_action_modify_data *dst_data = &conf->dst;
+	const struct rte_flow_field_data *src_data = &conf->src;
+	const struct rte_flow_field_data *dst_data = &conf->dst;
 	uint32_t dst_width, src_width, width = conf->width;
 
 	ret = flow_dv_validate_action_modify_hdr(action_flags, action, error);
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 1dded812ec..5e66b2af1d 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -3894,6 +3894,7 @@ struct rte_flow_action_ethdev {
 
 /**
  * Field IDs for MODIFY_FIELD action.
+ * e.g. the packet field IDs used in RTE_FLOW_ACTION_TYPE_MODIFY_FIELD.
  */
 enum rte_flow_field_id {
 	RTE_FLOW_FIELD_START = 0,	/**< Start of a packet. */
@@ -3947,9 +3948,10 @@ enum rte_flow_field_id {
  * @warning
  * @b EXPERIMENTAL: this structure may change without prior notice
  *
- * Field description for MODIFY_FIELD action.
+ * Field description for packet field.
+ * e.g. the packet fields used in RTE_FLOW_ACTION_TYPE_MODIFY_FIELD.
  */
-struct rte_flow_action_modify_data {
+struct rte_flow_field_data {
 	enum rte_flow_field_id field; /**< Field or memory type ID. */
 	union {
 		struct {
@@ -4058,8 +4060,8 @@ enum rte_flow_modify_op {
  */
 struct rte_flow_action_modify_field {
 	enum rte_flow_modify_op operation; /**< Operation to perform. */
-	struct rte_flow_action_modify_data dst; /**< Destination field. */
-	struct rte_flow_action_modify_data src; /**< Source field. */
+	struct rte_flow_field_data dst; /**< Destination field. */
+	struct rte_flow_field_data src; /**< Source field. */
 	uint32_t width; /**< Number of bits to use from a source field. */
 };
 
-- 
2.34.1


^ permalink raw reply	[relevance 7%]

* [PATCH v6 0/3] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE
    2024-02-01 12:29  3% ` [PATCH v5 0/3] " Suanming Mou
@ 2024-02-02  0:42  3% ` Suanming Mou
  2024-02-02  0:42  7%   ` [PATCH v6 1/3] ethdev: rename action modify field data structure Suanming Mou
  2024-02-06  2:06  3% ` [PATCH v7 0/4] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Suanming Mou
  2 siblings, 1 reply; 200+ results
From: Suanming Mou @ 2024-02-02  0:42 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: dev, orika

The new item type is added for the case user wants to match traffic
based on packet field compare result with other fields or immediate
value.

e.g. take advantage the compare item user will be able to accumulate
a IPv4/TCP packet's TCP data_offset and IPv4 IHL field to a tag
register, then compare the tag register with IPv4 header total length
to understand the packet has payload or not.

The supported operations can be as below:
 - RTE_FLOW_ITEM_COMPARE_EQ (equal)
 - RTE_FLOW_ITEM_COMPARE_NE (not equal)
 - RTE_FLOW_ITEM_COMPARE_LT (less than)
 - RTE_FLOW_ITEM_COMPARE_LE (less than or equal)
 - RTE_FLOW_ITEM_COMPARE_GT (great than)
 - RTE_FLOW_ITEM_COMPARE_GE (great than or equal)

V6:
 - fix typo and style issue.
 - adjust flow_field description.

V5:
 - rebase on top of next-net
 - add sample detail for rte_flow_field.

V4:
 - rebase on top of the latest version.
 - move ACTION_MODIFY_PATTERN_SIZE and modify_field_ids rename
   to first patch.
 - add comparison flow create sample in testpmd_funcs.rst.

V3:
 - fix code style missing empty line in rte_flow.rst.
 - fix missing the ABI change release notes.

V2:
 - Since modify field data struct is experiment, rename modify
   field data directly instead of adding new flow field struct.


Suanming Mou (3):
  ethdev: rename action modify field data structure
  ethdev: add compare item
  net/mlx5: add compare item support

 app/test-pmd/cmdline_flow.c                 | 416 +++++++++++++++++++-
 doc/guides/nics/features/default.ini        |   1 +
 doc/guides/nics/features/mlx5.ini           |   1 +
 doc/guides/nics/mlx5.rst                    |   7 +
 doc/guides/prog_guide/rte_flow.rst          |   9 +-
 doc/guides/rel_notes/release_24_03.rst      |   9 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  24 ++
 drivers/net/mlx5/mlx5_flow.c                |   4 +-
 drivers/net/mlx5/mlx5_flow.h                |   9 +-
 drivers/net/mlx5/mlx5_flow_dv.c             |  10 +-
 drivers/net/mlx5/mlx5_flow_hw.c             |  73 ++++
 lib/ethdev/rte_flow.c                       |   1 +
 lib/ethdev/rte_flow.h                       | 332 +++++++++-------
 13 files changed, 727 insertions(+), 169 deletions(-)

-- 
2.34.1


^ permalink raw reply	[relevance 3%]

* RE: [PATCH v5 0/3] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE
  2024-02-01 18:56  0%   ` [PATCH v5 0/3] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Ferruh Yigit
@ 2024-02-02  0:32  0%     ` Suanming Mou
  0 siblings, 0 replies; 200+ results
From: Suanming Mou @ 2024-02-02  0:32 UTC (permalink / raw)
  To: Ferruh Yigit; +Cc: dev, Ori Kam



> -----Original Message-----
> From: Ferruh Yigit <ferruh.yigit@amd.com>
> Sent: Friday, February 2, 2024 2:56 AM
> To: Suanming Mou <suanmingm@nvidia.com>
> Cc: dev@dpdk.org; Ori Kam <orika@nvidia.com>
> Subject: Re: [PATCH v5 0/3] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE
> 
> On 2/1/2024 12:29 PM, Suanming Mou wrote:
> > The new item type is added for the case user wants to match traffic
> > based on packet field compare result with other fields or immediate
> > value.
> >
> > e.g. take advantage the compare item user will be able to accumulate a
> > IPv4/TCP packet's TCP data_offset and IPv4 IHL field to a tag
> > register, then compare the tag register with IPv4 header total length
> > to understand the packet has payload or not.
> >
> > The supported operations can be as below:
> >  - RTE_FLOW_ITEM_COMPARE_EQ (equal)
> >  - RTE_FLOW_ITEM_COMPARE_NE (not equal)
> >  - RTE_FLOW_ITEM_COMPARE_LT (less than)
> >  - RTE_FLOW_ITEM_COMPARE_LE (less than or equal)
> >  - RTE_FLOW_ITEM_COMPARE_GT (great than)
> >  - RTE_FLOW_ITEM_COMPARE_GE (great than or equal)
> >
> > V5:
> >  - rebase on top of next-net
> >  - add sample detail for rte_flow_field.
> >
> > V4:
> >  - rebase on top of the latest version.
> >  - move ACTION_MODIFY_PATTERN_SIZE and modify_field_ids rename
> >    to first patch.
> >  - add comparison flow create sample in testpmd_funcs.rst.
> >
> > V3:
> >  - fix code style missing empty line in rte_flow.rst.
> >  - fix missing the ABI change release notes.
> >
> > V2:
> >  - Since modify field data struct is experiment, rename modify
> >    field data directly instead of adding new flow field struct.
> >
> >
> > Suanming Mou (3):
> >   ethdev: rename action modify field data structure
> >   ethdev: add compare item
> >   net/mlx5: add compare item support
> >
> 
> Mostly looks good, please find comments on a few minor issues on patches.

Sure, thanks.


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v5 1/3] ethdev: rename action modify field data structure
  2024-02-01 12:29  7%   ` [PATCH v5 1/3] ethdev: rename action modify field data structure Suanming Mou
@ 2024-02-01 18:57  0%     ` Ferruh Yigit
  0 siblings, 0 replies; 200+ results
From: Ferruh Yigit @ 2024-02-01 18:57 UTC (permalink / raw)
  To: Suanming Mou, Ori Kam, Aman Singh, Yuying Zhang,
	Dariusz Sosnowski, Viacheslav Ovsiienko, Matan Azrad,
	Thomas Monjalon, Andrew Rybchenko
  Cc: dev

On 2/1/2024 12:29 PM, Suanming Mou wrote:
> Current rte_flow_action_modify_data struct describes the pkt
> field perfectly and is used only in action.
> 
> It is planned to be used for item as well. This commit renames
> it to "rte_flow_field_data" making it compatible to be used by item.
> 
> Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
> Acked-by: Ori Kam <orika@nvidia.com>
> Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>

<...>

> diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
> index 84d3144215..efeda6ea97 100644
> --- a/doc/guides/rel_notes/release_24_03.rst
> +++ b/doc/guides/rel_notes/release_24_03.rst
> @@ -124,6 +124,7 @@ ABI Changes
>  
>  * No ABI change that would break compatibility with 23.11.
>  
> +* ethdev: Rename the experimental ``struct rte_flow_action_modify_data`` to be ``struct rte_flow_field_data``
>  

Please put one more empty line after your change, to have two empty
lines before next section.

<...>

> diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
> index 1267c146e5..a143ecb194 100644
> --- a/lib/ethdev/rte_flow.h
> +++ b/lib/ethdev/rte_flow.h
> @@ -3887,6 +3887,8 @@ struct rte_flow_action_ethdev {
>  
>  /**
>   * Field IDs for MODIFY_FIELD action.
> + * e.g. the packet field IDs used in RTE_FLOW_ACTION_TYPE_MODIFY_FIELD
> + * and RTE_FLOW_ITEM_TYPE_COMPARE.
>

In this patch there is no RTE_FLOW_ITEM_TYPE_COMPARE yet, can you please
update to have RTE_FLOW_ACTION_TYPE_MODIFY_FIELD in this patch and add
RTE_FLOW_ITEM_TYPE_COMPARE in next patch?


>   */
>  enum rte_flow_field_id {
>  	RTE_FLOW_FIELD_START = 0,	/**< Start of a packet. */
> @@ -3940,9 +3942,11 @@ enum rte_flow_field_id {
>   * @warning
>   * @b EXPERIMENTAL: this structure may change without prior notice
>   *
> - * Field description for MODIFY_FIELD action.
> + * Field description for packet field.
> + * e.g. the packet fields used in RTE_FLOW_ACTION_TYPE_MODIFY_FIELD
> + * and RTE_FLOW_ITEM_TYPE_COMPARE.
>

Same here, can you please mention from RTE_FLOW_ITEM_TYPE_COMPARE in
next patch.


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v5 0/3] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE
  2024-02-01 12:29  3% ` [PATCH v5 0/3] " Suanming Mou
  2024-02-01 12:29  7%   ` [PATCH v5 1/3] ethdev: rename action modify field data structure Suanming Mou
@ 2024-02-01 18:56  0%   ` Ferruh Yigit
  2024-02-02  0:32  0%     ` Suanming Mou
  1 sibling, 1 reply; 200+ results
From: Ferruh Yigit @ 2024-02-01 18:56 UTC (permalink / raw)
  To: Suanming Mou; +Cc: dev, orika

On 2/1/2024 12:29 PM, Suanming Mou wrote:
> The new item type is added for the case user wants to match traffic
> based on packet field compare result with other fields or immediate
> value.
> 
> e.g. take advantage the compare item user will be able to accumulate
> a IPv4/TCP packet's TCP data_offset and IPv4 IHL field to a tag
> register, then compare the tag register with IPv4 header total length
> to understand the packet has payload or not.
> 
> The supported operations can be as below:
>  - RTE_FLOW_ITEM_COMPARE_EQ (equal)
>  - RTE_FLOW_ITEM_COMPARE_NE (not equal)
>  - RTE_FLOW_ITEM_COMPARE_LT (less than)
>  - RTE_FLOW_ITEM_COMPARE_LE (less than or equal)
>  - RTE_FLOW_ITEM_COMPARE_GT (great than)
>  - RTE_FLOW_ITEM_COMPARE_GE (great than or equal)
> 
> V5:
>  - rebase on top of next-net
>  - add sample detail for rte_flow_field.
> 
> V4:
>  - rebase on top of the latest version.
>  - move ACTION_MODIFY_PATTERN_SIZE and modify_field_ids rename
>    to first patch.
>  - add comparison flow create sample in testpmd_funcs.rst.
> 
> V3:
>  - fix code style missing empty line in rte_flow.rst.
>  - fix missing the ABI change release notes.
> 
> V2:
>  - Since modify field data struct is experiment, rename modify
>    field data directly instead of adding new flow field struct.
> 
> 
> Suanming Mou (3):
>   ethdev: rename action modify field data structure
>   ethdev: add compare item
>   net/mlx5: add compare item support
> 

Mostly looks good, please find comments on a few minor issues on patches.


^ permalink raw reply	[relevance 0%]

* Re: [PATCH v2 11/11] eventdev: RFC clarify docs on event object fields
  @ 2024-02-01 16:59  0%       ` Bruce Richardson
  2024-02-02  9:38  0%         ` Mattias Rönnblom
  0 siblings, 1 reply; 200+ results
From: Bruce Richardson @ 2024-02-01 16:59 UTC (permalink / raw)
  To: Mattias Rönnblom
  Cc: dev, jerinj, mattias.ronnblom, abdullah.sevincer, sachin.saxena,
	hemant.agrawal, pbhagavatula, pravin.pathak

On Wed, Jan 24, 2024 at 12:34:50PM +0100, Mattias Rönnblom wrote:
> On 2024-01-19 18:43, Bruce Richardson wrote:
> > Clarify the meaning of the NEW, FORWARD and RELEASE event types.
> > For the fields in "rte_event" struct, enhance the comments on each to
> > clarify the field's use, and whether it is preserved between enqueue and
> > dequeue, and it's role, if any, in scheduling.
> > 
> > Signed-off-by: Bruce Richardson <bruce.richardson@intel.com>
> > ---
> > 
> > As with the previous patch, please review this patch to ensure that the
> > expected semantics of the various event types and event fields have not
> > changed in an unexpected way.
> > ---
> >   lib/eventdev/rte_eventdev.h | 105 ++++++++++++++++++++++++++----------
> >   1 file changed, 77 insertions(+), 28 deletions(-)
> > 
> > diff --git a/lib/eventdev/rte_eventdev.h b/lib/eventdev/rte_eventdev.h
> > index cb13602ffb..4eff1c4958 100644
> > --- a/lib/eventdev/rte_eventdev.h
> > +++ b/lib/eventdev/rte_eventdev.h
> > @@ -1416,21 +1416,25 @@ struct rte_event_vector {
> > 
> >   /* Event enqueue operations */
> >   #define RTE_EVENT_OP_NEW                0
> > -/**< The event producers use this operation to inject a new event to the
> > +/**< The @ref rte_event.op field should be set to this type to inject a new event to the
> >    * event device.
> >    */
> 
> "type" -> "value"
> 
> "to" -> "into"?
> 
> You could also say "to mark the event as new".
> 
> What is new? Maybe "new (as opposed to a forwarded) event." or "new (i.e.,
> not previously dequeued).".
> 

Using this latter suggested wording in V3.

> "The application sets the @ref rte_event.op field of an enqueued event to
> this value to mark the event as new (i.e., not previously dequeued)."
> 
> >   #define RTE_EVENT_OP_FORWARD            1
> > -/**< The CPU use this operation to forward the event to different event queue or
> > - * change to new application specific flow or schedule type to enable
> > - * pipelining.
> > +/**< SW should set the @ref rte_event.op filed to this type to return a
> > + * previously dequeued event to the event device for further processing.
> 
> "filed" -> "field"
> 
> "SW" -> "The application"
> 
> "to be scheduled for further processing (or transmission)"
> 
> The wording should otherwise be the same as NEW, whatever you choose there.
> 
Ack.

> >    *
> > - * This operation must only be enqueued to the same port that the
> > + * This event *must* be enqueued to the same port that the
> >    * event to be forwarded was dequeued from.
> 
> OK, so you "should" mark a new event RTE_EVENT_OP_FORWARD but you "*must*"
> enqueue it to the same port.
> 
> I think you "must" do both.
> 
Ack

> > + *
> > + * The event's fields, including (but not limited to) flow_id, scheduling type,
> > + * destination queue, and event payload e.g. mbuf pointer, may all be updated as
> > + * desired by software, but the @ref rte_event.impl_opaque field must
> 
> "software" -> "application"
>
Ack
 
> > + * be kept to the same value as was present when the event was dequeued.
> >    */
> >   #define RTE_EVENT_OP_RELEASE            2
> >   /**< Release the flow context associated with the schedule type.
> >    *
> > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ATOMIC*
> > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ATOMIC
> >    * then this function hints the scheduler that the user has completed critical
> >    * section processing in the current atomic context.
> >    * The scheduler is now allowed to schedule events from the same flow from
> > @@ -1442,21 +1446,19 @@ struct rte_event_vector {
> >    * performance, but the user needs to design carefully the split into critical
> >    * vs non-critical sections.
> >    *
> > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_ORDERED*
> > - * then this function hints the scheduler that the user has done all that need
> > - * to maintain event order in the current ordered context.
> > - * The scheduler is allowed to release the ordered context of this port and
> > - * avoid reordering any following enqueues.
> > - *
> > - * Early ordered context release may increase parallelism and thus system
> > - * performance.
> > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_ORDERED
> 
> Isn't a missing "or @ref RTE_SCHED_TYPE_ATOMIC" just an oversight (in the
> original API wording)?
> 

No, I don't think so, because ATOMIC is described above.

> > + * then this function informs the scheduler that the current event has
> > + * completed processing and will not be returned to the scheduler, i.e.
> > + * it has been dropped, and so the reordering context for that event
> > + * should be considered filled.
> >    *
> > - * If current flow's scheduler type method is *RTE_SCHED_TYPE_PARALLEL*
> > + * If current flow's scheduler type method is @ref RTE_SCHED_TYPE_PARALLEL
> >    * or no scheduling context is held then this function may be an NOOP,
> >    * depending on the implementation.
> 
> Maybe you can also fix this "function" -> "operation". I suggest you delete
> that sentence, because it makes no sense.
> 
> What is says in a somewhat vague manner is that you tread into the realm of
> undefined behavior if you release PARALLEL events.
> 

Agree. Just deleting.

> >    *
> >    * This operation must only be enqueued to the same port that the
> > - * event to be released was dequeued from.
> > + * event to be released was dequeued from. The @ref rte_event.impl_opaque
> > + * field in the release event must match that in the original dequeued event.
> 
> I would say "same value" rather than "match".
> 
> "The @ref rte_event.impl_opaque field in the release event have the same
> value as in the original dequeued event."
> 
Ack.

> >    */
> > 
> >   /**
> > @@ -1473,53 +1475,100 @@ struct rte_event {
> >   			/**< Targeted flow identifier for the enqueue and
> >   			 * dequeue operation.
> >   			 * The value must be in the range of
> > -			 * [0, nb_event_queue_flows - 1] which
> > +			 * [0, @ref rte_event_dev_config.nb_event_queue_flows - 1] which
> 
> The same comment as I had before about ranges for unsigned types.
> 
Ack.

> >   			 * previously supplied to rte_event_dev_configure().
> > +			 *
> > +			 * For @ref RTE_SCHED_TYPE_ATOMIC, this field is used to identify a
> > +			 * flow context for atomicity, such that events from each individual flow
> > +			 * will only be scheduled to one port at a time.
> 
> flow_id alone doesn't identify an atomic flow. It's queue_id + flow_id. I'm
> not sure I think "flow context" adds much, unless it's defined somewhere.
> Sounds like some assumed implementation detail.
> 
Removing the word context, and adding that it identifies a flow "within a
queue and priority level", to make it clear that it's just not the flow_id
involved here, as you say.

> > +			 *
> > +			 * This field is preserved between enqueue and dequeue when
> > +			 * a device reports the @ref RTE_EVENT_DEV_CAP_CARRY_FLOW_ID
> > +			 * capability. Otherwise the value is implementation dependent
> > +			 * on dequeue >   			 */
> >   			uint32_t sub_event_type:8;
> >   			/**< Sub-event types based on the event source.
> > +			 *
> > +			 * This field is preserved between enqueue and dequeue.
> > +			 * This field is for SW or event adapter use,
> 
> "SW" -> "application"
> 
Ack.

> > +			 * and is unused in scheduling decisions.
> > +			 *
> >   			 * @see RTE_EVENT_TYPE_CPU
> >   			 */
> >   			uint32_t event_type:4;
> > -			/**< Event type to classify the event source.
> > -			 * @see RTE_EVENT_TYPE_ETHDEV, (RTE_EVENT_TYPE_*)
> > +			/**< Event type to classify the event source. (RTE_EVENT_TYPE_*)
> > +			 *
> > +			 * This field is preserved between enqueue and dequeue
> > +			 * This field is for SW or event adapter use,
> > +			 * and is unused in scheduling decisions.
> 
> "unused" -> "is not considered"?
> 
Ack.

> >   			 */
> >   			uint8_t op:2;
> > -			/**< The type of event enqueue operation - new/forward/
> > -			 * etc.This field is not preserved across an instance
> > +			/**< The type of event enqueue operation - new/forward/ etc.
> > +			 *
> > +			 * This field is *not* preserved across an instance
> >   			 * and is undefined on dequeue.
> 
> Maybe you should use "undefined" rather than "implementation dependent", or
> change this instance of undefined to implementation dependent. Now two
> different terms are used for the same thing.
> 

Using implementation dependent.
Ideally, I think we should update all drivers to set this to "FORWARD" by
default on dequeue, but for now it's "implementation dependent".

> > -			 * @see RTE_EVENT_OP_NEW, (RTE_EVENT_OP_*)
> > +			 *
> > +			 * @see RTE_EVENT_OP_NEW
> > +			 * @see RTE_EVENT_OP_FORWARD
> > +			 * @see RTE_EVENT_OP_RELEASE
> >   			 */
> >   			uint8_t rsvd:4;
> > -			/**< Reserved for future use */
> > +			/**< Reserved for future use.
> > +			 *
> > +			 * Should be set to zero on enqueue. Zero on dequeue.
> > +			 */
> 
> Why say they should be zero on dequeue? Doesn't this defeat the purpose of
> having reserved bits, for future use? With you suggested wording, you can't
> use these bits without breaking the ABI.

Good point. Removing the dequeue value bit.

> 
> >   			uint8_t sched_type:2;
> >   			/**< Scheduler synchronization type (RTE_SCHED_TYPE_*)
> >   			 * associated with flow id on a given event queue
> >   			 * for the enqueue and dequeue operation.
> > +			 *
> > +			 * This field is used to determine the scheduling type
> > +			 * for events sent to queues where @ref RTE_EVENT_QUEUE_CFG_ALL_TYPES
> > +			 * is supported.
> 
> "supported" -> "configured"
> 
Ack.

> > +			 * For queues where only a single scheduling type is available,
> > +			 * this field must be set to match the configured scheduling type.
> > +			 *
> 
> Why is the API/event device asking this of the application?
> 
Historical reasons. I agree that it shouldn't, this should just be marked
as ignored on fixed-type queues, but the spec up till now says it must
match and some drivers do check this. Once we update the drivers to stop
checking then we can change the spec without affecting apps.

> > +			 * This field is preserved between enqueue and
> > dequeue.  +			 * +			 * @see
> > RTE_SCHED_TYPE_ORDERED +			 * @see
> > RTE_SCHED_TYPE_ATOMIC +			 * @see
> > RTE_SCHED_TYPE_PARALLEL */ uint8_t queue_id; /**< Targeted event queue
> > identifier for the enqueue or * dequeue operation.  * The value must be
> > in the range of -			 * [0, nb_event_queues - 1] which
> > previously supplied to -			 *
> > rte_event_dev_configure().  +			 * [0, @ref
> > rte_event_dev_config.nb_event_queues - 1] which was +
> > * previously supplied to rte_event_dev_configure().  +
> > * +			 * This field is preserved between enqueue on
> > dequeue.  */ uint8_t priority; /**< Event priority relative to other
> > events in the * event queue. The requested priority should in the -
> > * range of  [RTE_EVENT_DEV_PRIORITY_HIGHEST, -			 *
> > RTE_EVENT_DEV_PRIORITY_LOWEST].  +			 * range of  [@ref
> > RTE_EVENT_DEV_PRIORITY_HIGHEST, +			 * @ref
> > RTE_EVENT_DEV_PRIORITY_LOWEST].  * The implementation shall normalize
> > the requested * priority to supported priority value.  +
> > * * Valid when the device has -			 *
> > RTE_EVENT_DEV_CAP_EVENT_QOS capability.  +			 * @ref
> > RTE_EVENT_DEV_CAP_EVENT_QOS capability.  +			 * Ignored
> > otherwise.  +			 * +			 * This
> > field is preserved between enqueue and dequeue.
> 
> Is the normalized or unnormalized value that is preserved?
> 
Very good point. It's the normalized & then denormalised version that is
guaranteed to be preserved, I suspect. SW eventdevs probably preserve
as-is, but HW eventdevs may lose precision. Rather than making this
"implementation defined" or "not preserved" which would be annoying for
apps, I think, I'm going to document this as "preserved, but with possible
loss of precision".

> >   			 */
> >   			uint8_t impl_opaque;
> >   			/**< Implementation specific opaque value.
> 
> Maybe you can also fix "implementation" here to be something more specific.
> Implementation, of what?
> 
> "Event device implementation" or just "event device".
> 
"Opaque field for event device use"

> > +			 *
> >   			 * An implementation may use this field to hold
> >   			 * implementation specific value to share between
> >   			 * dequeue and enqueue operation.
> > +			 *
> >   			 * The application should not modify this field.
> > +			 * Its value is implementation dependent on dequeue,
> > +			 * and must be returned unmodified on enqueue when
> > +			 * op type is @ref RTE_EVENT_OP_FORWARD or @ref RTE_EVENT_OP_RELEASE
> 
> Should it be mentioned that impl_opaque is ignored by the event device for
> NEW events?
> 
Added in V3.

> >   			 */
> >   		};
> >   	};
> > --
> > 2.40.1
> > 

^ permalink raw reply	[relevance 0%]

* RE: [EXT] [PATCH 1/3] cryptodev: add ec points to sm2 op
  @ 2024-02-01 13:25  0%   ` Kusztal, ArkadiuszX
  0 siblings, 0 replies; 200+ results
From: Kusztal, ArkadiuszX @ 2024-02-01 13:25 UTC (permalink / raw)
  To: Akhil Goyal, dev; +Cc: Power, Ciara



> -----Original Message-----
> From: Akhil Goyal <gakhil@marvell.com>
> Sent: Thursday, February 1, 2024 9:08 AM
> To: Kusztal, ArkadiuszX <arkadiuszx.kusztal@intel.com>; dev@dpdk.org
> Cc: Power, Ciara <ciara.power@intel.com>
> Subject: RE: [EXT] [PATCH 1/3] cryptodev: add ec points to sm2 op
> 
> > ----------------------------------------------------------------------
> > In the case when PMD cannot support full process of the SM2, but
> > elliptic curve computation only, additional fields are needed to
> > handle such a case.
> >
> 
> Asym crypto APIs are no longer experimental.
> Hence adding new fields would lead to ABI break.

It seems that
`__rte_crypto_op_reset`
`rte_crypto_op_pool_create`
functions do not need versioning, and we could easily do it if needed.
But the field `flags` changes an offset, and this is actually problematic.
Which means that we cannot do this change before 24.11.

> 
> > Points C1, kP therefore were added to the SM2 crypto operation struct.
> >
> > Signed-off-by: Arkadiusz Kusztal <arkadiuszx.kusztal@intel.com>

^ permalink raw reply	[relevance 0%]

* [PATCH v5 1/3] ethdev: rename action modify field data structure
  2024-02-01 12:29  3% ` [PATCH v5 0/3] " Suanming Mou
@ 2024-02-01 12:29  7%   ` Suanming Mou
  2024-02-01 18:57  0%     ` Ferruh Yigit
  2024-02-01 18:56  0%   ` [PATCH v5 0/3] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Ferruh Yigit
  1 sibling, 1 reply; 200+ results
From: Suanming Mou @ 2024-02-01 12:29 UTC (permalink / raw)
  To: ferruh.yigit, Ori Kam, Aman Singh, Yuying Zhang,
	Dariusz Sosnowski, Viacheslav Ovsiienko, Matan Azrad,
	Thomas Monjalon, Andrew Rybchenko
  Cc: dev

Current rte_flow_action_modify_data struct describes the pkt
field perfectly and is used only in action.

It is planned to be used for item as well. This commit renames
it to "rte_flow_field_data" making it compatible to be used by item.

Signed-off-by: Suanming Mou <suanmingm@nvidia.com>
Acked-by: Ori Kam <orika@nvidia.com>
Acked-by: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>
---
 app/test-pmd/cmdline_flow.c            | 22 +++++++++++-----------
 doc/guides/prog_guide/rte_flow.rst     |  2 +-
 doc/guides/rel_notes/release_24_03.rst |  1 +
 drivers/net/mlx5/mlx5_flow.c           |  4 ++--
 drivers/net/mlx5/mlx5_flow.h           |  6 +++---
 drivers/net/mlx5/mlx5_flow_dv.c        | 10 +++++-----
 lib/ethdev/rte_flow.h                  | 12 ++++++++----
 7 files changed, 31 insertions(+), 26 deletions(-)

diff --git a/app/test-pmd/cmdline_flow.c b/app/test-pmd/cmdline_flow.c
index 359c187b3c..972c6ae490 100644
--- a/app/test-pmd/cmdline_flow.c
+++ b/app/test-pmd/cmdline_flow.c
@@ -742,13 +742,13 @@ enum index {
 #define ITEM_RAW_SIZE \
 	(sizeof(struct rte_flow_item_raw) + ITEM_RAW_PATTERN_SIZE)
 
-/** Maximum size for external pattern in struct rte_flow_action_modify_data. */
-#define ACTION_MODIFY_PATTERN_SIZE 32
+/** Maximum size for external pattern in struct rte_flow_field_data. */
+#define FLOW_FIELD_PATTERN_SIZE 32
 
 /** Storage size for struct rte_flow_action_modify_field including pattern. */
 #define ACTION_MODIFY_SIZE \
 	(sizeof(struct rte_flow_action_modify_field) + \
-	ACTION_MODIFY_PATTERN_SIZE)
+	FLOW_FIELD_PATTERN_SIZE)
 
 /** Maximum number of queue indices in struct rte_flow_action_rss. */
 #define ACTION_RSS_QUEUE_NUM 128
@@ -942,7 +942,7 @@ static const char *const modify_field_ops[] = {
 	"set", "add", "sub", NULL
 };
 
-static const char *const modify_field_ids[] = {
+static const char *const flow_field_ids[] = {
 	"start", "mac_dst", "mac_src",
 	"vlan_type", "vlan_id", "mac_type",
 	"ipv4_dscp", "ipv4_ttl", "ipv4_src", "ipv4_dst",
@@ -6986,7 +6986,7 @@ static const struct token token_list[] = {
 			     ARGS_ENTRY_ARB(0, 0),
 			     ARGS_ENTRY_ARB
 				(sizeof(struct rte_flow_action_modify_field),
-				 ACTION_MODIFY_PATTERN_SIZE)),
+				 FLOW_FIELD_PATTERN_SIZE)),
 		.call = parse_vc_conf,
 	},
 	[ACTION_MODIFY_FIELD_WIDTH] = {
@@ -9798,10 +9798,10 @@ parse_vc_modify_field_id(struct context *ctx, const struct token *token,
 	if (ctx->curr != ACTION_MODIFY_FIELD_DST_TYPE_VALUE &&
 		ctx->curr != ACTION_MODIFY_FIELD_SRC_TYPE_VALUE)
 		return -1;
-	for (i = 0; modify_field_ids[i]; ++i)
-		if (!strcmp_partial(modify_field_ids[i], str, len))
+	for (i = 0; flow_field_ids[i]; ++i)
+		if (!strcmp_partial(flow_field_ids[i], str, len))
 			break;
-	if (!modify_field_ids[i])
+	if (!flow_field_ids[i])
 		return -1;
 	if (!ctx->object)
 		return len;
@@ -12028,10 +12028,10 @@ comp_set_modify_field_id(struct context *ctx, const struct token *token,
 
 	RTE_SET_USED(token);
 	if (!buf)
-		return RTE_DIM(modify_field_ids);
-	if (ent >= RTE_DIM(modify_field_ids) - 1)
+		return RTE_DIM(flow_field_ids);
+	if (ent >= RTE_DIM(flow_field_ids) - 1)
 		return -1;
-	name = modify_field_ids[ent];
+	name = flow_field_ids[ent];
 	if (ctx->curr == ACTION_MODIFY_FIELD_SRC_TYPE ||
 	    (strcmp(name, "pointer") && strcmp(name, "value")))
 		return strlcpy(buf, name, size);
diff --git a/doc/guides/prog_guide/rte_flow.rst b/doc/guides/prog_guide/rte_flow.rst
index 900fdaefb6..f936a9ba19 100644
--- a/doc/guides/prog_guide/rte_flow.rst
+++ b/doc/guides/prog_guide/rte_flow.rst
@@ -3185,7 +3185,7 @@ destination offset as ``48``, and provide immediate value ``0xXXXX85XX``.
    | ``width``     | number of bits to use   |
    +---------------+-------------------------+
 
-.. _table_rte_flow_action_modify_data:
+.. _table_rte_flow_field_data:
 
 .. table:: destination/source field definition
 
diff --git a/doc/guides/rel_notes/release_24_03.rst b/doc/guides/rel_notes/release_24_03.rst
index 84d3144215..efeda6ea97 100644
--- a/doc/guides/rel_notes/release_24_03.rst
+++ b/doc/guides/rel_notes/release_24_03.rst
@@ -124,6 +124,7 @@ ABI Changes
 
 * No ABI change that would break compatibility with 23.11.
 
+* ethdev: Rename the experimental ``struct rte_flow_action_modify_data`` to be ``struct rte_flow_field_data``
 
 Known Issues
 ------------
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 85e8c77c81..5788a7fb57 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -2493,7 +2493,7 @@ mlx5_validate_action_ct(struct rte_eth_dev *dev,
  * Validate the level value for modify field action.
  *
  * @param[in] data
- *   Pointer to the rte_flow_action_modify_data structure either src or dst.
+ *   Pointer to the rte_flow_field_data structure either src or dst.
  * @param[out] error
  *   Pointer to error structure.
  *
@@ -2501,7 +2501,7 @@ mlx5_validate_action_ct(struct rte_eth_dev *dev,
  *   0 on success, a negative errno value otherwise and rte_errno is set.
  */
 int
-flow_validate_modify_field_level(const struct rte_flow_action_modify_data *data,
+flow_validate_modify_field_level(const struct rte_flow_field_data *data,
 				 struct rte_flow_error *error)
 {
 	if (data->level == 0)
diff --git a/drivers/net/mlx5/mlx5_flow.h b/drivers/net/mlx5/mlx5_flow.h
index 6dde9de688..ecfb04ead2 100644
--- a/drivers/net/mlx5/mlx5_flow.h
+++ b/drivers/net/mlx5/mlx5_flow.h
@@ -1121,7 +1121,7 @@ flow_items_to_tunnel(const struct rte_flow_item items[])
  *   Tag array index.
  */
 static inline uint8_t
-flow_tag_index_get(const struct rte_flow_action_modify_data *data)
+flow_tag_index_get(const struct rte_flow_field_data *data)
 {
 	return data->tag_index ? data->tag_index : data->level;
 }
@@ -2523,7 +2523,7 @@ int mlx5_flow_validate_action_default_miss(uint64_t action_flags,
 				const struct rte_flow_attr *attr,
 				struct rte_flow_error *error);
 int flow_validate_modify_field_level
-			(const struct rte_flow_action_modify_data *data,
+			(const struct rte_flow_field_data *data,
 			 struct rte_flow_error *error);
 int mlx5_flow_item_acceptable(const struct rte_flow_item *item,
 			      const uint8_t *mask,
@@ -2828,7 +2828,7 @@ size_t flow_dv_get_item_hdr_len(const enum rte_flow_item_type item_type);
 int flow_dv_convert_encap_data(const struct rte_flow_item *items, uint8_t *buf,
 			   size_t *size, struct rte_flow_error *error);
 void mlx5_flow_field_id_to_modify_info
-		(const struct rte_flow_action_modify_data *data,
+		(const struct rte_flow_field_data *data,
 		 struct field_modify_info *info, uint32_t *mask,
 		 uint32_t width, struct rte_eth_dev *dev,
 		 const struct rte_flow_attr *attr, struct rte_flow_error *error);
diff --git a/drivers/net/mlx5/mlx5_flow_dv.c b/drivers/net/mlx5/mlx5_flow_dv.c
index 115d730317..52620be262 100644
--- a/drivers/net/mlx5/mlx5_flow_dv.c
+++ b/drivers/net/mlx5/mlx5_flow_dv.c
@@ -1441,7 +1441,7 @@ flow_modify_info_mask_32_masked(uint32_t length, uint32_t off, uint32_t post_mas
 }
 
 static __rte_always_inline enum mlx5_modification_field
-mlx5_mpls_modi_field_get(const struct rte_flow_action_modify_data *data)
+mlx5_mpls_modi_field_get(const struct rte_flow_field_data *data)
 {
 	return MLX5_MODI_IN_MPLS_LABEL_0 + data->tag_index;
 }
@@ -1449,7 +1449,7 @@ mlx5_mpls_modi_field_get(const struct rte_flow_action_modify_data *data)
 static void
 mlx5_modify_flex_item(const struct rte_eth_dev *dev,
 		      const struct mlx5_flex_item *flex,
-		      const struct rte_flow_action_modify_data *data,
+		      const struct rte_flow_field_data *data,
 		      struct field_modify_info *info,
 		      uint32_t *mask, uint32_t width)
 {
@@ -1573,7 +1573,7 @@ mlx5_modify_flex_item(const struct rte_eth_dev *dev,
 
 void
 mlx5_flow_field_id_to_modify_info
-		(const struct rte_flow_action_modify_data *data,
+		(const struct rte_flow_field_data *data,
 		 struct field_modify_info *info, uint32_t *mask,
 		 uint32_t width, struct rte_eth_dev *dev,
 		 const struct rte_flow_attr *attr, struct rte_flow_error *error)
@@ -5284,8 +5284,8 @@ flow_dv_validate_action_modify_field(struct rte_eth_dev *dev,
 	struct mlx5_sh_config *config = &priv->sh->config;
 	struct mlx5_hca_attr *hca_attr = &priv->sh->cdev->config.hca_attr;
 	const struct rte_flow_action_modify_field *conf = action->conf;
-	const struct rte_flow_action_modify_data *src_data = &conf->src;
-	const struct rte_flow_action_modify_data *dst_data = &conf->dst;
+	const struct rte_flow_field_data *src_data = &conf->src;
+	const struct rte_flow_field_data *dst_data = &conf->dst;
 	uint32_t dst_width, src_width, width = conf->width;
 
 	ret = flow_dv_validate_action_modify_hdr(action_flags, action, error);
diff --git a/lib/ethdev/rte_flow.h b/lib/ethdev/rte_flow.h
index 1267c146e5..a143ecb194 100644
--- a/lib/ethdev/rte_flow.h
+++ b/lib/ethdev/rte_flow.h
@@ -3887,6 +3887,8 @@ struct rte_flow_action_ethdev {
 
 /**
  * Field IDs for MODIFY_FIELD action.
+ * e.g. the packet field IDs used in RTE_FLOW_ACTION_TYPE_MODIFY_FIELD
+ * and RTE_FLOW_ITEM_TYPE_COMPARE.
  */
 enum rte_flow_field_id {
 	RTE_FLOW_FIELD_START = 0,	/**< Start of a packet. */
@@ -3940,9 +3942,11 @@ enum rte_flow_field_id {
  * @warning
  * @b EXPERIMENTAL: this structure may change without prior notice
  *
- * Field description for MODIFY_FIELD action.
+ * Field description for packet field.
+ * e.g. the packet fields used in RTE_FLOW_ACTION_TYPE_MODIFY_FIELD
+ * and RTE_FLOW_ITEM_TYPE_COMPARE.
  */
-struct rte_flow_action_modify_data {
+struct rte_flow_field_data {
 	enum rte_flow_field_id field; /**< Field or memory type ID. */
 	union {
 		struct {
@@ -4051,8 +4055,8 @@ enum rte_flow_modify_op {
  */
 struct rte_flow_action_modify_field {
 	enum rte_flow_modify_op operation; /**< Operation to perform. */
-	struct rte_flow_action_modify_data dst; /**< Destination field. */
-	struct rte_flow_action_modify_data src; /**< Source field. */
+	struct rte_flow_field_data dst; /**< Destination field. */
+	struct rte_flow_field_data src; /**< Source field. */
 	uint32_t width; /**< Number of bits to use from a source field. */
 };
 
-- 
2.34.1


^ permalink raw reply	[relevance 7%]

* [PATCH v5 0/3] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE
  @ 2024-02-01 12:29  3% ` Suanming Mou
  2024-02-01 12:29  7%   ` [PATCH v5 1/3] ethdev: rename action modify field data structure Suanming Mou
  2024-02-01 18:56  0%   ` [PATCH v5 0/3] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Ferruh Yigit
  2024-02-02  0:42  3% ` [PATCH v6 " Suanming Mou
  2024-02-06  2:06  3% ` [PATCH v7 0/4] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Suanming Mou
  2 siblings, 2 replies; 200+ results
From: Suanming Mou @ 2024-02-01 12:29 UTC (permalink / raw)
  To: ferruh.yigit; +Cc: dev, orika

The new item type is added for the case user wants to match traffic
based on packet field compare result with other fields or immediate
value.

e.g. take advantage the compare item user will be able to accumulate
a IPv4/TCP packet's TCP data_offset and IPv4 IHL field to a tag
register, then compare the tag register with IPv4 header total length
to understand the packet has payload or not.

The supported operations can be as below:
 - RTE_FLOW_ITEM_COMPARE_EQ (equal)
 - RTE_FLOW_ITEM_COMPARE_NE (not equal)
 - RTE_FLOW_ITEM_COMPARE_LT (less than)
 - RTE_FLOW_ITEM_COMPARE_LE (less than or equal)
 - RTE_FLOW_ITEM_COMPARE_GT (great than)
 - RTE_FLOW_ITEM_COMPARE_GE (great than or equal)

V5:
 - rebase on top of next-net
 - add sample detail for rte_flow_field.

V4:
 - rebase on top of the latest version.
 - move ACTION_MODIFY_PATTERN_SIZE and modify_field_ids rename
   to first patch.
 - add comparison flow create sample in testpmd_funcs.rst.

V3:
 - fix code style missing empty line in rte_flow.rst.
 - fix missing the ABI change release notes.

V2:
 - Since modify field data struct is experiment, rename modify
   field data directly instead of adding new flow field struct.


Suanming Mou (3):
  ethdev: rename action modify field data structure
  ethdev: add compare item
  net/mlx5: add compare item support

 app/test-pmd/cmdline_flow.c                 | 416 +++++++++++++++++++-
 doc/guides/nics/features/default.ini        |   1 +
 doc/guides/nics/features/mlx5.ini           |   1 +
 doc/guides/nics/mlx5.rst                    |   7 +
 doc/guides/prog_guide/rte_flow.rst          |   9 +-
 doc/guides/rel_notes/release_24_03.rst      |   8 +
 doc/guides/testpmd_app_ug/testpmd_funcs.rst |  24 ++
 drivers/net/mlx5/mlx5_flow.c                |   4 +-
 drivers/net/mlx5/mlx5_flow.h                |   9 +-
 drivers/net/mlx5/mlx5_flow_dv.c             |  10 +-
 drivers/net/mlx5/mlx5_flow_hw.c             |  73 ++++
 lib/ethdev/rte_flow.c                       |   1 +
 lib/ethdev/rte_flow.h                       | 332 +++++++++-------
 13 files changed, 726 insertions(+), 169 deletions(-)

-- 
2.34.1


^ permalink raw reply	[relevance 3%]

Results 1-200 of ~17000   | reverse | sort options + mbox downloads above
-- links below jump to the message on this page --
2023-01-17  9:10     [PATCH 0/2] add ring telemetry cmds Jie Hai
2023-11-09 10:20     ` [RESEND v7 0/3] add telemetry cmds for ring Jie Hai
2023-11-09 10:20       ` [RESEND v7 1/3] ring: fix unmatched type definition and usage Jie Hai
2024-02-18 18:11  3%     ` Thomas Monjalon
2024-02-19  8:24  0%       ` Jie Hai
2024-02-19  8:32  3% ` [PATCH v8 0/2] add telemetry cmds for ring Jie Hai
2023-09-29 15:42     [PATCH] lib/hash: new feature adding existing key David Marchand
2023-10-23  8:29     ` [PATCH v2] " Abdullah Ömer Yamaç
2024-02-16 12:43  3%   ` Thomas Monjalon
2023-11-15 13:36     [PATCH v1 0/2] dts: api docs generation Juraj Linkeš
2024-01-22 16:35     ` [PATCH v3 0/3] dts: API " Juraj Linkeš
2024-01-22 16:35       ` [PATCH v3 3/3] dts: add API doc generation Juraj Linkeš
     [not found]         ` <CAJvnSUCNjo0p-yhROF1MNLKhjiAw2QTyTHO2hpOaVVUn0xnJ0A@mail.gmail.com>
2024-02-29 18:12  2%       ` Nicholas Pratte
2024-04-12 10:14     ` [PATCH v4 0/3] dts: API docs generation Juraj Linkeš
2024-04-12 10:14  2%   ` [PATCH v4 3/3] dts: add API doc generation Juraj Linkeš
2023-12-12 15:36     [PATCH v1] crypto/ipsec_mb: unified IPsec MB interface Brian Dooley
2024-03-05 17:42     ` [PATCH v5 1/4] crypto/ipsec_mb: bump minimum IPsec Multi-buffer version Brian Dooley
2024-03-05 19:11       ` [EXTERNAL] " Akhil Goyal
2024-03-06 11:12  4%     ` Power, Ciara
2024-03-12 13:50     ` [PATCH v6 1/5] ci: replace IPsec-mb package install Brian Dooley
2024-03-12 13:54       ` David Marchand
2024-03-12 15:26  3%     ` Power, Ciara
2024-03-12 16:13  3%       ` David Marchand
2024-03-12 17:07  0%         ` Power, Ciara
2023-12-13  1:42     [PATCH 00/26] Replace uses of RTE_LOGTYPE_PMD Stephen Hemminger
2024-02-03  4:10  3% ` [PATCH v7 00/19] Replace use of PMD logtype Stephen Hemminger
2024-02-12 14:45  0%   ` David Marchand
2024-02-03  4:11  3% ` [PATCH v7 00/19] Replace uses of RTE_LOGTYPE_PMD Stephen Hemminger
2023-12-14  1:56     [PATCH] ethdev: add dump regs for telemetry Jie Hai
2024-02-05 10:51     ` [PATCH v2 0/7] support dump reigser names and filter them Jie Hai
2024-02-05 10:51  8%   ` [PATCH v2 1/7] ethdev: support report register names and filter Jie Hai
2024-02-07 17:00  3%     ` Ferruh Yigit
2024-02-20  8:43  3%       ` Jie Hai
2024-02-20 10:58     ` [PATCH v3 0/7] support dump reigser names and filter them Jie Hai
2024-02-20 10:58  8%   ` [PATCH v3 1/7] ethdev: support report register names and filter Jie Hai
2024-02-26  3:07     ` [PATCH v4 0/7] support dump reigser names and filter them Jie Hai
2024-02-26  3:07  8%   ` [PATCH v4 1/7] ethdev: support report register names and filter Jie Hai
2024-02-26  8:01  0%     ` fengchengwen
2024-03-06  7:22  0%       ` Jie Hai
2024-02-29  9:52  3%     ` Thomas Monjalon
2024-03-05  7:45  5%       ` Jie Hai
2024-03-07  3:02     ` [PATCH v5 0/7] support dump reigser names and filter them Jie Hai
2024-03-07  3:02  8%   ` [PATCH v5 1/7] ethdev: support report register names and filter Jie Hai
2023-12-14  3:12     [PATCH 0/2] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Suanming Mou
2024-02-01 12:29  3% ` [PATCH v5 0/3] " Suanming Mou
2024-02-01 12:29  7%   ` [PATCH v5 1/3] ethdev: rename action modify field data structure Suanming Mou
2024-02-01 18:57  0%     ` Ferruh Yigit
2024-02-01 18:56  0%   ` [PATCH v5 0/3] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Ferruh Yigit
2024-02-02  0:32  0%     ` Suanming Mou
2024-02-02  0:42  3% ` [PATCH v6 " Suanming Mou
2024-02-02  0:42  7%   ` [PATCH v6 1/3] ethdev: rename action modify field data structure Suanming Mou
2024-02-05 11:23  0%     ` Thomas Monjalon
2024-02-05 11:49  0%       ` Suanming Mou
2024-02-06  2:06  3% ` [PATCH v7 0/4] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Suanming Mou
2024-02-06  2:06  3%   ` [PATCH v7 1/4] ethdev: rename action modify field data structure Suanming Mou
2024-02-06 21:24  0%   ` [PATCH v7 0/4] ethdev: add RTE_FLOW_ITEM_TYPE_COMPARE Ferruh Yigit
2023-12-22 11:04     [v7 1/1] net/af_xdp: fix multi interface support for K8s Maryam Tahhan
2024-01-10 14:58     ` Maryam Tahhan
2024-01-10 15:21       ` Ferruh Yigit
2024-01-11  9:01         ` Maryam Tahhan
2024-01-11 11:35           ` Ferruh Yigit
2024-01-11 12:21             ` Maryam Tahhan
2024-01-11 14:21               ` Ferruh Yigit
2024-02-07 23:24  0%             ` Ferruh Yigit
2024-02-09 12:40  0%               ` Loftus, Ciara
2024-01-08  1:59     Issues around packet capture when secondary process is doing rx/tx Stephen Hemminger
2024-01-08 10:41     ` Morten Brørup
2024-04-03 11:43  0%   ` Ferruh Yigit
2024-01-08 15:13     ` Konstantin Ananyev
2024-04-03  0:14  4%   ` Stephen Hemminger
2024-04-03 11:42  0%   ` Ferruh Yigit
2024-01-08  8:27     [PATCH] app/dma-perf: support bi-directional transfer Amit Prakash Shukla
2024-02-27 19:26     ` [PATCH v2] " Amit Prakash Shukla
2024-02-28  7:03       ` fengchengwen
2024-02-28  9:38         ` [EXT] " Amit Prakash Shukla
2024-02-29 14:03  3%       ` Amit Prakash Shukla
2024-03-01  1:46  0%         ` fengchengwen
2024-03-01  8:31  0%           ` [EXTERNAL] " Amit Prakash Shukla
2024-03-01  9:30  0%             ` fengchengwen
2024-03-01 10:59  0%               ` Amit Prakash Shukla
2024-03-07 13:41  0%                 ` fengchengwen
2024-01-10 16:57     [PATCH] doc: update minimum Linux kernel version Stephen Hemminger
2024-01-11  9:18     ` Morten Brørup
2024-01-11 19:02       ` Patrick Robb
2024-01-11 19:26         ` Morten Brørup
2024-01-11 19:54           ` Stephen Hemminger
2024-01-11 22:38             ` Morten Brørup
2024-02-16  3:05  0%           ` Stephen Hemminger
2024-02-16  8:29  0%             ` Morten Brørup
2024-02-16 17:22  0%               ` Stephen Hemminger
2024-01-18 13:45     [PATCH v1 0/7] improve eventdev API specification/documentation Bruce Richardson
2024-01-19 17:43     ` [PATCH v2 00/11] " Bruce Richardson
2024-01-19 17:43       ` [PATCH v2 11/11] eventdev: RFC clarify docs on event object fields Bruce Richardson
2024-01-24 11:34         ` Mattias Rönnblom
2024-02-01 16:59  0%       ` Bruce Richardson
2024-02-02  9:38  0%         ` Mattias Rönnblom
2024-02-02 12:39       ` [PATCH v3 00/11] improve eventdev API specification/documentation Bruce Richardson
2024-02-02 12:39         ` [PATCH v3 10/11] eventdev: clarify docs on event object fields and op types Bruce Richardson
2024-02-09  9:14           ` Jerin Jacob
2024-02-20 17:39  3%         ` Bruce Richardson
2024-02-21  9:31  0%           ` Jerin Jacob
2024-01-24 22:17     [PATCH 0/2] more replacement of zero length array Tyler Retzlaff
2024-02-12 22:36     ` [PATCH v2 0/4] " Tyler Retzlaff
2024-02-13 13:14  3%   ` David Marchand
2024-02-13 19:20  3%     ` Tyler Retzlaff
2024-02-14  7:36  4%       ` David Marchand
2024-02-16 10:14  0%         ` David Marchand
2024-02-16 20:46  0%           ` Tyler Retzlaff
2024-02-27 23:56     ` [PATCH v3 0/6] " Tyler Retzlaff
2024-02-27 23:56  4%   ` [PATCH v3 4/6] pipeline: replace zero length array with flex array Tyler Retzlaff
2024-02-29 22:58     ` [PATCH v4 0/6] more replacement of zero length array Tyler Retzlaff
2024-02-29 22:58  4%   ` [PATCH v4 4/6] pipeline: replace zero length array with flex array Tyler Retzlaff
2024-03-06 20:13     ` [PATCH v5 0/6] more replacement of zero length array Tyler Retzlaff
2024-03-06 20:13  4%   ` [PATCH v5 4/6] pipeline: replace zero length array with flex array Tyler Retzlaff
2024-01-28  9:39     [PATCH 0/4] introduce encap hash calculation Ori Kam
2024-02-08  9:09     ` [PATCH v2 1/4] ethdev: " Ori Kam
2024-02-08 17:13       ` Ferruh Yigit
2024-02-11  7:29         ` Ori Kam
2024-02-12 17:05           ` Ferruh Yigit
2024-02-12 18:44             ` Ori Kam
2024-02-12 20:09  3%           ` Ferruh Yigit
2024-02-13  7:05  0%             ` Ori Kam
2024-01-29 18:59     [PATCH 1/3] cryptodev: add ec points to sm2 op Arkadiusz Kusztal
2024-02-01  8:07     ` [EXT] " Akhil Goyal
2024-02-01 13:25  0%   ` Kusztal, ArkadiuszX
2024-01-30  3:46     [RFC 0/2] net/tap RSS BPF rewrite Stephen Hemminger
2024-02-07 22:11     ` [PATCH v2 0/7] net/tap: RSS using BPF overhaul Stephen Hemminger
2024-02-07 22:11  2%   ` [PATCH v2 4/7] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-02-08 17:41     ` [PATCH v3 0/7] net/tap: RSS using BPF overhaul Stephen Hemminger
2024-02-08 17:41  2%   ` [PATCH v3 4/7] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-02-08 19:05     ` [PATCH v4 0/7] net/tap: queue flow action RSS using BPF redo Stephen Hemminger
2024-02-08 19:05  2%   ` [PATCH v4 4/7] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-04-02 17:12     ` [PATCH v5 0/8] net/tap: cleanups and fix BPF flow Stephen Hemminger
2024-04-02 17:12  2%   ` [PATCH v5 6/8] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-04-05 21:14     ` [PATCH v6 0/8] net/tap: cleanup and fix BPF flow support Stephen Hemminger
2024-04-05 21:14  2%   ` [PATCH v6 6/8] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-04-08 21:18     ` [PATCH v7 0/8] net/tap: cleanups and fix BPF support Stephen Hemminger
2024-04-08 21:18  2%   ` [PATCH v7 5/8] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-04-09  3:40     ` [PATCH v8 0/8] net/tap: cleanups and fix BPF support Stephen Hemminger
2024-04-09  3:40  2%   ` [PATCH v8 5/8] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-04-26 15:48     ` [PATCH v9 0/9] net/tap: fix RSS (BPF) support Stephen Hemminger
2024-04-26 15:48  2%   ` [PATCH v9 5/9] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-01 16:11     ` [PATCH v10 0/9] net/tap: fix RSS (BPF) flow support Stephen Hemminger
2024-05-01 16:12  2%   ` [PATCH v10 5/9] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-02  2:49     ` [PATCH v11 0/9] net/tap fix RSS (BPF) flow support Stephen Hemminger
2024-05-02  2:49  2%   ` [PATCH v11 5/9] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-05-02 21:31     ` [PATCH v12 00/12] net/tap: RSS and other fixes Stephen Hemminger
2024-05-02 21:31  2%   ` [PATCH v12 06/12] net/tap: rewrite the RSS BPF program Stephen Hemminger
2024-01-30 23:26     [PATCH] replace GCC marker extension with C11 anonymous unions Tyler Retzlaff
2024-02-15  6:21     ` [PATCH v4 00/18] stop using zero sized marker fields Tyler Retzlaff
2024-02-15  6:21  3%   ` [PATCH v4 01/18] mbuf: deprecate GCC marker in rte mbuf struct Tyler Retzlaff
2024-02-18 12:39  0%     ` Thomas Monjalon
2024-02-18 13:07  0%       ` Morten Brørup
2024-02-20 17:20  0%       ` Tyler Retzlaff
2024-02-27  5:41     ` [PATCH v6 00/23] stop and remove RTE_MARKER typedefs Tyler Retzlaff
2024-02-27  5:41       ` [PATCH v6 02/23] mbuf: consolidate driver asserts for mbuf struct Tyler Retzlaff
2024-03-14 16:51  4%     ` Tyler Retzlaff
2024-02-27  5:41       ` [PATCH v6 20/23] mbuf: remove and stop using rte marker fields Tyler Retzlaff
2024-02-27 15:18  4%     ` David Marchand
2024-02-27 16:04  3%       ` Morten Brørup
2024-02-27 17:23  4%       ` Tyler Retzlaff
2024-02-28 10:42  3%         ` David Marchand
2024-02-28 14:03  3%       ` Dodji Seketeli
2024-02-28 14:18         ` David Marchand
2024-02-28 15:01  4%       ` Morten Brørup
2024-02-28 15:33  3%         ` David Marchand
2024-04-02 20:08  3% ` [PATCH v9 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
2024-04-02 20:08  2%   ` [PATCH v9 2/4] mbuf: remove rte marker fields Tyler Retzlaff
2024-04-02 20:45  0%     ` Stephen Hemminger
2024-04-02 20:51  0%       ` Tyler Retzlaff
2024-04-03 17:53  3% ` [PATCH v10 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
2024-04-03 17:53  2%   ` [PATCH v10 2/4] mbuf: remove rte marker fields Tyler Retzlaff
2024-04-03 19:32  0%     ` Morten Brørup
2024-04-03 22:45  0%       ` Tyler Retzlaff
2024-04-03 21:49  0%     ` Stephen Hemminger
2024-04-04 17:51  3% ` [PATCH v11 0/4] remove use of RTE_MARKER fields in libraries Tyler Retzlaff
2024-04-04 17:51  2%   ` [PATCH v11 2/4] mbuf: remove rte marker fields Tyler Retzlaff
2024-01-30 23:26     [PATCH] mbuf: replace GCC marker extension with C11 anonymous unions Tyler Retzlaff
2024-02-13  6:45  3% ` [PATCH v2] RFC: " Tyler Retzlaff
2024-02-13  8:57  0%   ` Bruce Richardson
2024-02-13 17:09  0%   ` Morten Brørup
2024-01-31  9:35     [PATCH v2] ethdev: fast path async flow API Dariusz Sosnowski
2024-02-06 17:36  1% ` [PATCH v3] " Dariusz Sosnowski
2024-02-02  5:13     [PATCH 1/1] eal: add C++ include guard in generic/rte_vect.h Ashish Sadanandan
2024-02-02  9:18     ` Thomas Monjalon
2024-02-02  9:40       ` Bruce Richardson
2024-02-02 20:58         ` Ashish Sadanandan
2024-03-13 23:45           ` Stephen Hemminger
2024-03-14  3:45  3%         ` Tyler Retzlaff
     [not found]     <0230331200824.195294-1-stephen@networkplumber.org>
2024-02-05 17:43     ` [PATCH v9 00/23] Use inclusive naming in DPDK Stephen Hemminger
2024-02-05 17:43  2%   ` [PATCH v9 05/23] mbuf: replace term sanity check Stephen Hemminger
2024-02-10 10:42     [PATCH 0/4] add support of partial offload Chaoyong He
2024-02-23  2:42     ` [PATCH v2 0/4] add support of MARK and RSS flow action Chaoyong He
2024-02-23  2:42  2%   ` [PATCH v2 1/4] ethdev: add function to check representor port Chaoyong He
2024-02-14  1:26     [PATCH 00/14] use C11 alignas and normalize type alignment Tyler Retzlaff
2024-02-14  7:05     ` [PATCH v3 00/39] " Tyler Retzlaff
2024-02-14  7:06  4%   ` [PATCH v3 27/39] mempool: use C11 alignas Tyler Retzlaff
2024-02-14 16:35     ` [PATCH v4 00/39] use C11 alignas and normalize type alignment Tyler Retzlaff
2024-02-14 16:35  4%   ` [PATCH v4 27/39] mempool: use C11 alignas Tyler Retzlaff
2024-02-23 19:03     ` [PATCH v5 00/39] " Tyler Retzlaff
2024-02-23 19:04  3%   ` [PATCH v5 27/39] mempool: " Tyler Retzlaff
2024-02-26 18:25     ` [PATCH v6 00/39] " Tyler Retzlaff
2024-02-26 18:25  3%   ` [PATCH v6 27/39] mempool: " Tyler Retzlaff
2024-02-27  9:42  0%     ` Konstantin Ananyev
2024-03-04 17:52     ` [PATCH v7 00/39] " Tyler Retzlaff
2024-03-04 17:52  3%   ` [PATCH v7 27/39] mempool: " Tyler Retzlaff
2024-02-15 12:48  3% [PATCH 0/7] add Nitrox compress device support Nagadheeraj Rottela
2024-02-29 17:22  0% ` Akhil Goyal
2024-02-19  9:40     [RFC v2 1/5] eal: add static per-lcore memory allocation facility Mattias Rönnblom
2024-02-20  8:49     ` [RFC v3 0/6] Lcore variables Mattias Rönnblom
2024-02-20  8:49       ` [RFC v3 1/6] eal: add static per-lcore memory allocation facility Mattias Rönnblom
2024-02-20  9:11         ` Bruce Richardson
2024-02-20 10:47  3%       ` Mattias Rönnblom
2024-02-20 11:39  0%         ` Bruce Richardson
2024-02-20 13:37  0%           ` Morten Brørup
2024-02-20 16:26  0%           ` Mattias Rönnblom
2024-02-20  3:29     [DPDK/ethdev Bug 1381] TAP device can not support 17 queues bugzilla
2024-02-20 19:23  3% ` Stephen Hemminger
2024-02-20 15:33     [RFC PATCH 0/2] power: refactor power management library Sivaprasad Tummala
2024-02-20 15:33     ` [RFC PATCH 1/2] power: refactor core " Sivaprasad Tummala
2024-03-01  2:56  3%   ` lihuisong (C)
2024-03-01 10:39  0%     ` Hunt, David
2024-03-05  4:35  3%     ` Tummala, Sivaprasad
2024-02-22  7:35     release candidate 24.03-rc1 Thomas Monjalon
2024-02-29  9:18  4% ` Xu, HailinX
2024-02-28  3:18  9% [DPDK/ethdev Bug 1386] [dpdk-24.03] [ABI][meson test] driver-tests/link_bonding_autotest test failed: Segmentation fault when do ABI testing bugzilla
2024-02-29 17:56  3% [PATCH] net/tap: allow more that 4 queues Stephen Hemminger
2024-03-06 16:14  0% ` Ferruh Yigit
2024-03-06 20:21  3%   ` Stephen Hemminger
2024-03-07 10:25  0%     ` Ferruh Yigit
2024-03-07 16:53  0%       ` Stephen Hemminger
2024-03-01 16:25  3% [PATCH 0/7] add Nitrox compress device support Nagadheeraj Rottela
2024-03-02  9:38  3% Nagadheeraj Rottela
2024-03-04  7:14  0% ` Akhil Goyal
2024-03-02 13:53  2% [RFC 0/7] Improve EAL bit operations API Mattias Rönnblom
2024-03-04 18:45     [PATCH] hash: make gfni stubs inline Stephen Hemminger
2024-03-05 10:14     ` David Marchand
2024-03-05 17:53  3%   ` Tyler Retzlaff
2024-03-05 18:44  0%     ` Stephen Hemminger
2024-03-07 10:32  0%     ` David Marchand
2024-03-05  2:29     [PATCH 0/5] add new meta data module Chaoyong He
2024-03-05  2:29  4% ` [PATCH 1/5] net/nfp: create " Chaoyong He
2024-03-05  2:29  3% ` [PATCH 3/5] net/nfp: uniform function name format Chaoyong He
2024-03-05 13:49  4% [PATCH] devtools: require version for experimental symbols David Marchand
2024-03-06 12:24     [PATCH v3 00/33] net/ena: v2.9.0 driver release shaibran
2024-03-06 12:24     ` [PATCH v3 07/33] net/ena: restructure the llq policy setting process shaibran
2024-03-08 17:24  3%   ` Ferruh Yigit
2024-03-10 14:29  0%     ` Brandes, Shai
2024-03-13 11:21  0%       ` Ferruh Yigit
     [not found]     <20220825024425.10534-1-lihuisong@huawei.com>
2024-01-30  6:36     ` [PATCH v7 0/5] app/testpmd: support multiple process attach and detach port Huisong Li
2024-03-08 10:38  0%   ` lihuisong (C)
2024-04-23 11:17  0%   ` lihuisong (C)
2024-03-08 18:54  2% [RFC] eal: increase the number of availble file descriptors for MP Stephen Hemminger
2024-03-08 20:36  8% ` [RFC v2] eal: increase passed max multi-process file descriptors Stephen Hemminger
2024-03-14 15:23  0%   ` Stephen Hemminger
2024-03-14 15:38  3%     ` David Marchand
2024-03-09 18:12  2% ` [RFC v3] tap: do not duplicate fd's Stephen Hemminger
2024-03-14 14:40  4% ` [RFC] eal: increase the number of availble file descriptors for MP David Marchand
2024-03-11 22:36  2% Community CI Meeting Minutes - March 7, 2024 Patrick Robb
2024-03-12  7:52     [PATCH 0/3] support setting lanes Dengdui Huang
2024-03-12  7:52  5% ` [PATCH 1/3] ethdev: " Dengdui Huang
2024-03-12  7:52     [PATCH 3/3] app/testpmd: " Dengdui Huang
2024-03-22  7:09     ` [PATCH v2 0/6] " Dengdui Huang
2024-03-22  7:09  5%   ` [PATCH v2 1/6] ethdev: " Dengdui Huang
2024-03-20 20:50     [PATCH 00/46] use stdatomic API Tyler Retzlaff
2024-03-20 20:51     ` [PATCH 15/46] net/sfc: use rte " Tyler Retzlaff
2024-03-21 18:11  3%   ` Aaron Conole
2024-03-21 18:15  0%     ` Tyler Retzlaff
2024-03-20 21:05     [PATCH 00/15] fix packing of structs when building with MSVC Tyler Retzlaff
2024-03-20 21:05     ` [PATCH 02/15] eal: pack structures " Tyler Retzlaff
2024-03-21 16:02  3%   ` Bruce Richardson
2024-03-21 14:49  3% DPDK Release Status Meeting 2024-03-21 Mcnamara, John
2024-03-25 10:05     [PATCH v3] graph: expose node context as pointers Robin Jarry
2024-03-27  9:14  4% ` [PATCH v5] " Robin Jarry
2024-03-26 23:59     [PATCH] igc/ixgbe: add get/set link settings interface Marek Pazdan
2024-04-03 13:40     ` [PATCH] lib: " Marek Pazdan
2024-04-03 16:49  3%   ` Tyler Retzlaff
2024-04-04  7:09  4%     ` David Marchand
2024-04-05  0:55  0%       ` Tyler Retzlaff
2024-04-05  0:56  0%         ` Tyler Retzlaff
2024-04-05  8:58  0%         ` David Marchand
2024-03-28  2:19  3% Minutes of Technical Board meeting 20-March-2024 Stephen Hemminger
2024-03-28 21:46  3% DPDK 24.03 released Thomas Monjalon
2024-03-28 23:33     [PATCH] vhost: optimize mbuf allocation in virtio Tx packed path Andrey Ignatov
2024-03-28 23:44     ` Stephen Hemminger
2024-03-29  0:10       ` Andrey Ignatov
2024-03-29  2:53         ` Stephen Hemminger
2024-03-29 13:04           ` Maxime Coquelin
2024-03-29 13:42  3%         ` The effect of inlining Morten Brørup
2024-03-29 20:26  0%           ` Tyler Retzlaff
2024-04-01 15:20  3%           ` Mattias Rönnblom
2024-04-03 16:01  3%             ` Morten Brørup
2024-03-30 17:54 18% [PATCH] version: 24.07-rc0 David Marchand
2024-04-02  8:52  0% ` Thomas Monjalon
2024-04-02  9:25  0%   ` David Marchand
2024-04-04 16:29  3% Community CI Meeting Minutes - April 4, 2024 Patrick Robb
2024-04-04 21:04     [PATCH v1 0/3] Additional queue stats Nicolas Chautru
2024-04-04 21:04     ` [PATCH v1 1/3] bbdev: new queue stat for available enqueue depth Nicolas Chautru
2024-04-05  0:46  3%   ` Stephen Hemminger
2024-04-05 15:15  3%   ` Stephen Hemminger
2024-04-05 18:17  3%     ` Chautru, Nicolas
2024-04-10  9:33     Strict aliasing problem with rte_eth_linkstatus_set() fengchengwen
2024-04-10 15:27     ` Stephen Hemminger
2024-04-10 15:58  3%   ` Ferruh Yigit
2024-04-10 17:54       ` Morten Brørup
2024-04-10 19:58  3%     ` Tyler Retzlaff
2024-04-11  3:20  0%       ` fengchengwen
2024-04-11  8:22  3% [PATCH 0/3] cryptodev: add API to get used queue pair depth Akhil Goyal
2024-04-12 11:57  3% ` [PATCH v2 " Akhil Goyal
2024-04-18 17:49  3% Community CI Meeting Minutes - April 18, 2024 Patrick Robb
2024-04-24  4:08  3% getting rid of type argument to rte_malloc() Stephen Hemminger
2024-04-24 10:29  0% ` Ferruh Yigit
2024-04-24 16:23  0%   ` Stephen Hemminger
2024-04-24 16:23  0%     ` Stephen Hemminger
2024-04-24 17:09  0%     ` Morten Brørup
2024-04-24 19:05  0%       ` Stephen Hemminger
2024-04-24 19:06  0%   ` Stephen Hemminger
2024-04-24 15:24  3% Minutes of DPDK Technical Board Meeting, 2024-04-03 Thomas Monjalon
2024-04-24 17:25  3% ` Morten Brørup
2024-04-24 19:10  0%   ` Thomas Monjalon
2024-04-25 17:46     [RFC] net/af_packet: make stats reset reliable Ferruh Yigit
2024-04-26 14:38     ` [RFC v2] " Ferruh Yigit
2024-04-28 15:11       ` Mattias Rönnblom
2024-05-07  7:23         ` Mattias Rönnblom
2024-05-07 13:49           ` Ferruh Yigit
2024-05-07 14:51             ` Stephen Hemminger
2024-05-07 16:00  3%           ` Morten Brørup
2024-05-07 16:54  0%             ` Ferruh Yigit
2024-05-07 18:47  0%               ` Stephen Hemminger
2024-05-08  7:48  0%             ` Mattias Rönnblom
2024-04-30 15:39     [PATCH] net/af_packet: fix statistics Stephen Hemminger
2024-05-01 16:25     ` Ferruh Yigit
2024-05-01 16:44  3%   ` Stephen Hemminger
2024-05-01 18:18  0%     ` Morten Brørup
2024-05-02 13:47  0%       ` Ferruh Yigit
2024-05-02 13:55     [PATCH] freebsd: Add support for multiple dpdk instances on FreeBSD Tom Jones
2024-05-03  9:46     ` Tom Jones
2024-05-03 13:03       ` Bruce Richardson
2024-05-03 13:12         ` Tom Jones
2024-05-03 13:24  3%       ` Bruce Richardson
2024-05-13 15:59     [PATCH 0/9] reowrd in prog guide Nandini Persad
2024-05-13 15:59  5% ` [PATCH 1/9] doc: reword design section in contributors guidelines Nandini Persad

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).